Hi,
I'm reasonably new to UIMA and trying to get it to do what I want. I'm
attempting to perform entity extraction on 3 languages. I have an IF statement
at the start of each Analysis engine which skips if the language of the
document is not English for example - another AE detects the language to begin
with.
the next AE then tokenises this document (space tokeniser), next AE then
extracts entities and CAS consumer then writes this to disk.
However I don't want to write ALL the space tokenised annotations to the disk
aswell - only the extracted entities, as the files gets very large very
quickly! Once a token has been processed I want it to be removed from the CAS/
jCAS, but token.removeFromIndexes() (I'm using Java) just throws a concurrent
modification exception.
How do I get around this?
This is my code:
AnnotationIndex<Annotation> token = aJCas.getAnnotationIndex(Token.type);
FSIterator<Annotation> timeIter = token.iterator();
while (timeIter.hasNext()) {
Token currentToken = (Token) timeIter.next();
Token previousToken = null;
if (englishNamesAsTrie.search(currentToken.getToken().toLowerCase())) {
PersonName annotation = new PersonName(aJCas);
annotation.setBegin(currentToken.getBegin());
annotation.setEnd(currentToken.getEnd());
annotation.addToIndexes(aJCas);
currentToken.removeFromIndexes(aJCas)
}
}
}