Hello,
I am trying to get rid of duplicates in the FSIndex. I thought a very
clever way to do this would be to just push them into a Set Collection in
Java, which does not allow duplicates. This is very (very) standard Java:
ArrayList al = new ArrayList();
// add elements to al, including duplicates
HashSet hs = new HashSet();
hs.addAll(al);
al.clear();
al.addAll(hs);
This list will contain no duplicates.
However, I am not getting this to work in my UIMA code:
System.out.println("Index size is: "+idx.size());
AnnotationIndex<Annotation> idx = aJCas.getAnnotationIndex();
ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx.size());
FSIterator it = idx.iterator();
//load the Annotations into a temporary list. includes duplicates
while(it.hasNext())
{
tempList.add((Annotation) it.next());
}
Iterator tempIt = tempList.iterator();
// remove all Annotations from the index. this works fine
while(tempIt.hasNext()){
((Annotation) tempIt.next()).removeFromIndexes(aJCas);
}
// push tempList into HashSet
HashSet<Annotation> hs = new HashSet<Annotation>();
hs.addAll(tempList);
// this should not allow duplicates
System.out.println("HS length: "+hs.size()); // size should be less the
size of the FSIndex by the number of duplicates. it is not. This is the
main problem
tempList.clear();
tempList.addAll(hs);
System.out.println("templist length: "+tempList.size());
Iterator<Annotation> it2 = tempList.iterator(); // this should now be the
clean list
while(it2.hasNext()){
it2.next().addToIndexes(aJCas);
}