Eclipse pointed out a bug in my code, fix is below
On 11/18/2014 9:37 AM, Marshall Schor wrote:
> Hi Kameron,
>
> Based on this code snip, the two "cat" annotations you create are "different"
> using the HashSet definition, because they correspond to two distinct UIMA
> Annotations.  You could, for instance, update one of them, and not the other;
> that it the sense in which they are distinct.  In the case below, the two 
> "cat"
> annotations would have different begin and end offsets.
>
> I'm guessing that your goal was to to have one of the two cat annotations be
> dropped.
>
> You could do that by using your hash set approach, if you defined equal to 
> mean
> that just the covered text of the annotation was equal.
>
> Here's one way to do this:  Create a "cover object" for your annotations, that
> contains a reference to the annotation and defines equals and hashcode (you 
> have
> to define these together).  The easy way to do this is using Eclipse - define 
> a
> new class: e.g.
>
> public class MyAnnotationWithSpecialEquals {
>   final public Annotation annotation;   // the covered annotation
>  
>   public MyAnnotationWithSpecialEquals(Annotation annotation) {
>     this.annotation = annotation;
>   }
> }
>
> and then use Eclipse to define the equals and hashcode:  go to Menu -> Source 
> ->
> Generate hashcode() and equals()
> and have it generate one based on just "annotation".  This will not (yet) be
> correct - it should add two methods like this:
>
>   @Override
>   public int hashCode() {
>     final int prime = 31;
>     int result = 1;
>     result = prime * result + ((annotation == null) ? 0 : 
> annotation.hashCode());
>     return result;
>   }
>
>   @Override
>   public boolean equals(Object obj) {
>     if (this == obj)
>       return true;
>     if (obj == null)
>       return false;
>     if (getClass() != obj.getClass())
>       return false;
>     MyAnnotationWithSpecialEquals other = (MyAnnotationWithSpecialEquals) obj;
            // buggy lines
>     if (annotation == null) {
>       if (other.annotation != null)
>         return false;
            //  replace above with
      if (annotation == null && other.annotation != null)
        return false;
>     } else if (!annotation.equals(other.annotation))
>       return false;
>     return true;
>   }
>
> Now, to get these to be the definitions you want, which depend only on the
> covered text, modify these as follows:
>
> First, for hashCode, use only the string covered text:
>
>   @Override
>   public int hashCode() {
>     final int prime = 31;
>     int result = 1;
>     result = prime * result + ((annotation == null) ? 0 :
> annotation.getCoveredText().hashCode());
>     return result;
>   }
>
> and for equals: replace test for annotation being "equal" with
> annotation.getCoveredText() being "equal",
> with some additional edge case testing in case of nulls:
>
> @Override
>   public boolean equals(Object obj) {
>     if (this == obj)
>       return true;
>     if (obj == null)
>       return false;
>     if (getClass() != obj.getClass())
>       return false;
>     MyAnnotationWithSpecialEquals other = (MyAnnotationWithSpecialEquals) obj;
>     if (annotation == null) {
>       if (other.annotation != null)
>         return false;
>     } else {
>       String coveredText = annotation.getCoveredText();
>       if (coveredText == null) {
>          if (other.annotation.getCoveredText() == null)
>             return true;  // handle special case if covered text is null
>          else return false;
>       }
>       // coveredText is not null
>       if (!coveredText.equals(other.annotation.getCoveredText()))
>         return false;
>       return true;
>     }
>   }
>
> HTH.  -Marshall
>
>
> On 11/17/2014 4:49 PM, Kameron Cole wrote:
>> Input text:
>>
>> ------------------------------
>>
>> bird, cat, bush, cat
>>
>> ----------------------------
>>
>> Create the Annotations:
>>
>> -------------------------------
>> docText = aJCas.getDocumentText();
>>
>> *int* index = docText.indexOf("cat");
>> *while*(index >= 0) {
>> *int* begin = index;
>> *int* end = begin+3;
>> Animal animal = *new* Animal(aJCas);
>> animal.setBegin(begin);
>> animal.setEnd(end);
>> animal.addToIndexes();
>>  
>>    index = docText.indexOf("cat", index+1);
>> }
>>
>> index = docText.indexOf("bird");
>> *while*(index >= 0) {
>> *int* begin = index;
>> *int* end = begin+4;
>> Animal animal = *new* Animal(aJCas);
>> animal.setBegin(begin);
>> animal.setEnd(end);
>> animal.addToIndexes();
>>  
>>    index = docText.indexOf("bird", index+1);
>> }
>>
>> index = docText.indexOf("bush");
>> *while*(index >= 0) {
>> *int* begin = index;
>> *int* end = begin+4;
>> Vegetable animal = *new* Vegetable(aJCas);
>> animal.setBegin(begin);
>> animal.setEnd(end);
>> animal.addToIndexes();
>>  
>>    index = docText.indexOf("bird", index+1);
>> }
>> ------------------------------------------------------
>>
>>     
>> --------------------------------------------------------------------------------
>>
>>     *Kameron Arthur Cole
>>     Watson Content Analytics Applications and Support
>>     email: **[email protected]* <mailto:[email protected]>* | Tel:
>>     305-389-8512**
>>     **upload logs here* <http://www.ecurep.ibm.com/app/upload>  
>>
>>      
>>
>>      
>>
>>     
>> <http://www.facebook.com/ibmwatson><https://twitter.com/@ibmwatson><http://www.youtube.com/user/IBMWatsonSolutions/videos>
>>
>>
>>     
>> --------------------------------------------------------------------------------
>>
>>
>>
>> Inactive hide details for Marshall Schor ---11/17/2014 04:35:06 PM---Hi, Two
>> Feature Structures are considered "equal" in the sMarshall Schor 
>> ---11/17/2014
>> 04:35:06 PM---Hi, Two Feature Structures are considered "equal" in the sense
>> used by HashSet, if
>>
>> From: Marshall Schor <[email protected]>
>> To: [email protected]
>> Date: 11/17/2014 04:35 PM
>> Subject: Re: can't remove duplicate Annotations with Java Set Collection
>>
>> --------------------------------------------------------------------------------
>>
>>
>>
>> Hi,
>>
>> Two Feature Structures are considered "equal" in the sense used by HashSet, 
>> if
>> fs1.equals(fs2).   The definition of "equals" for feature structures is: they
>> are equal if they refer to the same underlying CAS, and the same "spot" in 
>> the
>> the CAS Heap.
>>
>> How did you create the Annotations that you think are "equal" in the HashSet
>> sense?
>>
>> Here's an example of two annotations which are "equal" in the UIMA sorted 
>> index
>> sense, but unequal in the HashSet sense.
>>
>>    Annotation fs1 = new Annotation(myJCas, 0, 4); // create an instance of
>> Annotation in myJCas, with a begin = 0, and end = 4.
>>    Annotation fs2 = new Annotation(myJCas, 0, 4); // create an instance of
>> Annotation in myJCas, with a begin = 0, and end = 4.
>>
>> These will be "equal" in the UIMA sense - the same kind of annotation, in the
>> same CAS, with the same feature values, but will be two distinct feature
>> structures, so HashSet will consider them to be unequal.
>>
>> Could this be what is happening in your case?  Please respond so we can see 
>> if
>> there's another straight-forward solution that does what you're looking for.
>>
>> -Marshall
>> on 11/17/2014 2:59 PM, Kameron Cole wrote:
>>> Hello,
>>>
>>> I am trying to get rid of duplicates in the FSIndex.  I thought a very
>>> clever way to do this would be to just push them into a Set Collection in
>>> Java, which does not allow duplicates. This is very (very) standard Java:
>>>
>>> ArrayList al = new ArrayList();
>>> // add elements to al, including duplicates
>>> HashSet hs = new HashSet();
>>> hs.addAll(al);
>>> al.clear();
>>> al.addAll(hs);
>>>
>>> This list will contain no duplicates.
>>>
>>> However, I am not getting this to work in my UIMA code:
>>>
>>>
>>> System.out.println("Index size is: "+idx.size());
>>>
>>> AnnotationIndex<Annotation> idx = aJCas.getAnnotationIndex();
>>>
>>> ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx.size());
>>>
>>> FSIterator it  = idx.iterator();
>>>
>>> //load the Annotations into a temporary list.  includes duplicates
>>>
>>> while(it.hasNext())
>>> {
>>>
>>> tempList.add((Annotation) it.next());
>>>
>>> }
>>>
>>> Iterator tempIt = tempList.iterator();
>>>
>>> // remove all Annotations from the index.  this works fine
>>>
>>> while(tempIt.hasNext()){
>>> ((Annotation) tempIt.next()).removeFromIndexes(aJCas);
>>> }
>>>
>>> // push tempList into HashSet
>>>
>>> HashSet<Annotation> hs = new HashSet<Annotation>();
>>>
>>> hs.addAll(tempList);
>>>
>>> // this should not allow duplicates
>>>
>>> System.out.println("HS length: "+hs.size()); // size should be less the
>>> size of the FSIndex by the number of duplicates.  it is not. This is the
>>> main problem
>>>
>>> tempList.clear();
>>>
>>> tempList.addAll(hs);
>>>
>>> System.out.println("templist length: "+tempList.size());
>>>
>>>
>>> Iterator<Annotation> it2 = tempList.iterator(); // this should now be the
>>> clean list
>>>
>>>
>>> while(it2.hasNext()){
>>> it2.next().addToIndexes(aJCas);
>>> }
>>
>

Reply via email to