Eddie, Thilo,
Sorry not to have responded.
And sorry again that I have not described my scenario properly and
made you misunderstood it...
The idea Eddie gave was also useful for my further development but
seems not the one for my case.
Writing ad-hoc code, I have already realized what I wanted. But
because I still wonder the same question, I explain the case again.
2 sections below
A: Example of What I wanted this time
B: The ad-hoc source code I made this time
As you see source code shown in B, you might see how my code is
redundunt and lacks extensionability (even ignoring my general Java
skills).
Simple and extensionable code is appreciated.
Thanks In ADV,
Isaac
A: Example of What I wanted this time
I input text into aggregated AE, which consists of AE-1 and AE-2
working in the numbered order.
AE-1 puts <Person> and other 20 kinds of annotations.
AE-2 removes other annotations if their positions are the same with
those of <Person>'s.
i) Input text
"The story of Mr.Saito is similar to the Isaac Foundation's mission."
#NOTE: actually we're handling mainly Japanese, but it doesn't matter here.
ii) Annotation Result of AE-1:
"The story of Mr.<Person>Saito</Person> is similar to the
<Organization><Person>Isaac Foundation</Person></Organization>'s
mission."
iii) Annotation Result of AE-2(What I wanted this time):
"The story of Mr.<Person>Saito</Person> is similar to the Isaac
Foundation's mission."
B: The ad-hoc source code I made this time
public class PersonAnnotator extends JCasAnnotator_ImplBase {
private static final String CLASSNAME_ROOT =
"com.ibm.omnifind.ne.types";
private static final String CLASSNAME_ORG = CLASSNAME_ROOT + ".Org";
private static final String CLASSNAME_COMPANY = CLASSNAME_ROOT +
".Company";
private static final String CLASSNAME_PLACE = CLASSNAME_ROOT + ".Place";
private static final String CLASSNAME_COUNTRY = CLASSNAME_ROOT +
".Country";
private static final String CLASSNAME_AREA = CLASSNAME_ROOT + ".Area";
private static final String CLASSNAME_ORDINAL = CLASSNAME_ROOT +
".Ordinal";
@Override
public void process(JCas jcas) throws AnalysisEngineProcessException {
this.removeMultiplyAssignedNe(jcas);
}
private void removeMultiplyAssignedNe(JCas jcas) {
FSIterator personIter = jcas.getJFSIndexRepository()
.getAnnotationIndex(Person.type).iterator();
LinkedList<Person> persons = new LinkedList<Person>();
for (; personIter.isValid(); personIter.moveToNext()) {
persons.add((Person) personIter.next());
}
FSIterator annotItr = jcas.getAnnotationIndex().iterator();
LinkedList<NamedEntity> removalCandidates = new
LinkedList<NamedEntity>();
for (; annotItr.isValid(); annotItr.moveToNext()) {
String typename = annotItr.get().getType().getName();
if (PersonAnnotator.CLASSNAME_ORG.equals(typename)
||
PersonAnnotator.CLASSNAME_COMPANY.equals(typename)
||
PersonAnnotator.CLASSNAME_PLACE.equals(typename)
||
PersonAnnotator.CLASSNAME_COUNTRY.equals(typename)
||
PersonAnnotator.CLASSNAME_AREA.equals(typename)
||
PersonAnnotator.CLASSNAME_ORDINAL.equals(typename)) {
NamedEntity ne = (NamedEntity) annotItr.get();
removalCandidates.add(ne);
}
}
for (int i = 0; i < removalCandidates.size(); i++) {
boolean tobeRemoved = false;
NamedEntity rn = removalCandidates.get(i);
int startPos = rn.getBegin();
int endPos = rn.getEnd();
for (int j = 0; j < persons.size(); j++) {
Person p = persons.get(j);
int p_startPos = p.getBegin();
int p_endPos = p.getEnd();
if ((p_startPos == startPos) && (p_endPos ==
endPos)) {
// removalCandidates.remove(rn);
tobeRemoved = true;
}
}
if (tobeRemoved) {
System.out.println(super.getClass().getName()
+ "#removeMultiplyAssignedNe: "
+ rn.getLex()
+ " removed.");
rn.removeFromIndexes();
}
}
}
}
On Jan 27, 2008 5:06 AM, Eddie Epstein <[EMAIL PROTECTED]> wrote:
> Hi Isaac,
>
> If I understand your scenario, you want to ignore duplicate Person
> annotations. The set index type is useful for just this purpose.
>
> The javadocs for this index type say:
> Indexing strategy: set index. A set index contains no duplicates of the
> same type, where a duplicate is defined by the indexing comparator. A set
> index is not guaranteed to be sorted.
>
> A simple test shows an iterator for a set index to respect sort order, so
> I'm not sure what the documentation means about "not guaranteeed to be
> sorted". We'll have to wait for Thilo to clarify this.
>
> The attached files are intended to be placed into
> $UIMA_HOME/examples/descriptors/analysis_engine/SetIndexTest.xml
> $UIMA_HOME/examples/src/org/apache/uima/examples/SetIndexTest.java
>
> The test prints the following:
>
> Set index contents:
> annotation at begin=0 end=3
> annotation at begin=10 end=13
> annotation at begin=20 end=23
>
> Annotation index contents:
> annotation at begin=0 end=3
> annotation at begin=10 end=15
> annotation at begin=10 end=13
> annotation at begin=20 end=23
>
> Note that the Person at (10,15) is identical to (10,13) because the set
> index is defined with only one key, the begin feature.
>
> Regards,
> Eddie
>
>
>
> On Jan 25, 2008 7:33 AM, SAITO, Isao Isaac <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > I wonder if there is any method delivered by UIMA framework that can
> > be applicable to My scenario below.
> >
> > My scenario:
> > - Regions annotated as Person are needed
> > - IF multiple annotations includiong Person applied to the region
> > which has the same start and end position, THEN remove the Person
> > annotation with that region from Index
> >
> >
> > Though I know I can write ad-hoc codes for this,
> > I like to take the best method to avoid 1)decrease performance of
> > system 2)cost of writing adhoc codes in the future.
> >
> > Thanks,
> > Isaac
> >
>