Hi Isaac,

If I understand your scenario, you want to ignore duplicate Person
annotations. The set index type is useful for just this purpose.

The javadocs for this index type say:
Indexing strategy: set index. A set index contains no duplicates of the same
type, where a duplicate is defined by the indexing comparator. A set index
is not guaranteed to be sorted.

A simple test shows an iterator for a set index to respect sort order, so
I'm not sure what the documentation means about "not guaranteeed to be
sorted". We'll have to wait for Thilo to clarify this.

The attached files are intended to be placed into
$UIMA_HOME/examples/descriptors/analysis_engine/SetIndexTest.xml
$UIMA_HOME/examples/src/org/apache/uima/examples/SetIndexTest.java

The test prints the following:
Set index contents:
annotation at begin=0 end=3
annotation at begin=10 end=13
annotation at begin=20 end=23

Annotation index contents:
annotation at begin=0 end=3
annotation at begin=10 end=15
annotation at begin=10 end=13
annotation at begin=20 end=23

Note that the Person at (10,15) is identical to (10,13) because the set
index is defined with only one key, the begin feature.

Regards,
Eddie

On Jan 25, 2008 7:33 AM, SAITO, Isao Isaac <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I wonder if there is any method delivered by UIMA framework that can
> be applicable to My scenario below.
>
> My scenario:
>  - Regions annotated as Person are needed
>  - IF multiple annotations includiong Person applied to the region
> which has the same start and end position, THEN remove the Person
> annotation with that region from Index
>
>
> Though I know I can write ad-hoc codes for this,
> I like to take the best method to avoid 1)decrease performance of
> system 2)cost of writing adhoc codes in the future.
>
> Thanks,
>  Isaac
>
<?xml version="1.0" encoding="UTF-8"?>
<analysisEngineDescription xmlns="http://uima.apache.org/resourceSpecifier";>
  <frameworkImplementation>org.apache.uima.java</frameworkImplementation>
  <primitive>true</primitive>
  <annotatorImplementationName>org.apache.uima.examples.cas.PersonTitleAnnotator</annotatorImplementationName>
  <analysisEngineMetaData>
    <name>Person Title Annotator</name>
    <description>An example annotator that discovers Person Titles in text and classifies them
    into three categories - Civilian (e.g. Mr.,Ms.), Military (e.g. Lt. Col.) , and 
    Government (e.g. Gov., Sen.).  This annotator can be configured to only look for 
    titles within existing annotations of a particular type (for example, Person Name 
    annotations).</description>
    <version>1.0</version>
    <vendor>The Apache Software Foundation</vendor>
    <configurationParameters>
      <configurationParameter>
        <name>CivilianTitles</name>
        <description>List of Civilian Titles to be annotated.</description>
        <type>String</type>
        <multiValued>true</multiValued>
        <mandatory>true</mandatory>
      </configurationParameter>
      <configurationParameter>
        <name>MilitaryTitles</name>
        <description>List of Military Titles to be annotated.</description>
        <type>String</type>
        <multiValued>true</multiValued>
        <mandatory>true</mandatory>
      </configurationParameter>
      <configurationParameter>
        <name>GovernmentTitles</name>
        <description>List of Government Titles to be annotated.</description>
        <type>String</type>
        <multiValued>true</multiValued>
        <mandatory>true</mandatory>
      </configurationParameter>
      <configurationParameter>
        <name>ContainingAnnotationType</name>
        <description>Annotation type within which to search for Person Titles.  If no value is specified,
        the entire document will be searched.</description>
        <type>String</type>
        <multiValued>false</multiValued>
        <mandatory>false</mandatory>
      </configurationParameter>
    </configurationParameters>
    <configurationParameterSettings>
      <nameValuePair>
        <name>CivilianTitles</name>
        <value>
          <array>
            <string>Mr.</string>
            <string>Ms.</string>
            <string>Mrs.</string>
            <string>Dr.</string>
          </array>
        </value>
      </nameValuePair>
      <nameValuePair>
        <name>MilitaryTitles</name>
        <value>
          <array>
            <string>Gen.</string>
            <string>Col.</string>
            <string>Maj.</string>
            <string>Capt.</string>
            <string>Lt. Gen.</string>
            <string>Lt Col.</string>
            <string>Lt.</string>
          </array>
        </value>
      </nameValuePair>
      <nameValuePair>
        <name>GovernmentTitles</name>
        <value>
          <array>
            <string>Vice President</string>
            <string>President</string>
            <string>Vice Pres.</string>
            <string>Pres.</string>
            <string>Governor</string>
            <string>Lt. Governor</string>
            <string>Gov.</string>
            <string>Lt. Gov.</string>
            <string>Senator</string>
            <string>Sen.</string>
          </array>
        </value>
      </nameValuePair>
    </configurationParameterSettings>
    <typeSystemDescription>
      <types>
        <typeDescription>
          <name>test.Person</name>
          <description>A Person.</description>
          <supertypeName>uima.tcas.Annotation</supertypeName>
        </typeDescription>
        <typeDescription>
          <name>example.PersonTitle</name>
          <description>A Personal Title.</description>
          <supertypeName>uima.tcas.Annotation</supertypeName>
          <features>
            <featureDescription>
              <name>Kind</name>
              <description>The kind of title - Civilian, Military, or Government.</description>
              <rangeTypeName>example.PersonTitleKind</rangeTypeName>
            </featureDescription>
          </features>
        </typeDescription>
        <typeDescription>
          <name>example.PersonTitleKind</name>
          <description>A kind of person title - Civilian, Military, or Government.</description>
          <supertypeName>uima.cas.String</supertypeName>
          <allowedValues>
            <value>
              <string>Civilian</string>
              <description>Title of a person not in military or government service.</description>
            </value>
            <value>
              <string>Military</string>
              <description>Title of a person in the military.</description>
            </value>
            <value>
              <string>Government</string>
              <description>Title of a government official.</description>
            </value>
          </allowedValues>
        </typeDescription>
      </types>
    </typeSystemDescription>
    <typePriorities/>
    <fsIndexCollection>
      <fsIndexes>
        <fsIndexDescription>
          <label>testIdx</label>
          <typeName>test.Person</typeName>
          <kind>set</kind>
          <keys>
            <fsIndexKey>
              <featureName>begin</featureName>
              <comparator>standard</comparator>
            </fsIndexKey>
          </keys>
        </fsIndexDescription>
      </fsIndexes>
    </fsIndexCollection>
    <capabilities>
      <capability>
        <inputs/>
        <outputs>
          <type>example.PersonTitle</type>
          <feature>example.PersonTitle:Kind</feature>
        </outputs>
        <languagesSupported>
          <language>en</language>
        </languagesSupported>
      </capability>
    </capabilities>
    <operationalProperties>
      <modifiesCas>true</modifiesCas>
      <multipleDeploymentAllowed>true</multipleDeploymentAllowed>
      <outputsNewCASes>false</outputsNewCASes>
    </operationalProperties>
  </analysisEngineMetaData>
</analysisEngineDescription>
/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 * 
 *   http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing,
 * software distributed under the License is distributed on an
 * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 * KIND, either express or implied.  See the License for the
 * specific language governing permissions and limitations
 * under the License.
 */

package org.apache.uima.examples;

import java.io.File;

import org.apache.uima.UIMAFramework;
import org.apache.uima.analysis_engine.AnalysisEngine;
import org.apache.uima.cas.CAS;
import org.apache.uima.cas.FSIndex;
import org.apache.uima.cas.FSIterator;
import org.apache.uima.cas.Type;
import org.apache.uima.cas.text.AnnotationFS;
import org.apache.uima.resource.ResourceSpecifier;
import org.apache.uima.util.XMLInputSource;

/**
 */
public class SetIndexTest {
  /**
   * Main program.
   * 
   * @param args
   *          Command-line arguments - see class description
   */
  public static void main(String[] args) {
    try {
      File taeDescriptor = null;

      // Read and validate command line arguments
      boolean validArgs = false;
      if (args.length == 1) {
        taeDescriptor = new File(args[0]);

        validArgs = taeDescriptor.exists();
      }
      if (!validArgs) {
        printUsageMessage();
      } else {
        // get Resource Specifier from XML file
        XMLInputSource in = new XMLInputSource(taeDescriptor);
        ResourceSpecifier specifier = UIMAFramework.getXMLParser().parseResourceSpecifier(in);

        // create Analysis Engine
        AnalysisEngine ae = UIMAFramework.produceAnalysisEngine(specifier);
        // create a CAS
        CAS cas = ae.newCAS();
        Type Person  = cas.getTypeSystem().getType("test.Person");
        AnnotationFS newfs = cas.createAnnotation(Person, 10, 13);
        cas.getIndexRepository().addFS(newfs);
        newfs = cas.createAnnotation(Person, 20, 23);
        cas.getIndexRepository().addFS(newfs);
        newfs = cas.createAnnotation(Person, 0, 3);
        cas.getIndexRepository().addFS(newfs);
        newfs = cas.createAnnotation(Person, 10, 15);
        cas.getIndexRepository().addFS(newfs);

        FSIndex testIdx = cas.getIndexRepository().getIndex("testIdx");
        FSIterator testItr = testIdx.iterator();
        System.out.println("Set index contents:");
        while (testItr.hasNext()) {
        	newfs = (AnnotationFS) testItr.get();
        	System.out.println("annotation at begin=" + newfs.getBegin() +
        			" end="+ newfs.getEnd());
        	testItr.moveToNext();
        }

        testItr = cas.getAnnotationIndex().iterator();
        System.out.println("\nAnnotation index contents:");
        while (testItr.hasNext()) {
        	newfs = (AnnotationFS) testItr.get();
        	System.out.println("annotation at begin=" + newfs.getBegin() +
        			" end="+ newfs.getEnd());
        	testItr.moveToNext();
        }
        
        ae.destroy();
      }
    } catch (Exception e) {
      e.printStackTrace();
    }
  }

  /**
   * Prints usage message.
   */
  private static void printUsageMessage() {
    System.err.println("Usage: java org.apache.uima.example.SetIndexTest "
            + "<TAE descriptor or TEAR file name>");
  }

}

Reply via email to