Hello,

I am trying to integrate UIMA into Gate. I got working the RE annotator from
the example shown in the UIMA IBM Tutorial,
http://www.ibm.com/developerworks/webservices/tutorials/ws-uima/section7.html
by Nicholas Chase. I am trying to embed the RE application in GATE within
the UIMA Plugin folder in GATE in order to be able to call UIMA
functionality from GATE.



I have followed the same structure/pattern as the CountLowercaseAnnotator
that is provided in gate/plugins/UIMA. The annotations seem to be done but
when printing the annotated file into xml the ProductNumber is never
written/annotated. Only Token and SpaceToken are annotated.



I registered the ProductNumberAnnotator.xml in AllTokenRelatedAnnotators.xml
like follows:



<taeDescription *xmlns*="http://uima.apache.org/resourceSpecifier";>

<frameworkImplementation>org.apache.uima.java</frameworkImplementation>

<primitive>false</primitive>

<delegateAnalysisEngineSpecifiers>

  <delegateAnalysisEngine key="printer">

    <import location="TokenPrinterAnnotator.*xml*"/>

  </delegateAnalysisEngine>

  <delegateAnalysisEngine key="counter">

      <import location="CountLowercaseAnnotator.*xml*" />

  </delegateAnalysisEngine>

  <delegateAnalysisEngine key="uppercaseCounter">

      <import location="CountUppercaseAnnotator.*xml*" />

  </delegateAnalysisEngine>

  <delegateAnalysisEngine key="remover">

      <import location="RemoveEvenLengthTokens.*xml*" />

  </delegateAnalysisEngine>

  <delegateAnalysisEngine key="annotatorRE">

      <import location="ProductNumberAnnotator.*xml*" />

  </delegateAnalysisEngine>

</delegateAnalysisEngineSpecifiers>



<analysisEngineMetaData>

  <name>Token *annotators*</name>

  <description>

    Meta-AE that combines all the *annotators* that use Token annotations,
so

    JCasGen can generate *overarching* JCas classes that will work for all
of the

    *annotators*.

  </description>



  <flowConstraints>

    <fixedFlow>

      <node>printer</node>

      <node>counter</node>

      <node>remover</node>

      <node>uppercaseCounter</node>

      <node>annotatorRE</node>

    </fixedFlow>

  </flowConstraints>



  <capabilities>

    <capability>

      <inputs>

        <type allAnnotatorFeatures="true">gate.*uima*.*cas*.Token</type>

        <type allAnnotatorFeatures="true">gate.*uima*.*cas*
.ProductNumber</type>

      </inputs>

      <outputs>

        <feature>gate.*uima*.*cas*.Token:LowerCaseLetters</feature>

        <feature>gate.*uima*.*cas*.Token:UpperCaseLetters</feature>

        <feature>gate.*uima*.*cas*.ProductNumber:ProductLine</feature>

      </outputs>

    </capability>

  </capabilities>

</analysisEngineMetaData>

</taeDescription>





I have the TokenHandlerGateMappingREUIMA.xml in
gate/plugins/UIMA/examples/conf/mapping with the content:



<uimaGateMapping>

  <inputs>

    <uimaAnnotation type="gate.*uima*.*cas*.Token" gateType="Token"
indexed="true">

      <feature name="String" kind="string">

        <gateAnnotFeatureValue name="string" />

      </feature>

      <feature name="Kind" kind="string">

        <gateAnnotFeatureValue name="kind" />

      </feature>

      <feature name="*Orth*" kind="string">

        <gateAnnotFeatureValue name="*orth*" />

      </feature>

      <feature name="POS" kind="string">

            <gateAnnotFeatureValue name="category" />

      </feature>

    </uimaAnnotation>

    <uimaAnnotation type="gate.*uima*.*cas*.Sentence" gateType="Sentence"
indexed="true">

    </uimaAnnotation>

    <uimaAnnotation type="gate.*uima*.*cas*.ProductNumber"
gateType="ProductNumber" indexed="true">

      <feature name="ProductLine" kind="string">

            <gateAnnotFeatureValue name="productLine" />

      </feature>

    </uimaAnnotation>

  </inputs>

  <outputs>

    <updated>

      <gateAnnotation type="Token" uimaType="gate.*uima*.*cas*.Token">

        <feature name="numUpper">

          <uimaFSFeatureValue name="gate.*uima*.*cas*
.Token:UpperCaseLetters"

                              kind="*int*" />

        </feature>

      </gateAnnotation>

      <gateAnnotation type="ProductNumber" uimaType="gate.*uima*.*cas*
.ProductNumber">

        <feature name="productLine">

          <uimaFSFeatureValue name="gate.*uima*.*cas*
.ProductNumber:ProductLine"

                              kind="string" />

        </feature>

      </gateAnnotation>

    </updated>

  </outputs>

</uimaGateMapping>



And I have the file TokenHandlerAggregateREUIMA.xml in
gate/plugins/UIMA/examples/conf/uima_descriptors with the content:





<taeDescription *xmlns*="http://uima.apache.org/resourceSpecifier";>

<frameworkImplementation>org.apache.uima.java</frameworkImplementation>

<primitive>false</primitive>

<delegateAnalysisEngineSpecifiers>

  <delegateAnalysisEngine key="printer">

    <import location="TokenPrinterAnnotator.*xml*"/>

  </delegateAnalysisEngine>

  <delegateAnalysisEngine key="uppercaseCounter">

    <import location="CountUppercaseAnnotator.*xml*"/>

  </delegateAnalysisEngine>

  <delegateAnalysisEngine key="annotatorRE">

    <import location="ProductNumberAnnotator.*xml*"/>

  </delegateAnalysisEngine>

</delegateAnalysisEngineSpecifiers>



<analysisEngineMetaData>

  <name>Token Handler</name>

  <description>Prints the features of tokens</description>



  <flowConstraints>

    <fixedFlow>

      <node>printer</node>

      <node>uppercaseCounter</node>

      <node>annotatorRE</node>

    </fixedFlow>

  </flowConstraints>



  <capabilities>

    <capability>

      <inputs>

        <type allAnnotatorFeatures="true">gate.*uima*.*cas*.Token</type>

        <type allAnnotatorFeatures="true">gate.*uima*.*cas*.Sentence</type>

        <type allAnnotatorFeatures="true">gate.*uima*.*cas*
.ProductNumber</type>

      </inputs>

      <outputs>

        <feature>gate.*uima*.*cas*.Token:UpperCaseLetters</feature>

        <feature>gate.*uima*.*cas*.ProductNumber:ProductLine</feature>

        <!--<type allAnnotatorFeatures="true">gate.*uima*
.mapping.AnnotationSource</type>-->

      </outputs>

    </capability>

  </capabilities>

</analysisEngineMetaData>

</taeDescription>







Then I have ProductNumberAnnotator.xml:



<taeDescription *xmlns*="http://uima.apache.org/resourceSpecifier";>

<frameworkImplementation>org.apache.uima.java</frameworkImplementation>

<primitive>true</primitive>

<annotatorImplementationName>gate.uima.examples.ProductNumberAnnotator</annotatorImplementationName>

<analysisEngineMetaData>

  <name>ProductNumber AE Descriptor</name>

  <description/>

  <version>1.0</version>

  <vendor/>

  <configurationParameters/>

  <configurationParameterSettings/>

<!-- TypeSystem Definition -->

  <typeSystemDescription>

      <imports>

        <import location="file:/C:/*dev*/workspace/gate/*plugins*
/UIMA/examples/*conf*/uima_descriptors/typeSystemDescriptor.*xml*"/>

      </imports>

  </typeSystemDescription>

<typePriorities/>

<fsIndexCollection/>

<!-- Capabilities: Inputs, Outputs, and Preconditions -->

  <capabilities>

      <capability>

        <inputs>

          <type>gate.*uima*.*cas*.ProductNumber</type>

        </inputs>

        <outputs>

          <type>gate.*uima*.*cas*.ProductNumber</type>

          <feature>gate.*uima*.*cas*.ProductNumber:ProductLine</feature>

        </outputs>

        <languagesSupported/>

      </capability>

  </capabilities>



</analysisEngineMetaData>

</taeDescription>







And typeSystemDescriptor.xml :



<typeSystemDescription *xmlns*="http://uima.apache.org/resourceSpecifier";>

  <name>ProductNumberTypeSystemDescriptor</name>

  <description>This type descriptor describes the ProductNumber type, which
can be used to search company reports, customer e-mails, and so
on.</description>

  <version>1.0</version>

  <vendor/>

  <types>

    <typeDescription>

      <name>gate.*uima*.*cas*.ProductNumber</name>

      <description>Product Type test Description</description>

      <supertypeName>*uima*.*tcas*.Annotation</supertypeName>

      <features>

        <featureDescription>

          <name>ProductLine</name>

          <description>Product line to which the product number
belongs</description>

          <rangeTypeName>*uima*.*cas*.String</rangeTypeName>

        </featureDescription>

      </features>

    </typeDescription>

  </types>

</typeSystemDescription>



And finally, I have ProductNumberAnnotator.java as the original one in the
tutorial.

(I renamed ProductNumberAEDescriptor.xml to ProductNumberAnnotator.xml in
order to follow same structure as UIMA Plugin in Gate

where the identifiers of the .xml and the .java coincide for a given
annotator.


And my application (a basic version of the provided TestUIMAInGATE.java)
looks like:


...

if(!gateInited) {
      Gate.setGateHome(gateHomeDir);
      Gate.init();
      // load ANNIE
      Gate.getCreoleRegister().registerDirectories(new
File(Gate.getPluginsHome(), "ANNIE").toURI().toURL());
      // load the UIMA plugin

Gate.getCreoleRegister().registerDirectories(uimaPluginDir.toURI().toURL());
      // load the example annotators into the GATE classloader

Gate.getCreoleRegister().registerDirectories(examplesDir.toURI().toURL());
      gateInited = true;
    }

...


    FeatureMap tokeniserParams = Factory.newFeatureMap();
    tokeniser =
(LanguageAnalyser)Factory.createResource("gate.creole.tokeniser.DefaultTokeniser",
tokeniserParams);

    FeatureMap sacParams = Factory.newFeatureMap();
    app =
(SerialAnalyserController)Factory.createResource("gate.creole.SerialAnalyserController",
sacParams);
    app.add(tokeniser);
    testCorpus = Factory.newCorpus("Test corpus");
    app.setCorpus(testCorpus);
  }

  /**
   * Test for a local analysis engine that updates a feature value.
   */
  public void testUpdatedOutput() throws Exception {
    File aeDescriptor = new File(descriptorsDir,
"TokenHandlerAggregateREUIMA.xml");
    updatedOutput(aeDescriptor);
  }

  /**
   * Run "updated" test against the given analysis engine.
   */
  private void updatedOutput(File aeDescriptor) throws Exception {
    File gateMapping = new File(mappingDir,
"TokenHandlerGateMappingREUIMA.xml");

    FeatureMap aeprParams = Factory.newFeatureMap();
    aeprParams.put("analysisEngineDescriptor",
aeDescriptor.toURI().toURL());
    aeprParams.put("mappingDescriptor", gateMapping.toURI().toURL());

    LanguageAnalyser aepr = (LanguageAnalyser)Factory.createResource(
            "gate.uima.AnalysisEnginePR", aeprParams);
    app.add(aepr);

    try {
      // create test document
      Document testDoc = Factory.newDocument("ONe Two THree four
UNI-23456");

      try {
        testCorpus.add(testDoc);
        try {
          app.execute();

          // Check the results
          AnnotationSet annots = testDoc.getAnnotations(); //Includes
SpaceToken
          System.out.println("XML: "+testDoc.toXml());
          System.out.println("ANNOTATIONS:
"+annots.getAllTypes().toString()); //<-ONLY SHOWS TOKEN AND SPACETOKEN
ANNOTATIONS, NEVER PRODUCTNUMBER ANNOTATIONS, EVEN THOUGH THEY ARE
ANNOTATED.

          // Printing the result to a document .xml
          try {
            BufferedWriter out = new BufferedWriter(new
FileWriter("C:/testFile/UIMATaggerREResult.xml"));
            out.write(testDoc.toXml().toString());
            out.close();
          } catch (IOException e) {System.out.println("Error writting result
into XML!!"); }
         }
...
  }

Any Ideas of what am I missing/doing wrong?

Thanks a lot!


-- 
********
 Natalia Díaz Rodríguez
 Åbo Akademi University, Turku, Finland
 Universidad de Granada, Spain
 [email protected]

Reply via email to