Hello, I am trying to integrate UIMA into Gate. I got working the RE annotator from the example shown in the UIMA IBM Tutorial, http://www.ibm.com/developerworks/webservices/tutorials/ws-uima/section7.html by Nicholas Chase. I am trying to embed the RE application in GATE within the UIMA Plugin folder in GATE in order to be able to call UIMA functionality from GATE.
I have followed the same structure/pattern as the CountLowercaseAnnotator that is provided in gate/plugins/UIMA. The annotations seem to be done but when printing the annotated file into xml the ProductNumber is never written/annotated. Only Token and SpaceToken are annotated. I registered the ProductNumberAnnotator.xml in AllTokenRelatedAnnotators.xml like follows: <taeDescription *xmlns*="http://uima.apache.org/resourceSpecifier"> <frameworkImplementation>org.apache.uima.java</frameworkImplementation> <primitive>false</primitive> <delegateAnalysisEngineSpecifiers> <delegateAnalysisEngine key="printer"> <import location="TokenPrinterAnnotator.*xml*"/> </delegateAnalysisEngine> <delegateAnalysisEngine key="counter"> <import location="CountLowercaseAnnotator.*xml*" /> </delegateAnalysisEngine> <delegateAnalysisEngine key="uppercaseCounter"> <import location="CountUppercaseAnnotator.*xml*" /> </delegateAnalysisEngine> <delegateAnalysisEngine key="remover"> <import location="RemoveEvenLengthTokens.*xml*" /> </delegateAnalysisEngine> <delegateAnalysisEngine key="annotatorRE"> <import location="ProductNumberAnnotator.*xml*" /> </delegateAnalysisEngine> </delegateAnalysisEngineSpecifiers> <analysisEngineMetaData> <name>Token *annotators*</name> <description> Meta-AE that combines all the *annotators* that use Token annotations, so JCasGen can generate *overarching* JCas classes that will work for all of the *annotators*. </description> <flowConstraints> <fixedFlow> <node>printer</node> <node>counter</node> <node>remover</node> <node>uppercaseCounter</node> <node>annotatorRE</node> </fixedFlow> </flowConstraints> <capabilities> <capability> <inputs> <type allAnnotatorFeatures="true">gate.*uima*.*cas*.Token</type> <type allAnnotatorFeatures="true">gate.*uima*.*cas* .ProductNumber</type> </inputs> <outputs> <feature>gate.*uima*.*cas*.Token:LowerCaseLetters</feature> <feature>gate.*uima*.*cas*.Token:UpperCaseLetters</feature> <feature>gate.*uima*.*cas*.ProductNumber:ProductLine</feature> </outputs> </capability> </capabilities> </analysisEngineMetaData> </taeDescription> I have the TokenHandlerGateMappingREUIMA.xml in gate/plugins/UIMA/examples/conf/mapping with the content: <uimaGateMapping> <inputs> <uimaAnnotation type="gate.*uima*.*cas*.Token" gateType="Token" indexed="true"> <feature name="String" kind="string"> <gateAnnotFeatureValue name="string" /> </feature> <feature name="Kind" kind="string"> <gateAnnotFeatureValue name="kind" /> </feature> <feature name="*Orth*" kind="string"> <gateAnnotFeatureValue name="*orth*" /> </feature> <feature name="POS" kind="string"> <gateAnnotFeatureValue name="category" /> </feature> </uimaAnnotation> <uimaAnnotation type="gate.*uima*.*cas*.Sentence" gateType="Sentence" indexed="true"> </uimaAnnotation> <uimaAnnotation type="gate.*uima*.*cas*.ProductNumber" gateType="ProductNumber" indexed="true"> <feature name="ProductLine" kind="string"> <gateAnnotFeatureValue name="productLine" /> </feature> </uimaAnnotation> </inputs> <outputs> <updated> <gateAnnotation type="Token" uimaType="gate.*uima*.*cas*.Token"> <feature name="numUpper"> <uimaFSFeatureValue name="gate.*uima*.*cas* .Token:UpperCaseLetters" kind="*int*" /> </feature> </gateAnnotation> <gateAnnotation type="ProductNumber" uimaType="gate.*uima*.*cas* .ProductNumber"> <feature name="productLine"> <uimaFSFeatureValue name="gate.*uima*.*cas* .ProductNumber:ProductLine" kind="string" /> </feature> </gateAnnotation> </updated> </outputs> </uimaGateMapping> And I have the file TokenHandlerAggregateREUIMA.xml in gate/plugins/UIMA/examples/conf/uima_descriptors with the content: <taeDescription *xmlns*="http://uima.apache.org/resourceSpecifier"> <frameworkImplementation>org.apache.uima.java</frameworkImplementation> <primitive>false</primitive> <delegateAnalysisEngineSpecifiers> <delegateAnalysisEngine key="printer"> <import location="TokenPrinterAnnotator.*xml*"/> </delegateAnalysisEngine> <delegateAnalysisEngine key="uppercaseCounter"> <import location="CountUppercaseAnnotator.*xml*"/> </delegateAnalysisEngine> <delegateAnalysisEngine key="annotatorRE"> <import location="ProductNumberAnnotator.*xml*"/> </delegateAnalysisEngine> </delegateAnalysisEngineSpecifiers> <analysisEngineMetaData> <name>Token Handler</name> <description>Prints the features of tokens</description> <flowConstraints> <fixedFlow> <node>printer</node> <node>uppercaseCounter</node> <node>annotatorRE</node> </fixedFlow> </flowConstraints> <capabilities> <capability> <inputs> <type allAnnotatorFeatures="true">gate.*uima*.*cas*.Token</type> <type allAnnotatorFeatures="true">gate.*uima*.*cas*.Sentence</type> <type allAnnotatorFeatures="true">gate.*uima*.*cas* .ProductNumber</type> </inputs> <outputs> <feature>gate.*uima*.*cas*.Token:UpperCaseLetters</feature> <feature>gate.*uima*.*cas*.ProductNumber:ProductLine</feature> <!--<type allAnnotatorFeatures="true">gate.*uima* .mapping.AnnotationSource</type>--> </outputs> </capability> </capabilities> </analysisEngineMetaData> </taeDescription> Then I have ProductNumberAnnotator.xml: <taeDescription *xmlns*="http://uima.apache.org/resourceSpecifier"> <frameworkImplementation>org.apache.uima.java</frameworkImplementation> <primitive>true</primitive> <annotatorImplementationName>gate.uima.examples.ProductNumberAnnotator</annotatorImplementationName> <analysisEngineMetaData> <name>ProductNumber AE Descriptor</name> <description/> <version>1.0</version> <vendor/> <configurationParameters/> <configurationParameterSettings/> <!-- TypeSystem Definition --> <typeSystemDescription> <imports> <import location="file:/C:/*dev*/workspace/gate/*plugins* /UIMA/examples/*conf*/uima_descriptors/typeSystemDescriptor.*xml*"/> </imports> </typeSystemDescription> <typePriorities/> <fsIndexCollection/> <!-- Capabilities: Inputs, Outputs, and Preconditions --> <capabilities> <capability> <inputs> <type>gate.*uima*.*cas*.ProductNumber</type> </inputs> <outputs> <type>gate.*uima*.*cas*.ProductNumber</type> <feature>gate.*uima*.*cas*.ProductNumber:ProductLine</feature> </outputs> <languagesSupported/> </capability> </capabilities> </analysisEngineMetaData> </taeDescription> And typeSystemDescriptor.xml : <typeSystemDescription *xmlns*="http://uima.apache.org/resourceSpecifier"> <name>ProductNumberTypeSystemDescriptor</name> <description>This type descriptor describes the ProductNumber type, which can be used to search company reports, customer e-mails, and so on.</description> <version>1.0</version> <vendor/> <types> <typeDescription> <name>gate.*uima*.*cas*.ProductNumber</name> <description>Product Type test Description</description> <supertypeName>*uima*.*tcas*.Annotation</supertypeName> <features> <featureDescription> <name>ProductLine</name> <description>Product line to which the product number belongs</description> <rangeTypeName>*uima*.*cas*.String</rangeTypeName> </featureDescription> </features> </typeDescription> </types> </typeSystemDescription> And finally, I have ProductNumberAnnotator.java as the original one in the tutorial. (I renamed ProductNumberAEDescriptor.xml to ProductNumberAnnotator.xml in order to follow same structure as UIMA Plugin in Gate where the identifiers of the .xml and the .java coincide for a given annotator. And my application (a basic version of the provided TestUIMAInGATE.java) looks like: ... if(!gateInited) { Gate.setGateHome(gateHomeDir); Gate.init(); // load ANNIE Gate.getCreoleRegister().registerDirectories(new File(Gate.getPluginsHome(), "ANNIE").toURI().toURL()); // load the UIMA plugin Gate.getCreoleRegister().registerDirectories(uimaPluginDir.toURI().toURL()); // load the example annotators into the GATE classloader Gate.getCreoleRegister().registerDirectories(examplesDir.toURI().toURL()); gateInited = true; } ... FeatureMap tokeniserParams = Factory.newFeatureMap(); tokeniser = (LanguageAnalyser)Factory.createResource("gate.creole.tokeniser.DefaultTokeniser", tokeniserParams); FeatureMap sacParams = Factory.newFeatureMap(); app = (SerialAnalyserController)Factory.createResource("gate.creole.SerialAnalyserController", sacParams); app.add(tokeniser); testCorpus = Factory.newCorpus("Test corpus"); app.setCorpus(testCorpus); } /** * Test for a local analysis engine that updates a feature value. */ public void testUpdatedOutput() throws Exception { File aeDescriptor = new File(descriptorsDir, "TokenHandlerAggregateREUIMA.xml"); updatedOutput(aeDescriptor); } /** * Run "updated" test against the given analysis engine. */ private void updatedOutput(File aeDescriptor) throws Exception { File gateMapping = new File(mappingDir, "TokenHandlerGateMappingREUIMA.xml"); FeatureMap aeprParams = Factory.newFeatureMap(); aeprParams.put("analysisEngineDescriptor", aeDescriptor.toURI().toURL()); aeprParams.put("mappingDescriptor", gateMapping.toURI().toURL()); LanguageAnalyser aepr = (LanguageAnalyser)Factory.createResource( "gate.uima.AnalysisEnginePR", aeprParams); app.add(aepr); try { // create test document Document testDoc = Factory.newDocument("ONe Two THree four UNI-23456"); try { testCorpus.add(testDoc); try { app.execute(); // Check the results AnnotationSet annots = testDoc.getAnnotations(); //Includes SpaceToken System.out.println("XML: "+testDoc.toXml()); System.out.println("ANNOTATIONS: "+annots.getAllTypes().toString()); //<-ONLY SHOWS TOKEN AND SPACETOKEN ANNOTATIONS, NEVER PRODUCTNUMBER ANNOTATIONS, EVEN THOUGH THEY ARE ANNOTATED. // Printing the result to a document .xml try { BufferedWriter out = new BufferedWriter(new FileWriter("C:/testFile/UIMATaggerREResult.xml")); out.write(testDoc.toXml().toString()); out.close(); } catch (IOException e) {System.out.println("Error writting result into XML!!"); } } ... } Any Ideas of what am I missing/doing wrong? Thanks a lot! -- ******** Natalia Díaz Rodríguez Åbo Akademi University, Turku, Finland Universidad de Granada, Spain [email protected]
