Hi, I just wanted to say thanks - your description gave me enough clues that I finally got this to work. I think I have some questions, though, about WHY certain things work, but since I am preparing to go out of town, I will wait on those. I need to understand what I did better so I can configure these things faster in the future.
thanks, Bonnie MacKellar On Thu, Jun 23, 2016 at 5:07 AM, Peter Klügl <[email protected]> wrote: > Hi, > > > sorry, here's just a short reply since I am currently travelling. If > the problem still exists I will try to reproduce it and reply with more > details next week. > > > Yes, in simple UIMA Ruta projects, these descriptors are copied to > descriptor/utils when you create the project. The descriptor folder is > listed in the buildpath as a "descriptor" folder, where imported > descriptors are searched in. > > UIMA Ruta supports currently two ways to find the descriptors: the > absolute paths specified in the descriptorPaths configuration parameter > and the classpath. Thus, the simplest way for you would be to use the > classpath to find the descriptor instead of the descriptorPaths (which > points to the descriptor folder of your ruta project). > > Changing the imports to something like: UIMAFIT > org.apache.uima.ruta.engine.PlainTextAnnotator should do the trick (you > need also to adapt the TYPESYSTEM import). Then the script does not > depend on the project structure. > > > If you use the SourceDocumentInformation type system in your ruta > script, then you need to include it separately. In some situtation, the > Ruta Workbench does that automatically for you. However, it is not > mentioned in types.txt in ruta-core. So you need to add it there in your > maven project so that the typesystem scanning of uimaFIT finds it. > > > If you create the analysis engine (descriptor) for a ruta script > programmatically, there are sometimes additional configuration > parameters that need to be set. In your use case, you import additional > analysis engine in your script. These need to be mentioned in the > corresponding configuration parameters, e.g., PARAM_ADDITIONAL_ENGINES > or PARAM_ADDITIONAL_UIMAFIT_ENGINES. Since there are several parameters > that are rather technical. I normally use the generated descriptor in > the uimaFIT factory. > > > Best, > > > Peter > > > Am 22.06.2016 um 21:55 schrieb Bonnie MacKellar: > > I am still trying to figure out how to count Ruta annotations across a > > bunch of input files. There doesn't seem to be any Workbench way to do > it. > > So now I am trying to call Ruta from UimaFit so I can do the job in Java. > > > > However, I am having serious configuration problems, plus I have a > question > > on how do bring in PlainTextAnnotator. > > > > I am using Maven, with the jcasgen-maven-plugin, the ruta-maven-plugin, > and > > the uimafit-maven-plugin. I will include the pom file at the end of this > > post. > > > > I want my Java code to be aware of the types declared in the Ruta script > - > > that is the whole point - I want to count those annotations. > > > > My Ruta script also uses PlainTextAnnotator. The problem with this is > that > > I can't figure out where to put it. In a Workbench based Ruta project, > > PlainTextAnnotator.xml and PlainTextAnnotatorTypeSystem get put > > automatically into descriptor/utils, along with a number of other > > descriptors that seem to be built into Ruta. But when I create a project > > using maven, there is no such location, and these descriptors do not get > > put anywhere. I tried a number of places but could not get my script to > see > > the type system for PlainTextAnnotator. Finally, I hit on putting the > files > > in target/generated-sources/ruta/descriptor/utils, and finally my script > is > > able to see the types and I can run it. This is good because at that > point, > > the ruta-maven-plugin does its job and generates the descriptors for my > > script. However, I suspect this is not a good place to put the > > PlainTextAnnotator files since doing a clean overwrites them. Where > should > > they go? Is there any entry in the pom file that is needed? > > > > The second problem is that although my Ruta script works nicely on its > own, > > the Java code fails. I get the following exception > > Exception in thread "main" org.apache.uima.cas.CASRuntimeException: JCas > > type "org.apache.uima.examples.SourceDocumentInformation" used in Java > > code, but was not declared in the XML type descriptor. > > at org.apache.uima.jcas.impl.JCasImpl.getTypeInit(JCasImpl.java:435) > > at org.apache.uima.jcas.impl.JCasImpl.getType(JCasImpl.java:408) > > at org.apache.uima.jcas.cas.TOP.<init>(TOP.java:96) > > at org.apache.uima.jcas.cas.AnnotationBase.<init>(AnnotationBase.java:66) > > at org.apache.uima.jcas.tcas.Annotation.<init>(Annotation.java:54) > > at > > > org.apache.uima.examples.SourceDocumentInformation.<init>(SourceDocumentInformation.java:80) > > at > > > org.apache.uima.examples.cpe.FileSystemCollectionReader.getNext(FileSystemCollectionReader.java:162) > > at > > > org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:149) > > at PipelineSystem.<init>(PipelineSystem.java:59) > > at PipelineSystem.main(PipelineSystem.java:73) > > > > I am guessing that I need to put some other descriptor somewhere but I > > can't figure out what it might be. Here is the code that causes the > problem > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > import java.io.IOException; > > import java.util.Iterator; > > > > import org.apache.uima.UIMAException; > > import org.apache.uima.analysis_engine.AnalysisEngine; > > import org.apache.uima.analysis_engine.AnalysisEngineDescription; > > import org.apache.uima.analysis_engine.AnalysisEngineProcessException; > > import org.apache.uima.cas.Type; > > import org.apache.uima.cas.TypeSystem; > > import org.apache.uima.collection.CollectionReaderDescription; > > import org.apache.uima.examples.cpe.FileSystemCollectionReader; > > import org.apache.uima.fit.component.CasDumpWriter; > > import org.apache.uima.fit.factory.AnalysisEngineFactory; > > import org.apache.uima.fit.factory.CollectionReaderFactory; > > import org.apache.uima.fit.pipeline.SimplePipeline; > > import org.apache.uima.jcas.JCas; > > import org.apache.uima.resource.ResourceInitializationException; > > import org.apache.uima.ruta.engine.RutaEngine; > > > > public class PipelineSystem { > > public PipelineSystem() throws IOException, UIMAException > > { > > try { > > CollectionReaderDescription readerDesc = > > CollectionReaderFactory.createReaderDescription( > > FileSystemCollectionReader.class, > > FileSystemCollectionReader.PARAM_INPUTDIR, > > "/home/bonnie/Research/eclipse-uima-projects/PipeLineWithRuta/input", > > FileSystemCollectionReader.PARAM_ENCODING, "UTF-8", > > FileSystemCollectionReader.PARAM_LANGUAGE, "English"); > > AnalysisEngine rae = AnalysisEngineFactory.createEngine(RutaEngine.class, > > RutaEngine.PARAM_MAIN_SCRIPT, > > "ecClassifierRules"); > > AnalysisEngineDescription rutaEngineDesc = > > AnalysisEngineFactory.createEngineDescription(RutaEngine.class, > > RutaEngine.PARAM_MAIN_SCRIPT, > > "ecClassifierRules"); > > AnalysisEngineDescription writerDesc = > > AnalysisEngineFactory.createEngineDescription(CasDumpWriter.class, > > CasDumpWriter.PARAM_OUTPUT_FILE, "dump.txt"); > > JCas jCas = rae.newJCas(); > > SimplePipeline.runPipeline(readerDesc, rutaEngineDesc); > > displayRutaResults(jCas); > > } catch (ResourceInitializationException e) { > > // TODO Auto-generated catch block > > e.printStackTrace(); > > } catch (AnalysisEngineProcessException e) { > > // TODO Auto-generated catch block > > e.printStackTrace(); > > } > > } > > > > public static void main(String[] args) throws IOException, > UIMAException { > > PipelineSystem p = new PipelineSystem(); > > > > } > > > > public void displayRutaResults(JCas jCas) > > { > > System.out.println("in display ruta results"); > > TypeSystem ts = jCas.getTypeSystem(); > > Iterator<Type> typeItr = ts.getTypeIterator(); > > while (typeItr.hasNext()) { > > Type type = (Type) typeItr.next(); > > if (type.getName().equals("INCL")) { > > System.out.println("INCL was found"); > > } > > } > > } > > > ------------------------------------------------------------------------------------------------------------------------------------------------ > > > > Yes, I know the code doesn't actually count annotations yet - this is > > strictly a test of the configuration. The type INCL is declared in the > > script > > > > ENGINE utils.PlainTextAnnotator; TYPESYSTEM utils.PlainTextTypeSystem; > > Document{-> RETAINTYPE(BREAK)}; Document{-> EXEC(PlainTextAnnotator, > > {Line})}; > > > > DECLARE INCL; "INCLUSION" -> INCL; > > > > And finally, here is the pom file. I note that the ruta pugin and the > > jcasegen plugin are correctly generating the descriptor files for the > > script and the Java classes for the types. I have this set up so that the > > jcasgen plugin reads the type descriptors from the folder that is > generated > > by the ruta-maven-plugin (I saw this in one of the examples mentioned > > elsewhere on this mailing lsit) > > However, the uimafit plugin does not generate anything. > > > > thanks for any help. It is really hard to figure out all these moving > parts. > > > > Bonnie MacKellar > > > > > --------------------------------------------------------------------------------------------------------------------------------- > > > > <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi=" > > http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=" > > http://maven.apache.org/POM/4.0.0 > > http://maven.apache.org/xsd/maven-4.0.0.xsd"> > > <modelVersion>4.0.0</modelVersion> <groupId>PipeLineWithRuta</groupId> > > <artifactId>PipeLineWithRuta</artifactId> > <version>0.0.1-SNAPSHOT</version> > > <packaging>jar</packaging> <name>PipeLineWithRuta</name> <url> > > http://maven.apache.org</url> <properties> > > <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> > > </properties> <build> <sourceDirectory>src/main/java</sourceDirectory> > > <resources> <resource> <directory>src/main/ruta</directory> </resource> > > <resource> <directory>src/desc</directory> </resource> </resources> > > <plugins> <plugin> <artifactId>maven-compiler-plugin</artifactId> > > <version>3.3</version> <configuration> <source>1.8</source> > > <target>1.8</target> </configuration> </plugin> <plugin> > > <groupId>org.apache.uima</groupId> > > <artifactId>jcasgen-maven-plugin</artifactId> <version>2.4.1</version> > <!-- > > change this to the latest version --> <executions> <execution> <goals> > > <goal>generate</goal> </goals> <!-- this is the only goal --> <!-- runs > in > > phase process-resources by default --> <configuration> <!-- REQUIRED --> > > <typeSystemIncludes> <!-- one or more ant-like file patterns identifying > > top level descriptors --> > > > <typeSystemInclude>target/generated-sources/ruta/descriptor/ecClassifierRulesTypeSystem.xml</typeSystemInclude> > > </typeSystemIncludes> <!-- OPTIONAL --> <!-- a sequence of ant-like file > > patterns to exclude from the above include list --> <typeSystemExcludes> > > </typeSystemExcludes> <!-- OPTIONAL --> <!-- where the generated files go > > --> <!-- default value: > > ${project.build.directory}/generated-sources/jcasgen" --> > <outputDirectory> > > </outputDirectory> <!-- true or false, default = false --> <!-- if true, > > then although the complete merged type system will be created internally, > > only those types whose definition is contained within this maven project > > will be generated. The others will be presumed to be available via other > > projects. --> <!-- OPTIONAL --> <limitToProject>true</limitToProject> > > </configuration> </execution> </executions> </plugin> <plugin> > > <groupId>org.apache.uima</groupId> > > <artifactId>ruta-maven-plugin</artifactId> <version>2.3.1</version> > > <configuration> <scriptPaths> <scriptPath>src/main/ruta/</scriptPath> > > </scriptPaths> <!-- Descriptor paths of the generated analysis engine > > descriptor. --> <!-- default value: none --> <descriptorPaths> > > > <descriptorPath>${project.build.directory}/generated-sources/ruta/descriptor</descriptorPath> > > </descriptorPaths> <!-- Resource paths of the generated analysis engine > > descriptor. --> <!-- default value: none --> <resourcePaths> > > <resourcePath>${project.build.directory}/generated-sources/ruta/ > > resources/</resourcePath> </resourcePaths> > > <analysisEngineSuffix>Engine</analysisEngineSuffix> > > <typeSystemSuffix>TypeSystem</typeSystemSuffix> <!-- Type of type system > > imports. false = import by location. --> <!-- default value: false --> > > <importByName>false</importByName> <!-- Option to resolve imports while > > building. --> <!-- default value: false --> > > <resolveImports>false</resolveImports> <!-- List of packages with > language > > extensions --> <!-- default value: none --> <extensionPackages> > > <extensionPackage>org.apache.uima.ruta</extensionPackage> > > </extensionPackages> <!-- Add UIMA Ruta nature to .project --> <!-- > default > > value: false --> <addRutaNature>true</addRutaNature> <!-- Buildpath of > the > > UIMA Ruta Workbench (IDE) for this project --> <!-- default value: none > --> > > <buildPaths> <buildPath>script:src/main/ruta/</buildPath> > > <buildPath>descriptor:target/generated-sources/ruta/descriptor/ > > </buildPath> <buildPath>resources:src/main/resources/</buildPath> > > </buildPaths> </configuration> <executions> <execution> <id>default</id> > > <phase>process-classes</phase> <goals> <goal>generate</goal> </goals> > > </execution> </executions> </plugin> <plugin> > > <groupId>org.apache.uima</groupId> > > <artifactId>uimafit-maven-plugin</artifactId> <version>2.2.0</version> > <!-- > > change to latest version --> <configuration> <!-- OPTIONAL --> <!-- Path > > where the generated resources are written. --> <outputDirectory> > > ${project.build.directory}/generated-sources/uimafit </outputDirectory> > > <!-- OPTIONAL --> <!-- Skip generation of > > META-INF/org.apache.uima.fit/components.txt --> > > <skipComponentsManifest>false</skipComponentsManifest> <!-- OPTIONAL --> > > <!-- Source file encoding. --> > > <encoding>${project.build.sourceEncoding}</encoding> </configuration> > > <executions> <execution> <id>default</id> <phase>process-classes</phase> > > <goals> <goal>generate</goal> </goals> </execution> </executions> > </plugin> > > </plugins> </build> <dependencies> <dependency> > > <groupId>org.apache.uima</groupId> <artifactId>uimafit-core</artifactId> > > <version>2.2.0</version> </dependency> <dependency> > > <groupId>org.apache.uima</groupId> <artifactId>uimaj-core</artifactId> > > <version>2.8.1</version> </dependency> <dependency> > > <groupId>org.apache.uima</groupId> > > <artifactId>ruta-maven-plugin</artifactId> <version>2.3.1</version> > > </dependency> <dependency> <groupId>org.apache.uima</groupId> > > <artifactId>uimaj-cpe</artifactId> <version>2.8.1</version> </dependency> > > <dependency> <groupId>org.apache.uima</groupId> > > <artifactId>uimaj-examples</artifactId> <version>2.8.1</version> > > </dependency> </dependencies> </project> > > > >
