Re: missing Ruta annotations from uimaFit

Peter Klügl Thu, 07 Jul 2016 01:48:56 -0700

Nope, that should not be a problem since the types are initialized in
process()



Am 07.07.2016 um 10:26 schrieb Richard Eckart de Castilho:
> Hi,
>
> iteratePipeline and runPipeline should be mostly equivalent.
> A difference occurs if you e.g. have a CAS multiplier within
> an aggregate engine. 
>
> runPipeline delegates the execution to the UIMA core and is able
> to handle CAS multipliers. 
>
> iteratePipeline (re)uses a single CAS instance which is passed
> to the reader and all analysis engines in turn. It does not
> support CAS multipliers.
>
> A user recently pointed out that uimaFIT 2.2.0 reintroduces a bug
> in iteratePipeline - typeSystemInit() is not called [1].
>
> @Peter: could the missing call to typeSystemInit() be a problem for Ruta?
>
> Cheers,
>
> -- Richard
>
> [1] https://issues.apache.org/jira/browse/UIMA-4998
>
>> On 07.07.2016, at 09:17, Peter Klügl <[email protected]> wrote:
>>
>> Hi,
>>
>>
>> I have no idea yet why the code with iteratePipeline does not work.
>>
>>
>> Richard, do you have an idea?
>>
>>
>> Are there any exceptions? Do you use the rae objects somewhere? Is your
>> code hosted somewhere, e.g., on github? What do you mean by your own
>> annotations? Annotations of an external type system or annotations added
>> by another engine or reader?
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>> Am 06.07.2016 um 02:41 schrieb Bonnie MacKellar:
>>> I have a very lengthy Ruta script which annotates my files successfully. I
>>> can see all the annotations in AnnotationBrowser and they are correct.
>>> I want to get all the annotations in a Java program, so I can count
>>> occurrences.  I am using uimaFit. I am getting very odd results.
>>>
>>> When I use CasDumpWriter, I see all my annotations, correctly written to
>>> the dump file. Here is the code that does this
>>> -------------------------------------------------------------------------------------------------------
>>> AnalysisEngineDescription rutaEngineDesc =
>>> AnalysisEngineFactory.createEngineDescription(RutaEngine.class,
>>> RutaEngine.PARAM_MAIN_SCRIPT,
>>>           "ecClassifier",
>>>           RutaEngine.PARAM_SCRIPT_PATHS, new String[]
>>> {"/home/bonnie/Research/eclipse-uima-projects/counttypes/src/main/ruta"},
>>>           RutaEngine.PARAM_DESCRIPTOR_PATHS,  new String[]
>>> {"/home/bonnie/Research/eclipse-uima-projects/counttypes/target/generated-sources/ruta/descriptor"},
>>>           RutaEngine.PARAM_ADDITIONAL_UIMAFIT_ENGINES,
>>> "org.apache.uima.ruta.engine.PlainTextAnnotator");
>>> AnalysisEngineDescription writerDesc =
>>> AnalysisEngineFactory.createEngineDescription(CasDumpWriter.class,
>>> CasDumpWriter.PARAM_OUTPUT_FILE, "dump2.txt");
>>> AnalysisEngine rae = AnalysisEngineFactory.createEngine(rutaEngineDesc);
>>> SimplePipeline.runPipeline(readerDesc, rutaEngineDesc, writerDesc);
>>> -----------------------------------------------------------------------------------------------------
>>>
>>> However, when I try to do this myself, using iteratePipeline to iterate
>>> through the JCas structures for each input file, many of the annotations
>>> are missing. I have a suspicion that the missing annotations are ones that
>>> annotate text for which there is another annotation.   For example, text
>>> will be annotated with Line, and with my own annotation. My code to print
>>> the annotations is based on the code in CasDumpWriter.
>>>
>>> -----------------------------------------------------------------------------------------------------
>>>
>>> for (JCas jcas : SimplePipeline.iteratePipeline(readerDesc,
>>> rutaEngineDesc)) {
>>> displayRutaResults(jcas);
>>>
>>>
>>> public void displayRutaResults(JCas jcas)
>>> {
>>> System.out.println("in display ruta results");
>>>
>>>     FSIterator<Annotation> annotationIter =
>>> jcas.getAnnotationIndex().iterator();
>>>     while (annotationIter.hasNext())
>>>     {
>>>     AnnotationFS annotation = annotationIter.next();
>>>     System.out.println(annotation.getType().getName());
>>>     System.out.println(annotation.getCoveredText());
>>>
>>>     System.out.println("------------------------------------------");
>>>    //  System.out.println(annotation.toString());
>>>     }
>>> }
>>>
>>> ------------------------------------------------------------------------------------------------
>>>
>>> Why would this code produce different results than CasDumpWriter, which
>>> uses almost exactly the same code?   Is it something to do with using
>>> runPipeline vs iteratePipeline? Should I write my code so it can be placed
>>> inside runPipeline?
>>>
>>> thanks so much!
>>> Bonnie MacKellar
>>>

Re: missing Ruta annotations from uimaFit

Reply via email to