Processing a Text Collection Several Times?

Susanne Neumann Mon, 17 Jun 2013 12:21:02 -0700

Hi,

I want to implement a bootstrapping algorithm using UIMA, which requires 
processing a whole text collection several times. With each iteration, new 
evidence based on the results of the previous runs on all the documents is 
collected and applied. The number of iterations is determined at runtime.


I planned to write a bootstrapping AE, but I can't figure out how to 
iteratively process the collection with UIMA, because the process method 
processes the text collection only once.

As a workaround, I am considering to add the annotator several times to the 
pipeline. However, it depends on the order in which the documents are processed 
withing the pipeline to work as desired. In which order are documents being 
processed in a pipeline? Does each component process the whole text collection 
first before going to the next component, or is every document of the 
collection being processed by each component first? In the latter case, the 
workaround would not work. Another solution is, to run a whole Pipeline 
containing the annotator several times.

Is there any better way to iteratively process a text collection than those 
workarounds? Which of them would work? Any hints on this are welcome.

Thanks,
Susanne

Processing a Text Collection Several Times?

Reply via email to