Hi, I want to implement a bootstrapping algorithm using UIMA, which requires processing a whole text collection several times. With each iteration, new evidence based on the results of the previous runs on all the documents is collected and applied. The number of iterations is determined at runtime.
I planned to write a bootstrapping AE, but I can't figure out how to iteratively process the collection with UIMA, because the process method processes the text collection only once. As a workaround, I am considering to add the annotator several times to the pipeline. However, it depends on the order in which the documents are processed withing the pipeline to work as desired. In which order are documents being processed in a pipeline? Does each component process the whole text collection first before going to the next component, or is every document of the collection being processed by each component first? In the latter case, the workaround would not work. Another solution is, to run a whole Pipeline containing the annotator several times. Is there any better way to iteratively process a text collection than those workarounds? Which of them would work? Any hints on this are welcome. Thanks, Susanne
