Not sure this is going to help, but here goes. I find
this kind of situation quite common. So I don't use a
CPE, but just run an aggregate under my own control.
Create a UIMA app with something like this:
public class UimaApp {
public static void main(String[] args) {
try {
// Get Resource Specifier from XML file
XMLInputSource in = new XMLInputSource(args[0]);
ResourceSpecifier specifier =
UIMAFramework.getXMLParser().parseResourceSpecifier(in);
// Instantiate analysis engine
AnalysisEngine ae = UIMAFramework.produceAnalysisEngine(specifier);
// Create CAS
CAS cas = ae.newCAS();
// Now process documents (in a loop, hooked up to a queue or whatever is
convenient...)
// Reset CAS before processing
cas.reset();
// Set document text and do other initialization
cas.setDocumentText("Document text goes here");
// Run ae on CAS
ae.process(cas);
// If necessary, get results out of CAS...
System.out.println("Number of annotations: " +
cas.getAnnotationIndex().size());
} catch (Exception e) {
e.printStackTrace();
}
}
}
So if the only thing you need the CPE for is the collection
reader, this is an alternative for you.
--Thilo
Christoph Büscher wrote:
Hi,
so far I've always used UIMA CPEs to read whole collections of documents
from e.g. a source directory. In a new application it will be necessary
to run a CPE on new documents beeing passed to it by another application
(outside UIMA). It would be nice to be able to simply hand single
documents over to a collection reader and then simply to "run/wake up"
the CPE to process the document.
My idea was to put the incoming documents into a waiting queue, register
this at a custom collection reader and then let the
hasNext/getNext-Method simply to ask the queue if there is work to do.
But when "hasNext()" in the collection reader returns "false", the CPE
stops execution.
Is it possible to put a reader or the whole CPE into a "waiting" mode,
or is the only solution to always restart the whole CPE once new
documents have arrived to be processed? Has anybody dealt with a similar
situation so far and has any "best practices" to share? How do you
handle them ?
Thanks,