We've been using UIMA within a web service that processes one document at a time. The single document is sent as a SOAP attachment. Metadata configuring our UIMA components is included within the SOAP message. The service module (which can also be executed as a standalone application) gets a CAS from the pool, loads the document into the CAS, sequentially calls the requested UIMA components, pulls the resulting document(s) from the CAS and returns them to the service caller.
Rather than using a CPE, we wrote our own processing manager. We use a xml configuration file, similar in format to the cpe descriptor. The configuration file lists, in order, the readers, analysis engines, and consumers that need to be instantiated. It also allows for overwriting the respective descriptors. For example, any parameters and resource file names can be changed via the configuration file. The processing manager reads in the descriptors, modifies the contents (if requested), saves the modified descriptors as an array of InputStreams. The processing manager then creates the UIMA components and configures the CAS pool. At this point we are not utilizing any of the sofa and type input/output information to validate the order of execution - this processing manager controller is used for well-defined applications. The processing manager uses the metadata to control which reader(s) and consumer(s) to execute. The web service will accept documents in multiple formats (xml, html, text). The getNext() on the relevant reader pulls the document from the CAS and parses it (the metadata may also include xpath or similar instructions on how to find the plain text portions), placing the result into the CAS. After the analysis engines complete the annotating, one or more consumers are executed in order to build the requested output (annotated original, annotated extract, annotations only, xmi, etc.). Terry Heinze Thomson Corporation R&D -----Original Message----- From: Christoph Büscher [mailto:[EMAIL PROTECTED] Sent: Friday, February 15, 2008 8:31 AM To: [email protected] Subject: How to "push" documents into CPE/CollectionReader? Hi, so far I've always used UIMA CPEs to read whole collections of documents from e.g. a source directory. In a new application it will be necessary to run a CPE on new documents beeing passed to it by another application (outside UIMA). It would be nice to be able to simply hand single documents over to a collection reader and then simply to "run/wake up" the CPE to process the document. My idea was to put the incoming documents into a waiting queue, register this at a custom collection reader and then let the hasNext/getNext-Method simply to ask the queue if there is work to do. But when "hasNext()" in the collection reader returns "false", the CPE stops execution. Is it possible to put a reader or the whole CPE into a "waiting" mode, or is the only solution to always restart the whole CPE once new documents have arrived to be processed? Has anybody dealt with a similar situation so far and has any "best practices" to share? How do you handle them ? Thanks, -- -------------------------------- Christoph Büscher
