On Tue, Jan 27, 2009 at 3:00 PM, Burn Lewis <[email protected]> wrote: > You could also consider UIMA-AS. If you have a multiple annotator pipeline > it can run each annotator in a separate thread and so have multiple CASes > active in the same pipeline. Individual annotators can be scaled out to get > further speedups. You can specify an aggregate that starts with a > collection reader and ends with a cas consumer and then adjust the > deployment descriptor to get the appropriate CAS pool size and scaleout > values.
At the moment my pipeline is composed by 5 annotators: tokenizer sentence morphilogy Jape (here a lot of ojects: map forward to jape and back to uima) lucene indexer (here other objects: lucene documents) Architecturally speaking I've a Listener bounded to a queue of "jobs" to be done: each job is a "collection" to be passed to the CR. After a collection is completed, I reset the underlyng cp and restart it with the new collection (job). In the future I want try to scale out by sharing the queue with terracotta in a master/worker architecture. But I will take a look to uima-as too. -- Roberto Franchini http://www.celi.it http://www.blogmeter.it http://www.memesphere.it Tel +39-011-6600814 jabber:[email protected] skype:ro.franchini
