On Tue, Jan 27, 2009 at 3:00 PM, Burn Lewis <[email protected]> wrote:
> You could also consider UIMA-AS.  If you have a multiple annotator pipeline
> it can run each annotator in a separate thread and so have multiple CASes
> active in the same pipeline.  Individual annotators can be scaled out to get
> further speedups.  You can specify an aggregate that starts with a
> collection reader and ends with a cas consumer and then adjust the
> deployment descriptor to get the appropriate CAS pool size and scaleout
> values.

At the moment my pipeline is composed by 5 annotators:
tokenizer
sentence
morphilogy
Jape (here a lot of ojects: map forward to jape and back to uima)
lucene indexer (here other objects: lucene documents)

Architecturally speaking I've a Listener bounded to a queue of "jobs"
to be done: each job is a "collection" to be passed to the CR.
After a collection is completed, I reset the underlyng cp and restart
it with the new collection (job).
In the future I want try to scale out by sharing the queue with
terracotta in a master/worker architecture.
But I will take a look to uima-as too.

-- 
Roberto Franchini
http://www.celi.it
http://www.blogmeter.it
http://www.memesphere.it
Tel +39-011-6600814
jabber:[email protected] skype:ro.franchini

Reply via email to