Re: Lucene cas consumer

Grant Ingersoll Thu, 11 Dec 2008 12:52:18 -0800

Coming late to the conversation... Just offering some Luceneperspective


On Dec 4, 2008, at 1:36 PM, Niels Ott wrote:

What Lucene cannot do - or at least not without a lot of hacking - is
aggregating analyses as UIMA can using the CAS. Usually your knowledge
grows during an UIMA-based NLP-pipeline: you add the a tokenannotation,
a lemma annotation, a POS-annotation and so on...  In Lucene, you have
the classical pipeline: the output replaces the input. (Yes, by
subclassing Lucene's "Token" class, one can fiddle around the issue,but
it is not elegant at all.)

You might find the TeeTokenFilter and SinkTokenizer interesting formapping/aggregating tokens/extractions out to other fields in Lucene.

Also, Lucene is getting more flexible in terms of indexing andsearching. You can attach payloads to terms (i.e. byte arrays) whichcan provide some crude annotation storage and https://issues.apache.org/jira/browse/LUCENE-1422and a couple of other issues are the start of more flexibility toadd attributes that can then be indexed. We're still working on thesearch side of it, but I think you will see more in the way offlexible indexing in the coming months that should be a nice win forUIMA + Lucene users.

What makes Lucene + UIMA interesting for me is a simple fact: I can do
all the NLP I want and be as flexible as I need in UIMA. Then I canfeed
the outcome (or rather: a small part of it) into a Lucene index.

In my special case, I'm not using a CAS Consumer, but I can imagine
other people would appreciate it in their application scenarios.
To conclude: Lucene and UIMA aren't competitors, but in some caseshaving one feeding the other is what you want.


Couldn't agree more!

Cheers,
Grant

Re: Lucene cas consumer

Reply via email to