Re: Patch for Lucene/Solr

Lance Norskog Thu, 31 May 2012 14:10:38 -0700

Thanks. I have looked at UIMA several times and it seemed very
complex. It has a lot of features, is mature, has an Eclipse app
builder, etc. I could not keep it all in my head at once. The
Solr/Lucene document pipeline features give little space for NLP
features. Hydra or OpenPipeline give UIMA and OpenNLP "room to
breathe".


Are there free annotated text databases for UIMA? OpenNLP does not use
any with open licences. It has binary models made from copyrighted
annotations and so they cannot be checked into Apache.

On Wed, May 30, 2012 at 6:11 PM, Christian Moen <[email protected]> wrote:
> Hello Lance,
>
> This is very cool!  I'm looking forward to having a look at this.
>
>
> Christian Moen
> http://atilika.com
>
> On May 31, 2012, at 9:54 AM, Lance Norskog wrote:
>
>> I'm creating a patch to integrate OpenNLP into the Lucene/Solr
>> project. The SentenceDetector, Tokenizer, POS tagger, Chunker, and NER
>> tools are included. The SentenceDetector and Tokenizer are a Lucene
>> Tokenizer, and a Lucene TokenFilter takes this stream and runs
>> POS/Chunking/NER on it, saving the tags as upper-case payloads. The
>> patch includes a couple of handy combinations. For example, make a
>> more focused search index by only indexing the nouns & verbs.
>>
>> Do you have any hints on how to package it? The documentation should
>> include how to download and install the models.
>>
>> --
>> Lance Norskog
>> [email protected]
>



-- 
Lance Norskog
[email protected]

Re: Patch for Lucene/Solr

Reply via email to