It's up!

https://issues.apache.org/jira/browse/LUCENE-2899

It has sentence/tokenizer, pos, chunking and NER. Also some utility
filters to fiddle with payloads. It is smart about caching models.

It is done as a Lucene tokenizer/tokenfilter which is a fairly limiting arena.

The opennlp build needs a little upgrading to work with the license
validation in the Lucene build. OPENNLP-511 requests this.

On Fri, Jun 1, 2012 at 4:18 AM, Svetoslav Marinov
<[email protected]> wrote:
> At Findwise we active use a number of OpenNLP components with both Hydra
> and OpenPipeline when indexing with Solr.
>
> I look forward to see the result of the patch!
>
> Best,
> Svetoslav
>
> On 2012-05-31 23:10, "Lance Norskog" <[email protected]> wrote:
>
>>Thanks. I have looked at UIMA several times and it seemed very
>>complex. It has a lot of features, is mature, has an Eclipse app
>>builder, etc. I could not keep it all in my head at once. The
>>Solr/Lucene document pipeline features give little space for NLP
>>features. Hydra or OpenPipeline give UIMA and OpenNLP "room to
>>breathe".
>>
>>Are there free annotated text databases for UIMA? OpenNLP does not use
>>any with open licences. It has binary models made from copyrighted
>>annotations and so they cannot be checked into Apache.
>>
>>On Wed, May 30, 2012 at 6:11 PM, Christian Moen <[email protected]> wrote:
>>> Hello Lance,
>>>
>>> This is very cool!  I'm looking forward to having a look at this.
>>>
>>>
>>> Christian Moen
>>> http://atilika.com
>>>
>>> On May 31, 2012, at 9:54 AM, Lance Norskog wrote:
>>>
>>>> I'm creating a patch to integrate OpenNLP into the Lucene/Solr
>>>> project. The SentenceDetector, Tokenizer, POS tagger, Chunker, and NER
>>>> tools are included. The SentenceDetector and Tokenizer are a Lucene
>>>> Tokenizer, and a Lucene TokenFilter takes this stream and runs
>>>> POS/Chunking/NER on it, saving the tags as upper-case payloads. The
>>>> patch includes a couple of handy combinations. For example, make a
>>>> more focused search index by only indexing the nouns & verbs.
>>>>
>>>> Do you have any hints on how to package it? The documentation should
>>>> include how to download and install the models.
>>>>
>>>> --
>>>> Lance Norskog
>>>> [email protected]
>>>
>>
>>
>>
>>--
>>Lance Norskog
>>[email protected]
>>
>
>



-- 
Lance Norskog
[email protected]

Reply via email to