I'm creating a patch to integrate OpenNLP into the Lucene/Solr project. The SentenceDetector, Tokenizer, POS tagger, Chunker, and NER tools are included. The SentenceDetector and Tokenizer are a Lucene Tokenizer, and a Lucene TokenFilter takes this stream and runs POS/Chunking/NER on it, saving the tags as upper-case payloads. The patch includes a couple of handy combinations. For example, make a more focused search index by only indexing the nouns & verbs.
Do you have any hints on how to package it? The documentation should include how to download and install the models. -- Lance Norskog [email protected]
