Thanks. I have looked at UIMA several times and it seemed very complex. It has a lot of features, is mature, has an Eclipse app builder, etc. I could not keep it all in my head at once. The Solr/Lucene document pipeline features give little space for NLP features. Hydra or OpenPipeline give UIMA and OpenNLP "room to breathe".
Are there free annotated text databases for UIMA? OpenNLP does not use any with open licences. It has binary models made from copyrighted annotations and so they cannot be checked into Apache. On Wed, May 30, 2012 at 6:11 PM, Christian Moen <[email protected]> wrote: > Hello Lance, > > This is very cool! I'm looking forward to having a look at this. > > > Christian Moen > http://atilika.com > > On May 31, 2012, at 9:54 AM, Lance Norskog wrote: > >> I'm creating a patch to integrate OpenNLP into the Lucene/Solr >> project. The SentenceDetector, Tokenizer, POS tagger, Chunker, and NER >> tools are included. The SentenceDetector and Tokenizer are a Lucene >> Tokenizer, and a Lucene TokenFilter takes this stream and runs >> POS/Chunking/NER on it, saving the tags as upper-case payloads. The >> patch includes a couple of handy combinations. For example, make a >> more focused search index by only indexing the nouns & verbs. >> >> Do you have any hints on how to package it? The documentation should >> include how to download and install the models. >> >> -- >> Lance Norskog >> [email protected] > -- Lance Norskog [email protected]
