I like OpenNLP for this reason too. There are too many things that can be done wrong when making a framework, especially from my narrow perspective!
Peace. Michael On Mon, Jul 9, 2012 at 11:57 AM, John Stewart <[email protected]> wrote: > FWIW I agree with this perspective. So many Java-based tools grow > into configuration monsters (cough! Hadoop). I very much appreciate > the simplicity and self-containedness of OpenNLP, which makes it > straightforward to integrate into many different architectures, even > non-Java. In fact, I'd argue for making this aspect of the project > one of its defining characteristics. > > jds > > On Mon, Jul 9, 2012 at 2:18 PM, Jeyendran Balakrishnan > <[email protected]> wrote: >> It is exactly the "production"-oriented aspects of UIMA that, IMHO, make it >> unattractive for many users. >> Almost all commercial implementations of NLP use NLP as one of the >> components in their overall software stack, >> i.e., they typically have their own custom framework, platform, and >> architecture that do many things other than NLP. >> What these commercial systems need are open source libraries to do specific >> things, leaving their developers free to put them together according to >> their own requirements and tradeoffs. >> Any NLP system that forces the developer to use one particular way (eg UIMA) >> of putting things together will not be attractive, >> and might steer them away from otherwise great algorithm implementations due >> to the significant additional baggage that the algorithms come with. >> >> Today, there are so many ways of connecting components together, including >> workflows, platforms, configuration management, parallel batch processing >> (Hadoop, anyone?), parallel stream processing (Storm), etc. Almost the >> entire code-base of UIMA has nothing to do with NLP. There's a big reason >> why IBM gave away all this code to Apache, and kept their core algorithms to >> themselves - it was clear to them where the value is. At least to me, it is >> clear where the value is - algorithms, and not frameworks. >> >> Software developers in industry are very capable of easily putting together >> their own frameworks. What they need help with are core NLP algorithms that >> they don't have the background to do themselves. One example I would suggest >> (at least according my view), is the difference between Lucene and Nutch. >> Being a library, Lucene has pretty much taken over search engine software >> development. Nutch, on the other hand, tries to be a full-fledged platform >> for crawling, indexing and search, and has not gathered anywhere near the >> same usage levels. >> >> My vote is to please keep OpenNLP clean, smart, algorithm-centered, >> user-focused. >> Keep it simple. >> Math, stat, and algos. >> And excel at it. >> >> Please don’t dumb OpenNLP down with unnecessary bloat that any decent >> software team can do easily, and might often prefer to implement in a >> different way. >> Connectors, not merging. >> >> My two bits... :-) >> >> Cheers, >> Jeyendran >> >> >> -----Original Message----- >> From: Jörn Kottmann [mailto:[email protected]] >> Sent: Monday, July 09, 2012 1:50 AM >> To: [email protected] >> Subject: Re: Apache "Text Analysis" top-level project? >> >> On 07/09/2012 05:56 AM, Lance Norskog wrote: >>> Would it make sense to join OpenNLP, UIMA, and Open Relevance into one >>> top-level "Text Analysis" project? There are already cross-project >>> connections between UIMA and OpenNLP. ORP seems dormant. It also seems >>> a more natural place than OpenNLP for a database of tagged text. >>> >>> >> >> OpenNLP and UIMA align nicely in my opinion. OpenNLP just implements engines >> for various NLP tasks without any further support. >> UIMA on the other side can do a lot of these additional things you need to >> run OpenNLP in a production system e.g. scaling the engines to many >> machines, providing workflow support, resource loading and management, etc. >> So there is not really an overlap between the two. >> >> UIMA has some NLP related addons in their sandbox, some of them duplicate >> functionality which is also provided by OpenNLP e.g. pos tagging, or the >> dictionary annotator, but that does not seem to be that much. >> >> Lucene contains a lot of NLP code for stemming and word segmentation in >> different languages. Thats probably the biggest NLP related code base next >> to OpenNLP at Apache. >> >> Jörn >> >> >>
