FWIW I agree with this perspective. So many Java-based tools grow into configuration monsters (cough! Hadoop). I very much appreciate the simplicity and self-containedness of OpenNLP, which makes it straightforward to integrate into many different architectures, even non-Java. In fact, I'd argue for making this aspect of the project one of its defining characteristics.
jds On Mon, Jul 9, 2012 at 2:18 PM, Jeyendran Balakrishnan <[email protected]> wrote: > It is exactly the "production"-oriented aspects of UIMA that, IMHO, make it > unattractive for many users. > Almost all commercial implementations of NLP use NLP as one of the components > in their overall software stack, > i.e., they typically have their own custom framework, platform, and > architecture that do many things other than NLP. > What these commercial systems need are open source libraries to do specific > things, leaving their developers free to put them together according to their > own requirements and tradeoffs. > Any NLP system that forces the developer to use one particular way (eg UIMA) > of putting things together will not be attractive, > and might steer them away from otherwise great algorithm implementations due > to the significant additional baggage that the algorithms come with. > > Today, there are so many ways of connecting components together, including > workflows, platforms, configuration management, parallel batch processing > (Hadoop, anyone?), parallel stream processing (Storm), etc. Almost the entire > code-base of UIMA has nothing to do with NLP. There's a big reason why IBM > gave away all this code to Apache, and kept their core algorithms to > themselves - it was clear to them where the value is. At least to me, it is > clear where the value is - algorithms, and not frameworks. > > Software developers in industry are very capable of easily putting together > their own frameworks. What they need help with are core NLP algorithms that > they don't have the background to do themselves. One example I would suggest > (at least according my view), is the difference between Lucene and Nutch. > Being a library, Lucene has pretty much taken over search engine software > development. Nutch, on the other hand, tries to be a full-fledged platform > for crawling, indexing and search, and has not gathered anywhere near the > same usage levels. > > My vote is to please keep OpenNLP clean, smart, algorithm-centered, > user-focused. > Keep it simple. > Math, stat, and algos. > And excel at it. > > Please don’t dumb OpenNLP down with unnecessary bloat that any decent > software team can do easily, and might often prefer to implement in a > different way. > Connectors, not merging. > > My two bits... :-) > > Cheers, > Jeyendran > > > -----Original Message----- > From: Jörn Kottmann [mailto:[email protected]] > Sent: Monday, July 09, 2012 1:50 AM > To: [email protected] > Subject: Re: Apache "Text Analysis" top-level project? > > On 07/09/2012 05:56 AM, Lance Norskog wrote: >> Would it make sense to join OpenNLP, UIMA, and Open Relevance into one >> top-level "Text Analysis" project? There are already cross-project >> connections between UIMA and OpenNLP. ORP seems dormant. It also seems >> a more natural place than OpenNLP for a database of tagged text. >> >> > > OpenNLP and UIMA align nicely in my opinion. OpenNLP just implements engines > for various NLP tasks without any further support. > UIMA on the other side can do a lot of these additional things you need to > run OpenNLP in a production system e.g. scaling the engines to many machines, > providing workflow support, resource loading and management, etc. > So there is not really an overlap between the two. > > UIMA has some NLP related addons in their sandbox, some of them duplicate > functionality which is also provided by OpenNLP e.g. pos tagging, or the > dictionary annotator, but that does not seem to be that much. > > Lucene contains a lot of NLP code for stemming and word segmentation in > different languages. Thats probably the biggest NLP related code base next to > OpenNLP at Apache. > > Jörn > > >
