We could distribute it with our main release, similar to how we do with opennlp-uima. I think that would make sense. If people would like to use it they can add it as an extra dependency.
There are probably also other thing we can distribute in a similar fashion with the next release. Jörn On Fri, Jul 15, 2016 at 3:34 PM, William Colen <william.co...@gmail.com> wrote: > Not only licensing, but also I think we try to keep OpenNLP without > external dependencies. The Morfologik also has some dependencies itself. > > > 2016-07-15 4:55 GMT-03:00 Rodrigo Agerri <rage...@apache.org>: > > > Great stuff, William. > > > > I have been using Morfologik stemming for a long time and when we > > included it we put it as an addon. I assume that the reason was its > > license, but reading Morfologik license it is not clear to me why is > > is not Apache compatible. > > > > If it is, it would be nice to include it directly in OpenNLP. > > > > Can anyone shed any light on this? > > > > Thanks, > > > > R > > > > On Fri, Jul 15, 2016 at 12:02 AM, William Colen <william.co...@gmail.com > > > > wrote: > > > Hello, > > > > > > A while back we started working on a Morfologik Addon. > > > > > > http://svn.apache.org/viewvc/opennlp/addons/ > > > > > > I checked it out last week and notice it was outdated, specially > because > > it > > > was not using the latest Morfologik version. Also it was missing > > > documentation. > > > > > > You can find more about Morfologik here: > > > https://github.com/morfologik/morfologik-stemming > > > > > > Morfologik provides tools for finite state automata (FSA) construction > > and > > > dictionary-based morphological dictionaries. > > > > > > The Morfologik Addon implements some OpenNLP interfaces and extends > some > > > classes to make it easier to use of FSA Morfologik dictionaries: > > > > > > - opennlp.morfologik.tagdict.MorfologikPOSTaggerFactory > > > - Extends: opennlp.tools.postag.POSTaggerFactory > > > - Helps creating a POSTagger model with an embedded TagDictionary > > > based on FSA > > > - opennlp.morfologik.tagdict.MorfologikTagDictionary > > > - Implements: opennlp.tools.postag.TagDictionary > > > - A TagDictionary based on FSA is much smaller than the defaul > XML > > > based, and consumes less memory. > > > - opennlp.morfologik.lemmatizer.MorfologikLemmatizer > > > - Implements: opennlp.tools.lemmatizer.DictionaryLemmatizer > > > - A dictionary based lemmatizer that uses FSA dictionary. > > > > > > It also provides a command line interface that allows: > > > > > > - MorfologikDictionaryBuilder > > > - builds a binary POS Dictionary using Morfologik > > > - XMLDictionaryToTable > > > - reads an OpenNLP XML tag dictionary and outputs it in a tab > > > separated file that can be built into a FSA dictionary > > > > > > > > > In a project I developed it was of great help. The TAG Dictionary for > POS > > > Tag was huge (something like 50 MB), requiring a lot of memory. > > > Migrating it to a FSA dictionary allowed not only a smaller model, but > > also > > > I could use the model without the need to increase the JVM memory. > > > > > > More here: > > > > > https://cwiki.apache.org/confluence/display/OPENNLP/FSA+Dictionary+with+ > morfologik-addon > > > > > > Hope it will be helpful. > > > > > > William > > >