We could distribute it with our main release, similar to how we do with
opennlp-uima. I think that would make sense. If people would like to use it
they can add it as an extra dependency.

There are probably also other thing we can distribute in a similar fashion
with the next release.

Jörn

On Fri, Jul 15, 2016 at 3:34 PM, William Colen <william.co...@gmail.com>
wrote:

> Not only licensing, but also I think we try to keep OpenNLP without
> external dependencies. The Morfologik also has some dependencies itself.
>
>
> 2016-07-15 4:55 GMT-03:00 Rodrigo Agerri <rage...@apache.org>:
>
> > Great stuff, William.
> >
> > I have been using Morfologik stemming for a long time and when we
> > included it we put it as an addon. I assume that the reason was its
> > license, but reading Morfologik license it is not clear to me why is
> > is not Apache compatible.
> >
> > If it is, it would be nice to include it directly in OpenNLP.
> >
> > Can anyone shed any light on this?
> >
> > Thanks,
> >
> > R
> >
> > On Fri, Jul 15, 2016 at 12:02 AM, William Colen <william.co...@gmail.com
> >
> > wrote:
> > > Hello,
> > >
> > > A while back we started working on a Morfologik Addon.
> > >
> > > http://svn.apache.org/viewvc/opennlp/addons/
> > >
> > > I checked it out last week and notice it was outdated, specially
> because
> > it
> > > was not using the latest Morfologik version. Also it was missing
> > > documentation.
> > >
> > > You can find more about Morfologik here:
> > > https://github.com/morfologik/morfologik-stemming
> > >
> > > Morfologik provides tools for finite state automata (FSA) construction
> > and
> > > dictionary-based morphological dictionaries.
> > >
> > > The Morfologik Addon implements some OpenNLP interfaces and extends
> some
> > > classes to make it easier to use of FSA Morfologik dictionaries:
> > >
> > >    - opennlp.morfologik.tagdict.MorfologikPOSTaggerFactory
> > >       - Extends: opennlp.tools.postag.POSTaggerFactory
> > >       - Helps creating a POSTagger model with an embedded TagDictionary
> > >       based on FSA
> > >    - opennlp.morfologik.tagdict.MorfologikTagDictionary
> > >    - Implements: opennlp.tools.postag.TagDictionary
> > >       - A TagDictionary based on FSA is much smaller than the defaul
> XML
> > >       based, and consumes less memory.
> > >    - opennlp.morfologik.lemmatizer.MorfologikLemmatizer
> > >    - Implements: opennlp.tools.lemmatizer.DictionaryLemmatizer
> > >       - A dictionary based lemmatizer that uses FSA dictionary.
> > >
> > > It also provides a command line interface that allows:
> > >
> > >    - MorfologikDictionaryBuilder
> > >       - builds a binary POS Dictionary using Morfologik
> > >    - XMLDictionaryToTable
> > >       - reads an OpenNLP XML tag dictionary and outputs it in a tab
> > >       separated file that can be built into a FSA dictionary
> > >
> > >
> > > In a project I developed it was of great help. The TAG Dictionary for
> POS
> > > Tag was huge (something like 50 MB), requiring a lot of memory.
> > > Migrating it to a FSA dictionary allowed not only a smaller model, but
> > also
> > > I could use the model without the need to increase the JVM memory.
> > >
> > > More here:
> > >
> > https://cwiki.apache.org/confluence/display/OPENNLP/FSA+Dictionary+with+
> morfologik-addon
> > >
> > > Hope it will be helpful.
> > >
> > > William
> >
>

Reply via email to