Yes. I see your point.

Here I store the dictionary as a text file and encoding the dictionary is
part of the build process, so it is easy to update the dictionary.
Maybe we should create an API that supports multiple implementations, a
default implementation can use JWNL, which is already available. We can
create other implementations in the sandbox as optional packages.


On Wed, Apr 10, 2013 at 10:14 AM, Rodrigo Agerri <[email protected]>wrote:

> Hello,
>
> I used morfologik and LanguageTool for grammar correction. It can be
> tricky to create and re-create the binary dictionaries, although it is
> true that once is created the speed is very good.
>
> In any case, that would also create a dependence on morfologik for
> creating and accessing the dictionaries.
>
> Cheers,
>
> Rodrigo
>
>
>
> On Wed, Apr 10, 2013 at 2:34 PM, William Colen <[email protected]>
> wrote:
> > Hi,
> >
> > +1 for a lemmatizer API
> >
> > For my Master's project I created a lemma dictionary, which keys were the
> > [token + POS tag] and the value one or more lemmas.
> >
> > To store and access the entries I used a very nice Java tool available
> > under BSD license that is part of the Morfologik tool (
> > http://sourceforge.net/projects/morfologik). This tool encodes the
> > dictionary in a finite-state automata, allowing a very efficient access
> and
> > a compact dictionary.
> > The tool also provide a efficient way of encoding and accessing lexical
> > dictionaries.
> >
> > The LanguageTools members wrote a tutorial on how to use Morfologik for
> > this: http://wiki.languagetool.org/developing-a-tagger-dictionary
> >
> >
> >
> > On Wed, Apr 10, 2013 at 9:02 AM, Rodrigo Agerri <[email protected]
> >wrote:
> >
> >> On Wed, Apr 10, 2013 at 1:00 PM, Jörn Kottmann <[email protected]>
> >> wrote:>
> >> >
> >> > +1, it would be nice to have control over the dictionary, maybe we can
> >> come
> >> > up with
> >> > a format to store it in. That will allow us to easily include it in
> our
> >> > models
> >> > as a resource for feature generation and eliminates the dependency on
> >> > external libraries.
> >>
> >> I do not know yet which dictionary format will be best, but I can try
> >> to come up with a proposal independent of WordNet or other third party
> >> resources, when I have it working, and then discuss it.
> >>
> >> >
> >> > +1
> >> >
> >> > We should define an interface which allows to use different
> >> implementations
> >> > like
> >> > we did for the other components.
> >>
> >> OK.
> >>
> >> Cheers,
> >>
> >> Rodrigo
> >>
>

Reply via email to