Hello,

I used morfologik and LanguageTool for grammar correction. It can be
tricky to create and re-create the binary dictionaries, although it is
true that once is created the speed is very good.

In any case, that would also create a dependence on morfologik for
creating and accessing the dictionaries.

Cheers,

Rodrigo



On Wed, Apr 10, 2013 at 2:34 PM, William Colen <[email protected]> wrote:
> Hi,
>
> +1 for a lemmatizer API
>
> For my Master's project I created a lemma dictionary, which keys were the
> [token + POS tag] and the value one or more lemmas.
>
> To store and access the entries I used a very nice Java tool available
> under BSD license that is part of the Morfologik tool (
> http://sourceforge.net/projects/morfologik). This tool encodes the
> dictionary in a finite-state automata, allowing a very efficient access and
> a compact dictionary.
> The tool also provide a efficient way of encoding and accessing lexical
> dictionaries.
>
> The LanguageTools members wrote a tutorial on how to use Morfologik for
> this: http://wiki.languagetool.org/developing-a-tagger-dictionary
>
>
>
> On Wed, Apr 10, 2013 at 9:02 AM, Rodrigo Agerri <[email protected]>wrote:
>
>> On Wed, Apr 10, 2013 at 1:00 PM, Jörn Kottmann <[email protected]>
>> wrote:>
>> >
>> > +1, it would be nice to have control over the dictionary, maybe we can
>> come
>> > up with
>> > a format to store it in. That will allow us to easily include it in our
>> > models
>> > as a resource for feature generation and eliminates the dependency on
>> > external libraries.
>>
>> I do not know yet which dictionary format will be best, but I can try
>> to come up with a proposal independent of WordNet or other third party
>> resources, when I have it working, and then discuss it.
>>
>> >
>> > +1
>> >
>> > We should define an interface which allows to use different
>> implementations
>> > like
>> > we did for the other components.
>>
>> OK.
>>
>> Cheers,
>>
>> Rodrigo
>>

Reply via email to