On Tue, Aug 24, 2010 at 1:04 PM, mvillarino <mvillar...@gmail.com> wrote:
>>> I've got a question: does it make sense to use Spanish stemmer for
>>> Catalan?  How about Portuguese for Galician?
>> I don't really know about this. I'll forward this message to the
>> galician free software localization mailing list.
> I have ever tryed to stem Galician with Portuguese stemmer, however it
> should fail for verbs with enclitic particles (amabachellela ->
> amava-che-lhe-la)
> Regarding stemmers and CAT tools, Lokalize from trunk (heave not yet
> tested kde sc 4.5) stems prior to searching into glossary, that is, it
> stems the source text before searching for matches into the stemmed
> version of the glossary (because not everybody uses the glossary for
> terminology only). But despite Mikola considered initially the use of
> snowball, it has finally been done through Hunspell's stemmer, thus
> supporting a wider set of languages.
> The results? well, It is so fresh that possibly most of Lokalize's
> users had not yet realized about this functionality. I did some
> testing on a pre-release checkout of the sources and I personaly like
> the result, but...
> ...but please take into account that by stemming hunspell refers to
> doing a "reverse spellchecking", so for each word it offers as stems
> watever word in the dictionary can be derived into the word in the
> text, so please expect a lot of "false matches". By the way, i find
> them very usefull both to check the quality of the glossary as well as
> to pray for this process to use additional information from a pos
> tagger some day into the near future.

Now you have implemented stemming we couldn't search the translation
for some words like "filtering" that is a substantive and also a verb.
Have you considered some idea to solve this, Jacek?
Proxecto mailing list

Responderlle a