Re: [open-tran] Stemming

2010-08-24 Conversa mvillarino
 I've got a question: does it make sense to use Spanish stemmer for
 Catalan?  How about Portuguese for Galician?

 I don't really know about this. I'll forward this message to the
 galician free software localization mailing list.

I have ever tryed to stem Galician with Portuguese stemmer, however it
should fail for verbs with enclitic particles (amabachellela -
amava-che-lhe-la)

Regarding stemmers and CAT tools, Lokalize from trunk (heave not yet
tested kde sc 4.5) stems prior to searching into glossary, that is, it
stems the source text before searching for matches into the stemmed
version of the glossary (because not everybody uses the glossary for
terminology only). But despite Mikola considered initially the use of
snowball, it has finally been done through Hunspell's stemmer, thus
supporting a wider set of languages.
The results? well, It is so fresh that possibly most of Lokalize's
users had not yet realized about this functionality. I did some
testing on a pre-release checkout of the sources and I personaly like
the result, but...
...but please take into account that by stemming hunspell refers to
doing a reverse spellchecking, so for each word it offers as stems
watever word in the dictionary can be derived into the word in the
text, so please expect a lot of false matches. By the way, i find
them very usefull both to check the quality of the glossary as well as
to pray for this process to use additional information from a pos
tagger some day into the near future.

All the best,
Marce Villarino
___
Proxecto mailing list
Proxecto@trasno.net
http://listas.trasno.net/listinfo/proxecto


Re: [open-tran] Stemming

2010-08-24 Conversa Miguel Solla
En realidade, dáme o corpo que o stemmer de hunspell non se comporta así.
Nas probas que fixen, o stemmer ($hunspell -s) identifica un lema só cando
ten un único sufixo (e non o identifica cando hai dobre recursividade nos
sufixos, como é o caso dos pronomes enclíticos):

amabas
amabas amar

amábachellela
amábachellela

trouxo
trouxo traer

tróuxocho
tróuxocho

pedras
pedras pedra

Acerta na flexión nominal e verbal, pero non atina con pronomes enclíticos.
Igual en futuras versións de hunspell...


2010/8/24 mvillarino mvillar...@gmail.com

  I've got a question: does it make sense to use Spanish stemmer for
  Catalan?  How about Portuguese for Galician?
 
  I don't really know about this. I'll forward this message to the
  galician free software localization mailing list.

 I have ever tryed to stem Galician with Portuguese stemmer, however it
 should fail for verbs with enclitic particles (amabachellela -
 amava-che-lhe-la)

 Regarding stemmers and CAT tools, Lokalize from trunk (heave not yet
 tested kde sc 4.5) stems prior to searching into glossary, that is, it
 stems the source text before searching for matches into the stemmed
 version of the glossary (because not everybody uses the glossary for
 terminology only). But despite Mikola considered initially the use of
 snowball, it has finally been done through Hunspell's stemmer, thus
 supporting a wider set of languages.
 The results? well, It is so fresh that possibly most of Lokalize's
 users had not yet realized about this functionality. I did some
 testing on a pre-release checkout of the sources and I personaly like
 the result, but...
 ...but please take into account that by stemming hunspell refers to
 doing a reverse spellchecking, so for each word it offers as stems
 watever word in the dictionary can be derived into the word in the
 text, so please expect a lot of false matches. By the way, i find
 them very usefull both to check the quality of the glossary as well as
 to pray for this process to use additional information from a pos
 tagger some day into the near future.

 All the best,
 Marce Villarino
 ___
 Proxecto mailing list
 Proxecto@trasno.net
 http://listas.trasno.net/listinfo/proxecto

___
Proxecto mailing list
Proxecto@trasno.net
http://listas.trasno.net/listinfo/proxecto


Traducións KDE en Ubuntu non completadas

2010-08-24 Conversa Fran Dieguez
Ola rapaces,

revisando en Ubuntu vexo que hai unha morea de paquetes de KDE que non
están completos:

https://translations.edge.launchpad.net/ubuntu/maverick/+lang/gl/+index?start=300batch=50
páxinas anteriores e sucesivas.

e pregúntome se existe algún problema polo cal non se estean importando
correctamente ou se mesmo no voso grupo non o podedes completar.

Eu en GNOME tamén teño bastantes que non están importadas máis que si
están traducidas en upstream. De tódolos xeitos agardo que cando se
libere a versión oficial e final de GNOME 2.32 se importen todas. Está
pasando isto mesmo en KDE?

Saúdos

___
Proxecto mailing list
Proxecto@trasno.net
http://listas.trasno.net/listinfo/proxecto


Re: [open-tran] Stemming

2010-08-24 Conversa mvillarino
2010/8/24, Miguel Solla brado...@gmail.com:
 En realidade, dáme o corpo que o stemmer de hunspell non se comporta así.

Moi posibelmente

 Igual en futuras versións de hunspell...

Non se ninguén fai unha RFE
___
Proxecto mailing list
Proxecto@trasno.net
http://listas.trasno.net/listinfo/proxecto