[translate-pootle] Glossary stuff (was: Re: Frequncy list)

2009-01-22 Thread Samuel Murray (Groenkloof)
Leandro Regueiro wrote:

 Another thing is that in a good glossary doesn't appear words. A good
 glossary has only concepts as entries, and several entries could have
 the same word (because words could have several meanings).

That is fine, from an academic point of view, but the fact is that a 
glossary function must have the ability to recognise items from the 
source text that are in the glossary.  No program can recognise 
concepts.  Only words can be matched.  Therefore, glossaries must be 
word based.

 Sometimes could be a good idea having several glossaries, because you
 don't use the same words in Battle for Wesnoth or in Firefox, for
 example.

Well, I think a super list is not a bad idea.  Any project manager can 
then take the super list and make the changes to it that he thinks is 
best for his particular project, but the super list remains unchanged.

Isn't Martin Benjamin working on such a list via AnLoc?
http://africanlocalisation.net/en/terminology

 A good support (or even only support) for glossaries is a great lack
 of a lot of CAT programs. In Lokalize there is some support for this
 http://youonlylivetwice.info/lokalize/lokalize-glossary.htm

Well, I think there are four important glossary tasks in CAT tools, 
namely term recognition, term insertion, term adding and term editing. 
Term recognition is an automatic process whereby the tool searches 
existing glossaries for matching terms in the current source text 
segment.  Term insertion is the ability to insert a term's translation 
into the target field in some easy way.  Term adding is the ability to 
add terms (and their translations) to glossaries used by the term 
recognition function.  Term editing is the ability to make changes to 
existing glossary entries.

Most CAT tools that I know of, offer term recognition.  Even if a tool 
offers only term recognition, it can already benefit greatly from a 
pre-existing super glossary.

For comparison:  A CAT tool that offers only term recognition (not the 
other three) is OmegaT.  A CAT tool that offers both term recognition 
and term insertion, is Pootle.  In both OmegaT and Pootle, it is not 
possible to add terms to the glossary without using a separate program. 
  OmegaT's glossaries are easier to edit (use a text editor) but you 
must reload the project each time.  Pootle's glossaries are more 
difficult to edit (unless you're running a local Pootle), but new terms 
are recognised immediately (if I remember correctly).

 From the presentation, it appears that KBab^H^H^H^HLokalize can do term 
recognition, term insertion and term adding (and possibly also term 
editing).

A way to judge a CAT tool's term recognition is (a) whether it can do 
fuzzy matching when doing glossary recognition, and (b) whether one can 
customise the matching process using techniques like (i) stemming and 
(ii) setting truncation rules.  If I remember correctly, Pootle can do 
#a but not #b.  OmegaT can do neither.  Wordfast can do #a, #b1 and #b2.

A way to judge a CAT tool's term insertion is (a) whether it can be done 
using only the keyboard and (b) whether it can make changes to the 
target text term in the light of the current text (eg (i) if the SL word 
starts with a capital letter, but the glossary item does not, will the 
CAT tool insert the target term with a capital letter, or (ii) if the SL 
word contains an accelerator, can the CAT tool give the inserted 
translation an accelerator also).  Pootle fails on both #a and #b. 
Wordfast can do #a and #b1 but not #b2.

How does Lokalize fare in the light of the above?

What other CAT tools were you thinking of when you made your comment?

Samuel



-- 
Samuel Murray
sam...@translate.org.za
Decathlon, for volunteer opensource translations
http://translate.sourceforge.net/wiki/decathlon/

--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
Translate-pootle mailing list
Translate-pootle@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/translate-pootle


Re: [translate-pootle] Glossary stuff (was: Re: Frequncy list)

2009-01-22 Thread Leandro Regueiro
 Another thing is that in a good glossary doesn't appear words. A good
 glossary has only concepts as entries, and several entries could have
 the same word (because words could have several meanings).

 That is fine, from an academic point of view, but the fact is that a
 glossary function must have the ability to recognise items from the source
 text that are in the glossary.  No program can recognise concepts.  Only
 words can be matched.  Therefore, glossaries must be word based.

Since I think glossaries are maintained by humans, the glossaries
could be concept based.


 Sometimes could be a good idea having several glossaries, because you
 don't use the same words in Battle for Wesnoth or in Firefox, for
 example.

 Well, I think a super list is not a bad idea.  Any project manager can then
 take the super list and make the changes to it that he thinks is best for
 his particular project, but the super list remains unchanged.

I think that in the terminology server should be maintained several
glossaries, without merging. The CAT tool should be able to work
against one of them, several of them (at the same time, perhaps
merging some of them on some way), or against all of them (merging
them all). Althought this terminology server could give the
possibility to download in several formats or making queries
(searching some word like you could make in open-tran against several
TMs).


 Isn't Martin Benjamin working on such a list via AnLoc?
 http://africanlocalisation.net/en/terminology

Perhaps. In the last times there are lot of tools for translating,
maintaining TM, glossaries... Too much for me.


 A good support (or even only support) for glossaries is a great lack
 of a lot of CAT programs. In Lokalize there is some support for this
 http://youonlylivetwice.info/lokalize/lokalize-glossary.htm

 Well, I think there are four important glossary tasks in CAT tools, namely
 term recognition, term insertion, term adding and term editing. Term
 recognition is an automatic process whereby the tool searches existing
 glossaries for matching terms in the current source text segment.  Term
 insertion is the ability to insert a term's translation into the target
 field in some easy way.  Term adding is the ability to add terms (and their
 translations) to glossaries used by the term recognition function.  Term
 editing is the ability to make changes to existing glossary entries.

 Most CAT tools that I know of, offer term recognition.  Even if a tool
 offers only term recognition, it can already benefit greatly from a
 pre-existing super glossary.

 For comparison:  A CAT tool that offers only term recognition (not the other
 three) is OmegaT.  A CAT tool that offers both term recognition and term
 insertion, is Pootle.  In both OmegaT and Pootle, it is not possible to add
 terms to the glossary without using a separate program.  OmegaT's glossaries
 are easier to edit (use a text editor) but you must reload the project each
 time.  Pootle's glossaries are more difficult to edit (unless you're running
 a local Pootle), but new terms are recognised immediately (if I remember
 correctly).

 From the presentation, it appears that KBab^H^H^H^HLokalize can do term
 recognition, term insertion and term adding (and possibly also term
 editing).

Yes, perhaps could do term editing, but if we set up a terminology
server, the term editing should be considered term suggestion that
must be approved by some user of the terminology server (a human).


 A way to judge a CAT tool's term recognition is (a) whether it can do fuzzy
 matching when doing glossary recognition, and (b) whether one can customise
 the matching process using techniques like (i) stemming and (ii) setting
 truncation rules.  If I remember correctly, Pootle can do #a but not #b.
  OmegaT can do neither.  Wordfast can do #a, #b1 and #b2.

Where is exact matching? I think that in TMs fuzzy matching is
very important, but in glossaries it isn't so important.


 A way to judge a CAT tool's term insertion is (a) whether it can be done
 using only the keyboard and (b) whether it can make changes to the target
 text term in the light of the current text (eg (i) if the SL word starts
 with a capital letter, but the glossary item does not, will the CAT tool
 insert the target term with a capital letter, or (ii) if the SL word
 contains an accelerator, can the CAT tool give the inserted translation an
 accelerator also).  Pootle fails on both #a and #b. Wordfast can do #a and
 #b1 but not #b2.

 How does Lokalize fare in the light of the above?

I really don't know. I don't use Lokalize yet. Ask Shaforostoff.


 What other CAT tools were you thinking of when you made your comment?

I was thinking on Gtranslator, Poedit...

Bye,
Leandro Regueiro

--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.

Re: [translate-pootle] Glossary stuff

2009-01-22 Thread Samuel Murray (Groenkloof)
Leandro Regueiro wrote:

 Samuel wrote:

 That is fine, from an academic point of view, but the fact is that
 a glossary function must have the ability to recognise items from
 the source text that are in the glossary.  No program can recognise
 concepts.  Only words can be matched.  Therefore, glossaries must
 be word based.

 Since I think glossaries are maintained by humans, the glossaries 
 could be concept based.

I'm interested to know how a glossary server would match a concept (from
the glossary) to a word (in the source text).  Or... were you thinking
of having a glossary server that doesn't perform any automatic matching
of words from the source text?

 Isn't Martin Benjamin working on such a list via AnLoc? 
 http://africanlocalisation.net/en/terminology

 Perhaps. In the last times there are lot of tools for translating, 
 maintaining TM, glossaries... Too much for me.

No, the Anloc Terminology project is not a tool -- it is a list.  It is
a list of 2500 terms, to be translated into many African languages.  If
one can get one's hands on that list, it could be a useful start for a
super list of GUI terms.  Martin's list also has nothing to do with TM.

 Yes, perhaps could do term editing, but if we set up a terminology 
 server, the term editing should be considered term suggestion that 
 must be approved by some user of the terminology server (a human).

And this is why such a terminology server will fail.  If users who add
terms find that their expertise is not respected by the community, and
that their contributions are regarded as second-rate until formally
approved by some other guy, they will lose interest in participating.

 A way to judge a CAT tool's term recognition is (a) whether it can
 do fuzzy matching when doing glossary recognition...

 Where is exact matching? I think that in TMs fuzzy matching is 
 very important, but in glossaries it isn't so important.

I did not mention exact matching because I assumed that exact matching
is a given.

Fuzzy matching can be important in glossaries if the glossary does not
contain all possible permutations of a word from the source text.  If
the glossary contains file but not files, will the CAT tool give a
result if the source text contains files?

Samuel

-- 
Samuel Murray
sam...@translate.org.za
Decathlon, for volunteer opensource translations
http://translate.sourceforge.net/wiki/decathlon/

--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
Translate-pootle mailing list
Translate-pootle@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/translate-pootle


Re: [translate-pootle] Glossary stuff (was: Re: Frequncy list)

2009-01-22 Thread Marce Villarino
O Xoves 22 Xaneiro 2009 18:13, Samuel Murray (Groenkloof) escribiu:
  Another thing is that in a good glossary doesn't appear words. A good
  glossary has only concepts as entries, and several entries could have
  the same word (because words could have several meanings).

 That is fine, from an academic point of view, but the fact is that a
 glossary function must have the ability to recognise items from the
 source text that are in the glossary.  No program can recognise
 concepts.  Only words can be matched.  Therefore, glossaries must be
 word based.

Both tm and glossaries usefull for me because:
a) They makes me translate faster
b) They help me using the same target text for the same source text. 
(Corollary: they help keeping a consistent style)
c) They help me to use standard wording, particularly the glossary.

By language standardization I mean reduction of polysemy/synonymy, that is do 
not use a the same word/expresion to refer to several meanings, and also, do 
not use several words/expresions to refer to a single meaning.

So, my vote goes to a glossary with meaning as the primary key concept, 
and languages, translations, subbordinated to meaning.

That still gives the chance to lookup words !!, given that a proper 
configuration is set (source and target languages), and that glossary 
contains the pair source-=meaning=-target.
Sure, if the glossary contains several entries with source word, each for a  
different meaning (obviously), then several target word can be suggested, 
each for it's meaning, if the glossary contains a translation for that 
meaning, of course.

-- 
Best regards,
MV


pgpNJVHPxsnlr.pgp
Description: PGP signature
--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword___
Translate-pootle mailing list
Translate-pootle@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/translate-pootle