Re: [translate-pootle] Glossary stuff (was: Re: Frequncy list)
O Xoves 22 Xaneiro 2009 18:13, Samuel Murray (Groenkloof) escribiu: > > Another thing is that in a good glossary doesn't appear words. A good > > glossary has only concepts as entries, and several entries could have > > the same word (because words could have several meanings). > > That is fine, from an academic point of view, but the fact is that a > glossary function must have the ability to recognise items from the > source text that are in the glossary. No program can recognise > concepts. Only words can be matched. Therefore, glossaries must be > word based. Both tm and glossaries usefull for me because: a) They makes me translate faster b) They help me using the same target text for the same source text. (Corollary: they help keeping a consistent style) c) They help me to use standard wording, particularly the glossary. By language standardization I mean reduction of polysemy/synonymy, that is do not use a the same word/expresion to refer to several meanings, and also, do not use several words/expresions to refer to a single meaning. So, my vote goes to a glossary with "meaning" as the "primary key" concept, and languages, translations, subbordinated to meaning. That still gives the chance to lookup words !!, given that a proper configuration is set (source and target languages), and that glossary contains the pair source<-=meaning=->target. Sure, if the glossary contains several entries with , each for a different meaning (obviously), then several can be suggested, each for it's meaning, if the glossary contains a translation for that meaning, of course. -- Best regards, MV pgpNJVHPxsnlr.pgp Description: PGP signature -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword___ Translate-pootle mailing list Translate-pootle@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/translate-pootle
Re: [translate-pootle] Glossary stuff
Leandro Regueiro wrote: > Samuel wrote: >> That is fine, from an academic point of view, but the fact is that >> a glossary function must have the ability to recognise items from >> the source text that are in the glossary. No program can recognise >> concepts. Only words can be matched. Therefore, glossaries must >> be word based. > Since I think glossaries are maintained by humans, the glossaries > could be concept based. I'm interested to know how a glossary server would match a concept (from the glossary) to a word (in the source text). Or... were you thinking of having a glossary server that doesn't perform any automatic matching of words from the source text? >> Isn't Martin Benjamin working on such a list via AnLoc? >> http://africanlocalisation.net/en/terminology > Perhaps. In the last times there are lot of tools for translating, > maintaining TM, glossaries... Too much for me. No, the Anloc Terminology project is not a tool -- it is a list. It is a list of 2500 terms, to be translated into many African languages. If one can get one's hands on that list, it could be a useful start for a super list of GUI terms. Martin's list also has nothing to do with TM. > Yes, perhaps could do term editing, but if we set up a terminology > server, the term editing should be considered term suggestion that > must be approved by some user of the terminology server (a human). And this is why such a terminology server will fail. If users who add terms find that their expertise is not respected by the community, and that their contributions are regarded as second-rate until formally approved by some other guy, they will lose interest in participating. >> A way to judge a CAT tool's term recognition is (a) whether it can >> do fuzzy matching when doing glossary recognition... > Where is "exact matching"? I think that in TMs "fuzzy matching" is > very important, but in glossaries it isn't so important. I did not mention exact matching because I assumed that exact matching is a given. Fuzzy matching can be important in glossaries if the glossary does not contain all possible permutations of a word from the source text. If the glossary contains "file" but not "files", will the CAT tool give a result if the source text contains "files"? Samuel -- Samuel Murray sam...@translate.org.za Decathlon, for volunteer opensource translations http://translate.sourceforge.net/wiki/decathlon/ -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword ___ Translate-pootle mailing list Translate-pootle@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/translate-pootle
Re: [translate-pootle] Glossary stuff (was: Re: Frequncy list)
>> Another thing is that in a good glossary doesn't appear words. A good >> glossary has only concepts as entries, and several entries could have >> the same word (because words could have several meanings). > > That is fine, from an academic point of view, but the fact is that a > glossary function must have the ability to recognise items from the source > text that are in the glossary. No program can recognise concepts. Only > words can be matched. Therefore, glossaries must be word based. Since I think glossaries are maintained by humans, the glossaries could be concept based. >> Sometimes could be a good idea having several glossaries, because you >> don't use the same words in Battle for Wesnoth or in Firefox, for >> example. > > Well, I think a super list is not a bad idea. Any project manager can then > take the super list and make the changes to it that he thinks is best for > his particular project, but the super list remains unchanged. I think that in the terminology server should be maintained several glossaries, without merging. The CAT tool should be able to work against one of them, several of them (at the same time, perhaps merging some of them on some way), or against all of them (merging them all). Althought this terminology server could give the possibility to download in several formats or making queries (searching some word like you could make in open-tran against several TMs). > Isn't Martin Benjamin working on such a list via AnLoc? > http://africanlocalisation.net/en/terminology Perhaps. In the last times there are lot of tools for translating, maintaining TM, glossaries... Too much for me. >> A good support (or even only support) for glossaries is a great lack >> of a lot of CAT programs. In Lokalize there is some support for this >> http://youonlylivetwice.info/lokalize/lokalize-glossary.htm > > Well, I think there are four important glossary tasks in CAT tools, namely > term recognition, term insertion, term adding and term editing. Term > recognition is an automatic process whereby the tool searches existing > glossaries for matching terms in the current source text segment. Term > insertion is the ability to insert a term's translation into the target > field in some easy way. Term adding is the ability to add terms (and their > translations) to glossaries used by the term recognition function. Term > editing is the ability to make changes to existing glossary entries. > > Most CAT tools that I know of, offer term recognition. Even if a tool > offers only term recognition, it can already benefit greatly from a > pre-existing super glossary. > > For comparison: A CAT tool that offers only term recognition (not the other > three) is OmegaT. A CAT tool that offers both term recognition and term > insertion, is Pootle. In both OmegaT and Pootle, it is not possible to add > terms to the glossary without using a separate program. OmegaT's glossaries > are easier to edit (use a text editor) but you must reload the project each > time. Pootle's glossaries are more difficult to edit (unless you're running > a local Pootle), but new terms are recognised immediately (if I remember > correctly). > > From the presentation, it appears that KBab^H^H^H^HLokalize can do term > recognition, term insertion and term adding (and possibly also term > editing). Yes, perhaps could do term editing, but if we set up a terminology server, the term editing should be considered term suggestion that must be approved by some user of the terminology server (a human). > A way to judge a CAT tool's term recognition is (a) whether it can do fuzzy > matching when doing glossary recognition, and (b) whether one can customise > the matching process using techniques like (i) stemming and (ii) setting > truncation rules. If I remember correctly, Pootle can do #a but not #b. > OmegaT can do neither. Wordfast can do #a, #b1 and #b2. Where is "exact matching"? I think that in TMs "fuzzy matching" is very important, but in glossaries it isn't so important. > A way to judge a CAT tool's term insertion is (a) whether it can be done > using only the keyboard and (b) whether it can make changes to the target > text term in the light of the current text (eg (i) if the SL word starts > with a capital letter, but the glossary item does not, will the CAT tool > insert the target term with a capital letter, or (ii) if the SL word > contains an accelerator, can the CAT tool give the inserted translation an > accelerator also). Pootle fails on both #a and #b. Wordfast can do #a and > #b1 but not #b2. > > How does Lokalize fare in the light of the above? I really don't know. I don't use Lokalize yet. Ask Shaforostoff. > What other CAT tools were you thinking of when you made your comment? I was thinking on Gtranslator, Poedit... Bye, Leandro Regueiro -- This SF.net email is sponsored by: SourcForge
[translate-pootle] Glossary stuff (was: Re: Frequncy list)
Leandro Regueiro wrote: > Another thing is that in a good glossary doesn't appear words. A good > glossary has only concepts as entries, and several entries could have > the same word (because words could have several meanings). That is fine, from an academic point of view, but the fact is that a glossary function must have the ability to recognise items from the source text that are in the glossary. No program can recognise concepts. Only words can be matched. Therefore, glossaries must be word based. > Sometimes could be a good idea having several glossaries, because you > don't use the same words in Battle for Wesnoth or in Firefox, for > example. Well, I think a super list is not a bad idea. Any project manager can then take the super list and make the changes to it that he thinks is best for his particular project, but the super list remains unchanged. Isn't Martin Benjamin working on such a list via AnLoc? http://africanlocalisation.net/en/terminology > A good support (or even only support) for glossaries is a great lack > of a lot of CAT programs. In Lokalize there is some support for this > http://youonlylivetwice.info/lokalize/lokalize-glossary.htm Well, I think there are four important glossary tasks in CAT tools, namely term recognition, term insertion, term adding and term editing. Term recognition is an automatic process whereby the tool searches existing glossaries for matching terms in the current source text segment. Term insertion is the ability to insert a term's translation into the target field in some easy way. Term adding is the ability to add terms (and their translations) to glossaries used by the term recognition function. Term editing is the ability to make changes to existing glossary entries. Most CAT tools that I know of, offer term recognition. Even if a tool offers only term recognition, it can already benefit greatly from a pre-existing super glossary. For comparison: A CAT tool that offers only term recognition (not the other three) is OmegaT. A CAT tool that offers both term recognition and term insertion, is Pootle. In both OmegaT and Pootle, it is not possible to add terms to the glossary without using a separate program. OmegaT's glossaries are easier to edit (use a text editor) but you must reload the project each time. Pootle's glossaries are more difficult to edit (unless you're running a local Pootle), but new terms are recognised immediately (if I remember correctly). From the presentation, it appears that KBab^H^H^H^HLokalize can do term recognition, term insertion and term adding (and possibly also term editing). A way to judge a CAT tool's term recognition is (a) whether it can do fuzzy matching when doing glossary recognition, and (b) whether one can customise the matching process using techniques like (i) stemming and (ii) setting truncation rules. If I remember correctly, Pootle can do #a but not #b. OmegaT can do neither. Wordfast can do #a, #b1 and #b2. A way to judge a CAT tool's term insertion is (a) whether it can be done using only the keyboard and (b) whether it can make changes to the target text term in the light of the current text (eg (i) if the SL word starts with a capital letter, but the glossary item does not, will the CAT tool insert the target term with a capital letter, or (ii) if the SL word contains an accelerator, can the CAT tool give the inserted translation an accelerator also). Pootle fails on both #a and #b. Wordfast can do #a and #b1 but not #b2. How does Lokalize fare in the light of the above? What other CAT tools were you thinking of when you made your comment? Samuel -- Samuel Murray sam...@translate.org.za Decathlon, for volunteer opensource translations http://translate.sourceforge.net/wiki/decathlon/ -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword ___ Translate-pootle mailing list Translate-pootle@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/translate-pootle
Re: [translate-pootle] Frequncy list
On Thu, Jan 22, 2009 at 2:57 PM, Dwayne Bailey wrote: > On Thu, 2009-01-22 at 12:38 +0100, Leandro Regueiro wrote: > > > >> >> A good support (or even only support) for glossaries is a great lack >> >> of a lot of CAT programs. In Lokalize there is some support for this >> >> http://youonlylivetwice.info/lokalize/lokalize-glossary.htm >> > >> > I must say I've been underwhelmed by most glossary solutions. Thanks >> > for that link the lokalize approach looks interesting. >> >> Yes. It could be a good idea that instead of saving the new entries >> added to the glossary in local, it could connect to a terminology >> server and add them there. Perhaps is time to specify a kind of >> protocol that have all the things needed for this things. > > I think before that its probably best to think of the type of data that > is needed and exchanged. > > My thoughts would be: > request_term(term) -> (term, disambiguation, definition, translation, > translated_definition)*N Um, this is difficult to say if we don't have defined the structure of the glossary. In Trasno Project we are trying to synthesize what has our actual glossary (maintained by hand on a wiki) to make a specification for our new glossary and find a system that allow us to maintain the glossaries and interchange them in several formats, and in several ways with several CAT programs. >> > More importantly. How did they create that flash presentation! >> >> :) I think the important thing is what the presentation shows. > > Not if you want to make some of your own presentations like that. Then contact Shaforostoff http://youonlylivetwice.info/ Bye, Leandro Regueiro -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword ___ Translate-pootle mailing list Translate-pootle@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/translate-pootle
Re: [translate-pootle] Frequncy list
On Thu, 2009-01-22 at 12:38 +0100, Leandro Regueiro wrote: > >> A good support (or even only support) for glossaries is a great lack > >> of a lot of CAT programs. In Lokalize there is some support for this > >> http://youonlylivetwice.info/lokalize/lokalize-glossary.htm > > > > I must say I've been underwhelmed by most glossary solutions. Thanks > > for that link the lokalize approach looks interesting. > > Yes. It could be a good idea that instead of saving the new entries > added to the glossary in local, it could connect to a terminology > server and add them there. Perhaps is time to specify a kind of > protocol that have all the things needed for this things. I think before that its probably best to think of the type of data that is needed and exchanged. My thoughts would be: request_term(term) -> (term, disambiguation, definition, translation, translated_definition)*N > > More importantly. How did they create that flash presentation! > > :) I think the important thing is what the presentation shows. Not if you want to make some of your own presentations like that. -- Dwayne Bailey Associate +27 12 460 1095 (w) Translate.org.za +27 83 443 7114 (c) Recent blog posts: * xclip - where have you been all of my life! http://www.translate.org.za/blogs/dwayne/en/content/xclip-where-have-you-been-all-my-life * Virtaal on Fedora * Translate Toolkit on Fedora. Status of Virtaal and Pootle Stop Digital Apartheid! - http://www.digitalapartheid.com Firefox web browser in Afrikaans - http://af.www.mozilla.com/af/ African Network for Localisation (ANLoc) - http://africanlocalisation.net/ -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword ___ Translate-pootle mailing list Translate-pootle@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/translate-pootle
Re: [translate-pootle] Frequncy list
>> >> The challenge in trying to >> >> leverage this without extensive reference to context is that many short >> >> strings can have ambiguous meanings >> >> >> >> Left (remaining) or Left (direction), Clear (erase) or Clear (transparent) >> >> and so on. >> > >> > Yes, I also found this in the short frequency words lists I created for >> > the Decathlon (see my mail to Asiri). >> > >> > I think the most practical solution would be to create such a list >> > anyway, and then try to find as many different meanings for each word, >> > and include all those meanings in the list. You'll end up with meanings >> > that are not common, but at least you'll cover all the meanings that are >> > important. >> > >> > For example, if the list contains "file", you might put both computer >> > file and nail file in the word list, even though nail file is very >> > unlikely to occur in a software translation. In this way, translators >> > (who must use these lists intelligently) can easily spot the appropriate >> > meaning. >> >> I think the terminology should be created and maintained via a >> specific program for this task. Using a program for seeing the words >> that are more used could be useful until certain point, because a very >> common word is "the", a word that I think doesn't need to be in a >> glossary. > > That's why you have to use stoplists, like poterminology does. Someday I will read all the Pootle wiki stuff for knowing better the Pootle environment. >> Another thing is that in a good glossary doesn't appear words. A good >> glossary has only concepts as entries, and several entries could have >> the same word (because words could have several meanings). >> >> Sometimes could be a good idea having several glossaries, because you >> don't use the same words in Battle for Wesnoth or in Firefox, for >> example. > > Or maybe groups of terminology that cover common and then domain > specific stuff. Yes, I forgot the common stuff glossary, but the others are specific. >> A good support (or even only support) for glossaries is a great lack >> of a lot of CAT programs. In Lokalize there is some support for this >> http://youonlylivetwice.info/lokalize/lokalize-glossary.htm > > I must say I've been underwhelmed by most glossary solutions. Thanks > for that link the lokalize approach looks interesting. Yes. It could be a good idea that instead of saving the new entries added to the glossary in local, it could connect to a terminology server and add them there. Perhaps is time to specify a kind of protocol that have all the things needed for this things. > More importantly. How did they create that flash presentation! :) I think the important thing is what the presentation shows. Bye, Leandro Regueiro -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword ___ Translate-pootle mailing list Translate-pootle@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/translate-pootle
Re: [translate-pootle] Frequncy list
On Wed, 2009-01-21 at 12:23 +0100, Leandro Regueiro wrote: > >> The challenge in trying to > >> leverage this without extensive reference to context is that many short > >> strings can have ambiguous meanings > >> > >> Left (remaining) or Left (direction), Clear (erase) or Clear (transparent) > >> and so on. > > > > Yes, I also found this in the short frequency words lists I created for > > the Decathlon (see my mail to Asiri). > > > > I think the most practical solution would be to create such a list > > anyway, and then try to find as many different meanings for each word, > > and include all those meanings in the list. You'll end up with meanings > > that are not common, but at least you'll cover all the meanings that are > > important. > > > > For example, if the list contains "file", you might put both computer > > file and nail file in the word list, even though nail file is very > > unlikely to occur in a software translation. In this way, translators > > (who must use these lists intelligently) can easily spot the appropriate > > meaning. > > I think the terminology should be created and maintained via a > specific program for this task. Using a program for seeing the words > that are more used could be useful until certain point, because a very > common word is "the", a word that I think doesn't need to be in a > glossary. That's why you have to use stoplists, like poterminology does. > Another thing is that in a good glossary doesn't appear words. A good > glossary has only concepts as entries, and several entries could have > the same word (because words could have several meanings). > > Sometimes could be a good idea having several glossaries, because you > don't use the same words in Battle for Wesnoth or in Firefox, for > example. Or maybe groups of terminology that cover common and then domain specific stuff. > A good support (or even only support) for glossaries is a great lack > of a lot of CAT programs. In Lokalize there is some support for this > http://youonlylivetwice.info/lokalize/lokalize-glossary.htm I must say I've been underwhelmed by most glossary solutions. Thanks for that link the lokalize approach looks interesting. More importantly. How did they create that flash presentation! -- Dwayne Bailey Associate +27 12 460 1095 (w) Translate.org.za +27 83 443 7114 (c) Recent blog posts: * xclip - where have you been all of my life! http://www.translate.org.za/blogs/dwayne/en/content/xclip-where-have-you-been-all-my-life * Virtaal on Fedora * Translate Toolkit on Fedora. Status of Virtaal and Pootle Stop Digital Apartheid! - http://www.digitalapartheid.com Firefox web browser in Afrikaans - http://af.www.mozilla.com/af/ African Network for Localisation (ANLoc) - http://africanlocalisation.net/ -- This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword ___ Translate-pootle mailing list Translate-pootle@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/translate-pootle