Re: [translate-pootle] Glossary stuff (was: Re: Frequncy list)

2009-01-22 Thread Marce Villarino
O Xoves 22 Xaneiro 2009 18:13, Samuel Murray (Groenkloof) escribiu:
> > Another thing is that in a good glossary doesn't appear words. A good
> > glossary has only concepts as entries, and several entries could have
> > the same word (because words could have several meanings).
>
> That is fine, from an academic point of view, but the fact is that a
> glossary function must have the ability to recognise items from the
> source text that are in the glossary.  No program can recognise
> concepts.  Only words can be matched.  Therefore, glossaries must be
> word based.

Both tm and glossaries usefull for me because:
a) They makes me translate faster
b) They help me using the same target text for the same source text. 
(Corollary: they help keeping a consistent style)
c) They help me to use standard wording, particularly the glossary.

By language standardization I mean reduction of polysemy/synonymy, that is do 
not use a the same word/expresion to refer to several meanings, and also, do 
not use several words/expresions to refer to a single meaning.

So, my vote goes to a glossary with "meaning" as the "primary key" concept, 
and languages, translations, subbordinated to meaning.

That still gives the chance to lookup words !!, given that a proper 
configuration is set (source and target languages), and that glossary 
contains the pair source<-=meaning=->target.
Sure, if the glossary contains several entries with , each for a  
different meaning (obviously), then several  can be suggested, 
each for it's meaning, if the glossary contains a translation for that 
meaning, of course.

-- 
Best regards,
MV


pgpNJVHPxsnlr.pgp
Description: PGP signature
--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword___
Translate-pootle mailing list
Translate-pootle@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/translate-pootle


Re: [translate-pootle] Glossary stuff

2009-01-22 Thread Samuel Murray (Groenkloof)
Leandro Regueiro wrote:

> Samuel wrote:

>> That is fine, from an academic point of view, but the fact is that
>> a glossary function must have the ability to recognise items from
>> the source text that are in the glossary.  No program can recognise
>> concepts.  Only words can be matched.  Therefore, glossaries must
>> be word based.

> Since I think glossaries are maintained by humans, the glossaries 
> could be concept based.

I'm interested to know how a glossary server would match a concept (from
the glossary) to a word (in the source text).  Or... were you thinking
of having a glossary server that doesn't perform any automatic matching
of words from the source text?

>> Isn't Martin Benjamin working on such a list via AnLoc? 
>> http://africanlocalisation.net/en/terminology

> Perhaps. In the last times there are lot of tools for translating, 
> maintaining TM, glossaries... Too much for me.

No, the Anloc Terminology project is not a tool -- it is a list.  It is
a list of 2500 terms, to be translated into many African languages.  If
one can get one's hands on that list, it could be a useful start for a
super list of GUI terms.  Martin's list also has nothing to do with TM.

> Yes, perhaps could do term editing, but if we set up a terminology 
> server, the term editing should be considered term suggestion that 
> must be approved by some user of the terminology server (a human).

And this is why such a terminology server will fail.  If users who add
terms find that their expertise is not respected by the community, and
that their contributions are regarded as second-rate until formally
approved by some other guy, they will lose interest in participating.

>> A way to judge a CAT tool's term recognition is (a) whether it can
>> do fuzzy matching when doing glossary recognition...

> Where is "exact matching"? I think that in TMs "fuzzy matching" is 
> very important, but in glossaries it isn't so important.

I did not mention exact matching because I assumed that exact matching
is a given.

Fuzzy matching can be important in glossaries if the glossary does not
contain all possible permutations of a word from the source text.  If
the glossary contains "file" but not "files", will the CAT tool give a
result if the source text contains "files"?

Samuel

-- 
Samuel Murray
sam...@translate.org.za
Decathlon, for volunteer opensource translations
http://translate.sourceforge.net/wiki/decathlon/

--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
Translate-pootle mailing list
Translate-pootle@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/translate-pootle


Re: [translate-pootle] Glossary stuff (was: Re: Frequncy list)

2009-01-22 Thread Leandro Regueiro
>> Another thing is that in a good glossary doesn't appear words. A good
>> glossary has only concepts as entries, and several entries could have
>> the same word (because words could have several meanings).
>
> That is fine, from an academic point of view, but the fact is that a
> glossary function must have the ability to recognise items from the source
> text that are in the glossary.  No program can recognise concepts.  Only
> words can be matched.  Therefore, glossaries must be word based.

Since I think glossaries are maintained by humans, the glossaries
could be concept based.


>> Sometimes could be a good idea having several glossaries, because you
>> don't use the same words in Battle for Wesnoth or in Firefox, for
>> example.
>
> Well, I think a super list is not a bad idea.  Any project manager can then
> take the super list and make the changes to it that he thinks is best for
> his particular project, but the super list remains unchanged.

I think that in the terminology server should be maintained several
glossaries, without merging. The CAT tool should be able to work
against one of them, several of them (at the same time, perhaps
merging some of them on some way), or against all of them (merging
them all). Althought this terminology server could give the
possibility to download in several formats or making queries
(searching some word like you could make in open-tran against several
TMs).


> Isn't Martin Benjamin working on such a list via AnLoc?
> http://africanlocalisation.net/en/terminology

Perhaps. In the last times there are lot of tools for translating,
maintaining TM, glossaries... Too much for me.


>> A good support (or even only support) for glossaries is a great lack
>> of a lot of CAT programs. In Lokalize there is some support for this
>> http://youonlylivetwice.info/lokalize/lokalize-glossary.htm
>
> Well, I think there are four important glossary tasks in CAT tools, namely
> term recognition, term insertion, term adding and term editing. Term
> recognition is an automatic process whereby the tool searches existing
> glossaries for matching terms in the current source text segment.  Term
> insertion is the ability to insert a term's translation into the target
> field in some easy way.  Term adding is the ability to add terms (and their
> translations) to glossaries used by the term recognition function.  Term
> editing is the ability to make changes to existing glossary entries.
>
> Most CAT tools that I know of, offer term recognition.  Even if a tool
> offers only term recognition, it can already benefit greatly from a
> pre-existing super glossary.
>
> For comparison:  A CAT tool that offers only term recognition (not the other
> three) is OmegaT.  A CAT tool that offers both term recognition and term
> insertion, is Pootle.  In both OmegaT and Pootle, it is not possible to add
> terms to the glossary without using a separate program.  OmegaT's glossaries
> are easier to edit (use a text editor) but you must reload the project each
> time.  Pootle's glossaries are more difficult to edit (unless you're running
> a local Pootle), but new terms are recognised immediately (if I remember
> correctly).
>
> From the presentation, it appears that KBab^H^H^H^HLokalize can do term
> recognition, term insertion and term adding (and possibly also term
> editing).

Yes, perhaps could do term editing, but if we set up a terminology
server, the term editing should be considered term suggestion that
must be approved by some user of the terminology server (a human).


> A way to judge a CAT tool's term recognition is (a) whether it can do fuzzy
> matching when doing glossary recognition, and (b) whether one can customise
> the matching process using techniques like (i) stemming and (ii) setting
> truncation rules.  If I remember correctly, Pootle can do #a but not #b.
>  OmegaT can do neither.  Wordfast can do #a, #b1 and #b2.

Where is "exact matching"? I think that in TMs "fuzzy matching" is
very important, but in glossaries it isn't so important.


> A way to judge a CAT tool's term insertion is (a) whether it can be done
> using only the keyboard and (b) whether it can make changes to the target
> text term in the light of the current text (eg (i) if the SL word starts
> with a capital letter, but the glossary item does not, will the CAT tool
> insert the target term with a capital letter, or (ii) if the SL word
> contains an accelerator, can the CAT tool give the inserted translation an
> accelerator also).  Pootle fails on both #a and #b. Wordfast can do #a and
> #b1 but not #b2.
>
> How does Lokalize fare in the light of the above?

I really don't know. I don't use Lokalize yet. Ask Shaforostoff.


> What other CAT tools were you thinking of when you made your comment?

I was thinking on Gtranslator, Poedit...

Bye,
Leandro Regueiro

--
This SF.net email is sponsored by:
SourcForge

[translate-pootle] Glossary stuff (was: Re: Frequncy list)

2009-01-22 Thread Samuel Murray (Groenkloof)
Leandro Regueiro wrote:

> Another thing is that in a good glossary doesn't appear words. A good
> glossary has only concepts as entries, and several entries could have
> the same word (because words could have several meanings).

That is fine, from an academic point of view, but the fact is that a 
glossary function must have the ability to recognise items from the 
source text that are in the glossary.  No program can recognise 
concepts.  Only words can be matched.  Therefore, glossaries must be 
word based.

> Sometimes could be a good idea having several glossaries, because you
> don't use the same words in Battle for Wesnoth or in Firefox, for
> example.

Well, I think a super list is not a bad idea.  Any project manager can 
then take the super list and make the changes to it that he thinks is 
best for his particular project, but the super list remains unchanged.

Isn't Martin Benjamin working on such a list via AnLoc?
http://africanlocalisation.net/en/terminology

> A good support (or even only support) for glossaries is a great lack
> of a lot of CAT programs. In Lokalize there is some support for this
> http://youonlylivetwice.info/lokalize/lokalize-glossary.htm

Well, I think there are four important glossary tasks in CAT tools, 
namely term recognition, term insertion, term adding and term editing. 
Term recognition is an automatic process whereby the tool searches 
existing glossaries for matching terms in the current source text 
segment.  Term insertion is the ability to insert a term's translation 
into the target field in some easy way.  Term adding is the ability to 
add terms (and their translations) to glossaries used by the term 
recognition function.  Term editing is the ability to make changes to 
existing glossary entries.

Most CAT tools that I know of, offer term recognition.  Even if a tool 
offers only term recognition, it can already benefit greatly from a 
pre-existing super glossary.

For comparison:  A CAT tool that offers only term recognition (not the 
other three) is OmegaT.  A CAT tool that offers both term recognition 
and term insertion, is Pootle.  In both OmegaT and Pootle, it is not 
possible to add terms to the glossary without using a separate program. 
  OmegaT's glossaries are easier to edit (use a text editor) but you 
must reload the project each time.  Pootle's glossaries are more 
difficult to edit (unless you're running a local Pootle), but new terms 
are recognised immediately (if I remember correctly).

 From the presentation, it appears that KBab^H^H^H^HLokalize can do term 
recognition, term insertion and term adding (and possibly also term 
editing).

A way to judge a CAT tool's term recognition is (a) whether it can do 
fuzzy matching when doing glossary recognition, and (b) whether one can 
customise the matching process using techniques like (i) stemming and 
(ii) setting truncation rules.  If I remember correctly, Pootle can do 
#a but not #b.  OmegaT can do neither.  Wordfast can do #a, #b1 and #b2.

A way to judge a CAT tool's term insertion is (a) whether it can be done 
using only the keyboard and (b) whether it can make changes to the 
target text term in the light of the current text (eg (i) if the SL word 
starts with a capital letter, but the glossary item does not, will the 
CAT tool insert the target term with a capital letter, or (ii) if the SL 
word contains an accelerator, can the CAT tool give the inserted 
translation an accelerator also).  Pootle fails on both #a and #b. 
Wordfast can do #a and #b1 but not #b2.

How does Lokalize fare in the light of the above?

What other CAT tools were you thinking of when you made your comment?

Samuel



-- 
Samuel Murray
sam...@translate.org.za
Decathlon, for volunteer opensource translations
http://translate.sourceforge.net/wiki/decathlon/

--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
Translate-pootle mailing list
Translate-pootle@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/translate-pootle


Re: [translate-pootle] Frequncy list

2009-01-22 Thread Leandro Regueiro
On Thu, Jan 22, 2009 at 2:57 PM, Dwayne Bailey  wrote:
> On Thu, 2009-01-22 at 12:38 +0100, Leandro Regueiro wrote:
>
> 
>
>> >> A good support (or even only support) for glossaries is a great lack
>> >> of a lot of CAT programs. In Lokalize there is some support for this
>> >> http://youonlylivetwice.info/lokalize/lokalize-glossary.htm
>> >
>> > I must say I've been underwhelmed by most glossary solutions.  Thanks
>> > for that link the lokalize approach looks interesting.
>>
>> Yes. It could be a good idea that instead of saving the new entries
>> added to the glossary in local, it could connect to a terminology
>> server and add them there. Perhaps is time to specify a kind of
>> protocol that have all the things needed for this things.
>
> I think before that its probably best to think of the type of data that
> is needed and exchanged.
>
> My thoughts would be:
> request_term(term) -> (term, disambiguation, definition, translation,
> translated_definition)*N

Um, this is difficult to say if we don't have defined the structure of
the glossary. In Trasno Project we are trying to synthesize what has
our actual glossary (maintained by hand on a wiki) to make a
specification for our new glossary and find a system that allow us to
maintain the glossaries and interchange them in several formats, and
in several ways with several CAT programs.


>> > More importantly.  How did they create that flash presentation!
>>
>> :) I think the important thing is what the presentation shows.
>
> Not if you want to make some of your own presentations like that.

Then contact Shaforostoff http://youonlylivetwice.info/

Bye,
   Leandro Regueiro

--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
Translate-pootle mailing list
Translate-pootle@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/translate-pootle


Re: [translate-pootle] Frequncy list

2009-01-22 Thread Dwayne Bailey
On Thu, 2009-01-22 at 12:38 +0100, Leandro Regueiro wrote:



> >> A good support (or even only support) for glossaries is a great lack
> >> of a lot of CAT programs. In Lokalize there is some support for this
> >> http://youonlylivetwice.info/lokalize/lokalize-glossary.htm
> >
> > I must say I've been underwhelmed by most glossary solutions.  Thanks
> > for that link the lokalize approach looks interesting.
> 
> Yes. It could be a good idea that instead of saving the new entries
> added to the glossary in local, it could connect to a terminology
> server and add them there. Perhaps is time to specify a kind of
> protocol that have all the things needed for this things.

I think before that its probably best to think of the type of data that
is needed and exchanged.

My thoughts would be:
request_term(term) -> (term, disambiguation, definition, translation,
translated_definition)*N

> > More importantly.  How did they create that flash presentation!
> 
> :) I think the important thing is what the presentation shows.

Not if you want to make some of your own presentations like that.

-- 
Dwayne Bailey
Associate  +27 12 460 1095 (w)
Translate.org.za   +27 83 443 7114 (c)

Recent blog posts:
* xclip - where have you been all of my life!
http://www.translate.org.za/blogs/dwayne/en/content/xclip-where-have-you-been-all-my-life
* Virtaal on Fedora
* Translate Toolkit on Fedora.  Status of Virtaal and Pootle

Stop Digital Apartheid! - http://www.digitalapartheid.com
Firefox web browser in Afrikaans - http://af.www.mozilla.com/af/
African Network for Localisation (ANLoc) -
http://africanlocalisation.net/



--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
Translate-pootle mailing list
Translate-pootle@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/translate-pootle


Re: [translate-pootle] Frequncy list

2009-01-22 Thread Leandro Regueiro
>> >> The  challenge in trying to
>> >> leverage this without extensive reference to context is that many short
>> >> strings can have ambiguous meanings
>> >>
>> >> Left (remaining) or Left (direction), Clear (erase) or Clear (transparent)
>> >> and so on.
>> >
>> > Yes, I also found this in the short frequency words lists I created for
>> > the Decathlon (see my mail to Asiri).
>> >
>> > I think the most practical solution would be to create such a list
>> > anyway, and then try to find as many different meanings for each word,
>> > and include all those meanings in the list.  You'll end up with meanings
>> > that are not common, but at least you'll cover all the meanings that are
>> > important.
>> >
>> > For example, if the list contains "file", you might put both computer
>> > file and nail file in the word list, even though nail file is very
>> > unlikely to occur in a software translation.  In this way, translators
>> > (who must use these lists intelligently) can easily spot the appropriate
>> > meaning.
>>
>> I think the terminology should be created and maintained via a
>> specific program for this task. Using a program for seeing the words
>> that are more used could be useful until certain point, because a very
>> common word is "the", a word that I think doesn't need to be in a
>> glossary.
>
> That's why you have to use stoplists, like poterminology does.

Someday I will read all the Pootle wiki stuff for knowing better the
Pootle environment.


>> Another thing is that in a good glossary doesn't appear words. A good
>> glossary has only concepts as entries, and several entries could have
>> the same word (because words could have several meanings).
>>
>> Sometimes could be a good idea having several glossaries, because you
>> don't use the same words in Battle for Wesnoth or in Firefox, for
>> example.
>
> Or maybe groups of terminology that cover common and then domain
> specific stuff.

Yes, I forgot the common stuff glossary, but the others are specific.


>> A good support (or even only support) for glossaries is a great lack
>> of a lot of CAT programs. In Lokalize there is some support for this
>> http://youonlylivetwice.info/lokalize/lokalize-glossary.htm
>
> I must say I've been underwhelmed by most glossary solutions.  Thanks
> for that link the lokalize approach looks interesting.

Yes. It could be a good idea that instead of saving the new entries
added to the glossary in local, it could connect to a terminology
server and add them there. Perhaps is time to specify a kind of
protocol that have all the things needed for this things.


> More importantly.  How did they create that flash presentation!

:) I think the important thing is what the presentation shows.

Bye,
  Leandro Regueiro

--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
Translate-pootle mailing list
Translate-pootle@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/translate-pootle


Re: [translate-pootle] Frequncy list

2009-01-22 Thread Dwayne Bailey
On Wed, 2009-01-21 at 12:23 +0100, Leandro Regueiro wrote:
> >> The  challenge in trying to
> >> leverage this without extensive reference to context is that many short
> >> strings can have ambiguous meanings
> >>
> >> Left (remaining) or Left (direction), Clear (erase) or Clear (transparent)
> >> and so on.
> >
> > Yes, I also found this in the short frequency words lists I created for
> > the Decathlon (see my mail to Asiri).
> >
> > I think the most practical solution would be to create such a list
> > anyway, and then try to find as many different meanings for each word,
> > and include all those meanings in the list.  You'll end up with meanings
> > that are not common, but at least you'll cover all the meanings that are
> > important.
> >
> > For example, if the list contains "file", you might put both computer
> > file and nail file in the word list, even though nail file is very
> > unlikely to occur in a software translation.  In this way, translators
> > (who must use these lists intelligently) can easily spot the appropriate
> > meaning.
> 
> I think the terminology should be created and maintained via a
> specific program for this task. Using a program for seeing the words
> that are more used could be useful until certain point, because a very
> common word is "the", a word that I think doesn't need to be in a
> glossary.

That's why you have to use stoplists, like poterminology does.

> Another thing is that in a good glossary doesn't appear words. A good
> glossary has only concepts as entries, and several entries could have
> the same word (because words could have several meanings).
> 
> Sometimes could be a good idea having several glossaries, because you
> don't use the same words in Battle for Wesnoth or in Firefox, for
> example.

Or maybe groups of terminology that cover common and then domain
specific stuff.

> A good support (or even only support) for glossaries is a great lack
> of a lot of CAT programs. In Lokalize there is some support for this
> http://youonlylivetwice.info/lokalize/lokalize-glossary.htm

I must say I've been underwhelmed by most glossary solutions.  Thanks
for that link the lokalize approach looks interesting.

More importantly.  How did they create that flash presentation!

-- 
Dwayne Bailey
Associate  +27 12 460 1095 (w)
Translate.org.za   +27 83 443 7114 (c)

Recent blog posts:
* xclip - where have you been all of my life!
http://www.translate.org.za/blogs/dwayne/en/content/xclip-where-have-you-been-all-my-life
* Virtaal on Fedora
* Translate Toolkit on Fedora.  Status of Virtaal and Pootle

Stop Digital Apartheid! - http://www.digitalapartheid.com
Firefox web browser in Afrikaans - http://af.www.mozilla.com/af/
African Network for Localisation (ANLoc) - http://africanlocalisation.net/



--
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
___
Translate-pootle mailing list
Translate-pootle@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/translate-pootle