Re: [translate-pootle] when is real-time translation memory support coming, and what does it mean?

2010-09-01 Thread Leandro Regueiro
On Wed, Aug 25, 2010 at 1:22 PM, F Wolff frie...@translate.org.za wrote:
 Op Wo, 2010-08-25 om 11:07 +0200 skryf Leandro Regueiro:
 On Mon, Aug 23, 2010 at 4:01 PM, Dwayne Bailey dwa...@translate.org.za 
 wrote:
  On Fri, 2010-08-20 at 17:52 -0700, The Language Techie wrote:
  Dwayne:
 
  In Twitter you mentioned that Pootle core code is in place for real-time 
  TM.
  Can you give a little more details, and when this functionality coming?
 
  We've just got 2.1 out the door and will probably focus on bug fixes for
  the next little while.  After that we'll be looking at our roadmap.
 
  This is a wish list description of what I consider to be real-time TM
  support:
 
  (1) Project initiators can upload a TMX TM file to the project to apply 
  the
  TM matching to all the files for that project.
 
  As apart from the files being translated?  You mean a separate TM right.
 
  (2) At the translate interface, each source segment will have a list of
  fuzzy-match suggestions based on the active TM for the project.  The score
  of the fuzzy match (1%-99%) or exact match (100%) will be displayed.  The
  difference in the source text and the text being considered for 
  translation
  will be highlighted for the translator to see the difference right a way.
 
  We do this on Virtaal, have a look and tell us what you think.  This
  would probably be the approach that we'll take.
 
  (3) The translator can choose to see fuzzy matches at a pre-configured
  level, such as 0% (no fuzzy), 50%, 60%, 75%, etc.  This should be a user
  preference setting.
 
  While setting a threshold is nice, a really low threshold has no real
  value and loads the server quite considerably.

 IMHO below 60% is not useful at all. I even suggest to show only the
 first five top results at most. For short strings (less than 10 words)
 I only suggest to show 85% or upper matches, and in larger strings
 (paragraphs and long phrases) you can show results with a matching of
 60% or more. These are my suggestions after using Lokalize TM matching
 feature during several months, I still am searching the way to setup
 the TM matching feature on Virtaal.


 Hallo Leandro

 I agree about 60%, although you might be surprised about what people do
 in the translation industry :-)  Our defaults are usually at about
 70%-75%.  I also agree about limiting the matches, although for Pootle,
 I would probably want to limit it to even less than 5 (probably only one
 or two suggestions), but that is detail for later.

You may be surprised in some large strings matches when is the third
or the fourth match the most suitable to use as a basis for the new
translation. TM matching sometimes has weird behaviors.

 Virtaal doesn't currently support configuring the match percentage
 through the GUI, but you can easily edit the options in
 ~/.virtaal/plugins.ini

 Keep well
 Friedel

--
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
___
Translate-pootle mailing list
Translate-pootle@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/translate-pootle


Re: [translate-pootle] when is real-time translation memory support coming, and what does it mean?

2010-08-25 Thread Leandro Regueiro
On Mon, Aug 23, 2010 at 4:01 PM, Dwayne Bailey dwa...@translate.org.za wrote:
 On Fri, 2010-08-20 at 17:52 -0700, The Language Techie wrote:
 Dwayne:

 In Twitter you mentioned that Pootle core code is in place for real-time TM.
 Can you give a little more details, and when this functionality coming?

 We've just got 2.1 out the door and will probably focus on bug fixes for
 the next little while.  After that we'll be looking at our roadmap.

 This is a wish list description of what I consider to be real-time TM
 support:

 (1) Project initiators can upload a TMX TM file to the project to apply the
 TM matching to all the files for that project.

 As apart from the files being translated?  You mean a separate TM right.

 (2) At the translate interface, each source segment will have a list of
 fuzzy-match suggestions based on the active TM for the project.  The score
 of the fuzzy match (1%-99%) or exact match (100%) will be displayed.  The
 difference in the source text and the text being considered for translation
 will be highlighted for the translator to see the difference right a way.

 We do this on Virtaal, have a look and tell us what you think.  This
 would probably be the approach that we'll take.

 (3) The translator can choose to see fuzzy matches at a pre-configured
 level, such as 0% (no fuzzy), 50%, 60%, 75%, etc.  This should be a user
 preference setting.

 While setting a threshold is nice, a really low threshold has no real
 value and loads the server quite considerably.

IMHO below 60% is not useful at all. I even suggest to show only the
first five top results at most. For short strings (less than 10 words)
I only suggest to show 85% or upper matches, and in larger strings
(paragraphs and long phrases) you can show results with a matching of
60% or more. These are my suggestions after using Lokalize TM matching
feature during several months, I still am searching the way to setup
the TM matching feature on Virtaal.

 (4) After one translation is committed or suggested (by pressing the
 button), that translation goes into the active TM pool immediately, so that
 it can become a 100% match or a fuzzy match for the next sentence, without
 waiting until the entire project is completed or when the TM is updated.

 With 2.1 all our strings are in the database so that should become
 easier to do.  But the issue of load comes up again.  Matching is pretty
 expensive.

 For example,

 EN The quick brown fox jumps over the lazy dog. = zh-TW 敏捷的棕狐狸跳過懶狗。

 As soon as this translation is made, and committed or suggested, if at a
 later time, or in the next sentence, the translator sees:

 EN The quick brown foxes jump over the lazy dog.

 They will see in the fuzzy-match section that

 EN The quick brown fox jumps over the lazy dog. = zh-TW 敏捷的棕狐狸跳過懶狗。

 is a high fuzzy match, with a score of about 75-80%.  If this TM match is
 only a suggestion, it will be nice to show that.  Any TM match should show
 information like creation-user, change-user, creation-time, change-time,
 etc., if available.

 We probably won't do the TM approach that loads a 100% match into your
 text. We prefer users to make their own selections.

 This real-time fuzzy match behavior is what commercial CAT tools offer
 already.

 Sure, so do we in Virtaal.  Its pretty easy to do on a desktop tool.
 But matching a lot of strings across lot of languages and projects will
 be where we need to see how well we can scale.

 (5) At any time during the project, the active TM can be downloaded through
 TMX format.  It should contain all committed and suggested translation.  Or,
 the inclusion of suggested translations in the download should be an
 option.  The download TMX should contain creation-user, change-user,
 creation-time, change-time, etc. meta data.

 (6) When the user chooses to download files for offline translation with
 Virtaal, the relevant part of the active TM (over the chosen % threshold)
 should be downloaded along with the files, so that in Virtaal, users will
 see exactly the same fuzzy matches as they would see in Pootle online.  The
 only difference is that, translations made by users offline in Virtaal will
 not be uploaded to Pootle server in real-time to benefit other translators.

 Hope my description of the TM functionality makes sense to the team, and I
 look forward to hearing about this feature!

 Yes, it does indeed and is very much inline with our own thinking.

 --
 Dwayne Bailey
 Associate             Research Director        +27 12 460 1095 (w)
 Translate.org.za      ANLoc                    +27 83 443 7114 (c)

 Recent blog posts:
 * Localizing Mac OS X strings files using open source PO editors
 http://www.translate.org.za/blogs/dwayne/en/content/localizing-mac-os-x-strings-files-using-open-source-po-editors
 * What's new in Virtaal 0.6.1
 * Localisation: How we guess the target translation language in Virtaal

 Firefox web browser in Afrikaans - http://af.www.mozilla.com/af/
 African Network for 

Re: [translate-pootle] when is real-time translation memory support coming, and what does it mean?

2010-08-23 Thread Dwayne Bailey
On Fri, 2010-08-20 at 17:52 -0700, The Language Techie wrote:
 Dwayne:
 
 In Twitter you mentioned that Pootle core code is in place for real-time TM.
 Can you give a little more details, and when this functionality coming?

We've just got 2.1 out the door and will probably focus on bug fixes for
the next little while.  After that we'll be looking at our roadmap.

 This is a wish list description of what I consider to be real-time TM
 support:
 
 (1) Project initiators can upload a TMX TM file to the project to apply the
 TM matching to all the files for that project.

As apart from the files being translated?  You mean a separate TM right.

 (2) At the translate interface, each source segment will have a list of
 fuzzy-match suggestions based on the active TM for the project.  The score
 of the fuzzy match (1%-99%) or exact match (100%) will be displayed.  The
 difference in the source text and the text being considered for translation
 will be highlighted for the translator to see the difference right a way.

We do this on Virtaal, have a look and tell us what you think.  This
would probably be the approach that we'll take.

 (3) The translator can choose to see fuzzy matches at a pre-configured
 level, such as 0% (no fuzzy), 50%, 60%, 75%, etc.  This should be a user
 preference setting.

While setting a threshold is nice, a really low threshold has no real
value and loads the server quite considerably.

 (4) After one translation is committed or suggested (by pressing the
 button), that translation goes into the active TM pool immediately, so that
 it can become a 100% match or a fuzzy match for the next sentence, without
 waiting until the entire project is completed or when the TM is updated.

With 2.1 all our strings are in the database so that should become
easier to do.  But the issue of load comes up again.  Matching is pretty
expensive.

 For example,
 
 EN The quick brown fox jumps over the lazy dog. = zh-TW 敏捷的棕狐狸跳過懶狗。
 
 As soon as this translation is made, and committed or suggested, if at a
 later time, or in the next sentence, the translator sees:
 
 EN The quick brown foxes jump over the lazy dog.
 
 They will see in the fuzzy-match section that
 
 EN The quick brown fox jumps over the lazy dog. = zh-TW 敏捷的棕狐狸跳過懶狗。
 
 is a high fuzzy match, with a score of about 75-80%.  If this TM match is
 only a suggestion, it will be nice to show that.  Any TM match should show
 information like creation-user, change-user, creation-time, change-time,
 etc., if available.

We probably won't do the TM approach that loads a 100% match into your
text. We prefer users to make their own selections.

 This real-time fuzzy match behavior is what commercial CAT tools offer
 already.

Sure, so do we in Virtaal.  Its pretty easy to do on a desktop tool.
But matching a lot of strings across lot of languages and projects will
be where we need to see how well we can scale.

 (5) At any time during the project, the active TM can be downloaded through
 TMX format.  It should contain all committed and suggested translation.  Or,
 the inclusion of suggested translations in the download should be an
 option.  The download TMX should contain creation-user, change-user,
 creation-time, change-time, etc. meta data.
 
 (6) When the user chooses to download files for offline translation with
 Virtaal, the relevant part of the active TM (over the chosen % threshold)
 should be downloaded along with the files, so that in Virtaal, users will
 see exactly the same fuzzy matches as they would see in Pootle online.  The
 only difference is that, translations made by users offline in Virtaal will
 not be uploaded to Pootle server in real-time to benefit other translators.
 
 Hope my description of the TM functionality makes sense to the team, and I
 look forward to hearing about this feature!

Yes, it does indeed and is very much inline with our own thinking.  

-- 
Dwayne Bailey
Associate Research Director+27 12 460 1095 (w)
Translate.org.za  ANLoc+27 83 443 7114 (c)

Recent blog posts:
* Localizing Mac OS X strings files using open source PO editors
http://www.translate.org.za/blogs/dwayne/en/content/localizing-mac-os-x-strings-files-using-open-source-po-editors
* What's new in Virtaal 0.6.1
* Localisation: How we guess the target translation language in Virtaal

Firefox web browser in Afrikaans - http://af.www.mozilla.com/af/
African Network for Localisation (ANLoc) - http://africanlocalisation.net/



--
This SF.net email is sponsored by 

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
___
Translate-pootle mailing list
Translate-pootle@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/translate-pootle