Re: [translate-pootle] when is real-time translation memory support coming, and what does it mean?
On Wed, Aug 25, 2010 at 1:22 PM, F Wolff frie...@translate.org.za wrote: Op Wo, 2010-08-25 om 11:07 +0200 skryf Leandro Regueiro: On Mon, Aug 23, 2010 at 4:01 PM, Dwayne Bailey dwa...@translate.org.za wrote: On Fri, 2010-08-20 at 17:52 -0700, The Language Techie wrote: Dwayne: In Twitter you mentioned that Pootle core code is in place for real-time TM. Can you give a little more details, and when this functionality coming? We've just got 2.1 out the door and will probably focus on bug fixes for the next little while. After that we'll be looking at our roadmap. This is a wish list description of what I consider to be real-time TM support: (1) Project initiators can upload a TMX TM file to the project to apply the TM matching to all the files for that project. As apart from the files being translated? You mean a separate TM right. (2) At the translate interface, each source segment will have a list of fuzzy-match suggestions based on the active TM for the project. The score of the fuzzy match (1%-99%) or exact match (100%) will be displayed. The difference in the source text and the text being considered for translation will be highlighted for the translator to see the difference right a way. We do this on Virtaal, have a look and tell us what you think. This would probably be the approach that we'll take. (3) The translator can choose to see fuzzy matches at a pre-configured level, such as 0% (no fuzzy), 50%, 60%, 75%, etc. This should be a user preference setting. While setting a threshold is nice, a really low threshold has no real value and loads the server quite considerably. IMHO below 60% is not useful at all. I even suggest to show only the first five top results at most. For short strings (less than 10 words) I only suggest to show 85% or upper matches, and in larger strings (paragraphs and long phrases) you can show results with a matching of 60% or more. These are my suggestions after using Lokalize TM matching feature during several months, I still am searching the way to setup the TM matching feature on Virtaal. Hallo Leandro I agree about 60%, although you might be surprised about what people do in the translation industry :-) Our defaults are usually at about 70%-75%. I also agree about limiting the matches, although for Pootle, I would probably want to limit it to even less than 5 (probably only one or two suggestions), but that is detail for later. You may be surprised in some large strings matches when is the third or the fourth match the most suitable to use as a basis for the new translation. TM matching sometimes has weird behaviors. Virtaal doesn't currently support configuring the match percentage through the GUI, but you can easily edit the options in ~/.virtaal/plugins.ini Keep well Friedel -- This SF.net Dev2Dev email is sponsored by: Show off your parallel programming skills. Enter the Intel(R) Threading Challenge 2010. http://p.sf.net/sfu/intel-thread-sfd ___ Translate-pootle mailing list Translate-pootle@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/translate-pootle
Re: [translate-pootle] when is real-time translation memory support coming, and what does it mean?
On Mon, Aug 23, 2010 at 4:01 PM, Dwayne Bailey dwa...@translate.org.za wrote: On Fri, 2010-08-20 at 17:52 -0700, The Language Techie wrote: Dwayne: In Twitter you mentioned that Pootle core code is in place for real-time TM. Can you give a little more details, and when this functionality coming? We've just got 2.1 out the door and will probably focus on bug fixes for the next little while. After that we'll be looking at our roadmap. This is a wish list description of what I consider to be real-time TM support: (1) Project initiators can upload a TMX TM file to the project to apply the TM matching to all the files for that project. As apart from the files being translated? You mean a separate TM right. (2) At the translate interface, each source segment will have a list of fuzzy-match suggestions based on the active TM for the project. The score of the fuzzy match (1%-99%) or exact match (100%) will be displayed. The difference in the source text and the text being considered for translation will be highlighted for the translator to see the difference right a way. We do this on Virtaal, have a look and tell us what you think. This would probably be the approach that we'll take. (3) The translator can choose to see fuzzy matches at a pre-configured level, such as 0% (no fuzzy), 50%, 60%, 75%, etc. This should be a user preference setting. While setting a threshold is nice, a really low threshold has no real value and loads the server quite considerably. IMHO below 60% is not useful at all. I even suggest to show only the first five top results at most. For short strings (less than 10 words) I only suggest to show 85% or upper matches, and in larger strings (paragraphs and long phrases) you can show results with a matching of 60% or more. These are my suggestions after using Lokalize TM matching feature during several months, I still am searching the way to setup the TM matching feature on Virtaal. (4) After one translation is committed or suggested (by pressing the button), that translation goes into the active TM pool immediately, so that it can become a 100% match or a fuzzy match for the next sentence, without waiting until the entire project is completed or when the TM is updated. With 2.1 all our strings are in the database so that should become easier to do. But the issue of load comes up again. Matching is pretty expensive. For example, EN The quick brown fox jumps over the lazy dog. = zh-TW 敏捷的棕狐狸跳過懶狗。 As soon as this translation is made, and committed or suggested, if at a later time, or in the next sentence, the translator sees: EN The quick brown foxes jump over the lazy dog. They will see in the fuzzy-match section that EN The quick brown fox jumps over the lazy dog. = zh-TW 敏捷的棕狐狸跳過懶狗。 is a high fuzzy match, with a score of about 75-80%. If this TM match is only a suggestion, it will be nice to show that. Any TM match should show information like creation-user, change-user, creation-time, change-time, etc., if available. We probably won't do the TM approach that loads a 100% match into your text. We prefer users to make their own selections. This real-time fuzzy match behavior is what commercial CAT tools offer already. Sure, so do we in Virtaal. Its pretty easy to do on a desktop tool. But matching a lot of strings across lot of languages and projects will be where we need to see how well we can scale. (5) At any time during the project, the active TM can be downloaded through TMX format. It should contain all committed and suggested translation. Or, the inclusion of suggested translations in the download should be an option. The download TMX should contain creation-user, change-user, creation-time, change-time, etc. meta data. (6) When the user chooses to download files for offline translation with Virtaal, the relevant part of the active TM (over the chosen % threshold) should be downloaded along with the files, so that in Virtaal, users will see exactly the same fuzzy matches as they would see in Pootle online. The only difference is that, translations made by users offline in Virtaal will not be uploaded to Pootle server in real-time to benefit other translators. Hope my description of the TM functionality makes sense to the team, and I look forward to hearing about this feature! Yes, it does indeed and is very much inline with our own thinking. -- Dwayne Bailey Associate Research Director +27 12 460 1095 (w) Translate.org.za ANLoc +27 83 443 7114 (c) Recent blog posts: * Localizing Mac OS X strings files using open source PO editors http://www.translate.org.za/blogs/dwayne/en/content/localizing-mac-os-x-strings-files-using-open-source-po-editors * What's new in Virtaal 0.6.1 * Localisation: How we guess the target translation language in Virtaal Firefox web browser in Afrikaans - http://af.www.mozilla.com/af/ African Network for
Re: [translate-pootle] when is real-time translation memory support coming, and what does it mean?
On Fri, 2010-08-20 at 17:52 -0700, The Language Techie wrote: Dwayne: In Twitter you mentioned that Pootle core code is in place for real-time TM. Can you give a little more details, and when this functionality coming? We've just got 2.1 out the door and will probably focus on bug fixes for the next little while. After that we'll be looking at our roadmap. This is a wish list description of what I consider to be real-time TM support: (1) Project initiators can upload a TMX TM file to the project to apply the TM matching to all the files for that project. As apart from the files being translated? You mean a separate TM right. (2) At the translate interface, each source segment will have a list of fuzzy-match suggestions based on the active TM for the project. The score of the fuzzy match (1%-99%) or exact match (100%) will be displayed. The difference in the source text and the text being considered for translation will be highlighted for the translator to see the difference right a way. We do this on Virtaal, have a look and tell us what you think. This would probably be the approach that we'll take. (3) The translator can choose to see fuzzy matches at a pre-configured level, such as 0% (no fuzzy), 50%, 60%, 75%, etc. This should be a user preference setting. While setting a threshold is nice, a really low threshold has no real value and loads the server quite considerably. (4) After one translation is committed or suggested (by pressing the button), that translation goes into the active TM pool immediately, so that it can become a 100% match or a fuzzy match for the next sentence, without waiting until the entire project is completed or when the TM is updated. With 2.1 all our strings are in the database so that should become easier to do. But the issue of load comes up again. Matching is pretty expensive. For example, EN The quick brown fox jumps over the lazy dog. = zh-TW 敏捷的棕狐狸跳過懶狗。 As soon as this translation is made, and committed or suggested, if at a later time, or in the next sentence, the translator sees: EN The quick brown foxes jump over the lazy dog. They will see in the fuzzy-match section that EN The quick brown fox jumps over the lazy dog. = zh-TW 敏捷的棕狐狸跳過懶狗。 is a high fuzzy match, with a score of about 75-80%. If this TM match is only a suggestion, it will be nice to show that. Any TM match should show information like creation-user, change-user, creation-time, change-time, etc., if available. We probably won't do the TM approach that loads a 100% match into your text. We prefer users to make their own selections. This real-time fuzzy match behavior is what commercial CAT tools offer already. Sure, so do we in Virtaal. Its pretty easy to do on a desktop tool. But matching a lot of strings across lot of languages and projects will be where we need to see how well we can scale. (5) At any time during the project, the active TM can be downloaded through TMX format. It should contain all committed and suggested translation. Or, the inclusion of suggested translations in the download should be an option. The download TMX should contain creation-user, change-user, creation-time, change-time, etc. meta data. (6) When the user chooses to download files for offline translation with Virtaal, the relevant part of the active TM (over the chosen % threshold) should be downloaded along with the files, so that in Virtaal, users will see exactly the same fuzzy matches as they would see in Pootle online. The only difference is that, translations made by users offline in Virtaal will not be uploaded to Pootle server in real-time to benefit other translators. Hope my description of the TM functionality makes sense to the team, and I look forward to hearing about this feature! Yes, it does indeed and is very much inline with our own thinking. -- Dwayne Bailey Associate Research Director+27 12 460 1095 (w) Translate.org.za ANLoc+27 83 443 7114 (c) Recent blog posts: * Localizing Mac OS X strings files using open source PO editors http://www.translate.org.za/blogs/dwayne/en/content/localizing-mac-os-x-strings-files-using-open-source-po-editors * What's new in Virtaal 0.6.1 * Localisation: How we guess the target translation language in Virtaal Firefox web browser in Afrikaans - http://af.www.mozilla.com/af/ African Network for Localisation (ANLoc) - http://africanlocalisation.net/ -- This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev ___ Translate-pootle mailing list Translate-pootle@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/translate-pootle