People who want to make use of the TMX available here:

http://ooo.services.openoffice.org/pub/OpenOffice.org/cws/upload/localization/tmx21/

Have noticed that their structure does not match the structure of the PO files that are output by the oo2po utility.

The reason is that the TMX are created directly from the existing SDF files (or something close enough) while the PO files add an extra layer of escape characters ("\") to the SDF contents:

For example:

*String in the SDF (and in the TMX):

 \<ahelp hid=\".\" visibility=\"hidden\"\>something\</ahelp\>

*The same string in the PO will become:

\\<ahelp hid=\\\".\\\" visibility=\\\"hidden\\\"\\>something\\</ahelp \\>


So, when people use the TMX to match the contents of the PO file, they have to manually add all the extra "\". A process that is very much error prone. Besides for the fact that only a few PO editors (OmegaT only ?) can make real use of TMX files...


The solution is to directly work from the SDF files.


But since their structure is a little complex, it would be easier to extract the translatable contents first, translate it is a TMX supporting tool (OmegaT/OpenLanguageTools etc) and merge the translation to the SDF to deliver an error-free file.


People who still work with PO files in PO _editors_ don't really know what it is like to translate with _real_ translation tools, and I hope to convince them that the PO based processes that are used for OOo's localization are quickly becoming obsolete.

If you want to have fun translating while maintaining a professional level to your work, it is time to consider using tools that are created for translation... :)



A few days before this round started, I mentioned that Alex Buloichik had created a utility that extracted and merged the translatable contents of a SDF file. I tested it on the French file set (in a trial and error process, apologies to Sophie for the weird intermediary files...) and now it is ready to be used by the OOo localization community.

The tool is hosted here:

http://alex73.zaval.org/snapshots/OpenOffice/sdf2txt.jar

The source code is included in the Jar, and the license is GPL.

This tool is mostly for team coordinators: they use it to split the SDF in its module parts and to extract the translatable contents to a simple key=value text file.

The syntax is as follows:

For store sdf messages to text file:
java -jar sdf2txt.jar --extract <source-sdf-file-name> <source- lang> <output-dir>
For create sdf file with translated messages:
java -jar sdf2txt.jar --merge <source-sdf-file-name> <source- lang> <input-dir> <target-sdf-file-name> <language>
Examples:
   java -jar sdf2txt.jar --extract en-US.sdf en-US data/
   java -jar sdf2txt.jar --merge en-US.sdf en-US data/ be-BY.sdf be-BY


The output creates a folder architecture with the module names and the word count appended to each name. A summary is also output in a count.log file (file name, line number, word count).

For example, the file that contains the strings of the original [res_DataLabel_tmpl.hrc] will be found in the following folder structure:

[extraction folder]/chart2-84/source-84/controller-84/dialogs-84/ res_DataLabel_tmpl.hrc.utf8.ini

The structure of the file to translate is:

checkbox.RESOURCE_DATALABEL( xpos, ypos ).CB_CATEGORY=Show ~category
checkbox.RESOURCE_DATALABEL( xpos, ypos ).CB_SYMBOL=Show ~legend key
checkbox.RESOURCE_DATALABEL( xpos, ypos ).CB_VALUE_AS_NUMBER=Show value as ~number checkbox.RESOURCE_DATALABEL( xpos, ypos ).CB_VALUE_AS_PERCENTAGE=Show value as ~percentage fixedtext.RESOURCE_DATALABEL( xpos, ypos ).FT_LABEL_PLACEMENT=Place~ment pushbutton.RESOURCE_DATALABEL( xpos, ypos ).PB_NUMBERFORMAT=Number ~format... pushbutton.RESOURCE_DATALABEL( xpos, ypos ).PB_PERCENT_NUMBERFORMAT=Percentage f~ormat...
stringlist.WORKAROUND.1=Best fit
stringlist.WORKAROUND.10=Top right
stringlist.WORKAROUND.11=Inside
stringlist.WORKAROUND.12=Outside
stringlist.WORKAROUND.13=Near origin
stringlist.WORKAROUND.2=Center
stringlist.WORKAROUND.3=Above
stringlist.WORKAROUND.4=Top left
stringlist.WORKAROUND.5=Left
stringlist.WORKAROUND.6=Bottom left
stringlist.WORKAROUND.7=Below
stringlist.WORKAROUND.8=Bottom right
stringlist.WORKAROUND.9=Right

instead of the escapedly ugly PO format.


Once the extraction is complete, the translation coordinator divides the extracted package between the translators.

The translators can use OmegaT to translate the files.

-> put the files in /source/
-> put the TMX files in /tm/
-> put the glossary files (Sun Gloss) in /glossary/
-> reload the project

Make sure that the files are handled as UTF-8 in the File Filters option.

This time, the TMX will perfectly match the contents of the translatable files and there won't be any need to manually add "\".


For your information, I did the French UI with sdf2txt and OmegaT and the results were the following:

-about half of the 400 segments were already in the TMX
-more than half of the remaining segments had a very close equivalent in the TMX
-the rest was about 60~70 segments _out of 400_



When the translation coordinator receives all the translated files, they are merged in the original SDF file and put to the issues tracker.


Also, the current CVS version of OmegaT includes Hunspell. You can use OOo's dictionaries directly with it. You just need "ant" to build OmegaT.


I hope this post will contribute to ease the OOo localization process ! And I would like to thank Alex for the numerous test versions he produced before _I_ was satisfied with sdf2txt !

Don't hesitate to ask questions if you have any !

Jean-Christophe Helary

==============================
==sdf2txt.jar is a Java utility.
 http://alex73.zaval.org/snapshots/OpenOffice/sdf2txt.jar
==============================
==OmegaT is a Java Computer Aided Translation tool.
 (Version 1.7.3)
 http://sourceforge.net/project/showfiles.php?group_id=68187&package_id=214253
 (Version 1.8, CVS, with Hunspell)
cvs -z3 -d:pserver:[&#x30E1;&#x30FC;&#x30EB;&#x30A2;&#x30C9;&#x30EC;&#x30B9;&#x4FDD;&#x8B77;]:/cvsroot/ omegat co -P omegat
 to build, enter the /omegat/ folder and type "ant"
 the dictionary setup is relatively straightforward.
==============================

---------------------------------------------------------------------
To unsubscribe, e-mail: 
[&#x30E1;&#x30FC;&#x30EB;&#x30A2;&#x30C9;&#x30EC;&#x30B9;&#x4FDD;&#x8B77;]
For additional commands, e-mail: 
[&#x30E1;&#x30FC;&#x30EB;&#x30A2;&#x30C9;&#x30EC;&#x30B9;&#x4FDD;&#x8B77;]

メールによる返信