[ja-discuss] OOo/ODF翻訳作業（京都でセミナー）

Jean-Christophe Helary Tue, 10 Jul 2007 22:54:32 -0700

みなさん、

こんにちは。

[メールアドレス保護]方に「.sdfファイルをどうやって直接OmegaTで翻訳できる」という英語ポストを送りました。このメールの下にそれをペーストします。

それと、最新ニュースとして、OmegaTの最新テスト版が公開されて、最新版の日本語取扱説明ガイドも含まれています。

リンクは：
http://sourceforge.net/project/showfiles.php?group_id=68187

（「Other - Development｜OmegaT 1.7.1」というところになります）

面白い新しい機能は：mediawikiソースファイルをインポートする機能が付きました。それを使って、他の言語で書かれているmediawikiのファイルを簡単にインポートして、翻訳ができるようになります。翻訳したファイルは手動でアプロードしないといけないけど…


オンラインッマニュアルは：
http://sourceforge.net/docman/display_doc.php?docid=61937&group_id=68187

Wikipedia:
http://ja.wikipedia.org/wiki/OmegaT

（興味のある方はmediawikiインポートを使ってenページを訳してみてください！）

Wordならばとても高い翻訳支援ソフトを購入しなければ難しい作業はこのOmegaTを使ってOOoのファイル形式であるODF（ISO-IEC 26300）をとても簡単に翻訳ができます。

来週のOpen Source Conference 関西・京都でOmegaTの新機能やワークフローについてセミナーを行います（金・土、両方）。土曜日の午前中はFOSS翻訳の２時間ハンズオンも予定されています！


http://www.ospn.jp/osc2007-kansai/

翻訳関係の方は（OOoのl10nの方も当然）是非参加してください！


エラリー


        Subject:        [l10n-dev] Translating .sdf files directly with OmegaT
        Date:   7 juillet 2007 00:41:25 HNJ
        To:       
[&#x30E1;&#x30FC;&#x30EB;&#x30A2;&#x30C9;&#x30EC;&#x30B9;&#x4FDD;&#x8B77;]
        Reply-To:         
[&#x30E1;&#x30FC;&#x30EB;&#x30A2;&#x30C9;&#x30EC;&#x30B9;&#x4FDD;&#x8B77;]

The reason why I tried to do that is because using the .po createdwith oo2po along with the TMX created with po2tmx does not work well.The po2tmx removes data from escape sequences and that means morethings to type in the OmegaT edit window.

So, the idea was to consider the .sdf file as a pseudo HTML file tobenefit from a few automatic goodies offered by OmegaT:1) tag reduction (so that one needs to type less when tags areinline) and2) tag protection (for block tags like the <ahelp>...</ahelp> whenthey open and close the segment)

if the TMX could be hacked to show formatting tags similar to themodified source file it would become trivial to edit the tags andreflect the new contents found in source.

Problem is, an .sdf file is not a HTML file: there is plenty of metainformation and a lot of escaped "<", ">" and others.Also, a .sdf file seems to be constituted of 2 lines blocks: thesource line and the target line.

The first problem will be solved later, now, to extract thetranslatable contents we need to change the 2 lines blocks into oneline blocks with source and target data next to each other.

This is does using a regexp like (those are not exact, I do them frommemory plus they may change depending on the editor you chose):


search for:
^(.*)(en-US)(.*)\r^(.*)(fr)(.*)
replace with:
\1\2\3\t\4\5\6

Now that your .sdf is "linearized", change its name to .csv and openit in OpenOffice by using "tab" as field separator and "nothing" astext delimiter.

The tabs in the original .sdf create a number of columns from whereyou just need to copy the column with the en-US translatable contents.


Paste that into a text file that you'll name to .html

Now, we need to convert this to pseudo HTML. The idea being thatOmegaT will smoothly handle all the <ahelp> etc tags that will befound there.

First of all, we need to understand that not all the "<" are tagbeginning characters, a number of them are simply "inferior"characters. So we grab those first:


search for:
([^\])<
replace with:
\1&lt;

">" are less of a problem but let's do them anyway:

search for:
([^\])>
replace with:
\1&gt;

Now we can safely assume that all the remaining "<" or ">" areescaped with "\" and to correct that (so that the non escaped tagscan be recognized in OmegaT) do:


search for:
\\<
replace with:
<

search for:
\\>
replace with:
>

Last but not least, to ensure that OmegaT will consider each line asbeing a segment we need to add the "paragraph" mark to each linebeginning:


search for:
^
replace with:
<p>

Save, the file should be ready to be processed.

Now, we need to get matches from the TMX files that either we havecreated (oo2po -> po2tmx) or that Rafaella & all have provided us with.

Problem is that the TMX files reflect the contents of the .sdf thatwe have just modified.

In the TMX, we are likely to find an <ahelp> tag written as \<ahelpsomething\> which will not be helpful since in OmegaT the <ahelp> tagwill be displayed as <a0> and thus will not match the \<ahelpsomething\> string.

So, we need to hack the file so that it looks close enough to whatthe source expects...

In the TMX we want to reduce _all_ the escaped tags to a shortexpression that looks like <a> for a tag starting with "a".


So we would do something like (here again, not 100% exact regexp).

search for:
\\<(.)[^>]*>
replace with:
&lt;\1&gt;

same for tail tags:
\\</(.)[^>]*>
replace with:
&lt;/\1&gt;

If I remember well everything I did in the last few days that isabout it. Save the TMX, put it in /tm/, load the project andtranslate...

You can also put the Sun glossaries in /glossary/ after a little bitof formatting. But that too is trivial.

When translation is done, it is important to verify the tags (Tool ->Valitate tags) click on each segment where the tags don't with sourceand correct the target.


Then Project -> Create translated files

Get the translated .html file from /target/

And now we need to back process the whole thing to revert it to itsoriginal .sdf form.


1) remove all the <p> at the beginning of the lines

2) replace all the < with \<, all the > with \>, all the < with <and the > with >

This should be enough. Now copy the whole file and paste it in thetarget contents part of the still opened .csv file.

The .csv file now contains the source part and the target part nextto each other.

Let's save this (be careful: "tab" as field separator and "nothing"as text delimiter).


Open the result in the text editor.

The pattern we need to find to revert the 1 line blocks to 2 lineblocks is something like:

(something)(followed by lots of en-US stuff)a tab(the same something)(followed by lots of translated stuff)


^([^\t])(.*)\t\1(.*)$
and we need to replace it with:
\1\2\r\1\4

Make sure there are no mistakes (if there are any they are likely toappear right in the first lines).


Now you should have your 2 lines block.

Rename the file to .sdf and here you are.

This whole process is a bit tricky but the advantage of using OmegaTon such a big translation is just so much worth it that I reallysuggest you give that a try. The conversions does not take much timeand requires only one person preparing the files.


I hope that helps, even if it is a bit late :)

Jean-Christophe

---------------------------------------------------------------------
To unsubscribe, e-mail: 
[&#x30E1;&#x30FC;&#x30EB;&#x30A2;&#x30C9;&#x30EC;&#x30B9;&#x4FDD;&#x8B77;]
For additional commands, e-mail: 
[&#x30E1;&#x30FC;&#x30EB;&#x30A2;&#x30C9;&#x30EC;&#x30B9;&#x4FDD;&#x8B77;]

[ja-discuss] OOo/ODF翻訳作業（京都でセミナ ー）

メールによる返信

[ja-discuss] OOo/ODF翻訳作業（京都でセミナー）