On Jul 18, 2009, at 5:06 PM, Hannes Magnusson wrote:

On Sat, Jul 18, 2009 at 12:44, pedram
salehpoor<[email protected]> wrote:
Hi
I wanted to know how the changes for PO files are progressing and if there
is anything that I can do to ease the change?

There was a recent discussion about this recently.. It seemed that
people had great fears of needing to read over every single snippet
(thousands, probably hundreds of thousands) and verify their
correctness.

I didn't sense this fear, or likely ignored it. The conversion default is fuzzy but we can easily change that.

A larger fear is lost content because the conversion is not perfect. A rough example (which could easily be way off) shows Japanese at 78% translated via PO files post test conversion... so about a 20% loss. Assuming this is correct, it's a real problem but is a one time deal and likely could be better with improved conversion methods. If something is moderately up to date and follows the same structure as en/ (like, the same number of <para>'s), then it should [theoretically] convert fine. This conversion deserves better testing/ debugging.

I don't know what the exact situation is, but I think we should
probably cast a vote on this.. do people want to keep using Docbook
XML for translations (and continue with all the problems that has;
broken builds being the most annoying part) or o people want to switch
to po (with the biggest disadvantage of being non-contextual)?

How about more time before choosing any method. Yesterday I chatted with the transifex folks, and the possibility of them taking charge of this design came up. They live and breath this sort of thing so it seems natural. However, their service aspect is a business so they seek some sponsorship to properly promise dedicated time towards this. What do people think about this idea? It does not appear we have people in-house who want to lead such a charge, and I'm certainly not ideal for designing this. And now thanks to Nilgun, current DocBook translations with SVN seem to be working fine now so we're not in a huge rush. I don't think this discussion should stop translators from translating today.

If we vote for po, I would recommend to try scripting the conversion
to mark "up2date translations" as OK - i.e. error on "OK" rather then
"fuzzy" to ease the pain of needing to sanitycheck way to much text.

Sounds reasonable.

Do be honest, I don't really understand how the build process will
work with .po files. Will the "core files" (english) be automatically
generated? Will those files be in SVN? Does PhD need changes? ...

Building EN will not change, but building translations is a different story. I believe it goes like:

foreach (english docbook file as enfile) {
   if (po file exists for enfile) {
       build_file = turn_po_into_docbook(enfile);
   } else {
       build_file = enfile;
   }
   use_this_for_build(build_file);
}

Where turning a PO file into DocBook is our main change, and is done by external tools (like po4a or po2xml). Of course there are other considerations like dealing with entities but the above is a simplified flow.

I'm unsure how exactly POT files come into play here, which are basically English only PO files (templates). They are most useful for starting a new translation for a file or determining if translations are outdated (en/ strings are compared).

But as we progress I reckon we'll figure out these finer details because I imagine it'll be important for us to track which files changed since the last build, so we won't have to convert every PO file to XML on every run. I don't know if we want updated POT files in SVN because that's a pain but maybe we do. We could explore magic where these POT commits are automagically done for us, but that seems odd. And add QA checks that do full uncached builds about every week. I imagine us avoiding too much magic.

As far as I have gathered there is some work going on by a 3rd party
called "transferex" (or something similar) that offer web based system
for translation work.. Currently phpdoc is larger then their system is
capable of - but there exists a good chunk of desktop applications
that are used by others to translate these files...

A few systems come to mind and are being tested, which are:

- Pootle : Offers online editing, and various statistics
--- http://translate.php.net/
--- For the most part working now (kudos to Michael) but it seems buggy

- Transifex : Offers various statistics, and collaboration with other projects
--- http://www.transifex.net/projects/php/
--- Will also offer online editing soon

- Our online editor (beta): Knows our docbook files
--- http://doc.php.net/editor
--- Can add po related tools in the future

Transifex is hosted offsite and essentially a place that many projects gather. Unfortunately we are too large for them today but they are looking into it. Transifex is also Open Source software that we could host ourselves but I think it's better that we live on their server and live with other projects there (which hopefully means additional translators). I consider transifex.net to be an optional addition to our translation process, and certainly not a requirement for anyone to use. We may or may not allow transifex to commit to our repository on behalf of translators there.

Whatever the path, I'm hopeful we can make useful translation related tools available to us. This includes TM (translation memory), CAT (computer assisted translation), and other such tools.

Also, translators will choose to edit online or download/translate/ commit PO files themselves, depending on the situation or desire.

I am not a translator, and have never really looked into gettext and
po in any serious way, so I really lack the experience on the topic.

I, and without a doubt Philip too (who has been looking into this the
most), would greatly appreciate feedback from all translators here
(especially people like Masahiro and Nilgün!)...

This is true, and all thought and feedback by people is welcome and needed.

Regards,
Philip

Reply via email to