Re: OLPC Software Code Localization - A Few Things I've Noticed

2007-10-27 Thread Ed Trager
Hi, Xavier and everyone,

 ET
 ETDoes everyone agree that there needs to be a way that
 ET all of the .po files for all languages get updated with the
 ET latest messages extracted via xgettext from the latest
 ET codebase (toolbar.py, etc.)?

 Yes, there's a problem. Reviewing what you've noted, the problem
 appears to be a mix of things. Just for the record, we are
 sticking to the POT files found in d.l.o git (not fedora)

 1) the POT in dlo only has 9 strings
 http://dev.laptop.org/git?p=projects/write;a=blob_plain;f=po/write.pot;hb=HEAD

 I personally believe that developers should generate the POT file
 and make sure that it's in d.l.o git.


Using the Write activity as an example once again, it doesn't take
that much more work to translate all 32 or 36 msgids in the latest
code base, and the result of doing so will be to have a nearly fully
localized activity.  If only 9 msgids are translated, that is, IMHO, a
much greater problem which will result in the Write activity only
being ~30%* localized which is too low a standard.

(* 30% based on the fraction of translated msgids, not word count)

There is work involved in running xgettext against the latest
dev.laptop.org git tree snapshot and then checking the resulting POT
files and merging as needed to get a more nearly complete set of
msgids in the existing PO files in the d.l.o git code base.  But --if
it is not too much work-- it might be worth doing it once to provide a
more uniform base in dev.laptop.org's git repository from which all
developers --soon to be duly informed of their responsibility to help
keep the POT files current from this point going forward-- can work
from.


 On top, some of the quirks and particularities of the tools do
 seem to get in the way, but I think that most stem from the fact
 that we don't have a 'base' POT population.


Exactly.


 Still working on it,
 Xavier


And no doubt it is a lot of work, and there are only 24 hours in each
day ... ;-)


 PS: The issue regarding lists is an interesting issue that I think
 it may be much broader than the XO... :)


Yes, I think the lists issue is much broader than XO too.  OLPC is
already setting new, innovative, and higher standards in many areas,
including the novel area of appropriate internationalization and
localization of software for children.  (Has anyone even done that in
before?  Maybe not).  The OLPC has a unique opportunity to invent good
solutions that set new standards for the rest of the world to learn
from, not just for kids, but for adults as well.

- Ed
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: OLPC Software Code Localization - A Few Things I've Noticed

2007-10-26 Thread Xavier Alvarez
On Friday 26 October 2007 16:54, you wrote:
ET Hi, everyone,
ET
ET In response to Xavier Alvarez' request on 10/25 for
ET translators and coordinators, I decided to get off the
ET sidelines and take a look at OLPC's new Pootle-based L10N
ET infrastructure.
ET
ET Here are a few things I noticed which I think will be of
ET general interest and concern:
ET
ET (0) CASING/NAMING OF PO FILES PROBLEM:

The 'rule' is quite simple (but not necessarily as intuitive as 
may be expected): given that we are bundling several d.l.o 
projects into pootle-projects, we need to ensure (or at least 
minimize the possibility) of having 2 POT files with the same 
name.

Solution? We prefix whatever filename used for the POT in d.l.o 
with the name of its project...

journal-activity.Journal.po
--dlo-project-.filename

Thus, any 'inconsistencies' are really product of other 
inconsistencies... they just happen to be more evident (and ugly) 
within Pootle.

ET
ET   (Upper/Lower) Casing of names of po files is
ET inconsistent: For example, in Core there is
ET journal-activity.Journal.po with upper case J for
ET the 2nd occurrence of Journal but then why isn't
ET write.write.po written write.Write.po? 
ET
ET   This is a small point, but consistent and inuitive
ET naming of these PO files will help everyone. Or am I just
ET failing to understand or intuit what the pattern is supposed
ET to be here?
ET
ET (1)  INCONSISTENT NUMBER OF MSGIDs ACROSS DIFFERENT
ET LANGUAGES: 

Yes and no.

The numbers shown in the statistics do not represent quantity of 
MSGIDs but WORDS in the file. So I presume that for untranslated 
strings it takes the MSGID words, and for translated strings, the 
MSGSTR. Thus two languages with all things translated and upto 
date, may still show different numbers (although conceptually 
they are the same). BTW, it does show the number of strings in 
other 'statistic levels'.

Yes, I was quite baffled too... translators are more worried about 
the word-count than 'lines of code'... ;)

In http://solar.laptop.org:5080/projects/xo_core/
LanguageTrans.  Fuzzy   Untrans. Total
Portuguese (Brazil) 162 42% 4   1%  213 56% 379
Spanish 219 62% 0   0%  132 
37% 351

While in each language+project
[pt_BR] 8 files, 162/379 words (42%) translated [118/247 strings]
[es]8 files, 219/351 words (62%) translated [157/234 strings]

Note that even Still, there's a difference with the number of 
strings... see below.


ET
ETThe other day when I looked at write.write.po for
ET French, there were only 10 messages in the catalog.  Today, I
ET see that there are 36 messages which looks a lot closer to
ET what I myself get from xgettext toolbar.py on the latest
ET code.
ETHowever, when I checked write.write.po for Thai today,
ET I see that it still has only 10 messages.
ET
ETSolution (Or at least  A Question Posing As A Possible
ET Solution): 
ET
ETDoes everyone agree that there needs to be a way that
ET all of the .po files for all languages get updated with the 
ET latest messages extracted via xgettext from the latest
ET codebase (toolbar.py, etc.)?

Yes, there's a problem. Reviewing what you've noted, the problem 
appears to be a mix of things. Just for the record, we are 
sticking to the POT files found in d.l.o git (not fedora)

1) the POT in dlo only has 9 strings
http://dev.laptop.org/git?p=projects/write;a=blob_plain;f=po/write.pot;hb=HEAD

2) the POT creation dates have probably been tampered with 
externally so it's impossible to determine which one makes sense 
without going into the source code:
FR.PO   POT-Creation-Date: 2007-06-21 17:33+0200\n
DLO POT POT-Creation-Date: 2007-06-21 17:33+0200\n

I personally believe that developers should generate the POT file 
and make sure that it's in d.l.o git. 


Overall, I find these inconsistencies a direct result of the messy 
flow we've had with t.fp.o. As a matter of fact, I've been trying 
to process the tickets in d.l.o holding PO submissions and things 
haven't been very nice. The current situation is:

0) only some projects have been injected into Pootle
   (core and bundled activites, with few exceptions like Etoys)
1) d.l.o POT files are being considered the standard
2) d.l.o PO files have been injected but not fully verified
2.1) many have lost their (UTF-8) encoding
2.2) many PO files seem not to correspond to their POT (1)
3) tickets (submitting PO files) seem to issues noted in (2)

On top, some of the quirks and particularities of the tools do 
seem to get in the way, but I think that most stem from the fact 
that we don't have a 'base' POT population.



Still working on it,
Xavier

PS: The issue regarding lists is an interesting issue that I think 
it may be much broader than the XO... :)

...snip...
ET 
ET  Questions, suggestions, ideas, etc. are all welcome!
ET 
ET 
ET  Cheers,
ET  Xavier
ET 
ET