Re: Pootle Data

Andre Fischer Thu, 09 Feb 2012 07:10:35 -0800

On 09.02.2012 15:23, Huaidong Qiu wrote:

Where can we get the data? I can help to check and understand the toolset
and process.


That sounds great.  Please have a look at

http://people.apache.org/~af/index.html

-Andre


On Thu, Feb 9, 2012 at 1:04 AM, Louis Suárez-Potts
<[email protected]>wrote:

Hi

On 8 February 2012 11:49, Andre Fischer<[email protected]>  wrote:

On 08.02.2012 17:31, Stuart Swales wrote:


On 07/02/2012 14:02, Andre Fischer wrote:


Hi,

I recently had a little time to look at the pootle data. Here is what I
have found out so far. Please keep in mind that this is new for me and
that my interpretations may be wrong.

For context I will start with a short description of the directory
structure of the 80 GB of the backup disk:

In the top-level podirectory/ there is a sub-directory openoffice_org/
that probably is the translation data of OpenOffice.org. It contains
sub-directories for most languages (more on the exact set below.)
The content of podirectory is available at [1].

Below the top-level backup/ there are two directories DEV_m103/ and
DEV_94/ for two milestones. Below these you can find directories like
backconvert-110326/ that probably contain backups for certain dates
(March 26 2011 in this example. The most recent is
DEV_m103/backconvert-110401 from April 1st of last year.

After comparing time stamps I now think that we can disregard the whole
backup/ directory. There are .po files under podirectory/ that are from
later then April 1st. Some files are from May.

I then tried to find out whether the pootle data are older or newer

than

the data in the extras/l10n module in our SVN repository. The

timestamps

in the .sdf files are useless, our tools set them all to 2002-02-02.

The

file time stamps can not be used directly because of the differing
directory structures.

Comparing the set of lanuages of the pootle server and that in
extras/l10n/ was also inconclusive:
The set of languages that are present in both data sets is
af ar as ast bg bn bo bs ca cs cy da dz es et fa fr fur ga gd gl gu he
hi hu id is it ja jbo ka kab kn ko ku lt lv ml mr my nb nl nn nr nso ny
oc om or pap pl ps pt ru sc si sk so sq ss st sv ta te th tn tr ts ug
uk uz ve vi xh zu

Languages only in extras/l10n/ are:
be-BY br brx de dgo el eo eu fi hr kid kk km kok ks ky mai mk mn mni ne
pa-IN ro rw sa-IN sat sd sh sl sr sw-TZ tg

Languages only on the pootle server are:
pyg son tk tlh

See [2] for a list of language ids. (tlh for example is klingon)


So, we probably have to merge both data sets and hope for the best.
Any information from people who know the localization process better is
welcome.


Regards,
Andre


[1] http://people.apache.org/~af/index.html
[2] http://www.loc.gov/standards/iso639-2/php/code_list.php




And what has happened to en-GB and en-ZA ?



Ah, at least one person who reads my mails :-)

I forgot to add the following languages as being present in both

locations:

    ca-XV en-GB en-ZA pt-BR zh-CN zh-TW

Reason: These six language ids are written slightly differently on the
pootle server (with a '_' (underline) in the middle) and in l10n/ (with a
'-' (dash)).  I sorted them differently and then forgot about them.

Sorry.

Thanks. And I too actually read your mail messages :-)--and deep
appreciate the work.

ciao
louis


-Andre

Re: Pootle Data

Reply via email to