Re: Pootle Data

Huaidong Qiu Thu, 09 Feb 2012 06:24:17 -0800

Where can we get the data? I can help to check and understand the toolset
and process.


On Thu, Feb 9, 2012 at 1:04 AM, Louis Suárez-Potts
<[email protected]>wrote:

> Hi
>
> On 8 February 2012 11:49, Andre Fischer <[email protected]> wrote:
> > On 08.02.2012 17:31, Stuart Swales wrote:
> >>
> >> On 07/02/2012 14:02, Andre Fischer wrote:
> >>>
> >>> Hi,
> >>>
> >>> I recently had a little time to look at the pootle data. Here is what I
> >>> have found out so far. Please keep in mind that this is new for me and
> >>> that my interpretations may be wrong.
> >>>
> >>> For context I will start with a short description of the directory
> >>> structure of the 80 GB of the backup disk:
> >>>
> >>> In the top-level podirectory/ there is a sub-directory openoffice_org/
> >>> that probably is the translation data of OpenOffice.org. It contains
> >>> sub-directories for most languages (more on the exact set below.)
> >>> The content of podirectory is available at [1].
> >>>
> >>> Below the top-level backup/ there are two directories DEV_m103/ and
> >>> DEV_94/ for two milestones. Below these you can find directories like
> >>> backconvert-110326/ that probably contain backups for certain dates
> >>> (March 26 2011 in this example. The most recent is
> >>> DEV_m103/backconvert-110401 from April 1st of last year.
> >>>
> >>> After comparing time stamps I now think that we can disregard the whole
> >>> backup/ directory. There are .po files under podirectory/ that are from
> >>> later then April 1st. Some files are from May.
> >>>
> >>> I then tried to find out whether the pootle data are older or newer
> than
> >>> the data in the extras/l10n module in our SVN repository. The
> timestamps
> >>> in the .sdf files are useless, our tools set them all to 2002-02-02.
> The
> >>> file time stamps can not be used directly because of the differing
> >>> directory structures.
> >>>
> >>> Comparing the set of lanuages of the pootle server and that in
> >>> extras/l10n/ was also inconclusive:
> >>> The set of languages that are present in both data sets is
> >>> af ar as ast bg bn bo bs ca cs cy da dz es et fa fr fur ga gd gl gu he
> >>> hi hu id is it ja jbo ka kab kn ko ku lt lv ml mr my nb nl nn nr nso ny
> >>> oc om or pap pl ps pt ru sc si sk so sq ss st sv ta te th tn tr ts ug
> >>> uk uz ve vi xh zu
> >>>
> >>> Languages only in extras/l10n/ are:
> >>> be-BY br brx de dgo el eo eu fi hr kid kk km kok ks ky mai mk mn mni ne
> >>> pa-IN ro rw sa-IN sat sd sh sl sr sw-TZ tg
> >>>
> >>> Languages only on the pootle server are:
> >>> pyg son tk tlh
> >>>
> >>> See [2] for a list of language ids. (tlh for example is klingon)
> >>>
> >>>
> >>> So, we probably have to merge both data sets and hope for the best.
> >>> Any information from people who know the localization process better is
> >>> welcome.
> >>>
> >>>
> >>> Regards,
> >>> Andre
> >>>
> >>>
> >>> [1] http://people.apache.org/~af/index.html
> >>> [2] http://www.loc.gov/standards/iso639-2/php/code_list.php
> >>
> >>
> >>
> >> And what has happened to en-GB and en-ZA ?
> >
> >
> > Ah, at least one person who reads my mails :-)
> >
> > I forgot to add the following languages as being present in both
> locations:
> >    ca-XV en-GB en-ZA pt-BR zh-CN zh-TW
> >
> > Reason: These six language ids are written slightly differently on the
> > pootle server (with a '_' (underline) in the middle) and in l10n/ (with a
> > '-' (dash)).  I sorted them differently and then forgot about them.
> Sorry.
>
> Thanks. And I too actually read your mail messages :-)--and deep
> appreciate the work.
>
> ciao
> louis
> >
> > -Andre
>

Re: Pootle Data

Reply via email to