Hi Andre, On Feb 9, 2012, at 7:09 AM, Andre Fischer wrote:
> On 09.02.2012 15:23, Huaidong Qiu wrote: >> Where can we get the data? I can help to check and understand the toolset >> and process. > > That sounds great. Please have a look at > > http://people.apache.org/~af/index.html It would be a good idea to add an INFRA issue to JIRA to track loading of this data to the Apache Pootle Server. Regards, Dave > > -Andre > >> >> On Thu, Feb 9, 2012 at 1:04 AM, Louis Suárez-Potts >> <[email protected]>wrote: >> >>> Hi >>> >>> On 8 February 2012 11:49, Andre Fischer<[email protected]> wrote: >>>> On 08.02.2012 17:31, Stuart Swales wrote: >>>>> >>>>> On 07/02/2012 14:02, Andre Fischer wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I recently had a little time to look at the pootle data. Here is what I >>>>>> have found out so far. Please keep in mind that this is new for me and >>>>>> that my interpretations may be wrong. >>>>>> >>>>>> For context I will start with a short description of the directory >>>>>> structure of the 80 GB of the backup disk: >>>>>> >>>>>> In the top-level podirectory/ there is a sub-directory openoffice_org/ >>>>>> that probably is the translation data of OpenOffice.org. It contains >>>>>> sub-directories for most languages (more on the exact set below.) >>>>>> The content of podirectory is available at [1]. >>>>>> >>>>>> Below the top-level backup/ there are two directories DEV_m103/ and >>>>>> DEV_94/ for two milestones. Below these you can find directories like >>>>>> backconvert-110326/ that probably contain backups for certain dates >>>>>> (March 26 2011 in this example. The most recent is >>>>>> DEV_m103/backconvert-110401 from April 1st of last year. >>>>>> >>>>>> After comparing time stamps I now think that we can disregard the whole >>>>>> backup/ directory. There are .po files under podirectory/ that are from >>>>>> later then April 1st. Some files are from May. >>>>>> >>>>>> I then tried to find out whether the pootle data are older or newer >>> than >>>>>> the data in the extras/l10n module in our SVN repository. The >>> timestamps >>>>>> in the .sdf files are useless, our tools set them all to 2002-02-02. >>> The >>>>>> file time stamps can not be used directly because of the differing >>>>>> directory structures. >>>>>> >>>>>> Comparing the set of lanuages of the pootle server and that in >>>>>> extras/l10n/ was also inconclusive: >>>>>> The set of languages that are present in both data sets is >>>>>> af ar as ast bg bn bo bs ca cs cy da dz es et fa fr fur ga gd gl gu he >>>>>> hi hu id is it ja jbo ka kab kn ko ku lt lv ml mr my nb nl nn nr nso ny >>>>>> oc om or pap pl ps pt ru sc si sk so sq ss st sv ta te th tn tr ts ug >>>>>> uk uz ve vi xh zu >>>>>> >>>>>> Languages only in extras/l10n/ are: >>>>>> be-BY br brx de dgo el eo eu fi hr kid kk km kok ks ky mai mk mn mni ne >>>>>> pa-IN ro rw sa-IN sat sd sh sl sr sw-TZ tg >>>>>> >>>>>> Languages only on the pootle server are: >>>>>> pyg son tk tlh >>>>>> >>>>>> See [2] for a list of language ids. (tlh for example is klingon) >>>>>> >>>>>> >>>>>> So, we probably have to merge both data sets and hope for the best. >>>>>> Any information from people who know the localization process better is >>>>>> welcome. >>>>>> >>>>>> >>>>>> Regards, >>>>>> Andre >>>>>> >>>>>> >>>>>> [1] http://people.apache.org/~af/index.html >>>>>> [2] http://www.loc.gov/standards/iso639-2/php/code_list.php >>>>> >>>>> >>>>> >>>>> And what has happened to en-GB and en-ZA ? >>>> >>>> >>>> Ah, at least one person who reads my mails :-) >>>> >>>> I forgot to add the following languages as being present in both >>> locations: >>>> ca-XV en-GB en-ZA pt-BR zh-CN zh-TW >>>> >>>> Reason: These six language ids are written slightly differently on the >>>> pootle server (with a '_' (underline) in the middle) and in l10n/ (with a >>>> '-' (dash)). I sorted them differently and then forgot about them. >>> Sorry. >>> >>> Thanks. And I too actually read your mail messages :-)--and deep >>> appreciate the work. >>> >>> ciao >>> louis >>>> >>>> -Andre >>> >>
