Where can we get the data? I can help to check and understand the toolset and process.
On Thu, Feb 9, 2012 at 1:04 AM, Louis Suárez-Potts <[email protected]>wrote: > Hi > > On 8 February 2012 11:49, Andre Fischer <[email protected]> wrote: > > On 08.02.2012 17:31, Stuart Swales wrote: > >> > >> On 07/02/2012 14:02, Andre Fischer wrote: > >>> > >>> Hi, > >>> > >>> I recently had a little time to look at the pootle data. Here is what I > >>> have found out so far. Please keep in mind that this is new for me and > >>> that my interpretations may be wrong. > >>> > >>> For context I will start with a short description of the directory > >>> structure of the 80 GB of the backup disk: > >>> > >>> In the top-level podirectory/ there is a sub-directory openoffice_org/ > >>> that probably is the translation data of OpenOffice.org. It contains > >>> sub-directories for most languages (more on the exact set below.) > >>> The content of podirectory is available at [1]. > >>> > >>> Below the top-level backup/ there are two directories DEV_m103/ and > >>> DEV_94/ for two milestones. Below these you can find directories like > >>> backconvert-110326/ that probably contain backups for certain dates > >>> (March 26 2011 in this example. The most recent is > >>> DEV_m103/backconvert-110401 from April 1st of last year. > >>> > >>> After comparing time stamps I now think that we can disregard the whole > >>> backup/ directory. There are .po files under podirectory/ that are from > >>> later then April 1st. Some files are from May. > >>> > >>> I then tried to find out whether the pootle data are older or newer > than > >>> the data in the extras/l10n module in our SVN repository. The > timestamps > >>> in the .sdf files are useless, our tools set them all to 2002-02-02. > The > >>> file time stamps can not be used directly because of the differing > >>> directory structures. > >>> > >>> Comparing the set of lanuages of the pootle server and that in > >>> extras/l10n/ was also inconclusive: > >>> The set of languages that are present in both data sets is > >>> af ar as ast bg bn bo bs ca cs cy da dz es et fa fr fur ga gd gl gu he > >>> hi hu id is it ja jbo ka kab kn ko ku lt lv ml mr my nb nl nn nr nso ny > >>> oc om or pap pl ps pt ru sc si sk so sq ss st sv ta te th tn tr ts ug > >>> uk uz ve vi xh zu > >>> > >>> Languages only in extras/l10n/ are: > >>> be-BY br brx de dgo el eo eu fi hr kid kk km kok ks ky mai mk mn mni ne > >>> pa-IN ro rw sa-IN sat sd sh sl sr sw-TZ tg > >>> > >>> Languages only on the pootle server are: > >>> pyg son tk tlh > >>> > >>> See [2] for a list of language ids. (tlh for example is klingon) > >>> > >>> > >>> So, we probably have to merge both data sets and hope for the best. > >>> Any information from people who know the localization process better is > >>> welcome. > >>> > >>> > >>> Regards, > >>> Andre > >>> > >>> > >>> [1] http://people.apache.org/~af/index.html > >>> [2] http://www.loc.gov/standards/iso639-2/php/code_list.php > >> > >> > >> > >> And what has happened to en-GB and en-ZA ? > > > > > > Ah, at least one person who reads my mails :-) > > > > I forgot to add the following languages as being present in both > locations: > > ca-XV en-GB en-ZA pt-BR zh-CN zh-TW > > > > Reason: These six language ids are written slightly differently on the > > pootle server (with a '_' (underline) in the middle) and in l10n/ (with a > > '-' (dash)). I sorted them differently and then forgot about them. > Sorry. > > Thanks. And I too actually read your mail messages :-)--and deep > appreciate the work. > > ciao > louis > > > > -Andre >
