On Thu, 4 Jul 2013 22:22:40 +0800 Marguerite Su wrote:

Hi Marguerite,

> >> It is possible that some kind guys writing some perl scripts to
> >> extract the translator credits in every po, generate a new xml, and
> >> integrate it into our existing documentation?

Attached is a shell script that, sort of, does what you want--based on
the existing data.

The biggest problem with your .po files is the inconsistency of the
Translators data. Some translator credits are added as comments at the
beginning of the file, whereas others are mentioned in the
"translator-credits" section. Combination of both exists, too.

The "translator credit" section contains the data in a number of
varieties that makes them difficult to parse (there are even two
different commas!). Single authors also entered their credits
differently from file to file (there are, for example, nine different
credits from yourself ;-)).

Interestingly, there is almost no occurrence of multi-line data in the
"translator-credits" section. (a fact I made use of).

The fact that the data is inconsistent also makes the msggrep output
useless for your purpose. It typically looks like this:

 > msggrep -K -e 'translator-credits' po/zypper.xml.zh_CN.po
#
# Translators:
#   <[email protected]>, 2013.
# Guo Yunhe <[email protected]>, 2013.
msgid ""
msgstr ""
"Project-Id-Version: opensuse-manuals\n"
"POT-Creation-Date: 2013-03-17 02:02+0800\n"
"PO-Revision-Date: 2013-03-02 03:36+0000\n"
"Last-Translator: guoyunhebrave <[email protected]>\n"
"Language-Team: Chinese (China)
(http://www.transifex.com/projects/p/opensuse-";
"manuals/language/zh_CN/)\n" "Language: zh_CN\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=1; plural=0;\n"

#. Put one translator per line, in the form of NAME <EMAIL>, YEAR1,
YEAR2 #: zypper.xml:0(None)
msgid "translator-credits"
msgstr "Sign Guo Yunhe <[email protected]>"

Parsing that output is not much different from parsing the complete .po
file, therefore I did not make use of it.

My script focuses on the "translator-credits" (ignoring the comments)
and tries unify the existing variants - just a quick hack that allows
you to quickly make use of the data without having to do too much
manual work.

This version assumes that multiple entries occur on one line and are
separated by ";" - that is the case in all but three files.
It converts all usable data to

<member>2012, 玛丽苏 <ulink url="mailto:[email protected]"/></member>

which allows you to just Cut & Paste it into a <simplelist>. The output
can easily be changed by adjusting the script.

Errors and missing data is reported at STDERR--that will hopefully help
fixing the data.

To run it, go to the po/ and run the script like this:

extract-translators.sh 2>translator_errors.txt | sort -u

That will put the data on STDOUT (sorted, with duplicates removed) and
the errors to translator_errors.txt.

For better results, the translator data needs to be cleaned up (at
least the comments from the top of the files need to be moved to the
"translator-credits" section). And as long as there is no way or tool
that lets you properly extract the "translator-credits" section from
the .po file, I suggest to put the data in one line separated by a
RECORD_DELIMITER (see line 4 in the script).

Hope this helps.

-- 
Regards
        Frank

Frank Sundermeyer, Technical Writer, Documentation
SUSE Linux Products GmbH, Maxfeldstr. 5, D-90409 Nuernberg
Tel: +49-911-74053-0, Fax: +49-911-7417755;  http://www.opensuse.org/
SUSE Linux Products GmbH, GF:
Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg) 
"Reality is always controlled by the people who are most insane" Dogbert

Attachment: extract-translators.sh
Description: application/shellscript

Reply via email to