Risposta al messaggio di Hallvard B Furuseth :


I don't know the syntax of a po file, but this works for the
snippet you posted:

arg_re = r'"[^\\\"]*(?:\\.[^\\\"]*)*"'
arg_re = '%s(?:\s+%s)*' % (arg_re, arg_re)
find_re = re.compile(
     r'^msgid\s+(' + arg_re + ')\s*\nmsgstr\s+(' + arg_re + ')\s*\n', re.M)

However, can \ quote a newline? If so, replace \\. with \\[\s\S] or
something.
Can there be other keywords between msgid and msgstr?  If so,
add something like (?:\w+\s+<arg_re>\s*\n)*? between them.
Can msgstr come before msgid? If so, forget using a single regexp.
Anything else to the syntax to look out for?  Single quotes, maybe?

Is it a problem if the regexp isn't quite right and doesn't match all
cases, yet doesn't report an error when that happens?

All in all, it may be a bad idea to sqeeze this into a single regexp.
It gets ugly real fast.  Might be better to parse the file in a more
regular way, maybe using regexps just to extract each (keyword, "value")
pair.

Thank you very much, Haldvard, it seem to works, there is a strange match in the file header but I could skip the first match.


The po files have this structure:
http://bit.ly/18qbVc

msgid "string to translate"
"   second string to match"
"   n string to match"
msgstr "translated sting"
"   second translated string"
"  n translated string"
One or more new line before the next group.

In past I have created a Python script to parse PO files where msgid and msgstr are in two sequential lines, for example:

msgid "string to translate"
msgstr "translated string"

now the problem is how to match also (optional) string between msgid and msgstr.

Sandro





--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to