Risposta al messaggio di Hallvard B Furuseth :
I don't know the syntax of a po file, but this works for the
snippet you posted:
arg_re = r'"[^\\\"]*(?:\\.[^\\\"]*)*"'
arg_re = '%s(?:\s+%s)*' % (arg_re, arg_re)
find_re = re.compile(
r'^msgid\s+(' + arg_re + ')\s*\nmsgstr\s+(' + arg_re + ')\s*\n', re.M)
However, can \ quote a newline? If so, replace \\. with \\[\s\S] or
something.
Can there be other keywords between msgid and msgstr? If so,
add something like (?:\w+\s+<arg_re>\s*\n)*? between them.
Can msgstr come before msgid? If so, forget using a single regexp.
Anything else to the syntax to look out for? Single quotes, maybe?
Is it a problem if the regexp isn't quite right and doesn't match all
cases, yet doesn't report an error when that happens?
All in all, it may be a bad idea to sqeeze this into a single regexp.
It gets ugly real fast. Might be better to parse the file in a more
regular way, maybe using regexps just to extract each (keyword, "value")
pair.
Thank you very much, Haldvard, it seem to works, there is a strange
match in the file header but I could skip the first match.
The po files have this structure:
http://bit.ly/18qbVc
msgid "string to translate"
" second string to match"
" n string to match"
msgstr "translated sting"
" second translated string"
" n translated string"
One or more new line before the next group.
In past I have created a Python script to parse PO files where msgid
and msgstr are in two sequential lines, for example:
msgid "string to translate"
msgstr "translated string"
now the problem is how to match also (optional) string between msgid and
msgstr.
Sandro
--
http://mail.python.org/mailman/listinfo/python-list