iconv -f utf-8 -t cp936 RGui-zh_CN.po > RGui-zh_CN.po.cp936 iconv: illegal input sequence at position 19303
iconv -c -f utf-8 -t cp936 RGui-zh_CN.po > RGui-zh_CN.po.cp936 ^^ iconv -f cp936 -t utf-8 RGui-zh_CN.po.cp936 > RGui-zh_CN.po.cp936utf8 diff -uN RGui-zh_CN.po RGui-zh_CN.po.cp936utf8 @@ -852,7 +852,7 @@ #: rui.c:1283 rui.c:1404 msgid "menu + item is limited to 1000 bytes" -msgstr "xxx" +msgstr "xxx" grep -C1 "menu + item is limited to 1000 bytes" RGui-zh_CN.po This should ask a translator for text of a part for a difference. BTW, there is not a problem in GB18030. 2006/10/7, Duncan Murdoch <[EMAIL PROTECTED]>: > On 10/6/2006 1:35 PM, Hin-Tak Leung wrote: > > Duncan Murdoch wrote: > >> On 2006-10-5 8:06, Ei-ji Nakama wrote: > >>> I do not understand Chinese, but recognize kanji. > >>> RGui-zh_CN.po is written in utf-8, but charset=CP936 wrote. > >>> > >>> perl -p -i -e 's#charset=CP936#charset=utf-8#' RGui-zh_CN.po > >>> msgfmt -o RGui.mo RGui-zh_CN.po > >> > >> Thanks!! That does fix the error, at least on my system. I'll commit > >> the change to R-devel and R-patched. > > > > Hmm, I do understand Chinese, and I can confirm that the content > > of RGui-zh_CN.po in R 2.4 is in utf-8 rather than CP936. > > > > I can also confirm that CP950(big5) for RGui-zh_TW.po is correct, and > > CP932(shift-JIS) for RGui-ja.po is also correct. (so you'll need to > > find some korean to verify CP949 for RGui-ko.po). > > > > However, the fix is slightly "asymmetric". Out of ru, zh_CN, zh_TW, > > ja, ko, only ru in R-2.4.0/po/*.po is in localised encoding, > > (the others 4 in UTF-8), whereas RGui-*.po, after the fix, all > > are in localised encoding except RGui-zh_CN.po . > > > > I would propose correcting the encoding of the *content*, rather > > than the charset tag, so that Rgui-* all uses localised ones (CP932, > > CP936, CP949, CP950). That should be better for older windows... > > I did try that, but iconv didn't want to convert the file from UTF-8 to > CP936. I've no idea why not. > > In any case, those files only need to be readable by the translation > teams, not by end-users, so I don't think the asymmetry matters: if a > translator finds it easy to work in UTF-8 that's fine for R, as long as > it is correctly recorded. > > Duncan Murdoch > > > -- EI-JI Nakama <[EMAIL PROTECTED]> "\u4e2d\u9593\u6804\u6cbb" <[EMAIL PROTECTED]> ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel