[issue23050] Add Japanese legacy encodings

2014-12-15 Thread STINNER Victor
STINNER Victor added the comment: These character encodings are legacy, but are still used. Do you have an idea of how many users still have documents stored or exchanged using these encodings? The patch is not trivial, the legacy japanese codecs are complex and so error prone :-/ For

[issue23050] Add Japanese legacy encodings

2014-12-15 Thread Tetsuya Morimoto
Tetsuya Morimoto added the comment: These character encodings are legacy, but are still used. Do you have an idea of how many users still have documents stored or exchanged using these encodings? Hmm, I guess iso-2022-jp codec is still default charset of MUA (Mail User Agent) on Japanese

[issue23050] Add Japanese legacy encodings

2014-12-15 Thread STINNER Victor
STINNER Victor added the comment: I refactored some parts of CJK codecs for performances, after the PEP 393 was implemented. A blocker point was that these codecs have very few tests. Not for valid data but for invalid data. It may be a little bit better. I tried to write a test for each path in

[issue23050] Add Japanese legacy encodings

2014-12-15 Thread Martin v . Löwis
Martin v. Löwis added the comment: Another traditional issue with Japanese codecs is that people have different opinions on what the encoding should do. It may be that when we release the codec, somebody comes up and says that the codec is incorrect, and it should do something different for

[issue23050] Add Japanese legacy encodings

2014-12-15 Thread Tetsuya Morimoto
Tetsuya Morimoto added the comment: By error prone, it mean that it's easy to introduce a bug or a regression, since the code is complex and almost nobody maintains it. Indeed. Actually, I encountered some faults when I migrated original patch. The character encoding is a kind of specialty

[issue23050] Add Japanese legacy encodings

2014-12-15 Thread Tetsuya Morimoto
Tetsuya Morimoto added the comment: Another traditional issue with Japanese codecs is that people have different opinions on what the encoding should do. It may be that when we release the codec, somebody comes up and says that the codec is incorrect, and it should do something different

[issue23050] Add Japanese legacy encodings

2014-12-14 Thread Tetsuya Morimoto
New submission from Tetsuya Morimoto: This patch adds Japanese legacy encodings as below. https://bitbucket.org/t2y/cpython/branches/compare/japanese-legacy-encoding..default * eucjp_ms (euc-jp compatible with cp932) * iso2022_jp_ms (yet another iso-2022-jp compatible with cp932, similar to

[issue23050] Add Japanese legacy encodings

2014-12-14 Thread R. David Murray
R. David Murray added the comment: In emails these are labeled as, say, iso-2022-jp-ms? See also issue 8898 with regards to email encodings. -- nosy: +r.david.murray ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23050

[issue23050] Add Japanese legacy encodings

2014-12-14 Thread Tetsuya Morimoto
Tetsuya Morimoto added the comment: On Mon, Dec 15, 2014 at 1:04 AM, R. David Murray rep...@bugs.python.org wrote: In emails these are labeled as, say, iso-2022-jp-ms? No. These are labeled just 'iso-2022-jp' and we (japanese) choose proper charset encoding to decode the encoded text. You can

[issue23050] Add Japanese legacy encodings

2014-12-14 Thread Serhiy Storchaka
Changes by Serhiy Storchaka storch...@gmail.com: -- nosy: +lemburg, loewis, serhiy.storchaka stage: - patch review ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue23050 ___