STINNER Victor added the comment:
These character encodings are legacy, but are still used.
Do you have an idea of how many users still have documents stored or exchanged
using these encodings? The patch is not trivial, the legacy japanese codecs are
complex and so error prone :-/
For
Tetsuya Morimoto added the comment:
These character encodings are legacy, but are still used.
Do you have an idea of how many users still have documents stored or
exchanged using these encodings?
Hmm, I guess iso-2022-jp codec is still default charset of MUA (Mail
User Agent) on Japanese
STINNER Victor added the comment:
I refactored some parts of CJK codecs for performances, after the PEP 393
was implemented. A blocker point was that these codecs have very few tests.
Not for valid data but for invalid data. It may be a little bit better. I
tried to write a test for each path in
Martin v. Löwis added the comment:
Another traditional issue with Japanese codecs is that people have different
opinions on what the encoding should do. It may be that when we release the
codec, somebody comes up and says that the codec is incorrect, and it should do
something different for
Tetsuya Morimoto added the comment:
By error prone, it mean that it's easy to introduce a bug or a regression,
since the code is complex and almost nobody maintains it.
Indeed. Actually, I encountered some faults when I migrated original
patch. The character encoding is a kind of specialty
Tetsuya Morimoto added the comment:
Another traditional issue with Japanese codecs is that people have different
opinions on what the encoding should do. It may be that when we release the
codec, somebody comes up and says that the codec is incorrect, and it should
do something different
New submission from Tetsuya Morimoto:
This patch adds Japanese legacy encodings as below.
https://bitbucket.org/t2y/cpython/branches/compare/japanese-legacy-encoding..default
* eucjp_ms (euc-jp compatible with cp932)
* iso2022_jp_ms (yet another iso-2022-jp compatible with cp932, similar to
R. David Murray added the comment:
In emails these are labeled as, say, iso-2022-jp-ms?
See also issue 8898 with regards to email encodings.
--
nosy: +r.david.murray
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23050
Tetsuya Morimoto added the comment:
On Mon, Dec 15, 2014 at 1:04 AM, R. David Murray rep...@bugs.python.org wrote:
In emails these are labeled as, say, iso-2022-jp-ms?
No. These are labeled just 'iso-2022-jp' and we (japanese) choose
proper charset encoding to decode the encoded text. You can
Changes by Serhiy Storchaka storch...@gmail.com:
--
nosy: +lemburg, loewis, serhiy.storchaka
stage: - patch review
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23050
___
10 matches
Mail list logo