[issue15866] encode(..., 'xmlcharrefreplace') produces entities for surrogate pairs

2013-08-06 Thread Roundup Robot
Roundup Robot added the comment: New changeset 719ee60fc5e2 by Serhiy Storchaka in branch '2.7': Issue #15866: The xmlcharrefreplace error handler no more produces two XML http://hg.python.org/cpython/rev/719ee60fc5e2 -- nosy: +python-dev ___ Python

[issue15866] encode(..., 'xmlcharrefreplace') produces entities for surrogate pairs

2013-08-06 Thread Serhiy Storchaka
Changes by Serhiy Storchaka storch...@gmail.com: -- assignee: - serhiy.storchaka resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15866

[issue15866] encode(..., 'xmlcharrefreplace') produces entities for surrogate pairs

2013-04-14 Thread STINNER Victor
STINNER Victor added the comment: Should we really invest time to fix bugs related to astral (non-BMP) characters with rare codecs and error handlers (CJK codecs, xmlcharrefreplace error handler)? Python 3.3 is released and has a much better support of astral characters (in many places). I

[issue15866] encode(..., 'xmlcharrefreplace') produces entities for surrogate pairs

2013-04-14 Thread Ezio Melotti
Ezio Melotti added the comment: I tend to agree with Victor: if you want to fix 2.7 go ahead, but if that's too much work it's OK with me to close this issue. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15866

[issue15866] encode(..., 'xmlcharrefreplace') produces entities for surrogate pairs

2013-04-13 Thread Serhiy Storchaka
Changes by Serhiy Storchaka storch...@gmail.com: -- versions: -Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15866 ___ ___

[issue15866] encode(..., 'xmlcharrefreplace') produces entities for surrogate pairs

2013-03-11 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Here is a patch which fixes xmlcharrefreplace error handling in other places. Unfortunately multibyte asian encoders are broken yet. I'll open a separate issue for this. -- Added file: http://bugs.python.org/file29378/issue15866_2.patch

[issue15866] encode(..., 'xmlcharrefreplace') produces entities for surrogate pairs

2013-03-05 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I think it's better to be compatible with 3.3+. This is anyway a rather obscure corner case. Well, we should not introduce new divergence between 3.2 wide build and 3.3. Do you want to propose a new patch? I will do it. --

[issue15866] encode(..., 'xmlcharrefreplace') produces entities for surrogate pairs

2013-03-04 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I prefer a little different (simpler for me) form: for (p = collstart; p collend;) { Py_UCS4 ch = *p++; if ((0xD800 = ch ch = 0xDBFF) (p collend)

[issue15866] encode(..., 'xmlcharrefreplace') produces entities for surrogate pairs

2013-03-04 Thread Ezio Melotti
Ezio Melotti added the comment: I doubt about '\ud83d\udc9d' on wide build. Is it right to encode it as b'#128157;' and not as b'#55357;#56477;'? I don't think so. IIRC surrogates are invalid in UTF-32, and certainly shouldn't be recombined. This will be compatible with narrow build but

[issue15866] encode(..., 'xmlcharrefreplace') produces entities for surrogate pairs

2013-03-02 Thread Ezio Melotti
Changes by Ezio Melotti ezio.melo...@gmail.com: -- nosy: +serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15866 ___ ___

[issue15866] encode(..., 'xmlcharrefreplace') produces entities for surrogate pairs

2012-09-06 Thread Ezio Melotti
Ezio Melotti added the comment: Attached patch against 3.2 seems to fix the problem. -- keywords: +patch stage: - patch review versions: +Python 3.2 Added file: http://bugs.python.org/file27134/issue15866.diff ___ Python tracker

[issue15866] encode(..., 'xmlcharrefreplace') produces entities for surrogate pairs

2012-09-06 Thread Ezio Melotti
Ezio Melotti added the comment: Note that there's similar code in charmap_encoding_error, PyUnicode_EncodeCharmap, PyUnicode_TranslateCharmap, and PyUnicode_EncodeDecimal, however I'm not sure how to reach these paths. -- nosy: +lemburg ___ Python

[issue15866] encode(..., 'xmlcharrefreplace') produces entities for surrogate pairs

2012-09-06 Thread STINNER Victor
STINNER Victor added the comment: Thanks to the PEP 393, this issue is already fixed in Python 3.3. $ ./python Python 3.3.0rc1+ (default:ba2c1def3710+, Sep 3 2012, 23:20:25) [GCC 4.6.3 20120306 (Red Hat 4.6.3-2)] on linux ( u'\U0001f49d' ).encode('ascii', errors='xmlcharrefreplace')

[issue15866] encode(..., 'xmlcharrefreplace') produces entities for surrogate pairs

2012-09-05 Thread Wim
New submission from Wim: Encoding a (well-formed) Unicode string containing a non-BMP character, using the xmlcharrefreplace error handler, will produce two XML entities for surrogate codepoints instead of one entity for the actual character. Here's a transcript (Python 2.7.3, x86_64): b