Serhiy Storchaka added the comment:
Could you please finish this issue Victor?
--
assignee: - haypo
stage: resolved -
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13916
___
Serhiy Storchaka added the comment:
I have no opinion.
--
assignee: serhiy.storchaka -
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13916
___
Serhiy Storchaka added the comment:
Here is a patch which tests encoding name with cp65001 instead of CP_UTF8.
I can't test on Windows and don't know which of two patches are correct.
--
Added file: http://bugs.python.org/file35262/surrogatepass_cp65001.patch
Roundup Robot added the comment:
New changeset 8ee2b73cda7a by Victor Stinner in branch 'default':
Issue #13916: Fix surrogatepass error handler on Windows
http://hg.python.org/cpython/rev/8ee2b73cda7a
--
___
Python tracker rep...@bugs.python.org
STINNER Victor added the comment:
But an exception reports about CP_UTF8.
Oh, that's my fault! And it is a bug: CP_UTF8 is the Windows constant, but it
is not a valid Python codec name.
Attached patch cp_encoding_name.patch fixes this issue.
I don't think that Py_LOWER() is needed because
Serhiy Storchaka added the comment:
This issue was mainly resolved in issue12892. The surrogatepass error handler
now works with UTF-16* and UTF-32* encodings. But for other encodings it
behaves as for UTF-8 (preserve old behavior). Should we change the behavior for
non-UTF encodings end
Serhiy Storchaka added the comment:
Here is a patch which disallows the surrogatepass handler for non-utf
encodings. Please test it on Windows.
--
type: behavior - enhancement
versions: +Python 3.5 -Python 3.1, Python 3.2, Python 3.3
___
Python
STINNER Victor added the comment:
Serhiy Storchaka wrote:
Here is a patch
I don't see your patch.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13916
___
Serhiy Storchaka added the comment:
Oh, sorry.
--
keywords: +patch
Added file: http://bugs.python.org/file35257/surrogatepass_non_utf.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13916
Martin v. Löwis added the comment:
LGTM
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13916
___
___
Python-bugs-list mailing list
Unsubscribe:
Roundup Robot added the comment:
New changeset 5e98a50e0f55 by Serhiy Storchaka in branch 'default':
Issue #13916: Disallowed the surrogatepass error handler for non UTF-*
http://hg.python.org/cpython/rev/5e98a50e0f55
--
nosy: +python-dev
___
Python
Changes by Serhiy Storchaka storch...@gmail.com:
--
assignee: - serhiy.storchaka
resolution: - fixed
stage: - resolved
status: open - closed
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13916
STINNER Victor added the comment:
It makes sense to restrict surrogatepass to UTF-* encodings. UTF-8, UTF-16 and
UTF-32 encoders reject surrogate characters, but not UTF-7. Is it a bug? I'm
asking because PyCodec_SurrogatePassErrors() doesn't support UTF-7.
IMO your change is important enough
STINNER Victor added the comment:
Windows buildbots are unhappy.
http://buildbot.python.org/all/builders/x86%20Windows7%203.x/builds/8355/steps/test/logs/stdio
==
ERROR: test_surrogatepass_handler
Serhiy Storchaka added the comment:
Here is a patch, which adds support for cp65001 and fixes test_cp1252. Please
test it on Windows Vista.
Lone surrogates are not illegal in UTF-7 (see RFC 1642), so error handler is
not called and explicit support of UTF-7 is not needed.
Could you please
STINNER Victor added the comment:
Here is a patch, which adds support for cp65001
The name of the encoding is cp65001, not something like cp-utf8.
And there is no alias like cp_65001, there is only cp65001.
--
___
Python tracker
Serhiy Storchaka added the comment:
But an exception reports about CP_UTF8.
--
title: disallow the surrogatepass handler for non utf-* encodings -
disallow the surrogatepass handler for non utf-* encodings
___
Python tracker rep...@bugs.python.org
Martin v. Löwis mar...@v.loewis.de added the comment:
I fail to see the problem. If the error handler does not produce meaningful
results in some context, then just don't use it.
The whole point of error handlers is that they handle errors; using them
shouldn't ever cause errors/exceptions.
Serhiy Storchaka storch...@gmail.com added the comment:
The problem is that surrogatepass specific to utf-8 and there is no standard
way to decode alone surrogates in utf-16.
\udc80\udc80.encode(utf-16, surrogatepass).decode(utf-16,
surrogatepass)
Traceback (most recent call last):
File
Martin v. Löwis mar...@v.loewis.de added the comment:
I see. The proper reaction for a codec that can't handle a certain error then
is to raise the original exception. I'm -1 on raising LookupError when trying
to find the error handler - this would suggest that the error handler does not
Changes by Serhiy Storchaka storch...@gmail.com:
--
nosy: +storchaka
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13916
___
___
Python-bugs-list
New submission from Kang-Hao (Kenny) Lu kennyl...@csail.mit.edu:
Currently the surrogatepass handler always encodes the surrogates in UTF-8
and hence the behavior for, say, \udc80.encode(latin-1,
surrogatepass).decode(latin-1) might be unexpected and I don't even know
what would, say,
Changes by STINNER Victor victor.stin...@haypocalc.com:
--
nosy: +haypo
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13916
___
___
23 matches
Mail list logo