[issue39337] codecs.lookup() ignores non-ASCII characters, whereas encodings.normalize_encoding() copies them

2020-10-14 Thread hai shi


hai shi  added the comment:

Thanks for everyone's continus review :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39337] codecs.lookup() ignores non-ASCII characters, whereas encodings.normalize_encoding() copies them

2020-10-14 Thread STINNER Victor


STINNER Victor  added the comment:

Thanks Hai Shi for the change and for new codecs and encodings tests.

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed
versions: +Python 3.10 -Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39337] codecs.lookup() ignores non-ASCII characters, whereas encodings.normalize_encoding() copies them

2020-10-14 Thread STINNER Victor


STINNER Victor  added the comment:


New changeset c5b049b91ca50c615f9a5425055c2b79a82ac547 by Hai Shi in branch 
'master':
bpo-39337: encodings.normalize_encoding() now ignores non-ASCII characters 
(GH-22219)
https://github.com/python/cpython/commit/c5b049b91ca50c615f9a5425055c2b79a82ac547


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39337] codecs.lookup() ignores non-ASCII characters, whereas encodings.normalize_encoding() copies them

2020-10-08 Thread STINNER Victor


STINNER Victor  added the comment:


New changeset 3f342376ab0da3b4c8a38a27edfafba8e8cdf52d by Hai Shi in branch 
'master':
bpo-39337: Add a test case for normalizing of codec names (GH-19069)
https://github.com/python/cpython/commit/3f342376ab0da3b4c8a38a27edfafba8e8cdf52d


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39337] codecs.lookup() ignores non-ASCII characters, whereas encodings.normalize_encoding() copies them

2020-10-01 Thread STINNER Victor


Change by STINNER Victor :


--
pull_requests: +21499
pull_request: https://github.com/python/cpython/pull/17997

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39337] codecs.lookup() ignores non-ASCII characters, whereas encodings.normalize_encoding() copies them

2020-10-01 Thread STINNER Victor


STINNER Victor  added the comment:

See also bpo-37751 and my PR 17997.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39337] codecs.lookup() ignores non-ASCII characters, whereas encodings.normalize_encoding() copies them

2020-09-22 Thread hai shi


Change by hai shi :


--
pull_requests: +21401
pull_request: https://github.com/python/cpython/pull/22360

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39337] codecs.lookup() ignores non-ASCII characters, whereas encodings.normalize_encoding() copies them

2020-09-12 Thread hai shi


Change by hai shi :


--
pull_requests: +21274
pull_request: https://github.com/python/cpython/pull/22219

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39337] codecs.lookup() ignores non-ASCII characters, whereas encodings.normalize_encoding() copies them

2020-03-19 Thread hai shi


Change by hai shi :


--
pull_requests: +18424
pull_request: https://github.com/python/cpython/pull/19069

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39337] codecs.lookup() ignores non-ASCII characters, whereas encodings.normalize_encoding() copies them

2020-03-17 Thread hai shi


hai shi  added the comment:

> Use _testinternalcapi in this case.
oh, forgive me. I don't atttention this extension module before :(

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39337] codecs.lookup() ignores non-ASCII characters, whereas encodings.normalize_encoding() copies them

2020-03-17 Thread STINNER Victor


STINNER Victor  added the comment:

> _testcapi must test the public Python C API, not CPython internal C API

Use _testinternalcapi in this case.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39337] codecs.lookup() ignores non-ASCII characters, whereas encodings.normalize_encoding() copies them

2020-03-17 Thread hai shi


hai shi  added the comment:

> Maybe we should just add a private function for test in _testcapi.

Oh, there have a problem with this idea:
struct _is is defined in internal/pycore_pystate.h.
_testcapi must test the public Python C API, not CPython internal C API

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39337] codecs.lookup() ignores non-ASCII characters, whereas encodings.normalize_encoding() copies them

2020-03-14 Thread hai shi


Change by hai shi :


--
pull_requests: +18344
pull_request: https://github.com/python/cpython/pull/18987

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39337] codecs.lookup() ignores non-ASCII characters, whereas encodings.normalize_encoding() copies them

2020-03-14 Thread hai shi


hai shi  added the comment:

> How about calling `encodings.normalize_encoding() in 
> codecs.normalizestring()` to keep same behavior?(I create PR18845)

I have try this idea, but it make the testcase of test_io.py failed because 
some object will call `codecs.Lookup()` in `__del__()`.-->extension module will 
be cleaned before calling `__del__().`

> I would prefer that codecs.lookup() and encodings.normalize_encoding() behave 
> the same. Either always ignore or always copy.

I try to add a `_Py_normalize_unicode_encoding()` in unicodeobject.c to support 
non-ASCII encoding names' normalization(PR18987), but this PR caused many 
testcases failed.

For example:

In master:
python3.9 -c "print('a\xac\u1234\u20ac\u8000\U0010'.encode('iso-8859-15', 
'namereplace'))"
result:
b'a\xac\\N{ETHIOPIC SYLLABLE SEE}\xa4\\N{CJK UNIFIED IDEOGRAPH-8000}\\U0010'

after PR18987:
./python -c "print('a\xac\u1234\u20ac\u8000\U0010'.encode('iso-8859-15', 
'namereplace'))"
result:
b'a\xac\\N{ETHIOPIC SYLLABLE SEE}\\N{EURO SIGN}\\N{CJK UNIFIED 
IDEOGRAPH-8000}\\U0010'

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39337] codecs.lookup() ignores non-ASCII characters, whereas encodings.normalize_encoding() copies them

2020-03-08 Thread hai shi


hai shi  added the comment:

> I would prefer that codecs.lookup() and encodings.normalize_encoding() behave 
> the same. Either always ignore or always copy.

How about calling `encodings.normalize_encoding() in codecs.normalizestring()` 
to keep same behavior?(I create PR18845)

> Maybe we should just add a private function for test in _testcapi

I can try to add some test cases in next weekend ;)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39337] codecs.lookup() ignores non-ASCII characters, whereas encodings.normalize_encoding() copies them

2020-03-08 Thread hai shi


Change by hai shi :


--
keywords: +patch
nosy: +shihai1991
nosy_count: 3.0 -> 4.0
pull_requests: +18202
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/18845

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39337] codecs.lookup() ignores non-ASCII characters, whereas encodings.normalize_encoding() copies them

2020-01-14 Thread STINNER Victor

New submission from STINNER Victor :

bpo-37751 changed codecs.lookup() in a subtle way: non-ASCII characters are now 
ignored, whereas they were copied unmodified previously.

I would prefer that codecs.lookup() and encodings.normalize_encoding() behave 
the same. Either always ignore or always copy.

Moreover, it seems like there is no test on how the encoding names are 
normalized in codecs.register(). I recall that using codecs.register() in an 
unit test causes troubles since there is no API to unregister a search 
function. Maybe we should just add a private function for test in _testcapi.

Serhiy Storchaka wrote an example on my PR:
https://github.com/python/cpython/pull/17997/files

> There are other differences. For example, normalize_encoding("КОИ-8") returns 
> "кои_8", but codecs.lookup normalizes it to "8".

> The comment in the sources is also not correct.

--
components: Library (Lib)
messages: 360004
nosy: lemburg, serhiy.storchaka, vstinner
priority: normal
severity: normal
status: open
title: codecs.lookup() ignores non-ASCII characters, whereas 
encodings.normalize_encoding() copies them
versions: Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com