[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-07-02 Thread Ned Deily


Ned Deily  added the comment:


New changeset 30c2ae4dcfd19acbdfb7045151c73d5700eec7b4 by Ned Deily (Miss 
Islington (bot)) in branch '3.7':
[3.7] bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304) 
(GH-14369)
https://github.com/python/cpython/commit/30c2ae4dcfd19acbdfb7045151c73d5700eec7b4


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-06-28 Thread Ned Deily


Ned Deily  added the comment:

@xtreak, If you set the priority to "release blocker", the appropriate release 
managers will be nosied automatically.  I'd rather see too many potential 
"release blocker"s than too few.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-06-28 Thread Karthikeyan Singaravelan


Karthikeyan Singaravelan  added the comment:

Thanks Ned for the details, in future reports I will also try to add the 
respective release manager to the issue in case of regressions.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-06-28 Thread Ned Deily


Ned Deily  added the comment:

"The only changes allowed to occur in a maintenance branch without debate are 
bug fixes. Also, a general rule for maintenance branches is that compatibility 
must not be broken at any point between sibling minor releases (3.5.1, 3.5.2, 
etc.). For both rules, only rare exceptions are accepted and must be discussed 
first."

https://devguide.python.org/devcycle/#maintenance-branches

The principle here is that we "promise" users that they can upgrade from any 
version of 3.n.x to the latest version of 3.n without needing to make any 
changes except in very rare cases when there is an overriding concern and which 
must be well-documented in the release materials.  We're human so we sometimes 
slip up and inadvertently break that promise but that's the goal. And it's 
because of that promise that we can take the approach of immediately obsoleting 
previous older micro releases when a new micro release occurs, i.e. we don't 
provide fixes for 3.7.2 once 3.7.3 is released.

So, from my perspective, pretty much *any* regression between micro releases is 
a release blocker but especially in a case like this where it can be addressed 
before a final release.  That's basically why we do release candidates :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-06-28 Thread Karthikeyan Singaravelan


Karthikeyan Singaravelan  added the comment:

Thanks Ned, is there a general policy on regressions to be marked as release 
blocker, like if a regression that was made in 3.7.3 then it acts as a release 
blocker for 3.7.4 or is it based on severity? ?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-06-28 Thread Ned Deily


Ned Deily  added the comment:

Since the latest fix (GH-14369) addresses a regression between 3.7.3 and 
3.7.4rc1, I am going to cherry-pick it into 3.7.4 final.  (In retrospect, this 
regression should have been marked as a "release blocker".)

--
nosy: +ned.deily

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-06-25 Thread STINNER Victor


STINNER Victor  added the comment:

Thanks Karthikeyan Singaravelan for the bug report of the regression. You're 
right that I misunderstood it. Thanks Serhiy for the second fix. The regression 
should now be fixed as well, I close the issue again.

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-06-25 Thread STINNER Victor


STINNER Victor  added the comment:


New changeset c755ca89c75252a7aae9beae82fd47787a76b9e2 by Victor Stinner (Miss 
Islington (bot)) in branch '3.7':
[3.7] bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304) 
(GH-14369)
https://github.com/python/cpython/commit/c755ca89c75252a7aae9beae82fd47787a76b9e2


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-06-25 Thread miss-islington


miss-islington  added the comment:


New changeset d32594ad27f48a898d42a0ea30b9d007b1c57de9 by Miss Islington (bot) 
in branch '3.8':
bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304)
https://github.com/python/cpython/commit/d32594ad27f48a898d42a0ea30b9d007b1c57de9


--
nosy: +miss-islington

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-06-25 Thread miss-islington


Change by miss-islington :


--
pull_requests: +14186
pull_request: https://github.com/python/cpython/pull/14369

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-06-25 Thread miss-islington


Change by miss-islington :


--
pull_requests: +14185
pull_request: https://github.com/python/cpython/pull/14368

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-06-25 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:


New changeset 894263ba80af4b7733c2df95b527e96953922656 by Serhiy Storchaka in 
branch 'master':
bpo-24214: Fixed the UTF-8 and UTF-16 incremental decoders. (GH-14304)
https://github.com/python/cpython/commit/894263ba80af4b7733c2df95b527e96953922656


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-06-22 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Victor, I think you misunderstood the issue. The problem is not that a decoding 
error is raised. The problem is that the incremental decoder no longer raises 
where it raised before.

I think that both behavior may be correct, and that it is better to not rely on 
ability of the incremental decoder with final=False to detect an invalid 
encoded data, but I see now that it is possible to fix for the original issue 
more carefully, without changing that behavior. PR 14304 does this.

It also change the UTF-16 incremental decoder with the surrogatepass error 
handler to return a non-empty data when decode a low surrogate with 
final=False. It is not necessary, but it makes all UTF-* decoders consistent 
and makes tests simpler.

--
resolution: fixed -> 
stage: resolved -> patch review
status: closed -> open
versions: +Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-06-22 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
pull_requests: +14128
pull_request: https://github.com/python/cpython/pull/14304

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-06-21 Thread STINNER Victor


STINNER Victor  added the comment:

> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 1: 
> invalid continuation byte

Python is right: b'f\xf1\xf6rd' is not a valid UTF-8 string:

$ python3
Python 3.7.3 (default, May 11 2019, 00:38:04) 
>>> b'f\xf1\xf6rd'.decode()
Traceback (most recent call last):
  File "", line 1, in 
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 1: invalid 
continuation byte

This change is deliberate: it makes UTF-8 incremental decoder correct (respect 
the UTF-8 standard). I close the issue.

--
resolution:  -> fixed
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-06-21 Thread Roufique Hossain


Change by Roufique Hossain :


--
nosy: +roufique7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-06-20 Thread STINNER Victor


Change by STINNER Victor :


--
resolution: fixed -> 
status: closed -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-06-20 Thread Karthikeyan Singaravelan

Karthikeyan Singaravelan  added the comment:

This change seems to have caused test failure reported in 
https://github.com/python-hyper/wsproto/issues/126


from codecs import getincrementaldecoder
decoder = getincrementaldecoder("utf-8")()
print(decoder.decode(b'f\xf1\xf6rd', False))

# With this commit 7a465cb5ee

➜  cpython git:(7a465cb5ee) ./python.exe /tmp/foo.py
f

Before 7a465cb5ee

➜  cpython git:(38f4e468d4) ./python.exe /tmp/foo.py
Traceback (most recent call last):
  File "/tmp/foo.py", line 3, in 
print(decoder.decode(b'f\xf1\xf6rd', False))
  File "/Users/karthikeyansingaravelan/stuff/python/cpython/Lib/codecs.py", 
line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 1: invalid 
continuation byte

--
nosy: +xtreak

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-03-30 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-03-30 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:


New changeset bd48280cb66544827952ca91e326cbb178c8c461 by Serhiy Storchaka 
(Miss Islington (bot)) in branch '3.7':
bpo-24214: Fixed the UTF-8 incremental decoder. (GH-12603) (GH-12627)
https://github.com/python/cpython/commit/bd48280cb66544827952ca91e326cbb178c8c461


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-03-30 Thread miss-islington


Change by miss-islington :


--
pull_requests: +12560

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-03-30 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:


New changeset 7a465cb5ee7e298cae626ace1fc3e7d97df79f2e by Serhiy Storchaka in 
branch 'master':
bpo-24214: Fixed the UTF-8 incremental decoder. (GH-12603)
https://github.com/python/cpython/commit/7a465cb5ee7e298cae626ace1fc3e7d97df79f2e


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-03-28 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

PR 12603 fixes this issue in more general way and does not affect performance.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-03-28 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
versions: +Python 3.7 -Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-03-28 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
pull_requests: +12543

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-03-28 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
versions: +Python 3.8, Python 3.9 -Python 3.5, Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-03-28 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
assignee:  -> serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2019-03-28 Thread Inada Naoki


Change by Inada Naoki :


--
nosy: +inada.naoki

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2016-08-02 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

The patch slows down decoding up to 20%.

$ ./python -m timeit -s 'b = b"\xc4\x80"*1' -- 'b.decode()'
Unpatched:  1 loops, best of 3: 50.8 usec per loop
Patched:1 loops, best of 3: 63.3 usec per loop

And I'm not sure that fixing only for the surrogatepass handler is enough. 
Other standard error handlers look working, but what if a user handler consumes 
more then one byte?

--
components: +Interpreter Core
priority: normal -> high

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2016-07-27 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
nosy: +serhiy.storchaka
stage:  -> patch review
versions: +Python 3.5 -Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24214] UTF-8 incremental decoder doesn't support surrogatepass correctly

2016-07-27 Thread STINNER Victor

Changes by STINNER Victor :


--
title: Exception with utf-8, surrogatepass and incremental decoding -> UTF-8 
incremental decoder doesn't support surrogatepass correctly

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com