[issue36311] Flaw in Windows code page decoder for large input

2019-09-09 Thread Steve Dower
Steve Dower added the comment: Declaring this out-of-scope for 2.7, unless someone wants to insist (and provide a PR). -- resolution: -> fixed stage: backport needed -> resolved status: open -> closed versions: -Python 2.7 ___ Python tracker

[issue36311] Flaw in Windows code page decoder for large input

2019-08-21 Thread miss-islington
miss-islington added the comment: New changeset 735a960ac98cf414caf910565220ab2761fa542a by Miss Islington (bot) in branch '3.7': bpo-36311: Fixes decoding multibyte characters around chunk boundaries and improves decoding performance (GH-15083)

[issue36311] Flaw in Windows code page decoder for large input

2019-08-21 Thread miss-islington
miss-islington added the comment: New changeset f93c15aedc2ea2cb8b56fc9dbb0d412918992e86 by Miss Islington (bot) in branch '3.8': bpo-36311: Fixes decoding multibyte characters around chunk boundaries and improves decoding performance (GH-15083)

[issue36311] Flaw in Windows code page decoder for large input

2019-08-21 Thread Steve Dower
Steve Dower added the comment: I'll get the 3.7 and 3.8 backports merged - looks like they're trivial. Going to need some help with the 2.7 backport, but I'm happy to approve a PR. -- stage: patch review -> backport needed ___ Python tracker

[issue36311] Flaw in Windows code page decoder for large input

2019-08-21 Thread miss-islington
Change by miss-islington : -- pull_requests: +15086 pull_request: https://github.com/python/cpython/pull/15375 ___ Python tracker ___

[issue36311] Flaw in Windows code page decoder for large input

2019-08-21 Thread miss-islington
Change by miss-islington : -- pull_requests: +15085 pull_request: https://github.com/python/cpython/pull/15374 ___ Python tracker ___

[issue36311] Flaw in Windows code page decoder for large input

2019-08-21 Thread Steve Dower
Steve Dower added the comment: New changeset 7ebdda0dbee7df6f0c945a7e1e623e47676e112d by Steve Dower in branch 'master': bpo-36311: Fixes decoding multibyte characters around chunk boundaries and improves decoding performance (GH-15083)

[issue36311] Flaw in Windows code page decoder for large input

2019-08-02 Thread Steve Dower
Change by Steve Dower : -- keywords: +patch pull_requests: +14828 stage: test needed -> patch review pull_request: https://github.com/python/cpython/pull/15083 ___ Python tracker

[issue36311] Flaw in Windows code page decoder for large input

2019-08-02 Thread Steve Dower
Change by Steve Dower : -- assignee: -> steve.dower versions: +Python 3.9 ___ Python tracker ___ ___ Python-bugs-list mailing list

[issue36311] Flaw in Windows code page decoder for large input

2019-08-02 Thread Steve Dower
Steve Dower added the comment: If we reduce our chunk size below INT_MAX, then we avoid the issue entirely. Our logic for hitting the middle of a multibyte character is fine (perhaps fixed since this issue was opened?), there's just a weird edge case at 2 GiB in the API call. As a bonus,

[issue36311] Flaw in Windows code page decoder for large input

2019-03-22 Thread Terry J. Reedy
Terry J. Reedy added the comment: I have 24G if all working and would be willing to try to run a test case. -- nosy: +terry.reedy stage: -> test needed ___ Python tracker

[issue36311] Flaw in Windows code page decoder for large input

2019-03-16 Thread Serhiy Storchaka
New submission from Serhiy Storchaka : There is a flaw in PyUnicode_DecodeCodePageStateful() (exposed as _codecs.code_page_decode() at Python level). Since MultiByteToWideChar() takes the size of the input as C int, it can not be used for decoding more than 2 GiB. Large input is split on