[issue17381] IGNORECASE breaks unicode literal range matching

2014-10-31 Thread Roundup Robot
Roundup Robot added the comment: New changeset 6f52a3d0f548 by Serhiy Storchaka in branch 'default': Issue #17381: Fixed handling of case-insensitive ranges in regular expressions. https://hg.python.org/cpython/rev/6f52a3d0f548 New changeset 7981cb1556cf by Serhiy Storchaka in branch '3.4':

[issue17381] IGNORECASE breaks unicode literal range matching

2014-10-31 Thread Roundup Robot
Roundup Robot added the comment: New changeset ebd48b4f650d by Serhiy Storchaka in branch '2.7': Backported the optimization of compiling charsets in regular expressions https://hg.python.org/cpython/rev/ebd48b4f650d New changeset 6cd4b9827755 by Serhiy Storchaka in branch '2.7': Issue #17381:

[issue17381] IGNORECASE breaks unicode literal range matching

2014-10-31 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Thank you Antoine for your review. -- resolution: - fixed stage: patch review - resolved status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17381

[issue17381] IGNORECASE breaks unicode literal range matching

2014-10-24 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Does the patch look good now for you Antoine? If there are no objections I'm going to commit it soon. In order to apply 3.4 patch to 2.7 we need either significant modify the patch, or first backport issue19329 changes to 2.7 (it would be easier).

[issue17381] IGNORECASE breaks unicode literal range matching

2014-10-09 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Updated patch for 3.5 addresses Antoine's comments. Note that 3.4 and 3.5 use different solutions of this issue. -- dependencies: +Get rid of SRE character tables Added file: http://bugs.python.org/file36842/re_ignore_case_range-3.5_3.patch

[issue17381] IGNORECASE breaks unicode literal range matching

2014-10-08 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Actually 3.5 patch can be simpler. -- Added file: http://bugs.python.org/file36839/re_ignore_case_range-3.5_2.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17381

[issue17381] IGNORECASE breaks unicode literal range matching

2014-09-24 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Here is other patch for 3.4. It is more than 10 times faster than initial patch in worst case. -- Added file: http://bugs.python.org/file36712/re_ignore_case_range-3.4_2.patch ___ Python tracker

[issue17381] IGNORECASE breaks unicode literal range matching

2014-09-17 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: This patch has a disadvantage - it slows down case-insensitive compiling of some very wide ranges, e.g. compile(r[\x00-\U0010]+, re.I) (this is worst case). In most cases this is not important, because such wide ranges are rare enough and compiled

[issue17381] IGNORECASE breaks unicode literal range matching

2014-09-08 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: No, issue12728 is more complicate case. Here is a patch which fixes this issue and issue3511. -- assignee: - serhiy.storchaka keywords: +patch stage: - patch review versions: +Python 3.4, Python 3.5 -Python 3.3 Added file:

[issue17381] IGNORECASE breaks unicode literal range matching

2013-03-12 Thread Ezio Melotti
Ezio Melotti added the comment: Is this the same issue described in #12728? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17381 ___ ___

[issue17381] IGNORECASE breaks unicode literal range matching

2013-03-11 Thread Ezio Melotti
Ezio Melotti added the comment: Matthew, should this be closed then? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17381 ___ ___

[issue17381] IGNORECASE breaks unicode literal range matching

2013-03-11 Thread Chris Adams
Chris Adams added the comment: Ezio: given the non-obvious failure, what do you think of at least documenting this and issuing a warning any time both re.UNICODE and re.IGNORECASE are set? -- ___ Python tracker rep...@bugs.python.org

[issue17381] IGNORECASE breaks unicode literal range matching

2013-03-11 Thread Matthew Barnett
Matthew Barnett added the comment: In issue #3511 the range was slightly unusual, so closing it seemed a reasonable approach, but the range in this issue is less clearly a problem. My preference would be to fix it, if possible. -- ___ Python

[issue17381] IGNORECASE breaks unicode literal range matching

2013-03-11 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I'm working on the patch. -- nosy: +serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17381 ___

[issue17381] IGNORECASE breaks unicode literal range matching

2013-03-08 Thread Chris Adams
Chris Adams added the comment: Ah, that explains it - I'd been hoping based on the re.DEBUG output that the explicit unicode ranges were preserved. I found #3511 before opening this one but don't believe the decision should be the same since this isn't a mixed numeric/alphabetic range.

[issue17381] IGNORECASE breaks unicode literal range matching

2013-03-07 Thread Chris Adams
New submission from Chris Adams: I noticed an interesting failure while using re.match / re.sub to look for non-Cyrillic characters in allegedly Russian text: re.sub(r'[\s\u0400-\u0527]+', ' ', 'Архангельская губерния', flags=re.IGNORECASE) 'Архангельская губерния'

[issue17381] IGNORECASE breaks unicode literal range matching

2013-03-07 Thread Matthew Barnett
Matthew Barnett added the comment: The way the re handles ranges is to convert the two endpoints to lowercase and then check whether the lowercase form of the character in the text is in that range. For example, [A-Z] is converted to the range [\x41-\x5A], and the lowercase form of 'Q'