[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2017-03-31 Thread Donald Stufft

Changes by Donald Stufft :


--
pull_requests: +981

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2016-11-22 Thread STINNER Victor

STINNER Victor added the comment:

Serhiy Storchaka: "&& left->codesize && right->codesize)"

Ooops! Fixed!

"While we are here, it perhaps worth to add a fast path for self == other."

Done.

--
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2016-11-22 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 6b43d15fd2d7 by Victor Stinner in branch '3.6':
Issue #28727: Fix typo in pattern_richcompare()
https://hg.python.org/cpython/rev/6b43d15fd2d7

New changeset c2cb70c97163 by Victor Stinner in branch '3.6':
Issue #28727: Optimize pattern_richcompare() for a==a
https://hg.python.org/cpython/rev/c2cb70c97163

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2016-11-22 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

While we are here, it perhaps worth to add a fast path for self == other.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2016-11-22 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

+   && left->codesize && right->codesize);

There is a typo. Should be:

+   && left->codesize == right->codesize);

--
status: closed -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2016-11-21 Thread STINNER Victor

STINNER Victor added the comment:

For stricter checks on _sre.compile() arguments, I created the issue #28765: 
"_sre.compile(): be more strict on types of indexgroup and groupindex".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2016-11-21 Thread STINNER Victor

STINNER Victor added the comment:

Serhiy Storchaka: "pattern_compare-6.patch LGTM."

Thank you very much for your very useful reviews! I pushed the change.

--
resolution:  -> fixed
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2016-11-21 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 5e8ef1493843 by Victor Stinner in branch '3.6':
Implement rich comparison for _sre.SRE_Pattern
https://hg.python.org/cpython/rev/5e8ef1493843

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2016-11-21 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

pattern_compare-6.patch LGTM.

--
assignee:  -> haypo
stage: patch review -> commit review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2016-11-21 Thread STINNER Victor

STINNER Victor added the comment:

Back to basis, patch 6:

* revert changes on indexgroup and groupindex types: I will fix this later, 
once this issue is fixed
* pattern_richcompare() and pattern_hash() also uses pattern, but don't use 
groups, indexgroup nor groupindex anymore

I removed the @cpython_only unit test and rewrote test_pattern_compare_bytes() 
to make it easier to follow.

re.compile('abc', re.IGNORECASE) is different than re.compile('ABC', 
re.IGNORECASE), but it's a deliberate choice to not test it. I consider that 
the behaviour can change depending on the Python implementation and in a future 
version.

--
Added file: http://bugs.python.org/file45587/pattern_compare-6.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2016-11-20 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

This looks too complicated. groups, indexgroup and groupindex are unambiguously 
derived from pattern string. If caching works different pattern strings are 
compiled to different pattern objects. Currently they are not equal, even if 
their codes are equal. And I don't see large need to consider them equal.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2016-11-18 Thread STINNER Victor

STINNER Victor added the comment:

New approach: patch 5 now compares indexgroup, groupindex and code instead of 
comparing pattern, to handle correctly two equal pattern strings compiled with 
the re.LOCALE flag and two different locales.

The patch also converts indexgroup list to a tuple to be able to hash it. (It 
also prevents modification, but this is just a side effect, and groupindex 
remains a mutable dictionary.)

_sre.compile() checks types which helps to identify a bug in unit tests.

--
Added file: http://bugs.python.org/file45541/pattern_compare-5.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2016-11-18 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I see two options:

* Compare flags, isbytes and code. This makes some different patterns be 
compiled to equal objects. For example spaces in verbose mode are ignored. But 
this don't make equal all equivalent patterns. '[aA]' would be equal to 
'(?:a|A)' but still would be not equal to '(i?a)' with current implementation.

* Compare flags, isbytes, code and pattern. This makes literally different 
patterns be compiled to not equal objects even if the difference is not 
significant. '[abc]' would be different from '[cba]' despites the fact that 
matching both always returns the same result.

Since this issue becomes a little ambiguous, I would target the patch to 3.7 
only. Maybe we will find other subtle details or will decide to change the 
meaning of equality of pattern objects before releasing 3.7.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2016-11-18 Thread STINNER Victor

STINNER Victor added the comment:

> '[abc]' and '[cba]' are compiled to the same code. Do we want to handle them 
> as equal?

Comparison must be fast. If the comparison is just memcmp(code1,
code2, codesize), it's ok.

I agree that we must put a similar somewhere to say that some parts
are implementation details. Maybe split the test in two parts, and
mark the second part with @cpython_only.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2016-11-18 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

'[abc]' and '[cba]' are compiled to the same code. Do we want to handle them as 
equal? This is implementation defined.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2016-11-18 Thread STINNER Victor

STINNER Victor added the comment:

Serhiy: "There is a problem with locale-depending flags. The same pattern may 
be compiled with the LOCALE flag to different pattern objects in different 
locales."

Oh, I didn't know and you are right.

"Perhaps we should compare the compiled code instead of pattern strings. Or 
both."

PatternObject contains many fields. I used the following two fields which come 
from re.compile():

* pattern
* flags

I considered that they were enough because pattern_repr() only displays these 
ones. Other fields:

* groups
* groupindex
* indexgroup
* weakreflist
* isbytes
* codesize, code

weakreflist can be skipped, isbytes is already tested in my patch.

Would it be possible to only compare code instead of pattern? What are groups, 
groupindex and indexgroup: should we also compare them?

Maybe I can start from  pattern_compare-4.patch and only add a test comparing 
code?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2016-11-18 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

There is a problem with locale-depending flags. The same pattern may be 
compiled with the LOCALE flag to different pattern objects in different 
locales. Perhaps we should compare the compiled code instead of pattern 
strings. Or both.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2016-11-18 Thread STINNER Victor

STINNER Victor added the comment:

Ok, I hope that it's the last attempt: patch 5

* Remove hash(b) != hash(a): only keep tests on hash(b)==hash(a) when b==a
* Replace re.ASCII flag with no flag to test two patterns with different flags

--
Added file: http://bugs.python.org/file45529/pattern_compare-4.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2016-11-17 Thread STINNER Victor

STINNER Victor added the comment:

Patch version 3:

* pattern_hash() includes isbytes, I also shifted flags by 1 bit to not erase 
the isbytes bit (FYI maximum value of flags is 256)
* pattern_richcompare() avoids calling PyObject_RichCompareBool() if flags or 
isbytes is different
* unit test ensures that no BytesWarning warning is raised
* checks hash() in unit tests
* fix also the unit test with a different flag (use the same pattern)
* document also in the unit test that the comparison is case sensitive

--
Added file: http://bugs.python.org/file45528/pattern_compare-3.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2016-11-17 Thread Matthew Barnett

Matthew Barnett added the comment:

I hope you make it clear what you mean by 'equal', i.e. that it's comparing the 
pattern and the flags (AFAICT), so re.compile('(?x)a') != re.compile('(?x) a ').

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2016-11-17 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
components: +Regular Expressions
nosy: +ezio.melotti, mrabarnett

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2016-11-17 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Added comments on Rietveld.

--
nosy: +serhiy.storchaka
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2016-11-17 Thread STINNER Victor

STINNER Victor added the comment:

> Ten subtest in test_re fail with: TypeError: unhashable type: 
> '_sre.SRE_Pattern'

Oops, right. Updated patch implements also hash() on patterns.

--
Added file: http://bugs.python.org/file45523/pattern_compare-2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2016-11-17 Thread SilentGhost

SilentGhost added the comment:

Ten subtest in test_re fail with

TypeError: unhashable type: '_sre.SRE_Pattern'

--
nosy: +SilentGhost

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28727] Implement comparison (x==y and x!=y) for _sre.SRE_Pattern

2016-11-17 Thread STINNER Victor

New submission from STINNER Victor:

Attached patch implements rich comparison for _sre.SRE_Pattern objects created 
by re.compile().

Comparison between patterns is used in the warnings module to not add 
duplicated filters, see issue #18383:

New changeset f57f4e33ba5e by Martin Panter in branch '3.5':
Issue #18383: Avoid adding duplicate filters when warnings is reloaded
https://hg.python.org/cpython/rev/f57f4e33ba5e

For the warnings module, it became a problem in test_warnings since the Python 
test runner started to clear all caches. When re.purge() is called, 
re.compile() creates a new object, whereas with the cache it returns the same 
object and so the two patterns are equal since it's the same object. => see 
issue #28688

--
files: pattern_compare.patch
keywords: patch
messages: 281046
nosy: haypo
priority: normal
severity: normal
status: open
title: Implement comparison (x==y and x!=y) for _sre.SRE_Pattern
type: enhancement
versions: Python 3.7
Added file: http://bugs.python.org/file45520/pattern_compare.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com