[issue30349] Preparation for advanced set syntax in regular expressions

2021-09-21 Thread Philippe Ombredanne


Philippe Ombredanne  added the comment:

Sorry, my comment was at best nonsensical gibberish!

I meant to say that this warning message should include the actual regex at 
fault; otherwise it is hard to fix when the regex in question comes from some 
data structure like a list; then the line number where the warning occurs is 
not enough to fix the issue; the code needs to be instrumented first to catch 
warning which is rather heavy handed to handle a warning.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30349] Preparation for advanced set syntax in regular expressions

2021-09-21 Thread Philippe Ombredanne


Philippe Ombredanne  added the comment:

FWIW, this warning is annoying because it is hard to fix in the case where the 
regex are source from data: the warning message does not include the regex at 
fault; it should otherwise the warning is noisy and ineffective IMHO.

--
nosy: +pombredanne

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30349] Preparation for advanced set syntax in regular expressions

2018-02-05 Thread Tim Graham

Tim Graham  added the comment:

Okay, I created #32775.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30349] Preparation for advanced set syntax in regular expressions

2018-02-05 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

Good catch! fnmatch.translate() can produce a pattern which emits a warning 
when compiled. Could you please open a separate issue for this?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30349] Preparation for advanced set syntax in regular expressions

2018-02-05 Thread Tim Graham

Tim Graham  added the comment:

It might be worth adding part of the problematic regex to the warning message. 
For Django's tests, I see an error like "FutureWarning: Possible nested set at 
position 17 return re.compile(res).match". It took some effort to track down 
the source.

A partial traceback is:
  File "/home/tim/code/django/django/core/management/commands/loaddata.py", 
line 247, in find_fixtures
for candidate in glob.iglob(glob.escape(path) + '*'):
  File "/home/tim/code/cpython/Lib/glob.py", line 72, in _iglob
for name in glob_in_dir(dirname, basename, dironly):
  File "/home/tim/code/cpython/Lib/glob.py", line 83, in _glob1
return fnmatch.filter(names, pattern)
  File "/home/tim/code/cpython/Lib/fnmatch.py", line 52, in filter
match = _compile_pattern(pat)
  File "/home/tim/code/cpython/Lib/fnmatch.py", line 46, in _compile_pattern
return re.compile(res).match
  File "/home/tim/code/cpython/Lib/re.py", line 240, in compile
return _compile(pattern, flags)
  File "/home/tim/code/cpython/Lib/re.py", line 292, in _compile
p = sre_compile.compile(pattern, flags)
  File "/home/tim/code/cpython/Lib/sre_compile.py", line 764, in compile
p = sre_parse.parse(p, flags)
  File "/home/tim/code/cpython/Lib/sre_parse.py", line 930, in parse
p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
  File "/home/tim/code/cpython/Lib/sre_parse.py", line 426, in _parse_sub
not nested and not items))
  File "/home/tim/code/cpython/Lib/sre_parse.py", line 816, in _parse
p = _parse_sub(source, state, sub_verbose, nested + 1)
  File "/home/tim/code/cpython/Lib/sre_parse.py", line 426, in _parse_sub
not nested and not items))
  File "/home/tim/code/cpython/Lib/sre_parse.py", line 524, in _parse
FutureWarning, stacklevel=nested + 6
FutureWarning: Possible nested set at position 17

As an aside, I'm not sure how to fix the warning in Django. It comes from the 
test added in 
https://github.com/django/django/commit/98df288ddaba9787e4a370f12aba51c2b9133142
 where a path like 'tests/fixtures/fixtures/fixture_with[special]chars' is run 
through glob.escape() which creates 
'tests/fixtures/fixtures/fixture_with[[]special]chars'.

--
nosy: +Tim.Graham

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30349] Preparation for advanced set syntax in regular expressions

2017-11-16 Thread Serhiy Storchaka

Change by Serhiy Storchaka :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30349] Preparation for advanced set syntax in regular expressions

2017-11-16 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:


New changeset 05cb728d68a278d11466f9a6c8258d914135c96c by Serhiy Storchaka in 
branch 'master':
bpo-30349: Raise FutureWarning for nested sets and set operations (#1553)
https://github.com/python/cpython/commit/05cb728d68a278d11466f9a6c8258d914135c96c


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30349] Preparation for advanced set syntax in regular expressions

2017-10-05 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

Made a warning for '[' be emitted only at the start of a set. This 
significantly decrease the breakage of other code. I think we can get around 
without implicit union of nested sets, like in [_[0-9][:Latin:]]. This can be 
written as [_||[0-9]||[:Latin:]].

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30349] Preparation for advanced set syntax in regular expressions

2017-05-12 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
pull_requests: +1650

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30349] Preparation for advanced set syntax in regular expressions

2017-05-12 Thread Serhiy Storchaka
New submission from Serhiy Storchaka:

Currently the re module supports only simple sets. They can include literal 
characters, character ranges, some simple character classes and support the 
negation. The Unicode standard [1] defines set operations (union, intersection, 
difference and symmetric difference) and nested sets. Some regular expression 
engines implemented these features, for example the regex module supports all 
TR18 features except not-nested POSIX character classes.

If replace the re module with the regex module or add support of these features 
in the re module and make this syntax enabled by default, this will break some 
code. It is very unlikely the the regular expression contains duplicated 
characters ('--', '||', '&&' or '~~'), but nested sets uses just '[', and 
non-escaped '[' is occurred in character sets in regular expressions (even the 
stdlib contains several occurrences).

Proposed patch adds FutureWarnings emitted when possible breaking set construct 
('--', '||', '&&', '~~' or '[') is occurred in a regular expression. We need 
one or two releases with a warning before changing syntax. The patch also makes 
re.escape() escaping '&' and '~' and fixes several regular expression in the 
stdlib.

Alternatively the support of new set syntax could be enabled by special flag.

I'm not sure that the support of set operations and nested sets is necessary. 
This complicates the syntax of regular expressions (which already is not 
simple). Currently set operations can be emulated with lookarounds:

[set1||set2] -- (?:[set1]|[set2])
[set1&&set2] -- [set1](?<=[set2]) or (?=[set1])[set2]
[set1--set2] -- [set1](?http://unicode.org/reports/tr18/#Subtraction_and_Intersection

--
assignee: serhiy.storchaka
components: Library (Lib), Regular Expressions
messages: 293532
nosy: ezio.melotti, mrabarnett, r.david.murray, rhettinger, serhiy.storchaka
priority: normal
severity: normal
stage: patch review
status: open
title: Preparation for advanced set syntax in regular expressions
type: enhancement
versions: Python 3.7

___
Python tracker 
<http://bugs.python.org/issue30349>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com