[issue40480] "fnmatch" exponential execution time

2020-05-12 Thread Tim Peters


Tim Peters  added the comment:

Ned, I'm happy to do this. While the ability to join wasn't documented, it's 
not an unreasonable expectation. I'm not sure it's possible to fail _unless_ 
the regexps use named groups (and/or numbered backreferences) - and nobody in 
their right mind would expect regexps for such simple patterns to do such a 
thing ;-)

So chances seem decent the regression your user stumbled into wouldn't be the 
only one to pop up.  The fix was actually quite easy (and thanks to Anthony for 
nudging me in that direction!).  The annoying part was writing a test given 
that the precise group names generated are no longer predictable :-(

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40480] "fnmatch" exponential execution time

2020-05-12 Thread Ned Batchelder


Ned Batchelder  added the comment:

Wow, thanks Tim. To be honest, I was coming around to your original point of 
view that it was too much to promise that these regexes could be combined the 
way I'm doing it.  If we have to undo this latest change, I can live with it.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40480] "fnmatch" exponential execution time

2020-05-11 Thread Tim Peters


Tim Peters  added the comment:


New changeset b1b4c790e7d3b5f4244450aefe3d8f01710c13f7 by Tim Peters in branch 
'master':
bpo-40480: restore ability to join fnmatch.translate() results (GH-20049)
https://github.com/python/cpython/commit/b1b4c790e7d3b5f4244450aefe3d8f01710c13f7


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40480] "fnmatch" exponential execution time

2020-05-11 Thread Tim Peters


Change by Tim Peters :


--
pull_requests: +19358
pull_request: https://github.com/python/cpython/pull/20049

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40480] "fnmatch" exponential execution time

2020-05-11 Thread Tim Peters


Tim Peters  added the comment:

I don't want something probabilistic.  Fix it or don't ;-)

One thing that would work, but at the cost of non-determinism:  do the same as 
now, but obtain the number part of the group name by applying next() to a 
module-global private instance of itertools.count().  That will keep the 
numbers increasing "forever", and across calls.  The point to using .count() is 
that it's atomic (i.e., won't repeat a number if multiple threads happen to be 
constructing regexps simultaneously).

It's a darned silly amount of effort, though ;-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40480] "fnmatch" exponential execution time

2020-05-11 Thread Anthony Sottile


Anthony Sottile  added the comment:

one way might be to give the groups "unique" names (perhaps hashing the input 
string?)  ((this is what I attempted to do in a little bit of code which tried 
to "backport" (group)*+ and (group)++))

--
nosy: +Anthony Sottile

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40480] "fnmatch" exponential execution time

2020-05-11 Thread Tim Peters


Tim Peters  added the comment:

Ned, would it be possible to rewrite code of the form:

if giant pasted regexp matches:

to:

if any(p matches for p in patterns):

That should work under any version of Python.

There's no guarantee that regexps _can_ be pasted together and still work, so I 
can't call this change "a bug".  That pasting regexps together "worked" before 
was an implementation accident.

I'd be happy to change it anyway, except I know of no way to use Python's re 
engine without backreferences that can avoid exponential-time behavior in some 
cases.  In some other regexp engines, yes (e.g., as the code comments note, in 
those that support "atomic grouping"), but not in Python's.  Nor does Python's 
re engine support reusing backreference names or numbers.

So I know of no way to restore the ability to paste regexps together that 
wouldn't reintroduce the possiblity of exponential time failure :-(

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40480] "fnmatch" exponential execution time

2020-05-11 Thread Ned Batchelder


Ned Batchelder  added the comment:

This change has caused a problem for coverage.py.  The full details are here: 
https://github.com/nedbat/coveragepy/issues/988#issuecomment-626926513

Coverage.py combines fnmatch-produced regexes by joining them with pipes into 
one larger regex.  The \ group in the regexes now makes that larger regex 
invalid.

--
nosy: +nedbat

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40480] "fnmatch" exponential execution time

2020-05-05 Thread Tim Peters


Change by Tim Peters :


--
assignee:  -> tim.peters
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40480] "fnmatch" exponential execution time

2020-05-05 Thread Tim Peters


Tim Peters  added the comment:


New changeset b9c46a2c2d7fc68457bff641f78932d66f5e5f59 by Tim Peters in branch 
'master':
bpo-40480 "fnmatch" exponential execution time (GH-19908)
https://github.com/python/cpython/commit/b9c46a2c2d7fc68457bff641f78932d66f5e5f59


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40480] "fnmatch" exponential execution time

2020-05-04 Thread Tim Peters


Change by Tim Peters :


--
keywords: +patch
pull_requests: +19223
stage: needs patch -> patch review
pull_request: https://github.com/python/cpython/pull/19908

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40480] "fnmatch" exponential execution time

2020-05-04 Thread Tim Peters


Tim Peters  added the comment:

Changed version to 3.9, because anything done would change the regexp 
generated, and fnmatch.translate()` makes that regexp visible.

--
stage:  -> needs patch
versions: +Python 3.9 -Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40480] "fnmatch" exponential execution time

2020-05-03 Thread Tim Peters


Tim Peters  added the comment:

"The trick" with this kind of pattern is that a `*` should consume as little as 
possible to allow the next fixed portion to match.  Apply that at every stage, 
and it succeeds or it doesn't.  Backtracking can't change that outcome.  For, 
e.g., the shell pattern:

a*bc*d*

a regexp to do this without backtracking would be like this, IF Python's re 
engine supported "atomic groups" (but it doesn't):

a(?>.*?bc)(?>.*?d).*\Z

The same effect can be gotten in a more convoluted way, via positive lookahead 
assertions and backreferencing (which Python's re engine does support):

a(?=(.*?bc))\1(?=(.*?d))\2.*\Z

Assertions are "one and done":  if the overall match fails, assertions don't 
try alternatives.

So that's _a_ way to proceed.  I'm unclear on that it's worth it, though.  
Stuff like "*a*a*a*a*a*a*" is just hard to swallow as a shell pattern that 
would occur in real life.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40480] "fnmatch" exponential execution time

2020-05-03 Thread Tim Peters


Tim Peters  added the comment:

Note that doctest has the same kind of potential problem with matching ellipsis 
(0 or more characters) in expected output blocks.  Backtracking isn't needed at 
all to resolve patterns of that limited kind, but I don't think Python's re 
module supports enough gimmicks to disable backtracking.

So instead doctest has its own

_ellipsis_match()

function to do it in worst-case linear time.

--
nosy: +tim.peters

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40480] "fnmatch" exponential execution time

2020-05-03 Thread kleshni


kleshni  added the comment:

>So the solution is to replace multiple asterisks with a single one
No, this is not a solution. This glob also hangs:

fnmatch.fnmatchcase("a" * 32, "*a" * 16 + "b")

And again, Bash and SQLite 3 work perfectly.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40480] "fnmatch" exponential execution time

2020-05-03 Thread kleshni


New submission from kleshni :

Hello. The following code hangs:

import fnmatch
fnmatch.fnmatchcase("a" * 32, "*" * 16 + "b")

Measurements show that its execution time rises exponentially with the number 
of asterisks. Bash and SQLite 3 process this glob instantly.

This is because "fnmatchcase" generates a regular expression with repeated dots:

(?s:.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*b)\\Z

It's equivalent to:

(?s:.*b)\\Z

But works in exponential time. So the solution is to replace multiple asterisks 
with a single one, so the latter regexp is generated instead.

--
components: Library (Lib)
messages: 367963
nosy: kleshni
priority: normal
severity: normal
status: open
title: "fnmatch" exponential execution time
type: performance
versions: Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com