[issue30148] Pathological regex behaviour

2017-11-16 Thread Serhiy Storchaka

Change by Serhiy Storchaka :


--
nosy: +serhiy.storchaka
resolution:  -> wont fix
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30148] Pathological regex behaviour

2017-04-24 Thread Tim Peters

Tim Peters added the comment:

Yes, that example takes time exponential in the number of blanks to (fail to) 
match - each time you add a blank to `input`, it essentially doubles the time 
required.

It's _possible_ for an implementation to deduce that `(\s+)+` is an insanely 
inefficient way to spell `\s+`, like it's _possible_ for an implementation to 
deduce that 10**10**10 - 10**10**10 is an insanely inefficient way to spell 0.

Python's does not.  To understand what's going on, Friedl's book "Mastering 
Regular Expressions" is an excellent source.

--
nosy: +tim.peters

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30148] Pathological regex behaviour

2017-04-24 Thread Jussi Pakkanen

Jussi Pakkanen added the comment:

This is slow even when ignores is set to a non-empty value. It's not as slow 
but the real slowdown is in the whitespace regex. Here is a minimal sample:

input = '  abc'
re.search(r'(\s+)+d', input)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30148] Pathological regex behaviour

2017-04-23 Thread Matthew Barnett

Matthew Barnett added the comment:

If 'ignores' is '', you get this:

(?:\b(?:extern|G_INLINE_FUNC|%s)\s*)

which can match an empty string, and it's tried repeatedly.

That's inadvisable.

There's also:

(?:\s+|\*)+

which can match whitespace in multiple ways.

That's inadvisable too.

If the pattern really doesn't match the string (and it doesn't!), then it won't 
find out until it has tried _all_ of the possibilities.

Some implementations, such as Perl's, have extra checks to try to reduce the 
problem.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30148] Pathological regex behaviour

2017-04-23 Thread Jussi Pakkanen

New submission from Jussi Pakkanen:

Attached is a script that runs a single regex against one line of text taking 
over 12 seconds.

If you run the exact same regex in Perl it finishes immediately.

The slowness has something to do with spaces. If you replace consecutive spaces 
in the input with one, the evaluation is immediate.

This bug was originally discovered here: 
https://bugzilla.gnome.org/show_bug.cgi?id=781569

--
components: Regular Expressions
files: retest.py
messages: 292181
nosy: ezio.melotti, jpakkane, mrabarnett
priority: normal
severity: normal
status: open
title: Pathological regex behaviour
type: resource usage
Added file: http://bugs.python.org/file46828/retest.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com