New issue 2777: re: incorrect behaviour for long patterns that are used
repeatedly (possible JIT bug?)
https://bitbucket.org/pypy/pypy/issues/2777/re-incorrect-behaviour-for-long-patterns
Andrew Stepanov:
I've observed that `re` module gives incorrect results for very long patterns
that are used repeatedly (possible JIT bug?)
The following code produces an error on both pypy2 & pypy3 (latest version from
mercurial) although it passes on CPython3.5
```
import re
pattern = ".a" * 2500
text = "a" * 6000
match = re.compile(pattern).match
for idx in range(len(text) - len(pattern) + 1):
substr = text[idx:idx+len(pattern)]
if match(substr) is None:
raise RuntimeError("This shouldn't have happened at {}".format(idx))
```
```
Traceback (most recent call last):
File "pypy_re_bug.py", line 9, in
raise RuntimeError("This shouldn't have happened at {}".format(idx))
RuntimeError: This shouldn't have happened at 632
```
This also happens for other long patterns (I tried `pattern = "." * 5000`,
`pattern = "a" * 5000` and random strings from `{".", "a"}` alphabet of lengths
>= 5000)
The exact number of iterations before the error occurs can vary slightly, e.g.
if I move `match = re.compile(pattern).match` inside the loop, I get exception
at iteration 668 on pypy3 and 643 on pypy2.
The code works fine for shorter patterns.
___
pypy-issue mailing list
pypy-issue@python.org
https://mail.python.org/mailman/listinfo/pypy-issue