[issue18685] Restore re performance to pre-PEP393 level

2013-10-26 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 66e2dfbb1d70 by Serhiy Storchaka in branch 'default':
Issue #18685: Restore re performance to pre-PEP 393 levels.
http://hg.python.org/cpython/rev/66e2dfbb1d70

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-10-26 Thread STINNER Victor

STINNER Victor added the comment:

Sorry, I was busy with my tracemalloc PEP, I didn't havee time to review your 
patch. I'm happy that you restored Python 3.2 performances! Thanks Serhiy.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-10-26 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 00e61cb3b11c by Serhiy Storchaka in branch 'default':
Issue #18685: Extract template part of _sre.c into separated sre_lib.h file.
http://hg.python.org/cpython/rev/00e61cb3b11c

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-10-26 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Didn't you forget to add sre_lib.h?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-10-26 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Ah, sorry, no. I was fooled by the commit e-mail.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-10-26 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Yes, the commit e-mail looks queer.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-10-26 Thread Antoine Pitrou

Antoine Pitrou added the comment:

I suppose this issue can be fixed then. Thanks for doing this!

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-10-26 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Thank you for your review Antoine and Victor.

--
resolution:  - fixed
stage: commit review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-10-25 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Posted review on Rietveld. See also issue #19387.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-10-25 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Updated patch addresses Antoine's comments.

--
Added file: http://bugs.python.org/file32363/sre_optimize_3.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-10-25 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Looks good to me.

--
stage: patch review - commit review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-10-24 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Please review this patch. I will extract template part into separated file in 
separated commit.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-10-23 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
dependencies:  -Pointers point out of array bound in _sre.c

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-10-21 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Rebased patch to tip and added non-ASCII tests for main re functions.

--
Added file: http://bugs.python.org/file32283/sre_optimize_2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-08-29 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
dependencies: +Fix format specifiers for debug output in _sre.c, Pointers point 
out of array bound in _sre.c

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-08-25 Thread Tal Weiss

Changes by Tal Weiss t...@evature.com:


--
nosy: +Tal.Weiss

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-08-14 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 Using #include _sre.c in _sre.c looks weird. Instead of huge sections 
 delimited by #ifdef SRE_RECURSIVE, I would prefer something similar to the 
 stringlib. .h template files included more than once. I also expect shorter 
 files: _sre.c is close to 4000 lines of C code :-(

Agree, but a patch will be larger and harder for the synchronization and for 
the review in Rietveld. I'm going first solve other issues (issue18647, 
issue18672) before creating a large final patch.

 The value of SIZEOF_SRE_CHAR looks suspicious.

Good catch. Actually this macro is used only in order to skip some checks for 
UCS4. It should not affects the correctness, only possible the performance.

 Does test_re have some non-ASCII tests? If not, we should probably start by 
 adding such tests!

There is a small number (about 10) of tests for non-ASCII data.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-08-13 Thread STINNER Victor

STINNER Victor added the comment:

Using #include _sre.c in _sre.c looks weird. Instead of huge sections 
delimited by #ifdef SRE_RECURSIVE, I would prefer something similar to the 
stringlib. .h template files included more than once. I also expect shorter 
files: _sre.c is close to 4000 lines of C code :-(

If you move code from _sre.c to a new file, you should use hg cp to keep the 
history. For the review, it's maybe better to continue with your SRE_RECURSIVE 
hack :)

--

#define SRE_CHAR Py_UCS1
#define SIZEOF_SRE_CHAR 1
..
#define SRE_CHAR Py_UCS2
#define SIZEOF_SRE_CHAR 1
...
#define SRE_CHAR Py_UCS4
#define SIZEOF_SRE_CHAR 1

The value of SIZEOF_SRE_CHAR looks suspicious.

Does test_re have some non-ASCII tests? If not, we should probably start by 
adding such tests!

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-08-09 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
nosy: +tim_one

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-08-09 Thread Antoine Pitrou

Antoine Pitrou added the comment:

I get the same kind of results as Serhiy:

$ python3.2 -m timeit -s import re; f = re.compile(b'abc').search; x = 
b'x'*10  f(x)
1 loops, best of 3: 81.7 usec per loop
$ python3.2 -m timeit -s import re; f = re.compile('abc').search; x = 
'x'*10  f(x)
1 loops, best of 3: 31.1 usec per loop
$ python3.2 -m timeit -s import re; f = re.compile('abc').search; x = 
'\u20ac'*10  f(x)
1 loops, best of 3: 31.1 usec per loop

Unpatched 3.4:

$ ./python -m timeit -s import re; f = re.compile(b'abc').search; x = 
b'x'*10  f(x)
1 loops, best of 3: 81.6 usec per loop
$ ./python -m timeit -s import re; f = re.compile('abc').search; x = 
'x'*10  f(x)
1 loops, best of 3: 163 usec per loop
$ ./python -m timeit -s import re; f = re.compile('abc').search; x = 
'\u20ac'*10  f(x)
1 loops, best of 3: 190 usec per loop

Patched 3.4:

$ ./python -m timeit -s import re; f = re.compile(b'abc').search; x = 
b'x'*10  f(x)
1 loops, best of 3: 54.4 usec per loop
$ ./python -m timeit -s import re; f = re.compile('abc').search; x = 
'x'*10  f(x)
1 loops, best of 3: 54.2 usec per loop
$ ./python -m timeit -s import re; f = re.compile('abc').search; x = 
'\u20ac'*10  f(x)
1 loops, best of 3: 54.5 usec per loop

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-08-09 Thread Matthew Barnett

Matthew Barnett added the comment:

@Antoine: Are you on the same OS as Serhiy?

IIRC, wasn't the performance regression that wxjmfauth complained about in 
Python 3.3 apparent on Windows, but not on Linux?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-08-09 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 @Antoine: Are you on the same OS as Serhiy?

I don't know, I'm under Linux with gcc (on a x86-64 machine) :-)

 IIRC, wasn't the performance regression that wxjmfauth complained
 about in Python 3.3 apparent on Windows, but not on Linux?

I don't know, but I'm not willing to give any attention to something
reported by jmfauth. He's very far in the trollzone, as far as I'm
concerned.

However, if you are under Windows and can give it a try, it would be
nice to have performance numbers for Serhiy's patch.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-08-09 Thread STINNER Victor

Changes by STINNER Victor victor.stin...@gmail.com:


--
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-08-09 Thread Matthew Barnett

Matthew Barnett added the comment:

With the patch the results are:

C:\Python34\python.exe -m timeit -s import re; f = re.compile(b'abc').search; 
x = b'x'*10 f(x) 
1 loops, best of 3: 113 usec per loop

C:\Python34\python.exe -m timeit -s import re; f = re.compile('abc').search; x 
= 'x'*10 f(x) 
1 loops, best of 3: 113 usec per loop

C:\Python34\python.exe -m timeit -s import re; f = re.compile('abc').search; x 
= '\u20ac'*10 f(x) 
1 loops, best of 3: 113 usec per loop

I'm most impressed! :-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-08-09 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I'm under 32-bit Linux with gcc 4.6.3.

The above test is only one example for which I expect largest difference. I 
suppose other tests will show a gain too.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-08-09 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Ok, here are some results from the benchmarks suite:

Report on Linux fsol 3.8.0-27-generic #40-Ubuntu SMP Tue Jul 9 00:17:05 UTC 
2013 x86_64 x86_64
Total CPU cores: 4

### regex_effbot ###
Min: 0.058952 - 0.054367: 1.08x faster
Avg: 0.059060 - 0.054378: 1.09x faster
Significant (t=132.69)
Stddev: 0.8 - 0.1: 5.9597x smaller

### regex_v8 ###
Min: 0.063401 - 0.050701: 1.25x faster
Avg: 0.066147 - 0.053530: 1.24x faster
Significant (t=3.22)
Stddev: 0.00608 - 0.00630: 1.0363x larger

The following not significant results are hidden, use -v to show them:
regex_compile.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-08-08 Thread Serhiy Storchaka

New submission from Serhiy Storchaka:

Before PEP 393 the regex functions scanned an array of char or Py_UNICODE and 
character testing was cheap. After PEP 393 they checks a kind of an unicode 
string for every tested character and processing of unicode strings becomes 
slower. _sre.c already generates two sets of functions from one source -- for 
byte and unicode strings. The proposed patch uses same technique to generate 
three sets of functions -- for byte/UCS1, UCS2 and UCS4 strings. This 
simplifies the code (now it more similar to pre-PEP393 version) and makes 
characters testing faster.

Benchmark example:

Python 3.2:
$ python3.2 -m timeit -s import re; f = re.compile(b'abc').search; x = 
b'x'*10  f(x)
1000 loops, best of 3: 613 usec per loop
$ python3.2 -m timeit -s import re; f = re.compile('abc').search; x = 
'x'*10  f(x)
1000 loops, best of 3: 232 usec per loop
$ python3.2 -m timeit -s import re; f = re.compile('abc').search; x = 
'\u20ac'*10  f(x)
1000 loops, best of 3: 217 usec per loop

Python 3.4.0a1+ unpatched:
$ ./python -m timeit -s import re; f = re.compile(b'abc').search; x = 
b'x'*10  f(x)
1000 loops, best of 3: 485 usec per loop
$ ./python -m timeit -s import re; f = re.compile('abc').search; x = 
'x'*10  f(x)
1000 loops, best of 3: 790 usec per loop
$ ./python -m timeit -s import re; f = re.compile('abc').search; x = 
'\u20ac'*10  f(x)
1000 loops, best of 3: 1.09 msec per loop

Python 3.4.0a1+ patched:
$ ./python -m timeit -s import re; f = re.compile(b'abc').search; x = 
b'x'*10  f(x)
1000 loops, best of 3: 250 usec per loop
$ ./python -m timeit -s import re; f = re.compile('abc').search; x = 
'x'*10  f(x)
1000 loops, best of 3: 250 usec per loop
$ ./python -m timeit -s import re; f = re.compile('abc').search; x = 
'\u20ac'*10  f(x)
1000 loops, best of 3: 256 usec per loop

I also propose for simplicity extract a template part of _sre.c to separated 
file (i.e. srelib.h) and get rid of recursion.

--
assignee: serhiy.storchaka
components: Regular Expressions, Unicode
files: sre_optimize.patch
keywords: patch
messages: 194669
nosy: ezio.melotti, mrabarnett, serhiy.storchaka
priority: normal
severity: normal
stage: patch review
status: open
title: Restore re performance to pre-PEP393 level
type: performance
versions: Python 3.4
Added file: http://bugs.python.org/file31198/sre_optimize.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18685] Restore re performance to pre-PEP393 level

2013-08-08 Thread Matthew Barnett

Matthew Barnett added the comment:

It appears that in your tests Python 3.2 is faster with Unicode than 
bytestrings and that unpatched Python 3.4 is a lot slower.

I get somewhat different results (Windows XP Pro, 32-bit):

C:\Python32\python.exe -m timeit -s import re; f = re.compile(b'abc').search; 
x = b'x'*10 f(x)
1000 loops, best of 3: 449 usec per loop

C:\Python32\python.exe -m timeit -s import re; f = re.compile('abc').search; x 
= 'x'*10 f(x)
1000 loops, best of 3: 506 usec per loop

C:\Python32\python.exe -m timeit -s import re; f = re.compile('abc').search; x 
= '\u20ac'*10 f(x)
1000 loops, best of 3: 506 usec per loop


C:\Python34\python.exe -m timeit -s import re; f = re.compile(b'abc').search; 
x = b'x'*10 f(x)
1000 loops, best of 3: 227 usec per loop

C:\Python34\python.exe -m timeit -s import re; f = re.compile('abc').search; x 
= 'x'*10 f(x)
1000 loops, best of 3: 339 usec per loop

C:\Python34\python.exe -m timeit -s import re; f = re.compile('abc').search; x 
= '\u20ac'*10 f(x)
1000 loops, best of 3: 504 usec per loop

For comparison, in the regex module I don't duplicate whole sections of code, 
but instead have a pointer to one of 3 functions (for UCS1, UCS2 and UCS4) that 
gets the codepoint, except for some tight loops. Doing that might be too much 
of a change for re.

However, the speed appears to be a lot more consistent:

C:\Python32\python.exe -m timeit -s import regex; f = 
regex.compile(b'abc').search; x = b'x'*10 f(x)
1 loops, best of 3: 113 usec per loop

C:\Python32\python.exe -m timeit -s import regex; f = 
regex.compile('abc').search; x = 'x'*10 f(x)
1 loops, best of 3: 113 usec per loop

C:\Python32\python.exe -m timeit -s import regex; f = 
regex.compile('abc').search; x = '\u20ac'*10 f(x)
1 loops, best of 3: 113 usec per loop


C:\Python34\python.exe -m timeit -s import regex; f = 
regex.compile(b'abc').search; x = b'x'*10 f(x)
1 loops, best of 3: 113 usec per loop

C:\Python34\python.exe -m timeit -s import regex; f = 
regex.compile('abc').search; x = 'x'*10 f(x)
1 loops, best of 3: 113 usec per loop

C:\Python34\python.exe -m timeit -s import regex; f = 
regex.compile('abc').search; x = '\u20ac'*10 f(x)
1 loops, best of 3: 113 usec per loop

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18685
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com