[issue29995] re.escape() escapes too much

2019-01-31 Thread Salvo Tomaselli


Salvo Tomaselli  added the comment:

Aaaand this broke my unit tests when moving from 3.6 to 3.7!

--
nosy: +LtWorf

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29995] re.escape() escapes too much

2017-06-11 Thread Terry J. Reedy

Terry J. Reedy added the comment:

Serhiy, please nosy me when you change idlelib files.

--
versions: +Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29995] re.escape() escapes too much

2017-06-11 Thread Terry J. Reedy

Terry J. Reedy added the comment:


New changeset a895f91a46c65a6076e8c6a28af0df1a07ed60a2 by terryjreedy in branch 
'3.6':
[3.6]bpo-29995: Adjust IDLE test for 3.7 re.escape change [GH-1007] (#2114)
https://github.com/python/cpython/commit/a895f91a46c65a6076e8c6a28af0df1a07ed60a2


--
nosy: +terry.reedy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29995] re.escape() escapes too much

2017-06-11 Thread Terry J. Reedy

Changes by Terry J. Reedy :


--
pull_requests: +2167

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29995] re.escape() escapes too much

2017-04-13 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29995] re.escape() escapes too much

2017-04-13 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:


New changeset 5908300e4b0891fc5ab8bd24fba8fac72012eaa7 by Serhiy Storchaka in 
branch 'master':
bpo-29995: re.escape() now escapes only special characters. (#1007)
https://github.com/python/cpython/commit/5908300e4b0891fc5ab8bd24fba8fac72012eaa7


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29995] re.escape() escapes too much

2017-04-12 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
assignee:  -> serhiy.storchaka
dependencies: +Add examples for re.escape()

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29995] re.escape() escapes too much

2017-04-05 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
pull_requests: +1175

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29995] re.escape() escapes too much

2017-04-05 Thread Serhiy Storchaka

New submission from Serhiy Storchaka:

re.escape() escapes all the characters except ASCII letters, numbers and '_'. 
This is too excessive, makes escaping and compiling slower and makes the 
pattern less human-readable. Characters "!\"%&\',/:;<=>@_`~" as well as 
non-ASCII characters are always literal in a regular expression and don't need 
escaping.

Proposed patch makes re.escape() escaping only minimal set of characters that 
can have special meaning in regular expressions. This includes special 
characters ".\\[]{}()*+?^$|", "-" (a range in a character set), "#" (starts a 
comment in verbose mode) and ASCII whitespaces (ignored in verbose mode).

The null character no longer need a special escaping.

The patch also increases the speed of re.escape() (even if it produces the same 
result).

$ ./python -m perf timeit -s 'from re import escape; s = "()[]{}?*+-|^$\\.# 
\t\n\r\v\f"' -- --duplicate 100 'escape(s)'
Unpatched:  Median +- std dev: 42.2 us +- 0.8 us
Patched:Median +- std dev: 11.4 us +- 0.1 us

$ ./python -m perf timeit -s 'from re import escape; s = b"()[]{}?*+-|^$\\.# 
\t\n\r\v\f"' -- --duplicate 100 'escape(s)'
Unpatched:  Median +- std dev: 38.7 us +- 0.7 us
Patched:Median +- std dev: 18.4 us +- 0.2 us

$ ./python -m perf timeit -s 'from re import escape; s = 
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"' -- 
--duplicate 100 'escape(s)'
Unpatched:  Median +- std dev: 40.3 us +- 0.5 us
Patched:Median +- std dev: 33.1 us +- 0.6 us

$ ./python -m perf timeit -s 'from re import escape; s = 
b"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"' -- 
--duplicate 100 'escape(s)'
Unpatched:  Median +- std dev: 54.4 us +- 0.7 us
Patched:Median +- std dev: 40.6 us +- 0.5 us

$ ./python -m perf timeit -s 'from re import escape; s = 
"абвгґдеєжзиіїйклмнопрстуфхцчшщьюяАБВГҐДЕЄЖЗИІЇЙКЛМНОПРСТУФХЦЧШЩЬЮЯ"' -- 
--duplicate 100 'escape(s)'
Unpatched:  Median +- std dev: 156 us +- 3 us
Patched:Median +- std dev: 43.5 us +- 0.5 us

$ ./python -m perf timeit -s 'from re import escape; s = 
"абвгґдеєжзиіїйклмнопрстуфхцчшщьюяАБВГҐДЕЄЖЗИІЇЙКЛМНОПРСТУФХЦЧШЩЬЮЯ".encode()' 
-- --duplicate 100 'escape(s)'
Unpatched:  Median +- std dev: 200 us +- 4 us
Patched:Median +- std dev: 77.0 us +- 0.6 us

And the speed of compilation of escaped string.

$ ./python -m perf timeit -s 'from re import escape; from sre_compile import 
compile; s = 
"абвгґдеєжзиіїйклмнопрстуфхцчшщьюяАБВГҐДЕЄЖЗИІЇЙКЛМНОПРСТУФХЦЧШЩЬЮЯ"; p = 
escape(s)' -- --duplicate 100 'compile(p)'
Unpatched:  Median +- std dev: 1.96 ms +- 0.02 ms
Patched:Median +- std dev: 1.16 ms +- 0.02 ms

$ ./python -m perf timeit -s 'from re import escape; from sre_compile import 
compile; s = 
"абвгґдеєжзиіїйклмнопрстуфхцчшщьюяАБВГҐДЕЄЖЗИІЇЙКЛМНОПРСТУФХЦЧШЩЬЮЯ".encode(); 
p = escape(s)' -- --duplicate 100 'compile(p)'
Unpatched:  Median +- std dev: 3.69 ms +- 0.04 ms
Patched:Median +- std dev: 2.13 ms +- 0.03 ms

--
components: Library (Lib), Regular Expressions
messages: 291177
nosy: ezio.melotti, mrabarnett, serhiy.storchaka
priority: normal
severity: normal
stage: patch review
status: open
title: re.escape() escapes too much
type: enhancement
versions: Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com