Roundup Robot devnull@devnull added the comment:
New changeset dda33191f7f5 by Ezio Melotti in branch 'default':
#2650: re.escape() no longer escapes the _.
http://hg.python.org/cpython/rev/dda33191f7f5
--
___
Python tracker rep...@bugs.python.org
Changes by Ezio Melotti ezio.melo...@gmail.com:
--
resolution: - fixed
stage: needs patch - committed/rejected
status: open - closed
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2650
___
Ezio Melotti ezio.melo...@gmail.com added the comment:
Georg, do you think a versionchanged note should be added for this? The change
is minor and the patch updates the documentation to reflect the change.
--
___
Python tracker
Roundup Robot devnull@devnull added the comment:
New changeset 1402c719b7cf by Ezio Melotti in branch '3.1':
#2650: Refactor the tests for re.escape.
http://hg.python.org/cpython/rev/1402c719b7cf
New changeset 9147f7ed75b3 by Ezio Melotti in branch '3.1':
#2650: Add tests with non-ascii chars
Roundup Robot devnull@devnull added the comment:
New changeset d52b1faa7b11 by Ezio Melotti in branch '2.7':
#2650: Refactor re.escape and its tests.
http://hg.python.org/cpython/rev/d52b1faa7b11
--
___
Python tracker rep...@bugs.python.org
Ezio Melotti ezio.melo...@gmail.com added the comment:
I did a few more tests and using a re.sub seems indeed slower (the
implementation is just 4 lines though, and it's more readable):
wolf@hp:~/dev/py/3.1$ ./python -m timeit -s 'import re,string; escape_pattern =
Ezio Melotti ezio.melo...@gmail.com added the comment:
The attached patch (issue2650.diff) adds '_' to the list of chars that are not
escaped.
--
keywords: +patch
Added file: http://bugs.python.org/file21390/issue2650.diff
___
Python tracker
SilentGhost ghost@gmail.com added the comment:
I think these are two different questions:
1. What to escape
2. What to do about poor performance of the re.escape when re.sub is used
In my opinion, there isn't any justifiable reason to escape non-meta
characters: it doesn't affect
Ezio Melotti ezio.melo...@gmail.com added the comment:
re.escape and its tests can be refactored in 2.7/3.1, the '_' can be added to
the list of chars that are not escaped in 3.3.
I'll put together a patch and fix this unless someone thinks that the '_'
should be escaped in 3.3 too.
Ezio Melotti ezio.melo...@gmail.com added the comment:
I took a look to what other languages do, and it turned out that:
perl escapes [^A-Za-z_0-9] [0];
.net escapes the metachars and whitespace [1];
java escapes the metachars or escape sequences [2];
ruby escapes the metachars [3];
It might
SilentGhost ghost@gmail.com added the comment:
Here is the latest patch for test_re incorporating review suggestions by Ezio
and some improvements along the way.
--
Added file: http://bugs.python.org/file21096/test_re.diff
___
Python tracker
Changes by SilentGhost ghost@gmail.com:
Added file: http://bugs.python.org/file20860/test_re.diff
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2650
___
Changes by SilentGhost ghost@gmail.com:
Removed file: http://bugs.python.org/file20389/test_re.diff
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2650
___
SilentGhost ghost@gmail.com added the comment:
Here is the patch, including adjustment to the test.
--
Added file: http://bugs.python.org/file20388/issue2650.diff
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2650
Changes by SilentGhost ghost@gmail.com:
Removed file: http://bugs.python.org/file20388/issue2650.diff
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2650
___
SilentGhost ghost@gmail.com added the comment:
The naïve version of the code proposed was about 3 times slower than existing
version. However, the test, I think, is valuable enough. So, I'm reinstating it.
--
Added file: http://bugs.python.org/file20389/test_re.diff
James Y Knight f...@users.sourceforge.net added the comment:
Show your speed test? Looks 2.5x faster to me. But I'm running this on python
2.6, so I guess it's possible that the re module's speed was decimated in Py3k.
python -m timeit -s $(printf import re\ndef escape(s):\n return
SilentGhost ghost@gmail.com added the comment:
James, I think the setup statement should have been:
import re\ndef escape(s):\n return re.sub(r'([][.^$*+?{}\\|()])', r'\\\1',
s))
note the raw string literals.
The timings that I got after applying file20388
James Y Knight f...@users.sourceforge.net added the comment:
Right you are, it seems that python's regexp implementation is terribly slow
when doing replacements with a substitution in them. (fixing the broken test,
as you pointed out changed the timing to 97.6 usec vs the in-error-reported
yeswanth swamiyeswa...@yahoo.com added the comment:
@James test results for py3k
python -m timeit -s $(printf import re\ndef escape(s):\n return
re.sub('([][.^$*+?{}\\|()])', '\\\1', s))
'escape(!@#$%^*()!@#$%^*()!@#$%^*()!@#$%^*()!@#$%^*()!@#$%^*()!@#$%^*()!@#$%^*())'
10 loops, best
Changes by A.M. Kuchling li...@amk.ca:
--
nosy: -akuchling
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2650
___
___
Python-bugs-list mailing
yeswanth swamiyeswa...@yahoo.com added the comment:
As James said I have written the patch using only regular expressions . This is
going to be my first patch . I need help writing the test for it
--
___
Python tracker rep...@bugs.python.org
Antoine Pitrou pit...@free.fr added the comment:
As James said I have written the patch using only regular expressions .
This is going to be my first patch . I need help writing the test for it
You will find the current tests in Lib/test/test_re.py.
To execute them, run:
$ ./python -m
Changes by SilentGhost ghost@gmail.com:
--
nosy: +SilentGhost
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2650
___
___
Python-bugs-list
Changes by yeswanth swamiyeswa...@yahoo.com:
--
nosy: +swamiyeswanth
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2650
___
___
Python-bugs-list
Georg Brandl ge...@python.org added the comment:
The loop looks strange to me too, not to mention inefficient compared with a
regex replacement done in C.
--
nosy: +georg.brandl
___
Python tracker rep...@bugs.python.org
Antoine Pitrou pit...@free.fr added the comment:
James, could you propose a proper patch? Even better if you also give a couple
of timing results, just for the record?
--
versions: +Python 3.2 -Python 2.7, Python 3.1
___
Python tracker
James Y Knight f...@users.sourceforge.net added the comment:
I just ran into the impl of escape after being surprised that '/' was being
escaped, and then was completely amazed that it wasn't just implemented as a
one-line re.subn. Come on, a loop for string replacement? This is *in* the
Matthew Barnett pyt...@mrabarnett.plus.com added the comment:
Re the regex module (issue #2636), would a good compromise be:
regex.escape(user_input, special_only=True)
to maintain compatibility?
--
nosy: +mrabarnett
___
Python tracker
Björn Lindqvist bjou...@gmail.com added the comment:
In my app, I need to transform the regexp created from user input so
that it matches unicode characters with their ascii equivalents. For
example, if someone searches for el nino, that should match the string
el ñino. Similarly, searching for
Changes by Ezio Melotti ezio.melo...@gmail.com:
--
nosy: +ezio.melotti
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2650
___
___
Python-bugs-list
Changes by Jeffrey C. Jacobs [EMAIL PROTECTED]:
--
versions: +Python 2.7, Python 3.1 -Python 2.6, Python 3.0
___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2650
___
Changes by Jeffrey C. Jacobs [EMAIL PROTECTED]:
--
nosy: +timehorse
___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2650
___
___
Python-bugs-list mailing
Antoine Pitrou [EMAIL PROTECTED] added the comment:
The escaped regexp is not utf-8 (why should it be?)
I suppose it is annoying if you want to print the escaped regexp for
debugging purposes.
Anyway, I suppose someone should really decide if improving re.escape is
worth it, and if not, close
Morten Lied Johansen [EMAIL PROTECTED] added the comment:
In my particular case, we were passing the regex on to a database which
has regex support syntactically equal to Python, so it seemed natural
to use re.escape to make sure we weren't matching against the pattern
we really wanted.
The
Morten Lied Johansen [EMAIL PROTECTED] added the comment:
One issue that the current implementation has, which I can't see have
been commented on here, is that it kills utf8 characters (and probably
every other character encoding that is multi-byte).
A é character in an utf8 encoded string
Amaury Forgeot d'Arc [EMAIL PROTECTED] added the comment:
The escaped regexp is not utf-8 (why should it be?), but it still
matches the same bytes in the searched text, which has to be utf-8
encoded anyway:
text = uété.encode('utf-8')
regexp = ué.encode('utf-8')
re.findall(regexp, text)
Antoine Pitrou [EMAIL PROTECTED] added the comment:
Talking about performance, why use a loop to escape special characters
when you could use a regular expression to escape them all at once?
--
nosy: +pitrou
___
Python tracker [EMAIL PROTECTED]
Alexander Belopolsky [EMAIL PROTECTED] added the comment:
Lorenz's patch uses a set, not a list for special characters. Set
lookup is as fast as dict lookup, but a set takes less memory because it
does not have to store dummy values. More importantly, use of frozenset
instead of dict makes
Russ Cox [EMAIL PROTECTED] added the comment:
Lorenz's patch uses a set, not a list for special characters. Set
lookup is as fast as dict lookup, but a set takes less memory because it
does not have to store dummy values. More importantly, use of frozenset
instead of dict makes the code
Alexander Belopolsky [EMAIL PROTECTED] added the comment:
On Thu, May 8, 2008 at 10:36 AM, Russ Cox [EMAIL PROTECTED] wrote:
..
The title of this issue (#2650) is re.escape should not escape underscore,
not re.escape is too slow and too easy to read.
Neither does the title say re.escape
Russ Cox [EMAIL PROTECTED] added the comment:
You don't need to get so defensive. I did not raise a performance
problem, I was simply responding to Rafael's AFAIK the lookup on
dictionaries is faster than on lists comment. I did not say that you
*should* rewrite your patch the way I
Russ Cox [EMAIL PROTECTED] added the comment:
On Thu, May 8, 2008 at 12:12 PM, Alexander Belopolsky
[EMAIL PROTECTED] wrote:
Alexander Belopolsky [EMAIL PROTECTED] added the comment:
On Thu, May 8, 2008 at 11:45 AM, Russ Cox [EMAIL PROTECTED] wrote:
..
My argument is only that Python
A.M. Kuchling [EMAIL PROTECTED] added the comment:
I haven't assessed the patch, but wouldn't mind to see it applied to
an alpha release or to 3.0; +0 from me. Given that the next 2.6 release
is planned to be a beta, though, the release manager would have to rule.
Note that I don't think this
Rafael Zanella [EMAIL PROTECTED] added the comment:
AFAIK the lookup on dictionaries is faster than on lists.
Patch added, mainly a compilation of the previous patches with an
expanded test.
--
nosy: +zanella
Added file: http://bugs.python.org/file10215/re_patch.diff
Lorenz Quack [EMAIL PROTECTED] added the comment:
The loop in escape should really use enumerate
instead of for i in range(len(pattern)).
It needs i to edit s[i].
enumerate(iterable) returns a tuple for each element in iterable
containing the index and the element itself.
I attached a
Russ Cox [EMAIL PROTECTED] added the comment:
The loop in escape should really use enumerate
instead of for i in range(len(pattern)).
It needs i to edit s[i].
Instead of using a loop, can't the test just
use self.assertEqual(re.esacpe(same), same)?
Done.
Also, please add tests for
Changes by Russ Cox [EMAIL PROTECTED]:
--
keywords: +patch
Added file: http://bugs.python.org/file10080/re.patch
__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2650
__
Benjamin Peterson [EMAIL PROTECTED] added the comment:
Thanks.
The loop in escape should really use enumerate instead of for i in
range(len(pattern)).
Instead of using a loop, can't the test just use
self.assertEqual(re.esacpe(same), same)? Also, please add tests for
what re.escape should
New submission from Russ Cox [EMAIL PROTECTED]:
import re
print re.escape(_)
Prints \_ but should be _.
This behavior differs from Perl and other systems: _ is an identifier
character and as such does not need to be escaped.
--
messages: 65585
nosy: rsc
severity: normal
status: open
Changes by Russ Cox [EMAIL PROTECTED]:
--
components: +Regular Expressions
__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2650
__
___
Python-bugs-list mailing list
Changes by Guido van Rossum [EMAIL PROTECTED]:
--
keywords: +easy
__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2650
__
___
Python-bugs-list mailing list
Unsubscribe:
Changes by Guido van Rossum [EMAIL PROTECTED]:
--
versions: +Python 2.6, Python 3.0 -Python 2.5
__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2650
__
___
Benjamin Peterson [EMAIL PROTECTED] added the comment:
It seems that escape is pretty dumb. The documentations says that
re.escape escapes all non-alphanumeric characters, and it does that
faithfully. It would seem more useful to have a list of meta-characters
and just escape those. This is more
Russ Cox [EMAIL PROTECTED] added the comment:
It seems that escape is pretty dumb. The documentations says that
re.escape escapes all non-alphanumeric characters, and it does that
faithfully. It would seem more useful to have a list of meta-characters
and just escape those. This is more true
55 matches
Mail list logo