[issue2650] re.escape should not escape underscore

2011-04-10 Thread Roundup Robot
Roundup Robot devnull@devnull added the comment: New changeset dda33191f7f5 by Ezio Melotti in branch 'default': #2650: re.escape() no longer escapes the _. http://hg.python.org/cpython/rev/dda33191f7f5 -- ___ Python tracker rep...@bugs.python.org

[issue2650] re.escape should not escape underscore

2011-04-10 Thread Ezio Melotti
Changes by Ezio Melotti ezio.melo...@gmail.com: -- resolution: - fixed stage: needs patch - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2650 ___

[issue2650] re.escape should not escape underscore

2011-04-03 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: Georg, do you think a versionchanged note should be added for this? The change is minor and the patch updates the documentation to reflect the change. -- ___ Python tracker

[issue2650] re.escape should not escape underscore

2011-03-25 Thread Roundup Robot
Roundup Robot devnull@devnull added the comment: New changeset 1402c719b7cf by Ezio Melotti in branch '3.1': #2650: Refactor the tests for re.escape. http://hg.python.org/cpython/rev/1402c719b7cf New changeset 9147f7ed75b3 by Ezio Melotti in branch '3.1': #2650: Add tests with non-ascii chars

[issue2650] re.escape should not escape underscore

2011-03-25 Thread Roundup Robot
Roundup Robot devnull@devnull added the comment: New changeset d52b1faa7b11 by Ezio Melotti in branch '2.7': #2650: Refactor re.escape and its tests. http://hg.python.org/cpython/rev/d52b1faa7b11 -- ___ Python tracker rep...@bugs.python.org

[issue2650] re.escape should not escape underscore

2011-03-25 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: I did a few more tests and using a re.sub seems indeed slower (the implementation is just 4 lines though, and it's more readable): wolf@hp:~/dev/py/3.1$ ./python -m timeit -s 'import re,string; escape_pattern =

[issue2650] re.escape should not escape underscore

2011-03-25 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: The attached patch (issue2650.diff) adds '_' to the list of chars that are not escaped. -- keywords: +patch Added file: http://bugs.python.org/file21390/issue2650.diff ___ Python tracker

[issue2650] re.escape should not escape underscore

2011-03-14 Thread SilentGhost
SilentGhost ghost@gmail.com added the comment: I think these are two different questions: 1. What to escape 2. What to do about poor performance of the re.escape when re.sub is used In my opinion, there isn't any justifiable reason to escape non-meta characters: it doesn't affect

[issue2650] re.escape should not escape underscore

2011-03-14 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: re.escape and its tests can be refactored in 2.7/3.1, the '_' can be added to the list of chars that are not escaped in 3.3. I'll put together a patch and fix this unless someone thinks that the '_' should be escaped in 3.3 too.

[issue2650] re.escape should not escape underscore

2011-03-13 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: I took a look to what other languages do, and it turned out that: perl escapes [^A-Za-z_0-9] [0]; .net escapes the metachars and whitespace [1]; java escapes the metachars or escape sequences [2]; ruby escapes the metachars [3]; It might

[issue2650] re.escape should not escape underscore

2011-03-12 Thread SilentGhost
SilentGhost ghost@gmail.com added the comment: Here is the latest patch for test_re incorporating review suggestions by Ezio and some improvements along the way. -- Added file: http://bugs.python.org/file21096/test_re.diff ___ Python tracker

[issue2650] re.escape should not escape underscore

2011-02-23 Thread SilentGhost
Changes by SilentGhost ghost@gmail.com: Added file: http://bugs.python.org/file20860/test_re.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2650 ___

[issue2650] re.escape should not escape underscore

2011-02-23 Thread SilentGhost
Changes by SilentGhost ghost@gmail.com: Removed file: http://bugs.python.org/file20389/test_re.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2650 ___

[issue2650] re.escape should not escape underscore

2011-01-13 Thread SilentGhost
SilentGhost ghost@gmail.com added the comment: Here is the patch, including adjustment to the test. -- Added file: http://bugs.python.org/file20388/issue2650.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2650

[issue2650] re.escape should not escape underscore

2011-01-13 Thread SilentGhost
Changes by SilentGhost ghost@gmail.com: Removed file: http://bugs.python.org/file20388/issue2650.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2650 ___

[issue2650] re.escape should not escape underscore

2011-01-13 Thread SilentGhost
SilentGhost ghost@gmail.com added the comment: The naïve version of the code proposed was about 3 times slower than existing version. However, the test, I think, is valuable enough. So, I'm reinstating it. -- Added file: http://bugs.python.org/file20389/test_re.diff

[issue2650] re.escape should not escape underscore

2011-01-13 Thread James Y Knight
James Y Knight f...@users.sourceforge.net added the comment: Show your speed test? Looks 2.5x faster to me. But I'm running this on python 2.6, so I guess it's possible that the re module's speed was decimated in Py3k. python -m timeit -s $(printf import re\ndef escape(s):\n return

[issue2650] re.escape should not escape underscore

2011-01-13 Thread SilentGhost
SilentGhost ghost@gmail.com added the comment: James, I think the setup statement should have been: import re\ndef escape(s):\n return re.sub(r'([][.^$*+?{}\\|()])', r'\\\1', s)) note the raw string literals. The timings that I got after applying file20388

[issue2650] re.escape should not escape underscore

2011-01-13 Thread James Y Knight
James Y Knight f...@users.sourceforge.net added the comment: Right you are, it seems that python's regexp implementation is terribly slow when doing replacements with a substitution in them. (fixing the broken test, as you pointed out changed the timing to 97.6 usec vs the in-error-reported

[issue2650] re.escape should not escape underscore

2011-01-13 Thread yeswanth
yeswanth swamiyeswa...@yahoo.com added the comment: @James test results for py3k python -m timeit -s $(printf import re\ndef escape(s):\n return re.sub('([][.^$*+?{}\\|()])', '\\\1', s)) 'escape(!@#$%^*()!@#$%^*()!@#$%^*()!@#$%^*()!@#$%^*()!@#$%^*()!@#$%^*()!@#$%^*())' 10 loops, best

[issue2650] re.escape should not escape underscore

2011-01-13 Thread A.M. Kuchling
Changes by A.M. Kuchling li...@amk.ca: -- nosy: -akuchling ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2650 ___ ___ Python-bugs-list mailing

[issue2650] re.escape should not escape underscore

2011-01-12 Thread yeswanth
yeswanth swamiyeswa...@yahoo.com added the comment: As James said I have written the patch using only regular expressions . This is going to be my first patch . I need help writing the test for it -- ___ Python tracker rep...@bugs.python.org

[issue2650] re.escape should not escape underscore

2011-01-12 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: As James said I have written the patch using only regular expressions . This is going to be my first patch . I need help writing the test for it You will find the current tests in Lib/test/test_re.py. To execute them, run: $ ./python -m

[issue2650] re.escape should not escape underscore

2011-01-12 Thread SilentGhost
Changes by SilentGhost ghost@gmail.com: -- nosy: +SilentGhost ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2650 ___ ___ Python-bugs-list

[issue2650] re.escape should not escape underscore

2011-01-11 Thread yeswanth
Changes by yeswanth swamiyeswa...@yahoo.com: -- nosy: +swamiyeswanth ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2650 ___ ___ Python-bugs-list

[issue2650] re.escape should not escape underscore

2011-01-08 Thread Georg Brandl
Georg Brandl ge...@python.org added the comment: The loop looks strange to me too, not to mention inefficient compared with a regex replacement done in C. -- nosy: +georg.brandl ___ Python tracker rep...@bugs.python.org

[issue2650] re.escape should not escape underscore

2011-01-08 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: James, could you propose a proper patch? Even better if you also give a couple of timing results, just for the record? -- versions: +Python 3.2 -Python 2.7, Python 3.1 ___ Python tracker

[issue2650] re.escape should not escape underscore

2011-01-07 Thread James Y Knight
James Y Knight f...@users.sourceforge.net added the comment: I just ran into the impl of escape after being surprised that '/' was being escaped, and then was completely amazed that it wasn't just implemented as a one-line re.subn. Come on, a loop for string replacement? This is *in* the

[issue2650] re.escape should not escape underscore

2010-11-25 Thread Matthew Barnett
Matthew Barnett pyt...@mrabarnett.plus.com added the comment: Re the regex module (issue #2636), would a good compromise be: regex.escape(user_input, special_only=True) to maintain compatibility? -- nosy: +mrabarnett ___ Python tracker

[issue2650] re.escape should not escape underscore

2009-09-12 Thread Björn Lindqvist
Björn Lindqvist bjou...@gmail.com added the comment: In my app, I need to transform the regexp created from user input so that it matches unicode characters with their ascii equivalents. For example, if someone searches for el nino, that should match the string el ñino. Similarly, searching for

[issue2650] re.escape should not escape underscore

2009-04-29 Thread Ezio Melotti
Changes by Ezio Melotti ezio.melo...@gmail.com: -- nosy: +ezio.melotti ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2650 ___ ___ Python-bugs-list

[issue2650] re.escape should not escape underscore

2008-09-28 Thread Jeffrey C. Jacobs
Changes by Jeffrey C. Jacobs [EMAIL PROTECTED]: -- versions: +Python 2.7, Python 3.1 -Python 2.6, Python 3.0 ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue2650 ___

[issue2650] re.escape should not escape underscore

2008-09-28 Thread Jeffrey C. Jacobs
Changes by Jeffrey C. Jacobs [EMAIL PROTECTED]: -- nosy: +timehorse ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue2650 ___ ___ Python-bugs-list mailing

[issue2650] re.escape should not escape underscore

2008-06-28 Thread Antoine Pitrou
Antoine Pitrou [EMAIL PROTECTED] added the comment: The escaped regexp is not utf-8 (why should it be?) I suppose it is annoying if you want to print the escaped regexp for debugging purposes. Anyway, I suppose someone should really decide if improving re.escape is worth it, and if not, close

[issue2650] re.escape should not escape underscore

2008-06-28 Thread Morten Lied Johansen
Morten Lied Johansen [EMAIL PROTECTED] added the comment: In my particular case, we were passing the regex on to a database which has regex support syntactically equal to Python, so it seemed natural to use re.escape to make sure we weren't matching against the pattern we really wanted. The

[issue2650] re.escape should not escape underscore

2008-06-26 Thread Morten Lied Johansen
Morten Lied Johansen [EMAIL PROTECTED] added the comment: One issue that the current implementation has, which I can't see have been commented on here, is that it kills utf8 characters (and probably every other character encoding that is multi-byte). A é character in an utf8 encoded string

[issue2650] re.escape should not escape underscore

2008-06-26 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc [EMAIL PROTECTED] added the comment: The escaped regexp is not utf-8 (why should it be?), but it still matches the same bytes in the searched text, which has to be utf-8 encoded anyway: text = uété.encode('utf-8') regexp = ué.encode('utf-8') re.findall(regexp, text)

[issue2650] re.escape should not escape underscore

2008-06-14 Thread Antoine Pitrou
Antoine Pitrou [EMAIL PROTECTED] added the comment: Talking about performance, why use a loop to escape special characters when you could use a regular expression to escape them all at once? -- nosy: +pitrou ___ Python tracker [EMAIL PROTECTED]

[issue2650] re.escape should not escape underscore

2008-05-08 Thread Alexander Belopolsky
Alexander Belopolsky [EMAIL PROTECTED] added the comment: Lorenz's patch uses a set, not a list for special characters. Set lookup is as fast as dict lookup, but a set takes less memory because it does not have to store dummy values. More importantly, use of frozenset instead of dict makes

[issue2650] re.escape should not escape underscore

2008-05-08 Thread Russ Cox
Russ Cox [EMAIL PROTECTED] added the comment: Lorenz's patch uses a set, not a list for special characters. Set lookup is as fast as dict lookup, but a set takes less memory because it does not have to store dummy values. More importantly, use of frozenset instead of dict makes the code

[issue2650] re.escape should not escape underscore

2008-05-08 Thread Alexander Belopolsky
Alexander Belopolsky [EMAIL PROTECTED] added the comment: On Thu, May 8, 2008 at 10:36 AM, Russ Cox [EMAIL PROTECTED] wrote: .. The title of this issue (#2650) is re.escape should not escape underscore, not re.escape is too slow and too easy to read. Neither does the title say re.escape

[issue2650] re.escape should not escape underscore

2008-05-08 Thread Russ Cox
Russ Cox [EMAIL PROTECTED] added the comment: You don't need to get so defensive. I did not raise a performance problem, I was simply responding to Rafael's AFAIK the lookup on dictionaries is faster than on lists comment. I did not say that you *should* rewrite your patch the way I

[issue2650] re.escape should not escape underscore

2008-05-08 Thread Russ Cox
Russ Cox [EMAIL PROTECTED] added the comment: On Thu, May 8, 2008 at 12:12 PM, Alexander Belopolsky [EMAIL PROTECTED] wrote: Alexander Belopolsky [EMAIL PROTECTED] added the comment: On Thu, May 8, 2008 at 11:45 AM, Russ Cox [EMAIL PROTECTED] wrote: .. My argument is only that Python

[issue2650] re.escape should not escape underscore

2008-05-08 Thread A.M. Kuchling
A.M. Kuchling [EMAIL PROTECTED] added the comment: I haven't assessed the patch, but wouldn't mind to see it applied to an alpha release or to 3.0; +0 from me. Given that the next 2.6 release is planned to be a beta, though, the release manager would have to rule. Note that I don't think this

[issue2650] re.escape should not escape underscore

2008-05-07 Thread Rafael Zanella
Rafael Zanella [EMAIL PROTECTED] added the comment: AFAIK the lookup on dictionaries is faster than on lists. Patch added, mainly a compilation of the previous patches with an expanded test. -- nosy: +zanella Added file: http://bugs.python.org/file10215/re_patch.diff

[issue2650] re.escape should not escape underscore

2008-04-28 Thread Lorenz Quack
Lorenz Quack [EMAIL PROTECTED] added the comment: The loop in escape should really use enumerate instead of for i in range(len(pattern)). It needs i to edit s[i]. enumerate(iterable) returns a tuple for each element in iterable containing the index and the element itself. I attached a

[issue2650] re.escape should not escape underscore

2008-04-24 Thread Russ Cox
Russ Cox [EMAIL PROTECTED] added the comment: The loop in escape should really use enumerate instead of for i in range(len(pattern)). It needs i to edit s[i]. Instead of using a loop, can't the test just use self.assertEqual(re.esacpe(same), same)? Done. Also, please add tests for

[issue2650] re.escape should not escape underscore

2008-04-23 Thread Russ Cox
Changes by Russ Cox [EMAIL PROTECTED]: -- keywords: +patch Added file: http://bugs.python.org/file10080/re.patch __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue2650 __

[issue2650] re.escape should not escape underscore

2008-04-23 Thread Benjamin Peterson
Benjamin Peterson [EMAIL PROTECTED] added the comment: Thanks. The loop in escape should really use enumerate instead of for i in range(len(pattern)). Instead of using a loop, can't the test just use self.assertEqual(re.esacpe(same), same)? Also, please add tests for what re.escape should

[issue2650] re.escape should not escape underscore

2008-04-17 Thread Russ Cox
New submission from Russ Cox [EMAIL PROTECTED]: import re print re.escape(_) Prints \_ but should be _. This behavior differs from Perl and other systems: _ is an identifier character and as such does not need to be escaped. -- messages: 65585 nosy: rsc severity: normal status: open

[issue2650] re.escape should not escape underscore

2008-04-17 Thread Russ Cox
Changes by Russ Cox [EMAIL PROTECTED]: -- components: +Regular Expressions __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue2650 __ ___ Python-bugs-list mailing list

[issue2650] re.escape should not escape underscore

2008-04-17 Thread Guido van Rossum
Changes by Guido van Rossum [EMAIL PROTECTED]: -- keywords: +easy __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue2650 __ ___ Python-bugs-list mailing list Unsubscribe:

[issue2650] re.escape should not escape underscore

2008-04-17 Thread Guido van Rossum
Changes by Guido van Rossum [EMAIL PROTECTED]: -- versions: +Python 2.6, Python 3.0 -Python 2.5 __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue2650 __ ___

[issue2650] re.escape should not escape underscore

2008-04-17 Thread Benjamin Peterson
Benjamin Peterson [EMAIL PROTECTED] added the comment: It seems that escape is pretty dumb. The documentations says that re.escape escapes all non-alphanumeric characters, and it does that faithfully. It would seem more useful to have a list of meta-characters and just escape those. This is more

[issue2650] re.escape should not escape underscore

2008-04-17 Thread Russ Cox
Russ Cox [EMAIL PROTECTED] added the comment: It seems that escape is pretty dumb. The documentations says that re.escape escapes all non-alphanumeric characters, and it does that faithfully. It would seem more useful to have a list of meta-characters and just escape those. This is more true