SilentGhost <[email protected]> added the comment:
I think these are two different questions:
1. What to escape
2. What to do about poor performance of the re.escape when re.sub is used
In my opinion, there isn't any justifiable reason to escape non-meta
characters: it doesn't affect matching; escaped strings are typically just
re-used in regex.
I would favour simpler and cleaner code with re.sub. I don't think that
re.quote could be a performance bottleneck in any application. I did some
profiling with python3.2 and it seems that the reason for this poor performance
is many abstraction layers when using re.sub. However, we need to bear in mind
that we're only talking about 40 usec difference for a 100-char string
(string.printable): I'd think that strings being escaped are typically shorter.
As a compromise, I tested this code:
_mp = {ord(i): '\\' + i for i in '][.^$*+?{}\\|()'}
def escape(pattern):
if isinstance(pattern, str):
return pattern.translate(_mp)
return sub(br'([][.^$*+?{}\\|()])', br'\\\1', pattern)
which is fast (faster than existing code) for str and slow for bytes patterns.
I don't particularly like it, because of the difference between str and bytes
handling, but I do think that it will be much easier to "fix" once/when/if re
module is improved.
----------
keywords: -patch
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue2650>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com