On Tue, May 19, 2009 at 5:36 AM, spir <[email protected]> wrote:
> Hello,
>
> This is a follow the post on performance issues.
> Using a profiler, I realized that inside error message creation, most of the
> time was spent in a tool func used to clean up source text output.
> The issue is that when the source text holds control chars such as \n, then
> the error message is hardly readible. MY solution is to replace such chars
> with their repr():
>
> def _cleanRepr(text):
> ''' text with control chars replaced by repr() equivalent '''
> result = ""
> for char in text:
> n = ord(char)
> if (n < 32) or (n > 126 and n < 160):
> char = repr(char)[1:-1]
> result += char
> return result
>
> For any reason, this func is extremely slow. While the rest of error message
> creation looks very complicated, this seemingly innocent consume > 90% of the
> time. The issue is that I cannot use repr(text), because repr will replace
> all non-ASCII characters. I need to replace only control characters.
> How else could I do that?
I would get rid of the calls to ord() and repr() to start. There is no
need for ord() at all, just compare characters directly:
if char < '\x20' or '\x7e' < char < '\xa0':
To eliminate repr() you could create a dict mapping chars to their
repr and look up in that.
You should also look for a way to get the loop into C code. One way to
do that would be to use a regex to search and replace. The replacement
pattern in a call to re.sub() can be a function call; the function can
do the dict lookup. Here is something to try:
import re
# Create a dictionary of replacement strings
repl = {}
for i in range(0, 32) + range(127, 160):
c = chr(i)
repl[c] = repr(c)[1:-1]
def sub(m):
''' Helper function for re.sub(). m will be a Match object. '''
return repl[m.group()]
# Regex to match control chars
controlsRe = re.compile(r'[\x00-\x1f\x7f-\x9f]')
def replaceControls(s):
''' Replace all control chars in s with their repr() '''
return controlsRe.sub(sub, s)
for s in [
'Simple test',
'Harder\ntest',
'Final\x00\x1f\x20\x7e\x7f\x9f\xa0test'
]:
print replaceControls(s)
Kent
_______________________________________________
Tutor maillist - [email protected]
http://mail.python.org/mailman/listinfo/tutor