Re: [Python-Dev] inspect.py very slow under 2.5

2006-09-07 Thread Fernando Perez
Nick Coghlan wrote:

> I've updated the patch on SF, and committed the fix (including PJE's and
> Neal's comments) to the trunk.
> 
> I'll backport it tomorrow night (assuming I don't hear any objections in the
> meantime :).

I just wanted to thank you all for taking the time to work on this, even with
my 11-th hour report.  Greatly appreciated, really.

Looking forward to 2.5!

f

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] inspect.py very slow under 2.5

2006-09-07 Thread Nick Coghlan
Ralf Schmitt wrote:
> Nick Coghlan wrote:
>> Good point. I modified the patch so it does the latter (it only calls 
>> getabspath() again for a module if the value of module.__file__ changes).
> 
> with _filesbymodname[modname] = file changed to _filesbymodname[modname] 
> = f
> it seems to work ok.

I checked the inspect module unit tests and discovered the test for this 
function was only covering one of the half dozen possible execution paths.

I've updated the patch on SF, and committed the fix (including PJE's and 
Neal's comments) to the trunk.

I'll backport it tomorrow night (assuming I don't hear any objections in the 
meantime :).

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] inspect.py very slow under 2.5

2006-09-06 Thread Ralf Schmitt
Nick Coghlan wrote:
> Ralf Schmitt wrote:
>> Nick Coghlan wrote:
>>> It looks like the problem is the call to getabspath() in getmodule(). 
>>> This happens every time, even if the file name is already in the 
>>> modulesbyfile cache. This calls os.path.abspath() and 
>>> os.path.normpath() every time that inspect.findsource() is called.
>>>
>>> That can be fixed by having findsource() pass the filename argument to 
>>> getmodule(), and adding a check of the modulesbyfile cache *before* 
>>> the call to getabspath().
>>>
>>> Can you try this patch and see if you get 2.4 level performance back 
>>> on Fernando's test?:
>> no. this doesn't work. getmodule always iterates over 
>> sys.modules.values() and only returns None afterwards.
>> One would have to cache the bad file value, or only inspect new/changed 
>> modules from sys.modules.
> 
> Good point. I modified the patch so it does the latter (it only calls 
> getabspath() again for a module if the value of module.__file__ changes).

with _filesbymodname[modname] = file changed to 
_filesbymodname[modname] = f
it seems to work ok.

diff -r d41ffd2faa28 inspect.py
--- a/inspect.pyWed Sep 06 13:01:12 2006 +0200
+++ b/inspect.pyWed Sep 06 16:52:39 2006 +0200
@@ -403,6 +403,7 @@ def getabsfile(object, _filename=None):
  return os.path.normcase(os.path.abspath(_filename))

  modulesbyfile = {}
+_filesbymodname = {}

  def getmodule(object, _filename=None):
  """Return the module an object was defined in, or None if not 
found."""
@@ -410,17 +411,23 @@ def getmodule(object, _filename=None):
  return object
  if hasattr(object, '__module__'):
  return sys.modules.get(object.__module__)
+if _filename is not None and _filename in modulesbyfile:
+return sys.modules.get(modulesbyfile[_filename])
  try:
  file = getabsfile(object, _filename)
  except TypeError:
  return None
  if file in modulesbyfile:
  return sys.modules.get(modulesbyfile[file])
-for module in sys.modules.values():
+for modname, module in sys.modules.iteritems():
  if ismodule(module) and hasattr(module, '__file__'):
+f = module.__file__
+if f == _filesbymodname.get(modname, None):
+continue
+_filesbymodname[modname] = f
  f = getabsfile(module)
  modulesbyfile[f] = modulesbyfile[
-os.path.realpath(f)] = module.__name__
+os.path.realpath(f)] = modname
  if file in modulesbyfile:
  return sys.modules.get(modulesbyfile[file])
  main = sys.modules['__main__']
@@ -444,7 +451,7 @@ def findsource(object):
  in the file and the line number indexes a line in that list.  An 
IOError
  is raised if the source code cannot be retrieved."""
  file = getsourcefile(object) or getfile(object)
-module = getmodule(object)
+module = getmodule(object, file)
  if module:
  lines = linecache.getlines(file, module.__dict__)
  else:

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] inspect.py very slow under 2.5

2006-09-06 Thread Nick Coghlan
Ralf Schmitt wrote:
> Nick Coghlan wrote:
>>
>> It looks like the problem is the call to getabspath() in getmodule(). 
>> This happens every time, even if the file name is already in the 
>> modulesbyfile cache. This calls os.path.abspath() and 
>> os.path.normpath() every time that inspect.findsource() is called.
>>
>> That can be fixed by having findsource() pass the filename argument to 
>> getmodule(), and adding a check of the modulesbyfile cache *before* 
>> the call to getabspath().
>>
>> Can you try this patch and see if you get 2.4 level performance back 
>> on Fernando's test?:
> 
> no. this doesn't work. getmodule always iterates over 
> sys.modules.values() and only returns None afterwards.
> One would have to cache the bad file value, or only inspect new/changed 
> modules from sys.modules.

Good point. I modified the patch so it does the latter (it only calls 
getabspath() again for a module if the value of module.__file__ changes).

Cheers,
Nick.


-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] inspect.py very slow under 2.5

2006-09-06 Thread Ralf Schmitt
Nick Coghlan wrote:
> 
> It looks like the problem is the call to getabspath() in getmodule(). This 
> happens every time, even if the file name is already in the modulesbyfile 
> cache. This calls os.path.abspath() and os.path.normpath() every time that 
> inspect.findsource() is called.
> 
> That can be fixed by having findsource() pass the filename argument to 
> getmodule(), and adding a check of the modulesbyfile cache *before* the call 
> to getabspath().
> 
> Can you try this patch and see if you get 2.4 level performance back on 
> Fernando's test?:

no. this doesn't work. getmodule always iterates over 
sys.modules.values() and only returns None afterwards.
One would have to cache the bad file value, or only inspect new/changed 
modules from sys.modules.

> 
> http://www.python.org/sf/1553314
> 
> (Assigned to Neal in the hopes of making 2.5rc2)
> 
> Cheers,
> Nick.
> 

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] inspect.py very slow under 2.5

2006-09-06 Thread Nick Coghlan
Ralf Schmitt wrote:
> The problem seems to originate from the module=getmodule(object) in 
> findsource. If I outcomment that code (or rather do a module=None),
> things seem to be back as normal. (linecache.getlines has been called 
> with a None module in python 2.4's inspect.py).

It looks like the problem is the call to getabspath() in getmodule(). This 
happens every time, even if the file name is already in the modulesbyfile 
cache. This calls os.path.abspath() and os.path.normpath() every time that 
inspect.findsource() is called.

That can be fixed by having findsource() pass the filename argument to 
getmodule(), and adding a check of the modulesbyfile cache *before* the call 
to getabspath().

Can you try this patch and see if you get 2.4 level performance back on 
Fernando's test?:

http://www.python.org/sf/1553314

(Assigned to Neal in the hopes of making 2.5rc2)

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] inspect.py very slow under 2.5

2006-09-06 Thread Ralf Schmitt
Fernando Perez wrote:
> 
> These enormous numbers of calls are the origin of the slowdown, and the more
> modules have been imported, the worse it gets.


--- /exp/lib/python2.5/inspect.py   2006-08-28 11:53:36.0 +0200
+++ inspect.py  2006-09-06 12:10:45.0 +0200
@@ -444,7 +444,8 @@
  in the file and the line number indexes a line in that list.  An 
IOError
  is raised if the source code cannot be retrieved."""
  file = getsourcefile(object) or getfile(object)
-module = getmodule(object)
+#module = getmodule(object)
+module = None
  if module:
  lines = linecache.getlines(file, module.__dict__)
  else:

The problem seems to originate from the module=getmodule(object) in 
findsource. If I outcomment that code (or rather do a module=None),
things seem to be back as normal. (linecache.getlines has been called 
with a None module in python 2.4's inspect.py).

- Ralf

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] inspect.py very slow under 2.5

2006-09-05 Thread Fernando Perez
Hi all,

I know that the 2.5 release is extremely close, so this will probably be
2.5.1 material.  I discussed it briefly with Guido at scipy'06, and he
asked for some profile-based info, which I've only now had time to gather. 
I hope this will be of some use, as I think the problem is rather serious.

For context: I am the IPython lead developer (http://ipython.scipy.org), and
ipython is used as the base shell for several interactive environments, one
of which is the mathematics system SAGE
(http://modular.math.washington.edu/sage).  It was the SAGE lead who first
ran into this problem while testing SAGE with 2.5.

The issue is the following: ipython provides several exception reporting
modes which give a lot more information than python's default tracebacks. 
In order to generate this info, it makes extensive use of the inspect
module.  The module in ipython responsible for these fancy tracebacks is:

http://projects.scipy.org/ipython/ipython/browser/ipython/trunk/IPython/ultraTB.py

which is an enhanced port of Ka Ping-Yee's old cgitb module.

Under 2.5, the generation of one of these detailed tracebacks is /extremely/
expensive, and the cost goes up very quickly the more modules have been
imported into the current session.  While in a new ipython session the
slowdown is not crippling, under SAGE (which starts with a lot of loaded
modules) it is bad enough to make the system nearly unusable.

I'm attaching a little script which can be run to show the problem, but you
need IPython to be installed to run it.  If any of you run ubuntu, fedora,
suse or almost any other major linux distro, it's already available via the
usual channels.

In case you don't want to (or can't) run the attached code, here's a summary
of what I see on my machine (ubuntu dapper).  Using ipython under python
2.4.3, I get:

 2268 function calls (2225 primitive calls) in 0.020 CPU seconds

   Ordered by: call count
   List reduced from 127 to 32 due to restriction <0.25>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  3050.0000.0000.0000.000 :0(append)
  259/2530.0100.0000.0100.000 :0(len)
  1770.0000.0000.0000.000 :0(isinstance)
   900.0000.0000.0000.000 :0(match)
   680.0000.0000.0000.000 ultraTB.py:539(tokeneater)
   680.0000.0000.0000.000 tokenize.py:16
(generate_tokens)
   610.0000.0000.0000.000 :0(span)
   570.0000.0000.0000.000 sre_parse.py:130(__getitem__)
   560.0000.0000.0000.000 string.py:220(lower)

etc, while running the same script under ipython/python2.5 and no other
changes gives:

 230370 function calls (229754 primitive calls) in 3.340 CPU seconds

   Ordered by: call count
   List reduced from 83 to 21 due to restriction <0.25>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
550030.4200.0000.4200.000 :0(startswith)
450260.2640.0000.2640.000 :0(endswith)
200130.1480.0000.1480.000 :0(append)
121380.1800.0000.6600.000 posixpath.py:156(islink)
121380.1920.0000.1920.000 :0(lstat)
121380.1800.0000.2880.000 stat.py:60(S_ISLNK)
121380.1080.0000.1080.000 stat.py:29(S_IFMT)
118380.6800.0001.2440.000 posixpath.py:56(join)
 48370.0520.0000.0520.000 :0(len)
 43620.0280.0000.0280.000 :0(split)
 43620.0480.0000.1000.000 posixpath.py:47(isabs)
 35980.0360.0000.0560.000 string.py:218(lower)
 35980.0200.0000.0200.000 :0(lower)
 28150.0320.0000.0320.000 :0(isinstance)
 28090.0280.0000.0280.000 :0(join)
 28080.2640.0000.5200.000 posixpath.py:374(normpath)
 26320.0400.0000.0680.000 inspect.py:35(ismodule)
 21430.0160.0000.0160.000 :0(hasattr)
 18840.0280.0000.4440.000 posixpath.py:401(abspath)
 15570.0160.0000.0160.000 :0(range)
 10780.0080.0000.0440.000 inspect.py:342(getfile)


These enormous numbers of calls are the origin of the slowdown, and the more
modules have been imported, the worse it gets.

I haven't had time to dive deep into inspect.py to try and fix this, but I
figured it would be best to at least report it now.  As far as IPython and
its user projects is concerned, I'll probably hack things to overwrite
inspect.py from 2.4 over the 2.5 version in the exception reporter, because
the current code is simply unusable for detailed tracebacks.  It would be
great if this could be fixed in the trunk at some point.

I'll be happy to provide further feedback or put this information elsewhere. 
Guido suggested initially posting here, but if you prefer it on the SF
tracker (even a