Re: [Numpy-discussion] import numpy is slow
Robert Kern wrote: It's still pretty bad, though. I do recommend running Disk Repair like Bill did. I did that, and it found and did nothing -- I suspect it ran when I re-booted -- it did take a while to reboot. However, this is pretty consistently what I'm getting now: $ time python -c import numpy real0m0.728s user0m0.327s sys 0m0.398s Which is apparently pretty slow. Robert gets: $ time python -c import numpy python -c import numpy 0.18s user 0.46s system 88% cpu 0.716 total Is that on a similar machine??? Are you running Universal binaries? Would that make any difference? I wouldn't think so, I'm just grasping at straws here. This is a Dual 1.8GHz G5 desktop, running OS-X 10.4.11, Python 2.5.2 (python.org build), numpy 1.1.1 (from binary on sourceforge) I just tried this on a colleague's machine that is identical, and got about 0.4 seconds real -- so faster than mine, but still slow. This still feels blazingly fast to me, as I was getting something like 7+ seconds! thanks for all the help, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
Robert Kern wrote: It isn't. The problem is on Chris's file system. Thanks for all your help, Robert. Interestingly, I haven't noticed any problems anywhere else, but who knows? I guess this is what Linux Torvalds meant when he said that OS-X's file system was brain dead Whatever is wrong with his file system (Bill Spotz's identical problem suggests too many temporary but unused inodes) I didn't see anything about Bill having similar issues -- was it on this list? But the problem really is his disk; it's not a problem with numpy or Python or anything else. so the question is: what can I do about it? Do I have any other choice than wiping the disk and re-installing? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Mon, Aug 4, 2008 at 14:24, Christopher Barker [EMAIL PROTECTED] wrote: Robert Kern wrote: It isn't. The problem is on Chris's file system. Thanks for all your help, Robert. Interestingly, I haven't noticed any problems anywhere else, but who knows? I guess this is what Linux Torvalds meant when he said that OS-X's file system was brain dead Whatever is wrong with his file system (Bill Spotz's identical problem suggests too many temporary but unused inodes) I didn't see anything about Bill having similar issues -- was it on this list? From my earlier message in this thread: Looking at the Shark results you sent me, it looks like all of your time is getting sucked up by the system call getdirentries(). Googling for some of the function names in that stack brings me to the message Slow python initialization on the Pythonmac-SIG: http://mail.python.org/pipermail/pythonmac-sig/2005-December/015542.html The ultimate resolution was that Bill Spotz, the original poster, ran Disk Utility and used the Disk Repair option to clean up a large number of unused inodes. This solved the problem for him: http://mail.python.org/pipermail/pythonmac-sig/2005-December/015548.html -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
OK, So I'm an idiot. After reading this, I thought I haven't rebooted for a while. It turns out it's been 35 days. I think I've been having slow startup for a longer than that, but none the less, I re-booted (which took a long time), and presto: $ time python -c import numpy real0m0.686s user0m0.322s sys 0m0.363s much better! I suspect OS-X did some disk-cleaning on re-boot. Frankly, 35 days is pretty pathetic for an uptime, but as I said, I think this issue has been going on longer. Perhaps OS-X runs a disk check every n re-boots, like some linux distros do. Sorry about the noise, and thanks, particularly to Robert, for taking an interest in this. -Chris Robert Kern wrote: On Mon, Aug 4, 2008 at 14:24, Christopher Barker [EMAIL PROTECTED] wrote: Robert Kern wrote: It isn't. The problem is on Chris's file system. Thanks for all your help, Robert. Interestingly, I haven't noticed any problems anywhere else, but who knows? I guess this is what Linux Torvalds meant when he said that OS-X's file system was brain dead Whatever is wrong with his file system (Bill Spotz's identical problem suggests too many temporary but unused inodes) I didn't see anything about Bill having similar issues -- was it on this list? From my earlier message in this thread: Looking at the Shark results you sent me, it looks like all of your time is getting sucked up by the system call getdirentries(). Googling for some of the function names in that stack brings me to the message Slow python initialization on the Pythonmac-SIG: http://mail.python.org/pipermail/pythonmac-sig/2005-December/015542.html The ultimate resolution was that Bill Spotz, the original poster, ran Disk Utility and used the Disk Repair option to clean up a large number of unused inodes. This solved the problem for him: http://mail.python.org/pipermail/pythonmac-sig/2005-December/015548.html -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Mon, Aug 4, 2008 at 18:01, Christopher Barker [EMAIL PROTECTED] wrote: OK, So I'm an idiot. After reading this, I thought I haven't rebooted for a while. It turns out it's been 35 days. I think I've been having slow startup for a longer than that, but none the less, I re-booted (which took a long time), and presto: $ time python -c import numpy real0m0.686s user0m0.322s sys 0m0.363s much better! It's still pretty bad, though. I do recommend running Disk Repair like Bill did. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Sat, Aug 2, 2008 at 00:06, David Cournapeau [EMAIL PROTECTED] wrote: Christopher Barker wrote: OK, I just installed wxPython, and whoa! time python -c import numpy real0m2.793s user0m0.294s sys 0m2.494s so it's taking almost two seconds more to import numpy, now that wxPython is installed. I haven't even imported it yet. importing wx isn't as bad: $ time python -c import wx real0m1.589s user0m0.274s sys 0m1.000s Since numpy wo wx + wc import times adds up to numpy import times, this suggests that numpy may import wx. Which it shouldn't, obviously. There is something strange happening here. Please check wether wx really is imported when you do import numpy: python -c import numpy; import sys; print sys.modules And if it is, we have to know why it is imported at all when doing import numpy. It isn't. The problem is on Chris's file system. Whatever is wrong with his file system (Bill Spotz's identical problem suggests too many temporary but unused inodes) increases the traversal of the file system. wx has a .pth file which adds entries to sys.path. Every time one tries to import something, the entries on sys.path are examined for the module. So increasing the number of entries on sys.path exacerbates the problem. But the problem really is his disk; it's not a problem with numpy or Python or anything else. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
Robert Kern wrote: It isn't. The problem is on Chris's file system. Whatever is wrong with his file system (Bill Spotz's identical problem suggests too many temporary but unused inodes) increases the traversal of the file system. Ah, I did not think it could indeed affect the whole fs. This seems much more likely, then. I guess I was confused because wx caused me some problems a long time ago, with scipy, and thought maybe there were some leftovers in Chris' system. It would also explain why import numpy is still kind of slow on his machine. I don't remember the numbers, but I think it was quicker on my PPC minimac (under Mac os X) than on his computer. wx has a .pth file which adds entries to sys.path. Every time one tries to import something, the entries on sys.path are examined for the module. So increasing the number of entries on sys.path exacerbates the problem. But the problem really is his disk; it's not a problem with numpy or Python or anything else. It was an fs problem, after all. I am a bit surprised this can happen in such an aggravated manner, though. cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
I've got a proof of concept that take the time on my machine to import numpy from 0.21 seconds down to 0.08 seconds. Doing that required some somewhat awkward things, like deferring all 'import re' statements. I don't think that's stable in the long run because people will blithely import re in the future and not care that it takes 0.02 seconds to import. I don't blame them for complaining; I was curious on how fast I could get things. Note that when I started complaining about this a month ago the import time on my machine was about 0.3 seconds. I'll work on patches within the next couple of days. Here's an outline of what I did, along with some questions about what's feasible. 1) don't import 'numpy.testing'. Savings = 0.012s. Doing so required patches like -from numpy.testing import Tester -test = Tester().test -bench = Tester().bench +def test(label='fast', verbose=1, extra_argv=None, doctests=False, + coverage=False, **kwargs): +from testing import Tester +import numpy +Tester(numpy).test(label, verbose, extra_argv, doctests, + coverage, **kwargs) +def bench(label='fast', verbose=1, extra_argv=None): +from testing import Tester +import numpy +Tester(numpy).bench(label, verbose, extra_argv) QUESTION: since numpy is moving to nose, and the documentation only describes doing 'import numpy; numpy.test()', can I remove all other definitions of test and bench? 2) removing 'import ctypeslib' in top-level - 0.023 seconds QUESTION: is this considered part of the API that must be preserved? The primary use case is supposed to be to help interactive users. I don't think interactive users spend much time using ctypes, and those that do are also those that aren't confused about needing an extra import statement. 3) removing 'import string' in numerictypes.py - 0.008 seconds . This requires some ugly but simple changes to the code. 4) remove the 'import re' in _internal, numpy/lib/, function_base, and other places. This reduced my overall startup cost by 0.013. 5) defer bzip and gzip imports in _datasource: 0.009 s. This will require non-trivial code changes. 6) defer 'format' from io.py: 0.007 s 7) _datasource imports shutil in order to use shutil.rmdir in a __del__. I don't think this can be deferred, because I don't want to do an import during system shutdown, which is when the __del__ might be called. It would save 0.004s. 8) If I can remove 'import doc' from the top-level numpy (is that part of the required API?) then I can save 0.004s. 9) defer urlparse in _datasource: about 0.003s 10) If I get rid of the cPickle top-level numeric.py then I can save 0.006 seconds. 11) not importing add_newdocs saves 0.005 s. This might be possible by moving all of the docstrings to the actual functions. I haven't looked into this much and it might not be possible. Those millisecond improvements add up! When I do an interactive 'import numpy' on my system I don't notice the import time like I did before. Andrew [EMAIL PROTECTED] Andrew [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Fri, Aug 01, 2008 at 09:18:48AM -0700, Christopher Barker wrote: What does python -c import sys; print sys.path say ? A lot! 41 entries, and lot's of eggs -- are eggs an issue? I'm also wondering how the order is determined -- if it looked in site-packages first, it would find numpy a whole lot faster. AFAIK this is a setuptools issue. From what I hear, it might be fixed in the svn version of setuptools, but they still have to make a release that has this feature. The two issues I can see are: import path priority, it should be screwed up like it is, and speed. Speed is obviously a hard problem. I suspect the thing to do is to re-install from scratch, and only add in packages I'm really using now. Avoid eggs if you can. This has been my policy. I am not sure how much this is just superstition or a real problem, though. I realize that you are on mac, and that mac unlike some distribution of linux does not have a good dependency tracking system. Thus seutptools and eggs are a great tentation. Them come to a cost, but it can probably be improved. If you care about this problem, you could try and work with the setuptools developers to improve the situation. I must say that I am under UBuntu, and I don't have the dependency problem at all, so setuptools does not answer an important need for me. I however realize that not everybody wants to use Ubuntu and I thus care about the problem, maybe not enough to invest much time in setuptools, but at least enough to try to report problems and track solution. Do not underestimate how difficult it is to get a package-manager that works well. If you ever do verify that it is indeed eggs that I slowing down your import, I'd be interested in having the confirmation, just so that I am sure I am not blaming them for nothing. Cheers, Gaël ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Fri, Aug 1, 2008 at 11:53, Gael Varoquaux [EMAIL PROTECTED] wrote: On Fri, Aug 01, 2008 at 09:18:48AM -0700, Christopher Barker wrote: What does python -c import sys; print sys.path say ? A lot! 41 entries, and lot's of eggs -- are eggs an issue? I'm also wondering how the order is determined -- if it looked in site-packages first, it would find numpy a whole lot faster. AFAIK this is a setuptools issue. From what I hear, it might be fixed in the svn version of setuptools, but they still have to make a release that has this feature. The two issues I can see are: import path priority, it should be screwed up like it is, and speed. Speed is obviously a hard problem. I suspect the thing to do is to re-install from scratch, and only add in packages I'm really using now. Avoid eggs if you can. This has been my policy. I am not sure how much this is just superstition or a real problem, though. Superstition. [~]$ python -c import sys; print len(sys.path) 269 [~]$ python -v -v -c import numpy 2 foo.txt [~]$ wc -l foo.txt 42500 foo.txt [~]$ time python -c import numpy python -c import numpy 0.18s user 0.46s system 88% cpu 0.716 total So cut it out. Chris, please profile your import so we actually have some real information to work with instead of prejudices. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Thu, Jul 31, 2008 at 10:02 PM, Robert Kern [EMAIL PROTECTED] wrote: On Thu, Jul 31, 2008 at 05:43, Andrew Dalke [EMAIL PROTECTED] wrote: On Jul 31, 2008, at 12:03 PM, Robert Kern wrote: But you still can't remove them since they are being used inside numerictypes. That's why I labeled them internal utility functions instead of leaving them with minimal docstrings such that you would have to guess. My proposal is to replace that code with a table mapping the type name to the uppercase/lowercase/capitalized forms, thus eliminating the (small) amount of time needed to import string. It makes adding new types slightly more difficult. I know it's a tradeoff. Probably not a bad one. Write up the patch, and then we'll see how much it affects the import time. I would much rather that we discuss concrete changes like this rather than rehash the justifications of old decisions. Regardless of the merits about the old decisions (and I agreed with your position at the time), it's a pointless and irrelevant conversation. The decisions were made, and now we have a user base to whom we have promised not to break their code so egregiously again. The relevant conversation is what changes we can make now. Some general guidelines: 1) Everything exposed by from numpy import * still needs to work. a) The layout of everything under numpy.core is an implementation detail. b) _underscored functions and explicitly labeled internal functions can probably be modified. c) Ask about specific functions when in doubt. 2) The improvement in import times should be substantial. Feel free to bundle up the optimizations for consideration. 3) Moving imports from module-level down into the functions where they are used is generally okay if we get a reasonable win from it. The local imports should be commented, explaining that they are made local in order to improve the import times. 4) __import__ hacks are off the table. 5) Proxy objects ... I would really like to avoid proxy objects. They have caused fragility in the past. 6) I'm not a fan of having environment variables control the way numpy gets imported, but I'm willing to consider it. For example, I might go for having proxy objects for linalg et al. *only* if a particular environment variable were set. But there had better be a very large improvement in import times. I just want to say that I agree with Andrew that slow imports just suck. But it's not really that bad, for example on my system: In [1]: %time import numpy CPU times: user 0.11 s, sys: 0.01 s, total: 0.12 s Wall time: 0.12 s so that's ok. For comparison: In [1]: %time import sympy CPU times: user 0.12 s, sys: 0.02 s, total: 0.14 s Wall time: 0.14 s But I am still unhappy about it, I'd like if the package could import much faster, because it adds up, when you need to import 7 packages like that, it's suddenly 1s and that's just too much. But of course everything within the constrains that Robert has outlined. From the theoretical point of view, I don't understand why python cannot just import numpy (or any other package) immediatelly, and only at the moment the user actually access something, to import it in real. Mercurial uses a lazy import module, that does exactly this. Maybe that's an option? Look into mercurial/demandimport.py. Use it like this: In [1]: import demandimport In [2]: demandimport.enable() In [3]: %time import numpy CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s Wall time: 0.00 s That's pretty good, huh? :) Unfortunately, numpy cannot work with lazy import (yet): In [5]: %time from numpy import array ERROR: An unexpected error occurred while tokenizing input The following traceback may be corrupted or invalid The error message is: ('EOF in multi-line statement', (17, 0)) --- AttributeErrorTraceback (most recent call last) [skip] /usr/lib/python2.5/site-packages/numpy/lib/index_tricks.py in module() 14 import function_base 15 import numpy.core.defmatrix as matrix --- 16 makemat = matrix.matrix 17 18 # contributed by Stefan van der Walt /home/ondra/ext/sympy/demandimport.pyc in __getattribute__(self, attr) 73 return object.__getattribute__(self, attr) 74 self._load() --- 75 return getattr(self._module, attr) 76 def __setattr__(self, attr, val): 77 self._load() AttributeError: 'module' object has no attribute 'matrix' BTW, neither can SymPy. However, maybe it shows some possibilities and maybe it's possible to fix numpy to work with such a lazy import. On the other hand, I can imagine it can bring a lot more troubles, so it should probably only be optional. Ondrej ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
David Cournapeau wrote: IOW, I don't think the problem is the numbers themselves. It has to be something else. A simple profiling like python -m cProfile -o foo.stats foo.py and then: python -c import pstats; p = pstats.Stats(foo.stats); p.sort_stats('cumulative').print_stats(50) OK, see the results -- I think (though i may be wrong) this means that the problem isn't in finding the numpy package: As for Shark, I'm sorry I missed that message, but I'm trying to see if I can do that now -- I don't seem to have Shark installed, and the ADC site doesn't seem to be working, but I'll keep looking. Thanks for all your help with this... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] Fri Aug 1 15:14:10 2008ImportNumpy.stats 26987 function calls (26098 primitive calls) in 5.150 CPU seconds Ordered by: cumulative time List reduced from 631 to 50 due to restriction 50 ncalls tottime percall cumtime percall filename:lineno(function) 10.0000.0005.1515.151 {execfile} 10.0360.0365.1515.151 ImportNumpy.py:1(module) 10.1460.1465.1155.115 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/__init__.py:63(module) 10.0260.0263.9413.941 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/add_newdocs.py:9(module) 10.0640.0643.9033.903 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/lib/__init__.py:1(module) 10.1790.1792.0772.077 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/lib/io.py:1(module) 10.4830.4831.7351.735 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/lib/_datasource.py:33(module) 10.0350.0351.5821.582 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/lib/type_check.py:3(module) 10.1120.1121.5471.547 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/core/__init__.py:2(module) 10.0100.0101.3481.348 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/core/defmatrix.py:1(module) 10.3020.3021.3381.338 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/lib/utils.py:1(module) 10.5180.5181.2361.236 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib2.py:74(module) 10.0120.0120.6960.696 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/testing/__init__.py:2(module) 10.3270.3270.6830.683 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/testing/numpytest.py:1(module) 10.0110.0110.6810.681 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/compiler/__init__.py:22(module) 10.4470.4470.6500.650 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/httplib.py:67(module) 10.3510.3510.3560.356 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/compiler/transformer.py:9(module) 10.0120.0120.3140.314 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/compiler/pycodegen.py:1(module) 10.1810.1810.3000.300 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/compiler/pyassem.py:1(module) 10.1620.1620.2050.205 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/inspect.py:24(module) 10.0610.0610.1940.194 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/testing/utils.py:3(module) 10.1630.1630.1630.163 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/mimetools.py:1(module) 10.1310.1310.1630.163 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/tempfile.py:18(module) 10.1610.1610.1620.162 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/unittest.py:45(module) 10.1310.1310.1490.149 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/pydoc.py:35(module) 10.1170.1170.1320.132 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/difflib.py:29(module) 10.0610.0610.1220.122 /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/_import_tools.py:2(module)
Re: [Numpy-discussion] import numpy is slow
Robert Kern wrote: File/Save As..., pick a file name. When asked about whether to embed source files or strip them out, choose Strip. Then email the resulting .mshark file to me. I've done that, and sent it to you directly -- it's too big to put in the mailing list. It looks like your Python just takes a truly inordinate amount of time to execute any code. Some of the problematic modules like httplib have been moved to local imports, but the time it takes for your Python to execute the code in that module is still ridiculously large. Can you profile just importing httplib instead of numpy? I've got to go catch a bus now, and I don't have a Mac at home, so this will have to wait 'till next Monday -- thanks for all your time on this. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Sat, Aug 2, 2008 at 5:33 AM, Ondrej Certik [EMAIL PROTECTED] wrote: But I am still unhappy about it, I'd like if the package could import much faster, because it adds up, when you need to import 7 packages like that, it's suddenly 1s and that's just too much. Too much for what ? We need more information on the kind of things people who complaing about numpy startup cost are doing. I suggested lazy import a few weeks ago when this discussion started (with the example of bzr instead of hg), but I am less convinced that it would be that useful, because numpy is fundamentally different than bzr/hg. As robert said, it would bring some complexity, and in an area where python is already fishy. When you import numpy, you expect some core things to be available, and they are the ones who take the most time. In bzr/hg, you use a *program*, and you can relatively easily change the API because not many people use it. But numpy is essentially an API, not a tool, so we don't have this freedom. Also, it means it is relatively easy for bzr/hg developers to control lazy import ,because they are the users, and users of bzr/hg don't deal with python directly. If our own lazy import has some bugs, it will impact many people who will not be able to trace it. The main advantage I see with lazy imports is that it avoids someone else from breaking the speed-up work by re-importing globally a costly package. But of course everything within the constrains that Robert has outlined. From the theoretical point of view, I don't understand why python cannot just import numpy (or any other package) immediatelly, and only at the moment the user actually access something, to import it in real. I guess because it would be complex to do everywhere while keeping all the semantics of python import. Also, like everything lazy, it means it is more complicated to follow what's happening. Your examples show that it would be complex to do. As I see it, there are some things in numpy we could do a bit differently to cut significantly import times (a few ten ), without changing much. Let's try that first. Mercurial uses a lazy import module, that does exactly this. Maybe that's an option? Note that mercurial is under the GPL :) cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Jul 31, 2008, at 3:53 AM, David Cournapeau wrote: You are supposed to run the tests on an installed numpy, not in the sources: import numpy numpy.test(verbose = 10) Doesn't that make things more cumbersome to test? That is, if I were to make a change I would need to: - python setup.py build (to put the code into the build/* subdirectory) - cd the build directory, or switch to a terminal which was already there - manually do the import/test code you wrote, or a write two-line program for it I would rather do 'nosetests' in the source tree, if at all feasible, although that might only be possible for the Python source. Hmm. And it looks like testing/nosetester.py (which implements the 'test' function above) is meant to make it easier to run nose, except my feeling is the extra level of wrapping makes things more complicated. The nosetest command-line appears to be more flexible, with support for, for examples, dropping into the debugger on errors, and reseting the coverage test files. I'm speaking out of ignorance, btw. Cheers, Andrew [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
Hi All, I've been reading this discussion with interest. I would just to highlight an alternate use of numpy to interactive use. We have a cluster of machines which process tasks on an individual basis where a master tasks may spawn 600 slave tasks to be processed. These tasks are spread across the cluster and processed as scripts in a individual python thread. Although reducing the process time by 300 seconds for the master task is only about a 1.5% speedup (total time can be i excess of 24000s). We process large number of these tasks in any given year and every little helps! Hanni 2008/7/31 Stéfan van der Walt [EMAIL PROTECTED] 2008/7/31 Andrew Dalke [EMAIL PROTECTED]: The user base for numpy might be .. 10,000 people? 100,000 people? Let's go with the latter, and assume that with command-line scripts, CGI scripts, and the other programs that people write in order to help do research means that numpy is started on average 10 times a day. 100,000 people * 10 times / day * 0.1 seconds per startup = almost 28 people-hours spent each day waiting for numpy to start. I don't buy that argument. No single person is agile enough to do anything useful in the half a second or so it takes to start up NumPy. No one is *waiting* for NumPy to start. Just by answering this e-mail I could have (and maybe should have) started NumPy three hundred and sixty times. I don't want to argue about this, though. Write the patches, file a ticket, and hopefully someone will deem them important enough to apply them. Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Thu, Jul 31, 2008 at 7:31 AM, Hanni Ali [EMAIL PROTECTED] wrote: I would just to highlight an alternate use of numpy to interactive use. We have a cluster of machines which process tasks on an individual basis where a master tasks may spawn 600 slave tasks to be processed. These tasks are spread across the cluster and processed as scripts in a individual python thread. Although reducing the process time by 300 seconds for the master task is only about a 1.5% speedup (total time can be i excess of 24000s). We process large number of these tasks in any given year and every little helps! There are other components of NumPy/SciPy that are more worthy of optimization. Given that programmer time is a scarce resource, it's more sensible to direct our efforts towards making the other 98.5% of the computation faster. /law of diminishing returns -- Nathan Bell [EMAIL PROTECTED] http://graphics.cs.uiuc.edu/~wnbell/ ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
Nathan Bell wrote: There are other components of NumPy/SciPy that are more worthy of optimization. Given that programmer time is a scarce resource, it's more sensible to direct our efforts towards making the other 98.5% of the computation faster. To be fair, when I took a look at the problem last month, it took a few of us (Robert and me IIRC) maximum 2 man hours altogether to divide by two numpy import times on linux, without altering at all the API. Maybe there are more things which can be done to get to a more 'flat' profile. cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Thu, Jul 31, 2008 at 5:36 AM, Andrew Dalke [EMAIL PROTECTED] wrote: The user base for numpy might be .. 10,000 people? 100,000 people? Let's go with the latter, and assume that with command-line scripts, CGI scripts, and the other programs that people write in order to help do research means that numpy is started on average 10 times a day. 100,000 people * 10 times / day * 0.1 seconds per startup = almost 28 people-hours spent each day waiting for numpy to start. I'm willing to spend a few days to achieve that. Perhaps there's fewer people than I'm estimating. OTOH, perhaps there are more imports of numpy per day. An order of magnitude less time is still a couple of hours each day as the world waits to import all of the numpy libraries. If on average people import numpy 10 times a day and it could be made 0.1 seconds faster then that's 1 second per person per day. If it takes on average 5 minutes to learn to import the module directly and the onus is all on numpy, then after 1 year of use the efficiency has made up for it, and the benefits continue to grow. Just think of the savings that could be achieved if all 2.1 million Walmart employees were outfitted with colostomy bags. 0.5 hours / day for bathroom breaks * 2,100,000 employees * 365 days/year * $7/hour = $2,682,750,000/year Granted, I'm probably not the first to run these numbers. -- Nathan Bell [EMAIL PROTECTED] http://graphics.cs.uiuc.edu/~wnbell/ ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Thu, Jul 31, 2008 at 03:41:15PM +0900, David Cournapeau wrote: Yes. Nothing that an easy make file cannot solve, nonetheless (I am sure I am not the only one with a makefile/script which automates the above, to test a new svn updated numpy in one command). That's why distutils have a test target. You can do python setup.py test, and if you have setup you setup.py properly it should work (obviously it is easy to make this statement, and harder to get the thing working). Gaël ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Thu, Jul 31, 2008 at 12:43:17PM +0200, Andrew Dalke wrote: Startup performance has not been a numpy concern. It a concern for me, and it has been (for other packages) a concern for some of my clients. I am curious, if startup performance is a problem, I guess it is because you are running lots of little scripts where startup time is big compared to run time. Did you think of forking them from an already started process. I had this same problem (with libraries way slower than numpy to load) and used os.fork to a great success. Gaël ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Thu, Jul 31, 2008 at 2:12 AM, Andrew Dalke [EMAIL PROTECTED] wrote: Hmm. And it looks like testing/nosetester.py (which implements the 'test' function above) is meant to make it easier to run nose, except my feeling is the extra level of wrapping makes things more complicated. The nosetest command-line appears to be more flexible, with support for, for examples, dropping into the debugger on errors, and reseting the coverage test files. You can actually pass those sorts of options to nose through the extra_argv parameter in test(). That might be a little cumbersome, but (as far as I know) it's something I'm going to do so infrequently it's not a big deal. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
Gael Varoquaux wrote: That's why distutils have a test target. You can do python setup.py test, and if you have setup you setup.py properly it should work (obviously it is easy to make this statement, and harder to get the thing working). I have already seen some discussion about distutils like this, if you mean something like this: http://blog.ianbicking.org/pythons-makefile.html but I would take with rake and make over this anytime. I just don't understand why something like rake does not exist in python, but well, let's not go there. David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Thu, Jul 31, 2008 at 11:05:33PM +0900, David Cournapeau wrote: Gael Varoquaux wrote: That's why distutils have a test target. You can do python setup.py test, and if you have setup you setup.py properly it should work (obviously it is easy to make this statement, and harder to get the thing working). I have already seen some discussion about distutils like this, if you mean something like this: http://blog.ianbicking.org/pythons-makefile.html but I would take with rake and make over this anytime. I just don't understand why something like rake does not exist in python, but well, let's not go there. Well, actually, in the enthought tools suite we use setuptools for packaging (I don't want to start a controversy, I am not advocating the use of setuptools, just stating a fact) and nose for testing, and getting setup.py test to wrok, including do the build test and download nose if not there, is a matter of addig those two lines to the setup.py: tests_require = [ 'nose = 0.10.3', ], test_suite = 'nose.collector', Obviously, the build part has to be well-tuned for the machinery to work, but there is a lot of value here. Gaël ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
Gael Varoquaux wrote: Obviously, the build part has to be well-tuned for the machinery to work, but there is a lot of value here. Ah yes, setuptools does have this. But this is specific to setuptools, bare distutils does not have this test command, right ? cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Thu, Jul 31, 2008 at 10:14 AM, Gael Varoquaux [EMAIL PROTECTED] wrote: On Thu, Jul 31, 2008 at 12:43:17PM +0200, Andrew Dalke wrote: Startup performance has not been a numpy concern. It a concern for me, and it has been (for other packages) a concern for some of my clients. I am curious, if startup performance is a problem, I guess it is because you are running lots of little scripts where startup time is big compared to run time. Did you think of forking them from an already started process. I had this same problem (with libraries way slower than numpy to load) and used os.fork to a great success. Start up time is an issue for me, but in a larger sense than just numpy. I do run many scripts, some that are ephemeral and some that take significant amounts of time. However, numpy is just one of many many libraries that I must import, so improvements, even minor ones, are appreciated. The morale of this discussion, for me, is that just because _you_ don't care about a particular aspect or feature, doesn't mean that others don't or shouldn't. Your workarounds may not be viable for me and vice-versa. So let's just go with the spirit of open source and encourage those motivated to controbute to do so, provided their suggestions are sensible and do not break code. -Kevin ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Thu, Jul 31, 2008 at 11:16:12PM +0900, David Cournapeau wrote: Gael Varoquaux wrote: Obviously, the build part has to be well-tuned for the machinery to work, but there is a lot of value here. Ah yes, setuptools does have this. But this is specific to setuptools, bare distutils does not have this test command, right ? Dunno, sorry. The scale of my ignore of distutils and related subjects would probably impress you :). Gaël, looking forward to your tutorial on scons. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Thu, Jul 31, 2008 at 10:34:04AM -0400, Kevin Jacobs [EMAIL PROTECTED] wrote: The morale of this discussion, for me, is that just because _you_ don't care about a particular aspect or feature, doesn't mean that others don't or shouldn't. Your workarounds may not be viable for me and vice-versa. So let's just go with the spirit of open source and encourage those motivated to controbute to do so, provided their suggestions are sensible and do not break code. I fully agree ehre. And if people improve numpy's startup time with breaking or obfuscating stuff, I am very happy. I was just trying to help :). Yes, the value of open source is that different people improve the same tools to meet different goals, thus we should always keep on open ear to other people's requirements, especially if they come up with high-quality code. Gaël ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
Andrew Dalke wrote: If I had my way, remove things like (in numpy/__init__.py) import linalg import fft import random import ctypeslib import ma as a side benefit, this might help folks using py2exe, py2app and friends -- as it stands all those sub-modules need to be included in your app bundle regardless of whether they are used. I recall having to explicitly add them by hand, too, though that may have been a matplotlib.numerix issue. but leave the list of submodules in __all__ so that from numpy import * works. Of course, no one should be doing that anyway ;-) And for what it's worth, I've found myself very frustrated by how long it takes to start up python and import numpy. I often do whip out the interpreter to do something fast, and I didn't used to have to wait for it. On my OS-X box (10.4.11, python2.5, numpy '1.1.1rc2'), it takes about 7 seconds to import numpy! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
Christopher Barker wrote: On my OS-X box (10.4.11, python2.5, numpy '1.1.1rc2'), it takes about 7 seconds to import numpy! Hot or cold ? If hot, there is something horribly wrong with your setup. On my macbook, it takes ~ 180 ms to to python -c import numpy, and ~ 100 ms on linux (same machine). cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
Stéfan van der Walt wrote: No one is *waiting* for NumPy to start. I am, and probably 10 times, a day, yes. And it's a major issue for CGI, though maybe no one's using that anymore anyway. Just by answering this e-mail I could have (and maybe should have) started NumPy three hundred and sixty times. sure, but I like wasting my time on mailing lists -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
David Cournapeau wrote: Christopher Barker wrote: On my OS-X box (10.4.11, python2.5, numpy '1.1.1rc2'), it takes about 7 seconds to import numpy! Hot or cold ? If hot, there is something horribly wrong with your setup. hot -- it takes about 10 cold. I've been wondering about that. time python -c import numpy real0m8.383s user0m0.320s sys 0m7.805s and similar results if run multiple times in a row. Any idea what could be wrong? I have no clue where to start, though I suppose a complete clean out and re-install of python comes to mind. oh, and this is a dual G5 PPC (which should have a faster disk than your Macbook) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Thu, 31 Jul 2008 10:12:22 -0700 Christopher Barker [EMAIL PROTECTED] wrote: David Cournapeau wrote: Christopher Barker wrote: On my OS-X box (10.4.11, python2.5, numpy '1.1.1rc2'), it takes about 7 seconds to import numpy! Hot or cold ? If hot, there is something horribly wrong with your setup. hot -- it takes about 10 cold. I've been wondering about that. time python -c import numpy real0m8.383s user0m0.320s sys 0m7.805s and similar results if run multiple times in a row. Any idea what could be wrong? I have no clue where to start, though I suppose a complete clean out and re-install of python comes to mind. oh, and this is a dual G5 PPC (which should have a faster disk than your Macbook) -Chris No idea, but for comparison time /usr/bin/python -c import numpy real0m0.295s user0m0.236s sys 0m0.050s [EMAIL PROTECTED]:~/svn/matplotlib cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 10 model name : mobile AMD Athlon (tm) 2500+ stepping: 0 cpu MHz : 662.592 cache size : 512 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr sse pni syscall mp mmxext 3dnowext 3dnow bogomips: 1316.57 Nils ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Thu, Jul 31, 2008 at 1:12 PM, Christopher Barker [EMAIL PROTECTED]wrote: David Cournapeau wrote: Christopher Barker wrote: On my OS-X box (10.4.11, python2.5, numpy '1.1.1rc2'), it takes about 7 seconds to import numpy! Hot or cold ? If hot, there is something horribly wrong with your setup. hot -- it takes about 10 cold. I've been wondering about that. time python -c import numpy real0m8.383s user0m0.320s sys 0m7.805s and similar results if run multiple times in a row. Any idea what could be wrong? I have no clue where to start, though I suppose a complete clean out and re-install of python comes to mind. Is only 'import numpy' slow, or other packages import slowly too ? Are there remote directories in your pythonpath ? Do you have old `eggs` in the site-packages directory that point to remote directories (installed with setuptools developp) ? Try cleaning the site-packages directory. That did the trick for me once. David oh, and this is a dual G5 PPC (which should have a faster disk than your Macbook) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
hot -- it takes about 10 cold. I've been wondering about that. time python -c import numpy real0m8.383s user0m0.320s sys 0m7.805s and similar results if run multiple times in a row. What does python -c import sys; print sys.path say ? Any idea what could be wrong? I have no clue where to start, though I suppose a complete clean out and re-install of python comes to mind. oh, and this is a dual G5 PPC (which should have a faster disk than your Macbook) disk should not matter. If hot, everything should be in the IO buffer, opening a file is of the order of a few micro seconds (that's certainly the order on Linux; the VM on Mac OS X is likely not as good, but still). cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Thu, Jul 31, 2008 at 10:12:22AM -0700, Christopher Barker wrote: I've been wondering about that. time python -c import numpy real0m8.383s user0m0.320s sys 0m7.805s I don't know what is wrong, but this is plain wrong, unless you are on a distant file system, or something usual. On the box I am currently on, I get: python -c import numpy 0.10s user 0.03s system 101% cpu 0.122 total And this matches my overall experience. Gaël ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Thu, Jul 31, 2008 at 07:46:20AM -0500, Nathan Bell wrote: On Thu, Jul 31, 2008 at 7:31 AM, Hanni Ali [EMAIL PROTECTED] wrote: I would just to highlight an alternate use of numpy to interactive use. We have a cluster of machines which process tasks on an individual basis where a master tasks may spawn 600 slave tasks to be processed. These tasks are spread across the cluster and processed as scripts in a individual python thread. Although reducing the process time by 300 seconds for the master task is only about a 1.5% speedup (total time can be i excess of 24000s). We process large number of these tasks in any given year and every little helps! There are other components of NumPy/SciPy that are more worthy of optimization. Given that programmer time is a scarce resource, it's more sensible to direct our efforts towards making the other 98.5% of the computation faster. This is true in general, but I have a different use case for one of my programs that uses numpy on a cluster. Basically, the program gets called thousands of times per day and the runtime for each is only a second or two. In this case I am much more dominated by numpy's import time. Scott PS: Yes, I could change the way that the routine works so that it is called many fewer times, however, that would be very difficult (although not impossible). A free speedup due to faster numpy import would be very nice. -- Scott M. RansomAddress: NRAO Phone: (434) 296-0320 520 Edgemont Rd. email: [EMAIL PROTECTED] Charlottesville, VA 22903 USA GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Thu, Jul 31, 2008 at 05:43, Andrew Dalke [EMAIL PROTECTED] wrote: On Jul 31, 2008, at 12:03 PM, Robert Kern wrote: But you still can't remove them since they are being used inside numerictypes. That's why I labeled them internal utility functions instead of leaving them with minimal docstrings such that you would have to guess. My proposal is to replace that code with a table mapping the type name to the uppercase/lowercase/capitalized forms, thus eliminating the (small) amount of time needed to import string. It makes adding new types slightly more difficult. I know it's a tradeoff. Probably not a bad one. Write up the patch, and then we'll see how much it affects the import time. I would much rather that we discuss concrete changes like this rather than rehash the justifications of old decisions. Regardless of the merits about the old decisions (and I agreed with your position at the time), it's a pointless and irrelevant conversation. The decisions were made, and now we have a user base to whom we have promised not to break their code so egregiously again. The relevant conversation is what changes we can make now. Some general guidelines: 1) Everything exposed by from numpy import * still needs to work. a) The layout of everything under numpy.core is an implementation detail. b) _underscored functions and explicitly labeled internal functions can probably be modified. c) Ask about specific functions when in doubt. 2) The improvement in import times should be substantial. Feel free to bundle up the optimizations for consideration. 3) Moving imports from module-level down into the functions where they are used is generally okay if we get a reasonable win from it. The local imports should be commented, explaining that they are made local in order to improve the import times. 4) __import__ hacks are off the table. 5) Proxy objects ... I would really like to avoid proxy objects. They have caused fragility in the past. 6) I'm not a fan of having environment variables control the way numpy gets imported, but I'm willing to consider it. For example, I might go for having proxy objects for linalg et al. *only* if a particular environment variable were set. But there had better be a very large improvement in import times. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Fri, Aug 1, 2008 at 5:02 AM, Robert Kern [EMAIL PROTECTED] wrote: 5) Proxy objects ... I would really like to avoid proxy objects. They have caused fragility in the past. One recurrent problem around import times optimization is that it is some work to improve it, but it takes one line to destroy it all. For example, inspect import came back, and this alone is ~10-15 % of my import time on mac os x (from ~ 180 to ~160). This would be the main advantage of lazy import; but does it really worth the trouble, since it brings some complexity as you mentionned last time we had this discussion ? Maybe a simple test script to check for known costly import would be enough (running from time to time ?). Maybe ctypes can be loaded in the fly, too. Those are the two obvious hotspot ( ~ 25 % altogether). with a recent SVN checkout 6) I'm not a fan of having environment variables control the way numpy gets imported, but I'm willing to consider it. For example, I might go for having proxy objects for linalg et al. *only* if a particular environment variable were set. But there had better be a very large improvement in import times. linalg does not seem to have a huge impact. It is typically much faster to load than ctypeslib or inspect. cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Jul 4, 2008, at 2:22 PM, Andrew Dalke wrote: [josiah:numpy/build/lib.macosx-10.3-fat-2.5] dalke% time python -c 'pass' 0.015u 0.042s 0:00.06 83.3% 0+0k 0+0io 0pf+0w [josiah:numpy/build/lib.macosx-10.3-fat-2.5] dalke% time python -c 'import numpy' 0.084u 0.231s 0:00.33 93.9% 0+0k 0+8io 0pf+0w [josiah:numpy/build/lib.macosx-10.3-fat-2.5] dalke% For one of my clients I wrote a tool to analyze import times. I don't have it, but here's something similar I just now whipped up: Based on those results I've been digging into the code trying to figure out why numpy imports so many files, and at the same time I've been trying to guess at the use case Robert Kern regards as typical when he wrote: Your use case isn't so typical and so suffers on the import time end of the balance and trying to figure out what code would break if those modules weren't all eagerly imported and were instead written as most other Python modules are written. I have two thoughts for why mega-importing might be useful: - interactive users get to do tab complete and see everything (eg, import numpy means numpy.fft.ifft works, without having to do import numpy.fft manually) - class inspectors don't need to to directory checks to find possible modules (This is a stretch, since every general purpose inspector I know of has to know how to frob the directories to find directories.) Are these the reasons numpy imports everything or are there other reasons? The first guess comes from the comment in numpy/__init__.py The following sub-packages must be explicitly imported: meaning, I take it, that the other modules (core, lib, random, linalg, fft, testing) do not need to be explicitly imported. Is the numpy recommendation that people should do: import numpy numpy.fft.ifft(data) ? If so, the documentation should be updated to say that random, ma, ctypeslib and several other libraries are included in that list. Why is the last so important that it should be in the top- level namespace? In my opinion, this assistance is counter to standard practice in effectively every other Python package. I don't see the benefit. You may ask if there are possible improvements. There's no obvious place taking up a bunch of time but there are plenty of small places which add up. For examples: 1) I wondered why 'cPickle' needed to be imported. One of the places it's used is numpy.lib.format which is only imported by numpy.lib.io. It's easy to defer the 'import format' to be inside the functions which need it. Note that io.py already defers the import of zipfile, so function-local imports are not inappropriate. 'io' imports 'tempfile', needing 0.016 seconds. This can be a deferred cost only incurred by those who use io.savez, which already has some function-local imports. The reason for the high import costs? Here's what tempfile itself imports. tempfile: 0.016 (io) errno: 0.000 (tempfile) random: 0.010 (tempfile) binascii: 0.003 (random) _random: 0.003 (random) fcntl: 0.003 (tempfile) thread: 0.000 (tempfile) (This is read as 'tempfile' is imported by 'io' and takes 0.016 seconds total, including all children, and the directly imported children of 'tempfile' are 'errno', 'random', 'fcntl' and 'thread'. 'random' imports 'binascii' and '_random'.) BTW, the load and save commands in io do an incorrect check. if isinstance(file, type()): fid = _file(file,rb) else: fid = file Filenames can be unicode strings. This test should either be isinstance(file, basestring) or not hasatttr(file, 'read') 2) What's the point of add_newdocs? According to the top of the module # This is only meant to add docs to objects defined in C- extension modules. # The purpose is to allow easier editing of the docstrings without # requiring a re-compile. which implies this aids development, but not deployment. The import takes a miniscule 0.006 seconds of the 0.225 (import lib and its subimports takes 0.141 seconds) but seems to add no direct end-user benefit. Shouldn't this documentation be pushed into the C code at least for each release? 3) I see that numpy/core/numerictypes.py imports 'string', which takes 0.008 seconds. I wondered why. It's part of english_lower, english_upper, and english_capitalize, which are functions defined in that module. The implementation can't be improved, and using string.translate is the right approach. However, 3a) the two functions have no leading underscore and have docstrings to imply that this is part of the public API (although they are not included in __all__). Are they meant for general use? Note that english_capitalize is over-engineered for the use-case in that file. There are no empty type names, so the test if s is never false. 3b) there are only 33
Re: [Numpy-discussion] import numpy is slow
2008/7/30 Andrew Dalke [EMAIL PROTECTED]: Based on those results I've been digging into the code trying to figure out why numpy imports so many files, and at the same time I've been trying to guess at the use case Robert Kern regards as typical when he wrote: Your use case isn't so typical and so suffers on the import time end of the balance I.e. most people don't start up NumPy all the time -- they import NumPy, and then do some calculations, which typically take longer than the import time. and trying to figure out what code would break if those modules weren't all eagerly imported and were instead written as most other Python modules are written. For a benefit of 0.03s, I don't think it's worth it. I have two thoughts for why mega-importing might be useful: - interactive users get to do tab complete and see everything (eg, import numpy means numpy.fft.ifft works, without having to do import numpy.fft manually) Numpy has a very flat namespace, for better or worse, which implies many imports. This can't be easily changed without modifying the API. Is the numpy recommendation that people should do: import numpy numpy.fft.ifft(data) That's the way many people use it. ? If so, the documentation should be updated to say that random, ma, ctypeslib and several other libraries are included in that list. Thanks for pointing that out, I'll edit the documentation wiki. Why is the last so important that it should be in the top- level namespace? It's a single Python file -- does it make much of a difference? In my opinion, this assistance is counter to standard practice in effectively every other Python package. I don't see the benefit. How do you propose we change this? BTW, the load and save commands in io do an incorrect check. if isinstance(file, type()): fid = _file(file,rb) else: fid = file Thanks, fixed. [snip lots of suggestions] Getting rid of these functions, and thus getting rid of the import speeds numpy startup time by 3.5%. While I appreciate you taking the time to find these niggles, but we are short on developer time as it is. Asking them to spend their precious time on making a 3.5% improvement in startup time does not make much sense. If you provide a patch, on the other hand, it would only take a matter of seconds to decide whether to apply or not. You've already done most of the sleuth work. I could probably get another 0.05 seconds if I dug around more, but I can't without knowing what use case numpy is trying to achieve. Why are all those ancillary modules (testing, ctypeslib) eagerly loaded when there seems no need for that feature? Need is relative. You need fast startup time, but most of our users need quick access to whichever functions they want (and often use from an interactive terminal). I agree that testing and ctypeslib do not belong in that category, but they don't seem to do much harm either. Regards Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Jul 30, 2008, at 10:59 PM, Stéfan van der Walt wrote: I.e. most people don't start up NumPy all the time -- they import NumPy, and then do some calculations, which typically take longer than the import time. Is that interactively, or is that through programs? For a benefit of 0.03s, I don't think it's worth it. The final number with all the hundredths of a second added up to 0.08 seconds, which was about 30% of the 'import numpy' cost. Numpy has a very flat namespace, for better or worse, which implies many imports. I don't get the feeling that numpy is flat. Python's stdlib is flat. Numpy has many 2- and 3-level modules. Is the numpy recommendation that people should do: import numpy numpy.fft.ifft(data) That's the way many people use it. The normal Python way is: from numpy import fft fft.ifft(data) because in most packages, parent modules don't import all of their children. I acknowledge that existing numpy code will break with my desired change, as this example from the tutorial import numpy import pylab # Build a vector of 1 normal deviates with variance 0.5^2 and mean 2 mu, sigma = 2, 0.5 v = numpy.random.normal(mu,sigma,1) and I am not saying to change this code. Instead, I am asking for limits on the eagerness, with a long-term goal of minimizing its use. Why is [ctypeslib] so important that it should be in the top- level namespace? It's a single Python file -- does it make much of a difference? The file imports other files. Here's the import chain: ctypeslib: 0.047 (numpy) ctypes: -1.000 (ctypeslib) _ctypes: 0.003 (ctypes) gestalt: -1.000 (ctypes) ma: 0.005 (numpy) extras: 0.001 (ma) numpy.lib.index_tricks: 0.000 (extras) numpy.lib.polynomial: 0.000 (extras) (The -1.000 indicates a bug in my instrumentation script, which I worked around with a -1.0 value.) Every numpy program, because it eagerly imports 'ctypeslib' to make it be accessible as a top-level variable, ends up importing ctypes. if 1: ... t1 = time.time() ... import ctypes ... t2 = time.time() ... t2-t1 0.032159090042114258 That's 10% of the import time. In my opinion, this assistance is counter to standard practice in effectively every other Python package. I don't see the benefit. How do you propose we change this? If I had my way, remove things like (in numpy/__init__.py) import linalg import fft import random import ctypeslib import ma but leave the list of submodules in __all__ so that from numpy import * works. Perhaps add a top-level function to 'import_all()' which mimics the current behavior, and have iPython know about it so interactive users get it automatically. Or something like that. Yes, I know the numpy team won't change this behavior. I want to know why you all will consider changing. Something more concrete: change the top-level definitions in 'numpy' from from testing import Tester test = Tester().test bench = Tester().bench with def test(label='fast', verbose=1, extra_argv=None, doctests=False, coverage=False, **kwargs): from testing import Tester Tester.test(label, verbose, extra_argv, doctests, coverage, **kwargs and do something similar for 'bench'. Note that numpy currently implements numpy.test -- this is a Tester().test numpy.testing.test -- another Tester().test bound method so there's some needless and distracting, but extremely minor, duplication. Getting rid of these functions, and thus getting rid of the import speeds numpy startup time by 3.5%. While I appreciate you taking the time to find these niggles, but we are short on developer time as it is. Asking them to spend their precious time on making a 3.5% improvement in startup time does not make much sense. If you provide a patch, on the other hand, it would only take a matter of seconds to decide whether to apply or not. You've already done most of the sleuth work. I wrote that I don't know the reasons for why the design was as it is. Are those functions (english_upper, english_lower, english_capitalize) expected as part of the public interface for the module? The lack of a _ prefix and their verbose docstrings implies that they are for general use. In that case, they can't easily be gotten rid of. Yet it doesn't make sense for them to be part of 'numerictypes'. Why would I submit a patch if there's no way those definitions will disappear, for reasons I am not aware of? I am not asking you all to make these changes. I'm asking about how much change is acceptable, what are the restrictions, and why are they there? I also haven't yet figured out how to get the regression tests to run, and I'm not going to contribute patches without at least passing that bare minimum. BTW, how do I do that? In the top-level there's a 'test.sh' command but when I run it I get: % mkdir
Re: [Numpy-discussion] import numpy is slow
On Jul 30, 2008, at 10:51 PM, Alan McIntyre wrote: I suppose it's necessary for providing the test() and bench() functions in subpackages, but I that isn't a good reason to impose upon all users the time required to set up numpy.testing. I just posted this in my reply to Stéfan, but I'll say it again here. numpy defines numpy.test numpy.bench and numpy.testing.test The two 'test's use the same implementation. This is a likely unneeded duplication and one should be removed. The choice depends on if people think the name should be 'numpy.test' or 'numpy.testing.test'. BTW, where's the on-line documentation for these functions? They are actually bound methods, and I wondered if the doc programs handle them okay. If they should be top-level functions then I would prefer the be actual functions to hide an import. In that case, replace from testing import Tester test = Tester().test with def test(label='fast', verbose=1, extra_argv=None, doctests=False, coverage=False, **kwargs): from testing import Tester Tester.test(label, verbose, extra_argv, doctests, coverage, **kwargs) or something similar. This would keep the API unchanged (assuming those are important in the top-level) and reduce the number of imports. Else I would keep/move them in 'numpy.testing' and require that if someone wants to use 'test' or 'bench' then to get them after a 'from numpy import testing'. Thanks for taking the time to find those; I just removed the unused glob and delayed the import of shlex, difflib, and inspect in numpy.testing. Thanks! Andrew [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Thu, 2008-07-31 at 02:07 +0200, Andrew Dalke wrote: On Jul 30, 2008, at 10:59 PM, Stéfan van der Walt wrote: I.e. most people don't start up NumPy all the time -- they import NumPy, and then do some calculations, which typically take longer than the import time. Is that interactively, or is that through programs? Most people use it interactively, or for long running programs. Import times only matters for interactive commands depending on numpy. and I am not saying to change this code. Instead, I am asking for limits on the eagerness, with a long-term goal of minimizing its use. For new API, this is never done, and is a bug if it is. In scipy, typically, import scipy does not import the whole subpackages list. I also haven't yet figured out how to get the regression tests to run, and I'm not going to contribute patches without at least passing that bare minimum. BTW, how do I do that? In the top-level there's a 'test.sh' command but when I run it I get: Argh, this file should have never ended here, that's entirely my fault. It was a merge from a (at the time) experimental branch. I can't remove it now because my company does not allow subversion access, but I will fix this tonight. Sorry for the confusion. and when I run 'nosetests' in the top-level directory I get: ImportError: Error importing numpy: you should not try to import numpy from its source directory; please exit the numpy source tree, and relaunch your python intepreter from there. I couldn't find (in a cursory search) instructions for running self- tests or regression tests. You are supposed to run the tests on an installed numpy, not in the sources: import numpy numpy.test(verbose = 10) You can't really use run numpy without it to be installed first (which is what the message is about). cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Wed, Jul 30, 2008 at 8:19 PM, Andrew Dalke [EMAIL PROTECTED] wrote: numpy defines numpy.test numpy.bench and numpy.testing.test The two 'test's use the same implementation. This is a likely unneeded duplication and one should be removed. The choice depends on if people think the name should be 'numpy.test' or 'numpy.testing.test'. They actually do two different things; numpy.test() runs test for all of numpy, and numpy.testing.test() runs tests for numpy.testing only. There are similar functions in numpy.lib, numpy.core, etc. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Jul 31, 2008, at 4:21 AM, Alan McIntyre wrote: They actually do two different things; numpy.test() runs test for all of numpy, and numpy.testing.test() runs tests for numpy.testing only. There are similar functions in numpy.lib, numpy.core, etc. Really? This is the code from numpy/__init__.py: from testing import Tester test = Tester().test bench = Tester().bench This is the code from numpy/testing/__init__.py: test = Tester().test ... ahhh, here's the magic, from testing/nosetester.py:NoseTester if package is None: f = sys._getframe(1) package = f.f_locals.get('__file__', None) assert package is not None package = os.path.dirname(package) Why are 'test' and 'bench' part of the general API instead something only used during testing? Andrew [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Jul 3, 2008, at 9:06 AM, Robert Kern wrote: Can you try the SVN trunk? Sure. Though did you know it's not easy to find how to get numpy from SVN? I had to go to the second page of Google, which linked to someone's talk. I expected to find a link to it at http://numpy.scipy.org/ . Just like I expected to find a link to the numpy mailing list. Okay, compiled. [josiah:numpy/build/lib.macosx-10.3-fat-2.5] dalke% time python -c 'pass' 0.015u 0.042s 0:00.06 83.3% 0+0k 0+0io 0pf+0w [josiah:numpy/build/lib.macosx-10.3-fat-2.5] dalke% time python -c 'import numpy' 0.084u 0.231s 0:00.33 93.9% 0+0k 0+8io 0pf+0w [josiah:numpy/build/lib.macosx-10.3-fat-2.5] dalke% Previously it took 0.44 seconds so it's now 24% faster. I would be interested to know how significantly it improves your use case. For one of my clients I wrote a tool to analyze import times. I don't have it, but here's something similar I just now whipped up: import time seen = set() import_order = [] elapsed_times = {} level = 0 parent = None children = {} def new_import(name, globals, locals, fromlist): global level, parent if name in seen: return old_import(name, globals, locals, fromlist) seen.add(name) import_order.append((name, level, parent)) t1 = time.time() old_parent = parent parent = name level += 1 module = old_import(name, globals, locals, fromlist) level -= 1 parent = old_parent t2 = time.time() elapsed_times[name] = t2-t1 return module old_import = __builtins__.__import__ __builtins__.__import__ = new_import import numpy parents = {} for name, level, parent in import_order: parents[name] = parent print == Tree == for name, level,parent in import_order: print %s%s: %.3f (%s) % ( *level, name, elapsed_times[name], parent) print \n print == Slowest (including children) == slowest = sorted((t, name) for (name, t) in elapsed_times.items())[-20:] for elapsed_time, name in slowest[::-1]: print %.3f %s (%s) % (elapsed_time, name, parents[name]) The result using the version out of subversion is == Tree == numpy: 0.237 (None) numpy.__config__: 0.000 (numpy) version: 0.000 (numpy) os: 0.000 (version) imp: 0.000 (version) _import_tools: 0.024 (numpy) sys: 0.000 (_import_tools) glob: 0.024 (_import_tools) fnmatch: 0.020 (glob) re: 0.018 (fnmatch) sre_compile: 0.009 (re) _sre: 0.000 (sre_compile) sre_constants: 0.004 (sre_compile) sre_parse: 0.006 (re) copy_reg: 0.000 (re) add_newdocs: 0.156 (numpy) lib: 0.150 (add_newdocs) info: 0.000 (lib) numpy.version: 0.000 (lib) type_check: 0.091 (lib) ... many lines removed ... mtrand: 0.021 (numpy) ctypeslib: 0.024 (numpy) ctypes: 0.023 (ctypeslib) _ctypes: 0.003 (ctypes) gestalt: 0.013 (ctypes) ctypes._endian: 0.001 (ctypes) numpy.core._internal: 0.000 (ctypeslib) ma: 0.005 (numpy) extras: 0.001 (ma) numpy.lib.index_tricks: 0.000 (extras) numpy.lib.polynomial: 0.000 (extras) == Slowest (including children) == 0.237 numpy (None) 0.156 add_newdocs (numpy) 0.150 lib (add_newdocs) 0.091 type_check (lib) 0.090 numpy.core.numeric (type_check) 0.049 io (lib) 0.048 numpy.testing (numpy.core.numeric) 0.024 _import_tools (numpy) 0.024 ctypeslib (numpy) 0.024 glob (_import_tools) 0.023 ctypes (ctypeslib) 0.022 utils (numpy.testing) 0.022 difflib (utils) 0.021 mtrand (numpy) 0.020 fnmatch (glob) 0.020 _datasource (io) 0.020 tempfile (io) 0.018 re (fnmatch) 0.018 heapq (difflib) 0.013 gestalt (ctypes) This only reports the first time a module is imported so fixing, say, the 'glob' in _import_tools doesn't mean it won't appear elsewhere. Andrew [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Mon, Jun 30, 2008 at 18:32, Andrew Dalke [EMAIL PROTECTED] wrote: Why does numpy/__init__.py need to import all of these other modules and submodules? Any chance of cutting down on the number, in order to improve startup costs? Can you try the SVN trunk? In another thread (it must be numpy imports slowly! week), David Cournapeau found some optimizations that could be done that don't affect the API. They seem to cut down my import times (on OS X) by about 1/3; on his Linux machine, it seems to be more. I would be interested to know how significantly it improves your use case. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
Would it not be possible to import just the necessary module of numpy to meet the necessary functionality of your application. i.e. import numpy.core or whatever you're using you could even do: import numpy.core as numpy I think, to simplify your code, I'm no expert though. Hanni 2008/7/1 Andrew Dalke [EMAIL PROTECTED]: On Jul 1, 2008, at 2:22 AM, Robert Kern wrote: Your use case isn't so typical and so suffers on the import time end of the balance. I'm working on my presentation for EuroSciPy. Isn't so typical seems to be a good summary of my first slide. :) Any chance of cutting down on the number, in order to improve startup costs? Not at this point in time, no. That would break too much code. Understood. Thanks for the response, Andrew [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
Hi, IIRC, il you do import numpy.core as numpy, it starts by importing numpy, so it will be even slower. Matthieu 2008/7/1 Hanni Ali [EMAIL PROTECTED]: Would it not be possible to import just the necessary module of numpy to meet the necessary functionality of your application. i.e. import numpy.core or whatever you're using you could even do: import numpy.core as numpy I think, to simplify your code, I'm no expert though. Hanni 2008/7/1 Andrew Dalke [EMAIL PROTECTED]: On Jul 1, 2008, at 2:22 AM, Robert Kern wrote: Your use case isn't so typical and so suffers on the import time end of the balance. I'm working on my presentation for EuroSciPy. Isn't so typical seems to be a good summary of my first slide. :) Any chance of cutting down on the number, in order to improve startup costs? Not at this point in time, no. That would break too much code. Understood. Thanks for the response, Andrew [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
You are correct, it appears to take slightly longer to import numpy.core and longer again to import numpy.core as numpy I should obviously check first in future. Hanni 2008/7/1 Matthieu Brucher [EMAIL PROTECTED]: Hi, IIRC, il you do import numpy.core as numpy, it starts by importing numpy, so it will be even slower. Matthieu 2008/7/1 Hanni Ali [EMAIL PROTECTED]: Would it not be possible to import just the necessary module of numpy to meet the necessary functionality of your application. i.e. import numpy.core or whatever you're using you could even do: import numpy.core as numpy I think, to simplify your code, I'm no expert though. Hanni 2008/7/1 Andrew Dalke [EMAIL PROTECTED]: On Jul 1, 2008, at 2:22 AM, Robert Kern wrote: Your use case isn't so typical and so suffers on the import time end of the balance. I'm working on my presentation for EuroSciPy. Isn't so typical seems to be a good summary of my first slide. :) Any chance of cutting down on the number, in order to improve startup costs? Not at this point in time, no. That would break too much code. Understood. Thanks for the response, Andrew [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
2008/7/1 Hanni Ali [EMAIL PROTECTED]: Would it not be possible to import just the necessary module of numpy to meet the necessary functionality of your application. Matthieu Brucher responded: IIRC, il you do import numpy.core as numpy, it starts by importing numpy, so it will be even slower. which you can see if you start python with the -v option to display imports. import numpy.core import numpy # directory /Library/Frameworks/Python.framework/ Versions/2.5/lib/python2.5/site-packages/numpy # /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/ site-packages/numpy/__init__.pyc matches /Library/Frameworks/ Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/ __init__.py import numpy # precompiled from /Library/Frameworks/Python.framework/ Versions/2.5/lib/python2.5/site-packages/numpy/__init__.pyc # /Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/ site-packages/numpy/__config__.pyc matches /Library/Frameworks/ Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/ __config__.py import numpy.__config__ # precompiled from /Library/Frameworks/ Python.framework/Versions/2.5/lib/python2.5/site-packages/numpy/ __config__.pyc ... and many more Andrew [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] import numpy is slow
(Trying again now that I'm subscribed. BTW, there's no link to the subscription page from numpy.scipy.org .) The initial 'import numpy' loads a huge number of modules, even when I don't need them. Python 2.5 (r25:51918, Sep 19 2006, 08:49:13) [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin Type help, copyright, credits or license for more information. import sys len(sys.modules) 28 import numpy len(sys.modules) 256 len([s for s in sorted(sys.modules) if 'numpy' in s]) 127 numpy.__version__ '1.1.0' As a result, I assume that's the reason my program's startup cost is quite high. [josiah:~/src/fp] dalke% time python -c 'a=4' 0.014u 0.038s 0:00.05 80.0% 0+0k 0+1io 0pf+0w [josiah:~/src/fp] dalke% time python -c 'import numpy' 0.161u 0.279s 0:00.44 97.7% 0+0k 0+9io 0pf+0w My total runtime is something like 1.4 seconds, and the only thing I'm using NumPy for is to make an array of doubles that I can pass to a C extension. (I could use the array module or ctypes, but figured numpy is more useful for downstream code.) Why does numpy/__init__.py need to import all of these other modules and submodules? Any chance of cutting down on the number, in order to improve startup costs? Andrew [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Mon, Jun 30, 2008 at 18:32, Andrew Dalke [EMAIL PROTECTED] wrote: (Trying again now that I'm subscribed. BTW, there's no link to the subscription page from numpy.scipy.org .) The initial 'import numpy' loads a huge number of modules, even when I don't need them. Python 2.5 (r25:51918, Sep 19 2006, 08:49:13) [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin Type help, copyright, credits or license for more information. import sys len(sys.modules) 28 import numpy len(sys.modules) 256 len([s for s in sorted(sys.modules) if 'numpy' in s]) 127 numpy.__version__ '1.1.0' As a result, I assume that's the reason my program's startup cost is quite high. [josiah:~/src/fp] dalke% time python -c 'a=4' 0.014u 0.038s 0:00.05 80.0% 0+0k 0+1io 0pf+0w [josiah:~/src/fp] dalke% time python -c 'import numpy' 0.161u 0.279s 0:00.44 97.7% 0+0k 0+9io 0pf+0w My total runtime is something like 1.4 seconds, and the only thing I'm using NumPy for is to make an array of doubles that I can pass to a C extension. (I could use the array module or ctypes, but figured numpy is more useful for downstream code.) Why does numpy/__init__.py need to import all of these other modules and submodules? Strictly speaking, there is no *need* for any of it. It was a judgment call trading off import time for the convenience in fairly typical use cases which do use functions across the breadth of the library. Your use case isn't so typical and so suffers on the import time end of the balance. Any chance of cutting down on the number, in order to improve startup costs? Not at this point in time, no. That would break too much code. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy is slow
On Jul 1, 2008, at 2:22 AM, Robert Kern wrote: Your use case isn't so typical and so suffers on the import time end of the balance. I'm working on my presentation for EuroSciPy. Isn't so typical seems to be a good summary of my first slide. :) Any chance of cutting down on the number, in order to improve startup costs? Not at this point in time, no. That would break too much code. Understood. Thanks for the response, Andrew [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion