Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Nick Coghlan
Guido van Rossum wrote: Just to add some skepticism, has anyone done any kind of instrumentation of bzr start-up behavior? IIRC every time I was asked to reduce the start-up cost of some Python app, the cause was too many imports, and the solution was either to speed up import itself (.pyc

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Robert Collins
On Thu, 2009-04-09 at 21:26 -0700, Guido van Rossum wrote: Just to add some skepticism, has anyone done any kind of instrumentation of bzr start-up behavior? We sure have. 'bzr --profile-imports' reports on the time to import different modules (both cumulative and individually). We have a

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Antoine Pitrou
Robert Collins robert.collins at canonical.com writes: (errors.py is 3027 lines long with 347 exception classes). 347 exception classes? Perhaps your framework is over-engineered. Similarly, when using a heavy Web framework, reloading a Web app can take several seconds... but I won't blame

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Robert Collins
On Fri, 2009-04-10 at 11:52 +, Antoine Pitrou wrote: Robert Collins robert.collins at canonical.com writes: (errors.py is 3027 lines long with 347 exception classes). 347 exception classes? Perhaps your framework is over-engineered. Similarly, when using a heavy Web framework,

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Peter Otten
John Arbash Meinel wrote: Not as big of a difference as I thought it would be... But I bet if there was a way to put the random shuffle in the inner loop, so you weren't accessing the same identical 25k keys internally, you might get more interesting results. You can prepare a few random

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Toshio Kuratomi
Robert Collins wrote: Certainly, import time is part of it: robe...@lifeless-64:~$ python -m timeit -s 'import sys; import bzrlib.errors' del sys.modules['bzrlib.errors']; import bzrlib.errors 10 loops, best of 3: 18.7 msec per loop (errors.py is 3027 lines long with 347 exception

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread P.J. Eby
At 06:52 PM 4/10/2009 +1000, Nick Coghlan wrote: This problem (slow application startup times due to too many imports at startup, which can in turn can be due to top level imports for library or framework functionality that a given application doesn't actually use) is actually the main reason I

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Aahz
On Thu, Apr 09, 2009, John Arbash Meinel wrote: PS I'm not yet subscribed to python-dev, so if you could make sure to CC me in replies, I would appreciate it. Please do subscribe to python-dev ASAP; I also suggest that you subscribe to python-ideas, because I suspect that this is sufficiently

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Dirkjan Ochtman
On Thu, Apr 9, 2009 at 17:31, Aahz a...@pythoncraft.com wrote: Please do subscribe to python-dev ASAP; I also suggest that you subscribe to python-ideas, because I suspect that this is sufficiently blue-sky to start there. It might also be interesting to the unladen-swallow guys. Cheers,

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Collin Winter
Hi John, On Thu, Apr 9, 2009 at 8:02 AM, John Arbash Meinel j...@arbash-meinel.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I've been doing some memory profiling of my application, and I've found some interesting results with how intern() works. I was pretty surprised to see

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel
... Anyway, I the internals of intern() could be done a bit better. Here are some concrete things: [snip] Memory usage is definitely something we're interested in improving. Since you've already looked at this in some detail, could you try implementing one or two of your ideas and

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Collin Winter
On Thu, Apr 9, 2009 at 9:34 AM, John Arbash Meinel john.arbash.mei...@gmail.com wrote: ... Anyway, I the internals of intern() could be done a bit better. Here are some concrete things: [snip] Memory usage is definitely something we're interested in improving. Since you've already looked

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Christian Heimes
John Arbash Meinel wrote: When I looked at the actual references from interned, I saw mostly variable names. Considering that every variable goes through the python intern dict. And when you look at the intern function, it doesn't use setdefault logic, it actually does a get() followed by a

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel
Christian Heimes wrote: John Arbash Meinel wrote: When I looked at the actual references from interned, I saw mostly variable names. Considering that every variable goes through the python intern dict. And when you look at the intern function, it doesn't use setdefault logic, it actually does

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel
Alexander Belopolsky wrote: On Thu, Apr 9, 2009 at 11:02 AM, John Arbash Meinel j...@arbash-meinel.com wrote: ... a) Don't keep a double reference to both key and value to the same object (1 pointer per entry), this could be as simple as using a Set() instead of a dict() There

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Martin v. Löwis
So I guess some of it comes down to whether loweis would also reject this change on the basis that mathematically a set is not a dict. I'd like to point out that this was not the reason to reject it. Instead, this (or, the opposite of it) was given as a reason why this patch should be accepted

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel
... I like your rationale (save memory) much more, and was asking in the tracker for specific numbers, which weren't forthcoming. ... Now that you brought up a specific numbers, I tried to verify them, and found them correct (although a bit unfortunate), please see my test script below.

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Martin v. Löwis
I don't have numbers on how much that would improve CPU times, I would imagine improving 'intern()' would impact import times more than run times, simply because import time is interning a *lot* of strings. Though honestly, Bazaar would really like this, because startup overhead for us is

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel
Martin v. Löwis wrote: I don't have numbers on how much that would improve CPU times, I would imagine improving 'intern()' would impact import times more than run times, simply because import time is interning a *lot* of strings. Though honestly, Bazaar would really like this, because startup

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Martin v. Löwis
Also, consider that resizing has to evaluate every object, thus paging in all X bytes, and assigning to another 2X bytes. Cutting X by (potentially 3), would probably have a small but measurable effect. I'm *very* skeptical about claims on performance in the absence of actual measurements. Too

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Jake McGuire
On Apr 9, 2009, at 12:06 PM, Martin v. Löwis wrote: Now that you brought up a specific numbers, I tried to verify them, and found them correct (although a bit unfortunate), please see my test script below. Up to 21800 interned strings, the dict takes (only) 384kiB. It then grows, requiring

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Greg Ewing
John Arbash Meinel wrote: And when you look at the intern function, it doesn't use setdefault logic, it actually does a get() followed by a set(), which means the cost of interning is 1-2 lookups depending on likelyhood, etc. Keep in mind that intern() is called fairly rarely, mostly only at

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Benjamin Peterson
2009/4/9 Greg Ewing greg.ew...@canterbury.ac.nz: John Arbash Meinel wrote: And when you look at the intern function, it doesn't use setdefault logic, it actually does a get() followed by a set(), which means the cost of interning is 1-2 lookups depending on likelyhood, etc. Keep in mind

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel
Greg Ewing wrote: John Arbash Meinel wrote: And the way intern is currently written, there is a third cost when the item doesn't exist yet, which is another lookup to insert the object. That's even rarer still, since it only happens the first time you load a piece of code that uses a given

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Mike Klaas
On 9-Apr-09, at 6:24 PM, John Arbash Meinel wrote: Greg Ewing wrote: John Arbash Meinel wrote: And the way intern is currently written, there is a third cost when the item doesn't exist yet, which is another lookup to insert the object. That's even rarer still, since it only happens the

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Jeffrey Yasskin
On Thu, Apr 9, 2009 at 6:24 PM, John Arbash Meinel john.arbash.mei...@gmail.com wrote: Greg Ewing wrote: John Arbash Meinel wrote: And the way intern is currently written, there is a third cost when the item doesn't exist yet, which is another lookup to insert the object. That's even rarer

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Collin Winter
On Thu, Apr 9, 2009 at 6:24 PM, John Arbash Meinel john.arbash.mei...@gmail.com wrote: Greg Ewing wrote: John Arbash Meinel wrote: And the way intern is currently written, there is a third cost when the item doesn't exist yet, which is another lookup to insert the object. That's even rarer

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Guido van Rossum
On Thu, Apr 9, 2009 at 9:07 PM, Collin Winter coll...@gmail.com wrote: On Thu, Apr 9, 2009 at 6:24 PM, John Arbash Meinel john.arbash.mei...@gmail.com wrote: And I would be a *lot* happier if startup time was 100ms instead of 400ms. Quite so. We have a number of internal tools, and they

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel
... Somewhat true, though I know it happens 25k times during startup of bzr... And I would be a *lot* happier if startup time was 100ms instead of 400ms. I don't want to quash your idealism too severely, but it is extremely unlikely that you are going to get anywhere near that kind of