subject:"\[Python\-Dev\] Rethinking intern and its data structure"

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Nick Coghlan

Guido van Rossum wrote: Just to add some skepticism, has anyone done any kind of instrumentation of bzr start-up behavior? IIRC every time I was asked to reduce the start-up cost of some Python app, the cause was too many imports, and the solution was either to speed up import itself (.pyc

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Robert Collins

On Thu, 2009-04-09 at 21:26 -0700, Guido van Rossum wrote: Just to add some skepticism, has anyone done any kind of instrumentation of bzr start-up behavior? We sure have. 'bzr --profile-imports' reports on the time to import different modules (both cumulative and individually). We have a

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Antoine Pitrou

Robert Collins robert.collins at canonical.com writes: (errors.py is 3027 lines long with 347 exception classes). 347 exception classes? Perhaps your framework is over-engineered. Similarly, when using a heavy Web framework, reloading a Web app can take several seconds... but I won't blame

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Robert Collins

On Fri, 2009-04-10 at 11:52 +, Antoine Pitrou wrote: Robert Collins robert.collins at canonical.com writes: (errors.py is 3027 lines long with 347 exception classes). 347 exception classes? Perhaps your framework is over-engineered. Similarly, when using a heavy Web framework,

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Peter Otten

John Arbash Meinel wrote: Not as big of a difference as I thought it would be... But I bet if there was a way to put the random shuffle in the inner loop, so you weren't accessing the same identical 25k keys internally, you might get more interesting results. You can prepare a few random

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Toshio Kuratomi

Robert Collins wrote: Certainly, import time is part of it: robe...@lifeless-64:~$ python -m timeit -s 'import sys; import bzrlib.errors' del sys.modules['bzrlib.errors']; import bzrlib.errors 10 loops, best of 3: 18.7 msec per loop (errors.py is 3027 lines long with 347 exception

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread P.J. Eby

At 06:52 PM 4/10/2009 +1000, Nick Coghlan wrote: This problem (slow application startup times due to too many imports at startup, which can in turn can be due to top level imports for library or framework functionality that a given application doesn't actually use) is actually the main reason I

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Aahz

On Thu, Apr 09, 2009, John Arbash Meinel wrote: PS I'm not yet subscribed to python-dev, so if you could make sure to CC me in replies, I would appreciate it. Please do subscribe to python-dev ASAP; I also suggest that you subscribe to python-ideas, because I suspect that this is sufficiently

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Dirkjan Ochtman

On Thu, Apr 9, 2009 at 17:31, Aahz a...@pythoncraft.com wrote: Please do subscribe to python-dev ASAP; I also suggest that you subscribe to python-ideas, because I suspect that this is sufficiently blue-sky to start there. It might also be interesting to the unladen-swallow guys. Cheers,

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Collin Winter

Hi John, On Thu, Apr 9, 2009 at 8:02 AM, John Arbash Meinel j...@arbash-meinel.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I've been doing some memory profiling of my application, and I've found some interesting results with how intern() works. I was pretty surprised to see

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel

... Anyway, I the internals of intern() could be done a bit better. Here are some concrete things: [snip] Memory usage is definitely something we're interested in improving. Since you've already looked at this in some detail, could you try implementing one or two of your ideas and

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Collin Winter

On Thu, Apr 9, 2009 at 9:34 AM, John Arbash Meinel john.arbash.mei...@gmail.com wrote: ... Anyway, I the internals of intern() could be done a bit better. Here are some concrete things: [snip] Memory usage is definitely something we're interested in improving. Since you've already looked

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Christian Heimes

John Arbash Meinel wrote: When I looked at the actual references from interned, I saw mostly variable names. Considering that every variable goes through the python intern dict. And when you look at the intern function, it doesn't use setdefault logic, it actually does a get() followed by a

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel

Christian Heimes wrote: John Arbash Meinel wrote: When I looked at the actual references from interned, I saw mostly variable names. Considering that every variable goes through the python intern dict. And when you look at the intern function, it doesn't use setdefault logic, it actually does

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel

Alexander Belopolsky wrote: On Thu, Apr 9, 2009 at 11:02 AM, John Arbash Meinel j...@arbash-meinel.com wrote: ... a) Don't keep a double reference to both key and value to the same object (1 pointer per entry), this could be as simple as using a Set() instead of a dict() There

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Martin v. Löwis

So I guess some of it comes down to whether loweis would also reject this change on the basis that mathematically a set is not a dict. I'd like to point out that this was not the reason to reject it. Instead, this (or, the opposite of it) was given as a reason why this patch should be accepted

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel

... I like your rationale (save memory) much more, and was asking in the tracker for specific numbers, which weren't forthcoming. ... Now that you brought up a specific numbers, I tried to verify them, and found them correct (although a bit unfortunate), please see my test script below.

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Martin v. Löwis

I don't have numbers on how much that would improve CPU times, I would imagine improving 'intern()' would impact import times more than run times, simply because import time is interning a *lot* of strings. Though honestly, Bazaar would really like this, because startup overhead for us is

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel

Martin v. Löwis wrote: I don't have numbers on how much that would improve CPU times, I would imagine improving 'intern()' would impact import times more than run times, simply because import time is interning a *lot* of strings. Though honestly, Bazaar would really like this, because startup

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Martin v. Löwis

Also, consider that resizing has to evaluate every object, thus paging in all X bytes, and assigning to another 2X bytes. Cutting X by (potentially 3), would probably have a small but measurable effect. I'm *very* skeptical about claims on performance in the absence of actual measurements. Too

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Jake McGuire

On Apr 9, 2009, at 12:06 PM, Martin v. Löwis wrote: Now that you brought up a specific numbers, I tried to verify them, and found them correct (although a bit unfortunate), please see my test script below. Up to 21800 interned strings, the dict takes (only) 384kiB. It then grows, requiring

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Greg Ewing

John Arbash Meinel wrote: And when you look at the intern function, it doesn't use setdefault logic, it actually does a get() followed by a set(), which means the cost of interning is 1-2 lookups depending on likelyhood, etc. Keep in mind that intern() is called fairly rarely, mostly only at

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Benjamin Peterson

2009/4/9 Greg Ewing greg.ew...@canterbury.ac.nz: John Arbash Meinel wrote: And when you look at the intern function, it doesn't use setdefault logic, it actually does a get() followed by a set(), which means the cost of interning is 1-2 lookups depending on likelyhood, etc. Keep in mind

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel

Greg Ewing wrote: John Arbash Meinel wrote: And the way intern is currently written, there is a third cost when the item doesn't exist yet, which is another lookup to insert the object. That's even rarer still, since it only happens the first time you load a piece of code that uses a given

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Mike Klaas

On 9-Apr-09, at 6:24 PM, John Arbash Meinel wrote: Greg Ewing wrote: John Arbash Meinel wrote: And the way intern is currently written, there is a third cost when the item doesn't exist yet, which is another lookup to insert the object. That's even rarer still, since it only happens the

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Jeffrey Yasskin

On Thu, Apr 9, 2009 at 6:24 PM, John Arbash Meinel john.arbash.mei...@gmail.com wrote: Greg Ewing wrote: John Arbash Meinel wrote: And the way intern is currently written, there is a third cost when the item doesn't exist yet, which is another lookup to insert the object. That's even rarer

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Collin Winter

On Thu, Apr 9, 2009 at 6:24 PM, John Arbash Meinel john.arbash.mei...@gmail.com wrote: Greg Ewing wrote: John Arbash Meinel wrote: And the way intern is currently written, there is a third cost when the item doesn't exist yet, which is another lookup to insert the object. That's even rarer

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Guido van Rossum

On Thu, Apr 9, 2009 at 9:07 PM, Collin Winter coll...@gmail.com wrote: On Thu, Apr 9, 2009 at 6:24 PM, John Arbash Meinel john.arbash.mei...@gmail.com wrote: And I would be a *lot* happier if startup time was 100ms instead of 400ms. Quite so. We have a number of internal tools, and they

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel

... Somewhat true, though I know it happens 25k times during startup of bzr... And I would be a *lot* happier if startup time was 100ms instead of 400ms. I don't want to quash your idealism too severely, but it is extremely unlikely that you are going to get anywhere near that kind of

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

Re: [Python-Dev] Rethinking intern() and its data structure

29 matches

Site Navigation

Mail list logo

Footer information