Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Nick Coghlan
Guido van Rossum wrote:
 Just to add some skepticism, has anyone done any kind of
 instrumentation of bzr start-up behavior?  IIRC every time I was asked
 to reduce the start-up cost of some Python app, the cause was too many
 imports, and the solution was either to speed up import itself (.pyc
 files were the first thing ever that came out of that -- importing
 from a single .zip file is one of the more recent tricks) or to reduce
 the number of modules imported at start-up (or both :-). Heavy-weight
 frameworks are usually the root cause, but usually there's nothing
 that can be done about that by the time you've reached this point. So,
 amen on the good luck, but please start with a bit of analysis.

This problem (slow application startup times due to too many imports at
startup, which can in turn can be due to top level imports for library
or framework functionality that a given application doesn't actually
use) is actually the main reason I sometimes wish for a nice, solid lazy
module import mechanism that manages to avoid the potential deadlock
problems created by using import statements inside functions.

Providing a clean API and implementation for that functionality is a
pretty tough nut to crack though, so I'm not holding my breath...

Cheers,
Nick.

P.S. It's only an occasional fairly idle wish for me though, or I'd have
at least tried to come up with something myself by now.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Robert Collins
On Thu, 2009-04-09 at 21:26 -0700, Guido van Rossum wrote:

 Just to add some skepticism, has anyone done any kind of
 instrumentation of bzr start-up behavior?

We sure have. 'bzr --profile-imports' reports on the time to import
different modules (both cumulative and individually).

We have a lazy module loader that allows us to defer loading modules we
might not use (though if they are needed we are in fact going to pay for
loading them eventually).

We monkeypatch the standard library where modules we want are
unreasonably expensive to import (for instance by making a regex we
wouldn't use be lazy compiled rather than compiled at import time).

   IIRC every time I was asked
 to reduce the start-up cost of some Python app, the cause was too many
 imports, and the solution was either to speed up import itself (.pyc
 files were the first thing ever that came out of that -- importing
 from a single .zip file is one of the more recent tricks) or to reduce
 the number of modules imported at start-up (or both :-). Heavy-weight
 frameworks are usually the root cause, but usually there's nothing
 that can be done about that by the time you've reached this point. So,
 amen on the good luck, but please start with a bit of analysis.

Certainly, import time is part of it:
robe...@lifeless-64:~$ python -m timeit -s 'import sys;  import
bzrlib.errors' del sys.modules['bzrlib.errors']; import bzrlib.errors
10 loops, best of 3: 18.7 msec per loop

(errors.py is 3027 lines long with 347 exception classes).

We've also looked lower - python does a lot of stat operations search
for imports and determining if the pyc is up to date; these appear to
only really matter on cold-cache imports (but they matter a lot then);
in hot-cache situations they are insignificant.

Uhm, there's probably more - but I just wanted to note that we have done
quite a bit of analysis. I think a large chunk of our problem is having
too much code loaded when only a small fraction will be used in any one
operation. Consider importing bzrlib errors - 10% of the startup time
for 'bzr help'. In any operation only a few of those exceptions will be
used - and typically 0.

-Rob


signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Antoine Pitrou
Robert Collins robert.collins at canonical.com writes:
 
 (errors.py is 3027 lines long with 347 exception classes).

347 exception classes? Perhaps your framework is over-engineered.

Similarly, when using a heavy Web framework, reloading a Web app can take
several seconds... but I won't blame Python for that.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Robert Collins
On Fri, 2009-04-10 at 11:52 +, Antoine Pitrou wrote:
 Robert Collins robert.collins at canonical.com writes:
  
  (errors.py is 3027 lines long with 347 exception classes).
 
 347 exception classes? Perhaps your framework is over-engineered.
 
 Similarly, when using a heavy Web framework, reloading a Web app can take
 several seconds... but I won't blame Python for that.

Well, we've added exceptions as we needed them. This isn't much
different to errno in C programs; the errno range has expanded as people
have wanted to signal that specific situations have arisen. The key
thing for us is to have both something that can be caught (for library
users of bzrlib) and something that can be formatted with variable
substitution (for displaying to users). If there are better ways to
approach this in python than what we've done, that would be great.

-Rob


signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Peter Otten
John Arbash Meinel wrote:

 Not as big of a difference as I thought it would be... But I bet if
 there was a way to put the random shuffle in the inner loop, so you
 weren't accessing the same identical 25k keys internally, you might get
 more interesting results.

You can prepare a few random samples during startup:

$ python -m timeit -sfrom random import sample; d =
dict.fromkeys(xrange(10**7)); nextrange = iter([sample(xrange(10**7),25000)
for i in range(200)]).next for x in nextrange(): d.get(x)
10 loops, best of 3: 20.2 msec per loop

To put it into perspective:
 
$ python -m timeit -sd = dict.fromkeys(xrange(10**7)); nextrange =
iter([range(25000)]*200).next for x in nextrange(): d.get(x)
100 loops, best of 3: 10.9 msec per loop

Peter

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread Toshio Kuratomi
Robert Collins wrote:

 Certainly, import time is part of it:
 robe...@lifeless-64:~$ python -m timeit -s 'import sys;  import
 bzrlib.errors' del sys.modules['bzrlib.errors']; import bzrlib.errors
 10 loops, best of 3: 18.7 msec per loop
 
 (errors.py is 3027 lines long with 347 exception classes).
 
 We've also looked lower - python does a lot of stat operations search
 for imports and determining if the pyc is up to date; these appear to
 only really matter on cold-cache imports (but they matter a lot then);
 in hot-cache situations they are insignificant.
 
Tarek, Georg, and I talked about a way to do both multi-version and
speedup of this exact problem with import in the future at pycon.  I had
to leave before the hackfest got started, though, so I don't know where
the idea went from there.  Tarek, did this idea progress any?

-Toshio



signature.asc
Description: OpenPGP digital signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-10 Thread P.J. Eby

At 06:52 PM 4/10/2009 +1000, Nick Coghlan wrote:

This problem (slow application startup times due to too many imports at
startup, which can in turn can be due to top level imports for library
or framework functionality that a given application doesn't actually
use) is actually the main reason I sometimes wish for a nice, solid lazy
module import mechanism that manages to avoid the potential deadlock
problems created by using import statements inside functions.


Have you tried http://pypi.python.org/pypi/Importing ? Or more 
specifically, http://peak.telecommunity.com/DevCenter/Importing#lazy-imports ?


It does of course use the import lock, but as long as your top-level 
module code doesn't acquire locks (directly or indirectly), it 
shouldn't be possible to deadlock.  (Or more precisely, to add any 
*new* deadlocks that you didn't already have.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Aahz
On Thu, Apr 09, 2009, John Arbash Meinel wrote:

 PS I'm not yet subscribed to python-dev, so if you could make sure to
 CC me in replies, I would appreciate it.

Please do subscribe to python-dev ASAP; I also suggest that you subscribe
to python-ideas, because I suspect that this is sufficiently blue-sky to
start there.

As always, this is the kind of thing where code trumps gedanken, so you
shouldn't expect much activity unless either you are willing to make at
least initial attempts at trying out your ideas or someone else just
happens to find it interesting.  In general, the core Python
implementation strives for simplicity, so there's already some built-in
pushback.
-- 
Aahz (a...@pythoncraft.com)   * http://www.pythoncraft.com/

Why is this newsgroup different from all other newsgroups?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Dirkjan Ochtman
On Thu, Apr 9, 2009 at 17:31, Aahz a...@pythoncraft.com wrote:
 Please do subscribe to python-dev ASAP; I also suggest that you subscribe
 to python-ideas, because I suspect that this is sufficiently blue-sky to
 start there.

It might also be interesting to the unladen-swallow guys.

Cheers,

Dirkjan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Collin Winter
Hi John,

On Thu, Apr 9, 2009 at 8:02 AM, John Arbash Meinel
j...@arbash-meinel.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 I've been doing some memory profiling of my application, and I've found
 some interesting results with how intern() works. I was pretty surprised
 to see that the interned dict was actually consuming a significant
 amount of total memory.
 To give the specific values, after doing:
  bzr branch A B
 of a small project, the total memory consumption is ~21MB

[snip]

 Anyway, I the internals of intern() could be done a bit better. Here are
 some concrete things:

[snip]

Memory usage is definitely something we're interested in improving.
Since you've already looked at this in some detail, could you try
implementing one or two of your ideas and see if it makes a difference
in memory consumption? Changing from a dict to a set looks promising,
and should be a fairly self-contained way of starting on this. If it
works, please post the patch on http://bugs.python.org with your
results and assign it to me for review.

Thanks,
Collin Winter
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel
...

 Anyway, I the internals of intern() could be done a bit better. Here are
 some concrete things:
 

 [snip]

 Memory usage is definitely something we're interested in improving.
 Since you've already looked at this in some detail, could you try
 implementing one or two of your ideas and see if it makes a difference
 in memory consumption? Changing from a dict to a set looks promising,
 and should be a fairly self-contained way of starting on this. If it
 works, please post the patch on http://bugs.python.org with your
 results and assign it to me for review.

 Thanks,
 Collin Winter
   
(I did end up subscribing, just with a different email address :)

What is the best branch to start working from? trunk?

John
=:-

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Collin Winter
On Thu, Apr 9, 2009 at 9:34 AM, John Arbash Meinel
john.arbash.mei...@gmail.com wrote:
 ...

 Anyway, I the internals of intern() could be done a bit better. Here are
 some concrete things:


 [snip]

 Memory usage is definitely something we're interested in improving.
 Since you've already looked at this in some detail, could you try
 implementing one or two of your ideas and see if it makes a difference
 in memory consumption? Changing from a dict to a set looks promising,
 and should be a fairly self-contained way of starting on this. If it
 works, please post the patch on http://bugs.python.org with your
 results and assign it to me for review.

 Thanks,
 Collin Winter

 (I did end up subscribing, just with a different email address :)

 What is the best branch to start working from? trunk?

That's a good place to start, yes. If the idea works well, we'll want
to port it to the py3k branch, too, but that can wait.

Collin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Christian Heimes
John Arbash Meinel wrote:
 When I looked at the actual references from interned, I saw mostly
 variable names. Considering that every variable goes through the python
 intern dict. And when you look at the intern function, it doesn't use
 setdefault logic, it actually does a get() followed by a set(), which
 means the cost of interning is 1-2 lookups depending on likelyhood, etc.
 (I saw a whole lot of strings as the error codes in win32all /
 winerror.py, and windows error codes tend to be longer-than-average
 variable length.)

I've read your posting twice but I'm still not sure if you are aware of
the most important feature of interned strings. In the first place
interning not about saving some bytes of memory but a speed
optimization. Interned strings can be compared with a simple and fast
pointer comparison. With interend strings you can simple write:

char *a, *b;
if (a == b) {
...
}

Instead of:

char *a, *b;
if (strcmp(a, b) == 0) {
...
}

A compiler can optimize the pointer comparison much better than a
function call.

 Anyway, I the internals of intern() could be done a bit better. Here are
 some concrete things:
 
   a) Don't keep a double reference to both key and value to the same
  object (1 pointer per entry), this could be as simple as using a
  Set() instead of a dict()
 
   b) Don't cache the hash key in the set, as strings already cache them.
  (1 long per entry). This is a big win for space, but would need to
  be balanced against lookup and collision resolving speed.
 
  My guess is that reducing the size of the set will actually improve
  speed more, because more items can fit in cache. It depends on how
  many times you need to resolve a collision. If the string hash is
  sufficiently spread out, and the load factor is reasonable, then
  likely when you actually find an item in the set, it will be the
  item you want, and you'll need to bring the string object into
  cache anyway, so that you can do a string comparison (rather than
  just a hash comparison.)
 
   c) Use the existing lookup function one time. (PySet-lookup())
  Sets already have a lookup which is optimized for strings, and
  returns a pointer to where the object would go if it exists. Which
  means the intern() function can do a single lookup resolving any
  collisions, and return the object or insert without doing a second
  lookup.
 
   d) Having a special structure might also allow for separate optimizing
  of things like 'default size', 'grow rate', 'load factor', etc. A
  lot of this could be tuned specifically knowing that we really only
  have 1 of these objects, and it is going to be pointing at a lot of
  strings that are  50 bytes long.
 
  If hashes of variable name strings are well distributed, we could
  probably get away with a load factor of 2. If we know we are likely
  to have lots and lots that never go away (you rarely *unload*
  modules, and all variable names are in the intern dict), that would
  suggest having a large initial size, and probably a wide growth
  factor to avoid spending a lot of time resizing the set.

I agree that a dict is not the most memory efficient data structure for
interned strings. However dicts are extremely well tested and highly
optimized. Any specialized data structure needs to be desinged and
tested very carefully. If you happen to break the interning system it's
going to lead to rather nasty and hard to debug problems.

   e) How tuned is String.hash() for the fact that most of these strings
  are going to be ascii text? (I know that python wants to support
  non-ascii variable names, but I still think there is going to be an
  overwhelming bias towards characters in the range 65-122 ('A'-'z').

Python 3.0 uses unicode for all names. You have to design something that
can be adopted to unicode, too. By the way do you know that dicts have
an optimized lookup function for strings? It's called lookdict_unicode /
 lookdict_string.

 Also note that the performance of the interned dict gets even worse on
 64-bit platforms. Where the size of a 'dictentry' doubles, but the
 average length of a variable name wouldn't change.
 
 Anyway, I would be happy to implement something along the lines of a
 StringSet, or maybe the InternSet, etc. I just wanted to check if
 people would be interested or not.

Since interning is mostly used in the core and extension modules you
might want to experiment with a different growth rate. The interning
data structure could start with a larger value and have a slower, non
progressive data growth rate.

Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel
Christian Heimes wrote:
 John Arbash Meinel wrote:
 When I looked at the actual references from interned, I saw mostly
 variable names. Considering that every variable goes through the python
 intern dict. And when you look at the intern function, it doesn't use
 setdefault logic, it actually does a get() followed by a set(), which
 means the cost of interning is 1-2 lookups depending on likelyhood, etc.
 (I saw a whole lot of strings as the error codes in win32all /
 winerror.py, and windows error codes tend to be longer-than-average
 variable length.)
 
 I've read your posting twice but I'm still not sure if you are aware of
 the most important feature of interned strings. In the first place
 interning not about saving some bytes of memory but a speed
 optimization. Interned strings can be compared with a simple and fast
 pointer comparison. With interend strings you can simple write:
 
 char *a, *b;
 if (a == b) {
 ...
 }
 
 Instead of:
 
 char *a, *b;
 if (strcmp(a, b) == 0) {
 ...
 }
 
 A compiler can optimize the pointer comparison much better than a
 function call.
 

Certainly. But there is a cost associated with calling intern() in the
first place. You created a string, and you are now trying to de-dup it.
That cost is both in the memory to track all strings interned so far,
and the cost to do a dict lookup. And the way intern is currently
written, there is a third cost when the item doesn't exist yet, which is
another lookup to insert the object.

I'll also note that increasing memory does have a semi-direct effect on
performance, because more memory requires more time to bring memory back
and forth from main memory to CPU caches.

...

 I agree that a dict is not the most memory efficient data structure for
 interned strings. However dicts are extremely well tested and highly
 optimized. Any specialized data structure needs to be desinged and
 tested very carefully. If you happen to break the interning system it's
 going to lead to rather nasty and hard to debug problems.

Sure. My plan was to basically take the existing Set/Dict design, and
just tweak it slightly for the expected operations of interned.

 
   e) How tuned is String.hash() for the fact that most of these strings
  are going to be ascii text? (I know that python wants to support
  non-ascii variable names, but I still think there is going to be an
  overwhelming bias towards characters in the range 65-122 ('A'-'z').
 
 Python 3.0 uses unicode for all names. You have to design something that
 can be adopted to unicode, too. By the way do you know that dicts have
 an optimized lookup function for strings? It's called lookdict_unicode /
  lookdict_string.

Sure, but so does PySet. I'm not sure about lookset_unicode, but I would
guess that exists or should exist for py3k.

 
 Also note that the performance of the interned dict gets even worse on
 64-bit platforms. Where the size of a 'dictentry' doubles, but the
 average length of a variable name wouldn't change.

 Anyway, I would be happy to implement something along the lines of a
 StringSet, or maybe the InternSet, etc. I just wanted to check if
 people would be interested or not.
 
 Since interning is mostly used in the core and extension modules you
 might want to experiment with a different growth rate. The interning
 data structure could start with a larger value and have a slower, non
 progressive data growth rate.
 
 Christian

I'll also mention that there are other uses for intern() where it is
uniquely suitable. Namely, if you are parsing lots of text with
redundant strings, it is a way to decrease total memory consumption.
(And potentially speed up future comparisons, etc.)

The main reason why intern() is useful for this is because it doesn't
make strings immortal, as would happen if you used some other structure.
Because strings know about the interned object.

The options for a 3rd-party structure fall down into something like:

1) A cache that makes the strings immortal. (IIRC this is what older
versions of Python did.)

2) A cache that is periodically walked to see if any of the objects are
no longer externally referenced. The main problem here is that walking
is O(all-objects), whereas doing the checking at refcount=0 time means
you only check objects when you think the last reference has gone away.

3) Hijacking PyStringType-dealloc, so that when the refcount goes to 0
and Python want's to destroy the string, you then trigger your own cache
to look and see if it should remove the object.

Even further, you either have to check on every string dealloc, or
re-use PyStringObject-ob_sstate to track that you have placed this
string into your custom structure. Which would preclude ever calling
intern() on this string, because intern() doesn't just check a couple
bits, it looks at the entire ob_sstate value.

I think you could make it work, such that if your custom cache had set
some values, then intern() would just 

Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel
Alexander Belopolsky wrote:
 On Thu, Apr 9, 2009 at 11:02 AM, John Arbash Meinel
 j...@arbash-meinel.com wrote:
 ...
  a) Don't keep a double reference to both key and value to the same
 object (1 pointer per entry), this could be as simple as using a
 Set() instead of a dict()

 
 There is a rejected patch implementing just that:
 http://bugs.python.org/issue1507011 .
 

Thanks for the heads up.


So reading that thread, the final reason it was rejected was 2 part:

  Without reviewing the patch again, I also doubt it is capable of
  getting rid of the reference count cheating: essentially, this
  cheating enables the interning dictionary to have weak references to
  strings, this is important to allow automatic collection of certain
  interned strings. This feature needs to be preserved, so the cheating
  in the reference count must continue.

That specific argument was invalid. Because the patch just changed the
refcount trickery to use +- 1. And I'm pretty sure Alexander's argument
was just that +- 2 was weird, not that the weakref behavior was bad.

The other argument against the patch was based on the idea that:
  The operation give me the member equal but not identical to E is
  conceptually a lookup operation; the mathematical set construct has no
  such operation, and the Python set models it closely. IOW, set is
  *not* a dict with key==value.


I don't know if there was any consensus reached on this, since only
Martin responded this way.


I can say that for my do some work with a medium size code base, the
overhead of interned as a dictionary was 1.5MB out of 20MB total memory.

Simply changing it to a Set would drop this to 1.0MB. I have no proof
about the impact on performance, since I haven't benchmarked it yet.

Changing it to a StringSet could further drop it to 0.5MB. I would guess
that any performance impact would depend on whether the total size of
'interned' would fit inside L2 cache or not.


There is a small bug in the original patch adding the string to the set
failed. Namely it would return t == NULL which would be t != s and
the intern in place would end up setting your pointer to NULL rather
than doing nothing and clearing the error code.


So I guess some of it comes down to whether loweis would also reject
this change on the basis that mathematically a set is not a dict.
Though given that his claim nobody else is speaking in favor of the
patch, while at least Colin Winter has expressed some interest at this
point.

John
=:-
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Martin v. Löwis
 So I guess some of it comes down to whether loweis would also reject
 this change on the basis that mathematically a set is not a dict.

I'd like to point out that this was not the reason to reject it.
Instead, this (or, the opposite of it) was given as a reason why this
patch should be accepted (in msg50482). I found that a weak rationale
for making that change, in particular because I think the rationale
is incorrect.

I like your rationale (save memory) much more, and was asking in the
tracker for specific numbers, which weren't forthcoming.

 Though given that his claim nobody else is speaking in favor of the
 patch, while at least Colin Winter has expressed some interest at this
 point.

Again, at that point in the tracker, none of the other committers had
spoken in favor of the patch. Since I wasn't convinced of its
correctness, and nobody else (whom I trust) had reviewed it as correct,
I rejected it.

Now that you brought up a specific numbers, I tried to verify them,
and found them correct (although a bit unfortunate), please see my
test script below. Up to 21800 interned strings, the dict takes (only)
384kiB. It then grows, requiring 1536kiB. Whether or not having 22k
interned strings is typical, I still don't know.

Wrt. your proposed change, I would be worried about maintainability,
in particular if it would copy parts of the set implementation.

Regards,
Martin

import gc, sys
def find_interned_dict():
cand = None
for o in gc.get_objects():
if not isinstance(o, dict):
continue
if find_interned_dict not in o:
continue
for k,v in o.iteritems():
if k is not v:
break
else:
assert not cand
cand = o
return cand

d = find_interned_dict()
print len(d), sys.getsizeof(d)

l = []
for i in range(2):
if i%100==0:
print len(d), sys.getsizeof(d)

l.append(intern(repr(i)))
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel

...

 I like your rationale (save memory) much more, and was asking in the
 tracker for specific numbers, which weren't forthcoming.
 

...

 Now that you brought up a specific numbers, I tried to verify them,
 and found them correct (although a bit unfortunate), please see my
 test script below. Up to 21800 interned strings, the dict takes (only)
 384kiB. It then grows, requiring 1536kiB. Whether or not having 22k
 interned strings is typical, I still don't know.

Given that every variable name in any file is interned, it can grow
pretty rapidly. As an extreme case, consider the file
win32/lib/winerror.py which tracks all possible win32 errors.

 import winerror
 print len(winerror.__dict__)
1872

So a single error file has 1.9k strings.

My python version (2.5.2) doesn't have 'sys.getsizeof()', but otherwise
your code looks correct.

If all I do is find the interned dict, I see:
 print len(d)
5037

So stock python, without importing much extra (just os, sys, gc, etc.)
has almost 5k strings already.

I don't have a great regex yet for just extracting how many unique
strings there are in a given bit of source code.

However, if I do:

import gc, sys
def find_interned_dict():
cand = None
for o in gc.get_objects():
if not isinstance(o, dict):
continue
if find_interned_dict not in o:
continue
for k,v in o.iteritems():
if k is not v:
break
else:
assert not cand
cand = o
return cand

d = find_interned_dict()
print len(d)

# Just import a few of the core structures
from bzrlib import branch, repository, workingtree, builtins
print len(d)

I start at 5k strings, and after just importing the important bits of
bzrlib, I'm at:
19,316

Now, the bzrlib source code isn't particularly huge. It is about 3.7MB /
91k lines of .py files (that is, without importing the test suite).

Memory consumption with just importing bzrlib shows up at 15MB, with
300kB taken up by the intern dict.

If I then import some extra bits of bzrlib, like http support, ftp
support, and sftp support (which brings in python's httplib, and
paramiko, and ssh/sftp implementation), I'm up to:
 print len(d)
25186

Memory has jumped to 23MB, (interned is now 1.57MB) and I haven't
actually done anything but import python code yet. If I sum the size of
the PyString objects held in intern() it ammounts to 940KB. Though they
refer to only 335KB of char data. (or an average of 13 bytes per string).

 
 Wrt. your proposed change, I would be worried about maintainability,
 in particular if it would copy parts of the set implementation.

Right, so in the first part, I would just use Set(), as it could then
save 1/3rd of the memory it uses today. (Dropping down to 1MB from 1.5MB.)

I don't have numbers on how much that would improve CPU times, I would
imagine improving 'intern()' would impact import times more than run
times, simply because import time is interning a *lot* of strings.

Though honestly, Bazaar would really like this, because startup overhead
for us is almost 400ms to 'do nothing', which is a lot for a command
line app.

John
=:-

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Martin v. Löwis
 I don't have numbers on how much that would improve CPU times, I would
 imagine improving 'intern()' would impact import times more than run
 times, simply because import time is interning a *lot* of strings.
 
 Though honestly, Bazaar would really like this, because startup overhead
 for us is almost 400ms to 'do nothing', which is a lot for a command
 line app.

Maybe I misunderstand your proposed change: how could the representation
of the interning dict possibly change the runtime of interning? (let
alone significantly)

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel
Martin v. Löwis wrote:
 I don't have numbers on how much that would improve CPU times, I would
 imagine improving 'intern()' would impact import times more than run
 times, simply because import time is interning a *lot* of strings.

 Though honestly, Bazaar would really like this, because startup overhead
 for us is almost 400ms to 'do nothing', which is a lot for a command
 line app.
 
 Maybe I misunderstand your proposed change: how could the representation
 of the interning dict possibly change the runtime of interning? (let
 alone significantly)
 
 Regards,
 Martin
 

Decreasing memory consumption lets more things fit in cache. Once the
size of 'interned' is greater than fits into L2 cache, you start paying
the cost of a full memory fetch, which is usually measured in 100s of
cpu cycles.

Avoiding double lookups in the dictionary would be less overhead, though
the second lookup is probably pretty fast if there are no collisions,
since everything would already be in the local CPU cache.

If we were dealing in objects that were KB in size, it wouldn't matter.
But as the intern dict quickly gets into MB, it starts to make a bigger
difference.

How big of a difference would be very CPU and dataset size specific. But
certainly caches make certain things much faster, and once you overflow
a cache, performance can take a surprising turn.

So my primary goal is certainly a decrease of memory consumption. I
think it will have a small knock-on effect of improving performance, I
don't have anything to give concrete numbers.

Also, consider that resizing has to evaluate every object, thus paging
in all X bytes, and assigning to another 2X bytes. Cutting X by
(potentially 3), would probably have a small but measurable effect.

John
=:-
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Martin v. Löwis
 Also, consider that resizing has to evaluate every object, thus paging
 in all X bytes, and assigning to another 2X bytes. Cutting X by
 (potentially 3), would probably have a small but measurable effect.

I'm *very* skeptical about claims on performance in the absence of
actual measurements. Too many effects come together, so the actual
performance is difficult to predict (and, for that prediction, you
would need *at least* a work load that you want to measure - starting
bzr would be such a workload, of course).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Jake McGuire

On Apr 9, 2009, at 12:06 PM, Martin v. Löwis wrote:

Now that you brought up a specific numbers, I tried to verify them,
and found them correct (although a bit unfortunate), please see my
test script below. Up to 21800 interned strings, the dict takes (only)
384kiB. It then grows, requiring 1536kiB. Whether or not having 22k
interned strings is typical, I still don't know.

Wrt. your proposed change, I would be worried about maintainability,
in particular if it would copy parts of the set implementation.



I connected to a random one of our processes, which has been running  
for a typical amount of time and is currently at ~300MB RSS.


(gdb) p *(PyDictObject*)interned
$2 = {ob_refcnt = 1,
  ob_type = 0x8121240,
  ma_fill = 97239,
  ma_used = 95959,
  ma_mask = 262143,
  ma_table = 0xa493c008,
  }

Going from 3MB to 2.25MB isn't much, but it's not nothing, either.

I'd be skeptical of cache performance arguments given that the strings  
used in any particular bit of code should be spread pretty much evenly  
throughout the hash table, and 3MB seems solidly bigger than any L2  
cache I know of.  You should be able to get meaningful numbers out of  
a C profiler, but I'd be surprised to see the act of interning taking  
a noticeable amount of time.


-jake
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Greg Ewing

John Arbash Meinel wrote:

And when you look at the intern function, it doesn't use
setdefault logic, it actually does a get() followed by a set(), which
means the cost of interning is 1-2 lookups depending on likelyhood, etc.


Keep in mind that intern() is called fairly rarely, mostly
only at module load time. It may not be worth attempting
to speed it up.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Benjamin Peterson
2009/4/9 Greg Ewing greg.ew...@canterbury.ac.nz:
 John Arbash Meinel wrote:

 And when you look at the intern function, it doesn't use
 setdefault logic, it actually does a get() followed by a set(), which
 means the cost of interning is 1-2 lookups depending on likelyhood, etc.

 Keep in mind that intern() is called fairly rarely, mostly
 only at module load time. It may not be worth attempting
 to speed it up.

That's very important, though, for a command line tool for bazaar.
Even a few fractions of a second can make a difference in user
perception of speed.



-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel
Greg Ewing wrote:
 John Arbash Meinel wrote:
 And the way intern is currently
 written, there is a third cost when the item doesn't exist yet, which is
 another lookup to insert the object.
 
 That's even rarer still, since it only happens the first
 time you load a piece of code that uses a given variable
 name anywhere in any module.
 

Somewhat true, though I know it happens 25k times during startup of
bzr... And I would be a *lot* happier if startup time was 100ms instead
of 400ms.

John
=:-

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Mike Klaas


On 9-Apr-09, at 6:24 PM, John Arbash Meinel wrote:


Greg Ewing wrote:

John Arbash Meinel wrote:

And the way intern is currently
written, there is a third cost when the item doesn't exist yet,  
which is

another lookup to insert the object.


That's even rarer still, since it only happens the first
time you load a piece of code that uses a given variable
name anywhere in any module.



Somewhat true, though I know it happens 25k times during startup of
bzr... And I would be a *lot* happier if startup time was 100ms  
instead

of 400ms.


I don't want to quash your idealism too severely, but it is extremely  
unlikely that you are going to get anywhere near that kind of speed up  
by tweaking string interning.  25k times doing anything (computation)  
just isn't all that much.


$ python -mtimeit -s 'd=dict.fromkeys(xrange(1000))' 'for x in  
xrange(25000): d.get(x)'

100 loops, best of 3: 8.28 msec per loop

Perhaps this isn't representative (int hashing is ridiculously cheap,  
for instance), but the dict itself is far bigger than the dict you are  
dealing with and such would have similar cache-busting properties.   
And yet, 25k accesses (plus python-c dispatching costs which you are  
paying with interning) consume only ~10ms.  You could do more good by  
eliminating a handful of disk seeks by reducing the number of imported  
modules...


-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Jeffrey Yasskin
On Thu, Apr 9, 2009 at 6:24 PM, John Arbash Meinel
john.arbash.mei...@gmail.com wrote:
 Greg Ewing wrote:
 John Arbash Meinel wrote:
 And the way intern is currently
 written, there is a third cost when the item doesn't exist yet, which is
 another lookup to insert the object.

 That's even rarer still, since it only happens the first
 time you load a piece of code that uses a given variable
 name anywhere in any module.


 Somewhat true, though I know it happens 25k times during startup of
 bzr... And I would be a *lot* happier if startup time was 100ms instead
 of 400ms.

I think you have plenty of a case to try it out. If you code it up and
it doesn't speed anything up, well then we've learned something, and
maybe it'll be useful anyway for the memory savings. If it does speed
things up, well then Python's faster. I wouldn't waste time arguing
about it before you have the change written.

Good luck!
Jeffrey
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Collin Winter
On Thu, Apr 9, 2009 at 6:24 PM, John Arbash Meinel
john.arbash.mei...@gmail.com wrote:
 Greg Ewing wrote:
 John Arbash Meinel wrote:
 And the way intern is currently
 written, there is a third cost when the item doesn't exist yet, which is
 another lookup to insert the object.

 That's even rarer still, since it only happens the first
 time you load a piece of code that uses a given variable
 name anywhere in any module.


 Somewhat true, though I know it happens 25k times during startup of
 bzr... And I would be a *lot* happier if startup time was 100ms instead
 of 400ms.

Quite so. We have a number of internal tools, and they find that
frequently just starting up Python takes several times the duration of
the actual work unit itself. I'd be very interested to review any
patches you come up with to improve start-up time; so far on this
thread, there's been a lot of theory and not much practice. I'd
approach this iteratively: first replace the dict with a set, then if
that bears fruit, consider a customized data structure; if that bears
fruit, etc.

Good luck, and be sure to let us know what you find,
Collin Winter
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Guido van Rossum
On Thu, Apr 9, 2009 at 9:07 PM, Collin Winter coll...@gmail.com wrote:
 On Thu, Apr 9, 2009 at 6:24 PM, John Arbash Meinel 
 john.arbash.mei...@gmail.com wrote:

 And I would be a *lot* happier if startup time was 100ms instead
  of 400ms.

 Quite so. We have a number of internal tools, and they find that
 frequently just starting up Python takes several times the duration of
 the actual work unit itself. I'd be very interested to review any
 patches you come up with to improve start-up time; so far on this
 thread, there's been a lot of theory and not much practice. I'd
 approach this iteratively: first replace the dict with a set, then if
 that bears fruit, consider a customized data structure; if that bears
 fruit, etc.

 Good luck, and be sure to let us know what you find,

Just to add some skepticism, has anyone done any kind of
instrumentation of bzr start-up behavior?  IIRC every time I was asked
to reduce the start-up cost of some Python app, the cause was too many
imports, and the solution was either to speed up import itself (.pyc
files were the first thing ever that came out of that -- importing
from a single .zip file is one of the more recent tricks) or to reduce
the number of modules imported at start-up (or both :-). Heavy-weight
frameworks are usually the root cause, but usually there's nothing
that can be done about that by the time you've reached this point. So,
amen on the good luck, but please start with a bit of analysis.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread John Arbash Meinel

...
 Somewhat true, though I know it happens 25k times during startup of
 bzr... And I would be a *lot* happier if startup time was 100ms instead
 of 400ms.
 
 I don't want to quash your idealism too severely, but it is extremely
 unlikely that you are going to get anywhere near that kind of speed up
 by tweaking string interning.  25k times doing anything (computation)
 just isn't all that much.
 
 $ python -mtimeit -s 'd=dict.fromkeys(xrange(1000))' 'for x in
 xrange(25000): d.get(x)'
 100 loops, best of 3: 8.28 msec per loop
 
 Perhaps this isn't representative (int hashing is ridiculously cheap,
 for instance), but the dict itself is far bigger than the dict you are
 dealing with and such would have similar cache-busting properties.  And
 yet, 25k accesses (plus python-c dispatching costs which you are paying
 with interning) consume only ~10ms.  You could do more good by
 eliminating a handful of disk seeks by reducing the number of imported
 modules...
 
 -Mike
 

You're also using timeit over the same set of 25k keys, which means it
only has to load that subset. And as you are using identical runs each
time, those keys are already loaded into your cache lines... And given
how hash(int) works, they are all sequential in memory, and all 10M in
your original set have 0 collisions. Actually, at 10M, you'll have a
dict of size 20M entries, and the first 10M entries will be full, and
the trailing 10M entries will all be empty.

That said, you're right, the benefits of a smaller structure are going
to be small. I'll just point that if I just do a small tweak to your
timing and do:

$ python -mtimeit -s 'd=dict.fromkeys(xrange(1000))' 'for x in
  xrange(25000): d.get(x)'
100 loops, best of 3: 6.27 msec per loop

So slightly faster than yours, *but*, lets try a much smaller dict:

$ python -mtimeit -s 'd=dict.fromkeys(xrange(25000))' 'for x in
  xrange(25000): d.get(x)'
100 loops, best of 3: 6.35 msec per loop

Pretty much the same time. Well within the noise margin. But if I go
back to the big dict and actually select 25k keys across the whole set:

$ TIMEIT -s 'd=dict.fromkeys(xrange(1000));' \
 -s keys=range(0,1000,1000/25000)' \
 'for x in keys: d.get(x)'
100 loops, best of 3: 13.1 msec per loop

Now I'm still accessing 25k keys, but I'm doing it across the whole
range, and suddenly the time *doubled*.

What about slightly more random access:
$ TIMEIT -s 'import random; d=dict.fromkeys(xrange(1000));'
-s 'bits = range(0, 1000, 400); random.shuffle(bits)'\
'for x in bits: d.get(x)'
100 loops, best of 3: 15.5 msec per loop

Not as big of a difference as I thought it would be... But I bet if
there was a way to put the random shuffle in the inner loop, so you
weren't accessing the same identical 25k keys internally, you might get
more interesting results.

As for other bits about exercising caches:

$ shuffle(range(0, 1000, 400))
100 loops, best of 3: 15.5 msec per loop

$ shuffle(range(0, 1000, 40))
10 loops, best of 3: 175 msec per loop

10x more keys, costs 11.3x, pretty close to linear.

$ shuffle(range(0, 1000, 10))
10 loops, best of 3: 739 msec per loop

4x the keys, 4.5x the time, starting to get more into nonlinear effects.

Anyway, you're absolutely right. intern() overhead is a tiny fraction of
'import bzrlib.*' time, so I don't expect to see amazing results. That
said, accessing 25k keys in a smaller structure is 2x faster than
accessing 25k keys spread across a larger structure.

John
=:-
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com