Re: [Python-Dev] Rethinking intern() and its data structure
Guido van Rossum wrote: Just to add some skepticism, has anyone done any kind of instrumentation of bzr start-up behavior? IIRC every time I was asked to reduce the start-up cost of some Python app, the cause was too many imports, and the solution was either to speed up import itself (.pyc files were the first thing ever that came out of that -- importing from a single .zip file is one of the more recent tricks) or to reduce the number of modules imported at start-up (or both :-). Heavy-weight frameworks are usually the root cause, but usually there's nothing that can be done about that by the time you've reached this point. So, amen on the good luck, but please start with a bit of analysis. This problem (slow application startup times due to too many imports at startup, which can in turn can be due to top level imports for library or framework functionality that a given application doesn't actually use) is actually the main reason I sometimes wish for a nice, solid lazy module import mechanism that manages to avoid the potential deadlock problems created by using import statements inside functions. Providing a clean API and implementation for that functionality is a pretty tough nut to crack though, so I'm not holding my breath... Cheers, Nick. P.S. It's only an occasional fairly idle wish for me though, or I'd have at least tried to come up with something myself by now. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia --- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
On Thu, 2009-04-09 at 21:26 -0700, Guido van Rossum wrote: Just to add some skepticism, has anyone done any kind of instrumentation of bzr start-up behavior? We sure have. 'bzr --profile-imports' reports on the time to import different modules (both cumulative and individually). We have a lazy module loader that allows us to defer loading modules we might not use (though if they are needed we are in fact going to pay for loading them eventually). We monkeypatch the standard library where modules we want are unreasonably expensive to import (for instance by making a regex we wouldn't use be lazy compiled rather than compiled at import time). IIRC every time I was asked to reduce the start-up cost of some Python app, the cause was too many imports, and the solution was either to speed up import itself (.pyc files were the first thing ever that came out of that -- importing from a single .zip file is one of the more recent tricks) or to reduce the number of modules imported at start-up (or both :-). Heavy-weight frameworks are usually the root cause, but usually there's nothing that can be done about that by the time you've reached this point. So, amen on the good luck, but please start with a bit of analysis. Certainly, import time is part of it: robe...@lifeless-64:~$ python -m timeit -s 'import sys; import bzrlib.errors' del sys.modules['bzrlib.errors']; import bzrlib.errors 10 loops, best of 3: 18.7 msec per loop (errors.py is 3027 lines long with 347 exception classes). We've also looked lower - python does a lot of stat operations search for imports and determining if the pyc is up to date; these appear to only really matter on cold-cache imports (but they matter a lot then); in hot-cache situations they are insignificant. Uhm, there's probably more - but I just wanted to note that we have done quite a bit of analysis. I think a large chunk of our problem is having too much code loaded when only a small fraction will be used in any one operation. Consider importing bzrlib errors - 10% of the startup time for 'bzr help'. In any operation only a few of those exceptions will be used - and typically 0. -Rob signature.asc Description: This is a digitally signed message part ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
Robert Collins robert.collins at canonical.com writes: (errors.py is 3027 lines long with 347 exception classes). 347 exception classes? Perhaps your framework is over-engineered. Similarly, when using a heavy Web framework, reloading a Web app can take several seconds... but I won't blame Python for that. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
On Fri, 2009-04-10 at 11:52 +, Antoine Pitrou wrote: Robert Collins robert.collins at canonical.com writes: (errors.py is 3027 lines long with 347 exception classes). 347 exception classes? Perhaps your framework is over-engineered. Similarly, when using a heavy Web framework, reloading a Web app can take several seconds... but I won't blame Python for that. Well, we've added exceptions as we needed them. This isn't much different to errno in C programs; the errno range has expanded as people have wanted to signal that specific situations have arisen. The key thing for us is to have both something that can be caught (for library users of bzrlib) and something that can be formatted with variable substitution (for displaying to users). If there are better ways to approach this in python than what we've done, that would be great. -Rob signature.asc Description: This is a digitally signed message part ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
John Arbash Meinel wrote: Not as big of a difference as I thought it would be... But I bet if there was a way to put the random shuffle in the inner loop, so you weren't accessing the same identical 25k keys internally, you might get more interesting results. You can prepare a few random samples during startup: $ python -m timeit -sfrom random import sample; d = dict.fromkeys(xrange(10**7)); nextrange = iter([sample(xrange(10**7),25000) for i in range(200)]).next for x in nextrange(): d.get(x) 10 loops, best of 3: 20.2 msec per loop To put it into perspective: $ python -m timeit -sd = dict.fromkeys(xrange(10**7)); nextrange = iter([range(25000)]*200).next for x in nextrange(): d.get(x) 100 loops, best of 3: 10.9 msec per loop Peter ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
Robert Collins wrote: Certainly, import time is part of it: robe...@lifeless-64:~$ python -m timeit -s 'import sys; import bzrlib.errors' del sys.modules['bzrlib.errors']; import bzrlib.errors 10 loops, best of 3: 18.7 msec per loop (errors.py is 3027 lines long with 347 exception classes). We've also looked lower - python does a lot of stat operations search for imports and determining if the pyc is up to date; these appear to only really matter on cold-cache imports (but they matter a lot then); in hot-cache situations they are insignificant. Tarek, Georg, and I talked about a way to do both multi-version and speedup of this exact problem with import in the future at pycon. I had to leave before the hackfest got started, though, so I don't know where the idea went from there. Tarek, did this idea progress any? -Toshio signature.asc Description: OpenPGP digital signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
At 06:52 PM 4/10/2009 +1000, Nick Coghlan wrote: This problem (slow application startup times due to too many imports at startup, which can in turn can be due to top level imports for library or framework functionality that a given application doesn't actually use) is actually the main reason I sometimes wish for a nice, solid lazy module import mechanism that manages to avoid the potential deadlock problems created by using import statements inside functions. Have you tried http://pypi.python.org/pypi/Importing ? Or more specifically, http://peak.telecommunity.com/DevCenter/Importing#lazy-imports ? It does of course use the import lock, but as long as your top-level module code doesn't acquire locks (directly or indirectly), it shouldn't be possible to deadlock. (Or more precisely, to add any *new* deadlocks that you didn't already have.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
On Thu, Apr 09, 2009, John Arbash Meinel wrote: PS I'm not yet subscribed to python-dev, so if you could make sure to CC me in replies, I would appreciate it. Please do subscribe to python-dev ASAP; I also suggest that you subscribe to python-ideas, because I suspect that this is sufficiently blue-sky to start there. As always, this is the kind of thing where code trumps gedanken, so you shouldn't expect much activity unless either you are willing to make at least initial attempts at trying out your ideas or someone else just happens to find it interesting. In general, the core Python implementation strives for simplicity, so there's already some built-in pushback. -- Aahz (a...@pythoncraft.com) * http://www.pythoncraft.com/ Why is this newsgroup different from all other newsgroups? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
On Thu, Apr 9, 2009 at 17:31, Aahz a...@pythoncraft.com wrote: Please do subscribe to python-dev ASAP; I also suggest that you subscribe to python-ideas, because I suspect that this is sufficiently blue-sky to start there. It might also be interesting to the unladen-swallow guys. Cheers, Dirkjan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
Hi John, On Thu, Apr 9, 2009 at 8:02 AM, John Arbash Meinel j...@arbash-meinel.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I've been doing some memory profiling of my application, and I've found some interesting results with how intern() works. I was pretty surprised to see that the interned dict was actually consuming a significant amount of total memory. To give the specific values, after doing: bzr branch A B of a small project, the total memory consumption is ~21MB [snip] Anyway, I the internals of intern() could be done a bit better. Here are some concrete things: [snip] Memory usage is definitely something we're interested in improving. Since you've already looked at this in some detail, could you try implementing one or two of your ideas and see if it makes a difference in memory consumption? Changing from a dict to a set looks promising, and should be a fairly self-contained way of starting on this. If it works, please post the patch on http://bugs.python.org with your results and assign it to me for review. Thanks, Collin Winter ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
... Anyway, I the internals of intern() could be done a bit better. Here are some concrete things: [snip] Memory usage is definitely something we're interested in improving. Since you've already looked at this in some detail, could you try implementing one or two of your ideas and see if it makes a difference in memory consumption? Changing from a dict to a set looks promising, and should be a fairly self-contained way of starting on this. If it works, please post the patch on http://bugs.python.org with your results and assign it to me for review. Thanks, Collin Winter (I did end up subscribing, just with a different email address :) What is the best branch to start working from? trunk? John =:- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
On Thu, Apr 9, 2009 at 9:34 AM, John Arbash Meinel john.arbash.mei...@gmail.com wrote: ... Anyway, I the internals of intern() could be done a bit better. Here are some concrete things: [snip] Memory usage is definitely something we're interested in improving. Since you've already looked at this in some detail, could you try implementing one or two of your ideas and see if it makes a difference in memory consumption? Changing from a dict to a set looks promising, and should be a fairly self-contained way of starting on this. If it works, please post the patch on http://bugs.python.org with your results and assign it to me for review. Thanks, Collin Winter (I did end up subscribing, just with a different email address :) What is the best branch to start working from? trunk? That's a good place to start, yes. If the idea works well, we'll want to port it to the py3k branch, too, but that can wait. Collin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
John Arbash Meinel wrote: When I looked at the actual references from interned, I saw mostly variable names. Considering that every variable goes through the python intern dict. And when you look at the intern function, it doesn't use setdefault logic, it actually does a get() followed by a set(), which means the cost of interning is 1-2 lookups depending on likelyhood, etc. (I saw a whole lot of strings as the error codes in win32all / winerror.py, and windows error codes tend to be longer-than-average variable length.) I've read your posting twice but I'm still not sure if you are aware of the most important feature of interned strings. In the first place interning not about saving some bytes of memory but a speed optimization. Interned strings can be compared with a simple and fast pointer comparison. With interend strings you can simple write: char *a, *b; if (a == b) { ... } Instead of: char *a, *b; if (strcmp(a, b) == 0) { ... } A compiler can optimize the pointer comparison much better than a function call. Anyway, I the internals of intern() could be done a bit better. Here are some concrete things: a) Don't keep a double reference to both key and value to the same object (1 pointer per entry), this could be as simple as using a Set() instead of a dict() b) Don't cache the hash key in the set, as strings already cache them. (1 long per entry). This is a big win for space, but would need to be balanced against lookup and collision resolving speed. My guess is that reducing the size of the set will actually improve speed more, because more items can fit in cache. It depends on how many times you need to resolve a collision. If the string hash is sufficiently spread out, and the load factor is reasonable, then likely when you actually find an item in the set, it will be the item you want, and you'll need to bring the string object into cache anyway, so that you can do a string comparison (rather than just a hash comparison.) c) Use the existing lookup function one time. (PySet-lookup()) Sets already have a lookup which is optimized for strings, and returns a pointer to where the object would go if it exists. Which means the intern() function can do a single lookup resolving any collisions, and return the object or insert without doing a second lookup. d) Having a special structure might also allow for separate optimizing of things like 'default size', 'grow rate', 'load factor', etc. A lot of this could be tuned specifically knowing that we really only have 1 of these objects, and it is going to be pointing at a lot of strings that are 50 bytes long. If hashes of variable name strings are well distributed, we could probably get away with a load factor of 2. If we know we are likely to have lots and lots that never go away (you rarely *unload* modules, and all variable names are in the intern dict), that would suggest having a large initial size, and probably a wide growth factor to avoid spending a lot of time resizing the set. I agree that a dict is not the most memory efficient data structure for interned strings. However dicts are extremely well tested and highly optimized. Any specialized data structure needs to be desinged and tested very carefully. If you happen to break the interning system it's going to lead to rather nasty and hard to debug problems. e) How tuned is String.hash() for the fact that most of these strings are going to be ascii text? (I know that python wants to support non-ascii variable names, but I still think there is going to be an overwhelming bias towards characters in the range 65-122 ('A'-'z'). Python 3.0 uses unicode for all names. You have to design something that can be adopted to unicode, too. By the way do you know that dicts have an optimized lookup function for strings? It's called lookdict_unicode / lookdict_string. Also note that the performance of the interned dict gets even worse on 64-bit platforms. Where the size of a 'dictentry' doubles, but the average length of a variable name wouldn't change. Anyway, I would be happy to implement something along the lines of a StringSet, or maybe the InternSet, etc. I just wanted to check if people would be interested or not. Since interning is mostly used in the core and extension modules you might want to experiment with a different growth rate. The interning data structure could start with a larger value and have a slower, non progressive data growth rate. Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
Christian Heimes wrote: John Arbash Meinel wrote: When I looked at the actual references from interned, I saw mostly variable names. Considering that every variable goes through the python intern dict. And when you look at the intern function, it doesn't use setdefault logic, it actually does a get() followed by a set(), which means the cost of interning is 1-2 lookups depending on likelyhood, etc. (I saw a whole lot of strings as the error codes in win32all / winerror.py, and windows error codes tend to be longer-than-average variable length.) I've read your posting twice but I'm still not sure if you are aware of the most important feature of interned strings. In the first place interning not about saving some bytes of memory but a speed optimization. Interned strings can be compared with a simple and fast pointer comparison. With interend strings you can simple write: char *a, *b; if (a == b) { ... } Instead of: char *a, *b; if (strcmp(a, b) == 0) { ... } A compiler can optimize the pointer comparison much better than a function call. Certainly. But there is a cost associated with calling intern() in the first place. You created a string, and you are now trying to de-dup it. That cost is both in the memory to track all strings interned so far, and the cost to do a dict lookup. And the way intern is currently written, there is a third cost when the item doesn't exist yet, which is another lookup to insert the object. I'll also note that increasing memory does have a semi-direct effect on performance, because more memory requires more time to bring memory back and forth from main memory to CPU caches. ... I agree that a dict is not the most memory efficient data structure for interned strings. However dicts are extremely well tested and highly optimized. Any specialized data structure needs to be desinged and tested very carefully. If you happen to break the interning system it's going to lead to rather nasty and hard to debug problems. Sure. My plan was to basically take the existing Set/Dict design, and just tweak it slightly for the expected operations of interned. e) How tuned is String.hash() for the fact that most of these strings are going to be ascii text? (I know that python wants to support non-ascii variable names, but I still think there is going to be an overwhelming bias towards characters in the range 65-122 ('A'-'z'). Python 3.0 uses unicode for all names. You have to design something that can be adopted to unicode, too. By the way do you know that dicts have an optimized lookup function for strings? It's called lookdict_unicode / lookdict_string. Sure, but so does PySet. I'm not sure about lookset_unicode, but I would guess that exists or should exist for py3k. Also note that the performance of the interned dict gets even worse on 64-bit platforms. Where the size of a 'dictentry' doubles, but the average length of a variable name wouldn't change. Anyway, I would be happy to implement something along the lines of a StringSet, or maybe the InternSet, etc. I just wanted to check if people would be interested or not. Since interning is mostly used in the core and extension modules you might want to experiment with a different growth rate. The interning data structure could start with a larger value and have a slower, non progressive data growth rate. Christian I'll also mention that there are other uses for intern() where it is uniquely suitable. Namely, if you are parsing lots of text with redundant strings, it is a way to decrease total memory consumption. (And potentially speed up future comparisons, etc.) The main reason why intern() is useful for this is because it doesn't make strings immortal, as would happen if you used some other structure. Because strings know about the interned object. The options for a 3rd-party structure fall down into something like: 1) A cache that makes the strings immortal. (IIRC this is what older versions of Python did.) 2) A cache that is periodically walked to see if any of the objects are no longer externally referenced. The main problem here is that walking is O(all-objects), whereas doing the checking at refcount=0 time means you only check objects when you think the last reference has gone away. 3) Hijacking PyStringType-dealloc, so that when the refcount goes to 0 and Python want's to destroy the string, you then trigger your own cache to look and see if it should remove the object. Even further, you either have to check on every string dealloc, or re-use PyStringObject-ob_sstate to track that you have placed this string into your custom structure. Which would preclude ever calling intern() on this string, because intern() doesn't just check a couple bits, it looks at the entire ob_sstate value. I think you could make it work, such that if your custom cache had set some values, then intern() would just
Re: [Python-Dev] Rethinking intern() and its data structure
Alexander Belopolsky wrote: On Thu, Apr 9, 2009 at 11:02 AM, John Arbash Meinel j...@arbash-meinel.com wrote: ... a) Don't keep a double reference to both key and value to the same object (1 pointer per entry), this could be as simple as using a Set() instead of a dict() There is a rejected patch implementing just that: http://bugs.python.org/issue1507011 . Thanks for the heads up. So reading that thread, the final reason it was rejected was 2 part: Without reviewing the patch again, I also doubt it is capable of getting rid of the reference count cheating: essentially, this cheating enables the interning dictionary to have weak references to strings, this is important to allow automatic collection of certain interned strings. This feature needs to be preserved, so the cheating in the reference count must continue. That specific argument was invalid. Because the patch just changed the refcount trickery to use +- 1. And I'm pretty sure Alexander's argument was just that +- 2 was weird, not that the weakref behavior was bad. The other argument against the patch was based on the idea that: The operation give me the member equal but not identical to E is conceptually a lookup operation; the mathematical set construct has no such operation, and the Python set models it closely. IOW, set is *not* a dict with key==value. I don't know if there was any consensus reached on this, since only Martin responded this way. I can say that for my do some work with a medium size code base, the overhead of interned as a dictionary was 1.5MB out of 20MB total memory. Simply changing it to a Set would drop this to 1.0MB. I have no proof about the impact on performance, since I haven't benchmarked it yet. Changing it to a StringSet could further drop it to 0.5MB. I would guess that any performance impact would depend on whether the total size of 'interned' would fit inside L2 cache or not. There is a small bug in the original patch adding the string to the set failed. Namely it would return t == NULL which would be t != s and the intern in place would end up setting your pointer to NULL rather than doing nothing and clearing the error code. So I guess some of it comes down to whether loweis would also reject this change on the basis that mathematically a set is not a dict. Though given that his claim nobody else is speaking in favor of the patch, while at least Colin Winter has expressed some interest at this point. John =:- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
So I guess some of it comes down to whether loweis would also reject this change on the basis that mathematically a set is not a dict. I'd like to point out that this was not the reason to reject it. Instead, this (or, the opposite of it) was given as a reason why this patch should be accepted (in msg50482). I found that a weak rationale for making that change, in particular because I think the rationale is incorrect. I like your rationale (save memory) much more, and was asking in the tracker for specific numbers, which weren't forthcoming. Though given that his claim nobody else is speaking in favor of the patch, while at least Colin Winter has expressed some interest at this point. Again, at that point in the tracker, none of the other committers had spoken in favor of the patch. Since I wasn't convinced of its correctness, and nobody else (whom I trust) had reviewed it as correct, I rejected it. Now that you brought up a specific numbers, I tried to verify them, and found them correct (although a bit unfortunate), please see my test script below. Up to 21800 interned strings, the dict takes (only) 384kiB. It then grows, requiring 1536kiB. Whether or not having 22k interned strings is typical, I still don't know. Wrt. your proposed change, I would be worried about maintainability, in particular if it would copy parts of the set implementation. Regards, Martin import gc, sys def find_interned_dict(): cand = None for o in gc.get_objects(): if not isinstance(o, dict): continue if find_interned_dict not in o: continue for k,v in o.iteritems(): if k is not v: break else: assert not cand cand = o return cand d = find_interned_dict() print len(d), sys.getsizeof(d) l = [] for i in range(2): if i%100==0: print len(d), sys.getsizeof(d) l.append(intern(repr(i))) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
... I like your rationale (save memory) much more, and was asking in the tracker for specific numbers, which weren't forthcoming. ... Now that you brought up a specific numbers, I tried to verify them, and found them correct (although a bit unfortunate), please see my test script below. Up to 21800 interned strings, the dict takes (only) 384kiB. It then grows, requiring 1536kiB. Whether or not having 22k interned strings is typical, I still don't know. Given that every variable name in any file is interned, it can grow pretty rapidly. As an extreme case, consider the file win32/lib/winerror.py which tracks all possible win32 errors. import winerror print len(winerror.__dict__) 1872 So a single error file has 1.9k strings. My python version (2.5.2) doesn't have 'sys.getsizeof()', but otherwise your code looks correct. If all I do is find the interned dict, I see: print len(d) 5037 So stock python, without importing much extra (just os, sys, gc, etc.) has almost 5k strings already. I don't have a great regex yet for just extracting how many unique strings there are in a given bit of source code. However, if I do: import gc, sys def find_interned_dict(): cand = None for o in gc.get_objects(): if not isinstance(o, dict): continue if find_interned_dict not in o: continue for k,v in o.iteritems(): if k is not v: break else: assert not cand cand = o return cand d = find_interned_dict() print len(d) # Just import a few of the core structures from bzrlib import branch, repository, workingtree, builtins print len(d) I start at 5k strings, and after just importing the important bits of bzrlib, I'm at: 19,316 Now, the bzrlib source code isn't particularly huge. It is about 3.7MB / 91k lines of .py files (that is, without importing the test suite). Memory consumption with just importing bzrlib shows up at 15MB, with 300kB taken up by the intern dict. If I then import some extra bits of bzrlib, like http support, ftp support, and sftp support (which brings in python's httplib, and paramiko, and ssh/sftp implementation), I'm up to: print len(d) 25186 Memory has jumped to 23MB, (interned is now 1.57MB) and I haven't actually done anything but import python code yet. If I sum the size of the PyString objects held in intern() it ammounts to 940KB. Though they refer to only 335KB of char data. (or an average of 13 bytes per string). Wrt. your proposed change, I would be worried about maintainability, in particular if it would copy parts of the set implementation. Right, so in the first part, I would just use Set(), as it could then save 1/3rd of the memory it uses today. (Dropping down to 1MB from 1.5MB.) I don't have numbers on how much that would improve CPU times, I would imagine improving 'intern()' would impact import times more than run times, simply because import time is interning a *lot* of strings. Though honestly, Bazaar would really like this, because startup overhead for us is almost 400ms to 'do nothing', which is a lot for a command line app. John =:- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
I don't have numbers on how much that would improve CPU times, I would imagine improving 'intern()' would impact import times more than run times, simply because import time is interning a *lot* of strings. Though honestly, Bazaar would really like this, because startup overhead for us is almost 400ms to 'do nothing', which is a lot for a command line app. Maybe I misunderstand your proposed change: how could the representation of the interning dict possibly change the runtime of interning? (let alone significantly) Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
Martin v. Löwis wrote: I don't have numbers on how much that would improve CPU times, I would imagine improving 'intern()' would impact import times more than run times, simply because import time is interning a *lot* of strings. Though honestly, Bazaar would really like this, because startup overhead for us is almost 400ms to 'do nothing', which is a lot for a command line app. Maybe I misunderstand your proposed change: how could the representation of the interning dict possibly change the runtime of interning? (let alone significantly) Regards, Martin Decreasing memory consumption lets more things fit in cache. Once the size of 'interned' is greater than fits into L2 cache, you start paying the cost of a full memory fetch, which is usually measured in 100s of cpu cycles. Avoiding double lookups in the dictionary would be less overhead, though the second lookup is probably pretty fast if there are no collisions, since everything would already be in the local CPU cache. If we were dealing in objects that were KB in size, it wouldn't matter. But as the intern dict quickly gets into MB, it starts to make a bigger difference. How big of a difference would be very CPU and dataset size specific. But certainly caches make certain things much faster, and once you overflow a cache, performance can take a surprising turn. So my primary goal is certainly a decrease of memory consumption. I think it will have a small knock-on effect of improving performance, I don't have anything to give concrete numbers. Also, consider that resizing has to evaluate every object, thus paging in all X bytes, and assigning to another 2X bytes. Cutting X by (potentially 3), would probably have a small but measurable effect. John =:- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
Also, consider that resizing has to evaluate every object, thus paging in all X bytes, and assigning to another 2X bytes. Cutting X by (potentially 3), would probably have a small but measurable effect. I'm *very* skeptical about claims on performance in the absence of actual measurements. Too many effects come together, so the actual performance is difficult to predict (and, for that prediction, you would need *at least* a work load that you want to measure - starting bzr would be such a workload, of course). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
On Apr 9, 2009, at 12:06 PM, Martin v. Löwis wrote: Now that you brought up a specific numbers, I tried to verify them, and found them correct (although a bit unfortunate), please see my test script below. Up to 21800 interned strings, the dict takes (only) 384kiB. It then grows, requiring 1536kiB. Whether or not having 22k interned strings is typical, I still don't know. Wrt. your proposed change, I would be worried about maintainability, in particular if it would copy parts of the set implementation. I connected to a random one of our processes, which has been running for a typical amount of time and is currently at ~300MB RSS. (gdb) p *(PyDictObject*)interned $2 = {ob_refcnt = 1, ob_type = 0x8121240, ma_fill = 97239, ma_used = 95959, ma_mask = 262143, ma_table = 0xa493c008, } Going from 3MB to 2.25MB isn't much, but it's not nothing, either. I'd be skeptical of cache performance arguments given that the strings used in any particular bit of code should be spread pretty much evenly throughout the hash table, and 3MB seems solidly bigger than any L2 cache I know of. You should be able to get meaningful numbers out of a C profiler, but I'd be surprised to see the act of interning taking a noticeable amount of time. -jake ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
John Arbash Meinel wrote: And when you look at the intern function, it doesn't use setdefault logic, it actually does a get() followed by a set(), which means the cost of interning is 1-2 lookups depending on likelyhood, etc. Keep in mind that intern() is called fairly rarely, mostly only at module load time. It may not be worth attempting to speed it up. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
2009/4/9 Greg Ewing greg.ew...@canterbury.ac.nz: John Arbash Meinel wrote: And when you look at the intern function, it doesn't use setdefault logic, it actually does a get() followed by a set(), which means the cost of interning is 1-2 lookups depending on likelyhood, etc. Keep in mind that intern() is called fairly rarely, mostly only at module load time. It may not be worth attempting to speed it up. That's very important, though, for a command line tool for bazaar. Even a few fractions of a second can make a difference in user perception of speed. -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
Greg Ewing wrote: John Arbash Meinel wrote: And the way intern is currently written, there is a third cost when the item doesn't exist yet, which is another lookup to insert the object. That's even rarer still, since it only happens the first time you load a piece of code that uses a given variable name anywhere in any module. Somewhat true, though I know it happens 25k times during startup of bzr... And I would be a *lot* happier if startup time was 100ms instead of 400ms. John =:- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
On 9-Apr-09, at 6:24 PM, John Arbash Meinel wrote: Greg Ewing wrote: John Arbash Meinel wrote: And the way intern is currently written, there is a third cost when the item doesn't exist yet, which is another lookup to insert the object. That's even rarer still, since it only happens the first time you load a piece of code that uses a given variable name anywhere in any module. Somewhat true, though I know it happens 25k times during startup of bzr... And I would be a *lot* happier if startup time was 100ms instead of 400ms. I don't want to quash your idealism too severely, but it is extremely unlikely that you are going to get anywhere near that kind of speed up by tweaking string interning. 25k times doing anything (computation) just isn't all that much. $ python -mtimeit -s 'd=dict.fromkeys(xrange(1000))' 'for x in xrange(25000): d.get(x)' 100 loops, best of 3: 8.28 msec per loop Perhaps this isn't representative (int hashing is ridiculously cheap, for instance), but the dict itself is far bigger than the dict you are dealing with and such would have similar cache-busting properties. And yet, 25k accesses (plus python-c dispatching costs which you are paying with interning) consume only ~10ms. You could do more good by eliminating a handful of disk seeks by reducing the number of imported modules... -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
On Thu, Apr 9, 2009 at 6:24 PM, John Arbash Meinel john.arbash.mei...@gmail.com wrote: Greg Ewing wrote: John Arbash Meinel wrote: And the way intern is currently written, there is a third cost when the item doesn't exist yet, which is another lookup to insert the object. That's even rarer still, since it only happens the first time you load a piece of code that uses a given variable name anywhere in any module. Somewhat true, though I know it happens 25k times during startup of bzr... And I would be a *lot* happier if startup time was 100ms instead of 400ms. I think you have plenty of a case to try it out. If you code it up and it doesn't speed anything up, well then we've learned something, and maybe it'll be useful anyway for the memory savings. If it does speed things up, well then Python's faster. I wouldn't waste time arguing about it before you have the change written. Good luck! Jeffrey ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
On Thu, Apr 9, 2009 at 6:24 PM, John Arbash Meinel john.arbash.mei...@gmail.com wrote: Greg Ewing wrote: John Arbash Meinel wrote: And the way intern is currently written, there is a third cost when the item doesn't exist yet, which is another lookup to insert the object. That's even rarer still, since it only happens the first time you load a piece of code that uses a given variable name anywhere in any module. Somewhat true, though I know it happens 25k times during startup of bzr... And I would be a *lot* happier if startup time was 100ms instead of 400ms. Quite so. We have a number of internal tools, and they find that frequently just starting up Python takes several times the duration of the actual work unit itself. I'd be very interested to review any patches you come up with to improve start-up time; so far on this thread, there's been a lot of theory and not much practice. I'd approach this iteratively: first replace the dict with a set, then if that bears fruit, consider a customized data structure; if that bears fruit, etc. Good luck, and be sure to let us know what you find, Collin Winter ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
On Thu, Apr 9, 2009 at 9:07 PM, Collin Winter coll...@gmail.com wrote: On Thu, Apr 9, 2009 at 6:24 PM, John Arbash Meinel john.arbash.mei...@gmail.com wrote: And I would be a *lot* happier if startup time was 100ms instead of 400ms. Quite so. We have a number of internal tools, and they find that frequently just starting up Python takes several times the duration of the actual work unit itself. I'd be very interested to review any patches you come up with to improve start-up time; so far on this thread, there's been a lot of theory and not much practice. I'd approach this iteratively: first replace the dict with a set, then if that bears fruit, consider a customized data structure; if that bears fruit, etc. Good luck, and be sure to let us know what you find, Just to add some skepticism, has anyone done any kind of instrumentation of bzr start-up behavior? IIRC every time I was asked to reduce the start-up cost of some Python app, the cause was too many imports, and the solution was either to speed up import itself (.pyc files were the first thing ever that came out of that -- importing from a single .zip file is one of the more recent tricks) or to reduce the number of modules imported at start-up (or both :-). Heavy-weight frameworks are usually the root cause, but usually there's nothing that can be done about that by the time you've reached this point. So, amen on the good luck, but please start with a bit of analysis. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
... Somewhat true, though I know it happens 25k times during startup of bzr... And I would be a *lot* happier if startup time was 100ms instead of 400ms. I don't want to quash your idealism too severely, but it is extremely unlikely that you are going to get anywhere near that kind of speed up by tweaking string interning. 25k times doing anything (computation) just isn't all that much. $ python -mtimeit -s 'd=dict.fromkeys(xrange(1000))' 'for x in xrange(25000): d.get(x)' 100 loops, best of 3: 8.28 msec per loop Perhaps this isn't representative (int hashing is ridiculously cheap, for instance), but the dict itself is far bigger than the dict you are dealing with and such would have similar cache-busting properties. And yet, 25k accesses (plus python-c dispatching costs which you are paying with interning) consume only ~10ms. You could do more good by eliminating a handful of disk seeks by reducing the number of imported modules... -Mike You're also using timeit over the same set of 25k keys, which means it only has to load that subset. And as you are using identical runs each time, those keys are already loaded into your cache lines... And given how hash(int) works, they are all sequential in memory, and all 10M in your original set have 0 collisions. Actually, at 10M, you'll have a dict of size 20M entries, and the first 10M entries will be full, and the trailing 10M entries will all be empty. That said, you're right, the benefits of a smaller structure are going to be small. I'll just point that if I just do a small tweak to your timing and do: $ python -mtimeit -s 'd=dict.fromkeys(xrange(1000))' 'for x in xrange(25000): d.get(x)' 100 loops, best of 3: 6.27 msec per loop So slightly faster than yours, *but*, lets try a much smaller dict: $ python -mtimeit -s 'd=dict.fromkeys(xrange(25000))' 'for x in xrange(25000): d.get(x)' 100 loops, best of 3: 6.35 msec per loop Pretty much the same time. Well within the noise margin. But if I go back to the big dict and actually select 25k keys across the whole set: $ TIMEIT -s 'd=dict.fromkeys(xrange(1000));' \ -s keys=range(0,1000,1000/25000)' \ 'for x in keys: d.get(x)' 100 loops, best of 3: 13.1 msec per loop Now I'm still accessing 25k keys, but I'm doing it across the whole range, and suddenly the time *doubled*. What about slightly more random access: $ TIMEIT -s 'import random; d=dict.fromkeys(xrange(1000));' -s 'bits = range(0, 1000, 400); random.shuffle(bits)'\ 'for x in bits: d.get(x)' 100 loops, best of 3: 15.5 msec per loop Not as big of a difference as I thought it would be... But I bet if there was a way to put the random shuffle in the inner loop, so you weren't accessing the same identical 25k keys internally, you might get more interesting results. As for other bits about exercising caches: $ shuffle(range(0, 1000, 400)) 100 loops, best of 3: 15.5 msec per loop $ shuffle(range(0, 1000, 40)) 10 loops, best of 3: 175 msec per loop 10x more keys, costs 11.3x, pretty close to linear. $ shuffle(range(0, 1000, 10)) 10 loops, best of 3: 739 msec per loop 4x the keys, 4.5x the time, starting to get more into nonlinear effects. Anyway, you're absolutely right. intern() overhead is a tiny fraction of 'import bzrlib.*' time, so I don't expect to see amazing results. That said, accessing 25k keys in a smaller structure is 2x faster than accessing 25k keys spread across a larger structure. John =:- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com