Re: [Python-Dev] Repeatability of looping over dicts
Guido What code would break if we loosened this restriction? I don't know how much, but I do know I've relied on this behavior before. (In fact, I've asked about it before.) I guess the counter question to yours would be, What would be gained by loosening this restriction? If the answer is, not much, then I don't see why this is even an idle thought. Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Repeatability of looping over dicts
On Jan 5, 2008 6:58 AM, [EMAIL PROTECTED] wrote: Guido What code would break if we loosened this restriction? I don't know how much, but I do know I've relied on this behavior before. (In fact, I've asked about it before.) I guess the counter question to yours would be, What would be gained by loosening this restriction? If the answer is, not much, then I don't see why this is even an idle thought. It sounds like loosening the restriction would allow Jython to use an implementation that is more efficient when used concurrently. That may not be sufficient reason; Jython apps that need a more efficient concurrent dict could import the ConcurrentHashMap class directly, and CPython apps are out of luck anyway. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Repeatability of looping over dicts
On Jan 4, 2008 12:05 PM, Tim Delaney [EMAIL PROTECTED] wrote: history of insertions and deletions. If items(), keys(), values(), iteritems(), iterkeys(), and itervalues() are called with no intervening modifications to the dictionary, the lists will directly correspond. I looked over the Java code briefly (http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/ConcurrentHashMap.java:HashIterator) and I don't see anything that would cause it to violate this condition. In particular, the locks don't affect the order. It's a complicated class though, so I could have missed it. Do you see a reason for it to change its iteration order spontaneously? If another thread is concurrently modifying the dict, that's an intervening modification, and changing the iteration order is fine. There's the second question of whether using ConcurrentHashMap for python dicts is a good idea. I can't find any mention of thread-safety in the Jython docs, so I assume it follows Java's rules, which are that most variables must be locked in order to be accessed and modified concurrently. Making dict a ConcurrentHashMap would provide a stronger guarantee for some built-in types but not others, which is likely to confuse people. Further, not all synchronized maps can be replaced by ConcurrentHashMap because you may need groups of operations to be atomic, while CHM only provides atomicity across single method calls. I think a new ConcurrentDict class would probably be a better way to integrate ConcurrentHashMap. -- Namasté, Jeffrey Yasskin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Repeatability of looping over dicts
A.M. Kuchling wrote: So, do Python implementations need to guarantee that list(dict_var) == a later result from list(dict_var)? As I just posted to the blog, yes. Look at section 3.8 of the reference manual (Mapping Types), specifically footnote 3: http://docs.python.org/lib/typesmapping.html (3) Keys and values are listed in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary's history of insertions and deletions. If items(), keys(), values(), iteritems(), iterkeys(), and itervalues() are called with no intervening modifications to the dictionary, the lists will directly correspond. This allows the creation of (value, key) pairs using zip(): pairs = zip(a.values(), a.keys()). The same relationship holds for the iterkeys() and itervalues() methods: pairs = zip(a.itervalues(), a.iterkeys()) provides the same value for pairs. Another way to create the same list is pairs = [(v, k) for (k, v) in a.iteritems()]. Tim Delaney ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Repeatability of looping over dicts
On Sat, Jan 05, 2008 at 07:05:55AM +1100, Tim Delaney wrote: So, do Python implementations need to guarantee that list(dict_var) == a later result from list(dict_var)? As I just posted to the blog, yes. Look at section 3.8 of the reference manual (Mapping Types), specifically footnote 3: Ah, thanks! I guess that rules out using ConcurrentHashMap, short of materializing the entire list and sorting it in some way. Oh well. --amk ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Repeatability of looping over dicts
A.M. Kuchling wrote: So, do Python implementations need to guarantee that list(dict_var) == a later result from list(dict_var)? As I just posted to the blog, yes. Look at section 3.8 of the reference manual (Mapping Types), specifically footnote 3: http://docs.python.org/lib/typesmapping.html (3) Keys and values are listed in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary's history of insertions and deletions. If items(), keys(), values(), iteritems(), iterkeys(), and itervalues() are called with no intervening modifications to the dictionary, the lists will directly correspond. This allows the creation of (value, key) pairs using zip(): pairs = zip(a.values(), a.keys()). The same relationship holds for the iterkeys() and itervalues() methods: pairs = zip(a.itervalues(), a.iterkeys()) provides the same value for pairs. Another way to create the same list is pairs = [(v, k) for (k, v) in a.iteritems()]. Tim Delaney ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Repeatability of looping over dicts
On Jan 4, 2008, at 5:54 PM, Guido van Rossum wrote: What code would break if we loosened this restriction? I guess defining d.items() as zip(d.keys(), d.values()) would no longer fly, but does anyone actually depend on this? I don't know what code would break today; this was initially added to the set of promises made by the dict methods a decode ago, but it was in response to a need in the software I was working on at the time (possibly Grail, for which 2.6/3.0 isn't an issue). That question should probably be addressed to a fairly wide audience (comp.lang.python) since the promise has been there for so long. -Fred -- Fred Drake fdrake at acm.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Repeatability of looping over dicts
On Jan 4, 2008 11:50 AM, A.M. Kuchling [EMAIL PROTECTED] wrote: This post describes work aimed at getting Django to run on Jython: http://zyasoft.com/pythoneering/2008/01/django-on-jython-minding-gap.html One outstanding issue is whether to use Java's ConcurrentHashMap type to underly Jython's dict type. See http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ConcurrentHashMap.html. ConcurrentHashMap scales better in the face of threading because it doesn't lock the whole table when updating it, but iterating over the map can return elements in a different order each time. This would mean that list(dict_var) doesn't return values in the same order as a later call to list(dict_var), even if dict_var hasn't been modified. Why? Under the hood, there are 32 different locks, each guarding a subset of the hash buckets, so if there are multiple threads iterating over the dictionary, they may not go through the buckets in order. http://www.ibm.com/developerworks/java/library/j-jtp08223/ discusses the implementation, at least in 2003. So, do Python implementations need to guarantee that list(dict_var) == a later result from list(dict_var)? What code would break if we loosened this restriction? I guess defining d.items() as zip(d.keys(), d.values()) would no longer fly, but does anyone actually depend on this? Just like we changed how we think about auto-closing files once Jython came along, I think this is at least worth considering. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Repeatability of looping over dicts
ConcurrentHashMap scales better in the face of threading .. . So, do Python implementations need to guarantee that list(dict_var) == a later result from list(dict_var)? What code would break if we loosened this restriction? I can imagine someone has code like this: for k in d: print '%5s' % k for k in d: print '-' for v in d.values(): print '%5s' % v It seems likely to me that a lot of code using d.values() would care about the order of the values and the only order that matters is corresponding to some set of keys. There are probably a lot of functions that take keys and values separately, so it would not be uncommon to call those with a pattern like: save_config(configfile, d.keys(), d.values()). In the OP's context where multiple threads are running, it may be fair to say that all bets are off. Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Repeatability of looping over dicts
On Fri, Jan 04, 2008 at 02:54:49PM -0800, Guido van Rossum wrote: What code would break if we loosened this restriction? I guess defining d.items() as zip(d.keys(), d.values()) would no longer fly, but does anyone actually depend on this? Just like we changed how we http://www.google.com/codesearch?hl=enq=+lang:python+zip+keysstart=10sa=N turns up a few pieces of code that would break: trac-0.10.3/trac/versioncontrol/cache.py Twisted-2.2.0/twisted/pb/test/test_banana.py Mattricks-0.7/Mattricks/MatchPrediction.py FlickrUploadr-1.0.0/src/xmltramp.py Projects trying to stay compatible with Python versions that don't have .items() may be more likely to use this idiom. Some classes may do this to implement .items(); openbabel-2.1.1/scripts/python/pybel.py does this. So some code will break, and I don't see a way to warn about this problem before breaking compatibility. --amk ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Repeatability of looping over dicts
On 1/4/08, A.M. Kuchling [EMAIL PROTECTED] wrote: This post describes work aimed at getting Django to run on Jython: http://zyasoft.com/pythoneering/2008/01/django-on-jython-minding-gap.html One outstanding issue is whether to use Java's ConcurrentHashMap type to underly Jython's dict type. See http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ConcurrentHashMap.html. As a side note, you may also want to consider using a NonBlockingHashMap as implemented by Cliff Click instead of ConcurrentHashMap: http://blogs.azulsystems.com/cliff/2007/03/a_nonblocking_h.html http://blogs.azulsystems.com/cliff/2007/03/talking_with_go.html http://video.google.com/videoplay?docid=2139967204534450862 http://sourceforge.net/projects/high-scale-lib I haven't looked to see if it has the same issues repeatability issues but it does scale even better. Anyways, could you not use something with repeatability issues with a wrapper that caches lists of keys to satisfy the repeatability? (don't lock and maintain the list, just regenerate the list on demand when a caller needs it and discard the cached list anytime a key is added or deleted) Also, should we drop this guarantee in py3k to give future implementations more flexibility? -gps ConcurrentHashMap scales better in the face of threading because it doesn't lock the whole table when updating it, but iterating over the map can return elements in a different order each time. This would mean that list(dict_var) doesn't return values in the same order as a later call to list(dict_var), even if dict_var hasn't been modified. Why? Under the hood, there are 32 different locks, each guarding a subset of the hash buckets, so if there are multiple threads iterating over the dictionary, they may not go through the buckets in order. http://www.ibm.com/developerworks/java/library/j-jtp08223/ discusses the implementation, at least in 2003. So, do Python implementations need to guarantee that list(dict_var) == a later result from list(dict_var)? --amk ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com