Re: [Python-Dev] Repeatability of looping over dicts

2008-01-05 Thread skip

Guido What code would break if we loosened this restriction? 

I don't know how much, but I do know I've relied on this behavior before.
(In fact, I've asked about it before.)  I guess the counter question to
yours would be, What would be gained by loosening this restriction?  If
the answer is, not much, then I don't see why this is even an idle
thought.

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Repeatability of looping over dicts

2008-01-05 Thread Guido van Rossum
On Jan 5, 2008 6:58 AM,  [EMAIL PROTECTED] wrote:

 Guido What code would break if we loosened this restriction?

 I don't know how much, but I do know I've relied on this behavior before.
 (In fact, I've asked about it before.)  I guess the counter question to
 yours would be, What would be gained by loosening this restriction?  If
 the answer is, not much, then I don't see why this is even an idle
 thought.

It sounds like loosening the restriction would allow Jython to use an
implementation that is more efficient when used concurrently.

That may not be sufficient reason; Jython apps that need a more
efficient concurrent dict could import the ConcurrentHashMap class
directly, and CPython apps are out of luck anyway.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Repeatability of looping over dicts

2008-01-05 Thread Jeffrey Yasskin
On Jan 4, 2008 12:05 PM, Tim Delaney [EMAIL PROTECTED] wrote:
 history of insertions and deletions. If items(), keys(), values(),
 iteritems(), iterkeys(), and itervalues() are called with no intervening
 modifications to the dictionary, the lists will directly correspond.

I looked over the Java code briefly
(http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/ConcurrentHashMap.java:HashIterator)
and I don't see anything that would cause it to violate this
condition. In particular, the locks don't affect the order.  It's a
complicated class though, so I could have missed it. Do you see a
reason for it to change its iteration order spontaneously? If another
thread is concurrently modifying the dict, that's an intervening
modification, and changing the iteration order is fine.

There's the second question of whether using ConcurrentHashMap for
python dicts is a good idea. I can't find any mention of thread-safety
in the Jython docs, so I assume it follows Java's rules, which are
that most variables must be locked in order to be accessed and
modified concurrently. Making dict a ConcurrentHashMap would provide a
stronger guarantee for some built-in types but not others, which is
likely to confuse people. Further, not all synchronized maps can be
replaced by ConcurrentHashMap because you may need groups of
operations to be atomic, while CHM only provides atomicity across
single method calls. I think a new ConcurrentDict class would probably
be a better way to integrate ConcurrentHashMap.

-- 
Namasté,
Jeffrey Yasskin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Repeatability of looping over dicts

2008-01-04 Thread Tim Delaney
A.M. Kuchling wrote:

 So, do Python implementations need to guarantee that list(dict_var) ==
 a later result from list(dict_var)?

As I just posted to the blog, yes. Look at section 3.8 of the reference 
manual
(Mapping Types), specifically footnote 3:

http://docs.python.org/lib/typesmapping.html

(3) Keys and values are listed in an arbitrary order which is non-random,
varies across Python implementations, and depends on the dictionary's
history of insertions and deletions. If items(), keys(), values(),
iteritems(), iterkeys(), and itervalues() are called with no intervening
modifications to the dictionary, the lists will directly correspond.
This allows the creation of (value, key) pairs using zip(): pairs =
zip(a.values(), a.keys()). The same relationship holds for the
iterkeys() and itervalues() methods: pairs = zip(a.itervalues(),
a.iterkeys()) provides the same value for pairs. Another way to create
the same list is pairs = [(v, k) for (k, v) in a.iteritems()].

Tim Delaney 

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Repeatability of looping over dicts

2008-01-04 Thread A.M. Kuchling
On Sat, Jan 05, 2008 at 07:05:55AM +1100, Tim Delaney wrote:
 So, do Python implementations need to guarantee that list(dict_var) ==
 a later result from list(dict_var)?
 
 As I just posted to the blog, yes. Look at section 3.8 of the reference 
 manual
 (Mapping Types), specifically footnote 3:

Ah, thanks!  I guess that rules out using ConcurrentHashMap, short of
materializing the entire list and sorting it in some way.  Oh well.

--amk


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Repeatability of looping over dicts

2008-01-04 Thread Tim Delaney
A.M. Kuchling wrote:

 So, do Python implementations need to guarantee that list(dict_var) ==
 a later result from list(dict_var)?

As I just posted to the blog, yes. Look at section 3.8 of the reference 
manual
(Mapping Types), specifically footnote 3:

http://docs.python.org/lib/typesmapping.html

(3) Keys and values are listed in an arbitrary order which is non-random,
varies across Python implementations, and depends on the dictionary's
history of insertions and deletions. If items(), keys(), values(),
iteritems(), iterkeys(), and itervalues() are called with no intervening
modifications to the dictionary, the lists will directly correspond.
This allows the creation of (value, key) pairs using zip(): pairs =
zip(a.values(), a.keys()). The same relationship holds for the
iterkeys() and itervalues() methods: pairs = zip(a.itervalues(),
a.iterkeys()) provides the same value for pairs. Another way to create
the same list is pairs = [(v, k) for (k, v) in a.iteritems()].

Tim Delaney 

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Repeatability of looping over dicts

2008-01-04 Thread Fred Drake
On Jan 4, 2008, at 5:54 PM, Guido van Rossum wrote:
 What code would break if we loosened this restriction? I guess
 defining d.items() as zip(d.keys(), d.values()) would no longer fly,
 but does anyone actually depend on this?

I don't know what code would break today; this was initially added to  
the set of promises made by the dict methods a decode ago, but it was  
in response to a need in the software I was working on at the time  
(possibly Grail, for which 2.6/3.0 isn't an issue).

That question should probably be addressed to a fairly wide audience  
(comp.lang.python) since the promise has been there for so long.


   -Fred

-- 
Fred Drake   fdrake at acm.org




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Repeatability of looping over dicts

2008-01-04 Thread Guido van Rossum
On Jan 4, 2008 11:50 AM, A.M. Kuchling [EMAIL PROTECTED] wrote:
 This post describes work aimed at getting Django to run on Jython:
 http://zyasoft.com/pythoneering/2008/01/django-on-jython-minding-gap.html

 One outstanding issue is whether to use Java's ConcurrentHashMap type
 to underly Jython's dict type.  See
 http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ConcurrentHashMap.html.

 ConcurrentHashMap scales better in the face of threading because it
 doesn't lock the whole table when updating it, but iterating over the
 map can return elements in a different order each time.  This would
 mean that list(dict_var) doesn't return values in the same order as a
 later call to list(dict_var), even if dict_var hasn't been modified.

 Why?  Under the hood, there are 32 different locks, each guarding a
 subset of the hash buckets, so if there are multiple threads iterating
 over the dictionary, they may not go through the buckets in order.
 http://www.ibm.com/developerworks/java/library/j-jtp08223/ discusses
 the implementation, at least in 2003.

 So, do Python implementations need to guarantee that list(dict_var) ==
 a later result from list(dict_var)?

What code would break if we loosened this restriction? I guess
defining d.items() as zip(d.keys(), d.values()) would no longer fly,
but does anyone actually depend on this? Just like we changed how we
think about auto-closing files once Jython came along, I think this is
at least worth considering.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Repeatability of looping over dicts

2008-01-04 Thread Raymond Hettinger
 ConcurrentHashMap scales better in the face of threading 
 .. .
 So, do Python implementations need to guarantee that list(dict_var) ==
 a later result from list(dict_var)?

 What code would break if we loosened this restriction? 

I can imagine someone has code like this:
  
   for k in d:
   print '%5s' % k
   for k in d:
   print '-'
   for v in d.values():
   print '%5s' % v

It seems likely to me that a lot of code using d.values() would care about the 
order of the values and the only order that matters is corresponding to some 
set of keys.  There are probably a lot of functions that take keys and values 
separately, so it would not be uncommon to call those with a pattern like: 
save_config(configfile, d.keys(), d.values()).

In the OP's context where multiple threads are running, it may be fair to say 
that all bets are off.


Raymond

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Repeatability of looping over dicts

2008-01-04 Thread A.M. Kuchling
On Fri, Jan 04, 2008 at 02:54:49PM -0800, Guido van Rossum wrote:
 What code would break if we loosened this restriction? I guess
 defining d.items() as zip(d.keys(), d.values()) would no longer fly,
 but does anyone actually depend on this? Just like we changed how we

http://www.google.com/codesearch?hl=enq=+lang:python+zip+keysstart=10sa=N
turns up a few pieces of code that would break:

trac-0.10.3/trac/versioncontrol/cache.py 
Twisted-2.2.0/twisted/pb/test/test_banana.py 
Mattricks-0.7/Mattricks/MatchPrediction.py 
FlickrUploadr-1.0.0/src/xmltramp.py 

Projects trying to stay compatible with Python versions that don't
have .items() may be more likely to use this idiom.  Some classes may
do this to implement .items(); openbabel-2.1.1/scripts/python/pybel.py
does this.  So some code will break, and I don't see a way to warn
about this problem before breaking compatibility.

--amk

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Repeatability of looping over dicts

2008-01-04 Thread Gregory P. Smith
On 1/4/08, A.M. Kuchling [EMAIL PROTECTED] wrote:
 This post describes work aimed at getting Django to run on Jython:
 http://zyasoft.com/pythoneering/2008/01/django-on-jython-minding-gap.html

 One outstanding issue is whether to use Java's ConcurrentHashMap type
 to underly Jython's dict type.  See
 http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ConcurrentHashMap.html.


As a side note, you may also want to consider using a
NonBlockingHashMap as implemented by Cliff Click instead of
ConcurrentHashMap:

 http://blogs.azulsystems.com/cliff/2007/03/a_nonblocking_h.html
 http://blogs.azulsystems.com/cliff/2007/03/talking_with_go.html
 http://video.google.com/videoplay?docid=2139967204534450862
 http://sourceforge.net/projects/high-scale-lib

I haven't looked to see if it has the same issues repeatability issues
but it does scale even better.

Anyways, could you not use something with repeatability issues with a
wrapper that caches lists of keys to satisfy the repeatability?
(don't lock and maintain the list, just regenerate the list on demand
when a caller needs it and discard the cached list anytime a key is
added or deleted)

Also, should we drop this guarantee in py3k to give future
implementations more flexibility?

-gps

 ConcurrentHashMap scales better in the face of threading because it
 doesn't lock the whole table when updating it, but iterating over the
 map can return elements in a different order each time.  This would
 mean that list(dict_var) doesn't return values in the same order as a
 later call to list(dict_var), even if dict_var hasn't been modified.

 Why?  Under the hood, there are 32 different locks, each guarding a
 subset of the hash buckets, so if there are multiple threads iterating
 over the dictionary, they may not go through the buckets in order.
 http://www.ibm.com/developerworks/java/library/j-jtp08223/ discusses
 the implementation, at least in 2003.

 So, do Python implementations need to guarantee that list(dict_var) ==
 a later result from list(dict_var)?

 --amk
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com