[Numpy-discussion] A change with minor compatibility questions

2012-10-17 Thread Travis Oliphant
Hey all, 

https://github.com/numpy/numpy/pull/482

is  a pull request that changes the hash function for numpy void scalars.   
These are the objects returned from fully indexing a structured array:  
array[i] if array is a 1-d structured array. 

Currently their hash function just hashes the pointer to the underlying data.   
 This means that void scalars can be used as keys in a dictionary but the 
behavior is non-intuitive because another void scalar with the same data but 
pointing to a different region of memory will hash differently.  

The pull request makes it so that two void scalars with the same data will hash 
to the same value (using the same algorithm as a tuple hash).This pull 
request also only allows read-only scalars to be hashed. 

There is a small chance this will break someone's code if they relied on this 
behavior.  I don't believe anyone is currently relying on this behavior -- but 
I've been proven wrong before.   What do people on this list think?   

Should we raise a warning in the next release when a hash function on a void 
scalar is called or just make the change, put it in the release notes and make 
a few people change their code if needed.  The problem was identified by a 
couple of users of NumPy currently which is why I think that people who have 
tried using numpy void scalars as keys aren't doing it right now but are 
instead converting them to tuples first. 

-Travis

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A change with minor compatibility questions

2012-10-17 Thread Gael Varoquaux
On Wed, Oct 17, 2012 at 10:22:28AM -0500, Travis Oliphant wrote:
 There is a small chance this will break someone's code if they relied on this
 behavior.  I don't believe anyone is currently relying on this behavior -- but
 I've been proven wrong before.   What do people on this list think?   

I think that the change is acceptable. Thanks for asking!

 Should we raise a warning in the next release when a hash function on a void
 scalar is called or just make the change, put it in the release notes and make
 a few people change their code if needed.

I think that there should be a section in the release notes that list
such changes in a very visible way. I don't think that a warning is
necessary here, but I guess that others might have a different point of
view.

G
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A change with minor compatibility questions

2012-10-17 Thread Dag Sverre Seljebotn
On 10/17/2012 05:22 PM, Travis Oliphant wrote:
 Hey all,

 https://github.com/numpy/numpy/pull/482

 is  a pull request that changes the hash function for numpy void
 scalars.   These are the objects returned from fully indexing a
 structured array:  array[i] if array is a 1-d structured array.

 Currently their hash function just hashes the pointer to the underlying
 data.This means that void scalars can be used as keys in a
 dictionary but the behavior is non-intuitive because another void scalar
 with the same data but pointing to a different region of memory will
 hash differently.

 The pull request makes it so that two void scalars with the same data
 will hash to the same value (using the same algorithm as a tuple hash).
 This pull request also only allows read-only scalars to be hashed.

 There is a small chance this will break someone's code if they relied on
 this behavior.  I don't believe anyone is currently relying on this
 behavior -- but I've been proven wrong before.   What do people on this
 list think?

I support working on fixing this, but if I understand your fix correctly 
this change just breaks things in a different way.

Specifically, in this example:

arr = np.ones(4, dtype=[('a', np.int64)])
x = arr[0]
d = { x : 'value' }
arr[0]['a'] = 4
print d[x]

Does the last line raise a KeyError? If I understand correctly it does.

(Of course, the current situation just breaks lookups in another situation.)

I propose to BOTH make x unhashable (thus being a good Python citizen 
and following Python rules) AND provide x.askey() or x.immutable() 
which returns something immutable you can use as a key.

The places where that breaks things is probably buggy code that must be 
fixed (either one way or the other) anyway. Perhaps a warning period is 
in order then (one would raise a warning in __hash__, telling people to 
use the askey() method).

(I would really prefer to always have x be immutable, but that 
probably breaks working code.)

Dag Sverre
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A change with minor compatibility questions

2012-10-17 Thread Dag Sverre Seljebotn
On 10/17/2012 06:56 PM, Dag Sverre Seljebotn wrote:
 On 10/17/2012 05:22 PM, Travis Oliphant wrote:
 Hey all,

 https://github.com/numpy/numpy/pull/482

 is  a pull request that changes the hash function for numpy void
 scalars.   These are the objects returned from fully indexing a
 structured array:  array[i] if array is a 1-d structured array.

 Currently their hash function just hashes the pointer to the underlying
 data.This means that void scalars can be used as keys in a
 dictionary but the behavior is non-intuitive because another void scalar
 with the same data but pointing to a different region of memory will
 hash differently.

 The pull request makes it so that two void scalars with the same data
 will hash to the same value (using the same algorithm as a tuple hash).
  This pull request also only allows read-only scalars to be hashed.

 There is a small chance this will break someone's code if they relied on
 this behavior.  I don't believe anyone is currently relying on this
 behavior -- but I've been proven wrong before.   What do people on this
 list think?

 I support working on fixing this, but if I understand your fix correctly
 this change just breaks things in a different way.

 Specifically, in this example:

 arr = np.ones(4, dtype=[('a', np.int64)])
 x = arr[0]
 d = { x : 'value' }
 arr[0]['a'] = 4
 print d[x]

 Does the last line raise a KeyError? If I understand correctly it does.

Argh. I overlooked both Travis' second commit, and the explicit mention 
of read-only above.

Isn't it possible to produce a read-only array from a writeable one 
though, and so get a read-only scalar whose underlying value can still 
change?

Anyway, sorry about being so quick to post.

Dag Sverre
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A change with minor compatibility questions

2012-10-17 Thread Travis Oliphant

On Oct 17, 2012, at 12:48 PM, Dag Sverre Seljebotn wrote:

 On 10/17/2012 06:56 PM, Dag Sverre Seljebotn wrote:
 On 10/17/2012 05:22 PM, Travis Oliphant wrote:
 Hey all,
 
 https://github.com/numpy/numpy/pull/482
 
 is  a pull request that changes the hash function for numpy void
 scalars.   These are the objects returned from fully indexing a
 structured array:  array[i] if array is a 1-d structured array.
 
 Currently their hash function just hashes the pointer to the underlying
 data.This means that void scalars can be used as keys in a
 dictionary but the behavior is non-intuitive because another void scalar
 with the same data but pointing to a different region of memory will
 hash differently.
 
 The pull request makes it so that two void scalars with the same data
 will hash to the same value (using the same algorithm as a tuple hash).
 This pull request also only allows read-only scalars to be hashed.
 
 There is a small chance this will break someone's code if they relied on
 this behavior.  I don't believe anyone is currently relying on this
 behavior -- but I've been proven wrong before.   What do people on this
 list think?
 
 I support working on fixing this, but if I understand your fix correctly
 this change just breaks things in a different way.
 
 Specifically, in this example:
 
 arr = np.ones(4, dtype=[('a', np.int64)])
 x = arr[0]
 d = { x : 'value' }
 arr[0]['a'] = 4
 print d[x]
 
 Does the last line raise a KeyError? If I understand correctly it does.
 
 Argh. I overlooked both Travis' second commit, and the explicit mention 
 of read-only above.
 
 Isn't it possible to produce a read-only array from a writeable one 
 though, and so get a read-only scalar whose underlying value can still 
 change?

Yes, it is possible to do that (just like it is currently possible to change a 
tuple with a C-extension or even Cython or a string with NumPy). 

We won't be able to prevent people from writing code that will have odd 
behavior, but we can communicate correctly about what one should do. 

-Travis



 
 Anyway, sorry about being so quick to post.
 
 Dag Sverre
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion