[issue15814] memoryview: equality-hash invariant

2012-11-02 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 969069d464bc by Stefan Krah in branch '3.3':
Issue #15814: Use hash function that is compatible with the equality
http://hg.python.org/cpython/rev/969069d464bc

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-11-02 Thread Stefan Krah

Changes by Stefan Krah stefan-use...@bytereef.org:


--
resolution:  - fixed
stage: needs patch - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-10-17 Thread Mark Lawrence

Mark Lawrence added the comment:

The 3.3.0 docs now state Note Hashing of memoryviews with formats other than 
‘B’, ‘b’ or ‘c’ as well as hashing of multi-dimensional memoryviews is possible 
in version 3.3.0, but will raise an error in 3.3.1 in order to be compatible 
with the new memoryview equality definition.  I assume 3.3.1 should read 3.4, 
or have I missed something?

--
nosy: +BreamoreBoy

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-10-17 Thread Martin v . Löwis

Martin v. Löwis added the comment:

3.3.1 is correct, you apparently missed msg169425.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-09 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 7734eb2707a1 by Stefan Krah in branch 'default':
Issue #15814: Document planned restrictions for memoryview hashes in 3.3.1.
http://hg.python.org/cpython/rev/7734eb2707a1

New changeset 71f4d80400f2 by Stefan Krah in branch 'default':
Issue #15814: Documentation: disallow hashing of multi-dimensional memoryviews.
http://hg.python.org/cpython/rev/71f4d80400f2

New changeset 3ffd6ad93fe4 by Nick Coghlan in branch 'default':
Issue #15814: Add NEWS entry regarding intended memoryview hashing restrictions
http://hg.python.org/cpython/rev/3ffd6ad93fe4

New changeset 88a0792e8ba3 by Stefan Krah in branch 'default':
Issue #15814: Update whatsnew to the current state of hashing memoryviews.
http://hg.python.org/cpython/rev/88a0792e8ba3

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-08 Thread Roundup Robot

Roundup Robot added the comment:

New changeset ca81b9a3a015 by Stefan Krah in branch 'default':
Issue #15814: Update whatsnew to the current state of hashing memoryviews.
http://hg.python.org/cpython/rev/ca81b9a3a015

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-08 Thread Stefan Krah

Stefan Krah added the comment:

Georg, thanks for including all changes that I've asked for!

If you still have the patience for the constant stream of memoryview
doc changes, there are three new ones that might be worth including:

3b2597c1fe35
c9c9d890400c
ca81b9a3a015

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-03 Thread Nick Coghlan

Nick Coghlan added the comment:

The main issue is that it's not quite clear how to deal with problems like 
C-style vs FORTRAN-style memory layouts and strides vs suboffsets in defining 
multidimensional hash equality. Without a use case, it's easier to just punt on 
the question and declare it illegal.

The 1D hashing case really just comes from wanting to have 1D bytes views 
behave as much like a bytes object as is practical.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-03 Thread Roundup Robot

Roundup Robot added the comment:

New changeset c9c9d890400c by Nick Coghlan in branch 'default':
Issue #15814: Add NEWS entry regarding intended memoryview hashing restrictions
http://hg.python.org/cpython/rev/c9c9d890400c

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-03 Thread Stefan Krah

Stefan Krah added the comment:

Small nitpick: multi-dimensional hashing wasn't really accidental,
it was perfectly aligned with the previous statically typed equality
definition.

When I suggested PyBuffer_Hash = hash(obj.tobytes()) on python-dev for
non-contiguous and multi-dimensional arrays, I got a +1 from Raymond
(which I now see was a private message).


I don't see what could possibly be ill-defined about using the
tobytes() definition for ND-arrays. In all places memoryview now
uses the logical array, which is displayed by tolist().

Leaving aside static typing, both the previous and the new equality
definitions regarded C and Fortran arrays with the same list outputs
as equal.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-03 Thread Alexander Belopolsky

Alexander Belopolsky added the comment:

On Mon, Sep 3, 2012 at 8:38 AM, Stefan Krah rep...@bugs.python.org wrote:
 I don't see what could possibly be ill-defined about using the
 tobytes() definition for ND-arrays. In all places memoryview now
 uses the logical array, which is displayed by tolist().

+1.  The key restriction is that to byte views.  ND aspect does not
complicate the logic once .tobytes() is properly implemented.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-03 Thread Stefan Behnel

Changes by Stefan Behnel sco...@users.sourceforge.net:


--
nosy:  -scoder

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-03 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Am 02.09.2012 16:21, schrieb Alexander Belopolsky:
 I have refrained from voting because in my line of work buffers or
 memoryviews deal with large objects that rarely serve as dictionary
 keys.  As a result, I have zero experince with hashing of buffers.
 This observation supports the current consensus to limit hashing to
 1d and 0d cases.  My only concern is that with hash(m) ==
 hash(m.tobytes()) implementing multidimensional restriction will
 require artificial  if ndim  1 check and an extra sentence in the
 docs while not simplifying anything.

As for the not simplifying argument: if hashing was restricted
to contiguous bytes, then the implementation would certainly be
simplified quite a bit: currently, if it's not contiguous, it needs
to make a separate copy and hash that. This code could go away
if hashing would only work for true memory blocks.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-03 Thread Alexander Belopolsky

Alexander Belopolsky added the comment:

On Mon, Sep 3, 2012 at 3:59 PM, Martin v. Löwis rep...@bugs.python.org wrote:
 if hashing was restricted
 to contiguous bytes, then the implementation would certainly be
 simplified quite a bit: currently, if it's not contiguous, it needs
 to make a separate copy and hash that.

I would be happy if this whole business of strided memory layouts
would go away.  I find it to be a failed attempt of
over-generalization.  It is failed because it does not cover many
important formats even in 2-d: sparse matrices, symmetrical matrices,
banded matrices, etc., etc.  (Check level 3 BLAS or NAG for a longer
list.)  It is an over-generalizarion because in practice it is often
more efficient to gather your data in contiguous memory, perform
vector calculations and then scatter the results than to operate on
scattered data to avoid copying.

I would much rather see a generalization of suboffsets that would
allow me to have views into arrays of pointers to variable width
arrays than strides.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-02 Thread Stefan Krah

Stefan Krah added the comment:

The totals are +11.5 :) for hashing, +1 for allowing non-contiguous and
-2 for multi-dimensional. I'll update the docs soon.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-02 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 3b2597c1fe35 by Stefan Krah in branch 'default':
Issue #15814: Documentation: disallow hashing of multi-dimensional memoryviews.
http://hg.python.org/cpython/rev/3b2597c1fe35

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-02 Thread Alexander Belopolsky

Alexander Belopolsky added the comment:

On Sep 2, 2012, at 8:44 AM, Stefan Krah rep...@bugs.python.org wrote:

 The totals are +11.5 :) for hashing, +1 for allowing non-contiguous and
 -2 for multi-dimensional

I have refrained from voting because in my line of work buffers or memoryviews 
deal with large objects that rarely serve as dictionary keys.  As a result, I 
have zero experince with hashing of buffers.  This observation supports the 
current consensus to limit hashing to 1d and 0d cases.  My only concern is that 
with hash(m) == hash(m.tobytes()) implementing multidimensional restriction 
will require artificial  if ndim  1 check and an extra sentence in the docs 
while not simplifying anything.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-01 Thread Stefan Krah

Stefan Krah added the comment:

Here's a patch that enforces byte formats.


Does everyone agree on (or tolerate at least) allowing 'B', 'b' and 'c'?
I think we should at least commit the doc patch for 3.3.0. Otherwise
implementors of exporting objects might waste time on a feature that's
deprecated.

--
Added file: http://bugs.python.org/file27085/issue15814-2.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-01 Thread Nick Coghlan

Nick Coghlan added the comment:

+1 for docs patch for 3.3.0 and then enforcing the format restriction for 3.3.1.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-01 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 895e123d9476 by Stefan Krah in branch 'default':
Issue #15814: Document planned restrictions for memoryview hashes in 3.3.1.
http://hg.python.org/cpython/rev/895e123d9476

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-01 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Am 01.09.12 13:22, schrieb Stefan Krah:
 Does everyone agree on (or tolerate at least) allowing 'B', 'b' and 'c'?

Why be more permissive than necessary? -0 on the committed version;
it should IMO further restrict it to 1D contiguous byte arrays.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-01 Thread Stefan Krah

Stefan Krah added the comment:

Martin v. L??wis rep...@bugs.python.org wrote:
 Why be more permissive than necessary? -0 on the committed version;
 it should IMO further restrict it to 1D contiguous byte arrays.

Does byte arrays include 'b' and 'c' or just 'B'? I don't see a reason 
to allow 'B' but not the others.


I'm +-0 on allowing multi-dimensional arrays, but it would be
odd to restrict hashing to contiguous arrays:

 b = b'abcdefhhijkl'
 m = memoryview(b)
 b[::-1]
b'lkjihhfedcba'
 bytes(m[::-1]) 
b'lkjihhfedcba'
 hash(b[::-1]) == hash(m[::-1])
True


My reasoning was: If non-contiguous arrays are allowed (and I think they
should be), why not allow multi-dimensional arrays, too?


The definition hash(m) == hash(m.tobytes()) is pretty straightforward.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-01 Thread Alexander Belopolsky

Alexander Belopolsky added the comment:

On Sep 1, 2012, at 10:24 AM, Stefan Krah rep...@bugs.python.org wrote:

 
 The definition hash(m) == hash(m.tobytes()) is pretty straightforward.

I probably missed something from the early discussion, but doesn't this 
definition only work for 1d (or 0d) views? Shouldn't shape come into 
calculation for multidimensional views?

--
nosy: +Alexander.Belopolsky

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-01 Thread Stefan Krah

Stefan Krah added the comment:

tobytes() is the same as the flattened multi-dimensional list representation
with all elements converted to bytes. If I'm not mistaken, that's how NumPy's
tostring() behaves.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-01 Thread Alexander Belopolsky

Alexander Belopolsky added the comment:

On Sep 1, 2012, at 11:06 AM, Stefan Krah rep...@bugs.python.org wrote:

 tobytes() is the same as the flattened multi-dimensional list representation
 with all elements converted to bytes.

This is correct, but why is it desirable to have deliberate hash collisions 
between views with different shapes?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-01 Thread Nick Coghlan

Nick Coghlan added the comment:

Keep in mind that its OK if hash(m) == hash(m.tobytes()) in some cases where m 
!= m.tobytes(). The only cases we really need to kill are those that break the 
hash invariant.

I don't like the idea of making the definition of hash more complicated just to 
rule out cases that aren't actually broken.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-01 Thread Stefan Krah

Stefan Krah added the comment:

 why is it desirable to have deliberate hash collisions between views with 
 different shapes?

Since we're now restricting everything to bytes, the multi-dimensional case
is probably not useful at all. As I said above, I would leave it in because
it actually saves one check for ndim and as Nick said, the definition is
simpler. Now the user only has to remember that hashing works for 'B', 'b'
and 'c'.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-01 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Am 01.09.12 16:24, schrieb Stefan Krah:
 Does byte arrays include 'b' and 'c' or just 'B'? I don't see a reason
 to allow 'B' but not the others.

Either type is fine with me. It's the multi-dimensional aspect I'd like
to ban.

 My reasoning was: If non-contiguous arrays are allowed (and I think they
 should be), why not allow multi-dimensional arrays, too?

I think neither should be allowed. The explicit request (from Antoine)
was that memoryviews of bytes objects should be hashable. I haven't
heard anybody asking for something more general (but I could see that
people want to hash other memory blocks as well, such as mmap views).

I can see that it is tempting to provide the most general definition,
but see where this approach got us. It's always easier to widen an
interface later than to narrow it.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-01 Thread Stefan Krah

Stefan Krah added the comment:

Disallowing non-contiguous arrays leads to very strange situations though.
I'm positive that there will be a bug report about this:

 x = memoryview(b'abc')[::-1]
 b = b'cba'
 d = {b'cba': 101}
 
 b in d
True
 x == b
True
 x in d
Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: memoryiew: hashing is restricted to C-contiguous arrays

Could we perhaps take a small poll? My own vote is:

1) Allow bytes hashing at all: +0.5

2) If 1) is allowed, then also non-contiguous hashing is allowed: +1

3) Allow multi-dimensional hashing: +-0

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-01 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Am 01.09.12 19:20, schrieb Stefan Krah:
 Disallowing non-contiguous arrays leads to very strange situations though.

I don't find that strange. That two object compare equal doesn't imply
that they both hash - only that *if* they hash, they should hash equal.

In any case, this can happen already:

py x = memoryview(array.array('B',b'cba'))
py b = b'cba'
py d = {b'cba': 101}
py b in d
True
py x == b
True
py x in d
Traceback (most recent call last):
   File stdin, line 1, in module
ValueError: cannot hash writable memoryview object

It can also happen with other types (although I had to look around
a bit):

py x=set((1,2,3))
py b=frozenset(x)
py d={b:101}
py b in d
True
py x==b
True
py x in d
Traceback (most recent call last):
   File stdin, line 1, in module
TypeError: unhashable type: 'set'

 1) Allow bytes hashing at all: +0.5

+0

 2) If 1) is allowed, then also non-contiguous hashing is allowed: +1

-1

 3) Allow multi-dimensional hashing: +-0

-1

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-01 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 Could we perhaps take a small poll? My own vote is:
 
 1) Allow bytes hashing at all: +0.5

+10. The buffer() object in 2.x was hashable, and a very important use
case of memoryview is replacing buffer().

 2) If 1) is allowed, then also non-contiguous hashing is allowed: +1
 3) Allow multi-dimensional hashing: +-0

I don't care about these. I kind of agree with Martin that hashability
should be motivated by an use case, so -0.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-01 Thread Stefan Krah

Stefan Krah added the comment:

 py x = memoryview(array.array('B',b'cba'))

I find the array example is different. The user has to remember one thing:
memoryviews based on arrays don't hash.

For memoryviews based on bytes one would have to remember:

  - 'B', 'c' and 'b' hash

  - only C-contiguous arrays hash

  - b'abc'[::-1] hashes, but memoryview(b'abc')[::-1] does not

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-01 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Am 01.09.12 20:06, schrieb Stefan Krah:
- b'abc'[::-1] hashes, but memoryview(b'abc')[::-1] does not

I find that memoryview(b'abc')[::-1] is a strange thing to have,
anyway, so I'm not bothered by it behaving different. I can accept
that it needs to be supported for consistency, but I would recommend
against using it in applications. Who needs reversed memory, anyway?
The only case were I ever needed reversal is lists (typically sorted,
perhaps cronological).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-09-01 Thread Nick Coghlan

Nick Coghlan added the comment:

+1 for allowing bytes hashing. As Antoine noted, the 1D bytes variant of 
memoryview() fills the role previously handled by buffer().

+1 for allowing 1D non-contiguous hashing. This is from a simplicity 
perspective, as I don't want to have to explain to people why 
hash(memoryview(b'12'[::2])), hash(memoryview(b'123'[::2])) and 
hash(memoryview(b'12')[::2]) all work, but hash(memoryview(b'123')[::2]) 
fails.

-1 for allowing hashing when ndim  1. We don't have a solid use case, and 
there's no odd value based corner cases to be considered.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-08-31 Thread Stefan Krah

Stefan Krah added the comment:

Dag Sverre Seljebotn rep...@bugs.python.org wrote:
 It is perfectly possible for an object to export memory in a read-only
 way that may still change. Another object may have a writeable view:
 
 x = obj.readonly_view
 y = obj.writable_view
 obj.move_to_next_image() # changes memory under x, y

PEP-3118 always means different things to different people. :)

Your distinction between constant, readonly, writable is one way to
interpret things. But looking at the use in the Python source tree,
readonly does mean immutable.

Unfortunately the PEP doesn't specify readonly: Is it the same as
mount a partition readonly, readonly CD or O_RDONLY?

Also, existing practice was that in response to a PyBUF_FULL_RO request
the exporter was free to hand out a writable buffer!

This is now documented and subverts reliable tagging for consumers:

http://docs.python.org/dev/c-api/buffer.html#readonly-format

If any of this is a major inconvenience to NumPy or other users of the
buffer API, perhaps you could open a separate issue.

 (FWIW, not sure how relevant this is; in NumPy, == does 
 
 In [1]: np.array([1,2,3]) == np.array([1,3,3])
 Out[1]: array([ True, False,  True], dtype=bool)

Since afa3dedfee18 we compare memoryviews and any exporter in the same manner
as NumPy, except that the return value is a single boolean. So I think the new
comparison is equivalent to all(a == b) in NumPy.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-08-31 Thread Stefan Krah

Stefan Krah added the comment:

Dag Sverre Seljebotn rep...@bugs.python.org wrote:
 OK, I can understand the desire to make memoryviews be bytes-like objects
 (though to my mind, bytes is frozen in a very different way...)

We have two desires that sometimes conflict: People who want to use
memoryview as a *buffer* legitimately want it to behave like bytes.

People who want to have an arrayview want it to behave like NumPy
or PIL arrays. The new implementation preserves fast paths for bytes
and otherwise behaves pretty much like a very restricted version of
NumPy's ndarray (with support for suboffsets), except that not all
planned multi-dimensional features have been implemented yet.

 If so, and it is deemed worth it, my suggestion is to add a new
 PyBUF_CONST flag to the buffer acquisition in that case

Note that memory_hash() raises an error if the exporter *itself* is
not hashable, so it only hashes immutable objects by design.

But the PyBUF_CONST flag might be worth considering (in another issue).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-08-31 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Am 31.08.12 06:14, schrieb Stefan Behnel:
 To add on Dag's comments, this essentially means that any caching of
 the hash value is dangerous

Stefan Krah is right here: the proper test (which is already 
implemented) is whether the underlying object is hashable; this
is a sufficient test.

Caching of the hash is always correct, or else the hash function
is incorrect: a hash of an object must not change. If it would
change, you couldn't find the key anymore in a dictionary.

 unless it can be assured that the
 underlying buffer definitely has not changed in the meantime. There
 is no way for users to explicitly tell a memoryview to rehash itself,
 so putting it into one dict, then modifying the content, then putting
 it into another is a perfectly reasonable use case for users,

No, it's not:

d = {}
o = some_object
m = memoryview(o)
d[m] = -1
o.modify()  # so that hash(m) changes
print(d[m]) # keyerror, even though m in d.keys()

 As far as I can see it, any obvious fix for this creates a whole
 bath tub full of worms: updating views transitively, adding new APIs,
 supporting new buffer parameters, ...

So above (in this thread): the easiest fix is to just drop hashing
support for memoryview objects. This can be extended in steps:
- support hashing if the underlying object is bytes
- support hashing if the shape is 1D B, and the underlying object
   is hashable
- support hashing if the exporter has somehow promised to not modify
   the memory

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-08-30 Thread Stefan Krah

Stefan Krah added the comment:

Martin v. L??wis rep...@bugs.python.org wrote:
  hash(x) == hash(x.tobytes())
 In the light of this requirement, it's even more difficult to ask
 people that they change their hashing, since some exporters may already
 comply with that original request.

I don't think so. memoryview.__hash__() is new in 3.3 and the requirement
is not documented at all in the general PEP-3118 sections.

[Adding Stefan Behnel to nosy, since Cython is pretty quick to pick up
new features.]

  The new equality definition and any possible new hash definition should
  probably also be part of the buffer API documentation, since they
  aren't memoryview specific.
 
 That's not true: they *are* memoryview-specific. The notion of equality
 is entirely one of memoryview objects, not of buffers.

Could you name a part of the equality definition that is memoryview-specific?

 I still maintain that specifying hashing for memoryviews under the
 new equality definition is just not feasible, and that we should give
 up on it (except perhaps supporting the hashing of bytes views).
 I also question whether it is useful to hash arbitrarily-shaped
 read-only buffers (along with questioning whether people will actually
 *have* arbitrarily-shaped read-only buffers).

Useful, perhaps. I don't know if it is worth the effort though. We
could restrict hashing to contiguous bytes views in 3.3.1.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-08-30 Thread Stefan Krah

Changes by Stefan Krah stefan-use...@bytereef.org:


--
nosy: +scoder

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-08-30 Thread Stefan Krah

Stefan Krah added the comment:

Here's a doc patch restricting the hash to formats 'B', 'b' and 'c'.
I think non-contiguous views are fine: both __eq__() and tobytes()
handle these, so the equality-hash invariant is preserved.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-08-30 Thread Stefan Krah

Changes by Stefan Krah stefan-use...@bytereef.org:


--
keywords: +patch
Added file: http://bugs.python.org/file27058/issue15814-doc.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-08-30 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Am 30.08.12 11:24, schrieb Stefan Krah:
 The new equality definition and any possible new hash definition should
 probably also be part of the buffer API documentation, since they
 aren't memoryview specific.

 That's not true: they *are* memoryview-specific. The notion of equality
 is entirely one of memoryview objects, not of buffers.

 Could you name a part of the equality definition that is memoryview-specific?

As literally spelled out, it starts with a memoryview and a PEP 3118 
exporter. But I was actually thinking of the implementation...

In any case, I'm skeptical that this is a useful extension to the buffer 
interface. You would imply two exporters are equal iff their exported 
buffers are equal. I think this is asking too much.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-08-30 Thread Dag Sverre Seljebotn

Dag Sverre Seljebotn added the comment:

It is perfectly possible for an object to export memory in a read-only way that 
may still change. Another object may have a writeable view:

x = obj.readonly_view
y = obj.writable_view
obj.move_to_next_image() # changes memory under x, y

So, hashing using tobytes() doesn't make any sense at all to me. A memoryview 
!= its contents.

You could compare memoryviews simply by comparing the Py_buffer structure, but 
that's going to be confusing for a lot of users. I would really prefer 
unhashable (+ if needed a method for comparing contents).

(FWIW, not sure how relevant this is; in NumPy, == does 

In [1]: np.array([1,2,3]) == np.array([1,3,3])
Out[1]: array([ True, False,  True], dtype=bool)

Cython will follow this behaviour for its typed memory views, which is 
Cython's builting mechanism for PEP 3118 which is not quite the same as CPython 
memoryview. But I understand that following this behaviour is probably out of 
the question for CPython.)

--
nosy: +Dag.Sverre.Seljebotn

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-08-30 Thread Dag Sverre Seljebotn

Dag Sverre Seljebotn added the comment:

OK, I can understand the desire to make memoryviews be bytes-like objects 
(though to my mind, bytes is frozen in a very different way...)

If so, and it is deemed worth it, my suggestion is to add a new PyBUF_CONST 
flag to the buffer acquisition in that case (which can not be used together 
with PyBUF_WRITABLE). Given that flag, the exporter guarantees that the 
contents does not change (or fails to give away a buffer).

Perhaps it could be possible for hash() to try to re-acquire a const buffer, so 
that some buffers are hashable (by contents) and others not.

(I really think relying on buffers that are not writeable to be constant is 
dangerous. A readonly memoryview could for instance point straight to the live 
output of a webcam.)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-08-30 Thread Stefan Behnel

Stefan Behnel added the comment:

To add on Dag's comments, this essentially means that any caching of the hash 
value is dangerous, unless it can be assured that the underlying buffer 
definitely has not changed in the meantime. There is no way for users to 
explicitly tell a memoryview to rehash itself, so putting it into one dict, 
then modifying the content, then putting it into another is a perfectly 
reasonable use case for users, but fails when the memory view caches its hash 
value.

As far as I can see it, any obvious fix for this creates a whole bath tub 
full of worms: updating views transitively, adding new APIs, supporting new 
buffer parameters, ...

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-08-29 Thread Stefan Krah

New submission from Stefan Krah:

The new PEP-3118 equality definition from #15573 that is based
on element-wise comparisons breaks the equality-hash invariant:

 from _testbuffer import ndarray
 x = ndarray([1,2,3], shape=[3], format='f')
 y = ndarray([1,2,3], shape=[3], format='B')
 a = memoryview(x)
 b = memoryview(y)
 a == b
True
 hash(a) == hash(b)
False


--
assignee: skrah
messages: 169398
nosy: Arfrever, christian.heimes, georg.brandl, haypo, loewis, mark.dickinson, 
meador.inge, ncoghlan, pitrou, skrah
priority: normal
severity: normal
stage: needs patch
status: open
title: memoryview: equality-hash invariant
type: behavior
versions: Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-08-29 Thread Martin v . Löwis

Martin v. Löwis added the comment:

I think the proper solution is to make memoryview objects unhashable. Any other 
approach will have flaws of some kind.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-08-29 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 I think the proper solution is to make memoryview objects unhashable.

Disagreed. If memoryviews are to be bytes-like objects they should be
hashable (at least when readonly).

 Any other approach will have flaws of some kind.

Not more so than equality between memoryviews.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-08-29 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Am 29.08.12 20:01, schrieb Antoine Pitrou:
 I think the proper solution is to make memoryview objects unhashable.

 Disagreed. If memoryviews are to be bytes-like objects they should be
 hashable (at least when readonly).

So what specific hash algorithm do you propose?

 Any other approach will have flaws of some kind.

 Not more so than equality between memoryviews.

Well, memoryviews now have a definition of equality as discussed in 
issue15573. You may disagree whether it's a useful definition, but I
don't believe it has actual flaws (in the sense that it is incorrect
or inconsistent).

My claim is that any hash definition for memoryviews will have a
*fundamental* flaw, failing to provide the basic property
that A==B must imply hash(A)==hash(B), making it actually work
incorrectly (when used for its primary application, namely keys
in dictionaries).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-08-29 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 Am 29.08.12 20:01, schrieb Antoine Pitrou:
  I think the proper solution is to make memoryview objects unhashable.
 
  Disagreed. If memoryviews are to be bytes-like objects they should be
  hashable (at least when readonly).
 
 So what specific hash algorithm do you propose?

The current algorithm works well in conjunction with bytes objects.

 My claim is that any hash definition for memoryviews will have a
 *fundamental* flaw, failing to provide the basic property
 that A==B must imply hash(A)==hash(B), making it actually work
 incorrectly

Why is there such a fundamental flaw?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-08-29 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Am 29.08.12 21:06, schrieb Antoine Pitrou:
 So what specific hash algorithm do you propose?

 The current algorithm works well in conjunction with bytes objects.

That's about the only type if works for.

 My claim is that any hash definition for memoryviews will have a
 *fundamental* flaw, failing to provide the basic property
 that A==B must imply hash(A)==hash(B), making it actually work
 incorrectly

 Why is there such a fundamental flaw?

The current algorithm is flawed as described in Stefan's original
message: two objects compare equal, yet hash different. That means
that if you use them as dictionary keys, you may end up with two
different values for the same key, depending on the size of the
dictionary (as the modulo operation in the dictionary still may
map the different hashes to the same dictionary slot).

In general, since memoryview(obj)==obj, it would be necessary that
hash(memoryview(obj))==hash(obj). However, since memoryview cannot
know what hashing algorithm obj uses, it cannot compute the hash
value with the same algorithm.

IOW, hashing is mutually exclusive with comparison with the
underlying object, unless you know what the hash algorithm
of the underlying object is.

So restricting tp_hash to memoryview objects where the underlying
object is the bytes type would work.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-08-29 Thread Stefan Krah

Stefan Krah added the comment:

Martin v. L??wis rep...@bugs.python.org wrote:
 In general, since memoryview(obj)==obj, it would be necessary that
 hash(memoryview(obj))==hash(obj). However, since memoryview cannot
 know what hashing algorithm obj uses, it cannot compute the hash
 value with the same algorithm.

In the memoryview-hash thread on python-dev [1] this objection was
addressed by demanding from exporters that they all use:

   hash(x) == hash(x.tobytes())

Since the previous equality concept was also based on
x.tobytes() == y.tobytes(), this wasn't a problem.

The new equality definition and any possible new hash definition should
probably also be part of the buffer API documentation, since they
aren't memoryview specific.

[1] http://mail.python.org/pipermail/python-dev/2011-November/114459.html

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-08-29 Thread Stefan Krah

Stefan Krah added the comment:

And since memory_richcompare() and any potentially compatible hash function
are quite tricky to implement by now, perhaps the generally useful parts
could be exposed as PyBuffer_RichCompareBool() and PyBuffer_Hash().

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-08-29 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Am 29.08.12 22:04, schrieb Stefan Krah:
 In the memoryview-hash thread on python-dev [1] this objection was
 addressed by demanding from exporters that they all use:

 hash(x) == hash(x.tobytes())

 Since the previous equality concept was also based on
 x.tobytes() == y.tobytes(), this wasn't a problem.

In the light of this requirement, it's even more difficult to ask
people that they change their hashing, since some exporters may already
comply with that original request.

 The new equality definition and any possible new hash definition should
 probably also be part of the buffer API documentation, since they
 aren't memoryview specific.

That's not true: they *are* memoryview-specific. The notion of equality
is entirely one of memoryview objects, not of buffers.

I still maintain that specifying hashing for memoryviews under the
new equality definition is just not feasible, and that we should give
up on it (except perhaps supporting the hashing of bytes views).
I also question whether it is useful to hash arbitrarily-shaped
read-only buffers (along with questioning whether people will actually
*have* arbitrarily-shaped read-only buffers).

Stefan: please do propose a semantics also along with proposing interfaces.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-08-29 Thread Alexander Belopolsky

Changes by Alexander Belopolsky alexander.belopol...@gmail.com:


--
nosy: +belopolsky

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15814] memoryview: equality-hash invariant

2012-08-29 Thread Nick Coghlan

Nick Coghlan added the comment:

My perspective is that hashing a memoryview only makes sense when the 
memoryview is read-only and m == m.tobytes() (i.e. it's a C contiguous 1D 
view of bytes, either because that's what the original object exported as a 
buffer or because the view has been cast that way)

So, I think we should document that restriction in What's New and the 
memoryview docs for 3.3.0, and actually enforce it in 3.3.1 by raising 
ValueError from hash(m) when the assumption of a 1D C contiguous bytes view is 
violated (the read-only restriction is already enforced).

In previous versions, this constraint didn't need to be explicitly enforced, 
since memoryview basically treated *everything* as a 1D C contiguous view of 
bytes.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15814
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com