Roundup Robot added the comment:
New changeset 969069d464bc by Stefan Krah in branch '3.3':
Issue #15814: Use hash function that is compatible with the equality
http://hg.python.org/cpython/rev/969069d464bc
--
___
Python tracker
Stefan Krah added the comment:
We overlooked one thing. Since hashing is defined in terms of
tobytes(), the equality-hash invariant is now broken:
from _testbuffer import ndarray
x = ndarray([1,2,3], shape=[3], format='f')
y = ndarray([1,2,3], shape=[3], format='B')
a = memoryview(x)
b =
Nick Coghlan added the comment:
Perhaps the hash could just be based on a subset of the items? For example,
hash the format, shape and the first and last items?
Yes, such an approach means you're more likely to get collisions if you do
start hashing these, but that seems better than making
Stefan Krah added the comment:
I'm trying to think of an optimization for the native types. It should
be possible to cast any native type element to unsigned char and use the
truncated value for the bytes hash.
Well, for double probably it's required to go from double - int64_t -
unsigned
Martin v. Löwis added the comment:
1. I STRONGLY request to take hashing out of this issue. The only further
action that this issue can have is complete reversal of the patch, starting
over. Now that the issue has been resolved, a NEW issue arises, namely that
hashing is inconsistent with
Stefan Krah added the comment:
I agree that this isn't a blocker due to the limited use of
memoryview hashing. Tracking this in #15814.
--
status: open - closed
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com:
--
resolution: - fixed
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
STINNER Victor added the comment:
Even if I would prefer to see a fully working new implementation of the
buffer interface, I agree with Georg: it's too late for such huge change in
Python 3.3. Can' we wait Python 3.4 to change the equality operator?
It also took me three major releases to fix
Stefan Krah added the comment:
The effect of the change is pretty minimal though: Previously
not equal was returned for unknown formats, now the struct
module figures it out.
memoryobject.c has 100% coverage except for a couple of lines
that are either impossible to reach or very hard to reach.
Nick Coghlan added the comment:
With Stefan's major improvements to the test suite for 3.3, the risk is low and
I *really* don't want to spend the life of the 3.3 series excusing the current
comparison behaviour.
By contrast, explaining the shift in definition as moving from raw memory
Georg Brandl added the comment:
Well, I'm not against you committing it as long as you commit it *right now*.
I'm about to cut the tag.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
Stefan Krah added the comment:
You can look at the attached memoryobject.c.gcov: *All* failure paths
are taken since Python API functions are wrapped in macros that
set PyExc_FailAPIError and return failure at predefined points
the test suite.
That should read: With a patch that wraps
Stefan Krah added the comment:
Great, give me 20 minutes or so.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Python-bugs-list
Nick Coghlan added the comment:
I'm currently on IRC and double checking the patch locally.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
Nick Coghlan added the comment:
Stefan, was there something specific you wanted to do before committing?
Otherwise I can commit it as soon as this full test run finishes.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
Roundup Robot added the comment:
New changeset afa3dedfee18 by Nick Coghlan in branch 'default':
Close #15573: use value-based memoryview comparisons (patch by Stefan Krah)
http://hg.python.org/cpython/rev/afa3dedfee18
--
resolution: - fixed
stage: patch review - committed/rejected
Stefan Krah added the comment:
Except for Misc/NEWS the patch can be committed unchanged. Please
go ahead!
[The remaining doc changes are tracked elsewhere.]
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
Martin v. Löwis added the comment:
Is this still blocking the release? If so, it should be resolved within the
next twelve hours, or else it may block the release until September.
--
___
Python tracker rep...@bugs.python.org
Georg Brandl added the comment:
I'm not very keen to hold up the release for long times again, especially for a
patch of this size and lots of potential breakage.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
Changes by Georg Brandl ge...@python.org:
--
priority: deferred blocker - release blocker
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
Stefan Krah added the comment:
I think issue15573-struct-2.diff is ready to go and I'd rather commit
sooner than later. Nick, can I interpret your last review comment
as go ahead? :)
--
___
Python tracker rep...@bugs.python.org
Georg Brandl added the comment:
Small nit: when you put versionchanged:: 3.3 there should be a hint of *what*
changed exactly :)
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
Stefan Krah added the comment:
Right. I'm tracking all versionchanged issues in #15724.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
Stefan Krah added the comment:
There is still one corner case involving NaNs: Released memoryviews always
compare equal. I took that over from the 3.2 implementation.
import array
a = array.array('d', [float('nan')])
m = memoryview(a)
m == m
False
m.release()
m == m
True
I guess we have
Stefan Krah added the comment:
Here is a new patch that hopefully addresses all comments.
--
Added file: http://bugs.python.org/file26831/issue15573-struct-2.diff
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
Stefan Krah added the comment:
Here is a patch implementing by-value comparisons for all format strings
understood by the struct module. It is slightly longer than promised, since
for larger arrays it is necessary to cache an unpacking object for acceptable
performance. The fast path for
Stefan Krah added the comment:
I didn't get my own comments as mail, so this is just a notification that I've
answered Nick's comments.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
Nick Coghlan added the comment:
Yeah, as soon as I got your email I realised I'd only been searching for usage
*in the patch* rather than the full file. Oops :(
More comments in the review.
--
___
Python tracker rep...@bugs.python.org
Stefan Krah added the comment:
Martin v. L??wis rep...@bugs.python.org wrote:
Here is a more formal definition of my last proposal,
v and w are equal iff
v.shape() == w.shape() and
((v.format == w.format and v.tobytes() == w.tobytes()) or
v.tolist() == w.tolist())
if tolist raises
Stefan Krah added the comment:
To be specific, after a quick look the only function from _testbuffer.c
that would be needed is unpack_single(). This really does not look very
complex.
--
___
Python tracker rep...@bugs.python.org
Nick Coghlan added the comment:
We do import modules in the Object/* code if we really need to (Grep for
PyImport_Import in the Objects dir)
It seems to me like this case is rapidly reaching (or has even gone well past)
the point where that's the sensible thing to do.
A couple of key issues
Stefan Krah added the comment:
Thanks, Nick. I'll work on a struct module patch then. At this point for me
this is the only satisfying solution.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
Nick Coghlan added the comment:
Stefan's proposed definition is correct. Shapes define array types, format
strings define entry types, then the actual memory contents define the
value.
It's not Stefan's definition of equivalence, it's what a statically typed
array *means*.
The 3.2 way is
Martin v. Löwis added the comment:
Nick: I still disagree. Would you agree that array.array constitutes a
statically typed array? Yet
py array.array('b',b'foo') == array.array('B',b'foo')
True
py array.array('i',[1,2,3]) == array.array('L', [1,2,3])
True
So the array object (rightfully)
Martin v. Löwis added the comment:
it might *still* happen, I should say, since this problem is exactly what
Victor says this issue intends to solve (for comparison of two 'u' arrays).
--
___
Python tracker rep...@bugs.python.org
Stefan Krah added the comment:
So we have two competing proposals:
1) Py_buffers are strongly typed arrays in the ML sense (e.g. array of float,
array of int).
This is probably what I'd prefer on the C level, imagine a function like
function like PyBuffer_Compare(v, w).
Backwards
Martin v. Löwis added the comment:
Here is a patch doing the comparison on abstract values if the formats are
different.
--
Added file: http://bugs.python.org/file26766/abstractcmp.diff
___
Python tracker rep...@bugs.python.org
Nick Coghlan added the comment:
OK, I think I finally understand what Martin is getting at from a semantic
point of view, and I think I can better explain the background of the issue and
why Stefan's proposed solution is both necessary and correct.
The ideal definition of equivalence for
Nick Coghlan added the comment:
Short version:
3.3 implemented new constraints on memoryview equality comparisons to eliminate
cases that were incorrectly returning True when they should have returned False.
However, the format constraints are currently too restrictive, so some
memoryview
Nick Coghlan added the comment:
I can see value in adopting the abstraction approach, but it's not part of this
issue. That problem already existed when using memoryview in 3.2, we can wait
until 3.4 to fix it.
There's an actual, concrete regression in 3.3 that may break code, and
comparing
Nick Coghlan added the comment:
And, to be honest, I'd be quite happy for congratulations, you have reached
the threshold where you need NumPy rather than memoryview to be the long term
answer to getting by value comparison semantics.
--
___
Python
Martin v. Löwis added the comment:
However, the format constraints are currently too restrictive, so
some memoryview comparisons that correctly returned True in 3.2 are
now also returning False. This patch fixes *those* comparisons to
once again return True.
No, it doesn't - at least not in
Martin v. Löwis added the comment:
And, to be honest, I'd be quite happy for congratulations, you have
reached the threshold where you need NumPy rather than memoryview to
be the long term answer to getting by value comparison semantics.
IMO, this threshold is already reached when you start
Nick Coghlan added the comment:
Hmm, you're right. OK, here's a simpler proposal for 3.3:
1. Always look at shape first. If they're different, then they're not equal
2. If both formats are known, compare by unpacked value
3. If either format is unknown, compare by memory contents (just like
Antoine Pitrou added the comment:
Le samedi 11 août 2012 à 19:52 +, Nick Coghlan a écrit :
I'd be happier if the compare-by-value didn't make complete copies of
the entire array though.
Ditto. If a and b are bytes objects, comparing memoryview(a) and
memoryview(b) should be as cheap as
Martin v. Löwis added the comment:
Am 11.08.12 21:59, schrieb Antoine Pitrou:
Le samedi 11 août 2012 à 19:52 +, Nick Coghlan a écrit :
I'd be happier if the compare-by-value didn't make complete copies of
the entire array though.
Ditto. If a and b are bytes objects, comparing
Martin v. Löwis added the comment:
Here is a more formal definition of my last proposal,
v and w are equal iff
v.shape() == w.shape() and
((v.format == w.format and v.tobytes() == w.tobytes()) or
v.tolist() == w.tolist())
if tolist raises an exception for unsupported codes, they are not
Martin v. Löwis added the comment:
haypo: thanks for stating the issue.
ISTM that this classifies as an obscure bug: you have to use memoryviews, and
you need to compare them for equality, and the comparison needs to be
non-trivial, where trivial is defined
by both are 1D byte arrays.
While
Stefan Krah added the comment:
I can easily provide a specification that makes the current implementation
correct
Yes, the current specification is: memoryview only attempts to
compare arrays with known (single character native) formats and
returns not equal otherwise.
The problem is that
Martin v. Löwis added the comment:
Note that in 3.2 memoryview would return equal for arrays that
simply aren't equal, if those arrays happen to have the same bit
pattern.
This is exactly my point. If a memoryview has the same memory
as another, why are they then not rightfully considered
Stefan Krah added the comment:
PEP-3118 specifies strongly typed multi-dimensional arrays. The existing
code in the 3.2 memoryview as well as numerous comments by Travis Oliphant
make it clear that these capabilities were intended from the start for
memoryview as well.
Perhaps the name
Martin v. Löwis added the comment:
Can you please elaborate on the specification:
1. what does it mean that the formats of v and w are equal?
2. Victor's clarification about this issue isn't about comparing two arrays,
but an array with a string object. So: when is an array equal to some
Stefan Krah added the comment:
1. what does it mean that the formats of v and w are equal?
I'm using array and Py_buffer interchangeably since a Py_buffer struct
actually describes a multi-dimensional array. v and w are Py_buffer
structs.
So v.format must equal w.format, where format is a
Changes by Stefan Krah stefan-use...@bytereef.org:
--
stage: needs patch - patch review
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
Martin v. Löwis added the comment:
So v.format must equal w.format, where format is a format string in
struct module syntax. The topic of this issue is to determine under
what circumstances two strings in struct module syntax are considered
equal.
And that is exactly my question: We don't
STINNER Victor added the comment:
Can't we start with something simple (for ptyhon 3.3?), and elaborate
later? In my specific example, both object have the same format string and
the same content. So i expect that they are equal.
Le 10 août 2012 19:47, Martin v. Löwis rep...@bugs.python.org a
Martin v. Löwis added the comment:
Can't we start with something simple (for ptyhon 3.3?), and elaborate
later?
Sure: someone would have to make a proposal what exactly that is.
IMO, the 3.2 definition *was* simple, but apparently it was considered
too simple. So either Stefan gets to define
Stefan Krah added the comment:
The ideal specification is:
1) Arrays v and w are equal iff format and shape are equal and for all valid
indices allowed by shape
memcmp((char *)PyBuffer_GetPointer(v, indices),
(char *)PyBuffer_GetPointer(w, indices),
itemsize) ==
Stefan Krah added the comment:
Martin v. Loewis rep...@bugs.python.org wrote:
Sure: someone would have to make a proposal what exactly that is.
IMO, the 3.2 definition *was* simple, but apparently it was considered
too simple.
It was simply broken in multiple ways. Example:
from numpy
Martin v. Löwis added the comment:
I find Stefan's proposed equality confusing. Why is it based on memcmp? Either
it compares memory (i.e. internal representations), or it compares abstract
values. If it compares abstract values, then it shouldn't be a requirement that
the format strings are
Martin v. Löwis added the comment:
Can someone please explain why this is a release blocker? What is the specific
regression from 3.2 that this deals with?
--
nosy: +loewis
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
STINNER Victor added the comment:
What is the specific regression from 3.2 that this deals with?
I don't know if it must be called a regression, but at least the behaviour is
different in Python 3.2 and 3.3. For example, an Unicode array is no more equal
to its memoryview:
Python 3.3.0b1
Stefan Krah added the comment:
Right, byte order specifiers are always at the beginning of the string.
That is at least something. I wonder if we should tighten PEP-3118 to
demand a canonical form of format strings, such as (probably incomplete):
- Whitespace is disallowed.
- Except for
New submission from Stefan Krah:
Continuing the discussion from #13072. I hit a snag here:
Determining in full generality whether two format strings describe
identical items is pretty complicated, see also #3132.
I'm attaching a best effort fmtcmp() function that should do the
following:
-
Nick Coghlan added the comment:
I confess I was thinking of an even simpler format strings must be identical
fallback, but agree your way is better, since it reproduces the 3.2 behaviour
in many more cases where ignoring the format string actually did the right
thing.
The struct docs for the
Changes by Christian Heimes li...@cheimes.de:
--
nosy: +christian.heimes
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15573
___
___
Meador Inge added the comment:
I agree that the general case is complicated. It will get even more
complicated if the full of PEP 3118 gets implemented since it turns into a tree
comparison. In general, I think you will probably have to compute some
canonical form and then compare the
67 matches
Mail list logo