[Python-Dev] PyObject_RichCompareBool identity shortcut
The other day I was surprised to learn this:
>>> nan = float('nan')
>>> nan == nan
False
>>> [nan] == [nan]
True # also True in tuples, dicts, etc.
# also:
>>> l = [nan]
>>> nan in l
True
>>> l.index(nan)
0
>>> l[0] == nan
False
The identity test is not in container comparators, but in
PyObject_RichCompareBool:
/* Quick result when objects are the same.
Guarantees that identity implies equality. */
if (v == w) {
if (op == Py_EQ)
return 1;
else if (op == Py_NE)
return 0;
}
The guarantee referred to in the comment is not only (AFAICT)
undocumented, but contradicts the documentation, which states that the
result should be the "equivalent of o1 op o2".
Calling PyObject_RichCompareBool is inconsistent with calling
PyObject_RichCompare and converting its result to bool manually,
something that wrappers (C++) and generators (cython) might reasonably
want to do themselves, for various reasons.
If this is considered a bug, I can open an issue.
Hrvoje
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
Wiadomość napisana przez Hrvoje Niksic w dniu 2011-04-27, o godz. 11:37:
> The other day I was surprised to learn this:
>
> >>> nan = float('nan')
> >>> nan == nan
> False
> >>> [nan] == [nan]
> True # also True in tuples, dicts, etc.
>
> # also:
> >>> l = [nan]
> >>> nan in l
> True
> >>> l.index(nan)
> 0
> >>> l[0] == nan
> False
>
This surprises me as well. I guess this is all related to the fact that:
>>> nan is nan
True
Have a look at this as well:
>>> inf = float('inf')
>>> inf == inf
True
>>> [inf] == [inf]
True
>>> l = [inf]
>>> inf in l
True
>>> l.index(inf)
0
>>> l[0] == inf
True
# Or even:
>>> inf+1 == inf-1
True
For the infinity part, I believe this is related to the funky IEEE 754
standard. I found
some discussion about this here:
http://compilers.iecc.com/comparch/article/98-07-134
--
Best regards,
Łukasz Langa
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
2011/4/27 Łukasz Langa : > # Or even: inf+1 == inf-1 > True > > For the infinity part, I believe this is related to the funky IEEE 754 > standard. I found > some discussion about this here: > http://compilers.iecc.com/comparch/article/98-07-134 The inf behaviour is fine (inf != inf only when you start talking about aleph levels, and IEEE 754 doesn't handle those). It's specifically `nan` that is problematic, as it is one of the very few cases that breaks the reflexivity of equality. That said, the current behaviour was chosen deliberately so that containers could cope with `nan` at least somewhat gracefully: http://bugs.python.org/issue4296 Issue 10912 added an explicit note about this behaviour to the 3.x series documentation, but that has not as yet been backported to 2.7 (I reopened the issue to request such a backport). Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Apr 27, 2011, at 2:37 AM, Hrvoje Niksic wrote:
> The other day I was surprised to learn this:
>
> >>> nan = float('nan')
> >>> nan == nan
> False
> >>> [nan] == [nan]
> True # also True in tuples, dicts, etc.
Would also be surprised if you put an object in a dictionary but couldn't get
it out? Or added it to a list but its count was zero?
Identity-implies-equality is necessary so that classes can maintain their
invariants and so that programmers can reason about their code. It is not just
in PyObject_RichCompareBool, it is deeply embedded in the language (the logic
inside dicts for example). It is not a short-cut, it is a way of making sure
that internally we can count on equality relations reflexive, symmetric, and
transitive. A programmer needs to be able to make basic deductions such as the
relationship between the two forms of the in-operator: for elem in somelist:
assert elem in somelist # this should never fail.
What surprises me is that anyone gets surprised by anything when experimenting
with an object that isn't equal to itself. It is roughly in the same category
as creating a __hash__ that has no relationship to __eq__ or making
self-referencing sets or setting False,True=1,0 in python 2. See
http://bertrandmeyer.com/2010/02/06/reflexivity-and-other-pillars-of-civilization/
for a nice blog post on the subject.
Raymond
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Wed, Apr 27, 2011 at 7:39 AM, Raymond Hettinger
wrote:
>
> On Apr 27, 2011, at 2:37 AM, Hrvoje Niksic wrote:
>
> The other day I was surprised to learn this:
>
nan = float('nan')
nan == nan
> False
[nan] == [nan]
> True # also True in tuples, dicts, etc.
>
> Would also be surprised if you put an object in a dictionary but couldn't
> get it out? Or added it to a list but its count was zero?
> Identity-implies-equality is necessary so that classes can maintain their
> invariants and so that programmers can reason about their code. It is not
> just in PyObject_RichCompareBool, it is deeply embedded in the language (the
> logic inside dicts for example). It is not a short-cut, it is a way of
> making sure that internally we can count on equality relations reflexive,
> symmetric, and transitive. A programmer needs to be able to make basic
> deductions such as the relationship between the two forms of the
> in-operator: for elem in somelist: assert elem in somelist # this should
> never fail.
> What surprises me is that anyone gets surprised by anything when
> experimenting with an object that isn't equal to itself. It is roughly in
> the same category as creating a __hash__ that has no relationship to __eq__
> or making self-referencing sets or setting False,True=1,0 in python 2.
> See http://bertrandmeyer.com/2010/02/06/reflexivity-and-other-pillars-of-civilization/ for
> a nice blog post on the subject.
Maybe we should just call off the odd NaN comparison behavior?
--
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 12:53 AM, Guido van Rossum wrote: >> What surprises me is that anyone gets surprised by anything when >> experimenting with an object that isn't equal to itself. It is roughly in >> the same category as creating a __hash__ that has no relationship to __eq__ >> or making self-referencing sets or setting False,True=1,0 in python 2. >> See http://bertrandmeyer.com/2010/02/06/reflexivity-and-other-pillars-of-civilization/ for >> a nice blog post on the subject. > > Maybe we should just call off the odd NaN comparison behavior? Rereading Meyer's article (I read it last time this came up, but it's a nice piece, so I ended up going over it again this time) the quote that leapt out at me was this one: """A few of us who had to examine the issue recently think that — whatever the standard says at the machine level — a programming language should support the venerable properties that equality is reflexive and that assignment yields equality. Every programming language should decide this on its own; for Eiffel we think this should be the specification. Do you agree?""" Currently, Python tries to split the difference: "==" and "!=" follow IEEE754 for NaN, but most other operations involving builtin types rely on the assumption that equality is always reflexive (and IEEE754 be damned). What that means is that "correct" implementations of methods like __contains__, __eq__, __ne__, index() and count() on containers should be using "x is y or x == y" to enforce reflexivity, but most such code does not (e.g. our own collections.abc.Sequence implementation gets those of these that it implements wrong, and hence Sequence based containers will handle NaN in a way that differs from the builtin containers) And none of that is actually documented anywhere (other than a behavioural note in the 3.x documentation for PyObject_RichCompareBool), so it's currently just an implementation detail of CPython that most of the builtin containers behave that way in practice. Given the status quo, what would seem to be the path of least resistance is to: - articulate in the language specification which container special methods are expected to enforce reflexivity of equality (even for non-reflexive types) - articulate in the library specification which ordinary container methods enforce reflexivity of equality - fix any standard library containers that don't enforce reflexivity to do so where appropriate (e.g. collections.abc.Sequence) Types with a non-reflexive notion of equality still wouldn't play nicely with containers that didn't enforce reflexivity where appropriate, but bad interactions between 3rd party types isn't really something we can prevent. Backing away from having float and decimal.Decimal respect the IEEE754 notion of NaN inequality at this late stage of the game seems like one for the "too hard" basket. It also wouldn't achieve much, since we want the builtin containers to preserve their invariants even for 3rd party types with a non-reflexive notion of equality. Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Wed, Apr 27, 2011 at 10:53 AM, Guido van Rossum wrote: .. > Maybe we should just call off the odd NaN comparison behavior? +1 There was a long thread on this topic last year: http://mail.python.org/pipermail/python-dev/2010-March/098832.html I was trying to find a rationale for non-reflexivity of equality in IEEE and although it is often mentioned that this property simplifies some numerical algorithms, I am yet to find an important algorithm that would benefit from it. I also believe that long history of suboptimal hardware implementations of nan arithmetics has stifled the development of practical applications. High performance applications that rely on non-reflexivity will still have an option of using ctypes.c_float type or NumPy. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 1:43 AM, Alexander Belopolsky wrote: > High performance applications that rely on non-reflexivity will still > have an option of using ctypes.c_float type or NumPy. However, that's exactly the reason I don't see any reason to reverse course on having float() and Decimal() follow IEEE754 semantics, regardless of how irritating we may find those semantics to be. Since we allow types to customise __eq__ and __ne__ with non-standard behaviour, if we want to permit *any* type to have a non-reflexive notion of equality, then we need to write our container types to enforce reflexivity when appropriate. Many of the builtin types already do this, by virtue of it being built in to RichCompareBool. It's now a matter of documenting that properly and updating the non-conformant types accordingly. Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Wed, Apr 27, 2011 at 11:31 AM, Nick Coghlan wrote:
..
> Backing away from having float and decimal.Decimal respect the IEEE754
> notion of NaN inequality at this late stage of the game seems like one
> for the "too hard" basket.
Why? float('nan') has always been in the use-at-your-own-risk
territory despite recent efforts to support it across Python
platforms. I cannot speak about decimal.Decimal (and decimal is a
different story because it is tied to a particular standard), but the
only use of non-reflexifity for float nans I've seen was use of x != x
instead of math.isnan(x).
> It also wouldn't achieve much, since we
> want the builtin containers to preserve their invariants even for 3rd
> party types with a non-reflexive notion of equality.
These are orthogonal issues. A third party type that plays with
__eq__ and other basic operations can easily break stdlib algorithms
no matter what we do. Therefore it is important to document the
properties of the types that each algorithm relies on. It is more
important, however that stdlib types do not break 3rd party's
algorithms. I don't think I've ever seen a third party type that
deliberately defines a non-reflexive __eq__ except as a side effect of
using float attributes or C float members in the underlying structure.
(Yes, decimal is a counter-example, but this is a very special case.)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Wed, Apr 27, 2011 at 12:05 PM, Isaac Morland wrote: .. > Of course, the definition of math.isnan cannot then be by checking its > argument by comparison with itself - it would have to check the appropriate > bits of the float representation. math.isnan() is implemented in C and does not rely on float.__eq__ in any way. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Wed, 27 Apr 2011, Alexander Belopolsky wrote: High performance applications that rely on non-reflexivity will still have an option of using ctypes.c_float type or NumPy. Python could also provide IEEE-754 equality as a function (perhaps in "math"), something like: def ieee_equal (a, b): return a == b and not isnan (a) and not isnan (b) Of course, the definition of math.isnan cannot then be by checking its argument by comparison with itself - it would have to check the appropriate bits of the float representation. Isaac Morland CSCF Web Guru DC 2554C, x36650WWW Software Specialist ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Wed, 27 Apr 2011 12:05:12 -0400 (EDT) Isaac Morland wrote: > On Wed, 27 Apr 2011, Alexander Belopolsky wrote: > > > High performance applications that rely on non-reflexivity will still > > have an option of using ctypes.c_float type or NumPy. > > Python could also provide IEEE-754 equality as a function (perhaps in > "math"), something like: > > def ieee_equal (a, b): > return a == b and not isnan (a) and not isnan (b) +1 (perhaps call it math.eq()). Regards Antoine. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Apr 27, 2011, at 7:53 AM, Guido van Rossum wrote: > Maybe we should just call off the odd NaN comparison behavior? I'm reluctant to suggest changing such enshrined behavior. ISTM, the current state of affairs is reasonable. Exotic objects are allowed to generate exotic behaviors but consumers of those objects are free to ignore some of those behaviors by making reasonable assumptions about how an object should behave. It's possible to make objects where the __hash__ doesn't correspond to __eq__.; they just won't behave well with hash tables. Likewise, it's possible for a sequence to define a __len__ that is different from it true length; it just won't behave well with the various pieces of code that assume collections are equal if the lengths are unequal. All of this seems reasonable to me. Raymond ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Wed, 27 Apr 2011, Antoine Pitrou wrote: Isaac Morland wrote: Python could also provide IEEE-754 equality as a function (perhaps in "math"), something like: def ieee_equal (a, b): return a == b and not isnan (a) and not isnan (b) +1 (perhaps call it math.eq()). Alexander Belopolsky pointed out to me (thanks!) that isnan is implemented in C so my caveat about the implementation of isnan is not an issue. But then that made me realize the ieee_equal (or just "eq" if that's preferable) probably ought to be implemented in C using a floating point comparison - i.e., use the processor implementation of the comparison operation.. Isaac Morland CSCF Web Guru DC 2554C, x36650WWW Software Specialist ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Issue Tracker
Ezio Melotti wrote: On 26/04/2011 22.32, Ethan Furman wrote: Okay, I finally found a little time and got roundup installed and operating. Only major complaint at this point is that the issue messages are presented in top-post format (argh). Does anyone know off the top of one's head what to change to put roundup in bottom-post (chronological) format? TIA! ~Ethan~ See line 309 of http://svn.python.org/view/tracker/instances/python-dev/html/issue.item.html?view=markup If you have other questions about Roundup see https://lists.sourceforge.net/lists/listinfo/roundup-users Thanks so much! That was just what I needed. ~Ethan~ ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Wed, Apr 27, 2011 at 12:28 PM, Raymond Hettinger wrote: > > On Apr 27, 2011, at 7:53 AM, Guido van Rossum wrote: > >> Maybe we should just call off the odd NaN comparison behavior? > > I'm reluctant to suggest changing such enshrined behavior. > > ISTM, the current state of affairs is reasonable. > Exotic objects are allowed to generate exotic behaviors > but consumers of those objects are free to ignore some > of those behaviors by making reasonable assumptions > about how an object should behave. Unfortunately NaNs are not that exotic. They can be silently produced in calculations and lead to hard to find errors. For example: >>> x = 1e300*1e300 >>> x - x nan This means that every program dealing with float data has to detect nans at every step and handle them correctly. This in turn makes it impossible to write efficient code that works equally well with floats and integers. Note that historically, Python was trying hard to prevent production of non-finite floats. AFAICT, none of the math functions would produce inf or nan. I am not sure why arithmetic operations are different. For example: >>> 1e300*1e300 inf but >>> 1e300**2 Traceback (most recent call last): File "", line 1, in OverflowError: (34, 'Result too large') and >>> math.pow(1e300,2) Traceback (most recent call last): File "", line 1, in OverflowError: math range error ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Apr 27, 2011, at 10:16 AM, Alexander Belopolsky wrote: > Unfortunately NaNs are not that exotic. They're exotic in the sense that they have the unusual property of not being equal to themselves. Exotic (adj) strikingly strange or unusual Raymond ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 4/27/2011 10:53 AM, Guido van Rossum wrote:
On Wed, Apr 27, 2011 at 7:39 AM, Raymond Hettinger
>> Identity-implies-equality is necessary so that classes can maintain
>> their invariants and so that programmers can reason about their code.
[snip]
See
http://bertrandmeyer.com/2010/02/06/reflexivity-and-other-pillars-of-civilization/
for
a nice blog post on the subject.
I carefully reread this, with the comments, and again came to the
conclusion that the committee left us no *good* answer, only a choice
between various more-or-less unsatifactory answers. The current Python
compromise may be as good as anything. In any case, I think it should be
explicitly documented with an indexed paragraph, perhaps as follows:
"The IEEE-754 committee defined the float Not_a_Number (NaN) value as
being incomparable with all others floats, including itself. This
violates the math and logic rule that equality is reflexive, that 'a ==
a' is always True. And Python collection classes depend on that rule for
their proper operation. So Python makes the follow compromise. Direct
equality comparisons involving Nan, such as "NaN=float('NaN'); NaN ==
ob", follow the IEEE-754 rule and return False. Indirect comparisons
conducted internally as part of a collection operation, such as 'NaN in
someset' or 'seq.count()' or 'somedict[x]', follow the reflexive rule
and act as it 'Nan == NaN' were True. Most Python programmers will never
see a Nan in real programs."
This might best be an entry in the Glossary under "NaN -- Not a Number".
It should be the first reference for Nan in the General Index and linked
to from the float() builtin and float type Nan mentions.
Maybe we should just call off the odd NaN comparison behavior?
Eiffel seems to have survived, though I do not know if it used for
numerical work. I wonder how much code would break and what the scipy
folks would think. 3.0 would have been the time, though.
--
Terry Jan Reedy
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 4/27/2011 8:31 AM, Nick Coghlan wrote: What that means is that "correct" implementations of methods like __contains__, __eq__, __ne__, index() and count() on containers should be using "x is y or x == y" to enforce reflexivity, but most such code does not (e.g. our own collections.abc.Sequence implementation gets those of these that it implements wrong, and hence Sequence based containers will handle NaN in a way that differs from the builtin containers) +1 to everything Nick said. One issue that I don't fully understand: I know there is only one instance of None in Python, but I'm not sure where to discover whether there is only a single, or whether there can be multiple, instances of NaN or Inf. The IEEE 754 spec is clear that there are multiple bit sequences that can be used to represent these, so I would hope that there can be, in fact, more than one value containing NaN (and Inf). This would properly imply that a collection should correctly handle the case of storing multiple, different items using different NaN (and Inf) instances. A dict, for example, should be able to hold hundreds of items with the index value of NaN. The distinction between "is" and "==" would permit proper operation, and I believe that Python's "rebinding" of names to values rather than the copying of values to variables makes such a distinction possible to use in a correct manner. Can someone confirm or explain this issue? ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 4/27/11 12:44 PM, Terry Reedy wrote: On 4/27/2011 10:53 AM, Guido van Rossum wrote: Maybe we should just call off the odd NaN comparison behavior? Eiffel seems to have survived, though I do not know if it used for numerical work. I wonder how much code would break and what the scipy folks would think. I suspect most of us would oppose changing it on general backwards-compatibility grounds rather than actually *liking* the current behavior. If the behavior changed with Python floats, we'd have to mull over whether we try to match that behavior with our scalar types (one of which subclasses from float) and our arrays. We would be either incompatible with Python or C, and we'd probably end up choosing Python to diverge from. It would make a mess, honestly. We already have to explain why equality is funky for arrays (arr1 == arr2 is a rich comparison that gives an array, not a bool, so we can't do containment tests for lists of arrays), so NaN is pretty easy to explain afterward. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 4/27/2011 2:41 PM, Glenn Linderman wrote:
One issue that I don't fully understand: I know there is only one
instance of None in Python, but I'm not sure where to discover whether
there is only a single, or whether there can be multiple, instances of
NaN or Inf.
I am sure there are multiple instances with just one bit pattern, the
same as other floats. Otherwise, float('nan') would have to either
randomly or systematically choose from among the possibilities. Ugh.
There are functions in the math module that pull apart (and put
together) floats.
> The IEEE 754 spec is clear that there are multiple bit
sequences that can be used to represent these,
Anyone actually interested in those should use C or possibly the math
module float assembly function.
> so I would hope that
there can be, in fact, more than one value containing NaN (and Inf).
If you do not know which pattern is which, what use could such passibly be?
--
Terry Jan Reedy
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 4/27/2011 11:31 AM, Nick Coghlan wrote: Currently, Python tries to split the difference: "==" and "!=" follow IEEE754 for NaN, but most other operations involving builtin types rely on the assumption that equality is always reflexive (and IEEE754 be damned). What that means is that "correct" implementations of methods like __contains__, __eq__, __ne__, index() and count() on containers should be using "x is y or x == y" to enforce reflexivity, but most such code does not (e.g. our own collections.abc.Sequence implementation gets those of these that it implements wrong, and hence Sequence based containers will handle NaN in a way that differs from the builtin containers) And none of that is actually documented anywhere (other than a behavioural note in the 3.x documentation for PyObject_RichCompareBool), so it's currently just an implementation detail of CPython that most of the builtin containers behave that way in practice. Which is why I proposed a Glossary entry in another post. Given the status quo, what would seem to be the path of least resistance is to: - articulate in the language specification which container special methods are expected to enforce reflexivity of equality (even for non-reflexive types) - articulate in the library specification which ordinary container methods enforce reflexivity of equality - fix any standard library containers that don't enforce reflexivity to do so where appropriate (e.g. collections.abc.Sequence) +1 to making my proposed text consistenly true if not now ;-). Backing away from having float and decimal.Decimal respect the IEEE754 notion of NaN inequality at this late stage of the game seems like one for the "too hard" basket. Robert Kern confirmed my suspicion about this relative to numpy. > It also wouldn't achieve much, since we want the builtin containers to preserve their invariants even for 3rd party types with a non-reflexive notion of equality. Good point. -- Terry Jan Reedy ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Wed, Apr 27, 2011 at 10:37 AM, Hrvoje Niksic wrote:
> The other day I was surprised to learn this:
>
nan = float('nan')
nan == nan
> False
[nan] == [nan]
> True # also True in tuples, dicts, etc.
That one surprises me a bit too: I knew we were using
identity-then-equality checks for containment (nan in [nan]), but I
hadn't realised identity-then-equality was also used for the
item-by-item comparisons when comparing two lists. It's defensible,
though: [nan] == [nan] should presumably produce the same result as
{nan} == {nan}, and the latter is a test that's arguably based on
containment (for sets s and t, s == t if each element of s is in t,
and vice versa).
I don't think any of this should change. It seems to me that we've
currently got something approaching the best approximation to
consistency and sanity achievable, given the fundamental
incompatibility of (1) nan breaking reflexivity of equality and (2)
containment being based on equality. That incompatibility is bound to
create inconsistencies somewhere along the line.
Declaring that 'nan == nan' should be True seems attractive in theory,
but I agree that it doesn't really seem like a realistic option in
terms of backwards compatibility and compatibility with other
mainstream languages.
Mark
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Wed, Apr 27, 2011 at 7:41 PM, Glenn Linderman wrote:
> One issue that I don't fully understand: I know there is only one instance
> of None in Python, but I'm not sure where to discover whether there is only
> a single, or whether there can be multiple, instances of NaN or Inf. The
> IEEE 754 spec is clear that there are multiple bit sequences that can be
> used to represent these, so I would hope that there can be, in fact, more
> than one value containing NaN (and Inf).
>
> This would properly imply that a collection should correctly handle the case
> of storing multiple, different items using different NaN (and Inf)
> instances. A dict, for example, should be able to hold hundreds of items
> with the index value of NaN.
>
> The distinction between "is" and "==" would permit proper operation, and I
> believe that Python's "rebinding" of names to values rather than the copying
> of values to variables makes such a distinction possible to use in a correct
> manner.
For infinities, there's no issue: there are exactly two distinct
infinities (+inf and -inf), and they don't have any special properties
that affect membership tests. Your float-keyed dict can contain both
+inf and -inf keys, or just one, or neither, in exactly the same way
that it can contain both +5.0 and -5.0 as keys, or just one, or
neither.
For nans, you *can* put multiple nans into a dictionary as separate
keys, but under the current rules the test for 'sameness' of two nan
keys becomes a test of object identity, not of bitwise equality.
Python takes no notice of the sign bits and 'payload' bits of a float
nan, except in operations like struct.pack and struct.unpack. For
example:
>>> x, y = float('nan'), float('nan')
>>> d = {x: 1, y:2}
>>> x in d
True
>>> y in d
True
>>> d[x]
1
>>> d[y]
2
But using struct.pack, you can see that x and y are bitwise identical:
>>> struct.pack('http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Socket servers in the test suite
I've been recently trying to improve the test coverage for the logging package, and have got to a not unreasonable point: logging/__init__.py 99% (96%) logging/config.py 89% (85%) logging/handlers.py 60% (54%) where the figures in parentheses include branch coverage measurements. I'm at the point where to appreciably increase coverage, I'd need to write some test servers to exercise client code in SocketHandler, DatagramHandler and HTTPHandler. I notice there are no utility classes in test.support to help with this kind of thing - would there be any mileage in adding such things? Of course I could add test server code just to test_logging (which already contains some socket server code to exercise the configuration functionality), but rolling a test server involves boilerplate such as using a custom RequestHandler-derived class for each application. I had in mind a more streamlined approach where you can just pass a single callable to a server to handle requests, e.g. as outlined in https://gist.github.com/945157 I'd be grateful for any comments about adding such functionality to e.g. test.support. Regards, Vinay Sajip ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 4/27/2011 2:15 PM, Mark Dickinson wrote:
On Wed, Apr 27, 2011 at 7:41 PM, Glenn Linderman wrote:
One issue that I don't fully understand: I know there is only one instance
of None in Python, but I'm not sure where to discover whether there is only
a single, or whether there can be multiple, instances of NaN or Inf. The
IEEE 754 spec is clear that there are multiple bit sequences that can be
used to represent these, so I would hope that there can be, in fact, more
than one value containing NaN (and Inf).
This would properly imply that a collection should correctly handle the case
of storing multiple, different items using different NaN (and Inf)
instances. A dict, for example, should be able to hold hundreds of items
with the index value of NaN.
The distinction between "is" and "==" would permit proper operation, and I
believe that Python's "rebinding" of names to values rather than the copying
of values to variables makes such a distinction possible to use in a correct
manner.
For infinities, there's no issue: there are exactly two distinct
infinities (+inf and -inf), and they don't have any special properties
that affect membership tests. Your float-keyed dict can contain both
+inf and -inf keys, or just one, or neither, in exactly the same way
that it can contain both +5.0 and -5.0 as keys, or just one, or
neither.
For nans, you *can* put multiple nans into a dictionary as separate
keys, but under the current rules the test for 'sameness' of two nan
keys becomes a test of object identity, not of bitwise equality.
Python takes no notice of the sign bits and 'payload' bits of a float
nan, except in operations like struct.pack and struct.unpack. For
example:
Thanks, Mark, for the succinct description and demonstration. Yes, only
two Inf values, many possible NaNs. And this is what I would expect.
I would not, however expect the original case that was described:
>>> nan = float('nan')
>>> nan == nan
False
>>> [nan] == [nan]
True # also True in tuples, dicts, etc.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
Guido van Rossum wrote: Maybe we should just call off the odd NaN comparison behavior? That's probably as good an idea as anything. The weirdness of NaNs is supposed to ensure that they propagate through a computation as a kind of exception signal. But to make that work properly, comparing two NaNs should really give you a NaB (Not a Boolean). As long as we're not doing that, we might as well treat NaNs sanely as Python objects. -- Greg ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 4/27/2011 2:04 PM, Mark Dickinson wrote:
On Wed, Apr 27, 2011 at 10:37 AM, Hrvoje Niksic wrote:
The other day I was surprised to learn this:
nan = float('nan')
nan == nan
False
[nan] == [nan]
True # also True in tuples, dicts, etc.
That one surprises me a bit too: I knew we were using
identity-then-equality checks for containment (nan in [nan]), but I
hadn't realised identity-then-equality was also used for the
item-by-item comparisons when comparing two lists. It's defensible,
though: [nan] == [nan] should presumably produce the same result as
{nan} == {nan}, and the latter is a test that's arguably based on
containment (for sets s and t, s == t if each element of s is in t,
and vice versa).
I don't think any of this should change. It seems to me that we've
currently got something approaching the best approximation to
consistency and sanity achievable, given the fundamental
incompatibility of (1) nan breaking reflexivity of equality and (2)
containment being based on equality. That incompatibility is bound to
create inconsistencies somewhere along the line.
Declaring that 'nan == nan' should be True seems attractive in theory,
but I agree that it doesn't really seem like a realistic option in
terms of backwards compatibility and compatibility with other
mainstream languages.
I think it should change. Inserting a NaN, even the same instance of
NaN into a list shouldn't suddenly make it compare equal to itself,
especially since the docs (section 5.9. Comparisons) say:
*
Tuples and lists are compared lexicographically using comparison
of corresponding elements. This means that to compare equal, each
element must compare equal and the two sequences must be of the
same type and have the same length.
If not equal, the sequences are ordered the same as their first
differing elements. For example, [1,2,x] <= [1,2,y] has the same
value as x <= y. If the corresponding element does not exist, the
shorter sequence is ordered first (for example, [1,2] < [1,2,3]).
The principle of least surprise, says that if two unequal items are
inserted into otherwise equal lists, the lists should be unequal. NaN
is unequal to itself.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
Guido van Rossum wrote: Maybe we should just call off the odd NaN comparison behavior? This doesn't solve the broader problem that *any* type might deliberately define non-reflexive equality, and therefore people will still be surprised by >>> x = SomeObject() >>> x == x False >>> [x] == [x] True The "problem" (if it is a problem) here is list, not NANs. Please don't break NANs to not-fix a problem with list. Since we can't (can we?) prohibit non-reflexivity, and even if we can, we shouldn't, reasonable solutions are: (1) live with the fact that lists and other built-in containers will short-cut equality with identity for speed, ignoring __eq__; (2) slow containers down by guaranteeing that they will use __eq__; (but how much will it actually hurt performance for real-world cases? and this will have the side-effect that non-reflexivity will propagate to containers) (3) allow types to register that they are non-reflexive, allowing containers to skip the identity shortcut when necessary. (but it is not clear to me that the extra complexity will be worth the cost) My vote is the status quo, (1). -- Steven ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
Terry Reedy wrote:
On 4/27/2011 2:41 PM, Glenn Linderman wrote:
One issue that I don't fully understand: I know there is only one
instance of None in Python, but I'm not sure where to discover whether
there is only a single, or whether there can be multiple, instances of
NaN or Inf.
I am sure there are multiple instances with just one bit pattern, the
same as other floats. Otherwise, float('nan') would have to either
randomly or systematically choose from among the possibilities. Ugh.
I think Glenn is asking whether NANs are singletons. They're not:
>>> x = float('nan')
>>> y = float('nan')
>>> x is y
False
>>> [x] == [y]
False
There are functions in the math module that pull apart (and put
together) floats.
The IEEE 754 spec is clear that there are multiple bit
sequences that can be used to represent these,
Anyone actually interested in those should use C or possibly the math
module float assembly function.
I'd like to point out that way back in the 1980s, Apple's Hypercard
allowed users to construct, and compare, distinct NANs without needing
to use C or check bit patterns. I think it is painful and ironic that a
development system aimed at non-programmers released by a company
notorious for "dumbing down" interfaces over 20 years ago had better and
simpler support for NANs than we have now.
--
Steven
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
Greg Ewing wrote: Guido van Rossum wrote: Maybe we should just call off the odd NaN comparison behavior? That's probably as good an idea as anything. The weirdness of NaNs is supposed to ensure that they propagate through a computation as a kind of exception signal. But to make that work properly, comparing two NaNs should really give you a NaB (Not a Boolean). As long as we're not doing that, we might as well treat NaNs sanely as Python objects. That doesn't follow. You can compare NANs, and the result of the comparisons are perfectly well defined by either True or False. There's no need for a NAB comparison flag. -- Steven ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: PyGILState_Ensure(), PyGILState_Release(), PyGILState_GetThisThreadState() are
Would it be a problem to make them available a no-ops?
On 4/26/11, victor.stinner wrote:
> http://hg.python.org/cpython/rev/75503c26a17f
> changeset: 69584:75503c26a17f
> user:Victor Stinner
> date:Tue Apr 26 23:34:58 2011 +0200
> summary:
> PyGILState_Ensure(), PyGILState_Release(), PyGILState_GetThisThreadState()
> are
> not available if Python is compiled without threads.
>
> files:
> Include/pystate.h | 10 +++---
> 1 files changed, 7 insertions(+), 3 deletions(-)
>
>
> diff --git a/Include/pystate.h b/Include/pystate.h
> --- a/Include/pystate.h
> +++ b/Include/pystate.h
> @@ -73,9 +73,9 @@
> struct _frame *frame;
> int recursion_depth;
> char overflowed; /* The stack has overflowed. Allow 50 more calls
> - to handle the runtime error. */
> -char recursion_critical; /* The current calls must not cause
> - a stack overflow. */
> +to handle the runtime error. */
> +char recursion_critical; /* The current calls must not cause
> +a stack overflow. */
> /* 'tracing' keeps track of the execution depth when tracing/profiling.
> This is to prevent the actual trace/profile code from being recorded
> in
> the trace/profile. */
> @@ -158,6 +158,8 @@
> enum {PyGILState_LOCKED, PyGILState_UNLOCKED}
> PyGILState_STATE;
>
> +#ifdef WITH_THREAD
> +
> /* Ensure that the current thread is ready to call the Python
> C API, regardless of the current state of Python, or of its
> thread lock. This may be called as many times as desired
> @@ -199,6 +201,8 @@
> */
> PyAPI_FUNC(PyThreadState *) PyGILState_GetThisThreadState(void);
>
> +#endif /* #ifdef WITH_THREAD */
> +
> /* The implementation of sys._current_frames() Returns a dict mapping
> thread id to that thread's current frame.
> */
>
> --
> Repository URL: http://hg.python.org/cpython
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
Mark Dickinson wrote:
On Wed, Apr 27, 2011 at 10:37 AM, Hrvoje Niksic wrote:
The other day I was surprised to learn this:
nan = float('nan')
nan == nan
False
[nan] == [nan]
True # also True in tuples, dicts, etc.
That one surprises me a bit too: I knew we were using
identity-then-equality checks for containment (nan in [nan]), but I
hadn't realised identity-then-equality was also used for the
item-by-item comparisons when comparing two lists. It's defensible,
though: [nan] == [nan] should presumably produce the same result as
{nan} == {nan}, and the latter is a test that's arguably based on
containment (for sets s and t, s == t if each element of s is in t,
and vice versa).
I don't think any of this should change. It seems to me that we've
currently got something approaching the best approximation to
consistency and sanity achievable, given the fundamental
incompatibility of (1) nan breaking reflexivity of equality and (2)
containment being based on equality. That incompatibility is bound to
create inconsistencies somewhere along the line.
Declaring that 'nan == nan' should be True seems attractive in theory,
but I agree that it doesn't really seem like a realistic option in
terms of backwards compatibility and compatibility with other
mainstream languages.
Totally out of my depth, but what if the a NaN object was allowed to
compare equal to itself, but different NaN objects still compared
unequal? If NaN was a singleton then the current behavior makes more
sense, but since we get a new NaN with each instance creation is there
really a good reason why the same NaN can't be equal to itself?
~Ethan~
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 4/27/2011 5:05 PM, Steven D'Aprano wrote:
(2) slow containers down by guaranteeing that they will use __eq__;
(but how much will it actually hurt performance for real-world cases?
and this will have the side-effect that non-reflexivity will propagate
to containers)
I think it is perfectly reasonable that containers containing items with
non-reflexive equality should sometimes have non-reflexive equality also
(depends on the placement of the item in the container, and the values
of other items, whether the non-reflexive equality of an internal item
will actually affect the equality of the container in practice).
I quoted the docs for tuple and list comparisons in a different part of
this thread, and for those types, the docs are very clear that the items
must compare equal for the lists or tuples to compare equal. For other
built-in types, the docs are less clear:
*
Mappings (dictionaries) compare equal if and only if they have the
same (key, value) pairs. Order comparisons ('<', '<=', '>=', '>')
raise TypeError
.
So we can immediately conclude that mappings do not provide an ordering
for sorts. But, the language "same (key, value)" pairs implies identity
comparisons, rather than equality comparisons. But in practice,
equality is used sometimes, and identity sometimes:
>>> nan = float('NaN')
>>> d1 = dict( a=1, nan=2 )
>>> d2 = dict( a=1, nan=2.0 )
>>> d1 == d2
True
>>> 2 is 2.0
False
"nan" and "nan" is being compared using identity, 2 and 2.0 by
equality. While that may be clear to those of you that know the
implementation (and even have described it somewhat in this thread), it
is certainly not clear in the docs. And I think it should read much
more like lists and tuples... "if all the (key, value) pairs, considered
as tuples, are equal".
*
Sets and frozensets define comparison operators to mean subset and
superset tests. Those relations do not define total orderings (the
two sets {1,2} and {2,3} are not equal, nor subsets of one
another, nor supersets of one another). Accordingly, sets are not
appropriate arguments for functions which depend on total
ordering. For example, min()
, max()
, and
sorted()
produce undefined results given a list of sets as inputs.
This clearly talks about sets and subsets, but it doesn't define those
concepts well in this section. It should refer to where it that concept
is defined, perhaps. The intuitive definition of "subset" to me is if,
for every item in set A, if an equal item is found in set B, then set A
is a subset of set B. That's what I learned back in math classes.
Since NaN is not equal to NaN, however, I would not expect a set
containing NaN to compare equal to any other set.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 4/27/2011 6:11 PM, Ethan Furman wrote:
Mark Dickinson wrote:
On Wed, Apr 27, 2011 at 10:37 AM, Hrvoje Niksic
wrote:
The other day I was surprised to learn this:
nan = float('nan')
nan == nan
False
[nan] == [nan]
True # also True in tuples, dicts, etc.
That one surprises me a bit too: I knew we were using
identity-then-equality checks for containment (nan in [nan]), but I
hadn't realised identity-then-equality was also used for the
item-by-item comparisons when comparing two lists. It's defensible,
though: [nan] == [nan] should presumably produce the same result as
{nan} == {nan}, and the latter is a test that's arguably based on
containment (for sets s and t, s == t if each element of s is in t,
and vice versa).
I don't think any of this should change. It seems to me that we've
currently got something approaching the best approximation to
consistency and sanity achievable, given the fundamental
incompatibility of (1) nan breaking reflexivity of equality and (2)
containment being based on equality. That incompatibility is bound to
create inconsistencies somewhere along the line.
Declaring that 'nan == nan' should be True seems attractive in theory,
but I agree that it doesn't really seem like a realistic option in
terms of backwards compatibility and compatibility with other
mainstream languages.
Totally out of my depth, but what if the a NaN object was allowed to
compare equal to itself, but different NaN objects still compared
unequal? If NaN was a singleton then the current behavior makes more
sense, but since we get a new NaN with each instance creation is there
really a good reason why the same NaN can't be equal to itself?
>>> n1 = float('NaN')
>>> n2 = float('NaN')
>>> n3 = n1
>>> n1
nan
>>> n2
nan
>>> n3
nan
>>> [n1] == [n2]
False
>>> [n1] == [n3]
True
This is the current situation: some NaNs compare equal sometimes, and
some don't. And unless you are particularly aware of the identity of
the object containing the NaN (not the list, but the particular NaN
value) it is surprising and confusing, because the mathematical
definition of NaN is that it should not be equal to itself.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 4/27/2011 6:15 PM, Glenn Linderman wrote: I think it is perfectly reasonable that containers containing items with non-reflexive equality should sometimes have non-reflexive equality also (depends on the placement of the item in the container, and the values of other items, whether the non-reflexive equality of an internal item will actually affect the equality of the container in practice). Pardon me, please ignore the parenthetical statement... it was really inspired by inequality comparisons, not equality comparisons. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Socket servers in the test suite
On Thu, Apr 28, 2011 at 7:23 AM, Vinay Sajip wrote: > I've been recently trying to improve the test coverage for the logging > package, > and have got to a not unreasonable point: > > logging/__init__.py 99% (96%) > logging/config.py 89% (85%) > logging/handlers.py 60% (54%) > > where the figures in parentheses include branch coverage measurements. > > I'm at the point where to appreciably increase coverage, I'd need to write > some > test servers to exercise client code in SocketHandler, DatagramHandler and > HTTPHandler. > > I notice there are no utility classes in test.support to help with this kind > of > thing - would there be any mileage in adding such things? Of course I could > add > test server code just to test_logging (which already contains some socket > server > code to exercise the configuration functionality), but rolling a test server > involves boilerplate such as using a custom RequestHandler-derived class for > each application. I had in mind a more streamlined approach where you can just > pass a single callable to a server to handle requests, e.g. as outlined in > > https://gist.github.com/945157 > > I'd be grateful for any comments about adding such functionality to e.g. > test.support. If you poke around in the test directory a bit, you may find there is already some code along these lines in other tests (e.g. I'm pretty sure the urllib tests already fire up a local server). Starting down the path of standardisation of that test functionality would be good. For larger components like this, it's also reasonable to add a dedicated helper module rather than using test.support directly. I started (and Antoine improved) something along those lines with the test.script_helper module for running Python subprocesses and checking their output, although it lacks documentation and there are lots of older tests that still use subprocess directly. Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
Glenn Linderman writes:
> I would not, however expect the original case that was described:
> >>> nan = float('nan')
> >>> nan == nan
> False
> >>> [nan] == [nan]
> True # also True in tuples, dicts, etc.
Are you saying you would expect that
>>> nan = float('nan')
>>> a = [1, ..., 499, nan, 501, ..., 999]# meta-ellipsis, not Ellipsis
>>> a == a
False
??
I wouldn't even expect
>>> a = [1, ..., 499, float('nan'), 501, ..., 999]
>>> b = [1, ..., 499, float('nan'), 501, ..., 999]
>>> a == b
False
but I guess I have to live with that. While I wouldn't apply it
to other people, I have to admit Raymond's aphorism applies to me (the
surprising thing is not the behavior of NaNs, but that I'm surprised
by anything that happens in the presence of NaNs!)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
Mark Dickinson writes: > Declaring that 'nan == nan' should be True seems attractive in > theory, No, it's intuitively attractive, but that's because humans like nice continuous behavior. In *theory*, it's true that some singularities are removable, and the NaN that occurs when evaluating at that point is actually definable in a broader context, but the point of NaN is that some singularities are *not* removable. This is somewhat Pythonic: "In the presence of ambiguity, refuse to guess." ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
Glenn Linderman writes:
> On 4/27/2011 6:11 PM, Ethan Furman wrote:
> > Totally out of my depth, but what if the a NaN object was allowed to
> > compare equal to itself, but different NaN objects still compared
> > unequal? If NaN was a singleton then the current behavior makes more
> > sense, but since we get a new NaN with each instance creation is there
> > really a good reason why the same NaN can't be equal to itself?
Yes. A NaN is a special object that means "the computation that
produced this object is undefined." For example, consider the
computation 1/x at x = 0. If you approach from the left, 1/0
"obviously" means minus infinity, while if you approach from the right
just as obviously it means plus infinity. So what does the 1/0 that
occurs in [1/x for x in range(-5, 6)] mean? In what sense is it
"equal to itself"? How can something which is not a number be
compared for numerical equality?
> >>> n1 = float('NaN')
> >>> n2 = float('NaN')
> >>> n3 = n1
>
> >>> n1
> nan
> >>> n2
> nan
> >>> n3
> nan
>
> >>> [n1] == [n2]
> False
> >>> [n1] == [n3]
> True
>
> This is the current situation: some NaNs compare equal sometimes, and
> some don't.
No, Ethan is asking for "n1 == n3" => True. As Mark points out, "[n1]
== [n3]" can be interpreted as a containment question, rather than an
equality question, with respect to the NaNs themselves. In standard
set theory, these are the same question, but that's not necessarily so
in other set-like toposes. In particular, getting equality and set
membership to behave reasonably with respect to each other one of the
problems faced in developing a workable theory of fuzzy sets.
I don't think it matters what behavior you choose for NaNs, somebody
is going be unhappy sometimes.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Wed, Apr 27, 2011 at 9:28 AM, Raymond Hettinger wrote: > > On Apr 27, 2011, at 7:53 AM, Guido van Rossum wrote: > >> Maybe we should just call off the odd NaN comparison behavior? > > I'm reluctant to suggest changing such enshrined behavior. No doubt there would be some problems; probably more for decimals than for floats. > ISTM, the current state of affairs is reasonable. Hardly; when I picked the NaN behavior I knew the IEEE std prescribed it but had never seen any code that used this. > Exotic objects are allowed to generate exotic behaviors > but consumers of those objects are free to ignore some > of those behaviors by making reasonable assumptions > about how an object should behave. I'd say that the various issues and inconsistencies brought up (e.g. x in A even though no a in A equals x) make it clear that one ignores NaN's exoticnesss at one's peril. > It's possible to make objects where the __hash__ doesn't > correspond to __eq__.; they just won't behave well with > hash tables. That's not the same thing at all. Such an object would violate a rule of the language (although one that Python cannot strictly enforce) and it would always be considered a bug. Currently NaN is not violating any language rules -- it is just violating users' intuition, in a much worse way than Inf does. (All in all, Inf behaves pretty intuitively, at least for someone who was awake during at least a few high school math classes. NaN is not discussed there. :-) > Likewise, it's possible for a sequence to > define a __len__ that is different from it true length; it > just won't behave well with the various pieces of code > that assume collections are equal if the lengths are unequal. (you probably meant "are never equal") Again, typically a bug. > All of this seems reasonable to me. Given the IEEE std and Python's history, it's defensible and hard to change, but still, I find reasonable too strong a word for the situation. I expect that that if 15 years or so ago I had decided to ignore the IEEE std and declare that object identity always implies equality it would have seemed quite reasonable as well... The rule could be something like "the == operator first checks for identity and if left and right are the same object, the answer is True without calling the object's __eq__ method; similarly the != would always return False when an object is compared to itself". We wouldn't change the inequalities, nor the outcome if a NaN is compared to another NaN (not the same object). But we would extend the special case for object identity from containers to all == and != operators. (Currently it seems that all NaNs have a hash() of 0. That hasn't hurt anyone so far.) Doing this in 3.3 would, alas, be a huge undertaking -- I expect that there are tons of unittests that depend either on the current NaN behavior or on x == x calling x.__eq__(x). Plus the decimal unittests would be affected. Perhaps somebody could try? -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Wed, Apr 27, 2011 at 11:48 AM, Robert Kern wrote: > On 4/27/11 12:44 PM, Terry Reedy wrote: >> >> On 4/27/2011 10:53 AM, Guido van Rossum wrote: > >>> Maybe we should just call off the odd NaN comparison behavior? >> >> Eiffel seems to have survived, though I do not know if it used for >> numerical >> work. I wonder how much code would break and what the scipy folks would >> think. > > I suspect most of us would oppose changing it on general > backwards-compatibility grounds rather than actually *liking* the current > behavior. If the behavior changed with Python floats, we'd have to mull over > whether we try to match that behavior with our scalar types (one of which > subclasses from float) and our arrays. We would be either incompatible with > Python or C, and we'd probably end up choosing Python to diverge from. It > would make a mess, honestly. We already have to explain why equality is > funky for arrays (arr1 == arr2 is a rich comparison that gives an array, not > a bool, so we can't do containment tests for lists of arrays), so NaN is > pretty easy to explain afterward. So does NumPy also follow Python's behavior about ignoring the NaN special-casing when doing array ops? -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 2011-04-27 22:16 , Guido van Rossum wrote:
On Wed, Apr 27, 2011 at 11:48 AM, Robert Kern wrote:
On 4/27/11 12:44 PM, Terry Reedy wrote:
On 4/27/2011 10:53 AM, Guido van Rossum wrote:
Maybe we should just call off the odd NaN comparison behavior?
Eiffel seems to have survived, though I do not know if it used for
numerical
work. I wonder how much code would break and what the scipy folks would
think.
I suspect most of us would oppose changing it on general
backwards-compatibility grounds rather than actually *liking* the current
behavior. If the behavior changed with Python floats, we'd have to mull over
whether we try to match that behavior with our scalar types (one of which
subclasses from float) and our arrays. We would be either incompatible with
Python or C, and we'd probably end up choosing Python to diverge from. It
would make a mess, honestly. We already have to explain why equality is
funky for arrays (arr1 == arr2 is a rich comparison that gives an array, not
a bool, so we can't do containment tests for lists of arrays), so NaN is
pretty easy to explain afterward.
So does NumPy also follow Python's behavior about ignoring the NaN
special-casing when doing array ops?
By "ignoring the NaN special-casing", do you mean that identity is checked
first? When we use dtype=object arrays (arrays that contain Python objects as
their data), yes:
[~]
|1> nan = float('nan')
[~]
|2> import numpy as np
[~]
|3> a = np.array([1, 2, nan], dtype=object)
[~]
|4> nan in a
True
[~]
|5> float('nan') in a
False
Just like lists:
[~]
|6> nan in [1, 2, nan]
True
[~]
|7> float('nan') in [1, 2, nan]
False
Actually, we go a little further by using PyObject_RichCompareBool() rather than
PyObject_RichCompare() to implement the array-wise comparisons in addition to
containment:
[~]
|8> a == nan
array([False, False, True], dtype=bool)
[~]
|9> [x == nan for x in [1, 2, nan]]
[False, False, False]
But for dtype=float arrays (which contain C doubles, not Python objects) we use
C semantics. Literally, we use whatever C's == operator gives us for the two
double values. Since there is no concept of identity for this case, there is no
cognate behavior of Python to match.
[~]
|10> b = np.array([1.0, 2.0, nan], dtype=float)
[~]
|11> b == nan
array([False, False, False], dtype=bool)
[~]
|12> nan in b
False
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 12:42 PM, Stephen J. Turnbull wrote: > Mark Dickinson writes: > > > Declaring that 'nan == nan' should be True seems attractive in > > theory, > > No, it's intuitively attractive, but that's because humans like nice > continuous behavior. In *theory*, it's true that some singularities > are removable, and the NaN that occurs when evaluating at that point > is actually definable in a broader context, but the point of NaN is > that some singularities are *not* removable. This is somewhat > Pythonic: "In the presence of ambiguity, refuse to guess." Refusing to guess in this case would be to treat all NaNs as signalling NaNs, and that wouldn't be good, either :) I like Terry's suggestion for a glossary entry, and have created an updated proposal at http://bugs.python.org/issue11945 (I also noted that array.array is like collections.Sequence in failing to enforce the container invariants in the presence of NaN values) Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Wed, Apr 27, 2011 at 8:42 PM, Robert Kern wrote:
> On 2011-04-27 22:16 , Guido van Rossum wrote:
>> So does NumPy also follow Python's behavior about ignoring the NaN
>> special-casing when doing array ops?
>
> By "ignoring the NaN special-casing", do you mean that identity is checked
> first? When we use dtype=object arrays (arrays that contain Python objects
> as their data), yes:
>
> [~]
> |1> nan = float('nan')
>
> [~]
> |2> import numpy as np
>
> [~]
> |3> a = np.array([1, 2, nan], dtype=object)
>
> [~]
> |4> nan in a
> True
>
> [~]
> |5> float('nan') in a
> False
>
>
> Just like lists:
>
> [~]
> |6> nan in [1, 2, nan]
> True
>
> [~]
> |7> float('nan') in [1, 2, nan]
> False
>
>
> Actually, we go a little further by using PyObject_RichCompareBool() rather
> than PyObject_RichCompare() to implement the array-wise comparisons in
> addition to containment:
>
> [~]
> |8> a == nan
> array([False, False, True], dtype=bool)
Hm, this sounds like NumPy always considers a NaN equal to *itself* as
long as objects are concerned.
> [~]
> |9> [x == nan for x in [1, 2, nan]]
> [False, False, False]
>
>
> But for dtype=float arrays (which contain C doubles, not Python objects) we
> use C semantics. Literally, we use whatever C's == operator gives us for the
> two double values. Since there is no concept of identity for this case,
> there is no cognate behavior of Python to match.
>
> [~]
> |10> b = np.array([1.0, 2.0, nan], dtype=float)
>
> [~]
> |11> b == nan
> array([False, False, False], dtype=bool)
>
> [~]
> |12> nan in b
> False
And I wouldn't want to change that. It sounds like NumPy wouldn't be
much affected if we were to change this (which I'm not saying we
would).
Thanks!
--
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Wed, Apr 27, 2011 at 8:43 PM, Nick Coghlan wrote: > (I also noted that array.array is like collections.Sequence in failing > to enforce the container invariants in the presence of NaN values) Regardless of whether we go any further it would indeed be good to be explicit about the rules in the language reference and fix the behavior of collections.Sequence. I'm not sure about array.array -- it doesn't hold objects so I don't think there's anything to enforce. It seems to behave the same way as NumPy arrays when they don't contain objects. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Wed, Apr 27, 2011 at 2:48 PM, Robert Kern wrote: .. > I suspect most of us would oppose changing it on general > backwards-compatibility grounds rather than actually *liking* the current > behavior. If the behavior changed with Python floats, we'd have to mull over > whether we try to match that behavior with our scalar types (one of which > subclasses from float) and our arrays. We would be either incompatible with > Python or C, and we'd probably end up choosing Python to diverge from. It > would make a mess, honestly. We already have to explain why equality is > funky for arrays (arr1 == arr2 is a rich comparison that gives an array, not > a bool, so we can't do containment tests for lists of arrays), so NaN is > pretty easy to explain afterward. Most NumPy applications are actually not exposed to NaN problems because it is recommended that NaNs be avoided in computations and when missing or undefined values are necessary, the recommended solution is to use ma.array or masked array which is a drop-in replacement for numpy array type and carries a boolean "mask" value with every element. This allows to have undefined elements is arrays of any type: float, integer or even boolean. Masked values propagate through all computations including comparisons. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Wed, Apr 27, 2011 at 9:15 PM, Alexander Belopolsky wrote: > On Wed, Apr 27, 2011 at 2:48 PM, Robert Kern wrote: > .. >> I suspect most of us would oppose changing it on general >> backwards-compatibility grounds rather than actually *liking* the current >> behavior. If the behavior changed with Python floats, we'd have to mull over >> whether we try to match that behavior with our scalar types (one of which >> subclasses from float) and our arrays. We would be either incompatible with >> Python or C, and we'd probably end up choosing Python to diverge from. It >> would make a mess, honestly. We already have to explain why equality is >> funky for arrays (arr1 == arr2 is a rich comparison that gives an array, not >> a bool, so we can't do containment tests for lists of arrays), so NaN is >> pretty easy to explain afterward. > > Most NumPy applications are actually not exposed to NaN problems > because it is recommended that NaNs be avoided in computations and > when missing or undefined values are necessary, the recommended > solution is to use ma.array or masked array which is a drop-in > replacement for numpy array type and carries a boolean "mask" value > with every element. This allows to have undefined elements is arrays > of any type: float, integer or even boolean. Masked values propagate > through all computations including comparisons. So do new masks get created when the outcome of an elementwise operation is a NaN? Because that's the only reason why one should have NaNs in one's data in the first place -- not to indicate missing values! -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 2011-04-27 23:01 , Guido van Rossum wrote:
On Wed, Apr 27, 2011 at 8:42 PM, Robert Kern wrote:
But for dtype=float arrays (which contain C doubles, not Python objects) we
use C semantics. Literally, we use whatever C's == operator gives us for the
two double values. Since there is no concept of identity for this case,
there is no cognate behavior of Python to match.
[~]
|10> b = np.array([1.0, 2.0, nan], dtype=float)
[~]
|11> b == nan
array([False, False, False], dtype=bool)
[~]
|12> nan in b
False
And I wouldn't want to change that. It sounds like NumPy wouldn't be
much affected if we were to change this (which I'm not saying we
would).
Well, I didn't say that. If Python changed its behavior for (float('nan') ==
float('nan')), we'd have to seriously consider some changes. We do like to keep
*some* amount of correspondence with Python semantics. In particular, we like
our scalar types that match Python types to work as close to the Python type as
possible. We have the np.float64 type, which represents a C double scalar and
corresponds to a Python float. It is used when a single item is indexed out of a
float64 array. We even subclass from the Python float type to help working with
libraries that may not know about numpy:
[~]
|5> import numpy as np
[~]
|6> nan = np.array([1.0, 2.0, float('nan')])[2]
[~]
|7> nan == nan
False
[~]
|8> type(nan)
numpy.float64
[~]
|9> type(nan).mro()
[numpy.float64,
numpy.floating,
numpy.inexact,
numpy.number,
numpy.generic,
float,
object]
If the Python float type changes behavior, we'd have to consider whether to keep
that for np.float64 or change it to match the usual C semantics used elsewhere.
So there *would* be a dilemma. Not necessarily the most nerve-wracking one, but
a dilemma nonetheless.
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 2011-04-27 23:24 , Guido van Rossum wrote: On Wed, Apr 27, 2011 at 9:15 PM, Alexander Belopolsky wrote: On Wed, Apr 27, 2011 at 2:48 PM, Robert Kern wrote: .. I suspect most of us would oppose changing it on general backwards-compatibility grounds rather than actually *liking* the current behavior. If the behavior changed with Python floats, we'd have to mull over whether we try to match that behavior with our scalar types (one of which subclasses from float) and our arrays. We would be either incompatible with Python or C, and we'd probably end up choosing Python to diverge from. It would make a mess, honestly. We already have to explain why equality is funky for arrays (arr1 == arr2 is a rich comparison that gives an array, not a bool, so we can't do containment tests for lists of arrays), so NaN is pretty easy to explain afterward. Most NumPy applications are actually not exposed to NaN problems because it is recommended that NaNs be avoided in computations and when missing or undefined values are necessary, the recommended solution is to use ma.array or masked array which is a drop-in replacement for numpy array type and carries a boolean "mask" value with every element. This allows to have undefined elements is arrays of any type: float, integer or even boolean. Masked values propagate through all computations including comparisons. So do new masks get created when the outcome of an elementwise operation is a NaN? No. Because that's the only reason why one should have NaNs in one's data in the first place -- not to indicate missing values! Yes. I'm not sure that Alexander was being entirely clear. Masked arrays are intended to solve just the missing data problem and not the occurrence of NaNs from computations. There is still a persistent part of the community that really does like to use NaNs for missing data, though. I don't think that's entirely relevant to this discussion[1]. I wouldn't say that numpy applications aren't exposed to NaN problems. They are just as exposed to computational NaNs as you would expect any application that does that many flops to be. [1] Okay, that's a lie. I'm sure that persistent minority would *love* to have NaN == NaN, because that would make their (ab)use of NaNs easier to work with. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 2:07 PM, Guido van Rossum wrote: > I'm not sure about array.array -- it doesn't hold objects so I don't > think there's anything to enforce. It seems to behave the same way as > NumPy arrays when they don't contain objects. Yep, after reading Robert's post I realised the point about native arrays in NumPy (and the lack of "object identity" in those cases) applied equally well to the array module. Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 4/27/2011 7:31 PM, Stephen J. Turnbull wrote:
Glenn Linderman writes:
> I would not, however expect the original case that was described:
> >>> nan = float('nan')
> >>> nan == nan
> False
> >>> [nan] == [nan]
> True # also True in tuples, dicts, etc.
Are you saying you would expect that
nan = float('nan')
a = [1, ..., 499, nan, 501, ..., 999]# meta-ellipsis, not Ellipsis
a == a
False
??
Yes, absolutely. Once you understand the definition of NaN, it
certainly cannot be True. a is a, but a is not equal to a.
I wouldn't even expect
a = [1, ..., 499, float('nan'), 501, ..., 999]
b = [1, ..., 499, float('nan'), 501, ..., 999]
a == b
False
but I guess I have to live with that. While I wouldn't apply it
to other people, I have to admit Raymond's aphorism applies to me (the
surprising thing is not the behavior of NaNs, but that I'm surprised
by anything that happens in the presence of NaNs!)
The only thing that should happen in the presence of NaNs is more NaNs :)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Wed, Apr 27, 2011 at 9:25 PM, Robert Kern wrote:
> On 2011-04-27 23:01 , Guido van Rossum wrote:
>> And I wouldn't want to change that. It sounds like NumPy wouldn't be
>> much affected if we were to change this (which I'm not saying we
>> would).
>
> Well, I didn't say that. If Python changed its behavior for (float('nan') ==
> float('nan')), we'd have to seriously consider some changes.
Ah, but I'm not proposing anything of the sort! float('nan') returns a
new object each time and two NaNs that are not the same *object* will
still follow the IEEE std. It's just when comparing a NaN-valued
*object* to *itself* (i.e. the *same* object) that I would consider
following the lead of Python's collections.
> We do like to
> keep *some* amount of correspondence with Python semantics. In particular,
> we like our scalar types that match Python types to work as close to the
> Python type as possible. We have the np.float64 type, which represents a C
> double scalar and corresponds to a Python float. It is used when a single
> item is indexed out of a float64 array. We even subclass from the Python
> float type to help working with libraries that may not know about numpy:
>
> [~]
> |5> import numpy as np
>
> [~]
> |6> nan = np.array([1.0, 2.0, float('nan')])[2]
>
> [~]
> |7> nan == nan
> False
Yeah, this is where things might change, because it is the same
*object* left and right.
> [~]
> |8> type(nan)
> numpy.float64
>
> [~]
> |9> type(nan).mro()
> [numpy.float64,
> numpy.floating,
> numpy.inexact,
> numpy.number,
> numpy.generic,
> float,
> object]
>
>
> If the Python float type changes behavior, we'd have to consider whether to
> keep that for np.float64 or change it to match the usual C semantics used
> elsewhere. So there *would* be a dilemma. Not necessarily the most
> nerve-wracking one, but a dilemma nonetheless.
Given what I just said, would it still be a dilemma? Maybe a smaller one?
--
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Wed, Apr 27, 2011 at 9:33 PM, Robert Kern wrote: > [1] Okay, that's a lie. I'm sure that persistent minority would *love* to > have NaN == NaN, because that would make their (ab)use of NaNs easier to > work with. Too bad, because that won't change. :-) I agree that this is abuse of NaNs and shouldn't be encouraged. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 4/27/2011 8:06 PM, Stephen J. Turnbull wrote:
Glenn Linderman writes:
> On 4/27/2011 6:11 PM, Ethan Furman wrote:
> > Totally out of my depth, but what if the a NaN object was allowed to
> > compare equal to itself, but different NaN objects still compared
> > unequal? If NaN was a singleton then the current behavior makes more
> > sense, but since we get a new NaN with each instance creation is there
> > really a good reason why the same NaN can't be equal to itself?
Yes. A NaN is a special object that means "the computation that
produced this object is undefined." For example, consider the
computation 1/x at x = 0. If you approach from the left, 1/0
"obviously" means minus infinity, while if you approach from the right
just as obviously it means plus infinity. So what does the 1/0 that
occurs in [1/x for x in range(-5, 6)] mean? In what sense is it
"equal to itself"? How can something which is not a number be
compared for numerical equality?
> >>> n1 = float('NaN')
> >>> n2 = float('NaN')
> >>> n3 = n1
>
> >>> n1
> nan
> >>> n2
> nan
> >>> n3
> nan
>
> >>> [n1] == [n2]
> False
> >>> [n1] == [n3]
> True
>
> This is the current situation: some NaNs compare equal sometimes, and
> some don't.
No, Ethan is asking for "n1 == n3" => True. As Mark points out, "[n1]
== [n3]" can be interpreted as a containment question, rather than an
equality question, with respect to the NaNs themselves.
It _can_ be interpreted as a containment question, but doing so is
contrary to the documentation of Python list comparison, which presently
doesn't match the implementation. The intuitive definition of equality
of lists is that each member is equal. The presence of NaN destroys
intuition of people that don't expect them to be as different from
numbers as they actually are, but for people that understand NaNs and
expect them to behave according to their definition, then the presence
of a NaN in a list would be expected to cause the list to not be equal
to itself, because a NaN is not equal to itself.
In standard
set theory, these are the same question, but that's not necessarily so
in other set-like toposes. In particular, getting equality and set
membership to behave reasonably with respect to each other one of the
problems faced in developing a workable theory of fuzzy sets.
I don't think it matters what behavior you choose for NaNs, somebody
is going be unhappy sometimes.
Some people will be unhappy just because they exist in the language, so
I agree :)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 2:54 PM, Guido van Rossum wrote:
>> Well, I didn't say that. If Python changed its behavior for (float('nan') ==
>> float('nan')), we'd have to seriously consider some changes.
>
> Ah, but I'm not proposing anything of the sort! float('nan') returns a
> new object each time and two NaNs that are not the same *object* will
> still follow the IEEE std. It's just when comparing a NaN-valued
> *object* to *itself* (i.e. the *same* object) that I would consider
> following the lead of Python's collections.
The reason this possibility bothers me is that it doesn't mesh well
with the "implementations are free to cache and reuse immutable
objects" rule. Although, if the updated NaN semantics were explicit
that identity was now considered part of the value of NaN objects
(thus ruling out caching them at the implementation layer), I guess
that objection would go away.
Regards,
Nick.
--
Nick Coghlan | [email protected] | Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Wed, Apr 27, 2011 at 11:14 PM, Guido van Rossum wrote:
..
>> ISTM, the current state of affairs is reasonable.
>
> Hardly; when I picked the NaN behavior I knew the IEEE std prescribed
> it but had never seen any code that used this.
>
Same here. The only code I've seen that depended on this NaN behavior
was either buggy (programmer did not consider NaN case) or was using x
== x as a way to detect nans. The later idiom is universally frowned
upon regardless of the language. In Python one should use
math.isnan() for this purpose.
I would like to present a challenge to the proponents of the status
quo. Look through your codebase and find code that will behave
differently if nan == nan were True. Then come back and report how
many bugs you have found. :-) Seriously, though, I bet that if you
find anything, it will fall into one of the two cases I mentioned
above.
..
> I expect that that if 15 years or so ago I had decided to ignore the
> IEEE std and declare that object identity always implies equality it
> would have seemed quite reasonable as well... The rule could be
> something like "the == operator first checks for identity and if left
> and right are the same object, the answer is True without calling the
> object's __eq__ method; similarly the != would always return False
> when an object is compared to itself".
Note that ctypes' floats already behave this way:
>>> x = c_double(float('nan'))
>>> x == x
True
..
> Doing this in 3.3 would, alas, be a huge undertaking -- I expect that
> there are tons of unittests that depend either on the current NaN
> behavior or on x == x calling x.__eq__(x). Plus the decimal unittests
> would be affected. Perhaps somebody could try?
Before we go down this path, I would like to discuss another
peculiarity of NaNs:
>>> float('nan') < 0
False
>>> float('nan') > 0
False
This property in my experience causes much more trouble than nan ==
nan being false. The problem is that common sorting or binary search
algorithms may degenerate into infinite loops in the presence of nans.
This may even happen when searching for a finite value in a large
array that contains a single nan. Errors like this do happen in the
wild and and after chasing a bug like this programmers tend to avoid
nans at all costs. Oftentimes this leads to using "magic"
placeholders such as 1e300 for missing data.
Since py3k has already made None < 0 an error, it may be reasonable
for float('nan') < 0 to raise an error as well (probably ValueError
rather than TypeError). This will not make lists with nans sortable
or searchable using binary search, but will make associated bugs
easier to find.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 12:33 AM, Robert Kern wrote: > On 2011-04-27 23:24 , Guido van Rossum wrote: .. >> So do new masks get created when the outcome of an elementwise >> operation is a NaN? > > No. Yes. >>> from MA import array >>> print array([0])/array([0]) [-- ] (I don't have numpy on this laptop, so the example is using Numeric, but I hope you guys did not change that while I was not looking:-) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
Steven D'Aprano wrote: You can compare NANs, and the result of the comparisons are perfectly well defined by either True or False. But it's *arbitrarily* defined, and it's far from clear that the definition chosen is useful in any way. If you perform a computation and get a NaN as the result, you know that something went wrong at some point. But if you subject that NaN to a comparison, your code takes some arbitrarily-chosen branch and produces a result that may look plausible but is almost certainly wrong. The Pythonic thing to do (in the Python 3 world at least) would be to regard NaNs as non-comparable and raise an exception. -- Greg ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 12:24 AM, Guido van Rossum wrote: > So do new masks get created when the outcome of an elementwise > operation is a NaN? Because that's the only reason why one should have > NaNs in one's data in the first place. If this is the case, why Python almost never produces NaNs as IEEE standard prescribes? >>> 0.0/0.0 Traceback (most recent call last): File "", line 1, in ZeroDivisionError: float division > -- not to indicate missing values! Sometimes you don't have a choice. For example when you data comes from a database that uses NaNs for missing values. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
Stephen J. Turnbull wrote: So what does the 1/0 that occurs in [1/x for x in range(-5, 6)] mean? In what sense is it "equal to itself"? How can something which is not a number be compared for numerical equality? I would say it *can't* be compared for *numerical* equality. It might make sense to compare it using some other notion of equality. One of the problems here, I think, is that Python only lets you define one notion of equality for each type, and that notion is the one that gets used when you compare collections of that type. (Or at least it's supposed to, but the identity- implies-equality shortcut that gets taken in some places interferes with that.) So if you're going to decide that it doesn't make sense to compare undefined numeric quantities, then it doesn't make sense to compare lists containing them either. -- Greg ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
Guido van Rossum wrote: Currently NaN is not violating any language rules -- it is just violating users' intuition, in a much worse way than Inf does. If it's to be an official language non-rule (by which I mean that types are officially allowed to compare non-reflexively) then any code assuming that identity implies equality for arbitrary objects is broken and should be fixed. -- Greg ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 1:40 AM, Greg Ewing wrote:
..
> The Pythonic thing to do (in the Python 3 world at least) would
> be to regard NaNs as non-comparable and raise an exception.
As I mentioned in a previous post, I agree in case of <, <=, >, or >=
comparisons, but == and != are a harder case because you don't want,
for example:
>>> [1,2,float('nan'),3].index(3)
3
to raise an exception.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 4/27/2011 8:43 PM, Nick Coghlan wrote: On Thu, Apr 28, 2011 at 12:42 PM, Stephen J. Turnbull wrote: Mark Dickinson writes: > Declaring that 'nan == nan' should be True seems attractive in > theory, No, it's intuitively attractive, but that's because humans like nice continuous behavior. In *theory*, it's true that some singularities are removable, and the NaN that occurs when evaluating at that point is actually definable in a broader context, but the point of NaN is that some singularities are *not* removable. This is somewhat Pythonic: "In the presence of ambiguity, refuse to guess." Refusing to guess in this case would be to treat all NaNs as signalling NaNs, and that wouldn't be good, either :) I like Terry's suggestion for a glossary entry, and have created an updated proposal at http://bugs.python.org/issue11945 (I also noted that array.array is like collections.Sequence in failing to enforce the container invariants in the presence of NaN values) In that bug, Nick, you mention that reflexive equality is something that container classes rely on in their implementation. Such reliance seems to me to be a bug, or an inappropriate optimization, rather than a necessity. I realize that classes that do not define equality use identity as their default equality operator, and that is acceptable for items that do not or cannot have any better equality operator. It does lead to the situation where two objects that are bit-for-bit clones get separate entries in a set... exactly the same as how NaNs of different identity work... the situation with a NaN of the same identity not being added to the set multiple times seems to simply be a bug because of conflating identity and equality, and should not be relied on in container implementations. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 2:20 AM, Glenn Linderman wrote: .. > In that bug, Nick, you mention that reflexive equality is something that > container classes rely on in their implementation. Such reliance seems to > me to be a bug, or an inappropriate optimization, .. An alternative interpretation would be that it is a bug to use NaN values in lists. It is certainly nonsensical to use NaNs as keys in dictionaries and that reportedly led Java designers to forgo the nonreflexivity of nans: """ A "NaN" value is not equal to itself. However, a "NaN" Java "Float" object is equal to itself. The semantic is defined this way, because otherwise "NaN" Java "Float" objects cannot be retrieved from a hash table. """ - http://www.concentric.net/~ttwang/tech/javafloat.htm With the status quo in Python, it may only make sense to store NaNs in array.array, but not in a list. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 4:20 PM, Glenn Linderman wrote: > In that bug, Nick, you mention that reflexive equality is something that > container classes rely on in their implementation. Such reliance seems to > me to be a bug, or an inappropriate optimization, rather than a necessity. > I realize that classes that do not define equality use identity as their > default equality operator, and that is acceptable for items that do not or > cannot have any better equality operator. It does lead to the situation > where two objects that are bit-for-bit clones get separate entries in a > set... exactly the same as how NaNs of different identity work... the > situation with a NaN of the same identity not being added to the set > multiple times seems to simply be a bug because of conflating identity and > equality, and should not be relied on in container implementations. No, as Raymond has articulated a number of times over the years, it's a property of the equivalence relation that is needed in order to present sane invariants to users of the container. I included in the bug report the critical invariants I am currently aware of that should hold, even when the container may hold types with a non-reflexive definition of equality: assert [x] == [x] # Generalised to all container types assert not [x] != [x]# Generalised to all container types for x in c: assert x in c assert c.count(x) > 0 # If applicable assert 0 <= c.index(x) < len(c) # If applicable The builtin types all already work this way, and that's a deliberate choice - my proposal is simply to document the behaviour as intentional, and fix the one case I know of in the standard library where we don't implement these semantics correctly (i.e. collections.Sequence). The question of whether or not float and decimal.Decimal should be modified to have reflexive definitions of equality (even for NaN values) is actually orthogonal to the question of clarifying and documenting the expected semantics of containers in the face of non-reflexive definitions of equality. Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
