Re: Rich Comparisons Gotcha

2009-01-07 Thread Steven D'Aprano
On Wed, 07 Jan 2009 01:23:19 +, Mark Wooding wrote:

 A case-sensitive string is /not the same/ as a case-insensitive string.
 One's a duck, the other's a goose.  I'd claim here that iabc =~ ABC
 must be False, because iabc =~ abc must be false also!  To define it
 otherwise leads to the incoherence you describe.

It's only incoherent if you need equality to be an equivalence relation. 
If you don't, it is perfectly reasonable to declare that iabc equals 
abc.


-- 
Steven


--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2009-01-07 Thread Mark Wooding
Steven D'Aprano ste...@remove.this.cybersource.com.au wrote:

 It's only incoherent if you need equality to be an equivalence relation. 
 If you don't, it is perfectly reasonable to declare that iabc equals 
 abc.

Right!  And if you didn't want an equivalence relation, then `==' will
suit you fine.  The problem is that some applications seem to /want/ an
equivalence relation, and one that's more useful (i.e., less
discriminating) than `is'.

-- [mdw]
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2009-01-06 Thread Mark Wooding
Steven D'Aprano ste...@remove.this.cybersource.com.au wrote:

 Such assumptions only hold under particular domains though. You can't
 assume equality is an equivalence relation once you start thinking
 about arbitrary domains.

From a formal mathematical point of view, equality /is/ an equivalence
relation.  If you have a relation on some domain, and it's not an
equivalence relation, then it ain't the equality relation, and that's
flat.

 But there cannot be any such function which is a domain-independent 
 equivalence relation, not if we're talking about arbitrarily wacky 
 domains.

That looks like a claim which requires a proof to me.  But it could also
do with a definition of `domain', so I'll settle for one of those first.

If we're dealing with sets (i.e., `domain's form a subclass of `sets')
then the claim is clearly false, and equality (determined by comparison
of elements) is indeed a domain-independent equivalence relation.

 Even something as straight-forward as is can't be an equivalence
 relation under a domain where identity isn't well-defined.

You've completely lost me here.  The Python `is' operator is (the
characteristic function of) an equivalence relation on Python values:
that's its definition.  You could describe an extension of the `is'
relation to a larger set of items, such that it fails to be an
equivalence relation on that set, but you'd be (rightly) criticized for
failing to preserve one of its two defining properties.  (The other is
that `is' makes distinctions between values which are at least as fine
as any other method, and this property should also be extended .)

Let me have another go.

All Python objects are instances of `object' or of some more specific
class.  The `==' operator on `object' is (the characteristic function
of) an equivalence relation.  In, fact, it's the same as `is' -- but
`==' can be overridden by subclasses, and subclasses are permitted --
according to the interface definition -- to coarsen the relation.  In
fact, they're permitted to make it not be an equivalence class at all.

I claim that this is a problem.  I /agree/ that domain-specific
predicates are useful, and can be sufficiently useful that they deserve
the `==' name -- as well as floats and numpy, I've provided SAGE and
sympy as examples myself.  But I also believe that there are good
reasons to want an `equivalence' operator (I'll write it as `=~', though
I don't propose this as Python syntax -- see below) with the following
properties:

  * `=~' is the characteristic function[1] of an equivalence relation,
i.e., for all values x, y, z: x =~ y in (True, False); (x =~ x) ==
True; if x =~ y then y =~ x; and if x =~ y and y =~ z then x =~ z

  * Moreover, `=~' is a coarsening of `is', i.e. for all values x, y: if
x is y then x =~ y.

A valuable property might be that x =~ y if x and y are
indistinguishable without using `is'.  That would mean immediately that
'xyz' =~ 'xy' + 'z' (regardless of interning, because strings are
immutable).  But for tuples this would imply elementwise comparison,
which may be expensive -- and, in the case of tuples manufactured by C
extensions, nontrivial because manufactured tuples need not be acyclic.
On the other hand, `==' is already recursive on tuples.

We can envisage a collection of different relations, according to which
distinguishing methods we're willing to disallow.  For example, for
numerical types, there are actually a number of interesting relations,
according to whether you think the answers to the following questions
are true or false.

  * Is 1 =~ 1/1?  (Here, 1 is an integer, and 1/1 is a rational number;
both are the multiplicative identities of their respective rings.
I'd suggest that it doesn't seem very useful to say `no' here, but
there might be reasons why one would want type(x) is type(y) if
x =~ y.)

  * Is 1 =~ 1.0?  (This is trickier.  Numerically the values are equal;
but the former is exact and the latter inexact, and this is a good
reason to want a separation.)

Essentially, these are asking whether `type' is a legitimate
distinguisher, and I think that the answer, unhelpful as it may be, is
`sometimes'.

A third useful distinguishing technique is mutation.  Given two
singleton lists whose respective elements compare equivalent, I can
mutate one of them to decide whether the other is in fact the same.  Is
this something which `=~' should distinguish?  Again, the answer is
probably `sometimes'.

To summarize: we're left with at least three different characteristics
which an equivalence predicate might have:

  * efficient (e.g., bounded recursion depth, works on circular values);
  * neglects irrelevant (to whom?) differences of type; and
  * neglects differences due to mutability.

A predicate used to compare set elements or hash-table keys should
probably /respect/ mutability.  (Associating hashing with this
predicate, rather than `==', would coherently allow mutable objects such
as lists to be 

Re: Rich Comparisons Gotcha

2009-01-06 Thread Steven D'Aprano
On Tue, 06 Jan 2009 12:42:13 +, Mark Wooding wrote:

 Steven D'Aprano ste...@remove.this.cybersource.com.au wrote:
 
 Such assumptions only hold under particular domains though. You can't
 assume equality is an equivalence relation once you start thinking
 about arbitrary domains.
 
 From a formal mathematical point of view, equality /is/ an equivalence
 relation.  If you have a relation on some domain, and it's not an
 equivalence relation, then it ain't the equality relation, and that's
 flat.

Okay, fair enough. In the formal mathematical sense, equality is always 
an equivalence relation. So there are certain domains which don't have 
equality, e.g. floating point, since nan != nan. Also Python objects, 
since x.__eq__(y) is not necessarily the same as y.__eq__(x).



 But there cannot be any such function which is a domain-independent
 equivalence relation, not if we're talking about arbitrarily wacky
 domains.
 
 That looks like a claim which requires a proof to me.  But it could also
 do with a definition of `domain', so I'll settle for one of those first.

I'm talking about domain in the sense of a particular problem domain. 
That is, the model, data and operations used to solve a problem. I don't 
know that I can be more formal than that.

To prove my claim, all you need is two domains with a mutually 
incompatible definition of equality. That's not so difficult, surely? How 
about equality of integers, versus equality of integers modulo some N?


 
 If we're dealing with sets (i.e., `domain's form a subclass of `sets')
 then the claim is clearly false, and equality (determined by comparison
 of elements) is indeed a domain-independent equivalence relation.

It isn't domain-independent in my sense, because you have specified one 
specific domain, namely set equality.

 
 Even something as straight-forward as is can't be an equivalence
 relation under a domain where identity isn't well-defined.
 
 You've completely lost me here.  The Python `is' operator is (the
 characteristic function of) an equivalence relation on Python values:
 that's its definition.

Yes, that's because identity is well-defined in Python. I'm saying that 
if identity isn't well-defined, then neither is the 'is' operator, and 
therefore it isn't an equivalence relation. That shouldn't be 
controversial.



 All Python objects are instances of `object' or of some more specific
 class.  The `==' operator on `object' is (the characteristic function
 of) an equivalence relation.  In, fact, it's the same as `is' -- but
 `==' can be overridden by subclasses, and subclasses are permitted --
 according to the interface definition -- to coarsen the relation.  In
 fact, they're permitted to make it not be an equivalence class at all.
 
 I claim that this is a problem.  

It *can* be a problem, if you insist on using == on arbitrary types while 
still expecting it to be an equivalence relation.

If you drop the requirement that it remain an e-r, then you can apply == 
to arbitrary types. And if you limit yourself to non-arbitrary types, 
then you can safely use (say) any strings you like, and == will remain an 
e-r. 



I /agree/ that domain-specific
 predicates are useful, and can be sufficiently useful that they deserve
 the `==' name -- as well as floats and numpy, I've provided SAGE and
 sympy as examples myself.  But I also believe that there are good
 reasons to want an `equivalence' operator (I'll write it as `=~', though
 I don't propose this as Python syntax -- see below) with the following
 properties:
 
   * `=~' is the characteristic function[1] of an equivalence relation,
 i.e., for all values x, y, z: x =~ y in (True, False); (x =~ x) ==
 True; if x =~ y then y =~ x; and if x =~ y and y =~ z then x =~ z

   * Moreover, `=~' is a coarsening of `is', i.e. for all values x, y: if
 x is y then x =~ y.


Ah, but you can't have such a generic e-r that applies across all problem 
domains. Consider:

Let's denote regular, case-sensitive strings using abc, and special, 
case-insensitive strings using iabc. So for regular strings, equality 
is an e-r; for case-insensitive strings, equality is also an e-r  (I 
trust that the truth of this is obvious). But if you try to use equality 
on *both* regular and case-insensitive strings, it fails to be an e-r:

iabc =~ ABC returns True if you use the case-insensitive definition 
of equality, but returns False if you use the case-sensitive definition. 
There is no single definition of equality that is *simultaneously* case-
sensitive and case-insensitive.


 A valuable property might be that x =~ y if x and y are
 indistinguishable without using `is'.

That's a little strong, because it implies that equality must look at 
*everything* about a particular object, not just whatever bits of data 
are relevant for the problem domain.

For example, consider storing data in a dict.

 D1 = {-1: 0, -2: 0}
 D2 = {-2: 0}
 D2[-1] = 0
 D1 == D2
True


We certainly want D1 and D2 to be 

Re: Rich Comparisons Gotcha

2009-01-06 Thread Mark Wooding
Steven D'Aprano ste...@remove.this.cybersource.com.au wrote:

 To prove my claim, all you need is two domains with a mutually 
 incompatible definition of equality. That's not so difficult, surely? How 
 about equality of integers, versus equality of integers modulo some N?

No, that's not an example.  The integers modulo N form a ring Z/NZ of
residue classes.  Such residue classes are distinct from the integers --
e.g., an integer 3 (say) is not the same as the set 3 + NZ { ..., 3 - 2N,
3 - N, 3, 3 + N, 3 + 2N, ... } -- but there is a homomorphism from Z
to Z/NZ under which 3 + NZ is the image of 3.

If we decide to define the == operator such that 3 == 3 + NZ and 3 + N
== 3 + NZ then == is not an equivalence relation (in particular,
transitivity fails).  But that's just an artifact of the definition.  If
we distinguish 3 from 3 + NZ then everything is fine.  3 + NZ == (3 + N)
+ NZ correctly, but 3 != 3 + N, and all is well.

Here, at least, the problem is not that == as an equivalence relation
fails in some particular domain -- because in both Z and Z/NZ it can be
a perfectly fine equivalence relation -- but that it can potentially
fail on the boundaries between domains.  Easy answer: don't mess it up
at the boundaries.

Proposition.  Let U, U' be disjoint sets, and let E, E' be equivalence
relations on U, U' respectively.  Define E^ on U union U' as E^ = E
union E', i.e.,

  E^(x, y) iff x in U and y in U and E(x, y) or
   x in U' and y in U' and E'(x, y)

Then E^ is an equivalence relation.

Proof.  Reflexivity and symmetry are trivial; transitivity follows from
disjointness of U and U'.

 It *can* be a problem, if you insist on using == on arbitrary types
 while still expecting it to be an equivalence relation.

Unfortunately, from the surrounding discussion, it seems that container
types particularly want to be able to contain arbitrary objects, and the
failure of == to be a equivalence relation makes this fail.  The problem
is that objects with wacky == operators are still more or less quacking
like the more usual kinds of ducks; but they turn out to taste very
different.

 Let's denote regular, case-sensitive strings using abc, and special, 
 case-insensitive strings using iabc. So for regular strings, equality 
 is an e-r; for case-insensitive strings, equality is also an e-r  (I 
 trust that the truth of this is obvious). But if you try to use equality 
 on *both* regular and case-insensitive strings, it fails to be an e-r:
 
 iabc =~ ABC returns True if you use the case-insensitive definition 
 of equality, but returns False if you use the case-sensitive definition. 
 There is no single definition of equality that is *simultaneously* case-
 sensitive and case-insensitive.

A case-sensitive string is /not the same/ as a case-insensitive string.
One's a duck, the other's a goose.  I'd claim here that iabc =~ ABC
must be False, because iabc =~ abc must be false also!  To define it
otherwise leads to the incoherence you describe.  But the above
proposition provides an easy answer.

  A valuable property might be that x =~ y if x and y are
  indistinguishable without using `is'.
 
 That's a little strong, because it implies that equality must look at 
 *everything* about a particular object, not just whatever bits of data 
 are relevant for the problem domain.

Yes.  That's one of the reasons that =~ isn't the same as ==.

I've been thinking on my feet in this thread, so I haven't thought
everything through.  And as I mention below, there are /many/ useful
equality predicates on values.  As I didn't mention (but hope is
obvious) having a massively-parametrized equality predicate is daft, and
providing enough to suit every possible application equally so.  But we
might be able to do well enough with just one or two -- or maybe by just
leaving things as they are.

 For example, consider storing data in a dict.
 
  D1 = {-1: 0, -2: 0}
  D2 = {-2: 0}
  D2[-1] = 0
  D1 == D2
 True
 
 
 We certainly want D1 and D2 to be equal.

Do we?  If we're using my `indistinguishable without using ``is'''
criterion from above, then D1 and D2 are certainly different!  To detect
the difference, mutate one and see if the other changes:

def distinct_dictionaries_p(D1, D2):
  
  Decide whether D1 and D2 are the same dictionary or not.
  Not threadsafe.
  
  magic = []
  more_magic = [magic]
  old = D1.get('mumble', more_magic)
  D1['mumble'] = magic
  result = D2.get('mumble', more_magic) is magic
  if old is more_magic:
del D1['mumble']
  else:
D1['mumble'] = old
  return result

But that criterion was a suggestion -- a way of defining a coherent
equivalence relation on the whole of the Python value space which is
coarser than `is' and maybe more useful.  My primary purpose in
proposing it was to stimulate discussion: what /do/ we want from
equality predicates?  We already have `is', which is too fine-grained to
be widely useful: it distinguishes between different instances of the
number 50, for 

Re: Rich Comparisons Gotcha

2009-01-05 Thread Mark Wooding
Steven D'Aprano st...@remove-this-cybersource.com.au wrote:

 There is nothing to blame them for. This is the correct behaviour. NaNs 
 should *not* compare equal to themselves, that's mathematically 
 incoherent.

Indeed.  The problem is a paucity of equality predicates.  This is
hardly surprising: Common Lisp has four general-purpose equality
predicates (EQ, EQL, EQUAL and EQUALP), and many more type-specific ones
(=, STRING=, STRING-EQUAL (yes, I know...), CHAR=, ...), and still
doesn't really have enough.  For example, EQUAL compares strings
case-sensitively, but other arrays are compared by address; EQUALP will
recurse into arbitrary arrays, but compares strings
case-insensitively...

For the purposes of this discussion, however, it has enough to be able
to distinguish between

  * numerical comparisons, which (as you explain later) should /not/
claim that two NaNs are equal, and

  * object comparisons, which clearly must declare an object equal to
itself.

For example, I had the following edifying conversation with SBCL.

CL-USER ;; Return NaNs rather than signalling errors.
 (sb-int:set-floating-point-modes :traps nil)
; No value
CL-USER (defconstant nan (/ 0.0 0.0))
NAN
CL-USER (loop for func in '(eql equal equalp =)
   collect (list func (funcall func nan nan)))
((EQL T) (EQUAL T) (EQUALP T) (= NIL))
CL-USER

That is, a NaN is EQL, EQUAL and EQUALP to itself, but not = to itself.
(Due to the vagaries of EQ, a NaN might or might not be EQ to itself or
other NaNs.)

Python has a much more limited selection of equality predicates -- in
fact, just == and is.  The is operator is Python's equivalent of Lisp's
EQ predicate: it compares objects by address.  I can have a similar chat
with Python.

In [12]: nan = float('nan')

In [13]: nan is nan
Out[13]: True

In [14]: nan == nan
Out[14]: False

In [16]: nan is float('nan')
Out[16]: False

Python numbers are the same as themselves reliably, unlike in Lisp.  But
there's no sensible way of asking whether something is `basically the
same as' nan, like Lisp's EQL or EQUAL.  I agree that the primary
equality predicate for numbers must be the numerical comparison, and
NaNs can't (sensibly) be numerically equal to themselves.

Address comparisons are great when you're dealing with singletons, or
when you carefully intern your objects.  In other cases, you're left
with ==.  This puts a great deal of responsibility on the programmer of
an == method to weigh carefully the potentially conflicting demands of 
compatibility (many other libraries just expect == to be an equality
operator returning a straightforward truth value, and given that there
isn't a separate dedicated equality operator, this isn't unreasonable),
and doing something more domain-specifically useful.

It's worth pointing out that numpy isn't unique in having == not return
a straightforward truth value.  The SAGE computer algebra system (and
sympy, I believe) implement the == operator on algebraic formulae so as
to construct equations.  For example, the following is syntactically and
semantically Python, with fancy libraries.

sage: var('x')  # x is now a variable
x
sage: solve(x**2 + 2*x - 4 == 1)
[x == -sqrt(6) - 1, x == sqrt(6) - 1]

(SAGE has some syntactic tweaks, such as ^ meaning the same as **, but I
didn't use them.)

I think this is an excellent use of the == operator -- but it does have
some potential to interfere with other libraries which make assumptions
about how == behaves.  The SAGE developers have been clever here,
though:

sage: 2*x + 1 == (2 + 4*x)/2
2*x + 1 == (4*x + 2)/2
sage: bool(2*x + 1 == (2 + 4*x)/2)
True
sage: bool(2*x + 1 == (2 + 4*x)/3)
False

I think Python manages surprisingly well with its limited equality
predicates.  But the keyword there is `surprisingly' -- and it may not
continue this trick forever.

-- [mdw]
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2009-01-05 Thread Mark Wooding
Steven D'Aprano ste...@remove.this.cybersource.com.au wrote:

 I've already mentioned NaNs. Sentinel values also sometimes need to
 compare not equal with themselves. Forcing them to compare equal will
 cause breakage.

There's a conflict between such domain-specific considerations (NaNs,
strange sentinels, SAGE's equations), and relatively natural assumptions
about an == operator, such as it being an equivalence relation.

I don't know how to resolve this conflict without introducing a new
function which is (or at least strongly encourages developers to arrange
for it to be) an equivalence relation.

-- [mdw]
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2009-01-05 Thread Steven D'Aprano
On Tue, 06 Jan 2009 01:24:58 +, Mark Wooding wrote:

 Steven D'Aprano ste...@remove.this.cybersource.com.au wrote:
 
 I've already mentioned NaNs. Sentinel values also sometimes need to
 compare not equal with themselves. Forcing them to compare equal will
 cause breakage.
 
 There's a conflict between such domain-specific considerations (NaNs,
 strange sentinels, SAGE's equations), and relatively natural assumptions
 about an == operator, such as it being an equivalence relation.

Such assumptions only hold under particular domains though. You can't 
assume equality is an equivalence relation once you start thinking about 
arbitrary domains.


 I don't know how to resolve this conflict without introducing a new
 function which is (or at least strongly encourages developers to arrange
 for it to be) an equivalence relation.

But there cannot be any such function which is a domain-independent 
equivalence relation, not if we're talking about arbitrarily wacky 
domains. Even something as straight-forward as is can't be an 
equivalence relation under a domain where identity isn't well-defined.


-- 
Steven
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-11 Thread Steven D'Aprano
On Wed, 10 Dec 2008 17:58:49 -0500, Luis Zarrabeitia wrote:

 On Sunday 07 December 2008 09:21:18 pm Robert Kern wrote:
 The deficiency is in the feature of rich comparisons, not numpy's
 implementation of it. __eq__() is allowed to return non-booleans;
 however, there are some parts of Python's implementation like
 list.__contains__() that still expect the return value of __eq__() to
 be meaningfully cast to a boolean.
 
 list.__contains__, tuple.__contains__, the 'if' keyword...
 
 How do can you suggest to fix the list.__contains__ implementation?


I suggest you don't, because I don't think it's broken. I think it's 
working as designed. It doesn't succeed with arbitrary data types which 
may be broken, buggy or incompatible with __contain__'s design, but 
that's okay, it's not supposed to.


 Should I wrap all my ifs with this?:
 
 if isinstance(a, numpy.array) or isisntance(b,numpy.array):
 res = compare_numpy(a,b)
 elif isinstance(a,some_otherclass) or isinstance(b,someotherclass):
 res = compare_someotherclass(a,b)
 ...
 else:
 res = (a == b)
 if res:
# do whatever

No, inlining that code everywhere you have an if would be stupid. What 
you should do is write a single function equals(x, y) that does precisely 
what you want it to do, in whatever way you want, and then call it:

if equals(a, b):

Or, put your data inside a wrapper. If you read back over my earlier 
posts in this thread, I suggested a lightweight wrapper class you could 
use. You could make it even more useful by using delegation to make the 
wrapped class behave *exactly* like the original, except for __eq__.

You don't even need to wrap every single item:

def wrap_or_not(obj):
if obj in list_of_bad_types_i_know_about:
return EqualityWrapper(obj)
return obj

data = [1, 2, 3, BadData, 4]
data = map(wrap_or_not, data)



It isn't really that hard to deal with these things, once you give up the 
illusion that your code should automatically work with arbitrarily wacky 
data types that you don't control.


-- 
Steven
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-11 Thread M.-A. Lemburg
On 2008-12-10 23:21, Luis Zarrabeitia wrote:
 On Wednesday 10 December 2008 02:44:45 pm you wrote:
 Even in statically typed languages, when you override the equality
 operator/function you can choose not to return a valid answer (raise an
 exception). And it would break all the cases mentioned above (element in
 list, etc). But that isn't the right thing to do. The language
 doesn't/can't prohibit you from breaking the equality test, but that
 shouldn't be considered a feature. (a==b).all() makes no sense.
 Perhaps not in your application, but it does make sense in other
 numeric applications, e.g. ones that work on vectors or matrixes.

 I'd suggest you simply wrap the comparison in a function and then
 have that apply the necessary conversion to a boolean.
 
 I do numeric work... I'm finishing my MSc in applied math and I'm programing 
 mostly with python. And I'd rather have a.compare_with(b), or 
 a.elementwise_compare(b), or whatever name, rather than (a==b).all(). In 
 fact, I'd very much like to have an a.compare_with(b, epsilon=e).all() (to 
 account for rounding errors), and with python2.5, all(a.compare_with(b)). 
 
 Yes, I could create an element_compare(a,b) function. But I still can't use 
 a==b and have a meaningful result. Ok, I can (and do) ignore that, it's just 
 one library, I'll keep track of the types before asking for equality (already 
 an ugly thing to do in python), but the a==b behaviour breaks the lists (a in 
 ll, ll.indexof(a)) even for elements not in numpy. ¿Should I also ignore 
 lists?

You should perhaps reconsider your use of lists. Lists with elements
of different types can be tricky at times, so perhaps you either need
a different data type which doesn't scan all elements or a separate
search function that knows about your type setup.

The fact that comparisons can raise exceptions is not new to Python,
so this problem can pop up in other areas as well, esp. when using
3rd party extensions.

Regarding the other issues like new methods you should really talk
to the numpy developers, since they are the ones who could fix this.

 The concept of equality between two arrays is very well defined, as it is 
 also 
 very well defined the element-by-element comparison. There is a need to test 
 for both - then the way to test for equality should be the equality test.
 
 I'm certain that something could be worked out. A quick paragraph that
 took me just a few minutes to type shouldn't be construed as a PEP that
 will solve all the problems :D.
 As always: the Devil is in the details :-)
 
 Of course... 

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 11 2008)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2008-12-02: Released mxODBC.Connect 1.0.0  http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-10 Thread Rasmus Fogh

Rhamphoryncus wrote:
 You grossly overvalue using the in operator on lists.

Maybe. But there is more to it than just 'in'. If you do:
 c = numpy.zeros((2,))
 ll = [1, c, 3.]
then the following all throw errors:
3 in ll, 3 not in ll, ll.index(3), ll.count(3), ll.remove(3)
c in ll, c not in ll, ll.index(c), ll.count(c), ll.remove(c)

Note how the presence of c in the list makes it behave wrong for 3 as
well.

 It's far more
 common to use a dict or set for containment tests, due to O(1)
 performance rather than O(n).  I doubt the numpy array supports
 hashing, so an error for misuse is all you should expect.

Indeed it doees not. So there is not much to be gained from modifying
equality comparison with sets/dicts.

 In the rare case that you want to test for identity in a list, you can
 easily write your own function to do it upfront:

 def idcontains(seq, obj):
 for i in seq:
 if i is obj:
 return True
 return False

Again, you can code around any particular case (though wrappers look like
a more robust solution). Still, why not get rid of this wart, if we can
find a way?


---
Dr. Rasmus H. Fogh  Email: [EMAIL PROTECTED]
Dept. of Biochemistry, University of Cambridge,
80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-10 Thread Rasmus Fogh

Rhodri James wrote:
 On Mon, 08 Dec 2008 14:24:59 -, Rasmus Fogh  wrote:

 On the minus side there would be the difference between
 '__equal__' and '__eq__' to confuse people.

 This is a very big minus.  It would be far better to spell __equal__ in
 such a way as to make it clear why it wasn't the same as __eq__,
 otherwise
 you end up with the confusion that the Perl == and eq operators
 regularly cause.

You are probably right, unfortunately. That proposal is unlikely to fly.
Do you think my latest proposal, raising BoolNotDefinedError, has better
chances?

---
Dr. Rasmus H. Fogh  Email: [EMAIL PROTECTED]
Dept. of Biochemistry, University of Cambridge,
80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-10 Thread Luis Zarrabeitia

Quoting Rasmus Fogh [EMAIL PROTECTED]:
 Rhamphoryncus wrote:
  You grossly overvalue using the in operator on lists.
 
 Maybe. But there is more to it than just 'in'. If you do:
  c = numpy.zeros((2,))
  ll = [1, c, 3.]
 then the following all throw errors:
 3 in ll, 3 not in ll, ll.index(3), ll.count(3), ll.remove(3)
 c in ll, c not in ll, ll.index(c), ll.count(c), ll.remove(c)
 
 Note how the presence of c in the list makes it behave wrong for 3 as
 well.

I think I lost the first messages on this thread, but... Wouldn't be easier to
just fix numpy? I see no need to have the == return anything but a boolean, at
least on Numpy's case. The syntax 'a == b' is an equality test, not a detailed
summary of why they may be different, and (a==b).all() makes no little sense to
read unless you know beforehad that a and b are numpy arrays. When I'm comparing
normal objects, I do not expect (nor should I) the == operator to return an
attribute-by-attribute summary of what was equal and what wasn't.

Why is numpy's == overloaded in such a counter intuitive way? I realize that an
elementwise comparison makes a lot of sense, but it could have been done instead
with a.compare_with(b) (or even better, a.compare_with(b, epsilon=e)). No
unexpected breakage, and you have the chance of specifying when you consider two
elements to be equal - very useful. 

Even the transition itself could be done without breaking much code... Make the
== op return an object that wraps the array of bools (instead of the array
itself), give it the any() and all() methods, and make __nonzero__/__bool__
equivalent to all().

-- 
Luis Zarrabeitia
Facultad de Matemática y Computación, UH
http://profesores.matcom.uh.cu/~kyrie
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-10 Thread M.-A. Lemburg
On 2008-12-10 16:40, Luis Zarrabeitia wrote:
 Quoting Rasmus Fogh [EMAIL PROTECTED]:
 Rhamphoryncus wrote:
 You grossly overvalue using the in operator on lists.
 Maybe. But there is more to it than just 'in'. If you do:
 c = numpy.zeros((2,))
 ll = [1, c, 3.]
 then the following all throw errors:
 3 in ll, 3 not in ll, ll.index(3), ll.count(3), ll.remove(3)
 c in ll, c not in ll, ll.index(c), ll.count(c), ll.remove(c)

 Note how the presence of c in the list makes it behave wrong for 3 as
 well.
 
 I think I lost the first messages on this thread, but... Wouldn't be easier to
 just fix numpy? I see no need to have the == return anything but a boolean, at
 least on Numpy's case. The syntax 'a == b' is an equality test, not a detailed
 summary of why they may be different, and (a==b).all() makes no little sense 
 to
 read unless you know beforehad that a and b are numpy arrays. When I'm 
 comparing
 normal objects, I do not expect (nor should I) the == operator to return an
 attribute-by-attribute summary of what was equal and what wasn't.
 
 Why is numpy's == overloaded in such a counter intuitive way? I realize that 
 an
 elementwise comparison makes a lot of sense, but it could have been done 
 instead
 with a.compare_with(b) (or even better, a.compare_with(b, epsilon=e)). No
 unexpected breakage, and you have the chance of specifying when you consider 
 two
 elements to be equal - very useful. 

Rich comparisons were added to Python at the request of the
Numeric (now numpy) developers and they have been part of Python
a Numeric for many many years.

I don't think it's likely they'll change things back to the days
of Python 1.5.2 ;-)

 Even the transition itself could be done without breaking much code... Make 
 the
 == op return an object that wraps the array of bools (instead of the array
 itself), give it the any() and all() methods, and make __nonzero__/__bool__
 equivalent to all().

That would cause a lot of confusion on its own, since such an
object wouldn't behave in the same way as say a regular Python
list (bool([0]) == True).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 10 2008)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2008-12-02: Released mxODBC.Connect 1.0.0  http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-10 Thread Rhamphoryncus
On Dec 10, 7:49 am, Rasmus Fogh [EMAIL PROTECTED] wrote:
 Rhamphoryncus wrote:
  You grossly overvalue using the in operator on lists.

 Maybe. But there is more to it than just 'in'. If you do: c = 
 numpy.zeros((2,))
  ll = [1, c, 3.]

 then the following all throw errors:
 3 in ll, 3 not in ll, ll.index(3), ll.count(3), ll.remove(3)
 c in ll, c not in ll, ll.index(c), ll.count(c), ll.remove(c)

 Note how the presence of c in the list makes it behave wrong for 3 as
 well.

All of these are O(n).  Use a set or dict.  What is your use case
anyway?


  It's far more
  common to use a dict or set for containment tests, due to O(1)
  performance rather than O(n).  I doubt the numpy array supports
  hashing, so an error for misuse is all you should expect.

 Indeed it doees not. So there is not much to be gained from modifying
 equality comparison with sets/dicts.

  In the rare case that you want to test for identity in a list, you can
  easily write your own function to do it upfront:
  def idcontains(seq, obj):
      for i in seq:
          if i is obj:
              return True
      return False

 Again, you can code around any particular case (though wrappers look like
 a more robust solution). Still, why not get rid of this wart, if we can
 find a way?

The wart is a feature.  I agree that it's confusing, but the cost of
adding a special case to work around it is far in excess of the
original problem.

Now if you phrased it as a hypothetical discussion for the purpose of
learning about language design, that'd be another matter.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-10 Thread Terry Reedy

Rasmus Fogh wrote:

Rhamphoryncus wrote:

You grossly overvalue using the in operator on lists.


Maybe. But there is more to it than just 'in'. If you do:

c = numpy.zeros((2,))
ll = [1, c, 3.]

then the following all throw errors:
3 in ll, 3 not in ll, ll.index(3), ll.count(3), ll.remove(3)
c in ll, c not in ll, ll.index(c), ll.count(c), ll.remove(c)

Note how the presence of c in the list makes it behave wrong for 3 as
well.


So do not put numpy arrays into lists without wrapping them.  They were 
designed and semi-optimized, by a separate community, for a specific 
purpose -- numerical computation -- and not for 'playing nice' with 
other Python objects.


It is a design feature of Python that people can implement specialized 
objects with specialized behaviors for specialized purposes.


--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-10 Thread Luis Zarrabeitia
On Wednesday 10 December 2008 10:50:57 am M.-A. Lemburg wrote:
 On 2008-12-10 16:40, Luis Zarrabeitia wrote:
  Quoting Rasmus Fogh [EMAIL PROTECTED]:
  Rhamphoryncus wrote:

 Rich comparisons were added to Python at the request of the
 Numeric (now numpy) developers and they have been part of Python
 a Numeric for many many years.

 I don't think it's likely they'll change things back to the days
 of Python 1.5.2 ;-)

Please define rich comparisons for me. It seems that I do not understand the 
term - I was thinking it meant the ability to override the comparison 
operators, and specially, the ability to override them independently.

Even in statically typed languages, when you override the equality 
operator/function you can choose not to return a valid answer (raise an 
exception). And it would break all the cases mentioned above (element in 
list, etc). But that isn't the right thing to do. The language doesn't/can't 
prohibit you from breaking the equality test, but that shouldn't be 
considered a feature. (a==b).all() makes no sense. 

  Even the transition itself could be done without breaking much code...
  Make the == op return an object that wraps the array of bools (instead of
  the array itself), give it the any() and all() methods, and make
  __nonzero__/__bool__ equivalent to all().

 That would cause a lot of confusion on its own, since such an
 object wouldn't behave in the same way as say a regular Python
 list (bool([0]) == True).

I'm certain that something could be worked out. A quick paragraph that took me 
just a few minutes to type shouldn't be construed as a PEP that will solve 
all the problems :D.

-- 
Luis Zarrabeitia (aka Kyrie)
Fac. de Matemática y Computación, UH.
http://profesores.matcom.uh.cu/~kyrie
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-10 Thread M.-A. Lemburg
On 2008-12-10 20:01, Luis Zarrabeitia wrote:
 On Wednesday 10 December 2008 10:50:57 am M.-A. Lemburg wrote:
 On 2008-12-10 16:40, Luis Zarrabeitia wrote:
 Quoting Rasmus Fogh [EMAIL PROTECTED]:
 Rhamphoryncus wrote:
 Rich comparisons were added to Python at the request of the
 Numeric (now numpy) developers and they have been part of Python
 a Numeric for many many years.

 I don't think it's likely they'll change things back to the days
 of Python 1.5.2 ;-)
 
 Please define rich comparisons for me. It seems that I do not understand 
 the 
 term - I was thinking it meant the ability to override the comparison 
 operators, and specially, the ability to override them independently.

That's one of the features, rich comparisons added. Another is
the ability to return arbitrary objects instead of just booleans
or integers:

http://www.python.org/dev/peps/pep-0207/

David was a Numeric developer at the time (among other things).

 Even in statically typed languages, when you override the equality 
 operator/function you can choose not to return a valid answer (raise an 
 exception). And it would break all the cases mentioned above (element in 
 list, etc). But that isn't the right thing to do. The language doesn't/can't 
 prohibit you from breaking the equality test, but that shouldn't be 
 considered a feature. (a==b).all() makes no sense. 

Perhaps not in your application, but it does make sense in other
numeric applications, e.g. ones that work on vectors or matrixes.

I'd suggest you simply wrap the comparison in a function and then
have that apply the necessary conversion to a boolean.

 Even the transition itself could be done without breaking much code...
 Make the == op return an object that wraps the array of bools (instead of
 the array itself), give it the any() and all() methods, and make
 __nonzero__/__bool__ equivalent to all().
 That would cause a lot of confusion on its own, since such an
 object wouldn't behave in the same way as say a regular Python
 list (bool([0]) == True).
 
 I'm certain that something could be worked out. A quick paragraph that took 
 me 
 just a few minutes to type shouldn't be construed as a PEP that will solve 
 all the problems :D.

As always: the Devil is in the details :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 10 2008)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2008-12-02: Released mxODBC.Connect 1.0.0  http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-10 Thread Luis Zarrabeitia
On Wednesday 10 December 2008 02:44:45 pm you wrote:
  Even in statically typed languages, when you override the equality
  operator/function you can choose not to return a valid answer (raise an
  exception). And it would break all the cases mentioned above (element in
  list, etc). But that isn't the right thing to do. The language
  doesn't/can't prohibit you from breaking the equality test, but that
  shouldn't be considered a feature. (a==b).all() makes no sense.

 Perhaps not in your application, but it does make sense in other
 numeric applications, e.g. ones that work on vectors or matrixes.

 I'd suggest you simply wrap the comparison in a function and then
 have that apply the necessary conversion to a boolean.

I do numeric work... I'm finishing my MSc in applied math and I'm programing 
mostly with python. And I'd rather have a.compare_with(b), or 
a.elementwise_compare(b), or whatever name, rather than (a==b).all(). In 
fact, I'd very much like to have an a.compare_with(b, epsilon=e).all() (to 
account for rounding errors), and with python2.5, all(a.compare_with(b)). 

Yes, I could create an element_compare(a,b) function. But I still can't use 
a==b and have a meaningful result. Ok, I can (and do) ignore that, it's just 
one library, I'll keep track of the types before asking for equality (already 
an ugly thing to do in python), but the a==b behaviour breaks the lists (a in 
ll, ll.indexof(a)) even for elements not in numpy. ¿Should I also ignore 
lists?

The concept of equality between two arrays is very well defined, as it is also 
very well defined the element-by-element comparison. There is a need to test 
for both - then the way to test for equality should be the equality test.

  I'm certain that something could be worked out. A quick paragraph that
  took me just a few minutes to type shouldn't be construed as a PEP that
  will solve all the problems :D.

 As always: the Devil is in the details :-)

Of course... 

-- 
Luis Zarrabeitia (aka Kyrie)
Fac. de Matemática y Computación, UH.
http://profesores.matcom.uh.cu/~kyrie
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-10 Thread Luis Zarrabeitia
On Sunday 07 December 2008 09:21:18 pm Robert Kern wrote:
 The deficiency is in the feature of rich comparisons, not numpy's
 implementation of it. __eq__() is allowed to return non-booleans; however,
 there are some parts of Python's implementation like list.__contains__()
 that still expect the return value of __eq__() to be meaningfully cast to a
 boolean.

list.__contains__, tuple.__contains__, the 'if' keyword...

How do can you suggest to fix the list.__contains__ implementation?

Should I wrap all my ifs with this?:

if isinstance(a, numpy.array) or isisntance(b,numpy.array):
res = compare_numpy(a,b)
elif isinstance(a,some_otherclass) or isinstance(b,someotherclass):
res = compare_someotherclass(a,b)
...
else:
res = (a == b)
if res:
   # do whatever

-- 
Luis Zarrabeitia (aka Kyrie)
Fac. de Matemática y Computación, UH.
http://profesores.matcom.uh.cu/~kyrie
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-10 Thread Steven D'Aprano
On Wed, 10 Dec 2008 17:21:51 -0500, Luis Zarrabeitia wrote:

 I do numeric work... I'm finishing my MSc in applied math and I'm
 programing mostly with python. And I'd rather have a.compare_with(b), or
 a.elementwise_compare(b), or whatever name, rather than (a==b).all(). 

Unluckily for you, the Numeric/Numpy people wanted something else. They 
asked first, there's a lot more of them, and their project is very 
important to Python's continued success.



 In
 fact, I'd very much like to have an a.compare_with(b, epsilon=e).all()
 (to account for rounding errors), and with python2.5,
 all(a.compare_with(b)).
 
 Yes, I could create an element_compare(a,b) function. 

Absolutely.


 But I still can't use a==b and have a meaningful result. 

That's right. *ANY* operation in Python can fail, given arbitrary data, 
with the possible exception of the id() function and the is and is 
not operators. You have to deal with it.


 Ok, I can (and do) ignore that,
 it's just one library, I'll keep track of the types before asking for
 equality (already an ugly thing to do in python), but the a==b behaviour
 breaks the lists (a in ll, ll.indexof(a)) even for elements not in
 numpy. ¿Should I also ignore lists?

That depends on what sort of contract your code is giving. Does it 
promise to work with any imaginable data whatsoever, no matter how badly 
broken or poorly designed or incompatible with what you're trying to do?

If so, then I suggest your contract is broken, not the behaviour of list. 
You can't make trustworthy promises to deal with arbitrary data types 
that you don't control, that can fail in arbitrary ways. Here's something 
for you to consider:

class Boobytrap:
def __eq__(self, other):
if other == 1:
return True
elif other == 2:
while True:
pass
return False

 alist = [0, Boobytrap(), 2, 3]
 1 in alist
True
 3 in alist
True
 5 in alist
False
 2 in alist


What do you expect should happen?


 
 The concept of equality between two arrays is very well defined, as it
 is also very well defined the element-by-element comparison. There is a
 need to test for both - then the way to test for equality should be the
 equality test.

The Numpy people disagree with you. It was from their request that Python 
was changed to allow __eq__ to return arbitrary objects.




-- 
Steven
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-09 Thread Rasmus Fogh

Steven DAprano wrote:
 On Mon, 08 Dec 2008 14:24:59 +, Rasmus Fogh wrote:

 For my personal problem I could indeed wrap all objects in a wrapper
 with whatever 'correct' behaviour I want (thanks, TJR). It does seem a
 bit much, though, just to get code like this to work as intended:
   alist.append(x)
   print ('x is present: ', x in alist)

 So, I would much prefer a language change. I am not competent to even
 propose one properly, but I'll try.

 You think changing the language is easier than applying a wrapper to
 your own data??? Oh my, that's too funny for words.

Any individual case of the problem can be hacked somehow - I have already
fixed this one.

My point is that python would be a better language if well-written classes
that followed normal python conventions could be relied on to work
correctly with list, and that it is worth trying to bring this about.
Lists are a central structure of the language after all. Of course you can
disagree, or think the work required would be disproportionate, but surely
there is nothing unreasonable about my point?

Rasmus

---
Dr. Rasmus H. Fogh  Email: [EMAIL PROTECTED]
Dept. of Biochemistry, University of Cambridge,
80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-09 Thread Rasmus Fogh
Steven DAprano wrote:
 On Mon, 08 Dec 2008 14:24:59 +, Rasmus Fogh wrote:

snip

 What might be a sensible behaviour (unlike your proposed wrapper)

Sorry
1) I was rude,
2) I thanked TJR for your wrapper class proposal in a later mail. It is
yours.

 What do you dislike about my wrapper class? Perhaps it is fixable.

I think it is a basic requirement for functioning lists that you get
 alist = [1,x]
 x in alist
True
 alist.remove(x)
 alist
[1] # unless of course x == 1, in which case the list is [x].

Your wrapper would not provide this behaviour. It is necessary to do
if x is y:
  return True
be it in the eq() function, or in the list implementation. Note that this
is the current python behaviour for nan in lists, whatever the mathematics
say.

 would be the following:

 def eq(x, y):
   if x is y:
 return True

 I've already mentioned NaNs. Sentinel values also sometimes need to
 compare not equal with themselves. Forcing them to compare equal will
 cause breakage.

The list.__contains__ method already checks 'x is y' before it checks 'x
== y'. I'd say that a list where my example above does not work is broken
already, but of course I do not want to break further code. Could you give
an example of this use of sentinel values?

   else:
 try:
   return (x == y)
 except Exception:
   return False

 Why False? Why not True? If an error occurs inside __eq__, how do you
 know that the correct result was False?

 class Broken(object):
 def __eq__(self, other):
 return Treu  # oops, raises NameError

In managing collections the purpose of eq would be to divide objects into
a small set that are all equal to each other, and a larger set that are
all unequal to all members of the first set. That requires default to
False. If you default to True then eq(aNumpyArray, x) would return True
for all x.

If an error occurs inside __eq__ it could be 1) because __eq__ is badly
written, or 2) because the type of y was not considered by the
implementers of x or is in some deep way incompatible with x. 1) I cannot
help, and for 2) I am simply saying that value semantics require an __eq__
that returns a truth value. In the absence of that I want identity
semantics.

Rasmus

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-09 Thread Mark Dickinson
On Dec 8, 2:24 pm, Rasmus Fogh [EMAIL PROTECTED] wrote:

 So, I would much prefer a language change. I am not competent to even
 propose one properly, but I'll try.

I don't see any technical problems in what you propose:  as
far as I can see it's entirely feasible.  However:

 should. On the minus side there would be the difference between
 '__equal__' and '__eq__' to confuse people.

I think this is exactly what makes the idea a non-starter. There
are already enough questions on the lists about when to use 'is'
and when to use '==', without adding an 'equals' function into
the mix.  It would add significant extra complexity to the core
language, for questionable (IMO) gain.


There are certainly other languages for which this distinction
would make sense;  I just don't think it's appropriate
for Python, with its emphasis on practicality and and
simplicity.

Mark



 On the plus side the behaviour
 of objects inside collections would now be explicitly defined, and __eq__
 and __equal__ would be so similar that most people could ignore the
 distinction.

 Some examples:

 # NaN:
 # For floats, __equal__ would be the same as __eq__. For NaN this gives x 
 = float('NaN')
  y = float('NaN')
  x == x
 False
  equal(x,x)
 True
  equal(x,y)

 False
 # It may be problematical mathematically, but computationally it makes
 # perfect sense that looking in a given storage location will give you the
 # same value every time, even if the actual value happens to be undefined.
 # The behaviour is simple to describe, and indeed NaN does behave this way
 # in collections at the moment. All we are doing is documenting it clearly.

 # numpy
 Numpy would have no __equal__ function, so we would have pure identity
 semantics - 'equals(x,y)' would be the same as 'x is y'

 # ordinary numbers.
 Any Python object with value semantics would need an __equal__ function
 with the correct behaviour.
 Mark Dickinson pointed out the thread Comparing float and decimal, which
 shows that comparisons between float and decimal numbers do not currently
 satisfy 3). It would not be attractive to have __equal__ and __eq__ behave
 differently for ordinary numbers, so if the relevant __eq__ can not be
 fixed that is a problem for my proposal.

 At this point I shall try to retire gracefully. Regrettably I am not
 competent to discuss if this can be done, how it can be done, and how
 much work is required.

 Rasmus

 ---
 Dr. Rasmus H. Fogh                  Email: [EMAIL PROTECTED]
 Dept. of Biochemistry, University of Cambridge,
 80 Tennis Court Road, Cambridge CB2 1GA, UK.     FAX (01223)766002

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-09 Thread Rasmus Fogh

Mark Dickinson wrote:
 On Dec 8, 2:24 pm, Rasmus Fogh [EMAIL PROTECTED] wrote:

 So, I would much prefer a language change. I am not competent to even
 propose one properly, but I'll try.

 I don't see any technical problems in what you propose:  as
 far as I can see it's entirely feasible.  However:

 should. On the minus side there would be the difference between
 '__equal__' and '__eq__' to confuse people.

 I think this is exactly what makes the idea a non-starter. There
 are already enough questions on the lists about when to use 'is'
 and when to use '==', without adding an 'equals' function into
 the mix.  It would add significant extra complexity to the core
 language, for questionable (IMO) gain.

So:

It is perfectly acceptable behaviour to have __eq__ return a value that
cannot be cast to a boolean, but it still does break the python list. The
fixes proposed so far all get the thumbs down, for various good reasons.

How about:

- Define a new built-in Exception
BoolNotDefinedError(ValueError)

- Have list.__contains__ (etc.) use the following comparison internally:
def newCollectionTest(x,y):
  if x is y:
return True
  else:
try:
  return bool(x == y)
except BoolNotDefinedError:
  return False

- Recommend that numpy.array.__nonzero__ and similar cases
  raise BoolNotDefinedError instead of ValueError

Objects that choose to raise BoolNotDefinedError will now work in lists,
with identity semantics.
Objects that do not raise BoolNotDefinedError have no change in behaviour.
Remains to be seen how hard it is to implement, and how much it slows down
list.__contains__

Rasmus

---
Dr. Rasmus H. Fogh  Email: [EMAIL PROTECTED]
Dept. of Biochemistry, University of Cambridge,
80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-09 Thread Rhamphoryncus
You grossly overvalue using the in operator on lists.  It's far more
common to use a dict or set for containment tests, due to O(1)
performance rather than O(n).  I doubt the numpy array supports
hashing, so an error for misuse is all you should expect.

In the rare case that you want to test for identity in a list, you can
easily write your own function to do it upfront:

def idcontains(seq, obj):
for i in seq:
if i is obj:
return True
return False
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-09 Thread Rhodri James
On Mon, 08 Dec 2008 14:24:59 -, Rasmus Fogh [EMAIL PROTECTED]  
wrote:



On the minus side there would be the difference between
'__equal__' and '__eq__' to confuse people.


This is a very big minus.  It would be far better to spell __equal__ in  
such a way as to make it clear why it wasn't the same as __eq__, otherwise  
you end up with the confusion that the Perl == and eq operators  
regularly cause.


--
Rhodri James *-* Wildebeeste Herder to the Masses
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-08 Thread James Stroud

Robert Kern wrote:

James Stroud wrote:
I think it skips straight to __eq__ if the element is not the first in 
the list.


No, it doesn't skip straight to __eq__(). y is 1 returns False, so 
(y==1) is checked. When y is a numpy array, this returns an array of 
bools. list.__contains__() tries to convert this array to a bool and 
ndarray.__nonzero__() raises the exception.


list.__contains__() checks is then __eq__() for each element before 
moving on to the next element. It does not try is for all elements, 
then try __eq__() for all elements.


Ok. Thanks for the explanation.


  That no one acknowledges this makes me feel like a conspiracy
  is afoot.

I don't know what you think I'm not acknowledging.


Sorry. That was a failed attempt at humor.

James
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-08 Thread Rasmus Fogh
Rober Kern wrote:
James Stroud wrote:
 Steven D'Aprano wrote:
 On Sun, 07 Dec 2008 13:57:54 -0800, James Stroud wrote:

 Rasmus Fogh wrote:

 ll1 = [y,1]
 y in ll1
 True
 ll2 = [1,y]
 y in ll2
 Traceback (most recent call last):
   File stdin, line 1, in module
 ValueError: The truth value of an array with more than one element
is
 ambiguous. Use a.any() or a.all()
 I think you could be safe calling this a bug with numpy.

 Only in the sense that there are special cases where the array
 elements are all true, or all false, and numpy *could* safely return a
 bool. But special cases are not special enough to break the rules.
 Better for the numpy caller to write this:

 a.all() # or any()

 instead of:

 try:
 bool(a)
 except ValueError:
 a.all()

 as they would need to do if numpy sometimes returned a bool and
 sometimes raised an exception.

 I'm missing how a.all() solves the problem Rasmus describes, namely
that
 the order of a python *list* affects the results of containment tests
by
 numpy.array. E.g. y in ll1 and y in ll2 evaluate to different
 results in his example. It still seems like a bug in numpy to me, even
 if too much other stuff is broken if you fix it (in which case it
 apparently becomes an issue).

 It's an issue, if anything, not a bug. There is no consistent
 implementation of
 bool(some_array) that works in all cases. numpy's predecessor Numeric
 used to
 implement this as returning True if at least one element was non-zero.
 This
 works well for bool(x!=y) (which is equivalent to (x!=y).any()) but does
 not
 work well for bool(x==y) (which should be (x==y).all()), but many people
 got
 confused and thought that bool(x==y) worked. When we made numpy, we
 decided to
 explicitly not allow bool(some_array) so that people will not write
 buggy code like this again.

You are so right, Robert:

 The deficiency is in the feature of rich comparisons, not numpy's
 implementation of it. __eq__() is allowed to return non-booleans;
 however, there are some parts of Python's implementation like
 list.__contains__() that still expect the return value of __eq__() to be
 meaningfully cast to a boolean.

One might argue if this is a deficiency in rich comparisons or a rather a
bug in list, set and dict. Certainly numpy is following the rules. In fact
numpy should be applauded for throwing an error rather than returning a
misleading value.

For my personal problem I could indeed wrap all objects in a wrapper with
whatever 'correct' behaviour I want (thanks, TJR). It does seem a bit
much, though, just to get code like this to work as intended:
  alist.append(x)
  print ('x is present: ', x in alist)

So, I would much prefer a language change. I am not competent to even
propose one properly, but I'll try.

First, to clear the air:
Rich comparisons, the ability to overload '==', and the constraints (or
lack of them) on __eq__ must stay unchanged. There are reasons for their
current behaviour - ieee754 is particularly convincing - and anyway they
are not going to change. No point in trying.

There remains the problem is that __eq__ is used inside python
'collections' (list, set, dict etc.), and that the kind of overloading
used (quite legitimately) in numpy etc. breaks the collection behaviour.
It seems that proper behaviour of the collections requires an equality
test that satisfies:
1) x equal x
2) x equal y = y equal x
3) x equal y and y equal z = x equal z
4) (x equal y) is a boolean
5) (x equal y) is defined (and will not throw an error) for all x,y
6) x unequal y == not(x equal y) (by definition)

Note to TJR: 5) does not mean that Python should magically shield me from
errors. All I am asking is that programmers design their equal() function
to avoid raising errors, and that errors raised from equal() clearly
count as bugs.

I cannot imagine getting the collections to work in a simple and intuitive
manner without an equality test that satisfies 1)-6). Maybe somebody else
can. Instead I would propose adding an __equal__ special method for the
purpose.

It looks like the current collections use the folowing, at least in part

def oldCollectionTest(x,y):
  if x is y:
return True
  else:
return (x == y)

I would propose adding a new __equal__ method that satisfies 2) - 6)
above.

We could then define

def newCollectionTest(x,y):
  if x is y:
# this takes care of satisfying 1)
return True
  elif hasattr(x, '__equal__'):
return x.__equal__(y)
  elif hasattr(y, '__equal__'):
return y.__equal__(x)
  else:
return False

The implementations for list, set and dict would then behave according to
newCollectionTest. We would also want an equal() built-in with the same
behaviour.

In plain words, the default behaviour would be identity semantics. Objects
that wanted value semantics could implement an __equal__ function with the
correct behaviour. Wherever possible __equal__ would be the same as
__eq__.  This function may deviate from 'proper' behaviour in some cases.
All I claim 

Re: Rich Comparisons Gotcha

2008-12-08 Thread Rhamphoryncus
On Dec 7, 4:20 pm, Steven D'Aprano [EMAIL PROTECTED]
cybersource.com.au wrote:
 On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:
  Rasmus Fogh wrote:

  Current behaviour is both inconsistent and counterintuitive, as these
  examples show.

  x = float('NaN')
  x == x
  False

  Blame IEEE for that one. Rich comparisons have nothing to do with that
  one.

 There is nothing to blame them for. This is the correct behaviour. NaNs
 should *not* compare equal to themselves, that's mathematically
 incoherent.

Mathematically, NaNs shouldn't be comparable at all.  They should
raise an exception when compared.  In fact, they should raise an
exception when *created*.  But that's not what we want.  What we want
is a dummy value that silently plods through our calculations.  For a
dummy value it seems a lot more sense to pick an arbitrary yet
consistent sort order (I suggest just above -Inf), rather than quietly
screwing up the sort.

Regarding the mythical IEEE 754, although it's extremely rare to find
quotations, I have one on just this subject.  And it does NOT say x
== NaN gives false.  It says it gives *unordered*.  It is C and
probably most other languages that turn that into false (as they want
a dummy value, not an error.)

http://groups.google.ca/group/sci.math.num-analysis/browse_thread/thread/ead0392e646b7cc0/a5bc354cd46f2c49?lnk=stq=why+does+NaN+not+equal+itself%3Frnum=3hl=enpli=1
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-08 Thread Terry Reedy

Robert Kern wrote:

There is an explicit policy that __eq__() methods can return non-bools 
for various purposes. I consider that policy to a presence that can be 
removed. There is no check because that policy exists, not the other 
way around.


OK, presence in manual versus presence in code.


Anyways, this is really a semantic digression, and not particularly 
important. Peace?


Yes


--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-08 Thread Robert Kern

Rhamphoryncus wrote:

On Dec 7, 4:20 pm, Steven D'Aprano [EMAIL PROTECTED]
cybersource.com.au wrote:

On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:

Rasmus Fogh wrote:

Current behaviour is both inconsistent and counterintuitive, as these
examples show.

x = float('NaN')
x == x

False

Blame IEEE for that one. Rich comparisons have nothing to do with that
one.

There is nothing to blame them for. This is the correct behaviour. NaNs
should *not* compare equal to themselves, that's mathematically
incoherent.


Mathematically, NaNs shouldn't be comparable at all.  They should
raise an exception when compared.  In fact, they should raise an
exception when *created*.  But that's not what we want.  What we want
is a dummy value that silently plods through our calculations.  For a
dummy value it seems a lot more sense to pick an arbitrary yet
consistent sort order (I suggest just above -Inf), rather than quietly
screwing up the sort.


Well, there are explicitly two kinds of NaNs: signalling NaNs and quiet NaNs, to 
accommodate both requirements. Additionally, there is significant flexibility in 
trapping the signals.



Regarding the mythical IEEE 754, although it's extremely rare to find
quotations, I have one on just this subject.  And it does NOT say x
== NaN gives false.  It says it gives *unordered*.  It is C and
probably most other languages that turn that into false (as they want
a dummy value, not an error.)

http://groups.google.ca/group/sci.math.num-analysis/browse_thread/thread/ead0392e646b7cc0/a5bc354cd46f2c49?lnk=stq=why+does+NaN+not+equal+itself%3Frnum=3hl=enpli=1


Table 4 on page 9 of the standard is pretty clear on the subject. When the two 
operands are unordered, the operator == returns False. The standard defines how 
to do comparisons notionally; two operands can be greater than, less than, 
equal or unordered. It then goes on to map these notional concepts to 
programming language boolean predicates.


--
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-08 Thread Rhamphoryncus
On Dec 8, 11:54 am, Robert Kern [EMAIL PROTECTED] wrote:
 Rhamphoryncus wrote:
  On Dec 7, 4:20 pm, Steven D'Aprano [EMAIL PROTECTED]
  cybersource.com.au wrote:
  On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:
  Rasmus Fogh wrote:
  Current behaviour is both inconsistent and counterintuitive, as these
  examples show.
  x = float('NaN')
  x == x
  False
  Blame IEEE for that one. Rich comparisons have nothing to do with that
  one.
  There is nothing to blame them for. This is the correct behaviour. NaNs
  should *not* compare equal to themselves, that's mathematically
  incoherent.

  Mathematically, NaNs shouldn't be comparable at all.  They should
  raise an exception when compared.  In fact, they should raise an
  exception when *created*.  But that's not what we want.  What we want
  is a dummy value that silently plods through our calculations.  For a
  dummy value it seems a lot more sense to pick an arbitrary yet
  consistent sort order (I suggest just above -Inf), rather than quietly
  screwing up the sort.

 Well, there are explicitly two kinds of NaNs: signalling NaNs and quiet NaNs, 
 to
 accommodate both requirements. Additionally, there is significant flexibility 
 in
 trapping the signals.

Right, but most of that's lower level.  By the time it reaches Python
we only care about quiet NaNs.


  Regarding the mythical IEEE 754, although it's extremely rare to find
  quotations, I have one on just this subject.  And it does NOT say x
  == NaN gives false.  It says it gives *unordered*.  It is C and
  probably most other languages that turn that into false (as they want
  a dummy value, not an error.)

 http://groups.google.ca/group/sci.math.num-analysis/browse_thread/thr...

 Table 4 on page 9 of the standard is pretty clear on the subject. When the two
 operands are unordered, the operator == returns False. The standard defines 
 how
 to do comparisons notionally; two operands can be greater than, less than,
 equal or unordered. It then goes on to map these notional concepts to
 programming language boolean predicates.

Ahh, interesting.  Still though, does it give an explanation for such
behaviour, or use cases?  There must be some situation where blindly
returning false is enough benefit to trump screwing up sorting.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-08 Thread Robert Kern

Rhamphoryncus wrote:

On Dec 8, 11:54 am, Robert Kern [EMAIL PROTECTED] wrote:

Rhamphoryncus wrote:

On Dec 7, 4:20 pm, Steven D'Aprano [EMAIL PROTECTED]
cybersource.com.au wrote:

On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:

Rasmus Fogh wrote:

Current behaviour is both inconsistent and counterintuitive, as these
examples show.

x = float('NaN')
x == x

False

Blame IEEE for that one. Rich comparisons have nothing to do with that
one.

There is nothing to blame them for. This is the correct behaviour. NaNs
should *not* compare equal to themselves, that's mathematically
incoherent.

Mathematically, NaNs shouldn't be comparable at all.  They should
raise an exception when compared.  In fact, they should raise an
exception when *created*.  But that's not what we want.  What we want
is a dummy value that silently plods through our calculations.  For a
dummy value it seems a lot more sense to pick an arbitrary yet
consistent sort order (I suggest just above -Inf), rather than quietly
screwing up the sort.

Well, there are explicitly two kinds of NaNs: signalling NaNs and quiet NaNs, to
accommodate both requirements. Additionally, there is significant flexibility in
trapping the signals.


Right, but most of that's lower level.  By the time it reaches Python
we only care about quiet NaNs.


No, signaling NaNs raise the exception that you are asking for. You're right 
that if you get a Python float object that is a NaN, it is probably going to be 
quiet, but signaling NaNs can affect Python in the way that you want.



Regarding the mythical IEEE 754, although it's extremely rare to find
quotations, I have one on just this subject.  And it does NOT say x
== NaN gives false.  It says it gives *unordered*.  It is C and
probably most other languages that turn that into false (as they want
a dummy value, not an error.)
http://groups.google.ca/group/sci.math.num-analysis/browse_thread/thr...

Table 4 on page 9 of the standard is pretty clear on the subject. When the two
operands are unordered, the operator == returns False. The standard defines how
to do comparisons notionally; two operands can be greater than, less than,
equal or unordered. It then goes on to map these notional concepts to
programming language boolean predicates.


Ahh, interesting.  Still though, does it give an explanation for such
behaviour, or use cases?  There must be some situation where blindly
returning false is enough benefit to trump screwing up sorting.


Well, the standard was written in the days of Fortran. You didn't really have 
generic sorting routines. You *could* implement whatever ordering you wanted 
because you *had* to implement the ordering yourself. You didn't have to use a 
limited boolean predicate.


Basically, the boolean predicates have to return either True or False. Neither 
one is really satisfactory, but that's the constraint you're under.


--
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-08 Thread Rhamphoryncus
On Dec 8, 1:04 pm, Robert Kern [EMAIL PROTECTED] wrote:
 Rhamphoryncus wrote:
  On Dec 8, 11:54 am, Robert Kern [EMAIL PROTECTED] wrote:
  Rhamphoryncus wrote:
  On Dec 7, 4:20 pm, Steven D'Aprano [EMAIL PROTECTED]
  cybersource.com.au wrote:
  On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:
  Rasmus Fogh wrote:
  Current behaviour is both inconsistent and counterintuitive, as these
  examples show.
  x = float('NaN')
  x == x
  False
  Blame IEEE for that one. Rich comparisons have nothing to do with that
  one.
  There is nothing to blame them for. This is the correct behaviour. NaNs
  should *not* compare equal to themselves, that's mathematically
  incoherent.
  Mathematically, NaNs shouldn't be comparable at all.  They should
  raise an exception when compared.  In fact, they should raise an
  exception when *created*.  But that's not what we want.  What we want
  is a dummy value that silently plods through our calculations.  For a
  dummy value it seems a lot more sense to pick an arbitrary yet
  consistent sort order (I suggest just above -Inf), rather than quietly
  screwing up the sort.
  Well, there are explicitly two kinds of NaNs: signalling NaNs and quiet 
  NaNs, to
  accommodate both requirements. Additionally, there is significant 
  flexibility in
  trapping the signals.

  Right, but most of that's lower level.  By the time it reaches Python
  we only care about quiet NaNs.

 No, signaling NaNs raise the exception that you are asking for. You're right
 that if you get a Python float object that is a NaN, it is probably going to 
 be
 quiet, but signaling NaNs can affect Python in the way that you want.

  Regarding the mythical IEEE 754, although it's extremely rare to find
  quotations, I have one on just this subject.  And it does NOT say x
  == NaN gives false.  It says it gives *unordered*.  It is C and
  probably most other languages that turn that into false (as they want
  a dummy value, not an error.)
 http://groups.google.ca/group/sci.math.num-analysis/browse_thread/thr...
  Table 4 on page 9 of the standard is pretty clear on the subject. When the 
  two
  operands are unordered, the operator == returns False. The standard 
  defines how
  to do comparisons notionally; two operands can be greater than, less 
  than,
  equal or unordered. It then goes on to map these notional concepts to
  programming language boolean predicates.

  Ahh, interesting.  Still though, does it give an explanation for such
  behaviour, or use cases?  There must be some situation where blindly
  returning false is enough benefit to trump screwing up sorting.

 Well, the standard was written in the days of Fortran. You didn't really have
 generic sorting routines. You *could* implement whatever ordering you wanted
 because you *had* to implement the ordering yourself. You didn't have to use a
 limited boolean predicate.

 Basically, the boolean predicates have to return either True or False. Neither
 one is really satisfactory, but that's the constraint you're under.

We've always done it that way is NOT a use case!  Certainly, it's a
factor, but it seems quite weak compared to the sort use case.

I suppose what I'm hoping for is an small example program (one or a
few functions) that needs the always false behaviour of NaN.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-08 Thread Robert Kern

Rhamphoryncus wrote:

On Dec 8, 1:04 pm, Robert Kern [EMAIL PROTECTED] wrote:

Rhamphoryncus wrote:

On Dec 8, 11:54 am, Robert Kern [EMAIL PROTECTED] wrote:

Rhamphoryncus wrote:

On Dec 7, 4:20 pm, Steven D'Aprano [EMAIL PROTECTED]
cybersource.com.au wrote:

On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:

Rasmus Fogh wrote:

Current behaviour is both inconsistent and counterintuitive, as these
examples show.

x = float('NaN')
x == x

False

Blame IEEE for that one. Rich comparisons have nothing to do with that
one.

There is nothing to blame them for. This is the correct behaviour. NaNs
should *not* compare equal to themselves, that's mathematically
incoherent.

Mathematically, NaNs shouldn't be comparable at all.  They should
raise an exception when compared.  In fact, they should raise an
exception when *created*.  But that's not what we want.  What we want
is a dummy value that silently plods through our calculations.  For a
dummy value it seems a lot more sense to pick an arbitrary yet
consistent sort order (I suggest just above -Inf), rather than quietly
screwing up the sort.

Well, there are explicitly two kinds of NaNs: signalling NaNs and quiet NaNs, to
accommodate both requirements. Additionally, there is significant flexibility in
trapping the signals.

Right, but most of that's lower level.  By the time it reaches Python
we only care about quiet NaNs.

No, signaling NaNs raise the exception that you are asking for. You're right
that if you get a Python float object that is a NaN, it is probably going to be
quiet, but signaling NaNs can affect Python in the way that you want.


Regarding the mythical IEEE 754, although it's extremely rare to find
quotations, I have one on just this subject.  And it does NOT say x
== NaN gives false.  It says it gives *unordered*.  It is C and
probably most other languages that turn that into false (as they want
a dummy value, not an error.)
http://groups.google.ca/group/sci.math.num-analysis/browse_thread/thr...

Table 4 on page 9 of the standard is pretty clear on the subject. When the two
operands are unordered, the operator == returns False. The standard defines how
to do comparisons notionally; two operands can be greater than, less than,
equal or unordered. It then goes on to map these notional concepts to
programming language boolean predicates.

Ahh, interesting.  Still though, does it give an explanation for such
behaviour, or use cases?  There must be some situation where blindly
returning false is enough benefit to trump screwing up sorting.

Well, the standard was written in the days of Fortran. You didn't really have
generic sorting routines. You *could* implement whatever ordering you wanted
because you *had* to implement the ordering yourself. You didn't have to use a
limited boolean predicate.

Basically, the boolean predicates have to return either True or False. Neither
one is really satisfactory, but that's the constraint you're under.


We've always done it that way is NOT a use case!  Certainly, it's a
factor, but it seems quite weak compared to the sort use case.


I didn't say it was. I was explaining that sorting was probably *not* a use case 
for the boolean predicates at the time of writing of the standard. In fact, it 
suggests implementing a Compare() function that returns greater than, less 
than, equal or unordered in addition to the boolean predicates. That Python 
eventually chose to use a generic boolean predicate as the basis of its sorting 
routine many years after the IEEE-754 standard is another matter entirely.


In any case, the standard itself is quite short, and does not spend much time 
justifying itself in any detail.



I suppose what I'm hoping for is an small example program (one or a
few functions) that needs the always false behaviour of NaN.


Steven D'Aprano gave one earlier in the thread. Additionally, (x!=x) is a simple 
test for NaNs if an IsNaN(x) function is not available. Really, though, the 
result falls out from the way that IEEE-754 constructed the logic of the 
system. It is not defined that (NaN==NaN) should return False, per se. Rather, 
all of the boolean predicates are defined in terms of that Compare(x,y) 
function. If that function returns unordered, then (x==y) is False. It doesn't 
matter if one or both are NaNs; in either case, the result is unordered.


--
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-08 Thread Terry Reedy

Rasmus Fogh wrote:


For my personal problem I could indeed wrap all objects in a wrapper with
whatever 'correct' behaviour I want (thanks, TJR). It does seem a bit


I was not suggesting that you wrap *everything*, merely an adaptor for 
numpy arrays in whatever subclass and source it is that feeds them to 
your code.  It is fairly unusual, I think, to find numpy arrays 'in the 
wild', outside the constrained context of numerical code where the 
programmer uses them intentionally and hopefully understands their 
peculiarities.



much, though, just to get code like this to work as intended:
  alist.append(x)
  print ('x is present: ', x in alist)


Even if rich comparisons as you propose, the above would *still* not 
necessarily work.  Collection classes can define a __contains__ that 
overrides the default and that can do anything, though True/False is 
recommended.


As best I can think of at the moment, the only things you can absolutely 
depend on is that builtin id(ob) will return an int, that 'ob1 is ob2' 
(based in id()) will be True or False, and that builtin type(ob) will be 
a class (at least in 3.0, not sure of 2.x).  The names can be rebound 
but you can control that within the module you write.


This is what I meant when I said that 'generic' nearly always needs to 
be qualified to something like 'generic for objects that meet the 
interface requirements'.  Every function has that precondition as part 
of its implied contract.  Your code has an interface requirement that 'x 
in y' not raise an exception.  An x,y pair that does it outside its 
contract.


Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-08 Thread Robert Kern

Terry Reedy wrote:

Rasmus Fogh wrote:



much, though, just to get code like this to work as intended:
  alist.append(x)
  print ('x is present: ', x in alist)


Even if rich comparisons as you propose, the above would *still* not 
necessarily work.  Collection classes can define a __contains__ that 
overrides the default and that can do anything, though True/False is 
recommended.


No, it's actually required.

In [4]: class A(object):
def __contains__(self, other):
return 'foo'
   ...:
   ...:

In [7]: a = A()

In [8]: 1 in a
Out[8]: True


Okay, so it will coerce to True/False for you, but unlike rich comparisons, the 
return value must be interpretable as a boolean.


--
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-08 Thread MRAB

Terry Reedy wrote:

Rasmus Fogh wrote:


For my personal problem I could indeed wrap all objects in a wrapper with
whatever 'correct' behaviour I want (thanks, TJR). It does seem a bit


I was not suggesting that you wrap *everything*, merely an adaptor for 
numpy arrays in whatever subclass and source it is that feeds them to 
your code.  It is fairly unusual, I think, to find numpy arrays 'in the 
wild', outside the constrained context of numerical code where the 
programmer uses them intentionally and hopefully understands their 
peculiarities.



much, though, just to get code like this to work as intended:
  alist.append(x)
  print ('x is present: ', x in alist)


Even if rich comparisons as you propose, the above would *still* not 
necessarily work.  Collection classes can define a __contains__ that 
overrides the default and that can do anything, though True/False is 
recommended.


If you have a list of results and you want to see whether one of them is 
Nan then the obvious way is Nan in results, but __contains__ uses 
__eq__ and Nan == Nan returns False, so Nan in results returns False. 
Hmm... Nan is Nan returns True, so if there was a version of 
__contains__ which used is then Nan in results would return True. 
Perhaps Nan is in results? Or would that be too confusing, ie in vs 
is in?


As best I can think of at the moment, the only things you can absolutely 
depend on is that builtin id(ob) will return an int, that 'ob1 is ob2' 
(based in id()) will be True or False, and that builtin type(ob) will be 
a class (at least in 3.0, not sure of 2.x).  The names can be rebound 
but you can control that within the module you write.


I wonder whether there could be some syntactic sugar which would wrap 
try...except... around an expression, eg except(foo(), False), which 
would return False if foo() raised an exception, otherwise return the 
result of foo().


This is what I meant when I said that 'generic' nearly always needs to 
be qualified to something like 'generic for objects that meet the 
interface requirements'.  Every function has that precondition as part 
of its implied contract.  Your code has an interface requirement that 'x 
in y' not raise an exception.  An x,y pair that does it outside its 
contract.



--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-08 Thread Robert Kern

MRAB wrote:

Terry Reedy wrote:

Rasmus Fogh wrote:

For my personal problem I could indeed wrap all objects in a wrapper 
with

whatever 'correct' behaviour I want (thanks, TJR). It does seem a bit


I was not suggesting that you wrap *everything*, merely an adaptor for 
numpy arrays in whatever subclass and source it is that feeds them to 
your code.  It is fairly unusual, I think, to find numpy arrays 'in 
the wild', outside the constrained context of numerical code where the 
programmer uses them intentionally and hopefully understands their 
peculiarities.



much, though, just to get code like this to work as intended:
  alist.append(x)
  print ('x is present: ', x in alist)


Even if rich comparisons as you propose, the above would *still* not 
necessarily work.  Collection classes can define a __contains__ that 
overrides the default and that can do anything, though True/False is 
recommended.


If you have a list of results and you want to see whether one of them is 
Nan then the obvious way is Nan in results, but __contains__ uses 
__eq__ and Nan == Nan returns False, so Nan in results returns False. 
Hmm... Nan is Nan returns True,


However, Nan is SomeOtherNan does not return True.

so if there was a version of 
__contains__ which used is then Nan in results would return True. 
Perhaps Nan is in results? Or would that be too confusing, ie in vs 
is in?


list.__contains__() already checks with is before it tries ==.


In [65]: from numpy import nan, inf

In [66]: other_nan = inf/inf

In [67]: nan in [nan]
Out[67]: True

In [68]: nan in [other_nan]
Out[68]: False


--
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-08 Thread Rhamphoryncus
On Dec 8, 2:51 pm, Robert Kern [EMAIL PROTECTED] wrote:
 Rhamphoryncus wrote:
  We've always done it that way is NOT a use case!  Certainly, it's a
  factor, but it seems quite weak compared to the sort use case.

 I didn't say it was. I was explaining that sorting was probably *not* a use 
 case
 for the boolean predicates at the time of writing of the standard. In fact, it
 suggests implementing a Compare() function that returns greater than, less
 than, equal or unordered in addition to the boolean predicates. That 
 Python
 eventually chose to use a generic boolean predicate as the basis of its 
 sorting
 routine many years after the IEEE-754 standard is another matter entirely.

I interpret that to mean IEEE 754's semantics are for different
circumstances and are inapplicable to Python.


 In any case, the standard itself is quite short, and does not spend much time
 justifying itself in any detail.

A pity, as it is often invoked to explain language design.


  I suppose what I'm hoping for is an small example program (one or a
  few functions) that needs the always false behaviour of NaN.

 Steven D'Aprano gave one earlier in the thread.

I see examples of behaviour, but no use cases.


 Additionally, (x!=x) is a simple
 test for NaNs if an IsNaN(x) function is not available.

That's a trick to work around the lack of IsNaN(x).  Again, not a use
case.


 Really, though, the
 result falls out from the way that IEEE-754 constructed the logic of the
 system. It is not defined that (NaN==NaN) should return False, per se. Rather,
 all of the boolean predicates are defined in terms of that Compare(x,y)
 function. If that function returns unordered, then (x==y) is False. It 
 doesn't
 matter if one or both are NaNs; in either case, the result is unordered.

And if I arbitrarily dictate that NaN is a single value which is
orderable, sorting just above -Infinity, then all the behaviour makes
a lot more sense AND I fix sort.

So you see the predicament I'm in.  On the one hand we have a problem
and an obvious solution.  On the other hand we've got historical
behaviour which everybody insists *must* remain, reasons unknown.  It
reeks of the Parable of the Monkeys.

I think I should head over to one of the math groups and see if they
can find a reason for it.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-08 Thread Terry Reedy

Robert Kern wrote:

Terry Reedy wrote:

Rasmus Fogh wrote:



much, though, just to get code like this to work as intended:
  alist.append(x)
  print ('x is present: ', x in alist)


Even if rich comparisons as you propose, the above would *still* not 
necessarily work.  Collection classes can define a __contains__ that 
overrides the default and that can do anything, though True/False is 
recommended.


No, it's actually required.

In [4]: class A(object):
def __contains__(self, other):
return 'foo'
   ...:
   ...:

In [7]: a = A()

In [8]: 1 in a
Out[8]: True


Okay, so it will coerce to True/False for you, but unlike rich 
comparisons, the return value must be interpretable as a boolean.


Interesting.  I did not expect that from Should return true if item is 
in self, false otherwise., but maybe the lowercase true/false is an 
(undocumented?) abbreviation for 'object with Boolean value True/False'.


Of course, if the return value is not so interpretable, or if 
__contains__ raises an exception, there is no coercion and the OP's code 
will not work.


A different summary of my main point in this thread: Dynamic binding and 
special method hooks make somewhat generic code possible, but the same 
special method hooks make absolutely generic code nearly impossible.


tjr


--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-08 Thread Steven D'Aprano
On Mon, 08 Dec 2008 14:24:59 +, Rasmus Fogh wrote:

 For my personal problem I could indeed wrap all objects in a wrapper
 with whatever 'correct' behaviour I want (thanks, TJR). It does seem a
 bit much, though, just to get code like this to work as intended:
   alist.append(x)
   print ('x is present: ', x in alist)
 
 So, I would much prefer a language change. I am not competent to even
 propose one properly, but I'll try.

You think changing the language is easier than applying a wrapper to your 
own data??? Oh my, that's too funny for words.



-- 
Steven
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-08 Thread Steven D'Aprano
On Mon, 08 Dec 2008 10:20:56 -0800, Rhamphoryncus wrote:

 On Dec 7, 4:20 pm, Steven D'Aprano [EMAIL PROTECTED]
 cybersource.com.au wrote:
 On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:
  Rasmus Fogh wrote:

  Current behaviour is both inconsistent and counterintuitive, as
  these examples show.

  x = float('NaN')
  x == x
  False

  Blame IEEE for that one. Rich comparisons have nothing to do with
  that one.

 There is nothing to blame them for. This is the correct behaviour. NaNs
 should *not* compare equal to themselves, that's mathematically
 incoherent.
 
 Mathematically, NaNs shouldn't be comparable at all.  They should raise
 an exception when compared.  In fact, they should raise an exception
 when *created*.  But that's not what we want.  What we want is a dummy
 value that silently plods through our calculations.  For a dummy value
 it seems a lot more sense to pick an arbitrary yet consistent sort order
 (I suggest just above -Inf), rather than quietly screwing up the sort.
 
 Regarding the mythical IEEE 754, 

It's hardly mythical.

http://ieeexplore.ieee.org/ISOL/standardstoc.jsp?punumber=4610933


 although it's extremely rare to find
 quotations, I have one on just this subject.  And it does NOT say x ==
 NaN gives false.  It says it gives *unordered*.


Unordered means that none of the following is true:

x  NaN
x  NaN
x == NaN


It doesn't mean that comparing a NaN with something else is an error.


-- 
Steven
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-08 Thread Steven D'Aprano
On Sun, 07 Dec 2008 16:24:58 -0800, George Sakkis wrote:

 On Dec 7, 6:37 pm, Steven D'Aprano [EMAIL PROTECTED]
 cybersource.com.au wrote:
...
 Given:

 x = log(-5)  # a NaN
 y = log(-2)  # the same NaN
 x == y  # Some people want this to be true for NaNs.

 Then:

 # Compare x and y directly.
 log(-5) == log(-2)
 # If x == y then exp(x) == exp(y) for all x, y. exp(log(-5)) ==
 exp(log(-2))
 -5 == -2

 and now the entire foundations of mathematics collapses into a steaming
 pile of rubble.
 
 And why doesn't this happen with the current behavior if x = y = log
 (-5) ? According to the same proof,  -5 != -5.

You're right, I was a little sloppy in my proof. There are additional 
subtleties going on.



-- 
Steven
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-08 Thread Rhamphoryncus
On Dec 8, 7:44 pm, Steven D'Aprano
[EMAIL PROTECTED] wrote:
 On Mon, 08 Dec 2008 10:20:56 -0800, Rhamphoryncus wrote:
  On Dec 7, 4:20 pm, Steven D'Aprano [EMAIL PROTECTED]
  cybersource.com.au wrote:
  On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:
   Rasmus Fogh wrote:

   Current behaviour is both inconsistent and counterintuitive, as
   these examples show.

   x = float('NaN')
   x == x
   False

   Blame IEEE for that one. Rich comparisons have nothing to do with
   that one.

  There is nothing to blame them for. This is the correct behaviour. NaNs
  should *not* compare equal to themselves, that's mathematically
  incoherent.

  Mathematically, NaNs shouldn't be comparable at all.  They should raise
  an exception when compared.  In fact, they should raise an exception
  when *created*.  But that's not what we want.  What we want is a dummy
  value that silently plods through our calculations.  For a dummy value
  it seems a lot more sense to pick an arbitrary yet consistent sort order
  (I suggest just above -Inf), rather than quietly screwing up the sort.

  Regarding the mythical IEEE 754,

 It's hardly mythical.

 http://ieeexplore.ieee.org/ISOL/standardstoc.jsp?punumber=4610933

I consider it to be mythical because most knowledge of it is
indirect.  Few who use floating point have the documents available to
them.  Requiring purchase/membership is the cause of this.


  although it's extremely rare to find
  quotations, I have one on just this subject.  And it does NOT say x ==
  NaN gives false.  It says it gives *unordered*.

 Unordered means that none of the following is true:

 x  NaN
 x  NaN
 x == NaN

 It doesn't mean that comparing a NaN with something else is an error.

Robert Kern already clarified that.  My confusion was due to relying
on second-hand knowledge.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread James Stroud

Rasmus Fogh wrote:

Dear All,

For the first time I have come across a Python feature that seems
completely wrong. After the introduction of rich comparisons, equality
comparison does not have to return a truth value, and may indeed return
nothing at all and throw an error instead. As a result, code like
  if foo == bar:
or
  foo in alist
cannot be relied on to work.

This is clearly no accident. According to the documentation all comparison
operators are allowed to return non-booleans, or to throw errors. There is
explicitly no guarantee that x == x is True.


I'm not a computer scientist, so my language and perspective on the 
topic may be a bit naive, but I'll try to demonstrate my caveman 
understanding example.


First, here is why the ability to throw an error is a feature:

class Apple(object):
  def __init__(self, appleness):
self.appleness = appleness
  def __cmp__(self, other):
assert isinstance(other, Apple), 'must compare apples to apples'
return cmp(self.appleness, other.appleness)

class Orange(object): pass

Apple(42) == Orange()


Second, consider that any value in python also evaluates to a truth 
value in boolean context.


Third, every function returns something. A function's returning nothing 
is not a possibility in the python language. None is something but 
evaluates to False in boolean context.



But surely you can define an equal/unequal classification for all
types of object, if you want to?


This reminds me of complex numbers: would 4 + 4i be equal to sqrt(32)? 
Even in the realm of pure mathematics, the generality of objects (i.e. 
numbers) can not be assumed.



James


--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread Rasmus Fogh
Robert Kern Wrote:
Terry Reedy wrote:
 Rasmus Fogh wrote:
 Personally I would like to get these [EMAIL PROTECTED]* misfeatures 
 removed,

 What you are calling a misfeature is an absence, not a presence that
 can be removed.

 That's not quite true. Rich comparisons explicitly allow non-boolean
 return values. Breaking up __cmp__ into multiple __special__ methods was
 not the sole purpose of rich comparisons. One of the prime examples at the
 time was numpy (well, Numeric at the time). We wanted to use == to be able
 to return an array
 with boolean values where the two operand arrays were equal. E.g.

 In [1]: from numpy import *

 In [2]: array([1, 2, 3]) == array([4, 2, 3])
 Out[2]: array([False,  True,  True], dtype=bool)

 SQLAlchemy uses these operators to build up objects that will be turned
 into SQL expressions.

  print users.c.id==addresses.c.user_id
 users.id = addresses.user_id

 Basically, the idea was to turn these operators into full-fledged
 operators like +-/*. Returning a non-boolean violates neither the letter,
 nor the spirit of the feature.

 Unfortunately, if you do overload __eq__ to build up expressions or
 whatnot, the other places where users of __eq__ are implicitly expecting
 a boolean break.
 While I was (and am) a supporter of rich comparisons, I feel Rasmus's
 pain from time to time. It would be nice to have an alternate method to
 express the boolean yes, this thing is equal in value to that other thing.
 Unfortunately, I haven't figured out a good way to fit it in now without
 sacrificing rich comparisons entirely.

The best way, IMHO, would have been to use an alternative notation in
numpy and SQLalchemy, and have '==' always return only a truth value - it
could be a non-boolean as long as the bool() function gave the correct
result. Surely the extra convenience of overloading '==' in special cases
was not worth breaking such basic operations as 'bool(x == y)' or
'x in alist'. Again, the problem is only with '==', not with '', '='
etc. Of course it is done now, and unlikely to be reversed.

 and constrain the __eq__ function to always return a truth value.

 It is impossible to do that with certainty by any mechanical
 creation-time checking.  So the implementation of operator.eq would
 have to check the return value of the ob.__eq__ function it calls *every
 time*.  That would slow down the speed of the 99.xx% of cases where the
 check is not needed and would still not prevent exceptions.  And if the
 return value was bad, all operator.eq could do is raise and exception
 anyway.

Sure, but then it would be a bug to return a non-boolean from __eq__ and
friends. It is not a bug today. I think that's what Rasmus is proposing.

Yes, that is the point. If __eq__ functions are *supposed* to return
booleans I can write generic code that will work for well-behaved objects,
and any errors will be somebody elses fault. If __eq__ is free to return
anything, or throw an error, it becomes my responsibility to write generic
code that will work anyway, including with floating point numbers, numpy,
or SQLalchemy. And I cannot see any way to do that (suggestions welcome).
If purportedly general code does not work with numpy, your average numpy
user will not be receptive to the idea that it is all numpys fault.

Current behaviour is both inconsistent and counterintuitive, as these
examples show.

 x = float('NaN')
 x == x
False
 ll = [x]
 x in ll
True
 x == ll[0]
False

 import numpy
 y = numpy.zeros((3,))
 y
array([ 0.,  0.,  0.])
 bool(y==y)
Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()
 ll1 = [y,1]
 y in ll1
True
 ll2 = [1,y]
 y in ll2
Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()


Can anybody see a way this could be fixed (please)? I may well have to
live with it, but I would really prefer not to.

---
Dr. Rasmus H. Fogh  Email: [EMAIL PROTECTED]
Dept. of Biochemistry, University of Cambridge,
80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread Rasmus Fogh
Jamed Stroud Wrote:
 Rasmus Fogh wrote:
 Dear All,

 For the first time I have come across a Python feature that seems
 completely wrong. After the introduction of rich comparisons, equality
 comparison does not have to return a truth value, and may indeed return
 nothing at all and throw an error instead. As a result, code like
   if foo == bar:
 or
   foo in alist
 cannot be relied on to work.

 This is clearly no accident. According to the documentation all
 comparison operators are allowed to return non-booleans, or to throw
 errors. There is
 explicitly no guarantee that x == x is True.

 I'm not a computer scientist, so my language and perspective on the
 topic may be a bit naive, but I'll try to demonstrate my caveman
 understanding example.

 First, here is why the ability to throw an error is a feature:

 class Apple(object):
def __init__(self, appleness):
  self.appleness = appleness
def __cmp__(self, other):
  assert isinstance(other, Apple), 'must compare apples to apples'
  return cmp(self.appleness, other.appleness)

 class Orange(object): pass

 Apple(42) == Orange()

True, but that does not hold for __eq__, only for __cmp__, and
for__gt__, __le__, etc.
Consider:

Class Apple(object):
  def __init__(self, appleness):
self.appleness = appleness
  def __gt__(self, other):
 assert isinstance(other, Apple), 'must compare apples to apples'
 return (self.appleness  other.appleness)
  def __eq__(self, other):
if  isinstance(other, Apple):
  return (self.appleness == other.appleness)
else:
  return False

 Second, consider that any value in python also evaluates to a truth
 value in boolean context.

 Third, every function returns something. A function's returning nothing
 is not a possibility in the python language. None is something but
 evaluates to False in boolean context.

Indeed. The requirement would be not that return_value was a boolean, but
that bool(return_value) was defined and gave the correct result. I
understand that in some old Numeric/numpy version the numpy array __eq__
function returned a non-empty array, so that
bool(numarray1 == numarray2)
was true for any pair of arguments, which is one way of breaking '=='.
In current numpy, even
bool(numarray1 == 1)
throws an error, which is another way of breaking '=='.

 But surely you can define an equal/unequal classification for all
 types of object, if you want to?

 This reminds me of complex numbers: would 4 + 4i be equal to sqrt(32)?
 Even in the realm of pure mathematics, the generality of objects (i.e.
 numbers) can not be assumed.

It sounds like that problem is simpler in computing. sqrt(32) evaluates to
5.6568542494923806 on my computer. A complex number c with non-zero
imaginary part would be unequal to sqrt(32) even if it so happened that
c*c==32.

Yours,

Rasmus

---
Dr. Rasmus H. Fogh  Email: [EMAIL PROTECTED]
Dept. of Biochemistry, University of Cambridge,
80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread Luis Zarrabeitia


Quoting James Stroud [EMAIL PROTECTED]:

 First, here is why the ability to throw an error is a feature:
 
 class Apple(object):
def __init__(self, appleness):
  self.appleness = appleness
def __cmp__(self, other):
  assert isinstance(other, Apple), 'must compare apples to apples'
  return cmp(self.appleness, other.appleness)
 
 class Orange(object): pass
 
 Apple(42) == Orange()

I beg to disagree.
The right answer for the question Am I equal to this chair right here? is not
I don't know, nor I can't compare. The answer is No, I'm not a chair, thus
I'm not equal to this chair right here. If someone comes to my house, looking
for me, he will not run away because he sees a chair before he sees me. Your
assert doesn't belong inside the methot, it should be up to the caller to decide
if the human-chair comparisons make sense or not. I certainly don't want to be
type-checking when looking for an object within a mixed-type collection. 

 This reminds me of complex numbers: would 4 + 4i be equal to sqrt(32)? 

I assume you meant sqrt(32i).
Well, sqrt is a function, and if its result value is defined as 4+4i, then the
answer is 'yes', otherwise, the answer should be no.

sqrt(4) is *not* -2, and should not be equal to -2. The standard definition of
the square root _function_ for real numbers is to take the non-negative real
root. I haven't heard of a standard square root _function_ for complex numbers
(there is of course, a definition of square root, but it is not a function).

So, if by your definition of sqrt, sqrt(32i) returns a number, there is no
ambiguity. -2 is not sqrt(4). If you need the answer to be 'True', you may be
asking the wrong question. 


--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread Steven D'Aprano
On Sun, 07 Dec 2008 13:03:43 +, Rasmus Fogh wrote:

 Jamed Stroud Wrote:
...
 Second, consider that any value in python also evaluates to a truth
 value in boolean context.

But bool(x) can fail too. So not every object in Python can be 
interpreted as a truth value.


 Third, every function returns something. 

Unless it doesn't return at all.


 A function's returning nothing
 is not a possibility in the python language. None is something but
 evaluates to False in boolean context.
 
 Indeed. The requirement would be not that return_value was a boolean,
 but that bool(return_value) was defined and gave the correct result.

If __bool__ or __nonzero__ raises an exception, you would like Python to 
ignore the exception and return True or False. Which should it be? How do 
you know what the correct result should be?

From the Zen of Python:

In the face of ambiguity, refuse the temptation to guess.


All binary operators are ambiguous when dealing with vector or array 
operands. Should the operator operate on the array as a whole, or on each 
element? The numpy people have decided that element-wise equality testing 
is more useful for them, and this is their prerogative to do so. In fact, 
the move to rich comparisons was driven by the needs of numpy. 

http://www.python.org/dev/peps/pep-0207/

It is a *VERY* important third-party library, and this was not the first 
and probably won't be the last time that their needs will move into 
Python the language.

Python encourages such domain-specific behaviour. In fact, that's what 
operator-overloading is all about: classes can define what any operator 
means for *them*. There's no requirement that the infinity of potential 
classes must all define operators in a mutually compatible fashion, not 
even for comparison operators.

For example, consider a class implementing one particular version of 
three-value logic. It isn't enough for == to only return True or False, 
because you also need Maybe:

True == False = returns False
True == True = returns True
True == Maybe = returns Maybe
etc.

Or consider fuzzy logic, where instead of two truth values, you have a 
continuum of truth values between 0.0 and 1.0. What should comparing two 
such fuzzy values for equality return? A boolean True/False? Another 
fuzzy value?


Another one from the Zen:

Special cases aren't special enough to break the rules.

The rules are that classes can customize their behaviour, that methods 
can fail, and that Python should not try to guess what the correct value 
should have been in the event of such a failure. Equality is a special 
case, but it isn't so special that it needs to be an exception from those 
rules.

If you really need a guaranteed-can't-fail[1] equality test, try 
something like this untested wrapper class:

class EqualityWrapper(object):
def __init__(self, obj):
self.wrapped = obj
def __eq__(self, other):
try:
return bool(self.wrapped == other)
except Exception:
return False  # or maybe True?

Now wrap all your data:

data = [a list of arbitrary objects]
data = map(EqualityWrapper, data)
process(data)




[1] Not a guarantee.

-- 
Steven
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread Rasmus Fogh
 On Sun, 07 Dec 2008 13:03:43 +, Rasmus Fogh wrote:
 Jamed Stroud Wrote:
 ...
 Second, consider that any value in python also evaluates to a truth
 value in boolean context.

 But bool(x) can fail too. So not every object in Python can be
 interpreted as a truth value.

 Third, every function returns something.

 Unless it doesn't return at all.

 A function's returning nothing
 is not a possibility in the python language. None is something but
 evaluates to False in boolean context.

 Indeed. The requirement would be not that return_value was a boolean,
 but that bool(return_value) was defined and gave the correct result.

 If __bool__ or __nonzero__ raises an exception, you would like Python to
 ignore the exception and return True or False. Which should it be? How
 do you know what the correct result should be?

 From the Zen of Python:

 In the face of ambiguity, refuse the temptation to guess.

 All binary operators are ambiguous when dealing with vector or array
 operands. Should the operator operate on the array as a whole, or on
 each element? The numpy people have decided that element-wise equality
 testing is more useful for them, and this is their prerogative to do so.
 In fact, the move to rich comparisons was driven by the needs of numpy.

 http://www.python.org/dev/peps/pep-0207/

 It is a *VERY* important third-party library, and this was not the first
 and probably won't be the last time that their needs will move into
 Python the language.

 Python encourages such domain-specific behaviour. In fact, that's what
 operator-overloading is all about: classes can define what any operator
 means for *them*. There's no requirement that the infinity of potential
 classes must all define operators in a mutually compatible fashion, not
 even for comparison operators.

 For example, consider a class implementing one particular version of
 three-value logic. It isn't enough for == to only return True or False,
 because you also need Maybe:

 True == False = returns False
 True == True = returns True
 True == Maybe = returns Maybe
 etc.

 Or consider fuzzy logic, where instead of two truth values, you have a
 continuum of truth values between 0.0 and 1.0. What should comparing two
 such fuzzy values for equality return? A boolean True/False? Another
 fuzzy value?

 Another one from the Zen:

 Special cases aren't special enough to break the rules.

 The rules are that classes can customize their behaviour, that methods
 can fail, and that Python should not try to guess what the correct value
 should have been in the event of such a failure. Equality is a special
 case, but it isn't so special that it needs to be an exception from
 those rules.

 If you really need a guaranteed-can't-fail[1] equality test, try
 something like this untested wrapper class:

 class EqualityWrapper(object):
def __init__(self, obj):
self.wrapped = obj
def __eq__(self, other):
try:
return bool(self.wrapped == other)
except Exception:
return False  # or maybe True?

 Now wrap all your data:

 data = [a list of arbitrary objects]
 data = map(EqualityWrapper, data)
 process(data)

 [1] Not a guarantee.

Well, lots to think about.

Just to keep you from shooting at straw men:

I would have liked it to be part of the design contract (a convention, if
you like) that
1) bool(x == y) should return a boolean and never throw an error
2) x == x return True

I do *not* say that bool(x) should never throw an error.
I do *not* say that Python should guess a return value if an __eq__
function throws an error, only that it should have been considered a bug,
or at least bad form, for __eq__ functions to do so.

What might be a sensible behaviour (unlike your proposed wrapper) would be
the following:

def eq(x, y):
  if x is y:
return True
  else:
try:
  return (x == y)
except Exception:
  return False

If is is possible to change the language, how about having two
diferent functions, one for overloading the '==' operator, and another
for testing list and set membership, dictionary key identity, etc.?
For instance like this
- Add a new function __equals__; x.__equals__(y) could default to
  bool(x.__eq__(y))
- Estalish by convention that x.__equals__(y) must return a boolean and
  may not intentionally throw an error.
- Establish by convention that 'x is y' implies 'x.__equals__(y)'
  in the sense that (not (x is y and not x.__equals__(y)) must always hold
- Have the Python data structures call __equals__ when they want to
  compare objects internally (e.g. for 'x in alist', 'x in adict',
  'set(alist)', etc.
- Provide an equals(x,y) built-in that calls the __equals__ function
- numpy and others who (mis)use '==' for their own purposes could use
  def __equals__(self, other): return (self is other)


For the float NaN case it looks like things are already behaving like
this. For numpy objects you would not lose anything, since
'numpyArray in alist' is broken 

Re: Rich Comparisons Gotcha

2008-12-07 Thread Mark Dickinson
On Dec 7, 4:23 pm, Rasmus Fogh [EMAIL PROTECTED] wrote:

 If is is possible to change the language, how about having two
 diferent functions, one for overloading the '==' operator, and another
 for testing list and set membership, dictionary key identity, etc.?

I've often thought that this would have made a lot of sense too,
though
I'd probably choose to spell the well-behaved structural equality ==
and the flexible numeric equality eq (a la Fortran).  Hey, we could
have *six* new keywords: eq, ne, le, lt, ge, gt!

See the recent (September?) thread Comparing float and decimal
for some of the fun that results from lack of transitivity of
equality.

But I think there's essentially no chance of Python changing to
support this.  And even if there were, Python's conflation of
structural equality with numeric equality brings significant
benefits in terms of readability of code, ease of learning,
and general friendliness; it's only really troublesome in
a few corner cases.  Is the tradeoff worth it?

So for me, this comes down to a case of 'practicality beats purity'.

Mark
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread Robert Kern

Rasmus Fogh wrote:


Current behaviour is both inconsistent and counterintuitive, as these
examples show.


x = float('NaN')
x == x

False


Blame IEEE for that one. Rich comparisons have nothing to do with that one.


ll = [x]
x in ll

True

x == ll[0]

False


import numpy
y = numpy.zeros((3,))
y

array([ 0.,  0.,  0.])

bool(y==y)

Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()

ll1 = [y,1]
y in ll1

True

ll2 = [1,y]
y in ll2

Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()

Can anybody see a way this could be fixed (please)? I may well have to
live with it, but I would really prefer not to.


Make a concrete proposal for fixing it that does not break backwards 
compatibility.

--
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread James Stroud

Luis Zarrabeitia wrote:


Quoting James Stroud [EMAIL PROTECTED]:


First, here is why the ability to throw an error is a feature:

class Apple(object):
   def __init__(self, appleness):
 self.appleness = appleness
   def __cmp__(self, other):
 assert isinstance(other, Apple), 'must compare apples to apples'
 return cmp(self.appleness, other.appleness)

class Orange(object): pass

Apple(42) == Orange()


I beg to disagree.
The right answer for the question Am I equal to this chair right here? is not
I don't know, nor I can't compare. The answer is No, I'm not a chair, thus
I'm not equal to this chair right here. If someone comes to my house, looking
for me, he will not run away because he sees a chair before he sees me. Your
assert doesn't belong inside the methot, it should be up to the caller to decide
if the human-chair comparisons make sense or not. I certainly don't want to be
type-checking when looking for an object within a mixed-type collection. 

This reminds me of complex numbers: would 4 + 4i be equal to sqrt(32)? 


I assume you meant sqrt(32i).


No, I definitely didn't mean sqrt(32i). I'm using sqrt() to represent 
the mathematical square root, and not an arbitrary function one might 
define, by the way.


My point is that 4 + 4i, sqrt(32), and sqrt(-32) all exist in different 
spaces. They are not comparable, even when testing for equality in a 
pure mathematical sense. If when encounter these values in our programs, 
we might like the power to decide the results of these comparisons. In 
one context it might make sense to throw an exception, in another, it 
might make sense to return False based on the fact that we consider them 
different types, in yet another context, it might make sense to look 
at complex plane values as vectors and return their scalar magnitude for 
comparison to real numbers. I think this ability to define the results 
of comparisons is not a shortcoming of the language but a strength.


--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread James Stroud

Rasmus Fogh wrote:

Current behaviour is both inconsistent and counterintuitive, as these
examples show.


x = float('NaN')
x == x

False


Perhaps this should raise an exception? I think the problem is not with 
comparisons in general but with the fact that nan is type float:


py type(float('NaN'))
type 'float'

No float can be equal to nan, but nan is a float. How can something be 
not a number and a float at the same time? The illogicality of nan's 
type creates the possibility for the illogical results of comparisons to 
nan including comparing nan to itself.



ll = [x]
x in ll

True

x == ll[0]

False


But there is consistency on the basis of identity which is the test for 
containment (in):


py x is x
True
py x in [x]
True

Identity and equality are two different concepts. Comparing identity to 
equality is like comparing apples to oranges ;o)





import numpy
y = numpy.zeros((3,))
y

array([ 0.,  0.,  0.])

bool(y==y)

Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()


But the equality test is not what fails here. It's the cast to bool that 
fails, which for numpy works like a unary ufunc. The designers of numpy 
thought that this would be a more desirable behavior. The test for 
equality likewise is a binary ufunc and the behavior was chosen in numpy 
for practical reasons. I don't know if you can overload the == operator 
in C, but if you can, you would be able to achieve the same behavior.



ll1 = [y,1]
y in ll1

True

ll2 = [1,y]
y in ll2

Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()


I think you could be safe calling this a bug with numpy. But the fact 
that someone can create a bug with a language is not a condemnation of 
the language. For example, C makes it real easy to crash a program by 
overrunning the limits of an array, but no one would suggest to remove 
arrays from C.



Can anybody see a way this could be fixed (please)? I may well have to
live with it, but I would really prefer not to.


Your only hope is to somehow convince the language designers to remove 
the ability to overload == then get them to agree on what you think the 
proper behavior should be for comparisons. I think the probability of 
that happening is about zero, though, because such a change would run 
counter to the dynamic nature of the language.


James


--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread Terry Reedy

Robert Kern wrote:

Terry Reedy wrote:

Rasmus Fogh wrote:



Personally I would like to get these [EMAIL PROTECTED]* misfeatures removed,


What you are calling a misfeature is an absence, not a presence that 
can be removed.


That's not quite true.


In what way, pray tell.  My statement still looks quite true to me.

 Rich comparisons explicitly allow non-boolean return values.

They do so by not doing anything to the return value of the underlying 
method.  As I said, the OP is complaining about an absence of a check. 
Moreover, the absence is intentional as I explained in the part snipped 
and as you further explained.



And if the return value was bad, all operator.eq could do is raise and 
exception anyway.


Sure, but then it would be a bug to return a non-boolean from __eq__ and 
friends. It is not a bug today. I think that's what Rasmus is proposing.


Right, the addition of a check that is absent today.

tjr


--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread James Stroud

James Stroud wrote:

[cast to bool] for numpy works like a unary ufunc.


Scratch that. Not thinking and typing at same time.


--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread Terry Reedy

Rasmus Fogh wrote:



Can anybody see a way this could be fixed (please)? I may well have to
live with it, but I would really prefer not to.


I made a suggestion in my first response, which perhaps you missed.

tjr

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread Steven D'Aprano
On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:

 Rasmus Fogh wrote:
 
 Current behaviour is both inconsistent and counterintuitive, as these
 examples show.
 
 x = float('NaN')
 x == x
 False
 
 Blame IEEE for that one. Rich comparisons have nothing to do with that
 one.

There is nothing to blame them for. This is the correct behaviour. NaNs 
should *not* compare equal to themselves, that's mathematically 
incoherent.

-- 
Steven
--
http://mail.python.org/mailman/listinfo/python-list


Re: Re: Rich Comparisons Gotcha

2008-12-07 Thread acerimusdux

James Stroud wrote:
div class=moz-text-flowed style=font-family: -moz-fixedRasmus 
Fogh wrote:

Current behaviour is both inconsistent and counterintuitive, as these
examples show.


x = float('NaN')
x == x

False


Perhaps this should raise an exception? I think the problem is not 
with comparisons in general but with the fact that nan is type float:


py type(float('NaN'))
type 'float'

No float can be equal to nan, but nan is a float. How can something be 
not a number and a float at the same time? The illogicality of nan's 
type creates the possibility for the illogical results of comparisons 
to nan including comparing nan to itself.





I initially thought that looked like a bug to me.  But, this is 
apparently standard behavior required for NaN.  I'm only using 
Wikipedia as a reference here, but about 80% of the way down, under 
standard operations:

http://en.wikipedia.org/wiki/IEEE_754-1985

Comparison operations. NaN is treated specially in that NaN=NaN always 
returns false.


Presumably since floating point calculations return NaN for some 
operations, and one Nan is usually not equal to another, this is the 
required behavior. So not a Python issue (though understandably a bit 
confusing).


The array issue seems to be with one 3rd party library, and one can 
choose to use or not use their library, to ask them to change it, or 
even to decide to override their == operator, if one doesn't like the 
way it is designed.


--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread Steven D'Aprano
On Sun, 07 Dec 2008 23:20:12 +, Steven D'Aprano wrote:

 On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:
 
 Rasmus Fogh wrote:
 
 Current behaviour is both inconsistent and counterintuitive, as these
 examples show.
 
 x = float('NaN')
 x == x
 False
 
 Blame IEEE for that one. Rich comparisons have nothing to do with that
 one.
 
 There is nothing to blame them for. This is the correct behaviour. NaNs
 should *not* compare equal to themselves, that's mathematically
 incoherent.


Sorry, I should explain why.

Given:

x = log(-5)  # a NaN
y = log(-2)  # the same NaN
x == y  # Some people want this to be true for NaNs.

Then:

# Compare x and y directly.
log(-5) == log(-2)
# If x == y then exp(x) == exp(y) for all x, y.
exp(log(-5)) == exp(log(-2))
-5 == -2


and now the entire foundations of mathematics collapses into a steaming 
pile of rubble.


-- 
Steven
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread Steven D'Aprano
On Sun, 07 Dec 2008 13:57:54 -0800, James Stroud wrote:

 Rasmus Fogh wrote:
 Current behaviour is both inconsistent and counterintuitive, as these
 examples show.
 
 x = float('NaN')
 x == x
 False
 
 Perhaps this should raise an exception?

Why on earth would you want checking equality on NaN to raise an 
exception??? What benefit does it give?


 I think the problem is not with
 comparisons in general but with the fact that nan is type float:
 
 py type(float('NaN'))
 type 'float'
 
 No float can be equal to nan, but nan is a float. How can something be
 not a number and a float at the same time? 

Because floats are not real numbers. They are *almost* numbers, they 
often (but not always) behave like numbers, but they're actually not 
numbers.

The difference is subtle enough that it is easy to forget that floats are 
not numbers, but it's easy enough to find examples proving it:

Some perfectly good numbers don't exist as floats:

 2**-1 == 0.0
True

Try as you might, you can't get the number 0.1 *exactly* as a float:

 0.1
0.10001


For any numbers x and y not equal to zero, x+y != x. But that fails for 
floats:

 1001.0 + 1e99 == 1e99
True

The above is because of overflow. But even avoiding overflow doesn't 
solve the problem. With a little effort, you can also find examples of 
ordinary sized floats where (x+y)-y != x.

 0.9+0.1-0.9 == 0.1
False



 import numpy
 y = numpy.zeros((3,))
 y
 array([ 0.,  0.,  0.])
 bool(y==y)
 Traceback (most recent call last):
   File stdin, line 1, in module
 ValueError: The truth value of an array with more than one element is
 ambiguous. Use a.any() or a.all()
 
 But the equality test is not what fails here. It's the cast to bool that
 fails

And it is right to do so, because it is ambiguous and the library 
designers rightly avoided the temptation of guessing what result is 
needed.


 ll1 = [y,1]
 y in ll1
 True
 ll2 = [1,y]
 y in ll2
 Traceback (most recent call last):
   File stdin, line 1, in module
 ValueError: The truth value of an array with more than one element is
 ambiguous. Use a.any() or a.all()
 
 I think you could be safe calling this a bug with numpy. 

Only in the sense that there are special cases where the array elements 
are all true, or all false, and numpy *could* safely return a bool. But 
special cases are not special enough to break the rules. Better for the 
numpy caller to write this:

a.all() # or any()

instead of:

try:
bool(a)
except ValueError:
a.all()

as they would need to do if numpy sometimes returned a bool and sometimes 
raised an exception.



-- 
Steven
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread Robert Kern

Steven D'Aprano wrote:

On Sun, 07 Dec 2008 23:20:12 +, Steven D'Aprano wrote:


On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:


Rasmus Fogh wrote:


Current behaviour is both inconsistent and counterintuitive, as these
examples show.


x = float('NaN')
x == x

False

Blame IEEE for that one. Rich comparisons have nothing to do with that
one.

There is nothing to blame them for. This is the correct behaviour. NaNs
should *not* compare equal to themselves, that's mathematically
incoherent.


Sorry, I should explain why.

Given:

x = log(-5)  # a NaN
y = log(-2)  # the same NaN
x == y  # Some people want this to be true for NaNs.

Then:

# Compare x and y directly.
log(-5) == log(-2)
# If x == y then exp(x) == exp(y) for all x, y.
exp(log(-5)) == exp(log(-2))
-5 == -2


and now the entire foundations of mathematics collapses into a steaming 
pile of rubble.


I didn't mean to suggest that it was incorrect, just that that particular 
surprising behavior is not related to rich comparisons. Even if the OP gets an 
__equals__() or some such, NaN will still not compare equal to NaN.


--
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread George Sakkis
On Dec 7, 6:37 pm, Steven D'Aprano [EMAIL PROTECTED]
cybersource.com.au wrote:
 On Sun, 07 Dec 2008 23:20:12 +, Steven D'Aprano wrote:
  On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:

  Rasmus Fogh wrote:

  Current behaviour is both inconsistent and counterintuitive, as these
  examples show.

  x = float('NaN')
  x == x
  False

  Blame IEEE for that one. Rich comparisons have nothing to do with that
  one.

  There is nothing to blame them for. This is the correct behaviour. NaNs
  should *not* compare equal to themselves, that's mathematically
  incoherent.

 Sorry, I should explain why.

 Given:

 x = log(-5)  # a NaN
 y = log(-2)  # the same NaN
 x == y  # Some people want this to be true for NaNs.

 Then:

 # Compare x and y directly.
 log(-5) == log(-2)
 # If x == y then exp(x) == exp(y) for all x, y.
 exp(log(-5)) == exp(log(-2))
 -5 == -2

 and now the entire foundations of mathematics collapses into a steaming
 pile of rubble.

And why doesn't this happen with the current behavior if x = y = log
(-5) ? According to the same proof,  -5 != -5.

George
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread Robert Kern

Terry Reedy wrote:

Robert Kern wrote:

Terry Reedy wrote:

Rasmus Fogh wrote:



Personally I would like to get these [EMAIL PROTECTED]* misfeatures removed,


What you are calling a misfeature is an absence, not a presence that 
can be removed.


That's not quite true.


In what way, pray tell.  My statement still looks quite true to me.


There is an explicit policy that __eq__() methods can return non-bools for 
various purposes. I consider that policy to a presence that can be removed. 
There is no check because that policy exists, not the other way around.


Anyways, this is really a semantic digression, and not particularly important. 
Peace?


--
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread James Stroud

Steven D'Aprano wrote:

On Sun, 07 Dec 2008 13:57:54 -0800, James Stroud wrote:


Rasmus Fogh wrote:



ll1 = [y,1]
y in ll1

True

ll2 = [1,y]
y in ll2

Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()
I think you could be safe calling this a bug with numpy. 


Only in the sense that there are special cases where the array elements 
are all true, or all false, and numpy *could* safely return a bool. But 
special cases are not special enough to break the rules. Better for the 
numpy caller to write this:


a.all() # or any()

instead of:

try:
bool(a)
except ValueError:
a.all()

as they would need to do if numpy sometimes returned a bool and sometimes 
raised an exception.


I'm missing how a.all() solves the problem Rasmus describes, namely that 
the order of a python *list* affects the results of containment tests by 
numpy.array. E.g. y in ll1 and y in ll2 evaluate to different 
results in his example. It still seems like a bug in numpy to me, even 
if too much other stuff is broken if you fix it (in which case it 
apparently becomes an issue).


James
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread Robert Kern

James Stroud wrote:

Steven D'Aprano wrote:

On Sun, 07 Dec 2008 13:57:54 -0800, James Stroud wrote:


Rasmus Fogh wrote:



ll1 = [y,1]
y in ll1

True

ll2 = [1,y]
y in ll2

Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()
I think you could be safe calling this a bug with numpy. 


Only in the sense that there are special cases where the array 
elements are all true, or all false, and numpy *could* safely return a 
bool. But special cases are not special enough to break the rules. 
Better for the numpy caller to write this:


a.all() # or any()

instead of:

try:
bool(a)
except ValueError:
a.all()

as they would need to do if numpy sometimes returned a bool and 
sometimes raised an exception.


I'm missing how a.all() solves the problem Rasmus describes, namely that 
the order of a python *list* affects the results of containment tests by 
numpy.array. E.g. y in ll1 and y in ll2 evaluate to different 
results in his example. It still seems like a bug in numpy to me, even 
if too much other stuff is broken if you fix it (in which case it 
apparently becomes an issue).


It's an issue, if anything, not a bug. There is no consistent implementation of 
bool(some_array) that works in all cases. numpy's predecessor Numeric used to 
implement this as returning True if at least one element was non-zero. This 
works well for bool(x!=y) (which is equivalent to (x!=y).any()) but does not 
work well for bool(x==y) (which should be (x==y).all()), but many people got 
confused and thought that bool(x==y) worked. When we made numpy, we decided to 
explicitly not allow bool(some_array) so that people will not write buggy code 
like this again.


The deficiency is in the feature of rich comparisons, not numpy's implementation 
of it. __eq__() is allowed to return non-booleans; however, there are some parts 
of Python's implementation like list.__contains__() that still expect the return 
value of __eq__() to be meaningfully cast to a boolean.


--
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread James Stroud

Robert Kern wrote:

James Stroud wrote:
I'm missing how a.all() solves the problem Rasmus describes, namely 
that the order of a python *list* affects the results of containment 
tests by numpy.array. E.g. y in ll1 and y in ll2 evaluate to 
different results in his example. It still seems like a bug in numpy 
to me, even if too much other stuff is broken if you fix it (in which 
case it apparently becomes an issue).


It's an issue, if anything, not a bug. There is no consistent 
implementation of bool(some_array) that works in all cases. numpy's 
predecessor Numeric used to implement this as returning True if at least 
one element was non-zero. This works well for bool(x!=y) (which is 
equivalent to (x!=y).any()) but does not work well for bool(x==y) (which 
should be (x==y).all()), but many people got confused and thought that 
bool(x==y) worked. When we made numpy, we decided to explicitly not 
allow bool(some_array) so that people will not write buggy code like 
this again.


The deficiency is in the feature of rich comparisons, not numpy's 
implementation of it. __eq__() is allowed to return non-booleans; 
however, there are some parts of Python's implementation like 
list.__contains__() that still expect the return value of __eq__() to be 
meaningfully cast to a boolean.




You have explained

py 112 = [1, y]
py y in 112
Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: The truth value of an array with more than one element is...

but not

py ll1 = [y,1]
py y in ll1
True

It's this discrepancy that seems like a bug, not that a ValueError is 
raised in the former case, which is perfectly reasonable to me.



All I can imagine is that something like the following lives in the 
bowels of the python code for list:


def __contains__(self, other):
  foundit = False
  for i, v in enumerate(self):
if i == 0:
  # evaluates to bool numpy array
  foundit = one_kind_of_test(v, other)
else:
  # raises exception for numpy array
  foundit = another_kind_of_test(v, other)
if foundit:
  break
  return foundit

I'm trying to imagine some other way to get the results mentioned but I 
honestly can't. It's beyond me why someone would do such a thing, but 
perhaps it's an optimization of some sort.


James
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread Steven D'Aprano
On Sun, 07 Dec 2008 16:23:59 +, Rasmus Fogh wrote:

 Just to keep you from shooting at straw men:
 
 I would have liked it to be part of the design contract (a convention,
 if you like) that
 1) bool(x == y) should return a boolean and never throw an error 


Can't be done without making bool a magic function. If x==y raises an 
exception, bool() won't even be called. The only way around that would be 
for the Python compiler to recognise bool(x=y) and perform special magic.

What if you did this?

trueorfalse = bool  # I don't like George Boole
trueoffalse( [x][0].__class__.__getattr__('__dict__')['__eq__'](y) )


Should that have special magic performed too? Just how much work must the 
compiler put in to special-casing bool?



 2) x == x return True

Which goes against the IEEE 754 floating-point standard.

http://grouper.ieee.org/groups/754/

Python used to optimize x==x and always return True. This was removed 
because it caused problems.


 
 I do *not* say that bool(x) should never throw an error. I do *not* say
 that Python should guess a return value if an __eq__ function throws an
 error,

But to get what you want, the above is implied.

I suppose, just barely, that you could avoid making bool() magic and just 
make if magic. When the compiler sees if expr: it could swallow all 
exceptions inside expr and force it to evaluate to True or False. (How? 
By guessing? Randomly?) This would cause many problems, but it could be 
done, and much easier than ensuring that bool(x) always succeeds.


 only that it should have been considered a bug, or at least bad
 form, for __eq__ functions to do so.


It's certainly *unusual* for comparisons to return non-bools, but it's 
not bad form.


 What might be a sensible behaviour (unlike your proposed wrapper) 

What do you dislike about my wrapper class? Perhaps it is fixable.



 would be the following:
 
 def eq(x, y):
   if x is y:
 return True

I've already mentioned NaNs. Sentinel values also sometimes need to 
compare not equal with themselves. Forcing them to compare equal will 
cause breakage.


   else:
 try:
   return (x == y)
 except Exception:
   return False

Why False? Why not True? If an error occurs inside __eq__, how do you 
know that the correct result was False?

class Broken(object):
def __eq__(self, other):
return Treu  # oops, raises NameError



-- 
Steven
--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread Robert Kern

James Stroud wrote:

Robert Kern wrote:

James Stroud wrote:
I'm missing how a.all() solves the problem Rasmus describes, namely 
that the order of a python *list* affects the results of containment 
tests by numpy.array. E.g. y in ll1 and y in ll2 evaluate to 
different results in his example. It still seems like a bug in numpy 
to me, even if too much other stuff is broken if you fix it (in which 
case it apparently becomes an issue).


It's an issue, if anything, not a bug. There is no consistent 
implementation of bool(some_array) that works in all cases. numpy's 
predecessor Numeric used to implement this as returning True if at 
least one element was non-zero. This works well for bool(x!=y) (which 
is equivalent to (x!=y).any()) but does not work well for bool(x==y) 
(which should be (x==y).all()), but many people got confused and 
thought that bool(x==y) worked. When we made numpy, we decided to 
explicitly not allow bool(some_array) so that people will not write 
buggy code like this again.


The deficiency is in the feature of rich comparisons, not numpy's 
implementation of it. __eq__() is allowed to return non-booleans; 
however, there are some parts of Python's implementation like 
list.__contains__() that still expect the return value of __eq__() to 
be meaningfully cast to a boolean.




You have explained

py 112 = [1, y]
py y in 112
Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: The truth value of an array with more than one element is...

but not

py ll1 = [y,1]
py y in ll1
True

It's this discrepancy that seems like a bug, not that a ValueError is 
raised in the former case, which is perfectly reasonable to me.


Nothing to do with numpy. list.__contains__() checks for identity with is 
before it goes to __eq__().


--
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread James Stroud

Robert Kern wrote:

James Stroud wrote:

py 112 = [1, y]
py y in 112
Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: The truth value of an array with more than one element is...

but not

py ll1 = [y,1]
py y in ll1
True

It's this discrepancy that seems like a bug, not that a ValueError is 
raised in the former case, which is perfectly reasonable to me.


Nothing to do with numpy. list.__contains__() checks for identity with 
is before it goes to __eq__().


...but only for the first element of the list:

py import numpy
py y = numpy.array([1,2,3])
py y
array([1, 2, 3])
py y in [1, y]

Traceback (most recent call last):
  File ipython console, line 1, in module
type 'exceptions.ValueError': The truth value of an array with more 
than one element is ambiguous. Use a.any() or a.all()

py y is [1, y][1]
True

I think it skips straight to __eq__ if the element is not the first in 
the list. That no one acknowledges this makes me feel like a conspiracy 
is afoot.

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-07 Thread Robert Kern

James Stroud wrote:

Robert Kern wrote:

James Stroud wrote:

py 112 = [1, y]
py y in 112
Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: The truth value of an array with more than one element is...

but not

py ll1 = [y,1]
py y in ll1
True

It's this discrepancy that seems like a bug, not that a ValueError is 
raised in the former case, which is perfectly reasonable to me.


Nothing to do with numpy. list.__contains__() checks for identity with 
is before it goes to __eq__().


...but only for the first element of the list:

py import numpy
py y = numpy.array([1,2,3])
py y
array([1, 2, 3])
py y in [1, y]

Traceback (most recent call last):
  File ipython console, line 1, in module
type 'exceptions.ValueError': The truth value of an array with more 
than one element is ambiguous. Use a.any() or a.all()

py y is [1, y][1]
True

I think it skips straight to __eq__ if the element is not the first in 
the list.


No, it doesn't skip straight to __eq__(). y is 1 returns False, so (y==1) is 
checked. When y is a numpy array, this returns an array of bools. 
list.__contains__() tries to convert this array to a bool and 
ndarray.__nonzero__() raises the exception.


list.__contains__() checks is then __eq__() for each element before moving on 
to the next element. It does not try is for all elements, then try __eq__() 
for all elements.


 That no one acknowledges this makes me feel like a conspiracy
 is afoot.

I don't know what you think I'm not acknowledging.

--
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-06 Thread Terry Reedy

Rasmus Fogh wrote:

Dear All,

For the first time I have come across a Python feature that seems
completely wrong. After the introduction of rich comparisons, equality
comparison does not have to return a truth value, and may indeed return
nothing at all and throw an error instead. As a result, code like
  if foo == bar:
or
  foo in alist
cannot be relied on to work.

This is clearly no accident. According to the documentation all comparison
operators are allowed to return non-booleans, or to throw errors. There is
explicitly no guarantee that x == x is True.


You have touched on a real and known issue that accompanies dynamic 
typing and the design of Python.  *Every* Python function can return any 
Python object and may raise any exception either actively, by design, or 
passively, by not catching exceptions raised in the functions *it* calls.



Personally I would like to get these [EMAIL PROTECTED]* misfeatures removed,


What you are calling a misfeature is an absence, not a presence that can 
be removed.



and constrain the __eq__ function to always return a truth value.


It is impossible to do that with certainty by any mechanical 
creation-time checking.  So the implementation of operator.eq would have 
to check the return value of the ob.__eq__ function it calls *every 
time*.  That would slow down the speed of the 99.xx% of cases where the 
check is not needed and would still not prevent exceptions.  And if the 
return value was bad, all operator.eq could do is raise and exception 
anyway.



That is clearly not likely to happen. Unless I have misunderstood something, 
could
somebody explain to me.


a. See above.
b. Python programmers are allowed to define 'weird' but possibly 
useful-in-context behaviors, such as try out 3-value logic, or to 
operate on collections element by element (as with numpy).



1) Why was this introduced?


The 6 comparisons were previously done with one __cmp__ function that 
was supposed to return -1, 0, or 1 and which worked with negative, 0, or 
positive response, but which could return anything or raise an 
exception.  The compare functions could mask but not prevent weird returns.


 I can understand relaxing the restrictions on

'', '=' etc. - after all you cannot define an ordering for all types of
object. But surely you can define an equal/unequal classification for all
types of object, if you want to? Is it just the numpy people wanting to
type 'a == b' instead of 'equals(a,b)', or is there a better reason?

2) If I want to write generic code, can I somehow work around the fact
that
  if foo == bar:
or
  foo in alist
does not work for arbitrary objects?


Every Python function is 'generic' unless restrained by type tests. 
However, even 'generic' functions can only work as expected with objects 
that meet the assumptions embodied in the function.  In my Python-based 
algorithm book-in-progess, I am stating this explicitly.  In particular, 
I say taht the book only applies to objects for which '==' gives a 
boolean result that is reflexive, symmetric, and transitive.  This 
exludes float('nan'), for instance (as I see you discovered), which 
follows the IEEE mandate to act otherwise.



CCPN has a table display class that maintains a list of arbitrary objects,
one per line in the table. The table class is completely generic,


but only for the objects that meet the implied assumption.  This is true 
for *all* Python code.  If you want to apply the function to other 
objects, you must either adapt the function or adapt or wrap the objects 
to give them an interface that does meet the assumptions.


 and subclassed for individual cases. It contains the code:


  if foo in tbllist:
...
  else:
...
tbllist.append(foo)
...

One day the 'if' statement gave this rather obscure error:
ValueError:
 The truth value of an array with more than one element is ambiguous.
 Use a.any() or a.all()
A subclass had used objects passed in from some third party code, and as
it turned out foo happened to be a tuple containing a tuple containing a
numpy array.


Right.  'in' calls '==' and assumes a boolean return.  Assumption 
violated, exception raised.  Completely normal.  The error message even 
suggests a solution: wrap the offending objects in an adaptor class that 
gives them a normal interface with .all (or perhaps the all() builtin).


Terry Jan Reedy


--
http://mail.python.org/mailman/listinfo/python-list


Re: Rich Comparisons Gotcha

2008-12-06 Thread Robert Kern

Terry Reedy wrote:

Rasmus Fogh wrote:

Dear All,

For the first time I have come across a Python feature that seems
completely wrong. After the introduction of rich comparisons, equality
comparison does not have to return a truth value, and may indeed return
nothing at all and throw an error instead. As a result, code like
  if foo == bar:
or
  foo in alist
cannot be relied on to work.

This is clearly no accident. According to the documentation all 
comparison
operators are allowed to return non-booleans, or to throw errors. 
There is

explicitly no guarantee that x == x is True.


You have touched on a real and known issue that accompanies dynamic 
typing and the design of Python.  *Every* Python function can return any 
Python object and may raise any exception either actively, by design, or 
passively, by not catching exceptions raised in the functions *it* calls.



Personally I would like to get these [EMAIL PROTECTED]* misfeatures removed,


What you are calling a misfeature is an absence, not a presence that can 
be removed.


That's not quite true. Rich comparisons explicitly allow non-boolean return 
values. Breaking up __cmp__ into multiple __special__ methods was not the sole 
purpose of rich comparisons. One of the prime examples at the time was numpy 
(well, Numeric at the time). We wanted to use == to be able to return an array 
with boolean values where the two operand arrays were equal. E.g.


In [1]: from numpy import *

In [2]: array([1, 2, 3]) == array([4, 2, 3])
Out[2]: array([False,  True,  True], dtype=bool)

SQLAlchemy uses these operators to build up objects that will be turned into SQL 
expressions.


 print users.c.id==addresses.c.user_id
users.id = addresses.user_id

Basically, the idea was to turn these operators into full-fledged operators like 
+-/*. Returning a non-boolean violates neither the letter, nor the spirit of the 
feature.


Unfortunately, if you do overload __eq__ to build up expressions or whatnot, the 
other places where users of __eq__ are implicitly expecting a boolean break. 
While I was (and am) a supporter of rich comparisons, I feel Rasmus's pain from 
time to time. It would be nice to have an alternate method to express the 
boolean yes, this thing is equal in value to that other thing. Unfortunately, 
I haven't figured out a good way to fit it in now without sacrificing rich 
comparisons entirely.



and constrain the __eq__ function to always return a truth value.


It is impossible to do that with certainty by any mechanical 
creation-time checking.  So the implementation of operator.eq would have 
to check the return value of the ob.__eq__ function it calls *every 
time*.  That would slow down the speed of the 99.xx% of cases where the 
check is not needed and would still not prevent exceptions.  And if the 
return value was bad, all operator.eq could do is raise and exception 
anyway.


Sure, but then it would be a bug to return a non-boolean from __eq__ and 
friends. It is not a bug today. I think that's what Rasmus is proposing.


--
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list