#17086: GenericGraph's documentation, __eq__, and __hash__ out of sync
-------------------------------------+-------------------------------------
Reporter: emassop | Owner:
Type: defect | Status: new
Priority: major | Milestone: sage-6.4
Component: graph theory | Resolution:
Keywords: | Merged in:
Authors: | Reviewers:
Report Upstream: N/A | Work issues:
Branch: | Commit:
u/emassop/graph_hash | 82b245ee0b092a7a994074e27cc47686d43551c9
Dependencies: | Stopgaps:
-------------------------------------+-------------------------------------
Old description:
> 1. The documentation of `GenericGraph.__eq__` contains "For equality,
> ..., output the same vertex list (in order), ...". However, since
> [http://git.sagemath.org/sage.git/commit/?id=b59285e6f9b91f25ee2c77bfe940a45c51b58f80
> b59285e6] (Trac #14806: Immutable graph backend), the order of the
> vertices is no longer checked. So either that check should be
> reintroduced, or the documentation should be updated.
>
> The list returned by `.vertices()` is sorted, so that checking the
> order only affects the result in rare cases. Hence I think updating the
> documentation should be okay, even while it is technically backwards
> incompatible.
>
> 2. While `GenericGraph.__eq__` no longer takes the ordering of vertices
> into account, `GenericGraph.__hash__` still does. Consequently graphs
> that compare equal can have different hashes.
>
> 3. For graphs that are not weighted, `GenericGraph.__eq__` ignores the
> edge labels, while `GenericGraph.__hash__` does not.
>
> Item 3 is most easily shown:
> {{{
> sage: G1 = Graph({0: {1: 'edge label A'}}, immutable=True)
> sage: G2 = Graph({0: {1: 'edge label B'}}, immutable=True)
> sage: G1 == G2
> True
> sage: G1.__hash__() == G2.__hash__()
> False
> }}}
>
> For items 2 and 3, here is some contrived but deterministic code showing
> that the order is not checked in `.__eq__`, while it is in `.__hash__`:
> {{{
> #!python
> import functools
> @functools.total_ordering
> class C:
> def __init__(self, val, reverse):
> self.val = val
> self.reverse = reverse
> def __repr__(self):
> return 'C(%r, %r)' % (self.val, self.reverse)
> def __hash__(self):
> return hash(self.val)
> def __eq__(self, other):
> return self.val == other.val
> def __lt__(self, other):
> if self.reverse:
> return self.val > other.val
> else:
> return self.val < other.val
> G1 = Graph({C(0, False): [], C(1, False): []}, immutable=True)
> G2 = Graph({C(0, True ): [], C(1, True ): []}, immutable=True)
> }}}
> {{{
> sage: [G1.vertices(), G2.vertices()]
> [[C(0, False), C(1, False)], [C(1, True), C(0, True)]]
> sage: G1 == G2
> True
> sage: hash(G1) == hash(G2)
> False
> }}}
>
> The trick in the code above is to obtain tuples (x1, y1) and (x2, y2)
> that compare equal but sort differently. Above this is achieved
> explicitly using the argument `reverse`. However, this can also happen by
> coincidence, as the code below shows. Here the inequality comparison
> falls back to comparison of ids, so we just generate objects until the
> ids are in an unlucky order.
> {{{
> #!python
> from random import randint
> from sage.graphs.graph import Graph
>
> class Foo:
> def __init__(self, b):
> self.b = b
> def __eq__(self, other):
> return self.b == other.b
> def __hash__(self):
> return hash(self.b)
> def __repr__(self):
> return "foo(%r)" % (self.b,)
>
> def gen(f):
> print "* Trying to obtain res = ((f(0),f(1)), (f(0),f(1))) with\n" \
> " (not res[0][0] < res[0][1]) and (res[1][0] < res[1][1])."
> candidates = [[],[]]
> res = [None, None]
> while res[False] is None or res[True] is None:
> bit = randint(0,1)
> candidates[bit].append( f(bit) )
> for x in candidates[0]:
> for y in candidates[1]:
> res[bool(x < y)] = (x,y)
> print "* Succeeded after %i random picks." \
> % (len(candidates[0]) + len(candidates[1]),)
> print ("* res[0] = %r\n res[1] = %r") % (res[0], res[1])
> return res
>
> def doit(f):
> print
> print "* f = %r" % (f,)
> res = gen(f)
> print "* For i=0 and i=1 creating graph G[i] with vertices " \
> "res[i] and no edges"
> G = [Graph(dict((x, []) for x in res[bit]), immutable = True)
> for bit in range(2)]
> print "* G[0] = %r\n G[1] = %r" % (G[0], G[1])
> print "* G[0].vertices() = %r\n G[1].vertices() = %r" % \
> (G[0].vertices(), G[1].vertices())
> print "* res[0] == res[1] : %r" % \
> (res[0] == res[1],)
> print "* sorted(res[0]) == sorted(res[1]) : %r" % \
> (sorted(res[0]) == sorted(res[1]),)
> print "* G[0] == G[1] : %r" % (G[0] == G[1],)
> print "* hash(G[0]) == hash(G[1]) : %r" % \
> (hash(G[0]) == hash(G[1]),)
> print
> }}}
> {{{
> sage: doit(Foo)
>
> * f = <class __main__.Foo at 0xacb694ac>
> * Trying to obtain res = ((f(0),f(1)), (f(0),f(1))) with
> (not res[0][0] < res[0][1]) and (res[1][0] < res[1][1]).
> * Succeeded after 6 random picks.
> * res[0] = (foo(0), foo(1))
> res[1] = (foo(0), foo(1))
> * For i=0 and i=1 creating graph G[i] with vertices res[i] and no edges
> * G[0] = Graph on 2 vertices
> G[1] = Graph on 2 vertices
> * G[0].vertices() = [foo(1), foo(0)]
> G[1].vertices() = [foo(0), foo(1)]
> * res[0] == res[1] : True
> * sorted(res[0]) == sorted(res[1]) : False
> * G[0] == G[1] : True
> * hash(G[0]) == hash(G[1]) : False
>
> }}}
>
> The custom class `Foo` above is not necessary:
> {{{
> sage: doit(lambda x: sage.graphs.graph.Graph(x, immutable=True))
>
> * f = <function <lambda> at 0xac556b1c>
> * Trying to obtain res = ((f(0),f(1)), (f(0),f(1))) with
> (not res[0][0] < res[0][1]) and (res[1][0] < res[1][1]).
> * Succeeded after 4 random picks.
> * res[0] = (Graph on 0 vertices, Graph on 1 vertex)
> res[1] = (Graph on 0 vertices, Graph on 1 vertex)
> * For i=0 and i=1 creating graph G[i] with vertices res[i] and no edges
> * G[0] = Graph on 2 vertices
> G[1] = Graph on 2 vertices
> * G[0].vertices() = [Graph on 1 vertex, Graph on 0 vertices]
> G[1].vertices() = [Graph on 0 vertices, Graph on 1 vertex]
> * res[0] == res[1] : True
> * sorted(res[0]) == sorted(res[1]) : False
> * G[0] == G[1] : True
> * hash(G[0]) == hash(G[1]) : False
>
> }}}
New description:
1. The documentation of `GenericGraph.__eq__` contains "For equality, ...,
output the same vertex list (in order), ...". However, since
[http://git.sagemath.org/sage.git/commit/?id=b59285e6f9b91f25ee2c77bfe940a45c51b58f80
b59285e6] (Trac #14806: Immutable graph backend), the order of the
vertices is no longer checked. So either that check should be
reintroduced, or the documentation should be updated.
The list returned by `.vertices()` is sorted, so that checking the
order only affects the result in rare cases. Hence I think updating the
documentation should be okay, even while it is technically backwards
incompatible.
2. While `GenericGraph.__eq__` no longer takes the ordering of vertices
into account, `GenericGraph.__hash__` still does. Consequently graphs that
compare equal can have different hashes.
3. For graphs that are not weighted, `GenericGraph.__eq__` ignores the
edge labels, while `GenericGraph.__hash__` does not.
Item 3 is most easily shown:
{{{
sage: G1 = Graph({0: {1: 'edge label A'}}, immutable=True)
sage: G2 = Graph({0: {1: 'edge label B'}}, immutable=True)
sage: G1 == G2
True
sage: G1.__hash__() == G2.__hash__()
False
}}}
For items 2 and 3, here is some contrived but deterministic code showing
that the order is not checked in `.__eq__`, while it is in `.__hash__`:
{{{
#!python
import functools
from sage.graphs.graph import Graph
@functools.total_ordering
class C:
order = ((0,0), (0,1), (1,1), (1,0))
# Hasse diagram:
# 0,0 < 0,1
# ^ ^
# 1,0 > 1,1
def __init__(self, x):
assert x in self.order
self.x = x
def __repr__(self):
return 'C(%r)' % (self.x,)
# ordering depends on full self.x
def __lt__(self, other):
return self.order.index(self.x) < self.order.index(other.x)
# equality depends only on the second coordinate.
def __eq__(self, other):
return self.x[1] == other.x[1]
def __hash__(self):
return hash(self.x[1])
G1 = Graph({C((0, 0)): [], C((0, 1)): []}, immutable=True)
G2 = Graph({C((1, 0)): [], C((1, 1)): []}, immutable=True)
print (G1.vertices(), G2.vertices())
print G1 == G2
print hash(G1) == hash(G2)
}}}
{{{
sage: %run /tmp/evil2.py
([C((0, 0)), C((0, 1))], [C((1, 1)), C((1, 0))])
True
False
}}}
The trick in the code above is to obtain lists [x1, y1] and [x2, y2] that
compare equal but sort differently. In this case we use [C((0,0)),
C((0,1))] and [C((1,0)), C((1,1))].
Such lists also occur naturally, as the code below shows. Here the
inequality comparison falls back to comparison of ids, so we just generate
objects until the ids are in the unlucky order.
{{{
#!python
from random import randint
from sage.graphs.graph import Graph
class Foo:
def __init__(self, b):
self.b = b
def __eq__(self, other):
return self.b == other.b
def __hash__(self):
return hash(self.b)
def __repr__(self):
return "<foo(%r) at %r>" % (self.b, id(self))
def gen(f):
print "* Trying to obtain res = [[f(0),f(1)], [f(0),f(1)]] with\n" \
" (not res[0][0] < res[0][1]) and (res[1][0] < res[1][1])."
candidates = [[],[]]
res = [None, None]
while res[False] is None or res[True] is None:
bit = randint(0,1)
candidates[bit].append( f(bit) )
for x in candidates[0]:
for y in candidates[1]:
res[bool(x < y)] = [x,y]
print "* Succeeded after %i random picks." \
% (len(candidates[0]) + len(candidates[1]),)
print ("* res[0] = %r\n res[1] = %r") % (res[False], res[True])
return res
def doit(f):
print
print "* f = %r" % (f,)
res = gen(f)
print "* For i=0 and i=1 creating graph G[i] with vertices " \
"res[i] and no edges"
G = [Graph(dict((x, []) for x in res[bit]), immutable = True)
for bit in range(2)]
print "* G[0] = %r\n G[1] = %r" % (G[0], G[1])
print "* G[0].vertices() = %r\n G[1].vertices() = %r" % \
(G[0].vertices(), G[1].vertices())
print "* res[0] == res[1] : %r" % \
(res[0] == res[1],)
print "* sorted(res[0]) == sorted(res[1]) : %r" % \
(sorted(res[0]) == sorted(res[1]),)
print "* G[0] == G[1] : %r" % (G[0] == G[1],)
print "* hash(G[0]) == hash(G[1]) : %r" % \
(hash(G[0]) == hash(G[1]),)
print
}}}
{{{
sage: %run /tmp/evil.py
sage: doit(Foo)
* f = <class __main__.Foo at 0xad1caadc>
* Trying to obtain res = [[f(0),f(1)], [f(0),f(1)]] with
(not res[0][0] < res[0][1]) and (res[1][0] < res[1][1]).
* Succeeded after 5 random picks.
* res[0] = [<foo(0) at 2904422892L>, <foo(1) at 2904420844L>]
res[1] = [<foo(0) at 2904422892L>, <foo(1) at 2904424140L>]
* For i=0 and i=1 creating graph G[i] with vertices res[i] and no edges
* G[0] = Graph on 2 vertices
G[1] = Graph on 2 vertices
* G[0].vertices() = [<foo(1) at 2904420844L>, <foo(0) at 2904422892L>]
G[1].vertices() = [<foo(0) at 2904422892L>, <foo(1) at 2904424140L>]
* res[0] == res[1] : True
* sorted(res[0]) == sorted(res[1]) : False
* G[0] == G[1] : True
* hash(G[0]) == hash(G[1]) : False
}}}
The custom class `Foo` above is not necessary:
{{{
sage: doit(lambda x: sage.graphs.graph.Graph(x, immutable=True))
* f = <function <lambda> at 0xad1d9e9c>
* Trying to obtain res = [[f(0),f(1)], [f(0),f(1)]] with
(not res[0][0] < res[0][1]) and (res[1][0] < res[1][1]).
* Succeeded after 3 random picks.
* res[0] = [Graph on 0 vertices, Graph on 1 vertex]
res[1] = [Graph on 0 vertices, Graph on 1 vertex]
* For i=0 and i=1 creating graph G[i] with vertices res[i] and no edges
* G[0] = Graph on 2 vertices
G[1] = Graph on 2 vertices
* G[0].vertices() = [Graph on 1 vertex, Graph on 0 vertices]
G[1].vertices() = [Graph on 0 vertices, Graph on 1 vertex]
* res[0] == res[1] : True
* sorted(res[0]) == sorted(res[1]) : False
* G[0] == G[1] : True
* hash(G[0]) == hash(G[1]) : False
}}}
--
Comment (by emassop):
Replying to [comment:3 ncohen]:
> Thank you for this 'Counter' thing, I did not know that it existed `^^;`
It's shiny new Python 2.7 material :).
[[BR]]
> 1) About `self.edge_iterator(labels = self._weighted)`:
>
> You can have graphs with edge labels for which _weighted is not True
>
> {{{
> sage: Graph([(1,2,"Hey")])._weighted
> False
> sage: Graph([(1,2,"Hey")],immutable=True)._weighted
> False
> }}}
>
> I do not think that this 'weighted' notion makes a lot of sense anyway,
perhaps we should remove it. Either way it is probably best to set
'labels' to `True` anytime.
Indeed you can set edge labels on unweighted graphs, but the documented
behaviour is that they are ignored in `__eq__`: "Note that graphs must be
considered weighted, or Sage will not pay attention to edge label data in
equality testing". Whether edge labels ''should'' be checked or not, is a
different question, that I would like to keep outside the scope of this
ticket. Either way, when `__eq__` is true, the hashes have to be the same.
So if `__eq__` ignores labels, then so should `__hash__`. I really see no
point in ignoring edge labels in `__hash__` when they are not ignored in
`__eq__`. I think that would just makes the hash weaker.
[[BR]]
> 2) Perhaps I do not understand what your tests do, but even when the
graph is immutable the list of vertices returned by `.vertices()` should
be sorted. Do you have a counterexample to that ?
>
> Indeed, we cannot assume that sorting a list produces a unique sorted
list (because the order is not always a total order) but that list,
however, should be sorted.
>
> This, because even when the immutable graph is created, the vertices
are labelled using the order returned by `.vertices()`. So I hope that you
found no bad behaviour there.
>
No, I do not have a counter example. I might look into it, or not. Either
way, I would like to restrict this ticket to the totally ordered case
where the axioms for ordering use {{{is}}} and not {{{==}}}.
When an {{{is}}}-based ordering does not respect {{{==}}}, such as
Python's default ordering using {{{id}}}, you can have {{{a1==a2 and
b1==b2}}} and {{{a1<b1 and a2>b2}}} at the same time. Within a single
graph this cannot happen as there {{{a1==a2}}} implies {{{a1 is a2}}}
(assuming that the given vertex labels are `==`-distinct). However, it
''is'' possible to construct ''a pair'' of graphs using {{{==}}}-equal
vertex lists that nonetheless have distinct {{{.vertices()}}}. This is
what happens above.
In the example using {{{class C}}}, I have explicitly constructed a
comparison that gives such evil results. (Presently I have modified it to
obey the axioms of an {{{is}}}-based ordering.) In the example using
{{{class Foo}}}, I show that the problem also occurs with Python's default
ordering. Since Python's default ordering depends on {{{id}}} over which
you have no control, you need a bit of luck. The example using {{{lambda
x: ...}}}, shows that the behaviour of {{{class Foo}}} occurs in the wild.
> But indeed, wrapping it with a frozenset if the right thing to do.
:)
[[BR]]
> 3) Technically, you do not need to add `._is_weighted` in the hash
function, as graphs with different values for `._is_weighted` are nonequal
anyway. But that does not hurt either.
In {{{hash(..., self._is_weighted, ...)}}} I indeed included it
redundantly.
[[BR]]
> 4) `O_O`
>
> {{{
> sage: G1 = Graph({0: {1: 'edge label A'}})
> sage: G2 = Graph({0: {1: 'edge label B'}})
> sage: G1 == G2
> True
> }}}
>
> This is *REALLY* bad ! Graphs with different labels should be non-equal
!!! `O_O;;;;;`
>
> But it seems that it has been like that forever... Gosh that's bad `O_o`
According to the documentation the labels are ignored. Let's defer what is
right(tm) to a different ticket.
> Hmmmm.. Now I understand why you added this `self._weighted` in the hash
function `:-/`
:)
--
Ticket URL: <http://trac.sagemath.org/ticket/17086#comment:4>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/d/optout.