Re: [sage-trac] #17086: GenericGraph's documentation, eq, and hash out of sync

sage-trac Thu, 02 Oct 2014 04:57:13 -0700

#17086: GenericGraph's documentation, __eq__, and __hash__ out of sync
-------------------------------------+-------------------------------------
       Reporter:  emassop            |        Owner:
           Type:  defect             |       Status:  new
       Priority:  major              |    Milestone:  sage-6.4
      Component:  graph theory       |   Resolution:
       Keywords:                     |    Merged in:
        Authors:                     |    Reviewers:
Report Upstream:  N/A                |  Work issues:
         Branch:                     |       Commit:
  u/emassop/graph_hash               |  82b245ee0b092a7a994074e27cc47686d43551c9
   Dependencies:                     |     Stopgaps:
-------------------------------------+-------------------------------------


Old description:

> 1. The documentation of `GenericGraph.__eq__` contains "For equality,
> ..., output the same vertex list (in order), ...". However, since
> [http://git.sagemath.org/sage.git/commit/?id=b59285e6f9b91f25ee2c77bfe940a45c51b58f80
> b59285e6] (Trac #14806: Immutable graph backend), the order of the
> vertices is no longer checked. So either that check should be
> reintroduced, or the documentation should be updated.
>
>    The list returned by `.vertices()` is sorted, so that checking the
> order only affects the result in rare cases. Hence I think updating the
> documentation should be okay, even while it is technically backwards
> incompatible.
>
> 2. While `GenericGraph.__eq__` no longer takes the ordering of vertices
> into account, `GenericGraph.__hash__` still does. Consequently graphs
> that compare equal can have different hashes.
>
> 3. For graphs that are not weighted, `GenericGraph.__eq__` ignores the
> edge labels, while `GenericGraph.__hash__` does not.
>
> Item 3 is most easily shown:
> {{{
> sage: G1 = Graph({0: {1: 'edge label A'}}, immutable=True)
> sage: G2 = Graph({0: {1: 'edge label B'}}, immutable=True)
> sage: G1 == G2
> True
> sage: G1.__hash__() == G2.__hash__()
> False
> }}}
>
> For items 2 and 3, here is some contrived but deterministic code showing
> that the order is not checked in `.__eq__`, while it is in `.__hash__`:
> {{{
> #!python
> import functools
> @functools.total_ordering
> class C:
>     def __init__(self, val, reverse):
>         self.val = val
>         self.reverse = reverse
>     def __repr__(self):
>         return 'C(%r, %r)' % (self.val, self.reverse)
>     def __hash__(self):
>         return hash(self.val)
>     def __eq__(self, other):
>         return self.val == other.val
>     def __lt__(self, other):
>         if self.reverse:
>             return self.val > other.val
>         else:
>             return self.val < other.val
> G1 = Graph({C(0, False): [], C(1, False): []}, immutable=True)
> G2 = Graph({C(0, True ): [], C(1, True ): []}, immutable=True)
> }}}
> {{{
> sage: [G1.vertices(), G2.vertices()]
> [[C(0, False), C(1, False)], [C(1, True), C(0, True)]]
> sage: G1 == G2
> True
> sage: hash(G1) == hash(G2)
> False
> }}}
>
> The trick in the code above is to obtain tuples (x1, y1) and (x2, y2)
> that compare equal but sort differently. Above this is achieved
> explicitly using the argument `reverse`. However, this can also happen by
> coincidence, as the code below shows. Here the inequality comparison
> falls back to comparison of ids, so we just generate objects until the
> ids are in an unlucky order.
> {{{
> #!python
> from random import randint
> from sage.graphs.graph import Graph
>
> class Foo:
>     def __init__(self, b):
>         self.b = b
>     def __eq__(self, other):
>         return self.b == other.b
>     def __hash__(self):
>         return hash(self.b)
>     def __repr__(self):
>         return "foo(%r)" % (self.b,)
>
> def gen(f):
>     print "* Trying to obtain res = ((f(0),f(1)), (f(0),f(1))) with\n" \
>           "  (not res[0][0] < res[0][1]) and (res[1][0] < res[1][1])."
>     candidates = [[],[]]
>     res = [None, None]
>     while res[False] is None or res[True] is None:
>         bit = randint(0,1)
>         candidates[bit].append( f(bit) )
>         for x in candidates[0]:
>             for y in candidates[1]:
>                 res[bool(x < y)] = (x,y)
>     print "* Succeeded after %i random picks." \
>            % (len(candidates[0]) + len(candidates[1]),)
>     print ("* res[0] = %r\n  res[1] = %r") % (res[0], res[1])
>     return res
>
> def doit(f):
>     print
>     print "* f = %r" % (f,)
>     res = gen(f)
>     print "* For i=0 and i=1 creating graph G[i] with vertices " \
>           "res[i] and no edges"
>     G = [Graph(dict((x, []) for x in res[bit]), immutable = True)
>          for bit in range(2)]
>     print "* G[0] = %r\n  G[1] = %r" % (G[0], G[1])
>     print "* G[0].vertices() = %r\n  G[1].vertices() = %r" % \
>             (G[0].vertices(), G[1].vertices())
>     print "*         res[0] == res[1]         : %r" % \
>             (res[0] == res[1],)
>     print "* sorted(res[0]) == sorted(res[1]) : %r" % \
>             (sorted(res[0]) == sorted(res[1]),)
>     print "*           G[0] == G[1]           : %r" % (G[0] == G[1],)
>     print "*     hash(G[0]) == hash(G[1])     : %r" % \
>             (hash(G[0]) == hash(G[1]),)
>     print
> }}}
> {{{
> sage: doit(Foo)
>
> * f = <class __main__.Foo at 0xacb694ac>
> * Trying to obtain res = ((f(0),f(1)), (f(0),f(1))) with
>   (not res[0][0] < res[0][1]) and (res[1][0] < res[1][1]).
> * Succeeded after 6 random picks.
> * res[0] = (foo(0), foo(1))
>   res[1] = (foo(0), foo(1))
> * For i=0 and i=1 creating graph G[i] with vertices res[i] and no edges
> * G[0] = Graph on 2 vertices
>   G[1] = Graph on 2 vertices
> * G[0].vertices() = [foo(1), foo(0)]
>   G[1].vertices() = [foo(0), foo(1)]
> *         res[0] == res[1]         : True
> * sorted(res[0]) == sorted(res[1]) : False
> *           G[0] == G[1]           : True
> *     hash(G[0]) == hash(G[1])     : False
>
> }}}
>
> The custom class `Foo` above is not necessary:
> {{{
> sage: doit(lambda x: sage.graphs.graph.Graph(x, immutable=True))
>
> * f = <function <lambda> at 0xac556b1c>
> * Trying to obtain res = ((f(0),f(1)), (f(0),f(1))) with
>   (not res[0][0] < res[0][1]) and (res[1][0] < res[1][1]).
> * Succeeded after 4 random picks.
> * res[0] = (Graph on 0 vertices, Graph on 1 vertex)
>   res[1] = (Graph on 0 vertices, Graph on 1 vertex)
> * For i=0 and i=1 creating graph G[i] with vertices res[i] and no edges
> * G[0] = Graph on 2 vertices
>   G[1] = Graph on 2 vertices
> * G[0].vertices() = [Graph on 1 vertex, Graph on 0 vertices]
>   G[1].vertices() = [Graph on 0 vertices, Graph on 1 vertex]
> *         res[0] == res[1]         : True
> * sorted(res[0]) == sorted(res[1]) : False
> *           G[0] == G[1]           : True
> *     hash(G[0]) == hash(G[1])     : False
>
> }}}

New description:

 1. The documentation of `GenericGraph.__eq__` contains "For equality, ...,
 output the same vertex list (in order), ...". However, since
 
[http://git.sagemath.org/sage.git/commit/?id=b59285e6f9b91f25ee2c77bfe940a45c51b58f80
 b59285e6] (Trac #14806: Immutable graph backend), the order of the
 vertices is no longer checked. So either that check should be
 reintroduced, or the documentation should be updated.

    The list returned by `.vertices()` is sorted, so that checking the
 order only affects the result in rare cases. Hence I think updating the
 documentation should be okay, even while it is technically backwards
 incompatible.

 2. While `GenericGraph.__eq__` no longer takes the ordering of vertices
 into account, `GenericGraph.__hash__` still does. Consequently graphs that
 compare equal can have different hashes.

 3. For graphs that are not weighted, `GenericGraph.__eq__` ignores the
 edge labels, while `GenericGraph.__hash__` does not.

 Item 3 is most easily shown:
 {{{
 sage: G1 = Graph({0: {1: 'edge label A'}}, immutable=True)
 sage: G2 = Graph({0: {1: 'edge label B'}}, immutable=True)
 sage: G1 == G2
 True
 sage: G1.__hash__() == G2.__hash__()
 False
 }}}

 For items 2 and 3, here is some contrived but deterministic code showing
 that the order is not checked in `.__eq__`, while it is in `.__hash__`:
 {{{
 #!python
 import functools
 from sage.graphs.graph import Graph

 @functools.total_ordering
 class C:
     order = ((0,0), (0,1), (1,1), (1,0))
     # Hasse diagram:
     #   0,0 < 0,1
     #    ^     ^
     #   1,0 > 1,1

     def __init__(self, x):
         assert x in self.order
         self.x = x
     def __repr__(self):
         return 'C(%r)' % (self.x,)

     # ordering depends on full self.x
     def __lt__(self, other):
         return self.order.index(self.x) < self.order.index(other.x)

     # equality depends only on the second coordinate.
     def __eq__(self, other):
         return self.x[1] == other.x[1]
     def __hash__(self):
         return hash(self.x[1])


 G1 = Graph({C((0, 0)): [], C((0, 1)): []}, immutable=True)
 G2 = Graph({C((1, 0)): [], C((1, 1)): []}, immutable=True)
 print (G1.vertices(), G2.vertices())
 print G1 == G2
 print hash(G1) == hash(G2)
 }}}
 {{{
 sage: %run /tmp/evil2.py
 ([C((0, 0)), C((0, 1))], [C((1, 1)), C((1, 0))])
 True
 False
 }}}

 The trick in the code above is to obtain lists [x1, y1] and [x2, y2] that
 compare equal but sort differently. In this case we use [C((0,0)),
 C((0,1))] and [C((1,0)), C((1,1))].

 Such lists also occur naturally, as the code below shows. Here the
 inequality comparison falls back to comparison of ids, so we just generate
 objects until the ids are in the unlucky order.
 {{{
 #!python
 from random import randint
 from sage.graphs.graph import Graph

 class Foo:
     def __init__(self, b):
         self.b = b
     def __eq__(self, other):
         return self.b == other.b
     def __hash__(self):
         return hash(self.b)
     def __repr__(self):
         return "<foo(%r) at %r>" % (self.b, id(self))

 def gen(f):
     print "* Trying to obtain res = [[f(0),f(1)], [f(0),f(1)]] with\n" \
           "  (not res[0][0] < res[0][1]) and (res[1][0] < res[1][1])."
     candidates = [[],[]]
     res = [None, None]
     while res[False] is None or res[True] is None:
         bit = randint(0,1)
         candidates[bit].append( f(bit) )
         for x in candidates[0]:
             for y in candidates[1]:
                 res[bool(x < y)] = [x,y]
     print "* Succeeded after %i random picks." \
            % (len(candidates[0]) + len(candidates[1]),)
     print ("* res[0] = %r\n  res[1] = %r") % (res[False], res[True])
     return res

 def doit(f):
     print
     print "* f = %r" % (f,)
     res = gen(f)
     print "* For i=0 and i=1 creating graph G[i] with vertices " \
           "res[i] and no edges"
     G = [Graph(dict((x, []) for x in res[bit]), immutable = True)
          for bit in range(2)]
     print "* G[0] = %r\n  G[1] = %r" % (G[0], G[1])
     print "* G[0].vertices() = %r\n  G[1].vertices() = %r" % \
             (G[0].vertices(), G[1].vertices())
     print "*         res[0] == res[1]         : %r" % \
             (res[0] == res[1],)
     print "* sorted(res[0]) == sorted(res[1]) : %r" % \
             (sorted(res[0]) == sorted(res[1]),)
     print "*           G[0] == G[1]           : %r" % (G[0] == G[1],)
     print "*     hash(G[0]) == hash(G[1])     : %r" % \
             (hash(G[0]) == hash(G[1]),)
     print
 }}}
 {{{
 sage: %run /tmp/evil.py
 sage: doit(Foo)

 * f = <class __main__.Foo at 0xad1caadc>
 * Trying to obtain res = [[f(0),f(1)], [f(0),f(1)]] with
   (not res[0][0] < res[0][1]) and (res[1][0] < res[1][1]).
 * Succeeded after 5 random picks.
 * res[0] = [<foo(0) at 2904422892L>, <foo(1) at 2904420844L>]
   res[1] = [<foo(0) at 2904422892L>, <foo(1) at 2904424140L>]
 * For i=0 and i=1 creating graph G[i] with vertices res[i] and no edges
 * G[0] = Graph on 2 vertices
   G[1] = Graph on 2 vertices
 * G[0].vertices() = [<foo(1) at 2904420844L>, <foo(0) at 2904422892L>]
   G[1].vertices() = [<foo(0) at 2904422892L>, <foo(1) at 2904424140L>]
 *         res[0] == res[1]         : True
 * sorted(res[0]) == sorted(res[1]) : False
 *           G[0] == G[1]           : True
 *     hash(G[0]) == hash(G[1])     : False

 }}}

 The custom class `Foo` above is not necessary:
 {{{
 sage: doit(lambda x: sage.graphs.graph.Graph(x, immutable=True))

 * f = <function <lambda> at 0xad1d9e9c>
 * Trying to obtain res = [[f(0),f(1)], [f(0),f(1)]] with
   (not res[0][0] < res[0][1]) and (res[1][0] < res[1][1]).
 * Succeeded after 3 random picks.
 * res[0] = [Graph on 0 vertices, Graph on 1 vertex]
   res[1] = [Graph on 0 vertices, Graph on 1 vertex]
 * For i=0 and i=1 creating graph G[i] with vertices res[i] and no edges
 * G[0] = Graph on 2 vertices
   G[1] = Graph on 2 vertices
 * G[0].vertices() = [Graph on 1 vertex, Graph on 0 vertices]
   G[1].vertices() = [Graph on 0 vertices, Graph on 1 vertex]
 *         res[0] == res[1]         : True
 * sorted(res[0]) == sorted(res[1]) : False
 *           G[0] == G[1]           : True
 *     hash(G[0]) == hash(G[1])     : False

 }}}

--

Comment (by emassop):

 Replying to [comment:3 ncohen]:

 > Thank you for this 'Counter' thing, I did not know that it existed `^^;`

 It's shiny new Python 2.7 material :).

 [[BR]]
 > 1) About `self.edge_iterator(labels = self._weighted)`:
 >
 > You can have graphs with edge labels for which _weighted is not True
 >
 > {{{
 > sage: Graph([(1,2,"Hey")])._weighted
 > False
 > sage: Graph([(1,2,"Hey")],immutable=True)._weighted
 > False
 > }}}
 >
 > I do not think that this 'weighted' notion makes a lot of sense anyway,
 perhaps we should remove it. Either way it is probably best to set
 'labels' to `True` anytime.

 Indeed you can set edge labels on unweighted graphs, but the documented
 behaviour is that they are ignored in `__eq__`: "Note that graphs must be
 considered weighted, or Sage will not pay attention to edge label data in
 equality testing". Whether edge labels ''should'' be checked or not, is a
 different question, that I would like to keep outside the scope of this
 ticket. Either way, when `__eq__` is true, the hashes have to be the same.
 So if `__eq__` ignores labels, then so should `__hash__`. I really see no
 point in ignoring edge labels in `__hash__` when they are not ignored in
 `__eq__`. I think that would just makes the hash weaker.

 [[BR]]
 > 2) Perhaps I do not understand what your tests do, but even when the
 graph is immutable the list of vertices returned by `.vertices()` should
 be sorted. Do you have a counterexample to that ?
 >
 >   Indeed, we cannot assume that sorting a list produces a unique sorted
 list (because the order is not always a total order) but that list,
 however, should be sorted.
 >
 >   This, because even when the immutable graph is created, the vertices
 are labelled using the order returned by `.vertices()`. So I hope that you
 found no bad behaviour there.
 >

 No, I do not have a counter example. I might look into it, or not. Either
 way, I would like to restrict this ticket to the totally ordered case
 where the axioms for ordering use {{{is}}} and not {{{==}}}.

 When an {{{is}}}-based ordering does not respect {{{==}}}, such as
 Python's default ordering using {{{id}}}, you can have {{{a1==a2 and
 b1==b2}}} and {{{a1<b1 and a2>b2}}} at the same time. Within a single
 graph this cannot happen as there {{{a1==a2}}} implies {{{a1 is a2}}}
 (assuming that the given vertex labels are `==`-distinct). However, it
 ''is'' possible to construct ''a pair'' of graphs using {{{==}}}-equal
 vertex lists that nonetheless have distinct {{{.vertices()}}}. This is
 what happens above.

 In the example using {{{class C}}}, I have explicitly constructed a
 comparison that gives such evil results. (Presently I have modified it to
 obey the axioms of an {{{is}}}-based ordering.) In the example using
 {{{class Foo}}}, I show that the problem also occurs with Python's default
 ordering. Since Python's default ordering depends on {{{id}}} over which
 you have no control, you need a bit of luck. The example using {{{lambda
 x: ...}}}, shows that the behaviour of {{{class Foo}}} occurs in the wild.

 >   But indeed, wrapping it with a frozenset if the right thing to do.

 :)

 [[BR]]
 > 3) Technically, you do not need to add `._is_weighted` in the hash
 function, as graphs with different values for `._is_weighted` are nonequal
 anyway. But that does not hurt either.

 In {{{hash(..., self._is_weighted, ...)}}} I indeed included it
 redundantly.

 [[BR]]
 > 4) `O_O`
 >
 > {{{
 > sage: G1 = Graph({0: {1: 'edge label A'}})
 > sage: G2 = Graph({0: {1: 'edge label B'}})
 > sage: G1 == G2
 > True
 > }}}
 >
 > This is *REALLY* bad ! Graphs with different labels should be non-equal
 !!! `O_O;;;;;`
 >
 > But it seems that it has been like that forever... Gosh that's bad `O_o`

 According to the documentation the labels are ignored. Let's defer what is
 right(tm) to a different ticket.

 > Hmmmm.. Now I understand why you added this `self._weighted` in the hash
 function `:-/`

 :)

--
Ticket URL: <http://trac.sagemath.org/ticket/17086#comment:4>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/d/optout.

Re: [sage-trac] #17086: GenericGraph's documentation, __eq__, and __hash__ out of sync

Reply via email to

Re: [sage-trac] #17086: GenericGraph's documentation, eq, and hash out of sync