[sage-devel] Re: BipartiteGraph status

Ryan Hinton Mon, 01 Mar 2010 14:45:50 -0800

For the impatient, I'll give the summary here at the top.

For what it's worth, my current opinion is that a mixture of options
(B) and (C) are probably best.  Given the backend abstraction,
overriding the core public interface (add/delete_edge, add/
delete_vertex) will result in most methods working correctly or
raising an exception (which are better than silent errors).  There are
a few methods like __copy__ that are much easier to slip in the
BipartiteGraph handling directly.

I still think there is a significant chunk of code necessary fix or
override GenericGraph and Graph methods that will fail for
BipartiteGraph instances.  In particular, any method that tests the
class or creates a new object via a constructor will probably fail
(and some will "fail" silently by returning a Graph when a
BipartiteGraph is expected).  Methods that access the backend also
deserve scrutiny.

I also think a battery of tests should be included in
bipartite_graph.py and/or tests on the individual methods in
generic_graph.py and graph.py.

But these are all opinions and suggestions since I don't have time to
do it at the moment.

On Mar 1, 3:18 pm, Robert Miller <r...@rlmiller.org> wrote:
> A.  Change BipartiteGraph so that it doesn't inherit from Graph.
>
> B.  Make BipartiteGraph handling an integral part of the Graph class...
>
> C. ... BipartiteGraph is a subclass of Graph and overrides methods when 
> necessary.

Good summary.  I'd like to clarify.  Option (A) removes BipartiteGraph
completely from the GenericGraph inheritance tree.  So every Graph and
GenericGraph method would need to be included.  Option (B) still has
BipartiteGraph as a subclass of Graph, but the methods in Graph and
GenericGraph are modified directly to handle the bipartite case when
necessary.  Option (C) doesn't touch the Graph or GenericGraph
classes.  Any necessary changes are handled by overriding methods.

> On Mon, Mar 1, 2010 at 9:55 AM, Ryan Hinton <iob...@email.com> wrote:
> > It looks like you're also supporting option (C) for solving problem
> > (2): very interesting. Since you wrote most of this code, your opinion
> > is the most important to me.  But I personally prefer both other
> > options over (C).  Option (A) puts the maintenance responsibility
> > completely on BipartiteGraph developers (a.k.a. users).  Option (B)
> > puts the maintenance responsibility on Graph/GenericGraph developers.
> > Option (C) just assumes everything works.
>
> Option C does not just assume everything works. For example, if you
> make "side" a required option to add_vertex, and you call a function
> which uses add_vertex without that argument, it will blow up, loudly
> and honestly. If it blows up, that means that nobody has ever tried it
> before and fixed it. It is true (maybe) that some things will appear
> to work when in fact they don't make sense or give the wrong answer.
> However it should be relatively easy to peruse the code now, to see
> which functions are like this, and fix them, so the real issue at hand
> is when someone writes a new function for Graph without being aware of
> BipartiteGraph's.

Option C assumes that any future method written for a Graph will work
on a BipartiteGraph as well.  This is a consequence of the class
inheritance.  Since BG "isa" Graph, any method that works on a Graph
should work on a BG as well.  So the "wishful thinking" I refer to
later is this "everything works" assumption.

Of course, options A and B also make assumptions.  Option A assumes a
non-empty set of BipartiteGraph developer(s)/user(s) willing to copy
everything over and keep the class up to date -- or at least maintain
the minimum functionality they want.  Option B assumes the graph.py
developers will know about and code for the BG case.

> > A Graph method that do something bad for a BG will *only* complain
> > when someone writes a doctest *in graph.py* for this method *using a
> > BipartiteGraph*.  Or when someone writes a doctest in
> > bipartite_graph.py for a method that only appears in graph.py.  Or,
> > most likely, when someone starts using a BG and finds that some
> > methods just blow up.
>
> I don't think that this is necessarily the picture. For one, your
> recent work has made many more people who would be reviewing new graph
> functions aware that BG's are around, and should be checked. Also,
> what is the real harm of having a function on Graph's that doesn't
> work for BG's but nobody ever tries to use for BG's?

I'm glad to be helpful. :-)

No harm if it's never used.  Annoyance or possible invalid results if
someone does use it.  If it's there, someone will eventually use it.
Tab completion and docstrings are my primary means for learning Sage
-- and they're inherited, so any Graph methods are assumed to work as
described for BG as well.

> > I like option (A) because it acknowledges that BipartiteGraph and
> > Graph have important differences.  The functionality of BG will lag
> > that of Graph, but explicitly so -- if a method exists for a BG, it
> > should work with high confidence.  I like option (B) because it is the
> > similar to the solution for DiGraph: it allows for the important
> > differences to be handled where necessary, and anyone working in
> > graph.py will quickly become aware of the cases that need to be
> > handled.  (Notice the surprise of several on this list that
> > BipartiteGraph even existed!)
>
> Let's clarify a bit: DiGraph and Graph both inherit from GenericGraph.
> This is already a mixture of what you are calling option A and B, for
> DiGraphs. Perhaps this approach makes sense for BipartiteGraphs too,
> but the fact is that a BipartiteGraph *is* a Graph, and that
> relationship should be preserved by class hierarchy. Also, the
> problems won't be solved by making BipartiteGraph a GenericGraph,
> since that will just push some of the functions out of the way, but
> not others.

Good point on the inheritance.  Also, the bulk of the methods are
implemented in GenericGraph and inherited in Graph and DiGraph.
What's the difference in adding code to handle the "grandchild"
BipartiteGraph (i.e. option B)?

Here's a little more detail on options.

(A)  My concept is that BipartiteGraph would have a Graph member that
it would delegate most calls.  Breaking the class hierarchy is a
significant disadvantage that I forgot to mention earlier.

(B)  My concept is that BipartiteGraph would add assign the _bipartite
attribute to True.  Then the GenericGraph code can use it like it uses
the _directed attribute now.

I think option (C) is well understood judging from your comments.

> > It seems like we should either separate the classes as in (A) so that
> > someone working in graph.py doesn't have to know or worry about
> > bipartite_graph.py, or integrate them as in (B) so Graph and
> > GenericGraph developers are expected to account for (and more likely
> > be aware of) the BipartiteGraph case.  Now that I've presented it as a
> > viable option, the wishful thinking of (C) appears unsustainable.
>
> I don't understand what is wishful about option C. If a BG has a
> function inherited from Graph which is buggy on BG input, then that
> function has a bug and needs to be fixed. Maybe it is wishful to
> believe that people will fix bugs as they find them, but I prefer this
> to major class obfuscation and additional complications in the code
> which will most likely cause more bugs.

Again, the wishful thinking is the assumptions.  Here are a few
examples.

* The copy() method calls the DiGraph and Graph constructors
directly.  To get all the same functionality (e.g. copying the _assoc
and _embedding attributes) in a BipartiteGraph override (option C)
would require duplicating the code.  Or is there a better way?

* union(), disjoint_union(), cartesian_product(), tensor_product(),
etc. also create Graph() and DiGraph() objects directly -- similar to
copy().  So they would need to be overridden in BG -- possibly with
the same code/algorithm -- to produce outputs of the correct class.

They also use tests like "self._directed and not other._directed" to
make judgements about the *class* of the objects.  Shouldn't the
__class__ be tested directly?

* Algorithms like the one for is_circular_planar() add/remove edges
and vertices from the graph will fail (loudly) once these are
overridden for BipartiteGraph.

Maybe these are the only cases to worry about: creating objects and
testing the class.  Since the implementation (e.g. NetworkX vs.
c_graph) is usually abstracted, most of the algorithms use the
"public" add_edge-style interface.  Some do access the _backend member
directly, so this is another red flag to watch for.  But the methods I
saw that did this *were* the core methods like add_edge().

> > I think option (A) gives the fewest bugs, and option (C) the most.  I
> > think option (A) implies slightly more work now simply because a large
> > number of one-line methods (with documentation and doctests copied
> > from graph.py) would need to be written.  I would put options (B) and
> > (C) as roughly equal work now.
>
> Here's my breakdown of your proposed options:
>
> Work now:
> Option A & B: Both of these are a lot of work now. And there is nobody
> who is obviously ready to do the work, all of which should be done at
> around the same time, with deprecations, etc.
> Option C: This just requires finishing #1941 and checking that the
> rest of the functions make sense. No major overhauling, just fixing
> some bugs.

Option (A) is a lot of work now.  Option (B) shouldn't be any more
work now than option (C).  It can be implemented gradually as
necessary.

> Bugs:

I neglected to mention my assumption that the code added was free of
bugs for its intended target, i.e. a method in Graph is correct for
Graph instances (though not necessarily BipartiteGraph instances).

> Option A: This will introduce more bugs, guaranteed. Any time you
> rearrange class structure, in my experience, it takes a long time to
> track down all the bugs, some of which may be quite subtle. However,
> adding new functions to Graph will not introduce bugs to
> BipartiteGraph (although it may to Graph).

True, it might take a long time for the BipartiteGraph class to
stabilize.  But apparently nobody is using it now since it's not
complete.  In other words, it hasn't stabilized yet using the current
approach.  Still, I think this is simply due to lack of time,
interest, and people knowing it existed.  (Hopefully the latter is
fixed now. :-)

> Option B: Basically the same a A. New bugs guaranteed, subtle and a
> long time to fix them all. I think this option is even more prone to
> subtle bugs and endless work than A. Adding new functions to Graph or
> BipartiteGraph will likely introduce bugs, since author and reviewer
> are likely to forget checks all over the place.

Maybe we're thinking of different things.  I don't see that it will
take any more work to add some case checking in the GenericGraph
methods that option (C) would have to override.

> Option C: This approach only fixes bugs, and will not introduce new
> ones. At worst, some functions on BG's which used to give wrong
> answers will complain loudly instead. Adding new functions to Graph
> may introduce bugs in BG's (or in Graphs).

My point is that code written *correctly* for a Graph can fail for a
BipartiteGraph, see the examples above.

> > Work later comes from changes or additions to graph.py.
>
> > i.  A new method is added.
> >  A.  A BipartiteGraph user who discovers the new method and wants to
> > use it needs to copy the documentation, make a few small changes to
> > the doctest, and handle the call appropriately (likely 1-line
> > delegation).
>
> This means massive amounts of code duplication. Any functions on BG's
> (nearly all the graph functions, potentially) will be duplicated.

Yes and no.  Every method signature and documentation would be
duplicated -- but most calls would be delegated directly to a member
object of class Graph.  Maintaining the docstrings would be the
biggest headache.

> >  B.  The graph.py developer is expected to handle the bipartite case.
>
> This may introduce a lot of resistance to people who want to develop
> for graphs in Sage. "What is all this complicated infrastructure? I
> don't care about BipartiteGraphs, and I don't have time, so I'll just
> use my patch myself and not submit it."

Possible.  I'm wondering if I have time to finish BipartiteGraph
enough to use it myself. :-)

> >  C.  A BipartiteGraph user discovers the new method and tries to use
> > it.  It might work, it might raise an exception, or it might give a
> > wrong answer.
>
> I believe that option C can be implemented so that new methods rarely
> give wrong answers. By making all the basic functions (i.e.
> add_vertex) fail loudly if they are not being used "Bipartite"-ly, we
> can make it very likely that such a new function will also blow up,
> which is the optimal solution for me.

I don't know that blowing up is optimal, but it's certainly better
than silent failure.  :-)  And I think it's good enough -- perhaps as
good as we can expect.

> > ii.  The algorithm for an existing method is changed.
> >  A.  There will be a BipartiteGraph doctest for this same method, so
> > an exception or wrong result will usually be discovered before
> > release.
> >  B.  Again, the graph.py developer should have included a
> > BipartiteGraph doctest and handled any necessary special cases.
> >  C.  Existing, correct code using BipartiteGraphs may begin to raise
> > exceptions or silently give wrong results at the next release.
>
> This disadvantage to option C can be overcome by including doctests
> for the functions which work for BG's which actually test the function
> for BG's, in the docstring in the graph code itself. Any function
> which is *expected* to work for BG's should be tested for BG's, and
> there are plenty of places in the documentation these tests could be
> put.

Are you suggesting the doctest for each method in generic_graph.py
should also test a BipartiteGraph instance (at least for those that
work without modification)?  That starts to sound alot like my
"mixing" option (B), because then the responsibility falls on the
Graph developer.  I would love to see more BG's tested directly in
Graph and GenericGraph doctests -- but expecting/requiring the
doctests should produce resistance roughly equal to the resistance you
predicted for option (B).

One possibility is to include doctests in the initial comments for
bipartite_graph.py for all the Graph and GenericGraph methods that are
expected to work.  This is a little bit like option (A)  That way it
is explicitly recorded *somewhere* (though not in the "method?"
documentation) which methods are known to work.  I think this is a
great idea.  In fact, I think including BG's in the generic_graph.py
doctests is a great idea, too.

> >> Back when #1941 was created, this thorough review was done. It may
> >> need to be updated.
>
> > I think it does.  There are lots of methods in graph.py; at least a
> > few need updating that are absent from the current list.
> > I'm not sure what class-handling capabilities you are referring to,
> > but I'm not a Python expert.  I don't see this as a language issue.
> > Python doesn't get in the way, but it certainly doesn't solve the
> > problem.  Well, perhaps we could use some of the introspective
> > features of Python in option (A) to indicate when Graph and
> > BipartiteGraph don't have the same set of methods.
>
> To me, having {{{isinstance(BipartiteGraph(), Graph) is True}}} is the
> only right way to go. All over Sage, the "is a" relationship is very
> carefully preserved, and makes sense mathematically and in terms of
> code. I think the main thing you are worried about in option C is
> people writing new code for Graphs and inadvertently introducing bugs
> in BipartiteGraphs. My main answer is documentation: if it is expected
> to work, it is tested, and those tests protect its proper functioning.
> If there is no test, then there is no reason to expect it to work,
> before or after the patch. And also, new code always introduces bugs,
> you just have to pick your poison and decide how much work you want to
> do.

OK, I'm willing to accept BG as inheriting from Graph.  It's unlikely
anyone will write or maintain option (A).

As for your "documentation" solution, I think this is where we
disagree.  If I use tab completion and find a method hanging on a
BipartiteGraph instance, I expect it to work.  In other words, if it
is written, documented, and tested in generic_graph.py *and inherited*
in BG, I expect it to work.  Having taken part in this discussion, I
know to look carefully at where the method is defined and whether it
includes any explicit tests of BG instances.  But in general, I expect
the methods defined on an object to work.  (Maybe this is a bad
assumption since it has failed several times for me with sage!)

You can read the summary again if you like -- this is where I
originally wrote it.  :-)

Thanks!

- Ryan

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

[sage-devel] Re: BipartiteGraph status

Reply via email to