Re: [sage-devel] Re: Code duplication and aliases in methods

Nathann Cohen Tue, 19 Jun 2012 14:06:17 -0700

Hellooooooo everybody !!

As in the best times, I tried 5 times to answer all who answered this
thread, got angry, erased the message, and began again. But Robert Miller
has a nice saying when we get to such situations : "shut the fuck up and
show me the code". So that's what I will try to do. Give examples instead
of wondering aloud whether the world has gone mad.


Be warned, for it is a long mail (I --love-- long emails).  It also gives
some idea of what is happening in Sage's graph library these days, if
anybody (but me) cares :-)

1) Some history : the BipartiteGraph class.
-----------------------------------------------------------
The purpose of this class was to *at long last* have Graph classes that
represent graph properties. There are two possibilities when writing such a
thing :
a) make it inherit all the Graph functions
b) make it a new class
The problem when picking b) is that this class gets totally useless as it
has none of the interesting methods of Graph. So a) was picked, and of
course most if not all methods of BipartiteGraph break when called on a
BipartiteGraph object.
The reason why is easy : some of the most fundamental graph operations are
invalid. Like add_edge.
I assure you that not being able to add an edge in a graph is *very*
problematic.
The code ?

Try the following :
sage: BipartiteGraph(graphs.CompleteBipartiteGraph(3,3)).complement()

Of course, the complement of a bipartite graph needs not be bipartite,
right ? Yeah, but the graph methods do not know that, nor should they mind
most of the time. Try that :

sage: BipartiteGraph(graphs.CompleteBipartiteGraph(3,3)).independent_set()

Basically all graphs functions would need to be rewritten to work with
BipartiteGraph. The only sane design choice would be for BipartiteGraph
*not* to inherit Graph methods, but of course it would be pretty empty and
all methods would need to be copied anyway.

There is another problem with Bipartite Graph. The add_edge function needs
to be wrapped so that no edge is added to the graph that creates an odd
cycle. This is a costly operation. *VERY* costly operation (compared to
add_edge, of course). And so it was not done this way, for efficiency.
Hence instead of checking whether the graph "plus the new edge" stays
bipartite, the vertices are bicolored from the start, and an exception is
raised when add_edge tries to add an edge that does not fit *the former
bicoloring*.
That's mostly incorrect, does not correspond to the user's intent, but it
is....faster >_<

sage: g = BipartiteGraph(5)
sage: g.add_edge(0,1)

This could could be fixed by checking that the graph remains bipartite each
time an edge is added, but of course that comes with a cost, and a cost
that it is not healthy to pay when you *just* want to add an edge. The most
natural explanation behind is that you usually know what you are doing when
you add an edge to a graph, and that you should know for yourself whether
you graph remains bipartite or not. If not, please check it with
is_bipartite when you need it, but *do not make everybody's code 10000
times slower out of sheer laziness*.

At some point somebody proposed to override the complement() function so
that the complement of a graph stays bipartite : this can be achieved by
switching edges and nonedges *between the two sets* and nowhere else.
Though this operation makes perfect sense in graph theory, it would of
course be very dangereous to replace the .complement() function by doing
that, for all algorithms that take a BipartiteGraph as an instance and
compute graph complements would then return wrong result *and not know it*.

Sooooooo, these are the problems created by a property we want to keep on a
graph. It is not *yet* about immutable graphs, because NONE of these
methods would be available at all. This would also prevent you from running
any method that think they can use the other ones, even when they do not
actually change the graph itself (like independent_set()).

I said it several times, but for me this class should be removed. It is not
just bad from the computational point of view, but it also creates problems
that new users have no idea how to cope with, because what the methods do
is not what they have in mind.

2) Immutable graphs : storing the graph's properties
-----------------------------------------------------------------------

Recently in Sage was merged a patch which is an interface with the graph
class database ISGCI [1,2]. There are many "graph classes" (let us
understand this as 'graph property' for the length of this discussion), and
this database can tell us to some extend which properties are implied by
others. When I first wrote this patch, what I had in mind is precisely what
Dima mentionned : some algorithms work for general graphs and have a bad
running time, some other algorithms have a better running time because they
are only meant to be run on graphs with a specific property [3].
This is tempting, because the graph theory litterature is full of this kind
of algorithms "which work only if my graph has this and this and that".
Hence the goal was to run on each graph the implementation most adapted to
it.

~~~~~~
By the way Dima, there are "very sane ways" to write perfect-graph-specific
algorithms in Graph.py, hence that can be accessed for all graphs. What
would you think of :
Graph.independent_set(algorithm = "perfect_graph") ?
It would not be the default algorithm, it would not be run by mistake, and
because perfect independent set are not only interesting for perfect
graphs, this is actually what would make the most sense. No new class, no
overhead over the most important graph functions...
~~~~~~

Of course, we cannot attempt, each time independent_set is run, to check
all graphs properties for which we have a specific algorithm. Unless the
current implementation (very bad, but -correct-) is changed, it is very
costly to check whether a graph is perfect. Much more than to compute an
independent set, actually.

Hence it would be nice to have a class of immutable graphs, to which could
be associated (cached) some properties that would not have to be computed
over and over afterwards. The most natural question, now : what about the
a) and b) options from the BipartiteGraph class ? Should they inherit all
methods, or none ?
My personal advice is clear --> absolutely none. Unless somebody can think
of a way to avoid all the problems we had with BipartiteGraph. Hence
immutable graphs would only be used for "storage purposes", and the
combinat guys already (repeatedly) asked for such a thing, as they
sometimes need their graphs to be hashable (and to store them in sets, or
as dictionary keys). It also happened to me recently, and I had to store
spar6_string instead. Not so bad, but not a good answer either.
I also wrote some kind of static graph structure myself -- low level ones
-- because you can perform some optimizations when you know from the start
that the graph will not be modified. Having a static graph structure thus
is a way to have a faster data structure, which we use for instance to
compute distances between all pairs of vertices [4].
In particular, the purpose of these speedups is to save the time it takes
to call has_edge, or to iterate over the neighbors of a graph.
In particular, it is totally unthinkable that graphs you actually want to
compute things on could be immutable. We already lose time compared to
networkX for stupid things like add/remove edges, there is no way on earth
that add_edge would copy the whole thing.

Dima, I have been working with Thomas on groups recently, with GAP and
Magma, which was pretty new to me. Of course their objects are usually
immutable, of course you create them once and never touch them, but the way
graph theory works is totally different. We spend our lifetimes adding and
removing edges, and vertices, and building a complete graph on n vertices
edge by edge *cannot* take n^4 time, which is what happens if you copy the
graph each time you want to add an edge.
Honestly. All algorithms are built like that. Any greedy algorithm works
like that, and graph theory has a lot of them. If you want to compute
matchings, if you want to compute decompositions, god, everything is being
done step by step, corrected 10 times, and you just cannot afford to copy
your whole data each time.

3) Splitting the files
---------------------------

We have something like "three files" for graph methods (data structures
excluded). generic_graph, graph, and digraph. The graph one is pretty
thick, and should be even thicker considering that some methods stored in
generic_graph should be moved to graph.py (old mistakes, most probably
mine). I do not see among the methods any clear way to split them (I
already had to list all graphs methods thematically twice, each time I
reached the same conclusion -- that it is totally arbitrary and unnatural),
so I will refrain from doing it myself. Of course anybody else can write a
patch doing that, which somebody else will review, Sage is a free software
so my personal advice does not matter much. But I do not see how it could
be done smartly.
Some recent additions to Sage's graph library were done in modules. These
functions are not necessarily made available through functions from the
Graph class, but can be imported.
I think that it is the best way to procede now that the algorithms we add
get more and more specific, somehow "research-level", at least in the view
I have of graphs ("topological things", like connectivity, distances,
sparse graphs, that stuff).

Having a large amount of graph function available in modules only is also a
way to get the users used to query the documentation when they wonder what
Sage can do, and not only the automatic completion of "g.<tab>". Many graph
methods are not that fundamental anyway, and the same way that it would not
make sense to import them systematically to Graph, there is no sense in
creating a "GraphThatYouWantToColor" class whose methods would be the
coloring-related stuff (currently stored in graph_coloring.py).

Hence, as this thread originally suggested, I think the best is to work
with modules, and to lessen the load of Graph.py by "importing" the useful
functions there, which just requires one line (thank you Florent for your
trick !!!).
I am sorry to only think of the matrix0, matrix1, matrix2 and matrix3 files
as a joke. The only reason I see to keep them this way is that "we always
kept them this way", and that people are usually scared of backward
compatibility. Of course, I do not know this code at all, so I will not
come and dirty it with my hands.
Jason suggested that it originally came from a technical problem in Cython,
and that makes sense. Honestly this is just a terrible design, and unless
you are forced to do it, you just don't. That's common sense. You do not
use object-oriented programming to create files matrix0...matrix3. That's
confusing, that's...
Come on, that's just a mistake.
It shows.

I mean.... The thing is that for me importing stuff from module to classes
just solves everything at no cost. Of course I may have missed things, in
which case somebody will probably tell me what eventually, but why look
anywhere else if we have an answer ? Is there a bad side to this way of
doing ? I work with group theory these days, so I am used to missing
obvious things :-)

-------------

Well. By the academic world's standard this mail is long only forgot
unimportant things, so I better resume filling boxes as I will be leaving
Belgium the day after tomorrow. Gosh, my appartment is a mess ^^;

Oh, and by the way I expect to be very close to totally unreachable for a
long period, probably two months and a half, from early july to end of
september. If anybody around happens to be in southeast Asia during that
time (Thailand/Vietnam/Cambodge/Laos), I will be backpacking there, let's
meet somewhere !!!

Haaaaaaaaaaaaaaaaaaaaaaaaaaave
fuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuun ! :-)

Nathann

[1] http://www.graphclasses.org/
[2] http://combinat.sagemath.org/doc/reference/sage/graphs/isgci.html
[3] Of course Dima I only talked about methods *implemented* in Sage
[4]
http://www.sagemath.org/doc/reference/sage/graphs/distances_all_pairs.html
[5]
http://www.sagemath.org/doc/reference/sage/graphs/base/static_sparse_graph.html

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Re: [sage-devel] Re: Code duplication and aliases in methods

Reply via email to