Re: [sage-devel] Re: Code duplication and aliases in methods

Dima Pasechnik Wed, 20 Jun 2012 02:34:03 -0700


On Wednesday, 20 June 2012 00:06:11 UTC+3, Nathann Cohen wrote:
>
> Hellooooooo everybody !!
>
> As in the best times, I tried 5 times to answer all who answered this 
> thread, got angry, erased the message, and began again. But Robert Miller 
> has a nice saying when we get to such situations : "shut the fuck up and 
> show me the code". So that's what I will try to do. Give examples instead 
> of wondering aloud whether the world has gone mad.
>
> Be warned, for it is a long mail (I --love-- long emails).  It also gives 
> some idea of what is happening in Sage's graph library these days, if 
> anybody (but me) cares :-)
>
> 1) Some history : the BipartiteGraph class.
> -----------------------------------------------------------
> The purpose of this class was to *at long last* have Graph classes that 
> represent graph properties. There are two possibilities when writing such a 
> thing :
> a) make it inherit all the Graph functions
> b) make it a new class
> The problem when picking b) is that this class gets totally useless as it 
> has none of the interesting methods of Graph. So a) was picked, and of 
> course most if not all methods of BipartiteGraph break when called on a 
> BipartiteGraph object.
> The reason why is easy : some of the most fundamental graph operations are 
> invalid. Like add_edge.
>


IMHO it's a problem of the Graph class itself. There should be no add_edge 
or remove_edge there at all.
All this mutability stuff must be done elsewhere. (In a subclass 
DurtyMutableGraph, if you wish)

As, indeed, these operations will break most of specific graph properties.
 

> I assure you that not being able to add an edge in a graph is *very* 
> problematic.
> The code ?
>
> Try the following :
> sage: BipartiteGraph(graphs.CompleteBipartiteGraph(3,3)).complement()
>
> Of course, the complement of a bipartite graph needs not be bipartite, 
> right ? Yeah, but the graph methods do not know that, nor should they mind 
> most of the time. Try that :
>
> sage: BipartiteGraph(graphs.CompleteBipartiteGraph(3,3)).independent_set()
>
> Basically all graphs functions would need to be rewritten to work with 
> BipartiteGraph. The only sane design choice would be for BipartiteGraph 
> *not* to inherit Graph methods, but of course it would be pretty empty and 
> all methods would need to be copied anyway.
>
> There is another problem with Bipartite Graph. The add_edge function needs 
> to be wrapped so that no edge is added to the graph that creates an odd 
> cycle. This is a costly operation. *VERY* costly operation (compared to 
> add_edge, of course). And so it was not done this way, for efficiency. 
> Hence instead of checking whether the graph "plus the new edge" stays 
> bipartite, the vertices are bicolored from the start, and an exception is 
> raised when add_edge tries to add an edge that does not fit *the former 
> bicoloring*.
> That's mostly incorrect, does not correspond to the user's intent, but it 
> is....faster >_<
>
> sage: g = BipartiteGraph(5)
> sage: g.add_edge(0,1)
>
> This could could be fixed by checking that the graph remains bipartite 
> each time an edge is added, but of course that comes with a cost, and a 
> cost that it is not healthy to pay when you *just* want to add an edge. The 
> most natural explanation behind is that you usually know what you are doing 
> when you add an edge to a graph, and that you should know for yourself 
> whether you graph remains bipartite or not. If not, please check it with 
> is_bipartite when you need it, but *do not make everybody's code 10000 
> times slower out of sheer laziness*.
>
> At some point somebody proposed to override the complement() function so 
> that the complement of a graph stays bipartite : this can be achieved by 
> switching edges and nonedges *between the two sets* and nowhere else. 
> Though this operation makes perfect sense in graph theory, it would of 
> course be very dangereous to replace the .complement() function by doing 
> that, for all algorithms that take a BipartiteGraph as an instance and 
> compute graph complements would then return wrong result *and not know it*.
>
> Sooooooo, these are the problems created by a property we want to keep on 
> a graph. It is not *yet* about immutable graphs, because NONE of these 
> methods would be available at all. This would also prevent you from running 
> any method that think they can use the other ones, even when they do not 
> actually change the graph itself (like independent_set()).
>
> I said it several times, but for me this class should be removed. It is 
> not just bad from the computational point of view, but it also creates 
> problems that new users have no idea how to cope with, because what the 
> methods do is not what they have in mind.
>
> 2) Immutable graphs : storing the graph's properties
> -----------------------------------------------------------------------
>
> Recently in Sage was merged a patch which is an interface with the graph 
> class database ISGCI [1,2]. There are many "graph classes" (let us 
> understand this as 'graph property' for the length of this discussion), and 
> this database can tell us to some extend which properties are implied by 
> others. When I first wrote this patch, what I had in mind is precisely what 
> Dima mentionned : some algorithms work for general graphs and have a bad 
> running time, some other algorithms have a better running time because they 
> are only meant to be run on graphs with a specific property [3].
> This is tempting, because the graph theory litterature is full of this 
> kind of algorithms "which work only if my graph has this and this and 
> that". Hence the goal was to run on each graph the implementation most 
> adapted to it.
>
> ~~~~~~
> By the way Dima, there are "very sane ways" to write 
> perfect-graph-specific algorithms in Graph.py, hence that can be accessed 
> for all graphs. What would you think of :
> Graph.independent_set(algorithm = "perfect_graph") ?
>

But it would return rubbish on a non-perfect graph, won't it?
This (non-generic) implementation of independent_set() for perfect graphs 
does not belong to this class.
It must be in a subclass implementing perfect graphs, which should 
overload independent_set()
with the more efficient version.

It would not be the default algorithm, it would not be run by mistake, and 
> because perfect independent set are not only interesting for perfect 
> graphs, this is actually what would make the most sense. No new class, no 
> overhead over the most important graph functions... 
> ~~~~~~
>
> Of course, we cannot attempt, each time independent_set is run, to check 
> all graphs properties for which we have a specific algorithm. Unless the 
> current implementation (very bad, but -correct-) is changed, it is very 
> costly to check whether a graph is perfect. Much more than to compute an 
> independent set, actually.
>
> Hence it would be nice to have a class of immutable graphs, to which could 
> be associated (cached) some properties that would not have to be computed 
> over and over afterwards. The most natural question, now : what about the 
> a) and b) options from the BipartiteGraph class ? Should they inherit all 
> methods, or none ?
> My personal advice is clear --> absolutely none. Unless somebody can think 
> of a way to avoid all the problems we had with BipartiteGraph. Hence 
> immutable graphs would only be used for "storage purposes", and the 
> combinat guys already (repeatedly) asked for such a thing, as they 
> sometimes need their graphs to be hashable (and to store them in sets, or 
> as dictionary keys). It also happened to me recently, and I had to store 
> spar6_string instead. Not so bad, but not a good answer either.
> I also wrote some kind of static graph structure myself -- low level ones 
> -- because you can perform some optimizations when you know from the start 
> that the graph will not be modified. Having a static graph structure thus 
> is a way to have a faster data structure, which we use for instance to 
> compute distances between all pairs of vertices [4].
> In particular, the purpose of these speedups is to save the time it takes 
> to call has_edge, or to iterate over the neighbors of a graph.
> In particular, it is totally unthinkable that graphs you actually want to 
> compute things on could be immutable. We already lose time compared to 
> networkX for stupid things like add/remove edges, there is no way on earth 
> that add_edge would copy the whole thing.
>
> Dima, I have been working with Thomas on groups recently, with GAP and 
> Magma, which was pretty new to me. Of course their objects are usually 
> immutable, of course you create them once and never touch them, but the way 
> graph theory works is totally different. We spend our lifetimes adding and 
> removing edges, and vertices, and building a complete graph on n vertices 
> edge by edge *cannot* take n^4 time, which is what happens if you copy the 
> graph each time you want to add an edge

 
Try working with GRAPE in GAP. Surely you can add and remove edges there. 
I am perfectly aware of this. GRAPE is not object-oriented in any way, 
that's why you can
do whatever you want with your graph.

Honestly. All algorithms are built like that. Any greedy algorithm works 
> like that, and graph theory has a lot of them. If you want to compute 
> matchings, if you want to compute decompositions, god, everything is being 
> done step by step, corrected 10 times, and you just cannot afford to copy 
> your whole data each time.
>
> 3) Splitting the files
> ---------------------------
>
> We have something like "three files" for graph methods (data structures 
> excluded). generic_graph, graph, and digraph. The graph one is pretty 
> thick, and should be even thicker considering that some methods stored in 
> generic_graph should be moved to graph.py (old mistakes, most probably 
> mine). I do not see among the methods any clear way to split them (I 
> already had to list all graphs methods thematically twice, each time I 
> reached the same conclusion -- that it is totally arbitrary and unnatural), 
> so I will refrain from doing it myself. Of course anybody else can write a 
> patch doing that, which somebody else will review, Sage is a free software 
> so my personal advice does not matter much. But I do not see how it could 
> be done smartly.
> Some recent additions to Sage's graph library were done in modules. These 
> functions are not necessarily made available through functions from the 
> Graph class, but can be imported.
> I think that it is the best way to procede now that the algorithms we add 
> get more and more specific, somehow "research-level", at least in the view 
> I have of graphs ("topological things", like connectivity, distances, 
> sparse graphs, that stuff).
>
> Having a large amount of graph function available in modules only is also 
> a way to get the users used to query the documentation when they wonder 
> what Sage can do, and not only the automatic completion of "g.<tab>". Many 
> graph methods are not that fundamental anyway, and the same way that it 
> would not make sense to import them systematically to Graph, there is no 
> sense in creating a "GraphThatYouWantToColor" class whose methods would be 
> the coloring-related stuff (currently stored in graph_coloring.py).
>
> Hence, as this thread originally suggested, I think the best is to work 
> with modules, and to lessen the load of Graph.py by "importing" the useful 
> functions there, which just requires one line (thank you Florent for your 
> trick !!!).
> I am sorry to only think of the matrix0, matrix1, matrix2 and matrix3 
> files as a joke. The only reason I see to keep them this way is that "we 
> always kept them this way", and that people are usually scared of backward 
> compatibility. Of course, I do not know this code at all, so I will not 
> come and dirty it with my hands.
> Jason suggested that it originally came from a technical problem in 
> Cython, and that makes sense. Honestly this is just a terrible design, and 
> unless you are forced to do it, you just don't. That's common sense. You do 
> not use object-oriented programming to create files matrix0...matrix3. 
> That's confusing, that's...
> Come on, that's just a mistake. 
> It shows.
>
> I mean.... The thing is that for me importing stuff from module to classes 
> just solves everything at no cost. Of course I may have missed things, in 
> which case somebody will probably tell me what eventually, but why look 
> anywhere else if we have an answer ? Is there a bad side to this way of 
> doing ? I work with group theory these days, so I am used to missing 
> obvious things :-)
>
> -------------
>
> Well. By the academic world's standard this mail is long only forgot 
> unimportant things, so I better resume filling boxes as I will be leaving 
> Belgium the day after tomorrow. Gosh, my appartment is a mess ^^;
>

I am writing this using a network connection via a borrowed 3G usb stick, 
and it only works outdoors here :-)
So I better continue later next week.

Dima
 

>
> Oh, and by the way I expect to be very close to totally unreachable for a 
> long period, probably two months and a half, from early july to end of 
> september. If anybody around happens to be in southeast Asia during that 
> time (Thailand/Vietnam/Cambodge/Laos), I will be backpacking there, let's 
> meet somewhere !!!
>
> Haaaaaaaaaaaaaaaaaaaaaaaaaaave 
> fuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuun ! :-)
>
> Nathann
>
> [1] http://www.graphclasses.org/
> [2] http://combinat.sagemath.org/doc/reference/sage/graphs/isgci.html
> [3] Of course Dima I only talked about methods *implemented* in Sage
> [4] 
> http://www.sagemath.org/doc/reference/sage/graphs/distances_all_pairs.html
> [5] 
> http://www.sagemath.org/doc/reference/sage/graphs/base/static_sparse_graph.html
>  

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Re: [sage-devel] Re: Code duplication and aliases in methods

Reply via email to