Re: [sage-support] Re: sage graph -- why is c_graph so slow??

Nathann Cohen Sun, 25 Dec 2011 01:53:31 -0800

Helloooooooo !

> Thanks for looking into this. I believe that I stayed within Sage's
> library when I wrote my test code.


You did ! My mistake ! But you actually call
strongly_connected_components_digraph instead of
strongly_connected_components, which mean the function is spending most of
its time building a DiGraph (which tells you how the components are related
to each other), instead of just computing them (as NetworkX does) :-)

> The general outline and some
> classes were originally written by a collaborator (I don't want to
> take credit, but I'll take responsibility where there are errors!)

Ahahah. I'm told that "it is how management should be done" : credit goes
to the group, mistakes are the leader; s responsibility. Instead of the
usual other way around :-)

> I couldn't find a way to attach files, so I've pasted the two classes
> below (plus a base class):

Thank you very much, it is perfect like that :-)

>    def scc(self):
>        self.M.strongly_connected_components_digraph()

That's where the problem lies. Just remove the _digraph part and it should
be *much* better. Creating a digraph (with fancy nodes, if I may say) is
not as cheap as BFS.

Well. All in all, it looks like your tests are sound, so it's now my turn
to work to reduce these runnings times. We can not afford to be slower than
NetworkX, because I want to be able to say that there is no library like
Sage and stop there without hearing anybody complaining :-D

Sooo.. The reason why it's slow on our side, and the reason why our
"NetworkX" backend is slower than pure networkX is exactly because of what
I said earlier : it is used as a backend, which means the "real graph" is
stored as Graph._backend, which means that each time you call one of
Graph's method the stuff has to be forwarded to another method, and we are
talking of so short methods that calling times makes a difference. I do not
know how I will improve that, but I will find ways :-p

Then, there is actually a nasty thing in our strongly-connected-components
method. It is calling in_neighbors at some point for each vertex of the
node, and it is just too stupid. Our graphs (c_graphs) are handwritten in
C, and for this reason they should be *MUCH* lighter in memory than
networkX graphs, because we use less complicated structures to store
adjacencies. But, for instance, if we store all the out-neighbors of a
node, we do not store its in-neighbors. Which means the in-neighbors
function is doing a bad job :-D

So either I should change that in scc, or else change the backend. I'll see
that, it probably will be the scc method. Anyway many basic functions of
Sage should be updated now that we have iterators in Cython.

Well. This is for my work. I also have something to add about the benchmark
you are doing : as usual, benchmarks try to compare what they can compare.
That is, you took the most essential graph functions and you tried to
compare their performances in different graph libraries. At least for the
edd/remove edge/vertices, this will apparently reflect badly on us, but
that's my fault and I'll try to change that :-p
The thing is that by doing benchmarks this way, you will probably miss all
of Sage's strengths. In particular :
* That we use Cython for many hardcore methods
* That we use external C software for some stuff
* That we use linear programming for *A LOT* of stuff

So, why would you miss that ? Well, always for the same reason. Because the
methods that are written this way in our library are far from being basic
graphs. They solve harder (sometimes NP-Hard) problems, and chances are
that these methods are never available in other libraries.

In particular, none of this stuff *can be coded* with NetworkX. You would
need Cython to do so, and we can afford to write them and to have short
running time because we use both at once. So in the first category (Cython
stuff), you have :

* Graph.distances_all_pairs which computes the distances (or the paths too
with another method if you want) between all pairs of nodes. This method is
actually slow (but faster than NetworkX) because it returns its result as a
double dictionary. Well, this method actually exists in Sage at C-level,
which means that when we need the result from the inside of another C
method we save all this dictionary creation. An example of that is
Graph.diameter(), which I encourage you to include in your benchmark :-p
(but please use the last version of Sage, 4.8.alpha4, I do not remember
when we added that)

So, well, let's add it :

* Graph.diameter()

These two methods are not about complicated computations. But we have
things like

* Graph.subgraph_search, Graph.subgraph_search_count

Which checks whether a graph contains another as a subgraph, or return the
number of instances. This thing is done in C, and though I should improve
it again it is already way faster than anything you could code in Python.
To give you an idea, the C implementation of the distances_all_pairs, and
of the diameter() methods were compared (and totaly crushed) a Java library
some friend of my lab were writing. Something like a 10x speedup compared
to Java :-p

* Graph.is_isomorphic (there we use either Nauty or a Sage implementation
from ... I guess Robert Miller, as he wrote all the cool stuff in the Graph
library :-p)

also * Graph.clique_number(), or Graph.coloring() and its 3 different
methods. That sometimes uses the external software Cliquer (C)

or * Graph.modular_decomposition, which also uses dedicated code in C

Or all the NP-hard methods. dominating_set,
is_hamiltonian/traveling_salesman_problem, multiway_cut,
fractional_chromatic_index (this last one isn't NP-Hard).

Well. Sage can do all that, and when it does, it is not using these methods
you are testing, because it is done in C, or in linear programs, well, by
other means. Hence if you benchmark libraries using running times for the
add/remove edge/vertex methods, well, you can see that it would be missing
a great part of what we do well, probably much better than anybody else.

And now it is my turn to go to work, because you told me that we are slower
for some functions, and I hate that :-D

Merry Christmas ! And thank you very much for reporting all this ! :-)

Nathann

-- 
To post to this group, send email to sage-support@googlegroups.com
To unsubscribe from this group, send email to 
sage-support+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sage-support
URL: http://www.sagemath.org

Re: [sage-support] Re: sage graph -- why is c_graph so slow??

Reply via email to