Re: [sage-support] Re: sage graph -- why is c_graph so slow??

2011-12-31 Thread Nathann Cohen
Hello !!!

> I'll try to test it asap. Having some patch application issues. :-|
> Just in case you're familiar with this, I'll throw it out there. But
> this is probably a job for google:

Oh, not really. This problem comes from the fact that I am using a more
recent version of Sage than you do :

~$ sage
--
| Sage Version 4.8.alpha4, Release Date: 2011-12-13


This one is the developper's version, and our patches should be applied on
this one in case of doubt :-)

Here is where you can find it : http://www.sagemath.org/download-latest.html

By the way, I think some of the recent optimizations in the Graph code that
are included in this version may not be available in the one you are
currently using :-)

Have fuun !

Nathann

-- 
To post to this group, send email to sage-support@googlegroups.com
To unsubscribe from this group, send email to 
sage-support+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sage-support
URL: http://www.sagemath.org


[sage-support] Re: sage graph -- why is c_graph so slow??

2011-12-30 Thread entropy
Nathann,

Many thanks! You're awesome! I looked at the patch, it looks tiny. :)
I'll try to test it asap. Having some patch application issues. :-|
Just in case you're familiar with this, I'll throw it out there. But
this is probably a job for google:

cd "/Applications/sage/devel/sage" && hg import   "/Users/jberwald/src/
trac_12235.patch"
applying /Users/jberwald/src/trac_12235.patch
patching file sage/graphs/base/c_graph.pyx
Hunk #1 FAILED at 2977
1 out of 1 hunks FAILED -- saving rejects to file sage/graphs/base/
c_graph.pyx.rej
patching file sage/graphs/digraph.py
Hunk #1 FAILED at 2561
1 out of 1 hunks FAILED -- saving rejects to file sage/graphs/
digraph.py.rej
abort: patch failed to apply

Thanks!
Jesse

On Dec 29, 10:48 am, Nathann Cohen  wrote:
> Hell !!!
>
> and still much faster than the c_graph implementation.
>
>
>
> Well... I spent *quite* some time over this problem, wrote a LOT of code
> and documentation , to find out later that this could be solved in a *very
> small* patch. I hope all the work I did could be used later on anyway, but
> for the moment there should be no further worries about this SCC method. I
> created a patch for this just there [1], which you will find with some
> benchmarks.
>
> http://trac.sagemath.org/sage_trac/ticket/12235
>
> As a side note, I've also been
>
> > testing subgraph functionality. Eg.,
> >  self.M.subgraph(self.rand_verts(K)), which maybe has a better
> > implementation using subgraph_search() ??
>
> Nonononono ! This subgraph method has nothing to do with subgraph_search !
> The subgraph method takes as an argument a set of vertices and returns the
> graph induced by those vertices. The subgraph_search (and all the
> subgraph_search_* method) take as an argument *another graph*, and look for
> copies of this other graph inside of the first one. Which is dead harder :-D
>
> > Anyways, I greatly appreciate your help with this. It would be great
> > to be able to use Sage/Python to run all of our code.
>
> Please complain whenever you have the slightest thought that Sage may not
> be the best graph library in the world :-p
>
> Have fuun ! :-p
>
> Nathann

-- 
To post to this group, send email to sage-support@googlegroups.com
To unsubscribe from this group, send email to 
sage-support+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sage-support
URL: http://www.sagemath.org


[sage-support] Re: sage graph -- why is c_graph so slow??

2011-12-29 Thread Nathann Cohen
Hell !!!

and still much faster than the c_graph implementation. 
>

Well... I spent *quite* some time over this problem, wrote a LOT of code 
and documentation , to find out later that this could be solved in a *very 
small* patch. I hope all the work I did could be used later on anyway, but 
for the moment there should be no further worries about this SCC method. I 
created a patch for this just there [1], which you will find with some 
benchmarks.

http://trac.sagemath.org/sage_trac/ticket/12235


As a side note, I've also been 
> testing subgraph functionality. Eg., 
>  self.M.subgraph(self.rand_verts(K)), which maybe has a better 
> implementation using subgraph_search() ?? 
>

Nonononono ! This subgraph method has nothing to do with subgraph_search ! 
The subgraph method takes as an argument a set of vertices and returns the 
graph induced by those vertices. The subgraph_search (and all the 
subgraph_search_* method) take as an argument *another graph*, and look for 
copies of this other graph inside of the first one. Which is dead harder :-D
 

> Anyways, I greatly appreciate your help with this. It would be great 
> to be able to use Sage/Python to run all of our code. 
>

Please complain whenever you have the slightest thought that Sage may not 
be the best graph library in the world :-p

Have fuun ! :-p

Nathann

-- 
To post to this group, send email to sage-support@googlegroups.com
To unsubscribe from this group, send email to 
sage-support+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sage-support
URL: http://www.sagemath.org


[sage-support] Re: sage graph -- why is c_graph so slow??

2011-12-25 Thread entropy
Hi Nathann,
Thank you for the timely updates! I agree with you about the calls to
random. I can move those out of the timing portion. I suspected that
the passage to the backend was probably responsible for the slow speed
of the add/remove edge/vertices calls. The cost of method calls is
overhead I'm willing to swallow if the harder algorithms are
faster. :)
As for the scc() method, the "_digraph()" problem was my fault--I
didn't understand how scc_digraph() worked. Unfortunately, after
removing "_digraph()" the timings are just as bad. Sample below:
(1000 node graph)
scc() -- 10 trials: sage_cgraph 1362871.89 usec [ now computed using
self.M.strongly_connected_components() (see previous post in thread) ]
scc() -- 10 trials:    networkx    6015.06 usec I'm afraid I'm still
missing something crucial here? :-? Also, as a confirmation of your
argument concerning the calls to the backend, the Sage implementation
of NetworkX (implementation='networkx' instead of 'c_graph') does work
nearly as as fast as pure NetworkX after removing "_digraph()"--and
still much faster than the c_graph implementation.
As far as which functions/methods to benchmark, I am (or rather we
are) interested in a few specific graph algorithms, scc() among them.
That's why I've been picking on scc(). As a side note, I've also been
testing subgraph functionality. Eg.,
 self.M.subgraph(self.rand_verts(K)), which maybe has a better
implementation using subgraph_search() ??
Anyways, I greatly appreciate your help with this. It would be great
to be able to use Sage/Python to run all of our code.
Merry Christmas,Jesse
On Dec 25, 3:08 am, Nathann Cohen  wrote:
> Oh yes, and something else about your benchmark : try to avoid using "rand"
> methods when you are doing one, especially when you test such low-level
> methods, because often the rand() method represents  an important part of
> the time.
>
> The best would be to compute all the random number you need in a first
> phase, then run %timeit on the add_edge part.
>
> Well, it probably will not reflect well on Sage because it should increase
> the differences between the libraries, but I think that it is very
> important in your benchmark, To give you an idea :
>
> sage: from numpy import random as rnd
> sage:
> sage: g = Graph(500)
> sage: def rand_entry(G):
> :     ...    n = G.order()
> :     ...    i = rnd.randint(0,n-1)
> :     ...    j = rnd.randint(0,n-1)
> :     ...    G.add_edge(i,j)
> :     ...    G.delete_edge(i,j)
> :
> sage: def just_rand(G):
> :     ...    n = G.order()
> :     ...    i = rnd.randint(0,n-1)
> :     ...    j = rnd.randint(0,n-1)
> :     ...    return i*j
> :
> sage: %timeit rand_entry(g)
> 625 loops, best of 3: 20.4 µs per loop
> sage: %timeit just_rand(g)
> 625 loops, best of 3: 4.93 µs per loop
>
> So 20% of the time used by this test method is actualy used by calls to
> "random()" :-)
>
> Nathann

-- 
To post to this group, send email to sage-support@googlegroups.com
To unsubscribe from this group, send email to 
sage-support+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sage-support
URL: http://www.sagemath.org


Re: [sage-support] Re: sage graph -- why is c_graph so slow??

2011-12-25 Thread Nathann Cohen
Oh yes, and something else about your benchmark : try to avoid using "rand"
methods when you are doing one, especially when you test such low-level
methods, because often the rand() method represents  an important part of
the time.

The best would be to compute all the random number you need in a first
phase, then run %timeit on the add_edge part.

Well, it probably will not reflect well on Sage because it should increase
the differences between the libraries, but I think that it is very
important in your benchmark, To give you an idea :

sage: from numpy import random as rnd
sage:
sage: g = Graph(500)
sage: def rand_entry(G):
: ...n = G.order()
: ...i = rnd.randint(0,n-1)
: ...j = rnd.randint(0,n-1)
: ...G.add_edge(i,j)
: ...G.delete_edge(i,j)
:
sage: def just_rand(G):
: ...n = G.order()
: ...i = rnd.randint(0,n-1)
: ...j = rnd.randint(0,n-1)
: ...return i*j
:
sage: %timeit rand_entry(g)
625 loops, best of 3: 20.4 µs per loop
sage: %timeit just_rand(g)
625 loops, best of 3: 4.93 µs per loop

So 20% of the time used by this test method is actualy used by calls to
"random()" :-)

Nathann

-- 
To post to this group, send email to sage-support@googlegroups.com
To unsubscribe from this group, send email to 
sage-support+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sage-support
URL: http://www.sagemath.org


Re: [sage-support] Re: sage graph -- why is c_graph so slow??

2011-12-25 Thread Nathann Cohen
Hell !

> Thanks for looking into this. I believe that I stayed within Sage's
> library when I wrote my test code.

You did ! My mistake ! But you actually call
strongly_connected_components_digraph instead of
strongly_connected_components, which mean the function is spending most of
its time building a DiGraph (which tells you how the components are related
to each other), instead of just computing them (as NetworkX does) :-)

> The general outline and some
> classes were originally written by a collaborator (I don't want to
> take credit, but I'll take responsibility where there are errors!)

Ahahah. I'm told that "it is how management should be done" : credit goes
to the group, mistakes are the leader; s responsibility. Instead of the
usual other way around :-)

> I couldn't find a way to attach files, so I've pasted the two classes
> below (plus a base class):

Thank you very much, it is perfect like that :-)

>def scc(self):
>self.M.strongly_connected_components_digraph()

That's where the problem lies. Just remove the _digraph part and it should
be *much* better. Creating a digraph (with fancy nodes, if I may say) is
not as cheap as BFS.

Well. All in all, it looks like your tests are sound, so it's now my turn
to work to reduce these runnings times. We can not afford to be slower than
NetworkX, because I want to be able to say that there is no library like
Sage and stop there without hearing anybody complaining :-D

Sooo.. The reason why it's slow on our side, and the reason why our
"NetworkX" backend is slower than pure networkX is exactly because of what
I said earlier : it is used as a backend, which means the "real graph" is
stored as Graph._backend, which means that each time you call one of
Graph's method the stuff has to be forwarded to another method, and we are
talking of so short methods that calling times makes a difference. I do not
know how I will improve that, but I will find ways :-p

Then, there is actually a nasty thing in our strongly-connected-components
method. It is calling in_neighbors at some point for each vertex of the
node, and it is just too stupid. Our graphs (c_graphs) are handwritten in
C, and for this reason they should be *MUCH* lighter in memory than
networkX graphs, because we use less complicated structures to store
adjacencies. But, for instance, if we store all the out-neighbors of a
node, we do not store its in-neighbors. Which means the in-neighbors
function is doing a bad job :-D

So either I should change that in scc, or else change the backend. I'll see
that, it probably will be the scc method. Anyway many basic functions of
Sage should be updated now that we have iterators in Cython.

Well. This is for my work. I also have something to add about the benchmark
you are doing : as usual, benchmarks try to compare what they can compare.
That is, you took the most essential graph functions and you tried to
compare their performances in different graph libraries. At least for the
edd/remove edge/vertices, this will apparently reflect badly on us, but
that's my fault and I'll try to change that :-p
The thing is that by doing benchmarks this way, you will probably miss all
of Sage's strengths. In particular :
* That we use Cython for many hardcore methods
* That we use external C software for some stuff
* That we use linear programming for *A LOT* of stuff

So, why would you miss that ? Well, always for the same reason. Because the
methods that are written this way in our library are far from being basic
graphs. They solve harder (sometimes NP-Hard) problems, and chances are
that these methods are never available in other libraries.

In particular, none of this stuff *can be coded* with NetworkX. You would
need Cython to do so, and we can afford to write them and to have short
running time because we use both at once. So in the first category (Cython
stuff), you have :

* Graph.distances_all_pairs which computes the distances (or the paths too
with another method if you want) between all pairs of nodes. This method is
actually slow (but faster than NetworkX) because it returns its result as a
double dictionary. Well, this method actually exists in Sage at C-level,
which means that when we need the result from the inside of another C
method we save all this dictionary creation. An example of that is
Graph.diameter(), which I encourage you to include in your benchmark :-p
(but please use the last version of Sage, 4.8.alpha4, I do not remember
when we added that)

So, well, let's add it :

* Graph.diameter()

These two methods are not about complicated computations. But we have
things like

* Graph.subgraph_search, Graph.subgraph_search_count

Which checks whether a graph contains another as a subgraph, or return the
number of instances. This thing is done in C, and though I should improve
it again it is already way faster than anything you could code in Python.
To give you an idea, the C implementation of the distances_all_pairs, and
of the diamet

[sage-support] Re: sage graph -- why is c_graph so slow??

2011-12-24 Thread entropy
Hi Nathan,

Thanks for looking into this. I believe that I stayed within Sage's
library when I wrote my test code. The general outline and some
classes were originally written by a collaborator (I don't want to
take credit, but I'll take responsibility where there are errors!) I
couldn't find a way to attach files, so I've pasted the two classes
below (plus a base class):

SAGE
sage_cgraph_tester.py:

import numpy as np
from numpy import random as rnd
from sage.all import DiGraph
from tester import Tester

class Tester_sage_cgraph( Tester ):
def __init__(self):
self.name = 'sage_cgraph'

def set_graph(self,E,N):
self.N = N
self.M = DiGraph(N, implementation="networkx" ) #"c_graph",
sparse=True)
self.M.add_edges(E)

def rand_row(self,R):
i = self.rand_vert()
self.M.add_edges([(i,self.rand_vert()) for j in range(R)])

def rand_col(self,R):
i = self.rand_vert()
self.M.add_edges([(self.rand_vert(),i) for j in range(R)])

def rand_entry(self):
i = self.rand_vert()
j = self.rand_vert()
if not self.M.has_edge(i,j):
self.M.add_edge(i,j)

def subgraph(self,K):
self.M.subgraph(self.rand_verts(K))

def scc(self):
self.M.strongly_connected_components_digraph()

NETWORKX
networkx_tester.py (originally coded by my collaborator, but I'll take
full responsibility for any errors :) ):

import numpy as np
from numpy import random as rnd
import networkx
from tester import Tester
import rads_nx

class Tester_networkx(Tester):

def __init__(self):
self.name = 'networkx'

def set_graph(self,E,N):
self.N = N
self.M = networkx.DiGraph()
self.M.add_nodes_from(range(N))
self.M.add_edges_from(E)

def rand_row(self,R):
i = self.rand_vert()
self.M.add_edges_from([(i,self.rand_vert()) for j in range(R)])

def rand_col(self,R):
i = self.rand_vert()
self.M.add_edges_from([(self.rand_vert(),i) for j in range(R)])

def rand_entry(self):
i = self.rand_vert()
j = self.rand_vert()
if not self.M.has_edge(i,j):
self.M.add_edge(i,j)

def subgraph(self,K):
networkx.subgraph(self.M,self.rand_verts(K))

def scc(self):

networkx.algorithms.components.strongly_connected_components(self.M)

The base class Tester is in tester.py:

import numpy as np
from numpy import random as rnd

class Tester:
def rand_vert(self):
return rnd.randint(0,self.N-1)

def rand_verts(self,R):
return rnd.randint(0,self.N-1,R)

The NetworkX and Sage graph testers are initialize as follows:

G = networkx.read_adjlist('testmat%i.adjlist' % (N))
N = len(G.nodes())
E = G.edges()
E = map(lambda x: (int(x[0]),int(x[1])), E)

print 'initializing testers... ',
tester.set_graph(E,N)

Operations are timed using the timeit module. Eg.,

 test_str = 'testers[%i].%s(%s)' % ( tester, test['F'],test['args'] )
 timer = timeit.Timer( test_str, import_str),

where test['F'] is the method names and test['args'] contain any
necessary args.

Thanks again for your help.

Happy holidays!
Jesse

On Dec 23, 6:43 pm, Nathann Cohen  wrote:
> Hello Jesse !
>
> Well, for a start it wouldn't be very fair to compare graph libraries if
> you do not use our graph methods and recode your own ! You seem to have
> rewritten your version of "strongly connected components" to test the
> libraries, and such low-level methods are in Sage written in Cython, so
> this kind of running times are only those you would get if you use Sage
> graphs but refuse to use any of the methods present in the library :-D
>
> This being said, I just did some tests and if they are far from being as
> bad for Sage as yours are, I was quite disappointed myself. I was under the
> impression we were leaving NetworkX far behind, and it looks like we
> actually are behind in some cases, which will need to be fixed. Could I ask
> you to provide examples of codes which have different running times for
> NetworkX and Sage ? I guess you only use the add/remove edge/vertices
> methods in your code, which may be the explanation. When you are doing that
> you are actually calling Cython methods through Python functions, and
> spending more time calling methods than actually getting the job done
> Though to be honest I do not want to have to explain why Sage is slower, I
> would like to show that it is faster :-)
>
> Hence, if you can provide the code, we could begin to talk about the
> technical reasons.
>
> Good night !
>
> Nathann

-- 
To post to this group, send email to sage-support@googlegroups.com
To unsubscribe from this group, send email to 
sage-support+unsubscr...@googlegroups.com
For more options, visit this group at 
http

[sage-support] Re: sage graph -- why is c_graph so slow??

2011-12-23 Thread Nathann Cohen
Hello Jesse !

Well, for a start it wouldn't be very fair to compare graph libraries if 
you do not use our graph methods and recode your own ! You seem to have 
rewritten your version of "strongly connected components" to test the 
libraries, and such low-level methods are in Sage written in Cython, so 
this kind of running times are only those you would get if you use Sage 
graphs but refuse to use any of the methods present in the library :-D

This being said, I just did some tests and if they are far from being as 
bad for Sage as yours are, I was quite disappointed myself. I was under the 
impression we were leaving NetworkX far behind, and it looks like we 
actually are behind in some cases, which will need to be fixed. Could I ask 
you to provide examples of codes which have different running times for 
NetworkX and Sage ? I guess you only use the add/remove edge/vertices 
methods in your code, which may be the explanation. When you are doing that 
you are actually calling Cython methods through Python functions, and 
spending more time calling methods than actually getting the job done
Though to be honest I do not want to have to explain why Sage is slower, I 
would like to show that it is faster :-)

Hence, if you can provide the code, we could begin to talk about the 
technical reasons.

Good night !

Nathann

-- 
To post to this group, send email to sage-support@googlegroups.com
To unsubscribe from this group, send email to 
sage-support+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sage-support
URL: http://www.sagemath.org