#10433: clean up the code for Kruskal's algorithm
-------------------------------+--------------------------------------------
   Reporter:  mvngu            |       Owner:  jason, ncohen, rlm
       Type:  enhancement      |      Status:  needs_review      
   Priority:  major            |   Milestone:  sage-4.6.2        
  Component:  graph theory     |    Keywords:                    
     Author:  Minh Van Nguyen  |    Upstream:  N/A               
   Reviewer:                   |      Merged:                    
Work_issues:                   |  
-------------------------------+--------------------------------------------
Changes (by mvngu):

  * status:  needs_info => needs_review


Comment:

 Replying to [comment:11 ncohen]:
 > Hello Minh !!!
 >
 > I'm sorry to come back with another thing to fix after having made so
 many remarks, but I really hadn't thought this problem would occur.. I
 just did some profiling, and even though your code is faster when no
 checkings are required, there is a large slowdown by default :

 What can be done is to turn off the sanity checks by default. I have done
 this in the updated patch. But note that the function now assumes that by
 default the input graph is simple, connected, not a tree, and has at least
 one vertex. If you can't be certain that the input graph meet all of the
 latter conditions, then toggle `check=True`. With the updated patch, here
 are some timing results. First, I generate the reference graph to be used
 for all profiling:

 {{{
 #!python
 sage: g = graphs.RandomGNP(2000, 0.3)
 sage: save(g, "graphs")
 }}}

 I get some timing for the case where we don't use the patch:

 {{{
 #!python
 sage: g = load("graphs")
 sage: %time g.min_spanning_tree();
 CPU times: user 15.57 s, sys: 0.00 s, total: 15.57 s
 Wall time: 15.57 s
 sage: %time g.min_spanning_tree();
 CPU times: user 16.29 s, sys: 0.00 s, total: 16.29 s
 Wall time: 16.29 s
 sage: %time g.min_spanning_tree();
 CPU times: user 16.02 s, sys: 0.00 s, total: 16.02 s
 Wall time: 16.02 s
 sage: %time g.min_spanning_tree();
 CPU times: user 16.50 s, sys: 0.00 s, total: 16.50 s
 Wall time: 16.50 s
 }}}

 Now with the patch, but use `kruskal()` indirectly via
 `min_spanning_tree()`:

 {{{
 #!python
 sage: g = load("graphs")
 sage: %time g.min_spanning_tree();
 CPU times: user 15.22 s, sys: 0.00 s, total: 15.22 s
 Wall time: 15.22 s
 sage: %time g.min_spanning_tree();
 CPU times: user 14.86 s, sys: 0.00 s, total: 14.86 s
 Wall time: 14.86 s
 sage: %time g.min_spanning_tree();
 CPU times: user 14.72 s, sys: 0.00 s, total: 14.72 s
 Wall time: 14.71 s
 sage: %time g.min_spanning_tree();
 CPU times: user 15.10 s, sys: 0.00 s, total: 15.10 s
 Wall time: 15.09 s
 }}}

 Finally, with the patch and directly import and use `kruskal()`:

 {{{
 #!python
 sage: g = load("graphs")
 sage: from sage.graphs.spanning_tree import kruskal
 sage: %time kruskal(g);
 CPU times: user 15.11 s, sys: 0.00 s, total: 15.11 s
 Wall time: 15.11 s
 sage: %time kruskal(g);
 CPU times: user 14.79 s, sys: 0.00 s, total: 14.79 s
 Wall time: 14.79 s
 sage: %time kruskal(g);
 CPU times: user 14.66 s, sys: 0.00 s, total: 14.66 s
 Wall time: 14.66 s
 sage: %time kruskal(g);
 CPU times: user 14.99 s, sys: 0.00 s, total: 14.99 s
 Wall time: 14.99 s
 }}}
 [[BR]]


 > Calling ``to_undirected()`` ::
 >
 > {{{
 > #!python
 > for i in range(10):
 >     g = graphs.RandomGNP(50,.3)
 >     timeit("g.to_undirected()")
 >
 > 125 loops, best of 3: 4.43 ms per loop
 > 125 loops, best of 3: 4.56 ms per loop
 > 125 loops, best of 3: 4.88 ms per loop
 > 125 loops, best of 3: 4.59 ms per loop
 > 125 loops, best of 3: 4.92 ms per loop
 > 125 loops, best of 3: 4.54 ms per loop
 > 125 loops, best of 3: 4.57 ms per loop
 > 125 loops, best of 3: 4.25 ms per loop
 > 125 loops, best of 3: 4.69 ms per loop
 > 125 loops, best of 3: 4.63 ms per loop
 > }}}

 The call to `to_undirected()`, `is_connected()`, and `to_simple()` are
 only performed when you do sanity checks. There are no other way around
 them. If sanity checking is toggled on, we need to somehow get the input
 graph to be simple, connected, undirected, not a tree, and has at least
 one vertex. Note that the current implementation of the function
 `kruskal()` is meant to be a generic implementation, so it needs to handle
 various types of graphs.
 [[BR]][[BR]]


 > So in general the default behaviour leads to a much slower performance,
 even though disabling the checkings makes it better than the current
 implementation. I think most of it is because of the time required to copy
 the graph (checking the connectivity is comparatively *much* faster). I
 tried to find some cheap trick to avoid the checking, at least when the
 graph is simple and unweighted. Actually, I rewrote a computation of
 spanning trees based on depth-first search, but I wasn't able to do better
 than what your code does when the checkings are disabled, so it's not
 useful by itself. I think the best way is to find a way around graph
 copies, but no cheap tricks so far `^^;`

 Here's one way to improve performance: get rid of the keyword "as_edges"
 and return only the edges of a minimum spanning tree. You might want
 `kruskal()` to return an actual tree object that represents a minimum
 spanning tree, but the behaviour of returning the edges can later on be
 tweaked to use generators and the yield statement to increase performance.
 Currently, the latest version of Cython (version 0.14) doesn't support
 generators and the yield statement, so there's no way to use them yet. But
 I have updated the patch to get rid of the keyword "as_edges". Later on
 when Cython supports the yield statement, we could use that. The advantage
 then would be that once an edge of a minimum spanning tree is found, it is
 returned via a yield statement and you could immediately use that edge to
 build your tree while the function `kruskal()` is running. This has the
 effect of returning an iterator over the edges of a minimum spanning tree,
 instead of first waiting for all such edges to be found and returned
 before you could start using them.
 [[BR]]

 What could also increase performance slightly is to use a priority queue
 to iterate over edges of the input graph, instead of using a sorting
 function. But a Cython implementation of priority queue is still not yet
 available.

-- 
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/10433#comment:12>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sage-trac?hl=en.

Reply via email to