#10433: clean up the code for Kruskal's algorithm
-------------------------------+--------------------------------------------
Reporter: mvngu | Owner: jason, ncohen, rlm
Type: enhancement | Status: needs_review
Priority: major | Milestone: sage-4.6.2
Component: graph theory | Keywords:
Author: Minh Van Nguyen | Upstream: N/A
Reviewer: | Merged:
Work_issues: |
-------------------------------+--------------------------------------------
Changes (by mvngu):
* status: needs_info => needs_review
Comment:
Replying to [comment:11 ncohen]:
> Hello Minh !!!
>
> I'm sorry to come back with another thing to fix after having made so
many remarks, but I really hadn't thought this problem would occur.. I
just did some profiling, and even though your code is faster when no
checkings are required, there is a large slowdown by default :
What can be done is to turn off the sanity checks by default. I have done
this in the updated patch. But note that the function now assumes that by
default the input graph is simple, connected, not a tree, and has at least
one vertex. If you can't be certain that the input graph meet all of the
latter conditions, then toggle `check=True`. With the updated patch, here
are some timing results. First, I generate the reference graph to be used
for all profiling:
{{{
#!python
sage: g = graphs.RandomGNP(2000, 0.3)
sage: save(g, "graphs")
}}}
I get some timing for the case where we don't use the patch:
{{{
#!python
sage: g = load("graphs")
sage: %time g.min_spanning_tree();
CPU times: user 15.57 s, sys: 0.00 s, total: 15.57 s
Wall time: 15.57 s
sage: %time g.min_spanning_tree();
CPU times: user 16.29 s, sys: 0.00 s, total: 16.29 s
Wall time: 16.29 s
sage: %time g.min_spanning_tree();
CPU times: user 16.02 s, sys: 0.00 s, total: 16.02 s
Wall time: 16.02 s
sage: %time g.min_spanning_tree();
CPU times: user 16.50 s, sys: 0.00 s, total: 16.50 s
Wall time: 16.50 s
}}}
Now with the patch, but use `kruskal()` indirectly via
`min_spanning_tree()`:
{{{
#!python
sage: g = load("graphs")
sage: %time g.min_spanning_tree();
CPU times: user 15.22 s, sys: 0.00 s, total: 15.22 s
Wall time: 15.22 s
sage: %time g.min_spanning_tree();
CPU times: user 14.86 s, sys: 0.00 s, total: 14.86 s
Wall time: 14.86 s
sage: %time g.min_spanning_tree();
CPU times: user 14.72 s, sys: 0.00 s, total: 14.72 s
Wall time: 14.71 s
sage: %time g.min_spanning_tree();
CPU times: user 15.10 s, sys: 0.00 s, total: 15.10 s
Wall time: 15.09 s
}}}
Finally, with the patch and directly import and use `kruskal()`:
{{{
#!python
sage: g = load("graphs")
sage: from sage.graphs.spanning_tree import kruskal
sage: %time kruskal(g);
CPU times: user 15.11 s, sys: 0.00 s, total: 15.11 s
Wall time: 15.11 s
sage: %time kruskal(g);
CPU times: user 14.79 s, sys: 0.00 s, total: 14.79 s
Wall time: 14.79 s
sage: %time kruskal(g);
CPU times: user 14.66 s, sys: 0.00 s, total: 14.66 s
Wall time: 14.66 s
sage: %time kruskal(g);
CPU times: user 14.99 s, sys: 0.00 s, total: 14.99 s
Wall time: 14.99 s
}}}
[[BR]]
> Calling ``to_undirected()`` ::
>
> {{{
> #!python
> for i in range(10):
> g = graphs.RandomGNP(50,.3)
> timeit("g.to_undirected()")
>
> 125 loops, best of 3: 4.43 ms per loop
> 125 loops, best of 3: 4.56 ms per loop
> 125 loops, best of 3: 4.88 ms per loop
> 125 loops, best of 3: 4.59 ms per loop
> 125 loops, best of 3: 4.92 ms per loop
> 125 loops, best of 3: 4.54 ms per loop
> 125 loops, best of 3: 4.57 ms per loop
> 125 loops, best of 3: 4.25 ms per loop
> 125 loops, best of 3: 4.69 ms per loop
> 125 loops, best of 3: 4.63 ms per loop
> }}}
The call to `to_undirected()`, `is_connected()`, and `to_simple()` are
only performed when you do sanity checks. There are no other way around
them. If sanity checking is toggled on, we need to somehow get the input
graph to be simple, connected, undirected, not a tree, and has at least
one vertex. Note that the current implementation of the function
`kruskal()` is meant to be a generic implementation, so it needs to handle
various types of graphs.
[[BR]][[BR]]
> So in general the default behaviour leads to a much slower performance,
even though disabling the checkings makes it better than the current
implementation. I think most of it is because of the time required to copy
the graph (checking the connectivity is comparatively *much* faster). I
tried to find some cheap trick to avoid the checking, at least when the
graph is simple and unweighted. Actually, I rewrote a computation of
spanning trees based on depth-first search, but I wasn't able to do better
than what your code does when the checkings are disabled, so it's not
useful by itself. I think the best way is to find a way around graph
copies, but no cheap tricks so far `^^;`
Here's one way to improve performance: get rid of the keyword "as_edges"
and return only the edges of a minimum spanning tree. You might want
`kruskal()` to return an actual tree object that represents a minimum
spanning tree, but the behaviour of returning the edges can later on be
tweaked to use generators and the yield statement to increase performance.
Currently, the latest version of Cython (version 0.14) doesn't support
generators and the yield statement, so there's no way to use them yet. But
I have updated the patch to get rid of the keyword "as_edges". Later on
when Cython supports the yield statement, we could use that. The advantage
then would be that once an edge of a minimum spanning tree is found, it is
returned via a yield statement and you could immediately use that edge to
build your tree while the function `kruskal()` is running. This has the
effect of returning an iterator over the edges of a minimum spanning tree,
instead of first waiting for all such edges to be found and returned
before you could start using them.
[[BR]]
What could also increase performance slightly is to use a priority queue
to iterate over edges of the input graph, instead of using a sorting
function. But a Cython implementation of priority queue is still not yet
available.
--
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/10433#comment:12>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/sage-trac?hl=en.