#10433: clean up the code for Kruskal's algorithm
-------------------------------+--------------------------------------------
Reporter: mvngu | Owner: jason, ncohen, rlm
Type: enhancement | Status: needs_review
Priority: major | Milestone: sage-4.6.2
Component: graph theory | Keywords:
Author: Minh Van Nguyen | Upstream: N/A
Reviewer: | Merged:
Work_issues: |
-------------------------------+--------------------------------------------
Changes (by mvngu):
* status: needs_work => needs_review
Comment:
Replying to [comment:7 ncohen]:
> * it is not possible to turn the sanity checks off from the
min_spanning_tree method. If a users wants to do so, he has to import the
kruskal method himself
I've added a new parameter "check" to the method `min_spanning_tree()`.
Wherever appropriate, this parameter will be passed on to any function
used in computing a minimum spanning tree.
[[BR]][[BR]]
> * the min_spanning_tree algororithm is one of the most basic, and is
meant to be *fast*. That's also why it is very good to have it in Cython.
But when the algorithm is run, here is what happens : test of
connectedness --> BFS/DFS --> the whole graph is read. When this is done,
you are testing whether the graph is a tree : first is_connected is
called, then the edges are counted. 2 traversals at this moment. You could
save one by just counting the edges of the graph, as you just checked it
is connected ( that's exactly how the is_tree method is written ). Anyway,
the best would really be to run kruskal's algorithm, to theck afterwards
whether the result is connected.
You're right. That's two computation of connectedness. This is now fixed
to only do one computation of connectedness and afterward to check for
tree.
[[BR]][[BR]]
> * I was not able to produce a counterexample, but I think there's
something wrong in how you remove multiple edges by keeping the minimum.
You list all the multiple edges (and all the edges of a complete graph may
be doubled or tripled). You then remember for each vertex *one of the
neighbors* with which it has many edges. You are then selecting a neighbor
for each vertex in the worst case, which mean a linear number of
information. Then, for each vertex and the neighbor you selected, you
remember the edge of smallest weight, which is ok... But why should you
have removed all the double edges ? What happens if a vertex has multiple
edges in common with many of its neighbors ? O_o
Here is an explanation of the code block on retaining only the minimum
weighted edge among all weighted multiple edges with the same start and
edge vertices. Let G be an undirected, weighted multigraph. If there are
multiple edges from u to v, retain only the start and end vertices of such
edges; at this point we don't care about the edge weights. Let a and b be
the start and end vertices, respectively, of a weighted edge `(a, b, w)`
having weight w. Then there are multiple weighted edges from a to b if and
only if the dictionary `uniqE` has the key/value pair `(a, b)`. Let `(u,
v)` be a key/value pair in `uniqE`. Then there are multiple weighted edges
from u to v. Let W be a list of all edge weights of multiple edges from u
to v, sorted in nondecreasing order. If w is the first element in W, then
`(u, v, w)` is an edge of minimum weight (there may be several edges of
minimum weight) among all weighted edges from u to v. If `i >= 2` is the
i-th element in W, delete the multiple weighted edge `(u, v, i)`. The only
remaining weighted edge is `(u, v, w)` with w being the first element of
W.
[[BR]][[BR]]
> * Personnal question : why is itemgetter(2) better than lambda
x:x[1] ? Or is it just because there are no lambda functions available in
Cython ?
Specifically, I'm using the idiom:
{{{
from operator import itemgetter
sorted(iterable, key=itemgetter(n))
}}}
It is more or less equivalent to using a lambda, but there are clear
documentation on the above sorting idiom. See
[http://docs.python.org/library/operator.html#operator.itemgetter this
page] in the Python library. And this
[http://wiki.python.org/moin/HowTo/Sorting/ other page] from the Python
wiki, especially the section "Operator Module Functions". Also note that
what you're suggesting is using a closure, which is not supported in the
version of Cython currently distributed with Sage 4.6.1.alpha3.
[[BR]][[BR]]
> * Whatever happens, you seem to need a copy of your graph... What
about dealing with multiple edges on the way ? Sorting all the edges
according to their weight, even with multiple ones. When you then try to
add the heavier copy of an edge you already tried, it is bound to be
refused anyway... Do you think it would be slower like that ?
I work with the assumption that we can only compute minimum spanning trees
for undirected simple graphs. Let g be a copy of the input graph and
suppose g is an unweighted multigraph. By the working assumption, we
merely convert g to be a simple graph and then run the resulting simple
graph through Kruskal's algorithm. I think working directly with the input
graph, and not a copy of this graph, would result in more complicated
code. If you want speed, make sure your input graph is a simple unweighted
graph that is connected and not a tree. The keyword `check` was designed
for this purpose
[[BR]][[BR]]
> * Why do you write "for startv in iter(e[0:2]):" instead of "for
startv in e[0:2]:" ? O_o
Because I want an iterator over the list returned by "e[0:2]". If I do
{{{
for startv in e[0:2]:
}}}
then I would first produce a list as requested by the slice operation
"e[0:2]" and I would need to wait for this production process to end.
After I have finished producing the list, I would then begin iterating
over items in the list. However, if I do
{{{
for startv in iter(e[0:2]):
}}}
then I don't need to first wait for the list "e[0:2]" to be produced
before I can start iterating over its elements. The function `iter()`
returns an iterator over an iterable such that the next item in the
iterable is only produced when necessary. Using iterators with `iter()` or
the `yield` statement can save some computation time.
[[BR]][[BR]]
> * Your algorithm returns a list of edges. Many of the methods I
first wrote were doing the same, and returned a list of edges instead of
returning a graph, to save the time required to build the graph
structure... After using them for a while, I noticed I was ALWAYS casting
the result to a graph, as it is, in the end, the best way to "represent a
set of pairs"... That's what it is made to be in the first place :-D
I've added the keyword `as_edges`. If the value of this boolean is true,
then output a list of weighted edges comprising a minimum spanning tree.
If `as_edges=False`, then output an actual tree that is a minimum spanning
tree.
[[BR]][[BR]]
--
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/10433#comment:8>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/sage-trac?hl=en.