#10433: clean up the code for Kruskal's algorithm
-------------------------------+--------------------------------------------
Reporter: mvngu | Owner: jason, ncohen, rlm
Type: enhancement | Status: needs_work
Priority: major | Milestone: sage-4.6.2
Component: graph theory | Keywords:
Author: Minh Van Nguyen | Upstream: N/A
Reviewer: | Merged:
Work_issues: |
-------------------------------+--------------------------------------------
Changes (by ncohen):
* status: needs_review => needs_work
Comment:
Hello Minh !!!!
I just looked at your code... Here are several questions/remarks about it
:
* it is not possible to turn the sanity checks off from the
min_spanning_tree method. If a users wants to do so, he has to import the
kruskal method himself
* the min_spanning_tree algororithm is one of the most basic, and is
meant to be *fast*. That's also why it is very good to have it in Cython.
But when the algorithm is run, here is what happens : test of
connectedness --> BFS/DFS --> the whole graph is read. When this is done,
you are testing whether the graph is a tree : first is_connected is
called, then the edges are counted. 2 traversals at this moment. You could
save one by just counting the edges of the graph, as you just checked it
is connected ( that's exactly how the is_tree method is written ). Anyway,
the best would really be to run kruskal's algorithm, to theck afterwards
whether the result is connected.
* I was not able to produce a counterexample, but I think there's
something wrong in how you remove multiple edges by keeping the minimum.
You list all the multiple edges (and all the edges of a complete graph may
be doubled or tripled). You then remember for each vertex *one of the
neighbors* with which it has many edges. You are then selecting a neighbor
for each vertex in the worst case, which mean a linear number of
information. Then, for each vertex and the neighbor you selected, you
remember the edge of smallest weight, which is ok... But why should you
have removed all the double edges ? What happens if a vertex has multiple
edges in common with many of its neighbors ? O_o
* Personnal question : why is itemgetter(2) better than lambda x:x[1]
? Or is it just because there are no lambda functions available in Cython
?
* Whatever happens, you seem to need a copy of your graph... What
about dealing with multiple edges on the way ? Sorting all the edges
according to their weight, even with multiple ones. When you then try to
add the heavier copy of an edge you already tried, it is bound to be
refused anyway... Do you think it would be slower like that ?
* Why do you write "for startv in iter(e[0:2]):" instead of "for
startv in e[0:2]:" ? O_o
* Your algorithm returns a list of edges. Many of the methods I first
wrote were doing the same, and returned a list of edges instead of
returning a graph, to save the time required to build the graph
structure... After using them for a while, I noticed I was ALWAYS casting
the result to a graph, as it is, in the end, the best way to "represent a
set of pairs"... That's what it is made to be in the first place :-D
That's really up to you on that one. I wouldn't say returning a list of
edges has any special fault, but I am now writing my methods so that they
return graphs directly... Probably just a matter of taste.
* I had never seen this union-find method... It took me some time, but
that's a pretty nice trick :-)
Nathann
--
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/10433#comment:7>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/sage-trac?hl=en.