#10433: clean up the code for Kruskal's algorithm
-------------------------------+--------------------------------------------
   Reporter:  mvngu            |       Owner:  jason, ncohen, rlm
       Type:  enhancement      |      Status:  needs_review      
   Priority:  major            |   Milestone:  sage-4.6.2        
  Component:  graph theory     |    Keywords:                    
     Author:  Minh Van Nguyen  |    Upstream:  N/A               
   Reviewer:                   |      Merged:                    
Work_issues:                   |  
-------------------------------+--------------------------------------------
Changes (by mvngu):

  * status:  needs_work => needs_review


Comment:

 Replying to [comment:7 ncohen]:
 >     * it is not possible to turn the sanity checks off from the
 min_spanning_tree method. If a users wants to do so, he has to import the
 kruskal method himself

 I've added a new parameter "check" to the method `min_spanning_tree()`.
 Wherever appropriate, this parameter will be passed on to any function
 used in computing a minimum spanning tree.
 [[BR]][[BR]]


 >     * the min_spanning_tree algororithm is one of the most basic, and is
 meant to be *fast*. That's also why it is very good to have it in Cython.
 But when the algorithm is run, here is what happens : test of
 connectedness --> BFS/DFS --> the whole graph is read. When this is done,
 you are testing whether the graph is a tree : first is_connected is
 called, then the edges are counted. 2 traversals at this moment. You could
 save one by just counting the edges of the graph, as you just checked it
 is connected ( that's exactly how the is_tree method is written ). Anyway,
 the best would really be to run kruskal's algorithm, to theck afterwards
 whether the result is connected.

 You're right. That's two computation of connectedness. This is now fixed
 to only do one computation of connectedness and afterward to check for
 tree.
 [[BR]][[BR]]


 >     * I was not able to produce a counterexample, but I think there's
 something wrong in how you remove multiple edges by keeping the minimum.
 You list all the multiple edges (and all the edges of a complete graph may
 be doubled or tripled). You then remember for each vertex *one of the
 neighbors* with which it has many edges. You are then selecting a neighbor
 for each vertex in the worst case, which mean a linear number of
 information. Then, for each vertex and the neighbor you selected, you
 remember the edge of smallest weight, which is ok... But why should you
 have removed all the double edges ? What happens if a vertex has multiple
 edges in common with many of its neighbors ? O_o

 Here is an explanation of the code block on retaining only the minimum
 weighted edge among all weighted multiple edges with the same start and
 edge vertices. Let G be an undirected, weighted multigraph. If there are
 multiple edges from u to v, retain only the start and end vertices of such
 edges; at this point we don't care about the edge weights. Let a and b be
 the start and end vertices, respectively, of a weighted edge `(a, b, w)`
 having weight w. Then there are multiple weighted edges from a to b if and
 only if the dictionary `uniqE` has the key/value pair `(a, b)`. Let `(u,
 v)` be a key/value pair in `uniqE`. Then there are multiple weighted edges
 from u to v. Let W be a list of all edge weights of multiple edges from u
 to v, sorted in nondecreasing order. If w is the first element in W, then
 `(u, v, w)` is an edge of minimum weight (there may be several edges of
 minimum weight) among all weighted edges from u to v. If `i >= 2` is the
 i-th element in W, delete the multiple weighted edge `(u, v, i)`. The only
 remaining weighted edge is `(u, v, w)` with w being the first element of
 W.
 [[BR]][[BR]]


 >     * Personnal question : why is itemgetter(2) better than lambda
 x:x[1] ? Or is it just because there are no lambda functions available in
 Cython ?

 Specifically, I'm using the idiom:

 {{{
 from operator import itemgetter
 sorted(iterable, key=itemgetter(n))
 }}}

 It is more or less equivalent to using a lambda, but there are clear
 documentation on the above sorting idiom. See
 [http://docs.python.org/library/operator.html#operator.itemgetter this
 page] in the Python library. And this
 [http://wiki.python.org/moin/HowTo/Sorting/ other page] from the Python
 wiki, especially the section "Operator Module Functions". Also note that
 what you're suggesting is using a closure, which is not supported in the
 version of Cython currently distributed with Sage 4.6.1.alpha3.
 [[BR]][[BR]]


 >     * Whatever happens, you seem to need a copy of your graph... What
 about dealing with multiple edges on the way ? Sorting all the edges
 according to their weight, even with multiple ones. When you then try to
 add the heavier copy of an edge you already tried, it is bound to be
 refused anyway... Do you think it would be slower like that ?

 I work with the assumption that we can only compute minimum spanning trees
 for undirected simple graphs. Let g be a copy of the input graph and
 suppose g is an unweighted multigraph. By the working assumption, we
 merely convert g to be a simple graph and then run the resulting simple
 graph through Kruskal's algorithm. I think working directly with the input
 graph, and not a copy of this graph, would result in more complicated
 code. If you want speed, make sure your input graph is a simple unweighted
 graph that is connected and not a tree. The keyword `check` was designed
 for this purpose
 [[BR]][[BR]]

 >     * Why do you write "for startv in iter(e[0:2]):" instead of "for
 startv in e[0:2]:" ? O_o

 Because I want an iterator over the list returned by "e[0:2]". If I do

 {{{
 for startv in e[0:2]:
 }}}

 then I would first produce a list as requested by the slice operation
 "e[0:2]" and I would need to wait for this production process to end.
 After I have finished producing the list, I would then begin iterating
 over items in the list. However, if I do

 {{{
 for startv in iter(e[0:2]):
 }}}

 then I don't need to first wait for the list "e[0:2]" to be produced
 before I can start iterating over its elements. The function `iter()`
 returns an iterator over an iterable such that the next item in the
 iterable is only produced when necessary. Using iterators with `iter()` or
 the `yield` statement can save some computation time.
 [[BR]][[BR]]


 >     * Your algorithm returns a list of edges. Many of the methods I
 first wrote were doing the same, and returned a list of edges instead of
 returning a graph, to save the time required to build the graph
 structure... After using them for a while, I noticed I was ALWAYS casting
 the result to a graph, as it is, in the end, the best way to "represent a
 set of pairs"... That's what it is made to be in the first place :-D

 I've added the keyword `as_edges`. If the value of this boolean is true,
 then output a list of weighted edges comprising a minimum spanning tree.
 If `as_edges=False`, then output an actual tree that is a minimum spanning
 tree.
 [[BR]][[BR]]

-- 
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/10433#comment:8>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sage-trac?hl=en.

Reply via email to