#19014: Tarjan Strongly Connected Components Algorithm
-------------------------------------+-------------------------------------
       Reporter:  borassi            |        Owner:
           Type:  enhancement        |       Status:  needs_review
       Priority:  major              |    Milestone:  sage-6.9
      Component:  graph theory       |   Resolution:
       Keywords:  Strongly           |    Merged in:
  connected components, Tarjan       |    Reviewers:
        Authors:  Michele Borassi    |  Work issues:
Report Upstream:  N/A                |       Commit:
         Branch:                     |  8df530ab5c761501b10ba397e361889b4be91833
  u/borassi/tarjan_strongly_connected_components_algorithm|     Stopgaps:
   Dependencies:                     |
-------------------------------------+-------------------------------------

Comment (by borassi):

 Replying to [comment:5 dcoudert]:

 > Hello,
 >
 > You have inverted the timing for GNM. I got
 > {{{
 > OLD algorithm
 > sage: set_random_seed(0)
 > sage: g = digraphs.RandomDirectedGNM(10000,30000)
 > sage: %timeit g.strongly_connected_components()
 > 1 loops, best of 3: 7.87 s per loop
 > sage: g.add_cycle(g.vertices())
 > sage: %timeit g.strongly_connected_components()
 > 100 loops, best of 3: 21 ms per loop
 > sage: g = digraphs.Circuit(100000)
 > sage: %timeit g.strongly_connected_components()
 > 10 loops, best of 3: 142 ms per loop
 >
 > NEW algorithm
 > sage: set_random_seed(0)
 > sage: g = digraphs.RandomDirectedGNM(10000,30000)
 > sage: %timeit g.strongly_connected_components()
 > 100 loops, best of 3: 21.5 ms per loop
 > sage: g.add_cycle(g.vertices())
 > sage: %timeit g.strongly_connected_components()
 > 10 loops, best of 3: 22.1 ms per loop
 > sage: g = digraphs.Circuit(100000)
 > sage: %timeit g.strongly_connected_components()
 > 10 loops, best of 3: 260 ms per loop
 > }}}
 >
 > I agree that the gain for non-strongly connected digraphs is so high
 that it is worth the small increase for strongly connected digraphs.

 Ups, sorry! Clearly, the fastest method was the new one.

 > I have some remarks:
 > - Do we really need to have both `_tarjan_strongly_connected_components`
 and `tarjan_strongly_connected_components` ? Same for other methods.
 What's the motivation behind?

 Yes, I think I need both methods. The point is that I want to implement a
 fast algorithm for closeness centrality ![1], and the first step is
 computing SCCs. In that implementation, I will first convert the graph to
 a `static_sparse_graph`, and then I will have to run the SCC algorithm on
 this graph, using the C routine `_tarjan_strongly_connected_components`. I
 also need the other routine to access
 `_tarjan_strongly_connected_components` from Python code (for instance, in
 a .py file). Do you have better ideas?

 > - What's the need for returning a dictionary? it is immediately
 converted to a list of lists in `strongly_connected_components`. Do you
 need this feature for something else?

 You are right: I have moved all the code in `static_sparse_graph.pyx`, and
 I removed the dictionary. In order to import the method to !DiGraph, I
 used types.methodtype.

 > - You have let another version of `strongly_connected_components` in
 `static_sparse_graph.pyx`. Is this method used somewhere?

 You are right, it is an old remnant of old routines, used only in a test.
 I removed everything.

 > - Have you tried using an array of short instead of a bitset for
 `in_scc_stack` ? We don't have memory issue here so we can save some
 operations, plus you already do that for `visited`

 Hmmm, I should really understand better how bitset work. I used it only
 because I thought it was faster than standard arrays. In any case, I
 removed all bitsets from this code.

 > - In `_strongly_connected_components_digraph` you have but you don't use
 `MemoryAllocator`. Here also, you could use array of shorts instead of
 bitset.

 Since I removed the bitsets, now I use `MemoryAllocator`.

 > - I don't really see the benefit of using numpy (I don't know it). Is it
 just more convenient or also faster that other methods? You could use
 something like
 > {{{
 > d = {i:list() for i in range(nscc)}
 > for u,i in scc.iteritems():
 > d[i].append(u)
 > output = [d[i] for i in range(nscc)]
 > }}}

 Well, probably the problem is that I'm a theoretical guy, and if the
 algorithm is linear I want a linear implementation. This is why I used an
 array and not a dictionary (and, as far as I know, numpy is the only way
 to build an array of Python objects). However, if we use a dictionary, the
 running-time is almost the same, even when there are a lot of SCCs. I
 deleted numpy and now I use dictionaries.

 {{{
 WITH NUMPY
 sage: g = DiGraph(10000000)
 sage: %timeit g.strongly_connected_components()
 1 loops, best of 3: 9.44 s per loop

 WITH DICTIONARIES
 sage: g = DiGraph(10000000)
 sage: %timeit g.strongly_connected_components()
 1 loops, best of 3: 9.87 s per loop
 }}}
 > David.

 [1] !http://arxiv.org/abs/1507.01490

--
Ticket URL: <http://trac.sagemath.org/ticket/19014#comment:11>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/d/optout.

Reply via email to