#19014: Tarjan Strongly Connected Components Algorithm
-------------------------------------+-------------------------------------
Reporter: borassi | Owner:
Type: enhancement | Status: needs_review
Priority: major | Milestone: sage-6.9
Component: graph theory | Resolution:
Keywords: Strongly | Merged in:
connected components, Tarjan | Reviewers:
Authors: Michele Borassi | Work issues:
Report Upstream: N/A | Commit:
Branch: | 8df530ab5c761501b10ba397e361889b4be91833
u/borassi/tarjan_strongly_connected_components_algorithm| Stopgaps:
Dependencies: |
-------------------------------------+-------------------------------------
Comment (by borassi):
Replying to [comment:5 dcoudert]:
> Hello,
>
> You have inverted the timing for GNM. I got
> {{{
> OLD algorithm
> sage: set_random_seed(0)
> sage: g = digraphs.RandomDirectedGNM(10000,30000)
> sage: %timeit g.strongly_connected_components()
> 1 loops, best of 3: 7.87 s per loop
> sage: g.add_cycle(g.vertices())
> sage: %timeit g.strongly_connected_components()
> 100 loops, best of 3: 21 ms per loop
> sage: g = digraphs.Circuit(100000)
> sage: %timeit g.strongly_connected_components()
> 10 loops, best of 3: 142 ms per loop
>
> NEW algorithm
> sage: set_random_seed(0)
> sage: g = digraphs.RandomDirectedGNM(10000,30000)
> sage: %timeit g.strongly_connected_components()
> 100 loops, best of 3: 21.5 ms per loop
> sage: g.add_cycle(g.vertices())
> sage: %timeit g.strongly_connected_components()
> 10 loops, best of 3: 22.1 ms per loop
> sage: g = digraphs.Circuit(100000)
> sage: %timeit g.strongly_connected_components()
> 10 loops, best of 3: 260 ms per loop
> }}}
>
> I agree that the gain for non-strongly connected digraphs is so high
that it is worth the small increase for strongly connected digraphs.
Ups, sorry! Clearly, the fastest method was the new one.
> I have some remarks:
> - Do we really need to have both `_tarjan_strongly_connected_components`
and `tarjan_strongly_connected_components` ? Same for other methods.
What's the motivation behind?
Yes, I think I need both methods. The point is that I want to implement a
fast algorithm for closeness centrality ![1], and the first step is
computing SCCs. In that implementation, I will first convert the graph to
a `static_sparse_graph`, and then I will have to run the SCC algorithm on
this graph, using the C routine `_tarjan_strongly_connected_components`. I
also need the other routine to access
`_tarjan_strongly_connected_components` from Python code (for instance, in
a .py file). Do you have better ideas?
> - What's the need for returning a dictionary? it is immediately
converted to a list of lists in `strongly_connected_components`. Do you
need this feature for something else?
You are right: I have moved all the code in `static_sparse_graph.pyx`, and
I removed the dictionary. In order to import the method to !DiGraph, I
used types.methodtype.
> - You have let another version of `strongly_connected_components` in
`static_sparse_graph.pyx`. Is this method used somewhere?
You are right, it is an old remnant of old routines, used only in a test.
I removed everything.
> - Have you tried using an array of short instead of a bitset for
`in_scc_stack` ? We don't have memory issue here so we can save some
operations, plus you already do that for `visited`
Hmmm, I should really understand better how bitset work. I used it only
because I thought it was faster than standard arrays. In any case, I
removed all bitsets from this code.
> - In `_strongly_connected_components_digraph` you have but you don't use
`MemoryAllocator`. Here also, you could use array of shorts instead of
bitset.
Since I removed the bitsets, now I use `MemoryAllocator`.
> - I don't really see the benefit of using numpy (I don't know it). Is it
just more convenient or also faster that other methods? You could use
something like
> {{{
> d = {i:list() for i in range(nscc)}
> for u,i in scc.iteritems():
> d[i].append(u)
> output = [d[i] for i in range(nscc)]
> }}}
Well, probably the problem is that I'm a theoretical guy, and if the
algorithm is linear I want a linear implementation. This is why I used an
array and not a dictionary (and, as far as I know, numpy is the only way
to build an array of Python objects). However, if we use a dictionary, the
running-time is almost the same, even when there are a lot of SCCs. I
deleted numpy and now I use dictionaries.
{{{
WITH NUMPY
sage: g = DiGraph(10000000)
sage: %timeit g.strongly_connected_components()
1 loops, best of 3: 9.44 s per loop
WITH DICTIONARIES
sage: g = DiGraph(10000000)
sage: %timeit g.strongly_connected_components()
1 loops, best of 3: 9.87 s per loop
}}}
> David.
[1] !http://arxiv.org/abs/1507.01490
--
Ticket URL: <http://trac.sagemath.org/ticket/19014#comment:11>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/d/optout.