#19014: Tarjan Strongly Connected Components Algorithm
-------------------------------------+-------------------------------------
Reporter: borassi | Owner:
Type: enhancement | Status: needs_review
Priority: major | Milestone: sage-6.9
Component: graph theory | Resolution:
Keywords: Strongly | Merged in:
connected components, Tarjan | Reviewers:
Authors: Michele Borassi | Work issues:
Report Upstream: N/A | Commit:
Branch: | 0f855c6676d6f10a6f68b64b4c2be693c08eba91
u/borassi/tarjan_strongly_connected_components_algorithm| Stopgaps:
Dependencies: |
-------------------------------------+-------------------------------------
Comment (by borassi):
Hello!
Let me try to address these problems!
Thank you very much,
Michele
Replying to [comment:13 dcoudert]:
> > > I have some remarks:
> > > - Do we really need to have both
`_tarjan_strongly_connected_components` and
`tarjan_strongly_connected_components` ? Same for other methods. What's
the motivation behind?
> > >
> > >
> > >
> > >
> >
> > Yes, I think I need both methods. The point is that I want to
implement a fast algorithm for closeness centrality ![1], and the first
step is computing SCCs. In that implementation, I will first convert the
graph to a `static_sparse_graph`, and then I will have to run the SCC
algorithm on this graph, using the C routine
`_tarjan_strongly_connected_components`. I also need the other routine to
access `_tarjan_strongly_connected_components` from Python code (for
instance, in a .py file). Do you have better ideas?
> >
> >
> >
> >
>
> If you plan to call these methods directly from other cython files, it
make sense to split the method in two pieces.
> With Nathann we recently started to name such methods
`compute_cool_stuff_C` rather than `_compute_cool_stuff`. I don't know if
there is a general rule for that.
Done!
> > > - Have you tried using an array of short instead of a bitset for
`in_scc_stack` ? We don't have memory issue here so we can save some
operations, plus you already do that for `visited`
> > >
> > >
> > >
> > >
> >
> > Hmmm, I should really understand better how bitset work. I used it
only because I thought it was faster than standard arrays. In any case, I
removed all bitsets from this code.
> >
> >
> >
> >
>
> Well, bitsets are really fast, and to be honest we save very little
using an array instead. The code is shorter. This is already something.
Ok, let's leave it as it is!
> > > - I don't really see the benefit of using numpy (I don't know it).
Is it just more convenient or also faster that other methods? You could
use something like
> > > {{{
> > > d = {i:list() for i in range(nscc)}
> > > for u,i in scc.iteritems():
> > > d[i].append(u)
> > > output = [d[i] for i in range(nscc)]
> > > }}}
> > >
> > >
> > >
> > >
> >
> > Well, probably the problem is that I'm a theoretical guy, and if the
algorithm is linear I want a linear implementation. This is why I used an
array and not a dictionary (and, as far as I know, numpy is the only way
to build an array of Python objects). However, if we use a dictionary, the
running-time is almost the same, even when there are a lot of SCCs. I
deleted numpy and now I use dictionaries.
> >
> > {{{
> > WITH NUMPY
> > sage: g = DiGraph(10000000)
> > sage: %timeit g.strongly_connected_components()
> > 1 loops, best of 3: 9.44 s per loop
> >
> > WITH DICTIONARIES
> > sage: g = DiGraph(10000000)
> > sage: %timeit g.strongly_connected_components()
> > 1 loops, best of 3: 9.87 s per loop
> > }}}
> >
> >
> >
> >
>
> another option (don't know if it is faster or not)
> {{{
> cdef int i
> cdef list output = list(list() for i in range(nscc)) # We cannot use []
here
> for i,v in enumerate(G.vertex_iterator()):
> output[scc[i]].append(v)
> }}}
Cool! Not only it is simpler, it is also faster!
{{{
sage: g = DiGraph(10000000)
sage: %timeit g.strongly_connected_components()
1 loops, best of 3: 7.89 s per loop
}}}
> Method `strongly_connected_components_digraph`:
> - is not currently called by
`DiGraph().strongly_connected_components_digraph()` and will certainly
never be
> - seems more complicated than what it should. Indeed, as soon as you
have a mapping `vertex->scc` you can simply iterate over the edges `(u,v)`
of the input digraph and add edges `(scc[u],scc[v])` to the output
digraph. I don't see the need for the
`_strongly_connected_components_digraph` method and it is rather
complicated.
Hmmm, this is the hard part of my answer...
The problem is that I will need a C algorithm working on
`static_sparse_graphs` to compute the digraph of strongly connected
components in linear time, with no parallel edge. I think this is the only
way to do it: I know it is complicated, but I do not know any other way.
Do you think I should use n dictionaries, to check for parallel edges?
> General comments:
> - `if visited[u] == 0:` -> `if not visited[u]`
Done!
> - `return [output, {v:scc[i] for i,v in enumerate(G.vertices())}]` ->
`return output, {v:scc[i] for i,v in enumerate(G.vertices())}` no need
for explicitely creating a list
Done!
> - you have broken doctests caused by this ticket. I have not tracked the
dependencies.
The problem is the order in which strongly connected components are
outputted in the list: we can change the results without any problem. In
the following, I try to explain why.
> {{{
> sage -t --long src/sage/monoids/automatic_semigroup.py
> **********************************************************************
> File "src/sage/monoids/automatic_semigroup.py", line 135, in
sage.monoids.automatic_semigroup.[wiki:AutomaticSemigroup]
> Failed example:
> map(sorted, M.j_classes())
> Expected:
> [[[], [2]], [[1, 1], [1]]]
> Got:
> [[[1], [1, 1]], [[], [2]]]
The J-classes are sets of elements in a semigroups satisfying a specific
property (for more information, see
http://www.liafa.jussieu.fr/~jep/PDF/HandBook.pdf). The difference amont
the results is the order in which the elements are listed, but since in
sets the order does not count, both results are correct.
> **********************************************************************
> File "src/sage/monoids/automatic_semigroup.py", line 137, in
sage.monoids.automatic_semigroup.[wiki:AutomaticSemigroup]
> Failed example:
> M.j_classes_of_idempotents()
> Expected:
> [[[]], [[1, 1]]]
> Got:
> [[[1, 1]], [[]]]
Here, the difference is the order in which the two classes of idempotents
are outputted. Since method j_classes_of_idempotents does not set a
specific order, both results are correct.
> **********************************************************************
> File "src/sage/monoids/automatic_semigroup.py", line 139, in
sage.monoids.automatic_semigroup.[wiki:AutomaticSemigroup]
> Failed example:
> M.j_transversal_of_idempotents()
> Expected:
> [[], [1, 1]]
> Got:
> [[1, 1], []]
Here, we simply get one element from the classes found in the previous
example: the order is maintained. Hence, both results are again correct.
> **********************************************************************
> 1 item had failures:
> 3 of 80 in sage.monoids.automatic_semigroup.[wiki:AutomaticSemigroup]
> [264 tests, 3 failures, 1.94 s]
> sage -t --long src/sage/categories/finite_semigroups.py
> **********************************************************************
> File "src/sage/categories/finite_semigroups.py", line 119, in
sage.categories.finite_semigroups.[wiki:FiniteSemigroups].[wiki:ParentMethods].j_transversal_of_idempotents
> Failed example:
> sorted(S.j_transversal_of_idempotents())
> Expected:
> ['a', 'ab', 'ac', 'acb', 'b', 'c', 'cb']
> Got:
> ['a', 'ab', 'abc', 'ac', 'b', 'bc', 'c']
> }}}
Here, 'abc' is in the same class of idempotents as 'acb', and 'bc' is in
the same class of idempotents as 'cb', as shown by the example before this
one. Since this routine should output one idempotent per class, we are
fine!
--
Ticket URL: <http://trac.sagemath.org/ticket/19014#comment:14>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/d/optout.