Re: [sage-trac] #19014: Tarjan Strongly Connected Components Algorithm

sage-trac Sun, 16 Aug 2015 10:42:52 -0700

#19014: Tarjan Strongly Connected Components Algorithm
-------------------------------------+-------------------------------------
       Reporter:  borassi            |        Owner:
           Type:  enhancement        |       Status:  needs_review
       Priority:  major              |    Milestone:  sage-6.9
      Component:  graph theory       |   Resolution:
       Keywords:  Strongly           |    Merged in:
  connected components, Tarjan       |    Reviewers:
        Authors:  Michele Borassi    |  Work issues:
Report Upstream:  N/A                |       Commit:
         Branch:                     |  0f855c6676d6f10a6f68b64b4c2be693c08eba91
  u/borassi/tarjan_strongly_connected_components_algorithm|     Stopgaps:
   Dependencies:                     |
-------------------------------------+-------------------------------------


Comment (by borassi):

 Hello!
 Let me try to address these problems!
 Thank you very much,
 Michele

 Replying to [comment:13 dcoudert]:

 > > > I have some remarks:
 > > > - Do we really need to have both
 `_tarjan_strongly_connected_components` and
 `tarjan_strongly_connected_components` ? Same for other methods. What's
 the motivation behind?
 > > >
 > > >
 > > >
 > > >
 > >
 > > Yes, I think I need both methods. The point is that I want to
 implement a fast algorithm for closeness centrality ![1], and the first
 step is computing SCCs. In that implementation, I will first convert the
 graph to a `static_sparse_graph`, and then I will have to run the SCC
 algorithm on this graph, using the C routine
 `_tarjan_strongly_connected_components`. I also need the other routine to
 access `_tarjan_strongly_connected_components` from Python code (for
 instance, in a .py file). Do you have better ideas?
 > >
 > >
 > >
 > >
 >
 > If you plan to call these methods directly from other cython files, it
 make sense to split the method in two pieces.
 > With Nathann we recently started to name such methods
 `compute_cool_stuff_C` rather than `_compute_cool_stuff`. I don't know if
 there is a general rule for that.

 Done!

 > > > - Have you tried using an array of short instead of a bitset for
 `in_scc_stack` ? We don't have memory issue here so we can save some
 operations, plus you already do that for `visited`
 > > >
 > > >
 > > >
 > > >
 > >
 > > Hmmm, I should really understand better how bitset work. I used it
 only because I thought it was faster than standard arrays. In any case, I
 removed all bitsets from this code.
 > >
 > >
 > >
 > >
 >
 > Well, bitsets are really fast, and to be honest we save very little
 using an array instead. The code is shorter. This is already something.

 Ok, let's leave it as it is!

 > > > - I don't really see the benefit of using numpy (I don't know it).
 Is it just more convenient or also faster that other methods? You could
 use something like
 > > > {{{
 > > > d = {i:list() for i in range(nscc)}
 > > > for u,i in scc.iteritems():
 > > > d[i].append(u)
 > > > output = [d[i] for i in range(nscc)]
 > > > }}}
 > > >
 > > >
 > > >
 > > >
 > >
 > > Well, probably the problem is that I'm a theoretical guy, and if the
 algorithm is linear I want a linear implementation. This is why I used an
 array and not a dictionary (and, as far as I know, numpy is the only way
 to build an array of Python objects). However, if we use a dictionary, the
 running-time is almost the same, even when there are a lot of SCCs. I
 deleted numpy and now I use dictionaries.
 > >
 > > {{{
 > > WITH NUMPY
 > > sage: g = DiGraph(10000000)
 > > sage: %timeit g.strongly_connected_components()
 > > 1 loops, best of 3: 9.44 s per loop
 > >
 > > WITH DICTIONARIES
 > > sage: g = DiGraph(10000000)
 > > sage: %timeit g.strongly_connected_components()
 > > 1 loops, best of 3: 9.87 s per loop
 > > }}}
 > >
 > >
 > >
 > >
 >
 > another option (don't know if it is faster or not)
 > {{{
 > cdef int i
 > cdef list output = list(list() for i in range(nscc))  # We cannot use []
 here
 > for i,v in enumerate(G.vertex_iterator()):
 > output[scc[i]].append(v)
 > }}}

 Cool! Not only it is simpler, it is also faster!

 {{{
 sage: g = DiGraph(10000000)
 sage: %timeit g.strongly_connected_components()
 1 loops, best of 3: 7.89 s per loop
 }}}
 > Method `strongly_connected_components_digraph`:
 > - is not currently called by
 `DiGraph().strongly_connected_components_digraph()` and will certainly
 never be
 > - seems more complicated than what it should. Indeed, as soon as you
 have a mapping `vertex->scc` you can simply iterate over the edges `(u,v)`
 of the input digraph and add edges `(scc[u],scc[v])` to the output
 digraph. I don't see the need for the
 `_strongly_connected_components_digraph` method and it is rather
 complicated.

 Hmmm, this is the hard part of my answer...

 The problem is that I will need a C algorithm working on
 `static_sparse_graphs` to compute the digraph of strongly connected
 components in linear time, with no parallel edge. I think this is the only
 way to do it: I know it is complicated, but I do not know any other way.
 Do you think I should use n dictionaries, to check for parallel edges?

 > General comments:
 > - `if visited[u] == 0:` -> `if not visited[u]`

 Done!

 > - `return [output, {v:scc[i] for i,v in enumerate(G.vertices())}]` ->
 `return output, {v:scc[i] for i,v in enumerate(G.vertices())}`  no need
 for explicitely creating a list

 Done!

 > - you have broken doctests caused by this ticket. I have not tracked the
 dependencies.

 The problem is the order in which strongly connected components are
 outputted in the list: we can change the results without any problem. In
 the following, I try to explain why.

 > {{{
 > sage -t --long src/sage/monoids/automatic_semigroup.py
 > **********************************************************************
 > File "src/sage/monoids/automatic_semigroup.py", line 135, in
 sage.monoids.automatic_semigroup.[wiki:AutomaticSemigroup]
 > Failed example:
 > map(sorted, M.j_classes())
 > Expected:
 > [[[], [2]], [[1, 1], [1]]]
 > Got:
 > [[[1], [1, 1]], [[], [2]]]

 The J-classes are sets of elements in a semigroups satisfying a specific
 property (for more information, see
 http://www.liafa.jussieu.fr/~jep/PDF/HandBook.pdf). The difference amont
 the results is the order in which the elements are listed, but since in
 sets the order does not count, both results are correct.

 > **********************************************************************
 > File "src/sage/monoids/automatic_semigroup.py", line 137, in
 sage.monoids.automatic_semigroup.[wiki:AutomaticSemigroup]
 > Failed example:
 > M.j_classes_of_idempotents()
 > Expected:
 > [[[]], [[1, 1]]]
 > Got:
 > [[[1, 1]], [[]]]

 Here, the difference is the order in which the two classes of idempotents
 are outputted. Since method j_classes_of_idempotents does not set a
 specific order, both results are correct.

 > **********************************************************************
 > File "src/sage/monoids/automatic_semigroup.py", line 139, in
 sage.monoids.automatic_semigroup.[wiki:AutomaticSemigroup]
 > Failed example:
 > M.j_transversal_of_idempotents()
 > Expected:
 > [[], [1, 1]]
 > Got:
 > [[1, 1], []]

 Here, we simply get one element from the classes found in the previous
 example: the order is maintained. Hence, both results are again correct.

 > **********************************************************************
 > 1 item had failures:
 > 3 of  80 in sage.monoids.automatic_semigroup.[wiki:AutomaticSemigroup]
 > [264 tests, 3 failures, 1.94 s]
 > sage -t --long src/sage/categories/finite_semigroups.py
 > **********************************************************************
 > File "src/sage/categories/finite_semigroups.py", line 119, in
 
sage.categories.finite_semigroups.[wiki:FiniteSemigroups].[wiki:ParentMethods].j_transversal_of_idempotents
 > Failed example:
 > sorted(S.j_transversal_of_idempotents())
 > Expected:
 > ['a', 'ab', 'ac', 'acb', 'b', 'c', 'cb']
 > Got:
 > ['a', 'ab', 'abc', 'ac', 'b', 'bc', 'c']
 > }}}

 Here, 'abc' is in the same class of idempotents as 'acb', and 'bc' is in
 the same class of idempotents as 'cb', as shown by the example before this
 one. Since this routine should output one idempotent per class, we are
 fine!

--
Ticket URL: <http://trac.sagemath.org/ticket/19014#comment:14>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/d/optout.

Re: [sage-trac] #19014: Tarjan Strongly Connected Components Algorithm

Reply via email to