#19031: New Algorithm for Top-K Closeness Centralities
-------------------------------------+-------------------------------------
       Reporter:  borassi            |        Owner:
           Type:  enhancement        |       Status:  needs_review
       Priority:  major              |    Milestone:  sage-6.9
      Component:  graph theory       |   Resolution:
       Keywords:  Closeness          |    Merged in:
  Centrality, top-k                  |    Reviewers:
        Authors:  Michele Borassi    |  Work issues:
Report Upstream:  N/A                |       Commit:
         Branch:                     |  2c738015b029f7c93017607a9a26bb5617ab683f
  u/borassi/new_algorithm_for_top_k_closeness_centralities|     Stopgaps:
   Dependencies:  #19007,#19014      |
-------------------------------------+-------------------------------------

Comment (by dcoudert):

 > > - What's the need for using type `uint64_t` instead of `int`? the
 corresponding variables are later cast to `int`.
 >
 > Well, it's a bit tricky. The point is that in the recursion for
 `reachU_scc` I might find values that are much bigger than `g.n`. Since I
 did not want to test at each step if this occurs, I used `uint64_t` in
 order to avoid overflow (these values are smaller than `g.n^2`). Then,
 only at the end, I use min (so that now the values do not overflow), and I
 set `reachU[i] = <int> min(reachU_scc[scc[i]], g.n)`. However, you are
 right with `reachL`, and I modified it.
 Is it mentioned somewhere?

 > > - You could use `maxscc = max(scc_sizes[i] for i in range(nscc)`
 >
 > I don't think so, `maxscc` is not the maximum size of a SCC, but it is
 the SCC with maximum size. In other words, your instruction outputs
 `scc_sizes[maxscc]`, not `maxscc`. Am I missing something?
 Right, it should be `maxscc = max((scc_sizes[i],i) for i in
 range(nscc))[1]`, but this is ugly.

 > > - the tests for the validity of the results are excessive. There is no
 need for the loop `range(1000)`. Testing for *one* randomly chosen graph
 with say `n=20` nodes and `random.randint(1, n*(n-1) / 2)` edges is
 sufficient.
 >
 > Done! I left it because I had to test it 100,000 times to find the last
 bug...
 That's a lot!
 However, I don't like the `n = random.randint(2,20)`. You could safely let
 `n=20`, no?

 > > - we can do that `for x in sorted_vert[:n]:` when `sorted_vert` is an
 array of ints ???
 >
 > Yep: !http://docs.cython.org/src/reference/language_basics.html
 That's funny.

 Can we always assume that `farness[v]>0`? This is for the `return
 sorted([(1.0/farness[v],...`.

 Instead of `G.vertices()[v]`, you should add a line like `cdef list V =
 G.vertices()` before the return statement and then use `V[v]`.


 About the behavior when `k>=n`. What you propose is in fact to return the
 centrality closeness (as a list of couples (centrality,vertex) ).
 - Is your method faster than `centrality_closeness` in this case?
 - This behavior should be documented for the parameter `k`

 That's all for now ;)

 David.

--
Ticket URL: <http://trac.sagemath.org/ticket/19031#comment:9>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/d/optout.

Reply via email to