#19031: New Algorithm for Top-K Closeness Centralities
-------------------------------------+-------------------------------------
       Reporter:  borassi            |        Owner:
           Type:  enhancement        |       Status:  needs_review
       Priority:  major              |    Milestone:  sage-6.9
      Component:  graph theory       |   Resolution:
       Keywords:  Closeness          |    Merged in:
  Centrality, top-k                  |    Reviewers:
        Authors:  Michele Borassi    |  Work issues:
Report Upstream:  N/A                |       Commit:
         Branch:                     |  06d2c00fbfdf6f0166834b68e5e9b0929c5abb89
  u/borassi/new_algorithm_for_top_k_closeness_centralities|     Stopgaps:
   Dependencies:  #19007,#19014      |
-------------------------------------+-------------------------------------

Comment (by borassi):

 > > > - What's the need for using type `uint64_t` instead of `int`? the
 corresponding variables are later cast to `int`.
 > > >
 > > >
 > > >
 > > >
 > >
 > > Well, it's a bit tricky. The point is that in the recursion for
 `reachU_scc` I might find values that are much bigger than `g.n`. Since I
 did not want to test at each step if this occurs, I used `uint64_t` in
 order to avoid overflow (these values are smaller than `g.n^2`). Then,
 only at the end, I use min (so that now the values do not overflow), and I
 set `reachU[i] = <int> min(reachU_scc[scc[i]], g.n)`. However, you are
 right with `reachL`, and I modified it.
 > >
 > >
 > >
 > >
 > Is it mentioned somewhere?
 >

 I added two inline comments to explain it.

 > > > - the tests for the validity of the results are excessive. There is
 no need for the loop `range(1000)`. Testing for *one* randomly chosen
 graph with say `n=20` nodes and `random.randint(1, n*(n-1) / 2)` edges is
 sufficient.
 > > >
 > > >
 > > >
 > > >
 > >
 > > Done! I left it because I had to test it 100,000 times to find the
 last bug...
 > >
 > >
 > >
 > >
 > That's a lot!
 > However, I don't like the `n = random.randint(2,20)`. You could safely
 let `n=20`, no?

 Done!

 > Can we always assume that farness[v]>0? This is for the return
 sorted([(1.0/farness[v],....

 Yep.

 The only instructions that modify the farness are `farness[x] = n` (if a
 visit is stopped) and `farness[x] = ((<double> f) * (n-1)) /
 (<double>(nd-1) * (nd-1))`, where `f` is a (positive) sum of distances,
 `nd-1` is positive because `nd` is the number of reachable vertices from a
 vertex `x`. If `nd==1`, it means that the out-degree of `x` is 0, and we
 skip the analysis of `x`, so this case never occurs. Then, `n>=nd`, and
 hence also `n-1` is positive.

 > Instead of `G.vertices()[v]`, you should add a line like `cdef list V =
 G.vertices()` before the return statement and then use `V[v]`.

 Done!

 > About the behavior when `k>=n`. What you propose is in fact to return
 the centrality closeness (as a list of couples (centrality,vertex) ).
 > - Is your method faster than `centrality_closeness` in this case?

 No, it should be the same (actually, a bit slower because we make some
 computations during the BFSes).

 > - This behavior should be documented for the parameter `k`

 Done!

--
Ticket URL: <http://trac.sagemath.org/ticket/19031#comment:11>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/d/optout.

Reply via email to