#19031: New Algorithm for Top-K Closeness Centralities
-------------------------------------+-------------------------------------
Reporter: borassi | Owner:
Type: enhancement | Status: needs_review
Priority: major | Milestone: sage-6.9
Component: graph theory | Resolution:
Keywords: Closeness | Merged in:
Centrality, top-k | Reviewers:
Authors: Michele Borassi | Work issues:
Report Upstream: N/A | Commit:
Branch: | 2c738015b029f7c93017607a9a26bb5617ab683f
u/borassi/new_algorithm_for_top_k_closeness_centralities| Stopgaps:
Dependencies: #19007,#19014 |
-------------------------------------+-------------------------------------
Comment (by dcoudert):
> > - What's the need for using type `uint64_t` instead of `int`? the
corresponding variables are later cast to `int`.
>
> Well, it's a bit tricky. The point is that in the recursion for
`reachU_scc` I might find values that are much bigger than `g.n`. Since I
did not want to test at each step if this occurs, I used `uint64_t` in
order to avoid overflow (these values are smaller than `g.n^2`). Then,
only at the end, I use min (so that now the values do not overflow), and I
set `reachU[i] = <int> min(reachU_scc[scc[i]], g.n)`. However, you are
right with `reachL`, and I modified it.
Is it mentioned somewhere?
> > - You could use `maxscc = max(scc_sizes[i] for i in range(nscc)`
>
> I don't think so, `maxscc` is not the maximum size of a SCC, but it is
the SCC with maximum size. In other words, your instruction outputs
`scc_sizes[maxscc]`, not `maxscc`. Am I missing something?
Right, it should be `maxscc = max((scc_sizes[i],i) for i in
range(nscc))[1]`, but this is ugly.
> > - the tests for the validity of the results are excessive. There is no
need for the loop `range(1000)`. Testing for *one* randomly chosen graph
with say `n=20` nodes and `random.randint(1, n*(n-1) / 2)` edges is
sufficient.
>
> Done! I left it because I had to test it 100,000 times to find the last
bug...
That's a lot!
However, I don't like the `n = random.randint(2,20)`. You could safely let
`n=20`, no?
> > - we can do that `for x in sorted_vert[:n]:` when `sorted_vert` is an
array of ints ???
>
> Yep: !http://docs.cython.org/src/reference/language_basics.html
That's funny.
Can we always assume that `farness[v]>0`? This is for the `return
sorted([(1.0/farness[v],...`.
Instead of `G.vertices()[v]`, you should add a line like `cdef list V =
G.vertices()` before the return statement and then use `V[v]`.
About the behavior when `k>=n`. What you propose is in fact to return the
centrality closeness (as a list of couples (centrality,vertex) ).
- Is your method faster than `centrality_closeness` in this case?
- This behavior should be documented for the parameter `k`
That's all for now ;)
David.
--
Ticket URL: <http://trac.sagemath.org/ticket/19031#comment:9>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/d/optout.