Hi,
I have looked through the archives and tried to find more information about how
to sample the nodes of a Neo4j instance.
As it seems, one way to go is to iterate using 'getAllNodes' and keep on
sampling until you are happy with the sample size. However, there is a
restriction with this approach in that it is not random -- you just get the
first N nodes of the 'getAllNodes' iterator. Is there an efficient way to do a
random sampling of N nodes? (I believe one way is to iterate through _all_
results from 'getAllNodes' and pick among these randomly -- but this is not
efficient and scales pretty bad.)
If relevant, the sample will be used as input to a sort of clustering algorithm
which will then try to cluster similar semantic node types into different
clusters (e.g., in the IMDb case, it can distinguish which nodes are movies and
which are actors).
I intend to write my own server plugin to do this and then get the results from
another application over the REST API. I feel that this can be kind of slow
though. Are there any alternatives to send data faster?
Thanks!
Regards,Anders Lindström
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user