Hi,
I have looked through the archives and tried to find more information about how 
to sample the nodes of a Neo4j instance.
As it seems, one way to go is to iterate using 'getAllNodes' and keep on 
sampling until you are happy with the sample size. However, there is a 
restriction with this approach in that it is not random -- you just get the 
first N nodes of the 'getAllNodes' iterator. Is there an efficient way to do a 
random sampling of N nodes? (I believe one way is to iterate through _all_ 
results from 'getAllNodes' and pick among these randomly -- but this is not 
efficient and scales pretty bad.)
If relevant, the sample will be used as input to a sort of clustering algorithm 
which will then try to cluster similar semantic node types into different 
clusters (e.g., in the IMDb case, it can distinguish which nodes are movies and 
which are actors).
I intend to write my own server plugin to do this and then get the results from 
another application over the REST API. I feel that this can be kind of slow 
though. Are there any alternatives to send data faster?
Thanks!
Regards,Anders Lindström
                                          
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to