Re: [Neo4j] finding all shortest paths between one node and all other nodes in a large scale databse

2011-05-12 Thread
Thanks for all your response,

Here is the size of the grapth db:

NodesSize
-
100,000  97MB
200,000  182MB
300,000  267MB
...
5,000,000   expect 5GB

I've tried to use 5 virtual machines, each one has 2 cores and 1G memory,
Running 2 threads on each VM.

Nodes   Time
--
50,000  20mins
150,00040hrs

Obviously, when the amount of node increases, the spending time of executing
GraphAlgoFactory.shortestPath increases heavily and non-linearly.
It's hard to estimate how many machines I need if I want to deal with
5,000,000 nodes or even 10,000,000,
I think Map/Reduce will be one solution for me, and I'll try to use BFS
traversal which may reduce some duplicate procedures.


Nice to discuss with you,

Thanks.

2011/5/12 Michael Hunger michael.hun...@neotechnology.com

 Hey JueiTing,

 I'm not sure if Hadoop is needed here.
 What is the current performance characteristics for the shortest path you
 are using?

 You could take a decent machine and just fire up, e.g. blocks of 10k node
 pairs to a ThreadPoolExecutor with cores*2 threads.
 Each of those tasks only has to return the sum of the path lengths (and you
 know the block size) so you can sum the whole thing up onto a long and
 divide it by the
 number of pairs processed at the end?

 Perhaps instead of brute force looping one and a half times over all nodes
 it is perhaps better to do a breath first traversal over all nodes
 (regardless of relationships, just outgoing rels) and returning unique paths
 from the traverser
 and calculating the path lenghts from the start nodes to each _connected_
 node on the paths (start node - path[0..n].nodes[1..n])

 Cheers

 Michael

 Am 11.05.2011 um 21:12 schrieb Peter Neubauer:

  Hi JueiTing,
  I think this is a typical case for a massive Map/Reduce job. I am
 thinking
  of combining Hadoop works with replicas of the graph and then do the
  computation. I believe Paddy Fitzgerald has been working with these
  approaches and can give some feedback.
 
  Of course, given the size of the graph, that might prove a problem. OTOH,
 if
  there are no modifications during the computation, you could run the
  calculations on read-only databases from the same store. Would that work?
 
  Cheers,
 
  /peter neubauer
 
  GTalk:  neubauer.peter
  Skype   peter.neubauer
  Phone   +46 704 106975
  LinkedIn   http://www.linkedin.com/in/neubauer
  Twitter  http://twitter.com/peterneubauer
 
  http://www.neo4j.org   - Your high performance graph
 database.
  http://startupbootcamp.org/- Öresund - Innovation happens HERE.
  http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
 
 
  On Wed, May 11, 2011 at 12:47 AM, 翁瑞廷 wshir...@gmail.com wrote:
 
  Hi,
 
  I'm trying to use Neo4j graph database to store a
  large social network(more than 5,000,000 nodes) for academic research.
 
  I need to compute the separation degree(path length) between any two
 nodes
  in the graph then get the average degree of whole database.
  The solution I'm using use now is archieved by executing API
  GraphAlgoFactory.shortestPath,
  but it means I need to execute (n*(n-1))/2 times to get all path length.
 I
  don't think it's a very good idea :(
 
  So, I'm wondering that is there any function which can assign one node
 and
  whole DB as Input, and return the
  paths or only path lengths between the node and all other nodes in the
  graph
  as Output? Just like Dijkstra algorithm did.
  I know there is a Dijkstra function in the API, but it only return the
 path
  between two nodes.
  It'll be really helpful for me if there is any function can return all
 the
  result in one computation(even the whole result array computing by
  Dijkstra)
 
  Or does anyone has better idea to deal with this issue?
  I tried to implement Dijkstra by myself, but it was impossible to
 declare
  such big 2D array in the program...(OutOfMemoryException)
 
  If there is any advice, I'll really appreciate it.
 
  Thanks.
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] finding all shortest paths between one node and all other nodes in a large scale databse

2011-05-11 Thread
 Hi,

I'm trying to use Neo4j graph database to store a
large social network(more than 5,000,000 nodes) for academic research.

I need to compute the separation degree(path length) between any two nodes
in the graph then get the average degree of whole database.
The solution I'm using use now is archieved by executing API
GraphAlgoFactory.shortestPath,
but it means I need to execute (n*(n-1))/2 times to get all path length. I
don't think it's a very good idea :(

So, I'm wondering that is there any function which can assign one node and
whole DB as Input, and return the
paths or only path lengths between the node and all other nodes in the graph
as Output? Just like Dijkstra algorithm did.
I know there is a Dijkstra function in the API, but it only return the path
between two nodes.
It'll be really helpful for me if there is any function can return all the
result in one computation(even the whole result array computing by Dijkstra)

Or does anyone has better idea to deal with this issue?
I tried to implement Dijkstra by myself, but it was impossible to declare
such big 2D array in the program...(OutOfMemoryException)

If there is any advice, I'll really appreciate it.

Thanks.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user