Re: [Neo4j] finding all shortest paths between one node and all other nodes in a large scale databse

jacopo . farina Thu, 12 May 2011 03:24:18 -0700

Hi,

I had the same problem and solved it by assigning a distance label to any
node.


The procedure is:

1-take the starting node N and add it to a set A, define the set B

2-set the value d=1

3-for any node M in A:

 3.1 set the label "distance" of M to d

3.2 for any node X which is connected to M and doesn't have the label
distance add X to the set B

4-increase d of 1

5-empty A and put all the entries of B in it, then empty B

 6-go to step 3 until A is empty


I have an implementation of it, but it's a little bit old (I created it for
a Neo4j 0.1 graph)

--------- Original Message --------

 Da: "Neo4j user discussions" &lt;[email protected]&gt;

 To: "Neo4j user discussions" &lt;[email protected]&gt;

 Oggetto: Re: [Neo4j] finding all shortest paths between one node and all
other nodes in a large scale databse

 Data: 12/05/11 09:30

 

  Thanks for all your response,

 

 Here is the size of the grapth db:

 

 Nodes        Size

 -----------------------------

 100,000      97MB

 200,000      182MB

 300,000      267MB

 ...

 5,000,000   expect 5GB

 

 I've tried to use 5 virtual machines, each one has 2 cores and 1G memory,

 Running 2 threads on each VM.

 

 Nodes       Time

 ------------------------------

 50,000      &gt;20mins

 150,000    &gt;40hrs

 

 Obviously, when the amount of node increases, the spending time of
executing

 GraphAlgoFactory.shortestPath increases heavily and non-linearly.

 It's hard to estimate how many machines I need if I want to deal with

 5,000,000 nodes or even 10,000,000....,

 I think Map/Reduce will be one solution for me, and I'll try to use BFS

 traversal which may reduce some duplicate procedures.

 

 

 Nice to discuss with you,

 

 Thanks.

 

 2011/5/12 Michael Hunger &lt;[email protected]&gt;

 

 &gt; Hey JueiTing,

 &gt;

 &gt; I'm not sure if Hadoop is needed here.

 &gt; What is the current performance characteristics for the shortest path
you

 &gt; are using?

 &gt;

 &gt; You could take a decent machine and just fire up, e.g. blocks of 10k
node

 &gt; pairs to a ThreadPoolExecutor with cores*2 threads.

 &gt; Each of those tasks only has to return the sum of the path lengths
(and you

 &gt; know the block size) so you can sum the whole thing up onto a long and

 &gt; divide it by the

 &gt; number of pairs processed at the end?

 &gt;

 &gt; Perhaps instead of brute force looping one and a half times over all
nodes

 &gt; it is perhaps better to do a breath first traversal over all nodes

 &gt; (regardless of relationships, just outgoing rels) and returning unique
paths

 &gt; from the traverser

 &gt; and calculating the path lenghts from the start nodes to each
_connected_

 &gt; node on the paths (start node -&gt; path[0..n].nodes[1..n])

 &gt;

 &gt; Cheers

 &gt;

 &gt; Michael

 &gt;

 &gt; Am 11.05.2011 um 21:12 schrieb Peter Neubauer:

 &gt;

 &gt; &gt; Hi JueiTing,

 &gt; &gt; I think this is a typical case for a massive Map/Reduce job. I am

 &gt; thinking

 &gt; &gt; of combining Hadoop works with replicas of the graph and then do
the

 &gt; &gt; computation. I believe Paddy Fitzgerald has been working with
these

 &gt; &gt; approaches and can give some feedback.

 &gt; &gt;

 &gt; &gt; Of course, given the size of the graph, that might prove a
problem. OTOH,

 &gt; if

 &gt; &gt; there are no modifications during the computation, you could run
the

 &gt; &gt; calculations on read-only databases from the same store. Would
that work?

 &gt; &gt;

 &gt; &gt; Cheers,

 &gt; &gt;

 &gt; &gt; /peter neubauer

 &gt; &gt;

 &gt; &gt; GTalk:      neubauer.peter

 &gt; &gt; Skype       peter.neubauer

 &gt; &gt; Phone       +46 704 106975

 &gt; &gt; LinkedIn   http://www.linkedin.com/in/neubauer

 &gt; &gt; Twitter      http://twitter.com/peterneubauer

 &gt; &gt;

 &gt; &gt; http://www.neo4j.org - Your high performance graph

 &gt; database.

 &gt; &gt; http://startupbootcamp.org/ - &Ouml;resund - Innovation happens
HERE.

 &gt; &gt; http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing
party.

 &gt; &gt;

 &gt; &gt;

 &gt; &gt; On Wed, May 11, 2011 at 12:47 AM,
&ccedil;&iquest;&ccedil;&lsquo;&aring;&raquo;&middot;
&lt;[email protected]&gt; wrote:

 &gt; &gt;

 &gt; &gt;&gt; Hi,

 &gt; &gt;&gt;

 &gt; &gt;&gt; I'm trying to use Neo4j graph database to store a

 &gt; &gt;&gt; large social network(more than 5,000,000 nodes) for academic
research.

 &gt; &gt;&gt;

 &gt; &gt;&gt; I need to compute the separation degree(path length) between
any two

 &gt; nodes

 &gt; &gt;&gt; in the graph then get the average degree of whole database.

 &gt; &gt;&gt; The solution I'm using use now is archieved by executing API

 &gt; &gt;&gt; "GraphAlgoFactory.shortestPath",

 &gt; &gt;&gt; but it means I need to execute (n*(n-1))/2 times to get all
path length.

 &gt; I

 &gt; &gt;&gt; don't think it's a very good idea :(

 &gt; &gt;&gt;

 &gt; &gt;&gt; So, I'm wondering that is there any function which can assign
one node

 &gt; and

 &gt; &gt;&gt; whole DB as Input, and return the

 &gt; &gt;&gt; paths or only path lengths between the node and all other
nodes in the

 &gt; &gt;&gt; graph

 &gt; &gt;&gt; as Output? Just like Dijkstra algorithm did.

 &gt; &gt;&gt; I know there is a Dijkstra function in the API, but it only
return the

 &gt; path

 &gt; &gt;&gt; between two nodes.

 &gt; &gt;&gt; It'll be really helpful for me if there is any function can
return all

 &gt; the

 &gt; &gt;&gt; result in one computation(even the whole result array
computing by

 &gt; &gt;&gt; Dijkstra)

 &gt; &gt;&gt;

 &gt; &gt;&gt; Or does anyone has better idea to deal with this issue?

 &gt; &gt;&gt; I tried to implement Dijkstra by myself, but it was
impossible to

 &gt; declare

 &gt; &gt;&gt; such big 2D array in the program...(OutOfMemoryException)

 &gt; &gt;&gt;

 &gt; &gt;&gt; If there is any advice, I'll really appreciate it.

 &gt; &gt;&gt;

 &gt; &gt;&gt; Thanks.

 &gt; &gt;&gt; _______________________________________________

 &gt; &gt;&gt; Neo4j mailing list

 &gt; &gt;&gt; [email protected]

 &gt; &gt;&gt; https://lists.neo4j.org/mailman/listinfo/user

 &gt; &gt;&gt;

 &gt; &gt; _______________________________________________

 &gt; &gt; Neo4j mailing list

 &gt; &gt; [email protected]

 &gt; &gt; https://lists.neo4j.org/mailman/listinfo/user

 &gt;

 &gt; _______________________________________________

 &gt; Neo4j mailing list

 &gt; [email protected]

 &gt; https://lists.neo4j.org/mailman/listinfo/user

 &gt;

 _______________________________________________

 Neo4j mailing list

 [email protected]

 https://lists.neo4j.org/mailman/listinfo/user

  
 
 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP
autenticato? GRATIS solo con Email.it: http://www.email.it/f
 
 Sponsor:
 Per il ponte del 2 giugno scegli le offerte speciali dei Riccione Family
Hotels! Bimbi gratis, spiaggia, baby menu, miniclub, parchi divertimento
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=11447&d=20110512

_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] finding all shortest paths between one node and all other nodes in a large scale databse

Reply via email to