Hi, I had the same problem and solved it by assigning a distance label to any node.

The procedure is: 1-take the starting node N and add it to a set A, define the set B 2-set the value d=1 3-for any node M in A: 3.1 set the label "distance" of M to d 3.2 for any node X which is connected to M and doesn't have the label distance add X to the set B 4-increase d of 1 5-empty A and put all the entries of B in it, then empty B 6-go to step 3 until A is empty I have an implementation of it, but it's a little bit old (I created it for a Neo4j 0.1 graph) --------- Original Message -------- Da: "Neo4j user discussions" <user@lists.neo4j.org> To: "Neo4j user discussions" <user@lists.neo4j.org> Oggetto: Re: [Neo4j] finding all shortest paths between one node and all other nodes in a large scale databse Data: 12/05/11 09:30 Thanks for all your response, Here is the size of the grapth db: Nodes Size ----------------------------- 100,000 97MB 200,000 182MB 300,000 267MB ... 5,000,000 expect 5GB I've tried to use 5 virtual machines, each one has 2 cores and 1G memory, Running 2 threads on each VM. Nodes Time ------------------------------ 50,000 >20mins 150,000 >40hrs Obviously, when the amount of node increases, the spending time of executing GraphAlgoFactory.shortestPath increases heavily and non-linearly. It's hard to estimate how many machines I need if I want to deal with 5,000,000 nodes or even 10,000,000...., I think Map/Reduce will be one solution for me, and I'll try to use BFS traversal which may reduce some duplicate procedures. Nice to discuss with you, Thanks. 2011/5/12 Michael Hunger <michael.hun...@neotechnology.com> > Hey JueiTing, > > I'm not sure if Hadoop is needed here. > What is the current performance characteristics for the shortest path you > are using? > > You could take a decent machine and just fire up, e.g. blocks of 10k node > pairs to a ThreadPoolExecutor with cores*2 threads. > Each of those tasks only has to return the sum of the path lengths (and you > know the block size) so you can sum the whole thing up onto a long and > divide it by the > number of pairs processed at the end? > > Perhaps instead of brute force looping one and a half times over all nodes > it is perhaps better to do a breath first traversal over all nodes > (regardless of relationships, just outgoing rels) and returning unique paths > from the traverser > and calculating the path lenghts from the start nodes to each _connected_ > node on the paths (start node -> path[0..n].nodes[1..n]) > > Cheers > > Michael > > Am 11.05.2011 um 21:12 schrieb Peter Neubauer: > > > Hi JueiTing, > > I think this is a typical case for a massive Map/Reduce job. I am > thinking > > of combining Hadoop works with replicas of the graph and then do the > > computation. I believe Paddy Fitzgerald has been working with these > > approaches and can give some feedback. > > > > Of course, given the size of the graph, that might prove a problem. OTOH, > if > > there are no modifications during the computation, you could run the > > calculations on read-only databases from the same store. Would that work? > > > > Cheers, > > > > /peter neubauer > > > > GTalk: neubauer.peter > > Skype peter.neubauer > > Phone +46 704 106975 > > LinkedIn http://www.linkedin.com/in/neubauer > > Twitter http://twitter.com/peterneubauer > > > > http://www.neo4j.org - Your high performance graph > database. > > http://startupbootcamp.org/ - Öresund - Innovation happens HERE. > > http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. > > > > > > On Wed, May 11, 2011 at 12:47 AM, ç¿ç‘å»· <wshir...@gmail.com> wrote: > > > >> Hi, > >> > >> I'm trying to use Neo4j graph database to store a > >> large social network(more than 5,000,000 nodes) for academic research. > >> > >> I need to compute the separation degree(path length) between any two > nodes > >> in the graph then get the average degree of whole database. > >> The solution I'm using use now is archieved by executing API > >> "GraphAlgoFactory.shortestPath", > >> but it means I need to execute (n*(n-1))/2 times to get all path length. > I > >> don't think it's a very good idea :( > >> > >> So, I'm wondering that is there any function which can assign one node > and > >> whole DB as Input, and return the > >> paths or only path lengths between the node and all other nodes in the > >> graph > >> as Output? Just like Dijkstra algorithm did. > >> I know there is a Dijkstra function in the API, but it only return the > path > >> between two nodes. > >> It'll be really helpful for me if there is any function can return all > the > >> result in one computation(even the whole result array computing by > >> Dijkstra) > >> > >> Or does anyone has better idea to deal with this issue? > >> I tried to implement Dijkstra by myself, but it was impossible to > declare > >> such big 2D array in the program...(OutOfMemoryException) > >> > >> If there is any advice, I'll really appreciate it. > >> > >> Thanks. > >> _______________________________________________ > >> Neo4j mailing list > >> User@lists.neo4j.org > >> https://lists.neo4j.org/mailman/listinfo/user > >> > > _______________________________________________ > > Neo4j mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it: http://www.email.it/f Sponsor: Per il ponte del 2 giugno scegli le offerte speciali dei Riccione Family Hotels! Bimbi gratis, spiaggia, baby menu, miniclub, parchi divertimento Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=11447&d=20110512 _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user