Hi,

I had the same problem and solved it by assigning a distance label to any
node.

The procedure is:

1-take the starting node N and add it to a set A, define the set B

2-set the value d=1

3-for any node M in A:

 3.1 set the label "distance" of M to d

3.2 for any node X which is connected to M and doesn't have the label
distance add X to the set B

4-increase d of 1

5-empty A and put all the entries of B in it, then empty B

 6-go to step 3 until A is empty


I have an implementation of it, but it's a little bit old (I created it for
a Neo4j 0.1 graph)

--------- Original Message --------

 Da: "Neo4j user discussions" <user@lists.neo4j.org>

 To: "Neo4j user discussions" <user@lists.neo4j.org>

 Oggetto: Re: [Neo4j] finding all shortest paths between one node and all
other nodes in a large scale databse

 Data: 12/05/11 09:30

 

  Thanks for all your response,

 

 Here is the size of the grapth db:

 

 Nodes        Size

 -----------------------------

 100,000      97MB

 200,000      182MB

 300,000      267MB

 ...

 5,000,000   expect 5GB

 

 I've tried to use 5 virtual machines, each one has 2 cores and 1G memory,

 Running 2 threads on each VM.

 

 Nodes       Time

 ------------------------------

 50,000      >20mins

 150,000    >40hrs

 

 Obviously, when the amount of node increases, the spending time of
executing

 GraphAlgoFactory.shortestPath increases heavily and non-linearly.

 It's hard to estimate how many machines I need if I want to deal with

 5,000,000 nodes or even 10,000,000....,

 I think Map/Reduce will be one solution for me, and I'll try to use BFS

 traversal which may reduce some duplicate procedures.

 

 

 Nice to discuss with you,

 

 Thanks.

 

 2011/5/12 Michael Hunger <michael.hun...@neotechnology.com>

 

 > Hey JueiTing,

 >

 > I'm not sure if Hadoop is needed here.

 > What is the current performance characteristics for the shortest path
you

 > are using?

 >

 > You could take a decent machine and just fire up, e.g. blocks of 10k
node

 > pairs to a ThreadPoolExecutor with cores*2 threads.

 > Each of those tasks only has to return the sum of the path lengths
(and you

 > know the block size) so you can sum the whole thing up onto a long and

 > divide it by the

 > number of pairs processed at the end?

 >

 > Perhaps instead of brute force looping one and a half times over all
nodes

 > it is perhaps better to do a breath first traversal over all nodes

 > (regardless of relationships, just outgoing rels) and returning unique
paths

 > from the traverser

 > and calculating the path lenghts from the start nodes to each
_connected_

 > node on the paths (start node -> path[0..n].nodes[1..n])

 >

 > Cheers

 >

 > Michael

 >

 > Am 11.05.2011 um 21:12 schrieb Peter Neubauer:

 >

 > > Hi JueiTing,

 > > I think this is a typical case for a massive Map/Reduce job. I am

 > thinking

 > > of combining Hadoop works with replicas of the graph and then do
the

 > > computation. I believe Paddy Fitzgerald has been working with
these

 > > approaches and can give some feedback.

 > >

 > > Of course, given the size of the graph, that might prove a
problem. OTOH,

 > if

 > > there are no modifications during the computation, you could run
the

 > > calculations on read-only databases from the same store. Would
that work?

 > >

 > > Cheers,

 > >

 > > /peter neubauer

 > >

 > > GTalk:      neubauer.peter

 > > Skype       peter.neubauer

 > > Phone       +46 704 106975

 > > LinkedIn   http://www.linkedin.com/in/neubauer

 > > Twitter      http://twitter.com/peterneubauer

 > >

 > > http://www.neo4j.org - Your high performance graph

 > database.

 > > http://startupbootcamp.org/ - Öresund - Innovation happens
HERE.

 > > http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing
party.

 > >

 > >

 > > On Wed, May 11, 2011 at 12:47 AM,
翁瑞廷
<wshir...@gmail.com> wrote:

 > >

 > >> Hi,

 > >>

 > >> I'm trying to use Neo4j graph database to store a

 > >> large social network(more than 5,000,000 nodes) for academic
research.

 > >>

 > >> I need to compute the separation degree(path length) between
any two

 > nodes

 > >> in the graph then get the average degree of whole database.

 > >> The solution I'm using use now is archieved by executing API

 > >> "GraphAlgoFactory.shortestPath",

 > >> but it means I need to execute (n*(n-1))/2 times to get all
path length.

 > I

 > >> don't think it's a very good idea :(

 > >>

 > >> So, I'm wondering that is there any function which can assign
one node

 > and

 > >> whole DB as Input, and return the

 > >> paths or only path lengths between the node and all other
nodes in the

 > >> graph

 > >> as Output? Just like Dijkstra algorithm did.

 > >> I know there is a Dijkstra function in the API, but it only
return the

 > path

 > >> between two nodes.

 > >> It'll be really helpful for me if there is any function can
return all

 > the

 > >> result in one computation(even the whole result array
computing by

 > >> Dijkstra)

 > >>

 > >> Or does anyone has better idea to deal with this issue?

 > >> I tried to implement Dijkstra by myself, but it was
impossible to

 > declare

 > >> such big 2D array in the program...(OutOfMemoryException)

 > >>

 > >> If there is any advice, I'll really appreciate it.

 > >>

 > >> Thanks.

 > >> _______________________________________________

 > >> Neo4j mailing list

 > >> User@lists.neo4j.org

 > >> https://lists.neo4j.org/mailman/listinfo/user

 > >>

 > > _______________________________________________

 > > Neo4j mailing list

 > > User@lists.neo4j.org

 > > https://lists.neo4j.org/mailman/listinfo/user

 >

 > _______________________________________________

 > Neo4j mailing list

 > User@lists.neo4j.org

 > https://lists.neo4j.org/mailman/listinfo/user

 >

 _______________________________________________

 Neo4j mailing list

 User@lists.neo4j.org

 https://lists.neo4j.org/mailman/listinfo/user

  
 
 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP
autenticato? GRATIS solo con Email.it: http://www.email.it/f
 
 Sponsor:
 Per il ponte del 2 giugno scegli le offerte speciali dei Riccione Family
Hotels! Bimbi gratis, spiaggia, baby menu, miniclub, parchi divertimento
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=11447&d=20110512

_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to