Hey JueiTing, I'm not sure if Hadoop is needed here. What is the current performance characteristics for the shortest path you are using?
You could take a decent machine and just fire up, e.g. blocks of 10k node pairs to a ThreadPoolExecutor with cores*2 threads. Each of those tasks only has to return the sum of the path lengths (and you know the block size) so you can sum the whole thing up onto a long and divide it by the number of pairs processed at the end? Perhaps instead of brute force looping one and a half times over all nodes it is perhaps better to do a breath first traversal over all nodes (regardless of relationships, just outgoing rels) and returning unique paths from the traverser and calculating the path lenghts from the start nodes to each _connected_ node on the paths (start node -> path[0..n].nodes[1..n]) Cheers Michael Am 11.05.2011 um 21:12 schrieb Peter Neubauer: > Hi JueiTing, > I think this is a typical case for a massive Map/Reduce job. I am thinking > of combining Hadoop works with replicas of the graph and then do the > computation. I believe Paddy Fitzgerald has been working with these > approaches and can give some feedback. > > Of course, given the size of the graph, that might prove a problem. OTOH, if > there are no modifications during the computation, you could run the > calculations on read-only databases from the same store. Would that work? > > Cheers, > > /peter neubauer > > GTalk: neubauer.peter > Skype peter.neubauer > Phone +46 704 106975 > LinkedIn http://www.linkedin.com/in/neubauer > Twitter http://twitter.com/peterneubauer > > http://www.neo4j.org - Your high performance graph database. > http://startupbootcamp.org/ - Öresund - Innovation happens HERE. > http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. > > > On Wed, May 11, 2011 at 12:47 AM, 翁瑞廷 <wshir...@gmail.com> wrote: > >> Hi, >> >> I'm trying to use Neo4j graph database to store a >> large social network(more than 5,000,000 nodes) for academic research. >> >> I need to compute the separation degree(path length) between any two nodes >> in the graph then get the average degree of whole database. >> The solution I'm using use now is archieved by executing API >> "GraphAlgoFactory.shortestPath", >> but it means I need to execute (n*(n-1))/2 times to get all path length. I >> don't think it's a very good idea :( >> >> So, I'm wondering that is there any function which can assign one node and >> whole DB as Input, and return the >> paths or only path lengths between the node and all other nodes in the >> graph >> as Output? Just like Dijkstra algorithm did. >> I know there is a Dijkstra function in the API, but it only return the path >> between two nodes. >> It'll be really helpful for me if there is any function can return all the >> result in one computation(even the whole result array computing by >> Dijkstra) >> >> Or does anyone has better idea to deal with this issue? >> I tried to implement Dijkstra by myself, but it was impossible to declare >> such big 2D array in the program...(OutOfMemoryException) >> >> If there is any advice, I'll really appreciate it. >> >> Thanks. >> _______________________________________________ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user