Hey JueiTing,

I'm not sure if Hadoop is needed here.
What is the current performance characteristics for the shortest path you are 
using?

You could take a decent machine and just fire up, e.g. blocks of 10k node pairs 
to a ThreadPoolExecutor with cores*2 threads.
Each of those tasks only has to return the sum of the path lengths (and you 
know the block size) so you can sum the whole thing up onto a long and divide 
it by the
number of pairs processed at the end? 

Perhaps instead of brute force looping one and a half times over all nodes it 
is perhaps better to do a breath first traversal over all nodes (regardless of 
relationships, just outgoing rels) and returning unique paths from the 
traverser 
and calculating the path lenghts from the start nodes to each _connected_ node 
on the paths (start node -> path[0..n].nodes[1..n])

Cheers

Michael

Am 11.05.2011 um 21:12 schrieb Peter Neubauer:

> Hi JueiTing,
> I think this is a typical case for a massive Map/Reduce job. I am thinking
> of combining Hadoop works with replicas of the graph and then do the
> computation. I believe Paddy Fitzgerald has been working with these
> approaches and can give some feedback.
> 
> Of course, given the size of the graph, that might prove a problem. OTOH, if
> there are no modifications during the computation, you could run the
> calculations on read-only databases from the same store. Would that work?
> 
> Cheers,
> 
> /peter neubauer
> 
> GTalk:      neubauer.peter
> Skype       peter.neubauer
> Phone       +46 704 106975
> LinkedIn   http://www.linkedin.com/in/neubauer
> Twitter      http://twitter.com/peterneubauer
> 
> http://www.neo4j.org               - Your high performance graph database.
> http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
> 
> 
> On Wed, May 11, 2011 at 12:47 AM, 翁瑞廷 <wshir...@gmail.com> wrote:
> 
>> Hi,
>> 
>> I'm trying to use Neo4j graph database to store a
>> large social network(more than 5,000,000 nodes) for academic research.
>> 
>> I need to compute the separation degree(path length) between any two nodes
>> in the graph then get the average degree of whole database.
>> The solution I'm using use now is archieved by executing API
>> "GraphAlgoFactory.shortestPath",
>> but it means I need to execute (n*(n-1))/2 times to get all path length. I
>> don't think it's a very good idea :(
>> 
>> So, I'm wondering that is there any function which can assign one node and
>> whole DB as Input, and return the
>> paths or only path lengths between the node and all other nodes in the
>> graph
>> as Output? Just like Dijkstra algorithm did.
>> I know there is a Dijkstra function in the API, but it only return the path
>> between two nodes.
>> It'll be really helpful for me if there is any function can return all the
>> result in one computation(even the whole result array computing by
>> Dijkstra)
>> 
>> Or does anyone has better idea to deal with this issue?
>> I tried to implement Dijkstra by myself, but it was impossible to declare
>> such big 2D array in the program...(OutOfMemoryException)
>> 
>> If there is any advice, I'll really appreciate it.
>> 
>> Thanks.
>> _______________________________________________
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>> 
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user

_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to