Hi Venkata! 2012/9/25 Venkata Sastry Malladi <[email protected]>: > Thanks for the reply, Sebastian. > > The problem I'm working on is a routing algorithm for a country wide railway > network. I was thinking about two ways to approach it.
How many vertices and edges do you have? Seems of reasonable size... > 1. Pre calculate the routes. I understand that this is quadratic, but I was > thinking of using some optimizations to limit the calculations. Like > restrict the number of hops from the source node, only calculate the route > to the nearest hub and not every other node, etc. If this is really just the railroad network, that should be fine. I have done something similar at my previous job with the road network on country basis and it took only a few minutes to actually run the thing on a small-ish cluster. Those networks are usually not billions of vertices, so no problem as n^2 is still pretty small. The giraph job we made, was part of a bigger pipeline of jobs (we used crunch for that btw). The first job would cut out the bare minimum of data (the graph minus any other attribution), which was the input for giraph. Then a giraph job read that input created an output file, which was used as an index by the final algorithm. (We actually calculated all the routes from all the vertices twice (in both directions) and filtered in the output writer and that was still fast enough for us.) You can do something similar, where you basically use giraph to calculate your graph index and then stick that into some database (hbase, redis, mysql whatever works for your dataset) and query that from your program. > 2. Use Giraph as a real time query engine - The graph is loaded into the > memory of a cluster of nodes, and given the source and destination nodes of > a query, a single source shortest path algorithm is run. It is not made for real-time queries. As I said, just use giraph to make your index and keep that in memory with the db of your choice. HTH --Andre
