I have an application that evaluate a graph using this algorithm:
-
use a parallel for loop to evaluate all nodes in a graph (to evaluate a
node, an image is read, and then result of this node is calculated)
-
use a second parallel for loop to evaluate all edges in the graph. The
function would take in results from both nodes of the edge, and then
calculate the answer for the edge
The final result will consist of calculated results of each edge. So each
node, and each edge is essentially a job, and in this case, an edge is more
like a job than a message
As you can see, the above
algorithm would employ two map functions, but no reduce function. The
total data size can be very large (say 100GB). Also, the workload of
each node and each edge is highly irregular, and thus load balancing
mechanisms are essential.
In this case, will giraph suit this
application? if so, how will my program like? And
will giraph be able to strike the balance between a good load balancing
of the second map function, and minimizing data transfer of the results
from the first map function?