Vasia Kalavri created FLINK-5127:
------------------------------------

             Summary: Reduce the amount of intermediate data in vertex-centric 
iterations
                 Key: FLINK-5127
                 URL: https://issues.apache.org/jira/browse/FLINK-5127
             Project: Flink
          Issue Type: Improvement
          Components: Gelly
            Reporter: Vasia Kalavri
            Assignee: Vasia Kalavri


The vertex-centric plan contains a join between the workset (messages) and the 
solution set (vertices) that outputs <Vertex, Message> tuples. This 
intermediate dataset is then co-grouped with the edges to provide the Pregel 
interface directly.

This issue proposes an improvement to reduce the size of this intermediate 
dataset. In particular, the vertex state does not have to be attached to all 
the output tuples of the join. If we replace the join with a coGroup and use an 
`Either` type, we can attach the vertex state to the first tuple only. The 
subsequent coGroup can retrieve the vertex state from the first tuple and 
correctly expose the Pregel interface.

In my preliminary experiments, I find that this change reduces intermediate 
data by 2x for small vertex state and 4-5x for large vertex states. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to