Chengxiang Li created FLINK-2241:
------------------------------------

             Summary: Use BloomFilter to minmize build side records which 
spilled to disk in Hybrid-Hash-Join
                 Key: FLINK-2241
                 URL: https://issues.apache.org/jira/browse/FLINK-2241
             Project: Flink
          Issue Type: Improvement
          Components: Core
            Reporter: Chengxiang Li
            Priority: Minor


In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
small table data would be spilled to disk, and the counterpart partition of big 
table data would be spilled to disk in probe phase as well. If we build a 
BloomFilter while spill small table to disk during build phase, and use it to 
filter the big table records which tend to be spilled to disk, this may greatly 
 reduce the spilled big table file size, and saved the disk IO cost for writing 
and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to