Hello everyone,
In our group at EPFL we're doing research on understanding and potentially improving the performance of data-parallel frameworks that use secondary storage. I was looking at the Flink code to understand how spilling to disk actually works. So far I got to the UnilateralSortMerger.java and its spill and reading threads. I also saw there are some spilling markers used. I am curious if there is any design document available on this topic. I was not able to find much online. If there is no such design document I would appreciate if someone could help me understand how these spilling markers are used. At a higher level, I am trying to understand how much data does Flink spill to disk after it has concluded that it needs to spill to disk. Thank you very much Florin Dinu