DataStream in batch mode - handling (un)ordered bounded data

Alexis Sarda-Espinosa Fri, 12 Mar 2021 06:37:57 -0800

Hello,

Regarding the new BATCH mode of the data stream API, I see that the 
documentation states that some operators will process all data for a given key 
before moving on to the next one. However, I don't see how Flink is supposed to 
know whether the input will provide all data for a given key sequentially. In 
the DataSet API, an (undocumented?) feature is using SplitDataProperties 
(https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/api/java/io/SplitDataProperties.html)
 to specify different grouping/partitioning/sorting properties, so if the data 
is pre-sorted (e.g. when reading from a database), some operations can be 
optimized. Will the DataStream API get something similar?


Regards,
Alexis.

DataStream in batch mode - handling (un)ordered bounded data

Reply via email to