If a flink pipeline with a kafka source was to be re stored from a
checkpoint, would a partition be assigned to a the same sub task index ? It
does not seem so, but wanted to confirm. We are trying to simulate an edge
case where we have dangling files. We create these files by  forcing 2 or
more restores between 2 consecutive checkpoints. These files are created
after the first checkpoint but are rendered redundant by another restore
before the next checkpoint. We want to be sure that all data in these files
are accounted for by checking for the records in resolved files but were
not sure we should check the ones with the same task index or all sub
indexes...


for example, if we have 2 sub task index ids
if _in-progress-part-1-1 is  a dangling/redundant file would that data be
in part-1-2 ( the next finalized file but with the same sub task id ) or
could it be in part-2-2 plus for example ? If the partitions were to
maintain affinity, we would imagine that it would be former but wanted to
confirm.

Regards.

Reply via email to