Hi Vishal,

Just wanted to comment on this bit:

> My job has very large amount of state (>100GB) and I have no option but
to use unaligned checkpoints.

I successfully ran Flink jobs with 10+ TB of state and no unaligned
checkpoints enabled. Usually, you consider enabling them when there is some
kind of skew in the topology, but IMO it's unrelated to the state size.

> Reducing the checkpoint interval is not really an option given the size
of the checkpoint

Do you use RocksDB state backend with incremental checkpointing?

On Tue, Nov 15, 2022 at 12:07 AM Vishal Surana <vis...@moengage.com> wrote:

> I wanted to achieve exactly once semantics in my job and wanted to make
> sure I understood the current behaviour correctly:
>
>    1. Only one Kafka transaction at a time (no concurrent checkpoints)
>    2. Only one transaction per checkpoint
>
>
> My job has very large amount of state (>100GB) and I have no option but to
> use unaligned checkpoints. With the above limitation, it seems to me that
> if checkpoint interval is 1 minute and checkpoint takes about 10 seconds to
> complete then only one Kafka transaction can happen in 70 seconds. All of
> the output records will not be visible until the transaction completes.
> This way a steady stream of inputs will result in an buffered output stream
> where data is only visible after a minute, thereby destroying any sort of
> real time streaming use cases. Reducing the checkpoint interval is not
> really an option given the size of the checkpoint. Only way out would be to
> allow for multiple transactions per checkpoint.
>
> Thanks,
> Vishal
>

Reply via email to