Re: Let BucketingSink roll file on each checkpoint

Fabian Hueske Thu, 22 Mar 2018 02:21:37 -0700

Hi,

Flink maintains its own Kafka offsets in its checkpoints and does not rely
on Kafka's offset management.
That way Flink guarantees that read offsets and checkpointed operator state
are always aligned.
The BucketingSink is designed to not lose any data and the mode of
operation is described in detail in JavaDocs of the class [1].


Please check the JavaDocs and let me know if you have questions or doubts
about the mechanism.

Best, Fabian

[1]
https://github.com/apache/flink/blob/release-1.4/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java

2018-03-21 7:38 GMT+01:00 XilangYan <xilang....@gmail.com>:

> Thank you! Fabian
>
> HDFS small file problem can be avoid with big checkpoint interval.
>
> Meanwhile, there is potential data lose problem in current BucketingSink.
> Say we consume data in kafka, when checkpoint is requested, kafka offset is
> update, but in-progress file in BucketingSink is remained. If flink crushed
> after that, data in the in-progress file is lost. Am I right?
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/
>

Re: Let BucketingSink roll file on each checkpoint

Reply via email to