Hi Amit,

Can you elaborate how you write using "S3 sink" and which version of Flink
you are using?

If you are using BucketingSink[1], you can checkout the API doc and
configure to flush before closing your sink.
This way your sink is "integrated with the checkpointing mechanism to
provide exactly once semantics"[2]

Thanks,
Rong

[1]
https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/filesystem_sink.html
[2]
https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.html

On Thu, May 17, 2018 at 2:57 AM, Amit Jain <aj201...@gmail.com> wrote:

> Hi,
>
> We are using Flink to process click stream data from Kafka and pushing
> the same in 128MB file in S3.
>
> What is the message processing guarantees with S3 sink? In my
> understanding, S3A client buffers the data on memory/disk. In failure
> scenario on particular node, TM would not trigger Writer#close hence
> buffered data can lose entirely assuming this buffer contains data of
> last successful checkpointing.
>
> --
> Thanks,
> Amit
>

Reply via email to