Hi Amit, The BucketingSink doesn't have well defined semantics when used with S3. Data loss is possible but I am not sure whether it is the only problem. There are plans to rewrite the BucketingSink in Flink 1.6 to enable eventually consistent file systems [1][2].
Best, Gary [1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/sink-with-BucketingSink-to-S3-files-override-td18433.html [2] https://issues.apache.org/jira/browse/FLINK-6306 On Thu, May 17, 2018 at 11:57 AM, Amit Jain <aj201...@gmail.com> wrote: > Hi, > > We are using Flink to process click stream data from Kafka and pushing > the same in 128MB file in S3. > > What is the message processing guarantees with S3 sink? In my > understanding, S3A client buffers the data on memory/disk. In failure > scenario on particular node, TM would not trigger Writer#close hence > buffered data can lose entirely assuming this buffer contains data of > last successful checkpointing. > > -- > Thanks, > Amit >