Re: StreamingFileSink doesn't close multipart uploads to s3?

2020-01-15 Thread Kostas Kloudas
Hi Ken, Jingsong and Li, Sorry for the late reply. As Jingsong pointed out, upon calling close() the StreamingFileSink does not commit the in-progress/pending files. The reason for this is that the close() method of any UDF including sink functions is called on both normal termination and

Re: StreamingFileSink doesn't close multipart uploads to s3?

2020-01-10 Thread Ken Krugler
Hi Kostas, I didn’t see a follow-up to this, and have also run into this same issue of winding up with a bunch of .inprogress files when a bounded input stream ends and the job terminates. When StreamingFileSystem.close() is called, shouldn’t all buckets get auto-rolled, so that the

Re: StreamingFileSink doesn't close multipart uploads to s3?

2019-12-09 Thread Jingsong Li
Hi Kostas, I took a look to StreamingFileSink.close, it just delete all temporary files. I know it is for failover. When Job fail, it should just delete temp files for next restart. But for testing purposes, we just want to run a bounded streaming job. If there is no checkpoint trigger, no one

Re: StreamingFileSink doesn't close multipart uploads to s3?

2019-12-06 Thread Li Peng
Ok I seem to have solved the issue by enabling checkpointing. Based on the docs (I'm using 1.9.0), it seemed like only StreamingFileSink.forBulkFormat() should've required checkpointing, but based on this

StreamingFileSink doesn't close multipart uploads to s3?

2019-12-06 Thread Li Peng
Hey folks, I'm trying to get StreamingFileSink to write to s3 every minute, with flink-s3-fs-hadoop, and based on the default rolling policy, which is configured to "roll" every 60 seconds, I thought that would be automatic (I interpreted rolling to mean actually close a multipart upload to s3).