Re: S3/S3A support

2016-10-12 Thread Cliff Resnick
Regarding S3 and the Rolling/BucketingSink, we've seen data loss when resuming from checkpoints, as S3 FileSystem implementations flush to temporary files while the RollingSink expects a direct flush to in-progress files. Because there is no such think as "flush and resume writing" to S3, I don't k

Re: S3/S3A support

2016-10-11 Thread Stephan Ewen
Hi! The "truncate()" functionality is only needed for the rolling/bucketing sink. The core checkpoint functionality does not need any truncate() behavior... Best, Stephan On Tue, Oct 11, 2016 at 5:22 PM, Vijay Srinivasaraghavan < vijikar...@yahoo.com.invalid> wrote: > Thanks Stephan. My unders

Re: S3/S3A support

2016-10-11 Thread Vijay Srinivasaraghavan
Thanks Stephan. My understanding is checkpoint uses truncate API but S3A does not support it. Will this have any impact? Some of the known S3A client limitations are captured in Hortonworks site https://hortonworks.github.io/hdp-aws/s3-s3aclient/index.html and wondering if that has any impact on

Re: S3/S3A support

2016-10-11 Thread Stephan Ewen
Hi! In 1.2-SNAPSHOT, we recently fixed issues due to the "eventual consistency" nature of S3. The fix is not in v1.1 - that is the only known issue I can think of. It results in occasional (seldom) periods of heavy restart retries, until all files are visible to all participants. If you run into

S3/S3A support

2016-10-10 Thread Vijay Srinivasaraghavan
Hello, Per documentation (https://ci.apache.org/projects/flink/flink-docs-master/setup/aws.html), it looks like S3/S3A FS implementation is supported using standard Hadoop S3 FS client APIs. In the absence of using standard HCFS and going with S3/S3A, 1) Are there any known limitations/issues?