Re: Using S3 as a sink (StreamingFileSink)

2019-08-19 Thread Swapnil Kumar
We are on 1.8 as of now will give "stop with savepoint" a try once we upgrade. I am trying to cancel the job with savepoint and restore it back again. I think there is an issue with how our s3 lifecycle is configured. Thank you for your help. On Sun, Aug 18, 2019 at 8:10 AM Stephan Ewen

Re: Using S3 as a sink (StreamingFileSink)

2019-08-19 Thread Swapnil Kumar
Thank you Taher, We are not on EMR but great to know that s3 and streaming sink should be working fine based on your explanation. On Sun, Aug 18, 2019 at 8:23 AM taher koitawala wrote: > Hi Swapnil, >We faced this problem once, I think changing checkpoint dir to hdfs > and keeping sink

Re: Using S3 as a sink (StreamingFileSink)

2019-08-19 Thread Swapnil Kumar
Hello Rafi, Thank you for getting back. We have lifecycle rule setup for the sink and not the s3 bucket for savepoints. This was my initial hunch too but we tried restarting the job immediately after canceling them and it failed. Best, Swapnil Kumar On Sat, Aug 17, 2019 at 2:23 PM Rafi Aroch

Re: Using S3 as a sink (StreamingFileSink)

2019-08-18 Thread Ayush Verma
Hi, I would suggest you upgrade flink to 1.7.x and flink-s3-fs-hadoop to 1.7.2. You might be facing this issue: - https://issues.apache.org/jira/browse/FLINK-11496 - https://issues.apache.org/jira/browse/FLINK-11302 Kind regards Ayush Verma On Sun, Aug 18, 2019 at 6:02 PM taher koitawala

Re: Using S3 as a sink (StreamingFileSink)

2019-08-18 Thread taher koitawala
We used EMR version 5.20 which has Flink 1.6.2 and all other libraries were according to this version. So flink-s3-fs-hadoop was 1.6.2 as well. On Sun, Aug 18, 2019, 9:55 PM Ayush Verma wrote: > Hello, could you tell us the version of flink-s3-fs-hadoop library that > you are using ? > > On Sun

Re: Using S3 as a sink (StreamingFileSink)

2019-08-18 Thread Ayush Verma
Hello, could you tell us the version of flink-s3-fs-hadoop library that you are using ? On Sun 18 Aug 2019 at 16:24, taher koitawala wrote: > Hi Swapnil, >We faced this problem once, I think changing checkpoint dir to hdfs > and keeping sink dir to s3 with EMRFS s3 consistency enabled

Re: Using S3 as a sink (StreamingFileSink)

2019-08-18 Thread taher koitawala
Hi Swapnil, We faced this problem once, I think changing checkpoint dir to hdfs and keeping sink dir to s3 with EMRFS s3 consistency enabled solves this problem. If you are not using emr then I don't know how else it can be solved. But in a nutshell because EMRFS s3 consistency uses Dynamo

Re: Using S3 as a sink (StreamingFileSink)

2019-08-18 Thread Stephan Ewen
My first guess would also be the same as Rafi's: The lifetime of the MPU part files is so too low for that use case. Maybe this can help: - If you want to stop a job with a savepoint and plan to restore later from it (possible much later, so that the MPU Part lifetime might be exceeded), then

Re: Using S3 as a sink (StreamingFileSink)

2019-08-17 Thread Rafi Aroch
Hi, S3 would delete files only if you have 'lifecycle rules' [1] defined on the bucket. Could that be the case? If so, make sure to disable / extend the object expiration period. [1] https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html Thanks, Rafi On Sat, Aug 17, 2019

Re: Using S3 as a sink (StreamingFileSink)

2019-08-16 Thread Oytun Tez
Hi Swapnil, I am not familiar with the StreamingFileSink, however, this sounds like a checkpointing issue to me FileSink should keep its sink state, and remove from the state the files that it *really successfully* sinks (perhaps you may want to add a validation here with S3 to check file

Using S3 as a sink (StreamingFileSink)

2019-08-16 Thread Swapnil Kumar
Hello, We are using Flink to process input events and aggregate and write o/p of our streaming job to S3 using StreamingFileSink but whenever we try to restore the job from a savepoint, the restoration fails with missing part files error. As per my understanding, s3 deletes those