S3A "Data read has a different length than the expected" issue root cause

2019-12-17 Thread spoganshev
In case you experience an exception similar to the following: org.apache.flink.fs.s3base.shaded.com.amazonaws.SdkClientException: Data read has a different length than the expected: dataLength=53562; expectedLength=65536; includeSkipped=true; in.getClass()=class

Re: No FileSystem for scheme "file" for S3A in and state processor api in 1.9

2019-11-01 Thread spoganshev
No, I didn't because it's inconvenient for us to have 2 different docker images for streaming and batch jobs. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: No FileSystem for scheme "file" for S3A in and state processor api in 1.9

2019-10-31 Thread spoganshev
The problem happens in batch jobs (the ones that use ExecutionEnvironment) that use state processor api for bootstrapping initial savepoint for streaming job. We are building a single docker image for streaming and batch versions of the job. In that image we put both presto (which we use for

Re: No FileSystem for scheme "file" for S3A in and state processor api in 1.9

2019-10-30 Thread spoganshev
Actually, I forgot to mention that it happens when there's also a presto library in plugins folder (we are using presto for checkpoints and hadoop for file sinks in the job itself) -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

No FileSystem for scheme "file" for S3A in and state processor api in 1.9

2019-10-26 Thread spoganshev
We've added flink-s3-fs-hadoop library to plugins folder and trying to bootstrap state to S3 using S3A protocol. The following exception happens (unless hadoop library is put to lib folder instead of plugins). Looks like S3A filesystem is trying to use "local" filesystem for temporary files and

Post-processing batch JobExecutionResult

2019-09-06 Thread spoganshev
Due to OptimizerPlanEnvironment.execute() throwing exception on the last line there is not way to post-process batch job execution result, like: JobExecutionResult r = env.execute(); // execute batch job analyzeResult(r); // this will never get executed due to plan optimization

KryoSerializer is used for List type instead of ListSerializer

2019-08-21 Thread spoganshev
The following code: val MAILBOX_SET_TYPE_INFO = object: TypeHint>() {}.typeInfo val env = StreamExecutionEnvironment.getExecutionEnvironment() println(MAILBOX_SET_TYPE_INFO.createSerializer(env.config)) prints: org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer@2c39e53 While there

Re: Checkpoints timing out for no apparent reason

2019-07-29 Thread spoganshev
Switching to 1.8 didn't help. Timeout exception from Kinesis is a consequence, not a reason. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Checkpoints timing out for no apparent reason

2019-07-23 Thread spoganshev
Looks like this is the issue: https://issues.apache.org/jira/browse/FLINK-11164 We'll try switching to 1.8 and see if it helps. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Checkpoints timing out for no apparent reason

2019-07-23 Thread spoganshev
I've looked into this problem a little bit more. And it looks like the problem is caused by some problem with Kinesis sink. There is an exception in the logs at the moment in time when the job gets restored after being stalled for about 15 minutes: Encountered an unexpected expired iterator

Re: Checkpoints timing out for no apparent reason

2019-07-18 Thread spoganshev
The image should be visible now at http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Checkpoints-timing-out-for-no-apparent-reason-td28793.html#none It doesn't look like it is a disk performance or network issue. Feels more like some buffer overflowing or timeout due to slightly

Checkpoints timing out for no apparent reason

2019-07-16 Thread spoganshev
We have an issue with a job when it occasionally times out while creating snapshots for no apparent reason: Details: - Flink 1.7.2 - Checkpoints are saved to S3 with presto - Incremental checkpoints are used What might be the cause of this issue? It feels like some internal s3 client timeout

Re: FileInputFormat that processes files in chronological order

2019-05-27 Thread spoganshev
I've tried that, but the problem is: - FileInputFormat#getInputSplitAssigner return type is LocatableInputSplitAssigner - LocatableInputSplitAssigner is final Which makes it impossible to override the split assigner unfortunately -- Sent from:

Re: FileInputFormat that processes files in chronological order

2019-05-27 Thread spoganshev
Why is FileInputFormat#getInputSplitAssigner not configurable though? It makes sense to let those who use FileInputFormat set the desired split assigner (and make LocatableInputSplitAssigner just a default one). -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/