Hello there Jean!

This may be related to a network issue between Flink and minio/s3. I had
this in the past and I configured Flink to not start if the state was not
possible. So every time I received one of those, Flink would restart and
try again.



Att,
Pedro Mázala
Be awesome


On Thu, 22 May 2025 at 12:29, Jean-Marc Paulin <jm.pau...@gmail.com> wrote:

> Hi,
>
> We are running Flink 1.20.1, and see a strange issue when trying to read a
> savepoint from minio/S3 to a hashmap backend. At first we'd think the file
> is not there, but when checking the S3 bucket the file is there. This is
> not systematic and only happens from time to time. We think it's an
> environmental issue. we were wondering if there were any options available
> to maybe give it a retry ?
> This is the exception we see:
>
> org.apache.flink.runtime.state.BackendBuildingException: Failed when trying 
> to restore heap backend
>         at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.restoreState(HeapKeyedStateBackendBuilder.java:174)
>         at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:108)
>         at 
> org.apache.flink.runtime.state.hashmap.HashMapStateBackend.createKeyedStateBackend(HashMapStateBackend.java:119)
>         at 
> org.apache.flink.runtime.state.hashmap.HashMapStateBackend.createKeyedStateBackend(HashMapStateBackend.java:61)
>         at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$3(StreamTaskStateInitializerImpl.java:446)
>         at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:173)
>         at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:137)
>         at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:457)
>         at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:203)
>         at 
> org.apache.flink.state.api.input.StreamOperatorContextBuilder.build(StreamOperatorContextBuilder.java:129)
>         at 
> org.apache.flink.state.api.input.KeyedStateInputFormat.open(KeyedStateInputFormat.java:176)
>         at 
> org.apache.flink.state.api.input.KeyedStateInputFormat.open(KeyedStateInputFormat.java:66)
>         at 
> org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:92)
>         at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:113)
>         at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:71)
>         at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:338)
> Caused by: 
> com.facebook.presto.hive.s3.PrestoS3FileSystem$UnrecoverableS3OperationException:
>  com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does 
> not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; 
> Request ID: 1841CDA87F9BAC8F; S3 Extended Request ID: 
> e5c2c7654856b7589c81653d762ab26f50a21aeaa0de520cb1263639b9f43328; Proxy: 
> null), S3 Extended Request ID: 
> e5c2c7654856b7589c81653d762ab26f50a21aeaa0de520cb1263639b9f43328 (Path: 
> s3://aiops-ir-lifecycle/savepoints/savepoint-7a276c-8ba7a1a7741b/2bef5371-e008-4e36-a0fe-c7e6fe11c844)
>         at 
> com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.lambda$openStream$2(PrestoS3FileSystem.java:1114)
>         at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:139)
>         at 
> com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.openStream(PrestoS3FileSystem.java:1099)
>         at 
> com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.openStream(PrestoS3FileSystem.java:1084)
>         at 
> com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.seekStream(PrestoS3FileSystem.java:1077)
>         at 
> com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.lambda$read$1(PrestoS3FileSystem.java:1021)
>         at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:139)
>         at 
> com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.read(PrestoS3FileSystem.java:1020)
>         at 
> java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:244)
>         at 
> java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:263)
>         at java.base/java.io.FilterInputStream.read(FilterInputStream.java:82)
>         at 
> org.apache.flink.fs.s3presto.common.HadoopDataInputStream.read(HadoopDataInputStream.java:88)
>         at java.base/java.io.DataInputStream.readInt(DataInputStream.java:381)
>         at 
> org.apache.flink.core.io.VersionedIOReadableWritable.read(VersionedIOReadableWritable.java:47)
>         at 
> org.apache.flink.runtime.state.KeyedBackendSerializationProxy.read(KeyedBackendSerializationProxy.java:143)
>         at 
> org.apache.flink.runtime.state.restore.FullSnapshotRestoreOperation.readMetaData(FullSnapshotRestoreOperation.java:194)
>         at 
> org.apache.flink.runtime.state.restore.FullSnapshotRestoreOperation.restoreKeyGroupsInStateHandle(FullSnapshotRestoreOperation.java:171)
>         at 
> org.apache.flink.runtime.state.restore.FullSnapshotRestoreOperation.access$100(FullSnapshotRestoreOperation.java:113)
>         at 
> org.apache.flink.runtime.state.restore.FullSnapshotRestoreOperation$1.next(FullSnapshotRestoreOperation.java:158)
>         at 
> org.apache.flink.runtime.state.restore.FullSnapshotRestoreOperation$1.next(FullSnapshotRestoreOperation.java:140)
>         at 
> org.apache.flink.runtime.state.heap.HeapSavepointRestoreOperation.restore(HeapSavepointRestoreOperation.java:116)
>         at 
> org.apache.flink.runtime.state.heap.HeapSavepointRestoreOperation.restore(HeapSavepointRestoreOperation.java:58)
>         at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.restoreState(HeapKeyedStateBackendBuilder.java:171)
>         ... 15 more
> Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The specified 
> key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: 
> NoSuchKey; Request ID: 1841CDA87F9BAC8F; S3 Extended Request ID: 
> e5c2c7654856b7589c81653d762ab26f50a21aeaa0de520cb1263639b9f43328; Proxy: 
> null), S3 Extended Request ID: 
> e5c2c7654856b7589c81653d762ab26f50a21aeaa0de520cb1263639b9f43328
>         at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1912)
>         at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1450)
>         at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1419)
>         at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1183)
>         at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:838)
>         at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:805)
>         at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:779)
>         at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:735)
>         at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:717)
>         at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:581)
>         at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559)
>         at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5593)
>         at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5540)
>         at 
> com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1574)
>         at 
> com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.lambda$openStream$2(PrestoS3FileSystem.java:1102)
>
>
>
> Thanks
>
> JM
>
>
>

Reply via email to