Hello there Jean! This may be related to a network issue between Flink and minio/s3. I had this in the past and I configured Flink to not start if the state was not possible. So every time I received one of those, Flink would restart and try again.
Att, Pedro Mázala Be awesome On Thu, 22 May 2025 at 12:29, Jean-Marc Paulin <jm.pau...@gmail.com> wrote: > Hi, > > We are running Flink 1.20.1, and see a strange issue when trying to read a > savepoint from minio/S3 to a hashmap backend. At first we'd think the file > is not there, but when checking the S3 bucket the file is there. This is > not systematic and only happens from time to time. We think it's an > environmental issue. we were wondering if there were any options available > to maybe give it a retry ? > This is the exception we see: > > org.apache.flink.runtime.state.BackendBuildingException: Failed when trying > to restore heap backend > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.restoreState(HeapKeyedStateBackendBuilder.java:174) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:108) > at > org.apache.flink.runtime.state.hashmap.HashMapStateBackend.createKeyedStateBackend(HashMapStateBackend.java:119) > at > org.apache.flink.runtime.state.hashmap.HashMapStateBackend.createKeyedStateBackend(HashMapStateBackend.java:61) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$3(StreamTaskStateInitializerImpl.java:446) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:173) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:137) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:457) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:203) > at > org.apache.flink.state.api.input.StreamOperatorContextBuilder.build(StreamOperatorContextBuilder.java:129) > at > org.apache.flink.state.api.input.KeyedStateInputFormat.open(KeyedStateInputFormat.java:176) > at > org.apache.flink.state.api.input.KeyedStateInputFormat.open(KeyedStateInputFormat.java:66) > at > org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:92) > at > org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:113) > at > org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:71) > at > org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:338) > Caused by: > com.facebook.presto.hive.s3.PrestoS3FileSystem$UnrecoverableS3OperationException: > com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does > not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; > Request ID: 1841CDA87F9BAC8F; S3 Extended Request ID: > e5c2c7654856b7589c81653d762ab26f50a21aeaa0de520cb1263639b9f43328; Proxy: > null), S3 Extended Request ID: > e5c2c7654856b7589c81653d762ab26f50a21aeaa0de520cb1263639b9f43328 (Path: > s3://aiops-ir-lifecycle/savepoints/savepoint-7a276c-8ba7a1a7741b/2bef5371-e008-4e36-a0fe-c7e6fe11c844) > at > com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.lambda$openStream$2(PrestoS3FileSystem.java:1114) > at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:139) > at > com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.openStream(PrestoS3FileSystem.java:1099) > at > com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.openStream(PrestoS3FileSystem.java:1084) > at > com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.seekStream(PrestoS3FileSystem.java:1077) > at > com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.lambda$read$1(PrestoS3FileSystem.java:1021) > at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:139) > at > com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.read(PrestoS3FileSystem.java:1020) > at > java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:244) > at > java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:263) > at java.base/java.io.FilterInputStream.read(FilterInputStream.java:82) > at > org.apache.flink.fs.s3presto.common.HadoopDataInputStream.read(HadoopDataInputStream.java:88) > at java.base/java.io.DataInputStream.readInt(DataInputStream.java:381) > at > org.apache.flink.core.io.VersionedIOReadableWritable.read(VersionedIOReadableWritable.java:47) > at > org.apache.flink.runtime.state.KeyedBackendSerializationProxy.read(KeyedBackendSerializationProxy.java:143) > at > org.apache.flink.runtime.state.restore.FullSnapshotRestoreOperation.readMetaData(FullSnapshotRestoreOperation.java:194) > at > org.apache.flink.runtime.state.restore.FullSnapshotRestoreOperation.restoreKeyGroupsInStateHandle(FullSnapshotRestoreOperation.java:171) > at > org.apache.flink.runtime.state.restore.FullSnapshotRestoreOperation.access$100(FullSnapshotRestoreOperation.java:113) > at > org.apache.flink.runtime.state.restore.FullSnapshotRestoreOperation$1.next(FullSnapshotRestoreOperation.java:158) > at > org.apache.flink.runtime.state.restore.FullSnapshotRestoreOperation$1.next(FullSnapshotRestoreOperation.java:140) > at > org.apache.flink.runtime.state.heap.HeapSavepointRestoreOperation.restore(HeapSavepointRestoreOperation.java:116) > at > org.apache.flink.runtime.state.heap.HeapSavepointRestoreOperation.restore(HeapSavepointRestoreOperation.java:58) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.restoreState(HeapKeyedStateBackendBuilder.java:171) > ... 15 more > Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The specified > key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: > NoSuchKey; Request ID: 1841CDA87F9BAC8F; S3 Extended Request ID: > e5c2c7654856b7589c81653d762ab26f50a21aeaa0de520cb1263639b9f43328; Proxy: > null), S3 Extended Request ID: > e5c2c7654856b7589c81653d762ab26f50a21aeaa0de520cb1263639b9f43328 > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1912) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1450) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1419) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1183) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:838) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:805) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:779) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:735) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:717) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:581) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559) > at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5593) > at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5540) > at > com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1574) > at > com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.lambda$openStream$2(PrestoS3FileSystem.java:1102) > > > > Thanks > > JM > > >