[jira] [Commented] (FLINK-13159) java.lang.ClassNotFoundException when restore job
[ https://issues.apache.org/jira/browse/FLINK-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900961#comment-16900961 ] Tzu-Li (Gordon) Tai commented on FLINK-13159: - I've made this a blocker for 1.8.2. Ideally, it would be best if we can fix this for the upcoming 1.9.0 as well. > java.lang.ClassNotFoundException when restore job > - > > Key: FLINK-13159 > URL: https://issues.apache.org/jira/browse/FLINK-13159 > Project: Flink > Issue Type: Bug > Components: API / Type Serialization System >Affects Versions: 1.8.0, 1.8.1 >Reporter: kring >Assignee: Yun Tang >Priority: Blocker > Attachments: image-2019-08-05-17-29-40-351.png, > image-2019-08-05-17-32-44-988.png > > > {code:java} > java.lang.Exception: Exception while creating StreamOperatorStateContext. > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.flink.util.FlinkException: Could not restore keyed > state backend for WindowOperator_b398b3dd4c544ddf2d47a0cc47d332f4_(1/6) from > any of the 1 prov > ided restore options. > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135) > ... 5 common frames omitted > Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed > when trying to restore heap backend > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:130) > at > org.apache.flink.runtime.state.filesystem.FsStateBackend.createKeyedStateBackend(FsStateBackend.java:489) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121) > ... 7 common frames omitted > Caused by: java.lang.RuntimeException: Cannot instantiate class. > at > org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:384) > at > org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders.lambda$createV2PlusReader$0(StateTableByKeyGroupReaders.java:74) > at > org.apache.flink.runtime.state.KeyGroupPartitioner$PartitioningResultKeyGroupReader.readMappingsInKeyGroup(KeyGroupPartitioner.java:297) > at > org.apache.flink.runtime.state.heap.HeapRestoreOperation.readKeyGroupStateData(HeapRestoreOperation.java:290) > at > org.apache.flink.runtime.state.heap.HeapRestoreOperation.readStateHandleStateData(HeapRestoreOperation.java:251) > at > org.apache.flink.runtime.state.heap.HeapRestoreOperation.restore(HeapRestoreOperation.java:153) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:127) > ... 11 common frames omitted > Caused by: java.lang.ClassNotFoundException: xxx > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:382) > ... 17 common frames omitted > {code} > A strange problem with Flink is that after a task has been running properly > for a period of time, if any exception (such as ask timeout or ES request > timeout) is thrown, the task restart will report the above error (xxx is a > business model), and ten subsequent retries will not succeed, but the task > will be resubmitted. Then it can run normally. In addition, there are three > other tasks running at the same time, none of which has the problem. > My flink version is 1.8.0. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (FLINK-13159) java.lang.ClassNotFoundException when restore job
[ https://issues.apache.org/jira/browse/FLINK-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900957#comment-16900957 ] Tzu-Li (Gordon) Tai commented on FLINK-13159: - Thanks for the diagnosis [~yunta] [~kring]. The observations makes sense to me, and this indeed is a bug since the user classloader is not being used when deserializing. Please ping me for a review [~yunta], I've assigned you to the JIRA ticket. > java.lang.ClassNotFoundException when restore job > - > > Key: FLINK-13159 > URL: https://issues.apache.org/jira/browse/FLINK-13159 > Project: Flink > Issue Type: Bug > Components: API / Type Serialization System >Reporter: kring >Assignee: Yun Tang >Priority: Critical > Attachments: image-2019-08-05-17-29-40-351.png, > image-2019-08-05-17-32-44-988.png > > > {code:java} > java.lang.Exception: Exception while creating StreamOperatorStateContext. > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.flink.util.FlinkException: Could not restore keyed > state backend for WindowOperator_b398b3dd4c544ddf2d47a0cc47d332f4_(1/6) from > any of the 1 prov > ided restore options. > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135) > ... 5 common frames omitted > Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed > when trying to restore heap backend > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:130) > at > org.apache.flink.runtime.state.filesystem.FsStateBackend.createKeyedStateBackend(FsStateBackend.java:489) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121) > ... 7 common frames omitted > Caused by: java.lang.RuntimeException: Cannot instantiate class. > at > org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:384) > at > org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders.lambda$createV2PlusReader$0(StateTableByKeyGroupReaders.java:74) > at > org.apache.flink.runtime.state.KeyGroupPartitioner$PartitioningResultKeyGroupReader.readMappingsInKeyGroup(KeyGroupPartitioner.java:297) > at > org.apache.flink.runtime.state.heap.HeapRestoreOperation.readKeyGroupStateData(HeapRestoreOperation.java:290) > at > org.apache.flink.runtime.state.heap.HeapRestoreOperation.readStateHandleStateData(HeapRestoreOperation.java:251) > at > org.apache.flink.runtime.state.heap.HeapRestoreOperation.restore(HeapRestoreOperation.java:153) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:127) > ... 11 common frames omitted > Caused by: java.lang.ClassNotFoundException: xxx > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:382) > ... 17 common frames omitted > {code} > A strange problem with Flink is that after a task has been running properly > for a period of time, if any exception (such as ask timeout or ES request > timeout) is thrown, the task restart will report the above error (xxx is a > business model), and ten subsequent retries will not succeed, but the task > will be resubmitted. Then it can run normally. In addition, there are three > other tasks running at the same time, none of which has the problem. > My flink version is 1.8.0. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (FLINK-13159) java.lang.ClassNotFoundException when restore job
[ https://issues.apache.org/jira/browse/FLINK-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900733#comment-16900733 ] Yun Tang commented on FLINK-13159: -- Before Flink-1.8, {{PojoSerializer}} would always use the [1st constructor|https://github.com/apache/flink/blob/56c3e7cd653e4cb2ad0a76ca317aa9fa1d564dc2/flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializer.java#L119] which assign {{Thread.currentThread().getContextClassLoader()}} to private field {{ClassLoader cl}}. The [2nd constructor|https://github.com/apache/flink/blob/56c3e7cd653e4cb2ad0a76ca317aa9fa1d564dc2/flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializer.java#L147] without {{cl}} assigned would only be used to ensure the compatibility. However, after Flink-1.8, we would use the [2nd constructor|https://github.com/apache/flink/blob/a0d236fba7c6abdabb461aa504b1e088a3982c31/flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializer.java#L142] to create a restore serializer or a reconfigured serializer from a {{PojoSerializerSnapshot}}. Unfortunately, the restored serializer does not contain the class loader. I could use below code within {{PojoSubclassSerializerTest}} to reproduce this. {code:java} @Test public void testRestorePojoSerializer() throws IOException { TypeSerializer serializer = createSerializer(); TestUserClass2 bar = new TestUserClass2( ThreadLocalRandom.current().nextInt(), "bar", ThreadLocalRandom.current().nextFloat()); ByteArrayOutputStream out = new ByteArrayOutputStream(); try (DataOutputViewStreamWrapper outView = new DataOutputViewStreamWrapper(out)) { serializer.serialize(bar, outView); DataInputViewStreamWrapper inputViewStreamWrapper = new DataInputViewStreamWrapper(new ByteArrayInputStream(out.toByteArray())); serializer.deserialize(inputViewStreamWrapper); } TypeSerializerSnapshot snapshot = serializer.snapshotConfiguration(); TypeSerializer restoreSerializer = snapshot.restoreSerializer(); try (DataInputViewStreamWrapper inputView = new DataInputViewStreamWrapper(new ByteArrayInputStream(out.toByteArray( { restoreSerializer.deserialize(inputView); } } {code} [~tzulitai], what do you think of this? If this is really a bug, I am glad to help to resolve this issue. > java.lang.ClassNotFoundException when restore job > - > > Key: FLINK-13159 > URL: https://issues.apache.org/jira/browse/FLINK-13159 > Project: Flink > Issue Type: Bug > Components: API / Type Serialization System >Reporter: kring >Priority: Critical > Attachments: image-2019-08-05-17-29-40-351.png, > image-2019-08-05-17-32-44-988.png > > > {code:java} > java.lang.Exception: Exception while creating StreamOperatorStateContext. > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.flink.util.FlinkException: Could not restore keyed > state backend for WindowOperator_b398b3dd4c544ddf2d47a0cc47d332f4_(1/6) from > any of the 1 prov > ided restore options. > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135) > ... 5 common frames omitted > Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed > when trying to restore heap backend > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:130) > at > org.apache.flink.runtime.state.filesystem.FsStateBackend.createKeyedStateBackend(FsStateBackend.java:489) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.j
[jira] [Commented] (FLINK-13159) java.lang.ClassNotFoundException when restore job
[ https://issues.apache.org/jira/browse/FLINK-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899946#comment-16899946 ] kring commented on FLINK-13159: --- My code uses a parent reference, which actually points to a subclass object. Then I tried debug Flink and found that CL was not passed in when this constructor !image-2019-08-05-17-29-40-351.png! was used to create PojoSerializer, resulting in no CL leading to Cannot instantiate class error when executing the org.apache.flink.api.java.typeutils.runtime.PojoSerializer#deserialize(org.apache.flink.core.memory.DataInputView) method.So I added !image-2019-08-05-17-32-44-988.png! to the last line of the constructor, and then recompiled and packaged, which successfully solved the problem.[~ssailappan] I guess you have the same problem as me. > java.lang.ClassNotFoundException when restore job > - > > Key: FLINK-13159 > URL: https://issues.apache.org/jira/browse/FLINK-13159 > Project: Flink > Issue Type: Bug > Components: API / Type Serialization System >Reporter: kring >Priority: Critical > Attachments: image-2019-08-05-17-32-44-988.png > > > {code:java} > java.lang.Exception: Exception while creating StreamOperatorStateContext. > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.flink.util.FlinkException: Could not restore keyed > state backend for WindowOperator_b398b3dd4c544ddf2d47a0cc47d332f4_(1/6) from > any of the 1 prov > ided restore options. > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135) > ... 5 common frames omitted > Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed > when trying to restore heap backend > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:130) > at > org.apache.flink.runtime.state.filesystem.FsStateBackend.createKeyedStateBackend(FsStateBackend.java:489) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121) > ... 7 common frames omitted > Caused by: java.lang.RuntimeException: Cannot instantiate class. > at > org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:384) > at > org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders.lambda$createV2PlusReader$0(StateTableByKeyGroupReaders.java:74) > at > org.apache.flink.runtime.state.KeyGroupPartitioner$PartitioningResultKeyGroupReader.readMappingsInKeyGroup(KeyGroupPartitioner.java:297) > at > org.apache.flink.runtime.state.heap.HeapRestoreOperation.readKeyGroupStateData(HeapRestoreOperation.java:290) > at > org.apache.flink.runtime.state.heap.HeapRestoreOperation.readStateHandleStateData(HeapRestoreOperation.java:251) > at > org.apache.flink.runtime.state.heap.HeapRestoreOperation.restore(HeapRestoreOperation.java:153) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:127) > ... 11 common frames omitted > Caused by: java.lang.ClassNotFoundException: xxx > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:382) > ... 17 common frames omitted > {code} > A strange problem with Flink is that after a task has been running properly > for a period of time, if any exception (such as ask timeout or ES request > timeout) is thrown, the task restart will report the above error (xxx is a > business model), and ten subsequent retries will not succeed, but the task > will be resubmitted. Then it
[jira] [Commented] (FLINK-13159) java.lang.ClassNotFoundException when restore job
[ https://issues.apache.org/jira/browse/FLINK-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895797#comment-16895797 ] Subramanian Sailappan commented on FLINK-13159: --- I am running into the same issue. I am running Flink 1.8.0 on EMR on a yarn cluster. The checkpoint is externalized and uses S3. The job runs fine for long times before it fails with the below exception and never recovers after that. It keeps trying to restore from the checkpoint and runs into the same error repeatedly. java.lang.Exception: Exception while creating StreamOperatorStateContext. at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state backend for WindowOperator_d28463815a9d818025b1ff96211a6dc9_(2/2) from any of the 1 provided restore options. at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307) at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135) ... 5 more Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed when trying to restore heap backend at org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:130) at org.apache.flink.runtime.state.filesystem.FsStateBackend.createKeyedStateBackend(FsStateBackend.java:489) at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291) at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142) at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121) ... 7 more Caused by: java.lang.RuntimeException: Cannot instantiate class. at org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:384) at org.apache.flink.api.common.typeutils.base.ListSerializer.deserialize(ListSerializer.java:133) at org.apache.flink.api.common.typeutils.base.ListSerializer.deserialize(ListSerializer.java:42) at org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders.lambda$createV2PlusReader$0(StateTableByKeyGroupReaders.java:74) at org.apache.flink.runtime.state.KeyGroupPartitioner$PartitioningResultKeyGroupReader.readMappingsInKeyGroup(KeyGroupPartitioner.java:297) at org.apache.flink.runtime.state.heap.HeapRestoreOperation.readKeyGroupStateData(HeapRestoreOperation.java:290) at org.apache.flink.runtime.state.heap.HeapRestoreOperation.readStateHandleStateData(HeapRestoreOperation.java:251) at org.apache.flink.runtime.state.heap.HeapRestoreOperation.restore(HeapRestoreOperation.java:153) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:127) ... 11 more Caused by: java.lang.ClassNotFoundException: xxx at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:382) ... 19 more > java.lang.ClassNotFoundException when restore job > - > > Key: FLINK-13159 > URL: https://issues.apache.org/jira/browse/FLINK-13159 > Project: Flink > Issue Type: Bug > Components: API / Type Serialization System >Reporter: kring >Priority: Critical > > {code:java} > java.lang.Exception: Exception while creating StreamOperatorStateContext. > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738) > at >
[jira] [Commented] (FLINK-13159) java.lang.ClassNotFoundException when restore job
[ https://issues.apache.org/jira/browse/FLINK-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894254#comment-16894254 ] Liu Bo commented on FLINK-13159: I'm using flink 1.8.1 on yarn session with rocksdb as state backend. got the same problem when starting flink jobs with checkpoints: bin/flink run -s /flink/checkpoints/c2b8fa3c51393a2c6865ca13045eccad/chk-84 deploy/xxx.jar job keep retrying and fail after about 8 seconds. Our job runs on flink 1.7.2 with no problem for about 4 months and could recover successfully. Full stack trace: java.lang.RuntimeException: Exception occurred while processing valve output watermark: at org.apache.flink.streaming.runtime.io.StreamInputProcessor$ForwardingValveOutputHandler.handleWatermark(StreamInputProcessor.java:265) at org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.findAndOutputNewMinWatermarkAcrossAlignedChannels(StatusWatermarkValve.java:189) at org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.inputWatermark(StatusWatermarkValve.java:111) at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:184) at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: Cannot instantiate class. at org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:384) at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:143) at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:37) at org.apache.flink.streaming.api.datastream.CoGroupedStreams$UnionSerializer.deserialize(CoGroupedStreams.java:581) at org.apache.flink.streaming.api.datastream.CoGroupedStreams$UnionSerializer.deserialize(CoGroupedStreams.java:506) at org.apache.flink.contrib.streaming.state.RocksDBListState.deserializeNextElement(RocksDBListState.java:144) at org.apache.flink.contrib.streaming.state.RocksDBListState.deserializeList(RocksDBListState.java:135) at org.apache.flink.contrib.streaming.state.RocksDBListState.getInternal(RocksDBListState.java:119) at org.apache.flink.contrib.streaming.state.RocksDBListState.get(RocksDBListState.java:111) at org.apache.flink.contrib.streaming.state.RocksDBListState.get(RocksDBListState.java:60) at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.onEventTime(WindowOperator.java:452) at org.apache.flink.streaming.api.operators.InternalTimerServiceImpl.advanceWatermark(InternalTimerServiceImpl.java:255) at org.apache.flink.streaming.api.operators.InternalTimeServiceManager.advanceWatermark(InternalTimeServiceManager.java:128) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.processWatermark(AbstractStreamOperator.java:775) at org.apache.flink.streaming.runtime.io.StreamInputProcessor$ForwardingValveOutputHandler.handleWatermark(StreamInputProcessor.java:262) ... 7 more Caused by: java.lang.ClassNotFoundException: com/xx/xx/xx at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:382) > java.lang.ClassNotFoundException when restore job > - > > Key: FLINK-13159 > URL: https://issues.apache.org/jira/browse/FLINK-13159 > Project: Flink > Issue Type: Bug > Components: API / Type Serialization System >Reporter: kring >Priority: Critical > > {code:java} > java.lang.Exception: Exception while creating StreamOperatorStateContext. > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.flink.util.FlinkException: Could not restore keyed > state backend for WindowOperator_b398b3dd4c544ddf2d47a0cc47d332f4_(1/6) from > any of the 1 prov > ided restore options. > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedSta
[jira] [Commented] (FLINK-13159) java.lang.ClassNotFoundException when restore job
[ https://issues.apache.org/jira/browse/FLINK-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880987#comment-16880987 ] kring commented on FLINK-13159: --- My taskManager and JobManager are on the same machine, using local files as backends. Therefore, I did not configure local recovery. Just now, I created a savepoint for this task and tried to recover from the savepoint. Then I reproduced the exception 100 percent, so I'm going to debug flink. > java.lang.ClassNotFoundException when restore job > - > > Key: FLINK-13159 > URL: https://issues.apache.org/jira/browse/FLINK-13159 > Project: Flink > Issue Type: Bug >Reporter: kring >Priority: Critical > > {code:java} > java.lang.Exception: Exception while creating StreamOperatorStateContext. > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.flink.util.FlinkException: Could not restore keyed > state backend for WindowOperator_b398b3dd4c544ddf2d47a0cc47d332f4_(1/6) from > any of the 1 prov > ided restore options. > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135) > ... 5 common frames omitted > Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed > when trying to restore heap backend > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:130) > at > org.apache.flink.runtime.state.filesystem.FsStateBackend.createKeyedStateBackend(FsStateBackend.java:489) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121) > ... 7 common frames omitted > Caused by: java.lang.RuntimeException: Cannot instantiate class. > at > org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:384) > at > org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders.lambda$createV2PlusReader$0(StateTableByKeyGroupReaders.java:74) > at > org.apache.flink.runtime.state.KeyGroupPartitioner$PartitioningResultKeyGroupReader.readMappingsInKeyGroup(KeyGroupPartitioner.java:297) > at > org.apache.flink.runtime.state.heap.HeapRestoreOperation.readKeyGroupStateData(HeapRestoreOperation.java:290) > at > org.apache.flink.runtime.state.heap.HeapRestoreOperation.readStateHandleStateData(HeapRestoreOperation.java:251) > at > org.apache.flink.runtime.state.heap.HeapRestoreOperation.restore(HeapRestoreOperation.java:153) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:127) > ... 11 common frames omitted > Caused by: java.lang.ClassNotFoundException: xxx > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:382) > ... 17 common frames omitted > {code} > A strange problem with Flink is that after a task has been running properly > for a period of time, if any exception (such as ask timeout or ES request > timeout) is thrown, the task restart will report the above error (xxx is a > business model), and ten subsequent retries will not succeed, but the task > will be resubmitted. Then it can run normally. In addition, there are three > other tasks running at the same time, none of which has the problem. > My flink version is 1.8.0. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-13159) java.lang.ClassNotFoundException when restore job
[ https://issues.apache.org/jira/browse/FLINK-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880972#comment-16880972 ] Yun Tang commented on FLINK-13159: -- [~kring] would you please give more details. From your description, you would meet the exception if your job failover. However, this could come to normal just after several times retries. Have you ever turned on [local recovery|https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#task-local-recovery], and if the failling task would come to normal once it run on another worker node? > java.lang.ClassNotFoundException when restore job > - > > Key: FLINK-13159 > URL: https://issues.apache.org/jira/browse/FLINK-13159 > Project: Flink > Issue Type: Bug >Reporter: kring >Priority: Critical > > {code:java} > java.lang.Exception: Exception while creating StreamOperatorStateContext. > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.flink.util.FlinkException: Could not restore keyed > state backend for WindowOperator_b398b3dd4c544ddf2d47a0cc47d332f4_(1/6) from > any of the 1 prov > ided restore options. > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135) > ... 5 common frames omitted > Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed > when trying to restore heap backend > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:130) > at > org.apache.flink.runtime.state.filesystem.FsStateBackend.createKeyedStateBackend(FsStateBackend.java:489) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121) > ... 7 common frames omitted > Caused by: java.lang.RuntimeException: Cannot instantiate class. > at > org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:384) > at > org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders.lambda$createV2PlusReader$0(StateTableByKeyGroupReaders.java:74) > at > org.apache.flink.runtime.state.KeyGroupPartitioner$PartitioningResultKeyGroupReader.readMappingsInKeyGroup(KeyGroupPartitioner.java:297) > at > org.apache.flink.runtime.state.heap.HeapRestoreOperation.readKeyGroupStateData(HeapRestoreOperation.java:290) > at > org.apache.flink.runtime.state.heap.HeapRestoreOperation.readStateHandleStateData(HeapRestoreOperation.java:251) > at > org.apache.flink.runtime.state.heap.HeapRestoreOperation.restore(HeapRestoreOperation.java:153) > at > org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:127) > ... 11 common frames omitted > Caused by: java.lang.ClassNotFoundException: xxx > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:382) > ... 17 common frames omitted > {code} > A strange problem with Flink is that after a task has been running properly > for a period of time, if any exception (such as ask timeout or ES request > timeout) is thrown, the task restart will report the above error (xxx is a > business model), and ten subsequent retries will not succeed, but the task > will be resubmitted. Then it can run normally. In addition, there are three > other tasks running at the same time, none of which has the problem. > My flink version is 1.8.0. -- This message was sent by Atlassian JIRA (v7.6.3#76005)