[jira] [Commented] (FLINK-13159) java.lang.ClassNotFoundException when restore job

2019-08-06 Thread Tzu-Li (Gordon) Tai (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900961#comment-16900961
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-13159:
-

I've made this a blocker for 1.8.2.
Ideally, it would be best if we can fix this for the upcoming 1.9.0 as well.

> java.lang.ClassNotFoundException when restore job
> -
>
> Key: FLINK-13159
> URL: https://issues.apache.org/jira/browse/FLINK-13159
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System
>Affects Versions: 1.8.0, 1.8.1
>Reporter: kring
>Assignee: Yun Tang
>Priority: Blocker
> Attachments: image-2019-08-05-17-29-40-351.png, 
> image-2019-08-05-17-32-44-988.png
>
>
> {code:java}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_b398b3dd4c544ddf2d47a0cc47d332f4_(1/6) from 
> any of the 1 prov
> ided restore options.
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
> ... 5 common frames omitted
> Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed 
> when trying to restore heap backend
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:130)
> at 
> org.apache.flink.runtime.state.filesystem.FsStateBackend.createKeyedStateBackend(FsStateBackend.java:489)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
> ... 7 common frames omitted
> Caused by: java.lang.RuntimeException: Cannot instantiate class.
> at 
> org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:384)
> at 
> org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders.lambda$createV2PlusReader$0(StateTableByKeyGroupReaders.java:74)
> at 
> org.apache.flink.runtime.state.KeyGroupPartitioner$PartitioningResultKeyGroupReader.readMappingsInKeyGroup(KeyGroupPartitioner.java:297)
> at 
> org.apache.flink.runtime.state.heap.HeapRestoreOperation.readKeyGroupStateData(HeapRestoreOperation.java:290)
> at 
> org.apache.flink.runtime.state.heap.HeapRestoreOperation.readStateHandleStateData(HeapRestoreOperation.java:251)
> at 
> org.apache.flink.runtime.state.heap.HeapRestoreOperation.restore(HeapRestoreOperation.java:153)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:127)
> ... 11 common frames omitted
> Caused by: java.lang.ClassNotFoundException: xxx
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at 
> org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:382)
> ... 17 common frames omitted
> {code}
> A strange problem with Flink is that after a task has been running properly 
> for a period of time, if any exception (such as ask timeout or ES request 
> timeout) is thrown, the task restart will report the above error (xxx is a 
> business model), and ten subsequent retries will not succeed, but the task 
> will be resubmitted. Then it can run normally. In addition, there are three 
> other tasks running at the same time, none of which has the problem.
> My flink version is 1.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-13159) java.lang.ClassNotFoundException when restore job

2019-08-06 Thread Tzu-Li (Gordon) Tai (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900957#comment-16900957
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-13159:
-

Thanks for the diagnosis [~yunta] [~kring].
The observations makes sense to me, and this indeed is a bug since the user 
classloader is not being used when deserializing.

Please ping me for a review [~yunta], I've assigned you to the JIRA ticket.

> java.lang.ClassNotFoundException when restore job
> -
>
> Key: FLINK-13159
> URL: https://issues.apache.org/jira/browse/FLINK-13159
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System
>Reporter: kring
>Assignee: Yun Tang
>Priority: Critical
> Attachments: image-2019-08-05-17-29-40-351.png, 
> image-2019-08-05-17-32-44-988.png
>
>
> {code:java}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_b398b3dd4c544ddf2d47a0cc47d332f4_(1/6) from 
> any of the 1 prov
> ided restore options.
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
> ... 5 common frames omitted
> Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed 
> when trying to restore heap backend
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:130)
> at 
> org.apache.flink.runtime.state.filesystem.FsStateBackend.createKeyedStateBackend(FsStateBackend.java:489)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
> ... 7 common frames omitted
> Caused by: java.lang.RuntimeException: Cannot instantiate class.
> at 
> org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:384)
> at 
> org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders.lambda$createV2PlusReader$0(StateTableByKeyGroupReaders.java:74)
> at 
> org.apache.flink.runtime.state.KeyGroupPartitioner$PartitioningResultKeyGroupReader.readMappingsInKeyGroup(KeyGroupPartitioner.java:297)
> at 
> org.apache.flink.runtime.state.heap.HeapRestoreOperation.readKeyGroupStateData(HeapRestoreOperation.java:290)
> at 
> org.apache.flink.runtime.state.heap.HeapRestoreOperation.readStateHandleStateData(HeapRestoreOperation.java:251)
> at 
> org.apache.flink.runtime.state.heap.HeapRestoreOperation.restore(HeapRestoreOperation.java:153)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:127)
> ... 11 common frames omitted
> Caused by: java.lang.ClassNotFoundException: xxx
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at 
> org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:382)
> ... 17 common frames omitted
> {code}
> A strange problem with Flink is that after a task has been running properly 
> for a period of time, if any exception (such as ask timeout or ES request 
> timeout) is thrown, the task restart will report the above error (xxx is a 
> business model), and ten subsequent retries will not succeed, but the task 
> will be resubmitted. Then it can run normally. In addition, there are three 
> other tasks running at the same time, none of which has the problem.
> My flink version is 1.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-13159) java.lang.ClassNotFoundException when restore job

2019-08-06 Thread Yun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900733#comment-16900733
 ] 

Yun Tang commented on FLINK-13159:
--

Before Flink-1.8, {{PojoSerializer}} would always use the [1st 
constructor|https://github.com/apache/flink/blob/56c3e7cd653e4cb2ad0a76ca317aa9fa1d564dc2/flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializer.java#L119]
 which assign {{Thread.currentThread().getContextClassLoader()}} to private 
field {{ClassLoader cl}}. The [2nd 
constructor|https://github.com/apache/flink/blob/56c3e7cd653e4cb2ad0a76ca317aa9fa1d564dc2/flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializer.java#L147]
 without {{cl}} assigned would only be used to ensure the compatibility.

However, after Flink-1.8, we would use the [2nd 
constructor|https://github.com/apache/flink/blob/a0d236fba7c6abdabb461aa504b1e088a3982c31/flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializer.java#L142]
 to create a restore serializer or a reconfigured serializer from a 
{{PojoSerializerSnapshot}}. Unfortunately, the restored serializer does not 
contain the class loader.

I could use below code within {{PojoSubclassSerializerTest}} to reproduce this.
{code:java}
@Test
public void testRestorePojoSerializer() throws IOException {
   TypeSerializer serializer = createSerializer();
   TestUserClass2 bar = new TestUserClass2(
  ThreadLocalRandom.current().nextInt(), "bar", 
ThreadLocalRandom.current().nextFloat());
   ByteArrayOutputStream out = new ByteArrayOutputStream();
   try (DataOutputViewStreamWrapper outView = new 
DataOutputViewStreamWrapper(out)) {
  serializer.serialize(bar, outView);
  DataInputViewStreamWrapper inputViewStreamWrapper =
 new DataInputViewStreamWrapper(new 
ByteArrayInputStream(out.toByteArray()));
  serializer.deserialize(inputViewStreamWrapper);
   }
   TypeSerializerSnapshot snapshot = 
serializer.snapshotConfiguration();
   TypeSerializer restoreSerializer = 
snapshot.restoreSerializer();
   try (DataInputViewStreamWrapper inputView =
  new DataInputViewStreamWrapper(new 
ByteArrayInputStream(out.toByteArray( {
  restoreSerializer.deserialize(inputView);
   }
}
 {code}

[~tzulitai], what do you think of this? If this is really a bug, I am glad to 
help to resolve this issue.

> java.lang.ClassNotFoundException when restore job
> -
>
> Key: FLINK-13159
> URL: https://issues.apache.org/jira/browse/FLINK-13159
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System
>Reporter: kring
>Priority: Critical
> Attachments: image-2019-08-05-17-29-40-351.png, 
> image-2019-08-05-17-32-44-988.png
>
>
> {code:java}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_b398b3dd4c544ddf2d47a0cc47d332f4_(1/6) from 
> any of the 1 prov
> ided restore options.
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
> ... 5 common frames omitted
> Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed 
> when trying to restore heap backend
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:130)
> at 
> org.apache.flink.runtime.state.filesystem.FsStateBackend.createKeyedStateBackend(FsStateBackend.java:489)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.j

[jira] [Commented] (FLINK-13159) java.lang.ClassNotFoundException when restore job

2019-08-05 Thread kring (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899946#comment-16899946
 ] 

kring commented on FLINK-13159:
---

My code uses a parent reference, which actually points to a subclass object. 
Then I tried debug Flink and found that CL was not passed in when this 
constructor !image-2019-08-05-17-29-40-351.png! was used to create 
PojoSerializer, resulting in no CL leading to Cannot instantiate class error 
when executing the 
org.apache.flink.api.java.typeutils.runtime.PojoSerializer#deserialize(org.apache.flink.core.memory.DataInputView)
 method.So I added  !image-2019-08-05-17-32-44-988.png! to the last line of the 
constructor, and then recompiled and packaged, which successfully solved the 
problem.[~ssailappan] I guess you have the same problem as me.

> java.lang.ClassNotFoundException when restore job
> -
>
> Key: FLINK-13159
> URL: https://issues.apache.org/jira/browse/FLINK-13159
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System
>Reporter: kring
>Priority: Critical
> Attachments: image-2019-08-05-17-32-44-988.png
>
>
> {code:java}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_b398b3dd4c544ddf2d47a0cc47d332f4_(1/6) from 
> any of the 1 prov
> ided restore options.
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
> ... 5 common frames omitted
> Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed 
> when trying to restore heap backend
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:130)
> at 
> org.apache.flink.runtime.state.filesystem.FsStateBackend.createKeyedStateBackend(FsStateBackend.java:489)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
> ... 7 common frames omitted
> Caused by: java.lang.RuntimeException: Cannot instantiate class.
> at 
> org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:384)
> at 
> org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders.lambda$createV2PlusReader$0(StateTableByKeyGroupReaders.java:74)
> at 
> org.apache.flink.runtime.state.KeyGroupPartitioner$PartitioningResultKeyGroupReader.readMappingsInKeyGroup(KeyGroupPartitioner.java:297)
> at 
> org.apache.flink.runtime.state.heap.HeapRestoreOperation.readKeyGroupStateData(HeapRestoreOperation.java:290)
> at 
> org.apache.flink.runtime.state.heap.HeapRestoreOperation.readStateHandleStateData(HeapRestoreOperation.java:251)
> at 
> org.apache.flink.runtime.state.heap.HeapRestoreOperation.restore(HeapRestoreOperation.java:153)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:127)
> ... 11 common frames omitted
> Caused by: java.lang.ClassNotFoundException: xxx
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at 
> org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:382)
> ... 17 common frames omitted
> {code}
> A strange problem with Flink is that after a task has been running properly 
> for a period of time, if any exception (such as ask timeout or ES request 
> timeout) is thrown, the task restart will report the above error (xxx is a 
> business model), and ten subsequent retries will not succeed, but the task 
> will be resubmitted. Then it 

[jira] [Commented] (FLINK-13159) java.lang.ClassNotFoundException when restore job

2019-07-29 Thread Subramanian Sailappan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895797#comment-16895797
 ] 

Subramanian Sailappan commented on FLINK-13159:
---

I am running into the same issue. I am running Flink 1.8.0 on EMR on a yarn 
cluster. The checkpoint is externalized and uses S3. The job runs fine for long 
times before it fails with the below exception and never recovers after that. 
It keeps trying to restore from the checkpoint and runs into the same error 
repeatedly.

 
java.lang.Exception: Exception while creating StreamOperatorStateContext.
at 
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state 
backend for WindowOperator_d28463815a9d818025b1ff96211a6dc9_(2/2) from any of 
the 1 provided restore options.
at 
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
at 
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307)
at 
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
... 5 more
Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed when 
trying to restore heap backend
at 
org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:130)
at 
org.apache.flink.runtime.state.filesystem.FsStateBackend.createKeyedStateBackend(FsStateBackend.java:489)
at 
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291)
at 
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
at 
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
... 7 more
Caused by: java.lang.RuntimeException: Cannot instantiate class.
at 
org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:384)
at 
org.apache.flink.api.common.typeutils.base.ListSerializer.deserialize(ListSerializer.java:133)
at 
org.apache.flink.api.common.typeutils.base.ListSerializer.deserialize(ListSerializer.java:42)
at 
org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders.lambda$createV2PlusReader$0(StateTableByKeyGroupReaders.java:74)
at 
org.apache.flink.runtime.state.KeyGroupPartitioner$PartitioningResultKeyGroupReader.readMappingsInKeyGroup(KeyGroupPartitioner.java:297)
at 
org.apache.flink.runtime.state.heap.HeapRestoreOperation.readKeyGroupStateData(HeapRestoreOperation.java:290)
at 
org.apache.flink.runtime.state.heap.HeapRestoreOperation.readStateHandleStateData(HeapRestoreOperation.java:251)
at 
org.apache.flink.runtime.state.heap.HeapRestoreOperation.restore(HeapRestoreOperation.java:153)
at 
org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:127)
... 11 more
Caused by: java.lang.ClassNotFoundException: xxx
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:382)
... 19 more

> java.lang.ClassNotFoundException when restore job
> -
>
> Key: FLINK-13159
> URL: https://issues.apache.org/jira/browse/FLINK-13159
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System
>Reporter: kring
>Priority: Critical
>
> {code:java}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
> at 
>

[jira] [Commented] (FLINK-13159) java.lang.ClassNotFoundException when restore job

2019-07-26 Thread Liu Bo (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894254#comment-16894254
 ] 

Liu Bo commented on FLINK-13159:


I'm using flink 1.8.1 on yarn session with rocksdb as state backend.

got the same problem when starting flink jobs with checkpoints: bin/flink run 
-s /flink/checkpoints/c2b8fa3c51393a2c6865ca13045eccad/chk-84 deploy/xxx.jar

job keep retrying and fail after about 8 seconds. 

Our job runs on flink 1.7.2 with no problem for about 4 months and could 
recover successfully. 

 

Full stack trace: 

java.lang.RuntimeException: Exception occurred while processing valve output 
watermark:
 at 
org.apache.flink.streaming.runtime.io.StreamInputProcessor$ForwardingValveOutputHandler.handleWatermark(StreamInputProcessor.java:265)
 at 
org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.findAndOutputNewMinWatermarkAcrossAlignedChannels(StatusWatermarkValve.java:189)
 at 
org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.inputWatermark(StatusWatermarkValve.java:111)
 at 
org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:184)
 at 
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Cannot instantiate class.
 at 
org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:384)
 at 
org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:143)
 at 
org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:37)
 at 
org.apache.flink.streaming.api.datastream.CoGroupedStreams$UnionSerializer.deserialize(CoGroupedStreams.java:581)
 at 
org.apache.flink.streaming.api.datastream.CoGroupedStreams$UnionSerializer.deserialize(CoGroupedStreams.java:506)
 at 
org.apache.flink.contrib.streaming.state.RocksDBListState.deserializeNextElement(RocksDBListState.java:144)
 at 
org.apache.flink.contrib.streaming.state.RocksDBListState.deserializeList(RocksDBListState.java:135)
 at 
org.apache.flink.contrib.streaming.state.RocksDBListState.getInternal(RocksDBListState.java:119)
 at 
org.apache.flink.contrib.streaming.state.RocksDBListState.get(RocksDBListState.java:111)
 at 
org.apache.flink.contrib.streaming.state.RocksDBListState.get(RocksDBListState.java:60)
 at 
org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.onEventTime(WindowOperator.java:452)
 at 
org.apache.flink.streaming.api.operators.InternalTimerServiceImpl.advanceWatermark(InternalTimerServiceImpl.java:255)
 at 
org.apache.flink.streaming.api.operators.InternalTimeServiceManager.advanceWatermark(InternalTimeServiceManager.java:128)
 at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.processWatermark(AbstractStreamOperator.java:775)
 at 
org.apache.flink.streaming.runtime.io.StreamInputProcessor$ForwardingValveOutputHandler.handleWatermark(StreamInputProcessor.java:262)
 ... 7 more
Caused by: java.lang.ClassNotFoundException: com/xx/xx/xx
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:348)
 at 
org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:382)

> java.lang.ClassNotFoundException when restore job
> -
>
> Key: FLINK-13159
> URL: https://issues.apache.org/jira/browse/FLINK-13159
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System
>Reporter: kring
>Priority: Critical
>
> {code:java}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_b398b3dd4c544ddf2d47a0cc47d332f4_(1/6) from 
> any of the 1 prov
> ided restore options.
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedSta

[jira] [Commented] (FLINK-13159) java.lang.ClassNotFoundException when restore job

2019-07-08 Thread kring (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880987#comment-16880987
 ] 

kring commented on FLINK-13159:
---

My taskManager and JobManager are on the same machine, using local files as 
backends. Therefore, I did not configure local recovery. Just now, I created a 
savepoint for this task and tried to recover from the savepoint. Then I 
reproduced the exception 100 percent, so I'm going to debug flink.

> java.lang.ClassNotFoundException when restore job
> -
>
> Key: FLINK-13159
> URL: https://issues.apache.org/jira/browse/FLINK-13159
> Project: Flink
>  Issue Type: Bug
>Reporter: kring
>Priority: Critical
>
> {code:java}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_b398b3dd4c544ddf2d47a0cc47d332f4_(1/6) from 
> any of the 1 prov
> ided restore options.
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
> ... 5 common frames omitted
> Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed 
> when trying to restore heap backend
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:130)
> at 
> org.apache.flink.runtime.state.filesystem.FsStateBackend.createKeyedStateBackend(FsStateBackend.java:489)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
> ... 7 common frames omitted
> Caused by: java.lang.RuntimeException: Cannot instantiate class.
> at 
> org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:384)
> at 
> org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders.lambda$createV2PlusReader$0(StateTableByKeyGroupReaders.java:74)
> at 
> org.apache.flink.runtime.state.KeyGroupPartitioner$PartitioningResultKeyGroupReader.readMappingsInKeyGroup(KeyGroupPartitioner.java:297)
> at 
> org.apache.flink.runtime.state.heap.HeapRestoreOperation.readKeyGroupStateData(HeapRestoreOperation.java:290)
> at 
> org.apache.flink.runtime.state.heap.HeapRestoreOperation.readStateHandleStateData(HeapRestoreOperation.java:251)
> at 
> org.apache.flink.runtime.state.heap.HeapRestoreOperation.restore(HeapRestoreOperation.java:153)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:127)
> ... 11 common frames omitted
> Caused by: java.lang.ClassNotFoundException: xxx
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at 
> org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:382)
> ... 17 common frames omitted
> {code}
> A strange problem with Flink is that after a task has been running properly 
> for a period of time, if any exception (such as ask timeout or ES request 
> timeout) is thrown, the task restart will report the above error (xxx is a 
> business model), and ten subsequent retries will not succeed, but the task 
> will be resubmitted. Then it can run normally. In addition, there are three 
> other tasks running at the same time, none of which has the problem.
> My flink version is 1.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-13159) java.lang.ClassNotFoundException when restore job

2019-07-08 Thread Yun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880972#comment-16880972
 ] 

Yun Tang commented on FLINK-13159:
--

[~kring] would you please give more details. From your description, you would 
meet the exception if your job failover. However, this could come to normal 
just after several times retries. Have you ever turned on [local 
recovery|https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#task-local-recovery],
 and if the failling task would come to normal once it run on another worker 
node?

> java.lang.ClassNotFoundException when restore job
> -
>
> Key: FLINK-13159
> URL: https://issues.apache.org/jira/browse/FLINK-13159
> Project: Flink
>  Issue Type: Bug
>Reporter: kring
>Priority: Critical
>
> {code:java}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_b398b3dd4c544ddf2d47a0cc47d332f4_(1/6) from 
> any of the 1 prov
> ided restore options.
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
> ... 5 common frames omitted
> Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed 
> when trying to restore heap backend
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:130)
> at 
> org.apache.flink.runtime.state.filesystem.FsStateBackend.createKeyedStateBackend(FsStateBackend.java:489)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
> ... 7 common frames omitted
> Caused by: java.lang.RuntimeException: Cannot instantiate class.
> at 
> org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:384)
> at 
> org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders.lambda$createV2PlusReader$0(StateTableByKeyGroupReaders.java:74)
> at 
> org.apache.flink.runtime.state.KeyGroupPartitioner$PartitioningResultKeyGroupReader.readMappingsInKeyGroup(KeyGroupPartitioner.java:297)
> at 
> org.apache.flink.runtime.state.heap.HeapRestoreOperation.readKeyGroupStateData(HeapRestoreOperation.java:290)
> at 
> org.apache.flink.runtime.state.heap.HeapRestoreOperation.readStateHandleStateData(HeapRestoreOperation.java:251)
> at 
> org.apache.flink.runtime.state.heap.HeapRestoreOperation.restore(HeapRestoreOperation.java:153)
> at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:127)
> ... 11 common frames omitted
> Caused by: java.lang.ClassNotFoundException: xxx
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at 
> org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:382)
> ... 17 common frames omitted
> {code}
> A strange problem with Flink is that after a task has been running properly 
> for a period of time, if any exception (such as ask timeout or ES request 
> timeout) is thrown, the task restart will report the above error (xxx is a 
> business model), and ten subsequent retries will not succeed, but the task 
> will be resubmitted. Then it can run normally. In addition, there are three 
> other tasks running at the same time, none of which has the problem.
> My flink version is 1.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)