[jira] [Closed] (FLINK-7760) Restore failing from external checkpointing metadata.

2018-02-07 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed FLINK-7760.
---
Resolution: Fixed

Fixed on release-1.4 in
5f9e367be6429383be5d0f1ff80e3b77d5a0dda8

Fixed on master in
c212701d56cfe9cffd9e5dc1e34c3483a50f8182

[~shashank734] please reopen the issue if it doesn't fix your problem

> Restore failing from external checkpointing metadata.
> -
>
> Key: FLINK-7760
> URL: https://issues.apache.org/jira/browse/FLINK-7760
> Project: Flink
>  Issue Type: Sub-task
>  Components: CEP, State Backends, Checkpointing
>Affects Versions: 1.4.0, 1.3.2
> Environment: Yarn, Flink 1.3.2, HDFS,  FsStateBackend
>Reporter: Shashank Agarwal
>Priority: Blocker
> Fix For: 1.5.0, 1.4.1
>
>
> My job failed due to failure of cassandra. I have enabled 
> ExternalizedCheckpoints. But when job tried to restore from that checkpoint 
> it's failing continuously with following error.
> {code:java}
> 2017-10-04 09:39:20,611 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - KeyedCEPPatternOperator -> Map (1/2) 
> (8ff7913f820ead571c8b54ccc6b16045) switched from RUNNING to FAILED.
> java.lang.IllegalStateException: Could not initialize keyed state backend.
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:321)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:217)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:676)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:663)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:252)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.StreamCorruptedException: invalid type code: 00
>   at 
> java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2519)
>   at 
> java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2553)
>   at 
> java.io.ObjectInputStream$BlockDataInputStream.skipBlockData(ObjectInputStream.java:2455)
>   at java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:1951)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
>   at 
> org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeCondition(NFA.java:1211)
>   at 
> org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeStates(NFA.java:1169)
>   at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:957)
>   at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:852)
>   at 
> org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders$StateTableByKeyGroupReaderV2V3.readMappingsInKeyGroup(StateTableByKeyGroupReaders.java:132)
>   at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:518)
>   at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:397)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.createKeyedStateBackend(StreamTask.java:772)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:311)
>   ... 6 more
> {code}
> I have tried to start new job also after failure with parameter {code:java} 
> -s [checkpoint meta data path]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-7760) Restore failing from external checkpointing metadata.

2017-10-13 Thread Kostas Kloudas (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kostas Kloudas closed FLINK-7760.
-
Resolution: Fixed

Hi, 

We just merged 2 in the master fixes for the following issues. 

https://issues.apache.org/jira/browse/FLINK-7835
https://issues.apache.org/jira/browse/FLINK-7484

I believe that these also fix the issue in this JIRA, this is why I am closing 
it.
Please try the master, and if the problem persists, feel free to re-open it.

> Restore failing from external checkpointing metadata.
> -
>
> Key: FLINK-7760
> URL: https://issues.apache.org/jira/browse/FLINK-7760
> Project: Flink
>  Issue Type: Sub-task
>  Components: CEP, State Backends, Checkpointing
>Affects Versions: 1.3.2
> Environment: Yarn, Flink 1.3.2, HDFS,  FsStateBackend
>Reporter: Shashank Agarwal
>Priority: Blocker
> Fix For: 1.4.0
>
>
> My job failed due to failure of cassandra. I have enabled 
> ExternalizedCheckpoints. But when job tried to restore from that checkpoint 
> it's failing continuously with following error.
> {code:java}
> 2017-10-04 09:39:20,611 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - KeyedCEPPatternOperator -> Map (1/2) 
> (8ff7913f820ead571c8b54ccc6b16045) switched from RUNNING to FAILED.
> java.lang.IllegalStateException: Could not initialize keyed state backend.
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:321)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:217)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:676)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:663)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:252)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.StreamCorruptedException: invalid type code: 00
>   at 
> java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2519)
>   at 
> java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2553)
>   at 
> java.io.ObjectInputStream$BlockDataInputStream.skipBlockData(ObjectInputStream.java:2455)
>   at java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:1951)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
>   at 
> org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeCondition(NFA.java:1211)
>   at 
> org.apache.flink.cep.nfa.NFA$NFASerializer.deserializeStates(NFA.java:1169)
>   at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:957)
>   at org.apache.flink.cep.nfa.NFA$NFASerializer.deserialize(NFA.java:852)
>   at 
> org.apache.flink.runtime.state.heap.StateTableByKeyGroupReaders$StateTableByKeyGroupReaderV2V3.readMappingsInKeyGroup(StateTableByKeyGroupReaders.java:132)
>   at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:518)
>   at 
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:397)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.createKeyedStateBackend(StreamTask.java:772)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:311)
>   ... 6 more
> {code}
> I have tried to start new job also after failure with parameter {code:java} 
> -s [checkpoint meta data path]{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)