Hey Folks:   
Please let me know how to resolve this issue since using 
--allowNonRestoredState without knowing if any state will be lost seems risky.
Thanks    On Friday, November 22, 2019, 02:55:09 PM EST, M Singh 
<mans2si...@yahoo.com> wrote:  
 
 Hi:
I have a flink application in which some of the operators have uid and name and 
some stateless ones don't.
I've taken a save point and tried to start another instance of the application 
from a savepoint - I get the following exception which indicates that the 
operator is not available to the new program even though the second job is the 
same as first but just running from the first jobs savepoint.

 

Caused by: java.lang.IllegalStateException: Failed to rollback to 
checkpoint/savepoint 
s3://mybucket/state/savePoint/mysavepointfolder/66s4c6402d7532801287290436fa9fadd/savepoint-664c64-fa235d26d379.
 Cannot map checkpoint/savepoint state for operator 
d1a56c5a9ce5e3f1b03e01cac458bb4f to the new program, because the operator is 
not available in the new program. If you want to allow to skip this, you can 
set the --allowNonRestoredState option on the CLI.

 at 
org.apache.flink.runtime.checkpoint.Checkpoints.loadAndValidateCheckpoint(Checkpoints.java:205)

 at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1102)

 at 
org.apache.flink.runtime.jobmaster.JobMaster.tryRestoreExecutionGraphFromSavepoint(JobMaster.java:1219)

 at 
org.apache.flink.runtime.jobmaster.JobMaster.createAndRestoreExecutionGraph(JobMaster.java:1143)

 at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:294)

 at 
org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:157)

 ... 10 more




I've tried to start an application instance from the checkpoint too of the 
first instance but it gives the same exception indicating that the operator is 
not available.
Questions:
1. If this a problem because some of the operators don't have uid ?2. Is it 
required to have uids even for stateless operators like simple map or filter 
operators ?3. Is there a way to find out which operator is not available in the 
new application even though I am running the same application ?4. Is there a 
way to figure out if this is the only missing operator or are there others 
whose mapping is missing for the second instance run ?5. Is this issue resolved 
in Apache Flink 1.9 (since I am still using Flink 1.6)
If there any additional pointers please let me know.
Thanks
Mans

  

Reply via email to