[jira] [Commented] (FLINK-5407) Savepoint for iterative Task fails.

2017-01-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821531#comment-15821531
 ] 

ASF GitHub Bot commented on FLINK-5407:
---

Github user senorcarbone commented on the issue:

https://github.com/apache/flink/pull/3088
  
Hey! In this upcoming [PR ](https://github.com/apache/flink/pull/1668) 
there is a custom operator responsible for the logging at the iteration head. 
Hope this does not break your tests in case they assume a null op.
Also, in case any of you is familiar with asynchronous checkpoints with 
ListState, could you check my last question in the same PR? thanks :)


> Savepoint for iterative Task fails.
> ---
>
> Key: FLINK-5407
> URL: https://issues.apache.org/jira/browse/FLINK-5407
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0
>Reporter: Stephan Ewen
>Assignee: Stefan Richter
> Fix For: 1.2.0, 1.3.0
>
> Attachments: SavepointBug.java
>
>
> Flink 1.2-SNAPSHOT (Commit: 5b54009) on Windows.
> Triggering a savepoint for a streaming job, both the savepoint and the job 
> failed.
> The job failed with the following exception:
> {code}
> java.lang.RuntimeException: Error while triggering checkpoint for 
> IterationSource-7 (1/1)
>   at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1026)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>   at java.util.concurrent.FutureTask.run(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>   at java.lang.Thread.run(Unknown Source)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.createOperatorIdentifier(StreamTask.java:767)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.access$500(StreamTask.java:115)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.createStreamFactory(StreamTask.java:986)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:956)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:583)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:551)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:511)
>   at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1019)
>   ... 5 more
> And the savepoint failed with the following exception:
> Using address /127.0.0.1:6123 to connect to JobManager.
> Triggering savepoint for job 153310c4a836a92ce69151757c6b73f1.
> Waiting for response...
> 
>  The program finished with the following exception:
> java.lang.Exception: Failed to complete savepoint
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$7.apply(JobManager.scala:793)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$7.apply(JobManager.scala:782)
> at 
> org.apache.flink.runtime.concurrent.impl.FlinkFuture$6.recover(FlinkFuture.java:263)
> at akka.dispatch.Recover.internal(Future.scala:267)
> at akka.dispatch.japi$RecoverBridge.apply(Future.scala:183)
> at akka.dispatch.japi$RecoverBridge.apply(Future.scala:181)
> at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
> at scala.util.Try$.apply(Try.scala:161)
> at scala.util.Failure.recover(Try.scala:185)
> at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
> at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
> at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
> at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
> at 
> 

[jira] [Commented] (FLINK-5407) Savepoint for iterative Task fails.

2017-01-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821489#comment-15821489
 ] 

ASF GitHub Bot commented on FLINK-5407:
---

Github user StefanRRichter closed the pull request at:

https://github.com/apache/flink/pull/3088


> Savepoint for iterative Task fails.
> ---
>
> Key: FLINK-5407
> URL: https://issues.apache.org/jira/browse/FLINK-5407
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0
>Reporter: Stephan Ewen
>Assignee: Stefan Richter
> Fix For: 1.2.0, 1.3.0
>
> Attachments: SavepointBug.java
>
>
> Flink 1.2-SNAPSHOT (Commit: 5b54009) on Windows.
> Triggering a savepoint for a streaming job, both the savepoint and the job 
> failed.
> The job failed with the following exception:
> {code}
> java.lang.RuntimeException: Error while triggering checkpoint for 
> IterationSource-7 (1/1)
>   at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1026)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>   at java.util.concurrent.FutureTask.run(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>   at java.lang.Thread.run(Unknown Source)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.createOperatorIdentifier(StreamTask.java:767)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.access$500(StreamTask.java:115)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.createStreamFactory(StreamTask.java:986)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:956)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:583)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:551)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:511)
>   at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1019)
>   ... 5 more
> And the savepoint failed with the following exception:
> Using address /127.0.0.1:6123 to connect to JobManager.
> Triggering savepoint for job 153310c4a836a92ce69151757c6b73f1.
> Waiting for response...
> 
>  The program finished with the following exception:
> java.lang.Exception: Failed to complete savepoint
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$7.apply(JobManager.scala:793)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$7.apply(JobManager.scala:782)
> at 
> org.apache.flink.runtime.concurrent.impl.FlinkFuture$6.recover(FlinkFuture.java:263)
> at akka.dispatch.Recover.internal(Future.scala:267)
> at akka.dispatch.japi$RecoverBridge.apply(Future.scala:183)
> at akka.dispatch.japi$RecoverBridge.apply(Future.scala:181)
> at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
> at scala.util.Try$.apply(Try.scala:161)
> at scala.util.Failure.recover(Try.scala:185)
> at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
> at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
> at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
> at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
> at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.Exception: Checkpoint failed: Checkpoint 

[jira] [Commented] (FLINK-5407) Savepoint for iterative Task fails.

2017-01-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821488#comment-15821488
 ] 

ASF GitHub Bot commented on FLINK-5407:
---

Github user StefanRRichter commented on the issue:

https://github.com/apache/flink/pull/3088
  
Thanks for reviewing and merging @aljoscha !


> Savepoint for iterative Task fails.
> ---
>
> Key: FLINK-5407
> URL: https://issues.apache.org/jira/browse/FLINK-5407
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0
>Reporter: Stephan Ewen
>Assignee: Stefan Richter
> Fix For: 1.2.0, 1.3.0
>
> Attachments: SavepointBug.java
>
>
> Flink 1.2-SNAPSHOT (Commit: 5b54009) on Windows.
> Triggering a savepoint for a streaming job, both the savepoint and the job 
> failed.
> The job failed with the following exception:
> {code}
> java.lang.RuntimeException: Error while triggering checkpoint for 
> IterationSource-7 (1/1)
>   at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1026)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>   at java.util.concurrent.FutureTask.run(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>   at java.lang.Thread.run(Unknown Source)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.createOperatorIdentifier(StreamTask.java:767)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.access$500(StreamTask.java:115)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.createStreamFactory(StreamTask.java:986)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:956)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:583)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:551)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:511)
>   at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1019)
>   ... 5 more
> And the savepoint failed with the following exception:
> Using address /127.0.0.1:6123 to connect to JobManager.
> Triggering savepoint for job 153310c4a836a92ce69151757c6b73f1.
> Waiting for response...
> 
>  The program finished with the following exception:
> java.lang.Exception: Failed to complete savepoint
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$7.apply(JobManager.scala:793)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$7.apply(JobManager.scala:782)
> at 
> org.apache.flink.runtime.concurrent.impl.FlinkFuture$6.recover(FlinkFuture.java:263)
> at akka.dispatch.Recover.internal(Future.scala:267)
> at akka.dispatch.japi$RecoverBridge.apply(Future.scala:183)
> at akka.dispatch.japi$RecoverBridge.apply(Future.scala:181)
> at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
> at scala.util.Try$.apply(Try.scala:161)
> at scala.util.Failure.recover(Try.scala:185)
> at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
> at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
> at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
> at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
> at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: 

[jira] [Commented] (FLINK-5407) Savepoint for iterative Task fails.

2017-01-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821482#comment-15821482
 ] 

ASF GitHub Bot commented on FLINK-5407:
---

Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/3088
  
Merged, could you please close this PR?


> Savepoint for iterative Task fails.
> ---
>
> Key: FLINK-5407
> URL: https://issues.apache.org/jira/browse/FLINK-5407
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0
>Reporter: Stephan Ewen
>Assignee: Stefan Richter
> Fix For: 1.2.0, 1.3.0
>
> Attachments: SavepointBug.java
>
>
> Flink 1.2-SNAPSHOT (Commit: 5b54009) on Windows.
> Triggering a savepoint for a streaming job, both the savepoint and the job 
> failed.
> The job failed with the following exception:
> {code}
> java.lang.RuntimeException: Error while triggering checkpoint for 
> IterationSource-7 (1/1)
>   at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1026)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>   at java.util.concurrent.FutureTask.run(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>   at java.lang.Thread.run(Unknown Source)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.createOperatorIdentifier(StreamTask.java:767)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.access$500(StreamTask.java:115)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.createStreamFactory(StreamTask.java:986)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:956)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:583)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:551)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:511)
>   at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1019)
>   ... 5 more
> And the savepoint failed with the following exception:
> Using address /127.0.0.1:6123 to connect to JobManager.
> Triggering savepoint for job 153310c4a836a92ce69151757c6b73f1.
> Waiting for response...
> 
>  The program finished with the following exception:
> java.lang.Exception: Failed to complete savepoint
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$7.apply(JobManager.scala:793)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$7.apply(JobManager.scala:782)
> at 
> org.apache.flink.runtime.concurrent.impl.FlinkFuture$6.recover(FlinkFuture.java:263)
> at akka.dispatch.Recover.internal(Future.scala:267)
> at akka.dispatch.japi$RecoverBridge.apply(Future.scala:183)
> at akka.dispatch.japi$RecoverBridge.apply(Future.scala:181)
> at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
> at scala.util.Try$.apply(Try.scala:161)
> at scala.util.Failure.recover(Try.scala:185)
> at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
> at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
> at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
> at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
> at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: 

[jira] [Commented] (FLINK-5407) Savepoint for iterative Task fails.

2017-01-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820786#comment-15820786
 ] 

ASF GitHub Bot commented on FLINK-5407:
---

Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/3088
  
Changes look very good! I fixed the formatting of the newly added methods 
in `TestingCluster` to conform to Scala coding guidelines.

I rebased on master, will wait for Travis to give the green light and then 
merge.


> Savepoint for iterative Task fails.
> ---
>
> Key: FLINK-5407
> URL: https://issues.apache.org/jira/browse/FLINK-5407
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0
>Reporter: Stephan Ewen
>Assignee: Stefan Richter
> Fix For: 1.2.0, 1.3.0
>
> Attachments: SavepointBug.java
>
>
> Flink 1.2-SNAPSHOT (Commit: 5b54009) on Windows.
> Triggering a savepoint for a streaming job, both the savepoint and the job 
> failed.
> The job failed with the following exception:
> {code}
> java.lang.RuntimeException: Error while triggering checkpoint for 
> IterationSource-7 (1/1)
>   at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1026)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>   at java.util.concurrent.FutureTask.run(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>   at java.lang.Thread.run(Unknown Source)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.createOperatorIdentifier(StreamTask.java:767)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.access$500(StreamTask.java:115)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.createStreamFactory(StreamTask.java:986)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:956)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:583)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:551)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:511)
>   at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1019)
>   ... 5 more
> And the savepoint failed with the following exception:
> Using address /127.0.0.1:6123 to connect to JobManager.
> Triggering savepoint for job 153310c4a836a92ce69151757c6b73f1.
> Waiting for response...
> 
>  The program finished with the following exception:
> java.lang.Exception: Failed to complete savepoint
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$7.apply(JobManager.scala:793)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$7.apply(JobManager.scala:782)
> at 
> org.apache.flink.runtime.concurrent.impl.FlinkFuture$6.recover(FlinkFuture.java:263)
> at akka.dispatch.Recover.internal(Future.scala:267)
> at akka.dispatch.japi$RecoverBridge.apply(Future.scala:183)
> at akka.dispatch.japi$RecoverBridge.apply(Future.scala:181)
> at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
> at scala.util.Try$.apply(Try.scala:161)
> at scala.util.Failure.recover(Try.scala:185)
> at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
> at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
> at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
> at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
> at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at 
> 

[jira] [Commented] (FLINK-5407) Savepoint for iterative Task fails.

2017-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15815285#comment-15815285
 ] 

ASF GitHub Bot commented on FLINK-5407:
---

GitHub user StefanRRichter opened a pull request:

https://github.com/apache/flink/pull/3088

[FLINK-5407] Fix savepoints for iterative jobs

This PR fixes savepoints for iterative jobs. Savepoints failed with NPE 
because the code assumed that operators in an operator chain are never null. 
For iterative jobs, this can happen.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StefanRRichter/flink NPE-Iterative-Snapshot

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3088.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3088


commit 984d596c063b5082520d8d58baa6b7361b1e9921
Author: Stefan Richter 
Date:   2017-01-05T13:28:50Z

[FLINK-5407] Handle snapshoting null-operator in chain

commit c96fe7ba35764b4f9e05ed61199b2027981daa54
Author: Stefan Richter 
Date:   2017-01-10T15:08:06Z

[FLINK-5407] IT case for savepoint with iterative job




> Savepoint for iterative Task fails.
> ---
>
> Key: FLINK-5407
> URL: https://issues.apache.org/jira/browse/FLINK-5407
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0
>Reporter: Stephan Ewen
>Assignee: Stefan Richter
> Fix For: 1.2.0, 1.3.0
>
> Attachments: SavepointBug.java
>
>
> Flink 1.2-SNAPSHOT (Commit: 5b54009) on Windows.
> Triggering a savepoint for a streaming job, both the savepoint and the job 
> failed.
> The job failed with the following exception:
> {code}
> java.lang.RuntimeException: Error while triggering checkpoint for 
> IterationSource-7 (1/1)
>   at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1026)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>   at java.util.concurrent.FutureTask.run(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>   at java.lang.Thread.run(Unknown Source)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.createOperatorIdentifier(StreamTask.java:767)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.access$500(StreamTask.java:115)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.createStreamFactory(StreamTask.java:986)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:956)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:583)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:551)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:511)
>   at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1019)
>   ... 5 more
> And the savepoint failed with the following exception:
> Using address /127.0.0.1:6123 to connect to JobManager.
> Triggering savepoint for job 153310c4a836a92ce69151757c6b73f1.
> Waiting for response...
> 
>  The program finished with the following exception:
> java.lang.Exception: Failed to complete savepoint
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$7.apply(JobManager.scala:793)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$7.apply(JobManager.scala:782)
> at 
> org.apache.flink.runtime.concurrent.impl.FlinkFuture$6.recover(FlinkFuture.java:263)
> at akka.dispatch.Recover.internal(Future.scala:267)
> at akka.dispatch.japi$RecoverBridge.apply(Future.scala:183)
> at akka.dispatch.japi$RecoverBridge.apply(Future.scala:181)
> at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
> at scala.util.Try$.apply(Try.scala:161)
> at scala.util.Failure.recover(Try.scala:185)
> at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
> at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
> at 
> 

[jira] [Commented] (FLINK-5407) Savepoint for iterative Task fails.

2017-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15815287#comment-15815287
 ] 

ASF GitHub Bot commented on FLINK-5407:
---

Github user StefanRRichter commented on the issue:

https://github.com/apache/flink/pull/3088
  
cc @aljoscha 


> Savepoint for iterative Task fails.
> ---
>
> Key: FLINK-5407
> URL: https://issues.apache.org/jira/browse/FLINK-5407
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0
>Reporter: Stephan Ewen
>Assignee: Stefan Richter
> Fix For: 1.2.0, 1.3.0
>
> Attachments: SavepointBug.java
>
>
> Flink 1.2-SNAPSHOT (Commit: 5b54009) on Windows.
> Triggering a savepoint for a streaming job, both the savepoint and the job 
> failed.
> The job failed with the following exception:
> {code}
> java.lang.RuntimeException: Error while triggering checkpoint for 
> IterationSource-7 (1/1)
>   at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1026)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>   at java.util.concurrent.FutureTask.run(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>   at java.lang.Thread.run(Unknown Source)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.createOperatorIdentifier(StreamTask.java:767)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.access$500(StreamTask.java:115)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.createStreamFactory(StreamTask.java:986)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:956)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:583)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:551)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:511)
>   at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1019)
>   ... 5 more
> And the savepoint failed with the following exception:
> Using address /127.0.0.1:6123 to connect to JobManager.
> Triggering savepoint for job 153310c4a836a92ce69151757c6b73f1.
> Waiting for response...
> 
>  The program finished with the following exception:
> java.lang.Exception: Failed to complete savepoint
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$7.apply(JobManager.scala:793)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$7.apply(JobManager.scala:782)
> at 
> org.apache.flink.runtime.concurrent.impl.FlinkFuture$6.recover(FlinkFuture.java:263)
> at akka.dispatch.Recover.internal(Future.scala:267)
> at akka.dispatch.japi$RecoverBridge.apply(Future.scala:183)
> at akka.dispatch.japi$RecoverBridge.apply(Future.scala:181)
> at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
> at scala.util.Try$.apply(Try.scala:161)
> at scala.util.Failure.recover(Try.scala:185)
> at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
> at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
> at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
> at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
> at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.Exception: Checkpoint