[jira] [Commented] (FLINK-2929) Recovery of jobs on cluster restarts

2015-11-03 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14987374#comment-14987374
 ] 

Till Rohrmann commented on FLINK-2929:
--

We could auto generate a random ZNode path for each cluster start. In case of a 
clean shutdown this path could be removed if not explicitly set to be kept. 
When starting a new cluster we then could add an option to start with a 
specific znode path in order to recover from or in case of an upgrade. However, 
this has the disadvantage that the user would be responsible for cleaning up 
the state data when it's no longer needed.

> Recovery of jobs on cluster restarts
> 
>
> Key: FLINK-2929
> URL: https://issues.apache.org/jira/browse/FLINK-2929
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 0.10
>Reporter: Ufuk Celebi
>
> Recovery information is stored in ZooKeeper under a static root like 
> {{/flink}}. In case of a cluster restart without canceling running jobs old 
> jobs will be recovered from ZooKeeper.
> This can be confusing or helpful depending on the use case.
> I suspect that the confusing case will be more common.
> We can change the default cluster start up (e.g. new YARN session or new 
> ./start-cluster call) to purge all existing data in ZooKeeper and add a flag 
> to not do this if needed.
> [~trohrm...@apache.org], [~aljoscha], [~StephanEwen] what's your opinion?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2929) Recovery of jobs on cluster restarts

2015-11-03 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14987120#comment-14987120
 ] 

Ufuk Celebi commented on FLINK-2929:


Another related issue: running multiple Flink HA clusters with the same root 
ZNode path is also problematic. You can work around this by configuring the 
root path, but this might be a confusing out of the box behaviour.

> Recovery of jobs on cluster restarts
> 
>
> Key: FLINK-2929
> URL: https://issues.apache.org/jira/browse/FLINK-2929
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 0.10
>Reporter: Ufuk Celebi
>
> Recovery information is stored in ZooKeeper under a static root like 
> {{/flink}}. In case of a cluster restart without canceling running jobs old 
> jobs will be recovered from ZooKeeper.
> This can be confusing or helpful depending on the use case.
> I suspect that the confusing case will be more common.
> We can change the default cluster start up (e.g. new YARN session or new 
> ./start-cluster call) to purge all existing data in ZooKeeper and add a flag 
> to not do this if needed.
> [~trohrm...@apache.org], [~aljoscha], [~StephanEwen] what's your opinion?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2929) Recovery of jobs on cluster restarts

2015-11-02 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985022#comment-14985022
 ] 

Till Rohrmann commented on FLINK-2929:
--

Sure, but for me it would make more sense to add the option for the case which 
is more unlikely and that's probably the upgrading case.

> Recovery of jobs on cluster restarts
> 
>
> Key: FLINK-2929
> URL: https://issues.apache.org/jira/browse/FLINK-2929
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 0.10
>Reporter: Ufuk Celebi
>
> Recovery information is stored in ZooKeeper under a static root like 
> {{/flink}}. In case of a cluster restart without canceling running jobs old 
> jobs will be recovered from ZooKeeper.
> This can be confusing or helpful depending on the use case.
> I suspect that the confusing case will be more common.
> We can change the default cluster start up (e.g. new YARN session or new 
> ./start-cluster call) to purge all existing data in ZooKeeper and add a flag 
> to not do this if needed.
> [~trohrm...@apache.org], [~aljoscha], [~StephanEwen] what's your opinion?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2929) Recovery of jobs on cluster restarts

2015-11-02 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986310#comment-14986310
 ] 

Stephan Ewen commented on FLINK-2929:
-

That sounds good, but should be have the option before we add the "purge 
zookeeper" mode as the default, so that there is always one way for upgrades 
possible?

> Recovery of jobs on cluster restarts
> 
>
> Key: FLINK-2929
> URL: https://issues.apache.org/jira/browse/FLINK-2929
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 0.10
>Reporter: Ufuk Celebi
>
> Recovery information is stored in ZooKeeper under a static root like 
> {{/flink}}. In case of a cluster restart without canceling running jobs old 
> jobs will be recovered from ZooKeeper.
> This can be confusing or helpful depending on the use case.
> I suspect that the confusing case will be more common.
> We can change the default cluster start up (e.g. new YARN session or new 
> ./start-cluster call) to purge all existing data in ZooKeeper and add a flag 
> to not do this if needed.
> [~trohrm...@apache.org], [~aljoscha], [~StephanEwen] what's your opinion?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2929) Recovery of jobs on cluster restarts

2015-10-27 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14976712#comment-14976712
 ] 

Aljoscha Krettek commented on FLINK-2929:
-

I think we have to fix it, yes. I'm not sure which should be the default 
behavior though. I gravitate towards making recovery of old jobs the default. 
But I see how it could be confusing...

> Recovery of jobs on cluster restarts
> 
>
> Key: FLINK-2929
> URL: https://issues.apache.org/jira/browse/FLINK-2929
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 0.10
>Reporter: Ufuk Celebi
>
> Recovery information is stored in ZooKeeper under a static root like 
> {{/flink}}. In case of a cluster restart without canceling running jobs old 
> jobs will be recovered from ZooKeeper.
> This can be confusing or helpful depending on the use case.
> I suspect that the confusing case will be more common.
> We can change the default cluster start up (e.g. new YARN session or new 
> ./start-cluster call) to purge all existing data in ZooKeeper and add a flag 
> to not do this if needed.
> [~trohrm...@apache.org], [~aljoscha], [~StephanEwen] what's your opinion?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2929) Recovery of jobs on cluster restarts

2015-10-27 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14976835#comment-14976835
 ] 

Ufuk Celebi commented on FLINK-2929:


Maybe you are right. Keeping it as it is (and adding an option to purge) makes 
sure that jobs are not removed accidentally. And it's possible to cancel an old 
job after a restart. 

> Recovery of jobs on cluster restarts
> 
>
> Key: FLINK-2929
> URL: https://issues.apache.org/jira/browse/FLINK-2929
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 0.10
>Reporter: Ufuk Celebi
>
> Recovery information is stored in ZooKeeper under a static root like 
> {{/flink}}. In case of a cluster restart without canceling running jobs old 
> jobs will be recovered from ZooKeeper.
> This can be confusing or helpful depending on the use case.
> I suspect that the confusing case will be more common.
> We can change the default cluster start up (e.g. new YARN session or new 
> ./start-cluster call) to purge all existing data in ZooKeeper and add a flag 
> to not do this if needed.
> [~trohrm...@apache.org], [~aljoscha], [~StephanEwen] what's your opinion?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)