[jira] [Updated] (FLINK-10694) ZooKeeperHaServices Cleanup

2020-05-18 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-10694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-10694:
--
Fix Version/s: (was: 1.12.0)
   1.11.0

> ZooKeeperHaServices Cleanup
> ---
>
> Key: FLINK-10694
> URL: https://issues.apache.org/jira/browse/FLINK-10694
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.6.1, 1.7.0
>Reporter: Mikhail Pryakhin
>Priority: Critical
> Fix For: 1.11.0
>
>
> When a streaming job with Zookeeper-HA enabled gets cancelled all the 
> job-related Zookeeper nodes are not removed. Is there a reason behind that? 
>  I noticed that Zookeeper paths are created of type "Container Node" (an 
> Ephemeral node that can have nested nodes) and fall back to Persistent node 
> type in case Zookeeper doesn't support this sort of nodes. 
>  But anyway, it is worth removing the job Zookeeper node when a job is 
> cancelled, isn't it?
> zookeeper version 3.4.10
>  flink version 1.6.1
>  # The job is deployed as a YARN cluster with the following properties set
> {noformat}
>  high-availability: zookeeper
>  high-availability.zookeeper.quorum: 
>  high-availability.zookeeper.storageDir: hdfs:///
>  high-availability.zookeeper.path.root: 
>  high-availability.zookeeper.path.namespace: 
> {noformat}
>  # The job is cancelled via flink cancel  command.
> What I've noticed:
>  when the job is running the following directory structure is created in 
> zookeeper
> {noformat}
> ///leader/resource_manager_lock
> ///leader/rest_server_lock
> ///leader/dispatcher_lock
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/5c21f00b9162becf5ce25a1cf0e67cde/041
> ///checkpoint-counter/5c21f00b9162becf5ce25a1cf0e67cde
> ///running_job_registry/5c21f00b9162becf5ce25a1cf0e67cde
> {noformat}
> when the job is cancelled some ephemeral nodes disappear, but most of them 
> are still there:
> {noformat}
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/
> ///checkpoint-counter/
> ///running_job_registry/
> {noformat}
> Here is the method [1] responsible for cleaning zookeeper folders up [1] 
> which is called when a job manager has stopped [2]. 
>  And it seems it only cleans up the *running_job_registry* folder, other 
> folders stay untouched. I suppose that everything under the 
> *///* folder should be cleaned up when the 
> job is cancelled.
> [1] 
> [https://github.com/apache/flink/blob/8674b69964eae50cad024f2c5caf92a71bf21a09/flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/zookeeper/ZooKeeperRunningJobsRegistry.java#L107]
>  [2] 
> [https://github.com/apache/flink/blob/f087f57749004790b6f5b823d66822c36ae09927/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L332]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-10694) ZooKeeperHaServices Cleanup

2020-05-18 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-10694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-10694:
--
Fix Version/s: (was: 1.7.3)
   1.12.0

> ZooKeeperHaServices Cleanup
> ---
>
> Key: FLINK-10694
> URL: https://issues.apache.org/jira/browse/FLINK-10694
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.6.1, 1.7.0
>Reporter: Mikhail Pryakhin
>Assignee: vinoyang
>Priority: Critical
> Fix For: 1.12.0
>
>
> When a streaming job with Zookeeper-HA enabled gets cancelled all the 
> job-related Zookeeper nodes are not removed. Is there a reason behind that? 
>  I noticed that Zookeeper paths are created of type "Container Node" (an 
> Ephemeral node that can have nested nodes) and fall back to Persistent node 
> type in case Zookeeper doesn't support this sort of nodes. 
>  But anyway, it is worth removing the job Zookeeper node when a job is 
> cancelled, isn't it?
> zookeeper version 3.4.10
>  flink version 1.6.1
>  # The job is deployed as a YARN cluster with the following properties set
> {noformat}
>  high-availability: zookeeper
>  high-availability.zookeeper.quorum: 
>  high-availability.zookeeper.storageDir: hdfs:///
>  high-availability.zookeeper.path.root: 
>  high-availability.zookeeper.path.namespace: 
> {noformat}
>  # The job is cancelled via flink cancel  command.
> What I've noticed:
>  when the job is running the following directory structure is created in 
> zookeeper
> {noformat}
> ///leader/resource_manager_lock
> ///leader/rest_server_lock
> ///leader/dispatcher_lock
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/5c21f00b9162becf5ce25a1cf0e67cde/041
> ///checkpoint-counter/5c21f00b9162becf5ce25a1cf0e67cde
> ///running_job_registry/5c21f00b9162becf5ce25a1cf0e67cde
> {noformat}
> when the job is cancelled some ephemeral nodes disappear, but most of them 
> are still there:
> {noformat}
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/
> ///checkpoint-counter/
> ///running_job_registry/
> {noformat}
> Here is the method [1] responsible for cleaning zookeeper folders up [1] 
> which is called when a job manager has stopped [2]. 
>  And it seems it only cleans up the *running_job_registry* folder, other 
> folders stay untouched. I suppose that everything under the 
> *///* folder should be cleaned up when the 
> job is cancelled.
> [1] 
> [https://github.com/apache/flink/blob/8674b69964eae50cad024f2c5caf92a71bf21a09/flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/zookeeper/ZooKeeperRunningJobsRegistry.java#L107]
>  [2] 
> [https://github.com/apache/flink/blob/f087f57749004790b6f5b823d66822c36ae09927/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L332]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-10694) ZooKeeperHaServices Cleanup

2020-04-08 Thread Aljoscha Krettek (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-10694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-10694:
-
Component/s: (was: API / DataStream)
 Runtime / Coordination

> ZooKeeperHaServices Cleanup
> ---
>
> Key: FLINK-10694
> URL: https://issues.apache.org/jira/browse/FLINK-10694
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.6.1, 1.7.0
>Reporter: Mikhail Pryakhin
>Assignee: vinoyang
>Priority: Critical
> Fix For: 1.7.3
>
>
> When a streaming job with Zookeeper-HA enabled gets cancelled all the 
> job-related Zookeeper nodes are not removed. Is there a reason behind that? 
>  I noticed that Zookeeper paths are created of type "Container Node" (an 
> Ephemeral node that can have nested nodes) and fall back to Persistent node 
> type in case Zookeeper doesn't support this sort of nodes. 
>  But anyway, it is worth removing the job Zookeeper node when a job is 
> cancelled, isn't it?
> zookeeper version 3.4.10
>  flink version 1.6.1
>  # The job is deployed as a YARN cluster with the following properties set
> {noformat}
>  high-availability: zookeeper
>  high-availability.zookeeper.quorum: 
>  high-availability.zookeeper.storageDir: hdfs:///
>  high-availability.zookeeper.path.root: 
>  high-availability.zookeeper.path.namespace: 
> {noformat}
>  # The job is cancelled via flink cancel  command.
> What I've noticed:
>  when the job is running the following directory structure is created in 
> zookeeper
> {noformat}
> ///leader/resource_manager_lock
> ///leader/rest_server_lock
> ///leader/dispatcher_lock
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/5c21f00b9162becf5ce25a1cf0e67cde/041
> ///checkpoint-counter/5c21f00b9162becf5ce25a1cf0e67cde
> ///running_job_registry/5c21f00b9162becf5ce25a1cf0e67cde
> {noformat}
> when the job is cancelled some ephemeral nodes disappear, but most of them 
> are still there:
> {noformat}
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/
> ///checkpoint-counter/
> ///running_job_registry/
> {noformat}
> Here is the method [1] responsible for cleaning zookeeper folders up [1] 
> which is called when a job manager has stopped [2]. 
>  And it seems it only cleans up the *running_job_registry* folder, other 
> folders stay untouched. I suppose that everything under the 
> *///* folder should be cleaned up when the 
> job is cancelled.
> [1] 
> [https://github.com/apache/flink/blob/8674b69964eae50cad024f2c5caf92a71bf21a09/flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/zookeeper/ZooKeeperRunningJobsRegistry.java#L107]
>  [2] 
> [https://github.com/apache/flink/blob/f087f57749004790b6f5b823d66822c36ae09927/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L332]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-10694) ZooKeeperHaServices Cleanup

2020-02-17 Thread Chesnay Schepler (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-10694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-10694:
-
Fix Version/s: (was: 1.6.5)

> ZooKeeperHaServices Cleanup
> ---
>
> Key: FLINK-10694
> URL: https://issues.apache.org/jira/browse/FLINK-10694
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.6.1, 1.7.0
>Reporter: Mikhail Pryakhin
>Assignee: vinoyang
>Priority: Critical
> Fix For: 1.7.3
>
>
> When a streaming job with Zookeeper-HA enabled gets cancelled all the 
> job-related Zookeeper nodes are not removed. Is there a reason behind that? 
>  I noticed that Zookeeper paths are created of type "Container Node" (an 
> Ephemeral node that can have nested nodes) and fall back to Persistent node 
> type in case Zookeeper doesn't support this sort of nodes. 
>  But anyway, it is worth removing the job Zookeeper node when a job is 
> cancelled, isn't it?
> zookeeper version 3.4.10
>  flink version 1.6.1
>  # The job is deployed as a YARN cluster with the following properties set
> {noformat}
>  high-availability: zookeeper
>  high-availability.zookeeper.quorum: 
>  high-availability.zookeeper.storageDir: hdfs:///
>  high-availability.zookeeper.path.root: 
>  high-availability.zookeeper.path.namespace: 
> {noformat}
>  # The job is cancelled via flink cancel  command.
> What I've noticed:
>  when the job is running the following directory structure is created in 
> zookeeper
> {noformat}
> ///leader/resource_manager_lock
> ///leader/rest_server_lock
> ///leader/dispatcher_lock
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/5c21f00b9162becf5ce25a1cf0e67cde/041
> ///checkpoint-counter/5c21f00b9162becf5ce25a1cf0e67cde
> ///running_job_registry/5c21f00b9162becf5ce25a1cf0e67cde
> {noformat}
> when the job is cancelled some ephemeral nodes disappear, but most of them 
> are still there:
> {noformat}
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/
> ///checkpoint-counter/
> ///running_job_registry/
> {noformat}
> Here is the method [1] responsible for cleaning zookeeper folders up [1] 
> which is called when a job manager has stopped [2]. 
>  And it seems it only cleans up the *running_job_registry* folder, other 
> folders stay untouched. I suppose that everything under the 
> *///* folder should be cleaned up when the 
> job is cancelled.
> [1] 
> [https://github.com/apache/flink/blob/8674b69964eae50cad024f2c5caf92a71bf21a09/flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/zookeeper/ZooKeeperRunningJobsRegistry.java#L107]
>  [2] 
> [https://github.com/apache/flink/blob/f087f57749004790b6f5b823d66822c36ae09927/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L332]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-10694) ZooKeeperHaServices Cleanup

2019-02-14 Thread sunjincheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-10694:

Fix Version/s: (was: 1.6.4)
   1.6.5

> ZooKeeperHaServices Cleanup
> ---
>
> Key: FLINK-10694
> URL: https://issues.apache.org/jira/browse/FLINK-10694
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Affects Versions: 1.6.1, 1.7.0
>Reporter: Mikhail Pryakhin
>Assignee: vinoyang
>Priority: Critical
> Fix For: 1.7.3, 1.8.0, 1.6.5
>
>
> When a streaming job with Zookeeper-HA enabled gets cancelled all the 
> job-related Zookeeper nodes are not removed. Is there a reason behind that? 
>  I noticed that Zookeeper paths are created of type "Container Node" (an 
> Ephemeral node that can have nested nodes) and fall back to Persistent node 
> type in case Zookeeper doesn't support this sort of nodes. 
>  But anyway, it is worth removing the job Zookeeper node when a job is 
> cancelled, isn't it?
> zookeeper version 3.4.10
>  flink version 1.6.1
>  # The job is deployed as a YARN cluster with the following properties set
> {noformat}
>  high-availability: zookeeper
>  high-availability.zookeeper.quorum: 
>  high-availability.zookeeper.storageDir: hdfs:///
>  high-availability.zookeeper.path.root: 
>  high-availability.zookeeper.path.namespace: 
> {noformat}
>  # The job is cancelled via flink cancel  command.
> What I've noticed:
>  when the job is running the following directory structure is created in 
> zookeeper
> {noformat}
> ///leader/resource_manager_lock
> ///leader/rest_server_lock
> ///leader/dispatcher_lock
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/5c21f00b9162becf5ce25a1cf0e67cde/041
> ///checkpoint-counter/5c21f00b9162becf5ce25a1cf0e67cde
> ///running_job_registry/5c21f00b9162becf5ce25a1cf0e67cde
> {noformat}
> when the job is cancelled some ephemeral nodes disappear, but most of them 
> are still there:
> {noformat}
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/
> ///checkpoint-counter/
> ///running_job_registry/
> {noformat}
> Here is the method [1] responsible for cleaning zookeeper folders up [1] 
> which is called when a job manager has stopped [2]. 
>  And it seems it only cleans up the *running_job_registry* folder, other 
> folders stay untouched. I suppose that everything under the 
> *///* folder should be cleaned up when the 
> job is cancelled.
> [1] 
> [https://github.com/apache/flink/blob/8674b69964eae50cad024f2c5caf92a71bf21a09/flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/zookeeper/ZooKeeperRunningJobsRegistry.java#L107]
>  [2] 
> [https://github.com/apache/flink/blob/f087f57749004790b6f5b823d66822c36ae09927/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L332]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10694) ZooKeeperHaServices Cleanup

2019-02-11 Thread Tzu-Li (Gordon) Tai (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-10694:

Fix Version/s: (was: 1.7.2)
   1.7.3

> ZooKeeperHaServices Cleanup
> ---
>
> Key: FLINK-10694
> URL: https://issues.apache.org/jira/browse/FLINK-10694
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Affects Versions: 1.6.1, 1.7.0
>Reporter: Mikhail Pryakhin
>Assignee: vinoyang
>Priority: Critical
> Fix For: 1.6.4, 1.7.3, 1.8.0
>
>
> When a streaming job with Zookeeper-HA enabled gets cancelled all the 
> job-related Zookeeper nodes are not removed. Is there a reason behind that? 
>  I noticed that Zookeeper paths are created of type "Container Node" (an 
> Ephemeral node that can have nested nodes) and fall back to Persistent node 
> type in case Zookeeper doesn't support this sort of nodes. 
>  But anyway, it is worth removing the job Zookeeper node when a job is 
> cancelled, isn't it?
> zookeeper version 3.4.10
>  flink version 1.6.1
>  # The job is deployed as a YARN cluster with the following properties set
> {noformat}
>  high-availability: zookeeper
>  high-availability.zookeeper.quorum: 
>  high-availability.zookeeper.storageDir: hdfs:///
>  high-availability.zookeeper.path.root: 
>  high-availability.zookeeper.path.namespace: 
> {noformat}
>  # The job is cancelled via flink cancel  command.
> What I've noticed:
>  when the job is running the following directory structure is created in 
> zookeeper
> {noformat}
> ///leader/resource_manager_lock
> ///leader/rest_server_lock
> ///leader/dispatcher_lock
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/5c21f00b9162becf5ce25a1cf0e67cde/041
> ///checkpoint-counter/5c21f00b9162becf5ce25a1cf0e67cde
> ///running_job_registry/5c21f00b9162becf5ce25a1cf0e67cde
> {noformat}
> when the job is cancelled some ephemeral nodes disappear, but most of them 
> are still there:
> {noformat}
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/
> ///checkpoint-counter/
> ///running_job_registry/
> {noformat}
> Here is the method [1] responsible for cleaning zookeeper folders up [1] 
> which is called when a job manager has stopped [2]. 
>  And it seems it only cleans up the *running_job_registry* folder, other 
> folders stay untouched. I suppose that everything under the 
> *///* folder should be cleaned up when the 
> job is cancelled.
> [1] 
> [https://github.com/apache/flink/blob/8674b69964eae50cad024f2c5caf92a71bf21a09/flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/zookeeper/ZooKeeperRunningJobsRegistry.java#L107]
>  [2] 
> [https://github.com/apache/flink/blob/f087f57749004790b6f5b823d66822c36ae09927/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L332]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10694) ZooKeeperHaServices Cleanup

2018-12-17 Thread Tzu-Li (Gordon) Tai (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-10694:

Fix Version/s: (was: 1.6.3)
   1.6.4

> ZooKeeperHaServices Cleanup
> ---
>
> Key: FLINK-10694
> URL: https://issues.apache.org/jira/browse/FLINK-10694
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Affects Versions: 1.6.1, 1.7.0
>Reporter: Mikhail Pryakhin
>Assignee: vinoyang
>Priority: Critical
> Fix For: 1.6.4, 1.7.2, 1.8.0
>
>
> When a streaming job with Zookeeper-HA enabled gets cancelled all the 
> job-related Zookeeper nodes are not removed. Is there a reason behind that? 
>  I noticed that Zookeeper paths are created of type "Container Node" (an 
> Ephemeral node that can have nested nodes) and fall back to Persistent node 
> type in case Zookeeper doesn't support this sort of nodes. 
>  But anyway, it is worth removing the job Zookeeper node when a job is 
> cancelled, isn't it?
> zookeeper version 3.4.10
>  flink version 1.6.1
>  # The job is deployed as a YARN cluster with the following properties set
> {noformat}
>  high-availability: zookeeper
>  high-availability.zookeeper.quorum: 
>  high-availability.zookeeper.storageDir: hdfs:///
>  high-availability.zookeeper.path.root: 
>  high-availability.zookeeper.path.namespace: 
> {noformat}
>  # The job is cancelled via flink cancel  command.
> What I've noticed:
>  when the job is running the following directory structure is created in 
> zookeeper
> {noformat}
> ///leader/resource_manager_lock
> ///leader/rest_server_lock
> ///leader/dispatcher_lock
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/5c21f00b9162becf5ce25a1cf0e67cde/041
> ///checkpoint-counter/5c21f00b9162becf5ce25a1cf0e67cde
> ///running_job_registry/5c21f00b9162becf5ce25a1cf0e67cde
> {noformat}
> when the job is cancelled some ephemeral nodes disappear, but most of them 
> are still there:
> {noformat}
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/
> ///checkpoint-counter/
> ///running_job_registry/
> {noformat}
> Here is the method [1] responsible for cleaning zookeeper folders up [1] 
> which is called when a job manager has stopped [2]. 
>  And it seems it only cleans up the *running_job_registry* folder, other 
> folders stay untouched. I suppose that everything under the 
> *///* folder should be cleaned up when the 
> job is cancelled.
> [1] 
> [https://github.com/apache/flink/blob/8674b69964eae50cad024f2c5caf92a71bf21a09/flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/zookeeper/ZooKeeperRunningJobsRegistry.java#L107]
>  [2] 
> [https://github.com/apache/flink/blob/f087f57749004790b6f5b823d66822c36ae09927/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L332]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10694) ZooKeeperHaServices Cleanup

2018-11-18 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-10694:
--
Fix Version/s: (was: 1.7.0)

> ZooKeeperHaServices Cleanup
> ---
>
> Key: FLINK-10694
> URL: https://issues.apache.org/jira/browse/FLINK-10694
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Affects Versions: 1.6.1, 1.7.0
>Reporter: Mikhail Pryakhin
>Assignee: vinoyang
>Priority: Critical
> Fix For: 1.6.3, 1.8.0, 1.7.1
>
>
> When a streaming job with Zookeeper-HA enabled gets cancelled all the 
> job-related Zookeeper nodes are not removed. Is there a reason behind that? 
>  I noticed that Zookeeper paths are created of type "Container Node" (an 
> Ephemeral node that can have nested nodes) and fall back to Persistent node 
> type in case Zookeeper doesn't support this sort of nodes. 
>  But anyway, it is worth removing the job Zookeeper node when a job is 
> cancelled, isn't it?
> zookeeper version 3.4.10
>  flink version 1.6.1
>  # The job is deployed as a YARN cluster with the following properties set
> {noformat}
>  high-availability: zookeeper
>  high-availability.zookeeper.quorum: 
>  high-availability.zookeeper.storageDir: hdfs:///
>  high-availability.zookeeper.path.root: 
>  high-availability.zookeeper.path.namespace: 
> {noformat}
>  # The job is cancelled via flink cancel  command.
> What I've noticed:
>  when the job is running the following directory structure is created in 
> zookeeper
> {noformat}
> ///leader/resource_manager_lock
> ///leader/rest_server_lock
> ///leader/dispatcher_lock
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/5c21f00b9162becf5ce25a1cf0e67cde/041
> ///checkpoint-counter/5c21f00b9162becf5ce25a1cf0e67cde
> ///running_job_registry/5c21f00b9162becf5ce25a1cf0e67cde
> {noformat}
> when the job is cancelled some ephemeral nodes disappear, but most of them 
> are still there:
> {noformat}
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/
> ///checkpoint-counter/
> ///running_job_registry/
> {noformat}
> Here is the method [1] responsible for cleaning zookeeper folders up [1] 
> which is called when a job manager has stopped [2]. 
>  And it seems it only cleans up the *running_job_registry* folder, other 
> folders stay untouched. I suppose that everything under the 
> *///* folder should be cleaned up when the 
> job is cancelled.
> [1] 
> [https://github.com/apache/flink/blob/8674b69964eae50cad024f2c5caf92a71bf21a09/flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/zookeeper/ZooKeeperRunningJobsRegistry.java#L107]
>  [2] 
> [https://github.com/apache/flink/blob/f087f57749004790b6f5b823d66822c36ae09927/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L332]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10694) ZooKeeperHaServices Cleanup

2018-11-18 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-10694:
--
Fix Version/s: 1.7.1

> ZooKeeperHaServices Cleanup
> ---
>
> Key: FLINK-10694
> URL: https://issues.apache.org/jira/browse/FLINK-10694
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Affects Versions: 1.6.1, 1.7.0
>Reporter: Mikhail Pryakhin
>Assignee: vinoyang
>Priority: Critical
> Fix For: 1.6.3, 1.7.0, 1.8.0, 1.7.1
>
>
> When a streaming job with Zookeeper-HA enabled gets cancelled all the 
> job-related Zookeeper nodes are not removed. Is there a reason behind that? 
>  I noticed that Zookeeper paths are created of type "Container Node" (an 
> Ephemeral node that can have nested nodes) and fall back to Persistent node 
> type in case Zookeeper doesn't support this sort of nodes. 
>  But anyway, it is worth removing the job Zookeeper node when a job is 
> cancelled, isn't it?
> zookeeper version 3.4.10
>  flink version 1.6.1
>  # The job is deployed as a YARN cluster with the following properties set
> {noformat}
>  high-availability: zookeeper
>  high-availability.zookeeper.quorum: 
>  high-availability.zookeeper.storageDir: hdfs:///
>  high-availability.zookeeper.path.root: 
>  high-availability.zookeeper.path.namespace: 
> {noformat}
>  # The job is cancelled via flink cancel  command.
> What I've noticed:
>  when the job is running the following directory structure is created in 
> zookeeper
> {noformat}
> ///leader/resource_manager_lock
> ///leader/rest_server_lock
> ///leader/dispatcher_lock
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/5c21f00b9162becf5ce25a1cf0e67cde/041
> ///checkpoint-counter/5c21f00b9162becf5ce25a1cf0e67cde
> ///running_job_registry/5c21f00b9162becf5ce25a1cf0e67cde
> {noformat}
> when the job is cancelled some ephemeral nodes disappear, but most of them 
> are still there:
> {noformat}
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/
> ///checkpoint-counter/
> ///running_job_registry/
> {noformat}
> Here is the method [1] responsible for cleaning zookeeper folders up [1] 
> which is called when a job manager has stopped [2]. 
>  And it seems it only cleans up the *running_job_registry* folder, other 
> folders stay untouched. I suppose that everything under the 
> *///* folder should be cleaned up when the 
> job is cancelled.
> [1] 
> [https://github.com/apache/flink/blob/8674b69964eae50cad024f2c5caf92a71bf21a09/flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/zookeeper/ZooKeeperRunningJobsRegistry.java#L107]
>  [2] 
> [https://github.com/apache/flink/blob/f087f57749004790b6f5b823d66822c36ae09927/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L332]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10694) ZooKeeperHaServices Cleanup

2018-11-18 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-10694:
--
Fix Version/s: 1.8.0

> ZooKeeperHaServices Cleanup
> ---
>
> Key: FLINK-10694
> URL: https://issues.apache.org/jira/browse/FLINK-10694
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Affects Versions: 1.6.1, 1.7.0
>Reporter: Mikhail Pryakhin
>Assignee: vinoyang
>Priority: Critical
> Fix For: 1.6.3, 1.7.0, 1.8.0
>
>
> When a streaming job with Zookeeper-HA enabled gets cancelled all the 
> job-related Zookeeper nodes are not removed. Is there a reason behind that? 
>  I noticed that Zookeeper paths are created of type "Container Node" (an 
> Ephemeral node that can have nested nodes) and fall back to Persistent node 
> type in case Zookeeper doesn't support this sort of nodes. 
>  But anyway, it is worth removing the job Zookeeper node when a job is 
> cancelled, isn't it?
> zookeeper version 3.4.10
>  flink version 1.6.1
>  # The job is deployed as a YARN cluster with the following properties set
> {noformat}
>  high-availability: zookeeper
>  high-availability.zookeeper.quorum: 
>  high-availability.zookeeper.storageDir: hdfs:///
>  high-availability.zookeeper.path.root: 
>  high-availability.zookeeper.path.namespace: 
> {noformat}
>  # The job is cancelled via flink cancel  command.
> What I've noticed:
>  when the job is running the following directory structure is created in 
> zookeeper
> {noformat}
> ///leader/resource_manager_lock
> ///leader/rest_server_lock
> ///leader/dispatcher_lock
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/5c21f00b9162becf5ce25a1cf0e67cde/041
> ///checkpoint-counter/5c21f00b9162becf5ce25a1cf0e67cde
> ///running_job_registry/5c21f00b9162becf5ce25a1cf0e67cde
> {noformat}
> when the job is cancelled some ephemeral nodes disappear, but most of them 
> are still there:
> {noformat}
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/
> ///checkpoint-counter/
> ///running_job_registry/
> {noformat}
> Here is the method [1] responsible for cleaning zookeeper folders up [1] 
> which is called when a job manager has stopped [2]. 
>  And it seems it only cleans up the *running_job_registry* folder, other 
> folders stay untouched. I suppose that everything under the 
> *///* folder should be cleaned up when the 
> job is cancelled.
> [1] 
> [https://github.com/apache/flink/blob/8674b69964eae50cad024f2c5caf92a71bf21a09/flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/zookeeper/ZooKeeperRunningJobsRegistry.java#L107]
>  [2] 
> [https://github.com/apache/flink/blob/f087f57749004790b6f5b823d66822c36ae09927/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L332]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10694) ZooKeeperHaServices Cleanup

2018-11-03 Thread Mikhail Pryakhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Pryakhin updated FLINK-10694:
-
Description: 
When a streaming job with Zookeeper-HA enabled gets cancelled all the 
job-related Zookeeper nodes are not removed. Is there a reason behind that? 
 I noticed that Zookeeper paths are created of type "Container Node" (an 
Ephemeral node that can have nested nodes) and fall back to Persistent node 
type in case Zookeeper doesn't support this sort of nodes. 
 But anyway, it is worth removing the job Zookeeper node when a job is 
cancelled, isn't it?

zookeeper version 3.4.10
 flink version 1.6.1
 # The job is deployed as a YARN cluster with the following properties set
{noformat}
 high-availability: zookeeper
 high-availability.zookeeper.quorum: 
 high-availability.zookeeper.storageDir: hdfs:///
 high-availability.zookeeper.path.root: 
 high-availability.zookeeper.path.namespace: 
{noformat}

 # The job is cancelled via flink cancel  command.

What I've noticed:
 when the job is running the following directory structure is created in 
zookeeper
{noformat}
///leader/resource_manager_lock
///leader/rest_server_lock
///leader/dispatcher_lock
///leader/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
///leaderlatch/resource_manager_lock
///leaderlatch/rest_server_lock
///leaderlatch/dispatcher_lock
///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
///checkpoints/5c21f00b9162becf5ce25a1cf0e67cde/041
///checkpoint-counter/5c21f00b9162becf5ce25a1cf0e67cde
///running_job_registry/5c21f00b9162becf5ce25a1cf0e67cde
{noformat}
when the job is cancelled some ephemeral nodes disappear, but most of them are 
still there:
{noformat}
///leader/5c21f00b9162becf5ce25a1cf0e67cde
///leaderlatch/resource_manager_lock
///leaderlatch/rest_server_lock
///leaderlatch/dispatcher_lock
///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
///checkpoints/
///checkpoint-counter/
///running_job_registry/
{noformat}
Here is the method [1] responsible for cleaning zookeeper folders up [1] which 
is called when a job manager has stopped [2]. 
 And it seems it only cleans up the *running_job_registry* folder, other 
folders stay untouched. I suppose that everything under the 
*///* folder should be cleaned up when the job 
is cancelled.

[1] 
[https://github.com/apache/flink/blob/8674b69964eae50cad024f2c5caf92a71bf21a09/flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/zookeeper/ZooKeeperRunningJobsRegistry.java#L107]
 [2] 
[https://github.com/apache/flink/blob/f087f57749004790b6f5b823d66822c36ae09927/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L332]

  was:
When a streaming job with Zookeeper-HA enabled gets cancelled all the 
job-related Zookeeper nodes are not removed. Is there a reason behind that? 
 I noticed that Zookeeper paths are created of type "Container Node" (an 
Ephemeral node that can have nested nodes) and fall back to Persistent node 
type in case Zookeeper doesn't support this sort of nodes. 
 But anyway, it is worth removing the job Zookeeper node when a job is 
cancelled, isn't it?

zookeeper version 3.4.10
 flink version 1.6.1
 # The job is deployed as a YARN cluster with the following properties set
{noformat}
 high-availability: zookeeper
 high-availability.zookeeper.quorum: 
 high-availability.zookeeper.storageDir: hdfs:///
 high-availability.zookeeper.path.root: 
 high-availability.zookeeper.path.namespace: 
{noformat}

 # The job is cancelled via flink cancel  command.

What I've noticed:
 when the job is running the following directory structure is created in 
zookeeper
{noformat}
///leader/resource_manager_lock
///leader/rest_server_lock
///leader/dispatcher_lock
///leader/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
///leaderlatch/resource_manager_lock
///leaderlatch/rest_server_lock
///leaderlatch/dispatcher_lock
///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
///checkpoints/5c21f00b9162becf5ce25a1cf0e67cde/041
///checkpoint-counter/5c21f00b9162becf5ce25a1cf0e67cde
///running_job_registry/5c21f00b9162becf5ce25a1cf0e67cde
{noformat}
when the job is cancelled some ephemeral nodes disappear, but most of them are 
still there:
{noformat}
///leader/5c21f00b9162becf5ce25a1cf0e67cde
///leaderlatch/resource_manager_lock
///leaderlatch/rest_server_lock
///leaderlatch/dispatcher_lock
///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
///checkpoints/
///checkpoint-counter/
///running_job_registry/
{noformat}
Here is the method [1] responsible for cleaning zookeeper folders up [1] which 
is called when the job manager has stopped [2]. 
 And it seems it only cleans up the folder *running_job_registry*, other 
folders stay untouched. I suppose that everything under the 
*///* folder is cleaned up when the job is 
cancelled.

[1] 

[jira] [Updated] (FLINK-10694) ZooKeeperHaServices Cleanup

2018-10-28 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-10694:
--
Summary: ZooKeeperHaServices Cleanup  (was: ZooKeeperRunningJobsRegistry 
Cleanup)

> ZooKeeperHaServices Cleanup
> ---
>
> Key: FLINK-10694
> URL: https://issues.apache.org/jira/browse/FLINK-10694
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Affects Versions: 1.6.1, 1.7.0
>Reporter: Mikhail Pryakhin
>Priority: Major
> Fix For: 1.6.3, 1.7.0
>
>
> When a streaming job with Zookeeper-HA enabled gets cancelled all the 
> job-related Zookeeper nodes are not removed. Is there a reason behind that? 
>  I noticed that Zookeeper paths are created of type "Container Node" (an 
> Ephemeral node that can have nested nodes) and fall back to Persistent node 
> type in case Zookeeper doesn't support this sort of nodes. 
>  But anyway, it is worth removing the job Zookeeper node when a job is 
> cancelled, isn't it?
> zookeeper version 3.4.10
>  flink version 1.6.1
>  # The job is deployed as a YARN cluster with the following properties set
> {noformat}
>  high-availability: zookeeper
>  high-availability.zookeeper.quorum: 
>  high-availability.zookeeper.storageDir: hdfs:///
>  high-availability.zookeeper.path.root: 
>  high-availability.zookeeper.path.namespace: 
> {noformat}
>  # The job is cancelled via flink cancel  command.
> What I've noticed:
>  when the job is running the following directory structure is created in 
> zookeeper
> {noformat}
> ///leader/resource_manager_lock
> ///leader/rest_server_lock
> ///leader/dispatcher_lock
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/5c21f00b9162becf5ce25a1cf0e67cde/041
> ///checkpoint-counter/5c21f00b9162becf5ce25a1cf0e67cde
> ///running_job_registry/5c21f00b9162becf5ce25a1cf0e67cde
> {noformat}
> when the job is cancelled some ephemeral nodes disappear, but most of them 
> are still there:
> {noformat}
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/
> ///checkpoint-counter/
> ///running_job_registry/
> {noformat}
> Here is the method [1] responsible for cleaning zookeeper folders up [1] 
> which is called when the job manager has stopped [2]. 
>  And it seems it only cleans up the folder *running_job_registry*, other 
> folders stay untouched. I suppose that everything under the 
> *///* folder is cleaned up when the job is 
> cancelled.
> [1] 
> [https://github.com/apache/flink/blob/8674b69964eae50cad024f2c5caf92a71bf21a09/flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/zookeeper/ZooKeeperRunningJobsRegistry.java#L107]
>  [2] 
> [https://github.com/apache/flink/blob/f087f57749004790b6f5b823d66822c36ae09927/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L332]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10694) ZooKeeperHaServices Cleanup

2018-10-28 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-10694:
--
Affects Version/s: 1.7.0

> ZooKeeperHaServices Cleanup
> ---
>
> Key: FLINK-10694
> URL: https://issues.apache.org/jira/browse/FLINK-10694
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Affects Versions: 1.6.1, 1.7.0
>Reporter: Mikhail Pryakhin
>Priority: Major
> Fix For: 1.6.3, 1.7.0
>
>
> When a streaming job with Zookeeper-HA enabled gets cancelled all the 
> job-related Zookeeper nodes are not removed. Is there a reason behind that? 
>  I noticed that Zookeeper paths are created of type "Container Node" (an 
> Ephemeral node that can have nested nodes) and fall back to Persistent node 
> type in case Zookeeper doesn't support this sort of nodes. 
>  But anyway, it is worth removing the job Zookeeper node when a job is 
> cancelled, isn't it?
> zookeeper version 3.4.10
>  flink version 1.6.1
>  # The job is deployed as a YARN cluster with the following properties set
> {noformat}
>  high-availability: zookeeper
>  high-availability.zookeeper.quorum: 
>  high-availability.zookeeper.storageDir: hdfs:///
>  high-availability.zookeeper.path.root: 
>  high-availability.zookeeper.path.namespace: 
> {noformat}
>  # The job is cancelled via flink cancel  command.
> What I've noticed:
>  when the job is running the following directory structure is created in 
> zookeeper
> {noformat}
> ///leader/resource_manager_lock
> ///leader/rest_server_lock
> ///leader/dispatcher_lock
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/5c21f00b9162becf5ce25a1cf0e67cde/041
> ///checkpoint-counter/5c21f00b9162becf5ce25a1cf0e67cde
> ///running_job_registry/5c21f00b9162becf5ce25a1cf0e67cde
> {noformat}
> when the job is cancelled some ephemeral nodes disappear, but most of them 
> are still there:
> {noformat}
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/
> ///checkpoint-counter/
> ///running_job_registry/
> {noformat}
> Here is the method [1] responsible for cleaning zookeeper folders up [1] 
> which is called when the job manager has stopped [2]. 
>  And it seems it only cleans up the folder *running_job_registry*, other 
> folders stay untouched. I suppose that everything under the 
> *///* folder is cleaned up when the job is 
> cancelled.
> [1] 
> [https://github.com/apache/flink/blob/8674b69964eae50cad024f2c5caf92a71bf21a09/flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/zookeeper/ZooKeeperRunningJobsRegistry.java#L107]
>  [2] 
> [https://github.com/apache/flink/blob/f087f57749004790b6f5b823d66822c36ae09927/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L332]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10694) ZooKeeperHaServices Cleanup

2018-10-28 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-10694:
--
Priority: Critical  (was: Major)

> ZooKeeperHaServices Cleanup
> ---
>
> Key: FLINK-10694
> URL: https://issues.apache.org/jira/browse/FLINK-10694
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Affects Versions: 1.6.1, 1.7.0
>Reporter: Mikhail Pryakhin
>Priority: Critical
> Fix For: 1.6.3, 1.7.0
>
>
> When a streaming job with Zookeeper-HA enabled gets cancelled all the 
> job-related Zookeeper nodes are not removed. Is there a reason behind that? 
>  I noticed that Zookeeper paths are created of type "Container Node" (an 
> Ephemeral node that can have nested nodes) and fall back to Persistent node 
> type in case Zookeeper doesn't support this sort of nodes. 
>  But anyway, it is worth removing the job Zookeeper node when a job is 
> cancelled, isn't it?
> zookeeper version 3.4.10
>  flink version 1.6.1
>  # The job is deployed as a YARN cluster with the following properties set
> {noformat}
>  high-availability: zookeeper
>  high-availability.zookeeper.quorum: 
>  high-availability.zookeeper.storageDir: hdfs:///
>  high-availability.zookeeper.path.root: 
>  high-availability.zookeeper.path.namespace: 
> {noformat}
>  # The job is cancelled via flink cancel  command.
> What I've noticed:
>  when the job is running the following directory structure is created in 
> zookeeper
> {noformat}
> ///leader/resource_manager_lock
> ///leader/rest_server_lock
> ///leader/dispatcher_lock
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/5c21f00b9162becf5ce25a1cf0e67cde/041
> ///checkpoint-counter/5c21f00b9162becf5ce25a1cf0e67cde
> ///running_job_registry/5c21f00b9162becf5ce25a1cf0e67cde
> {noformat}
> when the job is cancelled some ephemeral nodes disappear, but most of them 
> are still there:
> {noformat}
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/
> ///checkpoint-counter/
> ///running_job_registry/
> {noformat}
> Here is the method [1] responsible for cleaning zookeeper folders up [1] 
> which is called when the job manager has stopped [2]. 
>  And it seems it only cleans up the folder *running_job_registry*, other 
> folders stay untouched. I suppose that everything under the 
> *///* folder is cleaned up when the job is 
> cancelled.
> [1] 
> [https://github.com/apache/flink/blob/8674b69964eae50cad024f2c5caf92a71bf21a09/flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/zookeeper/ZooKeeperRunningJobsRegistry.java#L107]
>  [2] 
> [https://github.com/apache/flink/blob/f087f57749004790b6f5b823d66822c36ae09927/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L332]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-10694) ZooKeeperHaServices Cleanup

2018-10-28 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-10694:
--
Fix Version/s: 1.7.0
   1.6.3

> ZooKeeperHaServices Cleanup
> ---
>
> Key: FLINK-10694
> URL: https://issues.apache.org/jira/browse/FLINK-10694
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Affects Versions: 1.6.1, 1.7.0
>Reporter: Mikhail Pryakhin
>Priority: Major
> Fix For: 1.6.3, 1.7.0
>
>
> When a streaming job with Zookeeper-HA enabled gets cancelled all the 
> job-related Zookeeper nodes are not removed. Is there a reason behind that? 
>  I noticed that Zookeeper paths are created of type "Container Node" (an 
> Ephemeral node that can have nested nodes) and fall back to Persistent node 
> type in case Zookeeper doesn't support this sort of nodes. 
>  But anyway, it is worth removing the job Zookeeper node when a job is 
> cancelled, isn't it?
> zookeeper version 3.4.10
>  flink version 1.6.1
>  # The job is deployed as a YARN cluster with the following properties set
> {noformat}
>  high-availability: zookeeper
>  high-availability.zookeeper.quorum: 
>  high-availability.zookeeper.storageDir: hdfs:///
>  high-availability.zookeeper.path.root: 
>  high-availability.zookeeper.path.namespace: 
> {noformat}
>  # The job is cancelled via flink cancel  command.
> What I've noticed:
>  when the job is running the following directory structure is created in 
> zookeeper
> {noformat}
> ///leader/resource_manager_lock
> ///leader/rest_server_lock
> ///leader/dispatcher_lock
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/5c21f00b9162becf5ce25a1cf0e67cde/041
> ///checkpoint-counter/5c21f00b9162becf5ce25a1cf0e67cde
> ///running_job_registry/5c21f00b9162becf5ce25a1cf0e67cde
> {noformat}
> when the job is cancelled some ephemeral nodes disappear, but most of them 
> are still there:
> {noformat}
> ///leader/5c21f00b9162becf5ce25a1cf0e67cde
> ///leaderlatch/resource_manager_lock
> ///leaderlatch/rest_server_lock
> ///leaderlatch/dispatcher_lock
> ///leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
> ///checkpoints/
> ///checkpoint-counter/
> ///running_job_registry/
> {noformat}
> Here is the method [1] responsible for cleaning zookeeper folders up [1] 
> which is called when the job manager has stopped [2]. 
>  And it seems it only cleans up the folder *running_job_registry*, other 
> folders stay untouched. I suppose that everything under the 
> *///* folder is cleaned up when the job is 
> cancelled.
> [1] 
> [https://github.com/apache/flink/blob/8674b69964eae50cad024f2c5caf92a71bf21a09/flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/zookeeper/ZooKeeperRunningJobsRegistry.java#L107]
>  [2] 
> [https://github.com/apache/flink/blob/f087f57749004790b6f5b823d66822c36ae09927/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L332]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)