[jira] [Commented] (FLINK-3517) Number of job and task managers not checked in scripts
[ https://issues.apache.org/jira/browse/FLINK-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171641#comment-15171641 ] ASF GitHub Bot commented on FLINK-3517: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1716 > Number of job and task managers not checked in scripts > -- > > Key: FLINK-3517 > URL: https://issues.apache.org/jira/browse/FLINK-3517 > Project: Flink > Issue Type: Test > Components: Start-Stop Scripts >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi >Priority: Minor > > The start up scripts determine whether a job or task manager is running via a > pids file. If a process, which is part of the pid file, is destroyed (for > example on failure) outside of the scripts, a warning for multiple job > managers are printed even though they are not running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3517) Number of job and task managers not checked in scripts
[ https://issues.apache.org/jira/browse/FLINK-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171635#comment-15171635 ] ASF GitHub Bot commented on FLINK-3517: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/1716#issuecomment-190121633 I'm merging this to `master` and `release-1.0`. > Number of job and task managers not checked in scripts > -- > > Key: FLINK-3517 > URL: https://issues.apache.org/jira/browse/FLINK-3517 > Project: Flink > Issue Type: Test > Components: Start-Stop Scripts >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi >Priority: Minor > > The start up scripts determine whether a job or task manager is running via a > pids file. If a process, which is part of the pid file, is destroyed (for > example on failure) outside of the scripts, a warning for multiple job > managers are printed even though they are not running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3517) Number of job and task managers not checked in scripts
[ https://issues.apache.org/jira/browse/FLINK-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168725#comment-15168725 ] ASF GitHub Bot commented on FLINK-3517: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1716#issuecomment-189191740 Code changes look good :-) +1 for merging. > Number of job and task managers not checked in scripts > -- > > Key: FLINK-3517 > URL: https://issues.apache.org/jira/browse/FLINK-3517 > Project: Flink > Issue Type: Test > Components: Start-Stop Scripts >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi >Priority: Minor > > The start up scripts determine whether a job or task manager is running via a > pids file. If a process, which is part of the pid file, is destroyed (for > example on failure) outside of the scripts, a warning for multiple job > managers are printed even though they are not running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3517) Number of job and task managers not checked in scripts
[ https://issues.apache.org/jira/browse/FLINK-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168723#comment-15168723 ] ASF GitHub Bot commented on FLINK-3517: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/1716#issuecomment-189190990 +1 to merge! I find "No taskmanager daemon (pid: 27140) is running anymore on pablo." confusing. I think we could change it to something like "TaskManager couldn't be stopped. It has already been shut down." Anyways, it's not part of this issue. > Number of job and task managers not checked in scripts > -- > > Key: FLINK-3517 > URL: https://issues.apache.org/jira/browse/FLINK-3517 > Project: Flink > Issue Type: Test > Components: Start-Stop Scripts >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi >Priority: Minor > > The start up scripts determine whether a job or task manager is running via a > pids file. If a process, which is part of the pid file, is destroyed (for > example on failure) outside of the scripts, a warning for multiple job > managers are printed even though they are not running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3517) Number of job and task managers not checked in scripts
[ https://issues.apache.org/jira/browse/FLINK-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168098#comment-15168098 ] ASF GitHub Bot commented on FLINK-3517: --- GitHub user uce opened a pull request: https://github.com/apache/flink/pull/1716 [FLINK-3517] [dist] Only count active PIDs in start script ```bash $ bin/start-cluster.sh Starting cluster. Starting jobmanager daemon on host pablo. Starting taskmanager daemon on host pablo. $ bin/taskmanager.sh start [INFO] 1 instance(s) of taskmanager are already running on pablo. Starting taskmanager daemon on host pablo. $ bin/taskmanager.sh start [INFO] 2 instance(s) of taskmanager are already running on pablo. Starting taskmanager daemon on host pablo. $ bin/taskmanager.sh start [INFO] 3 instance(s) of taskmanager are already running on pablo. Starting taskmanager daemon on host pablo. $ jps 27328 TaskManager 27140 TaskManager 26949 TaskManager 26523 JobManager 26716 TaskManager $ kill -9 27140 $ bin/taskmanager.sh start >>> [INFO] 3 instance(s) of taskmanager are already running on pablo <<< Correct now Starting taskmanager daemon on host pablo. $ bin/stop-cluster.sh Stopping taskmanager daemon (pid: 27545) on host pablo. Stopping jobmanager daemon (pid: 26523) on host pablo. $ bin/taskmanager.sh stop Stopping taskmanager daemon (pid: 27328) on host pablo. $ bin/taskmanager.sh stop No taskmanager daemon (pid: 27140) is running anymore on pablo. $ bin/taskmanager.sh stop Stopping taskmanager daemon (pid: 26949) on host pablo. $ bin/taskmanager.sh stop Stopping taskmanager daemon (pid: 26716) on host pablo. $ bin/taskmanager.sh stop No taskmanager daemon to stop on host pablo. ``` We can further improve the stop part by repeatedly the PIDs in the pid file if a value is not matching an active PID. You can merge this pull request into a Git repository by running: $ git pull https://github.com/uce/flink 3517-scripts Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1716.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1716 commit e037c89404704b8f8bd02911e65dc1dd24b1e836 Author: Ufuk CelebiDate: 2016-02-25T23:11:48Z [FLINK-3517] [dist] Only count active PIDs in start script > Number of job and task managers not checked in scripts > -- > > Key: FLINK-3517 > URL: https://issues.apache.org/jira/browse/FLINK-3517 > Project: Flink > Issue Type: Test > Components: Start-Stop Scripts >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi >Priority: Minor > > The start up scripts determine whether a job or task manager is running via a > pids file. If a process, which is part of the pid file, is destroyed (for > example on failure) outside of the scripts, a warning for multiple job > managers are printed even though they are not running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)