[jira] [Commented] (FLINK-3517) Number of job and task managers not checked in scripts

2016-02-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171641#comment-15171641
 ] 

ASF GitHub Bot commented on FLINK-3517:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1716


> Number of job and task managers not checked in scripts
> --
>
> Key: FLINK-3517
> URL: https://issues.apache.org/jira/browse/FLINK-3517
> Project: Flink
>  Issue Type: Test
>  Components: Start-Stop Scripts
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>Priority: Minor
>
> The start up scripts determine whether a job or task manager is running via a 
> pids file. If a process, which is part of the pid file, is destroyed (for 
> example on failure) outside of the scripts, a warning for multiple job 
> managers are printed even though they are not running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3517) Number of job and task managers not checked in scripts

2016-02-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171635#comment-15171635
 ] 

ASF GitHub Bot commented on FLINK-3517:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1716#issuecomment-190121633
  
I'm merging this to `master` and `release-1.0`.


> Number of job and task managers not checked in scripts
> --
>
> Key: FLINK-3517
> URL: https://issues.apache.org/jira/browse/FLINK-3517
> Project: Flink
>  Issue Type: Test
>  Components: Start-Stop Scripts
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>Priority: Minor
>
> The start up scripts determine whether a job or task manager is running via a 
> pids file. If a process, which is part of the pid file, is destroyed (for 
> example on failure) outside of the scripts, a warning for multiple job 
> managers are printed even though they are not running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3517) Number of job and task managers not checked in scripts

2016-02-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168725#comment-15168725
 ] 

ASF GitHub Bot commented on FLINK-3517:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1716#issuecomment-189191740
  
Code changes look good :-) +1 for merging.


> Number of job and task managers not checked in scripts
> --
>
> Key: FLINK-3517
> URL: https://issues.apache.org/jira/browse/FLINK-3517
> Project: Flink
>  Issue Type: Test
>  Components: Start-Stop Scripts
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>Priority: Minor
>
> The start up scripts determine whether a job or task manager is running via a 
> pids file. If a process, which is part of the pid file, is destroyed (for 
> example on failure) outside of the scripts, a warning for multiple job 
> managers are printed even though they are not running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3517) Number of job and task managers not checked in scripts

2016-02-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168723#comment-15168723
 ] 

ASF GitHub Bot commented on FLINK-3517:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/1716#issuecomment-189190990
  
+1 to merge!

I find "No taskmanager daemon (pid: 27140) is running anymore on pablo." 
confusing. I think we could change it to something like "TaskManager couldn't 
be stopped. It has already been shut down." Anyways, it's not part of this 
issue.


> Number of job and task managers not checked in scripts
> --
>
> Key: FLINK-3517
> URL: https://issues.apache.org/jira/browse/FLINK-3517
> Project: Flink
>  Issue Type: Test
>  Components: Start-Stop Scripts
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>Priority: Minor
>
> The start up scripts determine whether a job or task manager is running via a 
> pids file. If a process, which is part of the pid file, is destroyed (for 
> example on failure) outside of the scripts, a warning for multiple job 
> managers are printed even though they are not running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3517) Number of job and task managers not checked in scripts

2016-02-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168098#comment-15168098
 ] 

ASF GitHub Bot commented on FLINK-3517:
---

GitHub user uce opened a pull request:

https://github.com/apache/flink/pull/1716

[FLINK-3517] [dist] Only count active PIDs in start script

```bash
$ bin/start-cluster.sh
Starting cluster.
Starting jobmanager daemon on host pablo.
Starting taskmanager daemon on host pablo.
$ bin/taskmanager.sh start
[INFO] 1 instance(s) of taskmanager are already running on pablo.
Starting taskmanager daemon on host pablo.
$ bin/taskmanager.sh start
[INFO] 2 instance(s) of taskmanager are already running on pablo.
Starting taskmanager daemon on host pablo.
$ bin/taskmanager.sh start
[INFO] 3 instance(s) of taskmanager are already running on pablo.
Starting taskmanager daemon on host pablo.
$ jps
27328 TaskManager
27140 TaskManager
26949 TaskManager
26523 JobManager
26716 TaskManager
$ kill -9 27140
$ bin/taskmanager.sh start
>>> [INFO] 3 instance(s) of taskmanager are already running on pablo <<< 
Correct now
Starting taskmanager daemon on host pablo.
$ bin/stop-cluster.sh
Stopping taskmanager daemon (pid: 27545) on host pablo.
Stopping jobmanager daemon (pid: 26523) on host pablo.
$ bin/taskmanager.sh stop
Stopping taskmanager daemon (pid: 27328) on host pablo.
$ bin/taskmanager.sh stop
No taskmanager daemon (pid: 27140) is running anymore on pablo.
$ bin/taskmanager.sh stop
Stopping taskmanager daemon (pid: 26949) on host pablo.
$ bin/taskmanager.sh stop
Stopping taskmanager daemon (pid: 26716) on host pablo.
$ bin/taskmanager.sh stop
No taskmanager daemon to stop on host pablo.
```

We can further improve the stop part by repeatedly the PIDs in the pid file 
if a value is not matching an active PID.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uce/flink 3517-scripts

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1716.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1716


commit e037c89404704b8f8bd02911e65dc1dd24b1e836
Author: Ufuk Celebi 
Date:   2016-02-25T23:11:48Z

[FLINK-3517] [dist] Only count active PIDs in start script




> Number of job and task managers not checked in scripts
> --
>
> Key: FLINK-3517
> URL: https://issues.apache.org/jira/browse/FLINK-3517
> Project: Flink
>  Issue Type: Test
>  Components: Start-Stop Scripts
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>Priority: Minor
>
> The start up scripts determine whether a job or task manager is running via a 
> pids file. If a process, which is part of the pid file, is destroyed (for 
> example on failure) outside of the scripts, a warning for multiple job 
> managers are printed even though they are not running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)