[jira] [Commented] (HADOOP-13837) Always get unable to kill error message even the hadoop process was successfully killed

2018-02-15 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366608#comment-16366608
 ] 

genericqa commented on HADOOP-13837:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 30s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 4s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
11s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
15s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 53m 30s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-13837 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12900444/HADOOP-13837.05.patch 
|
| Optional Tests |  asflicense  mvnsite  unit  shellcheck  shelldocs  |
| uname | Linux 69217260963f 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8013475 |
| maven | version: Apache Maven 3.3.9 |
| shellcheck | v0.4.6 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14145/testReport/ |
| Max. process+thread count | 341 (vs. ulimit of 5500) |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14145/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Always get unable to kill error message even the hadoop process was 
> successfully killed
> ---
>
> Key: HADOOP-13837
> URL: https://issues.apache.org/jira/browse/HADOOP-13837
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: HADOOP-13837.01.patch, HADOOP-13837.02.patch, 
> HADOOP-13837.03.patch, HADOOP-13837.04.patch, HADOOP-13837.05.patch, 
> check_proc.sh
>
>
> *Reproduce steps*
> # Setup a hadoop cluster
> # Stop resource manager : yarn --daemon stop resourcemanager
> # Stop node manager : yarn --daemon stop nodemanager
> WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill 
> with kill -9
> ERROR: Unable to kill 20325
> it always gets "Unable to kill " error message, this gives user 
> impression there is something wrong with the node manager process because it 
> was not able to be 

[jira] [Commented] (HADOOP-13837) Always get unable to kill error message even the hadoop process was successfully killed

2018-02-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366574#comment-16366574
 ] 

Wangda Tan commented on HADOOP-13837:
-

Moved to 3.2.0, please revert if you disagree.

> Always get unable to kill error message even the hadoop process was 
> successfully killed
> ---
>
> Key: HADOOP-13837
> URL: https://issues.apache.org/jira/browse/HADOOP-13837
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: HADOOP-13837.01.patch, HADOOP-13837.02.patch, 
> HADOOP-13837.03.patch, HADOOP-13837.04.patch, HADOOP-13837.05.patch, 
> check_proc.sh
>
>
> *Reproduce steps*
> # Setup a hadoop cluster
> # Stop resource manager : yarn --daemon stop resourcemanager
> # Stop node manager : yarn --daemon stop nodemanager
> WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill 
> with kill -9
> ERROR: Unable to kill 20325
> it always gets "Unable to kill " error message, this gives user 
> impression there is something wrong with the node manager process because it 
> was not able to be forcibly killed. But in fact, the kill command works as 
> expected.
> This was because hadoop-functions.sh did not check process existence after 
> kill properly. Currently it checks the process liveness right after the kill 
> command
> {code}
> ...
> kill -9 "${pid}" >/dev/null 2>&1
> if ps -p "${pid}" > /dev/null 2>&1; then
>   hadoop_error "ERROR: Unable to kill ${pid}"
> ...
> {code}
> when resource manager stopped before node managers, it always takes some 
> additional time until the process completely terminates. I tried to print 
> output of {{ps -p }} in a while loop after kill -9, and found 
> following
> {noformat}
> 16212 ?00:00:11 java 
> 0
>   PID TTY  TIME CMD
> 16212 ?00:00:11 java 
> 0
>   PID TTY  TIME CMD
> 16212 ?00:00:11 java 
> 0
>   PID TTY  TIME CMD
> 1
>   PID TTY  TIME CMD
> 1
>   PID TTY  TIME CMD
> 1
>   PID TTY  TIME CMD
> ...
> {noformat}
> in the first 3 times of the loop, the process did not terminate so the exit 
> code of {{ps -p}} are still {{0}}
> *Proposal of a fix*
> Firstly I was thinking to add a more comprehensive pid check, it checks the 
> pid liveness until reaches the HADOOP_STOP_TIMEOUT, but this seems to add too 
> much complexity. Second fix was to simply add a {{sleep 3}} after {{kill 
> -9}}, it should fix the error in most cases with relative small changes to 
> the script.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13837) Always get unable to kill error message even the hadoop process was successfully killed

2017-12-04 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16276464#comment-16276464
 ] 

genericqa commented on HADOOP-13837:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 
50s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 54s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 2s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m  
9s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 59s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
11s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 54m 31s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-13837 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12900444/HADOOP-13837.05.patch 
|
| Optional Tests |  asflicense  mvnsite  unit  shellcheck  shelldocs  |
| uname | Linux dffc115af861 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 37ca416 |
| maven | version: Apache Maven 3.3.9 |
| shellcheck | v0.4.6 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13775/testReport/ |
| Max. process+thread count | 343 (vs. ulimit of 5000) |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13775/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Always get unable to kill error message even the hadoop process was 
> successfully killed
> ---
>
> Key: HADOOP-13837
> URL: https://issues.apache.org/jira/browse/HADOOP-13837
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: HADOOP-13837.01.patch, HADOOP-13837.02.patch, 
> HADOOP-13837.03.patch, HADOOP-13837.04.patch, HADOOP-13837.05.patch, 
> check_proc.sh
>
>
> *Reproduce steps*
> # Setup a hadoop cluster
> # Stop resource manager : yarn --daemon stop resourcemanager
> # Stop node manager : yarn --daemon stop nodemanager
> WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill 
> with kill -9
> ERROR: Unable to kill 20325
> it always gets "Unable to kill " error message, this gives user 
> impression there is something wrong with the node manager process because it 
> was not able to be 

[jira] [Commented] (HADOOP-13837) Always get unable to kill error message even the hadoop process was successfully killed

2017-12-04 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16276454#comment-16276454
 ] 

genericqa commented on HADOOP-13837:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 
43s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  0s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 3s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m  
9s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
12s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 53m 21s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-13837 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12900444/HADOOP-13837.05.patch 
|
| Optional Tests |  asflicense  mvnsite  unit  shellcheck  shelldocs  |
| uname | Linux d5d6bb24eafb 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 37ca416 |
| maven | version: Apache Maven 3.3.9 |
| shellcheck | v0.4.6 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13774/testReport/ |
| Max. process+thread count | 315 (vs. ulimit of 5000) |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13774/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Always get unable to kill error message even the hadoop process was 
> successfully killed
> ---
>
> Key: HADOOP-13837
> URL: https://issues.apache.org/jira/browse/HADOOP-13837
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: HADOOP-13837.01.patch, HADOOP-13837.02.patch, 
> HADOOP-13837.03.patch, HADOOP-13837.04.patch, HADOOP-13837.05.patch, 
> check_proc.sh
>
>
> *Reproduce steps*
> # Setup a hadoop cluster
> # Stop resource manager : yarn --daemon stop resourcemanager
> # Stop node manager : yarn --daemon stop nodemanager
> WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill 
> with kill -9
> ERROR: Unable to kill 20325
> it always gets "Unable to kill " error message, this gives user 
> impression there is something wrong with the node manager process because it 
> was not able to be 

[jira] [Commented] (HADOOP-13837) Always get unable to kill error message even the hadoop process was successfully killed

2017-12-03 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16276404#comment-16276404
 ] 

Weiwei Yang commented on HADOOP-13837:
--

I am still seeing this issue in latest trunk, I am increasing severity to 
critical as this is almost happen every time which creates bad user experience.

> Always get unable to kill error message even the hadoop process was 
> successfully killed
> ---
>
> Key: HADOOP-13837
> URL: https://issues.apache.org/jira/browse/HADOOP-13837
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: HADOOP-13837.01.patch, HADOOP-13837.02.patch, 
> HADOOP-13837.03.patch, HADOOP-13837.04.patch, check_proc.sh
>
>
> *Reproduce steps*
> # Setup a hadoop cluster
> # Stop resource manager : yarn --daemon stop resourcemanager
> # Stop node manager : yarn --daemon stop nodemanager
> WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill 
> with kill -9
> ERROR: Unable to kill 20325
> it always gets "Unable to kill " error message, this gives user 
> impression there is something wrong with the node manager process because it 
> was not able to be forcibly killed. But in fact, the kill command works as 
> expected.
> This was because hadoop-functions.sh did not check process existence after 
> kill properly. Currently it checks the process liveness right after the kill 
> command
> {code}
> ...
> kill -9 "${pid}" >/dev/null 2>&1
> if ps -p "${pid}" > /dev/null 2>&1; then
>   hadoop_error "ERROR: Unable to kill ${pid}"
> ...
> {code}
> when resource manager stopped before node managers, it always takes some 
> additional time until the process completely terminates. I tried to print 
> output of {{ps -p }} in a while loop after kill -9, and found 
> following
> {noformat}
> 16212 ?00:00:11 java 
> 0
>   PID TTY  TIME CMD
> 16212 ?00:00:11 java 
> 0
>   PID TTY  TIME CMD
> 16212 ?00:00:11 java 
> 0
>   PID TTY  TIME CMD
> 1
>   PID TTY  TIME CMD
> 1
>   PID TTY  TIME CMD
> 1
>   PID TTY  TIME CMD
> ...
> {noformat}
> in the first 3 times of the loop, the process did not terminate so the exit 
> code of {{ps -p}} are still {{0}}
> *Proposal of a fix*
> Firstly I was thinking to add a more comprehensive pid check, it checks the 
> pid liveness until reaches the HADOOP_STOP_TIMEOUT, but this seems to add too 
> much complexity. Second fix was to simply add a {{sleep 3}} after {{kill 
> -9}}, it should fix the error in most cases with relative small changes to 
> the script.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13837) Always get unable to kill error message even the hadoop process was successfully killed

2017-06-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069326#comment-16069326
 ] 

Hadoop QA commented on HADOOP-13837:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 4s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m  
9s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
54s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 19m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HADOOP-13837 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12840813/HADOOP-13837.04.patch 
|
| Optional Tests |  asflicense  mvnsite  unit  shellcheck  shelldocs  |
| uname | Linux dbc81c67da12 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / af2773f |
| shellcheck | v0.4.6 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12673/testReport/ |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12673/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Always get unable to kill error message even the hadoop process was 
> successfully killed
> ---
>
> Key: HADOOP-13837
> URL: https://issues.apache.org/jira/browse/HADOOP-13837
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: check_proc.sh, HADOOP-13837.01.patch, 
> HADOOP-13837.02.patch, HADOOP-13837.03.patch, HADOOP-13837.04.patch
>
>
> *Reproduce steps*
> # Setup a hadoop cluster
> # Stop resource manager : yarn --daemon stop resourcemanager
> # Stop node manager : yarn --daemon stop nodemanager
> WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill 
> with kill -9
> ERROR: Unable to kill 20325
> it always gets "Unable to kill " error message, this gives user 
> impression there is something wrong with the node manager process because it 
> was not able to be forcibly killed. But in fact, the kill command works as 
> expected.
> This was because hadoop-functions.sh did not check process existence after 
> kill properly. Currently it checks the process liveness right after the kill 
> command
> {code}
> ...
> kill -9 "${pid}" >/dev/null 2>&1
> if ps -p "${pid}" > /dev/null 2>&1; then
>   hadoop_error "ERROR: Unable to kill ${pid}"
> ...
> {code}
> when resource manager stopped before node managers, it always takes some 
> additional time until the process completely terminates. I tried to print 
> output of {{ps -p }} in a while loop after kill -9, and found 
> following
> {noformat}
> 16212 ?00:00:11 java 
> 0
>   PID TTY  TIME CMD
> 16212 ?00:00:11 java 
> 0
>   PID TTY  TIME CMD
> 16212 

[jira] [Commented] (HADOOP-13837) Always get unable to kill error message even the hadoop process was successfully killed

2017-02-15 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869219#comment-15869219
 ] 

Weiwei Yang commented on HADOOP-13837:
--

Hi [~aw], [~ajisakaa]

This one seems to become obsolete :( , I'd like to summary the issue again to 
avoid distractions and try to get this done,

*Issue Summary*

Currently when hadoop-functions.sh kills a process forcibly, it checks the 
result immediately without waiting any time, this causes the check always fails 
with error {{ERROR: Unable to kill daemon_pid}}. This is a false alarm that the 
process actually gets killed successfully.

*The Fix*

The patch is simply fixing two things
# Sleep 3 seconds after kill -9
# Replace pid check with existing function hadoop_status_daemon

Hopefully we can get this fixed in 3.0 alpha3, thanks!

> Always get unable to kill error message even the hadoop process was 
> successfully killed
> ---
>
> Key: HADOOP-13837
> URL: https://issues.apache.org/jira/browse/HADOOP-13837
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: check_proc.sh, HADOOP-13837.01.patch, 
> HADOOP-13837.02.patch, HADOOP-13837.03.patch, HADOOP-13837.04.patch
>
>
> *Reproduce steps*
> # Setup a hadoop cluster
> # Stop resource manager : yarn --daemon stop resourcemanager
> # Stop node manager : yarn --daemon stop nodemanager
> WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill 
> with kill -9
> ERROR: Unable to kill 20325
> it always gets "Unable to kill " error message, this gives user 
> impression there is something wrong with the node manager process because it 
> was not able to be forcibly killed. But in fact, the kill command works as 
> expected.
> This was because hadoop-functions.sh did not check process existence after 
> kill properly. Currently it checks the process liveness right after the kill 
> command
> {code}
> ...
> kill -9 "${pid}" >/dev/null 2>&1
> if ps -p "${pid}" > /dev/null 2>&1; then
>   hadoop_error "ERROR: Unable to kill ${pid}"
> ...
> {code}
> when resource manager stopped before node managers, it always takes some 
> additional time until the process completely terminates. I tried to print 
> output of {{ps -p }} in a while loop after kill -9, and found 
> following
> {noformat}
> 16212 ?00:00:11 java 
> 0
>   PID TTY  TIME CMD
> 16212 ?00:00:11 java 
> 0
>   PID TTY  TIME CMD
> 16212 ?00:00:11 java 
> 0
>   PID TTY  TIME CMD
> 1
>   PID TTY  TIME CMD
> 1
>   PID TTY  TIME CMD
> 1
>   PID TTY  TIME CMD
> ...
> {noformat}
> in the first 3 times of the loop, the process did not terminate so the exit 
> code of {{ps -p}} are still {{0}}
> *Proposal of a fix*
> Firstly I was thinking to add a more comprehensive pid check, it checks the 
> pid liveness until reaches the HADOOP_STOP_TIMEOUT, but this seems to add too 
> much complexity. Second fix was to simply add a {{sleep 3}} after {{kill 
> -9}}, it should fix the error in most cases with relative small changes to 
> the script.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13837) Always get unable to kill error message even the hadoop process was successfully killed

2017-01-02 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15794334#comment-15794334
 ] 

Weiwei Yang commented on HADOOP-13837:
--

Hi [~aw]

I have updated the title and the description of this JIRA, hopefully it is 
describing the issue and the fix better. Please help to review the v4 patch. 
Thank you.

> Always get unable to kill error message even the hadoop process was 
> successfully killed
> ---
>
> Key: HADOOP-13837
> URL: https://issues.apache.org/jira/browse/HADOOP-13837
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
> Attachments: HADOOP-13837.01.patch, HADOOP-13837.02.patch, 
> HADOOP-13837.03.patch, HADOOP-13837.04.patch, check_proc.sh
>
>
> *Reproduce steps*
> # Setup a hadoop cluster
> # Stop resource manager : yarn --daemon stop resourcemanager
> # Stop node manager : yarn --daemon stop nodemanager
> WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill 
> with kill -9
> ERROR: Unable to kill 20325
> it always gets "Unable to kill " error message, this gives user 
> impression there is something wrong with the node manager process because it 
> was not able to be forcibly killed. But in fact, the kill command works as 
> expected.
> This was because hadoop-functions.sh did not check process existence after 
> kill properly. Currently it checks the process liveness right after the kill 
> command
> {code}
> ...
> kill -9 "${pid}" >/dev/null 2>&1
> if ps -p "${pid}" > /dev/null 2>&1; then
>   hadoop_error "ERROR: Unable to kill ${pid}"
> ...
> {code}
> when resource manager stopped before node managers, it always takes some 
> additional time until the process completely terminates. I tried to print 
> output of {{ps -p }} in a while loop after kill -9, and found 
> following
> {noformat}
> 16212 ?00:00:11 java 
> 0
>   PID TTY  TIME CMD
> 16212 ?00:00:11 java 
> 0
>   PID TTY  TIME CMD
> 16212 ?00:00:11 java 
> 0
>   PID TTY  TIME CMD
> 1
>   PID TTY  TIME CMD
> 1
>   PID TTY  TIME CMD
> 1
>   PID TTY  TIME CMD
> ...
> {noformat}
> in the first 3 times of the loop, the process did not terminate so the exit 
> code of {{ps -p}} are still {{0}}
> *Proposal of a fix*
> Firstly I was thinking to add a more comprehensive pid check, it checks the 
> pid liveness until reaches the HADOOP_STOP_TIMEOUT, but this seems to add too 
> much complexity. Second fix was to simply add a {{sleep 3}} after {{kill 
> -9}}, it should fix the error in most cases with relative small changes to 
> the script.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org