[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-08-02 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567394#comment-16567394
 ] 

Haibo Chen commented on YARN-6966:
--

The build has finished at 
[https://builds.apache.org/job/PreCommit-YARN-Build/21489/console.] But somehow 
the report was not posted here.

Checking the branch-2 in shortly.

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-2.002.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-3.0.0.001.patch, 
> YARN-6966-branch-3.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-08-02 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567076#comment-16567076
 ] 

Haibo Chen commented on YARN-6966:
--

I have backported YARN-7542 to branch-2. Will kick off another Jenkins job 
shortly.

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-2.002.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-3.0.0.001.patch, 
> YARN-6966-branch-3.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-08-02 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16566683#comment-16566683
 ] 

Szilard Nemeth commented on YARN-6966:
--

Hi [~haibochen]!
I found out what is causing the test to fail.
There is a missing backport on branch-2: YARN-7542
In {{RecoveredContainerLaunch}}, in {{call}} the {{ContainerEventType}} being 
sent is {{PAUSED}} instead of {{CONTAINER_LAUNCHED}}. This ultimately puts the 
container to PAUSED state instead of RUNNING.
When the container becomes RUNNING, the running container metrics is increased.
Could you please do the backport of YARN-7452 to branch-2?
Thanks!


> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-2.002.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-3.0.0.001.patch, 
> YARN-6966-branch-3.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-08-01 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564972#comment-16564972
 ] 

Szilard Nemeth commented on YARN-6966:
--

Thanks [~haibochen] for taking care of the build issue with docker.
Checking the unit test failures now.

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-2.002.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-3.0.0.001.patch, 
> YARN-6966-branch-3.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-31 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564512#comment-16564512
 ] 

Haibo Chen commented on YARN-6966:
--

[~snemeth] The unit tests are related. Seems that the new unit test does not 
clean up properly, causing following tests to fail.

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-2.002.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-3.0.0.001.patch, 
> YARN-6966-branch-3.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-31 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564474#comment-16564474
 ] 

genericqa commented on YARN-6966:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 21m 
42s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
34s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} branch-2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 20s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 4 new + 189 unchanged - 3 fixed = 193 total (was 192) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 15m 24s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 53m 59s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.TestContainerManagerRecovery |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:a716388 |
| JIRA Issue | YARN-6966 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933626/YARN-6966-branch-2.002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 425b686bdc93 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-2 / a716388 |
| maven | version: Apache Maven 3.3.9 
(bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00) |
| Default Java | 1.7.0_181 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/21455/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/21455/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21455/testReport/ |
| Max. process+thread count | 160 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://

[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-31 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564421#comment-16564421
 ] 

Haibo Chen commented on YARN-6966:
--

Retriggerred a Jenkins job now that HADOOP-15644 is fixed.

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-2.002.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-3.0.0.001.patch, 
> YARN-6966-branch-3.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-31 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564137#comment-16564137
 ] 

Haibo Chen commented on YARN-6966:
--

I created HADOOP-15644 to fix the branch-2 docker build issue.

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-2.002.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-3.0.0.001.patch, 
> YARN-6966-branch-3.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-30 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562373#comment-16562373
 ] 

genericqa commented on YARN-6966:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 20m  
9s{color} | {color:red} Docker failed to build yetus/hadoop:f667ef1. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-6966 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933626/YARN-6966-branch-2.002.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21440/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-2.002.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-3.0.0.001.patch, 
> YARN-6966-branch-3.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-30 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562095#comment-16562095
 ] 

genericqa commented on YARN-6966:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m  
6s{color} | {color:red} Docker failed to build yetus/hadoop:f667ef1. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-6966 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933611/YARN-6966-branch-2.002.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21436/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-2.002.patch, 
> YARN-6966-branch-3.0.0.001.patch, YARN-6966-branch-3.0.001.patch, 
> YARN-6966.001.patch, YARN-6966.002.patch, YARN-6966.003.patch, 
> YARN-6966.004.patch, YARN-6966.005.patch, YARN-6966.005.patch, 
> YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-30 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562091#comment-16562091
 ] 

Haibo Chen commented on YARN-6966:
--

Reattached the branch-2 patch to trigger a Jenkins job.

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-2.002.patch, 
> YARN-6966-branch-3.0.0.001.patch, YARN-6966-branch-3.0.001.patch, 
> YARN-6966.001.patch, YARN-6966.002.patch, YARN-6966.003.patch, 
> YARN-6966.004.patch, YARN-6966.005.patch, YARN-6966.005.patch, 
> YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-30 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562088#comment-16562088
 ] 

Haibo Chen commented on YARN-6966:
--

+1 on the branch-3.0 patch. Checking it in shortly

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-3.0.0.001.patch, 
> YARN-6966-branch-3.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-28 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560687#comment-16560687
 ] 

genericqa commented on YARN-6966:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} branch-3.0 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
 2s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 34s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
50s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} branch-3.0 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 22s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 4 new + 191 unchanged - 3 fixed = 195 total (was 194) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 36s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 16m 
18s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 82m 46s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:1776208 |
| JIRA Issue | YARN-6966 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933466/YARN-6966-branch-3.0.001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4d9cf83791ae 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-3.0 / eb40d4f |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/21420/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21420/testReport/ |
| Max. process+thread count | 332 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/h

[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-28 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560658#comment-16560658
 ] 

Szilard Nemeth commented on YARN-6966:
--

hi [~haibochen]!
Oh I see, I didn't know that this works like this.
Added new patches to branch-2 and branch-3.0. 
Both are compiling fine.
I still don't know what are these docker errors.
Thanks!

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-3.0.0.001.patch, 
> YARN-6966-branch-3.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-28 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560654#comment-16560654
 ] 

genericqa commented on YARN-6966:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m  
5s{color} | {color:red} Docker failed to build yetus/hadoop:f667ef1. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-6966 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933464/YARN-6966-branch-2.002.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21419/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-3.0.0.001.patch, 
> YARN-6966.001.patch, YARN-6966.002.patch, YARN-6966.003.patch, 
> YARN-6966.004.patch, YARN-6966.005.patch, YARN-6966.005.patch, 
> YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-27 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560503#comment-16560503
 ] 

Haibo Chen commented on YARN-6966:
--

The branch-2 patch does not compile for me either

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-3.0.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-27 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560486#comment-16560486
 ] 

Haibo Chen commented on YARN-6966:
--

[~snemeth] We should not be doing patch for branch-3.0.0, as it is already 
released. Instead, we should create a patch for branch-3.0 which is the next 
3.0.x maintenance release. 

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-3.0.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558940#comment-16558940
 ] 

genericqa commented on YARN-6966:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 12m 
38s{color} | {color:red} Docker failed to build yetus/hadoop:20ca677. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-6966 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933195/YARN-6966-branch-3.0.0.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21384/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-3.0.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-26 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558250#comment-16558250
 ] 

genericqa commented on YARN-6966:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
10s{color} | {color:red} Docker failed to build yetus/hadoop:20ca677. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-6966 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933195/YARN-6966-branch-3.0.0.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21373/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-3.0.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-26 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558248#comment-16558248
 ] 

Szilard Nemeth commented on YARN-6966:
--

Hi [~haibochen]!
Uploaded patch for branch-3.0.0
I hope the patch was named correctly.
Is there anything I should do with this jira at this point? 
Thanks!

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-3.0.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-25 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556179#comment-16556179
 ] 

genericqa commented on YARN-6966:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 12m  
7s{color} | {color:red} Docker failed to build yetus/hadoop:f667ef1. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-6966 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933031/YARN-6966-branch-2.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21368/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, YARN-6966.001.patch, 
> YARN-6966.002.patch, YARN-6966.003.patch, YARN-6966.004.patch, 
> YARN-6966.005.patch, YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-25 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556155#comment-16556155
 ] 

Haibo Chen commented on YARN-6966:
--

Yes, please add a patch for branch-3.0.

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, YARN-6966.001.patch, 
> YARN-6966.002.patch, YARN-6966.003.patch, YARN-6966.004.patch, 
> YARN-6966.005.patch, YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-25 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555480#comment-16555480
 ] 

Szilard Nemeth commented on YARN-6966:
--

Hi [~haibochen]!
Reopened and moved this issue to Patch Available as I think Yetus won't pick 
this up otherwise.
Added the patch for branch-2.
Should I add another patch to branch-3.0?
Thanks!

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, YARN-6966.001.patch, 
> YARN-6966.002.patch, YARN-6966.003.patch, YARN-6966.004.patch, 
> YARN-6966.005.patch, YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-24 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554735#comment-16554735
 ] 

Haibo Chen commented on YARN-6966:
--

I have backported it to branch-3.1, the cherry-pick into branch-3.0 and 
branch-2, however, failed. [~snemeth] Can you look into if it is plausible?

I'd guess we probably won't be able to backport it to 2.8 or prior release 
lines, given ContainerScheduler has been undergoing a lot of changes. But let's 
see what happens.

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-24 Thread Ruslan Dautkhanov (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554573#comment-16554573
 ] 

Ruslan Dautkhanov commented on YARN-6966:
-

+1 on backporting it.

We have same problem in Hadoop 2.6 (CDH 5.14)

 

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-24 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554491#comment-16554491
 ] 

Erik Krogen commented on YARN-6966:
---

Hi [~haibochen] / [~snemeth], thanks for taking on this work! If I understand 
correctly, this is a longstanding bug, should we backport to older branches?

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553227#comment-16553227
 ] 

Hudson commented on YARN-6966:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14617 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14617/])
YARN-6966. NodeManager metrics may return wrong negative values when NM 
(haibochen: rev 9d3c39e9dd88b8f32223c01328581bb68507d415)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/scheduler/TestContainerSchedulerRecovery.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/metrics/TestNodeManagerMetrics.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/scheduler/ContainerScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java


> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-23 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553183#comment-16553183
 ] 

Haibo Chen commented on YARN-6966:
--

TestContainerSchedulerRecovery.createRecoveredContainerState() is still longer 
than 80 characters, I'll fix it while committing the patch.  +1.

I'll create another Jira for the TODO comment I came across.

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-22 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552033#comment-16552033
 ] 

genericqa commented on YARN-6966:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 55s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 35s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m  
5s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 77m 25s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-6966 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932595/YARN-6966.006.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux bf9c4ff4f9f0 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 9fa9e30 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21333/testReport/ |
| Max. process+thread count | 301 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21333/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> NodeManager metrics may return wrong negativ

[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-22 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552006#comment-16552006
 ] 

Szilard Nemeth commented on YARN-6966:
--

Hi [~haibochen]!
Thanks for the comments.
1. Fixed
2. Fixed
Fixed the whitespace issues as well.
About the TODO and filing a separate jira: 
I am not completely sure at this point what did you mean. Is there any 
additional check need to be performed when calling 
{{metrics.allocateContainer}} or is it simply in a wrong place? If the latter 
which place it should have been? 
Based on my ambiguity, maybe I would not file the jira with the correct 
description, so could you please file that? 
Thanks!

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-20 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551028#comment-16551028
 ] 

Haibo Chen commented on YARN-6966:
--

+1 from me pending some other checkstyle issues  I came across

1) unnecessary new empty line added in TestNMLevelDBStateStoreService

2) An unused import in TestContainerSchedulerRecovery and  
createRecoveredContainerState() line exceeds 80 characters.

P.S, according to " // TODO launchedContainer misplaced -> doesn't necessarily 
mean a container
    // launch. A finished Application will not launch containers." in 
ContainerManagerImpl,

we are not updating metrics at the right place.  For example, if a container is 
queued in ContainerScheduler, the metrics would show the container as launched, 
whereas it is not launched yet. We can file a separate Jira to fix that.

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-20 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550315#comment-16550315
 ] 

Szilard Nemeth commented on YARN-6966:
--

Hi [~rkanter]!
Sorry, just realized you had an LGM comment, so my previous comment can be 
confusing. I guess it's up to [~haibochen] whether he has comments.

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-19 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549889#comment-16549889
 ] 

Szilard Nemeth commented on YARN-6966:
--

I will fix the whitespace issues tomorrow, apart from that, the patch is ready 
to review.

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-19 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549729#comment-16549729
 ] 

genericqa commented on YARN-6966:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 7 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 21s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m  
8s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 78m  3s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-6966 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932271/YARN-6966.005.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 023796d8093b 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 5836e0a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/21302/artifact/out/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21302/testReport/ |
| Max. process+thread count | 303 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YAR

[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-19 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549642#comment-16549642
 ] 

Szilard Nemeth commented on YARN-6966:
--

Reuploaded the same patch again to retrigger the build

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-18 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548546#comment-16548546
 ] 

Robert Kanter commented on YARN-6966:
-

It looks like the job is broken even though it's up now.  All of the tests from 
today ran for < 1 min before failing :(

+1 LGM pending Jenkins
[~haibochen] any other comments?

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-18 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16547480#comment-16547480
 ] 

Szilard Nemeth commented on YARN-6966:
--

Hi [~rkanter]!
builds.apache.org is up now.
Could you please retrigger the build?
Thanks!

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-17 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16547233#comment-16547233
 ] 

Robert Kanter commented on YARN-6966:
-

Looks like http://builds.apache.org/ is down...

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-17 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546599#comment-16546599
 ] 

Szilard Nemeth commented on YARN-6966:
--

Looks like we had build issues globally as another patch on one of my jiras 
also had a docker error.

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-17 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546457#comment-16546457
 ] 

genericqa commented on YARN-6966:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
10s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-6966 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12931926/YARN-6966.005.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21271/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-17 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546439#comment-16546439
 ] 

Szilard Nemeth commented on YARN-6966:
--

Hi [~haibochen] and [~rkanter]!
Thanks for taking time for the review, see my latest patch that fixes the 
issues.
[~haibochen]: Yes, you were right, it wasn't necessary to store the container 
token as it can be created from the startRequest.
Moved the container recovery logic to the recoverActiveContainer() method.

[~rkanter]: Your first comment no longer applies as I'm not saving the 
container token separately, see my answer to Haibo above.
For the second comment, it is a very good idea to have the negative values in 
the testcase.
I was struggling to reproduce it with a test and ultimetely I gave up as it's 
not that straightforward. I hope we can live with this and you think patch 005 
is fine even without this kind of testcase. 

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-05 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534069#comment-16534069
 ] 

Robert Kanter commented on YARN-6966:
-

Thanks [~snemeth] and [~fly_in_gis] for the patches.  Looks good overall.  
Here's a couple comments:
# For {{keyStartToken}}, instead of {{CONTAINERS_KEY_PREFIX + 
containerId.toString() + CONTAINER_START_TOKEN_SUFFIX;}}, we can simply call 
{{getContainerKey(idStr, CONTAINER_START_TOKEN_SUFFIX;)}}
# I ran the {{testNodeManagerMetricsRecovery}} unit test without the actual 
fix, and it it fails, but it only complains that the metrics was {{0}} instead 
of {{1}}.  While that's good, it would be better if we could reproduce the 
original issue with the negative values, if possible.

{noformat}
[ERROR] 
testNodeManagerMetricsRecovery(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManagerRecovery)
  Time elapsed: 1.352 s  <<< FAILURE!
java.lang.AssertionError: Bad value for metric ContainersLaunched expected:<1> 
but was:<0>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at 
org.apache.hadoop.test.MetricsAsserts.assertCounter(MetricsAsserts.java:169)
at 
org.apache.hadoop.yarn.server.nodemanager.metrics.TestNodeManagerMetrics.checkMetrics(TestNodeManagerMetrics.java:121)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManagerRecovery.testNodeManagerMetricsRecovery(TestContainerManagerRecovery.java:454)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)
{noformat}

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContai

[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-05 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534066#comment-16534066
 ] 

Haibo Chen commented on YARN-6966:
--

Thanks [~fly_in_gis] for the original report and patch, and [~snemeth] for 
updating!  The patch saves ContainerTokenIdentifier again in the level-db state 
store, which is already stored as part of RecoveredContainerState.startRequest. 
Hence, I think that is unnecessary. On the container recover path, the metric 
can be updated inside containerScheduler.recoverActiveContainer() where all 
other logic is located.

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-06-21 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519599#comment-16519599
 ] 

genericqa commented on YARN-6966:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
 1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 54s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 10s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
56s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 73m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-6966 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12928643/YARN-6966.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 096162f9277b 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 9f15483 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21074/testReport/ |
| Max. process+thread count | 407 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21074/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> NodeManager metrics may return wrong negative v

[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-06-21 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519536#comment-16519536
 ] 

Szilard Nemeth commented on YARN-6966:
--

Hi [~fly_in_gis]!

See my updated patch.

I wanted to pass the failing testcase 
(TestContainerManagerRecovery#testNodeManagerMetricsRecovery) first.

Actually, a quick workaround to send a container update event with some 
resource and check the metrics after NM recovery if they are matching.

Then I realized it should work out of the box, so when a container is created, 
we need to save its resource requests to the NM state store.

This piece is missing from the current implementation, so I extended 
ContainerManagerImpl.startContainerInternal() with this and adjusted the tests 
accordingly.

[~wilfreds]: Could you please have a look and check whether this makes sense?

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Yang Wang
>Priority: Major
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-06-21 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519408#comment-16519408
 ] 

Szilard Nemeth commented on YARN-6966:
--

Hi [~fly_in_gis]!

Do you mind if I take this over as I want this to be merged soon?

Moreover, the testcase fails so I have a fix for that and I see some cases 
where I can extend your patch.

Thanks!

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Yang Wang
>Priority: Major
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-06-08 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506591#comment-16506591
 ] 

genericqa commented on YARN-6966:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} YARN-6966 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-6966 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12881884/YARN-6966.003.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20989/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Yang Wang
>Priority: Major
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-06-08 Thread Ruslan Dautkhanov (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506584#comment-16506584
 ] 

Ruslan Dautkhanov commented on YARN-6966:
-

Is this possible to backport this bug to Hadoop 2.6?

Thanks.

 

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Yang Wang
>Priority: Major
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2017-08-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126937#comment-16126937
 ] 

Hadoop QA commented on YARN-6966:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
46s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 in trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 17s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 2 new + 90 unchanged - 2 fixed = 92 total (was 92) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 12s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 34m 57s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.TestContainerManagerRecovery |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-6966 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12881884/YARN-6966.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux bbcec4e095ba 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 645a8f2 |
| Default Java | 1.8.0_144 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-YARN-Build/16901/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/16901/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/16901/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/16901/testRepo

[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2017-08-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126862#comment-16126862
 ] 

Hadoop QA commented on YARN-6966:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
43s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 in trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 17s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 8 new + 90 unchanged - 2 fixed = 98 total (was 92) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 22s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 34m 19s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.TestContainerManagerRecovery |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-6966 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12881870/YARN-6966.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux c50c9679697f 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 645a8f2 |
| Default Java | 1.8.0_144 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-YARN-Build/16899/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/16899/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/16899/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/16899/testRepo

[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2017-08-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16118202#comment-16118202
 ] 

Hadoop QA commented on YARN-6966:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
44s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 in trunk has 5 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 16s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 8 new + 86 unchanged - 2 fixed = 94 total (was 88) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m  0s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 36m 19s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.TestContainerManagerRecovery |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-6966 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12880805/YARN-6966.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux e0ccb3f9385f 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 
14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 9891295 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-YARN-Build/16769/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/16769/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/16769/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/16769/testRep