[jira] [Updated] (YARN-10381) Send out application attempt state along with other elements in the application attempt object returned from appattempts REST API call
[ https://issues.apache.org/jira/browse/YARN-10381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Ahuja updated YARN-10381: --- Component/s: yarn-ui-v2 > Send out application attempt state along with other elements in the > application attempt object returned from appattempts REST API call > -- > > Key: YARN-10381 > URL: https://issues.apache.org/jira/browse/YARN-10381 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn-ui-v2 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja >Priority: Minor > > The [ApplicationAttempts RM REST > API|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Application_Attempts_API] > : > {code} > http://rm-http-address:port/ws/v1/cluster/apps/{appid}/appattempts > {code} > returns a collection of Application Attempt objects, where each application > attempt object contains elements like id, nodeId, startTime etc. > This JIRA has been raised to send out Application Attempt state as well as > part of the application attempt information from this REST API call. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10381) Send out application attempt state along with other elements in the application attempt object returned from appattempts REST API call
[ https://issues.apache.org/jira/browse/YARN-10381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Ahuja updated YARN-10381: --- Affects Version/s: 3.3.0 > Send out application attempt state along with other elements in the > application attempt object returned from appattempts REST API call > -- > > Key: YARN-10381 > URL: https://issues.apache.org/jira/browse/YARN-10381 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn-ui-v2 >Affects Versions: 3.3.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja >Priority: Minor > > The [ApplicationAttempts RM REST > API|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Application_Attempts_API] > : > {code} > http://rm-http-address:port/ws/v1/cluster/apps/{appid}/appattempts > {code} > returns a collection of Application Attempt objects, where each application > attempt object contains elements like id, nodeId, startTime etc. > This JIRA has been raised to send out Application Attempt state as well as > part of the application attempt information from this REST API call. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10381) Send out application attempt state along with other elements in the application attempt object returned from appattempts REST API call
Siddharth Ahuja created YARN-10381: -- Summary: Send out application attempt state along with other elements in the application attempt object returned from appattempts REST API call Key: YARN-10381 URL: https://issues.apache.org/jira/browse/YARN-10381 Project: Hadoop YARN Issue Type: Improvement Reporter: Siddharth Ahuja The [ApplicationAttempts RM REST API|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Application_Attempts_API] : {code} http://rm-http-address:port/ws/v1/cluster/apps/{appid}/appattempts {code} returns a collection of Application Attempt objects, where each application attempt object contains elements like id, nodeId, startTime etc. This JIRA has been raised to send out Application Attempt state as well as part of the application attempt information from this REST API call. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10381) Send out application attempt state along with other elements in the application attempt object returned from appattempts REST API call
[ https://issues.apache.org/jira/browse/YARN-10381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Ahuja reassigned YARN-10381: -- Assignee: Siddharth Ahuja > Send out application attempt state along with other elements in the > application attempt object returned from appattempts REST API call > -- > > Key: YARN-10381 > URL: https://issues.apache.org/jira/browse/YARN-10381 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja >Priority: Minor > > The [ApplicationAttempts RM REST > API|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Application_Attempts_API] > : > {code} > http://rm-http-address:port/ws/v1/cluster/apps/{appid}/appattempts > {code} > returns a collection of Application Attempt objects, where each application > attempt object contains elements like id, nodeId, startTime etc. > This JIRA has been raised to send out Application Attempt state as well as > part of the application attempt information from this REST API call. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10229) [Federation] Client should be able to submit application to RM directly using normal client conf
[ https://issues.apache.org/jira/browse/YARN-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168334#comment-17168334 ] Hadoop QA commented on YARN-10229: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 41s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 59s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 18s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 49s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 22m 14s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 79m 50s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/27/artifact/out/Dockerfile | | JIRA Issue | YARN-10229 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13006679/YARN-10229.008.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 2466c29a4332 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 05b3337a460 | | Default Java | Private Build-1.8.0_252-8u252-b09-1~18.04-b09 | | Test Results | https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/27/testReport/ | | Max. process+thread count | 424 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-pr
[jira] [Commented] (YARN-10229) [Federation] Client should be able to submit application to RM directly using normal client conf
[ https://issues.apache.org/jira/browse/YARN-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168317#comment-17168317 ] Íñigo Goiri commented on YARN-10229: +1 on [^YARN-10229.008.patch]. > [Federation] Client should be able to submit application to RM directly using > normal client conf > > > Key: YARN-10229 > URL: https://issues.apache.org/jira/browse/YARN-10229 > Project: Hadoop YARN > Issue Type: Bug > Components: amrmproxy, federation >Affects Versions: 3.1.1 >Reporter: JohnsonGuo >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-10229.001.patch, YARN-10229.002.patch, > YARN-10229.003.patch, YARN-10229.004.patch, YARN-10229.005.patch, > YARN-10229.006.patch, YARN-10229.007.patch, YARN-10229.008.patch > > > Scenario: When enable the yarn federation feature with multi yarn clusters, > one can submit their job to yarn-router by *modified* their client > configuration with yarn router address. > But if one still wants to submit their jobs via the original client (before > enable federation) to RM directly, it will encounter the AMRMToken exception. > That means once enable federation ,if some one want to submit job, they have > to modify the client conf. > > one possible solution for this Scenario is: > In NodeManger, when the client ApplicationMaster request comes: > * get the client job.xml from HDFS "". > * parse the "yarn.resourcemanager.scheduler.address" parameter in job.xml > * if the value of the parameter is "localhost:8049"(AMRM address),then do > the AMRMToken valid process > * if the value of the parameter is "rm:port"(rm address),then skip the > AMRMToken valid process > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168213#comment-17168213 ] Jim Brennan commented on YARN-1529: --- Thanks [~epayne]! > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Jim Brennan >Priority: Major > Fix For: 3.2.2, 2.10.1, 3.4.0, 3.3.1, 3.1.5 > > Attachments: YARN-1529-branch-2.10.001.patch, YARN-1529.005.patch, > YARN-1529.006.patch, YARN-1529.v01.patch, YARN-1529.v02.patch, > YARN-1529.v03.patch, YARN-1529.v04.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168212#comment-17168212 ] Hadoop QA commented on YARN-1529: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} branch-2.10 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 20s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 45s{color} | {color:green} branch-2.10 passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 3m 57s{color} | {color:red} hadoop-yarn in branch-2.10 failed with JDK Oracle Corporation-1.7.0_95-b00. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 7s{color} | {color:green} branch-2.10 passed with JDK Private Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 8s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 33s{color} | {color:green} branch-2.10 passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 29s{color} | {color:red} hadoop-yarn-api in branch-2.10 failed with JDK Oracle Corporation-1.7.0_95-b00. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 29s{color} | {color:red} hadoop-yarn-server-nodemanager in branch-2.10 failed with JDK Oracle Corporation-1.7.0_95-b00. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s{color} | {color:green} branch-2.10 passed with JDK Private Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 16s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 55s{color} | {color:green} branch-2.10 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 46s{color} | {color:green} the patch passed with JDK Oracle Corporation-1.7.0_95-b00 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 20s{color} | {color:green} the patch passed with JDK Private Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 20s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 14s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 535 unchanged - 0 fixed = 537 total (was 535) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 37s{color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkOracleCorporation-1.7.0_95-b00 with JDK Oracle Corporation-1.7.0_95-b00 generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | {color:green} the patch passed with JDK Private Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 25s{color} | {color:green} the patch passed {
[jira] [Comment Edited] (YARN-4575) ApplicationResourceUsageReport should return ALL reserved resource
[ https://issues.apache.org/jira/browse/YARN-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168205#comment-17168205 ] Eric Payne edited comment on YARN-4575 at 7/30/20, 8:45 PM: I'm not sure why 2 pre-commit builds are being triggered. Nevertheless, the unit tests are not failing for me and I think the TestFairSchedulerPreemption failure is YARN-9333. None of the others fail for me locally. was (Author: eepayne): I'm not sure why 2 pre-commit builds are being triggered. Nevertheless, the unit tests are not failing for me and I think the TestFairSchedulerPreemption failure is YARN-9333. > ApplicationResourceUsageReport should return ALL reserved resource > --- > > Key: YARN-4575 > URL: https://issues.apache.org/jira/browse/YARN-4575 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin Chundatt >Priority: Major > Labels: oct16-easy > Attachments: 0001-YARN-4575.patch, 0002-YARN-4575.patch, > YARN-4575.003.patch, YARN-4575.004.patch > > > ApplicationResourceUsageReport reserved resource report is only of default > parition should be of all partitions -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4575) ApplicationResourceUsageReport should return ALL reserved resource
[ https://issues.apache.org/jira/browse/YARN-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168205#comment-17168205 ] Eric Payne commented on YARN-4575: -- I'm not sure why 2 pre-commit builds are being triggered. Nevertheless, the unit tests are not failing for me and I think the TestFairSchedulerPreemption failure is YARN-9333. > ApplicationResourceUsageReport should return ALL reserved resource > --- > > Key: YARN-4575 > URL: https://issues.apache.org/jira/browse/YARN-4575 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin Chundatt >Priority: Major > Labels: oct16-easy > Attachments: 0001-YARN-4575.patch, 0002-YARN-4575.patch, > YARN-4575.003.patch, YARN-4575.004.patch > > > ApplicationResourceUsageReport reserved resource report is only of default > parition should be of all partitions -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168204#comment-17168204 ] Hadoop QA commented on YARN-1529: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 58s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} branch-2.10 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 18s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 50s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 15s{color} | {color:green} branch-2.10 passed with JDK Oracle Corporation-1.7.0_95-b00 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 4s{color} | {color:green} branch-2.10 passed with JDK Private Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 12s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 36s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s{color} | {color:green} branch-2.10 passed with JDK Oracle Corporation-1.7.0_95-b00 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} branch-2.10 passed with JDK Private Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 13s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 47s{color} | {color:green} branch-2.10 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 34s{color} | {color:green} the patch passed with JDK Oracle Corporation-1.7.0_95-b00 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 57s{color} | {color:green} the patch passed with JDK Private Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 57s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 5s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 535 unchanged - 0 fixed = 537 total (was 535) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s{color} | {color:green} the patch passed with JDK Oracle Corporation-1.7.0_95-b00 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} the patch passed with JDK Private Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 56s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 46s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 15m 16s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} |
[jira] [Updated] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-1529: - Fix Version/s: 3.1.5 3.3.1 3.4.0 2.10.1 3.2.2 > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Jim Brennan >Priority: Major > Fix For: 3.2.2, 2.10.1, 3.4.0, 3.3.1, 3.1.5 > > Attachments: YARN-1529-branch-2.10.001.patch, YARN-1529.005.patch, > YARN-1529.006.patch, YARN-1529.v01.patch, YARN-1529.v02.patch, > YARN-1529.v03.patch, YARN-1529.v04.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168161#comment-17168161 ] Jim Brennan commented on YARN-1529: --- [~epayne] I have uploaded a patch for branch-2.10. Incidentally, the compilation error was related to the fact that [YARN-7677] has not been pulled back to branch-2.10. We might want to consider doing that. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-1529-branch-2.10.001.patch, YARN-1529.005.patch, > YARN-1529.006.patch, YARN-1529.v01.patch, YARN-1529.v02.patch, > YARN-1529.v03.patch, YARN-1529.v04.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-1529: -- Attachment: YARN-1529-branch-2.10.001.patch > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-1529-branch-2.10.001.patch, YARN-1529.005.patch, > YARN-1529.006.patch, YARN-1529.v01.patch, YARN-1529.v02.patch, > YARN-1529.v03.patch, YARN-1529.v04.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168136#comment-17168136 ] Jim Brennan commented on YARN-1529: --- Thanks [~epayne]! I will put up a patch for branch-2.10. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-1529.005.patch, YARN-1529.006.patch, > YARN-1529.v01.patch, YARN-1529.v02.patch, YARN-1529.v03.patch, > YARN-1529.v04.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168122#comment-17168122 ] Eric Payne commented on YARN-1529: -- I don't know why 2 pre-commit builds were kicked off. The first was fine but the second one had several unit test failures. Those unit tests all succeed for me locally. I have committed to branch-3.1 to trunk. However, although there were no merge conflicts in backporting to 2.10, the following code does not compile: {code:title=ContainerLaunch#sanitizeEnv} addToEnvMap(environment, nmVars, Environment.LOCALIZATION_COUNTERS.name(), container.localizationCountersAsString()); {code} > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-1529.005.patch, YARN-1529.006.patch, > YARN-1529.v01.patch, YARN-1529.v02.patch, YARN-1529.v03.patch, > YARN-1529.v04.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10380) Import logic of multi-node allocation in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-10380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168116#comment-17168116 ] Wangda Tan commented on YARN-10380: --- cc: [~prabhujoseph], I think we identified more issues during a debug session. I saw YARN-10360 is filed, but I think there're more issues, do you remember? Also + [~sunil.gov...@gmail.com], [~tangzhankun]. I checked logics of other parts, I didn't see too many other issues, but I didn't spend much time on this so it is possible I missed something. > Import logic of multi-node allocation in CapacityScheduler > -- > > Key: YARN-10380 > URL: https://issues.apache.org/jira/browse/YARN-10380 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wangda Tan >Priority: Critical > > *1) Entry point:* > When we do multi-node allocation, we're using the same logic of async > scheduling: > {code:java} > // Allocate containers of node [start, end) > for (FiCaSchedulerNode node : nodes) { > if (current++ >= start) { > if (shouldSkipNodeSchedule(node, cs, printSkipedNodeLogging)) { > continue; > } > cs.allocateContainersToNode(node.getNodeID(), false); > } > } {code} > Is it the most effective way to do multi-node scheduling? Should we allocate > based on partitions? In above logic, if we have thousands of node in one > partition, we will repeatly access all nodes of the partition thousands of > times. > I would suggest looking at making entry-point for node-heartbeat, > async-scheduling (single node), and async-scheduling (multi-node) to be > different. > Node-heartbeat and async-scheduling (single node) can be still similar and > share most of the code. > async-scheduling (multi-node): should iterate partition first, using pseudo > code like: > {code:java} > for (partition : all partitions) { > allocateContainersOnMultiNodes(getCandidate(partition)) > } {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10380) Import logic of multi-node allocation in CapacityScheduler
Wangda Tan created YARN-10380: - Summary: Import logic of multi-node allocation in CapacityScheduler Key: YARN-10380 URL: https://issues.apache.org/jira/browse/YARN-10380 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan *1) Entry point:* When we do multi-node allocation, we're using the same logic of async scheduling: {code:java} // Allocate containers of node [start, end) for (FiCaSchedulerNode node : nodes) { if (current++ >= start) { if (shouldSkipNodeSchedule(node, cs, printSkipedNodeLogging)) { continue; } cs.allocateContainersToNode(node.getNodeID(), false); } } {code} Is it the most effective way to do multi-node scheduling? Should we allocate based on partitions? In above logic, if we have thousands of node in one partition, we will repeatly access all nodes of the partition thousands of times. I would suggest looking at making entry-point for node-heartbeat, async-scheduling (single node), and async-scheduling (multi-node) to be different. Node-heartbeat and async-scheduling (single node) can be still similar and share most of the code. async-scheduling (multi-node): should iterate partition first, using pseudo code like: {code:java} for (partition : all partitions) { allocateContainersOnMultiNodes(getCandidate(partition)) } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10379) Refactor ContainerExecutor exit code Exception handling
Benjamin Teke created YARN-10379: Summary: Refactor ContainerExecutor exit code Exception handling Key: YARN-10379 URL: https://issues.apache.org/jira/browse/YARN-10379 Project: Hadoop YARN Issue Type: Improvement Components: yarn Reporter: Benjamin Teke Assignee: Benjamin Teke **Currently every time a shell command is executed and returns with a non-zero exitcode an exception gets thrown. But along the call tree this exception gets catched, after some info/warn logging and other processing steps rethrown, possibly packaged to another exception. For example: * from PrivilegedOperationExecutor.executePrivilegedOperation - ExitCodeException catch (as IOException), PrivilegedOperationException thrown * then in LinuxContainerExecutor.startLocalizer - PrivilegedOperationException catch, exitCode collection, logging, IOException rethrown * then in ResourceLocalizationService.run - generic Exception catch, but there is a TODO for separate ExitCodeException handling, however that information is only present here in an error message string This flow could be simplified and unified in the different executors. For example use one specific exception till the last possible step, catch it only where it is necessary and keep the exitcode as it could be used later in the process. This change could help with maintainability and readability. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168064#comment-17168064 ] Hudson commented on YARN-1529: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18481 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18481/]) YARN-1529: Add Localization overhead metrics to NM. Contributed by (ericp: rev e0c9653166df48a47267dbc81d124ab78267e039) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/Container.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerResourceLocalizedEvent.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/MockContainer.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/metrics/NodeManagerMetrics.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-1529.005.patch, YARN-1529.006.patch, > YARN-1529.v01.patch, YARN-1529.v02.patch, YARN-1529.v03.patch, > YARN-1529.v04.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4783) Log aggregation failure for application when Nodemanager is restarted
[ https://issues.apache.org/jira/browse/YARN-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Gyori updated YARN-4783: --- Attachment: YARN-4783.001.patch > Log aggregation failure for application when Nodemanager is restarted > -- > > Key: YARN-4783 > URL: https://issues.apache.org/jira/browse/YARN-4783 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-4783.001.patch > > > Scenario : > = > 1.Start NM with user dsperf:hadoop > 2.Configure linux-execute user as dsperf > 3.Submit application with yarn user > 4.Once few containers are allocated to NM 1 > 5.Nodemanager 1 is stopped (wait for expiry ) > 6.Start node manager after application is completed > 7.Check the log aggregation is happening for the containers log in NMLocal > directory > Expect Output : > === > Log aggregation should be succesfull > Actual Output : > === > Log aggreation not successfull -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-4783) Log aggregation failure for application when Nodemanager is restarted
[ https://issues.apache.org/jira/browse/YARN-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Gyori reopened YARN-4783: I am reopening this issue in order to find a less invasive approach on how to handle this corner case, since it was reported a long time ago and still has not been resolved yet. Uploaded a new patch without a test case for now. The main idea is to try to renew the token stored in the application credentials, on an application state transition from NEW to INITING. If the renewal process is successful, the token is valid and nothing needs to be done from the application's point of view. However, if the renewal is failed with InvalidToken error, we request a new one on behalf of the user. In case of a token request, it is now the application's responsibility to clean it up, when the corresponding operations are done, therefore it is canceled when the log aggregation is finished. > Log aggregation failure for application when Nodemanager is restarted > -- > > Key: YARN-4783 > URL: https://issues.apache.org/jira/browse/YARN-4783 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore >Assignee: Andras Gyori >Priority: Major > > Scenario : > = > 1.Start NM with user dsperf:hadoop > 2.Configure linux-execute user as dsperf > 3.Submit application with yarn user > 4.Once few containers are allocated to NM 1 > 5.Nodemanager 1 is stopped (wait for expiry ) > 6.Start node manager after application is completed > 7.Check the log aggregation is happening for the containers log in NMLocal > directory > Expect Output : > === > Log aggregation should be succesfull > Actual Output : > === > Log aggreation not successfull -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-4783) Log aggregation failure for application when Nodemanager is restarted
[ https://issues.apache.org/jira/browse/YARN-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Gyori reassigned YARN-4783: -- Assignee: Andras Gyori > Log aggregation failure for application when Nodemanager is restarted > -- > > Key: YARN-4783 > URL: https://issues.apache.org/jira/browse/YARN-4783 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore >Assignee: Andras Gyori >Priority: Major > > Scenario : > = > 1.Start NM with user dsperf:hadoop > 2.Configure linux-execute user as dsperf > 3.Submit application with yarn user > 4.Once few containers are allocated to NM 1 > 5.Nodemanager 1 is stopped (wait for expiry ) > 6.Start node manager after application is completed > 7.Check the log aggregation is happening for the containers log in NMLocal > directory > Expect Output : > === > Log aggregation should be succesfull > Actual Output : > === > Log aggreation not successfull -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9136) getNMResourceInfo NodeManager REST API method is not documented
[ https://issues.apache.org/jira/browse/YARN-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167782#comment-17167782 ] Hadoop QA commented on YARN-9136: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:blue}0{color} | {color:blue} markdownlint {color} | {color:blue} 0m 0s{color} | {color:blue} markdownlint was not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 37m 52s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 45s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 56m 32s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/PreCommit-YARN-Build/26327/artifact/out/Dockerfile | | JIRA Issue | YARN-9136 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13008739/YARN-9136.002.patch | | Optional Tests | dupname asflicense mvnsite markdownlint | | uname | Linux e20d3254f4b3 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / cf4eb756085 | | Max. process+thread count | 308 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/26327/console | | versions | git=2.17.1 maven=3.6.0 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. > getNMResourceInfo NodeManager REST API method is not documented > --- > > Key: YARN-9136 > URL: https://issues.apache.org/jira/browse/YARN-9136 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Hudáky Márton Gyula >Priority: Major > Attachments: YARN-9136.001.patch, YARN-9136.002.patch > > > I cannot find documentation for the resources endpoint in NMWebServices: > /ws/v1/node/resources/\{resourcename\} > I looked in the file NodeManagerRest.md for documentation but haven't found > any. > This is supposedly unintentionally not documented: > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRest.md -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9136) getNMResourceInfo NodeManager REST API method is not documented
[ https://issues.apache.org/jira/browse/YARN-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167780#comment-17167780 ] Hadoop QA commented on YARN-9136: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 53s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 1s{color} | {color:green} No case conflicting files found. {color} | | {color:blue}0{color} | {color:blue} markdownlint {color} | {color:blue} 0m 1s{color} | {color:blue} markdownlint was not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 34m 16s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 49s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 50m 41s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/25/artifact/out/Dockerfile | | JIRA Issue | YARN-9136 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13008739/YARN-9136.002.patch | | Optional Tests | dupname asflicense mvnsite markdownlint | | uname | Linux 985367a2636f 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / cf4eb756085 | | Max. process+thread count | 433 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Console output | https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/25/console | | versions | git=2.17.1 maven=3.6.0 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. > getNMResourceInfo NodeManager REST API method is not documented > --- > > Key: YARN-9136 > URL: https://issues.apache.org/jira/browse/YARN-9136 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Hudáky Márton Gyula >Priority: Major > Attachments: YARN-9136.001.patch, YARN-9136.002.patch > > > I cannot find documentation for the resources endpoint in NMWebServices: > /ws/v1/node/resources/\{resourcename\} > I looked in the file NodeManagerRest.md for documentation but haven't found > any. > This is supposedly unintentionally not documented: > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRest.md -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10378) When NM goes down and comes back up, PC allocation tags are not removed for completed containers
[ https://issues.apache.org/jira/browse/YARN-10378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tarun Parimi resolved YARN-10378. - Resolution: Duplicate Looks like YARN-10034 fixes this issue for NM going down scenario also. Closing as duplicate. > When NM goes down and comes back up, PC allocation tags are not removed for > completed containers > > > Key: YARN-10378 > URL: https://issues.apache.org/jira/browse/YARN-10378 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.2.0, 3.1.1 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > > We are using placement constaints anti-affinity in an application along with > node label. The application requests two containers with anti affinity on the > node label containing only two nodes. > So two containers will be allocated in the two nodes, one on each node > satisfying anti-affinity. > When one nodemanager goes down for some time, the node is marked as lost by > RM and then it will kill all containers in that node. > The AM will now have one pending container request, since the previous > container got killed. > When the Nodemanager becomes up after some time, the pending container is not > getting allocated in that node again and the application has to wait forever > for that container. > If the ResourceManager is restarted, this issue disappears and the container > gets allocated on the NodeManager which came back up recently. > This seems to be an issue with the allocation tags not removed. > The allocation tag is added for the container > container_e68_1595886973474_0005_01_03 . > {code:java} > 2020-07-28 17:02:04,091 DEBUG constraint.AllocationTagsManager > (AllocationTagsManager.java:addContainer(355)) - Added > container=container_e68_1595886973474_0005_01_03 with tags=[hbase]\ > {code} > However, the allocation tag is not removed when the container > container_e68_1595886973474_0005_01_03 is released. There is no > equivalent DEBUG message seen for removing tags. This means that the tags are > not getting removed. If the tag is not removed, then scheduler will not > allocate in the same node due to anti-affinity resulting in the issue > observed. > {code:java} > 2020-07-28 17:19:34,353 DEBUG scheduler.AbstractYarnScheduler > (AbstractYarnScheduler.java:updateCompletedContainers(1038)) - Container > FINISHED: container_e68_1595886973474_0005_01_03 > 2020-07-28 17:19:34,353 INFO scheduler.AbstractYarnScheduler > (AbstractYarnScheduler.java:completedContainer(669)) - Container > container_e68_1595886973474_0005_01_03 completed with event FINISHED, but > corresponding RMContainer doesn't exist. > {code} > This seems to be due to changes done in YARN-8511 . Change here was made to > remove the tags only after NM confirms container is released. However, in our > scenario this is not happening. So the tag will never get removed until RM > restart. > Reverting YARN-8511 fixes this particular issue and tags are getting removed. > But this is not a valid solution since the problem that YARN-8511 solves is > also valid. We need to find a solution which does not break YARN-8511 and > also fixes this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9136) getNMResourceInfo NodeManager REST API method is not documented
[ https://issues.apache.org/jira/browse/YARN-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hudáky Márton Gyula updated YARN-9136: -- Attachment: YARN-9136.002.patch > getNMResourceInfo NodeManager REST API method is not documented > --- > > Key: YARN-9136 > URL: https://issues.apache.org/jira/browse/YARN-9136 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Hudáky Márton Gyula >Priority: Major > Attachments: YARN-9136.001.patch, YARN-9136.002.patch > > > I cannot find documentation for the resources endpoint in NMWebServices: > /ws/v1/node/resources/\{resourcename\} > I looked in the file NodeManagerRest.md for documentation but haven't found > any. > This is supposedly unintentionally not documented: > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRest.md -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org