[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17169137#comment-17169137 ] Gera Shegalov commented on YARN-1529: - I am glad this is still useful. Thanks for committing, [~Jim_Brennan] [~epayne]! > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Jim Brennan >Priority: Major > Fix For: 3.2.2, 2.10.1, 3.4.0, 3.3.1, 3.1.5 > > Attachments: YARN-1529-branch-2.10.001.patch, YARN-1529.005.patch, > YARN-1529.006.patch, YARN-1529.v01.patch, YARN-1529.v02.patch, > YARN-1529.v03.patch, YARN-1529.v04.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168213#comment-17168213 ] Jim Brennan commented on YARN-1529: --- Thanks [~epayne]! > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Jim Brennan >Priority: Major > Fix For: 3.2.2, 2.10.1, 3.4.0, 3.3.1, 3.1.5 > > Attachments: YARN-1529-branch-2.10.001.patch, YARN-1529.005.patch, > YARN-1529.006.patch, YARN-1529.v01.patch, YARN-1529.v02.patch, > YARN-1529.v03.patch, YARN-1529.v04.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168212#comment-17168212 ] Hadoop QA commented on YARN-1529: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} branch-2.10 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 20s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 45s{color} | {color:green} branch-2.10 passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 3m 57s{color} | {color:red} hadoop-yarn in branch-2.10 failed with JDK Oracle Corporation-1.7.0_95-b00. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 7s{color} | {color:green} branch-2.10 passed with JDK Private Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 8s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 33s{color} | {color:green} branch-2.10 passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 29s{color} | {color:red} hadoop-yarn-api in branch-2.10 failed with JDK Oracle Corporation-1.7.0_95-b00. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 29s{color} | {color:red} hadoop-yarn-server-nodemanager in branch-2.10 failed with JDK Oracle Corporation-1.7.0_95-b00. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s{color} | {color:green} branch-2.10 passed with JDK Private Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 16s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 55s{color} | {color:green} branch-2.10 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 46s{color} | {color:green} the patch passed with JDK Oracle Corporation-1.7.0_95-b00 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 20s{color} | {color:green} the patch passed with JDK Private Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 20s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 14s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 535 unchanged - 0 fixed = 537 total (was 535) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 37s{color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkOracleCorporation-1.7.0_95-b00 with JDK Oracle Corporation-1.7.0_95-b00 generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | {color:green} the patch passed with JDK Private Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 25s{color} | {color:green} the patch passed {
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168204#comment-17168204 ] Hadoop QA commented on YARN-1529: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 58s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} branch-2.10 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 18s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 50s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 15s{color} | {color:green} branch-2.10 passed with JDK Oracle Corporation-1.7.0_95-b00 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 4s{color} | {color:green} branch-2.10 passed with JDK Private Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 12s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 36s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s{color} | {color:green} branch-2.10 passed with JDK Oracle Corporation-1.7.0_95-b00 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} branch-2.10 passed with JDK Private Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 13s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 47s{color} | {color:green} branch-2.10 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 34s{color} | {color:green} the patch passed with JDK Oracle Corporation-1.7.0_95-b00 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 57s{color} | {color:green} the patch passed with JDK Private Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 57s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 5s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 535 unchanged - 0 fixed = 537 total (was 535) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s{color} | {color:green} the patch passed with JDK Oracle Corporation-1.7.0_95-b00 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} the patch passed with JDK Private Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 56s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 46s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 15m 16s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} |
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168161#comment-17168161 ] Jim Brennan commented on YARN-1529: --- [~epayne] I have uploaded a patch for branch-2.10. Incidentally, the compilation error was related to the fact that [YARN-7677] has not been pulled back to branch-2.10. We might want to consider doing that. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-1529-branch-2.10.001.patch, YARN-1529.005.patch, > YARN-1529.006.patch, YARN-1529.v01.patch, YARN-1529.v02.patch, > YARN-1529.v03.patch, YARN-1529.v04.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168136#comment-17168136 ] Jim Brennan commented on YARN-1529: --- Thanks [~epayne]! I will put up a patch for branch-2.10. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-1529.005.patch, YARN-1529.006.patch, > YARN-1529.v01.patch, YARN-1529.v02.patch, YARN-1529.v03.patch, > YARN-1529.v04.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168122#comment-17168122 ] Eric Payne commented on YARN-1529: -- I don't know why 2 pre-commit builds were kicked off. The first was fine but the second one had several unit test failures. Those unit tests all succeed for me locally. I have committed to branch-3.1 to trunk. However, although there were no merge conflicts in backporting to 2.10, the following code does not compile: {code:title=ContainerLaunch#sanitizeEnv} addToEnvMap(environment, nmVars, Environment.LOCALIZATION_COUNTERS.name(), container.localizationCountersAsString()); {code} > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-1529.005.patch, YARN-1529.006.patch, > YARN-1529.v01.patch, YARN-1529.v02.patch, YARN-1529.v03.patch, > YARN-1529.v04.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168064#comment-17168064 ] Hudson commented on YARN-1529: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18481 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18481/]) YARN-1529: Add Localization overhead metrics to NM. Contributed by (ericp: rev e0c9653166df48a47267dbc81d124ab78267e039) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/Container.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerResourceLocalizedEvent.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/MockContainer.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/metrics/NodeManagerMetrics.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-1529.005.patch, YARN-1529.006.patch, > YARN-1529.v01.patch, YARN-1529.v02.patch, YARN-1529.v03.patch, > YARN-1529.v04.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167554#comment-17167554 ] Hadoop QA commented on YARN-1529: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 23m 11s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 44s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 5s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 30s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 1s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 58s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 468 unchanged - 0 fixed = 470 total (was 468) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 53s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 14s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 3s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 27m 35s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 46s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}142m 43s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.TestDockerClient | | | hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor | | | hadoop.yarn.server.nodemanager.webapp.TestNMWebServices | | | hadoop.yarn.server.nodemanager.TestNodeManagerShutdown | | | hadoop.yarn.server.nodemanager.TestLinuxContainerExecutorWithMocks | | | hadoop.yarn.server
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167547#comment-17167547 ] Hadoop QA commented on YARN-1529: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 7s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 14s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 12s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 39s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 50s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 46s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 43s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 469 unchanged - 0 fixed = 471 total (was 469) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 47s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 3s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 4s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 22m 58s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 43s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}128m 27s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/23/artifact/out/Dockerfile | | JIRA Issue | YARN-1529 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13008719/YARN-1529.006.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f37ddd218c
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167516#comment-17167516 ] Jim Brennan commented on YARN-1529: --- [~epayne] I trust you to resolve the minor conflicts in the other branches. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-1529.005.patch, YARN-1529.006.patch, > YARN-1529.v01.patch, YARN-1529.v02.patch, YARN-1529.v03.patch, > YARN-1529.v04.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167513#comment-17167513 ] Eric Payne commented on YARN-1529: -- [~Jim_Brennan], one thing I should point out is that the backports aren't 100% clean, but the conflicts are fairly minor. If you trust me to resolve them myself, I can just do it as part of the commit process. If you'd prefer, you can create separate patches for branch-3.2 and branch-2.10. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-1529.005.patch, YARN-1529.006.patch, > YARN-1529.v01.patch, YARN-1529.v02.patch, YARN-1529.v03.patch, > YARN-1529.v04.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167510#comment-17167510 ] Eric Payne commented on YARN-1529: -- Thanks for the updated patch, [~Jim_Brennan]. The changes LGTM,. +1 I will commit tomorrow if all looks well with the pre-commit build and there are not objections. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-1529.005.patch, YARN-1529.006.patch, > YARN-1529.v01.patch, YARN-1529.v02.patch, YARN-1529.v03.patch, > YARN-1529.v04.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167499#comment-17167499 ] Jim Brennan commented on YARN-1529: --- [~epayne], for the checkstyle issues: {quote}./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java:227: /**: First sentence should end with a period. [JavadocStyle] {quote} I did not fix this because the added code follows the convention for the file. {quote}./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/Container.java:145: * \{@link org.apache.hadoop.yarn.api.ApplicationConstants.Environment#LOCALIZATION_COUNTERS} : Line is longer than 80 characters (found 94). [LineLength] {quote} I did not fix this because it would require breaking up the link string. {quote}./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java:104: private static enum LocalizationCounter {:11: Redundant 'static' modifier. [RedundantModifier] {quote} I fixed this one. I also fixed the unit test and whitespace issues. I am putting up patch 006 with these fixes. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-1529.005.patch, YARN-1529.v01.patch, > YARN-1529.v02.patch, YARN-1529.v03.patch, YARN-1529.v04.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167495#comment-17167495 ] Hadoop QA commented on YARN-1529: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 55s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 20s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 58s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 33s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 35s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 30s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 51s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 29s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 3 new + 373 unchanged - 0 fixed = 376 total (was 373) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 47s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 47s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 4s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 22m 41s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 49s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}122m 25s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch | \\ \\ || Subsystem || Report/Notes || | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/19/artifact/out/Dockerfile | | JIRA Issue | YARN-1529 | | JIRA Patch URL | https://issues.apache.or
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167469#comment-17167469 ] Eric Payne commented on YARN-1529: -- bq. The TestContainerLaunch failures look like they are relevant. I will investigate. Thanks [~Jim_Brennan]. Also, as long as you're at it, can you please look at the whitespace and checkstyle warnings. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-1529.005.patch, YARN-1529.v01.patch, > YARN-1529.v02.patch, YARN-1529.v03.patch, YARN-1529.v04.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167465#comment-17167465 ] Jim Brennan commented on YARN-1529: --- The TestContainerLaunch failures look like they are relevant. I will investigate. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-1529.005.patch, YARN-1529.v01.patch, > YARN-1529.v02.patch, YARN-1529.v03.patch, YARN-1529.v04.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167429#comment-17167429 ] Hadoop QA commented on YARN-1529: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 36s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 50s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 31s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 11s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 32s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 3 new + 374 unchanged - 0 fixed = 377 total (was 374) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 8s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 38s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 56s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 22m 25s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 51s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}113m 59s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch | \\ \\ || Subsystem || Report/Notes || | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/PreCommit-YARN-Build/26321/artifact/out/Dockerfile | | JIRA Issue | YARN-1529 | | JIRA Patch URL | https://issues.apache.or
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17090843#comment-17090843 ] Eric Payne commented on YARN-1529: -- bq. It might be worth splitting the patch so the less controversial NM-level metrics can go in earlier and we can discuss the per-container metrics API in another. +1 for this idea. We would like to see the NM metrics piece integrated. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Chris Trezzo >Priority: Major > Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch, > YARN-1529.v03.patch, YARN-1529.v04.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427301#comment-15427301 ] Jason Lowe commented on YARN-1529: -- bq. One comment that I have is we are adding a new API, albeit a small one, for YARN application developers. That's a great point, and actually I'd be perfectly happy if this JIRA simply added the NM-level metric source and skipped the container API part for now. If we're moving towards doing this via the ATS anyway, we may not want/need the env variable API. It might be worth splitting the patch so the less controversial NM-level metrics can go in earlier and we can discuss the per-container metrics API in another. If the consensus is that this patch should include the per-container metrics API via the container env as well then I'm OK with that too. I also agree that hiding the implementation details of that API would be important, whether that's in this JIRA or another. Either way the patch needs an update, and please feel free to do so. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Chris Trezzo > Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch, > YARN-1529.v03.patch, YARN-1529.v04.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427222#comment-15427222 ] Chris Trezzo commented on YARN-1529: Thanks [~jlowe] for the rebased patch! I agree that it would be nice to not tie these localization metrics to ATS so that more people can leverage them earlier. One comment that I have is we are adding a new API, albeit a small one, for YARN application developers. This API is the serialized data we put into the environment variable (LOCALIZATION_COUNTERS) to communicate the localization statistics to the application-level container. Currently, if a YARN developer wants to leverage these metrics, they have to figure out how information is serialized into this env var and hope it doesn't change. What do you think about adding a small class/method that defines this a little more formally and contains the deserialization logic? That way if another application, let's say TEZ, wants to leverage this data, they can just call the new deserialize method. If you think this is a good idea, I can post another patch with the added class. Thanks! > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Chris Trezzo > Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch, > YARN-1529.v03.patch, YARN-1529.v04.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389871#comment-15389871 ] Chris Trezzo commented on YARN-1529: I can take a crack at rebasing this patch and adjusting it so that it writes to ATS. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch, > YARN-1529.v03.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389855#comment-15389855 ] Chris Trezzo commented on YARN-1529: Scratch that last comment. If metrics are written to ATS the application doesn't have to be aware of it at all. It is just surfaced through the ATS UI. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch, > YARN-1529.v03.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389841#comment-15389841 ] Chris Trezzo commented on YARN-1529: [~mingma] that makes total sense. [~sjlee0] Is there anything that would prevent an application-level process running in a container from querying ATS for framework level metrics about the container itself while the container is running? As a side node, one interesting thing about these particular metrics is as they stand now, once the container is up and running they do not change (i.e. all localization for the container is done). > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch, > YARN-1529.v03.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388826#comment-15388826 ] Ming Ma commented on YARN-1529: --- With ATS v2 in trunk and other frameworks such as Tez wanting such feature, I wonder if there is a way to implement such feature completely in YARN (without MR change MAPREDUCE-5696) by having YARN write framework independent application metrics directly to ATS. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch, > YARN-1529.v03.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267075#comment-14267075 ] Hadoop QA commented on YARN-1529: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621292/YARN-1529.v03.patch against trunk revision 788ee35. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6260//console This message is automatically generated. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch, > YARN-1529.v03.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267068#comment-14267068 ] Andy Schlaikjer commented on YARN-1529: --- Any update on this? These new metrics look valuable. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch, > YARN-1529.v03.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861403#comment-13861403 ] Hadoop QA commented on YARN-1529: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621292/YARN-1529.v03.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2791//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2791//console This message is automatically generated. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch, > YARN-1529.v03.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861211#comment-13861211 ] Hadoop QA commented on YARN-1529: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621247/YARN-1529.v02.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 5 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2787//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2787//console This message is automatically generated. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861167#comment-13861167 ] Hadoop QA commented on YARN-1529: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12621228/YARN-1529.v02.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2783//console This message is automatically generated. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: YARN-1529.v01.patch, YARN-1529.v02.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856439#comment-13856439 ] Gera Shegalov commented on YARN-1529: - bq. Also, is there an MR jira for the per job stats? I linked MAPREDUCE-5696 to this JIRA. bq. Furthermore, shouldn't the per application implementation be such that all applications on YARN can leverage it as compared to just an MR specific implementation. Ideally, yes. As stated in the previous comment, open to suggestions. As of now there seems to be no common application metrics. I expose localization cost as an environment variable (LOCALIZATION_COUNTERS) in MAPREDUCE-5696 to containers. MR containers add them as TaskCounter. We can also include it in MRAppMetrics. Other applications can use this variable in some other way. bq. Is there any comment/doc that describes the overall plan/approach that you are trying to implement? The background is in YARN-1492 bq. I am not sure how these metrics translate into any actionable insights for a cluster admin to act upon. Users will see how localization overhead (shipping computation to data) compares to their container execution times. It should help reconsider build/packaging strategies encourage making better use of DistributedCache, etc. Admins will be able to better dissect network utilization in the cluster. Our particular goal is to clearly demo usefulness of YARN-1492. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: YARN-1529.v01.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856421#comment-13856421 ] Hitesh Shah commented on YARN-1529: --- bq. I am preparing a patch that exposes this information MR counters for MRv2. Is there a better way to achieve this in an application-agnostic manner such that it is visible in the webUI. Also, is there an MR jira for the per job stats? Furthermore, shouldn't the per application implementation be such that all applications on YARN can leverage it as compared to just an MR specific implementation. bq. Currently all resource types are lumped together. We can have a discussion whether it's helpful to expose a finer break down at the NM level or the app-level. Is there any comment/doc that describes the overall plan/approach that you are trying to implement? I am not sure how these metrics translate into any actionable insights for a cluster admin to act upon. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: YARN-1529.v01.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856042#comment-13856042 ] Gera Shegalov commented on YARN-1529: - Hi [~hitesh] thanks for chiming in! > Does the cache ratio account for the local resource visibility i.e. public > cache misses are more important than cache misses for application visibility? The current patch does not differentiate between cache visibilities. I am open to suggestions whether a finer breakdown for cache misses can be helpful. The goal of this and a follow-up MAPREDUCE is to raise awareness at the aggregate leve that shipping computation to data is not free > I assume the "LocalizationDownloadNanos" is an average per container? How > does an average help when there are numerous application types with diff no. > of resources and each container facing a different cache hit ratio? Is this > something which needs to be augmented into the container status and not a > general NM metric? LocalizationDownloadNanos is a total sum of container launch delay due to localization. An average can be obtained as {code}LocalizationDownloadNanos / ContainersLaunched{code}. > For that matter, what is the better option - trackinglocalization metrics on > the NM level or tracking them on a per container/per app level? I am preparing a patch that exposes this information MR counters for MRv2. Is there a better way to achieve this in an application-agnostic manner such that it is visible in the webUI. > Shouldn't there be a metric that tracks the actual size of the local resource > cache on disk? This is a very good idea in my opinion. > What about different resource types - file/archive/pattern? Currently all resource types are lumped together. We can have a discussion whether it's helpful to expose a finer break down at the NM level or the app-level. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: YARN-1529.v01.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856023#comment-13856023 ] Hitesh Shah commented on YARN-1529: --- [~jira.shegalov] Could you add more details on how users should interpret these new metrics? Does the cache ratio account for the local resource visibility i.e. public cache misses are more important than cache misses for application visibility? I assume the "LocalizationDownloadNanos" is an average per container? How does an average help when there are numerous application types with diff no. of resources and each container facing a different cache hit ratio? Is this something which needs to be augmented into the container status and not a general NM metric? For that matter, what is the better option - trackinglocalization metrics on the NM level or tracking them on a per container/per app level? Further thoughts: - Shouldn't there be a metric that tracks the actual size of the local resource cache on disk? - How are public/private/application caches being considered? - What about different resource types - file/archive/pattern? > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: YARN-1529.v01.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM
[ https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856014#comment-13856014 ] Hadoop QA commented on YARN-1529: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620280/YARN-1529.v01.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2718//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2718//console This message is automatically generated. > Add Localization overhead metrics to NM > --- > > Key: YARN-1529 > URL: https://issues.apache.org/jira/browse/YARN-1529 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: YARN-1529.v01.patch > > > Users are often unaware of localization cost that their jobs incur. To > measure effectiveness of localization caches it is necessary to expose the > overhead in the form of metrics. > We propose addition of the following metrics to NodeManagerMetrics. > When a container is about to launch, its set of LocalResources has to be > fetched from a central location, typically on HDFS, that results in a number > of download requests for the files missing in caches. > LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache > misses. > LocalizedFilesCached: total localization requests that were served from local > caches. Cache hits. > LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses. > LocalizedBytesCached: total bytes satisfied from local caches. > Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that > were served out of cache: ratio = 100 * caches / (caches + misses) > LocalizationDownloadNanos: total elapsed time in nanoseconds for a container > to go from ResourceRequestTransition to LocalizedTransition -- This message was sent by Atlassian JIRA (v6.1.5#6160)