[jira] [Updated] (YARN-7673) ClassNotFoundException: org.apache.hadoop.yarn.server.api.DistributedSchedulingAMProtocol when using hadoop-client-minicluster
[ https://issues.apache.org/jira/browse/YARN-7673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-7673: - Affects Version/s: 3.0.0 > ClassNotFoundException: > org.apache.hadoop.yarn.server.api.DistributedSchedulingAMProtocol when using > hadoop-client-minicluster > -- > > Key: YARN-7673 > URL: https://issues.apache.org/jira/browse/YARN-7673 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jeff Zhang > > I'd like to use hadoop-client-minicluster for hadoop downstream project, but > I encounter the following exception when starting hadoop minicluster. And I > check the hadoop-client-minicluster, it indeed does not have this class. Is > this something that is missing when packaging the published jar ? > {code} > java.lang.NoClassDefFoundError: > org/apache/hadoop/yarn/server/api/DistributedSchedulingAMProtocol > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:763) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) > at java.net.URLClassLoader.access$100(URLClassLoader.java:73) > at java.net.URLClassLoader$1.run(URLClassLoader.java:368) > at java.net.URLClassLoader$1.run(URLClassLoader.java:362) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:361) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.createResourceManager(MiniYARNCluster.java:851) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.serviceInit(MiniYARNCluster.java:285) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7673) ClassNotFoundException: org.apache.hadoop.yarn.server.api.DistributedSchedulingAMProtocol when using hadoop-client-minicluster
[ https://issues.apache.org/jira/browse/YARN-7673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296390#comment-16296390 ] Jeff Zhang commented on YARN-7673: -- \cc [~djp] > ClassNotFoundException: > org.apache.hadoop.yarn.server.api.DistributedSchedulingAMProtocol when using > hadoop-client-minicluster > -- > > Key: YARN-7673 > URL: https://issues.apache.org/jira/browse/YARN-7673 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jeff Zhang > > I'd like to use hadoop-client-minicluster for hadoop downstream project, but > I encounter the following exception when starting hadoop minicluster. And I > check the hadoop-client-minicluster, it indeed does not have this class. Is > this something that is missing when packaging the published jar ? > {code} > java.lang.NoClassDefFoundError: > org/apache/hadoop/yarn/server/api/DistributedSchedulingAMProtocol > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:763) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) > at java.net.URLClassLoader.access$100(URLClassLoader.java:73) > at java.net.URLClassLoader$1.run(URLClassLoader.java:368) > at java.net.URLClassLoader$1.run(URLClassLoader.java:362) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:361) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.createResourceManager(MiniYARNCluster.java:851) > at > org.apache.hadoop.yarn.server.MiniYARNCluster.serviceInit(MiniYARNCluster.java:285) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7673) ClassNotFoundException: org.apache.hadoop.yarn.server.api.DistributedSchedulingAMProtocol when using hadoop-client-minicluster
Jeff Zhang created YARN-7673: Summary: ClassNotFoundException: org.apache.hadoop.yarn.server.api.DistributedSchedulingAMProtocol when using hadoop-client-minicluster Key: YARN-7673 URL: https://issues.apache.org/jira/browse/YARN-7673 Project: Hadoop YARN Issue Type: Bug Reporter: Jeff Zhang I'd like to use hadoop-client-minicluster for hadoop downstream project, but I encounter the following exception when starting hadoop minicluster. And I check the hadoop-client-minicluster, it indeed does not have this class. Is this something that is missing when packaging the published jar ? {code} java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/server/api/DistributedSchedulingAMProtocol at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.yarn.server.MiniYARNCluster.createResourceManager(MiniYARNCluster.java:851) at org.apache.hadoop.yarn.server.MiniYARNCluster.serviceInit(MiniYARNCluster.java:285) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6364) How to set the resource queue when start spark job running on yarn
[ https://issues.apache.org/jira/browse/YARN-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932404#comment-15932404 ] Jeff Zhang commented on YARN-6364: -- Set spark.yarn.queue in zeppelin interpreter setting. > How to set the resource queue when start spark job running on yarn > --- > > Key: YARN-6364 > URL: https://issues.apache.org/jira/browse/YARN-6364 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: sydt > > As we all know, yarn takes charge of resource manage for hadoop. When > zeppelin start a spark job with yarn-client mode, how to set the designated > resource queue on yarn in order to make different spark applications belongs > to respective user running different yarn resource queue? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-6364) How to set the resource queue when start spark job running on yarn
[ https://issues.apache.org/jira/browse/YARN-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang resolved YARN-6364. -- Resolution: Invalid > How to set the resource queue when start spark job running on yarn > --- > > Key: YARN-6364 > URL: https://issues.apache.org/jira/browse/YARN-6364 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: sydt > > As we all know, yarn takes charge of resource manage for hadoop. When > zeppelin start a spark job with yarn-client mode, how to set the designated > resource queue on yarn in order to make different spark applications belongs > to respective user running different yarn resource queue? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3603) Application Attempts page confusing
[ https://issues.apache.org/jira/browse/YARN-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904342#comment-14904342 ] Jeff Zhang commented on YARN-3603: -- Still feel excluding failed container doesn't make sense. If some of my containers fail during job running, how can I check its logs through web ui if I don't want to kill the job ? > Application Attempts page confusing > --- > > Key: YARN-3603 > URL: https://issues.apache.org/jira/browse/YARN-3603 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.8.0 >Reporter: Thomas Graves >Assignee: Sunil G > Attachments: 0001-YARN-3603.patch, 0002-YARN-3603.patch, > 0003-YARN-3603.patch, ahs1.png > > > The application attempts page > (http://RM:8088/cluster/appattempt/appattempt_1431101480046_0003_01) > is a bit confusing on what is going on. I think the table of containers > there is for only Running containers and when the app is completed or killed > its empty. The table should have a label on it stating so. > Also the "AM Container" field is a link when running but not when its killed. > That might be confusing. > There is no link to the logs in this page but there is in the app attempt > table when looking at http:// > rm:8088/cluster/app/application_1431101480046_0003 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4182) Killed containers disappear on app attempt page
[ https://issues.apache.org/jira/browse/YARN-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang resolved YARN-4182. -- Resolution: Duplicate > Killed containers disappear on app attempt page > --- > > Key: YARN-4182 > URL: https://issues.apache.org/jira/browse/YARN-4182 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Jeff Zhang > Attachments: 2015-09-18_1601.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4182) Killed containers disappear on app attempt page
[ https://issues.apache.org/jira/browse/YARN-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14805223#comment-14805223 ] Jeff Zhang commented on YARN-4182: -- Thanks [~bibinchundatt] Close it as duplicated and put comment in YARN-3603 > Killed containers disappear on app attempt page > --- > > Key: YARN-4182 > URL: https://issues.apache.org/jira/browse/YARN-4182 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Jeff Zhang > Attachments: 2015-09-18_1601.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3603) Application Attempts page confusing
[ https://issues.apache.org/jira/browse/YARN-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14805222#comment-14805222 ] Jeff Zhang commented on YARN-3603: -- [~sunilg] If it is "Running Container ID", is it still necessary to include "Container Exit Status" ? And any reasons to exclude the killed containers here ? I think it would be helpful to diagnose if all the containers are included. > Application Attempts page confusing > --- > > Key: YARN-3603 > URL: https://issues.apache.org/jira/browse/YARN-3603 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Affects Versions: 2.8.0 >Reporter: Thomas Graves >Assignee: Sunil G > Attachments: 0001-YARN-3603.patch, 0002-YARN-3603.patch, > 0003-YARN-3603.patch, ahs1.png > > > The application attempts page > (http://RM:8088/cluster/appattempt/appattempt_1431101480046_0003_01) > is a bit confusing on what is going on. I think the table of containers > there is for only Running containers and when the app is completed or killed > its empty. The table should have a label on it stating so. > Also the "AM Container" field is a link when running but not when its killed. > That might be confusing. > There is no link to the logs in this page but there is in the app attempt > table when looking at http:// > rm:8088/cluster/app/application_1431101480046_0003 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4182) Killed containers disappear on app attempt page
[ https://issues.apache.org/jira/browse/YARN-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-4182: - Attachment: 2015-09-18_1601.png > Killed containers disappear on app attempt page > --- > > Key: YARN-4182 > URL: https://issues.apache.org/jira/browse/YARN-4182 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Jeff Zhang > Attachments: 2015-09-18_1601.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4182) Killed containers disappear on app attempt page
Jeff Zhang created YARN-4182: Summary: Killed containers disappear on app attempt page Key: YARN-4182 URL: https://issues.apache.org/jira/browse/YARN-4182 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Jeff Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4154) Tez Build with hadoop 2.6.1 fails due to MiniYarnCluster change
[ https://issues.apache.org/jira/browse/YARN-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-4154: - Affects Version/s: 2.6.1 Target Version/s: 2.6.1 Priority: Blocker (was: Major) > Tez Build with hadoop 2.6.1 fails due to MiniYarnCluster change > --- > > Key: YARN-4154 > URL: https://issues.apache.org/jira/browse/YARN-4154 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: Jeff Zhang >Priority: Blocker > > {code} > [ERROR] > /mnt/nfs0/jzhang/tez-autobuild/tez/tez-plugins/tez-yarn-timeline-history/src/test/java/org/apache/tez/tests/MiniTezClusterWithTimeline.java:[92,5] > no suitable constructor found for > MiniYARNCluster(java.lang.String,int,int,int,int,boolean) > constructor > org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int,int) > is not applicable > (actual and formal argument lists differ in length) > constructor > org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int) > is not applicable > (actual and formal argument lists differ in length) > {code} > MR might have the same issue. > \cc [~vinodkv] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4154) Tez Build with hadoop 2.6.1 fails due to MiniYarnCluster change
[ https://issues.apache.org/jira/browse/YARN-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-4154: - Description: {code} [ERROR] /mnt/nfs0/jzhang/tez-autobuild/tez/tez-plugins/tez-yarn-timeline-history/src/test/java/org/apache/tez/tests/MiniTezClusterWithTimeline.java:[92,5] no suitable constructor found for MiniYARNCluster(java.lang.String,int,int,int,int,boolean) constructor org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int,int) is not applicable (actual and formal argument lists differ in length) constructor org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int) is not applicable (actual and formal argument lists differ in length) {code} MR might have the same issue. \cc [~vinodkv] was: {code} [ERROR] /mnt/nfs0/jzhang/tez-autobuild/tez/tez-plugins/tez-yarn-timeline-history/src/test/java/org/apache/tez/tests/MiniTezClusterWithTimeline.java:[92,5] no suitable constructor found for MiniYARNCluster(java.lang.String,int,int,int,int,boolean) constructor org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int,int) is not applicable (actual and formal argument lists differ in length) constructor org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int) is not applicable (actual and formal argument lists differ in length) {code} \cc [~vinodkv] > Tez Build with hadoop 2.6.1 fails due to MiniYarnCluster change > --- > > Key: YARN-4154 > URL: https://issues.apache.org/jira/browse/YARN-4154 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jeff Zhang > > {code} > [ERROR] > /mnt/nfs0/jzhang/tez-autobuild/tez/tez-plugins/tez-yarn-timeline-history/src/test/java/org/apache/tez/tests/MiniTezClusterWithTimeline.java:[92,5] > no suitable constructor found for > MiniYARNCluster(java.lang.String,int,int,int,int,boolean) > constructor > org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int,int) > is not applicable > (actual and formal argument lists differ in length) > constructor > org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int) > is not applicable > (actual and formal argument lists differ in length) > {code} > MR might have the same issue. > \cc [~vinodkv] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4154) Tez Build with hadoop 2.6.1 fails due to MiniYarnCluster change
[ https://issues.apache.org/jira/browse/YARN-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-4154: - Description: {code} [ERROR] /mnt/nfs0/jzhang/tez-autobuild/tez/tez-plugins/tez-yarn-timeline-history/src/test/java/org/apache/tez/tests/MiniTezClusterWithTimeline.java:[92,5] no suitable constructor found for MiniYARNCluster(java.lang.String,int,int,int,int,boolean) constructor org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int,int) is not applicable (actual and formal argument lists differ in length) constructor org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int) is not applicable (actual and formal argument lists differ in length) {code} \cc [~vinodkv] was: {code} [ERROR] /mnt/nfs0/jzhang/tez-autobuild/tez/tez-plugins/tez-yarn-timeline-history/src/test/java/org/apache/tez/tests/MiniTezClusterWithTimeline.java:[92,5] no suitable constructor found for MiniYARNCluster(java.lang.String,int,int,int,int,boolean) constructor org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int,int) is not applicable (actual and formal argument lists differ in length) constructor org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int) is not applicable (actual and formal argument lists differ in length) {code} > Tez Build with hadoop 2.6.1 fails due to MiniYarnCluster change > --- > > Key: YARN-4154 > URL: https://issues.apache.org/jira/browse/YARN-4154 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jeff Zhang > > {code} > [ERROR] > /mnt/nfs0/jzhang/tez-autobuild/tez/tez-plugins/tez-yarn-timeline-history/src/test/java/org/apache/tez/tests/MiniTezClusterWithTimeline.java:[92,5] > no suitable constructor found for > MiniYARNCluster(java.lang.String,int,int,int,int,boolean) > constructor > org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int,int) > is not applicable > (actual and formal argument lists differ in length) > constructor > org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int) > is not applicable > (actual and formal argument lists differ in length) > {code} > \cc [~vinodkv] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4154) Tez Build with hadoop 2.6.1 fails due to MiniYarnCluster change
[ https://issues.apache.org/jira/browse/YARN-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-4154: - Description: {code} [ERROR] /mnt/nfs0/jzhang/tez-autobuild/tez/tez-plugins/tez-yarn-timeline-history/src/test/java/org/apache/tez/tests/MiniTezClusterWithTimeline.java:[92,5] no suitable constructor found for MiniYARNCluster(java.lang.String,int,int,int,int,boolean) constructor org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int,int) is not applicable (actual and formal argument lists differ in length) constructor org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int) is not applicable (actual and formal argument lists differ in length) {code} > Tez Build with hadoop 2.6.1 fails due to MiniYarnCluster change > --- > > Key: YARN-4154 > URL: https://issues.apache.org/jira/browse/YARN-4154 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jeff Zhang > > {code} > [ERROR] > /mnt/nfs0/jzhang/tez-autobuild/tez/tez-plugins/tez-yarn-timeline-history/src/test/java/org/apache/tez/tests/MiniTezClusterWithTimeline.java:[92,5] > no suitable constructor found for > MiniYARNCluster(java.lang.String,int,int,int,int,boolean) > constructor > org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int,int) > is not applicable > (actual and formal argument lists differ in length) > constructor > org.apache.hadoop.yarn.server.MiniYARNCluster.MiniYARNCluster(java.lang.String,int,int,int) > is not applicable > (actual and formal argument lists differ in length) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4154) Tez Build with hadoop 2.6.1 fails due to MiniYarnCluster change
Jeff Zhang created YARN-4154: Summary: Tez Build with hadoop 2.6.1 fails due to MiniYarnCluster change Key: YARN-4154 URL: https://issues.apache.org/jira/browse/YARN-4154 Project: Hadoop YARN Issue Type: Bug Reporter: Jeff Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2262) Few fields displaying wrong values in Timeline server after RM restart
[ https://issues.apache.org/jira/browse/YARN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700867#comment-14700867 ] Jeff Zhang commented on YARN-2262: -- And the document needs to be updated for the deprecation of FileSystemApplicationHistoryStore. http://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-site/TimelineServer.html > Few fields displaying wrong values in Timeline server after RM restart > -- > > Key: YARN-2262 > URL: https://issues.apache.org/jira/browse/YARN-2262 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.4.0 >Reporter: Nishan Shetty >Assignee: Naganarasimha G R > Attachments: Capture.PNG, Capture1.PNG, > yarn-testos-historyserver-HOST-10-18-40-95.log, > yarn-testos-resourcemanager-HOST-10-18-40-84.log, > yarn-testos-resourcemanager-HOST-10-18-40-95.log > > > Few fields displaying wrong values in Timeline server after RM restart > State:null > FinalStatus: UNDEFINED > Started: 8-Jul-2014 14:58:08 > Elapsed: 2562047397789hrs, 44mins, 47sec -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2262) Few fields displaying wrong values in Timeline server after RM restart
[ https://issues.apache.org/jira/browse/YARN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700861#comment-14700861 ] Jeff Zhang commented on YARN-2262: -- But my app still can not be recovered. Does it mean yarn can not recover running app ? {code} 2015-08-18 15:18:35,270 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Recovering attempt: appattempt_1439882258172_0001_01 with final state: null 2015-08-18 15:18:35,270 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Create AMRMToken for ApplicationAttempt: appattempt_1439882258172_0001_01 2015-08-18 15:18:35,273 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Creating password for appattempt_1439882258172_0001_01 2015-08-18 15:18:35,277 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application added - appId: application_1439882258172_0001 user: jzhang leaf-queue of parent: root #applications: 1 2015-08-18 15:18:35,278 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Accepted application application_1439882258172_0001 from user: jzhang, in queue: default 2015-08-18 15:18:35,278 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1439882258172_0001_01 State change from NEW to LAUNCHED 2015-08-18 15:18:35,278 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1439882258172_0001 State change from NEW to ACCEPTED {code} {code} 2015-08-18 15:18:36,305 ERROR org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Application attempt appattempt_1439882258172_0001_01 doesn't exist in ApplicationMasterService cache. 2015-08-18 15:18:36,306 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8030, call org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.allocate from 192.168.3.3:56241 Call#56 Retry#0 org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: Application attempt appattempt_1439882258172_0001_01 doesn't exist in ApplicationMasterService cache. at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:436) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) 2015-08-18 15:18:37,298 INFO org.apache.hadoop.yarn.util.RackResolver: Resolved 192.168.3.3 to /default-rack {code} > Few fields displaying wrong values in Timeline server after RM restart > -- > > Key: YARN-2262 > URL: https://issues.apache.org/jira/browse/YARN-2262 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.4.0 >Reporter: Nishan Shetty >Assignee: Naganarasimha G R > Attachments: Capture.PNG, Capture1.PNG, > yarn-testos-historyserver-HOST-10-18-40-95.log, > yarn-testos-resourcemanager-HOST-10-18-40-84.log, > yarn-testos-resourcemanager-HOST-10-18-40-95.log > > > Few fields displaying wrong values in Timeline server after RM restart > State:null > FinalStatus: UNDEFINED > Started: 8-Jul-2014 14:58:08 > Elapsed: 2562047397789hrs, 44mins, 47sec -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2262) Few fields displaying wrong values in Timeline server after RM restart
[ https://issues.apache.org/jira/browse/YARN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700845#comment-14700845 ] Jeff Zhang commented on YARN-2262: -- Please ignore my last comment, finally find how to use ATS for Storing history data. {code} private ApplicationHistoryManager createApplicationHistoryManager( Configuration conf) { // Backward compatibility: // APPLICATION_HISTORY_STORE is neither null nor empty, it means that the // user has enabled it explicitly. if (conf.get(YarnConfiguration.APPLICATION_HISTORY_STORE) == null || conf.get(YarnConfiguration.APPLICATION_HISTORY_STORE).length() == 0 || conf.get(YarnConfiguration.APPLICATION_HISTORY_STORE).equals( NullApplicationHistoryStore.class.getName())) { return new ApplicationHistoryManagerOnTimelineStore( timelineDataManager, aclsManager); } else { LOG.warn("The filesystem based application history store is deprecated."); return new ApplicationHistoryManagerImpl(); } } {code} > Few fields displaying wrong values in Timeline server after RM restart > -- > > Key: YARN-2262 > URL: https://issues.apache.org/jira/browse/YARN-2262 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.4.0 >Reporter: Nishan Shetty >Assignee: Naganarasimha G R > Attachments: Capture.PNG, Capture1.PNG, > yarn-testos-historyserver-HOST-10-18-40-95.log, > yarn-testos-resourcemanager-HOST-10-18-40-84.log, > yarn-testos-resourcemanager-HOST-10-18-40-95.log > > > Few fields displaying wrong values in Timeline server after RM restart > State:null > FinalStatus: UNDEFINED > Started: 8-Jul-2014 14:58:08 > Elapsed: 2562047397789hrs, 44mins, 47sec -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2262) Few fields displaying wrong values in Timeline server after RM restart
[ https://issues.apache.org/jira/browse/YARN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700830#comment-14700830 ] Jeff Zhang commented on YARN-2262: -- bq. No longer maintain FS based generic history store. I can reproduce this issue easily by restarting RM when app is running. Check YARN-2033, I do see that app history data can now be stored in Timeline service. But it looks like there's no ATS implementation of ApplicationHistoryStore. FileSystemApplicationHistoryStore is still the only feasible one for RM recovery. so does it make sense to make it no longer maintain. Or do I miss something ? [~zjshen] [~djp] > Few fields displaying wrong values in Timeline server after RM restart > -- > > Key: YARN-2262 > URL: https://issues.apache.org/jira/browse/YARN-2262 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.4.0 >Reporter: Nishan Shetty >Assignee: Naganarasimha G R > Attachments: Capture.PNG, Capture1.PNG, > yarn-testos-historyserver-HOST-10-18-40-95.log, > yarn-testos-resourcemanager-HOST-10-18-40-84.log, > yarn-testos-resourcemanager-HOST-10-18-40-95.log > > > Few fields displaying wrong values in Timeline server after RM restart > State:null > FinalStatus: UNDEFINED > Started: 8-Jul-2014 14:58:08 > Elapsed: 2562047397789hrs, 44mins, 47sec -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3763) Support fuzzy search in ATS
[ https://issues.apache.org/jira/browse/YARN-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-3763: - Description: Currently ATS only support exact match. Sometimes fuzzy match may be helpful when the entities in the ATS has some common prefix or suffix. Link with TEZ-2531 (was: Currently ATS only support exact match. Sometimes fuzzy match may be helpful when the entities in the ATS has some common prefix or suffix. ) > Support fuzzy search in ATS > --- > > Key: YARN-3763 > URL: https://issues.apache.org/jira/browse/YARN-3763 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Affects Versions: 2.7.0 >Reporter: Jeff Zhang > > Currently ATS only support exact match. Sometimes fuzzy match may be helpful > when the entities in the ATS has some common prefix or suffix. Link with > TEZ-2531 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3763) Support for fuzzy search in ATS
Jeff Zhang created YARN-3763: Summary: Support for fuzzy search in ATS Key: YARN-3763 URL: https://issues.apache.org/jira/browse/YARN-3763 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver Affects Versions: 2.7.0 Reporter: Jeff Zhang Currently ATS only support exact match. Sometimes fuzzy match may be helpful when the entities in the ATS has some common prefix or suffix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3763) Support fuzzy search in ATS
[ https://issues.apache.org/jira/browse/YARN-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-3763: - Summary: Support fuzzy search in ATS (was: Support for fuzzy search in ATS) > Support fuzzy search in ATS > --- > > Key: YARN-3763 > URL: https://issues.apache.org/jira/browse/YARN-3763 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Affects Versions: 2.7.0 >Reporter: Jeff Zhang > > Currently ATS only support exact match. Sometimes fuzzy match may be helpful > when the entities in the ATS has some common prefix or suffix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3755) Log the command of launching containers
[ https://issues.apache.org/jira/browse/YARN-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570276#comment-14570276 ] Jeff Zhang commented on YARN-3755: -- Close it as won't fix > Log the command of launching containers > --- > > Key: YARN-3755 > URL: https://issues.apache.org/jira/browse/YARN-3755 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Jeff Zhang >Assignee: Jeff Zhang > Attachments: YARN-3755-1.patch, YARN-3755-2.patch > > > In the resource manager log, yarn would log the command for launching AM, > this is very useful. But there's no such log in the NN log for launching > containers. It would be difficult to diagnose when containers fails to launch > due to some issue in the commands. Although user can look at the commands in > the container launch script file, this is an internal things of yarn, usually > user don't know that. In user's perspective, they only know what commands > they specify when building yarn application. > {code} > 2015-06-01 16:06:42,245 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Command > to launch container container_1433145984561_0001_01_01 : > $JAVA_HOME/bin/java -server -Djava.net.preferIPv4Stack=true > -Dhadoop.metrics.log.level=WARN -Xmx1024m > -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator > -Dlog4j.configuration=tez-container-log4j.properties > -Dyarn.app.container.log.dir= -Dtez.root.logger=info,CLA > -Dsun.nio.ch.bugLevel='' org.apache.tez.dag.app.DAGAppMaster > 1>/stdout 2>/stderr > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3755) Log the command of launching containers
[ https://issues.apache.org/jira/browse/YARN-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570275#comment-14570275 ] Jeff Zhang commented on YARN-3755: -- bq. How about we let individual frameworks like MapReduce/Tez log them as needed? That seems like the right place for debugging too - app developers don't always get access to the daemon logs. Make sense. > Log the command of launching containers > --- > > Key: YARN-3755 > URL: https://issues.apache.org/jira/browse/YARN-3755 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Jeff Zhang >Assignee: Jeff Zhang > Attachments: YARN-3755-1.patch, YARN-3755-2.patch > > > In the resource manager log, yarn would log the command for launching AM, > this is very useful. But there's no such log in the NN log for launching > containers. It would be difficult to diagnose when containers fails to launch > due to some issue in the commands. Although user can look at the commands in > the container launch script file, this is an internal things of yarn, usually > user don't know that. In user's perspective, they only know what commands > they specify when building yarn application. > {code} > 2015-06-01 16:06:42,245 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Command > to launch container container_1433145984561_0001_01_01 : > $JAVA_HOME/bin/java -server -Djava.net.preferIPv4Stack=true > -Dhadoop.metrics.log.level=WARN -Xmx1024m > -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator > -Dlog4j.configuration=tez-container-log4j.properties > -Dyarn.app.container.log.dir= -Dtez.root.logger=info,CLA > -Dsun.nio.ch.bugLevel='' org.apache.tez.dag.app.DAGAppMaster > 1>/stdout 2>/stderr > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3755) Log the command of launching containers
[ https://issues.apache.org/jira/browse/YARN-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-3755: - Attachment: YARN-3755-2.patch Upload new patch to address the checkstyle issue > Log the command of launching containers > --- > > Key: YARN-3755 > URL: https://issues.apache.org/jira/browse/YARN-3755 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Jeff Zhang >Assignee: Jeff Zhang > Attachments: YARN-3755-1.patch, YARN-3755-2.patch > > > In the resource manager log, yarn would log the command for launching AM, > this is very useful. But there's no such log in the NN log for launching > containers. It would be difficult to diagnose when containers fails to launch > due to some issue in the commands. Although user can look at the commands in > the container launch script file, this is an internal things of yarn, usually > user don't know that. In user's perspective, they only know what commands > they specify when building yarn application. > {code} > 2015-06-01 16:06:42,245 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Command > to launch container container_1433145984561_0001_01_01 : > $JAVA_HOME/bin/java -server -Djava.net.preferIPv4Stack=true > -Dhadoop.metrics.log.level=WARN -Xmx1024m > -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator > -Dlog4j.configuration=tez-container-log4j.properties > -Dyarn.app.container.log.dir= -Dtez.root.logger=info,CLA > -Dsun.nio.ch.bugLevel='' org.apache.tez.dag.app.DAGAppMaster > 1>/stdout 2>/stderr > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3755) Log the command of launching containers
[ https://issues.apache.org/jira/browse/YARN-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-3755: - Target Version/s: 2.7.1 > Log the command of launching containers > --- > > Key: YARN-3755 > URL: https://issues.apache.org/jira/browse/YARN-3755 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Jeff Zhang >Assignee: Jeff Zhang > Attachments: YARN-3755-1.patch > > > In the resource manager log, yarn would log the command for launching AM, > this is very useful. But there's no such log in the NN log for launching > containers. It would be difficult to diagnose when containers fails to launch > due to some issue in the commands. Although user can look at the commands in > the container launch script file, this is an internal things of yarn, usually > user don't know that. In user's perspective, they only know what commands > they specify when building yarn application. > {code} > 2015-06-01 16:06:42,245 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Command > to launch container container_1433145984561_0001_01_01 : > $JAVA_HOME/bin/java -server -Djava.net.preferIPv4Stack=true > -Dhadoop.metrics.log.level=WARN -Xmx1024m > -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator > -Dlog4j.configuration=tez-container-log4j.properties > -Dyarn.app.container.log.dir= -Dtez.root.logger=info,CLA > -Dsun.nio.ch.bugLevel='' org.apache.tez.dag.app.DAGAppMaster > 1>/stdout 2>/stderr > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3755) Log the command of launching containers
[ https://issues.apache.org/jira/browse/YARN-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-3755: - Attachment: YARN-3755-1.patch > Log the command of launching containers > --- > > Key: YARN-3755 > URL: https://issues.apache.org/jira/browse/YARN-3755 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jeff Zhang >Assignee: Jeff Zhang > Attachments: YARN-3755-1.patch > > > In the resource manager log, yarn would log the command for launching AM, > this is very useful. But there's no such log in the NN log for launching > containers. It would be difficult to diagnose when containers fails to launch > due to some issue in the commands. Although user can look at the commands in > the container launch script file, this is an internal things of yarn, usually > user don't know that. In user's perspective, they only know what commands > they specify when building yarn application. > {code} > 2015-06-01 16:06:42,245 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Command > to launch container container_1433145984561_0001_01_01 : > $JAVA_HOME/bin/java -server -Djava.net.preferIPv4Stack=true > -Dhadoop.metrics.log.level=WARN -Xmx1024m > -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator > -Dlog4j.configuration=tez-container-log4j.properties > -Dyarn.app.container.log.dir= -Dtez.root.logger=info,CLA > -Dsun.nio.ch.bugLevel='' org.apache.tez.dag.app.DAGAppMaster > 1>/stdout 2>/stderr > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3755) Log the command of launching containers
[ https://issues.apache.org/jira/browse/YARN-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-3755: - Description: In the resource manager log, yarn would log the command for launching AM, this is very useful. But there's no such log in the NN log for launching containers. It would be difficult to diagnose when containers fails to launch due to some issue in the commands. Although user can look at the commands in the container launch script file, this is an internal things of yarn, usually user don't know that. In user's perspective, they only know what commands they specify when building yarn application. {code} 2015-06-01 16:06:42,245 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Command to launch container container_1433145984561_0001_01_01 : $JAVA_HOME/bin/java -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir= -Dtez.root.logger=info,CLA -Dsun.nio.ch.bugLevel='' org.apache.tez.dag.app.DAGAppMaster 1>/stdout 2>/stderr {code} was: In the resource manager, yarn would log the command for launching AM, this is very useful. But there's no such log in the NN log for launching containers. It would be difficult to diagnose when containers fails to launch due to some issue in the commands. Although use can look at the commands in the container launch script file, this is an internal things of yarn, usually user don't know that. In user's perspective, they only know what command they specify when building yarn application. {code} 2015-06-01 16:06:42,245 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Command to launch container container_1433145984561_0001_01_01 : $JAVA_HOME/bin/java -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir= -Dtez.root.logger=info,CLA -Dsun.nio.ch.bugLevel='' org.apache.tez.dag.app.DAGAppMaster 1>/stdout 2>/stderr {code} > Log the command of launching containers > --- > > Key: YARN-3755 > URL: https://issues.apache.org/jira/browse/YARN-3755 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jeff Zhang >Assignee: Jeff Zhang > Attachments: YARN-3755-1.patch > > > In the resource manager log, yarn would log the command for launching AM, > this is very useful. But there's no such log in the NN log for launching > containers. It would be difficult to diagnose when containers fails to launch > due to some issue in the commands. Although user can look at the commands in > the container launch script file, this is an internal things of yarn, usually > user don't know that. In user's perspective, they only know what commands > they specify when building yarn application. > {code} > 2015-06-01 16:06:42,245 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Command > to launch container container_1433145984561_0001_01_01 : > $JAVA_HOME/bin/java -server -Djava.net.preferIPv4Stack=true > -Dhadoop.metrics.log.level=WARN -Xmx1024m > -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator > -Dlog4j.configuration=tez-container-log4j.properties > -Dyarn.app.container.log.dir= -Dtez.root.logger=info,CLA > -Dsun.nio.ch.bugLevel='' org.apache.tez.dag.app.DAGAppMaster > 1>/stdout 2>/stderr > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3755) Log the command of launching containers
Jeff Zhang created YARN-3755: Summary: Log the command of launching containers Key: YARN-3755 URL: https://issues.apache.org/jira/browse/YARN-3755 Project: Hadoop YARN Issue Type: Improvement Reporter: Jeff Zhang In the resource manager, yarn would log the command for launching AM, this is very useful. But there's no such log in the NN log for launching containers. It would be difficult to diagnose when containers fails to launch due to some issue in the commands. Although use can look at the commands in the container launch script file, this is an internal things of yarn, usually user don't know that. In user's perspective, they only know what command they specify when building yarn application. {code} 2015-06-01 16:06:42,245 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Command to launch container container_1433145984561_0001_01_01 : $JAVA_HOME/bin/java -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir= -Dtez.root.logger=info,CLA -Dsun.nio.ch.bugLevel='' org.apache.tez.dag.app.DAGAppMaster 1>/stdout 2>/stderr {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3755) Log the command of launching containers
[ https://issues.apache.org/jira/browse/YARN-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang reassigned YARN-3755: Assignee: Jeff Zhang > Log the command of launching containers > --- > > Key: YARN-3755 > URL: https://issues.apache.org/jira/browse/YARN-3755 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jeff Zhang >Assignee: Jeff Zhang > > In the resource manager, yarn would log the command for launching AM, this is > very useful. But there's no such log in the NN log for launching containers. > It would be difficult to diagnose when containers fails to launch due to some > issue in the commands. Although use can look at the commands in the container > launch script file, this is an internal things of yarn, usually user don't > know that. In user's perspective, they only know what command they specify > when building yarn application. > {code} > 2015-06-01 16:06:42,245 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Command > to launch container container_1433145984561_0001_01_01 : > $JAVA_HOME/bin/java -server -Djava.net.preferIPv4Stack=true > -Dhadoop.metrics.log.level=WARN -Xmx1024m > -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator > -Dlog4j.configuration=tez-container-log4j.properties > -Dyarn.app.container.log.dir= -Dtez.root.logger=info,CLA > -Dsun.nio.ch.bugLevel='' org.apache.tez.dag.app.DAGAppMaster > 1>/stdout 2>/stderr > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1609) Add Service Container type to NodeManager in YARN
[ https://issues.apache.org/jira/browse/YARN-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-1609: - Assignee: (was: Jeff Zhang) > Add Service Container type to NodeManager in YARN > - > > Key: YARN-1609 > URL: https://issues.apache.org/jira/browse/YARN-1609 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.2.0 >Reporter: Wangda Tan (No longer used) > Attachments: Add Service Container type to NodeManager in YARN-V1.pdf > > > From our work to support running OpenMPI on YARN (MAPREDUCE-2911), we found > that it’s important to have framework specific daemon process manage the > tasks on each node directly. The daemon process, most likely similar in other > frameworks as well, provides critical services to tasks running on that > node(for example “wireup”, spawn user process in large numbers at once etc). > In YARN, it’s hard, if not possible, to have the those processes to be > managed by YARN. > We propose to extend the container model on NodeManager side to support > “Service Container” to run/manage such framework daemon/services process. We > believe this is very useful to other application framework developers as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3171) Sort by application id doesn't work in ATS web ui
[ https://issues.apache.org/jira/browse/YARN-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315680#comment-14315680 ] Jeff Zhang commented on YARN-3171: -- [~Naganarasimha] Please go ahead. > Sort by application id doesn't work in ATS web ui > - > > Key: YARN-3171 > URL: https://issues.apache.org/jira/browse/YARN-3171 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Jeff Zhang >Assignee: Naganarasimha G R >Priority: Minor > Attachments: ats_webui.png > > > The order doesn't change when I click the column header -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3171) Sort by application id doesn't work in ATS web ui
[ https://issues.apache.org/jira/browse/YARN-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-3171: - Attachment: ats_webui.png attach screenshot > Sort by application id doesn't work in ATS web ui > - > > Key: YARN-3171 > URL: https://issues.apache.org/jira/browse/YARN-3171 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Jeff Zhang >Assignee: Naganarasimha G R > Attachments: ats_webui.png > > > The order doesn't change when I click the column header -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3171) Sort by application id doesn't work in ATS web ui
[ https://issues.apache.org/jira/browse/YARN-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-3171: - Priority: Minor (was: Major) > Sort by application id doesn't work in ATS web ui > - > > Key: YARN-3171 > URL: https://issues.apache.org/jira/browse/YARN-3171 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Jeff Zhang >Assignee: Naganarasimha G R >Priority: Minor > Attachments: ats_webui.png > > > The order doesn't change when I click the column header -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3171) Sort by application id doesn't work in ATS web ui
[ https://issues.apache.org/jira/browse/YARN-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-3171: - Summary: Sort by application id doesn't work in ATS web ui (was: Sort by application id don't work in ATS web ui) > Sort by application id doesn't work in ATS web ui > - > > Key: YARN-3171 > URL: https://issues.apache.org/jira/browse/YARN-3171 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Jeff Zhang > > The order doesn't change when I click the column header -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3171) Sort by application id don't work in ATS web ui
Jeff Zhang created YARN-3171: Summary: Sort by application id don't work in ATS web ui Key: YARN-3171 URL: https://issues.apache.org/jira/browse/YARN-3171 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Jeff Zhang The order doesn't change when I click the column header -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1743) Decorate event transitions and the event-types with their behaviour
[ https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-1743: - Attachment: YARN-1743-3.patch [~leftnoteasy] attach then updated patch with apache licence header added. > Decorate event transitions and the event-types with their behaviour > --- > > Key: YARN-1743 > URL: https://issues.apache.org/jira/browse/YARN-1743 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Jeff Zhang > Labels: documentation > Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743-2.patch, > YARN-1743-3.patch, YARN-1743.patch > > > Helps to annotate the transitions with (start-state, end-state) pair and the > events with (source, destination) pair. > Not just readability, we may also use them to generate the event diagrams > across components. > Not a blocker for 0.23, but let's see. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1743) Decorate event transitions and the event-types with their behaviour
[ https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287192#comment-14287192 ] Jeff Zhang commented on YARN-1743: -- [~leftnoteasy] Upload a new patch * Change the annotation type to be Class * Add more java doc to explain the usage of the 2 annotations * The patch only use the annotation on ApplicationEventType, for the other events we can create following up jira on that. > Decorate event transitions and the event-types with their behaviour > --- > > Key: YARN-1743 > URL: https://issues.apache.org/jira/browse/YARN-1743 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Jeff Zhang > Labels: documentation > Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743-2.patch, > YARN-1743.patch > > > Helps to annotate the transitions with (start-state, end-state) pair and the > events with (source, destination) pair. > Not just readability, we may also use them to generate the event diagrams > across components. > Not a blocker for 0.23, but let's see. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1743) Decorate event transitions and the event-types with their behaviour
[ https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-1743: - Attachment: YARN-1743-2.patch > Decorate event transitions and the event-types with their behaviour > --- > > Key: YARN-1743 > URL: https://issues.apache.org/jira/browse/YARN-1743 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Jeff Zhang > Labels: documentation > Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743-2.patch, > YARN-1743.patch > > > Helps to annotate the transitions with (start-state, end-state) pair and the > events with (source, destination) pair. > Not just readability, we may also use them to generate the event diagrams > across components. > Not a blocker for 0.23, but let's see. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3000) YARN_PID_DIR should be visible in yarn-env.sh
[ https://issues.apache.org/jira/browse/YARN-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14265426#comment-14265426 ] Jeff Zhang commented on YARN-3000: -- [~aw] THanks for clarification, then it make sense to make YARN_PID_DIR deprecated. > YARN_PID_DIR should be visible in yarn-env.sh > - > > Key: YARN-3000 > URL: https://issues.apache.org/jira/browse/YARN-3000 > Project: Hadoop YARN > Issue Type: Bug > Components: scripts >Affects Versions: 2.6.0 >Reporter: Jeff Zhang >Assignee: Rohith >Priority: Minor > Attachments: 0001-YARN-3000.patch > > > Currently I see YARN_PID_DIR only show in yarn-deamon.sh which is supposed > not the place for user to set up enviroment variable. IMO, yarn-env.sh is the > place for users to set up enviroment variable just like hadoop-env.sh, so > it's better to put YARN_PID_DIR into yarn-env.sh. ( can put it into comment > just like YARN_RESOURCEMANAGER_HEAPSIZE ) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3000) YARN_PID_DIR should be visible in yarn-env.sh
[ https://issues.apache.org/jira/browse/YARN-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264373#comment-14264373 ] Jeff Zhang commented on YARN-3000: -- BTW, what's the jira for making HADOOP_PID_DIR to replace YARN_PID_DIR ? As I know hadoop-2.6 didn't do that. > YARN_PID_DIR should be visible in yarn-env.sh > - > > Key: YARN-3000 > URL: https://issues.apache.org/jira/browse/YARN-3000 > Project: Hadoop YARN > Issue Type: Bug > Components: scripts >Affects Versions: 2.6.0 >Reporter: Jeff Zhang >Assignee: Rohith >Priority: Minor > Attachments: 0001-YARN-3000.patch > > > Currently I see YARN_PID_DIR only show in yarn-deamon.sh which is supposed > not the place for user to set up enviroment variable. IMO, yarn-env.sh is the > place for users to set up enviroment variable just like hadoop-env.sh, so > it's better to put YARN_PID_DIR into yarn-env.sh. ( can put it into comment > just like YARN_RESOURCEMANAGER_HEAPSIZE ) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3000) YARN_PID_DIR should be visible in yarn-env.sh
[ https://issues.apache.org/jira/browse/YARN-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264371#comment-14264371 ] Jeff Zhang commented on YARN-3000: -- It makes sense to make YARN_PID_DIR as deprecated if HADOOP_PID_DIR is used for yarn in trunk. > YARN_PID_DIR should be visible in yarn-env.sh > - > > Key: YARN-3000 > URL: https://issues.apache.org/jira/browse/YARN-3000 > Project: Hadoop YARN > Issue Type: Bug > Components: scripts >Affects Versions: 2.6.0 >Reporter: Jeff Zhang >Assignee: Rohith >Priority: Minor > Attachments: 0001-YARN-3000.patch > > > Currently I see YARN_PID_DIR only show in yarn-deamon.sh which is supposed > not the place for user to set up enviroment variable. IMO, yarn-env.sh is the > place for users to set up enviroment variable just like hadoop-env.sh, so > it's better to put YARN_PID_DIR into yarn-env.sh. ( can put it into comment > just like YARN_RESOURCEMANAGER_HEAPSIZE ) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3000) YARN_PID_DIR should be visible in yarn-env.sh
[ https://issues.apache.org/jira/browse/YARN-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14262022#comment-14262022 ] Jeff Zhang commented on YARN-3000: -- Sure, please take it over. > YARN_PID_DIR should be visible in yarn-env.sh > - > > Key: YARN-3000 > URL: https://issues.apache.org/jira/browse/YARN-3000 > Project: Hadoop YARN > Issue Type: Bug > Components: scripts >Affects Versions: 2.6.0 >Reporter: Jeff Zhang >Priority: Minor > > Currently I see YARN_PID_DIR only show in yarn-deamon.sh which is supposed > not the place for user to set up enviroment variable. IMO, yarn-env.sh is the > place for users to set up enviroment variable just like hadoop-env.sh, so > it's better to put YARN_PID_DIR into yarn-env.sh. ( can put it into comment > just like YARN_RESOURCEMANAGER_HEAPSIZE ) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3000) YARN_PID_DIR should be visible in yarn-env.sh
[ https://issues.apache.org/jira/browse/YARN-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14262021#comment-14262021 ] Jeff Zhang commented on YARN-3000: -- Sure, please take it over. > YARN_PID_DIR should be visible in yarn-env.sh > - > > Key: YARN-3000 > URL: https://issues.apache.org/jira/browse/YARN-3000 > Project: Hadoop YARN > Issue Type: Bug > Components: scripts >Affects Versions: 2.6.0 >Reporter: Jeff Zhang >Priority: Minor > > Currently I see YARN_PID_DIR only show in yarn-deamon.sh which is supposed > not the place for user to set up enviroment variable. IMO, yarn-env.sh is the > place for users to set up enviroment variable just like hadoop-env.sh, so > it's better to put YARN_PID_DIR into yarn-env.sh. ( can put it into comment > just like YARN_RESOURCEMANAGER_HEAPSIZE ) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3000) YARN_PID_DIR should be visible in yarn-env.sh
Jeff Zhang created YARN-3000: Summary: YARN_PID_DIR should be visible in yarn-env.sh Key: YARN-3000 URL: https://issues.apache.org/jira/browse/YARN-3000 Project: Hadoop YARN Issue Type: Bug Components: scripts Affects Versions: 2.6.0 Reporter: Jeff Zhang Priority: Minor Currently I see YARN_PID_DIR only show in yarn-deamon.sh which is supposed not the place for user to set up enviroment variable. IMO, yarn-env.sh is the place for users to set up enviroment variable just like hadoop-env.sh, so it's better to put YARN_PID_DIR into yarn-env.sh. ( can put it into comment just like YARN_RESOURCEMANAGER_HEAPSIZE ) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2560) Diagnostics is delayed to passed to ApplicationReport
[ https://issues.apache.org/jira/browse/YARN-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261855#comment-14261855 ] Jeff Zhang commented on YARN-2560: -- Check RMAppImpl & RMAppAttemptImpl and found that FinalApplicationStatus would been retrieved from the currentAttempt, while diagnostics would been retrieved from the RMApp which would be updated until it gets the AttemptFinishedEvent. This is the root cause that make the FinalApplicationStatus and diagnostics inconsistent sometimes. > Diagnostics is delayed to passed to ApplicationReport > - > > Key: YARN-2560 > URL: https://issues.apache.org/jira/browse/YARN-2560 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jeff Zhang > > The diagnostics of Application may be delayed to pass to ApplicationReport. > Here's one example when ApplicationStatus has changed to FAILED, but the > diagnostics is still empty. And the next call of getApplicationReport could > get the diagnostics. > {code} > while(true) { > appReport = yarnClient.getApplicationReport(appId); > Thread.sleep(1000); > LOG.info("AppStatus:" + appReport.getFinalApplicationStatus()); > LOG.info("Diagnostics:" + appReport.getDiagnostics()); > > } > {code} > *Output:* > {code} > AppStatus:FAILED > Diagnostics: // empty > // get diagnostics for the next getApplicationReport > AppStatus:FAILED > Diagnostics: // diagnostics info here > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-1197: - Assignee: (was: Jeff Zhang) > Support changing resources of an allocated container > > > Key: YARN-1197 > URL: https://issues.apache.org/jira/browse/YARN-1197 > Project: Hadoop YARN > Issue Type: Task > Components: api, nodemanager, resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Wangda Tan > Attachments: mapreduce-project.patch.ver.1, > tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, > yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, > yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, > yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, > yarn-server-resourcemanager.patch.ver.1 > > > The current YARN resource management logic assumes resource allocated to a > container is fixed during the lifetime of it. When users want to change a > resource > of an allocated container the only way is releasing it and allocating a new > container with expected size. > Allowing run-time changing resources of an allocated container will give us > better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2750) Allow StateMachine has callback when transition fail
[ https://issues.apache.org/jira/browse/YARN-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-2750: - Attachment: YARN-2750-2.patch > Allow StateMachine has callback when transition fail > > > Key: YARN-2750 > URL: https://issues.apache.org/jira/browse/YARN-2750 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.5.1 >Reporter: Jeff Zhang > Attachments: YARN-2750-2.patch, YARN-2750.patch > > > We have a situation that sometimes Transition may fail, but we don't want to > handle the fail in each Transition, we'd like to handle it in one centralized > place, Allow StateMachine has a callback would be good for us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2750) Allow StateMachine has callback when transition fail
[ https://issues.apache.org/jira/browse/YARN-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-2750: - Attachment: YARN-2750.patch Attach a patch for initial review. > Allow StateMachine has callback when transition fail > > > Key: YARN-2750 > URL: https://issues.apache.org/jira/browse/YARN-2750 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.5.1 >Reporter: Jeff Zhang > Attachments: YARN-2750.patch > > > We have a situation that sometimes Transition may fail, but we don't want to > handle the fail in each Transition, we'd like to handle it in one centralized > place, Allow StateMachine has a callback would be good for us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2750) Allow StateMachine has callback when transition fail
[ https://issues.apache.org/jira/browse/YARN-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-2750: - Affects Version/s: 2.5.1 > Allow StateMachine has callback when transition fail > > > Key: YARN-2750 > URL: https://issues.apache.org/jira/browse/YARN-2750 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.5.1 >Reporter: Jeff Zhang > > We have a situation that sometimes Transition may fail, but we don't want to > handle the fail in each Transition, we'd like to handle it in one centralized > place, Allow StateMachine has a callback would be good for us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2750) Allow StateMachine has callback when transition fail
[ https://issues.apache.org/jira/browse/YARN-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-2750: - Description: We have a situation that sometimes Transition may fail, but we don't want to handle the fail in each Transition, we'd like to handle it in one centralized place, Allow StateMachine has a callback would be good for us. > Allow StateMachine has callback when transition fail > > > Key: YARN-2750 > URL: https://issues.apache.org/jira/browse/YARN-2750 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jeff Zhang > > We have a situation that sometimes Transition may fail, but we don't want to > handle the fail in each Transition, we'd like to handle it in one centralized > place, Allow StateMachine has a callback would be good for us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2750) Allow StateMachine has callback when transition fail
Jeff Zhang created YARN-2750: Summary: Allow StateMachine has callback when transition fail Key: YARN-2750 URL: https://issues.apache.org/jira/browse/YARN-2750 Project: Hadoop YARN Issue Type: Improvement Reporter: Jeff Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2560) Diagnostics is delayed to passed to ApplicationReport
Jeff Zhang created YARN-2560: Summary: Diagnostics is delayed to passed to ApplicationReport Key: YARN-2560 URL: https://issues.apache.org/jira/browse/YARN-2560 Project: Hadoop YARN Issue Type: Bug Reporter: Jeff Zhang The diagnostics of Application may be delayed to pass to ApplicationReport. Here's one example when ApplicationStatus has changed to FAILED, but the diagnostics is still empty. And the next call of getApplicationReport could get the diagnostics. {code} while(true) { appReport = yarnClient.getApplicationReport(appId); Thread.sleep(1000); LOG.info("AppStatus:" + appReport.getFinalApplicationStatus()); LOG.info("Diagnostics:" + appReport.getDiagnostics()); } {code} *Output:* {code} AppStatus:FAILED Diagnostics: // empty // get diagnostics for the next getApplicationReport AppStatus:FAILED Diagnostics: // diagnostics info here {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1743) Decorate event transitions and the event-types with their behaviour
[ https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056976#comment-14056976 ] Jeff Zhang commented on YARN-1743: -- BTW, this patch is based on branch-2.4.0 > Decorate event transitions and the event-types with their behaviour > --- > > Key: YARN-1743 > URL: https://issues.apache.org/jira/browse/YARN-1743 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Jeff Zhang > Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743.patch > > > Helps to annotate the transitions with (start-state, end-state) pair and the > events with (source, destination) pair. > Not just readability, we may also use them to generate the event diagrams > across components. > Not a blocker for 0.23, but let's see. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1743) Decorate event transitions and the event-types with their behaviour
[ https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang reassigned YARN-1743: Assignee: Jeff Zhang > Decorate event transitions and the event-types with their behaviour > --- > > Key: YARN-1743 > URL: https://issues.apache.org/jira/browse/YARN-1743 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Jeff Zhang > Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743.patch > > > Helps to annotate the transitions with (start-state, end-state) pair and the > events with (source, destination) pair. > Not just readability, we may also use them to generate the event diagrams > across components. > Not a blocker for 0.23, but let's see. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1743) Decorate event transitions and the event-types with their behaviour
[ https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056931#comment-14056931 ] Jeff Zhang commented on YARN-1743: -- Attach one initial patch. It may need to refine later, this patch is just to illustrate my basic idea on generating event flow between different entities. For each event there is a source and dest, and use this information to generate the graph viz file. The state_machine diagram is for describing the internal transition of one entity, while this patch is for describing the interaction between different entities. Welcome any comments and feedback on this. > Decorate event transitions and the event-types with their behaviour > --- > > Key: YARN-1743 > URL: https://issues.apache.org/jira/browse/YARN-1743 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli > Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743.patch > > > Helps to annotate the transitions with (start-state, end-state) pair and the > events with (source, destination) pair. > Not just readability, we may also use them to generate the event diagrams > across components. > Not a blocker for 0.23, but let's see. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1743) Decorate event transitions and the event-types with their behaviour
[ https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-1743: - Attachment: NodeManager.pdf > Decorate event transitions and the event-types with their behaviour > --- > > Key: YARN-1743 > URL: https://issues.apache.org/jira/browse/YARN-1743 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli > Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743.patch > > > Helps to annotate the transitions with (start-state, end-state) pair and the > events with (source, destination) pair. > Not just readability, we may also use them to generate the event diagrams > across components. > Not a blocker for 0.23, but let's see. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1743) Decorate event transitions and the event-types with their behaviour
[ https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-1743: - Attachment: NodeManager.gv YARN-1743.patch > Decorate event transitions and the event-types with their behaviour > --- > > Key: YARN-1743 > URL: https://issues.apache.org/jira/browse/YARN-1743 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli > Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743.patch > > > Helps to annotate the transitions with (start-state, end-state) pair and the > events with (source, destination) pair. > Not just readability, we may also use them to generate the event diagrams > across components. > Not a blocker for 0.23, but let's see. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2184) ResourceManager may fail due to name node in safe mode
Jeff Zhang created YARN-2184: Summary: ResourceManager may fail due to name node in safe mode Key: YARN-2184 URL: https://issues.apache.org/jira/browse/YARN-2184 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jeff Zhang Assignee: Jeff Zhang If the historyservice is enabled in resourcemanager, it will try to mkdir when service is inited. And at that time maybe the name node is still in safemode which may cause the historyservice failed and then cause the resouremanager fail. It would be very possible when the cluster is restarted when namenode will be in safemode in a long time. Here's the error logs: {code} Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /Users/jzhang/Java/lib/hadoop-2.4.0/logs/yarn/system/history/ApplicationHistoryDataRoot. Name node is in safe mode. The reported blocks 85 has reached the threshold 0.9990 of total blocks 85. The number of live datanodes 1 has reached the minimum number 0. In safe mode extension. Safe mode will be turned off automatically in 19 seconds. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1195) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3564) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3540) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:754) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:558) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy14.mkdirs(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at com.sun.proxy.$Proxy14.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:500) at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2553) at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2524) at org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:827) at org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:823) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:823) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:816) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1815) at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.serviceInit(FileSystemApplicationHistoryStore.java:120) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) ... 10 more 2014-06-20 11:06:25,220 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down ResourceManager at jzhangMBPr.local/192.168.100.152 {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang reassigned YARN-1197: Assignee: Jeff Zhang > Support changing resources of an allocated container > > > Key: YARN-1197 > URL: https://issues.apache.org/jira/browse/YARN-1197 > Project: Hadoop YARN > Issue Type: Task > Components: api, nodemanager, resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Wangda Tan >Assignee: Jeff Zhang > Attachments: mapreduce-project.patch.ver.1, > tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, > yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, > yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, > yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, > yarn-server-resourcemanager.patch.ver.1 > > > The current YARN resource management logic assumes resource allocated to a > container is fixed during the lifetime of it. When users want to change a > resource > of an allocated container the only way is releasing it and allocating a new > container with expected size. > Allowing run-time changing resources of an allocated container will give us > better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1609) Add Service Container type to NodeManager in YARN
[ https://issues.apache.org/jira/browse/YARN-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang reassigned YARN-1609: Assignee: Jeff Zhang > Add Service Container type to NodeManager in YARN > - > > Key: YARN-1609 > URL: https://issues.apache.org/jira/browse/YARN-1609 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.2.0 >Reporter: Wangda Tan >Assignee: Jeff Zhang > Attachments: Add Service Container type to NodeManager in YARN-V1.pdf > > > From our work to support running OpenMPI on YARN (MAPREDUCE-2911), we found > that it’s important to have framework specific daemon process manage the > tasks on each node directly. The daemon process, most likely similar in other > frameworks as well, provides critical services to tasks running on that > node(for example “wireup”, spawn user process in large numbers at once etc). > In YARN, it’s hard, if not possible, to have the those processes to be > managed by YARN. > We propose to extend the container model on NodeManager side to support > “Service Container” to run/manage such framework daemon/services process. We > believe this is very useful to other application framework developers as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1754) Container process is not really killed
[ https://issues.apache.org/jira/browse/YARN-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-1754: - Description: I test the following distributed shell example on my mac: hadoop jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar -appname shell -jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar -shell_command=sleep -shell_args=10 -num_containers=1 And it will start 2 process for one container, one is the shell process, another is the real command I execute ( here is "sleep 10"). And then I kill this application by running command "yarn application -kill app_id" it will kill the shell process, but won't kill the real command process. The reason is that yarn use kill command to kill process, but it won't kill its child process. use pkill could resolve this issue. IMHO, it is a very important case which will make the resource usage inconsistency, and have potential security problem. was: I test the following distributed shell example on my mac: hadoop jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar -appname shell -jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar -shell_command=sleep -shell_args=10 -num_containers=1 And it will start 2 process for one container, one is the shell process, another is the real command I execute ( here is "sleep 10"). And then I kill this application by running command "yarn application -kill app_id" it will kill the shell process, but won't kill the real command process. The reason is that yarn use kill command to kill process, but it won't kill its child process. use pkill could resolve this issue. I also verify this case on centos which is the same as mac. IMHO, it is a very important case which will make the resource usage inconsistency, and have potential security problem. > Container process is not really killed > -- > > Key: YARN-1754 > URL: https://issues.apache.org/jira/browse/YARN-1754 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 > Environment: Mac >Reporter: Jeff Zhang > > I test the following distributed shell example on my mac: > hadoop jar > share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar > -appname shell -jar > share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar > -shell_command=sleep -shell_args=10 -num_containers=1 > And it will start 2 process for one container, one is the shell process, > another is the real command I execute ( here is "sleep 10"). > And then I kill this application by running command "yarn application -kill > app_id" > it will kill the shell process, but won't kill the real command process. The > reason is that yarn use kill command to kill process, but it won't kill its > child process. use pkill could resolve this issue. > IMHO, it is a very important case which will make the resource usage > inconsistency, and have potential security problem. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1754) Container process is not really killed
[ https://issues.apache.org/jira/browse/YARN-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910006#comment-13910006 ] Jeff Zhang commented on YARN-1754: -- It looks like it only happens on mac, but works normally in linux > Container process is not really killed > -- > > Key: YARN-1754 > URL: https://issues.apache.org/jira/browse/YARN-1754 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 > Environment: Mac >Reporter: Jeff Zhang > > I test the following distributed shell example on my mac: > hadoop jar > share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar > -appname shell -jar > share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar > -shell_command=sleep -shell_args=10 -num_containers=1 > And it will start 2 process for one container, one is the shell process, > another is the real command I execute ( here is "sleep 10"). > And then I kill this application by running command "yarn application -kill > app_id" > it will kill the shell process, but won't kill the real command process. The > reason is that yarn use kill command to kill process, but it won't kill its > child process. use pkill could resolve this issue. > IMHO, it is a very important case which will make the resource usage > inconsistency, and have potential security problem. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1754) Container process is not really killed
Jeff Zhang created YARN-1754: Summary: Container process is not really killed Key: YARN-1754 URL: https://issues.apache.org/jira/browse/YARN-1754 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.2.0 Environment: Mac Reporter: Jeff Zhang I test the following distributed shell example on my mac: hadoop jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar -appname shell -jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar -shell_command=sleep -shell_args=10 -num_containers=1 And it will start 2 process for one container, one is the shell process, another is the real command I execute ( here is "sleep 10"). And then I kill this application by running command "yarn application -kill app_id" it will kill the shell process, but won't kill the real command process. The reason is that yarn use kill command to kill process, but it won't kill its child process. use pkill could resolve this issue. I also verify this case on centos which is the same as mac. IMHO, it is a very important case which will make the resource usage inconsistency, and have potential security problem. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838492#comment-13838492 ] Jeff Zhang commented on YARN-321: - Thanks Shen. Is there any estimation for the release of 2.4 ? > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837281#comment-13837281 ] Jeff Zhang commented on YARN-321: - Another question about this jira. I found that the container logURL is hard-coded there, user still could not see the logs of each container ( stdout, stderror ). Is it on the roadmap that allow user to see the logs ? And which jira is tracking this ? Thanks . > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837276#comment-13837276 ] Jeff Zhang commented on YARN-321: - Will this jira been included in the next release ? > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1440) Yarn aggregated logs are difficult for external tools to understand
[ https://issues.apache.org/jira/browse/YARN-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831275#comment-13831275 ] Jeff Zhang commented on YARN-1440: -- @ledion, the current implementation will be one TFile per application, while your method will create one TFile per container which would generate more files. I guess the reason why the original author adopt TFile is that TFile has one index block which allow user quickly find the value. In this way, user could quickly find one container's log of one application. > Yarn aggregated logs are difficult for external tools to understand > --- > > Key: YARN-1440 > URL: https://issues.apache.org/jira/browse/YARN-1440 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: ledion bitincka > Labels: log-aggregation, logs, tfile, yarn > > The log aggregation feature in Yarn is awesome! However, the file type and > format in which the log files are aggregated into (TFile) should either be > much simpler or be made pluggable. The current TFile format forces anyone who > wants to see the files to either > a) use the web UI > b) use the CLI tools (yarn logs) or > c) write custom code to read the files > My suggestion would be to simplify the log collection by collecting and > writing the raw log files into a directory structure as follows: > {noformat} > /{log-collection-dir}/{app-id}/{container-id}/{log-file-name} > {noformat} > This way the application developers can (re)use a much wider array of tools > to process the logs. > For the readers who are not familiar with logs and their format you can find > more info the following two blog posts: > http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/ > http://blogs.splunk.com/2013/11/18/hadoop-2-0-rant/ -- This message was sent by Atlassian JIRA (v6.1#6144)