[jira] [Updated] (YARN-1232) Configuration to support multiple RMs
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1232: --- Attachment: yarn-1232-7.patch Configuration to support multiple RMs - Key: YARN-1232 URL: https://issues.apache.org/jira/browse/YARN-1232 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch, yarn-1232-7.patch, yarn-1232-7.patch We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1232) Configuration to support multiple RMs
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785929#comment-13785929 ] Karthik Kambatla commented on YARN-1232: s/tertiary/ternary/ in the above comment. Configuration to support multiple RMs - Key: YARN-1232 URL: https://issues.apache.org/jira/browse/YARN-1232 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch, yarn-1232-7.patch, yarn-1232-7.patch We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1219) FSDownload changes file suffix making FileUtil.unTar() throw exception
[ https://issues.apache.org/jira/browse/YARN-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785928#comment-13785928 ] Hudson commented on YARN-1219: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4537 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4537/]) YARN-1219. FSDownload changes file suffix making FileUtil.unTar() throw exception. Contributed by Shanyu Zhao. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529084) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java FSDownload changes file suffix making FileUtil.unTar() throw exception -- Key: YARN-1219 URL: https://issues.apache.org/jira/browse/YARN-1219 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.1.1-beta, 2.1.2-beta Reporter: shanyu zhao Assignee: shanyu zhao Fix For: 2.1.2-beta Attachments: YARN-1219.patch While running a Hive join operation on Yarn, I saw exception as described below. This is caused by FSDownload copy the files into a temp file and change the suffix into .tmp before unpacking it. In unpack(), it uses FileUtil.unTar() which will determine if the file is gzipped by looking at the file suffix: {code} boolean gzipped = inFile.toString().endsWith(gz); {code} To fix this problem, we can remove the .tmp in the temp file name. Here is the detailed exception: org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:240) at org.apache.hadoop.fs.FileUtil.unTarUsingJava(FileUtil.java:676) at org.apache.hadoop.fs.FileUtil.unTar(FileUtil.java:625) at org.apache.hadoop.yarn.util.FSDownload.unpack(FSDownload.java:203) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:287) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:50) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1232) Configuration to support multiple RMs
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785931#comment-13785931 ] Hadoop QA commented on YARN-1232: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606732/yarn-1232-7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2092//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2092//console This message is automatically generated. Configuration to support multiple RMs - Key: YARN-1232 URL: https://issues.apache.org/jira/browse/YARN-1232 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch, yarn-1232-7.patch, yarn-1232-7.patch We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (YARN-1219) FSDownload changes file suffix making FileUtil.unTar() throw exception
[ https://issues.apache.org/jira/browse/YARN-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth resolved YARN-1219. - Resolution: Fixed I've committed this to trunk, branch-2, and branch-2.1-beta. Shanyu, thank you for the patch. Omkar, thank you for help with code review. FSDownload changes file suffix making FileUtil.unTar() throw exception -- Key: YARN-1219 URL: https://issues.apache.org/jira/browse/YARN-1219 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.1.1-beta, 2.1.2-beta Reporter: shanyu zhao Assignee: shanyu zhao Fix For: 2.1.2-beta Attachments: YARN-1219.patch While running a Hive join operation on Yarn, I saw exception as described below. This is caused by FSDownload copy the files into a temp file and change the suffix into .tmp before unpacking it. In unpack(), it uses FileUtil.unTar() which will determine if the file is gzipped by looking at the file suffix: {code} boolean gzipped = inFile.toString().endsWith(gz); {code} To fix this problem, we can remove the .tmp in the temp file name. Here is the detailed exception: org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:240) at org.apache.hadoop.fs.FileUtil.unTarUsingJava(FileUtil.java:676) at org.apache.hadoop.fs.FileUtil.unTar(FileUtil.java:625) at org.apache.hadoop.yarn.util.FSDownload.unpack(FSDownload.java:203) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:287) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:50) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-7) Add support for DistributedShell to ask for CPUs along with memory
[ https://issues.apache.org/jira/browse/YARN-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-7: -- Attachment: YARN-7-v3.patch Sync up the patch, remove unnecessary white space/tab changes. Add support for DistributedShell to ask for CPUs along with memory -- Key: YARN-7 URL: https://issues.apache.org/jira/browse/YARN-7 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Arun C Murthy Assignee: Junping Du Labels: patch Attachments: YARN-7.patch, YARN-7-v2.patch, YARN-7-v3.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-7) Add support for DistributedShell to ask for CPUs along with memory
[ https://issues.apache.org/jira/browse/YARN-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785988#comment-13785988 ] Hadoop QA commented on YARN-7: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606739/YARN-7-v3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell: org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2094//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2094//console This message is automatically generated. Add support for DistributedShell to ask for CPUs along with memory -- Key: YARN-7 URL: https://issues.apache.org/jira/browse/YARN-7 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Arun C Murthy Assignee: Junping Du Labels: patch Attachments: YARN-7.patch, YARN-7-v2.patch, YARN-7-v3.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty
[ https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1167: Attachment: YARN-1167.5.patch Got lucky to pass the test on my local machine. --- T E S T S --- --- T E S T S --- Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.464 sec - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Results : Tests run: 2, Failures: 0, Errors: 0, Skipped: 0 Submitted distributed shell application shows appMasterHost = empty --- Key: YARN-1167 URL: https://issues.apache.org/jira/browse/YARN-1167 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1167.1.patch, YARN-1167.2.patch, YARN-1167.3.patch, YARN-1167.4.patch, YARN-1167.5.patch Submit distributed shell application. Once the application turns to be RUNNING state, app master host should not be empty. In reality, it is empty. ==console logs== distributedshell.Client: Got application report from ASM for, appId=12, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1166) YARN 'appsFailed' metric should be of type 'counter'
[ https://issues.apache.org/jira/browse/YARN-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1166: -- Attachment: YARN-1166.4.patch [~ajisakaa], thanks! I've uploaded a new patch to do the following two things: 1. Change appsFailed to be a counter 2. Expose RMContext to AppSchedulerInfo, such that QueueMetrics can use the app info to determine whether it is a last attempt or not. The counter only increase at the last attempt. Modified the test cases to verify the logic. It's a compromise to do the trick here. I've considered to correct the logic to only the increment on last attempt failure, but it turns out to be a lot changes on the path of the APP_REMOVE event from RMApp/RMAppAttempt to QueueMetrics. IMHO, I'm conservative to do such kind of change when release 2.2.0 is coming. YARN 'appsFailed' metric should be of type 'counter' Key: YARN-1166 URL: https://issues.apache.org/jira/browse/YARN-1166 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-1166.2.patch, YARN-1166.3.patch, YARN-1166.4.patch, YARN-1166.patch Currently in YARN's queue metrics, the cumulative metric 'appsFailed' is of type 'guage' - which means the exact value will be reported. All other cumulative queue metrics (AppsSubmitted, AppsCompleted, AppsKilled) are all of type 'counter' - meaning Ganglia will use slope to provide deltas between time-points. To be consistent, AppsFailed metric should also be of type 'counter'. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty
[ https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786015#comment-13786015 ] Hadoop QA commented on YARN-1167: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606760/YARN-1167.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell: org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2096//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2096//console This message is automatically generated. Submitted distributed shell application shows appMasterHost = empty --- Key: YARN-1167 URL: https://issues.apache.org/jira/browse/YARN-1167 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1167.1.patch, YARN-1167.2.patch, YARN-1167.3.patch, YARN-1167.4.patch, YARN-1167.5.patch Submit distributed shell application. Once the application turns to be RUNNING state, app master host should not be empty. In reality, it is empty. ==console logs== distributedshell.Client: Got application report from ASM for, appId=12, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1166) YARN 'appsFailed' metric should be of type 'counter'
[ https://issues.apache.org/jira/browse/YARN-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786020#comment-13786020 ] Hadoop QA commented on YARN-1166: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606761/YARN-1166.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2095//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2095//console This message is automatically generated. YARN 'appsFailed' metric should be of type 'counter' Key: YARN-1166 URL: https://issues.apache.org/jira/browse/YARN-1166 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-1166.2.patch, YARN-1166.3.patch, YARN-1166.4.patch, YARN-1166.patch Currently in YARN's queue metrics, the cumulative metric 'appsFailed' is of type 'guage' - which means the exact value will be reported. All other cumulative queue metrics (AppsSubmitted, AppsCompleted, AppsKilled) are all of type 'counter' - meaning Ganglia will use slope to provide deltas between time-points. To be consistent, AppsFailed metric should also be of type 'counter'. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan reassigned YARN-1197: Assignee: Wangda Tan Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Assignee: Wangda Tan Attachments: yarn-1197.pdf, yarn-1197-v2.pdf Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Attachment: yarn-1197-v2.pdf Guys, I attached an updated doc basic on our discussion, mainly focused workflow diagram and detailed API changes, many thanks to [~bikassaha], [~sandyr] and [~tucu00]. Hope to get your feedback. I'll start working on it. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: yarn-1197.pdf, yarn-1197-v2.pdf Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-621) RM triggers web auth failure before first job
[ https://issues.apache.org/jira/browse/YARN-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786060#comment-13786060 ] Hudson commented on YARN-621: - FAILURE: Integrated in Hadoop-Yarn-trunk #352 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/352/]) YARN-621. Changed YARN web app to not add paths that can cause duplicate additions of authenticated filters there by causing kerberos replay errors. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529030) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/WebApps.java RM triggers web auth failure before first job - Key: YARN-621 URL: https://issues.apache.org/jira/browse/YARN-621 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.4-alpha Reporter: Allen Wittenauer Assignee: Omkar Vinit Joshi Priority: Critical Fix For: 2.1.2-beta Attachments: YARN-621.20131001.1.patch On a secure YARN setup, before the first job is executed, going to the web interface of the resource manager triggers authentication errors. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-677) Increase coverage to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786057#comment-13786057 ] Hudson commented on YARN-677: - FAILURE: Integrated in Hadoop-Yarn-trunk #352 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/352/]) Revert YARN-677. Increase coverage to FairScheduler (Vadim Bondarev and Dennis Y via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528914) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Increase coverage to FairScheduler -- Key: YARN-677 URL: https://issues.apache.org/jira/browse/YARN-677 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Vadim Bondarev Assignee: Andrey Klochkov Attachments: HADOOP-4536-branch-2-a.patch, HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1236) FairScheduler setting queue name in RMApp is not working
[ https://issues.apache.org/jira/browse/YARN-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786061#comment-13786061 ] Hudson commented on YARN-1236: -- FAILURE: Integrated in Hadoop-Yarn-trunk #352 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/352/]) YARN-1236. FairScheduler setting queue name in RMApp is not working. (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529034) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java FairScheduler setting queue name in RMApp is not working - Key: YARN-1236 URL: https://issues.apache.org/jira/browse/YARN-1236 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.1.2-beta Attachments: YARN-1236.patch The fair scheduler sometimes picks a different queue than the one an application was submitted to, such as when user-as-default-queue is turned on. It needs to update the queue name in the RMApp so that this choice will be reflected in the UI. This isn't working because the scheduler is looking up the RMApp by application attempt id instead of app id and failing to find it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1199) Make NM/RM Versions Available
[ https://issues.apache.org/jira/browse/YARN-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786055#comment-13786055 ] Hudson commented on YARN-1199: -- FAILURE: Integrated in Hadoop-Yarn-trunk #352 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/352/]) YARN-1199. Make NM/RM Versions Available (Mit Desai via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529003) * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/TestRMNMInfo.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMNMInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodeInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java Make NM/RM Versions Available - Key: YARN-1199 URL: https://issues.apache.org/jira/browse/YARN-1199 Project: Hadoop YARN Issue Type: Improvement Reporter: Mit Desai Assignee: Mit Desai Fix For: 3.0.0, 2.3.0 Attachments: YARN-1199.patch, YARN-1199.patch, YARN-1199.patch, YARN-1199.patch Now as we have the NM and RM Versions available, we can display the YARN version of nodes running in the cluster. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1219) FSDownload changes file suffix making FileUtil.unTar() throw exception
[ https://issues.apache.org/jira/browse/YARN-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786059#comment-13786059 ] Hudson commented on YARN-1219: -- FAILURE: Integrated in Hadoop-Yarn-trunk #352 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/352/]) YARN-1219. FSDownload changes file suffix making FileUtil.unTar() throw exception. Contributed by Shanyu Zhao. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529084) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java FSDownload changes file suffix making FileUtil.unTar() throw exception -- Key: YARN-1219 URL: https://issues.apache.org/jira/browse/YARN-1219 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.1.1-beta, 2.1.2-beta Reporter: shanyu zhao Assignee: shanyu zhao Fix For: 2.1.2-beta Attachments: YARN-1219.patch While running a Hive join operation on Yarn, I saw exception as described below. This is caused by FSDownload copy the files into a temp file and change the suffix into .tmp before unpacking it. In unpack(), it uses FileUtil.unTar() which will determine if the file is gzipped by looking at the file suffix: {code} boolean gzipped = inFile.toString().endsWith(gz); {code} To fix this problem, we can remove the .tmp in the temp file name. Here is the detailed exception: org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:240) at org.apache.hadoop.fs.FileUtil.unTarUsingJava(FileUtil.java:676) at org.apache.hadoop.fs.FileUtil.unTar(FileUtil.java:625) at org.apache.hadoop.yarn.util.FSDownload.unpack(FSDownload.java:203) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:287) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:50) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-890) The roundup for memory values on resource manager UI is misleading
[ https://issues.apache.org/jira/browse/YARN-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786064#comment-13786064 ] Hudson commented on YARN-890: - FAILURE: Integrated in Hadoop-Yarn-trunk #352 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/352/]) YARN-890. Ensure CapacityScheduler doesn't round-up metric for available resources. Contributed by Xuan Gong Hitesh Shah. (acmurthy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529015) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java The roundup for memory values on resource manager UI is misleading -- Key: YARN-890 URL: https://issues.apache.org/jira/browse/YARN-890 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Trupti Dhavle Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: Screen Shot 2013-07-10 at 10.43.34 AM.png, YARN-890.1.patch, YARN-890.2.patch From the yarn-site.xml, I see following values- property nameyarn.nodemanager.resource.memory-mb/name value4192/value /property property nameyarn.scheduler.maximum-allocation-mb/name value4192/value /property property nameyarn.scheduler.minimum-allocation-mb/name value1024/value /property However the resourcemanager UI shows total memory as 5MB -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest
[ https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786066#comment-13786066 ] Hudson commented on YARN-1256: -- FAILURE: Integrated in Hadoop-Yarn-trunk #352 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/352/]) Addendum for missing file YARN-1256. NM silently ignores non-existent service in StartContainerRequest (Xuan Gong via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529048) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/InvalidAuxServiceException.java YARN-1256. NM silently ignores non-existent service in StartContainerRequest (Xuan Gong via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529039) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AuxiliaryServiceHelper.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerManagerWithLCE.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java NM silently ignores non-existent service in StartContainerRequest - Key: YARN-1256 URL: https://issues.apache.org/jira/browse/YARN-1256 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.1-beta Reporter: Bikas Saha Assignee: Xuan Gong Priority: Critical Fix For: 2.1.2-beta Attachments: YARN-1256.1.patch, YARN-1256.2.patch, YARN-1256.3.patch, YARN-1256.4.patch, YARN-1256.5.patch A container can set token service metadata for a service, say shuffle_service. If that service does not exist then the errors is silently ignored. Later, when the next container wants to access data written to shuffle_service by the first task, then it fails because the service does not have the token that was supposed to be set by the first task. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1131) $yarn logs command should return an appropriate error message if YARN application is still running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786058#comment-13786058 ] Hudson commented on YARN-1131: -- FAILURE: Integrated in Hadoop-Yarn-trunk #352 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/352/]) YARN-1131. logs command should return an appropriate error message if YARN application is still running. Contributed by Siddharth Seth. (hitesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529068) * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/tools/CLI.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogDumper.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestLogDumper.java $yarn logs command should return an appropriate error message if YARN application is still running -- Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Siddharth Seth Priority: Minor Fix For: 2.1.2-beta Attachments: YARN-1131.1.txt, YARN-1131.2.txt, YARN-1131.3.txt In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1271) Text file busy errors launching containers again
[ https://issues.apache.org/jira/browse/YARN-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786062#comment-13786062 ] Hudson commented on YARN-1271: -- FAILURE: Integrated in Hadoop-Yarn-trunk #352 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/352/]) YARN-1271. Text file busy errors launching containers again (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529058) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java Text file busy errors launching containers again -- Key: YARN-1271 URL: https://issues.apache.org/jira/browse/YARN-1271 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.1.2-beta Attachments: YARN-1271.patch The error is shown below in the comments. MAPREDUCE-2374 fixed this by removing -c when running the container launch script. It looks like the -c got brought back during the windows branch merge, so we should remove it again. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786065#comment-13786065 ] Hudson commented on YARN-1149: -- FAILURE: Integrated in Hadoop-Yarn-trunk #352 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/352/]) YARN-1149. NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING. Contributed by Xuan Gong. (hitesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529043) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/CMgrCompletedAppsEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/CMgrCompletedContainersEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerReboot.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerResync.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerShutdown.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/TestApplication.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING - Key: YARN-1149 URL: https://issues.apache.org/jira/browse/YARN-1149 Project: Hadoop YARN Issue Type: Bug Reporter: Ramya Sunil Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch, YARN-1149.7.patch, YARN-1149.8.patch, YARN-1149.9.patch, YARN-1149_branch-2.1-beta.1.patch When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown {noformat} 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp 2013-08-25 20:45:00,876 INFO
[jira] [Commented] (YARN-677) Increase coverage to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786135#comment-13786135 ] Hudson commented on YARN-677: - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1542 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1542/]) Revert YARN-677. Increase coverage to FairScheduler (Vadim Bondarev and Dennis Y via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528914) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Increase coverage to FairScheduler -- Key: YARN-677 URL: https://issues.apache.org/jira/browse/YARN-677 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Vadim Bondarev Assignee: Andrey Klochkov Attachments: HADOOP-4536-branch-2-a.patch, HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1131) $yarn logs command should return an appropriate error message if YARN application is still running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786136#comment-13786136 ] Hudson commented on YARN-1131: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1542 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1542/]) YARN-1131. logs command should return an appropriate error message if YARN application is still running. Contributed by Siddharth Seth. (hitesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529068) * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/tools/CLI.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogDumper.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestLogDumper.java $yarn logs command should return an appropriate error message if YARN application is still running -- Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Siddharth Seth Priority: Minor Fix For: 2.1.2-beta Attachments: YARN-1131.1.txt, YARN-1131.2.txt, YARN-1131.3.txt In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-621) RM triggers web auth failure before first job
[ https://issues.apache.org/jira/browse/YARN-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786138#comment-13786138 ] Hudson commented on YARN-621: - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1542 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1542/]) YARN-621. Changed YARN web app to not add paths that can cause duplicate additions of authenticated filters there by causing kerberos replay errors. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529030) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/WebApps.java RM triggers web auth failure before first job - Key: YARN-621 URL: https://issues.apache.org/jira/browse/YARN-621 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.4-alpha Reporter: Allen Wittenauer Assignee: Omkar Vinit Joshi Priority: Critical Fix For: 2.1.2-beta Attachments: YARN-621.20131001.1.patch On a secure YARN setup, before the first job is executed, going to the web interface of the resource manager triggers authentication errors. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1219) FSDownload changes file suffix making FileUtil.unTar() throw exception
[ https://issues.apache.org/jira/browse/YARN-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786137#comment-13786137 ] Hudson commented on YARN-1219: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1542 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1542/]) YARN-1219. FSDownload changes file suffix making FileUtil.unTar() throw exception. Contributed by Shanyu Zhao. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529084) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java FSDownload changes file suffix making FileUtil.unTar() throw exception -- Key: YARN-1219 URL: https://issues.apache.org/jira/browse/YARN-1219 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.1.1-beta, 2.1.2-beta Reporter: shanyu zhao Assignee: shanyu zhao Fix For: 2.1.2-beta Attachments: YARN-1219.patch While running a Hive join operation on Yarn, I saw exception as described below. This is caused by FSDownload copy the files into a temp file and change the suffix into .tmp before unpacking it. In unpack(), it uses FileUtil.unTar() which will determine if the file is gzipped by looking at the file suffix: {code} boolean gzipped = inFile.toString().endsWith(gz); {code} To fix this problem, we can remove the .tmp in the temp file name. Here is the detailed exception: org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:240) at org.apache.hadoop.fs.FileUtil.unTarUsingJava(FileUtil.java:676) at org.apache.hadoop.fs.FileUtil.unTar(FileUtil.java:625) at org.apache.hadoop.yarn.util.FSDownload.unpack(FSDownload.java:203) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:287) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:50) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest
[ https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786144#comment-13786144 ] Hudson commented on YARN-1256: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1542 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1542/]) Addendum for missing file YARN-1256. NM silently ignores non-existent service in StartContainerRequest (Xuan Gong via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529048) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/InvalidAuxServiceException.java YARN-1256. NM silently ignores non-existent service in StartContainerRequest (Xuan Gong via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529039) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AuxiliaryServiceHelper.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerManagerWithLCE.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java NM silently ignores non-existent service in StartContainerRequest - Key: YARN-1256 URL: https://issues.apache.org/jira/browse/YARN-1256 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.1-beta Reporter: Bikas Saha Assignee: Xuan Gong Priority: Critical Fix For: 2.1.2-beta Attachments: YARN-1256.1.patch, YARN-1256.2.patch, YARN-1256.3.patch, YARN-1256.4.patch, YARN-1256.5.patch A container can set token service metadata for a service, say shuffle_service. If that service does not exist then the errors is silently ignored. Later, when the next container wants to access data written to shuffle_service by the first task, then it fails because the service does not have the token that was supposed to be set by the first task. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-890) The roundup for memory values on resource manager UI is misleading
[ https://issues.apache.org/jira/browse/YARN-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786142#comment-13786142 ] Hudson commented on YARN-890: - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1542 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1542/]) YARN-890. Ensure CapacityScheduler doesn't round-up metric for available resources. Contributed by Xuan Gong Hitesh Shah. (acmurthy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529015) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java The roundup for memory values on resource manager UI is misleading -- Key: YARN-890 URL: https://issues.apache.org/jira/browse/YARN-890 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Trupti Dhavle Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: Screen Shot 2013-07-10 at 10.43.34 AM.png, YARN-890.1.patch, YARN-890.2.patch From the yarn-site.xml, I see following values- property nameyarn.nodemanager.resource.memory-mb/name value4192/value /property property nameyarn.scheduler.maximum-allocation-mb/name value4192/value /property property nameyarn.scheduler.minimum-allocation-mb/name value1024/value /property However the resourcemanager UI shows total memory as 5MB -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786143#comment-13786143 ] Hudson commented on YARN-1149: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1542 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1542/]) YARN-1149. NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING. Contributed by Xuan Gong. (hitesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529043) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/CMgrCompletedAppsEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/CMgrCompletedContainersEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerReboot.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerResync.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerShutdown.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/TestApplication.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING - Key: YARN-1149 URL: https://issues.apache.org/jira/browse/YARN-1149 Project: Hadoop YARN Issue Type: Bug Reporter: Ramya Sunil Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch, YARN-1149.7.patch, YARN-1149.8.patch, YARN-1149.9.patch, YARN-1149_branch-2.1-beta.1.patch When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown {noformat} 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp 2013-08-25 20:45:00,876 INFO
[jira] [Updated] (YARN-1251) TestDistributedShell#TestDSShell failed with timeout
[ https://issues.apache.org/jira/browse/YARN-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1251: - Attachment: YARN-1225-kickOffTestDS.patch The test will get failed on Jenkins. Attach a test patch to kick off test and reproduce the failure although I cannot reproduce it locally. TestDistributedShell#TestDSShell failed with timeout Key: YARN-1251 URL: https://issues.apache.org/jira/browse/YARN-1251 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Junping Du Attachments: YARN-1225-kickOffTestDS.patch The Stacktrace {code} java.lang.Exception: test timed out after 9 milliseconds at com.google.protobuf.LiteralByteString.init(LiteralByteString.java:234) at com.google.protobuf.ByteString.copyFromUtf8(ByteString.java:255) at org.apache.hadoop.ipc.protobuf.ProtobufRpcEngineProtos$RequestHeaderProto.getMethodNameBytes(ProtobufRpcEngineProtos.java:286) at org.apache.hadoop.ipc.protobuf.ProtobufRpcEngineProtos$RequestHeaderProto.getSerializedSize(ProtobufRpcEngineProtos.java:462) at com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:84) at org.apache.hadoop.ipc.ProtobufRpcEngine$RpcMessageWithHeader.write(ProtobufRpcEngine.java:302) at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:989) at org.apache.hadoop.ipc.Client.call(Client.java:1377) at org.apache.hadoop.ipc.Client.call(Client.java:1357) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at $Proxy70.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at $Proxy71.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:195) at org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:622) at org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:597) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:125) {code} For details, please refer: https://builds.apache.org/job/PreCommit-YARN-Build/2039//testReport/ -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1236) FairScheduler setting queue name in RMApp is not working
[ https://issues.apache.org/jira/browse/YARN-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786139#comment-13786139 ] Hudson commented on YARN-1236: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1542 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1542/]) YARN-1236. FairScheduler setting queue name in RMApp is not working. (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529034) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java FairScheduler setting queue name in RMApp is not working - Key: YARN-1236 URL: https://issues.apache.org/jira/browse/YARN-1236 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.1.2-beta Attachments: YARN-1236.patch The fair scheduler sometimes picks a different queue than the one an application was submitted to, such as when user-as-default-queue is turned on. It needs to update the queue name in the RMApp so that this choice will be reflected in the UI. This isn't working because the scheduler is looking up the RMApp by application attempt id instead of app id and failing to find it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1199) Make NM/RM Versions Available
[ https://issues.apache.org/jira/browse/YARN-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786133#comment-13786133 ] Hudson commented on YARN-1199: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1542 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1542/]) YARN-1199. Make NM/RM Versions Available (Mit Desai via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529003) * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/TestRMNMInfo.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMNMInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodeInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java Make NM/RM Versions Available - Key: YARN-1199 URL: https://issues.apache.org/jira/browse/YARN-1199 Project: Hadoop YARN Issue Type: Improvement Reporter: Mit Desai Assignee: Mit Desai Fix For: 3.0.0, 2.3.0 Attachments: YARN-1199.patch, YARN-1199.patch, YARN-1199.patch, YARN-1199.patch Now as we have the NM and RM Versions available, we can display the YARN version of nodes running in the cluster. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1271) Text file busy errors launching containers again
[ https://issues.apache.org/jira/browse/YARN-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786140#comment-13786140 ] Hudson commented on YARN-1271: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1542 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1542/]) YARN-1271. Text file busy errors launching containers again (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529058) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java Text file busy errors launching containers again -- Key: YARN-1271 URL: https://issues.apache.org/jira/browse/YARN-1271 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.1.2-beta Attachments: YARN-1271.patch The error is shown below in the comments. MAPREDUCE-2374 fixed this by removing -c when running the container launch script. It looks like the -c got brought back during the windows branch merge, so we should remove it again. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-7) Add support for DistributedShell to ask for CPUs along with memory
[ https://issues.apache.org/jira/browse/YARN-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786146#comment-13786146 ] Junping Du commented on YARN-7: --- The test failure are unrelated as YARN-1251 shows. Add support for DistributedShell to ask for CPUs along with memory -- Key: YARN-7 URL: https://issues.apache.org/jira/browse/YARN-7 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Arun C Murthy Assignee: Junping Du Labels: patch Attachments: YARN-7.patch, YARN-7-v2.patch, YARN-7-v3.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1251) TestDistributedShell#TestDSShell failed with timeout
[ https://issues.apache.org/jira/browse/YARN-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1251: - Description: TestDistributedShell#TestDSShell on trunk Jenkins are failed consistently recently. The Stacktrace is: {code} java.lang.Exception: test timed out after 9 milliseconds at com.google.protobuf.LiteralByteString.init(LiteralByteString.java:234) at com.google.protobuf.ByteString.copyFromUtf8(ByteString.java:255) at org.apache.hadoop.ipc.protobuf.ProtobufRpcEngineProtos$RequestHeaderProto.getMethodNameBytes(ProtobufRpcEngineProtos.java:286) at org.apache.hadoop.ipc.protobuf.ProtobufRpcEngineProtos$RequestHeaderProto.getSerializedSize(ProtobufRpcEngineProtos.java:462) at com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:84) at org.apache.hadoop.ipc.ProtobufRpcEngine$RpcMessageWithHeader.write(ProtobufRpcEngine.java:302) at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:989) at org.apache.hadoop.ipc.Client.call(Client.java:1377) at org.apache.hadoop.ipc.Client.call(Client.java:1357) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at $Proxy70.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at $Proxy71.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:195) at org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:622) at org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:597) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:125) {code} For details, please refer: https://builds.apache.org/job/PreCommit-YARN-Build/2039//testReport/ was: The Stacktrace {code} java.lang.Exception: test timed out after 9 milliseconds at com.google.protobuf.LiteralByteString.init(LiteralByteString.java:234) at com.google.protobuf.ByteString.copyFromUtf8(ByteString.java:255) at org.apache.hadoop.ipc.protobuf.ProtobufRpcEngineProtos$RequestHeaderProto.getMethodNameBytes(ProtobufRpcEngineProtos.java:286) at org.apache.hadoop.ipc.protobuf.ProtobufRpcEngineProtos$RequestHeaderProto.getSerializedSize(ProtobufRpcEngineProtos.java:462) at com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:84) at org.apache.hadoop.ipc.ProtobufRpcEngine$RpcMessageWithHeader.write(ProtobufRpcEngine.java:302) at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:989) at org.apache.hadoop.ipc.Client.call(Client.java:1377) at org.apache.hadoop.ipc.Client.call(Client.java:1357) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at $Proxy70.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at $Proxy71.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:195) at org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:622) at org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:597) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:125) {code} For details, please refer: https://builds.apache.org/job/PreCommit-YARN-Build/2039//testReport/ TestDistributedShell#TestDSShell failed with timeout
[jira] [Commented] (YARN-1251) TestDistributedShell#TestDSShell failed with timeout
[ https://issues.apache.org/jira/browse/YARN-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786159#comment-13786159 ] Hadoop QA commented on YARN-1251: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606782/YARN-1225-kickOffTestDS.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell: org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2097//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2097//console This message is automatically generated. TestDistributedShell#TestDSShell failed with timeout Key: YARN-1251 URL: https://issues.apache.org/jira/browse/YARN-1251 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Junping Du Attachments: YARN-1225-kickOffTestDS.patch TestDistributedShell#TestDSShell on trunk Jenkins are failed consistently recently. The Stacktrace is: {code} java.lang.Exception: test timed out after 9 milliseconds at com.google.protobuf.LiteralByteString.init(LiteralByteString.java:234) at com.google.protobuf.ByteString.copyFromUtf8(ByteString.java:255) at org.apache.hadoop.ipc.protobuf.ProtobufRpcEngineProtos$RequestHeaderProto.getMethodNameBytes(ProtobufRpcEngineProtos.java:286) at org.apache.hadoop.ipc.protobuf.ProtobufRpcEngineProtos$RequestHeaderProto.getSerializedSize(ProtobufRpcEngineProtos.java:462) at com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:84) at org.apache.hadoop.ipc.ProtobufRpcEngine$RpcMessageWithHeader.write(ProtobufRpcEngine.java:302) at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:989) at org.apache.hadoop.ipc.Client.call(Client.java:1377) at org.apache.hadoop.ipc.Client.call(Client.java:1357) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at $Proxy70.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at $Proxy71.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:195) at org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:622) at org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:597) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:125) {code} For details, please refer: https://builds.apache.org/job/PreCommit-YARN-Build/2039//testReport/ -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1225) FinishApplicationMasterRequest should also have a final IPC/RPC address.
[ https://issues.apache.org/jira/browse/YARN-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786155#comment-13786155 ] Junping Du commented on YARN-1225: -- Hi [~vinodkv], would you help to review the patch here? I guess any protocol changes should be best to happen before branch-2 GA. Isn't it? The test failure on TestDistributedShell seems to be unrelated (also appear on previous JIRA like YARN-49). Thanks! FinishApplicationMasterRequest should also have a final IPC/RPC address. Key: YARN-1225 URL: https://issues.apache.org/jira/browse/YARN-1225 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: Vinod Kumar Vavilapalli Assignee: Junping Du Attachments: YARN-1225-kickOffTestDS.patch, YARN-1225-v1.patch, YARN-1225-v2.patch, YARN-1225-v3.patch AMs already can report final Http URL via FinishApplicationMasterRequest, but there is no field to report an IPC/RPC address. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-7) Add support for DistributedShell to ask for CPUs along with memory
[ https://issues.apache.org/jira/browse/YARN-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-7: -- Target Version/s: 2.1.2-beta (was: 2.1.0-beta, 2.0.4-alpha) Affects Version/s: (was: 2.0.3-alpha) 2.1.1-beta Add support for DistributedShell to ask for CPUs along with memory -- Key: YARN-7 URL: https://issues.apache.org/jira/browse/YARN-7 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.1-beta Reporter: Arun C Murthy Assignee: Junping Du Labels: patch Attachments: YARN-7.patch, YARN-7-v2.patch, YARN-7-v3.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-890) The roundup for memory values on resource manager UI is misleading
[ https://issues.apache.org/jira/browse/YARN-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786182#comment-13786182 ] Hudson commented on YARN-890: - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1568 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1568/]) YARN-890. Ensure CapacityScheduler doesn't round-up metric for available resources. Contributed by Xuan Gong Hitesh Shah. (acmurthy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529015) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java The roundup for memory values on resource manager UI is misleading -- Key: YARN-890 URL: https://issues.apache.org/jira/browse/YARN-890 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Trupti Dhavle Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: Screen Shot 2013-07-10 at 10.43.34 AM.png, YARN-890.1.patch, YARN-890.2.patch From the yarn-site.xml, I see following values- property nameyarn.nodemanager.resource.memory-mb/name value4192/value /property property nameyarn.scheduler.maximum-allocation-mb/name value4192/value /property property nameyarn.scheduler.minimum-allocation-mb/name value1024/value /property However the resourcemanager UI shows total memory as 5MB -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1271) Text file busy errors launching containers again
[ https://issues.apache.org/jira/browse/YARN-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786180#comment-13786180 ] Hudson commented on YARN-1271: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1568 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1568/]) YARN-1271. Text file busy errors launching containers again (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529058) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java Text file busy errors launching containers again -- Key: YARN-1271 URL: https://issues.apache.org/jira/browse/YARN-1271 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.1.2-beta Attachments: YARN-1271.patch The error is shown below in the comments. MAPREDUCE-2374 fixed this by removing -c when running the container launch script. It looks like the -c got brought back during the windows branch merge, so we should remove it again. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1219) FSDownload changes file suffix making FileUtil.unTar() throw exception
[ https://issues.apache.org/jira/browse/YARN-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786177#comment-13786177 ] Hudson commented on YARN-1219: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1568 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1568/]) YARN-1219. FSDownload changes file suffix making FileUtil.unTar() throw exception. Contributed by Shanyu Zhao. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529084) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java FSDownload changes file suffix making FileUtil.unTar() throw exception -- Key: YARN-1219 URL: https://issues.apache.org/jira/browse/YARN-1219 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.1.1-beta, 2.1.2-beta Reporter: shanyu zhao Assignee: shanyu zhao Fix For: 2.1.2-beta Attachments: YARN-1219.patch While running a Hive join operation on Yarn, I saw exception as described below. This is caused by FSDownload copy the files into a temp file and change the suffix into .tmp before unpacking it. In unpack(), it uses FileUtil.unTar() which will determine if the file is gzipped by looking at the file suffix: {code} boolean gzipped = inFile.toString().endsWith(gz); {code} To fix this problem, we can remove the .tmp in the temp file name. Here is the detailed exception: org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:240) at org.apache.hadoop.fs.FileUtil.unTarUsingJava(FileUtil.java:676) at org.apache.hadoop.fs.FileUtil.unTar(FileUtil.java:625) at org.apache.hadoop.yarn.util.FSDownload.unpack(FSDownload.java:203) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:287) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:50) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-621) RM triggers web auth failure before first job
[ https://issues.apache.org/jira/browse/YARN-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786178#comment-13786178 ] Hudson commented on YARN-621: - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1568 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1568/]) YARN-621. Changed YARN web app to not add paths that can cause duplicate additions of authenticated filters there by causing kerberos replay errors. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529030) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/WebApps.java RM triggers web auth failure before first job - Key: YARN-621 URL: https://issues.apache.org/jira/browse/YARN-621 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.4-alpha Reporter: Allen Wittenauer Assignee: Omkar Vinit Joshi Priority: Critical Fix For: 2.1.2-beta Attachments: YARN-621.20131001.1.patch On a secure YARN setup, before the first job is executed, going to the web interface of the resource manager triggers authentication errors. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1236) FairScheduler setting queue name in RMApp is not working
[ https://issues.apache.org/jira/browse/YARN-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786179#comment-13786179 ] Hudson commented on YARN-1236: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1568 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1568/]) YARN-1236. FairScheduler setting queue name in RMApp is not working. (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529034) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java FairScheduler setting queue name in RMApp is not working - Key: YARN-1236 URL: https://issues.apache.org/jira/browse/YARN-1236 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.1.2-beta Attachments: YARN-1236.patch The fair scheduler sometimes picks a different queue than the one an application was submitted to, such as when user-as-default-queue is turned on. It needs to update the queue name in the RMApp so that this choice will be reflected in the UI. This isn't working because the scheduler is looking up the RMApp by application attempt id instead of app id and failing to find it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-677) Increase coverage to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786175#comment-13786175 ] Hudson commented on YARN-677: - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1568 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1568/]) Revert YARN-677. Increase coverage to FairScheduler (Vadim Bondarev and Dennis Y via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528914) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Increase coverage to FairScheduler -- Key: YARN-677 URL: https://issues.apache.org/jira/browse/YARN-677 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Vadim Bondarev Assignee: Andrey Klochkov Attachments: HADOOP-4536-branch-2-a.patch, HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1256) NM silently ignores non-existent service in StartContainerRequest
[ https://issues.apache.org/jira/browse/YARN-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786184#comment-13786184 ] Hudson commented on YARN-1256: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1568 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1568/]) Addendum for missing file YARN-1256. NM silently ignores non-existent service in StartContainerRequest (Xuan Gong via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529048) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/InvalidAuxServiceException.java YARN-1256. NM silently ignores non-existent service in StartContainerRequest (Xuan Gong via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529039) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AuxiliaryServiceHelper.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerManagerWithLCE.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java NM silently ignores non-existent service in StartContainerRequest - Key: YARN-1256 URL: https://issues.apache.org/jira/browse/YARN-1256 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.1-beta Reporter: Bikas Saha Assignee: Xuan Gong Priority: Critical Fix For: 2.1.2-beta Attachments: YARN-1256.1.patch, YARN-1256.2.patch, YARN-1256.3.patch, YARN-1256.4.patch, YARN-1256.5.patch A container can set token service metadata for a service, say shuffle_service. If that service does not exist then the errors is silently ignored. Later, when the next container wants to access data written to shuffle_service by the first task, then it fails because the service does not have the token that was supposed to be set by the first task. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1131) $yarn logs command should return an appropriate error message if YARN application is still running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786176#comment-13786176 ] Hudson commented on YARN-1131: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1568 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1568/]) YARN-1131. logs command should return an appropriate error message if YARN application is still running. Contributed by Siddharth Seth. (hitesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529068) * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/tools/CLI.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogDumper.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestLogDumper.java $yarn logs command should return an appropriate error message if YARN application is still running -- Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Siddharth Seth Priority: Minor Fix For: 2.1.2-beta Attachments: YARN-1131.1.txt, YARN-1131.2.txt, YARN-1131.3.txt In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1199) Make NM/RM Versions Available
[ https://issues.apache.org/jira/browse/YARN-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786173#comment-13786173 ] Hudson commented on YARN-1199: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1568 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1568/]) YARN-1199. Make NM/RM Versions Available (Mit Desai via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529003) * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/TestRMNMInfo.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMNMInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodeInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java Make NM/RM Versions Available - Key: YARN-1199 URL: https://issues.apache.org/jira/browse/YARN-1199 Project: Hadoop YARN Issue Type: Improvement Reporter: Mit Desai Assignee: Mit Desai Fix For: 3.0.0, 2.3.0 Attachments: YARN-1199.patch, YARN-1199.patch, YARN-1199.patch, YARN-1199.patch Now as we have the NM and RM Versions available, we can display the YARN version of nodes running in the cluster. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786183#comment-13786183 ] Hudson commented on YARN-1149: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1568 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1568/]) YARN-1149. NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING. Contributed by Xuan Gong. (hitesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529043) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/CMgrCompletedAppsEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/CMgrCompletedContainersEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerReboot.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerResync.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerShutdown.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/TestApplication.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING - Key: YARN-1149 URL: https://issues.apache.org/jira/browse/YARN-1149 Project: Hadoop YARN Issue Type: Bug Reporter: Ramya Sunil Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch, YARN-1149.7.patch, YARN-1149.8.patch, YARN-1149.9.patch, YARN-1149_branch-2.1-beta.1.patch When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown {noformat} 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp 2013-08-25 20:45:00,876 INFO
[jira] [Assigned] (YARN-913) Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans reassigned YARN-913: Assignee: Robert Joseph Evans Add a way to register long-lived services in a YARN cluster --- Key: YARN-913 URL: https://issues.apache.org/jira/browse/YARN-913 Project: Hadoop YARN Issue Type: New Feature Components: api Affects Versions: 3.0.0 Reporter: Steve Loughran Assignee: Robert Joseph Evans In a YARN cluster you can't predict where services will come up -or on what ports. The services need to work those things out as they come up and then publish them somewhere. Applications need to be able to find the service instance they are to bond to -and not any others in the cluster. Some kind of service registry -in the RM, in ZK, could do this. If the RM held the write access to the ZK nodes, it would be more secure than having apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1270) TestSLSRunner test is failing
[ https://issues.apache.org/jira/browse/YARN-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-1270: Description: Added in the YARN-1021 patch, the test TestSLSRunner is now failing. (was: Added in the YARn-1021 patch, the test TestSLSRunner is now failing.) TestSLSRunner test is failing - Key: YARN-1270 URL: https://issues.apache.org/jira/browse/YARN-1270 Project: Hadoop YARN Issue Type: Bug Reporter: Mit Desai Added in the YARN-1021 patch, the test TestSLSRunner is now failing. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786229#comment-13786229 ] Steve Loughran commented on YARN-913: - what I'm doing right now is just enumerating all instances of my app's type and verifying that the (username, instance-name) is unique : [https://github.com/hortonworks/hoya/blob/master/hoya-core/src/main/java/org/apache/hadoop/hoya/yarn/client/HoyaClient.java#L841] That's got race condition built in to it Add a way to register long-lived services in a YARN cluster --- Key: YARN-913 URL: https://issues.apache.org/jira/browse/YARN-913 Project: Hadoop YARN Issue Type: New Feature Components: api Affects Versions: 3.0.0 Reporter: Steve Loughran Assignee: Robert Joseph Evans In a YARN cluster you can't predict where services will come up -or on what ports. The services need to work those things out as they come up and then publish them somewhere. Applications need to be able to find the service instance they are to bond to -and not any others in the cluster. Some kind of service registry -in the RM, in ZK, could do this. If the RM held the write access to the ZK nodes, it would be more secure than having apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-867) Isolation of failures in aux services
[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786288#comment-13786288 ] Alejandro Abdelnur commented on YARN-867: - patch6 does not look good to me, the try/catch are not correct as an exception in ANY auxiliary service will halt delivery to the other auxiliary services. the try/catch should be done around each call to the auxiliary service interface methods as done in patch4. Isolation of failures in aux services -- Key: YARN-867 URL: https://issues.apache.org/jira/browse/YARN-867 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Critical Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch, YARN-867.sampleCode.2.patch Today, a malicious application can bring down the NM by sending bad data to a service. For example, sending data to the ShuffleService such that it results any non-IOException will cause the NM's async dispatcher to exit as the service's INIT APP event is not handled properly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786299#comment-13786299 ] Robert Joseph Evans commented on YARN-913: -- Yes it does have plenty of races. I'll try to get some detailed designs up shortly but at a high level the general idea is to have a restful web service. For the most common use case there just needs to be two interfaces. - Register/Monitor a Service - Query for Services Part of the reason we need the service registry is to securely verify that a client is talking to the real service, and no one has grabbed the service's port after it registered. To do that I want to have the concept of a verified service. For that we would need an admin interface for adding, updating, and removing verified services. The registry would provide a number of pluggable ways for services to authenticate. Part of adding a verified service would include indicating which authentication models the service can use to register and which users are allowed to register that service. The registry could also act like a trusted Certificate Authority. Another part of adding in a verified service would include indicating how clients could verify they are talking to the true service. This could include just publishing an application id so the client can go to the RM and get a delegation token. Another option would be having the service generate a public/private key pair. When the service registers it would get the private key and the public key would be available through the discovery interface. The plan is to also have the registry monitor the service similar to ZK. The service would heartbeat in to the registry periodically (could be on the order of mins depending on the service) after a certain period of time of inactivity the service would be removed from the registry. Perhaps we should add in an explicit unregister as well. I want to make sure that the data model it is generic enough that we could support something like a web service on the gird where each server can register itself and all of them would show up in the registry, so a service could have one or more servers that are a part of it, and each server could have some separate metadata about it. I also want to have a plug-in interface for discovery, so we could potentially make the registry look like a DNS server or an SSL Certificate Authority which would make compatibility with existing applications and clients a lot simpler. Add a way to register long-lived services in a YARN cluster --- Key: YARN-913 URL: https://issues.apache.org/jira/browse/YARN-913 Project: Hadoop YARN Issue Type: New Feature Components: api Affects Versions: 3.0.0 Reporter: Steve Loughran Assignee: Robert Joseph Evans In a YARN cluster you can't predict where services will come up -or on what ports. The services need to work those things out as they come up and then publish them somewhere. Applications need to be able to find the service instance they are to bond to -and not any others in the cluster. Some kind of service registry -in the RM, in ZK, could do this. If the RM held the write access to the ZK nodes, it would be more secure than having apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1268) TestFairScheduler.testContinuousScheduling is flaky
[ https://issues.apache.org/jira/browse/YARN-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1268: - Attachment: YARN-1268.patch TestFairScheduler.testContinuousScheduling is flaky --- Key: YARN-1268 URL: https://issues.apache.org/jira/browse/YARN-1268 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Wei Yan Attachments: YARN-1268.patch It looks like there's a timeout in it that's causing it to be flaky. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1166) YARN 'appsFailed' metric should be of type 'counter'
[ https://issues.apache.org/jira/browse/YARN-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1166: -- Attachment: YARN-1166.5.patch Fix the test failure YARN 'appsFailed' metric should be of type 'counter' Key: YARN-1166 URL: https://issues.apache.org/jira/browse/YARN-1166 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-1166.2.patch, YARN-1166.3.patch, YARN-1166.4.patch, YARN-1166.5.patch, YARN-1166.patch Currently in YARN's queue metrics, the cumulative metric 'appsFailed' is of type 'guage' - which means the exact value will be reported. All other cumulative queue metrics (AppsSubmitted, AppsCompleted, AppsKilled) are all of type 'counter' - meaning Ganglia will use slope to provide deltas between time-points. To be consistent, AppsFailed metric should also be of type 'counter'. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1271) Text file busy errors launching containers again
[ https://issues.apache.org/jira/browse/YARN-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786315#comment-13786315 ] Sandy Ryza commented on YARN-1271: -- Posted branch-2 addendum Text file busy errors launching containers again -- Key: YARN-1271 URL: https://issues.apache.org/jira/browse/YARN-1271 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.1.2-beta Attachments: YARN-1271-branch-2.patch, YARN-1271.patch The error is shown below in the comments. MAPREDUCE-2374 fixed this by removing -c when running the container launch script. It looks like the -c got brought back during the windows branch merge, so we should remove it again. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1271) Text file busy errors launching containers again
[ https://issues.apache.org/jira/browse/YARN-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1271: - Attachment: YARN-1271-branch-2.patch Text file busy errors launching containers again -- Key: YARN-1271 URL: https://issues.apache.org/jira/browse/YARN-1271 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.1.2-beta Attachments: YARN-1271-branch-2.patch, YARN-1271.patch The error is shown below in the comments. MAPREDUCE-2374 fixed this by removing -c when running the container launch script. It looks like the -c got brought back during the windows branch merge, so we should remove it again. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-867) Isolation of failures in aux services
[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786317#comment-13786317 ] Bikas Saha commented on YARN-867: - tucu you comments were addressed in YARN-1256. This jira is now targeted for more elaborate changes. Isolation of failures in aux services -- Key: YARN-867 URL: https://issues.apache.org/jira/browse/YARN-867 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Critical Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch, YARN-867.sampleCode.2.patch Today, a malicious application can bring down the NM by sending bad data to a service. For example, sending data to the ShuffleService such that it results any non-IOException will cause the NM's async dispatcher to exit as the service's INIT APP event is not handled properly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786318#comment-13786318 ] Steve Loughran commented on YARN-445: - c-break is special in that it can talk to the whole process group: [http://msdn.microsoft.com/en-us/library/windows/desktop/ms683155(v=vs.85).aspx] process-group signalling should be good (make it an option from the sender?) so that I can send a signal to a process started by its own bash script (e.g. bin/hbase-java). However, we do need to remember that some recent ubuntu versions (mistakenly) require a -- between signal and process group id This is quite a significant patch -and it adds a feature that many will find useful - but it its going to need careful review by the YARN experts (of which I am not). Some quick points # I wouldn't mark the interface/methods as stable yet # some of the diffs in the tests look bigger than they should be -reformatting/refactoring? It just makes it harder to distinguish changes. Ideally all the existing tests should be left alone (that way we can be confident that they will catch regressions), with new tests underneath or in their own class Ability to signal containers Key: YARN-445 URL: https://issues.apache.org/jira/browse/YARN-445 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Jason Lowe Attachments: YARN-445--n2.patch, YARN-445.patch It would be nice if an ApplicationMaster could send signals to contaniers such as SIGQUIT, SIGUSR1, etc. For example, in order to replicate the jstack-on-task-timeout feature implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an interface for sending SIGQUIT to a container. For that specific feature we could implement it as an additional field in the StopContainerRequest. However that would not address other potential features like the ability for an AM to trigger jstacks on arbitrary tasks *without* killing them. The latter feature would be a very useful debugging tool for users who do not have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1166) YARN 'appsFailed' metric should be of type 'counter'
[ https://issues.apache.org/jira/browse/YARN-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786348#comment-13786348 ] Hadoop QA commented on YARN-1166: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606822/YARN-1166.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2099//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2099//console This message is automatically generated. YARN 'appsFailed' metric should be of type 'counter' Key: YARN-1166 URL: https://issues.apache.org/jira/browse/YARN-1166 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-1166.2.patch, YARN-1166.3.patch, YARN-1166.4.patch, YARN-1166.5.patch, YARN-1166.patch Currently in YARN's queue metrics, the cumulative metric 'appsFailed' is of type 'guage' - which means the exact value will be reported. All other cumulative queue metrics (AppsSubmitted, AppsCompleted, AppsKilled) are all of type 'counter' - meaning Ganglia will use slope to provide deltas between time-points. To be consistent, AppsFailed metric should also be of type 'counter'. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1251) TestDistributedShell#TestDSShell failed with timeout
[ https://issues.apache.org/jira/browse/YARN-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1251: Attachment: error.log attaching thread dump TestDistributedShell#TestDSShell failed with timeout Key: YARN-1251 URL: https://issues.apache.org/jira/browse/YARN-1251 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Junping Du Attachments: error.log, YARN-1225-kickOffTestDS.patch TestDistributedShell#TestDSShell on trunk Jenkins are failed consistently recently. The Stacktrace is: {code} java.lang.Exception: test timed out after 9 milliseconds at com.google.protobuf.LiteralByteString.init(LiteralByteString.java:234) at com.google.protobuf.ByteString.copyFromUtf8(ByteString.java:255) at org.apache.hadoop.ipc.protobuf.ProtobufRpcEngineProtos$RequestHeaderProto.getMethodNameBytes(ProtobufRpcEngineProtos.java:286) at org.apache.hadoop.ipc.protobuf.ProtobufRpcEngineProtos$RequestHeaderProto.getSerializedSize(ProtobufRpcEngineProtos.java:462) at com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:84) at org.apache.hadoop.ipc.ProtobufRpcEngine$RpcMessageWithHeader.write(ProtobufRpcEngine.java:302) at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:989) at org.apache.hadoop.ipc.Client.call(Client.java:1377) at org.apache.hadoop.ipc.Client.call(Client.java:1357) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at $Proxy70.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at $Proxy71.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:195) at org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:622) at org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:597) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:125) {code} For details, please refer: https://builds.apache.org/job/PreCommit-YARN-Build/2039//testReport/ -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1251) TestDistributedShell#TestDSShell failed with timeout
[ https://issues.apache.org/jira/browse/YARN-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786377#comment-13786377 ] Hadoop QA commented on YARN-1251: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606834/error.log against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2100//console This message is automatically generated. TestDistributedShell#TestDSShell failed with timeout Key: YARN-1251 URL: https://issues.apache.org/jira/browse/YARN-1251 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Junping Du Attachments: error.log, YARN-1225-kickOffTestDS.patch TestDistributedShell#TestDSShell on trunk Jenkins are failed consistently recently. The Stacktrace is: {code} java.lang.Exception: test timed out after 9 milliseconds at com.google.protobuf.LiteralByteString.init(LiteralByteString.java:234) at com.google.protobuf.ByteString.copyFromUtf8(ByteString.java:255) at org.apache.hadoop.ipc.protobuf.ProtobufRpcEngineProtos$RequestHeaderProto.getMethodNameBytes(ProtobufRpcEngineProtos.java:286) at org.apache.hadoop.ipc.protobuf.ProtobufRpcEngineProtos$RequestHeaderProto.getSerializedSize(ProtobufRpcEngineProtos.java:462) at com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:84) at org.apache.hadoop.ipc.ProtobufRpcEngine$RpcMessageWithHeader.write(ProtobufRpcEngine.java:302) at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:989) at org.apache.hadoop.ipc.Client.call(Client.java:1377) at org.apache.hadoop.ipc.Client.call(Client.java:1357) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at $Proxy70.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at $Proxy71.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:195) at org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:622) at org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:597) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:125) {code} For details, please refer: https://builds.apache.org/job/PreCommit-YARN-Build/2039//testReport/ -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-867) Isolation of failures in aux services
[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-867: - Target Version/s: 2.3.0 Isolation of failures in aux services -- Key: YARN-867 URL: https://issues.apache.org/jira/browse/YARN-867 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Critical Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch, YARN-867.sampleCode.2.patch Today, a malicious application can bring down the NM by sending bad data to a service. For example, sending data to the ShuffleService such that it results any non-IOException will cause the NM's async dispatcher to exit as the service's INIT APP event is not handled properly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-867) Isolation of failures in aux services
[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786402#comment-13786402 ] Alejandro Abdelnur commented on YARN-867: - [~bikassaha] got it, missed that was moved to another jira, thx Isolation of failures in aux services -- Key: YARN-867 URL: https://issues.apache.org/jira/browse/YARN-867 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Critical Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch, YARN-867.sampleCode.2.patch Today, a malicious application can bring down the NM by sending bad data to a service. For example, sending data to the ShuffleService such that it results any non-IOException will cause the NM's async dispatcher to exit as the service's INIT APP event is not handled properly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-1270) TestSLSRunner test is failing
[ https://issues.apache.org/jira/browse/YARN-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan reassigned YARN-1270: - Assignee: Wei Yan TestSLSRunner test is failing - Key: YARN-1270 URL: https://issues.apache.org/jira/browse/YARN-1270 Project: Hadoop YARN Issue Type: Bug Reporter: Mit Desai Assignee: Wei Yan Added in the YARN-1021 patch, the test TestSLSRunner is now failing. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786407#comment-13786407 ] Wei Yan commented on YARN-1021: --- Thanks, [~mitdesai]. I'll look into it. Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Fix For: 2.3.0 Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty
[ https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1167: Attachment: YARN-1167.6.patch new patch contains a modified test case. Passing the TestDistributedShell test locally. Check if Jenkins like it Submitted distributed shell application shows appMasterHost = empty --- Key: YARN-1167 URL: https://issues.apache.org/jira/browse/YARN-1167 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1167.1.patch, YARN-1167.2.patch, YARN-1167.3.patch, YARN-1167.4.patch, YARN-1167.5.patch, YARN-1167.6.patch Submit distributed shell application. Once the application turns to be RUNNING state, app master host should not be empty. In reality, it is empty. ==console logs== distributedshell.Client: Got application report from ASM for, appId=12, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty
[ https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786434#comment-13786434 ] Omkar Vinit Joshi commented on YARN-1167: - [~xgong] can you please verify why we had to reduce memory arguments and container nos? Is that because we don't have memory or some race condition? Submitted distributed shell application shows appMasterHost = empty --- Key: YARN-1167 URL: https://issues.apache.org/jira/browse/YARN-1167 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1167.1.patch, YARN-1167.2.patch, YARN-1167.3.patch, YARN-1167.4.patch, YARN-1167.5.patch, YARN-1167.6.patch Submit distributed shell application. Once the application turns to be RUNNING state, app master host should not be empty. In reality, it is empty. ==console logs== distributedshell.Client: Got application report from ASM for, appId=12, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty
[ https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1167: Attachment: YARN-1167.7.patch set AMRPCPort to -1 and restore all parameters for the MinYarnCluster Submitted distributed shell application shows appMasterHost = empty --- Key: YARN-1167 URL: https://issues.apache.org/jira/browse/YARN-1167 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1167.1.patch, YARN-1167.2.patch, YARN-1167.3.patch, YARN-1167.4.patch, YARN-1167.5.patch, YARN-1167.6.patch, YARN-1167.7.patch Submit distributed shell application. Once the application turns to be RUNNING state, app master host should not be empty. In reality, it is empty. ==console logs== distributedshell.Client: Got application report from ASM for, appId=12, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty
[ https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786453#comment-13786453 ] Hadoop QA commented on YARN-1167: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606848/YARN-1167.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2101//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2101//console This message is automatically generated. Submitted distributed shell application shows appMasterHost = empty --- Key: YARN-1167 URL: https://issues.apache.org/jira/browse/YARN-1167 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1167.1.patch, YARN-1167.2.patch, YARN-1167.3.patch, YARN-1167.4.patch, YARN-1167.5.patch, YARN-1167.6.patch, YARN-1167.7.patch Submit distributed shell application. Once the application turns to be RUNNING state, app master host should not be empty. In reality, it is empty. ==console logs== distributedshell.Client: Got application report from ASM for, appId=12, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1232) Configuration to support multiple RMs
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786459#comment-13786459 ] Bikas Saha commented on YARN-1232: -- Looks good. +1. Thanks for being patient with the reviews! Configuration to support multiple RMs - Key: YARN-1232 URL: https://issues.apache.org/jira/browse/YARN-1232 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch, yarn-1232-7.patch, yarn-1232-7.patch We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty
[ https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786487#comment-13786487 ] Hadoop QA commented on YARN-1167: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606854/YARN-1167.7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell: org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2102//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2102//console This message is automatically generated. Submitted distributed shell application shows appMasterHost = empty --- Key: YARN-1167 URL: https://issues.apache.org/jira/browse/YARN-1167 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1167.1.patch, YARN-1167.2.patch, YARN-1167.3.patch, YARN-1167.4.patch, YARN-1167.5.patch, YARN-1167.6.patch, YARN-1167.7.patch Submit distributed shell application. Once the application turns to be RUNNING state, app master host should not be empty. In reality, it is empty. ==console logs== distributedshell.Client: Got application report from ASM for, appId=12, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty
[ https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786491#comment-13786491 ] Omkar Vinit Joshi commented on YARN-1167: - bq. + private int appMasterRpcPort = -1; why? Submitted distributed shell application shows appMasterHost = empty --- Key: YARN-1167 URL: https://issues.apache.org/jira/browse/YARN-1167 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1167.1.patch, YARN-1167.2.patch, YARN-1167.3.patch, YARN-1167.4.patch, YARN-1167.5.patch, YARN-1167.6.patch, YARN-1167.7.patch Submit distributed shell application. Once the application turns to be RUNNING state, app master host should not be empty. In reality, it is empty. ==console logs== distributedshell.Client: Got application report from ASM for, appId=12, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty
[ https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786497#comment-13786497 ] Vinod Kumar Vavilapalli commented on YARN-1167: --- Latest patch looks good. Can you debug why TestDistributedShell works with patch #6 but not with #7 ? Submitted distributed shell application shows appMasterHost = empty --- Key: YARN-1167 URL: https://issues.apache.org/jira/browse/YARN-1167 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1167.1.patch, YARN-1167.2.patch, YARN-1167.3.patch, YARN-1167.4.patch, YARN-1167.5.patch, YARN-1167.6.patch, YARN-1167.7.patch Submit distributed shell application. Once the application turns to be RUNNING state, app master host should not be empty. In reality, it is empty. ==console logs== distributedshell.Client: Got application report from ASM for, appId=12, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786506#comment-13786506 ] Andrey Klochkov commented on YARN-445: -- The large diffs in the tests are not due to reformatting but because of refactoring needed to implement an additional test without lots of copy/paste. Ability to signal containers Key: YARN-445 URL: https://issues.apache.org/jira/browse/YARN-445 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Jason Lowe Attachments: YARN-445--n2.patch, YARN-445.patch It would be nice if an ApplicationMaster could send signals to contaniers such as SIGQUIT, SIGUSR1, etc. For example, in order to replicate the jstack-on-task-timeout feature implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an interface for sending SIGQUIT to a container. For that specific feature we could implement it as an additional field in the StopContainerRequest. However that would not address other potential features like the ability for an AM to trigger jstacks on arbitrary tasks *without* killing them. The latter feature would be a very useful debugging tool for users who do not have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1272) Add a link to cluster/application page on node manager's list of application page
Paul Han created YARN-1272: -- Summary: Add a link to cluster/application page on node manager's list of application page Key: YARN-1272 URL: https://issues.apache.org/jira/browse/YARN-1272 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.5-alpha Reporter: Paul Han On node manager's application/application page, the content is significant less than the content on resource managers's application page /cluster/application. Adding a link from nodemanager's application page to resourcemanager's application page will help user get info faster and more efficient. Please see the screenshot for the benefit. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy
[ https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786532#comment-13786532 ] Ravi Prakash commented on YARN-465: --- Thanks for the updates Andrey! Here are some questions I had and comments. - Why did you remove the proxy.join() from startup? - If you removed proxy.join(), you didn't need to create a new method (startServer). Just call main() on the class. - In the test file in start(), did you mean to log the originalPort instead of port? port would always be 0. - Why did you have to create a core-default.xml file? Could you not have hardcoded the port inside the test file? Also, could you please tell me where hadoop.common.configuration.version is being used? I wasn't able to find it. - Nit: Can you setName(proxy) - setName(Proxy for test); -Nit: Could you please put a more detailed message in WebAppProxyForTest.start() when the proxy server starts up? - In testWebAppProxyServlet(), what is the significance of proxyConn.setRequestProperty(Cookie, checked_application_0_=true); ? The test passes after commenting out that line too. - What is testWebAppProxyServer that testWebAppProxyServlet()'s first test isn't testing? - What is testWebAppProxyServerMainMethod actually testing? counter is set 0 on the very first successful try. Shouldn't that be the expected behavior. If we are having to start the proxy server more than 1 time for the test to pass, that is bad, and should make the test fail. fix coverage org.apache.hadoop.yarn.server.webproxy Key: YARN-465 URL: https://issues.apache.org/jira/browse/YARN-465 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Assignee: Andrey Klochkov Attachments: YARN-465-branch-0.23-a.patch, YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, YARN-465-branch-2--n3.patch, YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk--n3.patch, YARN-465-trunk.patch fix coverage org.apache.hadoop.yarn.server.webproxy patch YARN-465-trunk.patch for trunk patch YARN-465-branch-2.patch for branch-2 patch YARN-465-branch-0.23.patch for branch-0.23 There is issue in branch-0.23 . Patch does not creating .keep file. For fix it need to run commands: mkdir yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy touch yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1254) NM is polluting container's credentials
[ https://issues.apache.org/jira/browse/YARN-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786539#comment-13786539 ] Vinod Kumar Vavilapalli commented on YARN-1254: --- Don't think this patch is correct. The fundamental problem is that ResourceLocalizationService.writeCredentials() is polluting container's credentials by adding LocalizerToken. We should just close container's credentials before writing the token file for the localizer. NM is polluting container's credentials --- Key: YARN-1254 URL: https://issues.apache.org/jira/browse/YARN-1254 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Omkar Vinit Joshi Attachments: YARN-1254.20131030.1.patch Before launching the container, NM is using the same credential object and so is polluting what container should see. We should fix this. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1232) Configuration to support multiple RMs
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786549#comment-13786549 ] Hudson commented on YARN-1232: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4539 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4539/]) YARN-1232. Configuration to support multiple RMs (Karthik Kambatla via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529251) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestHAUtil.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ServerRMProxy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMHAProtocolService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java Configuration to support multiple RMs - Key: YARN-1232 URL: https://issues.apache.org/jira/browse/YARN-1232 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch, yarn-1232-7.patch, yarn-1232-7.patch We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1268) TestFairScheduler.testContinuousScheduling is flaky
[ https://issues.apache.org/jira/browse/YARN-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1268: - Attachment: YARN-1268-1.patch TestFairScheduler.testContinuousScheduling is flaky --- Key: YARN-1268 URL: https://issues.apache.org/jira/browse/YARN-1268 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1268-1.patch, YARN-1268.patch It looks like there's a timeout in it that's causing it to be flaky. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786572#comment-13786572 ] Andrey Klochkov commented on YARN-445: -- Steve, the current implementation will send the signal to the java started with bin/hbase as it sends it to all processes in the job object, e.g. all processes of the main container process. It can be replaced with sending the signal to all processes in the group instead, and I think the behavior will be the same. BTW I don't know how to do the opposite - i.e. how to avoid sending the signal to all processes of the container, on Windows (so the behavior on Linux is different as bin/hbase will receive the signal). I think this is fine as long as this difference is documented. In case of hbase the shell script can create a custom hook for SIGTERM and do whatever is needed in that case (e.g. send SIGTERM to the java process it started). There is one caveat in ctrl+break handling in case of a batch file starting a java process: 1. the batch file starts the java process 2. user sends ctrl+break to all processes in the group (or job object). java process prints thread dump. batch file doesn't react yet. 3. the java processes completes successfully 4. the batch file will not exit, it will print Terminate batch job? (Y/N) as it received the ctrl+break signal earlier. The only way I see on how we can overcome this problem with batch file processes is to identify them somehow (by executable name?) when walking through the processes in the job object, and do not send them the signal. Sending ctrl+break to batch file processes doesn't make sense anyway as in newer Windows there's no way to disable or customize ctrl+break handling in batch files. Ability to signal containers Key: YARN-445 URL: https://issues.apache.org/jira/browse/YARN-445 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Jason Lowe Attachments: YARN-445--n2.patch, YARN-445.patch It would be nice if an ApplicationMaster could send signals to contaniers such as SIGQUIT, SIGUSR1, etc. For example, in order to replicate the jstack-on-task-timeout feature implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an interface for sending SIGQUIT to a container. For that specific feature we could implement it as an additional field in the StopContainerRequest. However that would not address other potential features like the ability for an AM to trigger jstacks on arbitrary tasks *without* killing them. The latter feature would be a very useful debugging tool for users who do not have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786577#comment-13786577 ] Jason Lowe commented on YARN-415: - The latest patch no longer applies to trunk. Could you please refresh it? Some review comments: General: * Nit: the extra plurality of VirtualCoresSeconds sounds a bit odd, wondering if it should be VirtualCoreSeconds or VcoreSeconds in the various places it appears. ApplicationCLI: * UI wording: In the code it's vcore-seconds but the UI says CPU-seconds. I'm wondering if users are going to interpret CPU to be a hardware core, and I'm not sure a vcore will map to a hardware core in the typical case. The configuration properties refer to vcores, so we should probably use vcore-seconds here for consistency. Curious what others think about this, as I could be convinced to leave it as CPU. RMAppAttempt has just a spurious whitespace change RMAppAttemptImpl: * Nit: containerAllocated and containerFinished are private and always called from transitions, so acquiring the write lock is unnecessary. * ContainerFinishedTransition.transition does not call containerFinished when it's the AM container. We leak the AM container and consider it always running if an AM crashes. RMContainerEvent: * Nit: whitespace between the constructor definitions would be nice. TestRMAppAttemptTransitions: * Nit: it would be cleaner and easier to read if we add a new allocateApplicationAttemptAtTime method and have the existing allocateApplicationAttempt method simply call it with -1 rather than change all those places to pass -1. Speaking of leaking containers, is there something we can do to audit/assert that applications that have completed don't have running containers? If we lose track of a container finished event, the consumed resources are going to keep increasing indefinitely. It's a bug in the RM either way but wondering if there's some warning/sanity checking we can do to keep the metric from becoming utterly useless when it occurs. Capping it at the end of the application would at least prevent it from growing beyond the application lifetime. Then again, letting it grow continuously at least is more indicative something went terribly wrong with the accounting and therefore the metric can't be trusted. Just thinking out loud, not sure what the best solution is. Capture memory utilization at the app-level for chargeback -- Key: YARN-415 URL: https://issues.apache.org/jira/browse/YARN-415 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 0.23.6 Reporter: Kendall Thrapp Assignee: Andrey Klochkov Attachments: YARN-415--n2.patch, YARN-415--n3.patch, YARN-415.patch For the purpose of chargeback, I'd like to be able to compute the cost of an application in terms of cluster resource usage. To start out, I'd like to get the memory utilization of an application. The unit should be MB-seconds or something similar and, from a chargeback perspective, the memory amount should be the memory reserved for the application, as even if the app didn't use all that memory, no one else was able to use it. (reserved ram for container 1 * lifetime of container 1) + (reserved ram for container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime of container n) It'd be nice to have this at the app level instead of the job level because: 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear on the job history server). 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). This new metric should be available both through the RM UI and RM Web Services REST API. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty
[ https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786596#comment-13786596 ] Xuan Gong commented on YARN-1167: - [~vinodkv] We can set appMasterRpcPort as -1. Because there is the parameter check for AMRMClientImple. {code} Preconditions.checkArgument(appHostPort = 0, Port number of the host should not be negative); {code} Submitted distributed shell application shows appMasterHost = empty --- Key: YARN-1167 URL: https://issues.apache.org/jira/browse/YARN-1167 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1167.1.patch, YARN-1167.2.patch, YARN-1167.3.patch, YARN-1167.4.patch, YARN-1167.5.patch, YARN-1167.6.patch, YARN-1167.7.patch Submit distributed shell application. Once the application turns to be RUNNING state, app master host should not be empty. In reality, it is empty. ==console logs== distributedshell.Client: Got application report from ASM for, appId=12, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty
[ https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786597#comment-13786597 ] Xuan Gong commented on YARN-1167: - Can not set the port as -1 Submitted distributed shell application shows appMasterHost = empty --- Key: YARN-1167 URL: https://issues.apache.org/jira/browse/YARN-1167 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1167.1.patch, YARN-1167.2.patch, YARN-1167.3.patch, YARN-1167.4.patch, YARN-1167.5.patch, YARN-1167.6.patch, YARN-1167.7.patch Submit distributed shell application. Once the application turns to be RUNNING state, app master host should not be empty. In reality, it is empty. ==console logs== distributedshell.Client: Got application report from ASM for, appId=12, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1268) TestFairScheduler.testContinuousScheduling is flaky
[ https://issues.apache.org/jira/browse/YARN-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786621#comment-13786621 ] Hadoop QA commented on YARN-1268: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606878/YARN-1268-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2103//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2103//console This message is automatically generated. TestFairScheduler.testContinuousScheduling is flaky --- Key: YARN-1268 URL: https://issues.apache.org/jira/browse/YARN-1268 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1268-1.patch, YARN-1268.patch It looks like there's a timeout in it that's causing it to be flaky. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-1183) MiniYARNCluster shutdown takes several minutes intermittently
[ https://issues.apache.org/jira/browse/YARN-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Klochkov reassigned YARN-1183: - Assignee: Andrey Klochkov MiniYARNCluster shutdown takes several minutes intermittently - Key: YARN-1183 URL: https://issues.apache.org/jira/browse/YARN-1183 Project: Hadoop YARN Issue Type: Bug Reporter: Andrey Klochkov Assignee: Andrey Klochkov Attachments: YARN-1183--n2.patch, YARN-1183--n3.patch, YARN-1183--n4.patch, YARN-1183.patch As described in MAPREDUCE-5501 sometimes M/R tests leave MRAppMaster java processes living for several minutes after successful completion of the corresponding test. There is a concurrency issue in MiniYARNCluster shutdown logic which leads to this. Sometimes RM stops before an app master sends it's last report, and then the app master keeps retrying for 6 minutes. In some cases it leads to failures in subsequent tests, and it affects performance of tests as app masters eat resources. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1253) Changes to LinuxContainerExecutor to run containers as a single dedicated user in non-secure mode
[ https://issues.apache.org/jira/browse/YARN-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786679#comment-13786679 ] Hudson commented on YARN-1253: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4541 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4541/]) YARN-1253. Changes to LinuxContainerExecutor to run containers as a single dedicated user in non-secure mode. (rvs via tucu) (tucu: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1529325) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java Changes to LinuxContainerExecutor to run containers as a single dedicated user in non-secure mode - Key: YARN-1253 URL: https://issues.apache.org/jira/browse/YARN-1253 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Roman Shaposhnik Priority: Blocker Fix For: 2.3.0 Attachments: YARN-1253.patch.txt When using cgroups we require LCE to be configured in the cluster to start containers. When LCE starts containers as the user that submitted the job. While this works correctly in a secure setup, in an un-secure setup this presents a couple issues: * LCE requires all Hadoop users submitting jobs to be Unix users in all nodes * Because users can impersonate other users, any user would have access to any local file of other users Particularly, the second issue is not desirable as a user could get access to ssh keys of other users in the nodes or if there are NFS mounts, get to other users data outside of the cluster. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy
[ https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Klochkov updated YARN-465: - Attachment: YARN-465-trunk--n4.patch Ravi, this is not my patch so please keep in mind I'm digging into this code as you are. Alexey wouldn't be available to make fixes so I'm taking this on me so the contribution wouldn't be lost. 1-2. As I see, WebAppProxy.start() method is used in another test, so that should be the reason it's not a part of the main method. The join method is removed as it's not used anymore. 3. I think that it is meant to log port, not originalPort. The port variable is set in WebAppProxyForTest.start() to the actual port which the server binds to. 4. Indeed core-default.xml is not needed. I'm replacing it with making this configuration in the code of the test itself. 5. It must be setName(proxy) as this is the name of the webapp under hadoop-yarn-common/src/main/resources/webapps. If you set it to anything else that would lead to ClassNotFoundException. I made the message about the port number more detailed. 6. I added the check which verifies that the cookie is present in one case and is absent in another. 7. Yes, I don't see why testWebAppProxyServer is needed in the presense of testWebAppProxyServlet. Removing. 8. The test testWebAppProxyServerMainMethod is testing that the server is starting successfully. The counter is used to wait for the server to start. Attaching the updated patch for trunk fix coverage org.apache.hadoop.yarn.server.webproxy Key: YARN-465 URL: https://issues.apache.org/jira/browse/YARN-465 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Assignee: Andrey Klochkov Attachments: YARN-465-branch-0.23-a.patch, YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, YARN-465-branch-2--n3.patch, YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk--n3.patch, YARN-465-trunk--n4.patch, YARN-465-trunk.patch fix coverage org.apache.hadoop.yarn.server.webproxy patch YARN-465-trunk.patch for trunk patch YARN-465-branch-2.patch for branch-2 patch YARN-465-branch-0.23.patch for branch-0.23 There is issue in branch-0.23 . Patch does not creating .keep file. For fix it need to run commands: mkdir yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy touch yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy
[ https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Klochkov updated YARN-465: - Attachment: YARN-465-branch-2--n4.patch Attaching the updated patch for branch-2 fix coverage org.apache.hadoop.yarn.server.webproxy Key: YARN-465 URL: https://issues.apache.org/jira/browse/YARN-465 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Assignee: Andrey Klochkov Attachments: YARN-465-branch-0.23-a.patch, YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, YARN-465-branch-2--n3.patch, YARN-465-branch-2--n4.patch, YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk--n3.patch, YARN-465-trunk--n4.patch, YARN-465-trunk.patch fix coverage org.apache.hadoop.yarn.server.webproxy patch YARN-465-trunk.patch for trunk patch YARN-465-branch-2.patch for branch-2 patch YARN-465-branch-0.23.patch for branch-0.23 There is issue in branch-0.23 . Patch does not creating .keep file. For fix it need to run commands: mkdir yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy touch yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy
[ https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786720#comment-13786720 ] Hadoop QA commented on YARN-465: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606913/YARN-465-branch-2--n4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2104//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2104//console This message is automatically generated. fix coverage org.apache.hadoop.yarn.server.webproxy Key: YARN-465 URL: https://issues.apache.org/jira/browse/YARN-465 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Assignee: Andrey Klochkov Attachments: YARN-465-branch-0.23-a.patch, YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, YARN-465-branch-2--n3.patch, YARN-465-branch-2--n4.patch, YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk--n3.patch, YARN-465-trunk--n4.patch, YARN-465-trunk.patch fix coverage org.apache.hadoop.yarn.server.webproxy patch YARN-465-trunk.patch for trunk patch YARN-465-branch-2.patch for branch-2 patch YARN-465-branch-0.23.patch for branch-0.23 There is issue in branch-0.23 . Patch does not creating .keep file. For fix it need to run commands: mkdir yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy touch yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1254) NM is polluting container's credentials
[ https://issues.apache.org/jira/browse/YARN-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-1254: Attachment: YARN-1254.20131004.1.patch NM is polluting container's credentials --- Key: YARN-1254 URL: https://issues.apache.org/jira/browse/YARN-1254 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Omkar Vinit Joshi Attachments: YARN-1254.20131004.1.patch, YARN-1254.20131030.1.patch Before launching the container, NM is using the same credential object and so is polluting what container should see. We should fix this. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-415) Capture memory utilization at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Klochkov updated YARN-415: - Attachment: YARN-415--n4.patch Jason, thanks for the thorough review. Attaching the patch with fixes. I basically made all the fixes you're proposing except the last one about capturing the leak. Capture memory utilization at the app-level for chargeback -- Key: YARN-415 URL: https://issues.apache.org/jira/browse/YARN-415 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 0.23.6 Reporter: Kendall Thrapp Assignee: Andrey Klochkov Attachments: YARN-415--n2.patch, YARN-415--n3.patch, YARN-415--n4.patch, YARN-415.patch For the purpose of chargeback, I'd like to be able to compute the cost of an application in terms of cluster resource usage. To start out, I'd like to get the memory utilization of an application. The unit should be MB-seconds or something similar and, from a chargeback perspective, the memory amount should be the memory reserved for the application, as even if the app didn't use all that memory, no one else was able to use it. (reserved ram for container 1 * lifetime of container 1) + (reserved ram for container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime of container n) It'd be nice to have this at the app level instead of the job level because: 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear on the job history server). 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). This new metric should be available both through the RM UI and RM Web Services REST API. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1254) NM is polluting container's credentials
[ https://issues.apache.org/jira/browse/YARN-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786745#comment-13786745 ] Omkar Vinit Joshi commented on YARN-1254: - Not able to find a good way to test container credential contamination. NM is polluting container's credentials --- Key: YARN-1254 URL: https://issues.apache.org/jira/browse/YARN-1254 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Omkar Vinit Joshi Attachments: YARN-1254.20131004.1.patch, YARN-1254.20131030.1.patch Before launching the container, NM is using the same credential object and so is polluting what container should see. We should fix this. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Klochkov updated YARN-445: - Attachment: YARN-445--n3.patch Attaching the patch that marks all new interfaces/methods as unstable. Ability to signal containers Key: YARN-445 URL: https://issues.apache.org/jira/browse/YARN-445 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Jason Lowe Attachments: YARN-445--n2.patch, YARN-445--n3.patch, YARN-445.patch It would be nice if an ApplicationMaster could send signals to contaniers such as SIGQUIT, SIGUSR1, etc. For example, in order to replicate the jstack-on-task-timeout feature implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an interface for sending SIGQUIT to a container. For that specific feature we could implement it as an additional field in the StopContainerRequest. However that would not address other potential features like the ability for an AM to trigger jstacks on arbitrary tasks *without* killing them. The latter feature would be a very useful debugging tool for users who do not have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty
[ https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1167: Attachment: YARN-1167.8.patch Submitted distributed shell application shows appMasterHost = empty --- Key: YARN-1167 URL: https://issues.apache.org/jira/browse/YARN-1167 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1167.1.patch, YARN-1167.2.patch, YARN-1167.3.patch, YARN-1167.4.patch, YARN-1167.5.patch, YARN-1167.6.patch, YARN-1167.7.patch, YARN-1167.8.patch Submit distributed shell application. Once the application turns to be RUNNING state, app master host should not be empty. In reality, it is empty. ==console logs== distributedshell.Client: Got application report from ASM for, appId=12, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1254) NM is polluting container's credentials
[ https://issues.apache.org/jira/browse/YARN-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786759#comment-13786759 ] Hadoop QA commented on YARN-1254: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606924/YARN-1254.20131004.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2105//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2105//console This message is automatically generated. NM is polluting container's credentials --- Key: YARN-1254 URL: https://issues.apache.org/jira/browse/YARN-1254 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Omkar Vinit Joshi Attachments: YARN-1254.20131004.1.patch, YARN-1254.20131030.1.patch Before launching the container, NM is using the same credential object and so is polluting what container should see. We should fix this. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1273) Distributed shell does not account for start container failures reported asynchronously.
Hitesh Shah created YARN-1273: - Summary: Distributed shell does not account for start container failures reported asynchronously. Key: YARN-1273 URL: https://issues.apache.org/jira/browse/YARN-1273 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah 2013-10-04 22:09:15,234 ERROR [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #1] distributedshell.ApplicationMaster (ApplicationMaster.java:onStartContainerError(719)) - Failed to start Container container_1380920347574_0018_01_06 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1273) Distributed shell does not account for start container failures reported asynchronously.
[ https://issues.apache.org/jira/browse/YARN-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-1273: -- Attachment: YARN-1273.1.patch Trivial patch. No tests - manually tested by changing NMClientAsync to trigger failures. Distributed shell does not account for start container failures reported asynchronously. Key: YARN-1273 URL: https://issues.apache.org/jira/browse/YARN-1273 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Attachments: YARN-1273.1.patch 2013-10-04 22:09:15,234 ERROR [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #1] distributedshell.ApplicationMaster (ApplicationMaster.java:onStartContainerError(719)) - Failed to start Container container_1380920347574_0018_01_06 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1272) Add a link to cluster/application page on node manager's list of application page
[ https://issues.apache.org/jira/browse/YARN-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Han updated YARN-1272: --- Attachment: YARN-1272.patch Add a link to cluster/application page on node manager's list of application page - Key: YARN-1272 URL: https://issues.apache.org/jira/browse/YARN-1272 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.5-alpha Reporter: Paul Han Attachments: YARN-1272.patch On node manager's application/application page, the content is significant less than the content on resource managers's application page /cluster/application. Adding a link from nodemanager's application page to resourcemanager's application page will help user get info faster and more efficient. Please see the screenshot for the benefit. -- This message was sent by Atlassian JIRA (v6.1#6144)