[jira] [Commented] (YARN-3400) [JDK 8] Build Failure due to unreported exceptions in RPCUtil
[ https://issues.apache.org/jira/browse/YARN-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381661#comment-14381661 ] Steve Loughran commented on YARN-3400: -- I'd seen this too. Given that jenkins is happy with it, and you can replicate with the javac version updated: +1 [JDK 8] Build Failure due to unreported exceptions in RPCUtil -- Key: YARN-3400 URL: https://issues.apache.org/jira/browse/YARN-3400 Project: Hadoop YARN Issue Type: Bug Reporter: Robert Kanter Assignee: Robert Kanter Attachments: YARN-3400.patch When I try compiling Hadoop with JDK 8 like this {noformat} mvn clean package -Pdist -Dtar -DskipTests -Djavac.version=1.8 {noformat} I get this error: {noformat} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hadoop-yarn-common: Compilation failure: Compilation failure: [ERROR] /Users/rkanter/dev/hadoop-common2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/RPCUtil.java:[101,11] unreported exception java.lang.Throwable; must be caught or declared to be thrown [ERROR] /Users/rkanter/dev/hadoop-common2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/RPCUtil.java:[104,11] unreported exception java.lang.Throwable; must be caught or declared to be thrown [ERROR] /Users/rkanter/dev/hadoop-common2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/RPCUtil.java:[107,11] unreported exception java.lang.Throwable; must be caught or declared to be thrown {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381605#comment-14381605 ] Anubhav Dhoot commented on YARN-2893: - The AMLauncher changes look like a possible fix though it does not have a matching unit test that demonstrates the root cause for this bug. The changes for RMAppManager#submitApplication seems to no longer return RMAppRejectedEvent for any exception in getDelegationTokenRenewer().addApplicationAsync. Is that deliberate? AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: zhihai xu Attachments: YARN-2893.000.patch, YARN-2893.001.patch MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2213) Change proxy-user cookie log in AmIpFilter to DEBUG
[ https://issues.apache.org/jira/browse/YARN-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381750#comment-14381750 ] Hudson commented on YARN-2213: -- FAILURE: Integrated in Hadoop-Yarn-trunk #878 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/878/]) YARN-2213. Change proxy-user cookie log in AmIpFilter to DEBUG. (xgong: rev e556198e71df6be3a83e5598265cb702fc7a668b) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/amfilter/AmIpFilter.java Change proxy-user cookie log in AmIpFilter to DEBUG --- Key: YARN-2213 URL: https://issues.apache.org/jira/browse/YARN-2213 Project: Hadoop YARN Issue Type: Task Reporter: Ted Yu Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 Attachments: YARN-2213.001.patch, YARN-2213.02.patch I saw a lot of the following lines in AppMaster log: {code} 14/06/24 17:12:36 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set {code} For long running app, this would consume considerable log space. Log level should be changed to DEBUG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3397) yarn rmadmin should skip -failover
[ https://issues.apache.org/jira/browse/YARN-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381748#comment-14381748 ] Hudson commented on YARN-3397: -- FAILURE: Integrated in Hadoop-Yarn-trunk #878 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/878/]) YARN-3397. yarn rmadmin should skip -failover. (J.Andreina via kasha) (kasha: rev c906a1de7280dabd9d9d8b6aeaa060898e6d17b6) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java yarn rmadmin should skip -failover -- Key: YARN-3397 URL: https://issues.apache.org/jira/browse/YARN-3397 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: J.Andreina Assignee: J.Andreina Priority: Minor Fix For: 2.8.0 Attachments: YARN-3397.1.patch Failover should be filtered out from HAAdmin to be in sync with doc. Since -failover is not supported operation in doc it is not been mentioned, cli usage is misguiding (can be in sync with doc) . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3397) yarn rmadmin should skip -failover
[ https://issues.apache.org/jira/browse/YARN-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381737#comment-14381737 ] Hudson commented on YARN-3397: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #144 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/144/]) YARN-3397. yarn rmadmin should skip -failover. (J.Andreina via kasha) (kasha: rev c906a1de7280dabd9d9d8b6aeaa060898e6d17b6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java * hadoop-yarn-project/CHANGES.txt yarn rmadmin should skip -failover -- Key: YARN-3397 URL: https://issues.apache.org/jira/browse/YARN-3397 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: J.Andreina Assignee: J.Andreina Priority: Minor Fix For: 2.8.0 Attachments: YARN-3397.1.patch Failover should be filtered out from HAAdmin to be in sync with doc. Since -failover is not supported operation in doc it is not been mentioned, cli usage is misguiding (can be in sync with doc) . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2213) Change proxy-user cookie log in AmIpFilter to DEBUG
[ https://issues.apache.org/jira/browse/YARN-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381739#comment-14381739 ] Hudson commented on YARN-2213: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #144 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/144/]) YARN-2213. Change proxy-user cookie log in AmIpFilter to DEBUG. (xgong: rev e556198e71df6be3a83e5598265cb702fc7a668b) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/amfilter/AmIpFilter.java Change proxy-user cookie log in AmIpFilter to DEBUG --- Key: YARN-2213 URL: https://issues.apache.org/jira/browse/YARN-2213 Project: Hadoop YARN Issue Type: Task Reporter: Ted Yu Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 Attachments: YARN-2213.001.patch, YARN-2213.02.patch I saw a lot of the following lines in AppMaster log: {code} 14/06/24 17:12:36 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set {code} For long running app, this would consume considerable log space. Log level should be changed to DEBUG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3397) yarn rmadmin should skip -failover
[ https://issues.apache.org/jira/browse/YARN-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381858#comment-14381858 ] Hudson commented on YARN-3397: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #144 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/144/]) YARN-3397. yarn rmadmin should skip -failover. (J.Andreina via kasha) (kasha: rev c906a1de7280dabd9d9d8b6aeaa060898e6d17b6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java yarn rmadmin should skip -failover -- Key: YARN-3397 URL: https://issues.apache.org/jira/browse/YARN-3397 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: J.Andreina Assignee: J.Andreina Priority: Minor Fix For: 2.8.0 Attachments: YARN-3397.1.patch Failover should be filtered out from HAAdmin to be in sync with doc. Since -failover is not supported operation in doc it is not been mentioned, cli usage is misguiding (can be in sync with doc) . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3397) yarn rmadmin should skip -failover
[ https://issues.apache.org/jira/browse/YARN-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381879#comment-14381879 ] Hudson commented on YARN-3397: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2094 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2094/]) YARN-3397. yarn rmadmin should skip -failover. (J.Andreina via kasha) (kasha: rev c906a1de7280dabd9d9d8b6aeaa060898e6d17b6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java yarn rmadmin should skip -failover -- Key: YARN-3397 URL: https://issues.apache.org/jira/browse/YARN-3397 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: J.Andreina Assignee: J.Andreina Priority: Minor Fix For: 2.8.0 Attachments: YARN-3397.1.patch Failover should be filtered out from HAAdmin to be in sync with doc. Since -failover is not supported operation in doc it is not been mentioned, cli usage is misguiding (can be in sync with doc) . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383245#comment-14383245 ] Hadoop QA commented on YARN-2893: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707662/YARN-2893.002.patch against trunk revision 47782cb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMHA org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7118//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7118//console This message is automatically generated. AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: zhihai xu Attachments: YARN-2893.000.patch, YARN-2893.001.patch, YARN-2893.002.patch MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3324) TestDockerContainerExecutor should clean test docker image from local repository after test is done
[ https://issues.apache.org/jira/browse/YARN-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383323#comment-14383323 ] Chen He commented on YARN-3324: --- +1, sounds good to me. Thanks, [~ravindra.naik] TestDockerContainerExecutor should clean test docker image from local repository after test is done --- Key: YARN-3324 URL: https://issues.apache.org/jira/browse/YARN-3324 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Chen He Attachments: YARN-3324-branch-2.6.0.002.patch, YARN-3324-trunk.002.patch Current TestDockerContainerExecutor only cleans the temp directory in local file system but leaves the test docker image in local docker repository. It should be cleaned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3324) TestDockerContainerExecutor should clean test docker image from local repository after test is done
[ https://issues.apache.org/jira/browse/YARN-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383324#comment-14383324 ] Chen He commented on YARN-3324: --- Make sure there is no side effect if there are parallel docker tests running when you do your 1st step, TestDockerContainerExecutor should clean test docker image from local repository after test is done --- Key: YARN-3324 URL: https://issues.apache.org/jira/browse/YARN-3324 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Chen He Attachments: YARN-3324-branch-2.6.0.002.patch, YARN-3324-trunk.002.patch Current TestDockerContainerExecutor only cleans the temp directory in local file system but leaves the test docker image in local docker repository. It should be cleaned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
[ https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-2336: Attachment: YARN-2336-4.patch Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree -- Key: YARN-2336 URL: https://issues.apache.org/jira/browse/YARN-2336 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1 Reporter: Kenji Kikushima Assignee: Kenji Kikushima Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336-4.patch, YARN-2336.patch When we have sub queues in Fair Scheduler, REST api returns a missing '[' blacket JSON for childQueues. This issue found by [~ajisakaa] at YARN-1050. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3404) View the queue name to YARN Application page
[ https://issues.apache.org/jira/browse/YARN-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryu Kobayashi updated YARN-3404: Attachment: screenshot.png View the queue name to YARN Application page Key: YARN-3404 URL: https://issues.apache.org/jira/browse/YARN-3404 Project: Hadoop YARN Issue Type: Improvement Reporter: Ryu Kobayashi Priority: Minor Attachments: screenshot.png It want to display the name of the queue that is used to YARN Application page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
[ https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-2336: Attachment: (was: YARN-2336-4.patch) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree -- Key: YARN-2336 URL: https://issues.apache.org/jira/browse/YARN-2336 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1 Reporter: Kenji Kikushima Assignee: Kenji Kikushima Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336.patch When we have sub queues in Fair Scheduler, REST api returns a missing '[' blacket JSON for childQueues. This issue found by [~ajisakaa] at YARN-1050. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
[ https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-2336: Attachment: YARN-2336-4.patch Rebased for the latest trunk. Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree -- Key: YARN-2336 URL: https://issues.apache.org/jira/browse/YARN-2336 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1 Reporter: Kenji Kikushima Assignee: Kenji Kikushima Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336-4.patch, YARN-2336.patch When we have sub queues in Fair Scheduler, REST api returns a missing '[' blacket JSON for childQueues. This issue found by [~ajisakaa] at YARN-1050. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3403) Nodemanager dies after a small typo in mapred-site.xml is induced
[ https://issues.apache.org/jira/browse/YARN-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383362#comment-14383362 ] Naganarasimha G R commented on YARN-3403: - Hi [~mnikhil], which version are you testing with ? Nodemanager dies after a small typo in mapred-site.xml is induced - Key: YARN-3403 URL: https://issues.apache.org/jira/browse/YARN-3403 Project: Hadoop YARN Issue Type: Bug Reporter: Nikhil Mulley Priority: Critical Hi, We have noticed that with a small typo in terms of xml config (mapred-site.xml) can cause the nodemanager go down completely without stopping/restarting it externally. I find it little weird that editing the config files on the filesystem, could cause the running slave daemon yarn nodemanager shutdown. In this case, I had a ending tag '/' missed in a property and that induced the nodemanager go down in a cluster. Why would nodemanager reload the configs while it is running? Are not they picked up when they are started? Even if they are automated to pick up the new configs dynamically, I think the xmllint/config checker should come in before the nodemanager is asked to reload/restart. --- java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 228; columnNumber: 3; The element type value must be terminated by the matching end-tag /value. at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2348) --- Please shed light on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3403) Nodemanager dies after a small typo in mapred-site.xml is induced
[ https://issues.apache.org/jira/browse/YARN-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383285#comment-14383285 ] Nikhil Mulley commented on YARN-3403: - The more stack trace is here: this is reproducible. --- 2015-03-26 20:04:43,690 FATAL org.apache.hadoop.conf.Configuration: error parsing conf mapred-site.xml org.xml.sax.SAXParseException; systemId: file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 316; columnNumber: 3; The element type property must be terminated by the matching end-tag /property. at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:347) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2183) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2171) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2242) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2195) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2112) at org.apache.hadoop.conf.Configuration.get(Configuration.java:858) at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:877) at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1278) at org.apache.hadoop.io.compress.zlib.ZlibFactory.isNativeZlibLoaded(ZlibFactory.java:65) at org.apache.hadoop.io.compress.zlib.ZlibFactory.getZlibCompressorType(ZlibFactory.java:82) at org.apache.hadoop.io.compress.DefaultCodec.getCompressorType(DefaultCodec.java:74) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163) at org.apache.hadoop.io.file.tfile.Compression$Algorithm.getCompressor(Compression.java:274) at org.apache.hadoop.io.file.tfile.BCFile$Writer$WBlockState.init(BCFile.java:129) at org.apache.hadoop.io.file.tfile.BCFile$Writer.prepareDataBlock(BCFile.java:430) at org.apache.hadoop.io.file.tfile.TFile$Writer.initDataBlock(TFile.java:642) at org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:533) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.writeVersion(AggregatedLogFormat.java:276) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.init(AggregatedLogFormat.java:272) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:108) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:166) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:140) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$2.run(LogAggregationService.java:354) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-03-26 20:04:43,691 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Aggregation did not complete for application application_1426202183036_103251 2015-03-26 20:04:43,691 ERROR org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[LogAggregationService #2,5,main] threw an Throwable, but we are shutting down, so ignoring this java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 316; columnNumber: 3; The element type property must be terminated by the matching end-tag /property. -- Nodemanager dies after a small typo in mapred-site.xml is induced - Key: YARN-3403 URL: https://issues.apache.org/jira/browse/YARN-3403 Project: Hadoop YARN Issue Type: Bug Reporter: Nikhil Mulley Priority: Critical Hi, We have noticed that with a small typo in terms of xml config (mapred-site.xml) can cause the nodemanager go down completely without stopping/restarting it externally. I find it little weird that editing the config files on the filesystem, could cause the running slave daemon yarn nodemanager shutdown. In this case, I had a ending tag '/' missed in a property and that induced the nodemanager go down in a cluster. Why would
[jira] [Updated] (YARN-3404) View the queue name to YARN Application page
[ https://issues.apache.org/jira/browse/YARN-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryu Kobayashi updated YARN-3404: Attachment: YARN-3404.1.patch View the queue name to YARN Application page Key: YARN-3404 URL: https://issues.apache.org/jira/browse/YARN-3404 Project: Hadoop YARN Issue Type: Improvement Reporter: Ryu Kobayashi Priority: Minor Attachments: YARN-3404.1.patch, screenshot.png It want to display the name of the queue that is used to YARN Application page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3334) [Event Producers] NM start to posting some app related metrics in early POC stage of phase 2.
[ https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3334: - Attachment: YARN-3334-v2.patch [Event Producers] NM start to posting some app related metrics in early POC stage of phase 2. - Key: YARN-3334 URL: https://issues.apache.org/jira/browse/YARN-3334 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: YARN-2928 Reporter: Junping Du Assignee: Junping Du Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, YARN-3334-v2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382191#comment-14382191 ] Junping Du commented on YARN-3040: -- OK. I have commit v6 patch to branch YARN-2928. Thanks [~zjshen] for contributing the patch, and review comments from [~sjlee0], [~vinodkv], [~gtCarrera9], [~kasha] and [~Naganarasimha]! [Data Model] Make putEntities operation be aware of the app's context - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch, YARN-3040.4.patch, YARN-3040.5.patch, YARN-3040.6.patch Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382308#comment-14382308 ] Sangjin Lee commented on YARN-3044: --- {quote} Well its not the limitation at RM timeline collector which i am trying to mention, but the writer interface is like TimelineWriter.write(TimelineEntities) Writer would not be aware whether client is writing ApplicationEntity or AppAttemptEntity.IIUC it will just try to write the fields of the TimelineEntity to the storage. May be if its just storing entity as an json object directly to storage it might not be an issue but it will not be the case in hbase column storage right ? {quote} I see. So your point is whether the storage implementation can recognize different entity types and act accordingly? If so, the answer is yes. The storage implementation can easily introspect the type of the entity and do the right thing based on the type if needed. + [~zjshen] [Event producers] Implement RM writing app lifecycle events to ATS -- Key: YARN-3044 URL: https://issues.apache.org/jira/browse/YARN-3044 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3044.20150325-1.patch Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382226#comment-14382226 ] Sangjin Lee commented on YARN-3040: --- Thanks much [~zjshen]! [Data Model] Make putEntities operation be aware of the app's context - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch, YARN-3040.4.patch, YARN-3040.5.patch, YARN-3040.6.patch Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.
[ https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3334: - Description: After YARN-3039, we have service discovery mechanism to pass app-collector service address among collectors, NMs and RM. In this JIRA, we will handle service address setting for TimelineClients in NodeManager, and put container metrics to the backend storage. [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service. -- Key: YARN-3334 URL: https://issues.apache.org/jira/browse/YARN-3334 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: YARN-2928 Reporter: Junping Du Assignee: Junping Du Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, YARN-3334-v2.patch After YARN-3039, we have service discovery mechanism to pass app-collector service address among collectors, NMs and RM. In this JIRA, we will handle service address setting for TimelineClients in NodeManager, and put container metrics to the backend storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382333#comment-14382333 ] Jian Fang commented on YARN-796: Come back to this issue again since I am trying to merge the latest YARN-796 into our hadoop code base. Seems one thing is missing, i.e., how to specify the labels for application masters? Application master is special and it is the task manager of a specific YARN application. It also has some special requirements for its allocation on a hadoop cluster running in cloud. For example, in Amazon EC2, we do not want any application masters to be launched on any spot instances if we have both spot and on-demand instances available. Yarn-796 should provide a mechanism to achieve this goal. Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.1 Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, Node-labels-Requirements-Design-doc-V2.pdf, Non-exclusive-Node-Partition-Design.pdf, YARN-796-Diagram.pdf, YARN-796.node-label.consolidate.1.patch, YARN-796.node-label.consolidate.10.patch, YARN-796.node-label.consolidate.11.patch, YARN-796.node-label.consolidate.12.patch, YARN-796.node-label.consolidate.13.patch, YARN-796.node-label.consolidate.14.patch, YARN-796.node-label.consolidate.2.patch, YARN-796.node-label.consolidate.3.patch, YARN-796.node-label.consolidate.4.patch, YARN-796.node-label.consolidate.5.patch, YARN-796.node-label.consolidate.6.patch, YARN-796.node-label.consolidate.7.patch, YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, YARN-796.patch, YARN-796.patch4 It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3334) [Event Producers] NM start to posting some app related metrics in early POC stage of phase 2.
[ https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382171#comment-14382171 ] Junping Du commented on YARN-3334: -- Thanks [~zjshen] for review and comments! In v2, I corporate all of your comments above except one: replace TimelineEntity with ContainerEntity. I agree that the latter one sounds better. However, the test cannot pass locally if replacing {code} TimelineEntity entity = new TimelineEntity(); entity.setId(containerId.toString()); entity.setType(TimelineEntityType.YARN_CONTAINER.toString()); {code} with: {code} ContainerEntity entity = new ContainerEntity(); entity.setId(containerId.toString()); {code} Do we expect some info extra is necessary for ContainerEntity to set? If not, I suspect some bug (NPE, etc.) could be hidden in putEntity for ContainerEntity. If so, can we fix it separately? Add a TODO here though. [Event Producers] NM start to posting some app related metrics in early POC stage of phase 2. - Key: YARN-3334 URL: https://issues.apache.org/jira/browse/YARN-3334 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: YARN-2928 Reporter: Junping Du Assignee: Junping Du Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, YARN-3334-v2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382152#comment-14382152 ] Zhijie Shen commented on YARN-3040: --- Sure, let's return null for now. [Data Model] Make putEntities operation be aware of the app's context - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch, YARN-3040.4.patch, YARN-3040.5.patch, YARN-3040.6.patch Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382343#comment-14382343 ] Naganarasimha G R commented on YARN-3044: - [~sjlee0], bq. I see. So your point is whether the storage implementation can recognize different entity types and act accordingly? If so, the answer is yes. The storage implementation can easily introspect the type of the entity and do the right thing based on the type if needed. Well if introspection is by checking through TimelineEntity.getType and then cast it to the specific TimelineEntity, then it can break if the client/AM by chance tries to post a normal TimelineEntity with type as TimelineEntityType.YARN_APPLICATION or other system entities. Or other approaches like checking with {{instance of}} or the likes sounds inappropriate. [Event producers] Implement RM writing app lifecycle events to ATS -- Key: YARN-3044 URL: https://issues.apache.org/jira/browse/YARN-3044 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3044.20150325-1.patch Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382352#comment-14382352 ] Varun Saxena commented on YARN-3047: [~zjshen], the patch YARN-3047.04.patch applies for me using {{patch -p0}}. I had updated latest code as well. May I know where is it failing for you ? [Data Serving] Set up ATS reader with basic request serving structure and lifecycle --- Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: Timeline_Reader(draft).pdf, YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.02.patch, YARN-3047.04.patch Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382299#comment-14382299 ] Jian Fang commented on YARN-2495: - In a cloud environment such as Amazon EMR, a hadoop cluster is launched as a service by a single command line. There is no admin at all and everything is automated. The lables are basically of two types, one is static. For example, the nature of an EC2 instance such as spot or on-demand. The other is dynamic. For example, the cluster controller process can set an instance to be a candidate to be terminated in the case of graceful shrink so that resource manager will not assign new tasks to it. Most likely, the labels specified from each NM are static and are provided by a cluster controller process to write into yarn-site.xml based on EC2 metadata available on each EC2 instance. As a result, at least you should defined a static lablel provider (plus a dynamic lable provider? not sure) so that these lables are only sent to resource manager at NM registeration time. There is no point to add the static lables to each heartbeat. I think the idea of central and distributed label configurations are not ideal to use in a cloud environment. Usually we have a mix of static lables from each node and dynamic labels that are specified against the resource manager directly. Static and dynamic lable concepts are more appopriate at least for Amazon EMR. Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, YARN-2495.20150321-1.patch, YARN-2495.20150324-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3388) userlimit isn't playing well with DRF calculator
[ https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Roberts updated YARN-3388: - Attachment: YARN-3388-v0.patch Initial patch for comments on approach. Seems to work well in basic testing on 2.6. I don't know how this interacts with label support + userlimit which I think is still lacking in some cases anyway. Hoping [~leftnoteasy] and others can comment. userlimit isn't playing well with DRF calculator Key: YARN-3388 URL: https://issues.apache.org/jira/browse/YARN-3388 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.6.0 Reporter: Nathan Roberts Assignee: Nathan Roberts Attachments: YARN-3388-v0.patch When there are multiple active users in a queue, it should be possible for those users to make use of capacity up-to max_capacity (or close). The resources should be fairly distributed among the active users in the queue. This works pretty well when there is a single resource being scheduled. However, when there are multiple resources the situation gets more complex and the current algorithm tends to get stuck at Capacity. Example illustrated in subsequent comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3401) [Data Model] users should not be able to create a generic TimelineEntity and associate arbitrary type
[ https://issues.apache.org/jira/browse/YARN-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382575#comment-14382575 ] Sangjin Lee commented on YARN-3401: --- Thanks for reminding me of that discussion. Yes, we definitely discussed that, and we said that only YARN daemons are allowed to post system entities. If any non-YARN daemons (e.g. AMs, clients, tasks, etc.) try to post YARN system entities they would be rejected. That said, they can still refer to a YARN system entity. For example, if you're an MR AM then you might refer to the container id to post metrics for the container in which your tasks are running. So we need to be precise exactly what is disallowed. bq. if so if we add a check @ Timelineclient will it impact NM from posting container metrics entities ? NM is a YARN daemon, so it should be able to post container metrics and entities with no issues. [Data Model] users should not be able to create a generic TimelineEntity and associate arbitrary type - Key: YARN-3401 URL: https://issues.apache.org/jira/browse/YARN-3401 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R IIUC it is possible for users to create a generic TimelineEntity and set an arbitrary entity type. For example, for a YARN app, the right entity API is ApplicationEntity. However, today nothing stops users from instantiating a base TimelineEntity class and set the application type on it. This presents a problem in handling these YARN system entities in the storage layer for example. We need to ensure that the API allows only the right type of the class to be created for a given entity type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382506#comment-14382506 ] Allen Wittenauer commented on YARN-2495: That's effectively what the executable interface is for Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, YARN-2495.20150321-1.patch, YARN-2495.20150324-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382555#comment-14382555 ] Jian Fang commented on YARN-2495: - Great, thanks. Will try them. Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, YARN-2495.20150321-1.patch, YARN-2495.20150324-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382389#comment-14382389 ] Wangda Tan commented on YARN-2495: -- Hi [~john.jian.fang], Thanks for your comments, I'm not sure if I completely understood what you said. Did you mean there're two different types of NMs, which is: some labels are not changed in NM's lifetime, some labels could be modified when NM's running (I think the decommission case you provided is better to be resolved by graceful NM decommission instead of node label.). Having a centralized node label list is mostly for resource planning, you can take a look at conversions on YARN-3214 for more details about resource planning stuffs. Regardless of the centralized node label list in RM side, I think current implementation of attached patch should work for you. Even if labels could be modified via heartbeat, but you can simply not change them in your own script, if there's no changes of NM's labels, no duplicated data will be sent to RM side. Wangda Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, YARN-2495.20150321-1.patch, YARN-2495.20150324-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382388#comment-14382388 ] Naganarasimha G R commented on YARN-2495: - Hi [~john.jian.fang], Well this jira is followed by YARN-2729, where in labels got from the script are passed as part of heartbeat which makes the distributed label configuration as dynamic. Also as part of this jira we have tried to ensure that only when there is change in labels we send and if not we do not send static lables to each heartbeat. And for your case if cluster controller process wants to label a node so that it can graceful shrink can be done in 2 ways: * Use REST API and change the label of the node to some unique label which is not visible to other users * After YARN-2729, may be you can have some script which has appropriate logic to update RM with some some unique label when it wants to shrink itself gracefully. Hope i have addressed your scenario Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, YARN-2495.20150321-1.patch, YARN-2495.20150324-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled
[ https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2740: Description: According to YARN-2495, when distributed node label configuration is enabled: - RMAdmin / REST API should reject change labels on node operations. - CommonNodeLabelsManager shouldn't persist labels on nodes when NM do heartbeat. was: According to YARN-2495, when distributed node label configuration is enabled: - RMAdmin / REST API should reject change labels on node operations. - RMNodeLabelsManager shouldn't persistent labels on nodes when NM do heartbeat. ResourceManager side should properly handle node label modifications when distributed node label configuration enabled -- Key: YARN-2740 URL: https://issues.apache.org/jira/browse/YARN-2740 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch According to YARN-2495, when distributed node label configuration is enabled: - RMAdmin / REST API should reject change labels on node operations. - CommonNodeLabelsManager shouldn't persist labels on nodes when NM do heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled
[ https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2740: Attachment: YARN-2740.20150327-1.patch Hi [~wangda], Have rebased the patch and updated the patch to handle the second scenario {{CommonNodeLabelsManager shouldn't persist labels on nodes when NM do heartbeat.}} ResourceManager side should properly handle node label modifications when distributed node label configuration enabled -- Key: YARN-2740 URL: https://issues.apache.org/jira/browse/YARN-2740 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch, YARN-2740.20150327-1.patch According to YARN-2495, when distributed node label configuration is enabled: - RMAdmin / REST API should reject change labels on node operations. - CommonNodeLabelsManager shouldn't persist labels on nodes when NM do heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2618) Avoid over-allocation of disk resources
[ https://issues.apache.org/jira/browse/YARN-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382181#comment-14382181 ] Hadoop QA commented on YARN-2618: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707506/YARN-2618-6.patch against trunk revision 2228456. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 20 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.client.api.impl.TestYarnClient org.apache.hadoop.yarn.client.TestResourceManagerAdministrationProtocolPBClientImpl org.apache.hadoop.yarn.client.api.impl.TestAMRMClient org.apache.hadoop.yarn.client.TestGetGroups org.apache.hadoop.yarn.client.api.impl.TestNMClient org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA org.apache.hadoop.yarn.client.TestApplicationMasterServiceProtocolOnHA org.apache.hadoop.yarn.client.TestRMFailover org.apache.hadoop.yarn.server.resourcemanager.TestRMHA org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7116//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7116//console This message is automatically generated. Avoid over-allocation of disk resources --- Key: YARN-2618 URL: https://issues.apache.org/jira/browse/YARN-2618 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-2618-1.patch, YARN-2618-2.patch, YARN-2618-3.patch, YARN-2618-4.patch, YARN-2618-5.patch, YARN-2618-6.patch Subtask of YARN-2139. This should include - Add API support for introducing disk I/O as the 3rd type resource. - NM should report this information to the RM - RM should consider this to avoid over-allocation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3378) a load test client that can replay a volume of history files
[ https://issues.apache.org/jira/browse/YARN-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382435#comment-14382435 ] Sangjin Lee commented on YARN-3378: --- cc [~jeagles], [~lichangleo] I'm working on this based on what you have on YARN-2556, with major differences being - write it against the v.2 API (obviously) - add an ability to replay things like a bunch of history files to generate more realistic and non-trivial entities and data We'll also look into benchmarks more appropriate for the v.2 work as Li mentioned. We need a little bit of discussion on how this will proceed in parallel with YARN-2556. I'm taking the latest patch on YARN-2556 as the basis. Should we go ahead and commit the work done in YARN-2556 first? Thoughts? a load test client that can replay a volume of history files Key: YARN-3378 URL: https://issues.apache.org/jira/browse/YARN-3378 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee It might be good to create a load test client that can replay a large volume of history files into the timeline service. One can envision running such a load test client as a mapreduce job and generate a fair amount of load. It would be useful to spot check correctness, and more importantly observe performance characteristic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3401) [Data Model] users should not be able to create a generic TimelineEntity and associate arbitrary type
Sangjin Lee created YARN-3401: - Summary: [Data Model] users should not be able to create a generic TimelineEntity and associate arbitrary type Key: YARN-3401 URL: https://issues.apache.org/jira/browse/YARN-3401 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee IIUC it is possible for users to create a generic TimelineEntity and set an arbitrary entity type. For example, for a YARN app, the right entity API is ApplicationEntity. However, today nothing stops users from instantiating a base TimelineEntity class and set the application type on it. This presents a problem in handling these YARN system entities in the storage layer for example. We need to ensure that the API allows only the right type of the class to be created for a given entity type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3400) [JDK 8] Build Failure due to unreported exceptions in RPCUtil
[ https://issues.apache.org/jira/browse/YARN-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382371#comment-14382371 ] Hudson commented on YARN-3400: -- FAILURE: Integrated in Hadoop-trunk-Commit #7441 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7441/]) YARN-3400. [JDK 8] Build Failure due to unreported exceptions in RPCUtil (rkanter) (rkanter: rev 87130bf6b22f538c5c26ad5cef984558a8117798) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/RPCUtil.java [JDK 8] Build Failure due to unreported exceptions in RPCUtil -- Key: YARN-3400 URL: https://issues.apache.org/jira/browse/YARN-3400 Project: Hadoop YARN Issue Type: Bug Reporter: Robert Kanter Assignee: Robert Kanter Fix For: 2.8.0 Attachments: YARN-3400.patch When I try compiling Hadoop with JDK 8 like this {noformat} mvn clean package -Pdist -Dtar -DskipTests -Djavac.version=1.8 {noformat} I get this error: {noformat} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hadoop-yarn-common: Compilation failure: Compilation failure: [ERROR] /Users/rkanter/dev/hadoop-common2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/RPCUtil.java:[101,11] unreported exception java.lang.Throwable; must be caught or declared to be thrown [ERROR] /Users/rkanter/dev/hadoop-common2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/RPCUtil.java:[104,11] unreported exception java.lang.Throwable; must be caught or declared to be thrown [ERROR] /Users/rkanter/dev/hadoop-common2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/RPCUtil.java:[107,11] unreported exception java.lang.Throwable; must be caught or declared to be thrown {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382372#comment-14382372 ] Sangjin Lee commented on YARN-3044: --- YARN-3401 [Event producers] Implement RM writing app lifecycle events to ATS -- Key: YARN-3044 URL: https://issues.apache.org/jira/browse/YARN-3044 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3044.20150325-1.patch Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382404#comment-14382404 ] Wangda Tan commented on YARN-796: - [~john.jian.fang], The patch attached in this JIRA is staled, instead you should merge patches under YARN-2492. For more usage info, you can take a look at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/YARN_RM_v22/node_labels/index.html#Item1.1. Specifically to your question, now we support 4 ways to specify labels for applications (CapacityScheduler only for now): 1) Specify default-node-label-expression in each queue, all containers under the queue will be assigned to label specified 2) Specify ApplicationSubmissionContext.appLabelExpression, all containers under the app will be assigned to label specified 3) Specify ApplicationSubmissionContext.amContainerLabelExpression, AM container will be assigned to label specified 4) Specify ResourceRequest.nodeLabelExpression, individual containers will be assigned to label specified. Let me know if you have more questions. Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.1 Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, Node-labels-Requirements-Design-doc-V2.pdf, Non-exclusive-Node-Partition-Design.pdf, YARN-796-Diagram.pdf, YARN-796.node-label.consolidate.1.patch, YARN-796.node-label.consolidate.10.patch, YARN-796.node-label.consolidate.11.patch, YARN-796.node-label.consolidate.12.patch, YARN-796.node-label.consolidate.13.patch, YARN-796.node-label.consolidate.14.patch, YARN-796.node-label.consolidate.2.patch, YARN-796.node-label.consolidate.3.patch, YARN-796.node-label.consolidate.4.patch, YARN-796.node-label.consolidate.5.patch, YARN-796.node-label.consolidate.6.patch, YARN-796.node-label.consolidate.7.patch, YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, YARN-796.patch, YARN-796.patch4 It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3401) [Data Model] users should not be able to create a generic TimelineEntity and associate arbitrary type
[ https://issues.apache.org/jira/browse/YARN-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382412#comment-14382412 ] Naganarasimha G R commented on YARN-3401: - Hi [~sjlee0], IIRC as part of the doc or some jira discussion we discussed that only RM/NM should be able to send the YARN system entities and other clients should not be able to send, right ? do we need to completely block it ? if so if we add a check @ Timelineclient will it impact NM from posting container metrics entities ? [Data Model] users should not be able to create a generic TimelineEntity and associate arbitrary type - Key: YARN-3401 URL: https://issues.apache.org/jira/browse/YARN-3401 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R IIUC it is possible for users to create a generic TimelineEntity and set an arbitrary entity type. For example, for a YARN app, the right entity API is ApplicationEntity. However, today nothing stops users from instantiating a base TimelineEntity class and set the application type on it. This presents a problem in handling these YARN system entities in the storage layer for example. We need to ensure that the API allows only the right type of the class to be created for a given entity type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382486#comment-14382486 ] Jian Fang commented on YARN-796: Thanks. Seems ApplicationSubmissionContext.amContainerLabelExpression is the one that I am looking for. Will try that to see if it works. Any plans for the fair scheduler? We need that as well. Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.1 Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, Node-labels-Requirements-Design-doc-V2.pdf, Non-exclusive-Node-Partition-Design.pdf, YARN-796-Diagram.pdf, YARN-796.node-label.consolidate.1.patch, YARN-796.node-label.consolidate.10.patch, YARN-796.node-label.consolidate.11.patch, YARN-796.node-label.consolidate.12.patch, YARN-796.node-label.consolidate.13.patch, YARN-796.node-label.consolidate.14.patch, YARN-796.node-label.consolidate.2.patch, YARN-796.node-label.consolidate.3.patch, YARN-796.node-label.consolidate.4.patch, YARN-796.node-label.consolidate.5.patch, YARN-796.node-label.consolidate.6.patch, YARN-796.node-label.consolidate.7.patch, YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, YARN-796.patch, YARN-796.patch4 It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382362#comment-14382362 ] Sangjin Lee commented on YARN-3044: --- I'll file a separate JIRA for this. [Event producers] Implement RM writing app lifecycle events to ATS -- Key: YARN-3044 URL: https://issues.apache.org/jira/browse/YARN-3044 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3044.20150325-1.patch Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382361#comment-14382361 ] Sangjin Lee commented on YARN-3044: --- That's a fair point. As a rule, we need to prevent users of the TimelineEntity API from setting arbitrary types. The only way of creating a YARN app timeline entity for example should be through instantiating ApplicationEntity. We may need to make some of the methods that make this possible non-public, etc., although it remains to be seen how much of that is doable, given json needs to be able to handle them. If we have that, IMO the type-based casting should be acceptable (it should reject the entity if the type says one thing and it is not the right class). Thoughts? [Event producers] Implement RM writing app lifecycle events to ATS -- Key: YARN-3044 URL: https://issues.apache.org/jira/browse/YARN-3044 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3044.20150325-1.patch Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3401) [Data Model] users should not be able to create a generic TimelineEntity and associate arbitrary type
[ https://issues.apache.org/jira/browse/YARN-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382406#comment-14382406 ] Junping Du commented on YARN-3401: -- We also need to make sure compatibility between old version application and new version timeline service. Typically, it won't be the case. But just put here as a reminder. [Data Model] users should not be able to create a generic TimelineEntity and associate arbitrary type - Key: YARN-3401 URL: https://issues.apache.org/jira/browse/YARN-3401 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R IIUC it is possible for users to create a generic TimelineEntity and set an arbitrary entity type. For example, for a YARN app, the right entity API is ApplicationEntity. However, today nothing stops users from instantiating a base TimelineEntity class and set the application type on it. This presents a problem in handling these YARN system entities in the storage layer for example. We need to ensure that the API allows only the right type of the class to be created for a given entity type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382496#comment-14382496 ] Jian Fang commented on YARN-2495: - On each EC2 instance, the metadata about that instance such as its market type, i.e., spot or on-demand, CPUs, memory and etc are available when the instance starts up. All these information are injected to yarn-site.xml by our instance controller and they will not be changed afterwards. Different instances in an EMR cluster could have different static lables since one EMR hadoop consists of multiple instance groups, i.e., different types of instances. I think it is ok that no duplicated data are sent to RM if not NM lable changes. Thanks. Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, YARN-2495.20150321-1.patch, YARN-2495.20150324-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382501#comment-14382501 ] Jian Fang commented on YARN-2495: - BTW, I haven't gone through all the details of YARN-2492 yet, is it possible to provide a configuration to hook in different label providers on NM, for example, a third party one? (Sorry if this feature already exists). Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, YARN-2495.20150321-1.patch, YARN-2495.20150324-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3399) Consider having a Default cluster ID
[ https://issues.apache.org/jira/browse/YARN-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383041#comment-14383041 ] Zhijie Shen commented on YARN-3399: --- Thanks, Vinod! This proposal sounds almost good to me, but I think we need to rethink what's the default cluster ID. default-$(RM-host-name)-cluster may not work because yarn.resourcemanager.hostname is 0.0.0.0 by default, such that different RMs may still use the same cluster ID. Even if we use IP address to lookup host name, it's likely to end up with the same localhost. Consider having a Default cluster ID Key: YARN-3399 URL: https://issues.apache.org/jira/browse/YARN-3399 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Brahma Reddy Battula In YARN-3040, timeline service will set the default cluster ID if users don't provide one. RM HA's current behavior is a bit different when users don't provide cluster ID. IllegalArgumentException will throw instead. Let's continue the discussion if RM HA needs the default cluster ID or not here, and what's the proper default cluster ID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3331) NodeManager should use directory other than tmp for extracting and loading leveldbjni
[ https://issues.apache.org/jira/browse/YARN-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383078#comment-14383078 ] Zhijie Shen commented on YARN-3331: --- bq. I am not sure which value in core-site would fix this after going through the core-default documentation. I'm afraid we can't set it in config file, because config file is read by the daemon, but we need to start the daemon with this opt. And IMHO, {{-Dlibrary.leveldbjni.path}} alone cannot fix the problem. If the temporal native lib is redirected to another dir, we also needs to add that dir to {{JAVA_LIBRARY_PATH}}. Otherwise, we may still end up with native lib not found. NodeManager should use directory other than tmp for extracting and loading leveldbjni - Key: YARN-3331 URL: https://issues.apache.org/jira/browse/YARN-3331 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3331.001.patch, YARN-3331.002.patch /tmp can be required to be noexec in many environments. This causes a problem when nodemanager tries to load the leveldbjni library which can get unpacked and executed from /tmp. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3323) Task UI, sort by name doesn't work
[ https://issues.apache.org/jira/browse/YARN-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381899#comment-14381899 ] Akira AJISAKA commented on YARN-3323: - Hi, [~brahmareddy], looks like the version of {{jquery.dataTables.min.js.gz}} included in v2 patch is still 1.9.4. Would you include the latest version? Task UI, sort by name doesn't work -- Key: YARN-3323 URL: https://issues.apache.org/jira/browse/YARN-3323 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.5.1 Reporter: Thomas Graves Assignee: Brahma Reddy Battula Attachments: YARN-3323-002.patch, YARN-3323.patch If you go to the MapReduce ApplicationMaster or HistoryServer UI and open the list of tasks, then try to sort by the task name/id, it does nothing. Note that if you go to the task attempts, that seem to sort fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3040: - Attachment: YARN-3040.6.patch [Data Model] Make putEntities operation be aware of the app's context - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch, YARN-3040.4.patch, YARN-3040.5.patch, YARN-3040.6.patch Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381908#comment-14381908 ] Junping Du commented on YARN-3040: -- Sounds like there is a build failure for v5 patch: RMTimelineCollector (just added in YARN-3034) need to override abstract method getTimelineEntityContext() in TimelineCollector. Given there is YARN-3390 to track this issue separately, I think we can simply add a quick method (like return null) to RMTimelineCollector like v6 patch shows. [~zjshen], can you confirm this? [Data Model] Make putEntities operation be aware of the app's context - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch, YARN-3040.4.patch, YARN-3040.5.patch, YARN-3040.6.patch Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters
[ https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381913#comment-14381913 ] Hadoop QA commented on YARN-3304: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707398/YARN-3304-v3.patch against trunk revision b4b4fe9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7115//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7115//console This message is automatically generated. ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters Key: YARN-3304 URL: https://issues.apache.org/jira/browse/YARN-3304 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Junping Du Assignee: Karthik Kambatla Priority: Blocker Attachments: YARN-3304-v2.patch, YARN-3304-v3.patch, YARN-3304.patch Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for unavailable case while other resource metrics are return 0 in the same case which sounds inconsistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3402) Security support for new timeline service.
Junping Du created YARN-3402: Summary: Security support for new timeline service. Key: YARN-3402 URL: https://issues.apache.org/jira/browse/YARN-3402 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Junping Du Assignee: Junping Du We should support YARN security for new TimelineService. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382799#comment-14382799 ] Yongjun Zhang commented on YARN-3021: - Hi [~jianhe], would you please take a look at the latest patch? thanks a lot. YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Assignee: Yongjun Zhang Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, YARN-3021.006.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382542#comment-14382542 ] Wangda Tan commented on YARN-2495: -- bq. is it possible to provide a configuration to hook in different label providers on NM, for example, a third party one? (Sorry if this feature already exists). Yes, you can check in this patch, how LabelProvider created is leaving blank, and we have two JIRAs to make it configurable: - YARN-2729 for script based - YARN-2923 for config based This should be pluggable and new provider can be added in the future. Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, YARN-2495.20150321-1.patch, YARN-2495.20150324-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382552#comment-14382552 ] Wangda Tan commented on YARN-796: - Fair scheduler efforts are tracked by YARN-2497. You can check about plans in that JIRA. Thanks, Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.1 Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, Node-labels-Requirements-Design-doc-V2.pdf, Non-exclusive-Node-Partition-Design.pdf, YARN-796-Diagram.pdf, YARN-796.node-label.consolidate.1.patch, YARN-796.node-label.consolidate.10.patch, YARN-796.node-label.consolidate.11.patch, YARN-796.node-label.consolidate.12.patch, YARN-796.node-label.consolidate.13.patch, YARN-796.node-label.consolidate.14.patch, YARN-796.node-label.consolidate.2.patch, YARN-796.node-label.consolidate.3.patch, YARN-796.node-label.consolidate.4.patch, YARN-796.node-label.consolidate.5.patch, YARN-796.node-label.consolidate.6.patch, YARN-796.node-label.consolidate.7.patch, YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, YARN-796.patch, YARN-796.patch4 It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters
[ https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382759#comment-14382759 ] Junping Du commented on YARN-3304: -- Hi [~kasha] and [~adhoot], v3 patch should be a complete and clean solution for this blocker. Can you help to review and comment? Thanks! ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters Key: YARN-3304 URL: https://issues.apache.org/jira/browse/YARN-3304 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Junping Du Assignee: Karthik Kambatla Priority: Blocker Attachments: YARN-3304-v2.patch, YARN-3304-v3.patch, YARN-3304.patch Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for unavailable case while other resource metrics are return 0 in the same case which sounds inconsistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3401) [Data Model] users should not be able to create a generic TimelineEntity and associate arbitrary type
[ https://issues.apache.org/jira/browse/YARN-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382794#comment-14382794 ] Junping Du commented on YARN-3401: -- [~sjlee0] and [~Naganarasimha], I think this belongs to prevent of malicious behaviors. I would suggest to get back to this until we are discussing support of YARN Security in TimelineService which shouldn't happen very soon. Just filed YARN-3402 to track security issue for new timeline service. [Data Model] users should not be able to create a generic TimelineEntity and associate arbitrary type - Key: YARN-3401 URL: https://issues.apache.org/jira/browse/YARN-3401 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R IIUC it is possible for users to create a generic TimelineEntity and set an arbitrary entity type. For example, for a YARN app, the right entity API is ApplicationEntity. However, today nothing stops users from instantiating a base TimelineEntity class and set the application type on it. This presents a problem in handling these YARN system entities in the storage layer for example. We need to ensure that the API allows only the right type of the class to be created for a given entity type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3402) Security support for new timeline service.
[ https://issues.apache.org/jira/browse/YARN-3402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3402: - Description: We should support YARN security for new TimelineService. Basically, there should be security token exchange between AM, NMs and app-collectors to prevent anyone who knows the service address of app-collector can post faked/unwanted information. was:We should support YARN security for new TimelineService. Security support for new timeline service. -- Key: YARN-3402 URL: https://issues.apache.org/jira/browse/YARN-3402 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Junping Du Assignee: Junping Du We should support YARN security for new TimelineService. Basically, there should be security token exchange between AM, NMs and app-collectors to prevent anyone who knows the service address of app-collector can post faked/unwanted information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382690#comment-14382690 ] Hadoop QA commented on YARN-3047: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707592/YARN-3047.005.patch against trunk revision 61df1b2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7117//console This message is automatically generated. [Data Serving] Set up ATS reader with basic request serving structure and lifecycle --- Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: Timeline_Reader(draft).pdf, YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.005.patch, YARN-3047.02.patch, YARN-3047.04.patch Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3047: --- Attachment: YARN-3047.005.patch Uploaded a new patch. Verfied that patch applies with {{ patch -p0 }} [Data Serving] Set up ATS reader with basic request serving structure and lifecycle --- Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: Timeline_Reader(draft).pdf, YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.005.patch, YARN-3047.02.patch, YARN-3047.04.patch Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382387#comment-14382387 ] Junping Du commented on YARN-3044: -- Thanks guys for good discussions above especially for topic of posting app lifecycle events from NM or RM. Can I propose that we do both ways in development stage? I fully understand the concern from [~sjlee0] that RM may not afford tens of thousands containers in large size cluster. However, we can disable RM-side posting work in production environment by default. We can have different entity types, e.g. NM_CONTAINER_EVENT, RM_CONTAINER_EVENT, for containers' event get posted from NM or RM then we can fully understand how the world could be different from NM and RM (i.e. start time, end time, etc.). It not only benefit the development cycle, but also benefit the trouble-shooting work in a production environment as this apple-to-apple comparing may provide some hints to user. Given this (both way) doesn't sounds like too much work, I think it may worth to do. Thoughts? [Event producers] Implement RM writing app lifecycle events to ATS -- Key: YARN-3044 URL: https://issues.apache.org/jira/browse/YARN-3044 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3044.20150325-1.patch Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3401) [Data Model] users should not be able to create a generic TimelineEntity and associate arbitrary type
[ https://issues.apache.org/jira/browse/YARN-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R reassigned YARN-3401: --- Assignee: Naganarasimha G R [Data Model] users should not be able to create a generic TimelineEntity and associate arbitrary type - Key: YARN-3401 URL: https://issues.apache.org/jira/browse/YARN-3401 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R IIUC it is possible for users to create a generic TimelineEntity and set an arbitrary entity type. For example, for a YARN app, the right entity API is ApplicationEntity. However, today nothing stops users from instantiating a base TimelineEntity class and set the application type on it. This presents a problem in handling these YARN system entities in the storage layer for example. We need to ensure that the API allows only the right type of the class to be created for a given entity type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3403) Nodemanager dies after a small typo in mapred-site.xml is induced
[ https://issues.apache.org/jira/browse/YARN-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikhil Mulley updated YARN-3403: Priority: Critical (was: Major) Nodemanager dies after a small typo in mapred-site.xml is induced - Key: YARN-3403 URL: https://issues.apache.org/jira/browse/YARN-3403 Project: Hadoop YARN Issue Type: Bug Reporter: Nikhil Mulley Priority: Critical Hi, We have noticed that with a small typo in terms of xml config (mapred-site.xml) can cause the nodemanager go down completely without stopping/restarting it externally. I find it little weird that editing the config files on the filesystem, could cause the running slave daemon yarn nodemanager shutdown. In this case, I had a ending tag '/' missed in a property and that induced the nodemanager go down in a cluster. Why would nodemanager reload the configs while it is running? Are not they picked up when they are started? Even if they are automated to pick up the new configs dynamically, I think the xmllint/config checker should come in before the nodemanager is asked to reload/restart. --- java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 228; columnNumber: 3; The element type value must be terminated by the matching end-tag /value. at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2348) --- Please shed light on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3403) Nodemanager dies after a small typo in mapred-site.xml is induced
Nikhil Mulley created YARN-3403: --- Summary: Nodemanager dies after a small typo in mapred-site.xml is induced Key: YARN-3403 URL: https://issues.apache.org/jira/browse/YARN-3403 Project: Hadoop YARN Issue Type: Bug Reporter: Nikhil Mulley Hi, We have noticed that with a small typo in terms of xml config (mapred-site.xml) can cause the nodemanager go down completely without stopping/restarting it externally. I find it little weird that editing the config files on the filesystem, could cause the running slave daemon yarn nodemanager shutdown. In this case, I had a ending tag '/' missed in a property and that induced the nodemanager go down in a cluster. Why would nodemanager reload the configs while it is running? Are not they picked up when they are started? Even if they are automated to pick up the new configs dynamically, I think the xmllint/config checker should come in before the nodemanager is asked to reload/restart. --- java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 228; columnNumber: 3; The element type value must be terminated by the matching end-tag /value. at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2348) --- Please shed light on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3402) Security support for new timeline service.
[ https://issues.apache.org/jira/browse/YARN-3402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3402: - Description: We should support YARN security for new TimelineService. Basically, there should be security token exchange between AM, NMs and app-collectors to prevent anyone who knows the service address of app-collector can post faked/unwanted information. Also, there should be tokens exchange between app-collector/RMTimelineCollector and backend storage (HBase, Phoenix, etc.) that enabling security. was: We should support YARN security for new TimelineService. Basically, there should be security token exchange between AM, NMs and app-collectors to prevent anyone who knows the service address of app-collector can post faked/unwanted information. Security support for new timeline service. -- Key: YARN-3402 URL: https://issues.apache.org/jira/browse/YARN-3402 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Junping Du Assignee: Junping Du We should support YARN security for new TimelineService. Basically, there should be security token exchange between AM, NMs and app-collectors to prevent anyone who knows the service address of app-collector can post faked/unwanted information. Also, there should be tokens exchange between app-collector/RMTimelineCollector and backend storage (HBase, Phoenix, etc.) that enabling security. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Unable to run Hadoop on Windows 8.1 64bit
As per Brahma i have followed the procedure he mentioned to build the Hadoop in windows 8.1 64bit system and i was successful but unable to run the Hadoop. https://issues.apache.org/jira/browse/HADOOP-11752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel Followed below procedure for building Hadoop and successful in building it: http://zutai.blogspot.in/2014/06/build-install-and-run-hadoop-24-240-on.html?showComment=1422091525887#c2264594416650430988 *Runtime error while running Hadoop in Windows 8.1 64bit system:* When i try to do hdfs namenode -format, i am getting the below error: *C:\Users\..\hadoophdfs namenode -format'hdfs' is not recognized as an internal or external command,operable program or batch file.* *C:\Users\..\hadoopstart-dfs'start-dfs' is not recognized as an internal or external command,operable program or batch file.* *C:\Users\..\hadoop\hadoop-dist\target\hadoop-3.0.0-SNAPSHOT\sbinhdfs namenode -format'hdfs' is not recognized as an internal or external command,operable program or batch file.* *C:\Users\..\hadoop\hadoop-dist\target\hadoop-3.0.0-SNAPSHOT\sbinstart-dfs* *The system cannot find the file hadoop.The system cannot find the file hadoop.* Can you please let me know how to format hdfs, start DFS, YARN and run the hadoop on windows 8.1 64bit system. -- Thanks Regards, Sravan CPChem 281-757-6777 (C) | kum...@cpchem.com kum...@cpchemt.com
[jira] [Resolved] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du resolved YARN-3040. -- Resolution: Fixed Fix Version/s: YARN-2928 Hadoop Flags: Reviewed [Data Model] Make putEntities operation be aware of the app's context - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Fix For: YARN-2928 Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch, YARN-3040.4.patch, YARN-3040.5.patch, YARN-3040.6.patch Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3399) Consider having a Default cluster ID
[ https://issues.apache.org/jira/browse/YARN-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3399: -- Summary: Consider having a Default cluster ID (was: Default cluster ID for RM HA) Editing title to be appropriate. Others commented on YARN-3040. So I'll try to summarize the discussion from YARN-1029 and YARN-3040. - We should have a generic {{yarn.cluster-id}} and deprecate the current RM only configuration - We need to have a reasonable default cluster-id -- This is needed for the Timeline service functionality - we want to gather insights per cluster -- Forcing admins to set the ID explicitly is one more hurdle w.r.t configuration -- For single node non-HA clusters, forcing the dev/admin to set it is unnecessary. - But there are concerns too -- Default cluster-id can potentially cause hard-to-debug issues in HA mode. - Other constraints while picking a default cluster ID -- Restarting RM on the same node shouldn't change the cluster-id So, I propose that we set the default cluster-ID to be something like default-$(RM-host-name)-cluster. This way - by default, single node clusters are good across RM restarts, unless you are running active/standby RMs on the same machine (dev environments) - HA RMs have to be setup explicitly to be part of the same cluster - thereby avoiding debuggability issues. - For real life use, in order to facilitate RM migrations, administrators will set their own cluster-id. Consider having a Default cluster ID Key: YARN-3399 URL: https://issues.apache.org/jira/browse/YARN-3399 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Brahma Reddy Battula In YARN-3040, timeline service will set the default cluster ID if users don't provide one. RM HA's current behavior is a bit different when users don't provide cluster ID. IllegalArgumentException will throw instead. Let's continue the discussion if RM HA needs the default cluster ID or not here, and what's the proper default cluster ID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382866#comment-14382866 ] Li Lu commented on YARN-3047: - Hi [~varun_saxena], thanks for the doc! I have two general questions about your proposed plan: # I'm a little bit confused on Timeline Reader will be a single daemon(in the initial phase). In reader overview section there are multiple threads in the reader, are those threads managed in YARN-3047? Specifically, what is the concrete plan for Phase 1 on reader's architecture, single daemon multiple thread, or single daemon single thread? If it's the former, you may want to update YARN-3047's patch, while if it's the latter, you may want to confirm this and update the figure afterwards (not the top priority for now). # On storage layer we're prioritizing timeline entities and metrics, it would be great if there are some API support from reader level for metrics. For the current progress on the storage layer, I'm not sure if we can finish V1 storage support by the time you finish reader phase 1. We may probably need some coordination on this. [Data Serving] Set up ATS reader with basic request serving structure and lifecycle --- Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: Timeline_Reader(draft).pdf, YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.005.patch, YARN-3047.02.patch, YARN-3047.04.patch Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2893: Attachment: YARN-2893.002.patch AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: zhihai xu Attachments: YARN-2893.000.patch, YARN-2893.001.patch, YARN-2893.002.patch MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383151#comment-14383151 ] zhihai xu commented on YARN-2893: - [~adhoot], thanks for the review. I added a test case for the AMLauncher changes in the new patch YARN-2893.002.patch. The root cause for this bug is at job Client which submitted a bad token in ApplicationSubmissionContext. The changes for RMAppManager#submitApplication is to prevent this error earlier. So the user who submit the application knows the real cause of the issue. bq. The changes for RMAppManager#submitApplication seems to no longer return RMAppRejectedEvent for any exception in getDelegationTokenRenewer().addApplicationAsync. Is that deliberate? I checked the code for DelegationTokenRenewer#addApplicationAsync, I didn't find any exception which will be generated from addApplicationAsync. addApplicationAsync will launch a thread to run handleDTRenewerAppSubmitEvent, any exception from handleDTRenewerAppSubmitEvent will return RMAppRejectedEvent. {code} private void handleDTRenewerAppSubmitEvent( DelegationTokenRenewerAppSubmitEvent event) { try { // Setup tokens for renewal DelegationTokenRenewer.this.handleAppSubmitEvent(event); rmContext.getDispatcher().getEventHandler() .handle(new RMAppEvent(event.getApplicationId(), RMAppEventType.START)); } catch (Throwable t) { LOG.warn( Unable to add the application to the delegation token renewer., t); // Sending APP_REJECTED is fine, since we assume that the // RMApp is in NEW state and thus we havne't yet informed the // Scheduler about the existence of the application rmContext.getDispatcher().getEventHandler().handle( new RMAppRejectedEvent(event.getApplicationId(), t.getMessage())); } } {code} This is why I only check the exception for parseCredentials. Also the original code only expected the exception from parseCredentials based on the exception message. {code} LOG.warn(Unable to parse credentials., e); {code} AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: zhihai xu Attachments: YARN-2893.000.patch, YARN-2893.001.patch, YARN-2893.002.patch MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383157#comment-14383157 ] zhihai xu commented on YARN-2893: - By the way, the new added test case in TestApplicationMasterLauncher will fail without the AMLauncher changes The following is sample failure message without the AMLauncher changes. {code} -- T E S T S --- Running org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.838 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher testSetupTokens(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher) Time elapsed: 2.101 sec FAILURE! java.lang.AssertionError: EOFException should not happen. at org.junit.Assert.fail(Assert.java:88) at org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher.testSetupTokens(TestApplicationMasterLauncher.java:278) {code} AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: zhihai xu Attachments: YARN-2893.000.patch, YARN-2893.001.patch, YARN-2893.002.patch MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3395) [Fair Scheduler] Handle the user name correctly when user name is used as default queue name.
[ https://issues.apache.org/jira/browse/YARN-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381436#comment-14381436 ] Hadoop QA commented on YARN-3395: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707428/YARN-3395.000.patch against trunk revision 44809b8. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMHA org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7114//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7114//console This message is automatically generated. [Fair Scheduler] Handle the user name correctly when user name is used as default queue name. - Key: YARN-3395 URL: https://issues.apache.org/jira/browse/YARN-3395 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3395.000.patch Handle the user name correctly when user name is used as default queue name in fair scheduler. It will be better to remove the trailing and leading whitespace of the user name when we use user name as default queue name, otherwise it will be rejected by InvalidQueueNameException from QueueManager. I think it is reasonable to make this change, because we already did special handling for '.' in user name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor
[ https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381441#comment-14381441 ] Sidharta Seethana commented on YARN-3365: - That should read : {{container-executor --tc-read-state tmp-file-with-tc-commands.txt}} Add support for using the 'tc' tool via container-executor -- Key: YARN-3365 URL: https://issues.apache.org/jira/browse/YARN-3365 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Sidharta Seethana Assignee: Sidharta Seethana Attachments: YARN-3365.001.patch, YARN-3365.002.patch, YARN-3365.003.patch We need the following functionality : 1) modify network interface traffic shaping rules - to be able to attach a qdisc, create child classes etc 2) read existing rules in place 3) read stats for the various classes Using tc requires elevated privileges - hence this functionality is to be made available via container-executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381453#comment-14381453 ] Naganarasimha G R commented on YARN-3044: - Thanks [~vinodkv],[~vrushalic], [~sjlee0] [~zjshen] for reviewing and providing your view points : 1 {{source of life-cycle events of container}} is a debatable topic, to summarize pro's and cons when run in NM: Pros * Even though the load is not too high when compared to publishing of container metrics, life cycle events might have considerable load for a large cluster as explained by [~sjlee0]. So i feel better to get it distributed in this aspect * if start and end time of life cycle events are logged from NM it will be easier to analyze flow of container as it is actual time when it was started * IMO it would be good to have all the metrics and events are raised from NM itself as there might be a possibility of race condition if container entities are raised from RM and metrics and few other life cycle events from NM for ex. when RM is slow to dispatch the events and NM is faster in doing it. (though hbase as storage will be able to handle it well but not sure about the other storages we are planning to ) Cons * start and end time of life cycle events might not match from what is displayed from RM (web ui etc..) * start and end time of life cycle events in terms of scheduling it might not be as accurate as it would have been done from RM. Please correct me on these and add on if i have missed any. 2 ??But the life-cycle events of container should definitely originate at the RM; NMs don't even know many of them.?? Not much aware on this, can you please eloborate on what might be missed ? 3 ??Why would that be the case? Can the RM timeline collector not use specific subclasses of TimelineEntity?? Well its not the limitation at RM timeline collector which i am trying to mention, but the writer interface is like {{TimelineWriter.write(TimelineEntities)}} Writer would not be aware whether client is writing ApplicationEntity or AppAttemptEntity.IIUC it will just try to write the fields of the TimelineEntity to the storage. May be if its just storing entity as an json object directly to storage it might not be an issue but it will not be the case in hbase column storage right ? 4 ??My suggestion is that we start with reimplementing what we provided in YTS v1, and add more timeline data on demand later?? true that to start of with this would be sufficent, but in future i would liked to capture all the events as currently to analyze/debug issues with container we usually start searching the NM and RM logs with container string to find what state the application/container is in. ur opinion ? [Event producers] Implement RM writing app lifecycle events to ATS -- Key: YARN-3044 URL: https://issues.apache.org/jira/browse/YARN-3044 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3044.20150325-1.patch Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2618) Avoid over-allocation of disk resources
[ https://issues.apache.org/jira/browse/YARN-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2618: -- Attachment: YARN-2618-6.patch Rebase the patch. Avoid over-allocation of disk resources --- Key: YARN-2618 URL: https://issues.apache.org/jira/browse/YARN-2618 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-2618-1.patch, YARN-2618-2.patch, YARN-2618-3.patch, YARN-2618-4.patch, YARN-2618-5.patch, YARN-2618-6.patch Subtask of YARN-2139. This should include - Add API support for introducing disk I/O as the 3rd type resource. - NM should report this information to the RM - RM should consider this to avoid over-allocation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2213) Change proxy-user cookie log in AmIpFilter to DEBUG
[ https://issues.apache.org/jira/browse/YARN-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381965#comment-14381965 ] Hudson commented on YARN-2213: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #135 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/135/]) YARN-2213. Change proxy-user cookie log in AmIpFilter to DEBUG. (xgong: rev e556198e71df6be3a83e5598265cb702fc7a668b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/amfilter/AmIpFilter.java * hadoop-yarn-project/CHANGES.txt Change proxy-user cookie log in AmIpFilter to DEBUG --- Key: YARN-2213 URL: https://issues.apache.org/jira/browse/YARN-2213 Project: Hadoop YARN Issue Type: Task Reporter: Ted Yu Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 Attachments: YARN-2213.001.patch, YARN-2213.02.patch I saw a lot of the following lines in AppMaster log: {code} 14/06/24 17:12:36 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set {code} For long running app, this would consume considerable log space. Log level should be changed to DEBUG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3397) yarn rmadmin should skip -failover
[ https://issues.apache.org/jira/browse/YARN-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381963#comment-14381963 ] Hudson commented on YARN-3397: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #135 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/135/]) YARN-3397. yarn rmadmin should skip -failover. (J.Andreina via kasha) (kasha: rev c906a1de7280dabd9d9d8b6aeaa060898e6d17b6) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java yarn rmadmin should skip -failover -- Key: YARN-3397 URL: https://issues.apache.org/jira/browse/YARN-3397 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: J.Andreina Assignee: J.Andreina Priority: Minor Fix For: 2.8.0 Attachments: YARN-3397.1.patch Failover should be filtered out from HAAdmin to be in sync with doc. Since -failover is not supported operation in doc it is not been mentioned, cli usage is misguiding (can be in sync with doc) . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2213) Change proxy-user cookie log in AmIpFilter to DEBUG
[ https://issues.apache.org/jira/browse/YARN-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381956#comment-14381956 ] Hudson commented on YARN-2213: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2076 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2076/]) YARN-2213. Change proxy-user cookie log in AmIpFilter to DEBUG. (xgong: rev e556198e71df6be3a83e5598265cb702fc7a668b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/amfilter/AmIpFilter.java * hadoop-yarn-project/CHANGES.txt Change proxy-user cookie log in AmIpFilter to DEBUG --- Key: YARN-2213 URL: https://issues.apache.org/jira/browse/YARN-2213 Project: Hadoop YARN Issue Type: Task Reporter: Ted Yu Assignee: Varun Saxena Priority: Minor Fix For: 2.7.0 Attachments: YARN-2213.001.patch, YARN-2213.02.patch I saw a lot of the following lines in AppMaster log: {code} 14/06/24 17:12:36 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user cookie, so user will not be set {code} For long running app, this would consume considerable log space. Log level should be changed to DEBUG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)