date:20150326


[ 
https://issues.apache.org/jira/browse/YARN-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381750#comment-14381750
 ] 

Hudson commented on YARN-2213:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #878 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/878/])
YARN-2213. Change proxy-user cookie log in AmIpFilter to DEBUG. (xgong: rev 
e556198e71df6be3a83e5598265cb702fc7a668b)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/amfilter/AmIpFilter.java


 Change proxy-user cookie log in AmIpFilter to DEBUG
 ---

 Key: YARN-2213
 URL: https://issues.apache.org/jira/browse/YARN-2213
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Ted Yu
Assignee: Varun Saxena
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2213.001.patch, YARN-2213.02.patch


 I saw a lot of the following lines in AppMaster log:
 {code}
 14/06/24 17:12:36 WARN web.SliderAmIpFilter: Could not find proxy-user 
 cookie, so user will not be set
 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user 
 cookie, so user will not be set
 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user 
 cookie, so user will not be set
 {code}
 For long running app, this would consume considerable log space.
 Log level should be changed to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3397) yarn rmadmin should skip -failover


[ 
https://issues.apache.org/jira/browse/YARN-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381748#comment-14381748
 ] 

Hudson commented on YARN-3397:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #878 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/878/])
YARN-3397. yarn rmadmin should skip -failover. (J.Andreina via kasha) (kasha: 
rev c906a1de7280dabd9d9d8b6aeaa060898e6d17b6)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java


 yarn rmadmin should skip -failover
 --

 Key: YARN-3397
 URL: https://issues.apache.org/jira/browse/YARN-3397
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: J.Andreina
Assignee: J.Andreina
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3397.1.patch


 Failover should be filtered out from HAAdmin to be in sync with doc.
 Since -failover is not supported operation in doc it is not been mentioned, 
 cli usage is misguiding (can be in sync with doc) .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3397) yarn rmadmin should skip -failover


[ 
https://issues.apache.org/jira/browse/YARN-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381737#comment-14381737
 ] 

Hudson commented on YARN-3397:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #144 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/144/])
YARN-3397. yarn rmadmin should skip -failover. (J.Andreina via kasha) (kasha: 
rev c906a1de7280dabd9d9d8b6aeaa060898e6d17b6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java
* hadoop-yarn-project/CHANGES.txt


 yarn rmadmin should skip -failover
 --

 Key: YARN-3397
 URL: https://issues.apache.org/jira/browse/YARN-3397
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: J.Andreina
Assignee: J.Andreina
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3397.1.patch


 Failover should be filtered out from HAAdmin to be in sync with doc.
 Since -failover is not supported operation in doc it is not been mentioned, 
 cli usage is misguiding (can be in sync with doc) .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2213) Change proxy-user cookie log in AmIpFilter to DEBUG


[ 
https://issues.apache.org/jira/browse/YARN-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381739#comment-14381739
 ] 

Hudson commented on YARN-2213:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #144 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/144/])
YARN-2213. Change proxy-user cookie log in AmIpFilter to DEBUG. (xgong: rev 
e556198e71df6be3a83e5598265cb702fc7a668b)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/amfilter/AmIpFilter.java


 Change proxy-user cookie log in AmIpFilter to DEBUG
 ---

 Key: YARN-2213
 URL: https://issues.apache.org/jira/browse/YARN-2213
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Ted Yu
Assignee: Varun Saxena
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2213.001.patch, YARN-2213.02.patch


 I saw a lot of the following lines in AppMaster log:
 {code}
 14/06/24 17:12:36 WARN web.SliderAmIpFilter: Could not find proxy-user 
 cookie, so user will not be set
 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user 
 cookie, so user will not be set
 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user 
 cookie, so user will not be set
 {code}
 For long running app, this would consume considerable log space.
 Log level should be changed to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3397) yarn rmadmin should skip -failover


[ 
https://issues.apache.org/jira/browse/YARN-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381858#comment-14381858
 ] 

Hudson commented on YARN-3397:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #144 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/144/])
YARN-3397. yarn rmadmin should skip -failover. (J.Andreina via kasha) (kasha: 
rev c906a1de7280dabd9d9d8b6aeaa060898e6d17b6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java


 yarn rmadmin should skip -failover
 --

 Key: YARN-3397
 URL: https://issues.apache.org/jira/browse/YARN-3397
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: J.Andreina
Assignee: J.Andreina
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3397.1.patch


 Failover should be filtered out from HAAdmin to be in sync with doc.
 Since -failover is not supported operation in doc it is not been mentioned, 
 cli usage is misguiding (can be in sync with doc) .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3397) yarn rmadmin should skip -failover


[ 
https://issues.apache.org/jira/browse/YARN-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381879#comment-14381879
 ] 

Hudson commented on YARN-3397:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2094 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2094/])
YARN-3397. yarn rmadmin should skip -failover. (J.Andreina via kasha) (kasha: 
rev c906a1de7280dabd9d9d8b6aeaa060898e6d17b6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java


 yarn rmadmin should skip -failover
 --

 Key: YARN-3397
 URL: https://issues.apache.org/jira/browse/YARN-3397
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: J.Andreina
Assignee: J.Andreina
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3397.1.patch


 Failover should be filtered out from HAAdmin to be in sync with doc.
 Since -failover is not supported operation in doc it is not been mentioned, 
 cli usage is misguiding (can be in sync with doc) .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream


[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383245#comment-14383245
 ] 

Hadoop QA commented on YARN-2893:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12707662/YARN-2893.002.patch
  against trunk revision 47782cb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMHA
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7118//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7118//console

This message is automatically generated.

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: zhihai xu
 Attachments: YARN-2893.000.patch, YARN-2893.001.patch, 
 YARN-2893.002.patch


 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3324) TestDockerContainerExecutor should clean test docker image from local repository after test is done

2015-03-26 Thread Chen He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383323#comment-14383323
 ] 

Chen He commented on YARN-3324:
---

+1, sounds good to me. Thanks, [~ravindra.naik]

 TestDockerContainerExecutor should clean test docker image from local 
 repository after test is done
 ---

 Key: YARN-3324
 URL: https://issues.apache.org/jira/browse/YARN-3324
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Chen He
 Attachments: YARN-3324-branch-2.6.0.002.patch, 
 YARN-3324-trunk.002.patch


 Current TestDockerContainerExecutor only cleans the temp directory in local 
 file system but leaves the test docker image in local docker repository. It 
 should be cleaned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3324) TestDockerContainerExecutor should clean test docker image from local repository after test is done

2015-03-26 Thread Chen He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383324#comment-14383324
 ] 

Chen He commented on YARN-3324:
---

Make sure there is no side effect if there are parallel docker tests running 
when you do your 1st step, 

 TestDockerContainerExecutor should clean test docker image from local 
 repository after test is done
 ---

 Key: YARN-3324
 URL: https://issues.apache.org/jira/browse/YARN-3324
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.6.0
Reporter: Chen He
 Attachments: YARN-3324-branch-2.6.0.002.patch, 
 YARN-3324-trunk.002.patch


 Current TestDockerContainerExecutor only cleans the temp directory in local 
 file system but leaves the test docker image in local docker repository. It 
 should be cleaned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree


 [ 
https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-2336:

Attachment: YARN-2336-4.patch

 Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
 --

 Key: YARN-2336
 URL: https://issues.apache.org/jira/browse/YARN-2336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Kenji Kikushima
Assignee: Kenji Kikushima
 Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336-4.patch, 
 YARN-2336.patch


 When we have sub queues in Fair Scheduler, REST api returns a missing '[' 
 blacket JSON for childQueues.
 This issue found by [~ajisakaa] at YARN-1050.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3404) View the queue name to YARN Application page

2015-03-26 Thread Ryu Kobayashi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryu Kobayashi updated YARN-3404:

Attachment: screenshot.png

 View the queue name to YARN Application page
 

 Key: YARN-3404
 URL: https://issues.apache.org/jira/browse/YARN-3404
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ryu Kobayashi
Priority: Minor
 Attachments: screenshot.png


 It want to display the name of the queue that is used to YARN Application 
 page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree


 [ 
https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-2336:

Attachment: (was: YARN-2336-4.patch)

 Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
 --

 Key: YARN-2336
 URL: https://issues.apache.org/jira/browse/YARN-2336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Kenji Kikushima
Assignee: Kenji Kikushima
 Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336.patch


 When we have sub queues in Fair Scheduler, REST api returns a missing '[' 
 blacket JSON for childQueues.
 This issue found by [~ajisakaa] at YARN-1050.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree


 [ 
https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-2336:

Attachment: YARN-2336-4.patch

Rebased for the latest trunk.

 Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
 --

 Key: YARN-2336
 URL: https://issues.apache.org/jira/browse/YARN-2336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Kenji Kikushima
Assignee: Kenji Kikushima
 Attachments: YARN-2336-2.patch, YARN-2336-3.patch, YARN-2336-4.patch, 
 YARN-2336.patch


 When we have sub queues in Fair Scheduler, REST api returns a missing '[' 
 blacket JSON for childQueues.
 This issue found by [~ajisakaa] at YARN-1050.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3403) Nodemanager dies after a small typo in mapred-site.xml is induced


[ 
https://issues.apache.org/jira/browse/YARN-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383362#comment-14383362
 ] 

Naganarasimha G R commented on YARN-3403:
-

Hi [~mnikhil],  which version are you testing with ?

 Nodemanager dies after a small typo in mapred-site.xml is induced
 -

 Key: YARN-3403
 URL: https://issues.apache.org/jira/browse/YARN-3403
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Nikhil Mulley
Priority: Critical

 Hi,
 We have noticed that with a small typo in terms of xml config 
 (mapred-site.xml) can cause the nodemanager go down completely without 
 stopping/restarting it externally.
 I find it little weird that editing the config files on the filesystem, could 
 cause the running slave daemon yarn nodemanager shutdown.
 In this case, I had a ending tag '/' missed in a property and that induced 
 the nodemanager go down in a cluster. 
 Why would nodemanager reload the configs while it is running? Are not they 
 picked up when they are started? Even if they are automated to pick up the 
 new configs dynamically, I think the xmllint/config checker should come in 
 before the nodemanager is asked to reload/restart.
  
 ---
 java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: 
 file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 228; columnNumber: 3; The 
 element type value must be terminated by the matching end-tag /value.
at 
 org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2348)
 ---
 Please shed light on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3403) Nodemanager dies after a small typo in mapred-site.xml is induced

2015-03-26 Thread Nikhil Mulley (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383285#comment-14383285
 ] 

Nikhil Mulley commented on YARN-3403:
-

The more stack trace is here:  this is reproducible.

---
2015-03-26 20:04:43,690 FATAL org.apache.hadoop.conf.Configuration: error 
parsing conf mapred-site.xml
org.xml.sax.SAXParseException; systemId: file:/etc/hadoop/conf/mapred-site.xml; 
lineNumber: 316; columnNumber: 3; The element type property must be 
terminated by the matching end-tag /property.
at 
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
at 
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:347)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2183)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2171)
at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2242)
at 
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2195)
at 
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2112)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:858)
at 
org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:877)
at 
org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1278)
at 
org.apache.hadoop.io.compress.zlib.ZlibFactory.isNativeZlibLoaded(ZlibFactory.java:65)
at 
org.apache.hadoop.io.compress.zlib.ZlibFactory.getZlibCompressorType(ZlibFactory.java:82)
at 
org.apache.hadoop.io.compress.DefaultCodec.getCompressorType(DefaultCodec.java:74)
at 
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
at 
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163)
at 
org.apache.hadoop.io.file.tfile.Compression$Algorithm.getCompressor(Compression.java:274)
at 
org.apache.hadoop.io.file.tfile.BCFile$Writer$WBlockState.init(BCFile.java:129)
at 
org.apache.hadoop.io.file.tfile.BCFile$Writer.prepareDataBlock(BCFile.java:430)
at 
org.apache.hadoop.io.file.tfile.TFile$Writer.initDataBlock(TFile.java:642)
at 
org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:533)
at 
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.writeVersion(AggregatedLogFormat.java:276)
at 
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.init(AggregatedLogFormat.java:272)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:108)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:166)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:140)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$2.run(LogAggregationService.java:354)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-03-26 20:04:43,691 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
 Aggregation did not complete for application application_1426202183036_103251
2015-03-26 20:04:43,691 ERROR 
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
Thread[LogAggregationService #2,5,main] threw an Throwable, but we are shutting 
down, so ignoring this
java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: 
file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 316; columnNumber: 3; The 
element type property must be terminated by the matching end-tag 
/property.
--

 Nodemanager dies after a small typo in mapred-site.xml is induced
 -

 Key: YARN-3403
 URL: https://issues.apache.org/jira/browse/YARN-3403
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Nikhil Mulley
Priority: Critical

 Hi,
 We have noticed that with a small typo in terms of xml config 
 (mapred-site.xml) can cause the nodemanager go down completely without 
 stopping/restarting it externally.
 I find it little weird that editing the config files on the filesystem, could 
 cause the running slave daemon yarn nodemanager shutdown.
 In this case, I had a ending tag '/' missed in a property and that induced 
 the nodemanager go down in a cluster. 
 Why would

[jira] [Updated] (YARN-3404) View the queue name to YARN Application page

2015-03-26 Thread Ryu Kobayashi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryu Kobayashi updated YARN-3404:

Attachment: YARN-3404.1.patch

 View the queue name to YARN Application page
 

 Key: YARN-3404
 URL: https://issues.apache.org/jira/browse/YARN-3404
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ryu Kobayashi
Priority: Minor
 Attachments: YARN-3404.1.patch, screenshot.png


 It want to display the name of the queue that is used to YARN Application 
 page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3334) [Event Producers] NM start to posting some app related metrics in early POC stage of phase 2.


 [ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3334:
-
Attachment: YARN-3334-v2.patch

 [Event Producers] NM start to posting some app related metrics in early POC 
 stage of phase 2.
 -

 Key: YARN-3334
 URL: https://issues.apache.org/jira/browse/YARN-3334
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: YARN-2928
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
 YARN-3334-v2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context


[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382191#comment-14382191
 ] 

Junping Du commented on YARN-3040:
--

OK. I have commit v6 patch to branch YARN-2928. Thanks [~zjshen] for 
contributing the patch, and review comments from [~sjlee0], [~vinodkv], 
[~gtCarrera9], [~kasha] and [~Naganarasimha]!

 [Data Model] Make putEntities operation be aware of the app's context
 -

 Key: YARN-3040
 URL: https://issues.apache.org/jira/browse/YARN-3040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen
 Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch, 
 YARN-3040.4.patch, YARN-3040.5.patch, YARN-3040.6.patch


 Per design in YARN-2928, implement client-side API for handling *flows*. 
 Frameworks should be able to define and pass in all attributes of flows and 
 flow runs to YARN, and they should be passed into ATS writers.
 YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS


[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382308#comment-14382308
 ] 

Sangjin Lee commented on YARN-3044:
---

{quote}
Well its not the limitation at RM timeline collector which i am trying to 
mention, but the writer interface is like
TimelineWriter.write(TimelineEntities)
Writer would not be aware whether client is writing ApplicationEntity or 
AppAttemptEntity.IIUC it will just try to write 
the fields of the TimelineEntity to the storage. May be if its just storing 
entity as an json object directly to storage it might not be an issue but it 
will not be the case in hbase column storage right ?
{quote}

I see. So your point is whether the storage implementation can recognize 
different entity types and act accordingly? If so, the answer is yes. The 
storage implementation can easily introspect the type of the entity and do the 
right thing based on the type if needed.

+ [~zjshen]

 [Event producers] Implement RM writing app lifecycle events to ATS
 --

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3044.20150325-1.patch


 Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context


[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382226#comment-14382226
 ] 

Sangjin Lee commented on YARN-3040:
---

Thanks much [~zjshen]!

 [Data Model] Make putEntities operation be aware of the app's context
 -

 Key: YARN-3040
 URL: https://issues.apache.org/jira/browse/YARN-3040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen
 Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch, 
 YARN-3040.4.patch, YARN-3040.5.patch, YARN-3040.6.patch


 Per design in YARN-2928, implement client-side API for handling *flows*. 
 Frameworks should be able to define and pass in all attributes of flows and 
 flow runs to YARN, and they should be passed into ATS writers.
 YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.


 [ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3334:
-
Description: After YARN-3039, we have service discovery mechanism to pass 
app-collector service address among collectors, NMs and RM. In this JIRA, we 
will handle service address setting for TimelineClients in NodeManager, and put 
container metrics to the backend storage.

 [Event Producers] NM TimelineClient life cycle handling and container metrics 
 posting to new timeline service.
 --

 Key: YARN-3334
 URL: https://issues.apache.org/jira/browse/YARN-3334
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: YARN-2928
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
 YARN-3334-v2.patch


 After YARN-3039, we have service discovery mechanism to pass app-collector 
 service address among collectors, NMs and RM. In this JIRA, we will handle 
 service address setting for TimelineClients in NodeManager, and put container 
 metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests


[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382333#comment-14382333
 ] 

Jian Fang commented on YARN-796:


Come back to this issue again since I am trying to merge the latest YARN-796 
into our hadoop code base. Seems one thing is missing, i.e., how to specify the 
labels for application masters? Application master is special and it is the 
task manager of a specific YARN application. It also has some special 
requirements for its allocation on a hadoop cluster running in cloud. For 
example, in Amazon EC2, we do not want any application masters to be launched 
on any spot instances if we have both spot and on-demand instances available. 
Yarn-796 should provide a mechanism to achieve this goal. 

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, 
 Non-exclusive-Node-Partition-Design.pdf, YARN-796-Diagram.pdf, 
 YARN-796.node-label.consolidate.1.patch, 
 YARN-796.node-label.consolidate.10.patch, 
 YARN-796.node-label.consolidate.11.patch, 
 YARN-796.node-label.consolidate.12.patch, 
 YARN-796.node-label.consolidate.13.patch, 
 YARN-796.node-label.consolidate.14.patch, 
 YARN-796.node-label.consolidate.2.patch, 
 YARN-796.node-label.consolidate.3.patch, 
 YARN-796.node-label.consolidate.4.patch, 
 YARN-796.node-label.consolidate.5.patch, 
 YARN-796.node-label.consolidate.6.patch, 
 YARN-796.node-label.consolidate.7.patch, 
 YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3334) [Event Producers] NM start to posting some app related metrics in early POC stage of phase 2.


[ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382171#comment-14382171
 ] 

Junping Du commented on YARN-3334:
--

Thanks [~zjshen] for review and comments!
In v2, I corporate all of your comments above except one: replace 
TimelineEntity with ContainerEntity. I agree that the latter one sounds better. 
However, the test cannot pass locally if replacing 
{code}
  TimelineEntity entity = new TimelineEntity();
  entity.setId(containerId.toString());
  entity.setType(TimelineEntityType.YARN_CONTAINER.toString());
{code}
with:
{code}
  ContainerEntity entity = new ContainerEntity();
  entity.setId(containerId.toString());
{code}
Do we expect some info extra is necessary for ContainerEntity to set? If not, I 
suspect some bug (NPE, etc.) could be hidden in putEntity for ContainerEntity. 
If so, can we fix it separately? Add a TODO here though. 

 [Event Producers] NM start to posting some app related metrics in early POC 
 stage of phase 2.
 -

 Key: YARN-3334
 URL: https://issues.apache.org/jira/browse/YARN-3334
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: YARN-2928
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
 YARN-3334-v2.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context

2015-03-26 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382152#comment-14382152
 ] 

Zhijie Shen commented on YARN-3040:
---

Sure, let's return null for now.

 [Data Model] Make putEntities operation be aware of the app's context
 -

 Key: YARN-3040
 URL: https://issues.apache.org/jira/browse/YARN-3040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen
 Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch, 
 YARN-3040.4.patch, YARN-3040.5.patch, YARN-3040.6.patch


 Per design in YARN-2928, implement client-side API for handling *flows*. 
 Frameworks should be able to define and pass in all attributes of flows and 
 flow runs to YARN, and they should be passed into ATS writers.
 YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS


[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382343#comment-14382343
 ] 

Naganarasimha G R commented on YARN-3044:
-

[~sjlee0],
bq. I see. So your point is whether the storage implementation can recognize 
different entity types and act accordingly? If so, the answer is yes. The 
storage implementation can easily introspect the type of the entity and do the 
right thing based on the type if needed.
Well if introspection is by checking through TimelineEntity.getType and then 
cast it to the specific TimelineEntity, then it can break if the client/AM  by 
chance tries to post  a normal TimelineEntity with type as 
TimelineEntityType.YARN_APPLICATION or other system entities. Or other 
approaches like checking with {{instance of}} or the likes sounds inappropriate.


 [Event producers] Implement RM writing app lifecycle events to ATS
 --

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3044.20150325-1.patch


 Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-03-26 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382352#comment-14382352
 ] 

Varun Saxena commented on YARN-3047:


[~zjshen], the patch YARN-3047.04.patch applies for me using {{patch -p0}}. I 
had updated latest code as well. May I know where is it failing for you ?

 [Data Serving] Set up ATS reader with basic request serving structure and 
 lifecycle
 ---

 Key: YARN-3047
 URL: https://issues.apache.org/jira/browse/YARN-3047
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: Timeline_Reader(draft).pdf, YARN-3047.001.patch, 
 YARN-3047.003.patch, YARN-3047.02.patch, YARN-3047.04.patch


 Per design in YARN-2938, set up the ATS reader as a service and implement the 
 basic structure as a service. It includes lifecycle management, request 
 serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382299#comment-14382299
 ] 

Jian Fang commented on YARN-2495:
-

In a cloud environment such as Amazon EMR, a hadoop cluster is launched as a 
service by a single command line. There is no admin at all and everything is 
automated. The lables are basically of two types, one is static. For example, 
the nature of an EC2 instance such as spot or on-demand. The other is dynamic. 
For example, the cluster controller process can set an instance to be a 
candidate to be terminated in the case of graceful shrink so that resource 
manager will not assign new tasks to it. 

Most likely, the labels specified from each NM are static and are provided by a 
cluster controller process to write into yarn-site.xml based on EC2 metadata 
available on each EC2 instance. As a  result, at least you should defined a 
static lablel provider (plus a dynamic lable provider? not sure) so that these 
lables are only sent to resource manager at NM registeration time. There is no 
point to add the static lables to each heartbeat.

I think the idea of central and distributed label configurations are not ideal 
to use in a cloud environment. Usually we have a mix of static lables from each 
node and dynamic labels that are specified against the resource manager 
directly. Static and dynamic lable concepts are more appopriate at least for 
Amazon EMR.


 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
 YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, 
 YARN-2495.20150321-1.patch, YARN-2495.20150324-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3388) userlimit isn't playing well with DRF calculator

2015-03-26 Thread Nathan Roberts (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-3388:
-
Attachment: YARN-3388-v0.patch

Initial patch for comments on approach. Seems to work well in basic testing on 
2.6. I don't know how this interacts with label support + userlimit which I 
think is still lacking in some cases anyway.  Hoping [~leftnoteasy] and others 
can comment.

 userlimit isn't playing well with DRF calculator
 

 Key: YARN-3388
 URL: https://issues.apache.org/jira/browse/YARN-3388
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts
 Attachments: YARN-3388-v0.patch


 When there are multiple active users in a queue, it should be possible for 
 those users to make use of capacity up-to max_capacity (or close). The 
 resources should be fairly distributed among the active users in the queue. 
 This works pretty well when there is a single resource being scheduled.   
 However, when there are multiple resources the situation gets more complex 
 and the current algorithm tends to get stuck at Capacity. 
 Example illustrated in subsequent comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3401) [Data Model] users should not be able to create a generic TimelineEntity and associate arbitrary type


[ 
https://issues.apache.org/jira/browse/YARN-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382575#comment-14382575
 ] 

Sangjin Lee commented on YARN-3401:
---

Thanks for reminding me of that discussion. Yes, we definitely discussed that, 
and we said that only YARN daemons are allowed to post system entities. If any 
non-YARN daemons (e.g. AMs, clients, tasks, etc.) try to post YARN system 
entities they would be rejected.

That said, they can still refer to a YARN system entity. For example, if you're 
an MR AM then you might refer to the container id to post metrics for the 
container in which your tasks are running. So we need to be precise exactly 
what is disallowed.

bq. if so if we add a check @ Timelineclient will it impact NM from posting 
container metrics  entities ?

NM is a YARN daemon, so it should be able to post container metrics and 
entities with no issues.

 [Data Model] users should not be able to create a generic TimelineEntity and 
 associate arbitrary type
 -

 Key: YARN-3401
 URL: https://issues.apache.org/jira/browse/YARN-3401
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R

 IIUC it is possible for users to create a generic TimelineEntity and set an 
 arbitrary entity type. For example, for a YARN app, the right entity API is 
 ApplicationEntity. However, today nothing stops users from instantiating a 
 base TimelineEntity class and set the application type on it. This presents a 
 problem in handling these YARN system entities in the storage layer for 
 example.
 We need to ensure that the API allows only the right type of the class to be 
 created for a given entity type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-03-26 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382506#comment-14382506
 ] 

Allen Wittenauer commented on YARN-2495:


That's effectively what the executable interface is for

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
 YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, 
 YARN-2495.20150321-1.patch, YARN-2495.20150324-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382555#comment-14382555
 ] 

Jian Fang commented on YARN-2495:
-

Great, thanks. Will try them.

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
 YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, 
 YARN-2495.20150321-1.patch, YARN-2495.20150324-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382389#comment-14382389
 ] 

Wangda Tan commented on YARN-2495:
--

Hi [~john.jian.fang],
Thanks for your comments,
I'm not sure if I completely understood what you said. Did you mean there're 
two different types of NMs, which is: some labels are not changed in NM's 
lifetime, some labels could be modified when NM's running (I think the 
decommission case you provided is better to be resolved by graceful NM 
decommission instead of node label.).

Having a centralized node label list is mostly for resource planning, you can 
take a look at conversions on YARN-3214 for more details about resource 
planning stuffs.

Regardless of the centralized node label list in RM side, I think current 
implementation of attached patch should work for you. Even if labels could be 
modified via heartbeat, but you can simply not change them in your own script, 
if there's no changes of NM's labels, no duplicated data will be sent to RM 
side.

Wangda

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
 YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, 
 YARN-2495.20150321-1.patch, YARN-2495.20150324-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382388#comment-14382388
 ] 

Naganarasimha G R commented on YARN-2495:
-

Hi [~john.jian.fang],
Well this jira is followed by YARN-2729, where in labels got from the script  
are passed as part of heartbeat which makes the distributed label configuration 
as dynamic. Also as part of this jira we have tried to ensure that only when 
there is change in labels we send and if not we do not send static lables to 
each heartbeat.
 And for your case if cluster controller process wants to label a node so that 
it can graceful shrink can be done in 2 ways:
* Use REST API and change the label of the node to  some unique label which is 
not visible to other users
* After YARN-2729, may be you can have some script which has appropriate logic 
to update RM with some some unique label when it wants to shrink itself 
gracefully.
Hope i have addressed your scenario

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
 YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, 
 YARN-2495.20150321-1.patch, YARN-2495.20150324-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled


 [ 
https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2740:

Description: 
According to YARN-2495, when distributed node label configuration is enabled:
- RMAdmin / REST API should reject change labels on node operations.
- CommonNodeLabelsManager shouldn't persist labels on nodes when NM do 
heartbeat.

  was:
According to YARN-2495, when distributed node label configuration is enabled:
- RMAdmin / REST API should reject change labels on node operations.
- RMNodeLabelsManager shouldn't persistent labels on nodes when NM do heartbeat.


 ResourceManager side should properly handle node label modifications when 
 distributed node label configuration enabled
 --

 Key: YARN-2740
 URL: https://issues.apache.org/jira/browse/YARN-2740
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch


 According to YARN-2495, when distributed node label configuration is enabled:
 - RMAdmin / REST API should reject change labels on node operations.
 - CommonNodeLabelsManager shouldn't persist labels on nodes when NM do 
 heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2740) ResourceManager side should properly handle node label modifications when distributed node label configuration enabled


 [ 
https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2740:

Attachment: YARN-2740.20150327-1.patch

Hi [~wangda], 
Have rebased the patch and updated the patch to handle the second scenario 
{{CommonNodeLabelsManager shouldn't persist labels on nodes when NM do 
heartbeat.}}


 ResourceManager side should properly handle node label modifications when 
 distributed node label configuration enabled
 --

 Key: YARN-2740
 URL: https://issues.apache.org/jira/browse/YARN-2740
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2740-20141024-1.patch, YARN-2740.20150320-1.patch, 
 YARN-2740.20150327-1.patch


 According to YARN-2495, when distributed node label configuration is enabled:
 - RMAdmin / REST API should reject change labels on node operations.
 - CommonNodeLabelsManager shouldn't persist labels on nodes when NM do 
 heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2618) Avoid over-allocation of disk resources

[
https://issues.apache.org/jira/browse/YARN-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382181#comment-14382181
]

Hadoop QA commented on YARN-2618:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12707506/YARN-2618-6.patch
against trunk revision 2228456.

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 20 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager

hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
org.apache.hadoop.yarn.client.api.impl.TestYarnClient

org.apache.hadoop.yarn.client.TestResourceManagerAdministrationProtocolPBClientImpl
org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
org.apache.hadoop.yarn.client.TestGetGroups
org.apache.hadoop.yarn.client.api.impl.TestNMClient

org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA

org.apache.hadoop.yarn.client.TestApplicationMasterServiceProtocolOnHA
org.apache.hadoop.yarn.client.TestRMFailover
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA

org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService

org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits

org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication

org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
org.apache.hadoop.yarn.server.resourcemanager.TestRM

org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler

org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/7116//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7116//console

This message is automatically generated.

Avoid over-allocation of disk resources
---

Key: YARN-2618
URL: https://issues.apache.org/jira/browse/YARN-2618
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Wei Yan
Assignee: Wei Yan
Attachments: YARN-2618-1.patch, YARN-2618-2.patch, YARN-2618-3.patch,
YARN-2618-4.patch, YARN-2618-5.patch, YARN-2618-6.patch

Subtask of YARN-2139.
This should include
- Add API support for introducing disk I/O as the 3rd type resource.
- NM should report this information to the RM
- RM should consider this to avoid over-allocation

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3378) a load test client that can replay a volume of history files


[ 
https://issues.apache.org/jira/browse/YARN-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382435#comment-14382435
 ] 

Sangjin Lee commented on YARN-3378:
---

cc [~jeagles], [~lichangleo]

I'm working on this based on what you have on YARN-2556, with major differences 
being
- write it against the v.2 API (obviously)
- add an ability to replay things like a bunch of history files to generate 
more realistic and non-trivial entities and data

We'll also look into benchmarks more appropriate for the v.2 work as Li 
mentioned.

We need a little bit of discussion on how this will proceed in parallel with 
YARN-2556. I'm taking the latest patch on YARN-2556 as the basis. Should we go 
ahead and commit the work done in YARN-2556 first? Thoughts?

 a load test client that can replay a volume of history files
 

 Key: YARN-3378
 URL: https://issues.apache.org/jira/browse/YARN-3378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee

 It might be good to create a load test client that can replay a large volume 
 of history files into the timeline service. One can envision running such a 
 load test client as a mapreduce job and generate a fair amount of load. It 
 would be useful to spot check correctness, and more importantly observe 
 performance characteristic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3401) [Data Model] users should not be able to create a generic TimelineEntity and associate arbitrary type

Sangjin Lee created YARN-3401:
-

 Summary: [Data Model] users should not be able to create a generic 
TimelineEntity and associate arbitrary type
 Key: YARN-3401
 URL: https://issues.apache.org/jira/browse/YARN-3401
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee


IIUC it is possible for users to create a generic TimelineEntity and set an 
arbitrary entity type. For example, for a YARN app, the right entity API is 
ApplicationEntity. However, today nothing stops users from instantiating a base 
TimelineEntity class and set the application type on it. This presents a 
problem in handling these YARN system entities in the storage layer for example.

We need to ensure that the API allows only the right type of the class to be 
created for a given entity type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3400) [JDK 8] Build Failure due to unreported exceptions in RPCUtil


[ 
https://issues.apache.org/jira/browse/YARN-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382371#comment-14382371
 ] 

Hudson commented on YARN-3400:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7441 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7441/])
YARN-3400. [JDK 8] Build Failure due to unreported exceptions in RPCUtil 
(rkanter) (rkanter: rev 87130bf6b22f538c5c26ad5cef984558a8117798)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/RPCUtil.java


 [JDK 8] Build Failure due to unreported exceptions in RPCUtil 
 --

 Key: YARN-3400
 URL: https://issues.apache.org/jira/browse/YARN-3400
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Robert Kanter
Assignee: Robert Kanter
 Fix For: 2.8.0

 Attachments: YARN-3400.patch


 When I try compiling Hadoop with JDK 8 like this
 {noformat}
 mvn clean package -Pdist -Dtar -DskipTests -Djavac.version=1.8
 {noformat}
 I get this error:
 {noformat}
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
 on project hadoop-yarn-common: Compilation failure: Compilation failure:
 [ERROR] 
 /Users/rkanter/dev/hadoop-common2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/RPCUtil.java:[101,11]
  unreported exception java.lang.Throwable; must be caught or declared to be 
 thrown
 [ERROR] 
 /Users/rkanter/dev/hadoop-common2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/RPCUtil.java:[104,11]
  unreported exception java.lang.Throwable; must be caught or declared to be 
 thrown
 [ERROR] 
 /Users/rkanter/dev/hadoop-common2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/RPCUtil.java:[107,11]
  unreported exception java.lang.Throwable; must be caught or declared to be 
 thrown
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS


[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382372#comment-14382372
 ] 

Sangjin Lee commented on YARN-3044:
---

YARN-3401

 [Event producers] Implement RM writing app lifecycle events to ATS
 --

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3044.20150325-1.patch


 Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests


[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382404#comment-14382404
 ] 

Wangda Tan commented on YARN-796:
-

[~john.jian.fang],
The patch attached in this JIRA is staled, instead you should merge patches 
under YARN-2492.

For more usage info, you can take a look at 
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/YARN_RM_v22/node_labels/index.html#Item1.1.
 Specifically to your question, now we support 4 ways to specify labels for 
applications (CapacityScheduler only for now):
1) Specify default-node-label-expression in each queue, all containers under 
the queue will be assigned to label specified
2) Specify ApplicationSubmissionContext.appLabelExpression, all containers 
under the app will be assigned to label specified
3) Specify ApplicationSubmissionContext.amContainerLabelExpression, AM 
container will be assigned to label specified
4) Specify ResourceRequest.nodeLabelExpression, individual containers will be 
assigned to label specified.

Let me know if you have more questions.

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, 
 Non-exclusive-Node-Partition-Design.pdf, YARN-796-Diagram.pdf, 
 YARN-796.node-label.consolidate.1.patch, 
 YARN-796.node-label.consolidate.10.patch, 
 YARN-796.node-label.consolidate.11.patch, 
 YARN-796.node-label.consolidate.12.patch, 
 YARN-796.node-label.consolidate.13.patch, 
 YARN-796.node-label.consolidate.14.patch, 
 YARN-796.node-label.consolidate.2.patch, 
 YARN-796.node-label.consolidate.3.patch, 
 YARN-796.node-label.consolidate.4.patch, 
 YARN-796.node-label.consolidate.5.patch, 
 YARN-796.node-label.consolidate.6.patch, 
 YARN-796.node-label.consolidate.7.patch, 
 YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3401) [Data Model] users should not be able to create a generic TimelineEntity and associate arbitrary type


[ 
https://issues.apache.org/jira/browse/YARN-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382412#comment-14382412
 ] 

Naganarasimha G R commented on YARN-3401:
-

Hi [~sjlee0],  IIRC as part of the doc or some jira discussion we discussed 
that only RM/NM should be able to send the YARN system entities and other 
clients should not be able to send, right ? do we need to completely block it ? 
if so if we add a check @ Timelineclient will it impact NM from posting 
container metrics  entities ?

 [Data Model] users should not be able to create a generic TimelineEntity and 
 associate arbitrary type
 -

 Key: YARN-3401
 URL: https://issues.apache.org/jira/browse/YARN-3401
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R

 IIUC it is possible for users to create a generic TimelineEntity and set an 
 arbitrary entity type. For example, for a YARN app, the right entity API is 
 ApplicationEntity. However, today nothing stops users from instantiating a 
 base TimelineEntity class and set the application type on it. This presents a 
 problem in handling these YARN system entities in the storage layer for 
 example.
 We need to ensure that the API allows only the right type of the class to be 
 created for a given entity type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests


[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382486#comment-14382486
 ] 

Jian Fang commented on YARN-796:


Thanks. Seems ApplicationSubmissionContext.amContainerLabelExpression is the 
one that I am looking for. Will try that to see if it works. Any plans for the 
fair scheduler? We need that as well.

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, 
 Non-exclusive-Node-Partition-Design.pdf, YARN-796-Diagram.pdf, 
 YARN-796.node-label.consolidate.1.patch, 
 YARN-796.node-label.consolidate.10.patch, 
 YARN-796.node-label.consolidate.11.patch, 
 YARN-796.node-label.consolidate.12.patch, 
 YARN-796.node-label.consolidate.13.patch, 
 YARN-796.node-label.consolidate.14.patch, 
 YARN-796.node-label.consolidate.2.patch, 
 YARN-796.node-label.consolidate.3.patch, 
 YARN-796.node-label.consolidate.4.patch, 
 YARN-796.node-label.consolidate.5.patch, 
 YARN-796.node-label.consolidate.6.patch, 
 YARN-796.node-label.consolidate.7.patch, 
 YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS


[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382362#comment-14382362
 ] 

Sangjin Lee commented on YARN-3044:
---

I'll file a separate JIRA for this.

 [Event producers] Implement RM writing app lifecycle events to ATS
 --

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3044.20150325-1.patch


 Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS


[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382361#comment-14382361
 ] 

Sangjin Lee commented on YARN-3044:
---

That's a fair point. As a rule, we need to prevent users of the TimelineEntity 
API from setting arbitrary types. The only way of creating a YARN app timeline 
entity for example should be through instantiating ApplicationEntity.

We may need to make some of the methods that make this possible non-public, 
etc., although it remains to be seen how much of that is doable, given json 
needs to be able to handle them.

If we have that, IMO the type-based casting should be acceptable (it should 
reject the entity if the type says one thing and it is not the right class). 
Thoughts?

 [Event producers] Implement RM writing app lifecycle events to ATS
 --

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3044.20150325-1.patch


 Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3401) [Data Model] users should not be able to create a generic TimelineEntity and associate arbitrary type


[ 
https://issues.apache.org/jira/browse/YARN-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382406#comment-14382406
 ] 

Junping Du commented on YARN-3401:
--

We also need to make sure compatibility between old version application and new 
version timeline service. Typically, it won't be the case. But just put here as 
a reminder.

 [Data Model] users should not be able to create a generic TimelineEntity and 
 associate arbitrary type
 -

 Key: YARN-3401
 URL: https://issues.apache.org/jira/browse/YARN-3401
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R

 IIUC it is possible for users to create a generic TimelineEntity and set an 
 arbitrary entity type. For example, for a YARN app, the right entity API is 
 ApplicationEntity. However, today nothing stops users from instantiating a 
 base TimelineEntity class and set the application type on it. This presents a 
 problem in handling these YARN system entities in the storage layer for 
 example.
 We need to ensure that the API allows only the right type of the class to be 
 created for a given entity type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382496#comment-14382496
 ] 

Jian Fang commented on YARN-2495:
-

On each EC2 instance, the metadata about that instance such as its market type, 
i.e., spot or on-demand, CPUs, memory and etc are available when the instance 
starts up. All these information are injected to yarn-site.xml by our instance 
controller and they will not be changed afterwards. Different instances in an 
EMR cluster could have different static lables since one EMR hadoop consists of 
multiple instance groups, i.e., different types of instances. 

I think it is ok that no duplicated data are sent to RM if not NM lable changes.

Thanks. 

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
 YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, 
 YARN-2495.20150321-1.patch, YARN-2495.20150324-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382501#comment-14382501
 ] 

Jian Fang commented on YARN-2495:
-

BTW, I haven't gone through all the details of YARN-2492 yet, is it possible to 
provide a configuration to hook in different label providers on NM, for 
example, a third party one? (Sorry if this feature already exists).

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
 YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, 
 YARN-2495.20150321-1.patch, YARN-2495.20150324-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3399) Consider having a Default cluster ID

2015-03-26 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383041#comment-14383041
 ] 

Zhijie Shen commented on YARN-3399:
---

Thanks, Vinod! This proposal sounds almost good to me, but I think we need to 
rethink what's the default cluster ID.  default-$(RM-host-name)-cluster may 
not work because yarn.resourcemanager.hostname is 0.0.0.0 by default, such 
that different RMs may still use the same cluster ID. Even if we use IP address 
to lookup host name, it's likely to end up with the same localhost.

 Consider having a Default cluster ID
 

 Key: YARN-3399
 URL: https://issues.apache.org/jira/browse/YARN-3399
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Brahma Reddy Battula

 In YARN-3040, timeline service will set the default cluster ID if users don't 
 provide one. RM HA's current behavior is a bit different when users don't 
 provide cluster ID. IllegalArgumentException will throw instead. Let's 
 continue the discussion if RM HA needs the default cluster ID or not here, 
 and what's the proper default cluster ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3331) NodeManager should use directory other than tmp for extracting and loading leveldbjni

2015-03-26 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383078#comment-14383078
 ] 

Zhijie Shen commented on YARN-3331:
---

bq. I am not sure which value in core-site would fix this after going through 
the core-default documentation.

I'm afraid we can't set it in config file, because config file is read by the 
daemon, but we need to start the daemon with this opt.

And IMHO, {{-Dlibrary.leveldbjni.path}} alone cannot fix the problem. If the 
temporal native lib is redirected to another dir, we also needs to add that dir 
to {{JAVA_LIBRARY_PATH}}. Otherwise, we may still end up with native lib not 
found.

 NodeManager should use directory other than tmp for extracting and loading 
 leveldbjni
 -

 Key: YARN-3331
 URL: https://issues.apache.org/jira/browse/YARN-3331
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3331.001.patch, YARN-3331.002.patch


 /tmp can be  required to be noexec in many environments. This causes a 
 problem when  nodemanager tries to load the leveldbjni library which can get 
 unpacked and executed from /tmp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3323) Task UI, sort by name doesn't work


[ 
https://issues.apache.org/jira/browse/YARN-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381899#comment-14381899
 ] 

Akira AJISAKA commented on YARN-3323:
-

Hi, [~brahmareddy], looks like the version of {{jquery.dataTables.min.js.gz}} 
included in v2 patch is still 1.9.4. Would you include the latest version?

 Task UI, sort by name doesn't work
 --

 Key: YARN-3323
 URL: https://issues.apache.org/jira/browse/YARN-3323
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.5.1
Reporter: Thomas Graves
Assignee: Brahma Reddy Battula
 Attachments: YARN-3323-002.patch, YARN-3323.patch


 If you go to the MapReduce ApplicationMaster or HistoryServer UI and open the 
 list of tasks, then try to sort by the task name/id, it does nothing.
 Note that if you go to the task attempts, that seem to sort fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context


 [ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3040:
-
Attachment: YARN-3040.6.patch

 [Data Model] Make putEntities operation be aware of the app's context
 -

 Key: YARN-3040
 URL: https://issues.apache.org/jira/browse/YARN-3040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen
 Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch, 
 YARN-3040.4.patch, YARN-3040.5.patch, YARN-3040.6.patch


 Per design in YARN-2928, implement client-side API for handling *flows*. 
 Frameworks should be able to define and pass in all attributes of flows and 
 flow runs to YARN, and they should be passed into ATS writers.
 YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context


[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381908#comment-14381908
 ] 

Junping Du commented on YARN-3040:
--

Sounds like there is a build failure for v5 patch:  RMTimelineCollector (just 
added in YARN-3034) need to override abstract method getTimelineEntityContext() 
in TimelineCollector. Given there is YARN-3390 to track this issue separately, 
I think we can simply add a quick method (like return null) to 
RMTimelineCollector like v6 patch shows. [~zjshen], can you confirm this?

 [Data Model] Make putEntities operation be aware of the app's context
 -

 Key: YARN-3040
 URL: https://issues.apache.org/jira/browse/YARN-3040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen
 Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch, 
 YARN-3040.4.patch, YARN-3040.5.patch, YARN-3040.6.patch


 Per design in YARN-2928, implement client-side API for handling *flows*. 
 Frameworks should be able to define and pass in all attributes of flows and 
 flow runs to YARN, and they should be passed into ATS writers.
 YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters


[ 
https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381913#comment-14381913
 ] 

Hadoop QA commented on YARN-3304:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12707398/YARN-3304-v3.patch
  against trunk revision b4b4fe9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7115//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7115//console

This message is automatically generated.

 ResourceCalculatorProcessTree#getCpuUsagePercent default return value is 
 inconsistent with other getters
 

 Key: YARN-3304
 URL: https://issues.apache.org/jira/browse/YARN-3304
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: YARN-3304-v2.patch, YARN-3304-v3.patch, YARN-3304.patch


 Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for 
 unavailable case while other resource metrics are return 0 in the same case 
 which sounds inconsistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3402) Security support for new timeline service.

Junping Du created YARN-3402:


 Summary: Security support for new timeline service.
 Key: YARN-3402
 URL: https://issues.apache.org/jira/browse/YARN-3402
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Junping Du
Assignee: Junping Du


We should support YARN security for new TimelineService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-03-26 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382799#comment-14382799
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~jianhe], would you please take a look at the latest patch? thanks a lot.


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly over DistCp
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
Assignee: Yongjun Zhang
 Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
 YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
 YARN-3021.006.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382542#comment-14382542
 ] 

Wangda Tan commented on YARN-2495:
--

bq. is it possible to provide a configuration to hook in different label 
providers on NM, for example, a third party one? (Sorry if this feature already 
exists).
Yes, you can check in this patch, how LabelProvider created is leaving blank, 
and we have two JIRAs to make it configurable:
- YARN-2729 for script based
- YARN-2923 for config based

This should be pluggable and new provider can be added in the future.

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
 YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
 YARN-2495.20150318-1.patch, YARN-2495.20150320-1.patch, 
 YARN-2495.20150321-1.patch, YARN-2495.20150324-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
 using script suggested by [~aw] (YARN-2729) )
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests


[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382552#comment-14382552
 ] 

Wangda Tan commented on YARN-796:
-

Fair scheduler efforts are tracked by YARN-2497. You can check about plans in 
that JIRA.

Thanks,

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, 
 Non-exclusive-Node-Partition-Design.pdf, YARN-796-Diagram.pdf, 
 YARN-796.node-label.consolidate.1.patch, 
 YARN-796.node-label.consolidate.10.patch, 
 YARN-796.node-label.consolidate.11.patch, 
 YARN-796.node-label.consolidate.12.patch, 
 YARN-796.node-label.consolidate.13.patch, 
 YARN-796.node-label.consolidate.14.patch, 
 YARN-796.node-label.consolidate.2.patch, 
 YARN-796.node-label.consolidate.3.patch, 
 YARN-796.node-label.consolidate.4.patch, 
 YARN-796.node-label.consolidate.5.patch, 
 YARN-796.node-label.consolidate.6.patch, 
 YARN-796.node-label.consolidate.7.patch, 
 YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters


[ 
https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382759#comment-14382759
 ] 

Junping Du commented on YARN-3304:
--

Hi [~kasha] and [~adhoot], v3 patch should be a complete and clean solution for 
this blocker. Can you help to review and comment? Thanks!

 ResourceCalculatorProcessTree#getCpuUsagePercent default return value is 
 inconsistent with other getters
 

 Key: YARN-3304
 URL: https://issues.apache.org/jira/browse/YARN-3304
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: YARN-3304-v2.patch, YARN-3304-v3.patch, YARN-3304.patch


 Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for 
 unavailable case while other resource metrics are return 0 in the same case 
 which sounds inconsistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3401) [Data Model] users should not be able to create a generic TimelineEntity and associate arbitrary type


[ 
https://issues.apache.org/jira/browse/YARN-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382794#comment-14382794
 ] 

Junping Du commented on YARN-3401:
--

[~sjlee0] and [~Naganarasimha], I think this belongs to prevent of malicious 
behaviors. I would suggest to get back to this until we are discussing support 
of YARN Security in TimelineService which shouldn't happen very soon.
Just filed YARN-3402 to track security issue for new timeline service. 

 [Data Model] users should not be able to create a generic TimelineEntity and 
 associate arbitrary type
 -

 Key: YARN-3401
 URL: https://issues.apache.org/jira/browse/YARN-3401
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R

 IIUC it is possible for users to create a generic TimelineEntity and set an 
 arbitrary entity type. For example, for a YARN app, the right entity API is 
 ApplicationEntity. However, today nothing stops users from instantiating a 
 base TimelineEntity class and set the application type on it. This presents a 
 problem in handling these YARN system entities in the storage layer for 
 example.
 We need to ensure that the API allows only the right type of the class to be 
 created for a given entity type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3402) Security support for new timeline service.


 [ 
https://issues.apache.org/jira/browse/YARN-3402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3402:
-
Description: 
We should support YARN security for new TimelineService.
Basically, there should be security token exchange between AM, NMs and 
app-collectors to prevent anyone who knows the service address of app-collector 
can post faked/unwanted information.

  was:We should support YARN security for new TimelineService.


 Security support for new timeline service.
 --

 Key: YARN-3402
 URL: https://issues.apache.org/jira/browse/YARN-3402
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Junping Du
Assignee: Junping Du

 We should support YARN security for new TimelineService.
 Basically, there should be security token exchange between AM, NMs and 
 app-collectors to prevent anyone who knows the service address of 
 app-collector can post faked/unwanted information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle


[ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382690#comment-14382690
 ] 

Hadoop QA commented on YARN-3047:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12707592/YARN-3047.005.patch
  against trunk revision 61df1b2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7117//console

This message is automatically generated.

 [Data Serving] Set up ATS reader with basic request serving structure and 
 lifecycle
 ---

 Key: YARN-3047
 URL: https://issues.apache.org/jira/browse/YARN-3047
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: Timeline_Reader(draft).pdf, YARN-3047.001.patch, 
 YARN-3047.003.patch, YARN-3047.005.patch, YARN-3047.02.patch, 
 YARN-3047.04.patch


 Per design in YARN-2938, set up the ATS reader as a service and implement the 
 basic structure as a service. It includes lifecycle management, request 
 serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-03-26 Thread Varun Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3047:
---
Attachment: YARN-3047.005.patch

Uploaded a new patch. Verfied that patch applies with {{ patch -p0 }}

 [Data Serving] Set up ATS reader with basic request serving structure and 
 lifecycle
 ---

 Key: YARN-3047
 URL: https://issues.apache.org/jira/browse/YARN-3047
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: Timeline_Reader(draft).pdf, YARN-3047.001.patch, 
 YARN-3047.003.patch, YARN-3047.005.patch, YARN-3047.02.patch, 
 YARN-3047.04.patch


 Per design in YARN-2938, set up the ATS reader as a service and implement the 
 basic structure as a service. It includes lifecycle management, request 
 serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

[
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382387#comment-14382387
]

Junping Du commented on YARN-3044:
--

Thanks guys for good discussions above especially for topic of posting app
lifecycle events from NM or RM. Can I propose that we do both ways in
development stage?
I fully understand the concern from [~sjlee0] that RM may not afford tens of
thousands containers in large size cluster. However, we can disable RM-side
posting work in production environment by default. We can have different entity
types, e.g. NM_CONTAINER_EVENT, RM_CONTAINER_EVENT, for containers' event get
posted from NM or RM then we can fully understand how the world could be
different from NM and RM (i.e. start time, end time, etc.). It not only benefit
the development cycle, but also benefit the trouble-shooting work in a
production environment as this apple-to-apple comparing may provide some hints
to user. Given this (both way) doesn't sounds like too much work, I think it
may worth to do. Thoughts?

[Event producers] Implement RM writing app lifecycle events to ATS
--

Key: YARN-3044
URL: https://issues.apache.org/jira/browse/YARN-3044
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
Attachments: YARN-3044.20150325-1.patch

Per design in YARN-2928, implement RM writing app lifecycle events to ATS.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3401) [Data Model] users should not be able to create a generic TimelineEntity and associate arbitrary type


 [ 
https://issues.apache.org/jira/browse/YARN-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R reassigned YARN-3401:
---

Assignee: Naganarasimha G R

 [Data Model] users should not be able to create a generic TimelineEntity and 
 associate arbitrary type
 -

 Key: YARN-3401
 URL: https://issues.apache.org/jira/browse/YARN-3401
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R

 IIUC it is possible for users to create a generic TimelineEntity and set an 
 arbitrary entity type. For example, for a YARN app, the right entity API is 
 ApplicationEntity. However, today nothing stops users from instantiating a 
 base TimelineEntity class and set the application type on it. This presents a 
 problem in handling these YARN system entities in the storage layer for 
 example.
 We need to ensure that the API allows only the right type of the class to be 
 created for a given entity type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3403) Nodemanager dies after a small typo in mapred-site.xml is induced

2015-03-26 Thread Nikhil Mulley (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nikhil Mulley updated YARN-3403:

Priority: Critical (was: Major)

Nodemanager dies after a small typo in mapred-site.xml is induced
-

Key: YARN-3403
URL: https://issues.apache.org/jira/browse/YARN-3403
Project: Hadoop YARN
Issue Type: Bug
Reporter: Nikhil Mulley
Priority: Critical

Hi,
We have noticed that with a small typo in terms of xml config
(mapred-site.xml) can cause the nodemanager go down completely without
stopping/restarting it externally.
I find it little weird that editing the config files on the filesystem, could
cause the running slave daemon yarn nodemanager shutdown.
In this case, I had a ending tag '/' missed in a property and that induced
the nodemanager go down in a cluster.
Why would nodemanager reload the configs while it is running? Are not they
picked up when they are started? Even if they are automated to pick up the
new configs dynamically, I think the xmllint/config checker should come in
before the nodemanager is asked to reload/restart.

---
java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId:
file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 228; columnNumber: 3; The
element type value must be terminated by the matching end-tag /value.
at
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2348)
---
Please shed light on this.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3403) Nodemanager dies after a small typo in mapred-site.xml is induced

2015-03-26 Thread Nikhil Mulley (JIRA)

Nikhil Mulley created YARN-3403:
---

 Summary: Nodemanager dies after a small typo in mapred-site.xml is 
induced
 Key: YARN-3403
 URL: https://issues.apache.org/jira/browse/YARN-3403
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Nikhil Mulley


Hi,

We have noticed that with a small typo in terms of xml config (mapred-site.xml) 
can cause the nodemanager go down completely without stopping/restarting it 
externally.

I find it little weird that editing the config files on the filesystem, could 
cause the running slave daemon yarn nodemanager shutdown.
In this case, I had a ending tag '/' missed in a property and that induced the 
nodemanager go down in a cluster. 
Why would nodemanager reload the configs while it is running? Are not they 
picked up when they are started? Even if they are automated to pick up the new 
configs dynamically, I think the xmllint/config checker should come in before 
the nodemanager is asked to reload/restart.
 
---
java.lang.RuntimeException: org.xml.sax.SAXParseException; systemId: 
file:/etc/hadoop/conf/mapred-site.xml; lineNumber: 228; columnNumber: 3; The 
element type value must be terminated by the matching end-tag /value.
   at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2348)
---

Please shed light on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3402) Security support for new timeline service.

[
https://issues.apache.org/jira/browse/YARN-3402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Junping Du updated YARN-3402:
-
Description:
We should support YARN security for new TimelineService.
Basically, there should be security token exchange between AM, NMs and
app-collectors to prevent anyone who knows the service address of app-collector
can post faked/unwanted information. Also, there should be tokens exchange
between app-collector/RMTimelineCollector and backend storage (HBase, Phoenix,
etc.) that enabling security.

was:
We should support YARN security for new TimelineService.
Basically, there should be security token exchange between AM, NMs and
app-collectors to prevent anyone who knows the service address of app-collector
can post faked/unwanted information.

Security support for new timeline service.
--

Key: YARN-3402
URL: https://issues.apache.org/jira/browse/YARN-3402
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Junping Du
Assignee: Junping Du

We should support YARN security for new TimelineService.
Basically, there should be security token exchange between AM, NMs and
app-collectors to prevent anyone who knows the service address of
app-collector can post faked/unwanted information. Also, there should be
tokens exchange between app-collector/RMTimelineCollector and backend storage
(HBase, Phoenix, etc.) that enabling security.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Unable to run Hadoop on Windows 8.1 64bit

2015-03-26 Thread venkata sravan kumar Talasila

As per Brahma i have followed the procedure he mentioned to build the
Hadoop in windows 8.1 64bit system and i was successful but unable to run
the Hadoop.

https://issues.apache.org/jira/browse/HADOOP-11752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Followed below procedure for building Hadoop and successful in building it:

http://zutai.blogspot.in/2014/06/build-install-and-run-hadoop-24-240-on.html?showComment=1422091525887#c2264594416650430988

*Runtime error while running Hadoop in Windows 8.1 64bit system:*
When i try to do hdfs namenode -format, i am getting the below error:



*C:\Users\..\hadoophdfs namenode -format'hdfs' is not recognized as an
internal or external command,operable program or batch file.*



*C:\Users\..\hadoopstart-dfs'start-dfs' is not recognized as an internal
or external command,operable program or batch file.*




*C:\Users\..\hadoop\hadoop-dist\target\hadoop-3.0.0-SNAPSHOT\sbinhdfs
namenode -format'hdfs' is not recognized as an internal or external
command,operable program or batch file.*

*C:\Users\..\hadoop\hadoop-dist\target\hadoop-3.0.0-SNAPSHOT\sbinstart-dfs*


*The system cannot find the file hadoop.The system cannot find the file
hadoop.*

Can you please let me know how to format hdfs, start DFS, YARN and run the
hadoop on windows 8.1 64bit system.

-- 

Thanks  Regards,

Sravan

CPChem

281-757-6777 (C)  |  kum...@cpchem.com kum...@cpchemt.com

[jira] [Resolved] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context


 [ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du resolved YARN-3040.
--
   Resolution: Fixed
Fix Version/s: YARN-2928
 Hadoop Flags: Reviewed

 [Data Model] Make putEntities operation be aware of the app's context
 -

 Key: YARN-3040
 URL: https://issues.apache.org/jira/browse/YARN-3040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Zhijie Shen
 Fix For: YARN-2928

 Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch, 
 YARN-3040.4.patch, YARN-3040.5.patch, YARN-3040.6.patch


 Per design in YARN-2928, implement client-side API for handling *flows*. 
 Frameworks should be able to define and pass in all attributes of flows and 
 flow runs to YARN, and they should be passed into ATS writers.
 YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3399) Consider having a Default cluster ID

2015-03-26 Thread Vinod Kumar Vavilapalli (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinod Kumar Vavilapalli updated YARN-3399:
--
Summary: Consider having a Default cluster ID (was: Default cluster ID for
RM HA)

Editing title to be appropriate.

Others commented on YARN-3040. So I'll try to summarize the discussion from
YARN-1029 and YARN-3040.
- We should have a generic {{yarn.cluster-id}} and deprecate the current RM
only configuration
- We need to have a reasonable default cluster-id
-- This is needed for the Timeline service functionality - we want to
gather insights per cluster
-- Forcing admins to set the ID explicitly is one more hurdle w.r.t
configuration
-- For single node non-HA clusters, forcing the dev/admin to set it is
unnecessary.
- But there are concerns too
-- Default cluster-id can potentially cause hard-to-debug issues in HA mode.
- Other constraints while picking a default cluster ID
-- Restarting RM on the same node shouldn't change the cluster-id

So, I propose that we set the default cluster-ID to be something like
default-$(RM-host-name)-cluster. This way
- by default, single node clusters are good across RM restarts, unless you are
running active/standby RMs on the same machine (dev environments)
- HA RMs have to be setup explicitly to be part of the same cluster - thereby
avoiding debuggability issues.
- For real life use, in order to facilitate RM migrations, administrators will
set their own cluster-id.

Consider having a Default cluster ID

Key: YARN-3399
URL: https://issues.apache.org/jira/browse/YARN-3399
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Brahma Reddy Battula

In YARN-3040, timeline service will set the default cluster ID if users don't
provide one. RM HA's current behavior is a bit different when users don't
provide cluster ID. IllegalArgumentException will throw instead. Let's
continue the discussion if RM HA needs the default cluster ID or not here,
and what's the proper default cluster ID.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-03-26 Thread Li Lu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382866#comment-14382866
]

Li Lu commented on YARN-3047:
-

Hi [~varun_saxena], thanks for the doc! I have two general questions about your
proposed plan:
# I'm a little bit confused on Timeline Reader will be a single daemon(in the
initial phase). In reader overview section there are multiple threads in the
reader, are those threads managed in YARN-3047? Specifically, what is the
concrete plan for Phase 1 on reader's architecture, single daemon multiple
thread, or single daemon single thread? If it's the former, you may want to
update YARN-3047's patch, while if it's the latter, you may want to confirm
this and update the figure afterwards (not the top priority for now).
# On storage layer we're prioritizing timeline entities and metrics, it would
be great if there are some API support from reader level for metrics. For the
current progress on the storage layer, I'm not sure if we can finish V1 storage
support by the time you finish reader phase 1. We may probably need some
coordination on this.

[Data Serving] Set up ATS reader with basic request serving structure and
lifecycle
---

Key: YARN-3047
URL: https://issues.apache.org/jira/browse/YARN-3047
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena
Attachments: Timeline_Reader(draft).pdf, YARN-3047.001.patch,
YARN-3047.003.patch, YARN-3047.005.patch, YARN-3047.02.patch,
YARN-3047.04.patch

Per design in YARN-2938, set up the ATS reader as a service and implement the
basic structure as a service. It includes lifecycle management, request
serving, and so on.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-03-26 Thread zhihai xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2893:

Attachment: YARN-2893.002.patch

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: zhihai xu
 Attachments: YARN-2893.000.patch, YARN-2893.001.patch, 
 YARN-2893.002.patch


 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-03-26 Thread zhihai xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383151#comment-14383151
 ] 

zhihai xu commented on YARN-2893:
-

[~adhoot], thanks for the review. I added a test case for the  AMLauncher 
changes in the new patch YARN-2893.002.patch.
The root cause for this bug is at job Client which submitted a bad token in 
ApplicationSubmissionContext.
The changes for RMAppManager#submitApplication is to prevent this error 
earlier. So the user who submit the application knows the real cause of the 
issue.

bq. The changes for RMAppManager#submitApplication seems to no longer return 
RMAppRejectedEvent for any exception in 
getDelegationTokenRenewer().addApplicationAsync. Is that deliberate?
I checked the code for DelegationTokenRenewer#addApplicationAsync, I didn't 
find any exception which will be generated from addApplicationAsync.
addApplicationAsync will launch a thread to run handleDTRenewerAppSubmitEvent, 
any exception from handleDTRenewerAppSubmitEvent will return RMAppRejectedEvent.
{code}
private void handleDTRenewerAppSubmitEvent(
DelegationTokenRenewerAppSubmitEvent event) {
  try {
// Setup tokens for renewal
DelegationTokenRenewer.this.handleAppSubmitEvent(event);
rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(event.getApplicationId(), 
RMAppEventType.START));
  } catch (Throwable t) {
LOG.warn(
Unable to add the application to the delegation token renewer.,
t);
// Sending APP_REJECTED is fine, since we assume that the
// RMApp is in NEW state and thus we havne't yet informed the
// Scheduler about the existence of the application
rmContext.getDispatcher().getEventHandler().handle(
new RMAppRejectedEvent(event.getApplicationId(), t.getMessage()));
  }
  }
{code}
This is why I only check the exception for parseCredentials.
Also the original code only expected the exception from parseCredentials based 
on the exception message.
{code}
LOG.warn(Unable to parse credentials., e);
{code}

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: zhihai xu
 Attachments: YARN-2893.000.patch, YARN-2893.001.patch, 
 YARN-2893.002.patch


 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-03-26 Thread zhihai xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383157#comment-14383157
 ] 

zhihai xu commented on YARN-2893:
-

By the way, the new added test case in TestApplicationMasterLauncher will fail 
without the AMLauncher changes
The following is sample failure message without the AMLauncher changes.
{code}
--
 T E S T S
---
Running 
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher
Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.838 sec  
FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher
testSetupTokens(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher)
  Time elapsed: 2.101 sec   FAILURE!
java.lang.AssertionError: EOFException should not happen.
at org.junit.Assert.fail(Assert.java:88)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher.testSetupTokens(TestApplicationMasterLauncher.java:278)
{code}

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: zhihai xu
 Attachments: YARN-2893.000.patch, YARN-2893.001.patch, 
 YARN-2893.002.patch


 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3395) [Fair Scheduler] Handle the user name correctly when user name is used as default queue name.


[ 
https://issues.apache.org/jira/browse/YARN-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381436#comment-14381436
 ] 

Hadoop QA commented on YARN-3395:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12707428/YARN-3395.000.patch
  against trunk revision 44809b8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMHA
  
org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication
  
org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7114//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7114//console

This message is automatically generated.

 [Fair Scheduler] Handle the user name correctly when user name is used as 
 default queue name.
 -

 Key: YARN-3395
 URL: https://issues.apache.org/jira/browse/YARN-3395
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3395.000.patch


 Handle the user name correctly when user name is used as default queue name 
 in fair scheduler.
 It will be better to remove the trailing and leading whitespace of the user 
 name when we use user name as default queue name, otherwise it will be 
 rejected by InvalidQueueNameException from QueueManager. I think it is 
 reasonable to make this change, because we already did special handling for 
 '.' in user name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor

2015-03-26 Thread Sidharta Seethana (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381441#comment-14381441
 ] 

Sidharta Seethana commented on YARN-3365:
-

That should read : {{container-executor --tc-read-state 
tmp-file-with-tc-commands.txt}}

 Add support for using the 'tc' tool via container-executor
 --

 Key: YARN-3365
 URL: https://issues.apache.org/jira/browse/YARN-3365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Attachments: YARN-3365.001.patch, YARN-3365.002.patch, 
 YARN-3365.003.patch


 We need the following functionality :
 1) modify network interface traffic shaping rules - to be able to attach a 
 qdisc, create child classes etc
 2) read existing rules in place 
 3) read stats for the various classes 
 Using tc requires elevated privileges - hence this functionality is to be 
 made available via container-executor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS


[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381453#comment-14381453
 ] 

Naganarasimha G R commented on YARN-3044:
-

Thanks [~vinodkv],[~vrushalic], [~sjlee0]  [~zjshen] for reviewing and 
providing your view points :
1 {{source of life-cycle events of container}} is a debatable topic, to 
summarize pro's and cons when run in NM:

Pros 
* Even though the load is not too high when compared to publishing of container 
metrics, life cycle events might have considerable load for a large cluster as 
explained by [~sjlee0]. So i feel better to get it distributed in this aspect
* if start and end time of life cycle events are logged from NM it will be 
easier to analyze flow of container as it is actual time when it was started
* IMO it would be good to have all the metrics and events are raised from NM 
itself as there might be a possibility of race condition if container entities 
are raised from RM and metrics and few other life cycle events from NM for ex. 
when RM is slow to dispatch the events and NM is faster in doing it. (though 
hbase as storage will be able to handle it well but not sure about the other 
storages we are planning to )
  
Cons
* start and end time of life cycle events might not match from what is 
displayed from RM (web ui etc..) 
* start and end time of life cycle events in terms of scheduling it might not 
be as accurate as it would have been done from RM.
Please correct me on these and add on if i have missed any.

2 ??But the life-cycle events of container should definitely originate at the 
RM; NMs don't even know many of them.??
Not much aware on this, can you please eloborate on what might be missed ?

3 ??Why would that be the case? Can the RM timeline collector not use specific 
subclasses of TimelineEntity??
Well its not the limitation at RM timeline collector which i am trying to 
mention, but the writer interface is like
{{TimelineWriter.write(TimelineEntities)}}
Writer would not be aware whether client is writing ApplicationEntity or 
AppAttemptEntity.IIUC it will just try to write 
the fields of the TimelineEntity to the storage. May be if its just storing 
entity as an json object directly to storage it might not be an issue but it 
will not be the case in hbase column storage right ?

4 ??My suggestion is that we start with reimplementing what we provided in YTS 
v1, and add more timeline data on demand later??
true that to start of with this would be sufficent, but in future i would liked 
to capture all the events as currently to analyze/debug issues with container 
we usually start searching the NM and RM logs with container string to find 
what state the application/container is in. ur opinion ?

 [Event producers] Implement RM writing app lifecycle events to ATS
 --

 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Naganarasimha G R
 Attachments: YARN-3044.20150325-1.patch


 Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2618) Avoid over-allocation of disk resources

2015-03-26 Thread Wei Yan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2618:
--
Attachment: YARN-2618-6.patch

Rebase the patch.

 Avoid over-allocation of disk resources
 ---

 Key: YARN-2618
 URL: https://issues.apache.org/jira/browse/YARN-2618
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2618-1.patch, YARN-2618-2.patch, YARN-2618-3.patch, 
 YARN-2618-4.patch, YARN-2618-5.patch, YARN-2618-6.patch


 Subtask of YARN-2139. 
 This should include
 - Add API support for introducing disk I/O as the 3rd type resource.
 - NM should report this information to the RM
 - RM should consider this to avoid over-allocation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2213) Change proxy-user cookie log in AmIpFilter to DEBUG


[ 
https://issues.apache.org/jira/browse/YARN-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381965#comment-14381965
 ] 

Hudson commented on YARN-2213:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #135 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/135/])
YARN-2213. Change proxy-user cookie log in AmIpFilter to DEBUG. (xgong: rev 
e556198e71df6be3a83e5598265cb702fc7a668b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/amfilter/AmIpFilter.java
* hadoop-yarn-project/CHANGES.txt


 Change proxy-user cookie log in AmIpFilter to DEBUG
 ---

 Key: YARN-2213
 URL: https://issues.apache.org/jira/browse/YARN-2213
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Ted Yu
Assignee: Varun Saxena
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2213.001.patch, YARN-2213.02.patch


 I saw a lot of the following lines in AppMaster log:
 {code}
 14/06/24 17:12:36 WARN web.SliderAmIpFilter: Could not find proxy-user 
 cookie, so user will not be set
 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user 
 cookie, so user will not be set
 14/06/24 17:12:39 WARN web.SliderAmIpFilter: Could not find proxy-user 
 cookie, so user will not be set
 {code}
 For long running app, this would consume considerable log space.
 Log level should be changed to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3397) yarn rmadmin should skip -failover


[ 
https://issues.apache.org/jira/browse/YARN-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381963#comment-14381963
 ] 

Hudson commented on YARN-3397:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #135 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/135/])
YARN-3397. yarn rmadmin should skip -failover. (J.Andreina via kasha) (kasha: 
rev c906a1de7280dabd9d9d8b6aeaa060898e6d17b6)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java


 yarn rmadmin should skip -failover
 --

 Key: YARN-3397
 URL: https://issues.apache.org/jira/browse/YARN-3397
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: J.Andreina
Assignee: J.Andreina
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3397.1.patch


 Failover should be filtered out from HAAdmin to be in sync with doc.
 Since -failover is not supported operation in doc it is not been mentioned, 
 cli usage is misguiding (can be in sync with doc) .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2213) Change proxy-user cookie log in AmIpFilter to DEBUG