date:20150504

[jira] [Commented] (YARN-3375) NodeHealthScriptRunner.shouldRun() check is performing 3 times for starting NodeHealthScriptRunner

2015-05-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527593#comment-14527593
 ] 

Hudson commented on YARN-3375:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7728 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7728/])
YARN-3375. NodeHealthScriptRunner.shouldRun() check is performing 3 times for 
starting NodeHealthScriptRunner (Devaraj K via wangda) (wangda: rev 
71f4de220c74bf2c90630bd0442979d92380d304)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthCheckerService.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NodeHealthScriptRunner.java


> NodeHealthScriptRunner.shouldRun() check is performing 3 times for starting 
> NodeHealthScriptRunner
> --
>
> Key: YARN-3375
> URL: https://issues.apache.org/jira/browse/YARN-3375
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Devaraj K
>Assignee: Devaraj K
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-3375.patch
>
>
> 1. NodeHealthScriptRunner.shouldRun() check is happening 3 times for starting 
> the NodeHealthScriptRunner.
> {code:title=NodeManager.java|borderStyle=solid}
> if(!NodeHealthScriptRunner.shouldRun(nodeHealthScript)) {
>   LOG.info("Abey khali");
>   return null;
> }
> {code}
> {code:title=NodeHealthCheckerService.java|borderStyle=solid}
> if (NodeHealthScriptRunner.shouldRun(
> conf.get(YarnConfiguration.NM_HEALTH_CHECK_SCRIPT_PATH))) {
>   addService(nodeHealthScriptRunner);
> }
> {code}
> {code:title=NodeHealthScriptRunner.java|borderStyle=solid}
> if (!shouldRun(nodeHealthScript)) {
>   LOG.info("Not starting node health monitor");
>   return;
> }
> {code}
> 2. If we don't configure node health script or configured health script 
> doesn't execute permission, NM logs with the below message.
> {code:xml}
> 2015-03-19 19:55:45,713 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Abey khali
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2929) Adding separator ApplicationConstants.FILE_PATH_SEPARATOR for better Windows support

2015-05-04 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527594#comment-14527594
 ] 

Jian He commented on YARN-2929:
---

[~ozawa], [~cnauroth]'s comments make sense to me. There's nothing wrong to add 
this support. But I prefer not adding one more replacement rule at this point 
if we have alternative solutions to achieve the same results. 

> Adding separator ApplicationConstants.FILE_PATH_SEPARATOR for better Windows 
> support
> 
>
> Key: YARN-2929
> URL: https://issues.apache.org/jira/browse/YARN-2929
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
> Attachments: YARN-2929.001.patch
>
>
> Some frameworks like Spark is tackling to run jobs on Windows(SPARK-1825). 
> For better multiple platform support, we should introduce 
> ApplicationConstants.FILE_PATH_SEPARATOR for making filepath 
> platform-independent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2492) (Clone of YARN-796) Allow for (admin) labels on nodes and resource-requests

2015-05-04 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527599#comment-14527599
 ] 

Wangda Tan commented on YARN-2492:
--

[~djp], I didn't close YARN-796 since there're lots of discussions and watches 
in that JIRA, itself is a sub-jira so I created YARN-2492. The targeted use 
cases in YARN-796 is still under discussion (YARN-3409), I suggest to keep both 
of them before we add constraints.

> (Clone of YARN-796) Allow for (admin) labels on nodes and resource-requests 
> 
>
> Key: YARN-2492
> URL: https://issues.apache.org/jira/browse/YARN-2492
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>
> Since YARN-796 is a sub JIRA of YARN-397, this JIRA is used to create and 
> track sub tasks and attach split patches for YARN-796.
> *Let's still keep over-all discussions on YARN-796.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3573) MiniMRYarnCluster constructor that starts the timeline server using a boolean should be marked depricated

2015-05-04 Thread Brahma Reddy Battula (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula reassigned YARN-3573:
--

Assignee: Brahma Reddy Battula

> MiniMRYarnCluster constructor that starts the timeline server using a boolean 
> should be marked depricated
> -
>
> Key: YARN-3573
> URL: https://issues.apache.org/jira/browse/YARN-3573
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Brahma Reddy Battula
>
> {code}MiniMRYarnCluster(String testName, int noOfNMs, boolean enableAHS){code}
> starts the timeline server using *boolean enableAHS*. It is better to have 
> the timelineserver started based on the config value.
> We should mark this constructor as deprecated to avoid its future use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1621) Add CLI to list rows of

2015-05-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527601#comment-14527601
 ] 

Hadoop QA commented on YARN-1621:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 45s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 35s | There were no new javac warning 
messages. |
| {color:red}-1{color} | javadoc |   9m 36s | The applied patch generated  3  
additional warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m 59s | There were no new checkstyle 
issues. |
| {color:blue}0{color} | shellcheck |   2m 59s | Shellcheck was not available. |
| {color:green}+1{color} | whitespace |   0m 29s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m 28s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | mapreduce tests | 106m 18s | Tests passed in 
hadoop-mapreduce-client-jobclient. |
| {color:green}+1{color} | yarn tests |   0m 27s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   6m 49s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m  2s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |  62m 37s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 221m 42s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12730251/YARN-1621.6.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle shellcheck |
| git revision | trunk / 3fe79e1 |
| javadoc | 
https://builds.apache.org/job/PreCommit-YARN-Build/7689/artifact/patchprocess/diffJavadocWarnings.txt
 |
| hadoop-mapreduce-client-jobclient test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7689/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7689/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7689/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7689/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7689/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7689/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7689/console |


This message was automatically generated.

> Add CLI to list rows of  state of container>
> --
>
> Key: YARN-1621
> URL: https://issues.apache.org/jira/browse/YARN-1621
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Tassapol Athiapinya
>Assignee: Bartosz Ługowski
> Attachments: YARN-1621.1.patch, YARN-1621.2.patch, YARN-1621.3.patch, 
> YARN-1621.4.patch, YARN-1621.5.patch, YARN-1621.6.patch
>
>
> As more applications are moved to YARN, we need generic CLI to list rows of 
> . Today 
> if YARN application running in a container does hang, there is no way to find 
> out more info because a user does not know where each attempt is running in.
> For each running application, it is useful to differentiate between 
> running/succeeded/failed/killed containers.
>  
> {code:title=proposed yarn cli}
> $ yarn application -list-containers -applicationId  [-containerState 
> ]
> where containerState is optional filter to list container in given state only.
>  can be running/succeeded/killed/failed/all.
> A user can specify more than one container state at once e.g. KILLED,FAILED.
> 
> {code}

[jira] [Updated] (YARN-3573) MiniMRYarnCluster constructor that starts the timeline server using a boolean should be marked depricated

2015-05-04 Thread Brahma Reddy Battula (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3573:
---
Attachment: YARN-3573.patch

> MiniMRYarnCluster constructor that starts the timeline server using a boolean 
> should be marked depricated
> -
>
> Key: YARN-3573
> URL: https://issues.apache.org/jira/browse/YARN-3573
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Brahma Reddy Battula
> Attachments: YARN-3573.patch
>
>
> {code}MiniMRYarnCluster(String testName, int noOfNMs, boolean enableAHS){code}
> starts the timeline server using *boolean enableAHS*. It is better to have 
> the timelineserver started based on the config value.
> We should mark this constructor as deprecated to avoid its future use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file

2015-05-04 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527612#comment-14527612
 ] 

Jian He commented on YARN-3018:
---

hi [~nijel], 
below code in CapacitySchedulerConfiguration actually uses 0 instead. How about 
change it to be 0 ? and simplify below code to {{return 
getInt(NODE_LOCALITY_DELAY, DEFAULT_NODE_LOCALITY_DELAY);}}
{code}
  public int getNodeLocalityDelay() {
int delay = getInt(NODE_LOCALITY_DELAY, DEFAULT_NODE_LOCALITY_DELAY);
return (delay == DEFAULT_NODE_LOCALITY_DELAY) ? 0 : delay;
  }
{code}

> Unify the default value for yarn.scheduler.capacity.node-locality-delay in 
> code and default xml file
> 
>
> Key: YARN-3018
> URL: https://issues.apache.org/jira/browse/YARN-3018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
> Attachments: YARN-3018-1.patch, YARN-3018-2.patch, YARN-3018-3.patch
>
>
> For the configuration item "yarn.scheduler.capacity.node-locality-delay" the 
> default value given in code is "-1"
> public static final int DEFAULT_NODE_LOCALITY_DELAY = -1;
> In the default capacity-scheduler.xml file in the resource manager config 
> directory it is 40.
> Can it be unified to avoid confusion when the user creates the file without 
> this configuration. IF he expects the values in the file to be default 
> values, then it will be wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3562) unit tests failures and issues found from findbug from earlier ATS checkins

2015-05-04 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527614#comment-14527614
 ] 

Naganarasimha G R commented on YARN-3562:
-

Thanks [~sjlee0], yes lately seeing some strange jenkins output and thanks 
for testing locally, but there might be some other unrelated test case failure  
as we are modifying the miniyarncluster, so not sure how to proceed in that 
case ? 
also how do you guys kickoff jenkins ? delete and reupload the patch ?

> unit tests failures and issues found from findbug from earlier ATS checkins
> ---
>
> Key: YARN-3562
> URL: https://issues.apache.org/jira/browse/YARN-3562
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-3562-YARN-2928.001.patch
>
>
> *Issues reported from MAPREDUCE-6337* :
> A bunch of MR unit tests are failing on our branch whenever the mini YARN 
> cluster needs to bring up multiple node managers.
> For example, see 
> https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5472/testReport/org.apache.hadoop.mapred/TestClusterMapReduceTestCase/testMapReduceRestarting/
> It is because the NMCollectorService is using a fixed port for the RPC (8048).
> *Issues reported from YARN-3044* :
> Test case failures and tools(FB & CS) issues found :
> # find bugs issue : Comparison of String objects using == or != in 
> ResourceTrackerService.updateAppCollectorsMap
> # find bugs issue : Boxing/unboxing to parse a primitive 
> RMTimelineCollectorManager.postPut. Called method Long.longValue()
> Should call Long.parseLong(String) instead.
> # find bugs issue : DM_DEFAULT_ENCODING Called method new 
> java.io.FileWriter(String, boolean) At 
> FileSystemTimelineWriterImpl.java:\[line 86\]
> # hadoop.yarn.server.resourcemanager.TestAppManager, 
> hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions, 
> hadoop.yarn.server.resourcemanager.TestClientRMService & 
> hadoop.yarn.server.resourcemanager.logaggregationstatus.TestRMAppLogAggregationStatus,
>  refer https://builds.apache.org/job/PreCommit-YARN-Build/7534/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3574) RM hangs on stopping MetricsSinkAdapter when transitioning to standby

2015-05-04 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527616#comment-14527616
 ] 

Brahma Reddy Battula commented on YARN-3574:


[~jianhe] I would like to work on this.. I am not able to reproduce this .. can 
you please give scenario ..?

> RM hangs on stopping MetricsSinkAdapter when transitioning to standby
> -
>
> Key: YARN-3574
> URL: https://issues.apache.org/jira/browse/YARN-3574
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>
> We've seen a situation that one RM hangs on stopping the MetricsSinkAdapter
> {code}
> "main-EventThread" daemon prio=10 tid=0x7f9b24031000 nid=0x2d18 in 
> Object.wait() [0x7f9afe7eb000]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xc058dcf8> (a 
> org.apache.hadoop.metrics2.impl.MetricsSinkAdapter$1)
> at java.lang.Thread.join(Thread.java:1281)
> - locked <0xc058dcf8> (a 
> org.apache.hadoop.metrics2.impl.MetricsSinkAdapter$1)
> at java.lang.Thread.join(Thread.java:1355)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSinkAdapter.stop(MetricsSinkAdapter.java:202)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stopSinks(MetricsSystemImpl.java:472)
> - locked <0xc04cc1a0> (a 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stop(MetricsSystemImpl.java:213)
> - locked <0xc04cc1a0> (a 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.shutdown(MetricsSystemImpl.java:592)
> - locked <0xc04cc1a0> (a 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
> at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdownInstance(DefaultMetricsSystem.java:72)
> at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdown(DefaultMetricsSystem.java:68)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:605)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> - locked <0xc0503568> (a java.lang.Object)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.stopActiveServices(ResourceManager.java:1024)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToStandby(ResourceManager.java:1076)
> - locked <0xc03fe3b8> (a 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToStandby(AdminService.java:322)
> - locked <0xc0502b10> (a 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeStandby(EmbeddedElectorService.java:135)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeStandby(ActiveStandbyElector.java:911)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:428)
> - locked <0xc0718940> (a 
> org.apache.hadoop.ha.ActiveStandbyElector)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:605)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> {code}
>  looks like the {{sinkThread.interrupt();}} in MetricsSinkAdapter#stop 
> doesn't really interrupt the thread, which cause it to hang at join.
> This appears only once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3574) RM hangs on stopping MetricsSinkAdapter when transitioning to standby

2015-05-04 Thread Brahma Reddy Battula (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula reassigned YARN-3574:
--

Assignee: Brahma Reddy Battula

> RM hangs on stopping MetricsSinkAdapter when transitioning to standby
> -
>
> Key: YARN-3574
> URL: https://issues.apache.org/jira/browse/YARN-3574
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Brahma Reddy Battula
>
> We've seen a situation that one RM hangs on stopping the MetricsSinkAdapter
> {code}
> "main-EventThread" daemon prio=10 tid=0x7f9b24031000 nid=0x2d18 in 
> Object.wait() [0x7f9afe7eb000]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xc058dcf8> (a 
> org.apache.hadoop.metrics2.impl.MetricsSinkAdapter$1)
> at java.lang.Thread.join(Thread.java:1281)
> - locked <0xc058dcf8> (a 
> org.apache.hadoop.metrics2.impl.MetricsSinkAdapter$1)
> at java.lang.Thread.join(Thread.java:1355)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSinkAdapter.stop(MetricsSinkAdapter.java:202)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stopSinks(MetricsSystemImpl.java:472)
> - locked <0xc04cc1a0> (a 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stop(MetricsSystemImpl.java:213)
> - locked <0xc04cc1a0> (a 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.shutdown(MetricsSystemImpl.java:592)
> - locked <0xc04cc1a0> (a 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
> at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdownInstance(DefaultMetricsSystem.java:72)
> at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdown(DefaultMetricsSystem.java:68)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:605)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> - locked <0xc0503568> (a java.lang.Object)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.stopActiveServices(ResourceManager.java:1024)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToStandby(ResourceManager.java:1076)
> - locked <0xc03fe3b8> (a 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToStandby(AdminService.java:322)
> - locked <0xc0502b10> (a 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeStandby(EmbeddedElectorService.java:135)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeStandby(ActiveStandbyElector.java:911)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:428)
> - locked <0xc0718940> (a 
> org.apache.hadoop.ha.ActiveStandbyElector)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:605)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> {code}
>  looks like the {{sinkThread.interrupt();}} in MetricsSinkAdapter#stop 
> doesn't really interrupt the thread, which cause it to hang at join.
> This appears only once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3573) MiniMRYarnCluster constructor that starts the timeline server using a boolean should be marked depricated

2015-05-04 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527619#comment-14527619
 ] 

Brahma Reddy Battula commented on YARN-3573:


[~mitdesai] Thanks for reporting, Attached the patch kindly review..

> MiniMRYarnCluster constructor that starts the timeline server using a boolean 
> should be marked depricated
> -
>
> Key: YARN-3573
> URL: https://issues.apache.org/jira/browse/YARN-3573
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Brahma Reddy Battula
> Attachments: YARN-3573.patch
>
>
> {code}MiniMRYarnCluster(String testName, int noOfNMs, boolean enableAHS){code}
> starts the timeline server using *boolean enableAHS*. It is better to have 
> the timelineserver started based on the config value.
> We should mark this constructor as deprecated to avoid its future use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3562) unit tests failures and issues found from findbug from earlier ATS checkins

2015-05-04 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527626#comment-14527626
 ] 

Zhijie Shen commented on YARN-3562:
---

I also see a lot of NoSuchMethodError here too. I think jenkins build on branch 
has some bug.

> unit tests failures and issues found from findbug from earlier ATS checkins
> ---
>
> Key: YARN-3562
> URL: https://issues.apache.org/jira/browse/YARN-3562
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-3562-YARN-2928.001.patch
>
>
> *Issues reported from MAPREDUCE-6337* :
> A bunch of MR unit tests are failing on our branch whenever the mini YARN 
> cluster needs to bring up multiple node managers.
> For example, see 
> https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5472/testReport/org.apache.hadoop.mapred/TestClusterMapReduceTestCase/testMapReduceRestarting/
> It is because the NMCollectorService is using a fixed port for the RPC (8048).
> *Issues reported from YARN-3044* :
> Test case failures and tools(FB & CS) issues found :
> # find bugs issue : Comparison of String objects using == or != in 
> ResourceTrackerService.updateAppCollectorsMap
> # find bugs issue : Boxing/unboxing to parse a primitive 
> RMTimelineCollectorManager.postPut. Called method Long.longValue()
> Should call Long.parseLong(String) instead.
> # find bugs issue : DM_DEFAULT_ENCODING Called method new 
> java.io.FileWriter(String, boolean) At 
> FileSystemTimelineWriterImpl.java:\[line 86\]
> # hadoop.yarn.server.resourcemanager.TestAppManager, 
> hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions, 
> hadoop.yarn.server.resourcemanager.TestClientRMService & 
> hadoop.yarn.server.resourcemanager.logaggregationstatus.TestRMAppLogAggregationStatus,
>  refer https://builds.apache.org/job/PreCommit-YARN-Build/7534/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-04 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3134:

Attachment: YARN-3134-YARN-2928.003.patch

Updated my patch according to the latest comments. I've rebased the patch to 
the latest YARN-2928 branch, with YARN-3551 in. In this version we're no longer 
swallowing exceptions. I have not made the change on the Phoenix connection 
string since, according to our previous discussion, we're planning to address 
this after we've decided which implementation to pursue in the future. 

A special note to [~zjshen]: I'm not sure my current way to access the 
"singleData" section of a TimelineMetric is correct (since the field no longer 
exists). It would be great if you can take a look at it. Thanks! 

> [Storage implementation] Exploiting the option of using Phoenix to access 
> HBase backend
> ---
>
> Key: YARN-3134
> URL: https://issues.apache.org/jira/browse/YARN-3134
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Li Lu
> Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
> YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
> YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
> YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
> YARN-3134-YARN-2928.003.patch, YARN-3134DataSchema.pdf
>
>
> Quote the introduction on Phoenix web page:
> {code}
> Apache Phoenix is a relational database layer over HBase delivered as a 
> client-embedded JDBC driver targeting low latency queries over HBase data. 
> Apache Phoenix takes your SQL query, compiles it into a series of HBase 
> scans, and orchestrates the running of those scans to produce regular JDBC 
> result sets. The table metadata is stored in an HBase table and versioned, 
> such that snapshot queries over prior versions will automatically use the 
> correct schema. Direct use of the HBase API, along with coprocessors and 
> custom filters, results in performance on the order of milliseconds for small 
> queries, or seconds for tens of millions of rows.
> {code}
> It may simply our implementation read/write data from/to HBase, and can 
> easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-04 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527632#comment-14527632
 ] 

Li Lu commented on YARN-3134:
-

And, one more thing: I'm closing all PreparedStatements implicitly in the 
try-with-resource statements. This statement will not swallow any exceptions 
(since there's no "catch" after it) but will guarantee the resource is released 
after the block's execution, even if there're exceptions. 

> [Storage implementation] Exploiting the option of using Phoenix to access 
> HBase backend
> ---
>
> Key: YARN-3134
> URL: https://issues.apache.org/jira/browse/YARN-3134
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Li Lu
> Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
> YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
> YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
> YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
> YARN-3134-YARN-2928.003.patch, YARN-3134DataSchema.pdf
>
>
> Quote the introduction on Phoenix web page:
> {code}
> Apache Phoenix is a relational database layer over HBase delivered as a 
> client-embedded JDBC driver targeting low latency queries over HBase data. 
> Apache Phoenix takes your SQL query, compiles it into a series of HBase 
> scans, and orchestrates the running of those scans to produce regular JDBC 
> result sets. The table metadata is stored in an HBase table and versioned, 
> such that snapshot queries over prior versions will automatically use the 
> correct schema. Direct use of the HBase API, along with coprocessors and 
> custom filters, results in performance on the order of milliseconds for small 
> queries, or seconds for tens of millions of rows.
> {code}
> It may simply our implementation read/write data from/to HBase, and can 
> easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3514) Active directory usernames like domain\login cause YARN failures

2015-05-04 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527639#comment-14527639
 ] 

Wangda Tan commented on YARN-3514:
--

[~cnauroth], I think this causes other problems in latest YARN as well, for 
example:

If a user with name with mixed cases for example "De", if we have a rule "/L" 
in kerberos side to make all names to lower case, when NM doing log 
aggregation, it will fail because user name doesn't match (in 
UserGroupInformation is "de", but in "OS").

{code}
java.io.IOException: Owner 'De' for path 
/hadoop/yarn2/log/application_1428608050835_0013/container_1428608050835_0013_01_06/stder
r did not match expected owner 'de'
at org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:285)
at 
org.apache.hadoop.io.SecureIOUtils.forceSecureOpenForRead(SecureIOUtils.java:219)
at 
org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:204)
at 
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogValue.secureOpenFile(AggregatedLogFormat.java:275)
at 
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogValue.write(AggregatedLogFormat.java:227)
at 
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.append(AggregatedLogFormat.java:448)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl$ContainerLogAggregator.doContainer
LogAggregation(AppLogAggregatorImpl.java:534)
at 
...
{code}

One possible solution is ignoring cases while compare user name, but that will 
be problematic when user "De"/"de" existed at the same time. Any thoughts? 
[~cnauroth].

> Active directory usernames like domain\login cause YARN failures
> 
>
> Key: YARN-3514
> URL: https://issues.apache.org/jira/browse/YARN-3514
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: CentOS6
>Reporter: john lilley
>Assignee: Chris Nauroth
>Priority: Minor
> Attachments: YARN-3514.001.patch, YARN-3514.002.patch
>
>
> We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is 
> Kerberos-enabled and uses an external AD domain controller for the KDC.  We 
> are able to authenticate, browse HDFS, etc.  However, YARN fails during 
> localization because it seems to get confused by the presence of a \ 
> character in the local user name.
> Our AD authentication on the nodes goes through sssd and set configured to 
> map AD users onto the form domain\username.  For example, our test user has a 
> Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user 
> "domain\hadoopuser".  We have no problem validating that user with PAM, 
> logging in as that user, su-ing to that user, etc.
> However, when we attempt to run a YARN application master, the localization 
> step fails when setting up the local cache directory for the AM.  The error 
> that comes out of the RM logs:
> 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: 
> ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, 
> diagnostics='Application application_1429295486450_0001 failed 1 times due to 
> AM Container for appattempt_1429295486450_0001_01 exited with  exitCode: 
> -1000 due to: Application application_1429295486450_0001 initialization 
> failed (exitCode=255) with output: main : command provided 0
> main : user is DOMAIN\hadoopuser
> main : requested yarn user is domain\hadoopuser
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create 
> directory: 
> /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10
> at 
> org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347)
> .Failing this attempt.. Failing the application.'
> However, when we look on the node launching the AM, we see this:
> [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache
> [root@rpb-cdh-kerb-2 usercache]# ls -l
> drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser
> There appears to be different treatment of the \ character in diff

[jira] [Commented] (YARN-3343) TestCapacitySchedulerNodeLabelUpdate.testNodeUpdate sometime fails in trunk

2015-05-04 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527640#comment-14527640
 ] 

Jian He commented on YARN-3343:
---

[~rohithsharma], is this still reproducible ? seems not on my side. 

> TestCapacitySchedulerNodeLabelUpdate.testNodeUpdate sometime fails in trunk
> ---
>
> Key: YARN-3343
> URL: https://issues.apache.org/jira/browse/YARN-3343
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Xuan Gong
>Assignee: Rohith
>Priority: Minor
> Attachments: 0001-YARN-3343.patch
>
>
> Error Message
> test timed out after 3 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 3 milliseconds
>   at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
>   at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
>   at 
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
>   at java.net.InetAddress.getAllByName0(InetAddress.java:1246)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1162)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1098)
>   at java.net.InetAddress.getByName(InetAddress.java:1048)
>   at org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:563)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager.isValidNode(NodesListManager.java:147)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(ResourceTrackerService.java:367)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockNM.nodeHeartbeat(MockNM.java:178)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockNM.nodeHeartbeat(MockNM.java:136)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:206)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate.testNodeUpdate(TestCapacitySchedulerNodeLabelUpdate.java:157)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3523) Cleanup ResourceManagerAdministrationProtocol interface audience

2015-05-04 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527649#comment-14527649
 ] 

Naganarasimha G R commented on YARN-3523:
-

in that case better to remove @Stable and not add  @Unstable .. thoughts ?

> Cleanup ResourceManagerAdministrationProtocol interface audience
> 
>
> Key: YARN-3523
> URL: https://issues.apache.org/jira/browse/YARN-3523
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
>  Labels: newbie
> Attachments: YARN-3523.20150422-1.patch, YARN-3523.20150504-1.patch
>
>
> I noticed ResourceManagerAdministrationProtocol has @Private audience for the 
> class and @Public audience for methods. It doesn't make sense to me. We 
> should make class audience and methods audience consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2918) RM starts up fails if accessible-node-labels are configured to queue without cluster lables

2015-05-04 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527671#comment-14527671
 ] 

Wangda Tan commented on YARN-2918:
--

[~rohithsharma], can I take this JIRA? I want to address the checking and 
remove directlyAccessNodeLabelStore as well.

Thanks,

> RM starts up fails if accessible-node-labels are configured to queue without 
> cluster lables
> ---
>
> Key: YARN-2918
> URL: https://issues.apache.org/jira/browse/YARN-2918
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith
>Assignee: Rohith
>
> I configured accessible-node-labels to queue. But RM startup fails with below 
> exception. I see current steps to configure NodeLabel is first need to add 
> via rmadmin and later need to configure for queues. But it will be good if 
> both cluster and queue node labels has consitency in configuring it. 
> {noformat}
> 2014-12-03 20:11:50,126 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> NodeLabelManager doesn't include label = x, please check.
>   at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:556)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:982)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:249)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1203)
> Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, 
> please check.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.(AbstractCSQueue.java:109)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:120)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels

2015-05-04 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527697#comment-14527697
 ] 

Wangda Tan commented on YARN-3521:
--

[~sunilg], Make sense to me, 

bq. IMO I also feel that NodeLabelManager apis can use Object rather than 
Strings. Admin interface can take this conversion logic.
Sorry I didn't get this, currently addToCluserNodeLabels is already takes 
object instead of String and you're using it in your patch.

> Support return structured NodeLabel objects in REST API when call 
> getClusterNodeLabels
> --
>
> Key: YARN-3521
> URL: https://issues.apache.org/jira/browse/YARN-3521
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: 0001-YARN-3521.patch, 0002-YARN-3521.patch, 
> 0003-YARN-3521.patch, 0004-YARN-3521.patch
>
>
> In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should 
> make the same change in REST API side to make them consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3574) RM hangs on stopping MetricsSinkAdapter when transitioning to standby

2015-05-04 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3574:
--
Description: 
We've seen a situation that one RM hangs on stopping the MetricsSinkAdapter

{code}
"main-EventThread" daemon prio=10 tid=0x7f9b24031000 nid=0x2d18 in 
Object.wait() [0x7f9afe7eb000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0xc058dcf8> (a 
org.apache.hadoop.metrics2.impl.MetricsSinkAdapter$1)
at java.lang.Thread.join(Thread.java:1281)
- locked <0xc058dcf8> (a 
org.apache.hadoop.metrics2.impl.MetricsSinkAdapter$1)
at java.lang.Thread.join(Thread.java:1355)
at 
org.apache.hadoop.metrics2.impl.MetricsSinkAdapter.stop(MetricsSinkAdapter.java:202)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stopSinks(MetricsSystemImpl.java:472)
- locked <0xc04cc1a0> (a 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stop(MetricsSystemImpl.java:213)
- locked <0xc04cc1a0> (a 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.shutdown(MetricsSystemImpl.java:592)
- locked <0xc04cc1a0> (a 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdownInstance(DefaultMetricsSystem.java:72)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdown(DefaultMetricsSystem.java:68)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:605)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
- locked <0xc0503568> (a java.lang.Object)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.stopActiveServices(ResourceManager.java:1024)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToStandby(ResourceManager.java:1076)
- locked <0xc03fe3b8> (a 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToStandby(AdminService.java:322)
- locked <0xc0502b10> (a 
org.apache.hadoop.yarn.server.resourcemanager.AdminService)
at 
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeStandby(EmbeddedElectorService.java:135)
at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeStandby(ActiveStandbyElector.java:911)
at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:428)
- locked <0xc0718940> (a 
org.apache.hadoop.ha.ActiveStandbyElector)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:605)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
{code}
{code}
"timeline" daemon prio=10 tid=0x7f9b34d55000 nid=0x1d93 runnable 
[0x7f9b0cbbf000]
   java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
- locked <0xc0f522c8> (a java.io.BufferedInputStream)
at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at 
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
at 
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
at 
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
at 
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at 
org.apache.hadoop.metrics2.sink.timeline.AbstractTimelineMetricsSink.emitMetrics(AbstractTimelineMetricsSink.java:66)
at 
org.apache.hadoop.metrics2.sink.timeline.HadoopTimelineMetricsSink.putMetrics(HadoopTimelineMetricsSink.java:203)
at 
org.apache.hadoop.metrics2.impl.MetricsSinkAdapter.consume(MetricsSinkAdapter.java:175)
at 
org.apache.hadoop.metrics2.impl.MetricsSin

[jira] [Commented] (YARN-3574) RM hangs on stopping MetricsSinkAdapter when transitioning to standby

2015-05-04 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527698#comment-14527698
 ] 

Jian He commented on YARN-3574:
---

[~brahmareddy],  I'm also not able to repro.. I wondered if any other folks 
have seen this issue before.
we found this while doing ambari integration testing. I added one more stack 
trace for the blocking thread in the description. 



> RM hangs on stopping MetricsSinkAdapter when transitioning to standby
> -
>
> Key: YARN-3574
> URL: https://issues.apache.org/jira/browse/YARN-3574
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Brahma Reddy Battula
>
> We've seen a situation that one RM hangs on stopping the MetricsSinkAdapter
> {code}
> "main-EventThread" daemon prio=10 tid=0x7f9b24031000 nid=0x2d18 in 
> Object.wait() [0x7f9afe7eb000]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xc058dcf8> (a 
> org.apache.hadoop.metrics2.impl.MetricsSinkAdapter$1)
> at java.lang.Thread.join(Thread.java:1281)
> - locked <0xc058dcf8> (a 
> org.apache.hadoop.metrics2.impl.MetricsSinkAdapter$1)
> at java.lang.Thread.join(Thread.java:1355)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSinkAdapter.stop(MetricsSinkAdapter.java:202)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stopSinks(MetricsSystemImpl.java:472)
> - locked <0xc04cc1a0> (a 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stop(MetricsSystemImpl.java:213)
> - locked <0xc04cc1a0> (a 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.shutdown(MetricsSystemImpl.java:592)
> - locked <0xc04cc1a0> (a 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl)
> at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdownInstance(DefaultMetricsSystem.java:72)
> at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdown(DefaultMetricsSystem.java:68)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:605)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> - locked <0xc0503568> (a java.lang.Object)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.stopActiveServices(ResourceManager.java:1024)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToStandby(ResourceManager.java:1076)
> - locked <0xc03fe3b8> (a 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToStandby(AdminService.java:322)
> - locked <0xc0502b10> (a 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeStandby(EmbeddedElectorService.java:135)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeStandby(ActiveStandbyElector.java:911)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:428)
> - locked <0xc0718940> (a 
> org.apache.hadoop.ha.ActiveStandbyElector)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:605)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> {code}
> {code}
> "timeline" daemon prio=10 tid=0x7f9b34d55000 nid=0x1d93 runnable 
> [0x7f9b0cbbf000]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:152)
> at java.net.SocketInputStream.read(SocketInputStream.java:122)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> - locked <0xc0f522c8> (a java.io.BufferedInputStream)
> at 
> org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
> at 
> org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
> at 
> org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
> at 
> org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
> at 
> org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
> at 
> org.apache.commons.httpclient.HttpMethodBase.exec

[jira] [Commented] (YARN-3557) Support Intel Trusted Execution Technology(TXT) in YARN scheduler

2015-05-04 Thread Dian Fu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527699#comment-14527699
 ] 

Dian Fu commented on YARN-3557:
---

Hi [~leftnoteasy],
Thanks a lot for your comments. What about the support of both distributed 
configuration and centralized configuration? Any thoughts about the solution I 
mentioned in the above comment?

> Support Intel Trusted Execution Technology(TXT) in YARN scheduler
> -
>
> Key: YARN-3557
> URL: https://issues.apache.org/jira/browse/YARN-3557
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Dian Fu
> Attachments: Support TXT in YARN high level design doc.pdf
>
>
> Intel TXT defines platform-level enhancements that provide the building 
> blocks for creating trusted platforms. A TXT aware YARN scheduler can 
> schedule security sensitive jobs on TXT enabled nodes only. YARN-2492 
> provides the capacity to restrict YARN applications to run only on cluster 
> nodes that have a specified node label. This is a good mechanism that be 
> utilized for TXT aware YARN scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3514) Active directory usernames like domain\login cause YARN failures

2015-05-04 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527702#comment-14527702
 ] 

Vinod Kumar Vavilapalli commented on YARN-3514:
---

I also doubt if this (the fix by the patch) is the only place where 
domain\login type of user-names will fail in YARN.

> Active directory usernames like domain\login cause YARN failures
> 
>
> Key: YARN-3514
> URL: https://issues.apache.org/jira/browse/YARN-3514
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: CentOS6
>Reporter: john lilley
>Assignee: Chris Nauroth
>Priority: Minor
> Attachments: YARN-3514.001.patch, YARN-3514.002.patch
>
>
> We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is 
> Kerberos-enabled and uses an external AD domain controller for the KDC.  We 
> are able to authenticate, browse HDFS, etc.  However, YARN fails during 
> localization because it seems to get confused by the presence of a \ 
> character in the local user name.
> Our AD authentication on the nodes goes through sssd and set configured to 
> map AD users onto the form domain\username.  For example, our test user has a 
> Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user 
> "domain\hadoopuser".  We have no problem validating that user with PAM, 
> logging in as that user, su-ing to that user, etc.
> However, when we attempt to run a YARN application master, the localization 
> step fails when setting up the local cache directory for the AM.  The error 
> that comes out of the RM logs:
> 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: 
> ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, 
> diagnostics='Application application_1429295486450_0001 failed 1 times due to 
> AM Container for appattempt_1429295486450_0001_01 exited with  exitCode: 
> -1000 due to: Application application_1429295486450_0001 initialization 
> failed (exitCode=255) with output: main : command provided 0
> main : user is DOMAIN\hadoopuser
> main : requested yarn user is domain\hadoopuser
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create 
> directory: 
> /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10
> at 
> org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347)
> .Failing this attempt.. Failing the application.'
> However, when we look on the node launching the AM, we see this:
> [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache
> [root@rpb-cdh-kerb-2 usercache]# ls -l
> drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser
> There appears to be different treatment of the \ character in different 
> places.  Something creates the directory as "domain\hadoopuser" but something 
> else later attempts to use it as "domain%5Chadoopuser".  I’m not sure where 
> or why the URL escapement converts the \ to %5C or why this is not consistent.
> I should also mention, for the sake of completeness, our auth_to_local rule 
> is set up to map u...@domain.com to domain\user:
> RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/domain\\$1/g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3523) Cleanup ResourceManagerAdministrationProtocol interface audience

2015-05-04 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527705#comment-14527705
 ] 

Vinod Kumar Vavilapalli commented on YARN-3523:
---

Makes sense.

> Cleanup ResourceManagerAdministrationProtocol interface audience
> 
>
> Key: YARN-3523
> URL: https://issues.apache.org/jira/browse/YARN-3523
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
>  Labels: newbie
> Attachments: YARN-3523.20150422-1.patch, YARN-3523.20150504-1.patch
>
>
> I noticed ResourceManagerAdministrationProtocol has @Private audience for the 
> class and @Public audience for methods. It doesn't make sense to me. We 
> should make class audience and methods audience consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3518) default rm/am expire interval should not less than default resourcemanager connect wait time

2015-05-04 Thread sandflee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527706#comment-14527706
 ] 

sandflee commented on YARN-3518:


agree, we should set nm, am, client  separately

> default rm/am expire interval should not less than default resourcemanager 
> connect wait time
> 
>
> Key: YARN-3518
> URL: https://issues.apache.org/jira/browse/YARN-3518
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Reporter: sandflee
>Assignee: sandflee
>  Labels: configuration, newbie
> Attachments: YARN-3518.001.patch
>
>
> take am for example, if am can't connect to RM, after am expire (600s), RM 
> relaunch am, and there will be two am at the same time util resourcemanager 
> connect max wait time(900s) passed.
> DEFAULT_RESOURCEMANAGER_CONNECT_MAX_WAIT_MS =  15 * 60 * 1000;
> DEFAULT_RM_AM_EXPIRY_INTERVAL_MS = 60;
> DEFAULT_RM_NM_EXPIRY_INTERVAL_MS = 60;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3069) Document missing properties in yarn-default.xml

2015-05-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527732#comment-14527732
 ] 

Hadoop QA commented on YARN-3069:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 46s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   7m 38s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 42s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   4m 45s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  3s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 39s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   7m 10s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | common tests |  23m 32s | Tests passed in 
hadoop-common. |
| {color:green}+1{color} | mapreduce tests |   9m 42s | Tests passed in 
hadoop-mapreduce-client-app. |
| {color:green}+1{color} | yarn tests |   1m 59s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | hdfs tests | 164m 48s | Tests failed in hadoop-hdfs. |
| | | 246m 47s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestFileCreation |
|   | hadoop.hdfs.TestHDFSFileSystemContract |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12730267/YARN-3069.006.patch |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | trunk / bf70c5a |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7691/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-mapreduce-client-app test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7691/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7691/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7691/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7691/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7691/console |


This message was automatically generated.

> Document missing properties in yarn-default.xml
> ---
>
> Key: YARN-3069
> URL: https://issues.apache.org/jira/browse/YARN-3069
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: supportability
> Attachments: YARN-3069.001.patch, YARN-3069.002.patch, 
> YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, 
> YARN-3069.006.patch
>
>
> The following properties are currently not defined in yarn-default.xml.  
> These properties should either be
>   A) documented in yarn-default.xml OR
>   B)  listed as an exception (with comments, e.g. for internal use) in the 
> TestYarnConfigurationFields unit test
> Any comments for any of the properties below are welcome.
>   org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker
>   org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore
>   security.applicationhistory.protocol.acl
>   yarn.app.container.log.backups
>   yarn.app.container.log.dir
>   yarn.app.container.log.filesize
>   yarn.client.app-submission.poll-interval
>   yarn.client.application-client-protocol.poll-timeout-ms
>   yarn.is.minicluster
>   yarn.log.server.url
>   yarn.minicluster.control-resource-monitoring
>   yarn.minicluster.fixed.ports
>   yarn.minicluster.use-rpc
>   yarn.node-labels.fs-store.retry-policy-spec
>   yarn.node-labels.fs-store.root-dir
>   yarn.node-labels.manager-class
>   yarn.nodemanager.container-executor.os.sched.priority.adjustment
>   yarn.nodemanager.container-monitor.process-tree.class
>   yarn.nodemanager.disk-health-checker.enable
>   yarn.nodemanager.docker-container-executor.image-name
>   yarn.nodemanager.linux-container-

[jira] [Commented] (YARN-3343) TestCapacitySchedulerNodeLabelUpdate.testNodeUpdate sometime fails in trunk

2015-05-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527743#comment-14527743
 ] 

Hadoop QA commented on YARN-3343:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12704716/0001-YARN-3343.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / 551615f |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7697/console |


This message was automatically generated.

> TestCapacitySchedulerNodeLabelUpdate.testNodeUpdate sometime fails in trunk
> ---
>
> Key: YARN-3343
> URL: https://issues.apache.org/jira/browse/YARN-3343
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Xuan Gong
>Assignee: Rohith
>Priority: Minor
> Attachments: 0001-YARN-3343.patch
>
>
> Error Message
> test timed out after 3 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 3 milliseconds
>   at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
>   at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
>   at 
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
>   at java.net.InetAddress.getAllByName0(InetAddress.java:1246)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1162)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1098)
>   at java.net.InetAddress.getByName(InetAddress.java:1048)
>   at org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:563)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager.isValidNode(NodesListManager.java:147)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(ResourceTrackerService.java:367)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockNM.nodeHeartbeat(MockNM.java:178)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockNM.nodeHeartbeat(MockNM.java:136)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:206)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate.testNodeUpdate(TestCapacitySchedulerNodeLabelUpdate.java:157)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3375) NodeHealthScriptRunner.shouldRun() check is performing 3 times for starting NodeHealthScriptRunner

2015-05-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527756#comment-14527756
 ] 

Hudson commented on YARN-3375:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #7729 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7729/])


> NodeHealthScriptRunner.shouldRun() check is performing 3 times for starting 
> NodeHealthScriptRunner
> --
>
> Key: YARN-3375
> URL: https://issues.apache.org/jira/browse/YARN-3375
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Devaraj K
>Assignee: Devaraj K
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-3375.patch
>
>
> 1. NodeHealthScriptRunner.shouldRun() check is happening 3 times for starting 
> the NodeHealthScriptRunner.
> {code:title=NodeManager.java|borderStyle=solid}
> if(!NodeHealthScriptRunner.shouldRun(nodeHealthScript)) {
>   LOG.info("Abey khali");
>   return null;
> }
> {code}
> {code:title=NodeHealthCheckerService.java|borderStyle=solid}
> if (NodeHealthScriptRunner.shouldRun(
> conf.get(YarnConfiguration.NM_HEALTH_CHECK_SCRIPT_PATH))) {
>   addService(nodeHealthScriptRunner);
> }
> {code}
> {code:title=NodeHealthScriptRunner.java|borderStyle=solid}
> if (!shouldRun(nodeHealthScript)) {
>   LOG.info("Not starting node health monitor");
>   return;
> }
> {code}
> 2. If we don't configure node health script or configured health script 
> doesn't execute permission, NM logs with the below message.
> {code:xml}
> 2015-03-19 19:55:45,713 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Abey khali
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2725) Adding test cases of retrying requests about ZKRMStateStore

2015-05-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527758#comment-14527758
 ] 

Hudson commented on YARN-2725:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #7729 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7729/])


> Adding test cases of retrying requests about ZKRMStateStore
> ---
>
> Key: YARN-2725
> URL: https://issues.apache.org/jira/browse/YARN-2725
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
> Fix For: 2.8.0
>
> Attachments: YARN-2725.1.patch, YARN-2725.1.patch
>
>
> YARN-2721 found a race condition for ZK-specific retry semantics. We should 
> add tests about the case of retry requests to ZK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3562) unit tests failures and issues found from findbug from earlier ATS checkins

2015-05-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527752#comment-14527752
 ] 

Hadoop QA commented on YARN-3562:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 53s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 5 new or modified test files. |
| {color:green}+1{color} | javac |   7m 36s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 41s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 40s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 25s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |  53m 18s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| {color:green}+1{color} | yarn tests |   2m 34s | Tests passed in 
hadoop-yarn-server-tests. |
| {color:green}+1{color} | yarn tests |   0m 21s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  94m  1s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.resourcemanager.TestClientRMService |
|   | hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12730037/YARN-3562-YARN-2928.001.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 557a395 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7694/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-tests test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7694/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7694/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7694/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7694/console |


This message was automatically generated.

> unit tests failures and issues found from findbug from earlier ATS checkins
> ---
>
> Key: YARN-3562
> URL: https://issues.apache.org/jira/browse/YARN-3562
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-3562-YARN-2928.001.patch
>
>
> *Issues reported from MAPREDUCE-6337* :
> A bunch of MR unit tests are failing on our branch whenever the mini YARN 
> cluster needs to bring up multiple node managers.
> For example, see 
> https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5472/testReport/org.apache.hadoop.mapred/TestClusterMapReduceTestCase/testMapReduceRestarting/
> It is because the NMCollectorService is using a fixed port for the RPC (8048).
> *Issues reported from YARN-3044* :
> Test case failures and tools(FB & CS) issues found :
> # find bugs issue : Comparison of String objects using == or != in 
> ResourceTrackerService.updateAppCollectorsMap
> # find bugs issue : Boxing/unboxing to parse a primitive 
> RMTimelineCollectorManager.postPut. Called method Long.longValue()
> Should call Long.parseLong(String) instead.
> # find bugs issue : DM_DEFAULT_ENCODING Called method new 
> java.io.FileWriter(String, boolean) At 
> FileSystemTimelineWriterImpl.java:\[line 86\]
> # hadoop.yarn.server.resourcemanager.TestAppManager, 
> hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions, 
> hadoop.yarn.server.resourcemanager.TestClientRMService & 
> hadoop.yarn.server.resourcemanager.logaggregationstatus.TestRMAppLogAggregationStatus,
>  refer https://builds.apache.org/job/PreCommit-YARN-Build/7534/testReport/



--
This message w

[jira] [Commented] (YARN-2618) Avoid over-allocation of disk resources

2015-05-04 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527792#comment-14527792
 ] 

Karthik Kambatla commented on YARN-2618:


[~vinodkv] - we were thinking of working on a branch and merge back to trunk in 
phases. Do you think this alone can directly go to trunk? 

Related - it would be nice if the scheduler parts of YARN-2140 are also worked 
on in a branch. Also, looking to hear thoughts on the branch-development thread 
on yarn-dev@


> Avoid over-allocation of disk resources
> ---
>
> Key: YARN-2618
> URL: https://issues.apache.org/jira/browse/YARN-2618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2618-1.patch, YARN-2618-2.patch, YARN-2618-3.patch, 
> YARN-2618-4.patch, YARN-2618-5.patch, YARN-2618-6.patch, YARN-2618-7.patch
>
>
> Subtask of YARN-2139. 
> This should include
> - Add API support for introducing disk I/O as the 3rd type resource.
> - NM should report this information to the RM
> - RM should consider this to avoid over-allocation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3562) unit tests failures and issues found from findbug from earlier ATS checkins

2015-05-04 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527793#comment-14527793
 ] 

Sangjin Lee commented on YARN-3562:
---

OK, the latest run seems more sane. You might want to take a look at the latest 
unit test failures. At least some failures seem relevant to our branch.

I have a login to the apache build server (being an Apache committer), and can 
kick off a build directly on the build server.

> unit tests failures and issues found from findbug from earlier ATS checkins
> ---
>
> Key: YARN-3562
> URL: https://issues.apache.org/jira/browse/YARN-3562
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-3562-YARN-2928.001.patch
>
>
> *Issues reported from MAPREDUCE-6337* :
> A bunch of MR unit tests are failing on our branch whenever the mini YARN 
> cluster needs to bring up multiple node managers.
> For example, see 
> https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5472/testReport/org.apache.hadoop.mapred/TestClusterMapReduceTestCase/testMapReduceRestarting/
> It is because the NMCollectorService is using a fixed port for the RPC (8048).
> *Issues reported from YARN-3044* :
> Test case failures and tools(FB & CS) issues found :
> # find bugs issue : Comparison of String objects using == or != in 
> ResourceTrackerService.updateAppCollectorsMap
> # find bugs issue : Boxing/unboxing to parse a primitive 
> RMTimelineCollectorManager.postPut. Called method Long.longValue()
> Should call Long.parseLong(String) instead.
> # find bugs issue : DM_DEFAULT_ENCODING Called method new 
> java.io.FileWriter(String, boolean) At 
> FileSystemTimelineWriterImpl.java:\[line 86\]
> # hadoop.yarn.server.resourcemanager.TestAppManager, 
> hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions, 
> hadoop.yarn.server.resourcemanager.TestClientRMService & 
> hadoop.yarn.server.resourcemanager.logaggregationstatus.TestRMAppLogAggregationStatus,
>  refer https://builds.apache.org/job/PreCommit-YARN-Build/7534/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3491) PublicLocalizer#addResource is too slow.

2015-05-04 Thread zhihai xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3491:

Attachment: YARN-3491.004.patch

> PublicLocalizer#addResource is too slow.
> 
>
> Key: YARN-3491
> URL: https://issues.apache.org/jira/browse/YARN-3491
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3491.000.patch, YARN-3491.001.patch, 
> YARN-3491.002.patch, YARN-3491.003.patch, YARN-3491.004.patch
>
>
> Based on the profiling, The bottleneck in PublicLocalizer#addResource is 
> getInitializedLocalDirs. getInitializedLocalDirs call checkLocalDir.
> checkLocalDir is very slow which takes about 10+ ms.
> The total delay will be approximately number of local dirs * 10+ ms.
> This delay will be added for each public resource localization.
> Because PublicLocalizer#addResource is slow, the thread pool can't be fully 
> utilized. Instead of doing public resource localization in 
> parallel(multithreading), public resource localization is serialized most of 
> the time.
> And also PublicLocalizer#addResource is running in Dispatcher thread, 
> So the Dispatcher thread will be blocked by PublicLocalizer#addResource for 
> long time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3491) PublicLocalizer#addResource is too slow.

2015-05-04 Thread zhihai xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527802#comment-14527802
 ] 

zhihai xu commented on YARN-3491:
-

thanks [~wilfreds] for the review. I uploaded a new patch YARN-3491.004.patch, 
which addressed all your comments.

> PublicLocalizer#addResource is too slow.
> 
>
> Key: YARN-3491
> URL: https://issues.apache.org/jira/browse/YARN-3491
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3491.000.patch, YARN-3491.001.patch, 
> YARN-3491.002.patch, YARN-3491.003.patch, YARN-3491.004.patch
>
>
> Based on the profiling, The bottleneck in PublicLocalizer#addResource is 
> getInitializedLocalDirs. getInitializedLocalDirs call checkLocalDir.
> checkLocalDir is very slow which takes about 10+ ms.
> The total delay will be approximately number of local dirs * 10+ ms.
> This delay will be added for each public resource localization.
> Because PublicLocalizer#addResource is slow, the thread pool can't be fully 
> utilized. Instead of doing public resource localization in 
> parallel(multithreading), public resource localization is serialized most of 
> the time.
> And also PublicLocalizer#addResource is running in Dispatcher thread, 
> So the Dispatcher thread will be blocked by PublicLocalizer#addResource for 
> long time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1612) FairScheduler: Enable delay scheduling by default

2015-05-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527820#comment-14527820
 ] 

Hadoop QA commented on YARN-1612:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 33s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 31s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 50s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 14s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |  52m 36s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  88m 52s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12730302/YARN-1612-004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 551615f |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7695/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7695/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7695/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7695/console |


This message was automatically generated.

> FairScheduler: Enable delay scheduling by default
> -
>
> Key: YARN-1612
> URL: https://issues.apache.org/jira/browse/YARN-1612
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Sandy Ryza
>Assignee: Chen He
> Attachments: YARN-1612-003.patch, YARN-1612-004.patch, 
> YARN-1612-v2.patch, YARN-1612.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3547) FairScheduler: Apps that have no resource demand should not participate scheduling

2015-05-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527827#comment-14527827
 ] 

Hadoop QA commented on YARN-3547:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 39s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 35s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 49s | The applied patch generated  1 
new checkstyle issues (total was 9, now 10). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 15s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |  52m 58s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  89m 29s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12730098/YARN-3547.003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 551615f |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/7696/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7696/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7696/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7696/console |


This message was automatically generated.

> FairScheduler: Apps that have no resource demand should not participate 
> scheduling
> --
>
> Key: YARN-3547
> URL: https://issues.apache.org/jira/browse/YARN-3547
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Xianyin Xin
>Assignee: Xianyin Xin
> Attachments: YARN-3547.001.patch, YARN-3547.002.patch, 
> YARN-3547.003.patch
>
>
> At present, all of the 'running' apps participate the scheduling process, 
> however, most of them may have no resource demand on a production cluster, as 
> the app's status is running other than waiting for resource at the most of 
> the app's lifetime. It's not a wise way we sort all the 'running' apps and 
> try to fulfill them, especially on a large-scale cluster which has heavy 
> scheduling load. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527835#comment-14527835
 ] 

Hadoop QA commented on YARN-3134:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 51s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 37s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 34s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  26m  0s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12730332/YARN-3134-YARN-2928.003.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 557a395 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7698/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7698/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7698/console |


This message was automatically generated.

> [Storage implementation] Exploiting the option of using Phoenix to access 
> HBase backend
> ---
>
> Key: YARN-3134
> URL: https://issues.apache.org/jira/browse/YARN-3134
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Li Lu
> Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
> YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
> YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
> YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
> YARN-3134-YARN-2928.003.patch, YARN-3134DataSchema.pdf
>
>
> Quote the introduction on Phoenix web page:
> {code}
> Apache Phoenix is a relational database layer over HBase delivered as a 
> client-embedded JDBC driver targeting low latency queries over HBase data. 
> Apache Phoenix takes your SQL query, compiles it into a series of HBase 
> scans, and orchestrates the running of those scans to produce regular JDBC 
> result sets. The table metadata is stored in an HBase table and versioned, 
> such that snapshot queries over prior versions will automatically use the 
> correct schema. Direct use of the HBase API, along with coprocessors and 
> custom filters, results in performance on the order of milliseconds for small 
> queries, or seconds for tens of millions of rows.
> {code}
> It may simply our implementation read/write data from/to HBase, and can 
> easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3480) Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable

2015-05-04 Thread Jun Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527847#comment-14527847
 ] 

Jun Gong commented on YARN-3480:


[~jianhe], sorry for not specifying our scenario: RM HA is enabled, use ZK to 
store apps' info, most apps running in the cluster are long running(service) 
apps, yarn.resourcemanager.am.max-attempts is set to 1 because we have not 
patched YARN-611 and we want apps to retry more times.  There are 10K apps with 
1~1 attempts stored in ZK. It will take about 6 mins to recover those apps 
when RM HA.

{quote}
1. How often do you see an app failed with a large number of attempts? If it's 
limited to a few apps. I wouldn't worry so much.
2. How slower it is in reality in your case? we've done some benchmark, 
recovering 10k apps(with 1 attempt) on ZK is pretty fast, within 20 seconds or 
so.
{quote}
Please see above. I think it will be OK for map-reduce jobs. But it might not 
be OK for service apps which have been running several months.

{quote}
3. Limiting the attempts to be recorded means we are losing history. it's a 
trade off.
{quote}
Yes, I agree.

> Make AM max attempts stored in RMAppImpl and RMStateStore to be configurable
> 
>
> Key: YARN-3480
> URL: https://issues.apache.org/jira/browse/YARN-3480
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3480.01.patch, YARN-3480.02.patch, 
> YARN-3480.03.patch
>
>
> When RM HA is enabled and running containers are kept across attempts, apps 
> are more likely to finish successfully with more retries(attempts), so it 
> will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However 
> it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make 
> RM recover process much slower. It might be better to set max attempts to be 
> stored in RMStateStore.
> BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to 
> a small value, retried attempts might be very large. So we need to delete 
> some attempts stored in RMStateStore and RMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2921) MockRM#waitForState methods can be too slow and flaky

2015-05-04 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527855#comment-14527855
 ] 

Tsuyoshi Ozawa commented on YARN-2921:
--

[~leftnoteasy] thank you for pinging me. Yes, it looks related. Let me 
survey

> MockRM#waitForState methods can be too slow and flaky
> -
>
> Key: YARN-2921
> URL: https://issues.apache.org/jira/browse/YARN-2921
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Tsuyoshi Ozawa
> Attachments: YARN-2921.001.patch, YARN-2921.002.patch, 
> YARN-2921.003.patch, YARN-2921.004.patch
>
>
> MockRM#waitForState methods currently sleep for too long (2 seconds and 1 
> second). This leads to slow tests and sometimes failures if the 
> App/AppAttempt moves to another state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3491) PublicLocalizer#addResource is too slow.

2015-05-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527898#comment-14527898
 ] 

Hadoop QA commented on YARN-3491:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 43s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 33s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 36s | The applied patch generated  3 
new checkstyle issues (total was 177, now 178). |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m  2s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   5m 57s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  42m  5s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12730351/YARN-3491.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 338e88a |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/7700/artifact/patchprocess/diffcheckstylehadoop-yarn-server-nodemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7700/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7700/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7700/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7700/console |


This message was automatically generated.

> PublicLocalizer#addResource is too slow.
> 
>
> Key: YARN-3491
> URL: https://issues.apache.org/jira/browse/YARN-3491
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3491.000.patch, YARN-3491.001.patch, 
> YARN-3491.002.patch, YARN-3491.003.patch, YARN-3491.004.patch
>
>
> Based on the profiling, The bottleneck in PublicLocalizer#addResource is 
> getInitializedLocalDirs. getInitializedLocalDirs call checkLocalDir.
> checkLocalDir is very slow which takes about 10+ ms.
> The total delay will be approximately number of local dirs * 10+ ms.
> This delay will be added for each public resource localization.
> Because PublicLocalizer#addResource is slow, the thread pool can't be fully 
> utilized. Instead of doing public resource localization in 
> parallel(multithreading), public resource localization is serialized most of 
> the time.
> And also PublicLocalizer#addResource is running in Dispatcher thread, 
> So the Dispatcher thread will be blocked by PublicLocalizer#addResource for 
> long time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels

2015-05-04 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527904#comment-14527904
 ] 

Sunil G commented on YARN-3521:
---

[~leftnoteasy] Yes, Its not a valid point. replaceLabelsOnNode and 
removeFromClusterNodeLabels doesn't need node label object, name is enough. Pls 
discard my earlier comment.

> Support return structured NodeLabel objects in REST API when call 
> getClusterNodeLabels
> --
>
> Key: YARN-3521
> URL: https://issues.apache.org/jira/browse/YARN-3521
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: 0001-YARN-3521.patch, 0002-YARN-3521.patch, 
> 0003-YARN-3521.patch, 0004-YARN-3521.patch
>
>
> In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should 
> make the same change in REST API side to make them consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3573) MiniMRYarnCluster constructor that starts the timeline server using a boolean should be marked depricated

2015-05-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527906#comment-14527906
 ] 

Hadoop QA commented on YARN-3573:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   5m  9s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 28s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 19s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 32s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 31s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 40s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | mapreduce tests | 106m 29s | Tests passed in 
hadoop-mapreduce-client-jobclient. |
| | | 122m 45s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12730327/YARN-3573.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / 551615f |
| hadoop-mapreduce-client-jobclient test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7699/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7699/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7699/console |


This message was automatically generated.

> MiniMRYarnCluster constructor that starts the timeline server using a boolean 
> should be marked depricated
> -
>
> Key: YARN-3573
> URL: https://issues.apache.org/jira/browse/YARN-3573
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Brahma Reddy Battula
> Attachments: YARN-3573.patch
>
>
> {code}MiniMRYarnCluster(String testName, int noOfNMs, boolean enableAHS){code}
> starts the timeline server using *boolean enableAHS*. It is better to have 
> the timelineserver started based on the config value.
> We should mark this constructor as deprecated to avoid its future use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3514) Active directory usernames like domain\login cause YARN failures

2015-05-04 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527907#comment-14527907
 ] 

Chris Nauroth commented on YARN-3514:
-

Looking at the original description, I see upper-case "DOMAIN" is getting 
translated to lower-case "domain" in this environment.  It's likely that this 
environment would get an ownership mismatch error even after getting past the 
current bug.

{code}
drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser
{code}

Nice catch, Wangda.

Is it necessary to translate to lower-case, or can the domain portion of the 
name be left in upper-case to match the OS level?

bq. One possible solution is ignoring cases while compare user name, but that 
will be problematic when user "De"/"de" existed at the same time.

I've seen a few mentions online that Active Directory is not case-sensitive but 
is case-preserving.  That means it will preserve the case you used in 
usernames, but the case doesn't matter for comparisons.  I've also seen 
references that DNS has similar behavior with regards to case.

I can't find a definitive statement though that this is guaranteed behavior.  
I'd feel safer making this kind of change if we had a definitive reference.

> Active directory usernames like domain\login cause YARN failures
> 
>
> Key: YARN-3514
> URL: https://issues.apache.org/jira/browse/YARN-3514
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: CentOS6
>Reporter: john lilley
>Assignee: Chris Nauroth
>Priority: Minor
> Attachments: YARN-3514.001.patch, YARN-3514.002.patch
>
>
> We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is 
> Kerberos-enabled and uses an external AD domain controller for the KDC.  We 
> are able to authenticate, browse HDFS, etc.  However, YARN fails during 
> localization because it seems to get confused by the presence of a \ 
> character in the local user name.
> Our AD authentication on the nodes goes through sssd and set configured to 
> map AD users onto the form domain\username.  For example, our test user has a 
> Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user 
> "domain\hadoopuser".  We have no problem validating that user with PAM, 
> logging in as that user, su-ing to that user, etc.
> However, when we attempt to run a YARN application master, the localization 
> step fails when setting up the local cache directory for the AM.  The error 
> that comes out of the RM logs:
> 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: 
> ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, 
> diagnostics='Application application_1429295486450_0001 failed 1 times due to 
> AM Container for appattempt_1429295486450_0001_01 exited with  exitCode: 
> -1000 due to: Application application_1429295486450_0001 initialization 
> failed (exitCode=255) with output: main : command provided 0
> main : user is DOMAIN\hadoopuser
> main : requested yarn user is domain\hadoopuser
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create 
> directory: 
> /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10
> at 
> org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347)
> .Failing this attempt.. Failing the application.'
> However, when we look on the node launching the AM, we see this:
> [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache
> [root@rpb-cdh-kerb-2 usercache]# ls -l
> drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser
> There appears to be different treatment of the \ character in different 
> places.  Something creates the directory as "domain\hadoopuser" but something 
> else later attempts to use it as "domain%5Chadoopuser".  I’m not sure where 
> or why the URL escapement converts the \ to %5C or why this is not consistent.
> I should also mention, for the sake of completeness, our auth_to_local rule 
> is set up to map u...@domain.com to domain\user:
> RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/dom

[jira] [Commented] (YARN-3560) Not able to navigate to the cluster from tracking url (proxy) generated after submission of job

2015-05-04 Thread Mohammad Shahid Khan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527917#comment-14527917
 ] 

Mohammad Shahid Khan commented on YARN-3560:


The issue is happening due to the wrong hyperlink url formation.
The system is always setting forming the url with the default port name even 
when the yarn.resourcemanager.webapp.address is being configured with different 
port numnber.

> Not able to navigate to the cluster from tracking url (proxy) generated after 
> submission of job
> ---
>
> Key: YARN-3560
> URL: https://issues.apache.org/jira/browse/YARN-3560
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anushri
>Priority: Minor
>
> a standalone web proxy server is enabled in the cluster
> when a job is submitted the url generated contains proxy
> track this url
> in the web page , if we try to navigate to the cluster links [about. 
> applications, or scheduler] it gets redirected to some default port instead 
> of actual RM web port configured
> as such it throws "webpage not available"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3343) TestCapacitySchedulerNodeLabelUpdate.testNodeUpdate sometime fails in trunk

2015-05-04 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527929#comment-14527929
 ] 

Rohith commented on YARN-3343:
--

[~jianhe] I was able to reproduce it. When I debug this issue, found that the 
30 sec timeout was so aggressive to complete the test. On an average , the test 
case was taken around 35-45 sec. Changed the timeout to 60 sec

> TestCapacitySchedulerNodeLabelUpdate.testNodeUpdate sometime fails in trunk
> ---
>
> Key: YARN-3343
> URL: https://issues.apache.org/jira/browse/YARN-3343
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Xuan Gong
>Assignee: Rohith
>Priority: Minor
> Attachments: 0001-YARN-3343.patch
>
>
> Error Message
> test timed out after 3 milliseconds
> Stacktrace
> java.lang.Exception: test timed out after 3 milliseconds
>   at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
>   at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
>   at 
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
>   at java.net.InetAddress.getAllByName0(InetAddress.java:1246)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1162)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1098)
>   at java.net.InetAddress.getByName(InetAddress.java:1048)
>   at org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:563)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager.isValidNode(NodesListManager.java:147)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(ResourceTrackerService.java:367)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockNM.nodeHeartbeat(MockNM.java:178)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockNM.nodeHeartbeat(MockNM.java:136)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:206)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate.testNodeUpdate(TestCapacitySchedulerNodeLabelUpdate.java:157)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3491) PublicLocalizer#addResource is too slow.

2015-05-04 Thread zhihai xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3491:

Attachment: (was: YARN-3491.004.patch)

> PublicLocalizer#addResource is too slow.
> 
>
> Key: YARN-3491
> URL: https://issues.apache.org/jira/browse/YARN-3491
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3491.000.patch, YARN-3491.001.patch, 
> YARN-3491.002.patch, YARN-3491.003.patch
>
>
> Based on the profiling, The bottleneck in PublicLocalizer#addResource is 
> getInitializedLocalDirs. getInitializedLocalDirs call checkLocalDir.
> checkLocalDir is very slow which takes about 10+ ms.
> The total delay will be approximately number of local dirs * 10+ ms.
> This delay will be added for each public resource localization.
> Because PublicLocalizer#addResource is slow, the thread pool can't be fully 
> utilized. Instead of doing public resource localization in 
> parallel(multithreading), public resource localization is serialized most of 
> the time.
> And also PublicLocalizer#addResource is running in Dispatcher thread, 
> So the Dispatcher thread will be blocked by PublicLocalizer#addResource for 
> long time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3491) PublicLocalizer#addResource is too slow.

2015-05-04 Thread zhihai xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3491:

Attachment: YARN-3491.004.patch

> PublicLocalizer#addResource is too slow.
> 
>
> Key: YARN-3491
> URL: https://issues.apache.org/jira/browse/YARN-3491
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3491.000.patch, YARN-3491.001.patch, 
> YARN-3491.002.patch, YARN-3491.003.patch, YARN-3491.004.patch
>
>
> Based on the profiling, The bottleneck in PublicLocalizer#addResource is 
> getInitializedLocalDirs. getInitializedLocalDirs call checkLocalDir.
> checkLocalDir is very slow which takes about 10+ ms.
> The total delay will be approximately number of local dirs * 10+ ms.
> This delay will be added for each public resource localization.
> Because PublicLocalizer#addResource is slow, the thread pool can't be fully 
> utilized. Instead of doing public resource localization in 
> parallel(multithreading), public resource localization is serialized most of 
> the time.
> And also PublicLocalizer#addResource is running in Dispatcher thread, 
> So the Dispatcher thread will be blocked by PublicLocalizer#addResource for 
> long time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3552) RM Web UI shows -1 running containers for completed apps

2015-05-04 Thread Rohith (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3552:
-
Attachment: (was: 0001-YARN-3552.patch)

> RM Web UI shows -1 running containers for completed apps
> 
>
> Key: YARN-3552
> URL: https://issues.apache.org/jira/browse/YARN-3552
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Rohith
>Assignee: Rohith
>Priority: Trivial
>  Labels: newbie
> Attachments: 0001-YARN-3552.patch, 0001-YARN-3552.patch, yarn-3352.PNG
>
>
> In the RMServerUtils, the default values are negative number which results in 
> the displayiing the RM web UI also negative number. 
> {code}
>   public static final ApplicationResourceUsageReport
> DUMMY_APPLICATION_RESOURCE_USAGE_REPORT =
>   BuilderUtils.newApplicationResourceUsageReport(-1, -1,
>   Resources.createResource(-1, -1), Resources.createResource(-1, -1),
>   Resources.createResource(-1, -1), 0, 0);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities

2015-05-04 Thread Jonathan Eagles (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-3448:
--
Attachment: YARN-3448.15.patch

Ported YARN-3530 to RollingLevelDBTimelineStore as part of patch 15

> Add Rolling Time To Lives Level DB Plugin Capabilities
> --
>
> Key: YARN-3448
> URL: https://issues.apache.org/jira/browse/YARN-3448
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-3448.1.patch, YARN-3448.10.patch, 
> YARN-3448.12.patch, YARN-3448.13.patch, YARN-3448.14.patch, 
> YARN-3448.15.patch, YARN-3448.2.patch, YARN-3448.3.patch, YARN-3448.4.patch, 
> YARN-3448.5.patch, YARN-3448.7.patch, YARN-3448.8.patch, YARN-3448.9.patch
>
>
> For large applications, the majority of the time in LeveldbTimelineStore is 
> spent deleting old entities record at a time. An exclusive write lock is held 
> during the entire deletion phase which in practice can be hours. If we are to 
> relax some of the consistency constraints, other performance enhancing 
> techniques can be employed to maximize the throughput and minimize locking 
> time.
> Split the 5 sections of the leveldb database (domain, owner, start time, 
> entity, index) into 5 separate databases. This allows each database to 
> maximize the read cache effectiveness based on the unique usage patterns of 
> each database. With 5 separate databases each lookup is much faster. This can 
> also help with I/O to have the entity and index databases on separate disks.
> Rolling DBs for entity and index DBs. 99.9% of the data are in these two 
> sections 4:1 ration (index to entity) at least for tez. We replace DB record 
> removal with file system removal if we create a rolling set of databases that 
> age out and can be efficiently removed. To do this we must place a constraint 
> to always place an entity's events into it's correct rolling db instance 
> based on start time. This allows us to stitching the data back together while 
> reading and artificial paging.
> Relax the synchronous writes constraints. If we are willing to accept losing 
> some records that we not flushed in the operating system during a crash, we 
> can use async writes that can be much faster.
> Prefer Sequential writes. sequential writes can be several times faster than 
> random writes. Spend some small effort arranging the writes in such a way 
> that will trend towards sequential write performance over random write 
> performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3552) RM Web UI shows -1 running containers for completed apps

2015-05-04 Thread Rohith (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3552:
-
Attachment: 0002-YARN-3552.patch

Updated the patch fixing indentation.

> RM Web UI shows -1 running containers for completed apps
> 
>
> Key: YARN-3552
> URL: https://issues.apache.org/jira/browse/YARN-3552
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Rohith
>Assignee: Rohith
>Priority: Trivial
>  Labels: newbie
> Attachments: 0001-YARN-3552.patch, 0001-YARN-3552.patch, 
> 0002-YARN-3552.patch, yarn-3352.PNG
>
>
> In the RMServerUtils, the default values are negative number which results in 
> the displayiing the RM web UI also negative number. 
> {code}
>   public static final ApplicationResourceUsageReport
> DUMMY_APPLICATION_RESOURCE_USAGE_REPORT =
>   BuilderUtils.newApplicationResourceUsageReport(-1, -1,
>   Resources.createResource(-1, -1), Resources.createResource(-1, -1),
>   Resources.createResource(-1, -1), 0, 0);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3523) Cleanup ResourceManagerAdministrationProtocol interface audience

2015-05-04 Thread Naganarasimha G R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3523:

Attachment: YARN-3523.20150505-1.patch

updated the patch with removal of \@stable

> Cleanup ResourceManagerAdministrationProtocol interface audience
> 
>
> Key: YARN-3523
> URL: https://issues.apache.org/jira/browse/YARN-3523
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
>  Labels: newbie
> Attachments: YARN-3523.20150422-1.patch, YARN-3523.20150504-1.patch, 
> YARN-3523.20150505-1.patch
>
>
> I noticed ResourceManagerAdministrationProtocol has @Private audience for the 
> class and @Public audience for methods. It doesn't make sense to me. We 
> should make class audience and methods audience consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2442) ResourceManager JMX UI does not give HA State

2015-05-04 Thread Rohith (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2442:
-
Priority: Major  (was: Trivial)

> ResourceManager JMX UI does not give HA State
> -
>
> Key: YARN-2442
> URL: https://issues.apache.org/jira/browse/YARN-2442
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: Nishan Shetty
>Assignee: Rohith
>
> ResourceManager JMX UI can show the haState (INITIALIZING, ACTIVE, STANDBY, 
> STOPPED)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3491) PublicLocalizer#addResource is too slow.

2015-05-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14528005#comment-14528005
 ] 

Hadoop QA commented on YARN-3491:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 40s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 32s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 37s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 36s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m  3s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   5m 51s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  41m 52s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12730368/YARN-3491.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 338e88a |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7701/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7701/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7701/console |


This message was automatically generated.

> PublicLocalizer#addResource is too slow.
> 
>
> Key: YARN-3491
> URL: https://issues.apache.org/jira/browse/YARN-3491
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3491.000.patch, YARN-3491.001.patch, 
> YARN-3491.002.patch, YARN-3491.003.patch, YARN-3491.004.patch
>
>
> Based on the profiling, The bottleneck in PublicLocalizer#addResource is 
> getInitializedLocalDirs. getInitializedLocalDirs call checkLocalDir.
> checkLocalDir is very slow which takes about 10+ ms.
> The total delay will be approximately number of local dirs * 10+ ms.
> This delay will be added for each public resource localization.
> Because PublicLocalizer#addResource is slow, the thread pool can't be fully 
> utilized. Instead of doing public resource localization in 
> parallel(multithreading), public resource localization is serialized most of 
> the time.
> And also PublicLocalizer#addResource is running in Dispatcher thread, 
> So the Dispatcher thread will be blocked by PublicLocalizer#addResource for 
> long time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3523) Cleanup ResourceManagerAdministrationProtocol interface audience

2015-05-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14528034#comment-14528034
 ] 

Hadoop QA commented on YARN-3523:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 12s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 53s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  0s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m  6s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 28s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 26s | Tests passed in 
hadoop-yarn-api. |
| | |  38m 38s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12730378/YARN-3523.20150505-1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 318081c |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7703/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7703/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7703/console |


This message was automatically generated.

> Cleanup ResourceManagerAdministrationProtocol interface audience
> 
>
> Key: YARN-3523
> URL: https://issues.apache.org/jira/browse/YARN-3523
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
>  Labels: newbie
> Attachments: YARN-3523.20150422-1.patch, YARN-3523.20150504-1.patch, 
> YARN-3523.20150505-1.patch
>
>
> I noticed ResourceManagerAdministrationProtocol has @Private audience for the 
> class and @Public audience for methods. It doesn't make sense to me. We 
> should make class audience and methods audience consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1971) WindowsLocalWrapperScriptBuilder does not check for errors in generated script

2015-05-04 Thread Remus Rusanu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526344#comment-14526344
 ] 

Remus Rusanu commented on YARN-1971:


The problem is that there is no error check in the generated script. For 
comparison the ContainerLaunch.WindowsShellScriptBuilder will check each line 
in the generated script by adding this line automatically in the script, after 
each command:
{code}
@if %errorlevel% neq 0 exit /b %errorlevel%
{code}

I'm not advocating checking for various error conditions before launching the 
script, I'm saying the generated script itself should have error checking and 
handling.

> WindowsLocalWrapperScriptBuilder does not check for errors in generated script
> --
>
> Key: YARN-1971
> URL: https://issues.apache.org/jira/browse/YARN-1971
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>Priority: Minor
>
> Similar to YARN-1865. The 
> DefaultContainerExecutor.WindowsLocalWrapperScriptBuilder builds a shell 
> script that contains commands that potentially may fail:
> {code}
> pout.println("@echo " + containerIdStr + " > " + normalizedPidFile +".tmp");
> pout.println("@move /Y " + normalizedPidFile + ".tmp " + normalizedPidFile); 
> {code}
> These can fail due to access permissions, disc out of space, bad hardware, 
> cosmic rays etc etc. There should be proper error checking to ease 
> troubleshooting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2775) There is no close method in NMWebServices#getLogs()

2015-05-04 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526366#comment-14526366
 ] 

Tsuyoshi Ozawa commented on YARN-2775:
--

[~skrho], thank you for taking this issue. I agree with that we need to close 
files after creating FileInputStream.  How about using try-with-resources 
statement since now we only supports JDK 7 or later? 
http://docs.oracle.com/javase/7/docs/technotes/guides/language/try-with-resources.html

{code}
try (final FileInputStream fis = ContainerLogsUtils.openLogFileForRead(
  containerIdStr, logFile, nmContext)) {
} catch () {
}
{code}

> There is no close method in NMWebServices#getLogs()
> ---
>
> Key: YARN-2775
> URL: https://issues.apache.org/jira/browse/YARN-2775
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: skrho
>Priority: Minor
> Attachments: YARN-2775_001.patch
>
>
> If getLogs method is called,  fileInputStream object is accumulated in 
> memory..
> Because fileinputStream object is not closed..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3557) Support Intel Trusted Execution Technology(TXT) in YARN scheduler

2015-05-04 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526399#comment-14526399
 ] 

Sunil G commented on YARN-3557:
---

bq.Currently for centralized node label configuration, it only supports admin 
configure node label through CLI.

Apart from CLI and REST, do u mean like exposing these configuration for a 
specific user (i assume this user will have some security approval in the 
cluster) so that this user can make the config via REST or api's.

> Support Intel Trusted Execution Technology(TXT) in YARN scheduler
> -
>
> Key: YARN-3557
> URL: https://issues.apache.org/jira/browse/YARN-3557
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Dian Fu
> Attachments: Support TXT in YARN high level design doc.pdf
>
>
> Intel TXT defines platform-level enhancements that provide the building 
> blocks for creating trusted platforms. A TXT aware YARN scheduler can 
> schedule security sensitive jobs on TXT enabled nodes only. YARN-2492 
> provides the capacity to restrict YARN applications to run only on cluster 
> nodes that have a specified node label. This is a good mechanism that be 
> utilized for TXT aware YARN scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3557) Support Intel Trusted Execution Technology(TXT) in YARN scheduler

2015-05-04 Thread Dian Fu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526416#comment-14526416
 ] 

Dian Fu commented on YARN-3557:
---

Hi [~sunilg],
Thanks for your comments. 
{quote}Apart from CLI and REST, do u mean like exposing these configuration for 
a specific user (i assume this user will have some security approval in the 
cluster) so that this user can make the config via REST or api's.{quote}
Exposing these configuration for a specific user can be one option. But this 
will require users to start a job which updates the labels periodically and is 
complicated for users. If we can provide the similar method to YARN-2495 at RM 
side, user will just need to provide a script(which takes node hostname/ip as 
input and output the node labels).

> Support Intel Trusted Execution Technology(TXT) in YARN scheduler
> -
>
> Key: YARN-3557
> URL: https://issues.apache.org/jira/browse/YARN-3557
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Dian Fu
> Attachments: Support TXT in YARN high level design doc.pdf
>
>
> Intel TXT defines platform-level enhancements that provide the building 
> blocks for creating trusted platforms. A TXT aware YARN scheduler can 
> schedule security sensitive jobs on TXT enabled nodes only. YARN-2492 
> provides the capacity to restrict YARN applications to run only on cluster 
> nodes that have a specified node label. This is a good mechanism that be 
> utilized for TXT aware YARN scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.

2015-05-04 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526423#comment-14526423
 ] 

Sunil G commented on YARN-2305:
---

Yes. This is can be closed. I have checked, and it was not occurring. Still i 
will perform few more tests, and if persists, I will reopen.

Thank you [~leftnoteasy]

> When a container is in reserved state then total cluster memory is displayed 
> wrongly.
> -
>
> Key: YARN-2305
> URL: https://issues.apache.org/jira/browse/YARN-2305
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: J.Andreina
>Assignee: Sunil G
> Attachments: Capture.jpg
>
>
> ENV Details:
> =  
>  3 queues  :  a(50%),b(25%),c(25%) ---> All max utilization is set to 
> 100
>  2 Node cluster with total memory as 16GB
> TestSteps:
> =
>   Execute following 3 jobs with different memory configurations for 
> Map , reducer and AM task
>   ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a 
> -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
> -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 
> /dir8 /preempt_85 (application_1405414066690_0023)
>  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b 
> -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
> -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 
> /dir2 /preempt_86 (application_1405414066690_0025)
>  
>  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c 
> -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 
> -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 
> /dir2 /preempt_62
> Issue
> =
>   when 2GB memory is in reserved state  totoal memory is shown as 
> 15GB and used as 15GB  ( while total memory is 16GB)
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file

2015-05-04 Thread nijel (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3018:

Attachment: YARN-3018-2.patch

Updated the patch to remove the white spaces

> Unify the default value for yarn.scheduler.capacity.node-locality-delay in 
> code and default xml file
> 
>
> Key: YARN-3018
> URL: https://issues.apache.org/jira/browse/YARN-3018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
> Attachments: YARN-3018-1.patch, YARN-3018-2.patch
>
>
> For the configuration item "yarn.scheduler.capacity.node-locality-delay" the 
> default value given in code is "-1"
> public static final int DEFAULT_NODE_LOCALITY_DELAY = -1;
> In the default capacity-scheduler.xml file in the resource manager config 
> directory it is 40.
> Can it be unified to avoid confusion when the user creates the file without 
> this configuration. IF he expects the values in the file to be default 
> values, then it will be wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file

2015-05-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526435#comment-14526435
 ] 

Hadoop QA commented on YARN-3018:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12730139/YARN-3018-2.patch |
| Optional Tests |  |
| git revision | trunk / bb9ddef |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7684/console |


This message was automatically generated.

> Unify the default value for yarn.scheduler.capacity.node-locality-delay in 
> code and default xml file
> 
>
> Key: YARN-3018
> URL: https://issues.apache.org/jira/browse/YARN-3018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
> Attachments: YARN-3018-1.patch, YARN-3018-2.patch
>
>
> For the configuration item "yarn.scheduler.capacity.node-locality-delay" the 
> default value given in code is "-1"
> public static final int DEFAULT_NODE_LOCALITY_DELAY = -1;
> In the default capacity-scheduler.xml file in the resource manager config 
> directory it is 40.
> Can it be unified to avoid confusion when the user creates the file without 
> this configuration. IF he expects the values in the file to be default 
> values, then it will be wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2293) Scoring for NMs to identify a better candidate to launch AMs

2015-05-04 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526441#comment-14526441
 ] 

Sunil G commented on YARN-2293:
---

Hi [~zjshen]
This work is moved to YARN-2005, I will share a basic prototype soon in that.
This can be made as duplicated to YARN-2005.

> Scoring for NMs to identify a better candidate to launch AMs
> 
>
> Key: YARN-2293
> URL: https://issues.apache.org/jira/browse/YARN-2293
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
>
> Container exit status from NM is giving indications of reasons for its 
> failure. Some times, it may be because of container launching problems in NM. 
> In a heterogeneous cluster, some machines with weak hardware may cause more 
> failures. It will be better not to launch AMs there more often. Also I would 
> like to clear that container failures because of buggy job should not result 
> in decreasing score. 
> As mentioned earlier, based on exit status if a scoring mechanism is added 
> for NMs in RM, then NMs with better scores can be given for launching AMs. 
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-2293) Scoring for NMs to identify a better candidate to launch AMs

2015-05-04 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G resolved YARN-2293.
---
Resolution: Duplicate

> Scoring for NMs to identify a better candidate to launch AMs
> 
>
> Key: YARN-2293
> URL: https://issues.apache.org/jira/browse/YARN-2293
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
>
> Container exit status from NM is giving indications of reasons for its 
> failure. Some times, it may be because of container launching problems in NM. 
> In a heterogeneous cluster, some machines with weak hardware may cause more 
> failures. It will be better not to launch AMs there more often. Also I would 
> like to clear that container failures because of buggy job should not result 
> in decreasing score. 
> As mentioned earlier, based on exit status if a scoring mechanism is added 
> for NMs in RM, then NMs with better scores can be given for launching AMs. 
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2256) Too many nodemanager and resourcemanager audit logs are generated

2015-05-04 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526445#comment-14526445
 ] 

Varun Saxena commented on YARN-2256:


[~zjshen], just to brief you on the issue; in our setup we were getting too 
many audit logs related to container events. We also found some other 
unnecessary logs(not required for debugging) appearing frequently. Had raised 
another JIRA for this. Anyways, so we internally we took up the task of 
cleaning up these logs. This also made a slight improvement in the throughput 
of running process(2.4.0)

To resolve the problem, one option was to remove these logs completely. But, we 
decided to support different log levels for audit logs so that if some customer 
requires these logs, we can enable them by merely changing log4j properties.
The scope of these 2 JIRAs' is indeed inter related. But I segregated them, 
because I wasnt sure if community would accept support for different log 
levels. We can decide if we need either one of these.

> Too many nodemanager and resourcemanager audit logs are generated
> -
>
> Key: YARN-2256
> URL: https://issues.apache.org/jira/browse/YARN-2256
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.4.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-2256.patch
>
>
> Following audit logs are generated too many times(due to the possibility of a 
> large number of containers) :
> 1. In NM - Audit logs corresponding to Starting, Stopping and finishing of a 
> container
> 2. In RM - Audit logs corresponding to AM allocating a container and AM 
> releasing a container
> We can have different log levels even for NM and RM audit logs and move these 
> successful container related logs to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2267) Auxiliary Service support in RM

2015-05-04 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526446#comment-14526446
 ] 

Sunil G commented on YARN-2267:
---

It would be a good feature if we can plugin few resource monitoring services to 
RM such as mentioned in *Scenario 1* above.

Could you please share the design thoughts for same, and main question will be 
like how this can be done in controlled way. By this what i meant is, an 
introduction of plugin should not conflict the existing behavior of scheduler's 
etc. 

> Auxiliary Service support in RM
> ---
>
> Key: YARN-2267
> URL: https://issues.apache.org/jira/browse/YARN-2267
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Rohith
>
> Currently RM does not have a provision to run any Auxiliary services. For 
> health/monitoring in RM, its better to make a plugin mechanism in RM itself, 
> similar to NM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3148) allow CORS related headers to passthrough in WebAppProxyServlet

2015-05-04 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526447#comment-14526447
 ] 

Varun Saxena commented on YARN-3148:


Thanks [~gtCarrera] for looking at this. Will update the patch ASAP.

> allow CORS related headers to passthrough in WebAppProxyServlet
> ---
>
> Key: YARN-3148
> URL: https://issues.apache.org/jira/browse/YARN-3148
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Prakash Ramachandran
>Assignee: Varun Saxena
> Attachments: YARN-3148.001.patch
>
>
> currently the WebAppProxyServlet filters the request headers as defined by  
> passThroughHeaders. Tez UI is building a webapp which using rest api to fetch 
> data from the am via the rm tracking url. 
> for this purpose it would be nice to have additional headers allowed 
> especially the ones related to CORS. A few of them that would help are 
> * Origin
> * Access-Control-Request-Method
> * Access-Control-Request-Headers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-05-04 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526452#comment-14526452
 ] 

Varun Saxena commented on YARN-2902:


[~leftnoteasy], sorry for the delay. Was on long leave and have come back 
today. We are pretty clear on how to handle it for private resources(as per the 
comment you highlighted) but hadn't updated a patch as need to simulate and 
investigate further for public resources. I will check it and update ASAP.


> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file

2015-05-04 Thread nijel (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel updated YARN-3018:

Attachment: YARN-3018-3.patch

Re trigger the CIS. Patch was wrongly generated
sorry for the noise

> Unify the default value for yarn.scheduler.capacity.node-locality-delay in 
> code and default xml file
> 
>
> Key: YARN-3018
> URL: https://issues.apache.org/jira/browse/YARN-3018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
> Attachments: YARN-3018-1.patch, YARN-3018-2.patch, YARN-3018-3.patch
>
>
> For the configuration item "yarn.scheduler.capacity.node-locality-delay" the 
> default value given in code is "-1"
> public static final int DEFAULT_NODE_LOCALITY_DELAY = -1;
> In the default capacity-scheduler.xml file in the resource manager config 
> directory it is 40.
> Can it be unified to avoid confusion when the user creates the file without 
> this configuration. IF he expects the values in the file to be default 
> values, then it will be wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file

2015-05-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526466#comment-14526466
 ] 

Hadoop QA commented on YARN-3018:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   0m  0s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | release audit |   0m 15s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| | |   0m 18s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12730144/YARN-3018-3.patch |
| Optional Tests |  |
| git revision | trunk / bb9ddef |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7685/console |


This message was automatically generated.

> Unify the default value for yarn.scheduler.capacity.node-locality-delay in 
> code and default xml file
> 
>
> Key: YARN-3018
> URL: https://issues.apache.org/jira/browse/YARN-3018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: nijel
>Assignee: nijel
>Priority: Trivial
> Attachments: YARN-3018-1.patch, YARN-3018-2.patch, YARN-3018-3.patch
>
>
> For the configuration item "yarn.scheduler.capacity.node-locality-delay" the 
> default value given in code is "-1"
> public static final int DEFAULT_NODE_LOCALITY_DELAY = -1;
> In the default capacity-scheduler.xml file in the resource manager config 
> directory it is 40.
> Can it be unified to avoid confusion when the user creates the file without 
> this configuration. IF he expects the values in the file to be default 
> values, then it will be wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-1662) Capacity Scheduler reservation issue cause Job Hang

2015-05-04 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G resolved YARN-1662.
---
Resolution: Invalid

> Capacity Scheduler reservation issue cause Job Hang
> ---
>
> Key: YARN-1662
> URL: https://issues.apache.org/jira/browse/YARN-1662
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.2.0
> Environment: Suse 11 SP1 + Linux
>Reporter: Sunil G
>
> There are 2 node managers in my cluster.
> NM1 with 8GB
> NM2 with 8GB
> I am submitting a Job with below details:
> AM with 2GB
> Map needs 5GB
> Reducer needs 3GB
> slowstart is enabled with 0.5
> 10maps and 50reducers are assigned.
> 5maps are completed. Now few reducers got scheduled.
> Now NM1 has 2GB AM and 3Gb Reducer_1[Used 5GB]
> NM2 has 3Gb Reducer_2  [Used 3GB]
> A Map has now reserved(5GB) in NM1 which has only 3Gb free.
> It hangs forever.
> Potential issue is, reservation is now blocked in NM1 for a Map which needs 
> 5GB.
> But the Reducer_1 hangs by waiting for few map ouputs.
> Reducer side preemption also not happened as few headroom is still available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1662) Capacity Scheduler reservation issue cause Job Hang

2015-05-04 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526473#comment-14526473
 ] 

Sunil G commented on YARN-1662:
---

Yes [~jianhe]
we can close this issue. After YARN-1769, we have a better reservation too.

I checked this and its not happening now.

> Capacity Scheduler reservation issue cause Job Hang
> ---
>
> Key: YARN-1662
> URL: https://issues.apache.org/jira/browse/YARN-1662
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.2.0
> Environment: Suse 11 SP1 + Linux
>Reporter: Sunil G
>
> There are 2 node managers in my cluster.
> NM1 with 8GB
> NM2 with 8GB
> I am submitting a Job with below details:
> AM with 2GB
> Map needs 5GB
> Reducer needs 3GB
> slowstart is enabled with 0.5
> 10maps and 50reducers are assigned.
> 5maps are completed. Now few reducers got scheduled.
> Now NM1 has 2GB AM and 3Gb Reducer_1[Used 5GB]
> NM2 has 3Gb Reducer_2  [Used 3GB]
> A Map has now reserved(5GB) in NM1 which has only 3Gb free.
> It hangs forever.
> Potential issue is, reservation is now blocked in NM1 for a Map which needs 
> 5GB.
> But the Reducer_1 hangs by waiting for few map ouputs.
> Reducer side preemption also not happened as few headroom is still available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3097) Logging of resource recovery on NM restart has redundancies

2015-05-04 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526577#comment-14526577
 ] 

Eric Payne commented on YARN-3097:
--

{quote}
-1  The patch doesn't appear to include any new or modified tests. Please 
justify why no new tests are needed for this patch. Also please list what 
manual steps were performed to verify this patch.
{quote}

Since the only change in this patch is to change an info log message to a debug 
log message, no tests were included.

> Logging of resource recovery on NM restart has redundancies
> ---
>
> Key: YARN-3097
> URL: https://issues.apache.org/jira/browse/YARN-3097
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Eric Payne
>Priority: Minor
>  Labels: newbie
> Attachments: YARN-3097.001.patch
>
>
> ResourceLocalizationService logs that it is recovering a resource with the 
> remote and local paths, but then very shortly afterwards the 
> LocalizedResource emits an INIT->LOCALIZED transition that also logs the same 
> remote and local paths.  The recovery message should be a debug message, 
> since it's not conveying any useful information that isn't already covered by 
> the resource state transition log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3562) unit tests failures and issues found from findbug from earlier ATS checkins

2015-05-04 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526578#comment-14526578
 ] 

Naganarasimha G R commented on YARN-3562:
-

Seems to be some issue with Jenkins, compilation is passing and the test logs 
are showing as compilation issues !

> unit tests failures and issues found from findbug from earlier ATS checkins
> ---
>
> Key: YARN-3562
> URL: https://issues.apache.org/jira/browse/YARN-3562
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-3562-YARN-2928.001.patch
>
>
> *Issues reported from MAPREDUCE-6337* :
> A bunch of MR unit tests are failing on our branch whenever the mini YARN 
> cluster needs to bring up multiple node managers.
> For example, see 
> https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5472/testReport/org.apache.hadoop.mapred/TestClusterMapReduceTestCase/testMapReduceRestarting/
> It is because the NMCollectorService is using a fixed port for the RPC (8048).
> *Issues reported from YARN-3044* :
> Test case failures and tools(FB & CS) issues found :
> # find bugs issue : Comparison of String objects using == or != in 
> ResourceTrackerService.updateAppCollectorsMap
> # find bugs issue : Boxing/unboxing to parse a primitive 
> RMTimelineCollectorManager.postPut. Called method Long.longValue()
> Should call Long.parseLong(String) instead.
> # find bugs issue : DM_DEFAULT_ENCODING Called method new 
> java.io.FileWriter(String, boolean) At 
> FileSystemTimelineWriterImpl.java:\[line 86\]
> # hadoop.yarn.server.resourcemanager.TestAppManager, 
> hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions, 
> hadoop.yarn.server.resourcemanager.TestClientRMService & 
> hadoop.yarn.server.resourcemanager.logaggregationstatus.TestRMAppLogAggregationStatus,
>  refer https://builds.apache.org/job/PreCommit-YARN-Build/7534/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3474) Add a way to let NM wait RM to come back, not kill running containers

2015-05-04 Thread Jun Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526588#comment-14526588
 ] 

Jun Gong commented on YARN-3474:


[~vinodkv] Thank you for the explanation. Closing it now.

> Add a way to let NM wait RM to come back, not kill running containers
> -
>
> Key: YARN-3474
> URL: https://issues.apache.org/jira/browse/YARN-3474
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3474.01.patch
>
>
> When RM HA is enabled and active RM shuts down, standby RM will become 
> active, recover apps and attempts. Apps will not be affected. 
> If there are some cases or bugs that cause both RM could not start 
> normally(e.g. [YARN-2340|https://issues.apache.org/jira/browse/YARN-2340]; RM 
> could not connect with ZK well). NM will kill containers running on it when  
> it could not heartbeat with RM for some time(max retry time is 15 mins by 
> default). Then all apps will be killed. 
> In production cluster, we might come across above cases and fixing these bugs 
> might need time more than 15 mins. In order to let apps not be affected and 
> killed by NM, YARN admin could set a flag(the flag is a znode 
> '/wait-rm-to-come-back/cluster-id' in our solution) to tell NM wait for RM to 
> come back and not kill running containers. After fixing bugs and RM start 
> normally, clear the flag.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-3474) Add a way to let NM wait RM to come back, not kill running containers

2015-05-04 Thread Jun Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong resolved YARN-3474.

Resolution: Invalid

> Add a way to let NM wait RM to come back, not kill running containers
> -
>
> Key: YARN-3474
> URL: https://issues.apache.org/jira/browse/YARN-3474
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3474.01.patch
>
>
> When RM HA is enabled and active RM shuts down, standby RM will become 
> active, recover apps and attempts. Apps will not be affected. 
> If there are some cases or bugs that cause both RM could not start 
> normally(e.g. [YARN-2340|https://issues.apache.org/jira/browse/YARN-2340]; RM 
> could not connect with ZK well). NM will kill containers running on it when  
> it could not heartbeat with RM for some time(max retry time is 15 mins by 
> default). Then all apps will be killed. 
> In production cluster, we might come across above cases and fixing these bugs 
> might need time more than 15 mins. In order to let apps not be affected and 
> killed by NM, YARN admin could set a flag(the flag is a znode 
> '/wait-rm-to-come-back/cluster-id' in our solution) to tell NM wait for RM to 
> come back and not kill running containers. After fixing bugs and RM start 
> normally, clear the flag.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup

2015-05-04 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526623#comment-14526623
 ] 

Naganarasimha G R commented on YARN-2729:
-

Thanks for the review comments [~vinodkv],  

bq.SCRIPT_NODE_LABELS_PROVIDER and CONFIG_NODE_LABELS_PROVIDER are not needed, 
delete them, you have separate constants for their prefixes
Actually these are not preffixes, as per [~Wangda]'s 
[comment|https://issues.apache.org/jira/browse/YARN-2729?focusedCommentId=14393545&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14393545]
 we had decided have whitelisting for provider : {{The option will be: 
yarn.node-labels.nm.provider = "config/script/other-class-name".}} . These are 
modifications for it.

bq. DISABLE_NODE_LABELS_PROVIDER_FETCH_TIMER doesn't need to be in 
YarnConfiguration
Well As per one of the wangda's comment he had mentioned that possible values 
or default values of configurations had to be kept in YARNConfiguration, hence 
had placed it here, if required as per your comment can move it to 
AbstractNodeLabelsProvider

bq. LOG is not used anywhere
Are the logs expected when the labels are set in {{setNodeLabels}} ? i can add 
here but anyway there were logs in NodeStatusUpdaterImpl on successfull and 
unsuccessfull attempts.

bq. BTW, assuming YARN-3565 goes in first, you will have to make some changes 
here.
bq. I think the format expected from the command should be more structured. 
Specifically as we expect more per-label attributes in line with YARN-3565.
Well was thinking about this while working on YARN-3565, but dint modify the 
NodeLabelsProvider as currently Labels(currently partitions) which needs to be 
sent from NM have to be one of RM's CLUSTER NodeLabel set. So exclusiveness 
need not be sent from NM to RM as currently we support specifying the 
exclusiveness only during adding clusterNode labels. So IMHO if there is plan 
to make this interface public & stable then would be better do these changes 
now itself if not it would better done after requirement for constraint labels, 
so that more clarity on structure would be there? 
[~wangda] and you can share your opinion on this, based on it will do the 
modifications.

bq. Not caused by your patch but worth fixing here. NodeStatusUpdaterImpl 
shouldn't worry about invalid label-set, previous-valid-labels and label 
validation. You should move all that functionality into NodeLabelsProvider.
Well as per the class reponsibility i understand that NodeStatusUpdaterImpl is 
not supposed to have it but as it might be expected to be public we had to 
ensure that
* For every heartbeat labels are sent across only if modified
* doing basic validations before sending the modified labels

These needs to be done irrespective of the label provider (system or user's) 
hence kept it in NodeStatusUpdaterImpl , but if req to be moved out then we 
need to bring in some intermediate manager(/helper/delegator) class between 
NodeStatusUpdaterImpl and NodeLabelsProvider.
Those changes were also from my previous patch, so no hard feelings in taking 
care of it if req :).

bq. Can you add the documentation for setting this up too too?
Well was planning to raise jira for updating documentation on top of NodeLabels 
but documentation for it is not yet completed. If required can just add some 
pdf here

> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup
> ---
>
> Key: YARN-2729
> URL: https://issues.apache.org/jira/browse/YARN-2729
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, 
> YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, 
> YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, 
> YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, 
> YARN-2729.20150402-1.patch, YARN-2729.20150404-1.patch
>
>
> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3565) NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object instead of String

2015-05-04 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526629#comment-14526629
 ] 

Naganarasimha G R commented on YARN-3565:
-

Thanks for the review comments [~vinodkv]
Agree with most of your suggestions but had few queries overall,
* can there be changes again when labels as constraints are introduced ? As i 
am not sure exclusivity will have any significance with constraints, if we plan 
to make use of NodeLabel class for constraints too.
* CLI will also require changes for adding, removing cluster node labels and 
mapping of nodes to labels ?
* If required to modify RMNodeLabelsManager.replaceLabelsOnNode() then i think 
we need to make yarn-3521 dependent on this jira, right ? 


> NodeHeartbeatRequest/RegisterNodeManagerRequest should use NodeLabel object 
> instead of String
> -
>
> Key: YARN-3565
> URL: https://issues.apache.org/jira/browse/YARN-3565
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
>Priority: Blocker
> Attachments: YARN-3565-20150502-1.patch
>
>
> Now NM HB/Register uses Set, it will be hard to add new fields if we 
> want to support specifying NodeLabel type such as exclusivity/constraints, 
> etc. We need to make sure rolling upgrade works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3523) Cleanup ResourceManagerAdministrationProtocol interface audience

2015-05-04 Thread Naganarasimha G R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3523:

Attachment: YARN-3523.20150504-1.patch

Have checked the 2.7.0 api docs and neither this class nor its package has been 
captured. Hence have modified visibility of the methods as private in this 
updated patch

> Cleanup ResourceManagerAdministrationProtocol interface audience
> 
>
> Key: YARN-3523
> URL: https://issues.apache.org/jira/browse/YARN-3523
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
>  Labels: newbie
> Attachments: YARN-3523.20150422-1.patch, YARN-3523.20150504-1.patch
>
>
> I noticed ResourceManagerAdministrationProtocol has @Private audience for the 
> class and @Public audience for methods. It doesn't make sense to me. We 
> should make class audience and methods audience consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2618) Avoid over-allocation of disk resources

2015-05-04 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526683#comment-14526683
 ] 

Junping Du commented on YARN-2618:
--

Kick off test again manually.

> Avoid over-allocation of disk resources
> ---
>
> Key: YARN-2618
> URL: https://issues.apache.org/jira/browse/YARN-2618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2618-1.patch, YARN-2618-2.patch, YARN-2618-3.patch, 
> YARN-2618-4.patch, YARN-2618-5.patch, YARN-2618-6.patch, YARN-2618-7.patch
>
>
> Subtask of YARN-2139. 
> This should include
> - Add API support for introducing disk I/O as the 3rd type resource.
> - NM should report this information to the RM
> - RM should consider this to avoid over-allocation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2622) RM should put the application related timeline data into a secured domain

2015-05-04 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2622:
-
Target Version/s:   (was: 2.6.0)

> RM should put the application related timeline data into a secured domain
> -
>
> Key: YARN-2622
> URL: https://issues.apache.org/jira/browse/YARN-2622
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> After YARN-2446, SystemMetricsPublisher doesn't specify any domain, and the 
> application related timeline data is put into the default domain. It is not 
> secured. We should let RM to choose a secured domain to put the system 
> metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2618) Avoid over-allocation of disk resources

2015-05-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526700#comment-14526700
 ] 

Hadoop QA commented on YARN-2618:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12723515/YARN-2618-7.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / bb9ddef |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7687/console |


This message was automatically generated.

> Avoid over-allocation of disk resources
> ---
>
> Key: YARN-2618
> URL: https://issues.apache.org/jira/browse/YARN-2618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2618-1.patch, YARN-2618-2.patch, YARN-2618-3.patch, 
> YARN-2618-4.patch, YARN-2618-5.patch, YARN-2618-6.patch, YARN-2618-7.patch
>
>
> Subtask of YARN-2139. 
> This should include
> - Add API support for introducing disk I/O as the 3rd type resource.
> - NM should report this information to the RM
> - RM should consider this to avoid over-allocation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3523) Cleanup ResourceManagerAdministrationProtocol interface audience

2015-05-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526710#comment-14526710
 ] 

Hadoop QA commented on YARN-3523:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 55s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 42s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 52s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m  2s | The applied patch generated  1 
new checkstyle issues (total was 17, now 18). |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 6  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 24s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 27s | Tests passed in 
hadoop-yarn-api. |
| | |  38m  5s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12730182/YARN-3523.20150504-1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / bb9ddef |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/7686/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/7686/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7686/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7686/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7686/console |


This message was automatically generated.

> Cleanup ResourceManagerAdministrationProtocol interface audience
> 
>
> Key: YARN-3523
> URL: https://issues.apache.org/jira/browse/YARN-3523
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
>  Labels: newbie
> Attachments: YARN-3523.20150422-1.patch, YARN-3523.20150504-1.patch
>
>
> I noticed ResourceManagerAdministrationProtocol has @Private audience for the 
> class and @Public audience for methods. It doesn't make sense to me. We 
> should make class audience and methods audience consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels

2015-05-04 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526747#comment-14526747
 ] 

Sunil G commented on YARN-3521:
---

1.
bq.Should be exclusivity.
Yes. I updated the same

2.
bq.Did we ever call these APIs stable? 
No. I have changed to a NodeLabelsInfo object and added new getter which can 
supply list/set of string names.

3.
Why are we not dropping the name-only records?
I have removed *NodeLabelsName*. And instead use *NodeLabelsInfo*, also added a 
new getter which can give back String of label names. NodeToLabelsName is 
renamed as NodeToLabelsInfo and internally it also uses NodeLabelInfo.

> Support return structured NodeLabel objects in REST API when call 
> getClusterNodeLabels
> --
>
> Key: YARN-3521
> URL: https://issues.apache.org/jira/browse/YARN-3521
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: 0001-YARN-3521.patch, 0002-YARN-3521.patch, 
> 0003-YARN-3521.patch
>
>
> In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should 
> make the same change in REST API side to make them consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3521) Support return structured NodeLabel objects in REST API when call getClusterNodeLabels

2015-05-04 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3521:
--
Attachment: 0004-YARN-3521.patch

[~vinodkv] and [~leftnoteasy]
Pls share your thoughts on this updated patch. 

IMO I also feel that NodeLabelManager apis can use Object rather than Strings. 
Admin interface can take this conversion logic.

> Support return structured NodeLabel objects in REST API when call 
> getClusterNodeLabels
> --
>
> Key: YARN-3521
> URL: https://issues.apache.org/jira/browse/YARN-3521
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: 0001-YARN-3521.patch, 0002-YARN-3521.patch, 
> 0003-YARN-3521.patch, 0004-YARN-3521.patch
>
>
> In YARN-3413, yarn cluster CLI returns NodeLabel instead of String, we should 
> make the same change in REST API side to make them consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3097) Logging of resource recovery on NM restart has redundancies

2015-05-04 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526755#comment-14526755
 ] 

Jason Lowe commented on YARN-3097:
--

+1, committing this.

> Logging of resource recovery on NM restart has redundancies
> ---
>
> Key: YARN-3097
> URL: https://issues.apache.org/jira/browse/YARN-3097
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Eric Payne
>Priority: Minor
>  Labels: newbie
> Attachments: YARN-3097.001.patch
>
>
> ResourceLocalizationService logs that it is recovering a resource with the 
> remote and local paths, but then very shortly afterwards the 
> LocalizedResource emits an INIT->LOCALIZED transition that also logs the same 
> remote and local paths.  The recovery message should be a debug message, 
> since it's not conveying any useful information that isn't already covered by 
> the resource state transition log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3388) Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit

2015-05-04 Thread Nathan Roberts (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526763#comment-14526763
 ] 

Nathan Roberts commented on YARN-3388:
--

Yes. I have a patch which I think is close. I need to merge to latest trunk. 
then I'll post for review.

> Allocation in LeafQueue could get stuck because DRF calculator isn't well 
> supported when computing user-limit
> -
>
> Key: YARN-3388
> URL: https://issues.apache.org/jira/browse/YARN-3388
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-3388-v0.patch, YARN-3388-v1.patch
>
>
> When there are multiple active users in a queue, it should be possible for 
> those users to make use of capacity up-to max_capacity (or close). The 
> resources should be fairly distributed among the active users in the queue. 
> This works pretty well when there is a single resource being scheduled.   
> However, when there are multiple resources the situation gets more complex 
> and the current algorithm tends to get stuck at Capacity. 
> Example illustrated in subsequent comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3097) Logging of resource recovery on NM restart has redundancies

2015-05-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526764#comment-14526764
 ] 

Hudson commented on YARN-3097:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7723 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7723/])
YARN-3097. Logging of resource recovery on NM restart has redundancies. 
Contributed by Eric Payne (jlowe: rev 8f65c793f2930bfd16885a2ab188a9970b754974)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* hadoop-yarn-project/CHANGES.txt


> Logging of resource recovery on NM restart has redundancies
> ---
>
> Key: YARN-3097
> URL: https://issues.apache.org/jira/browse/YARN-3097
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Eric Payne
>Priority: Minor
>  Labels: newbie
> Fix For: 2.8.0
>
> Attachments: YARN-3097.001.patch
>
>
> ResourceLocalizationService logs that it is recovering a resource with the 
> remote and local paths, but then very shortly afterwards the 
> LocalizedResource emits an INIT->LOCALIZED transition that also logs the same 
> remote and local paths.  The recovery message should be a debug message, 
> since it's not conveying any useful information that isn't already covered by 
> the resource state transition log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-04 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526775#comment-14526775
 ] 

Jason Lowe commented on YARN-3554:
--

YARN-3518 is a separate concern with different ramifications.  We should 
discuss it there and not mix these two.

bq. set this to a bigger value maybe based on network partition considerations 
not only for nm restart.
What value do you propose?  As pointed out earlier, anything over 10 minutes is 
pointless since the container allocation expires in that time.  Is it common 
for network partitions to take longer than 3 minutes but less than 10 minutes?  
If so we should tune the value for that.  If not then making the value larger 
just slows recovery time.

bq. 3 mins seems dangerous, If rm fails over and the recover takes serval mins, 
nm maybe kill all containers, in production env, it's not expected.

This JIRA is configuring the amount of time NM clients (i.e.: primarily 
ApplicationMasters and the RM when launching ApplicationMasters) will try to 
connect to a particular NM before failing.  I'm missing how RM failover leads 
to a mass killing of containers due to this proposed change.  This is not a 
property used by the NM, so the NM is not going to start killing all containers 
differently based on an updated value for it.  The only case where the RM will 
use this property is when connecting to NMs to launch AM containers, and it 
will only do so for NMs that have recently heartbeated.  Could you explain how 
this leads to all containers getting killed on a particular node?

> Default value for maximum nodemanager connect wait time is too high
> ---
>
> Key: YARN-3554
> URL: https://issues.apache.org/jira/browse/YARN-3554
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Naganarasimha G R
>  Labels: newbie
> Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch
>
>
> The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
> msec or 15 minutes, which is way too high.  The default container expiry time 
> from the RM and the default task timeout in MapReduce are both only 10 
> minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2618) Avoid over-allocation of disk resources

2015-05-04 Thread Wei Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526791#comment-14526791
 ] 

Wei Yan commented on YARN-2618:
---

Thanks, [~djp], I'll rebase the patch.

> Avoid over-allocation of disk resources
> ---
>
> Key: YARN-2618
> URL: https://issues.apache.org/jira/browse/YARN-2618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2618-1.patch, YARN-2618-2.patch, YARN-2618-3.patch, 
> YARN-2618-4.patch, YARN-2618-5.patch, YARN-2618-6.patch, YARN-2618-7.patch
>
>
> Subtask of YARN-2139. 
> This should include
> - Add API support for introducing disk I/O as the 3rd type resource.
> - NM should report this information to the RM
> - RM should consider this to avoid over-allocation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-04 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526934#comment-14526934
 ] 

Naganarasimha G R commented on YARN-3554:
-

Hi [~jlowe],
 earlier my query of ideal time and [~sandflee]'s comment is related to 
"yarn.resourcemanager.connect.max-wait.ms" and as [~gtCarrera] mentioned its 
just for discussion purpose.

> Default value for maximum nodemanager connect wait time is too high
> ---
>
> Key: YARN-3554
> URL: https://issues.apache.org/jira/browse/YARN-3554
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Naganarasimha G R
>  Labels: newbie
> Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch
>
>
> The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
> msec or 15 minutes, which is way too high.  The default container expiry time 
> from the RM and the default task timeout in MapReduce are both only 10 
> minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3491) PublicLocalizer#addResource is too slow.

2015-05-04 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526957#comment-14526957
 ] 

Wilfred Spiegelenburg commented on YARN-3491:
-

Can we clean up the getInitializedLogDirs() and getInitializedLogDirs() now 
that we're changing them?
Neither of the methods need to return anything since we do not use the return 
value. Also a rename of the methods would make it clearer:
getInitializedLogDirs()  -->  initializeLogDirs()
getInitializedLocalDirs()  -->  initializeLocalDirs()


> PublicLocalizer#addResource is too slow.
> 
>
> Key: YARN-3491
> URL: https://issues.apache.org/jira/browse/YARN-3491
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-3491.000.patch, YARN-3491.001.patch, 
> YARN-3491.002.patch, YARN-3491.003.patch
>
>
> Based on the profiling, The bottleneck in PublicLocalizer#addResource is 
> getInitializedLocalDirs. getInitializedLocalDirs call checkLocalDir.
> checkLocalDir is very slow which takes about 10+ ms.
> The total delay will be approximately number of local dirs * 10+ ms.
> This delay will be added for each public resource localization.
> Because PublicLocalizer#addResource is slow, the thread pool can't be fully 
> utilized. Instead of doing public resource localization in 
> parallel(multithreading), public resource localization is serialized most of 
> the time.
> And also PublicLocalizer#addResource is running in Dispatcher thread, 
> So the Dispatcher thread will be blocked by PublicLocalizer#addResource for 
> long time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1564) add some basic workflow YARN services

2015-05-04 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526963#comment-14526963
 ] 

Zhijie Shen commented on YARN-1564:
---

YARN-2928 is going to support flow as the first class citizen.It will be great 
if we can coordinate on this between app management and monitoring.

> add some basic workflow YARN services
> -
>
> Key: YARN-1564
> URL: https://issues.apache.org/jira/browse/YARN-1564
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: YARN-1564-001.patch
>
>   Original Estimate: 24h
>  Time Spent: 48h
>  Remaining Estimate: 0h
>
> I've been using some alternative composite services to help build workflows 
> of process execution in a YARN AM.
> They and their tests could be moved in YARN for the use by others -this would 
> make it easier to build aggregate services in an AM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2015-05-04 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526983#comment-14526983
 ] 

Jason Lowe commented on YARN-3554:
--

Ah, thanks [~Naganarasimha], sorry I missed that.  We can continue discussing 
the proper RM connect wait time over at YARN-3518, as obviously I cannot keep 
them straight here. ;-)

Are there still objections to lowering it from 15 mins to 3 mins?  I'm +1 for 
the second patch, but I'll wait a few days before committing to give time for 
alternate proposals.

> Default value for maximum nodemanager connect wait time is too high
> ---
>
> Key: YARN-3554
> URL: https://issues.apache.org/jira/browse/YARN-3554
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Naganarasimha G R
>  Labels: newbie
> Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch
>
>
> The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
> msec or 15 minutes, which is way too high.  The default container expiry time 
> from the RM and the default task timeout in MapReduce are both only 10 
> minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3422) relatedentities always return empty list when primary filter is set

2015-05-04 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527012#comment-14527012
 ] 

Zhijie Shen commented on YARN-3422:
---

[~billie.rina...@gmail.com], thanks for explaining the rationale. Hence the 
attache patch should not be the right fix.

bq. In retrospect, the directional nature of the related entity relationship 
seems to make things more confusing. Perhaps it would be better if relatedness 
were bidirectional.

I think directional may be okay, but the confusing part is that we're storing A 
<- B, but we query B -> A, while we always say related entities. In fact, we 
need to differentiate both. When storing A, B resides in A entity as 
isRelatedTo entity, and when querying B, A is shown as the relatesTo entity. Of 
cause, we can querying A, and B should be shown as the isRelatedTo entity, 
which is not supported here. This problem will be resolved in ATS v2.

Moreover, it's also the limitation about the way we store the primary filter. 
The index table is a copy of the whole entity (only the information comes with 
the current put) and attach the primary filter as the prefix of the key. It 
makes it expensive to define one primary key for an entity, and probably 
results in different snapshot of the entity with different primary keys. In 
this example, B doesn't have primary filter C, but even later we add C for B, 
we will still be not able to get related entity A when querying B via primary 
filter C. That's one reason why I suggest using reverse index in YARN-3448.

However, for current LeveldbTimelineStore, I'm not sure if we have a quick way 
to resolve the problem. Thoughts?

> relatedentities always return empty list when primary filter is set
> ---
>
> Key: YARN-3422
> URL: https://issues.apache.org/jira/browse/YARN-3422
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-3422.1.patch
>
>
> When you curl for ats entities with a primary filter, the relatedentities 
> fields always return empty list



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2618) Avoid over-allocation of disk resources

2015-05-04 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527020#comment-14527020
 ] 

Vinod Kumar Vavilapalli commented on YARN-2618:
---

Haven't looked at this so far, Tx for rekicking it Junping! Taking a quick look 
now..

> Avoid over-allocation of disk resources
> ---
>
> Key: YARN-2618
> URL: https://issues.apache.org/jira/browse/YARN-2618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2618-1.patch, YARN-2618-2.patch, YARN-2618-3.patch, 
> YARN-2618-4.patch, YARN-2618-5.patch, YARN-2618-6.patch, YARN-2618-7.patch
>
>
> Subtask of YARN-2139. 
> This should include
> - Add API support for introducing disk I/O as the 3rd type resource.
> - NM should report this information to the RM
> - RM should consider this to avoid over-allocation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3573) MiniMRYarnCluster constructor that starts the timeline server using a boolean should be marked depricated

2015-05-04 Thread Mit Desai (JIRA)

Mit Desai created YARN-3573:
---

 Summary: MiniMRYarnCluster constructor that starts the timeline 
server using a boolean should be marked depricated
 Key: YARN-3573
 URL: https://issues.apache.org/jira/browse/YARN-3573
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Mit Desai


{code}MiniMRYarnCluster(String testName, int noOfNMs, boolean enableAHS){code}
starts the timeline server using *boolean enableAHS*. It is better to have the 
timelineserver started based on the config value.
We should mark this constructor as deprecated to avoid its future use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2618) Avoid over-allocation of disk resources

2015-05-04 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527035#comment-14527035
 ] 

Vinod Kumar Vavilapalli commented on YARN-2618:
---

Okay, quickly scanned. Seems like you are having other related discussions at 
the umbrella ticket and other JIRAs. So please go ahead.

Is this only for trunk or branch-2 also?

> Avoid over-allocation of disk resources
> ---
>
> Key: YARN-2618
> URL: https://issues.apache.org/jira/browse/YARN-2618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2618-1.patch, YARN-2618-2.patch, YARN-2618-3.patch, 
> YARN-2618-4.patch, YARN-2618-5.patch, YARN-2618-6.patch, YARN-2618-7.patch
>
>
> Subtask of YARN-2139. 
> This should include
> - Add API support for introducing disk I/O as the 3rd type resource.
> - NM should report this information to the RM
> - RM should consider this to avoid over-allocation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3573) MiniMRYarnCluster constructor that starts the timeline server using a boolean should be marked depricated

2015-05-04 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai reassigned YARN-3573:
---

Assignee: Mit Desai

> MiniMRYarnCluster constructor that starts the timeline server using a boolean 
> should be marked depricated
> -
>
> Key: YARN-3573
> URL: https://issues.apache.org/jira/browse/YARN-3573
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
>
> {code}MiniMRYarnCluster(String testName, int noOfNMs, boolean enableAHS){code}
> starts the timeline server using *boolean enableAHS*. It is better to have 
> the timelineserver started based on the config value.
> We should mark this constructor as deprecated to avoid its future use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1612) FairScheduler: Enable delay scheduling by default

2015-05-04 Thread Chen He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-1612:
--
Attachment: YARN-1612-003.patch

patch updated. 

> FairScheduler: Enable delay scheduling by default
> -
>
> Key: YARN-1612
> URL: https://issues.apache.org/jira/browse/YARN-1612
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Sandy Ryza
>Assignee: Chen He
> Attachments: YARN-1612-003.patch, YARN-1612-v2.patch, YARN-1612.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1612) FairScheduler: Enable delay scheduling by default

2015-05-04 Thread Chen He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-1612:
--
Attachment: (was: YARN-1612-003.patch)

> FairScheduler: Enable delay scheduling by default
> -
>
> Key: YARN-1612
> URL: https://issues.apache.org/jira/browse/YARN-1612
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Sandy Ryza
>Assignee: Chen He
> Attachments: YARN-1612-003.patch, YARN-1612-v2.patch, YARN-1612.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 169 matches

Mail list logo