date:20150826


[ 
https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715341#comment-14715341
 ] 

Hadoop QA commented on YARN-3717:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  25m 15s | Pre-patch trunk has 7 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 7 new or modified test files. |
| {color:green}+1{color} | javac |   8m  3s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  6s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   3m  1s | Site still builds. |
| {color:red}-1{color} | checkstyle |   2m 43s | The applied patch generated  3 
new checkstyle issues (total was 16, now 18). |
| {color:green}+1{color} | whitespace |   0m 12s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   6m 21s | Post-patch findbugs 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 compilation is broken. |
| {color:green}+1{color} | findbugs |   6m 21s | The patch does not introduce 
any new Findbugs (version ) warnings. |
| {color:red}-1{color} | yarn tests |   0m 17s | Tests failed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   0m 12s | Tests failed in 
hadoop-yarn-client. |
| {color:red}-1{color} | yarn tests |   0m 19s | Tests failed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |   0m 13s | Tests failed in 
hadoop-yarn-server-applicationhistoryservice. |
| {color:red}-1{color} | yarn tests |   0m 13s | Tests failed in 
hadoop-yarn-server-common. |
| {color:red}-1{color} | yarn tests |   0m 18s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  60m 41s | |
\\
\\
|| Reason || Tests ||
| Failed build | hadoop-yarn-api |
|   | hadoop-yarn-client |
|   | hadoop-yarn-common |
|   | hadoop-yarn-server-applicationhistoryservice |
|   | hadoop-yarn-server-common |
|   | hadoop-yarn-server-resourcemanager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752534/YARN-3717.20150826-1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle site |
| git revision | trunk / a4d9acc |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8919/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8919/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8919/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8919/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8919/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8919/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8919/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8919/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8919/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8919/console |


This message was automatically generated.

 Improve RM node labels web UI
 -

 Key: YARN-3717
 URL: https://issues.apache.org/jira/browse/YARN-3717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, 
 YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, 
 YARN-3717.20150825-1.patch, YARN-3717.20150826-1.patch


 1 Add the

[jira] [Comment Edited] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..


[ 
https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715429#comment-14715429
 ] 

Allen Wittenauer edited comment on YARN-4084 at 8/26/15 8:10 PM:
-

So then mvn compile is actually what you want (I think) :)


was (Author: aw):
So then mvn compile is actually what you want

 Yarn should allow to skip hadoop-yarn-server-tests project from build..
 ---

 Key: YARN-4084
 URL: https://issues.apache.org/jira/browse/YARN-4084
 Project: Hadoop YARN
  Issue Type: Bug
  Components: build
Affects Versions: 2.7.1
Reporter: Ved Prakash Pandey
Priority: Minor
 Attachments: YARN-4084.patch


 For fast compilation one can try to skip the test code compilation by using 
 {{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this 
 option is used. This is because, it depends on hadoop-yarn-server-tests 
 project. 
 Below is the exception :
 {noformat}
 [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find 
 attachment with classifier: tests in module project: 
 org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this 
 module from the module-set.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..


[ 
https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715429#comment-14715429
 ] 

Allen Wittenauer commented on YARN-4084:


So then mvn compile is actually what you want

 Yarn should allow to skip hadoop-yarn-server-tests project from build..
 ---

 Key: YARN-4084
 URL: https://issues.apache.org/jira/browse/YARN-4084
 Project: Hadoop YARN
  Issue Type: Bug
  Components: build
Affects Versions: 2.7.1
Reporter: Ved Prakash Pandey
Priority: Minor
 Attachments: YARN-4084.patch


 For fast compilation one can try to skip the test code compilation by using 
 {{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this 
 option is used. This is because, it depends on hadoop-yarn-server-tests 
 project. 
 Below is the exception :
 {noformat}
 [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find 
 attachment with classifier: tests in module project: 
 org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this 
 module from the module-set.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..

2015-08-26 Thread Ved Prakash Pandey (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715338#comment-14715338
 ] 

Ved Prakash Pandey commented on YARN-4084:
--

I realize that my patch enforces to use the 
{{-Penable-yarn-server-test-module}} option to have normal builds. This is my 
bad. Rather, I will provide a patch tomorrow which have the switch like 
{{-Pdisable-yarn-server-test-module}} using which hadoop-yarn-server-test 
project can be skipped from the build. 

Please let me know if it sounds ok !!!



 Yarn should allow to skip hadoop-yarn-server-tests project from build..
 ---

 Key: YARN-4084
 URL: https://issues.apache.org/jira/browse/YARN-4084
 Project: Hadoop YARN
  Issue Type: Bug
  Components: build
Affects Versions: 2.7.1
Reporter: Ved Prakash Pandey
Priority: Minor
 Attachments: YARN-4084.patch


 For fast compilation one can try to skip the test code compilation by using 
 {{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this 
 option is used. This is because, it depends on hadoop-yarn-server-tests 
 project. 
 Below is the exception :
 {noformat}
 [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find 
 attachment with classifier: tests in module project: 
 org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this 
 module from the module-set.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4082) Container shouldn't be killed when node's label updated.


[ 
https://issues.apache.org/jira/browse/YARN-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715393#comment-14715393
 ] 

Hadoop QA commented on YARN-4082:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m  1s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   9m 12s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m  8s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 33s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 32s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  6s | The patch has 23  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 51s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 30s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  53m 31s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  95m  1s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752539/YARN-4082.2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a4d9acc |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8920/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8920/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8920/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8920/console |


This message was automatically generated.

 Container shouldn't be killed when node's label updated.
 

 Key: YARN-4082
 URL: https://issues.apache.org/jira/browse/YARN-4082
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacity scheduler
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-4082.1.patch, YARN-4082.2.patch


 From YARN-2920, containers will be killed if partition of a node changed. 
 Instead of killing containers, we should update resource-usage-by-partition 
 properly when node's partition updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2884) Proxying all AM-RM communications

2015-08-26 Thread Kishore Chaliparambil (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishore Chaliparambil updated YARN-2884:

Attachment: YARN-2884-V11.patch

Removed the ApplicationConstants.java file from the patch because it is not 
required.

 Proxying all AM-RM communications
 -

 Key: YARN-2884
 URL: https://issues.apache.org/jira/browse/YARN-2884
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Carlo Curino
Assignee: Kishore Chaliparambil
 Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, 
 YARN-2884-V11.patch, YARN-2884-V2.patch, YARN-2884-V3.patch, 
 YARN-2884-V4.patch, YARN-2884-V5.patch, YARN-2884-V6.patch, 
 YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch


 We introduce the notion of an RMProxy, running on each node (or once per 
 rack). Upon start the AM is forced (via tokens and configuration) to direct 
 all its requests to a new services running on the NM that provide a proxy to 
 the central RM. 
 This give us a place to:
 1) perform distributed scheduling decisions
 2) throttling mis-behaving AMs
 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

[
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712619#comment-14712619
]

Varun Saxena commented on YARN-3893:

I do not have any concern for exiting JVM. If fail fast is true(default
behavior), JVM will exit anyways.

I was wondering if it would be semantically appropriate to make JVM exit in
some cases if somebody has explicitly changed the fail fast config to false.
Logs can fill up if yarn-site.xml is wrong on both RMs' too.

I am not sure about the webapp part though. Does it require client rm service
to be initialized ? AFAIK, if RM is standby it will hit the webapp filter and
redirect to other RM(which may be active). Haven't tested UI after applying
previous patches, so maybe Bibin can tell. If there are some issues with
webapp, we will have to exit the JVM if transition to standby fails. Because
there may be no other way out then.
I will discuss further on this with you offline.

Both RM in active state when Admin#transitionToActive failure from refeshAll()
--

Key: YARN-3893
URL: https://issues.apache.org/jira/browse/YARN-3893
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch,
0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch,
yarn-site.xml

Cases that can cause this.
# Capacity scheduler xml is wrongly configured during switch
# Refresh ACL failure due to configuration
# Refresh User group failure due to configuration
Continuously both RM will try to be active
{code}
dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
./yarn rmadmin -getServiceState rm1
15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
active
dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
./yarn rmadmin -getServiceState rm2
15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
active
{code}
# Both Web UI active
# Status shown as active for both RM

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler


[ 
https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715222#comment-14715222
 ] 

Hadoop QA commented on YARN-3635:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745158/YARN-3635.6.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a4d9acc |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8918/console |


This message was automatically generated.

 Get-queue-mapping should be a common interface of YarnScheduler
 ---

 Key: YARN-3635
 URL: https://issues.apache.org/jira/browse/YARN-3635
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Wangda Tan
Assignee: Tan, Wangda
 Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, 
 YARN-3635.4.patch, YARN-3635.5.patch, YARN-3635.6.patch


 Currently, both of fair/capacity scheduler support queue mapping, which makes 
 scheduler can change queue of an application after submitted to scheduler.
 One issue of doing this in specific scheduler is: If the queue after mapping 
 has different maximum_allocation/default-node-label-expression of the 
 original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks 
 the wrong queue.
 I propose to make the queue mapping as a common interface of scheduler, and 
 RMAppManager set the queue after mapping before doing validations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics

2015-08-26 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14714416#comment-14714416
 ] 

Junping Du commented on YARN-3816:
--

Thanks [~varun_saxena] for review and comments!
bq. If we use same scheme for long or double, we may end up with 4 ORs' for a 
single metric. Maybe we can use cell tags for aggregation.
That's good point! When I was doing poc patch a few weeks ago, YARN-4053 
haven't been bring out to discussion so I thought it was a little overkill to 
use cell tag for specifying the only boolean value. Now it seems to be a good 
way, but I would prefer to defer this decision to YARN-4053 to address while 
there are other priority comments to address here so we can move faster. What 
do you think?

bq. Maybe in TimelineCollector#aggregateMetrics, we should do aggregation only 
if the flag is enabled.
That's true. That's part of reason why aggregation flag is added to metric. 
Will add check in next patch.

bq. In TimelineCollector#appendAggregatedMetricsToEntities any reason we are 
creating separate TimelineEntity objects for each metric ? Maybe create a 
single entity containing a set of metrics.
Nice catch.

bq. 3 new maps have been introduced in TimelineCollector and these are used as 
base to calculate aggregated value. What if the daemon crashes?
For RM, it could persistent maps to RMStateStore. For NM, it may not be enough 
as NM could be lost also. We need a mechanism that if TimelineCollector is 
relaunched somewhere else, it will read raw metrics and recover the maps before 
start to working. This will be part of failed over JIRAs like: YARN-3115, 
YARN-3359, etc.

bq. In TimelineMetricCalculator some functions have duplicate if conditions for 
long.
Fixed.

bq. In TimelineMetricCalculator#sum, to avoid negative values due to overflow, 
we can change conditions like below...
Like above comments, the overflow case will be handled in next patch.

bq. In TimelineMetric#aggregateTo, maybe use getValues instead of getValuesJAXB?
I would prefer to use TreeMap because it sort key (timestamp) when accessing 
it. aggregateTo() algorithm assume metrics are sorted by timestamp.

bq. Also I was wondering if TimelineMetric#aggregateTo should be moved to some 
util class. TimelineMetric is part of object model and exposed to client. And 
IIUC aggregateTo wont be called by client.
As Li's mentioning below, it is a bit tricky to have utility class for any 
classes in API, because it would mislead user to use it which is not our 
intension, at least for now. aggregateTo is not straighfoward and generic 
useful like methods in TimelineMetricCalculator, so let's hold on to expose it 
as utility class for now. Make it static sounds good though.

bq. What is EntityColumnPrefix#AGGREGATED_METRICS meant for?
It is something developed at poc stage a few weeks ago, and it should be 
removed after we moving to ApplicationTable.

 [Aggregation] App-level Aggregation for YARN system metrics
 ---

 Key: YARN-3816
 URL: https://issues.apache.org/jira/browse/YARN-3816
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Junping Du
Assignee: Junping Du
 Attachments: Application Level Aggregation of Timeline Data.pdf, 
 YARN-3816-YARN-2928-v1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch


 We need application level aggregation of Timeline data:
 - To present end user aggregated states for each application, include: 
 resource (CPU, Memory) consumption across all containers, number of 
 containers launched/completed/failed, etc. We need this for apps while they 
 are running as well as when they are done.
 - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
 aggregated to show details of states in framework level.
 - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
 on Application-level aggregations rather than raw entity-level data as much 
 less raws need to scan (with filter out non-aggregated entities, like: 
 events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4082) Container shouldn't be killed when node's label updated.

2015-08-26 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4082:
-
Attachment: YARN-4082.2.patch

Attached .2 patch, fixed findbugs warnings.

 Container shouldn't be killed when node's label updated.
 

 Key: YARN-4082
 URL: https://issues.apache.org/jira/browse/YARN-4082
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacity scheduler
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-4082.1.patch, YARN-4082.2.patch


 From YARN-2920, containers will be killed if partition of a node changed. 
 Instead of killing containers, we should update resource-usage-by-partition 
 properly when node's partition updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3717) Improve RM node labels web UI

2015-08-26 Thread Naganarasimha G R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3717:

Attachment: YARN-3717.20150826-1.patch

Fixing reported testcase failure and locally ran findbugs and dint find any 
issues induced by the code

 Improve RM node labels web UI
 -

 Key: YARN-3717
 URL: https://issues.apache.org/jira/browse/YARN-3717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, 
 YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, 
 YARN-3717.20150825-1.patch, YARN-3717.20150826-1.patch


 1 Add the default-node-Label expression for each queue in scheduler page.
 2 In Application/Appattempt page  show the app configured node label 
 expression for AM and Job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4085) Generate file with container resource limits in the container work dir

2015-08-26 Thread Varun Vasudev (JIRA)

Varun Vasudev created YARN-4085:
---

 Summary: Generate file with container resource limits in the 
container work dir
 Key: YARN-4085
 URL: https://issues.apache.org/jira/browse/YARN-4085
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Minor


Currently, a container doesn't know what resource limits are being imposed on 
it. It would be helpful if the NM generated a simple file in the container work 
dir with the resource limits specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-26 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715208#comment-14715208
 ] 

Vrushali C commented on YARN-4074:
--

My take is that we can make things as generic as possible but we should have 
separate apis for flows and flow runs. 

I had put up an initial proposal for flow based queries in ATS when we started 
off on this at 
https://issues.apache.org/jira/secure/attachment/12695071/Flow%20based%20queries.docx
 

I believe for the two queries you have listed above [~sjlee0], there would be 
two rest apis as:

1) Get All Flows
Path: /listFlows/cluster/
Returns: paginated list of apps with aggregated stats (to populate the flows 
list tab on the UI)
Sample URL:
http://timelineservice.example.com/ws/v2/listFlows/clusterid?limit=2startTime=20140510endTime=20140601
This would be an UI related aggregation query

2) Get specific Flow's runs
Path: /flow/cluster/user/flowName/[version]
Returns: list of flows
Sample URL: 
http://timelineservice.example.com/ws/v2/flow/clusterid/userName/someFlowName_idenitying_a_flow?limit=2startTime=1390939248000endTime=139361764800




 [timeline reader] implement support for querying for flows and flow runs
 

 Key: YARN-4074
 URL: https://issues.apache.org/jira/browse/YARN-4074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee

 Implement support for querying for flows and flow runs.
 We should be able to query for the most recent N flows, etc.
 This includes changes to the {{TimelineReader}} API if necessary, as well as 
 implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage

2015-08-26 Thread Vrushali C (JIRA)

[
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715192#comment-14715192
]

Vrushali C commented on YARN-4053:
--

The way I see this, it comes down to a basic question of whether we really
*need* floating point precision in metric values. For instance, cost is a
metric which could have a decimal value upon calculation. But, in my opinion
say a cost of 5 dollars versus 5.347891 dollars versus a cost of 5.78913 are
not that different. A cost of 6.x dollars is different from 5.x. I believe
that it does not matter THAT much that cost is 5.347891 or 5.79813. These are
hadoop applications, the time duration is rarely going to be exactly consistent
for the exactly same code. So metrics will usually have a slight fluctuation
between different runs of the exact same job.

Storage and querying of Longs is straightforward and clean. No ambiguity in
serialization.

Contrasting that with storage of various numerical data types in metrics:
- all the complexity of storing of column prefixes that can tell us which type
is stored so that serialization to/from hbase can be done correctly.
- the filtering in hbase becomes so much more complicated with all these
different datatypes.

Change the way metric values are stored in HBase Storage

Key: YARN-4053
URL: https://issues.apache.org/jira/browse/YARN-4053
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena
Attachments: YARN-4053-YARN-2928.01.patch

Currently HBase implementation uses GenericObjectMapper to convert and store
values in backend HBase storage. This converts everything into a string
representation(ASCII/UTF-8 encoded byte array).
While this is fine in most cases, it does not quite serve our use case for
metrics.
So we need to decide how are we going to encode and decode metric values and
store them in HBase.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4076) FairScheduler does not allow AM to choose which containers to preempt

2015-08-26 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4076:
---
Component/s: fairscheduler

 FairScheduler does not allow AM to choose which containers to preempt
 -

 Key: YARN-4076
 URL: https://issues.apache.org/jira/browse/YARN-4076
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot

 Capacity scheduler allows for AM to choose which containers will be 
 preempted. See comment about corresponding work pending for FairScheduler 
 https://issues.apache.org/jira/browse/YARN-568?focusedCommentId=13649126page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13649126



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4077) FairScheduler Reservation should wait for most relaxed scheduling delay permitted before issuing reservation

2015-08-26 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4077:
---
Component/s: fairscheduler

 FairScheduler Reservation should wait for most relaxed scheduling delay 
 permitted before issuing reservation
 

 Key: YARN-4077
 URL: https://issues.apache.org/jira/browse/YARN-4077
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot

 Today if an allocation has a node local request that allows for relaxation, 
 we do not wait for the relaxation delay before issuing the reservation. This 
 can be too aggressive. Instead we should allow the scheduling delays of 
 relaxation to expire before we choose to allow reserving a node for the 
 container. This allows for the request to be satisfied on a different node 
 instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins


[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715681#comment-14715681
 ] 

Varun Saxena commented on YARN-3528:


Will have a look.

 Tests with 12345 as hard-coded port break jenkins
 -

 Key: YARN-3528
 URL: https://issues.apache.org/jira/browse/YARN-3528
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Brahma Reddy Battula
Priority: Blocker
  Labels: test
 Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
 YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, YARN-3528.patch


 A lot of the YARN tests have hard-coded the port 12345 for their services to 
 come up on.
 This makes it impossible to have scheduled or precommit tests to run 
 consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
 appear to get ignored completely.
 A quick grep of 12345 shows up many places in the test suite where this 
 practise has developed.
 * All {{BaseContainerManagerTest}} subclasses
 * {{TestNodeManagerShutdown}}
 * {{TestContainerManager}}
 + others
 This needs to be addressed through portscanning and dynamic port allocation. 
 Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

[
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715519#comment-14715519
]

Varun Saxena commented on YARN-2962:

bq. how many applications did you have in the RM store before this became a
problem
Will have to check. I think it was more than 1 apps in our case. Will let
you know.

bq. switching the zk max messages size via -Djute.maxbuffer=bytes a viable
workaround?
Yes, that works. Also we can set a lower config value for number of completed
apps to be stored in state store. Even 0 can be set.

bq. Also, is there a sense of how close this ticket is to being merged?
The patches currently here have to be rebased because of recent changes. Had
put this on the back burner as this will go in trunk and not in branch-2. If
its required to be handled earlier, I will focus on it. Plan to take this up in
the coming month anyways.

ZKRMStateStore: Limit the number of znodes under a znode

Key: YARN-2962
URL: https://issues.apache.org/jira/browse/YARN-2962
Project: Hadoop YARN
Issue Type: Improvement
Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Varun Saxena
Priority: Critical
Attachments: YARN-2962.01.patch, YARN-2962.2.patch, YARN-2962.3.patch

We ran into this issue where we were hitting the default ZK server message
size configs, primarily because the message had too many znodes even though
they individually they were all small.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-26 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715641#comment-14715641
 ] 

Robert Kanter commented on YARN-3528:
-

+1 LGTM.  
Any other comments [~varun_saxena]?

 Tests with 12345 as hard-coded port break jenkins
 -

 Key: YARN-3528
 URL: https://issues.apache.org/jira/browse/YARN-3528
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Brahma Reddy Battula
Priority: Blocker
  Labels: test
 Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
 YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, YARN-3528.patch


 A lot of the YARN tests have hard-coded the port 12345 for their services to 
 come up on.
 This makes it impossible to have scheduled or precommit tests to run 
 consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
 appear to get ignored completely.
 A quick grep of 12345 shows up many places in the test suite where this 
 practise has developed.
 * All {{BaseContainerManagerTest}} subclasses
 * {{TestNodeManagerShutdown}}
 * {{TestContainerManager}}
 + others
 This needs to be addressed through portscanning and dynamic port allocation. 
 Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4086) Allow Aggregated Log readers to handle HAR files

2015-08-26 Thread Robert Kanter (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-4086:

Attachment: YARN-4086.001.patch

The YARN-4086.001.patch allows the yarn CLI and web UIs to read aggregated logs 
from HAR files.  It's mostly the same as the prelim patch in MAPREDUCE-6415, 
with some minor changes and unit tests.  The patches for this and 
MAPREDUCE-6415 can be applied independently.

*Important:* For the unit tests, I had to include some HAR files, which are 
basically folders with a few files in them.  One of the files is a binary file, 
which makes generating and applying the patch tricky.  I got it to work by 
generating it with {{git diff --binary  FILE}} and to apply with {{git apply 
FILE}}.  The regular {{patch}} command won't work and it has to be {{-p1}} 
and not {{-p0}}.  I'm not sure if Jenkins will be able to handle this.

 Allow Aggregated Log readers to handle HAR files
 

 Key: YARN-4086
 URL: https://issues.apache.org/jira/browse/YARN-4086
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.8.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-4086.001.patch


 This is for the YARN changes for MAPREDUCE-6415.  It allows the yarn CLI and 
 web UIs to read aggregated logs from HAR files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4086) Allow Aggregated Log readers to handle HAR files


[ 
https://issues.apache.org/jira/browse/YARN-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715545#comment-14715545
 ] 

Hadoop QA commented on YARN-4086:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m 27s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 6 new or modified test files. |
| {color:green}+1{color} | javac |   7m 46s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 53s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 20s | The applied patch generated 
4 release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 25s | The applied patch generated  3 
new checkstyle issues (total was 23, now 26). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 25s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |   6m 57s | Tests failed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   1m 59s | Tests passed in 
hadoop-yarn-common. |
| | |  53m 23s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.client.cli.TestLogsCLI |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752560/YARN-4086.001.patch |
| Optional Tests | javac unit findbugs checkstyle javadoc |
| git revision | trunk / a4d9acc |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/8921/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8921/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8921/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8921/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8921/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8921/console |


This message was automatically generated.

 Allow Aggregated Log readers to handle HAR files
 

 Key: YARN-4086
 URL: https://issues.apache.org/jira/browse/YARN-4086
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.8.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-4086.001.patch


 This is for the YARN changes for MAPREDUCE-6415.  It allows the yarn CLI and 
 web UIs to read aggregated logs from HAR files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2015-08-26 Thread Ben Podgursky (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715591#comment-14715591
 ] 

Ben Podgursky commented on YARN-2962:
-

Got it.  Thanks for the details.  It sounds like we'll have some workarounds 
available if we do run into trouble, which is hopefully good enough for now.

 ZKRMStateStore: Limit the number of znodes under a znode
 

 Key: YARN-2962
 URL: https://issues.apache.org/jira/browse/YARN-2962
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Varun Saxena
Priority: Critical
 Attachments: YARN-2962.01.patch, YARN-2962.2.patch, YARN-2962.3.patch


 We ran into this issue where we were hitting the default ZK server message 
 size configs, primarily because the message had too many znodes even though 
 they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712637#comment-14712637
 ] 

Varun Saxena commented on YARN-3893:


Infact according to me, we can crash RM on all times if config is wrong. 
Because till config is corrected, the RM where config is wrong cannot become 
active(and hence will be unusable). In that case, fail fast config wont even be 
required. So should we change the behavior to keep RM in standby(but up) if 
fail fast is set to false ? Anyways can discuss more in detail face to face. 

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2884) Proxying all AM-RM communications


[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712672#comment-14712672
 ] 

Hadoop QA commented on YARN-2884:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  21m 18s | Pre-patch trunk has 7 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 6 new or modified test files. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  2s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 31s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   6m 52s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 59s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |   7m 44s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |  53m 29s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 116m 18s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752399/YARN-2884-V11.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a4d9acc |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8913/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8913/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8913/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8913/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8913/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8913/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8913/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8913/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8913/console |


This message was automatically generated.

 Proxying all AM-RM communications
 -

 Key: YARN-2884
 URL: https://issues.apache.org/jira/browse/YARN-2884
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Carlo Curino
Assignee: Kishore Chaliparambil
 Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, 
 YARN-2884-V11.patch, YARN-2884-V2.patch, YARN-2884-V3.patch, 
 YARN-2884-V4.patch, YARN-2884-V5.patch, YARN-2884-V6.patch, 
 YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch


 We introduce the notion of an RMProxy, running on each node (or once per 
 rack). Upon start the AM is forced (via tokens and configuration) to direct 
 all its requests to a new services running on the NM that provide a proxy to 
 the central RM. 
 This give us a place to:
 1) perform distributed scheduling decisions
 2) throttling mis-behaving AMs
 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


 [ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3893:
---
Attachment: 0006-YARN-3893.patch

So JVM exit is the conclusion after discussion.
Attaching patch based on the same

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712896#comment-14712896
 ] 

Varun Saxena commented on YARN-3893:


The latest patch, 0008-YARN-3893.patch LGTM.
+1 pending Jenkins.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-26 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712745#comment-14712745
 ] 

Sunil G commented on YARN-3893:
---

As I see this, JVM exit is reasonable as proposed by Rohith earlier. Because 
scheduler configurations are wrong mostly, and its not required to switch to 
standby or fail-fast etc. Directly if we can exit JVM, it will be clean and 
there will be enough information available in logs to analyze for config fail 
reasons. 

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


 [ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3893:
---
Attachment: 0007-YARN-3893.patch

Missed one comment {{isRMActive}} check is not required.Attaching patch again

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


 [ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3893:
---
Attachment: 0008-YARN-3893.patch

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712942#comment-14712942
 ] 

Hadoop QA commented on YARN-3893:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 15s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 41s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 59s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 50s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 32s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  52m  9s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  90m 50s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752428/0006-YARN-3893.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a4d9acc |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8914/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8914/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8914/console |


This message was automatically generated.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins


[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712975#comment-14712975
 ] 

Hadoop QA commented on YARN-3528:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   8m 27s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 9 new or modified test files. |
| {color:green}+1{color} | javac |   7m 50s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 48s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 4  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 24s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m  3s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  22m 48s | Tests passed in 
hadoop-common. |
| {color:red}-1{color} | yarn tests |   7m 27s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  53m 46s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752441/YARN-3528-006.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / a4d9acc |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8917/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8917/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8917/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8917/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8917/console |


This message was automatically generated.

 Tests with 12345 as hard-coded port break jenkins
 -

 Key: YARN-3528
 URL: https://issues.apache.org/jira/browse/YARN-3528
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Brahma Reddy Battula
Priority: Blocker
  Labels: test
 Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
 YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, YARN-3528.patch


 A lot of the YARN tests have hard-coded the port 12345 for their services to 
 come up on.
 This makes it impossible to have scheduled or precommit tests to run 
 consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
 appear to get ignored completely.
 A quick grep of 12345 shows up many places in the test suite where this 
 practise has developed.
 * All {{BaseContainerManagerTest}} subclasses
 * {{TestNodeManagerShutdown}}
 * {{TestContainerManager}}
 + others
 This needs to be addressed through portscanning and dynamic port allocation. 
 Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-26 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712984#comment-14712984
 ] 

Brahma Reddy Battula commented on YARN-3528:


Testcase failures are unrelated..{{TestResourceLocalizationService}} is failing 
while cleanup dir's..

{noformat}
Tests run: 13, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.205 sec  
FAILURE! - in 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
testPublicResourceInitializesLocalDir(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
  Time elapsed: 0.275 sec   ERROR!
java.lang.IllegalArgumentException: 
target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/3/filecache/10
 does not exist
at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1637)
at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270)
{noformat}

 Tests with 12345 as hard-coded port break jenkins
 -

 Key: YARN-3528
 URL: https://issues.apache.org/jira/browse/YARN-3528
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Brahma Reddy Battula
Priority: Blocker
  Labels: test
 Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
 YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, YARN-3528.patch


 A lot of the YARN tests have hard-coded the port 12345 for their services to 
 come up on.
 This makes it impossible to have scheduled or precommit tests to run 
 consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
 appear to get ignored completely.
 A quick grep of 12345 shows up many places in the test suite where this 
 practise has developed.
 * All {{BaseContainerManagerTest}} subclasses
 * {{TestNodeManagerShutdown}}
 * {{TestContainerManager}}
 + others
 This needs to be addressed through portscanning and dynamic port allocation. 
 Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713001#comment-14713001
 ] 

Bibin A Chundatt commented on YARN-3893:


Test failures are not related to this patch. Have looked into the failed 
testcases 

{{hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens}} - Due Bind 
exception 
{{hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService}}
 - Locally verified its working fine and success
{{hadoop.yarn.server.resourcemanager.TestClientRMService}} -Ran locally in 
eclipse its working fine

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-26 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712931#comment-14712931
 ] 

Brahma Reddy Battula commented on YARN-3528:


[~rkanter] Sorry for delay and thanks for pinging.. Attached the patch kindly 
review...

 Tests with 12345 as hard-coded port break jenkins
 -

 Key: YARN-3528
 URL: https://issues.apache.org/jira/browse/YARN-3528
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Brahma Reddy Battula
Priority: Blocker
  Labels: test
 Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
 YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, YARN-3528.patch


 A lot of the YARN tests have hard-coded the port 12345 for their services to 
 come up on.
 This makes it impossible to have scheduled or precommit tests to run 
 consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
 appear to get ignored completely.
 A quick grep of 12345 shows up many places in the test suite where this 
 practise has developed.
 * All {{BaseContainerManagerTest}} subclasses
 * {{TestNodeManagerShutdown}}
 * {{TestContainerManager}}
 + others
 This needs to be addressed through portscanning and dynamic port allocation. 
 Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712974#comment-14712974
 ] 

Hadoop QA commented on YARN-3893:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 32s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 46s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 48s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 51s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  53m 19s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m 14s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
|   | hadoop.yarn.server.resourcemanager.TestClientRMService |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752434/0007-YARN-3893.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a4d9acc |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8915/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8915/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8915/console |


This message was automatically generated.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713002#comment-14713002
 ] 

Bibin A Chundatt commented on YARN-3893:


Above comments are for 
https://builds.apache.org/job/PreCommit-YARN-Build/8915/testReport/

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-26 Thread Brahma Reddy Battula (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3528:
---
Attachment: YARN-3528-006.patch

 Tests with 12345 as hard-coded port break jenkins
 -

 Key: YARN-3528
 URL: https://issues.apache.org/jira/browse/YARN-3528
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Brahma Reddy Battula
Priority: Blocker
  Labels: test
 Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
 YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, YARN-3528.patch


 A lot of the YARN tests have hard-coded the port 12345 for their services to 
 come up on.
 This makes it impossible to have scheduled or precommit tests to run 
 consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
 appear to get ignored completely.
 A quick grep of 12345 shows up many places in the test suite where this 
 practise has developed.
 * All {{BaseContainerManagerTest}} subclasses
 * {{TestNodeManagerShutdown}}
 * {{TestContainerManager}}
 + others
 This needs to be addressed through portscanning and dynamic port allocation. 
 Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712999#comment-14712999
 ] 

Hadoop QA commented on YARN-3893:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 43s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  8s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 51s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  53m 39s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  93m 18s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752437/0008-YARN-3893.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a4d9acc |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8916/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8916/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8916/console |


This message was automatically generated.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..


[ 
https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713541#comment-14713541
 ] 

Allen Wittenauer commented on YARN-4084:


Use -PskipTests in addition to -Dmaven.test.skip=true

 Yarn should allow to skip hadoop-yarn-server-tests project from build..
 ---

 Key: YARN-4084
 URL: https://issues.apache.org/jira/browse/YARN-4084
 Project: Hadoop YARN
  Issue Type: Bug
  Components: build
Affects Versions: 2.7.1
Reporter: Ved Prakash Pandey
Priority: Minor
 Attachments: YARN-4084.patch


 For fast compilation one can try to skip the test code compilation by using 
 {{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this 
 option is used. This is because, it depends on hadoop-yarn-server-tests 
 project. 
 Below is the exception :
 {noformat}
 [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find 
 attachment with classifier: tests in module project: 
 org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this 
 module from the module-set.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..


[ 
https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713541#comment-14713541
 ] 

Allen Wittenauer edited comment on YARN-4084 at 8/26/15 2:36 PM:
-

Use -DskipTests in addition to -Dmaven.test.skip=true


was (Author: aw):
Use -PskipTests in addition to -Dmaven.test.skip=true

 Yarn should allow to skip hadoop-yarn-server-tests project from build..
 ---

 Key: YARN-4084
 URL: https://issues.apache.org/jira/browse/YARN-4084
 Project: Hadoop YARN
  Issue Type: Bug
  Components: build
Affects Versions: 2.7.1
Reporter: Ved Prakash Pandey
Priority: Minor
 Attachments: YARN-4084.patch


 For fast compilation one can try to skip the test code compilation by using 
 {{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this 
 option is used. This is because, it depends on hadoop-yarn-server-tests 
 project. 
 Below is the exception :
 {noformat}
 [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find 
 attachment with classifier: tests in module project: 
 org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this 
 module from the module-set.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-08-26 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712700#comment-14712700
 ] 

Tsuyoshi Ozawa commented on YARN-3798:
--

[~vinodkv] [~zxu] could you check the latest patches?

 ZKRMStateStore shouldn't create new session without occurrance of 
 SESSIONEXPIED
 ---

 Key: YARN-3798
 URL: https://issues.apache.org/jira/browse/YARN-3798
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
 Environment: Suse 11 Sp3
Reporter: Bibin A Chundatt
Assignee: Varun Saxena
Priority: Blocker
  Labels: 2.6.1-candidate
 Attachments: RM.log, YARN-3798-2.7.002.patch, 
 YARN-3798-branch-2.6.01.patch, YARN-3798-branch-2.7.002.patch, 
 YARN-3798-branch-2.7.003.patch, YARN-3798-branch-2.7.004.patch, 
 YARN-3798-branch-2.7.005.patch, YARN-3798-branch-2.7.006.patch, 
 YARN-3798-branch-2.7.patch


 RM going down with NoNode exception during create of znode for appattempt
 *Please find the exception logs*
 {code}
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2015-06-09 10:09:44,732 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2015-06-09 10:09:44,886 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Exception while executing a ZK operation.
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
   at java.lang.Thread.run(Thread.java:745)
 2015-06-09 10:09:44,887 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
 out ZK retries. Giving up!
 2015-06-09 10:09:44,887 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 updating appAttempt: appattempt_1433764310492_7152_01
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
   at

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712767#comment-14712767
 ] 

Varun Saxena commented on YARN-3893:


Yes I agree. We can exit JVM directly. No need of using fail fast.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4029) Update LogAggregationStatus to store on finish


[ 
https://issues.apache.org/jira/browse/YARN-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713076#comment-14713076
 ] 

Bibin A Chundatt commented on YARN-4029:


Hi [~xgong]

Could you please review patch attached .
Also can we add this jira as subtask of YARN-431?

 Update LogAggregationStatus to store on finish
 --

 Key: YARN-4029
 URL: https://issues.apache.org/jira/browse/YARN-4029
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
 Attachments: 0001-YARN-4029.patch, Image.jpg


 Currently the log aggregation status is not getting updated to Store. When RM 
 is restarted will show NOT_START. 
 Steps to reproduce
 
 1.Submit mapreduce application
 2.Wait for completion
 3.Once application is completed switch RM
 *Log Aggregation Status* are changing
 *Log Aggregation Status* from SUCCESS to NOT_START



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-26 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715765#comment-14715765
 ] 

Sangjin Lee commented on YARN-4074:
---

I am about 90% done with the POC patch for this. I'm shooting for some time 
tomorrow to be able to post the patch.

In the meantime, in order to enable [~varun_saxena] and others to make 
progress, the following is the proposal that I'm implementing. Please *do* let 
me know if you have any questions or issues with the proposal so we can adjust 
accordingly.

(REST API)
In order to support the POC UI, we will implement 2 new queries:
# given the cluster, return the N most recent flows from the flow activity table
# given the cluster, user, flow id, and flow run id, return the flow run (with 
metrics) from the flow run table

At the REST level, they can be represented as follows for example:
# /listFlows/clusterId?limit=100
# /flow/clusterId/userId/flowName/flowRun

(UI)
With these URLs, the UI can invoke the first URL to render the landing page 
with the table. The REST output contains the flow activity records along with 
all the flow runs that were active during the day.

If the user drills down on a single flow, then the client side can generate the 
second queries against all the flow runs for that flow to fetch the metrics at 
the flow run level.

If the user further drills down into a single flow run, then it can do a 
(existing) query to retrieve all applications for a given flow run to get the 
application entities.

(reader interface)
Currently I am *not* planning to add new flow-specific methods to the 
{{TimelineReader}} interface. Instead, you can use the existing 
{{getEntities()}} and {{getEntity()}} methods to perform the above new queries:
# {{getEntities()}} with cluster specified and entity type = YARN_FLOW_ACTIVITY 
(a new timeline entity type)
# {{getEntity()}} with cluster, user, flow id, flow run id specified and entity 
type = YARN_FLOW


 [timeline reader] implement support for querying for flows and flow runs
 

 Key: YARN-4074
 URL: https://issues.apache.org/jira/browse/YARN-4074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee

 Implement support for querying for flows and flow runs.
 We should be able to query for the most recent N flows, etc.
 This includes changes to the {{TimelineReader}} API if necessary, as well as 
 implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4087) Set YARN_FAIL_FAST to be false by default

2015-08-26 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-4087:
--
Summary: Set YARN_FAIL_FAST to be false by default  (was: Set RM_FAIL_FAST 
to be false by default)

 Set YARN_FAIL_FAST to be false by default
 -

 Key: YARN-4087
 URL: https://issues.apache.org/jira/browse/YARN-4087
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He

 Increasingly, I feel setting this property to be false makes more sense 
 especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics

2015-08-26 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715809#comment-14715809
 ] 

Li Lu commented on YARN-3816:
-

Hi [~djp], I briefly looked at the patch, and have one quick question: In 
application table, we no longer store the type of the incoming entities, IIUC. 
All entity types from the application table will be added in HBaseReader, as in:
{code}
String entityType = isApplication ?
  TimelineEntityType.YARN_APPLICATION.toString() :
  EntityColumn.TYPE.readResult(result).toString();
{code} 
In this case, maybe we're missing YARN_APPLICATION_AGGREGATION types and we can 
no longer differentiating them? Or, any other ways we can recognize if an 
entity comes from application itself, or from aggregation? (Am I missing 
anything? )

 [Aggregation] App-level Aggregation for YARN system metrics
 ---

 Key: YARN-3816
 URL: https://issues.apache.org/jira/browse/YARN-3816
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Junping Du
Assignee: Junping Du
 Attachments: Application Level Aggregation of Timeline Data.pdf, 
 YARN-3816-YARN-2928-v1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch


 We need application level aggregation of Timeline data:
 - To present end user aggregated states for each application, include: 
 resource (CPU, Memory) consumption across all containers, number of 
 containers launched/completed/failed, etc. We need this for apps while they 
 are running as well as when they are done.
 - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
 aggregated to show details of states in framework level.
 - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
 on Application-level aggregations rather than raw entity-level data as much 
 less raws need to scan (with filter out non-aggregated entities, like: 
 events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default

2015-08-26 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715829#comment-14715829
 ] 

Karthik Kambatla commented on YARN-4087:


+1, if fail-fast hasn't been in any prior release and we are not drastically 
altering the behavior.

In any case, it would be nice to release note this new behavior for 2.8.0. 

 Set YARN_FAIL_FAST to be false by default
 -

 Key: YARN-4087
 URL: https://issues.apache.org/jira/browse/YARN-4087
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-4087.1.patch


 Increasingly, I feel setting this property to be false makes more sense 
 especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins


[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715754#comment-14715754
 ] 

Varun Saxena commented on YARN-3528:


# In TestNodeStatusUpdater#createNMConfig, change has been missed. Still see 
hardcoded port.
{code}
 conf.set(YarnConfiguration.NM_LOCALIZER_ADDRESS,
localhostAddress + :12346);
{code}
# In TestContainer, port is just used for creating container token. Dont need 
to call ServerSocketUtil#getPort.
# Nit : TestNodeManagerShutdown#startContainer, below commented line can be 
removed.
{code}
 //final int port = ServerSocketUtil.getPort(49156, 10);
{code}
# As you will be changing other things, maybe can change below as well.  In 
TestNodeManagerShutdown I dont see any need to add a try-catch block here. We 
have just replaced 12345 with a passed port.
{code}
-InetSocketAddress containerManagerBindAddress =
-NetUtils.createSocketAddrForHost(127.0.0.1, 12345);
+InetSocketAddress containerManagerBindAddress = null;
+try {
+  containerManagerBindAddress = 
NetUtils.createSocketAddrForHost(127.0.0.1, port);
+} catch (Exception e) {
+  throw new RuntimeException(Fail To Get the Port);
+}
{code}

Other things look fine.

 Tests with 12345 as hard-coded port break jenkins
 -

 Key: YARN-3528
 URL: https://issues.apache.org/jira/browse/YARN-3528
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Brahma Reddy Battula
Priority: Blocker
  Labels: test
 Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
 YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, YARN-3528.patch


 A lot of the YARN tests have hard-coded the port 12345 for their services to 
 come up on.
 This makes it impossible to have scheduled or precommit tests to run 
 consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
 appear to get ignored completely.
 A quick grep of 12345 shows up many places in the test suite where this 
 practise has developed.
 * All {{BaseContainerManagerTest}} subclasses
 * {{TestNodeManagerShutdown}}
 * {{TestContainerManager}}
 + others
 This needs to be addressed through portscanning and dynamic port allocation. 
 Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4087) Set RM_FAIL_FAST to be false by default

2015-08-26 Thread Jian He (JIRA)

Jian He created YARN-4087:
-

 Summary: Set RM_FAIL_FAST to be false by default
 Key: YARN-4087
 URL: https://issues.apache.org/jira/browse/YARN-4087
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He


Increasingly, I feel setting this property to be false makes more sense 
especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4087) Set YARN_FAIL_FAST to be false by default

2015-08-26 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-4087:
--
Attachment: YARN-4087.1.patch

simple patch which flips the config


 Set YARN_FAIL_FAST to be false by default
 -

 Key: YARN-4087
 URL: https://issues.apache.org/jira/browse/YARN-4087
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-4087.1.patch


 Increasingly, I feel setting this property to be false makes more sense 
 especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default


[ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715872#comment-14715872
 ] 

Bibin A Chundatt commented on YARN-4087:


So by  default in yarn-default.xml 

yarn.resourcemanager.fail-fast=true
yarn.fail-fast=false

In YarnConfiguration

{code}
  public static boolean shouldRMFailFast(Configuration conf) {
return conf.getBoolean(YarnConfiguration.RM_FAIL_FAST,
conf.getBoolean(YarnConfiguration.YARN_FAIL_FAST,
YarnConfiguration.DEFAULT_YARN_FAIL_FAST));
  }
{code}

some mismatch rt?

No plans to change YarnConfiguration.RM_FAIL_FAST.



 Set YARN_FAIL_FAST to be false by default
 -

 Key: YARN-4087
 URL: https://issues.apache.org/jira/browse/YARN-4087
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-4087.1.patch


 Increasingly, I feel setting this property to be false makes more sense 
 especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4048) Linux kernel panic under strict CPU limits

2015-08-26 Thread Craig Condit (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715561#comment-14715561
 ] 

Craig Condit commented on YARN-4048:


Just my two cents: Using cgroups on CentOS/RHEL 6.x is asking for it... We've 
experienced similar crashes using anything that utilizes cgroups, not just YARN 
(for example -- docker).

Cgroups is widely regarded as unstable in Linux kernel versions  3.10 or so.


 Linux kernel panic under strict CPU limits
 --

 Key: YARN-4048
 URL: https://issues.apache.org/jira/browse/YARN-4048
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.1
Reporter: Chengbing Liu
Priority: Critical
 Attachments: panic.png


 With YARN-2440 and YARN-2531, we have seen some kernel panics happening under 
 heavy pressure. Even with YARN-2809, it still panics.
 We are using CentOS 6.5, hadoop 2.5.0-cdh5.2.0 with the above patches. I 
 guess the latest version also has the same issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers

2015-08-26 Thread Anubhav Dhoot (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3920:

Attachment: YARN-3920.004.patch

Attaching a patch based on the multiple of increment approach

 FairScheduler Reserving a node for a container should be configurable to 
 allow it used only for large containers
 

 Key: YARN-3920
 URL: https://issues.apache.org/jira/browse/YARN-3920
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3920.004.patch, yARN-3920.001.patch, 
 yARN-3920.002.patch, yARN-3920.003.patch


 Reserving a node for a container was designed for preventing large containers 
 from starvation from small requests that keep getting into a node. Today we 
 let this be used even for a small container request. This has a huge impact 
 on scheduling since we block other scheduling requests until that reservation 
 is fulfilled. We should make this configurable so its impact can be minimized 
 by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-26 Thread Sangjin Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4074:
--
Attachment: YARN-4074-YARN-2928.POC.001.patch

Posting a v.1 POC patch. This implements the first query (the flow activity 
query). I'll follow it up with another one tomorrow that implements the second 
one too.

This is to get the design choices and correctness reviewed first. It does
- include the flow activity query as part of getEntities()
- create a data container for the flow activity table called FlowActivityEntity

It probably needs a fair amount of refactoring to make the reader code more 
manageable. Also, I need to add unit tests. They will come later.

 [timeline reader] implement support for querying for flows and flow runs
 

 Key: YARN-4074
 URL: https://issues.apache.org/jira/browse/YARN-4074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-4074-YARN-2928.POC.001.patch


 Implement support for querying for flows and flow runs.
 We should be able to query for the most recent N flows, etc.
 This includes changes to the {{TimelineReader}} API if necessary, as well as 
 implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers

2015-08-26 Thread Anubhav Dhoot (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3920:

Attachment: YARN-3920.004.patch

Updated configuration in FairSchedulerConfiguration

 FairScheduler Reserving a node for a container should be configurable to 
 allow it used only for large containers
 

 Key: YARN-3920
 URL: https://issues.apache.org/jira/browse/YARN-3920
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3920.004.patch, YARN-3920.004.patch, 
 yARN-3920.001.patch, yARN-3920.002.patch, yARN-3920.003.patch


 Reserving a node for a container was designed for preventing large containers 
 from starvation from small requests that keep getting into a node. Today we 
 let this be used even for a small container request. This has a huge impact 
 on scheduling since we block other scheduling requests until that reservation 
 is fulfilled. We should make this configurable so its impact can be minimized 
 by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers


[ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716119#comment-14716119
 ] 

Hadoop QA commented on YARN-3920:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 51s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:red}-1{color} | javac |   3m 39s | The patch appears to cause the 
build to fail. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752657/YARN-3920.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 4cbbfa2 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8923/console |


This message was automatically generated.

 FairScheduler Reserving a node for a container should be configurable to 
 allow it used only for large containers
 

 Key: YARN-3920
 URL: https://issues.apache.org/jira/browse/YARN-3920
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3920.004.patch, YARN-3920.004.patch, 
 yARN-3920.001.patch, yARN-3920.002.patch, yARN-3920.003.patch


 Reserving a node for a container was designed for preventing large containers 
 from starvation from small requests that keep getting into a node. Today we 
 let this be used even for a small container request. This has a huge impact 
 on scheduling since we block other scheduling requests until that reservation 
 is fulfilled. We should make this configurable so its impact can be minimized 
 by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2745) Extend YARN to support multi-resource packing of tasks

[
https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716019#comment-14716019
]

Srikanth Kandula commented on YARN-2745:

Just a brief update on this JIRA...

1) [~chris.douglas] pushed through collection of network and disk usages to
Hadoop common. See Hadoop 12210.

2) [~elgoiri] and [~kasha] in Yarn 3534 and Yarn 3980 collecting cpu and memory
info of containers, push that information from the NM to the RM and make it
available to the scheduler.

3) Packing requires the scheduler to look past the first schedulable task
discovered by the capacity scheduler loop. Based on the feedback above, we have
decoupled the architectural change needed from the actual packing policy. See
Yarn 4056, called bundling. Many different packing policies are allowed in the
bundle.

4) These changes are complementary and orthogonal to Yarn-1011. That JIRA
recommends, rightly, to adapt RM allocation based on dynamic resource usage of
the allocated containers. This JIRA is more about packing containers. It
currently does so based on expected resource usages as indicated in the ask.
Indeed, packing based on dynamic usage information would be strictly better and
is left for future work.

Extend YARN to support multi-resource packing of tasks
--

Key: YARN-2745
URL: https://issues.apache.org/jira/browse/YARN-2745
Project: Hadoop YARN
Issue Type: Improvement
Components: nodemanager, resourcemanager, scheduler
Reporter: Robert Grandl
Assignee: Robert Grandl
Attachments: sigcomm_14_tetris_talk.pptx, tetris_design_doc.docx,
tetris_paper.pdf

In this umbrella JIRA we propose an extension to existing scheduling
techniques, which accounts for all resources used by a task (CPU, memory,
disk, network) and it is able to achieve three competing objectives:
fairness, improve cluster utilization and reduces average job completion time.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2745) Extend YARN to support multi-resource packing of tasks


[ 
https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716023#comment-14716023
 ] 

Srikanth Kandula commented on YARN-2745:


[~aw] Done by [~chris.douglas]!

 Extend YARN to support multi-resource packing of tasks
 --

 Key: YARN-2745
 URL: https://issues.apache.org/jira/browse/YARN-2745
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager, scheduler
Reporter: Robert Grandl
Assignee: Robert Grandl
 Attachments: sigcomm_14_tetris_talk.pptx, tetris_design_doc.docx, 
 tetris_paper.pdf


 In this umbrella JIRA we propose an extension to existing scheduling 
 techniques, which accounts for all resources used by a task (CPU, memory, 
 disk, network) and it is able to achieve three competing objectives: 
 fairness, improve cluster utilization and reduces average job completion time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2745) Extend YARN to support multi-resource packing of tasks


[ 
https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716021#comment-14716021
 ] 

Srikanth Kandula commented on YARN-2745:


[~vinodkv] Thanks for the related. The efforts are complementary. Indeed, 
adapting assignment based on the dynamic usage would be a good thing to have. 
This JIRA is more about packing based on anticipated usages as indicated by the 
ask. Dynamic packing would be even better.


 Extend YARN to support multi-resource packing of tasks
 --

 Key: YARN-2745
 URL: https://issues.apache.org/jira/browse/YARN-2745
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager, scheduler
Reporter: Robert Grandl
Assignee: Robert Grandl
 Attachments: sigcomm_14_tetris_talk.pptx, tetris_design_doc.docx, 
 tetris_paper.pdf


 In this umbrella JIRA we propose an extension to existing scheduling 
 techniques, which accounts for all resources used by a task (CPU, memory, 
 disk, network) and it is able to achieve three competing objectives: 
 fairness, improve cluster utilization and reduces average job completion time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1012) Report NM aggregated container resource utilization in heartbeat


[ 
https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716029#comment-14716029
 ] 

Srikanth Kandula commented on YARN-1012:


[~elgoiri], [~kasha] Could you comment on whether this should go into hadoop 
common. Also, it may be worthwhile to extend this to also account for network 
and disk usages of the containers... See Hadoop 12210.

 Report NM aggregated container resource utilization in heartbeat
 

 Key: YARN-1012
 URL: https://issues.apache.org/jira/browse/YARN-1012
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Arun C Murthy
Assignee: Inigo Goiri
 Fix For: 2.8.0

 Attachments: YARN-1012-1.patch, YARN-1012-10.patch, 
 YARN-1012-11.patch, YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, 
 YARN-1012-5.patch, YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, 
 YARN-1012-9.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default


[ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715892#comment-14715892
 ] 

Hadoop QA commented on YARN-4087:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 27s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  2s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  9s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 58s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 10s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 59s | Tests passed in 
hadoop-yarn-common. |
| | |  46m 40s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752615/YARN-4087.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f44b599 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8922/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8922/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8922/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8922/console |


This message was automatically generated.

 Set YARN_FAIL_FAST to be false by default
 -

 Key: YARN-4087
 URL: https://issues.apache.org/jira/browse/YARN-4087
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-4087.1.patch


 Increasingly, I feel setting this property to be false makes more sense 
 especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1011) [Umbrella] RM should dynamically schedule containers based on utilization of currently allocated containers


[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716034#comment-14716034
 ] 

Srikanth Kandula commented on YARN-1011:


This is a great idea. Is there an ETA for this? Could you comment on whether it 
is being deprioritized for some reason?

 [Umbrella] RM should dynamically schedule containers based on utilization of 
 currently allocated containers
 ---

 Key: YARN-1011
 URL: https://issues.apache.org/jira/browse/YARN-1011
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Arun C Murthy

 Currently RM allocates containers and assumes resources allocated are 
 utilized.
 RM can, and should, get to a point where it measures utilization of allocated 
 containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

Srikanth Kandula created YARN-4088:
--

 Summary: RM should be able to process heartbeats from NM 
asynchronously
 Key: YARN-4088
 URL: https://issues.apache.org/jira/browse/YARN-4088
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Reporter: Srikanth Kandula


Today, the RM sequentially processes one heartbeat after another. 

Imagine a 3000 server cluster with each server heart-beating every 3s. This 
gives the RM 1ms on average to process each NM heartbeat. That is tough.

It is true that there are several underlying datastructures that will be 
touched during heartbeat processing. So, it is non-trivial to parallelize the 
NM heartbeat. Yet, it is quite doable...

Parallelizing the NM heartbeat would substantially improve the scalability of 
the RM, allowing it to either 
a) run larger clusters or 
b) support faster heartbeats or dynamic scaling of heartbeats
c) take more asks from each application or 
c) use cleverer/ more expensive algorithms such as node labels or better 
packing or ...

Indeed the RM's scalability limit has been cited as the motivating reason for a 
variety of efforts which will become less needed if this can be solved. Ditto 
for slow heartbeats.  See Sparrow and Mercury papers for example.

Can we take a shot at this?
If not, could we discuss why.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4081) Add support for multiple resource types in the Resource class


[ 
https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715991#comment-14715991
 ] 

Srikanth Kandula commented on YARN-4081:


Extending to multiple resources is great, but why use a Map? Is there a rough 
idea how many different resources one may want to encode? It seems an overkill 
to incur so much additional overhead if say all that is needed is a handful of 
more resources. Ditto for encapsulating strings in URIs and the 
ResourceInformation wrapper over doubles. It would perhaps have been okay if 
this datastructure was less often used but if i understand correctly, Resources 
is created/destroyed at least once per ask/ assignment and often many more 
times...

 Add support for multiple resource types in the Resource class
 -

 Key: YARN-4081
 URL: https://issues.apache.org/jira/browse/YARN-4081
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: YARN-4081-YARN-3926.001.patch


 For adding support for multiple resource types, we need to add support for 
 this in the Resource class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority

2015-08-26 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715998#comment-14715998
 ] 

Rohith Sharma K S commented on YARN-3250:
-

Thanks Sunil G for reviewing the patch. The test case failure are unrelated to 
this patch!!

 Support admin cli interface in for Application Priority
 ---

 Key: YARN-3250
 URL: https://issues.apache.org/jira/browse/YARN-3250
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Rohith Sharma K S
 Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, 
 0003-YARN-3250.patch


 Current Application Priority Manager supports only configuration via file. 
 To support runtime configurations for admin cli and REST, a common management 
 interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3534) Collect memory/cpu usage on the node


[ 
https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716031#comment-14716031
 ] 

Srikanth Kandula commented on YARN-3534:


[~elgoiri], [~kasha], could you comment on extending this to also take in 
network and disk usage information?

 Collect memory/cpu usage on the node
 

 Key: YARN-3534
 URL: https://issues.apache.org/jira/browse/YARN-3534
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Inigo Goiri
Assignee: Inigo Goiri
 Fix For: 2.8.0

 Attachments: YARN-3534-1.patch, YARN-3534-10.patch, 
 YARN-3534-11.patch, YARN-3534-12.patch, YARN-3534-14.patch, 
 YARN-3534-15.patch, YARN-3534-16.patch, YARN-3534-16.patch, 
 YARN-3534-17.patch, YARN-3534-17.patch, YARN-3534-18.patch, 
 YARN-3534-2.patch, YARN-3534-3.patch, YARN-3534-3.patch, YARN-3534-4.patch, 
 YARN-3534-5.patch, YARN-3534-6.patch, YARN-3534-7.patch, YARN-3534-8.patch, 
 YARN-3534-9.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 YARN should be aware of the resource utilization of the nodes when scheduling 
 containers. For this, this task will implement the collection of memory/cpu 
 usage on the node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler


[ 
https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716032#comment-14716032
 ] 

Srikanth Kandula commented on YARN-3980:


+1 this would be very useful to have... Will enable even better packing.

 Plumb resource-utilization info in node heartbeat through to the scheduler
 --

 Key: YARN-3980
 URL: https://issues.apache.org/jira/browse/YARN-3980
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.7.1
Reporter: Karthik Kambatla
Assignee: Inigo Goiri
 Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, 
 YARN-3980-v2.patch


 YARN-1012 and YARN-3534 collect resource utilization information for all 
 containers and the node respectively and send it to the RM on node heartbeat. 
 We should plumb it through to the scheduler so the scheduler can make use of 
 it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1011) [Umbrella] RM should dynamically schedule containers based on utilization of currently allocated containers