[jira] [Updated] (YARN-3319) Implement a Fair SchedulerOrderingPolicy

2015-04-02 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3319:
--
Attachment: YARN-3319.39.patch

 Implement a Fair SchedulerOrderingPolicy
 

 Key: YARN-3319
 URL: https://issues.apache.org/jira/browse/YARN-3319
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3319.13.patch, YARN-3319.14.patch, 
 YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch


 Implement a Fair Comparator for the Scheduler Comparator Ordering Policy 
 which prefers to allocate to SchedulerProcesses with least current usage, 
 very similar to the FairScheduler's FairSharePolicy.  
 The Policy will offer allocations to applications in a queue in order of 
 least resources used, and preempt applications in reverse order (from most 
 resources used). This will include conditional support for sizeBasedWeight 
 style adjustment
 An implementation of a Scheduler Comparator for use with the Scheduler 
 Comparator Ordering Policy will be built with the below comparison for 
 ordering applications for container assignment (ascending) and for preemption 
 (descending)
 Current resource usage - less usage is lesser
 Submission time - earlier is lesser
 Optionally, based on a conditional configuration to enable sizeBasedWeight 
 (default false), an adjustment to boost larger applications (to offset the 
 natural preference for smaller applications) will adjust the resource usage 
 value based on demand, dividing it by the below value:
 Math.log1p(app memory demand) / Math.log(2);
 In cases where the above is indeterminate (two applications are equal after 
 this comparison), behavior falls back to comparison based on the application 
 name, which is lexically FIFO for that comparison (first submitted is lesser)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior

2015-04-02 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3318:
--
Attachment: YARN-3318.39.patch

 Create Initial OrderingPolicy Framework, integrate with CapacityScheduler 
 LeafQueue supporting present behavior
 ---

 Key: YARN-3318
 URL: https://issues.apache.org/jira/browse/YARN-3318
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3318.13.patch, YARN-3318.14.patch, 
 YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, 
 YARN-3318.36.patch, YARN-3318.39.patch


 Create the initial framework required for using OrderingPolicies with 
 SchedulerApplicaitonAttempts and integrate with the CapacityScheduler.   This 
 will include an implementation which is compatible with current FIFO behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior

2015-04-02 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392211#comment-14392211
 ] 

Craig Welch commented on YARN-3318:
---

[~leftnoteasy]  

SchedulerProcessEvents replaced with containerAllocated and containerReleased
Serial and SerialEpoch replaced with compareInputOrderTo(), which is the option 
2 for addressing it which we settled on offline
Added addSchedulerProcess/removeSchedulerProcess/addAllSchedulerProcesses
Changed configuration so that 
yarn.scheduler.capacity.root.default.ordering-policy=fair
will setup the fair configuration, fifo will setup fifo, fair+fifo will 
setup compound fair + fifo, etc.  It is possible to setup a custom ordering 
policy class using a different configuration, but the base one will handle the 
friendly setup.

[~vinodkv]
bq. It is not entirely clear how the ordering and limits work together - as one 
policy with multiple facets or multiple policy types
This should be modeled as different types of policies, so that they can each 
focus on their particular purpose and avoid a repetition of the intermingling 
which has made it difficult to mix, match, and share capabilities.  Having 
multiple policy types is essential to make it easy to combine them as needed.
bq. let's split the patch that exposes this to the client side / web UI and in 
the API records into its own JIRA...premature to support this as a publicly 
supportable configuration...
The goal is to make this available quickly but iteratively, keeping the changes 
small but making them available for use and feedback.  Clearly we can mark 
things unstable, communicate that they are not fully mature/subject to 
change/should be used gently, but we will need to make it possible to activate 
the feature and use it in order to accomplish the use and feedback.  We should 
grow it organically, gradually, iteratively, think of it is a facet of the 
policy framework hooked up and available but with more to follow
bq. ...SchedulableEntity better... well, I'd actually talked [~leftnoteasy] 
into SchedulerProcess :-)   So, we can chew on this a bit more  see where we go
bq. You add/remove applications to/from LeafQueue's policy but addition/removal 
of containers is an event...
This has been factored differently along [~leftnoteasy]'s suggestion, it should 
now be consistent
bq. The notion of a comparator doesn't make sense to an admin. It is simply a 
policy...
Have modeled policy configuration differently, so comparator is out of 
sight (see above).  
bq.  Depending on how ordering and limits come together, they may become 
properties of a policy
I expect them to be distinct, this is specifically an ordering-policy, limits 
will be other types of limit-policy(ies)

patch with these changes to follow in a few...

 Create Initial OrderingPolicy Framework, integrate with CapacityScheduler 
 LeafQueue supporting present behavior
 ---

 Key: YARN-3318
 URL: https://issues.apache.org/jira/browse/YARN-3318
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3318.13.patch, YARN-3318.14.patch, 
 YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, YARN-3318.36.patch


 Create the initial framework required for using OrderingPolicies with 
 SchedulerApplicaitonAttempts and integrate with the CapacityScheduler.   This 
 will include an implementation which is compatible with current FIFO behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392345#comment-14392345
 ] 

Hadoop QA commented on YARN-3318:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12708923/YARN-3318.39.patch
  against trunk revision 867d5d2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1149 javac 
compiler warnings (more than the trunk's current 1148 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.client.api.impl.TestAMRMClient

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7198//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7198//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7198//console

This message is automatically generated.

 Create Initial OrderingPolicy Framework, integrate with CapacityScheduler 
 LeafQueue supporting present behavior
 ---

 Key: YARN-3318
 URL: https://issues.apache.org/jira/browse/YARN-3318
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3318.13.patch, YARN-3318.14.patch, 
 YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, 
 YARN-3318.36.patch, YARN-3318.39.patch


 Create the initial framework required for using OrderingPolicies with 
 SchedulerApplicaitonAttempts and integrate with the CapacityScheduler.   This 
 will include an implementation which is compatible with current FIFO behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.

2015-04-02 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3334:
--
Attachment: YARN-3334.7.patch

Last patch looks good to me, but I undo the some unnecessary change in 
TimelineClinetImpl (which seems to be adde for code debugging). Will hold the 
patch for a while before committing, in case other folks want to to take a look.

 [Event Producers] NM TimelineClient life cycle handling and container metrics 
 posting to new timeline service.
 --

 Key: YARN-3334
 URL: https://issues.apache.org/jira/browse/YARN-3334
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: YARN-2928
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
 YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, 
 YARN-3334-v5.patch, YARN-3334-v6.patch, YARN-3334.7.patch


 After YARN-3039, we have service discovery mechanism to pass app-collector 
 service address among collectors, NMs and RM. In this JIRA, we will handle 
 service address setting for TimelineClients in NodeManager, and put container 
 metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2942) Aggregated Log Files should be combined

2015-04-02 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-2942:

Attachment: ConcatableAggregatedLogsProposal_v4.pdf

I've just uploaded ConcatableAggregatedLogsProposal_v4.pdf, with an updated 
design that uses a slightly modified version of the CombinedAggregatedLogFormat 
(now ConcatableAggregatedLogFormat) I already wrote and would use HDFS concat 
to combine the files.

[~zjshen], [~kasha], and [~vinodkv], can you take a look at it?

 Aggregated Log Files should be combined
 ---

 Key: YARN-2942
 URL: https://issues.apache.org/jira/browse/YARN-2942
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
 CompactedAggregatedLogsProposal_v1.pdf, 
 CompactedAggregatedLogsProposal_v2.pdf, 
 ConcatableAggregatedLogsProposal_v4.pdf, YARN-2942-preliminary.001.patch, 
 YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
 YARN-2942.003.patch


 Turning on log aggregation allows users to easily store container logs in 
 HDFS and subsequently view them in the YARN web UIs from a central place.  
 Currently, there is a separate log file for each Node Manager.  This can be a 
 problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
 accumulating many (possibly small) files per YARN application.  The current 
 “solution” for this problem is to configure YARN (actually the JHS) to 
 automatically delete these files after some amount of time.  
 We should improve this by compacting the per-node aggregated log files into 
 one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes

2015-04-02 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393433#comment-14393433
 ] 

Rohini Palaniswamy commented on YARN-3439:
--

bq. Essentially the idea is to reference count the tokens and only attempt to 
cancel them when the token is no longer referenced. 
   Would be a good idea. I think this is the third time we have had delegation 
token renewal broken for Oozie with the Hadoop 2.x line. 

 RM fails to renew token when Oozie launcher leaves before sub-job finishes
 --

 Key: YARN-3439
 URL: https://issues.apache.org/jira/browse/YARN-3439
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Jason Lowe
Assignee: Daryn Sharp
Priority: Blocker
 Attachments: YARN-3439.001.patch


 When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't 
 linger waiting for the sub-job to finish.  At that point the RM stops 
 renewing delegation tokens for the launcher job which wreaks havoc on the 
 sub-job if the sub-job runs long enough for the tokens to expire.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2901) Add errors and warning stats to RM, NM web UI

2015-04-02 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393494#comment-14393494
 ] 

Wangda Tan commented on YARN-2901:
--

+1 for the patch. Will commit it today if no opposite opinions.

 Add errors and warning stats to RM, NM web UI
 -

 Key: YARN-2901
 URL: https://issues.apache.org/jira/browse/YARN-2901
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
 Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, 
 apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, apache-yarn-2901.3.patch, 
 apache-yarn-2901.4.patch, apache-yarn-2901.5.patch


 It would be really useful to have statistics on the number of errors and 
 warnings in the RM and NM web UI. I'm thinking about -
 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
 hours/day
 By errors and warnings I'm referring to the log level.
 I suspect we can probably achieve this by writing a custom appender?(I'm open 
 to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3388) Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393493#comment-14393493
 ] 

Hadoop QA commented on YARN-3388:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12709050/YARN-3388-v1.patch
  against trunk revision eccb7d4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  org.apache.hadoop.yarn.server.resourcemanager.TestRM

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7201//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7201//console

This message is automatically generated.

 Allocation in LeafQueue could get stuck because DRF calculator isn't well 
 supported when computing user-limit
 -

 Key: YARN-3388
 URL: https://issues.apache.org/jira/browse/YARN-3388
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts
 Attachments: YARN-3388-v0.patch, YARN-3388-v1.patch


 When there are multiple active users in a queue, it should be possible for 
 those users to make use of capacity up-to max_capacity (or close). The 
 resources should be fairly distributed among the active users in the queue. 
 This works pretty well when there is a single resource being scheduled.   
 However, when there are multiple resources the situation gets more complex 
 and the current algorithm tends to get stuck at Capacity. 
 Example illustrated in subsequent comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup

2015-04-02 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393545#comment-14393545
 ] 

Wangda Tan commented on YARN-2729:
--

Some comments:
*1) Configuration:*
Instead of distributed_node_labels_prefix, do you think is it better to name it 
: yarn.node-labels.nm.provider? The distributed.node-labels-provider 
doesn't clearly mentioned it runs in NM side.

I don't want to expose class to config unless it is necessary, now we have two 
options, one is script-based and another is config-based. We can set the two as 
white-list, if a given value is not in whitelist, trying to get a class from 
the name. So the option will be: yarn.node-labels.nm.provider = 
config/script/other-class-name.

Revisted interval, I think it's better to make it to be provider configuration 
instead of script-provider-only configuration. Since config/script will share 
it (I remember I have some back-and-forth opinions here).
If you agree above, the name could be: 
yarn.node-labels.nm.provider-fetch-interval-ms (and provider-fetch-timeout-ms)

And script-related options could be:
yarn.node-labels.nm.provider.script.path/opts

*2) Implementation of ScriptBasedNodeLabelsProvider*
I feel like ScriptBased and ConfigBased can share some implementations, they 
will all init a time task, get interval and run, check timeout (meaningless for 
config-based), etc.
Can you make an abstract class and inherited by ScriptBased?

DISABLE_TIMER_CONFIG should be a part of YarnConfiguration, all default of 
configurations should be a part of YarnConfiguration.

canRun - something like verifyConfiguredScript, and directly throw exception 
when something wrong (so that admin can know what really happened, such as file 
not found, doesn't have execution permission, etc.), and it should be private 
non-static.

checkAndThrowLabelName should be called in NodeStatusUpdaterImpl

label need to be trim() when called checkAndThrowLabelName(...)




 Support script based NodeLabelsProvider Interface in Distributed Node Label 
 Configuration Setup
 ---

 Key: YARN-2729
 URL: https://issues.apache.org/jira/browse/YARN-2729
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Fix For: 2.8.0

 Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, 
 YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, 
 YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, 
 YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, 
 YARN-2729.20150402-1.patch


 Support script based NodeLabelsProvider Interface in Distributed Node Label 
 Configuration Setup . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes

2015-04-02 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-3439:
-
Attachment: YARN-3439.001.patch

Daryn is out so posting a prototype patch he developed to get some early 
feedback.  Note that this patch can't go in as-is, as it's a work-in-progress 
that hacks out the automatic HDFS delegation token logic that was added as part 
of YARN-2704.

Essentially the idea is to reference count the tokens and only attempt to 
cancel them when the token is no longer referenced.  Since the launcher job 
won't complete until it has successfully submitted the sub-job(s), the token 
will remain referenced throughout the lifespan of the workflow even if the 
launcher job exits early.

 RM fails to renew token when Oozie launcher leaves before sub-job finishes
 --

 Key: YARN-3439
 URL: https://issues.apache.org/jira/browse/YARN-3439
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Jason Lowe
Assignee: Daryn Sharp
Priority: Blocker
 Attachments: YARN-3439.001.patch


 When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't 
 linger waiting for the sub-job to finish.  At that point the RM stops 
 renewing delegation tokens for the launcher job which wreaks havoc on the 
 sub-job if the sub-job runs long enough for the tokens to expire.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3440) ResourceUsage should be copy-on-write

2015-04-02 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3440:

Description: 
In {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceUsage}}, 
even if it is thread-safe, but Resource returned by getters could be updated by 
another thread.

All Resource objects in ResourceUsage should be copy-on-write, reader will 
always get a non-changed Resource. And changes apply on Resource acquired by 
caller will not affect original Resource.

  was:
In {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceUsage
}}, even if it is thread-safe, but Resource returned by getters could be 
updated by another thread.

All Resource objects in ResourceUsage should be copy-on-write, reader will 
always get a non-changed Resource. And changes apply on Resource acquired by 
caller will not affect original Resource.


 ResourceUsage should be copy-on-write
 -

 Key: YARN-3440
 URL: https://issues.apache.org/jira/browse/YARN-3440
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler, yarn
Reporter: Wangda Tan
Assignee: Li Lu

 In {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceUsage}}, 
 even if it is thread-safe, but Resource returned by getters could be updated 
 by another thread.
 All Resource objects in ResourceUsage should be copy-on-write, reader will 
 always get a non-changed Resource. And changes apply on Resource acquired by 
 caller will not affect original Resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so

2015-04-02 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393408#comment-14393408
 ] 

Mit Desai commented on YARN-2890:
-

[~hitesh], did you had any comments on the patch?

 MiniMRYarnCluster should turn on timeline service if configured to do so
 

 Key: YARN-2890
 URL: https://issues.apache.org/jira/browse/YARN-2890
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-2890.1.patch, YARN-2890.2.patch, YARN-2890.3.patch, 
 YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
 YARN-2890.patch


 Currently the MiniMRYarnCluster does not consider the configuration value for 
 enabling timeline service before starting. The MiniYarnCluster should only 
 start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393412#comment-14393412
 ] 

Hudson commented on YARN-3415:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7497 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7497/])
YARN-3415. Non-AM containers can be counted towards amResourceUsage of a 
fairscheduler queue (Zhihai Xu via Sandy Ryza) (sandy: rev 
6a6a59db7f1bfda47c3c14fb49676a7b22d2eb06)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* hadoop-yarn-project/CHANGES.txt


 Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler 
 queue
 --

 Key: YARN-3415
 URL: https://issues.apache.org/jira/browse/YARN-3415
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Rohit Agarwal
Assignee: zhihai xu
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-3415.000.patch, YARN-3415.001.patch, 
 YARN-3415.002.patch


 We encountered this problem while running a spark cluster. The 
 amResourceUsage for a queue became artificially high and then the cluster got 
 deadlocked because the maxAMShare constrain kicked in and no new AM got 
 admitted to the cluster.
 I have described the problem in detail here: 
 https://github.com/apache/spark/pull/5233#issuecomment-87160289
 In summary - the condition for adding the container's memory towards 
 amResourceUsage is fragile. It depends on the number of live containers 
 belonging to the app. We saw that the spark AM went down without explicitly 
 releasing its requested containers and then one of those containers memory 
 was counted towards amResource.
 cc - [~sandyr]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3437) convert load test driver to timeline service v.2

2015-04-02 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3437:
--
Attachment: YARN-3437.001.patch

Patch v.1 posted.

This is basically a modification of the YARN-2556 patch (and clean-up of issues 
etc.) to work against the timeline service v.2.

Since the new distributed timeline service collectors are tied to applications, 
I chose the approach of instantiating the base timeline collector within the 
mapper task, rather than going through the timeline client. Making it go 
through the timeline client has a number of challenges (see YARN-3378). But 
this should be still effective as a way to exercise the bulk of the write 
performance and scalability.

You can try this out by doing for example

{code}
hadoop jar 
share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT-tests.jar
 timelineperformance -m 10 -t 1000
{code}

You'll get the output like

{noformat}
TRANSACTION RATE (per mapper): 5027.652086 ops/s
IO RATE (per mapper): 5027.652086 KB/s
TRANSACTION RATE (total): 50276.520865 ops/s
IO RATE (total): 50276.520865 KB/s
{noformat}

It is still using pretty simple entities to write to the storage. I'll work on 
adding handling job history files later in a different JIRA.

I would greatly appreciate your review. Thanks!

 convert load test driver to timeline service v.2
 

 Key: YARN-3437
 URL: https://issues.apache.org/jira/browse/YARN-3437
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3437.001.patch


 This subtask covers the work for converting the proposed patch for the load 
 test driver (YARN-2556) to work with the timeline service v.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393554#comment-14393554
 ] 

Hadoop QA commented on YARN-3437:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12709078/YARN-3437.001.patch
  against trunk revision 6a6a59d.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7206//console

This message is automatically generated.

 convert load test driver to timeline service v.2
 

 Key: YARN-3437
 URL: https://issues.apache.org/jira/browse/YARN-3437
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3437.001.patch


 This subtask covers the work for converting the proposed patch for the load 
 test driver (YARN-2556) to work with the timeline service v.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3388) Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit

2015-04-02 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-3388:
-
Attachment: YARN-3388-v1.patch

Hi [~leftnoteasy]. Uploaded a new version of patch that addresses the 
inefficiency and adds unit tests.

I think label support is better left for a separate jira when labels are fully 
working with userlimits. 

 Allocation in LeafQueue could get stuck because DRF calculator isn't well 
 supported when computing user-limit
 -

 Key: YARN-3388
 URL: https://issues.apache.org/jira/browse/YARN-3388
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts
 Attachments: YARN-3388-v0.patch, YARN-3388-v1.patch


 When there are multiple active users in a queue, it should be possible for 
 those users to make use of capacity up-to max_capacity (or close). The 
 resources should be fairly distributed among the active users in the queue. 
 This works pretty well when there is a single resource being scheduled.   
 However, when there are multiple resources the situation gets more complex 
 and the current algorithm tends to get stuck at Capacity. 
 Example illustrated in subsequent comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup

2015-04-02 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393451#comment-14393451
 ] 

Wangda Tan commented on YARN-2729:
--

Apparently Jenkins ran wrong tests, rekicked Jenkins.

 Support script based NodeLabelsProvider Interface in Distributed Node Label 
 Configuration Setup
 ---

 Key: YARN-2729
 URL: https://issues.apache.org/jira/browse/YARN-2729
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Fix For: 2.8.0

 Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, 
 YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, 
 YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, 
 YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, 
 YARN-2729.20150402-1.patch


 Support script based NodeLabelsProvider Interface in Distributed Node Label 
 Configuration Setup . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393450#comment-14393450
 ] 

Hadoop QA commented on YARN-2942:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12709065/ConcatableAggregatedLogsProposal_v4.pdf
  against trunk revision 6a6a59d.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7204//console

This message is automatically generated.

 Aggregated Log Files should be combined
 ---

 Key: YARN-2942
 URL: https://issues.apache.org/jira/browse/YARN-2942
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
 CompactedAggregatedLogsProposal_v1.pdf, 
 CompactedAggregatedLogsProposal_v2.pdf, 
 ConcatableAggregatedLogsProposal_v4.pdf, YARN-2942-preliminary.001.patch, 
 YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
 YARN-2942.003.patch


 Turning on log aggregation allows users to easily store container logs in 
 HDFS and subsequently view them in the YARN web UIs from a central place.  
 Currently, there is a separate log file for each Node Manager.  This can be a 
 problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
 accumulating many (possibly small) files per YARN application.  The current 
 “solution” for this problem is to configure YARN (actually the JHS) to 
 automatically delete these files after some amount of time.  
 We should improve this by compacting the per-node aggregated log files into 
 one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2666) TestFairScheduler.testContinuousScheduling fails Intermittently

2015-04-02 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2666:

Attachment: (was: YARN-2666.000.patch)

 TestFairScheduler.testContinuousScheduling fails Intermittently
 ---

 Key: YARN-2666
 URL: https://issues.apache.org/jira/browse/YARN-2666
 Project: Hadoop YARN
  Issue Type: Test
  Components: scheduler
Reporter: Tsuyoshi Ozawa
Assignee: zhihai xu

 The test fails on trunk.
 {code}
 Tests run: 79, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.698 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
 testContinuousScheduling(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
   Time elapsed: 0.582 sec   FAILURE!
 java.lang.AssertionError: expected:2 but was:1
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3372)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393410#comment-14393410
 ] 

Hadoop QA commented on YARN-3439:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12709044/YARN-3439.001.patch
  against trunk revision eccb7d4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7200//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7200//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7200//console

This message is automatically generated.

 RM fails to renew token when Oozie launcher leaves before sub-job finishes
 --

 Key: YARN-3439
 URL: https://issues.apache.org/jira/browse/YARN-3439
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Jason Lowe
Assignee: Daryn Sharp
Priority: Blocker
 Attachments: YARN-3439.001.patch


 When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't 
 linger waiting for the sub-job to finish.  At that point the RM stops 
 renewing delegation tokens for the launcher job which wreaks havoc on the 
 sub-job if the sub-job runs long enough for the tokens to expire.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393556#comment-14393556
 ] 

Hadoop QA commented on YARN-2729:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12708788/YARN-2729.20150402-1.patch
  against trunk revision 6a6a59d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7205//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7205//console

This message is automatically generated.

 Support script based NodeLabelsProvider Interface in Distributed Node Label 
 Configuration Setup
 ---

 Key: YARN-2729
 URL: https://issues.apache.org/jira/browse/YARN-2729
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Fix For: 2.8.0

 Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, 
 YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, 
 YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, 
 YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, 
 YARN-2729.20150402-1.patch


 Support script based NodeLabelsProvider Interface in Distributed Node Label 
 Configuration Setup . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior

2015-04-02 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393388#comment-14393388
 ] 

Vinod Kumar Vavilapalli commented on YARN-3318:
---

bq.  I think it should be fine to make policy interfaces define as well as 
CapacityScheduler changes together with this patch (only for 
FifoOrderingPolicy), it's good to see how interfaces and policies work in CS, 
is it easy or not, etc. =
We can still do this with patches on two JIRAs - one for the framework, one for 
CS, one for FS etc. The Fifo one can be here for demonstration, no problem with 
that. Why is it so hard to focus one thing in one JIRA?

 Create Initial OrderingPolicy Framework, integrate with CapacityScheduler 
 LeafQueue supporting present behavior
 ---

 Key: YARN-3318
 URL: https://issues.apache.org/jira/browse/YARN-3318
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3318.13.patch, YARN-3318.14.patch, 
 YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, 
 YARN-3318.36.patch, YARN-3318.39.patch


 Create the initial framework required for using OrderingPolicies with 
 SchedulerApplicaitonAttempts and integrate with the CapacityScheduler.   This 
 will include an implementation which is compatible with current FIFO behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2

2015-04-02 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393530#comment-14393530
 ] 

Sangjin Lee commented on YARN-3437:
---

Added a few folks for review.

 convert load test driver to timeline service v.2
 

 Key: YARN-3437
 URL: https://issues.apache.org/jira/browse/YARN-3437
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3437.001.patch


 This subtask covers the work for converting the proposed patch for the load 
 test driver (YARN-2556) to work with the timeline service v.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue

2015-04-02 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-3415:
-
Summary: Non-AM containers can be counted towards amResourceUsage of a Fair 
Scheduler queue  (was: Non-AM containers can be counted towards amResourceUsage 
of a fairscheduler queue)

 Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler 
 queue
 --

 Key: YARN-3415
 URL: https://issues.apache.org/jira/browse/YARN-3415
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Rohit Agarwal
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3415.000.patch, YARN-3415.001.patch, 
 YARN-3415.002.patch


 We encountered this problem while running a spark cluster. The 
 amResourceUsage for a queue became artificially high and then the cluster got 
 deadlocked because the maxAMShare constrain kicked in and no new AM got 
 admitted to the cluster.
 I have described the problem in detail here: 
 https://github.com/apache/spark/pull/5233#issuecomment-87160289
 In summary - the condition for adding the container's memory towards 
 amResourceUsage is fragile. It depends on the number of live containers 
 belonging to the app. We saw that the spark AM went down without explicitly 
 releasing its requested containers and then one of those containers memory 
 was counted towards amResource.
 cc - [~sandyr]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior

2015-04-02 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393374#comment-14393374
 ] 

Wangda Tan commented on YARN-3318:
--

[~cwelch],
I took a look at your latest patch as well as [~vinodkv]'s suggestions, 
comments:

*1. I prefer what Vinod suggested, split SchedulerProcess to be 
QueueSchedulable and AppSchedulable to avoid notes in FairScheduler 
interface Schedulable like:*
{code}
/** Start time for jobs in FIFO queues; meaningless for QueueSchedulables.*/
{code}
They can both inherit {{Schedulable}}. With this patch, we can limit to 
AppSchedulable and Schedulable definition.
Also, regarding to schedulable comparator, not all Schedulable fit for all 
comparator, it's meaningless to do FIFO scheduling in parent queue level.
I think:
{code}
Schedulable contains ResourceUsage (class), and name
In addition, AppSchedulable contains compareSubmissionOrderTo(AppSchedulable) 
and Priority
{code}

*2. About inherit relationships between interfaces/classes, now it's not very 
clear to me, I spent some time got what they're doing. My suggestion is:*
{code}
FairOrderingPolicy/FifoOrderingPolicy  OrderingPolicy
  (implements)
FairOrderingPolicy and FifoOrderingPolicy could inherit from 
AbstractOrderingPolicy with commmon implementations

FairOrderingPolicy/FifoOrderingPolicy  
FairSchedulableComparator/FifoSchedulableComparator
(uses)
It's no need to invent SchedulerComparator interface, use existing Java 
Comparator interface should be simple and enough.
{code}

*3. Regarding relationship between OrderingPolicy and comparator:*
I understand the method of SchedulerComparator is to reduce unnecessary re-sort 
Schedulables being added/modified in OrderingPolicy, but actually we can 
1) Do this in OrderingPolicy itself. For example, with my above suggestion, 
FifoOrderingPolicy will simply ignore container changed notifications.
2) Comparator doesn't know about global info, only OrderingPolicy knows how 
combination of Comparator actors, I don't want containerAllocate/Release 
coupled in Comparator interface.
And we don't need a separated CompoundComparator, this can be put in 
AbstractOrderingPolicy.

*4. Regarding configuration (CapacitySchedulerConfiguration):*
I think we don't need ORDERING_POLICY_CLASS, two operations for very similar 
purpose can confuse user. I suggest only leave ordering-policy, and it name can 
be:
fifo, fair regardless of its internal comparator implementaiton. And in 
the future we can add priority-fifo, priority-fair. (note the - in name 
doesn't means AND only, it could be collaborate of the two instead of simply 
combination).
If user specify a name not in white-list-shortname given by us, we will try to 
load class with the name.

*5. Regarding longer term plan, LimitPolicy:*
This part seems not well discussed, to limit scope of this JIRA, so I think its 
implementation and definition should happen in separated ticket.
For longer plan, considering YARN-2986 as well, we may configure queue like 
following:
{code}
queue name=a
queues
queue name = a1
policy-properties
ordering-policyfair/ordering-policy
limit-policy
user-limit-policy
  enabledtrue/enabled
  user-limit-percentage50/user-limit-percentage
/user-limit-policy
queue-capacity-policy
   capacity../capacity
   max-capacity../max-capacity
/queue-capacity-policy
/limit-policy
/policy-properties
/queue
queues
/queue
{code}
Changes of this patch in CapacitySchedulerConfiguration seems reasonable, as 
Craig mentioned, simply mark it to be unstable or experimental should be 
enough. Longer term is to define and stablize YARN-2986 to make a real 
uniformed scheduler.

*6. Regarding scope of this JIRA*
I think it should be fine to make policy interfaces define as well as 
CapacityScheduler changes together with this patch (only for 
FifoOrderingPolicy), it's good to see how interfaces and policies work in CS, 
is it easy or not, etc. =
And following I suggest to move to a separated ticket:
1) UI (Web and CLI)
2) REST
3) PB related changes
Along with patch getting changes, you don't have to maintain above changes 
together with the patch.

Please feel free to let me know your thoughts.

 Create Initial OrderingPolicy Framework, integrate with CapacityScheduler 
 LeafQueue supporting present behavior
 ---

 Key: YARN-3318
 URL: https://issues.apache.org/jira/browse/YARN-3318
 Project: Hadoop YARN
  Issue Type: 

[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393496#comment-14393496
 ] 

Hadoop QA commented on YARN-3365:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12707355/YARN-3365.003.patch
  against trunk revision 6a6a59d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7203//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7203//console

This message is automatically generated.

 Add support for using the 'tc' tool via container-executor
 --

 Key: YARN-3365
 URL: https://issues.apache.org/jira/browse/YARN-3365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Attachments: YARN-3365.001.patch, YARN-3365.002.patch, 
 YARN-3365.003.patch


 We need the following functionality :
 1) modify network interface traffic shaping rules - to be able to attach a 
 qdisc, create child classes etc
 2) read existing rules in place 
 3) read stats for the various classes 
 Using tc requires elevated privileges - hence this functionality is to be 
 made available via container-executor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage

2015-04-02 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393264#comment-14393264
 ] 

Zhijie Shen commented on YARN-3391:
---

[~vrushalic], it sounds good to me to set aside the disagreement on the flow 
name default and move on. As far as I can tell, with the current context info 
data flow, it's quite simple to change the default value if we figure out the 
better one later. In addition, the previous debate is also related how we show 
flows on the web UI by default. I think we can go back to visit the defaults 
once we reaches the web UI work when we should have a better idea about it.

 Clearly define flow ID/ flow run / flow version in API and storage
 --

 Key: YARN-3391
 URL: https://issues.apache.org/jira/browse/YARN-3391
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3391.1.patch


 To continue the discussion in YARN-3040, let's figure out the best way to 
 describe the flow.
 Some key issues that we need to conclude on:
 - How do we include the flow version in the context so that it gets passed 
 into the collector and to the storage eventually?
 - Flow run id should be a number as opposed to a generic string?
 - Default behavior for the flow run id if it is missing (i.e. client did not 
 set it)
 - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue

2015-04-02 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393351#comment-14393351
 ] 

zhihai xu commented on YARN-3415:
-

[~sandyr], thanks for the review, The latest patch YARN-3415.002.patch is 
rebased on the latest code base and it passed the Jenkins test. Let me know 
whether you have more comments for the patch.

 Non-AM containers can be counted towards amResourceUsage of a fairscheduler 
 queue
 -

 Key: YARN-3415
 URL: https://issues.apache.org/jira/browse/YARN-3415
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Rohit Agarwal
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3415.000.patch, YARN-3415.001.patch, 
 YARN-3415.002.patch


 We encountered this problem while running a spark cluster. The 
 amResourceUsage for a queue became artificially high and then the cluster got 
 deadlocked because the maxAMShare constrain kicked in and no new AM got 
 admitted to the cluster.
 I have described the problem in detail here: 
 https://github.com/apache/spark/pull/5233#issuecomment-87160289
 In summary - the condition for adding the container's memory towards 
 amResourceUsage is fragile. It depends on the number of live containers 
 belonging to the app. We saw that the spark AM went down without explicitly 
 releasing its requested containers and then one of those containers memory 
 was counted towards amResource.
 cc - [~sandyr]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior

2015-04-02 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393347#comment-14393347
 ] 

Vinod Kumar Vavilapalli commented on YARN-3318:
---

bq. I think it is useful to split off CS changes into their own JIRA. We can 
strictly focus on the policy framework here.
You missed this, let's please do this.

bq. well, I'd actually talked Wangda Tan into SchedulerProcess So, we can chew 
on this a bit more  see where we go
SchedulerProcess is definitely misleading. It seems to point to a process that 
is doing scheduling. What you need is a Schedulable  / SchedulableEntity / 
Consumer etc. You could also say SchedulableProcess, but Process is way too 
overloaded.

bq.  The goal is to make this available quickly but iteratively, keeping the 
changes small but making them available for use and feedback. (..) We should 
grow it organically, gradually, iteratively, think of it is a facet of the 
policy framework hooked up and available but with more to follow
I agree to this, but we are not in a position to support the APIs, CLI, config 
names in a supportable manner yet. They may or may not change depending on how 
parent queue policies, limit policies evolve. For that reason alone, I am 
saying that (1) Don't make the configurations public yet, or put a warning 
saying that they are unstable and (2) don't expose them in CLI , REST APIs yet. 
It's okay to put in the web UI, web UI scraping is not a contract.

bq. You add/remove applications to/from LeafQueue's policy but 
addition/removal of containers is an event...
bq. This has been factored differently along Wangda Tan's suggestion, it should 
now be consistent
It's a bit better now. Although we are hard-coding Containers. Can revisit this 
later.

Other comments
 - SchedulerApplicationAttempt.getDemand() should be private.
 - SchedulerProcess
-- updateCaches() - updateState() / updateSchedulingState() as that is 
what it is doing?
-- getCachedConsumption() / getCachedDemand(): simply getCurrent*() ?
 - SchedulerComparator
  -- We aren't comparing Schedulers. Given the current name, it should have 
been SchedulerProcessComparator, but SchedulerProcess itself should be renamed 
as mentioned before.
  -- What is the need for reorderOnContainerAllocate () / 
reorderOnContainerRelease()?
 - Move all the comparator related classed into their own package.
 - SchedulerComparatorPolicy
  -- This is really a ComparatorBasedOrderingPolicy. Do we really see 
non-comparator based ordering-policy. We are unnecessarily adding two 
abstractions - adding policies and comparators.
  -- Use className.getName() instead of hardcoded strings like 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.policy.FifoComparator

 Create Initial OrderingPolicy Framework, integrate with CapacityScheduler 
 LeafQueue supporting present behavior
 ---

 Key: YARN-3318
 URL: https://issues.apache.org/jira/browse/YARN-3318
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3318.13.patch, YARN-3318.14.patch, 
 YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, 
 YARN-3318.36.patch, YARN-3318.39.patch


 Create the initial framework required for using OrderingPolicies with 
 SchedulerApplicaitonAttempts and integrate with the CapacityScheduler.   This 
 will include an implementation which is compatible with current FIFO behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3442) Consider abstracting out user, app limits etc into some sort of a LimitPolicy

2015-04-02 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-3442:
-

 Summary: Consider abstracting out user, app limits etc into some 
sort of a LimitPolicy
 Key: YARN-3442
 URL: https://issues.apache.org/jira/browse/YARN-3442
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


Similar to the policies being added in YARN-3318 and YARN-3441 for leaf and 
parent queues, we should consider extracting an abstraction for limits too.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior

2015-04-02 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393440#comment-14393440
 ] 

Vinod Kumar Vavilapalli commented on YARN-3318:
---

Filed YARN-3441 and YARN-3442 for parent queues and for limits.


 Create Initial OrderingPolicy Framework, integrate with CapacityScheduler 
 LeafQueue supporting present behavior
 ---

 Key: YARN-3318
 URL: https://issues.apache.org/jira/browse/YARN-3318
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-3318.13.patch, YARN-3318.14.patch, 
 YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, 
 YARN-3318.36.patch, YARN-3318.39.patch


 Create the initial framework required for using OrderingPolicies with 
 SchedulerApplicaitonAttempts and integrate with the CapacityScheduler.   This 
 will include an implementation which is compatible with current FIFO behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.

2015-04-02 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393441#comment-14393441
 ] 

Junping Du commented on YARN-3334:
--

Thanks [~zjshen] for review and comments!
bq. but I undo the some unnecessary change in TimelineClientImpl (which seems 
to be adde for code debugging).
I think that is necessary change. Previous message cannot tell too much info 
especially it return no different message between no response and response with 
failure. Also, error code should be log out even debug is not on because this 
is serious failure and should be reported in production environment. Thoughts?

 [Event Producers] NM TimelineClient life cycle handling and container metrics 
 posting to new timeline service.
 --

 Key: YARN-3334
 URL: https://issues.apache.org/jira/browse/YARN-3334
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: YARN-2928
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
 YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, 
 YARN-3334-v5.patch, YARN-3334-v6.patch, YARN-3334.7.patch


 After YARN-3039, we have service discovery mechanism to pass app-collector 
 service address among collectors, NMs and RM. In this JIRA, we will handle 
 service address setting for TimelineClients in NodeManager, and put container 
 metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue

2015-04-02 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393476#comment-14393476
 ] 

zhihai xu commented on YARN-3415:
-

Thanks [~ragarwal] for valuable feedback and filing this issue. Thanks 
[~sandyr]  for valuable feedback and committing the patch! Greatly appreciated.

 Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler 
 queue
 --

 Key: YARN-3415
 URL: https://issues.apache.org/jira/browse/YARN-3415
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Rohit Agarwal
Assignee: zhihai xu
Priority: Critical
 Fix For: 2.8.0

 Attachments: YARN-3415.000.patch, YARN-3415.001.patch, 
 YARN-3415.002.patch


 We encountered this problem while running a spark cluster. The 
 amResourceUsage for a queue became artificially high and then the cluster got 
 deadlocked because the maxAMShare constrain kicked in and no new AM got 
 admitted to the cluster.
 I have described the problem in detail here: 
 https://github.com/apache/spark/pull/5233#issuecomment-87160289
 In summary - the condition for adding the container's memory towards 
 amResourceUsage is fragile. It depends on the number of live containers 
 belonging to the app. We saw that the spark AM went down without explicitly 
 releasing its requested containers and then one of those containers memory 
 was counted towards amResource.
 cc - [~sandyr]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2666) TestFairScheduler.testContinuousScheduling fails Intermittently

2015-04-02 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2666:

Attachment: YARN-2666.000.patch

 TestFairScheduler.testContinuousScheduling fails Intermittently
 ---

 Key: YARN-2666
 URL: https://issues.apache.org/jira/browse/YARN-2666
 Project: Hadoop YARN
  Issue Type: Test
  Components: scheduler
Reporter: Tsuyoshi Ozawa
Assignee: zhihai xu
 Attachments: YARN-2666.000.patch


 The test fails on trunk.
 {code}
 Tests run: 79, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.698 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
 testContinuousScheduling(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
   Time elapsed: 0.582 sec   FAILURE!
 java.lang.AssertionError: expected:2 but was:1
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3372)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3410) YARN admin should be able to remove individual application records from RMStateStore

2015-04-02 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393563#comment-14393563
 ] 

Wangda Tan commented on YARN-3410:
--

Thanks for your comment, [~rohithsharma].

But what's the use case of using rmadmin removing a state while RM is running? 
The command is just a way to avoid app entered an un-expected state so RM 
cannot get started, unless there's any use case of doing that, I suggest to 
scope this to a RM starting option like YARN-2131.

 YARN admin should be able to remove individual application records from 
 RMStateStore
 

 Key: YARN-3410
 URL: https://issues.apache.org/jira/browse/YARN-3410
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, yarn
Reporter: Wangda Tan
Assignee: Rohith
Priority: Critical

 When RM state store entered an unexpected state, one example is YARN-2340, 
 when an attempt is not in final state but app already completed, RM can never 
 get up unless format RMStateStore.
 I think we should support remove individual application records from 
 RMStateStore to unblock RM admin make choice of either waiting for a fix or 
 format state store.
 In addition, RM should be able to report all fatal errors (which will 
 shutdown RM) when doing app recovery, this can save admin some time to remove 
 apps in bad state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-04-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393700#comment-14393700
 ] 

Karthik Kambatla commented on YARN-2942:


(Canceled the patch to stop Jenkins from evaluating the design doc :) ) 

[~rkanter] - thanks for updating the design doc. A couple of comments:
# If there is an NM X actively concatenating its logs and NM Y can't acquire 
the lock, what happens? 
## Does it do a blocking-wait? If yes, this should likely be in a separate 
thread.
## I would like for it to be non-blocking. How about a LogConcatenationService 
in the NM? This service is brought up if you enable log concatenation. This 
service would periodically go through all of its past aggregated logs and 
concatenate those that it can acquire a lock for. Delayed concatenation should 
be okay because we are doing this primarily to handle the problem HDFS has with 
small files. Also, this way, we don't have do anything different for NM 
restart. Forward looking, this concat service could potentially take input on 
how busy HDFS is. 
# I didn't completely understand the point about a config to specify the 
format. Are you suggesting we have two different on/off configs - one to turn 
on concatenation and one to specify the format JHS should be reading. I think 
just one config that clearly states that the turning on this on an NM (writer) 
requires the JHS (reader) already has this enabled. In case of rolling 
upgrades, this translates to requiring a JHS upgrade prior to NM upgrade.  

 Aggregated Log Files should be combined
 ---

 Key: YARN-2942
 URL: https://issues.apache.org/jira/browse/YARN-2942
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
 CompactedAggregatedLogsProposal_v1.pdf, 
 CompactedAggregatedLogsProposal_v2.pdf, 
 ConcatableAggregatedLogsProposal_v4.pdf, YARN-2942-preliminary.001.patch, 
 YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
 YARN-2942.003.patch


 Turning on log aggregation allows users to easily store container logs in 
 HDFS and subsequently view them in the YARN web UIs from a central place.  
 Currently, there is a separate log file for each Node Manager.  This can be a 
 problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
 accumulating many (possibly small) files per YARN application.  The current 
 “solution” for this problem is to configure YARN (actually the JHS) to 
 automatically delete these files after some amount of time.  
 We should improve this by compacting the per-node aggregated log files into 
 one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers

2015-04-02 Thread Sidharta Seethana (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana updated YARN-3366:

Attachment: YARN-3366.001.patch

Attaching a patch with an implementation of traffic classification/shaping for 
traffic originating from YARN containers. This patch depends on changes/patches 
from https://issues.apache.org/jira/browse/YARN-3365 and  
https://issues.apache.org/jira/browse/YARN-3443

 Outbound network bandwidth : classify/shape traffic originating from YARN 
 containers
 

 Key: YARN-3366
 URL: https://issues.apache.org/jira/browse/YARN-3366
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Attachments: YARN-3366.001.patch


 In order to be able to isolate based on/enforce outbound traffic bandwidth 
 limits, we need  a mechanism to classify/shape network traffic in the 
 nodemanager. For more information on the design, please see the attached 
 design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3435) AM container to be allocated Appattempt AM container shown as null

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393975#comment-14393975
 ] 

Hadoop QA commented on YARN-3435:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12709003/YARN-3435.001.patch
  against trunk revision bad070f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7208//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7208//console

This message is automatically generated.

 AM container to be allocated Appattempt AM container shown as null
 --

 Key: YARN-3435
 URL: https://issues.apache.org/jira/browse/YARN-3435
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: 1RM,1DN
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Trivial
 Attachments: Screenshot.png, YARN-3435.001.patch


 Submit yarn application
 Open http://rm:8088/cluster/appattempt/appattempt_1427984982805_0003_01 
 Before the AM container is allocated 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.

2015-04-02 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393631#comment-14393631
 ] 

Zhijie Shen commented on YARN-3334:
---

If so, I suggest combining the two massages together, and record a error-level 
log (the first message is actually useless, if we always report the second one).

 [Event Producers] NM TimelineClient life cycle handling and container metrics 
 posting to new timeline service.
 --

 Key: YARN-3334
 URL: https://issues.apache.org/jira/browse/YARN-3334
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: YARN-2928
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
 YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, 
 YARN-3334-v5.patch, YARN-3334-v6.patch, YARN-3334.7.patch


 After YARN-3039, we have service discovery mechanism to pass app-collector 
 service address among collectors, NMs and RM. In this JIRA, we will handle 
 service address setting for TimelineClients in NodeManager, and put container 
 metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-02 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393711#comment-14393711
 ] 

Wangda Tan commented on YARN-3434:
--

[~tgraves],
I feel like this issue and several related issues are solved by YARN-3243 
already. Could you please check if this problem is already solved?

Thanks,

 Interaction between reservations and userlimit can result in significant ULF 
 violation
 --

 Key: YARN-3434
 URL: https://issues.apache.org/jira/browse/YARN-3434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Thomas Graves

 ULF was set to 1.0
 User was able to consume 1.4X queue capacity.
 It looks like when this application launched, it reserved about 1000 
 containers, each 8G each, within about 5 seconds. I think this allowed the 
 logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-685) Capacity Scheduler is not distributing the reducers tasks across the cluster

2015-04-02 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan resolved YARN-685.
-
Resolution: Invalid

According to test result from [~raviprak], CS fairly distributes reducers to 
NMs in the cluster. Resolving this as invalid and please reopen this if you 
still think this is a problem.


 Capacity Scheduler is not distributing the reducers tasks across the cluster
 

 Key: YARN-685
 URL: https://issues.apache.org/jira/browse/YARN-685
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.4-alpha
Reporter: Devaraj K

 If we have reducers whose total memory required to complete is less than the 
 total cluster memory, it is not assigning the reducers to all the nodes 
 uniformly(~uniformly). Also at that time there are no other jobs or job tasks 
 running in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-04-02 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393750#comment-14393750
 ] 

Sangjin Lee commented on YARN-3051:
---

bq. To plot graphs based on timeseries data, we may need to provide a time 
window for metrics too. This would be useful in case of getEntity() API. So do 
we specify this time window separately for each metric to be retrieved or same 
for all metrics ?

My sense is that it should be fine to use the same time window for all metrics. 
[~gtCarrera9]? [~zjshen]?

bq. Queries based on relations i.e. queries such as get all containers for an 
app. We can return relatesto field while querying for an app. And then client 
can use this result to fetch detailed info about related entities. Is that fine 
? Or we have to be handle it as part of a single query ?

For now, let's assume 2 queries from the client side. My thinking was that this 
is an optimization. If the storage can return two levels of entities 
efficiently, we could potentially exploit it. But maybe that's nice to have at 
the moment.

bq. Some understanding on how flow id, flow run id will be stored is required.

Li just posted the schema design in YARN-3134. That should be helpful.

 [Storage abstraction] Create backing storage read interface for ATS readers
 ---

 Key: YARN-3051
 URL: https://issues.apache.org/jira/browse/YARN-3051
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena
 Attachments: YARN-3051_temp.patch


 Per design in YARN-2928, create backing storage read interface that can be 
 implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393773#comment-14393773
 ] 

Hudson commented on YARN-3365:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7500 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7500/])
YARN-3365. Enhanced NodeManager to support using the 'tc' tool via 
container-executor for outbound network traffic control. Contributed by 
Sidharta Seethana. (vinodkv: rev b21c72777ae664b08fd1a93b4f88fa43f2478d94)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java


 Add support for using the 'tc' tool via container-executor
 --

 Key: YARN-3365
 URL: https://issues.apache.org/jira/browse/YARN-3365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Fix For: 2.8.0

 Attachments: YARN-3365.001.patch, YARN-3365.002.patch, 
 YARN-3365.003.patch


 We need the following functionality :
 1) modify network interface traffic shaping rules - to be able to attach a 
 qdisc, create child classes etc
 2) read existing rules in place 
 3) read stats for the various classes 
 Using tc requires elevated privileges - hence this functionality is to be 
 made available via container-executor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3444) Fixed typo (capability)

2015-04-02 Thread Gabor Liptak (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393861#comment-14393861
 ] 

Gabor Liptak commented on YARN-3444:


Pull request at https://github.com/apache/hadoop/pull/15

 Fixed typo (capability)
 ---

 Key: YARN-3444
 URL: https://issues.apache.org/jira/browse/YARN-3444
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Gabor Liptak
Priority: Minor

 Fixed typo (capability)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3390) RMTimelineCollector should have the context info of each app

2015-04-02 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393615#comment-14393615
 ] 

Sangjin Lee commented on YARN-3390:
---

I think we need to either pass in the context per call or have a map of app id 
to context. I would favor the latter approach because it'd be easier on the 
perspective of callers of putEntities().

 RMTimelineCollector should have the context info of each app
 

 Key: YARN-3390
 URL: https://issues.apache.org/jira/browse/YARN-3390
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 RMTimelineCollector should have the context info of each app whose entity  
 has been put



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-02 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3134:

Attachment: YARN-3134DataSchema.pdf

After some community discussion we're finalizing the Phoenix data schema design 
for the very first phase. In this phase we focus on storing basic entities and 
their metrics, configs, and events. The attached document is a summary of our 
discussion results. Comments are more than welcome. 

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2666) TestFairScheduler.testContinuousScheduling fails Intermittently

2015-04-02 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393743#comment-14393743
 ] 

zhihai xu commented on YARN-2666:
-

Hi [~ozawa], I rebased the patch YARN-2666.000.patch rebased on the latest code 
base and it passed the Jenkins test. 
Do you have time to review/commit the patch? many thanks

 TestFairScheduler.testContinuousScheduling fails Intermittently
 ---

 Key: YARN-2666
 URL: https://issues.apache.org/jira/browse/YARN-2666
 Project: Hadoop YARN
  Issue Type: Test
  Components: scheduler
Reporter: Tsuyoshi Ozawa
Assignee: zhihai xu
 Attachments: YARN-2666.000.patch


 The test fails on trunk.
 {code}
 Tests run: 79, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.698 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
 testContinuousScheduling(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
   Time elapsed: 0.582 sec   FAILURE!
 java.lang.AssertionError: expected:2 but was:1
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3372)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2901) Add errors and warning metrics page to RM, NM web UI

2015-04-02 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2901:
-
Summary: Add errors and warning metrics page to RM, NM web UI  (was: Add 
errors and warning stats to RM, NM web UI)

 Add errors and warning metrics page to RM, NM web UI
 

 Key: YARN-2901
 URL: https://issues.apache.org/jira/browse/YARN-2901
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
 Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, 
 apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, apache-yarn-2901.3.patch, 
 apache-yarn-2901.4.patch, apache-yarn-2901.5.patch


 It would be really useful to have statistics on the number of errors and 
 warnings in the RM and NM web UI. I'm thinking about -
 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
 hours/day
 By errors and warnings I'm referring to the log level.
 I suspect we can probably achieve this by writing a custom appender?(I'm open 
 to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2901) Add errors and warning metrics page to RM, NM web UI

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393810#comment-14393810
 ] 

Hudson commented on YARN-2901:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7501 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7501/])
YARN-2901. Add errors and warning metrics page to RM, NM web UI. (Varun Vasudev 
via wangda) (wangda: rev bad070fe15a642cc6f3a165612fbd272187e03cb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NavBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMController.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/ErrorsAndWarningsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java
* hadoop-common-project/hadoop-common/src/main/conf/log4j.properties
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMErrorsAndWarningsPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RmController.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Log4jWarningErrorMetricsAppender.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMErrorsAndWarningsPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NavBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLog4jWarningErrorMetricsAppender.java


 Add errors and warning metrics page to RM, NM web UI
 

 Key: YARN-2901
 URL: https://issues.apache.org/jira/browse/YARN-2901
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
 Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, 
 apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, apache-yarn-2901.3.patch, 
 apache-yarn-2901.4.patch, apache-yarn-2901.5.patch


 It would be really useful to have statistics on the number of errors and 
 warnings in the RM and NM web UI. I'm thinking about -
 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
 hours/day
 By errors and warnings I'm referring to the log level.
 I suspect we can probably achieve this by writing a custom appender?(I'm open 
 to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3443) Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM

2015-04-02 Thread Sidharta Seethana (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana updated YARN-3443:

Attachment: YARN-3443.001.patch

Attaching patch that 1) separates out CGroup implementation into a reusable 
class 2) creates 'PrivilegedContainerExecutor' that wraps the 
container-executor binary that can be used for operations that require elevated 
privileges 3) creates a simple ResourceHandler interface for that be used to 
plug in support for new resource types. 

 Create a 'ResourceHandler' subsystem to ease addition of support for new 
 resource types on the NM
 -

 Key: YARN-3443
 URL: https://issues.apache.org/jira/browse/YARN-3443
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Attachments: YARN-3443.001.patch


 The current cgroups implementation is closely tied to supporting CPU as a 
 resource . We need to separate out CGroups support as well a provide a simple 
 ResourceHandler subsystem that will enable us to add support for new resource 
 types on the NM - e.g Network, Disk etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage

2015-04-02 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393645#comment-14393645
 ] 

Sangjin Lee commented on YARN-3391:
---

I am fine with tabling this discussion and revisiting it later in the interest 
of making progress.

I just wanted to add my 2 cents that this is something we already see and 
experience with hRaven so it's not theoretical. That's the context from our 
side. The way I see it is that apps that do not have the flow name are 
basically a degenerate case of a single-app flow. This is unrelated to the 
app-to-flow aggregation. It has to do with the flowRun-to-flow aggregation. And 
it's something we want the users to do when they can set the flow name. FWIW...

 Clearly define flow ID/ flow run / flow version in API and storage
 --

 Key: YARN-3391
 URL: https://issues.apache.org/jira/browse/YARN-3391
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-3391.1.patch


 To continue the discussion in YARN-3040, let's figure out the best way to 
 describe the flow.
 Some key issues that we need to conclude on:
 - How do we include the flow version in the context so that it gets passed 
 into the collector and to the storage eventually?
 - Flow run id should be a number as opposed to a generic string?
 - Default behavior for the flow run id if it is missing (i.e. client did not 
 set it)
 - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor

2015-04-02 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393789#comment-14393789
 ] 

Sidharta Seethana commented on YARN-3365:
-

Actually, never mind - it seems like the banned user list wasn't affected.

-Sid

 Add support for using the 'tc' tool via container-executor
 --

 Key: YARN-3365
 URL: https://issues.apache.org/jira/browse/YARN-3365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Fix For: 2.8.0

 Attachments: YARN-3365.001.patch, YARN-3365.002.patch, 
 YARN-3365.003.patch


 We need the following functionality :
 1) modify network interface traffic shaping rules - to be able to attach a 
 qdisc, create child classes etc
 2) read existing rules in place 
 3) read stats for the various classes 
 Using tc requires elevated privileges - hence this functionality is to be 
 made available via container-executor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3443) Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394003#comment-14394003
 ] 

Hadoop QA commented on YARN-3443:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12709150/YARN-3443.001.patch
  against trunk revision bad070f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1150 javac 
compiler warnings (more than the trunk's current 1148 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7210//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7210//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7210//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7210//console

This message is automatically generated.

 Create a 'ResourceHandler' subsystem to ease addition of support for new 
 resource types on the NM
 -

 Key: YARN-3443
 URL: https://issues.apache.org/jira/browse/YARN-3443
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Attachments: YARN-3443.001.patch


 The current cgroups implementation is closely tied to supporting CPU as a 
 resource . We need to separate out CGroups support as well a provide a simple 
 ResourceHandler subsystem that will enable us to add support for new resource 
 types on the NM - e.g Network, Disk etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2666) TestFairScheduler.testContinuousScheduling fails Intermittently

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393725#comment-14393725
 ] 

Hadoop QA commented on YARN-2666:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12709083/YARN-2666.000.patch
  against trunk revision 6a6a59d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7207//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7207//console

This message is automatically generated.

 TestFairScheduler.testContinuousScheduling fails Intermittently
 ---

 Key: YARN-2666
 URL: https://issues.apache.org/jira/browse/YARN-2666
 Project: Hadoop YARN
  Issue Type: Test
  Components: scheduler
Reporter: Tsuyoshi Ozawa
Assignee: zhihai xu
 Attachments: YARN-2666.000.patch


 The test fails on trunk.
 {code}
 Tests run: 79, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.698 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
 testContinuousScheduling(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
   Time elapsed: 0.582 sec   FAILURE!
 java.lang.AssertionError: expected:2 but was:1
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3372)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.

2015-04-02 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393723#comment-14393723
 ] 

Sangjin Lee commented on YARN-3334:
---

I took a quick look at the latest patch, and it looks good for the most part.

However, I do worry about the size of the map produced in the response in 
ResourceTrackerService. It can be potentially quite large every time and has a 
potential impact on many things as it is part of the NM heartbeat handling. 
It's OK for now, but we should try to address it sooner than later.

 [Event Producers] NM TimelineClient life cycle handling and container metrics 
 posting to new timeline service.
 --

 Key: YARN-3334
 URL: https://issues.apache.org/jira/browse/YARN-3334
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: YARN-2928
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
 YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, 
 YARN-3334-v5.patch, YARN-3334-v6.patch, YARN-3334.7.patch


 After YARN-3039, we have service discovery mechanism to pass app-collector 
 service address among collectors, NMs and RM. In this JIRA, we will handle 
 service address setting for TimelineClients in NodeManager, and put container 
 metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3443) Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM

2015-04-02 Thread Sidharta Seethana (JIRA)
Sidharta Seethana created YARN-3443:
---

 Summary: Create a 'ResourceHandler' subsystem to ease addition of 
support for new resource types on the NM
 Key: YARN-3443
 URL: https://issues.apache.org/jira/browse/YARN-3443
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana


The current cgroups implementation is closely tied to supporting CPU as a 
resource . We need to separate out CGroups support as well a provide a simple 
ResourceHandler subsystem that will enable us to add support for new resource 
types on the NM - e.g Network, Disk etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3444) Fixed typo (capability)

2015-04-02 Thread Gabor Liptak (JIRA)
Gabor Liptak created YARN-3444:
--

 Summary: Fixed typo (capability)
 Key: YARN-3444
 URL: https://issues.apache.org/jira/browse/YARN-3444
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Gabor Liptak
Priority: Minor


Fixed typo (capability)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2666) TestFairScheduler.testContinuousScheduling fails Intermittently

2015-04-02 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393976#comment-14393976
 ] 

Tsuyoshi Ozawa commented on YARN-2666:
--

OK, I'll check it.

 TestFairScheduler.testContinuousScheduling fails Intermittently
 ---

 Key: YARN-2666
 URL: https://issues.apache.org/jira/browse/YARN-2666
 Project: Hadoop YARN
  Issue Type: Test
  Components: scheduler
Reporter: Tsuyoshi Ozawa
Assignee: zhihai xu
 Attachments: YARN-2666.000.patch


 The test fails on trunk.
 {code}
 Tests run: 79, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.698 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
 testContinuousScheduling(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
   Time elapsed: 0.582 sec   FAILURE!
 java.lang.AssertionError: expected:2 but was:1
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3372)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3365) Add support for using the 'tc' tool via container-executor

2015-04-02 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3365:
--
Fix Version/s: 2.8.0

 Add support for using the 'tc' tool via container-executor
 --

 Key: YARN-3365
 URL: https://issues.apache.org/jira/browse/YARN-3365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Fix For: 2.8.0

 Attachments: YARN-3365.001.patch, YARN-3365.002.patch, 
 YARN-3365.003.patch


 We need the following functionality :
 1) modify network interface traffic shaping rules - to be able to attach a 
 qdisc, create child classes etc
 2) read existing rules in place 
 3) read stats for the various classes 
 Using tc requires elevated privileges - hence this functionality is to be 
 made available via container-executor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor

2015-04-02 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393775#comment-14393775
 ] 

Sidharta Seethana commented on YARN-3365:
-

Thanks, Vinod! we'll need a small patch to undo the banned users change in 
branch-2.

 Add support for using the 'tc' tool via container-executor
 --

 Key: YARN-3365
 URL: https://issues.apache.org/jira/browse/YARN-3365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Fix For: 2.8.0

 Attachments: YARN-3365.001.patch, YARN-3365.002.patch, 
 YARN-3365.003.patch


 We need the following functionality :
 1) modify network interface traffic shaping rules - to be able to attach a 
 qdisc, create child classes etc
 2) read existing rules in place 
 3) read stats for the various classes 
 Using tc requires elevated privileges - hence this functionality is to be 
 made available via container-executor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers

2015-04-02 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393971#comment-14393971
 ] 

Sidharta Seethana commented on YARN-3366:
-

Since this patch requires uncommitted changes from 
https://issues.apache.org/jira/browse/YARN-3443, I am not submitting this patch 
to a pre-commit build for the time being.

 Outbound network bandwidth : classify/shape traffic originating from YARN 
 containers
 

 Key: YARN-3366
 URL: https://issues.apache.org/jira/browse/YARN-3366
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana
 Attachments: YARN-3366.001.patch


 In order to be able to isolate based on/enforce outbound traffic bandwidth 
 limits, we need  a mechanism to classify/shape network traffic in the 
 nodemanager. For more information on the design, please see the attached 
 design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3390) RMTimelineCollector should have the context info of each app

2015-04-02 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393643#comment-14393643
 ] 

Zhijie Shen commented on YARN-3390:
---

bq. I would favor the latter approach 

+1

 RMTimelineCollector should have the context info of each app
 

 Key: YARN-3390
 URL: https://issues.apache.org/jira/browse/YARN-3390
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 RMTimelineCollector should have the context info of each app whose entity  
 has been put



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-04-02 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393717#comment-14393717
 ] 

Robert Kanter commented on YARN-2942:
-

Yes, it does a blocking wait.  I think this will end up being in a separate 
thread anyway because it's being done after uploading the logs to HDFS.  
However, I think making it a separate service is a good idea anyway.  As you 
said, this handles NM restart, and allows us to later add more flexibility.

If you upgrade the JHS before the NM, it's not the end of the world.  New logs 
wouldn't be found by the JHS, but that only hurts users trying to view those 
logs through the JHS.  Once the JHS is updated, they would be viewable.  In any 
case, having the two configs is probably more confusing than it needs to be for 
the user, and we'd have to take care of the case where the new format is 
disabled but concatenation is enabled (which is invalid).  I think we should 
just make this one config: the new format and concatenation is enabled or 
neither is.

I'll post an updated doc shortly.

 Aggregated Log Files should be combined
 ---

 Key: YARN-2942
 URL: https://issues.apache.org/jira/browse/YARN-2942
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
 CompactedAggregatedLogsProposal_v1.pdf, 
 CompactedAggregatedLogsProposal_v2.pdf, 
 ConcatableAggregatedLogsProposal_v4.pdf, YARN-2942-preliminary.001.patch, 
 YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
 YARN-2942.003.patch


 Turning on log aggregation allows users to easily store container logs in 
 HDFS and subsequently view them in the YARN web UIs from a central place.  
 Currently, there is a separate log file for each Node Manager.  This can be a 
 problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
 accumulating many (possibly small) files per YARN application.  The current 
 “solution” for this problem is to configure YARN (actually the JHS) to 
 automatically delete these files after some amount of time.  
 We should improve this by compacting the per-node aggregated log files into 
 one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2942) Aggregated Log Files should be combined

2015-04-02 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-2942:

Attachment: ConcatableAggregatedLogsProposal_v5.pdf

I've uploaded a v5 doc which address those changes.  I also clarified a few 
other things in there too.

 Aggregated Log Files should be combined
 ---

 Key: YARN-2942
 URL: https://issues.apache.org/jira/browse/YARN-2942
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
 CompactedAggregatedLogsProposal_v1.pdf, 
 CompactedAggregatedLogsProposal_v2.pdf, 
 ConcatableAggregatedLogsProposal_v4.pdf, 
 ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, 
 YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
 YARN-2942.003.patch


 Turning on log aggregation allows users to easily store container logs in 
 HDFS and subsequently view them in the YARN web UIs from a central place.  
 Currently, there is a separate log file for each Node Manager.  This can be a 
 problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
 accumulating many (possibly small) files per YARN application.  The current 
 “solution” for this problem is to configure YARN (actually the JHS) to 
 automatically delete these files after some amount of time.  
 We should improve this by compacting the per-node aggregated log files into 
 one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3436) Doc WebServicesIntro.html Example Rest API url wrong

2015-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393935#comment-14393935
 ] 

Hadoop QA commented on YARN-3436:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12709010/YARN-3436.001.patch
  against trunk revision bad070f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7209//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7209//console

This message is automatically generated.

 Doc WebServicesIntro.html Example Rest API url wrong
 

 Key: YARN-3436
 URL: https://issues.apache.org/jira/browse/YARN-3436
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Attachments: YARN-3436.001.patch


 /docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html
 {quote}
 Response Examples
 JSON response with single resource
 HTTP Request: GET 
 http://rmhost.domain:8088/ws/v1/cluster/{color:red}app{color}/application_1324057493980_0001
 Response Status Line: HTTP/1.1 200 OK
 {quote}
 Url should be ws/v1/cluster/{color:red}apps{color} .
 2 examples on same page are wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2015-04-02 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393938#comment-14393938
 ] 

Sidharta Seethana commented on YARN-2424:
-

It looks different versions of the patch to fix this were committed to branch-2 
and trunk? The corresponding changes to LinuxContainerExecutor.java look 
different. 

 LCE should support non-cgroups, non-secure mode
 ---

 Key: YARN-2424
 URL: https://issues.apache.org/jira/browse/YARN-2424
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
Priority: Blocker
 Fix For: 2.6.0

 Attachments: Y2424-1.patch, YARN-2424.patch


 After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
 This is a fairly serious regression, as turning on LCE prior to turning on 
 full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3390) RMTimelineCollector should have the context info of each app

2015-04-02 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394059#comment-14394059
 ] 

Naganarasimha G R commented on YARN-3390:
-

Thanks for the feedback [~zjshen]  [~sjlee0],
bq. either pass in the context per call or have a map of app id to context. I 
would favor the latter approach because it'd be easier on the perspective of 
callers of putEntities().
I too agree it will be easier easier on the perspective of callers of 
putEntities() but if we favor for map of {{app id to context}} 
* implicit assumption would be that {{putEntities(TimelineEntities ) }} will be 
for same appId(/will have have the same context)
* TimelineEntities as such do not have appID explicitly, so planning to modify 
{{TimelineCollector.getTimelineEntityContext()}} to  
{{TimelineCollector.getTimelineEntityContext(TimelineEntity.Identifier id)}} 
and subclasses of TimelineCollector can take care of mapping the Id to the  
Context (via AppId) if required.
* code of  {{putEntities(TimelineEntities)}}  would look something like 
{code}
IteratorTimelineEntity iterator = entities.getEntities().iterator();
TimelineEntity next = (iterator.hasNext())?iterator.next():null;
if(null!=next) {
  
TimelineCollectorContext context = 
getTimelineEntityContext(next.getIdentifier());
return writer.write(context.getClusterId(), context.getUserId(),
context.getFlowId(), context.getFlowRunId(), context.getAppId(),
entities);
}
{code}

If its ok then shall i work on it  ?



 RMTimelineCollector should have the context info of each app
 

 Key: YARN-3390
 URL: https://issues.apache.org/jira/browse/YARN-3390
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 RMTimelineCollector should have the context info of each app whose entity  
 has been put



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.

2015-04-02 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3334:
-
Attachment: YARN-3334-v8.patch

Upload v8 patch to address minor comments for log in TimelineClientImpl.

 [Event Producers] NM TimelineClient life cycle handling and container metrics 
 posting to new timeline service.
 --

 Key: YARN-3334
 URL: https://issues.apache.org/jira/browse/YARN-3334
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: YARN-2928
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
 YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, 
 YARN-3334-v5.patch, YARN-3334-v6.patch, YARN-3334-v8.patch, YARN-3334.7.patch


 After YARN-3039, we have service discovery mechanism to pass app-collector 
 service address among collectors, NMs and RM. In this JIRA, we will handle 
 service address setting for TimelineClients in NodeManager, and put container 
 metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3445) NM notify RM on running Apps in NM-RM heartbeat

2015-04-02 Thread Junping Du (JIRA)
Junping Du created YARN-3445:


 Summary: NM notify RM on running Apps in NM-RM heartbeat
 Key: YARN-3445
 URL: https://issues.apache.org/jira/browse/YARN-3445
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Affects Versions: 2.7.0
Reporter: Junping Du
Assignee: Junping Du


Per discussion in YARN-3334, we need filter out unnecessary collectors info 
from RM in heartbeat response. Our propose is to add additional field for 
running apps in NM heartbeat request, so RM only send collectors for local 
running apps back. This is also needed in YARN-914 (graceful decommission) that 
if no running apps in NM which is in decommissioning stage, it will get 
decommissioned immediately. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.

2015-04-02 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394052#comment-14394052
 ] 

Junping Du commented on YARN-3334:
--

Thanks [~zjshen] and [~sjlee0] for comments!
bq. If so, I suggest combining the two massages together, and record a 
error-level log (the first message is actually useless, if we always report the 
second one).
That sounds OK. Will update a quick fix.

bq. However, I do worry about the size of the map produced in the response in 
ResourceTrackerService. It can be potentially quite large every time and has a 
potential impact on many things as it is part of the NM heartbeat handling. 
It's OK for now, but we should try to address it sooner than later.
Just filed YARN-3445 to track this issue. This is also needed in gracefully 
decommission (YARN-914) - decommissioning node can be terminated earlier by RM 
if no running apps.

 [Event Producers] NM TimelineClient life cycle handling and container metrics 
 posting to new timeline service.
 --

 Key: YARN-3334
 URL: https://issues.apache.org/jira/browse/YARN-3334
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: YARN-2928
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
 YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, 
 YARN-3334-v5.patch, YARN-3334-v6.patch, YARN-3334.7.patch


 After YARN-3039, we have service discovery mechanism to pass app-collector 
 service address among collectors, NMs and RM. In this JIRA, we will handle 
 service address setting for TimelineClients in NodeManager, and put container 
 metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.

2015-04-02 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3334:
-
Attachment: YARN-3334-v6.patch

Incorporate [~zjshen]'s comments in v6 patch. Rebase it to latest YARN-2928 and 
verified e2e test can pass. [~zjshen], can you look it again? Thanks!

 [Event Producers] NM TimelineClient life cycle handling and container metrics 
 posting to new timeline service.
 --

 Key: YARN-3334
 URL: https://issues.apache.org/jira/browse/YARN-3334
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: YARN-2928
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
 YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, 
 YARN-3334-v5.patch, YARN-3334-v6.patch


 After YARN-3039, we have service discovery mechanism to pass app-collector 
 service address among collectors, NMs and RM. In this JIRA, we will handle 
 service address setting for TimelineClients in NodeManager, and put container 
 metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3293) Track and display capacity scheduler health metrics in web UI

2015-04-02 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393194#comment-14393194
 ] 

Craig Welch commented on YARN-3293:
---

Hey [~vvasudev], it seems that the patch doesn't apply cleanly, can you update 
to latest trunk?

 Track and display capacity scheduler health metrics in web UI
 -

 Key: YARN-3293
 URL: https://issues.apache.org/jira/browse/YARN-3293
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, 
 apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch


 It would be good to display metrics that let users know about the health of 
 the capacity scheduler in the web UI. Today it is hard to get an idea if the 
 capacity scheduler is functioning correctly. Metrics such as the time for the 
 last allocation, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3425) NPE from RMNodeLabelsManager.serviceStop when NodeLabelsManager.serviceInit failed

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392872#comment-14392872
 ] 

Hudson commented on YARN-3425:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #151 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/151/])
YARN-3425. NPE from RMNodeLabelsManager.serviceStop when 
NodeLabelsManager.serviceInit failed. (Bibin A Chundatt via wangda) (wangda: 
rev 492239424a3ace9868b6154f44a0f18fa5318235)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* hadoop-yarn-project/CHANGES.txt


 NPE from RMNodeLabelsManager.serviceStop when NodeLabelsManager.serviceInit 
 failed
 --

 Key: YARN-3425
 URL: https://issues.apache.org/jira/browse/YARN-3425
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
 Environment: 1 RM, 1 NM , 1 NN , I DN
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3425.001.patch


 Configure yarn.node-labels.enabled to true 
 and yarn.node-labels.fs-store.root-dir /node-labels
 Start resource manager without starting DN/NM
 {quote}
 2015-03-31 16:44:13,782 WARN org.apache.hadoop.service.AbstractService: When 
 stopping the service 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager : 
 java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.stopDispatcher(CommonNodeLabelsManager.java:261)
   at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:267)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
   at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:556)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:984)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:251)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1207)
 {quote}
 {code}
  protected void stopDispatcher() {
 AsyncDispatcher asyncDispatcher = (AsyncDispatcher) dispatcher;
asyncDispatcher.stop(); 
   }
 {code}
 Null check missing during stop



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3435) AM container to be allocated Appattempt AM container shown as null

2015-04-02 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-3435:
--

 Summary: AM container to be allocated Appattempt AM container 
shown as null
 Key: YARN-3435
 URL: https://issues.apache.org/jira/browse/YARN-3435
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: 1RM,1DN
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Trivial


Submit yarn application
Open http://rm:8088/cluster/appattempt/appattempt_1427984982805_0003_01 
Before the AM container is allocated 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3430) RMAppAttempt headroom data is missing in RM Web UI

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392866#comment-14392866
 ] 

Hudson commented on YARN-3430:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #151 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/151/])
YARN-3430. Made headroom data available on app attempt page of RM WebUI. 
Contributed by Xuan Gong. (zjshen: rev 8366a36ad356e6318b8ce6c5c96e201149f811bd)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppAttemptBlock.java


 RMAppAttempt headroom data is missing in RM Web UI
 --

 Key: YARN-3430
 URL: https://issues.apache.org/jira/browse/YARN-3430
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3430.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392789#comment-14392789
 ] 

Hudson commented on YARN-3248:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #142 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/142/])
YARN-3248. Correct fix version from branch-2.7 to branch-2.8 in the change log. 
(xgong: rev 2e79f1c2125517586c165a84e99d3c4d38ca0938)
* hadoop-yarn-project/CHANGES.txt


 Display count of nodes blacklisted by apps in the web UI
 

 Key: YARN-3248
 URL: https://issues.apache.org/jira/browse/YARN-3248
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: All applications.png, App page.png, Screenshot.jpg, 
 apache-yarn-3248.0.patch, apache-yarn-3248.1.patch, apache-yarn-3248.2.patch, 
 apache-yarn-3248.3.patch, apache-yarn-3248.4.patch


 It would be really useful when debugging app performance and failure issues 
 to get a count of the nodes blacklisted by individual apps displayed in the 
 web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3425) NPE from RMNodeLabelsManager.serviceStop when NodeLabelsManager.serviceInit failed

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392791#comment-14392791
 ] 

Hudson commented on YARN-3425:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #142 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/142/])
YARN-3425. NPE from RMNodeLabelsManager.serviceStop when 
NodeLabelsManager.serviceInit failed. (Bibin A Chundatt via wangda) (wangda: 
rev 492239424a3ace9868b6154f44a0f18fa5318235)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* hadoop-yarn-project/CHANGES.txt


 NPE from RMNodeLabelsManager.serviceStop when NodeLabelsManager.serviceInit 
 failed
 --

 Key: YARN-3425
 URL: https://issues.apache.org/jira/browse/YARN-3425
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
 Environment: 1 RM, 1 NM , 1 NN , I DN
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3425.001.patch


 Configure yarn.node-labels.enabled to true 
 and yarn.node-labels.fs-store.root-dir /node-labels
 Start resource manager without starting DN/NM
 {quote}
 2015-03-31 16:44:13,782 WARN org.apache.hadoop.service.AbstractService: When 
 stopping the service 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager : 
 java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.stopDispatcher(CommonNodeLabelsManager.java:261)
   at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:267)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
   at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:556)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:984)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:251)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1207)
 {quote}
 {code}
  protected void stopDispatcher() {
 AsyncDispatcher asyncDispatcher = (AsyncDispatcher) dispatcher;
asyncDispatcher.stop(); 
   }
 {code}
 Null check missing during stop



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3430) RMAppAttempt headroom data is missing in RM Web UI

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392784#comment-14392784
 ] 

Hudson commented on YARN-3430:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #142 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/142/])
YARN-3430. Made headroom data available on app attempt page of RM WebUI. 
Contributed by Xuan Gong. (zjshen: rev 8366a36ad356e6318b8ce6c5c96e201149f811bd)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppAttemptBlock.java


 RMAppAttempt headroom data is missing in RM Web UI
 --

 Key: YARN-3430
 URL: https://issues.apache.org/jira/browse/YARN-3430
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3430.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3433) Jersey tests failing with Port in Use -again

2015-04-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392497#comment-14392497
 ] 

Steve Loughran commented on YARN-3433:
--

{code}
com.sun.jersey.test.framework.spi.container.TestContainerException: 
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at 
org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413)
at 
org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384)
at 
org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375)
at 
org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549)
at 
org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255)
at 
com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326)
at 
com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343)
at 
com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219)
at 
com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.init(GrizzlyWebTestContainerFactory.java:129)
at 
com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.init(GrizzlyWebTestContainerFactory.java:86)
at 
com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79)
at 
com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342)
at com.sun.jersey.test.framework.JerseyTest.init(JerseyTest.java:217)
at 
org.apache.hadoop.yarn.webapp.JerseyTestBase.init(JerseyTestBase.java:27)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps.init(TestRMWebServicesApps.java:111)
{code}

 Jersey tests failing with Port in Use -again
 

 Key: YARN-3433
 URL: https://issues.apache.org/jira/browse/YARN-3433
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0
 Environment: ASF Jenkins
Reporter: Steve Loughran

 ASF Jenkins jersey tests failing with port in use exceptions.
 The YARN-2912 patch tried to fix it, but it defaults to port 9998 and doesn't 
 scan for a spare port —so is too brittle on a busy server



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3425) NPE from RMNodeLabelsManager.serviceStop when NodeLabelsManager.serviceInit failed

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392543#comment-14392543
 ] 

Hudson commented on YARN-3425:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #885 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/885/])
YARN-3425. NPE from RMNodeLabelsManager.serviceStop when 
NodeLabelsManager.serviceInit failed. (Bibin A Chundatt via wangda) (wangda: 
rev 492239424a3ace9868b6154f44a0f18fa5318235)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java


 NPE from RMNodeLabelsManager.serviceStop when NodeLabelsManager.serviceInit 
 failed
 --

 Key: YARN-3425
 URL: https://issues.apache.org/jira/browse/YARN-3425
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
 Environment: 1 RM, 1 NM , 1 NN , I DN
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3425.001.patch


 Configure yarn.node-labels.enabled to true 
 and yarn.node-labels.fs-store.root-dir /node-labels
 Start resource manager without starting DN/NM
 {quote}
 2015-03-31 16:44:13,782 WARN org.apache.hadoop.service.AbstractService: When 
 stopping the service 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager : 
 java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.stopDispatcher(CommonNodeLabelsManager.java:261)
   at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:267)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
   at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:556)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:984)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:251)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1207)
 {quote}
 {code}
  protected void stopDispatcher() {
 AsyncDispatcher asyncDispatcher = (AsyncDispatcher) dispatcher;
asyncDispatcher.stop(); 
   }
 {code}
 Null check missing during stop



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3433) Jersey tests failing with Port in Use -again

2015-04-02 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-3433:


 Summary: Jersey tests failing with Port in Use -again
 Key: YARN-3433
 URL: https://issues.apache.org/jira/browse/YARN-3433
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0
 Environment: ASF Jenkins
Reporter: Steve Loughran


ASF Jenkins jersey tests failing with port in use exceptions.

The YARN-2912 patch tried to fix it, but it defaults to port 9998 and doesn't 
scan for a spare port —so is too brittle on a busy server



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3433) Jersey tests failing with Port in Use -again

2015-04-02 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula reassigned YARN-3433:
--

Assignee: Brahma Reddy Battula

 Jersey tests failing with Port in Use -again
 

 Key: YARN-3433
 URL: https://issues.apache.org/jira/browse/YARN-3433
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Brahma Reddy Battula

 ASF Jenkins jersey tests failing with port in use exceptions.
 The YARN-2912 patch tried to fix it, but it defaults to port 9998 and doesn't 
 scan for a spare port —so is too brittle on a busy server



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392656#comment-14392656
 ] 

Hudson commented on YARN-3248:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2083 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2083/])
YARN-3248. Correct fix version from branch-2.7 to branch-2.8 in the change log. 
(xgong: rev 2e79f1c2125517586c165a84e99d3c4d38ca0938)
* hadoop-yarn-project/CHANGES.txt


 Display count of nodes blacklisted by apps in the web UI
 

 Key: YARN-3248
 URL: https://issues.apache.org/jira/browse/YARN-3248
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: All applications.png, App page.png, Screenshot.jpg, 
 apache-yarn-3248.0.patch, apache-yarn-3248.1.patch, apache-yarn-3248.2.patch, 
 apache-yarn-3248.3.patch, apache-yarn-3248.4.patch


 It would be really useful when debugging app performance and failure issues 
 to get a count of the nodes blacklisted by individual apps displayed in the 
 web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3425) NPE from RMNodeLabelsManager.serviceStop when NodeLabelsManager.serviceInit failed

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392658#comment-14392658
 ] 

Hudson commented on YARN-3425:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2083 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2083/])
YARN-3425. NPE from RMNodeLabelsManager.serviceStop when 
NodeLabelsManager.serviceInit failed. (Bibin A Chundatt via wangda) (wangda: 
rev 492239424a3ace9868b6154f44a0f18fa5318235)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* hadoop-yarn-project/CHANGES.txt


 NPE from RMNodeLabelsManager.serviceStop when NodeLabelsManager.serviceInit 
 failed
 --

 Key: YARN-3425
 URL: https://issues.apache.org/jira/browse/YARN-3425
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
 Environment: 1 RM, 1 NM , 1 NN , I DN
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3425.001.patch


 Configure yarn.node-labels.enabled to true 
 and yarn.node-labels.fs-store.root-dir /node-labels
 Start resource manager without starting DN/NM
 {quote}
 2015-03-31 16:44:13,782 WARN org.apache.hadoop.service.AbstractService: When 
 stopping the service 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager : 
 java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.stopDispatcher(CommonNodeLabelsManager.java:261)
   at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:267)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
   at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:556)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:984)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:251)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1207)
 {quote}
 {code}
  protected void stopDispatcher() {
 AsyncDispatcher asyncDispatcher = (AsyncDispatcher) dispatcher;
asyncDispatcher.stop(); 
   }
 {code}
 Null check missing during stop



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-02 Thread Thomas Graves (JIRA)
Thomas Graves created YARN-3434:
---

 Summary: Interaction between reservations and userlimit can result 
in significant ULF violation
 Key: YARN-3434
 URL: https://issues.apache.org/jira/browse/YARN-3434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Thomas Graves


ULF was set to 1.0
User was able to consume 1.4X queue capacity.
It looks like when this application launched, it reserved about 1000 
containers, each 8G each, within about 5 seconds. I think this allowed the 
logic in assignToUser() to allow the userlimit to be surpassed.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-02 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392751#comment-14392751
 ] 

Thomas Graves commented on YARN-3434:
-

The issue here is that in if we allow the user to continue from the user limit 
checks in assignContainers because they have reservations, when it gets down 
into the assignContainer routine and its allowed to get a container and the 
node has space we don't double check the user limit in this case.  We recheck 
in all other cases but this one is missed.  

 Interaction between reservations and userlimit can result in significant ULF 
 violation
 --

 Key: YARN-3434
 URL: https://issues.apache.org/jira/browse/YARN-3434
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Thomas Graves

 ULF was set to 1.0
 User was able to consume 1.4X queue capacity.
 It looks like when this application launched, it reserved about 1000 
 containers, each 8G each, within about 5 seconds. I think this allowed the 
 logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392541#comment-14392541
 ] 

Hudson commented on YARN-3248:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #885 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/885/])
YARN-3248. Display count of nodes blacklisted by apps in the web UI. (xgong: 
rev 4728bdfa15809db4b8b235faa286c65de4a48cf6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppsBlockWithMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppAttemptBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppAttemptInfo.java
YARN-3248. Correct fix version from branch-2.7 to branch-2.8 in the change log. 
(xgong: rev 2e79f1c2125517586c165a84e99d3c4d38ca0938)
* hadoop-yarn-project/CHANGES.txt


 Display count of nodes blacklisted by apps in the web UI
 

 Key: YARN-3248
 URL: https://issues.apache.org/jira/browse/YARN-3248
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: All applications.png, App page.png, Screenshot.jpg, 
 apache-yarn-3248.0.patch, apache-yarn-3248.1.patch, apache-yarn-3248.2.patch, 
 apache-yarn-3248.3.patch, apache-yarn-3248.4.patch


 It would be really useful when debugging app performance and failure issues 
 to get a count of the nodes blacklisted by individual apps displayed in the 
 web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3430) RMAppAttempt headroom data is missing in RM Web UI

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392536#comment-14392536
 ] 

Hudson commented on YARN-3430:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #885 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/885/])
YARN-3430. Made headroom data available on app attempt page of RM WebUI. 
Contributed by Xuan Gong. (zjshen: rev 8366a36ad356e6318b8ce6c5c96e201149f811bd)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppAttemptBlock.java
* hadoop-yarn-project/CHANGES.txt


 RMAppAttempt headroom data is missing in RM Web UI
 --

 Key: YARN-3430
 URL: https://issues.apache.org/jira/browse/YARN-3430
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3430.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392595#comment-14392595
 ] 

Hudson commented on YARN-3248:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #151 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/151/])
YARN-3248. Display count of nodes blacklisted by apps in the web UI. (xgong: 
rev 4728bdfa15809db4b8b235faa286c65de4a48cf6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppAttemptInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppAttemptBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppsBlockWithMetrics.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java
YARN-3248. Correct fix version from branch-2.7 to branch-2.8 in the change log. 
(xgong: rev 2e79f1c2125517586c165a84e99d3c4d38ca0938)
* hadoop-yarn-project/CHANGES.txt


 Display count of nodes blacklisted by apps in the web UI
 

 Key: YARN-3248
 URL: https://issues.apache.org/jira/browse/YARN-3248
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.8.0

 Attachments: All applications.png, App page.png, Screenshot.jpg, 
 apache-yarn-3248.0.patch, apache-yarn-3248.1.patch, apache-yarn-3248.2.patch, 
 apache-yarn-3248.3.patch, apache-yarn-3248.4.patch


 It would be really useful when debugging app performance and failure issues 
 to get a count of the nodes blacklisted by individual apps displayed in the 
 web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3430) RMAppAttempt headroom data is missing in RM Web UI

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392590#comment-14392590
 ] 

Hudson commented on YARN-3430:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #151 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/151/])
YARN-3430. Made headroom data available on app attempt page of RM WebUI. 
Contributed by Xuan Gong. (zjshen: rev 8366a36ad356e6318b8ce6c5c96e201149f811bd)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppAttemptBlock.java
* hadoop-yarn-project/CHANGES.txt


 RMAppAttempt headroom data is missing in RM Web UI
 --

 Key: YARN-3430
 URL: https://issues.apache.org/jira/browse/YARN-3430
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3430.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3425) NPE from RMNodeLabelsManager.serviceStop when NodeLabelsManager.serviceInit failed

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392597#comment-14392597
 ] 

Hudson commented on YARN-3425:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #151 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/151/])
YARN-3425. NPE from RMNodeLabelsManager.serviceStop when 
NodeLabelsManager.serviceInit failed. (Bibin A Chundatt via wangda) (wangda: 
rev 492239424a3ace9868b6154f44a0f18fa5318235)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java


 NPE from RMNodeLabelsManager.serviceStop when NodeLabelsManager.serviceInit 
 failed
 --

 Key: YARN-3425
 URL: https://issues.apache.org/jira/browse/YARN-3425
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
 Environment: 1 RM, 1 NM , 1 NN , I DN
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3425.001.patch


 Configure yarn.node-labels.enabled to true 
 and yarn.node-labels.fs-store.root-dir /node-labels
 Start resource manager without starting DN/NM
 {quote}
 2015-03-31 16:44:13,782 WARN org.apache.hadoop.service.AbstractService: When 
 stopping the service 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager : 
 java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.stopDispatcher(CommonNodeLabelsManager.java:261)
   at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:267)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
   at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:556)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:984)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:251)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1207)
 {quote}
 {code}
  protected void stopDispatcher() {
 AsyncDispatcher asyncDispatcher = (AsyncDispatcher) dispatcher;
asyncDispatcher.stop(); 
   }
 {code}
 Null check missing during stop



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3430) RMAppAttempt headroom data is missing in RM Web UI

2015-04-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392651#comment-14392651
 ] 

Hudson commented on YARN-3430:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2083 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2083/])
YARN-3430. Made headroom data available on app attempt page of RM WebUI. 
Contributed by Xuan Gong. (zjshen: rev 8366a36ad356e6318b8ce6c5c96e201149f811bd)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppAttemptBlock.java


 RMAppAttempt headroom data is missing in RM Web UI
 --

 Key: YARN-3430
 URL: https://issues.apache.org/jira/browse/YARN-3430
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3430.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3432) Cluster metrics have wrong Total Memory when there is reserved memory on CS

2015-04-02 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392687#comment-14392687
 ] 

Thomas Graves commented on YARN-3432:
-

that will fix it for the capacity scheduler, we need to see if that breaks the 
FairScheduler though.



 Cluster metrics have wrong Total Memory when there is reserved memory on CS
 ---

 Key: YARN-3432
 URL: https://issues.apache.org/jira/browse/YARN-3432
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Affects Versions: 2.6.0
Reporter: Thomas Graves
Assignee: Brahma Reddy Battula

 I noticed that when reservations happen when using the Capacity Scheduler, 
 the UI and web services report the wrong total memory.
 For example.  I have a 300GB of total memory in my cluster.  I allocate 50 
 and I reserve 10.  The cluster metrics for total memory get reported as 290GB.
 This was broken by https://issues.apache.org/jira/browse/YARN-656 so perhaps 
 there is a difference between fair scheduler and capacity scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3435) AM container to be allocated Appattempt AM container shown as null

2015-04-02 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3435:
---
Attachment: Screenshot.png

Attaching Screen shot for bug

 AM container to be allocated Appattempt AM container shown as null
 --

 Key: YARN-3435
 URL: https://issues.apache.org/jira/browse/YARN-3435
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: 1RM,1DN
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Trivial
 Attachments: Screenshot.png


 Submit yarn application
 Open http://rm:8088/cluster/appattempt/appattempt_1427984982805_0003_01 
 Before the AM container is allocated 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.

2015-04-02 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393015#comment-14393015
 ] 

Zhijie Shen commented on YARN-3334:
---

Junping, did you have the chance to look at the 3 and 4 of my last patch 
comment? One more nit: newTimelineServiceEnabled(config) - 
systemMetricsPublisherEnabled?

 [Event Producers] NM TimelineClient life cycle handling and container metrics 
 posting to new timeline service.
 --

 Key: YARN-3334
 URL: https://issues.apache.org/jira/browse/YARN-3334
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: YARN-2928
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
 YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, YARN-3334-v5.patch


 After YARN-3039, we have service discovery mechanism to pass app-collector 
 service address among collectors, NMs and RM. In this JIRA, we will handle 
 service address setting for TimelineClients in NodeManager, and put container 
 metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3436) Doc WebServicesIntro.html Example Rest API url wrong

2015-04-02 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-3436:
--

 Summary: Doc WebServicesIntro.html Example Rest API url wrong
 Key: YARN-3436
 URL: https://issues.apache.org/jira/browse/YARN-3436
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor


/docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html

{quote}

Response Examples
JSON response with single resource

HTTP Request: GET 
http://rmhost.domain:8088/ws/v1/cluster/{color:red}app{color}/application_1324057493980_0001

Response Status Line: HTTP/1.1 200 OK

{quote}

Url should be ws/v1/cluster/{color:red}apps{color} .
2 examples on same page are wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3437) convert load test driver to timeline service v.2

2015-04-02 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3437:
-

 Summary: convert load test driver to timeline service v.2
 Key: YARN-3437
 URL: https://issues.apache.org/jira/browse/YARN-3437
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee


This subtask covers the work for converting the proposed patch for the load 
test driver (YARN-2556) to work with the timeline service v.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3435) AM container to be allocated Appattempt AM container shown as null

2015-04-02 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3435:
---
Attachment: YARN-3435.001.patch

 AM container to be allocated Appattempt AM container shown as null
 --

 Key: YARN-3435
 URL: https://issues.apache.org/jira/browse/YARN-3435
 Project: Hadoop YARN
  Issue Type: Bug
 Environment: 1RM,1DN
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Trivial
 Attachments: Screenshot.png, YARN-3435.001.patch


 Submit yarn application
 Open http://rm:8088/cluster/appattempt/appattempt_1427984982805_0003_01 
 Before the AM container is allocated 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3438) add a mode to replay MR job history files to the timeline service

2015-04-02 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3438:
-

 Summary: add a mode to replay MR job history files to the timeline 
service
 Key: YARN-3438
 URL: https://issues.apache.org/jira/browse/YARN-3438
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee


The subtask covers the work on top of YARN-3437 to add a mode to replay MR job 
history files to the timeline service storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >