[jira] [Commented] (YARN-4547) LeafQueue#getApplications() is read-only interface, but it provides reference to caller

2016-02-16 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150015#comment-15150015
 ] 

Devaraj K commented on YARN-4547:
-

shouldn't it be duplicate instead of Done?

> LeafQueue#getApplications() is read-only interface, but it provides reference 
> to caller
> ---
>
> Key: YARN-4547
> URL: https://issues.apache.org/jira/browse/YARN-4547
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> The below API is read-only interface, but returning reference to the caller. 
> This causing caller to modify the orderingPolicy entities. If required 
> reference of ordering policy, caller can use 
> {{LeagQueue#getOrderingPolicy()#getSchedulableEntities()}}
> The returning object should be clone of 
> orderingPolicy.getSchedulableEntities()
> {code}
>   /**
>* Obtain (read-only) collection of active applications.
>*/
>   public Collection getApplications() {
> return orderingPolicy.getSchedulableEntities();
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4547) LeafQueue#getApplications() is read-only interface, but it provides reference to caller

2016-02-16 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S resolved YARN-4547.
-
Resolution: Done

This change is handled along with YARN-4617. Closing it as Done

> LeafQueue#getApplications() is read-only interface, but it provides reference 
> to caller
> ---
>
> Key: YARN-4547
> URL: https://issues.apache.org/jira/browse/YARN-4547
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> The below API is read-only interface, but returning reference to the caller. 
> This causing caller to modify the orderingPolicy entities. If required 
> reference of ordering policy, caller can use 
> {{LeagQueue#getOrderingPolicy()#getSchedulableEntities()}}
> The returning object should be clone of 
> orderingPolicy.getSchedulableEntities()
> {code}
>   /**
>* Obtain (read-only) collection of active applications.
>*/
>   public Collection getApplications() {
> return orderingPolicy.getSchedulableEntities();
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission

2016-02-16 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149811#comment-15149811
 ] 

Junping Du commented on YARN-3223:
--

Sorry for reply late as on vacation recently. Will take a look at it soon.

> Resource update during NM graceful decommission
> ---
>
> Key: YARN-3223
> URL: https://issues.apache.org/jira/browse/YARN-3223
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Junping Du
>Assignee: Brook Zhou
> Attachments: YARN-3223-v0.patch, YARN-3223-v1.patch, 
> YARN-3223-v2.patch, YARN-3223-v3.patch, YARN-3223-v4.patch, YARN-3223-v5.patch
>
>
> During NM graceful decommission, we should handle resource update properly, 
> include: make RMNode keep track of old resource for possible rollback, keep 
> available resource to 0 and used resource get updated when
> container finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run

2016-02-16 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149654#comment-15149654
 ] 

Jun Gong commented on YARN-3998:


Sorry for late reply, I was on holiday.

Thanks [~vinodkv] and [~vvasudev] for suggestion and review!

Some additional thought besides [~vvasudev]'s opinion: 
{quote}
Unification with AM restart policies
{quote}
I agree with [~vvasudev]. Now AM restart polices is retrying across different 
nodes, this feature is retrying on local node. When RM launches AM, it could 
specify local retry policy for it.  

{quote}
Treat relaunch in a first-class manner
{quote}
Glad to see it to be a first-class manner, I will update the patch.

{quote}
The following isn’t fool-proof and won’t work for all apps, can we just persist 
and read the selected log-dir from the state-store?
ContainerLaunch.handleContainerExitWithFailure() needs to handled differently 
during container-relaunches.
The same can be done for the work-dir.
All of these are related. If we store the log dir and work dir in the state 
store, we can address all 3 of these. 
{quote}
Yes, it will be better to store the log dir and work dir if we aims to make it 
more accurate. I was thinking to make minimal changes for this feature.

{quote}
In fact, if we end up changing the work-dir during relaunch due to a bad-dir, 
that may result in a breakage for the app. Apps may be reading from / writing 
into the work-dir and changing it during relaunch may invalidate application's 
assumptions. Should we just fail the container completely and let the AM deal 
with it?
{quote}
My thought is that if user specifies retry policy on container, the user should 
make sure that container could deal with this situation.

{quote}
Instead of removing a line and setting the limit to 10*1000, take the last 'n' 
characters in the string where 'n' is a config setting.
{quote}
It might make the diagnostics not consistent to remove the last n characters, 
suppose the  diagnostics is “The exception is ” and there is n characters 
in XXX, the diagnositics becomes “The exception is”. There is similar problem 
by removing first or last n lines. How about removing previous attempts' error 
information and just keeping the latest attempt's information? 

Glad to see more discussion about the feature.

> Add retry-times to let NM re-launch container when it fails to run
> --
>
> Key: YARN-3998
> URL: https://issues.apache.org/jira/browse/YARN-3998
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3998.01.patch, YARN-3998.02.patch, 
> YARN-3998.03.patch, YARN-3998.04.patch, YARN-3998.05.patch, YARN-3998.06.patch
>
>
> I'd like to add a field(retry-times) in ContainerLaunchContext. When AM 
> launches containers, it could specify the value. Then NM will re-launch the 
> container 'retry-times' times when it fails to run(e.g.exit code is not 0). 
> It will save a lot of time. It avoids container localization. RM does not 
> need to re-schedule the container. And local files in container's working 
> directory will be left for re-use.(If container have downloaded some big 
> files, it does not need to re-download them when running again.) 
> We find it is useful in systems like Storm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4697) NM aggregation thread pool is not bound by limits

2016-02-16 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149568#comment-15149568
 ] 

Haibo Chen commented on YARN-4697:
--

Default value is 100 as defined in YarnConfiguration

> NM aggregation thread pool is not bound by limits
> -
>
> Key: YARN-4697
> URL: https://issues.apache.org/jira/browse/YARN-4697
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4697.001.patch
>
>
> In the LogAggregationService.java we create a threadpool to upload logs from 
> the nodemanager to HDFS if log aggregation is turned on. This is a cached 
> threadpool which based on the javadoc is an ulimited pool of threads.
> In the case that we have had a problem with log aggregation this could cause 
> a problem on restart. The number of threads created at that point could be 
> huge and will put a large load on the NameNode and in worse case could even 
> bring it down due to file descriptor issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4697) NM aggregation thread pool is not bound by limits

2016-02-16 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4697:
-
Attachment: yarn4697.001.patch

> NM aggregation thread pool is not bound by limits
> -
>
> Key: YARN-4697
> URL: https://issues.apache.org/jira/browse/YARN-4697
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4697.001.patch
>
>
> In the LogAggregationService.java we create a threadpool to upload logs from 
> the nodemanager to HDFS if log aggregation is turned on. This is a cached 
> threadpool which based on the javadoc is an ulimited pool of threads.
> In the case that we have had a problem with log aggregation this could cause 
> a problem on restart. The number of threads created at that point could be 
> huge and will put a large load on the NameNode and in worse case could even 
> bring it down due to file descriptor issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4696) EntityGroupFSTimelineStore to work in the absence of an RM

2016-02-16 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149562#comment-15149562
 ] 

Li Lu commented on YARN-4696:
-

Thanks for the work [~ste...@apache.org]! My main question is that what is the 
assumed use case for the "non-RM" mode of the reader, other than unit tests? If 
it's only for unit tests, are there any ways we can clearly restrict this? 
Because IIUC, if detached from the RM, all app states will be unknown and 
eventually completed. However, the status is not accurate because it's only a 
timeout from unknownActiveMillis. 

For unit tests, is it possible to have a mock RM to to the same job? If there 
are too much troubles then having this looks fine, but we need to clearly 
restrict the use case. 

nits:
- Line 46, EntityGroupFSTimelineStore, I think we'd incline to avoid import 
\*s? 
- There is a findbugs warning about an inconsistent synchronization condition 
for LevelDBCacheTimelineStore, where we may want to synchronize on the 
constructor? This is an unrelated failure, so feel free to skip it. However, if 
you happen to have time, a quick fix would also be helpful. 

[~xgong] to double check the logic on the writer side. Exception handling looks 
fine but I would like to double check the logic on the flush. 

> EntityGroupFSTimelineStore to work in the absence of an RM
> --
>
> Key: YARN-4696
> URL: https://issues.apache.org/jira/browse/YARN-4696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
> Attachments: YARN-4696-001.patch
>
>
> {{EntityGroupFSTimelineStore}} now depends on an RM being up and running; the 
> configuration pointing to it. This is a new change, and impacts testing where 
> you have historically been able to test without an RM running.
> The sole purpose of the probe is to automatically determine if an app is 
> running; it falls back to "unknown" if not. If the RM connection was 
> optional, the "unknown" codepath could be called directly, relying on age of 
> file as a metric of completion
> Options
> # add a flag to disable RM connect
> # skip automatically if RM not defined/set to 0.0.0.0
> # disable retries on yarn client IPC; if it fails, tag app as unknown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4697) NM aggregation thread pool is not bound by limits

2016-02-16 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-4697:


 Summary: NM aggregation thread pool is not bound by limits
 Key: YARN-4697
 URL: https://issues.apache.org/jira/browse/YARN-4697
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Haibo Chen
Assignee: Haibo Chen


In the LogAggregationService.java we create a threadpool to upload logs from 
the nodemanager to HDFS if log aggregation is turned on. This is a cached 
threadpool which based on the javadoc is an ulimited pool of threads.
In the case that we have had a problem with log aggregation this could cause a 
problem on restart. The number of threads created at that point could be huge 
and will put a large load on the NameNode and in worse case could even bring it 
down due to file descriptor issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2016-02-16 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-1492:
---
Target Version/s: 2.9.0  (was: 2.8.0)

> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Chris Trezzo
>Priority: Critical
> Attachments: YARN-1492-all-trunk-v1.patch, 
> YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
> YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
> shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
> shared_cache_design_v5.pdf, shared_cache_design_v6.pdf
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3637) Handle localization sym-linking correctly at the YARN level

2016-02-16 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-3637:
---
Target Version/s: 2.9.0  (was: 2.8.0)

> Handle localization sym-linking correctly at the YARN level
> ---
>
> Key: YARN-3637
> URL: https://issues.apache.org/jira/browse/YARN-3637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Blocker
>
> The shared cache needs to handle resource sym-linking at the YARN layer. 
> Currently, we let the application layer (i.e. mapreduce) handle this, but it 
> is probably better for all applications if it is handled transparently.
> Here is the scenario:
> Imagine two separate jars (with unique checksums) that have the same name 
> job.jar.
> They are stored in the shared cache as two separate resources:
> checksum1/job.jar
> checksum2/job.jar
> A new application tries to use both of these resources, but internally refers 
> to them as different names:
> foo.jar maps to checksum1
> bar.jar maps to checksum2
> When the shared cache returns the path to the resources, both resources are 
> named the same (i.e. job.jar). Because of this, when the resources are 
> localized one of them clobbers the other. This is because both symlinks in 
> the container_id directory are the same name (i.e. job.jar) even though they 
> point to two separate resource directories.
> Originally we tackled this in the MapReduce client by using the fragment 
> portion of the resource url. This, however, seems like something that should 
> be solved at the YARN layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3637) Handle localization sym-linking correctly at the YARN level

2016-02-16 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149512#comment-15149512
 ] 

Chris Trezzo commented on YARN-3637:


[~vinodkv] I don't think I will have time to work on this before the 2.8 
release. I can move this to 2.9 for now.

> Handle localization sym-linking correctly at the YARN level
> ---
>
> Key: YARN-3637
> URL: https://issues.apache.org/jira/browse/YARN-3637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Blocker
>
> The shared cache needs to handle resource sym-linking at the YARN layer. 
> Currently, we let the application layer (i.e. mapreduce) handle this, but it 
> is probably better for all applications if it is handled transparently.
> Here is the scenario:
> Imagine two separate jars (with unique checksums) that have the same name 
> job.jar.
> They are stored in the shared cache as two separate resources:
> checksum1/job.jar
> checksum2/job.jar
> A new application tries to use both of these resources, but internally refers 
> to them as different names:
> foo.jar maps to checksum1
> bar.jar maps to checksum2
> When the shared cache returns the path to the resources, both resources are 
> named the same (i.e. job.jar). Because of this, when the resources are 
> localized one of them clobbers the other. This is because both symlinks in 
> the container_id directory are the same name (i.e. job.jar) even though they 
> point to two separate resource directories.
> Originally we tackled this in the MapReduce client by using the fragment 
> portion of the resource url. This, however, seems like something that should 
> be solved at the YARN layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4690) Skip object allocation in FSAppAttempt#getResourceUsage when possible

2016-02-16 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149463#comment-15149463
 ] 

Sangjin Lee commented on YARN-4690:
---

It looks good to me for the most part. I have only one minor comment.

For the case where {{getPreemptedResources()}} is non-zero, we might want to 
cache that as a small optimization. In other words,
{code}
Resource preemptedResource = getPreemptedResources();
return preemptedResources.equals(Resources.none()) ?
getCurrentConsumption() :
Resources.subtract(getCurrentConsumption(), preemptedResources());
{code}

> Skip object allocation in FSAppAttempt#getResourceUsage when possible
> -
>
> Key: YARN-4690
> URL: https://issues.apache.org/jira/browse/YARN-4690
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: YARN-4690.patch
>
>
> YARN-2768 addresses an important bottleneck. Here is another similar instance 
> where object allocation in Resources#subtract will slow down the fair 
> scheduler's event processing thread.
> {noformat}
> org.apache.hadoop.yarn.factories.impl.pb.RecordFactoryPBImpl.newRecordInstance(RecordFactoryPBImpl.java)
> org.apache.hadoop.yarn.util.Records.newRecord(Records.java)
> 
> org.apache.hadoop.yarn.util.resource.Resources.createResource(Resources.java)
> org.apache.hadoop.yarn.util.resource.Resources.clone(Resources.java)
> org.apache.hadoop.yarn.util.resource.Resources.subtract(Resources.java)
> 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getResourceUsage(FSAppAttempt.java)
> 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.getResourceUsage(FSLeafQueue.java)
> 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy$FairShareComparator.compare(FairSharePolicy.java)
> 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy$FairShareComparator.compare(FairSharePolicy.java)
> java.util.TimSort.binarySort(TimSort.java)
> java.util.TimSort.sort(TimSort.java)
> java.util.TimSort.sort(TimSort.java)
> java.util.Arrays.sort(Arrays.java)
> java.util.Collections.sort(Collections.java)
> 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java)
> 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java)
> 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java)
> 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java)
> 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java)
> 
> org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.handle(ResourceSchedulerWrapper.java)
> 
> org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.handle(ResourceSchedulerWrapper.java)
> {noformat}
> One way to fix it is to return {{getCurrentConsumption()}} if there is no 
> preemption which is the normal case. This means {{getResourceUsage}} method 
> will return reference to {{FSAppAttempt}}'s internal resource object. But 
> that should be ok as {{getResourceUsage}} doesn't expect the caller to modify 
> the object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4117) End to end unit test with mini YARN cluster for AMRMProxy Service

2016-02-16 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149431#comment-15149431
 ] 

Giovanni Matteo Fumarola commented on YARN-4117:


I just ran all the tests that failed or timed out and they completed fine in my 
machine.

My only concern is about the checkstyle.  

> End to end unit test with mini YARN cluster for AMRMProxy Service
> -
>
> Key: YARN-4117
> URL: https://issues.apache.org/jira/browse/YARN-4117
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Reporter: Kishore Chaliparambil
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-4117.v0.patch
>
>
> YARN-2884 introduces a proxy between AM and RM. This JIRA proposes an end to 
> end unit test using mini YARN cluster to the AMRMProxy service. This test 
> will validate register, allocate and finish application and token renewal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4484) Available Resource calculation for a queue is not correct when used with labels

2016-02-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149414#comment-15149414
 ] 

Wangda Tan commented on YARN-4484:
--

Thanks [~sunilg]/[~bibinchundatt],

bq. In new patch we can also take care of condition where the label mapping is 
not available in queue.
Sorry I may not get this, could you elaborate?

Few comments for the patch:
- Instead of using componentwiseMax, {{available-headroom-per-partition = 
partition-used >= partition-configured ? componentwiseMax(partition-configured 
- partition-used, 0) : 0}}. This is because when partition-used >= 
partition-configured, you cannot allocate anything so the available should be 
considered to 0.
- It's better to add a test.

> Available Resource calculation for a queue is not correct when used with 
> labels
> ---
>
> Key: YARN-4484
> URL: https://issues.apache.org/jira/browse/YARN-4484
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4484.patch
>
>
> To calculate available resource for a queue, we have to get the total 
> resource allocated for all labels in queue compare to its usage. 
> Also address the comments given in 
> [YARN-4304-comments|https://issues.apache.org/jira/browse/YARN-4304?focusedCommentId=15064874=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15064874
>  ] given by [~leftnoteasy] for same.
> ClusterMetrics related issues will also get handled once we fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4412) Create ClusterMonitor to compute ordered list of preferred NMs for OPPORTUNITIC containers

2016-02-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149383#comment-15149383
 ] 

Hadoop QA commented on YARN-4412:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 11 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
53s {color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 11s 
{color} | {color:green} yarn-2877 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 39s 
{color} | {color:green} yarn-2877 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
9s {color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 48s 
{color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
21s {color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
20s {color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 20s 
{color} | {color:green} yarn-2877 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 56s 
{color} | {color:green} yarn-2877 passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 6m 19s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 19s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 58s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 6m 58s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 58s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 9s 
{color} | {color:red} root: patch generated 83 new + 441 unchanged - 12 fixed = 
524 total (was 453) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 49s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 24s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 57s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 51s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_72. 
{color} |
| 

[jira] [Commented] (YARN-3945) maxApplicationsPerUser is wrongly calculated

2016-02-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149365#comment-15149365
 ] 

Wangda Tan commented on YARN-3945:
--

[~Naganarasimha],
bq. so based on this there will be no elasticity even though the resources are 
free in some other queue, is this expected ?
I think so

bq. Are we trying to avoid elasticity because we try to avoid preempting AM's 
even when preemption is enabled?
That's the one purpose, the other purpose is, when preemption is disabled, we 
will not suffer with too many AMs launched with queue's available resource 
increases and then come back.

To make move this task forward, I would suggest:
# Resolve bug of maxApplicationsPerUser should be capped by 
maxApplicationsPerQueue
# Computation of user AM limit should be symmetric to computation of 
user-limit, and user AM limit should be capped by queue's AM limit
# Avoid flexibility of computing queue and user's AM-limit (do not consider 
queue max cap). This needs more discussion.

My understanding is, #1 and #2 are scope of this JIRA, #3 could be done 
separately.

Agree?


> maxApplicationsPerUser is wrongly calculated
> 
>
> Key: YARN-3945
> URL: https://issues.apache.org/jira/browse/YARN-3945
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.7.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-3945.20150728-1.patch, YARN-3945.20150729-1.patch, 
> YARN-3945.V1.003.patch
>
>
> maxApplicationsPerUser is currently calculated based on the formula
> {{maxApplicationsPerUser = (int)(maxApplications * (userLimit / 100.0f) * 
> userLimitFactor)}} but description of userlimit is 
> {quote}
> Each queue enforces a limit on the percentage of resources allocated to a 
> user at any given time, if there is demand for resources. The user limit can 
> vary between a minimum and maximum value.{color:red} The the former (the 
> minimum value) is set to this property value {color} and the latter (the 
> maximum value) depends on the number of users who have submitted 
> applications. For e.g., suppose the value of this property is 25. If two 
> users have submitted applications to a queue, no single user can use more 
> than 50% of the queue resources. If a third user submits an application, no 
> single user can use more than 33% of the queue resources. With 4 or more 
> users, no user can use more than 25% of the queues resources. A value of 100 
> implies no user limits are imposed. The default is 100. Value is specified as 
> a integer.
> {quote}
> configuration related to minimum limit should not be made used in a formula 
> to calculate max applications for a user



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4117) End to end unit test with mini YARN cluster for AMRMProxy Service

2016-02-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149342#comment-15149342
 ] 

Hadoop QA commented on YARN-4117:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 54s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
58s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
0s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 55s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 3 new + 
55 unchanged - 0 fixed = 58 total (was 55) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
35s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 48s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 45s {color} 
| {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.8.0_72. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 5s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_72. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 15s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 6s {color} | 
{color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.7.0_95. 
{color} |
| 

[jira] [Commented] (YARN-3637) Handle localization sym-linking correctly at the YARN level

2016-02-16 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149193#comment-15149193
 ] 

Vinod Kumar Vavilapalli commented on YARN-3637:
---

[~ctrezzo], are you planning to working on this any time soon? We need to make 
a decision about this (and shared-cache security support in general) for 2.8.0 
as soon as possible. Tx. 

> Handle localization sym-linking correctly at the YARN level
> ---
>
> Key: YARN-3637
> URL: https://issues.apache.org/jira/browse/YARN-3637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Blocker
>
> The shared cache needs to handle resource sym-linking at the YARN layer. 
> Currently, we let the application layer (i.e. mapreduce) handle this, but it 
> is probably better for all applications if it is handled transparently.
> Here is the scenario:
> Imagine two separate jars (with unique checksums) that have the same name 
> job.jar.
> They are stored in the shared cache as two separate resources:
> checksum1/job.jar
> checksum2/job.jar
> A new application tries to use both of these resources, but internally refers 
> to them as different names:
> foo.jar maps to checksum1
> bar.jar maps to checksum2
> When the shared cache returns the path to the resources, both resources are 
> named the same (i.e. job.jar). Because of this, when the resources are 
> localized one of them clobbers the other. This is because both symlinks in 
> the container_id directory are the same name (i.e. job.jar) even though they 
> point to two separate resource directories.
> Originally we tackled this in the MapReduce client by using the fragment 
> portion of the resource url. This, however, seems like something that should 
> be solved at the YARN layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking

2016-02-16 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-4676:
--
Fix Version/s: (was: 2.8.0)

Removing fix-version as the patch isn't committed yet. This actually belongs to 
the "Target Version" field, but that is unavailable as of now.

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> 
>
> Key: YARN-4676
> URL: https://issues.apache.org/jira/browse/YARN-4676
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Zhi
>Assignee: Daniel Zhi
>  Labels: features
> Attachments: GracefulDecommissionYarnNode.pdf, HADOOP-4676.004.patch, 
> HADOOP-4676.005.patch
>
>
> DecommissioningNodeWatcher inside ResourceTrackingService tracks 
> DECOMMISSIONING nodes status automatically and asynchronously after 
> client/admin made the graceful decommission request. It tracks 
> DECOMMISSIONING nodes status to decide when, after all running containers on 
> the node have completed, will be transitioned into DECOMMISSIONED state. 
> NodesListManager detect and handle include and exclude list changes to kick 
> out decommission or recommission as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1547) Prevent DoS of ApplicationMasterProtocol by putting in limits

2016-02-16 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149085#comment-15149085
 ] 

Giovanni Matteo Fumarola commented on YARN-1547:


Thanks for raising this [~vinodkv]. I was wondering if I might take this up, if 
you are not actively working on it. 
I, [~subru] and [~kishorch] we brainstorming it and we came up with a first 
approach. I will share a small design document with you to receive feedback on 
it.


> Prevent DoS of ApplicationMasterProtocol by putting in limits
> -
>
> Key: YARN-1547
> URL: https://issues.apache.org/jira/browse/YARN-1547
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>
> Points of DoS in ApplicationMasterProtocol
>  - Host and trackingURL in RegisterApplicationMasterRequest
>  - Diagnostics, final trackingURL in FinishApplicationMasterRequest
>  - Unlimited number of resourceAsks, containersToBeReleased and 
> resourceBlacklistRequest in AllocateRequest
> -- Unbounded number of priorities and/or resourceRequests in each ask.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4695) EntityGroupFSTimelineStore to not log errors during shutdown

2016-02-16 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149041#comment-15149041
 ] 

Li Lu commented on YARN-4695:
-

Sure. I'll take a look. Thanks for the work! 

> EntityGroupFSTimelineStore to not log errors during shutdown
> 
>
> Key: YARN-4695
> URL: https://issues.apache.org/jira/browse/YARN-4695
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Li Lu
>
> # The {{EntityGroupFSTimelineStore}} threads log exceptions that get raised 
> during their execution.
> # the service stops by interrupting all its workers
> # as a result, the workers all log exceptions at error *even during a managed 
> shutdown*
> # this creates distracting noise in logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4117) End to end unit test with mini YARN cluster for AMRMProxy Service

2016-02-16 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149006#comment-15149006
 ] 

Giovanni Matteo Fumarola commented on YARN-4117:


Hi [~jianhe]
Can you please review this patch?
I am asking you because you reviewed and pushed the AMRMProxy on YARN-2884.

> End to end unit test with mini YARN cluster for AMRMProxy Service
> -
>
> Key: YARN-4117
> URL: https://issues.apache.org/jira/browse/YARN-4117
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Reporter: Kishore Chaliparambil
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-4117.v0.patch
>
>
> YARN-2884 introduces a proxy between AM and RM. This JIRA proposes an end to 
> end unit test using mini YARN cluster to the AMRMProxy service. This test 
> will validate register, allocate and finish application and token renewal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4117) End to end unit test with mini YARN cluster for AMRMProxy Service

2016-02-16 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-4117:
---
Attachment: YARN-4117.v0.patch

> End to end unit test with mini YARN cluster for AMRMProxy Service
> -
>
> Key: YARN-4117
> URL: https://issues.apache.org/jira/browse/YARN-4117
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Reporter: Kishore Chaliparambil
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-4117.v0.patch
>
>
> YARN-2884 introduces a proxy between AM and RM. This JIRA proposes an end to 
> end unit test using mini YARN cluster to the AMRMProxy service. This test 
> will validate register, allocate and finish application and token renewal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4696) EntityGroupFSTimelineStore to work in the absence of an RM

2016-02-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148995#comment-15148995
 ] 

Hadoop QA commented on YARN-4696:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 44s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 3s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 18s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 26s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage
 in trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 41s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
3s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 40s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 1s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 1s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 33s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 3 new + 
214 unchanged - 0 fixed = 217 total (was 214) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 32s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_72. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 4s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 55s 
{color} | {color:green} hadoop-yarn-server-timeline-pluginstorage in the patch 
passed with JDK v1.8.0_72. {color} |

[jira] [Updated] (YARN-4412) Create ClusterMonitor to compute ordered list of preferred NMs for OPPORTUNITIC containers

2016-02-16 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-4412:
--
Attachment: YARN-4412-yarn-2877.v4.patch

Updating patch to fix testcases.

With regard to the findbugs warnings:
# The {{System.exit(-1)}} in the {{EventDispatcher}} is intentional
# The Synchronization inconsistencies in the Protocol Buffer classes will 
remain.. to maintain consistency with all the other PBImpl classes. Also, I do 
not expect multiple threads to be modifying the same object in any case.

> Create ClusterMonitor to compute ordered list of preferred NMs for 
> OPPORTUNITIC containers
> --
>
> Key: YARN-4412
> URL: https://issues.apache.org/jira/browse/YARN-4412
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-4412-yarn-2877.v1.patch, 
> YARN-4412-yarn-2877.v2.patch, YARN-4412-yarn-2877.v3.patch, 
> YARN-4412-yarn-2877.v4.patch
>
>
> Introduce a Cluster Monitor that aggregates load information from individual 
> Node Managers and computes an ordered list of preferred Node managers to be 
> used as target Nodes for OPPORTUNISTIC container allocations. 
> This list can be pushed out to the Node Manager (specifically the AMRMProxy 
> running on the Node) via the Allocate Response. This will be used to make 
> local Scheduling decisions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4695) EntityGroupFSTimelineStore to not log errors during shutdown

2016-02-16 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148961#comment-15148961
 ] 

Steve Loughran commented on YARN-4695:
--

[~gtCarrera9] -I've fixed this in YARN-4696; if you could review that patch I'd 
be grateful

> EntityGroupFSTimelineStore to not log errors during shutdown
> 
>
> Key: YARN-4695
> URL: https://issues.apache.org/jira/browse/YARN-4695
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Li Lu
>
> # The {{EntityGroupFSTimelineStore}} threads log exceptions that get raised 
> during their execution.
> # the service stops by interrupting all its workers
> # as a result, the workers all log exceptions at error *even during a managed 
> shutdown*
> # this creates distracting noise in logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4695) EntityGroupFSTimelineStore to not log errors during shutdown

2016-02-16 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu reassigned YARN-4695:
---

Assignee: Li Lu

> EntityGroupFSTimelineStore to not log errors during shutdown
> 
>
> Key: YARN-4695
> URL: https://issues.apache.org/jira/browse/YARN-4695
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Li Lu
>
> # The {{EntityGroupFSTimelineStore}} threads log exceptions that get raised 
> during their execution.
> # the service stops by interrupting all its workers
> # as a result, the workers all log exceptions at error *even during a managed 
> shutdown*
> # this creates distracting noise in logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4696) EntityGroupFSTimelineStore to work in the absence of an RM

2016-02-16 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-4696:
-
Attachment: YARN-4696-001.patch

Patch -001; thing I had to to do to get my (external, spark integration) test 
closer to working.

These are a combination of things that are absolutely needed (disabling RM, 
flushing on close()), generally better (exception handling), and needed to 
debug what's going on (all the improved logging)

# RM integration can be disabled, the timeline store then only uses modified 
times as a liveness test. This includes checks for null around uses of 
yarnClient;
# I took the opportunity to clean up service shutdown in the process.
# YARN-4695 recommendations:  all worker threads unwrap exceptions and, if 
interrupted exceptions, skip the stack trace.
# better logging @ debug (including # of scanned apps)
# {{TimelineWriter}} doesn't rewrap IOEs in IOEs, wraps interrupted exception 
into {{InterruptedIOException}}
# {{FileSystemTimelineWriter.close()}}  does a {{flush()}}. Stops any last 
events getting lost.

There are tests, but not here. Look in 
https://github.com/steveloughran/spark-timeline-integration

> EntityGroupFSTimelineStore to work in the absence of an RM
> --
>
> Key: YARN-4696
> URL: https://issues.apache.org/jira/browse/YARN-4696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
> Attachments: YARN-4696-001.patch
>
>
> {{EntityGroupFSTimelineStore}} now depends on an RM being up and running; the 
> configuration pointing to it. This is a new change, and impacts testing where 
> you have historically been able to test without an RM running.
> The sole purpose of the probe is to automatically determine if an app is 
> running; it falls back to "unknown" if not. If the RM connection was 
> optional, the "unknown" codepath could be called directly, relying on age of 
> file as a metric of completion
> Options
> # add a flag to disable RM connect
> # skip automatically if RM not defined/set to 0.0.0.0
> # disable retries on yarn client IPC; if it fails, tag app as unknown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4648) Move preemption related tests from TestFairScheduler to TestFairSchedulerPreemption

2016-02-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148917#comment-15148917
 ] 

Hadoop QA commented on YARN-4648:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 19s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 41s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 158m 17s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_72 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12788055/YARN-4648.02.patch |
| JIRA Issue | YARN-4648 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| 

[jira] [Updated] (YARN-4678) Cluster used capacity is > 100 when container reserved

2016-02-16 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4678:
--
Attachment: 0001-YARN-4678.patch

Hi [~brahmareddy]
Sharing an initial version of patch based on above explanation.

> Cluster used capacity is > 100 when container reserved 
> ---
>
> Key: YARN-4678
> URL: https://issues.apache.org/jira/browse/YARN-4678
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: 0001-YARN-4678.patch
>
>
>  *Scenario:* 
> * Start cluster with Three NM's each having 8GB (cluster memory:24GB).
> * Configure queues with elasticity and userlimitfactor=10.
> * disable pre-emption.
> * run two job with different priority in different queue at the same time
> ** yarn jar hadoop-mapreduce-examples-2.7.2.jar pi -Dyarn.app.priority=LOW 
> -Dmapreduce.job.queuename=QueueA -Dmapreduce.map.memory.mb=4096 
> -Dyarn.app.mapreduce.am.resource.mb=1536 
> -Dmapreduce.job.reduce.slowstart.completedmaps=1.0 10 1
> ** yarn jar hadoop-mapreduce-examples-2.7.2.jar pi -Dyarn.app.priority=HIGH 
> -Dmapreduce.job.queuename=QueueB -Dmapreduce.map.memory.mb=4096 
> -Dyarn.app.mapreduce.am.resource.mb=1536 3 1
> * observe the cluster capacity which was used in RM web UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4696) EntityGroupFSTimelineStore to work in the absence of an RM

2016-02-16 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-4696:


 Summary: EntityGroupFSTimelineStore to work in the absence of an RM
 Key: YARN-4696
 URL: https://issues.apache.org/jira/browse/YARN-4696
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.8.0
Reporter: Steve Loughran


{{EntityGroupFSTimelineStore}} now depends on an RM being up and running; the 
configuration pointing to it. This is a new change, and impacts testing where 
you have historically been able to test without an RM running.

The sole purpose of the probe is to automatically determine if an app is 
running; it falls back to "unknown" if not. If the RM connection was optional, 
the "unknown" codepath could be called directly, relying on age of file as a 
metric of completion

Options

# add a flag to disable RM connect
# skip automatically if RM not defined/set to 0.0.0.0
# disable retries on yarn client IPC; if it fails, tag app as unknown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4648) Move preemption related tests from TestFairScheduler to TestFairSchedulerPreemption

2016-02-16 Thread Kai Sasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Sasaki updated YARN-4648:
-
Attachment: YARN-4648.02.patch

> Move preemption related tests from TestFairScheduler to 
> TestFairSchedulerPreemption
> ---
>
> Key: YARN-4648
> URL: https://issues.apache.org/jira/browse/YARN-4648
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Kai Sasaki
>  Labels: newbie++
> Attachments: YARN-4648.01.patch, YARN-4648.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4695) EntityGroupFSTimelineStore to not log errors during shutdown

2016-02-16 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148644#comment-15148644
 ] 

Steve Loughran commented on YARN-4695:
--

..hostname of www.bbc.co.uk/0.0.0.0:8032. is just 127.0.0.1 with the usual set 
of domains disabled. And as usual, IPC client doesn't detect "0.0.0.0" as an 
invalid IP address. Even if IPC client does (it should), ATS can and fail fast

> EntityGroupFSTimelineStore to not log errors during shutdown
> 
>
> Key: YARN-4695
> URL: https://issues.apache.org/jira/browse/YARN-4695
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>
> # The {{EntityGroupFSTimelineStore}} threads log exceptions that get raised 
> during their execution.
> # the service stops by interrupting all its workers
> # as a result, the workers all log exceptions at error *even during a managed 
> shutdown*
> # this creates distracting noise in logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4695) EntityGroupFSTimelineStore to not log errors during shutdown

2016-02-16 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148643#comment-15148643
 ] 

Steve Loughran commented on YARN-4695:
--

Fuller stack. What's happening is the RM has been shut down (to be precise: 
there isn't an RM running. not even a little one). the log parser is asking for 
the latest  list of running apps, and timing out; this puts IPC into retry, 
which stops the shutdown working.
{code}
016-02-16 14:13:28,126 [IPC Server Responder] INFO  ipc.Server 
(Server.java:run(959)) - Stopping IPC Server Responder
2016-02-16 14:13:28,126 [ScalaTest-main-running-TimelineListenerSuite] INFO  
timeline.EntityGroupFSTimelineStore 
(EntityGroupFSTimelineStore.java:serviceStop(275)) - Stopping 
EntityGroupFSTimelineStore
2016-02-16 14:13:28,127 [ScalaTest-main-running-TimelineListenerSuite] INFO  
timeline.EntityGroupFSTimelineStore 
(EntityGroupFSTimelineStore.java:serviceStop(279)) - Waiting for executor to 
terminate
2016-02-16 14:13:28,966 [EntityLogPluginWorker #1] INFO  ipc.Client 
(Client.java:handleConnectionFailure(897)) - Retrying connect to server: 
www.bbc.co.uk/0.0.0.0:8032. Already tried 0 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-02-16 14:13:29,970 [EntityLogPluginWorker #1] INFO  ipc.Client 
(Client.java:handleConnectionFailure(897)) - Retrying connect to server: 
www.bbc.co.uk/0.0.0.0:8032. Already tried 1 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-02-16 14:13:30,975 [EntityLogPluginWorker #1] INFO  ipc.Client 
(Client.java:handleConnectionFailure(897)) - Retrying connect to server: 
www.bbc.co.uk/0.0.0.0:8032. Already tried 2 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-02-16 14:13:31,980 [EntityLogPluginWorker #1] INFO  ipc.Client 
(Client.java:handleConnectionFailure(897)) - Retrying connect to server: 
www.bbc.co.uk/0.0.0.0:8032. Already tried 3 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-02-16 14:13:32,986 [EntityLogPluginWorker #1] INFO  ipc.Client 
(Client.java:handleConnectionFailure(897)) - Retrying connect to server: 
www.bbc.co.uk/0.0.0.0:8032. Already tried 4 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-02-16 14:13:33,995 [EntityLogPluginWorker #1] INFO  ipc.Client 
(Client.java:handleConnectionFailure(897)) - Retrying connect to server: 
www.bbc.co.uk/0.0.0.0:8032. Already tried 5 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-02-16 14:13:35,000 [EntityLogPluginWorker #1] INFO  ipc.Client 
(Client.java:handleConnectionFailure(897)) - Retrying connect to server: 
www.bbc.co.uk/0.0.0.0:8032. Already tried 6 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-02-16 14:13:36,005 [EntityLogPluginWorker #1] INFO  ipc.Client 
(Client.java:handleConnectionFailure(897)) - Retrying connect to server: 
www.bbc.co.uk/0.0.0.0:8032. Already tried 7 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-02-16 14:13:37,010 [EntityLogPluginWorker #1] INFO  ipc.Client 
(Client.java:handleConnectionFailure(897)) - Retrying connect to server: 
www.bbc.co.uk/0.0.0.0:8032. Already tried 8 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-02-16 14:13:38,015 [EntityLogPluginWorker #1] INFO  ipc.Client 
(Client.java:handleConnectionFailure(897)) - Retrying connect to server: 
www.bbc.co.uk/0.0.0.0:8032. Already tried 9 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-02-16 14:13:38,133 [ScalaTest-main-running-TimelineListenerSuite] WARN  
timeline.EntityGroupFSTimelineStore 
(EntityGroupFSTimelineStore.java:serviceStop(284)) - Executor did not terminate
2016-02-16 14:13:38,133 [ScalaTest-main-running-TimelineListenerSuite] INFO  
timeline.LeveldbTimelineStore (LeveldbTimelineStore.java:serviceStop(250)) - 
Waiting for deletion thread to complete its current action
2016-02-16 14:13:38,133 [Thread-9] INFO  timeline.LeveldbTimelineStore 
(LeveldbTimelineStore.java:run(296)) - Deletion thread received interrupt, 
exiting
2016-02-16 14:13:38,134 [EntityLogPluginWorker #1] ERROR 
timeline.EntityGroupFSTimelineStore (EntityGroupFSTimelineStore.java:run(693)) 
- Error processing logs for application__
java.lang.reflect.UndeclaredThrowableException
at com.sun.proxy.$Proxy25.getApplicationReport(Unknown Source)
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:448)
at 

[jira] [Commented] (YARN-4695) EntityGroupFSTimelineStore to not log errors during shutdown

2016-02-16 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148624#comment-15148624
 ] 

Steve Loughran commented on YARN-4695:
--

stack. Note how the interrupted exception happened during RPC and has been 
wrapped
{code}
2016-02-16 13:57:21,171 [ScalaTest-main-running-TimelineListenerSuite] INFO  
timeline.LeveldbTimelineStore (LeveldbTimelineStore.java:serviceStop(250)) - 
Waiting for deletion thread to complete its current action
2016-02-16 13:57:21,171 [Thread-10] INFO  timeline.LeveldbTimelineStore 
(LeveldbTimelineStore.java:run(296)) - Deletion thread received interrupt, 
exiting
2016-02-16 13:57:21,172 [EntityLogPluginWorker #1] ERROR 
timeline.EntityGroupFSTimelineStore (EntityGroupFSTimelineStore.java:run(693)) 
- Error processing logs for application__
java.lang.reflect.UndeclaredThrowableException
at com.sun.proxy.$Proxy25.getApplicationReport(Unknown Source)
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:448)
at 
org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getAppState(EntityGroupFSTimelineStore.java:464)
at 
org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.access$700(EntityGroupFSTimelineStore.java:79)
at 
org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore$AppLogs.parseSummaryLogs(EntityGroupFSTimelineStore.java:529)
at 
org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore$AppLogs.parseSummaryLogs(EntityGroupFSTimelineStore.java:519)
at 
org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore$ActiveLogParser.run(EntityGroupFSTimelineStore.java:686)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:159)
... 14 more
{code}

> EntityGroupFSTimelineStore to not log errors during shutdown
> 
>
> Key: YARN-4695
> URL: https://issues.apache.org/jira/browse/YARN-4695
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>
> # The {{EntityGroupFSTimelineStore}} threads log exceptions that get raised 
> during their execution.
> # the service stops by interrupting all its workers
> # as a result, the workers all log exceptions at error *even during a managed 
> shutdown*
> # this creates distracting noise in logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4695) EntityGroupFSTimelineStore to not log errors during shutdown

2016-02-16 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-4695:


 Summary: EntityGroupFSTimelineStore to not log errors during 
shutdown
 Key: YARN-4695
 URL: https://issues.apache.org/jira/browse/YARN-4695
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.8.0
Reporter: Steve Loughran


# The {{EntityGroupFSTimelineStore}} threads log exceptions that get raised 
during their execution.
# the service stops by interrupting all its workers
# as a result, the workers all log exceptions at error *even during a managed 
shutdown*
# this creates distracting noise in logs




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4648) Move preemption related tests from TestFairScheduler to TestFairSchedulerPreemption

2016-02-16 Thread Kai Sasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148619#comment-15148619
 ] 

Kai Sasaki commented on YARN-4648:
--

[~ozawa] It's reasonable. I'll update. Thanks!

> Move preemption related tests from TestFairScheduler to 
> TestFairSchedulerPreemption
> ---
>
> Key: YARN-4648
> URL: https://issues.apache.org/jira/browse/YARN-4648
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Kai Sasaki
>  Labels: newbie++
> Attachments: YARN-4648.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4412) Create ClusterMonitor to compute ordered list of preferred NMs for OPPORTUNITIC containers

2016-02-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148532#comment-15148532
 ] 

Hadoop QA commented on YARN-4412:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 14s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
40s {color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 41s 
{color} | {color:green} yarn-2877 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 34s 
{color} | {color:green} yarn-2877 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
9s {color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 46s 
{color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
21s {color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
21s {color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 12s 
{color} | {color:green} yarn-2877 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 47s 
{color} | {color:green} yarn-2877 passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 34s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 34s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 6m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 34s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 10s 
{color} | {color:red} root: patch generated 83 new + 442 unchanged - 11 fixed = 
525 total (was 453) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 22s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 57s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 56s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 23s {color} 
| {color:red} hadoop-yarn-api in the patch failed with JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 52s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_72. 

[jira] [Commented] (YARN-2225) Turn the virtual memory check to be off by default

2016-02-16 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148449#comment-15148449
 ] 

Akira AJISAKA commented on YARN-2225:
-

bq. Why not making vmem-pmem ratio larger to address the problem?
Yes, we can avoid the error by making vmem-pmem ratio larger, so the problems 
are, the default value of the ratio is too small for Java 8 and the virtual 
memory check is enabled by default. Can we make the ratio larger, or make the 
vmem check disabled by default?

> Turn the virtual memory check to be off by default
> --
>
> Key: YARN-2225
> URL: https://issues.apache.org/jira/browse/YARN-2225
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2225.patch
>
>
> The virtual memory check may not be the best way to isolate applications. 
> Virtual memory is not the constrained resource. It would be better if we 
> limit the swapping of the task using swapiness instead. This patch will turn 
> this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if 
> they need to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4692) [Umbrella] Simplified and first-class support for services in YARN

2016-02-16 Thread Marco Rabozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148405#comment-15148405
 ] 

Marco Rabozzi commented on YARN-4692:
-

Thanks [~vinodkv] for starting the discussion and [~asuresh] for the detailed 
comments on long running container scheduling. Overall, the document gives a 
very detailed overview of the current state for long running services support 
in YARN.

With respect to long running service upgrades, the proposal for allocation 
reuse (3.2.3) is very interesting since it allows to reduce the time needed for 
container upgrades. However, if the AM container is the one that needs to be 
upgraded, the RM should be aware of the process, otherwise, in case of 
subsequent AM or NM failures the AM might be restarted with old bits. I think 
that a possible solution would be to revise the design proposed in YARN-4470 to 
take into account allocation reuse. We could decouple the request to update the 
submission context within the RM from the actual updated *startContainer* 
request for the same AM allocation.


> [Umbrella] Simplified and first-class support for services in YARN
> --
>
> Key: YARN-4692
> URL: https://issues.apache.org/jira/browse/YARN-4692
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: 
> YARN-First-Class-And-Simplified-Support-For-Services-v0.pdf
>
>
> YARN-896 focused on getting the ball rolling on the support for services 
> (long running applications) on YARN.
> I’d like propose the next stage of this effort: _Simplified and first-class 
> support for services in YARN_.
> The chief rationale for filing a separate new JIRA is threefold:
>  - Do a fresh survey of all the things that are already implemented in the 
> project
>  - Weave a comprehensive story around what we further need and attempt to 
> rally the community around a concrete end-goal, and
>  - Additionally focus on functionality that YARN-896 and friends left for 
> higher layers to take care of and see how much of that is better integrated 
> into the YARN platform itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2225) Turn the virtual memory check to be off by default

2016-02-16 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148329#comment-15148329
 ] 

Tsuyoshi Ozawa commented on YARN-2225:
--

Why not making vmem-pmem ratio larger to address the problem?

> Turn the virtual memory check to be off by default
> --
>
> Key: YARN-2225
> URL: https://issues.apache.org/jira/browse/YARN-2225
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2225.patch
>
>
> The virtual memory check may not be the best way to isolate applications. 
> Virtual memory is not the constrained resource. It would be better if we 
> limit the swapping of the task using swapiness instead. This patch will turn 
> this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if 
> they need to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2225) Turn the virtual memory check to be off by default

2016-02-16 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148306#comment-15148306
 ] 

Akira AJISAKA commented on YARN-2225:
-

I'm thinking the default value should be changed in branch-2 as well. We did 
some incompatible changes in branch-2 if the change is really needed. There are 
14 incompatible changes in 2.6.0/2.7.0.
https://issues.apache.org/jira/issues/?jql=project%20in%20(YARN%2C%20HADOOP%2C%20HDFS%2C%20MAPREDUCE)%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20in%20(2.6.0%2C%202.7.0)%20AND%20%22Hadoop%20Flags%22%20%3D%20%22Incompatible%20change%22

I fed up with hearing Java8 users' complaints, "The default value does not 
work.".

> Turn the virtual memory check to be off by default
> --
>
> Key: YARN-2225
> URL: https://issues.apache.org/jira/browse/YARN-2225
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2225.patch
>
>
> The virtual memory check may not be the best way to isolate applications. 
> Virtual memory is not the constrained resource. It would be better if we 
> limit the swapping of the task using swapiness instead. This patch will turn 
> this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if 
> they need to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-2225) Turn the virtual memory check to be off by default

2016-02-16 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA reopened YARN-2225:
-

I'd like to reopen this.
The issue was previously closed because this change is incompatible, but the 
change can be in trunk.

> Turn the virtual memory check to be off by default
> --
>
> Key: YARN-2225
> URL: https://issues.apache.org/jira/browse/YARN-2225
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2225.patch
>
>
> The virtual memory check may not be the best way to isolate applications. 
> Virtual memory is not the constrained resource. It would be better if we 
> limit the swapping of the task using swapiness instead. This patch will turn 
> this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if 
> they need to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4412) Create ClusterMonitor to compute ordered list of preferred NMs for OPPORTUNITIC containers

2016-02-16 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-4412:
--
Attachment: YARN-4412-yarn-2877.v3.patch

Uploading patch to rebase with the latest YARN-2877 as well as address 
[~curino]'s review comments

> Create ClusterMonitor to compute ordered list of preferred NMs for 
> OPPORTUNITIC containers
> --
>
> Key: YARN-4412
> URL: https://issues.apache.org/jira/browse/YARN-4412
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-4412-yarn-2877.v1.patch, 
> YARN-4412-yarn-2877.v2.patch, YARN-4412-yarn-2877.v3.patch
>
>
> Introduce a Cluster Monitor that aggregates load information from individual 
> Node Managers and computes an ordered list of preferred Node managers to be 
> used as target Nodes for OPPORTUNISTIC container allocations. 
> This list can be pushed out to the Node Manager (specifically the AMRMProxy 
> running on the Node) via the Allocate Response. This will be used to make 
> local Scheduling decisions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4412) Create ClusterMonitor to compute ordered list of preferred NMs for OPPORTUNITIC containers

2016-02-16 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148261#comment-15148261
 ] 

Arun Suresh commented on YARN-4412:
---

Many thanks for the detailed review [~curino].

# I totally agree with your point in explicitly authorizing AMs to allow them 
to send and receive cluster information via the extended protocol : YARN-4631 
has been raised to track this.
# With regard to generalizing {{QueuedContainersStatus}} into a 
{{ClusterStatus}}, Please note.. this is actually metadata sent from the NM to 
the RM, therefore *ClusterStatus* might not apply here. But I agree, we 
probably can add more cluster information to the 
{{DistributedSchedulingProtocol}}, which we introduced in YARN-2885. Also the 
node heartbeat does already contain both Container as well as aggregate Node 
resource utilization information. {{QueuedContainersStatus}} is just another 
utilization metric used by the {{ClusterMonitor}} running on the RM and used by 
the DistributedScheduling framework to gauge the relative load on a Node based 
on the state of the queue (maintained by the {{ContainersMonitor}} which queues 
OPPORTUNISTICS container requests) 

bq.  ..documentation on the various classes would help. e.g., you introduce a 
DistributedSchedulingService, ..
Agreed, I have added some class level docs to some of the new classes 
introduced here.

bq. ... if you are factoring out all the "guts" of SchedulerEventDispatcher, 
can't we simply move the class out? ..
Agreed.. 

bq. Can you clarify what happens in DistributedSchedulingService.getServer() 
?...
Fixed the comment to explain this.

bq. ..assumes resources will have only cpu/mem...Is there any better way to 
load this info from configuration? It would be nice to have a 
config.getResource("blah"), which takes care of this...
Good point.. unfortunately, currently the Configuration object does not support 
{{getResource()}}.. Once the generalized resource model lands, will circle back 
to this.

bq. I see tests for TopKNodeSelector, but for nothing else. Is this enough?
Definitely not.. but we have to wait for the actual changes in the 
{{ContainerManager}} and {{ContainersMonitor}} class, handled in YARN-2883 to 
test this end-to-end. In the mean time, I will add tests to verify that extra 
fields in the protobuff are handled correctly.


> Create ClusterMonitor to compute ordered list of preferred NMs for 
> OPPORTUNITIC containers
> --
>
> Key: YARN-4412
> URL: https://issues.apache.org/jira/browse/YARN-4412
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-4412-yarn-2877.v1.patch, 
> YARN-4412-yarn-2877.v2.patch
>
>
> Introduce a Cluster Monitor that aggregates load information from individual 
> Node Managers and computes an ordered list of preferred Node managers to be 
> used as target Nodes for OPPORTUNISTIC container allocations. 
> This list can be pushed out to the Node Manager (specifically the AMRMProxy 
> running on the Node) via the Allocate Response. This will be used to make 
> local Scheduling decisions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)