[jira] [Commented] (YARN-4547) LeafQueue#getApplications() is read-only interface, but it provides reference to caller
[ https://issues.apache.org/jira/browse/YARN-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150015#comment-15150015 ] Devaraj K commented on YARN-4547: - shouldn't it be duplicate instead of Done? > LeafQueue#getApplications() is read-only interface, but it provides reference > to caller > --- > > Key: YARN-4547 > URL: https://issues.apache.org/jira/browse/YARN-4547 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > > The below API is read-only interface, but returning reference to the caller. > This causing caller to modify the orderingPolicy entities. If required > reference of ordering policy, caller can use > {{LeagQueue#getOrderingPolicy()#getSchedulableEntities()}} > The returning object should be clone of > orderingPolicy.getSchedulableEntities() > {code} > /** >* Obtain (read-only) collection of active applications. >*/ > public Collection getApplications() { > return orderingPolicy.getSchedulableEntities(); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4547) LeafQueue#getApplications() is read-only interface, but it provides reference to caller
[ https://issues.apache.org/jira/browse/YARN-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S resolved YARN-4547. - Resolution: Done This change is handled along with YARN-4617. Closing it as Done > LeafQueue#getApplications() is read-only interface, but it provides reference > to caller > --- > > Key: YARN-4547 > URL: https://issues.apache.org/jira/browse/YARN-4547 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > > The below API is read-only interface, but returning reference to the caller. > This causing caller to modify the orderingPolicy entities. If required > reference of ordering policy, caller can use > {{LeagQueue#getOrderingPolicy()#getSchedulableEntities()}} > The returning object should be clone of > orderingPolicy.getSchedulableEntities() > {code} > /** >* Obtain (read-only) collection of active applications. >*/ > public Collection getApplications() { > return orderingPolicy.getSchedulableEntities(); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission
[ https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149811#comment-15149811 ] Junping Du commented on YARN-3223: -- Sorry for reply late as on vacation recently. Will take a look at it soon. > Resource update during NM graceful decommission > --- > > Key: YARN-3223 > URL: https://issues.apache.org/jira/browse/YARN-3223 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, resourcemanager >Affects Versions: 2.7.1 >Reporter: Junping Du >Assignee: Brook Zhou > Attachments: YARN-3223-v0.patch, YARN-3223-v1.patch, > YARN-3223-v2.patch, YARN-3223-v3.patch, YARN-3223-v4.patch, YARN-3223-v5.patch > > > During NM graceful decommission, we should handle resource update properly, > include: make RMNode keep track of old resource for possible rollback, keep > available resource to 0 and used resource get updated when > container finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run
[ https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149654#comment-15149654 ] Jun Gong commented on YARN-3998: Sorry for late reply, I was on holiday. Thanks [~vinodkv] and [~vvasudev] for suggestion and review! Some additional thought besides [~vvasudev]'s opinion: {quote} Unification with AM restart policies {quote} I agree with [~vvasudev]. Now AM restart polices is retrying across different nodes, this feature is retrying on local node. When RM launches AM, it could specify local retry policy for it. {quote} Treat relaunch in a first-class manner {quote} Glad to see it to be a first-class manner, I will update the patch. {quote} The following isn’t fool-proof and won’t work for all apps, can we just persist and read the selected log-dir from the state-store? ContainerLaunch.handleContainerExitWithFailure() needs to handled differently during container-relaunches. The same can be done for the work-dir. All of these are related. If we store the log dir and work dir in the state store, we can address all 3 of these. {quote} Yes, it will be better to store the log dir and work dir if we aims to make it more accurate. I was thinking to make minimal changes for this feature. {quote} In fact, if we end up changing the work-dir during relaunch due to a bad-dir, that may result in a breakage for the app. Apps may be reading from / writing into the work-dir and changing it during relaunch may invalidate application's assumptions. Should we just fail the container completely and let the AM deal with it? {quote} My thought is that if user specifies retry policy on container, the user should make sure that container could deal with this situation. {quote} Instead of removing a line and setting the limit to 10*1000, take the last 'n' characters in the string where 'n' is a config setting. {quote} It might make the diagnostics not consistent to remove the last n characters, suppose the diagnostics is “The exception is ” and there is n characters in XXX, the diagnositics becomes “The exception is”. There is similar problem by removing first or last n lines. How about removing previous attempts' error information and just keeping the latest attempt's information? Glad to see more discussion about the feature. > Add retry-times to let NM re-launch container when it fails to run > -- > > Key: YARN-3998 > URL: https://issues.apache.org/jira/browse/YARN-3998 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-3998.01.patch, YARN-3998.02.patch, > YARN-3998.03.patch, YARN-3998.04.patch, YARN-3998.05.patch, YARN-3998.06.patch > > > I'd like to add a field(retry-times) in ContainerLaunchContext. When AM > launches containers, it could specify the value. Then NM will re-launch the > container 'retry-times' times when it fails to run(e.g.exit code is not 0). > It will save a lot of time. It avoids container localization. RM does not > need to re-schedule the container. And local files in container's working > directory will be left for re-use.(If container have downloaded some big > files, it does not need to re-download them when running again.) > We find it is useful in systems like Storm. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4697) NM aggregation thread pool is not bound by limits
[ https://issues.apache.org/jira/browse/YARN-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149568#comment-15149568 ] Haibo Chen commented on YARN-4697: -- Default value is 100 as defined in YarnConfiguration > NM aggregation thread pool is not bound by limits > - > > Key: YARN-4697 > URL: https://issues.apache.org/jira/browse/YARN-4697 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: yarn4697.001.patch > > > In the LogAggregationService.java we create a threadpool to upload logs from > the nodemanager to HDFS if log aggregation is turned on. This is a cached > threadpool which based on the javadoc is an ulimited pool of threads. > In the case that we have had a problem with log aggregation this could cause > a problem on restart. The number of threads created at that point could be > huge and will put a large load on the NameNode and in worse case could even > bring it down due to file descriptor issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4697) NM aggregation thread pool is not bound by limits
[ https://issues.apache.org/jira/browse/YARN-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-4697: - Attachment: yarn4697.001.patch > NM aggregation thread pool is not bound by limits > - > > Key: YARN-4697 > URL: https://issues.apache.org/jira/browse/YARN-4697 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: yarn4697.001.patch > > > In the LogAggregationService.java we create a threadpool to upload logs from > the nodemanager to HDFS if log aggregation is turned on. This is a cached > threadpool which based on the javadoc is an ulimited pool of threads. > In the case that we have had a problem with log aggregation this could cause > a problem on restart. The number of threads created at that point could be > huge and will put a large load on the NameNode and in worse case could even > bring it down due to file descriptor issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4696) EntityGroupFSTimelineStore to work in the absence of an RM
[ https://issues.apache.org/jira/browse/YARN-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149562#comment-15149562 ] Li Lu commented on YARN-4696: - Thanks for the work [~ste...@apache.org]! My main question is that what is the assumed use case for the "non-RM" mode of the reader, other than unit tests? If it's only for unit tests, are there any ways we can clearly restrict this? Because IIUC, if detached from the RM, all app states will be unknown and eventually completed. However, the status is not accurate because it's only a timeout from unknownActiveMillis. For unit tests, is it possible to have a mock RM to to the same job? If there are too much troubles then having this looks fine, but we need to clearly restrict the use case. nits: - Line 46, EntityGroupFSTimelineStore, I think we'd incline to avoid import \*s? - There is a findbugs warning about an inconsistent synchronization condition for LevelDBCacheTimelineStore, where we may want to synchronize on the constructor? This is an unrelated failure, so feel free to skip it. However, if you happen to have time, a quick fix would also be helpful. [~xgong] to double check the logic on the writer side. Exception handling looks fine but I would like to double check the logic on the flush. > EntityGroupFSTimelineStore to work in the absence of an RM > -- > > Key: YARN-4696 > URL: https://issues.apache.org/jira/browse/YARN-4696 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Steve Loughran > Attachments: YARN-4696-001.patch > > > {{EntityGroupFSTimelineStore}} now depends on an RM being up and running; the > configuration pointing to it. This is a new change, and impacts testing where > you have historically been able to test without an RM running. > The sole purpose of the probe is to automatically determine if an app is > running; it falls back to "unknown" if not. If the RM connection was > optional, the "unknown" codepath could be called directly, relying on age of > file as a metric of completion > Options > # add a flag to disable RM connect > # skip automatically if RM not defined/set to 0.0.0.0 > # disable retries on yarn client IPC; if it fails, tag app as unknown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4697) NM aggregation thread pool is not bound by limits
Haibo Chen created YARN-4697: Summary: NM aggregation thread pool is not bound by limits Key: YARN-4697 URL: https://issues.apache.org/jira/browse/YARN-4697 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Haibo Chen Assignee: Haibo Chen In the LogAggregationService.java we create a threadpool to upload logs from the nodemanager to HDFS if log aggregation is turned on. This is a cached threadpool which based on the javadoc is an ulimited pool of threads. In the case that we have had a problem with log aggregation this could cause a problem on restart. The number of threads created at that point could be huge and will put a large load on the NameNode and in worse case could even bring it down due to file descriptor issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-1492: --- Target Version/s: 2.9.0 (was: 2.8.0) > truly shared cache for jars (jobjar/libjar) > --- > > Key: YARN-1492 > URL: https://issues.apache.org/jira/browse/YARN-1492 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.0.4-alpha >Reporter: Sangjin Lee >Assignee: Chris Trezzo >Priority: Critical > Attachments: YARN-1492-all-trunk-v1.patch, > YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, > YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, > shared_cache_design.pdf, shared_cache_design_v2.pdf, > shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, > shared_cache_design_v5.pdf, shared_cache_design_v6.pdf > > > Currently there is the distributed cache that enables you to cache jars and > files so that attempts from the same job can reuse them. However, sharing is > limited with the distributed cache because it is normally on a per-job basis. > On a large cluster, sometimes copying of jobjars and libjars becomes so > prevalent that it consumes a large portion of the network bandwidth, not to > speak of defeating the purpose of "bringing compute to where data is". This > is wasteful because in most cases code doesn't change much across many jobs. > I'd like to propose and discuss feasibility of introducing a truly shared > cache so that multiple jobs from multiple users can share and cache jars. > This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3637) Handle localization sym-linking correctly at the YARN level
[ https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-3637: --- Target Version/s: 2.9.0 (was: 2.8.0) > Handle localization sym-linking correctly at the YARN level > --- > > Key: YARN-3637 > URL: https://issues.apache.org/jira/browse/YARN-3637 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Blocker > > The shared cache needs to handle resource sym-linking at the YARN layer. > Currently, we let the application layer (i.e. mapreduce) handle this, but it > is probably better for all applications if it is handled transparently. > Here is the scenario: > Imagine two separate jars (with unique checksums) that have the same name > job.jar. > They are stored in the shared cache as two separate resources: > checksum1/job.jar > checksum2/job.jar > A new application tries to use both of these resources, but internally refers > to them as different names: > foo.jar maps to checksum1 > bar.jar maps to checksum2 > When the shared cache returns the path to the resources, both resources are > named the same (i.e. job.jar). Because of this, when the resources are > localized one of them clobbers the other. This is because both symlinks in > the container_id directory are the same name (i.e. job.jar) even though they > point to two separate resource directories. > Originally we tackled this in the MapReduce client by using the fragment > portion of the resource url. This, however, seems like something that should > be solved at the YARN layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3637) Handle localization sym-linking correctly at the YARN level
[ https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149512#comment-15149512 ] Chris Trezzo commented on YARN-3637: [~vinodkv] I don't think I will have time to work on this before the 2.8 release. I can move this to 2.9 for now. > Handle localization sym-linking correctly at the YARN level > --- > > Key: YARN-3637 > URL: https://issues.apache.org/jira/browse/YARN-3637 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Blocker > > The shared cache needs to handle resource sym-linking at the YARN layer. > Currently, we let the application layer (i.e. mapreduce) handle this, but it > is probably better for all applications if it is handled transparently. > Here is the scenario: > Imagine two separate jars (with unique checksums) that have the same name > job.jar. > They are stored in the shared cache as two separate resources: > checksum1/job.jar > checksum2/job.jar > A new application tries to use both of these resources, but internally refers > to them as different names: > foo.jar maps to checksum1 > bar.jar maps to checksum2 > When the shared cache returns the path to the resources, both resources are > named the same (i.e. job.jar). Because of this, when the resources are > localized one of them clobbers the other. This is because both symlinks in > the container_id directory are the same name (i.e. job.jar) even though they > point to two separate resource directories. > Originally we tackled this in the MapReduce client by using the fragment > portion of the resource url. This, however, seems like something that should > be solved at the YARN layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4690) Skip object allocation in FSAppAttempt#getResourceUsage when possible
[ https://issues.apache.org/jira/browse/YARN-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149463#comment-15149463 ] Sangjin Lee commented on YARN-4690: --- It looks good to me for the most part. I have only one minor comment. For the case where {{getPreemptedResources()}} is non-zero, we might want to cache that as a small optimization. In other words, {code} Resource preemptedResource = getPreemptedResources(); return preemptedResources.equals(Resources.none()) ? getCurrentConsumption() : Resources.subtract(getCurrentConsumption(), preemptedResources()); {code} > Skip object allocation in FSAppAttempt#getResourceUsage when possible > - > > Key: YARN-4690 > URL: https://issues.apache.org/jira/browse/YARN-4690 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: YARN-4690.patch > > > YARN-2768 addresses an important bottleneck. Here is another similar instance > where object allocation in Resources#subtract will slow down the fair > scheduler's event processing thread. > {noformat} > org.apache.hadoop.yarn.factories.impl.pb.RecordFactoryPBImpl.newRecordInstance(RecordFactoryPBImpl.java) > org.apache.hadoop.yarn.util.Records.newRecord(Records.java) > > org.apache.hadoop.yarn.util.resource.Resources.createResource(Resources.java) > org.apache.hadoop.yarn.util.resource.Resources.clone(Resources.java) > org.apache.hadoop.yarn.util.resource.Resources.subtract(Resources.java) > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getResourceUsage(FSAppAttempt.java) > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.getResourceUsage(FSLeafQueue.java) > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy$FairShareComparator.compare(FairSharePolicy.java) > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy$FairShareComparator.compare(FairSharePolicy.java) > java.util.TimSort.binarySort(TimSort.java) > java.util.TimSort.sort(TimSort.java) > java.util.TimSort.sort(TimSort.java) > java.util.Arrays.sort(Arrays.java) > java.util.Collections.sort(Collections.java) > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java) > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java) > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java) > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java) > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java) > > org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.handle(ResourceSchedulerWrapper.java) > > org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.handle(ResourceSchedulerWrapper.java) > {noformat} > One way to fix it is to return {{getCurrentConsumption()}} if there is no > preemption which is the normal case. This means {{getResourceUsage}} method > will return reference to {{FSAppAttempt}}'s internal resource object. But > that should be ok as {{getResourceUsage}} doesn't expect the caller to modify > the object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4117) End to end unit test with mini YARN cluster for AMRMProxy Service
[ https://issues.apache.org/jira/browse/YARN-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149431#comment-15149431 ] Giovanni Matteo Fumarola commented on YARN-4117: I just ran all the tests that failed or timed out and they completed fine in my machine. My only concern is about the checkstyle. > End to end unit test with mini YARN cluster for AMRMProxy Service > - > > Key: YARN-4117 > URL: https://issues.apache.org/jira/browse/YARN-4117 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Reporter: Kishore Chaliparambil >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-4117.v0.patch > > > YARN-2884 introduces a proxy between AM and RM. This JIRA proposes an end to > end unit test using mini YARN cluster to the AMRMProxy service. This test > will validate register, allocate and finish application and token renewal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4484) Available Resource calculation for a queue is not correct when used with labels
[ https://issues.apache.org/jira/browse/YARN-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149414#comment-15149414 ] Wangda Tan commented on YARN-4484: -- Thanks [~sunilg]/[~bibinchundatt], bq. In new patch we can also take care of condition where the label mapping is not available in queue. Sorry I may not get this, could you elaborate? Few comments for the patch: - Instead of using componentwiseMax, {{available-headroom-per-partition = partition-used >= partition-configured ? componentwiseMax(partition-configured - partition-used, 0) : 0}}. This is because when partition-used >= partition-configured, you cannot allocate anything so the available should be considered to 0. - It's better to add a test. > Available Resource calculation for a queue is not correct when used with > labels > --- > > Key: YARN-4484 > URL: https://issues.apache.org/jira/browse/YARN-4484 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4484.patch > > > To calculate available resource for a queue, we have to get the total > resource allocated for all labels in queue compare to its usage. > Also address the comments given in > [YARN-4304-comments|https://issues.apache.org/jira/browse/YARN-4304?focusedCommentId=15064874=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15064874 > ] given by [~leftnoteasy] for same. > ClusterMetrics related issues will also get handled once we fix this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4412) Create ClusterMonitor to compute ordered list of preferred NMs for OPPORTUNITIC containers
[ https://issues.apache.org/jira/browse/YARN-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149383#comment-15149383 ] Hadoop QA commented on YARN-4412: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 11 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 53s {color} | {color:green} yarn-2877 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 11s {color} | {color:green} yarn-2877 passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 39s {color} | {color:green} yarn-2877 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 9s {color} | {color:green} yarn-2877 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 48s {color} | {color:green} yarn-2877 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 21s {color} | {color:green} yarn-2877 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 20s {color} | {color:green} yarn-2877 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 20s {color} | {color:green} yarn-2877 passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 56s {color} | {color:green} yarn-2877 passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 19s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 6m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 58s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 6m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 58s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 9s {color} | {color:red} root: patch generated 83 new + 441 unchanged - 12 fixed = 524 total (was 453) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 24s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 57s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 19s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 51s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_72. {color} | |
[jira] [Commented] (YARN-3945) maxApplicationsPerUser is wrongly calculated
[ https://issues.apache.org/jira/browse/YARN-3945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149365#comment-15149365 ] Wangda Tan commented on YARN-3945: -- [~Naganarasimha], bq. so based on this there will be no elasticity even though the resources are free in some other queue, is this expected ? I think so bq. Are we trying to avoid elasticity because we try to avoid preempting AM's even when preemption is enabled? That's the one purpose, the other purpose is, when preemption is disabled, we will not suffer with too many AMs launched with queue's available resource increases and then come back. To make move this task forward, I would suggest: # Resolve bug of maxApplicationsPerUser should be capped by maxApplicationsPerQueue # Computation of user AM limit should be symmetric to computation of user-limit, and user AM limit should be capped by queue's AM limit # Avoid flexibility of computing queue and user's AM-limit (do not consider queue max cap). This needs more discussion. My understanding is, #1 and #2 are scope of this JIRA, #3 could be done separately. Agree? > maxApplicationsPerUser is wrongly calculated > > > Key: YARN-3945 > URL: https://issues.apache.org/jira/browse/YARN-3945 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: YARN-3945.20150728-1.patch, YARN-3945.20150729-1.patch, > YARN-3945.V1.003.patch > > > maxApplicationsPerUser is currently calculated based on the formula > {{maxApplicationsPerUser = (int)(maxApplications * (userLimit / 100.0f) * > userLimitFactor)}} but description of userlimit is > {quote} > Each queue enforces a limit on the percentage of resources allocated to a > user at any given time, if there is demand for resources. The user limit can > vary between a minimum and maximum value.{color:red} The the former (the > minimum value) is set to this property value {color} and the latter (the > maximum value) depends on the number of users who have submitted > applications. For e.g., suppose the value of this property is 25. If two > users have submitted applications to a queue, no single user can use more > than 50% of the queue resources. If a third user submits an application, no > single user can use more than 33% of the queue resources. With 4 or more > users, no user can use more than 25% of the queues resources. A value of 100 > implies no user limits are imposed. The default is 100. Value is specified as > a integer. > {quote} > configuration related to minimum limit should not be made used in a formula > to calculate max applications for a user -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4117) End to end unit test with mini YARN cluster for AMRMProxy Service
[ https://issues.apache.org/jira/browse/YARN-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149342#comment-15149342 ] Hadoop QA commented on YARN-4117: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 42s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 54s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 58s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 55s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 6s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 3 new + 55 unchanged - 0 fixed = 58 total (was 55) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 48s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 45s {color} | {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 5s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 15s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 6s {color} | {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.7.0_95. {color} | |
[jira] [Commented] (YARN-3637) Handle localization sym-linking correctly at the YARN level
[ https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149193#comment-15149193 ] Vinod Kumar Vavilapalli commented on YARN-3637: --- [~ctrezzo], are you planning to working on this any time soon? We need to make a decision about this (and shared-cache security support in general) for 2.8.0 as soon as possible. Tx. > Handle localization sym-linking correctly at the YARN level > --- > > Key: YARN-3637 > URL: https://issues.apache.org/jira/browse/YARN-3637 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Blocker > > The shared cache needs to handle resource sym-linking at the YARN layer. > Currently, we let the application layer (i.e. mapreduce) handle this, but it > is probably better for all applications if it is handled transparently. > Here is the scenario: > Imagine two separate jars (with unique checksums) that have the same name > job.jar. > They are stored in the shared cache as two separate resources: > checksum1/job.jar > checksum2/job.jar > A new application tries to use both of these resources, but internally refers > to them as different names: > foo.jar maps to checksum1 > bar.jar maps to checksum2 > When the shared cache returns the path to the resources, both resources are > named the same (i.e. job.jar). Because of this, when the resources are > localized one of them clobbers the other. This is because both symlinks in > the container_id directory are the same name (i.e. job.jar) even though they > point to two separate resource directories. > Originally we tackled this in the MapReduce client by using the fragment > portion of the resource url. This, however, seems like something that should > be solved at the YARN layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking
[ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-4676: -- Fix Version/s: (was: 2.8.0) Removing fix-version as the patch isn't committed yet. This actually belongs to the "Target Version" field, but that is unavailable as of now. > Automatic and Asynchronous Decommissioning Nodes Status Tracking > > > Key: YARN-4676 > URL: https://issues.apache.org/jira/browse/YARN-4676 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daniel Zhi >Assignee: Daniel Zhi > Labels: features > Attachments: GracefulDecommissionYarnNode.pdf, HADOOP-4676.004.patch, > HADOOP-4676.005.patch > > > DecommissioningNodeWatcher inside ResourceTrackingService tracks > DECOMMISSIONING nodes status automatically and asynchronously after > client/admin made the graceful decommission request. It tracks > DECOMMISSIONING nodes status to decide when, after all running containers on > the node have completed, will be transitioned into DECOMMISSIONED state. > NodesListManager detect and handle include and exclude list changes to kick > out decommission or recommission as necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1547) Prevent DoS of ApplicationMasterProtocol by putting in limits
[ https://issues.apache.org/jira/browse/YARN-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149085#comment-15149085 ] Giovanni Matteo Fumarola commented on YARN-1547: Thanks for raising this [~vinodkv]. I was wondering if I might take this up, if you are not actively working on it. I, [~subru] and [~kishorch] we brainstorming it and we came up with a first approach. I will share a small design document with you to receive feedback on it. > Prevent DoS of ApplicationMasterProtocol by putting in limits > - > > Key: YARN-1547 > URL: https://issues.apache.org/jira/browse/YARN-1547 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli > > Points of DoS in ApplicationMasterProtocol > - Host and trackingURL in RegisterApplicationMasterRequest > - Diagnostics, final trackingURL in FinishApplicationMasterRequest > - Unlimited number of resourceAsks, containersToBeReleased and > resourceBlacklistRequest in AllocateRequest > -- Unbounded number of priorities and/or resourceRequests in each ask. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4695) EntityGroupFSTimelineStore to not log errors during shutdown
[ https://issues.apache.org/jira/browse/YARN-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149041#comment-15149041 ] Li Lu commented on YARN-4695: - Sure. I'll take a look. Thanks for the work! > EntityGroupFSTimelineStore to not log errors during shutdown > > > Key: YARN-4695 > URL: https://issues.apache.org/jira/browse/YARN-4695 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Li Lu > > # The {{EntityGroupFSTimelineStore}} threads log exceptions that get raised > during their execution. > # the service stops by interrupting all its workers > # as a result, the workers all log exceptions at error *even during a managed > shutdown* > # this creates distracting noise in logs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4117) End to end unit test with mini YARN cluster for AMRMProxy Service
[ https://issues.apache.org/jira/browse/YARN-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149006#comment-15149006 ] Giovanni Matteo Fumarola commented on YARN-4117: Hi [~jianhe] Can you please review this patch? I am asking you because you reviewed and pushed the AMRMProxy on YARN-2884. > End to end unit test with mini YARN cluster for AMRMProxy Service > - > > Key: YARN-4117 > URL: https://issues.apache.org/jira/browse/YARN-4117 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Reporter: Kishore Chaliparambil >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-4117.v0.patch > > > YARN-2884 introduces a proxy between AM and RM. This JIRA proposes an end to > end unit test using mini YARN cluster to the AMRMProxy service. This test > will validate register, allocate and finish application and token renewal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4117) End to end unit test with mini YARN cluster for AMRMProxy Service
[ https://issues.apache.org/jira/browse/YARN-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-4117: --- Attachment: YARN-4117.v0.patch > End to end unit test with mini YARN cluster for AMRMProxy Service > - > > Key: YARN-4117 > URL: https://issues.apache.org/jira/browse/YARN-4117 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Reporter: Kishore Chaliparambil >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-4117.v0.patch > > > YARN-2884 introduces a proxy between AM and RM. This JIRA proposes an end to > end unit test using mini YARN cluster to the AMRMProxy service. This test > will validate register, allocate and finish application and token renewal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4696) EntityGroupFSTimelineStore to work in the absence of an RM
[ https://issues.apache.org/jira/browse/YARN-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148995#comment-15148995 ] Hadoop QA commented on YARN-4696: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 44s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 3s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 39s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 26s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 41s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 3s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 40s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 1s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 1s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 33s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 3 new + 214 unchanged - 0 fixed = 217 total (was 214) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 32s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 4s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 55s {color} | {color:green} hadoop-yarn-server-timeline-pluginstorage in the patch passed with JDK v1.8.0_72. {color} |
[jira] [Updated] (YARN-4412) Create ClusterMonitor to compute ordered list of preferred NMs for OPPORTUNITIC containers
[ https://issues.apache.org/jira/browse/YARN-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-4412: -- Attachment: YARN-4412-yarn-2877.v4.patch Updating patch to fix testcases. With regard to the findbugs warnings: # The {{System.exit(-1)}} in the {{EventDispatcher}} is intentional # The Synchronization inconsistencies in the Protocol Buffer classes will remain.. to maintain consistency with all the other PBImpl classes. Also, I do not expect multiple threads to be modifying the same object in any case. > Create ClusterMonitor to compute ordered list of preferred NMs for > OPPORTUNITIC containers > -- > > Key: YARN-4412 > URL: https://issues.apache.org/jira/browse/YARN-4412 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-4412-yarn-2877.v1.patch, > YARN-4412-yarn-2877.v2.patch, YARN-4412-yarn-2877.v3.patch, > YARN-4412-yarn-2877.v4.patch > > > Introduce a Cluster Monitor that aggregates load information from individual > Node Managers and computes an ordered list of preferred Node managers to be > used as target Nodes for OPPORTUNISTIC container allocations. > This list can be pushed out to the Node Manager (specifically the AMRMProxy > running on the Node) via the Allocate Response. This will be used to make > local Scheduling decisions -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4695) EntityGroupFSTimelineStore to not log errors during shutdown
[ https://issues.apache.org/jira/browse/YARN-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148961#comment-15148961 ] Steve Loughran commented on YARN-4695: -- [~gtCarrera9] -I've fixed this in YARN-4696; if you could review that patch I'd be grateful > EntityGroupFSTimelineStore to not log errors during shutdown > > > Key: YARN-4695 > URL: https://issues.apache.org/jira/browse/YARN-4695 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Li Lu > > # The {{EntityGroupFSTimelineStore}} threads log exceptions that get raised > during their execution. > # the service stops by interrupting all its workers > # as a result, the workers all log exceptions at error *even during a managed > shutdown* > # this creates distracting noise in logs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4695) EntityGroupFSTimelineStore to not log errors during shutdown
[ https://issues.apache.org/jira/browse/YARN-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu reassigned YARN-4695: --- Assignee: Li Lu > EntityGroupFSTimelineStore to not log errors during shutdown > > > Key: YARN-4695 > URL: https://issues.apache.org/jira/browse/YARN-4695 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Li Lu > > # The {{EntityGroupFSTimelineStore}} threads log exceptions that get raised > during their execution. > # the service stops by interrupting all its workers > # as a result, the workers all log exceptions at error *even during a managed > shutdown* > # this creates distracting noise in logs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4696) EntityGroupFSTimelineStore to work in the absence of an RM
[ https://issues.apache.org/jira/browse/YARN-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-4696: - Attachment: YARN-4696-001.patch Patch -001; thing I had to to do to get my (external, spark integration) test closer to working. These are a combination of things that are absolutely needed (disabling RM, flushing on close()), generally better (exception handling), and needed to debug what's going on (all the improved logging) # RM integration can be disabled, the timeline store then only uses modified times as a liveness test. This includes checks for null around uses of yarnClient; # I took the opportunity to clean up service shutdown in the process. # YARN-4695 recommendations: all worker threads unwrap exceptions and, if interrupted exceptions, skip the stack trace. # better logging @ debug (including # of scanned apps) # {{TimelineWriter}} doesn't rewrap IOEs in IOEs, wraps interrupted exception into {{InterruptedIOException}} # {{FileSystemTimelineWriter.close()}} does a {{flush()}}. Stops any last events getting lost. There are tests, but not here. Look in https://github.com/steveloughran/spark-timeline-integration > EntityGroupFSTimelineStore to work in the absence of an RM > -- > > Key: YARN-4696 > URL: https://issues.apache.org/jira/browse/YARN-4696 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Steve Loughran > Attachments: YARN-4696-001.patch > > > {{EntityGroupFSTimelineStore}} now depends on an RM being up and running; the > configuration pointing to it. This is a new change, and impacts testing where > you have historically been able to test without an RM running. > The sole purpose of the probe is to automatically determine if an app is > running; it falls back to "unknown" if not. If the RM connection was > optional, the "unknown" codepath could be called directly, relying on age of > file as a metric of completion > Options > # add a flag to disable RM connect > # skip automatically if RM not defined/set to 0.0.0.0 > # disable retries on yarn client IPC; if it fails, tag app as unknown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4648) Move preemption related tests from TestFairScheduler to TestFairSchedulerPreemption
[ https://issues.apache.org/jira/browse/YARN-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148917#comment-15148917 ] Hadoop QA commented on YARN-4648: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 42s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 19s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 41s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 158m 17s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_72 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_95 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12788055/YARN-4648.02.patch | | JIRA Issue | YARN-4648 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | |
[jira] [Updated] (YARN-4678) Cluster used capacity is > 100 when container reserved
[ https://issues.apache.org/jira/browse/YARN-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4678: -- Attachment: 0001-YARN-4678.patch Hi [~brahmareddy] Sharing an initial version of patch based on above explanation. > Cluster used capacity is > 100 when container reserved > --- > > Key: YARN-4678 > URL: https://issues.apache.org/jira/browse/YARN-4678 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Attachments: 0001-YARN-4678.patch > > > *Scenario:* > * Start cluster with Three NM's each having 8GB (cluster memory:24GB). > * Configure queues with elasticity and userlimitfactor=10. > * disable pre-emption. > * run two job with different priority in different queue at the same time > ** yarn jar hadoop-mapreduce-examples-2.7.2.jar pi -Dyarn.app.priority=LOW > -Dmapreduce.job.queuename=QueueA -Dmapreduce.map.memory.mb=4096 > -Dyarn.app.mapreduce.am.resource.mb=1536 > -Dmapreduce.job.reduce.slowstart.completedmaps=1.0 10 1 > ** yarn jar hadoop-mapreduce-examples-2.7.2.jar pi -Dyarn.app.priority=HIGH > -Dmapreduce.job.queuename=QueueB -Dmapreduce.map.memory.mb=4096 > -Dyarn.app.mapreduce.am.resource.mb=1536 3 1 > * observe the cluster capacity which was used in RM web UI -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4696) EntityGroupFSTimelineStore to work in the absence of an RM
Steve Loughran created YARN-4696: Summary: EntityGroupFSTimelineStore to work in the absence of an RM Key: YARN-4696 URL: https://issues.apache.org/jira/browse/YARN-4696 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.8.0 Reporter: Steve Loughran {{EntityGroupFSTimelineStore}} now depends on an RM being up and running; the configuration pointing to it. This is a new change, and impacts testing where you have historically been able to test without an RM running. The sole purpose of the probe is to automatically determine if an app is running; it falls back to "unknown" if not. If the RM connection was optional, the "unknown" codepath could be called directly, relying on age of file as a metric of completion Options # add a flag to disable RM connect # skip automatically if RM not defined/set to 0.0.0.0 # disable retries on yarn client IPC; if it fails, tag app as unknown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4648) Move preemption related tests from TestFairScheduler to TestFairSchedulerPreemption
[ https://issues.apache.org/jira/browse/YARN-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Sasaki updated YARN-4648: - Attachment: YARN-4648.02.patch > Move preemption related tests from TestFairScheduler to > TestFairSchedulerPreemption > --- > > Key: YARN-4648 > URL: https://issues.apache.org/jira/browse/YARN-4648 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Kai Sasaki > Labels: newbie++ > Attachments: YARN-4648.01.patch, YARN-4648.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4695) EntityGroupFSTimelineStore to not log errors during shutdown
[ https://issues.apache.org/jira/browse/YARN-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148644#comment-15148644 ] Steve Loughran commented on YARN-4695: -- ..hostname of www.bbc.co.uk/0.0.0.0:8032. is just 127.0.0.1 with the usual set of domains disabled. And as usual, IPC client doesn't detect "0.0.0.0" as an invalid IP address. Even if IPC client does (it should), ATS can and fail fast > EntityGroupFSTimelineStore to not log errors during shutdown > > > Key: YARN-4695 > URL: https://issues.apache.org/jira/browse/YARN-4695 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Steve Loughran > > # The {{EntityGroupFSTimelineStore}} threads log exceptions that get raised > during their execution. > # the service stops by interrupting all its workers > # as a result, the workers all log exceptions at error *even during a managed > shutdown* > # this creates distracting noise in logs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4695) EntityGroupFSTimelineStore to not log errors during shutdown
[ https://issues.apache.org/jira/browse/YARN-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148643#comment-15148643 ] Steve Loughran commented on YARN-4695: -- Fuller stack. What's happening is the RM has been shut down (to be precise: there isn't an RM running. not even a little one). the log parser is asking for the latest list of running apps, and timing out; this puts IPC into retry, which stops the shutdown working. {code} 016-02-16 14:13:28,126 [IPC Server Responder] INFO ipc.Server (Server.java:run(959)) - Stopping IPC Server Responder 2016-02-16 14:13:28,126 [ScalaTest-main-running-TimelineListenerSuite] INFO timeline.EntityGroupFSTimelineStore (EntityGroupFSTimelineStore.java:serviceStop(275)) - Stopping EntityGroupFSTimelineStore 2016-02-16 14:13:28,127 [ScalaTest-main-running-TimelineListenerSuite] INFO timeline.EntityGroupFSTimelineStore (EntityGroupFSTimelineStore.java:serviceStop(279)) - Waiting for executor to terminate 2016-02-16 14:13:28,966 [EntityLogPluginWorker #1] INFO ipc.Client (Client.java:handleConnectionFailure(897)) - Retrying connect to server: www.bbc.co.uk/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-02-16 14:13:29,970 [EntityLogPluginWorker #1] INFO ipc.Client (Client.java:handleConnectionFailure(897)) - Retrying connect to server: www.bbc.co.uk/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-02-16 14:13:30,975 [EntityLogPluginWorker #1] INFO ipc.Client (Client.java:handleConnectionFailure(897)) - Retrying connect to server: www.bbc.co.uk/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-02-16 14:13:31,980 [EntityLogPluginWorker #1] INFO ipc.Client (Client.java:handleConnectionFailure(897)) - Retrying connect to server: www.bbc.co.uk/0.0.0.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-02-16 14:13:32,986 [EntityLogPluginWorker #1] INFO ipc.Client (Client.java:handleConnectionFailure(897)) - Retrying connect to server: www.bbc.co.uk/0.0.0.0:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-02-16 14:13:33,995 [EntityLogPluginWorker #1] INFO ipc.Client (Client.java:handleConnectionFailure(897)) - Retrying connect to server: www.bbc.co.uk/0.0.0.0:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-02-16 14:13:35,000 [EntityLogPluginWorker #1] INFO ipc.Client (Client.java:handleConnectionFailure(897)) - Retrying connect to server: www.bbc.co.uk/0.0.0.0:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-02-16 14:13:36,005 [EntityLogPluginWorker #1] INFO ipc.Client (Client.java:handleConnectionFailure(897)) - Retrying connect to server: www.bbc.co.uk/0.0.0.0:8032. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-02-16 14:13:37,010 [EntityLogPluginWorker #1] INFO ipc.Client (Client.java:handleConnectionFailure(897)) - Retrying connect to server: www.bbc.co.uk/0.0.0.0:8032. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-02-16 14:13:38,015 [EntityLogPluginWorker #1] INFO ipc.Client (Client.java:handleConnectionFailure(897)) - Retrying connect to server: www.bbc.co.uk/0.0.0.0:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2016-02-16 14:13:38,133 [ScalaTest-main-running-TimelineListenerSuite] WARN timeline.EntityGroupFSTimelineStore (EntityGroupFSTimelineStore.java:serviceStop(284)) - Executor did not terminate 2016-02-16 14:13:38,133 [ScalaTest-main-running-TimelineListenerSuite] INFO timeline.LeveldbTimelineStore (LeveldbTimelineStore.java:serviceStop(250)) - Waiting for deletion thread to complete its current action 2016-02-16 14:13:38,133 [Thread-9] INFO timeline.LeveldbTimelineStore (LeveldbTimelineStore.java:run(296)) - Deletion thread received interrupt, exiting 2016-02-16 14:13:38,134 [EntityLogPluginWorker #1] ERROR timeline.EntityGroupFSTimelineStore (EntityGroupFSTimelineStore.java:run(693)) - Error processing logs for application__ java.lang.reflect.UndeclaredThrowableException at com.sun.proxy.$Proxy25.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:448) at
[jira] [Commented] (YARN-4695) EntityGroupFSTimelineStore to not log errors during shutdown
[ https://issues.apache.org/jira/browse/YARN-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148624#comment-15148624 ] Steve Loughran commented on YARN-4695: -- stack. Note how the interrupted exception happened during RPC and has been wrapped {code} 2016-02-16 13:57:21,171 [ScalaTest-main-running-TimelineListenerSuite] INFO timeline.LeveldbTimelineStore (LeveldbTimelineStore.java:serviceStop(250)) - Waiting for deletion thread to complete its current action 2016-02-16 13:57:21,171 [Thread-10] INFO timeline.LeveldbTimelineStore (LeveldbTimelineStore.java:run(296)) - Deletion thread received interrupt, exiting 2016-02-16 13:57:21,172 [EntityLogPluginWorker #1] ERROR timeline.EntityGroupFSTimelineStore (EntityGroupFSTimelineStore.java:run(693)) - Error processing logs for application__ java.lang.reflect.UndeclaredThrowableException at com.sun.proxy.$Proxy25.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:448) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getAppState(EntityGroupFSTimelineStore.java:464) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.access$700(EntityGroupFSTimelineStore.java:79) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore$AppLogs.parseSummaryLogs(EntityGroupFSTimelineStore.java:529) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore$AppLogs.parseSummaryLogs(EntityGroupFSTimelineStore.java:519) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore$ActiveLogParser.run(EntityGroupFSTimelineStore.java:686) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:159) ... 14 more {code} > EntityGroupFSTimelineStore to not log errors during shutdown > > > Key: YARN-4695 > URL: https://issues.apache.org/jira/browse/YARN-4695 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Steve Loughran > > # The {{EntityGroupFSTimelineStore}} threads log exceptions that get raised > during their execution. > # the service stops by interrupting all its workers > # as a result, the workers all log exceptions at error *even during a managed > shutdown* > # this creates distracting noise in logs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4695) EntityGroupFSTimelineStore to not log errors during shutdown
Steve Loughran created YARN-4695: Summary: EntityGroupFSTimelineStore to not log errors during shutdown Key: YARN-4695 URL: https://issues.apache.org/jira/browse/YARN-4695 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.8.0 Reporter: Steve Loughran # The {{EntityGroupFSTimelineStore}} threads log exceptions that get raised during their execution. # the service stops by interrupting all its workers # as a result, the workers all log exceptions at error *even during a managed shutdown* # this creates distracting noise in logs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4648) Move preemption related tests from TestFairScheduler to TestFairSchedulerPreemption
[ https://issues.apache.org/jira/browse/YARN-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148619#comment-15148619 ] Kai Sasaki commented on YARN-4648: -- [~ozawa] It's reasonable. I'll update. Thanks! > Move preemption related tests from TestFairScheduler to > TestFairSchedulerPreemption > --- > > Key: YARN-4648 > URL: https://issues.apache.org/jira/browse/YARN-4648 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Kai Sasaki > Labels: newbie++ > Attachments: YARN-4648.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4412) Create ClusterMonitor to compute ordered list of preferred NMs for OPPORTUNITIC containers
[ https://issues.apache.org/jira/browse/YARN-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148532#comment-15148532 ] Hadoop QA commented on YARN-4412: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 14s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 40s {color} | {color:green} yarn-2877 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 41s {color} | {color:green} yarn-2877 passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 34s {color} | {color:green} yarn-2877 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 9s {color} | {color:green} yarn-2877 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 46s {color} | {color:green} yarn-2877 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 21s {color} | {color:green} yarn-2877 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 21s {color} | {color:green} yarn-2877 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 12s {color} | {color:green} yarn-2877 passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 47s {color} | {color:green} yarn-2877 passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 34s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 34s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 6m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 34s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 10s {color} | {color:red} root: patch generated 83 new + 442 unchanged - 11 fixed = 525 total (was 453) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 22s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 57s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 12s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 56s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 23s {color} | {color:red} hadoop-yarn-api in the patch failed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 52s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_72.
[jira] [Commented] (YARN-2225) Turn the virtual memory check to be off by default
[ https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148449#comment-15148449 ] Akira AJISAKA commented on YARN-2225: - bq. Why not making vmem-pmem ratio larger to address the problem? Yes, we can avoid the error by making vmem-pmem ratio larger, so the problems are, the default value of the ratio is too small for Java 8 and the virtual memory check is enabled by default. Can we make the ratio larger, or make the vmem check disabled by default? > Turn the virtual memory check to be off by default > -- > > Key: YARN-2225 > URL: https://issues.apache.org/jira/browse/YARN-2225 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2225.patch > > > The virtual memory check may not be the best way to isolate applications. > Virtual memory is not the constrained resource. It would be better if we > limit the swapping of the task using swapiness instead. This patch will turn > this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if > they need to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4692) [Umbrella] Simplified and first-class support for services in YARN
[ https://issues.apache.org/jira/browse/YARN-4692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148405#comment-15148405 ] Marco Rabozzi commented on YARN-4692: - Thanks [~vinodkv] for starting the discussion and [~asuresh] for the detailed comments on long running container scheduling. Overall, the document gives a very detailed overview of the current state for long running services support in YARN. With respect to long running service upgrades, the proposal for allocation reuse (3.2.3) is very interesting since it allows to reduce the time needed for container upgrades. However, if the AM container is the one that needs to be upgraded, the RM should be aware of the process, otherwise, in case of subsequent AM or NM failures the AM might be restarted with old bits. I think that a possible solution would be to revise the design proposed in YARN-4470 to take into account allocation reuse. We could decouple the request to update the submission context within the RM from the actual updated *startContainer* request for the same AM allocation. > [Umbrella] Simplified and first-class support for services in YARN > -- > > Key: YARN-4692 > URL: https://issues.apache.org/jira/browse/YARN-4692 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Attachments: > YARN-First-Class-And-Simplified-Support-For-Services-v0.pdf > > > YARN-896 focused on getting the ball rolling on the support for services > (long running applications) on YARN. > I’d like propose the next stage of this effort: _Simplified and first-class > support for services in YARN_. > The chief rationale for filing a separate new JIRA is threefold: > - Do a fresh survey of all the things that are already implemented in the > project > - Weave a comprehensive story around what we further need and attempt to > rally the community around a concrete end-goal, and > - Additionally focus on functionality that YARN-896 and friends left for > higher layers to take care of and see how much of that is better integrated > into the YARN platform itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2225) Turn the virtual memory check to be off by default
[ https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148329#comment-15148329 ] Tsuyoshi Ozawa commented on YARN-2225: -- Why not making vmem-pmem ratio larger to address the problem? > Turn the virtual memory check to be off by default > -- > > Key: YARN-2225 > URL: https://issues.apache.org/jira/browse/YARN-2225 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2225.patch > > > The virtual memory check may not be the best way to isolate applications. > Virtual memory is not the constrained resource. It would be better if we > limit the swapping of the task using swapiness instead. This patch will turn > this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if > they need to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2225) Turn the virtual memory check to be off by default
[ https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148306#comment-15148306 ] Akira AJISAKA commented on YARN-2225: - I'm thinking the default value should be changed in branch-2 as well. We did some incompatible changes in branch-2 if the change is really needed. There are 14 incompatible changes in 2.6.0/2.7.0. https://issues.apache.org/jira/issues/?jql=project%20in%20(YARN%2C%20HADOOP%2C%20HDFS%2C%20MAPREDUCE)%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20in%20(2.6.0%2C%202.7.0)%20AND%20%22Hadoop%20Flags%22%20%3D%20%22Incompatible%20change%22 I fed up with hearing Java8 users' complaints, "The default value does not work.". > Turn the virtual memory check to be off by default > -- > > Key: YARN-2225 > URL: https://issues.apache.org/jira/browse/YARN-2225 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2225.patch > > > The virtual memory check may not be the best way to isolate applications. > Virtual memory is not the constrained resource. It would be better if we > limit the swapping of the task using swapiness instead. This patch will turn > this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if > they need to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-2225) Turn the virtual memory check to be off by default
[ https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA reopened YARN-2225: - I'd like to reopen this. The issue was previously closed because this change is incompatible, but the change can be in trunk. > Turn the virtual memory check to be off by default > -- > > Key: YARN-2225 > URL: https://issues.apache.org/jira/browse/YARN-2225 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2225.patch > > > The virtual memory check may not be the best way to isolate applications. > Virtual memory is not the constrained resource. It would be better if we > limit the swapping of the task using swapiness instead. This patch will turn > this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if > they need to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4412) Create ClusterMonitor to compute ordered list of preferred NMs for OPPORTUNITIC containers
[ https://issues.apache.org/jira/browse/YARN-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-4412: -- Attachment: YARN-4412-yarn-2877.v3.patch Uploading patch to rebase with the latest YARN-2877 as well as address [~curino]'s review comments > Create ClusterMonitor to compute ordered list of preferred NMs for > OPPORTUNITIC containers > -- > > Key: YARN-4412 > URL: https://issues.apache.org/jira/browse/YARN-4412 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-4412-yarn-2877.v1.patch, > YARN-4412-yarn-2877.v2.patch, YARN-4412-yarn-2877.v3.patch > > > Introduce a Cluster Monitor that aggregates load information from individual > Node Managers and computes an ordered list of preferred Node managers to be > used as target Nodes for OPPORTUNISTIC container allocations. > This list can be pushed out to the Node Manager (specifically the AMRMProxy > running on the Node) via the Allocate Response. This will be used to make > local Scheduling decisions -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4412) Create ClusterMonitor to compute ordered list of preferred NMs for OPPORTUNITIC containers
[ https://issues.apache.org/jira/browse/YARN-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148261#comment-15148261 ] Arun Suresh commented on YARN-4412: --- Many thanks for the detailed review [~curino]. # I totally agree with your point in explicitly authorizing AMs to allow them to send and receive cluster information via the extended protocol : YARN-4631 has been raised to track this. # With regard to generalizing {{QueuedContainersStatus}} into a {{ClusterStatus}}, Please note.. this is actually metadata sent from the NM to the RM, therefore *ClusterStatus* might not apply here. But I agree, we probably can add more cluster information to the {{DistributedSchedulingProtocol}}, which we introduced in YARN-2885. Also the node heartbeat does already contain both Container as well as aggregate Node resource utilization information. {{QueuedContainersStatus}} is just another utilization metric used by the {{ClusterMonitor}} running on the RM and used by the DistributedScheduling framework to gauge the relative load on a Node based on the state of the queue (maintained by the {{ContainersMonitor}} which queues OPPORTUNISTICS container requests) bq. ..documentation on the various classes would help. e.g., you introduce a DistributedSchedulingService, .. Agreed, I have added some class level docs to some of the new classes introduced here. bq. ... if you are factoring out all the "guts" of SchedulerEventDispatcher, can't we simply move the class out? .. Agreed.. bq. Can you clarify what happens in DistributedSchedulingService.getServer() ?... Fixed the comment to explain this. bq. ..assumes resources will have only cpu/mem...Is there any better way to load this info from configuration? It would be nice to have a config.getResource("blah"), which takes care of this... Good point.. unfortunately, currently the Configuration object does not support {{getResource()}}.. Once the generalized resource model lands, will circle back to this. bq. I see tests for TopKNodeSelector, but for nothing else. Is this enough? Definitely not.. but we have to wait for the actual changes in the {{ContainerManager}} and {{ContainersMonitor}} class, handled in YARN-2883 to test this end-to-end. In the mean time, I will add tests to verify that extra fields in the protobuff are handled correctly. > Create ClusterMonitor to compute ordered list of preferred NMs for > OPPORTUNITIC containers > -- > > Key: YARN-4412 > URL: https://issues.apache.org/jira/browse/YARN-4412 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: YARN-4412-yarn-2877.v1.patch, > YARN-4412-yarn-2877.v2.patch > > > Introduce a Cluster Monitor that aggregates load information from individual > Node Managers and computes an ordered list of preferred Node managers to be > used as target Nodes for OPPORTUNISTIC container allocations. > This list can be pushed out to the Node Manager (specifically the AMRMProxy > running on the Node) via the Allocate Response. This will be used to make > local Scheduling decisions -- This message was sent by Atlassian JIRA (v6.3.4#6332)