[jira] [Commented] (YARN-9594) Unknown event arrived at ContainerScheduler: EventType: RECOVERY_COMPLETED
[ https://issues.apache.org/jira/browse/YARN-9594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860583#comment-16860583 ] lujie commented on YARN-9594: - ping-> > Unknown event arrived at ContainerScheduler: EventType: RECOVERY_COMPLETED > -- > > Key: YARN-9594 > URL: https://issues.apache.org/jira/browse/YARN-9594 > Project: Hadoop YARN > Issue Type: Bug >Reporter: lujie >Assignee: lujie >Priority: Major > Attachments: YARN-9594_1.patch > > > It seems that we miss a break in switch-case > {code:java} > case RECOVERY_COMPLETED: > startPendingContainers(maxOppQueueLength <= 0); > metrics.setQueuedContainers(queuedOpportunisticContainers.size(), > queuedGuaranteedContainers.size()); > //break;missed > default: > LOG.error("Unknown event arrived at ContainerScheduler: " > + event.toString()); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9612) Support using ip to register NodeID
[ https://issues.apache.org/jira/browse/YARN-9612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860561#comment-16860561 ] zhoukang commented on YARN-9612: Thanks [~tangzhankun] I think use service name will make maintenance more difficult. > Support using ip to register NodeID > --- > > Key: YARN-9612 > URL: https://issues.apache.org/jira/browse/YARN-9612 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhoukang >Priority: Major > > In the environment like k8s. We should support ip when register NodeID with > RM since the hostname will be podName which can not be be resolved by DNS of > k8s -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9598) Make reservation work well when multi-node enabled
[ https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860540#comment-16860540 ] Tao Yang commented on YARN-9598: Thanks [~cheersyang] for the response. {quote} How can we make sure a big container request not getting starved in such case? Maybe a way to improve this is to swap reserved container on NMs {quote} I think it should be improved preemption policy to take on this responsibility. Considering that there is still dispute about re-reservation disabled when multi-nodes enabled, perhaps we can suppose that the harmful of re-reservation can be ignored for now and the problem can be solved by improved nodes-sorting policy? I will remove referenced changes in the patch if no objections. > Make reservation work well when multi-node enabled > -- > > Key: YARN-9598 > URL: https://issues.apache.org/jira/browse/YARN-9598 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9598.001.patch, image-2019-06-10-11-37-43-283.png, > image-2019-06-10-11-37-44-975.png > > > This issue is to solve problems about reservation when multi-node enabled: > # As discussed in YARN-9576, re-reservation proposal may be always generated > on the same node and break the scheduling for this app and later apps. I > think re-reservation in unnecessary and we can replace it with > LOCALITY_SKIPPED to let scheduler have a chance to look up follow candidates > for this app when multi-node enabled. > # Scheduler iterates all nodes and try to allocate for reserved container in > LeafQueue#allocateFromReservedContainer. Here there are two problems: > ** The node of reserved container should be taken as candidates instead of > all nodes when calling FiCaSchedulerApp#assignContainers, otherwise later > scheduler may generate a reservation-fulfilled proposal on another node, > which will always be rejected in FiCaScheduler#commonCheckContainerAllocation. > ** Assignment returned by FiCaSchedulerApp#assignContainers could never be > null even if it's just skipped, it will break the normal scheduling process > for this leaf queue because of the if clause in LeafQueue#assignContainers: > "if (null != assignment) \{ return assignment;}" > # Nodes which have been reserved should be skipped when iterating candidates > in RegularContainerAllocator#allocate, otherwise scheduler may generate > allocation or reservation proposal on these node which will always be > rejected in FiCaScheduler#commonCheckContainerAllocation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9608) DecommissioningNodesWatcher should get lists of running applications on node from RMNode.
[ https://issues.apache.org/jira/browse/YARN-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860499#comment-16860499 ] Abhishek Modi commented on YARN-9608: - [~wjlei] with this jira we are also tracking applications whose containers ran on the node before node was put into Decommissioning state. Previously also node life-cycle was dependent on application run time but it was considering only those applications whose containers were running on the node when node was moved to decommissioning state. > DecommissioningNodesWatcher should get lists of running applications on node > from RMNode. > - > > Key: YARN-9608 > URL: https://issues.apache.org/jira/browse/YARN-9608 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9608.001.patch > > > At present, DecommissioningNodesWatcher tracks list of running applications > and triggers decommission of nodes when all the applications that ran on the > node completes. This Jira proposes to solve following problem: > # DecommissioningNodesWatcher skips tracking application containers on a > particular node before the node is in DECOMMISSIONING state. It only tracks > containers once the node is in DECOMMISSIONING state. This can lead to > shuffle data loss of apps whose containers ran on this node before it was > moved to decommissioning state. > # It is keeping track of running apps. We can leverage this directly from > RMNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9598) Make reservation work well when multi-node enabled
[ https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859740#comment-16859740 ] Juanjuan Tian edited comment on YARN-9598 at 6/11/19 1:58 AM: --- Hi Tao, {noformat} disable re-reservation can only make the scheduler skip reserving the same container repeatedly and try to allocate on other nodes, it won't affect normal scheduling for this app and later apps. Thoughts?{noformat} for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in cluster, and two queues A,B, each is configured with 50% capacity. firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, and each node of the 10 nodes will have a contianer allocated. Afterwards, another job JobB which requests 3G resource is submited to queue B, and there will be one container with 3G size reserved on node h1, if we disable re-reservation, in this case, even scheduler can look up other nodes, since the shouldAllocOrReserveNewContainer is false, there is still no other reservations, and JobB will still get stuck. was (Author: jutia): Hi Tao, {noformat} disable re-reservation can only make the scheduler skip reserving the same container repeatedly and try to allocate on other nodes, it won't affect normal scheduling for this app and later apps. Thoughts?{noformat} for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in cluster, and two queues A,B, each is configured with 50% capacity. firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, and each node of the 10 nodes will have a contianer allocated. Afterwards, another job JobB which requests 3G resource is submited to queue B, and there will be one container with 3G size reserved on node h1, if we disable re-reservation, in this case, even scheduler can look up other nodes, since the shouldAllocOrReserveNewContainer is false, there is still on other reservations, and JobB will still get stuck. > Make reservation work well when multi-node enabled > -- > > Key: YARN-9598 > URL: https://issues.apache.org/jira/browse/YARN-9598 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9598.001.patch, image-2019-06-10-11-37-43-283.png, > image-2019-06-10-11-37-44-975.png > > > This issue is to solve problems about reservation when multi-node enabled: > # As discussed in YARN-9576, re-reservation proposal may be always generated > on the same node and break the scheduling for this app and later apps. I > think re-reservation in unnecessary and we can replace it with > LOCALITY_SKIPPED to let scheduler have a chance to look up follow candidates > for this app when multi-node enabled. > # Scheduler iterates all nodes and try to allocate for reserved container in > LeafQueue#allocateFromReservedContainer. Here there are two problems: > ** The node of reserved container should be taken as candidates instead of > all nodes when calling FiCaSchedulerApp#assignContainers, otherwise later > scheduler may generate a reservation-fulfilled proposal on another node, > which will always be rejected in FiCaScheduler#commonCheckContainerAllocation. > ** Assignment returned by FiCaSchedulerApp#assignContainers could never be > null even if it's just skipped, it will break the normal scheduling process > for this leaf queue because of the if clause in LeafQueue#assignContainers: > "if (null != assignment) \{ return assignment;}" > # Nodes which have been reserved should be skipped when iterating candidates > in RegularContainerAllocator#allocate, otherwise scheduler may generate > allocation or reservation proposal on these node which will always be > rejected in FiCaScheduler#commonCheckContainerAllocation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9608) DecommissioningNodesWatcher should get lists of running applications on node from RMNode.
[ https://issues.apache.org/jira/browse/YARN-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860470#comment-16860470 ] jialei weng edited comment on YARN-9608 at 6/11/19 1:58 AM: {color:#33}This solution provides an idea to extend life-cycle of node local data to the whole application running time. A small question here, if the application is long running job, the node decommission time will also take longer? And rely on the time-out? [~abmodi] Please correct me if I misunderstand.{color} was (Author: wjlei): {color:#33}This solution provides an idea to extend life-cycle of {color:#33}node local data to the whole application running time. A small question here, if the application is long running job, the node decommission time will also take longer? And rely on the time-out? Please correct me if I misunderstand.{color}{color} > DecommissioningNodesWatcher should get lists of running applications on node > from RMNode. > - > > Key: YARN-9608 > URL: https://issues.apache.org/jira/browse/YARN-9608 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9608.001.patch > > > At present, DecommissioningNodesWatcher tracks list of running applications > and triggers decommission of nodes when all the applications that ran on the > node completes. This Jira proposes to solve following problem: > # DecommissioningNodesWatcher skips tracking application containers on a > particular node before the node is in DECOMMISSIONING state. It only tracks > containers once the node is in DECOMMISSIONING state. This can lead to > shuffle data loss of apps whose containers ran on this node before it was > moved to decommissioning state. > # It is keeping track of running apps. We can leverage this directly from > RMNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9608) DecommissioningNodesWatcher should get lists of running applications on node from RMNode.
[ https://issues.apache.org/jira/browse/YARN-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860470#comment-16860470 ] jialei weng commented on YARN-9608: --- {color:#33}This solution provides an idea to extend life-cycle of {color:#33}node local data to the whole application running time. A small question here, if the application is long running job, the node decommission time will also take longer? And rely on the time-out? Please correct me if I misunderstand.{color}{color} > DecommissioningNodesWatcher should get lists of running applications on node > from RMNode. > - > > Key: YARN-9608 > URL: https://issues.apache.org/jira/browse/YARN-9608 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9608.001.patch > > > At present, DecommissioningNodesWatcher tracks list of running applications > and triggers decommission of nodes when all the applications that ran on the > node completes. This Jira proposes to solve following problem: > # DecommissioningNodesWatcher skips tracking application containers on a > particular node before the node is in DECOMMISSIONING state. It only tracks > containers once the node is in DECOMMISSIONING state. This can lead to > shuffle data loss of apps whose containers ran on this node before it was > moved to decommissioning state. > # It is keeping track of running apps. We can leverage this directly from > RMNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources
[ https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860417#comment-16860417 ] zhenzhao wang commented on YARN-9616: - I had seen this issue in 2.9 and 2.6. More check is needed to identify the problem in the latest version. > Shared Cache Manager Failed To Upload Unpacked Resources > > > Key: YARN-9616 > URL: https://issues.apache.org/jira/browse/YARN-9616 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.3, 2.9.2, 2.8.5 >Reporter: zhenzhao wang >Assignee: zhenzhao wang >Priority: Major > > Yarn will unpack archives files and some other files based on the file type > and configuration. E.g. > If I started an MR job with -archive one.zip, then the one.zip will be > unpacked while download. Let's say there're file1 && file2 inside one.zip. > Then the files kept on local disk will be like > /disk3/yarn/local/filecache/352/one.zip/file1 > and/disk3/yarn/local/filecache/352/one.zip/file2. So the shared cache > uploader couldn't upload one.zip to shared cache as it was removed during > localization. The following errors will be thrown. > {code:java} > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader: > Exception while uploading the file dict.zip > java.io.FileNotFoundException: File > /disk3/yarn/local/filecache/352/one.zip/one.zip does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:631) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:857) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:621) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:926) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.computeChecksum(SharedCacheUploader.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:55) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources
[ https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhenzhao wang updated YARN-9616: Affects Version/s: 2.8.3 2.9.2 > Shared Cache Manager Failed To Upload Unpacked Resources > > > Key: YARN-9616 > URL: https://issues.apache.org/jira/browse/YARN-9616 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.3, 2.9.2, 2.8.5 >Reporter: zhenzhao wang >Assignee: zhenzhao wang >Priority: Major > > Yarn will unpack archives files and some other files based on the file type > and configuration. E.g. > If I started an MR job with -archive one.zip, then the one.zip will be > unpacked while download. Let's say there're file1 && file2 inside one.zip. > Then the files kept on local disk will be like > /disk3/yarn/local/filecache/352/one.zip/file1 > and/disk3/yarn/local/filecache/352/one.zip/file2. So the shared cache > uploader couldn't upload one.zip to shared cache as it was removed during > localization. The following errors will be thrown. > {code:java} > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader: > Exception while uploading the file dict.zip > java.io.FileNotFoundException: File > /disk3/yarn/local/filecache/352/one.zip/one.zip does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:631) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:857) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:621) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:926) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.computeChecksum(SharedCacheUploader.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:55) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources
[ https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhenzhao wang updated YARN-9616: Affects Version/s: 2.8.5 > Shared Cache Manager Failed To Upload Unpacked Resources > > > Key: YARN-9616 > URL: https://issues.apache.org/jira/browse/YARN-9616 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.5 >Reporter: zhenzhao wang >Assignee: zhenzhao wang >Priority: Major > > Yarn will unpack archives files and some other files based on the file type > and configuration. E.g. > If I started an MR job with -archive one.zip, then the one.zip will be > unpacked while download. Let's say there're file1 && file2 inside one.zip. > Then the files kept on local disk will be like > /disk3/yarn/local/filecache/352/one.zip/file1 > and/disk3/yarn/local/filecache/352/one.zip/file2. So the shared cache > uploader couldn't upload one.zip to shared cache as it was removed during > localization. The following errors will be thrown. > {code:java} > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader: > Exception while uploading the file dict.zip > java.io.FileNotFoundException: File > /disk3/yarn/local/filecache/352/one.zip/one.zip does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:631) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:857) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:621) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:926) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.computeChecksum(SharedCacheUploader.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:55) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources
zhenzhao wang created YARN-9616: --- Summary: Shared Cache Manager Failed To Upload Unpacked Resources Key: YARN-9616 URL: https://issues.apache.org/jira/browse/YARN-9616 Project: Hadoop YARN Issue Type: Bug Reporter: zhenzhao wang Assignee: zhenzhao wang Yarn will unpack archives files and some other files based on the file type and configuration. E.g. If I started an MR job with -archive one.zip, then the one.zip will be unpacked while download. Let's say there're file1 && file2 inside one.zip. Then the files kept on local disk will be like /disk3/yarn/local/filecache/352/one.zip/file1 and/disk3/yarn/local/filecache/352/one.zip/file2. So the shared cache uploader couldn't upload one.zip to shared cache as it was removed during localization. The following errors will be thrown. {code:java} org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader: Exception while uploading the file dict.zip java.io.FileNotFoundException: File /disk3/yarn/local/filecache/352/one.zip/one.zip does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:631) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:857) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:621) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:926) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.computeChecksum(SharedCacheUploader.java:257) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:128) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:55) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9593) Updating scheduler conf with comma in config value fails
[ https://issues.apache.org/jira/browse/YARN-9593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860396#comment-16860396 ] Anthony Hsu commented on YARN-9593: --- Thanks, [~jhung]. I don't have bandwidth at the moment, but glad we're agreed on the approach. I think this would be a good starter task. > Updating scheduler conf with comma in config value fails > > > Key: YARN-9593 > URL: https://issues.apache.org/jira/browse/YARN-9593 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0, 3.2.0, 3.1.2 >Reporter: Anthony Hsu >Priority: Major > > For example: > {code:java} > $ yarn schedulerconf -update "root.gridops:acl_administer_queue=user1,user2 > group1,group2" > Specify configuration key value as confKey=confVal.{code} > This fails because there is a comma in the config value and the SchedConfCLI > splits on comma first, expecting each split to a k=v pair. > {noformat} > void globalUpdates(String args, SchedConfUpdateInfo updateInfo) { > if (args == null) { > return; > } > HashMap globalUpdates = new HashMap<>(); > for (String globalUpdate : args.split(",")) { > putKeyValuePair(globalUpdates, globalUpdate); > } > updateInfo.setGlobalParams(globalUpdates); > }{noformat} > Cc: [~jhung] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9569) Auto-created leaf queues do not honor cluster-wide min/max memory/vcores
[ https://issues.apache.org/jira/browse/YARN-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860372#comment-16860372 ] Hudson commented on YARN-9569: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16714 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16714/]) YARN-9569. Auto-created leaf queues do not honor cluster-wide min/max (sumasai: rev 9191e08f0ad4ebc2a3b776c4cc71d0fc5c053beb) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerAutoCreatedQueueBase.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerAutoQueueCreation.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractManagedParentQueue.java > Auto-created leaf queues do not honor cluster-wide min/max memory/vcores > > > Key: YARN-9569 > URL: https://issues.apache.org/jira/browse/YARN-9569 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 3.2.0 >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Attachments: YARN-9569.001.patch, YARN-9569.002.patch > > > Auto-created leaf queues do not honor cluster-wide settings for maximum > CPU/vcores allocation. > To reproduce: > # Set auto-create-child-queue.enabled=true for a parent queue. > # Set leaf-queue-template.maximum-allocation-mb=16384. > # Set yarn.resource-types.memory-mb.maximum-allocation=16384 in > resource-types.xml > # Launch a YARN app with a container requesting 16 GB RAM. > > This scenario should work, but instead you get an error similar to this: > {code:java} > java.lang.IllegalArgumentException: Queue maximum allocation cannot be larger > than the cluster setting for queue root.auto.test max allocation per queue: > cluster setting: {code} > > This seems to be caused by this code in > ManagedParentQueue.getLeafQueueConfigs: > {code:java} > CapacitySchedulerConfiguration leafQueueConfigTemplate = new > CapacitySchedulerConfiguration(new Configuration(false), false);{code} > > This initializes a new leaf queue configuration that does not read > resource-types.xml (or any other config). Later, this > CapacitySchedulerConfiguration instance calls > ResourceUtils.fetchMaximumAllocationFromConfig() from its > getMaximumAllocationPerQueue() method and passes itself as the configuration > to use. Since the resource types are not present, ResourceUtils falls back to > compiled-in defaults of 8GB RAM, 4 cores. > > I was able to work around this with a custom AutoCreatedQueueManagementPolicy > implementation which does something like this in init() and reinitialize(): > {code:java} > for (Map.Entry entry : this.scheduler.getConfiguration()) { > if (entry.getKey().startsWith("yarn.resource-types")) { > parentQueue.getLeafQueueTemplate().getLeafQueueConfigs() > .set(entry.getKey(), entry.getValue()); > } > } > {code} > However, this is obviously a very hacky way to solve the problem. > I can submit a proper patch if someone can provide some direction as to the > best way to proceed. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9569) Auto-created leaf queues do not honor cluster-wide min/max memory/vcores
[ https://issues.apache.org/jira/browse/YARN-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860349#comment-16860349 ] Suma Shivaprasad commented on YARN-9569: Committed to trunk. Thanks [~ccondit] > Auto-created leaf queues do not honor cluster-wide min/max memory/vcores > > > Key: YARN-9569 > URL: https://issues.apache.org/jira/browse/YARN-9569 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 3.2.0 >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Attachments: YARN-9569.001.patch, YARN-9569.002.patch > > > Auto-created leaf queues do not honor cluster-wide settings for maximum > CPU/vcores allocation. > To reproduce: > # Set auto-create-child-queue.enabled=true for a parent queue. > # Set leaf-queue-template.maximum-allocation-mb=16384. > # Set yarn.resource-types.memory-mb.maximum-allocation=16384 in > resource-types.xml > # Launch a YARN app with a container requesting 16 GB RAM. > > This scenario should work, but instead you get an error similar to this: > {code:java} > java.lang.IllegalArgumentException: Queue maximum allocation cannot be larger > than the cluster setting for queue root.auto.test max allocation per queue: > cluster setting: {code} > > This seems to be caused by this code in > ManagedParentQueue.getLeafQueueConfigs: > {code:java} > CapacitySchedulerConfiguration leafQueueConfigTemplate = new > CapacitySchedulerConfiguration(new Configuration(false), false);{code} > > This initializes a new leaf queue configuration that does not read > resource-types.xml (or any other config). Later, this > CapacitySchedulerConfiguration instance calls > ResourceUtils.fetchMaximumAllocationFromConfig() from its > getMaximumAllocationPerQueue() method and passes itself as the configuration > to use. Since the resource types are not present, ResourceUtils falls back to > compiled-in defaults of 8GB RAM, 4 cores. > > I was able to work around this with a custom AutoCreatedQueueManagementPolicy > implementation which does something like this in init() and reinitialize(): > {code:java} > for (Map.Entry entry : this.scheduler.getConfiguration()) { > if (entry.getKey().startsWith("yarn.resource-types")) { > parentQueue.getLeafQueueTemplate().getLeafQueueConfigs() > .set(entry.getKey(), entry.getValue()); > } > } > {code} > However, this is obviously a very hacky way to solve the problem. > I can submit a proper patch if someone can provide some direction as to the > best way to proceed. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9613) Avoid remote lookups for RegistryDNS domain
[ https://issues.apache.org/jira/browse/YARN-9613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860316#comment-16860316 ] Eric Yang commented on YARN-9613: - [~billie.rinaldi] If RegistryDNS is designated as DNS authoritative server for a domain, then RegistryDNS doesn't need to perform forward lookup for the records in the Hadoop domain. I think we can introduce hadoop.registry.dns.soa.lookup=false. If this option is set to true, RegistryDNS will perform upstream lookup for queries within hadoop.registry.dns.domain-name. > Avoid remote lookups for RegistryDNS domain > --- > > Key: YARN-9613 > URL: https://issues.apache.org/jira/browse/YARN-9613 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.2 >Reporter: Billie Rinaldi >Priority: Major > > A typical setup for RegistryDNS is for an upstream DNS server to forward DNS > queries matching the hadoop.registry.dns.domain-name to RegistryDNS. If the > RegistryDNS lookup gets a non-zero DNS RCODE, RegistryDNS performs a remote > lookup in upstream DNS servers. For bad queries, this can result in a loop > when the upstream DNS server forwards the query back to RegistryDNS. > To solve this problem, we should avoid performing remote lookups for queries > within hadoop.registry.dns.domain-name, which are expected to be handled by > RegistryDNS. We may also want to evaluate whether we should add a > configuration property that allows the user to disable remote lookups > entirely for RegistryDNS, for installations where RegistryDNS is set up as > the last DNS server in a chain of DNS servers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9471) Cleanup in TestLogAggregationIndexFileController
[ https://issues.apache.org/jira/browse/YARN-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860269#comment-16860269 ] Hudson commented on YARN-9471: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16711 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16711/]) YARN-9471. Cleanup in TestLogAggregationIndexFileController. Contributed (weichiu: rev e94e6435842c5b9dc0f5fe681e0829d33dd5b24e) * (delete) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/filecontroller/ifile/TestLogAggregationIndexFileController.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/filecontroller/ifile/TestLogAggregationIndexedFileController.java > Cleanup in TestLogAggregationIndexFileController > > > Key: YARN-9471 > URL: https://issues.apache.org/jira/browse/YARN-9471 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation, yarn >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: YARN-9471.001.patch, YARN-9471.002.patch > > > {{TestLogAggregationIndexFileController}} class can be cleaned up a bit: > - bad javadoc link > - should be renamed to TestLogAggregationIndex *ed* FileController > - some private class members can be removed > - static fields from Assert can be imported > - {{StringBuilder}} can be removed from {{logMessage}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9615) Add dispatcher metrics to RM
Jonathan Hung created YARN-9615: --- Summary: Add dispatcher metrics to RM Key: YARN-9615 URL: https://issues.apache.org/jira/browse/YARN-9615 Project: Hadoop YARN Issue Type: Task Reporter: Jonathan Hung Assignee: Jonathan Hung It'd be good to have counts/processing times for each event type in RM async dispatcher and scheduler async dispatcher. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9593) Updating scheduler conf with comma in config value fails
[ https://issues.apache.org/jira/browse/YARN-9593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860209#comment-16860209 ] Jonathan Hung commented on YARN-9593: - Yeah [~erwaman], that seems reasonable. Are you interested in taking this up? > Updating scheduler conf with comma in config value fails > > > Key: YARN-9593 > URL: https://issues.apache.org/jira/browse/YARN-9593 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0, 3.2.0, 3.1.2 >Reporter: Anthony Hsu >Priority: Major > > For example: > {code:java} > $ yarn schedulerconf -update "root.gridops:acl_administer_queue=user1,user2 > group1,group2" > Specify configuration key value as confKey=confVal.{code} > This fails because there is a comma in the config value and the SchedConfCLI > splits on comma first, expecting each split to a k=v pair. > {noformat} > void globalUpdates(String args, SchedConfUpdateInfo updateInfo) { > if (args == null) { > return; > } > HashMap globalUpdates = new HashMap<>(); > for (String globalUpdate : args.split(",")) { > putKeyValuePair(globalUpdates, globalUpdate); > } > updateInfo.setGlobalParams(globalUpdates); > }{noformat} > Cc: [~jhung] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9614) Support configurable container hostname formats for YARN services
Billie Rinaldi created YARN-9614: Summary: Support configurable container hostname formats for YARN services Key: YARN-9614 URL: https://issues.apache.org/jira/browse/YARN-9614 Project: Hadoop YARN Issue Type: Improvement Reporter: Billie Rinaldi The hostname format used by YARN services is currently instance.service.user.domain. We could allow this hostname format to be configurable (with some restrictions). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9613) Avoid remote lookups for RegistryDNS domain
Billie Rinaldi created YARN-9613: Summary: Avoid remote lookups for RegistryDNS domain Key: YARN-9613 URL: https://issues.apache.org/jira/browse/YARN-9613 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.1.2 Reporter: Billie Rinaldi A typical setup for RegistryDNS is for an upstream DNS server to forward DNS queries matching the hadoop.registry.dns.domain-name to RegistryDNS. If the RegistryDNS lookup gets a non-zero DNS RCODE, RegistryDNS performs a remote lookup in upstream DNS servers. For bad queries, this can result in a loop when the upstream DNS server forwards the query back to RegistryDNS. To solve this problem, we should avoid performing remote lookups for queries within hadoop.registry.dns.domain-name, which are expected to be handled by RegistryDNS. We may also want to evaluate whether we should add a configuration property that allows the user to disable remote lookups entirely for RegistryDNS, for installations where RegistryDNS is set up as the last DNS server in a chain of DNS servers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9598) Make reservation work well when multi-node enabled
[ https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860157#comment-16860157 ] Weiwei Yang commented on YARN-9598: --- Thanks for bringing this up and the discussions. It looks like the discussion goes diverse somehow. Let's make sure we understand the problem we want to resolve here. If I understand correctly, [~jutia] was observing the issue that re-reservations are made on a single node because the policy always returns the same order. Actually, this is not the only issue, this policy may cause hot-spot node when multiple threads put allocations to same ordered nodes. I think we need to improve the policy, one possible solution like I previously commented, shuffle nodes per score-range. BTW, [~jutia], are you using this policy already in your cluster? The issue [~Tao Yang] raised is also valid, re-reservations were done by a lot of small asks happening on lots of nodes (when the cluster is busy), it will cause big players to be starving. This issue should be reproducible with SLS. I did a quick look at the patch [~Tao Yang] uploaded, but I also have the concern to disable re-reservation. How can we make sure a big container request not getting starved in such case? Maybe a way to improve this is to swap reserved container on NMs, e.g if a container is already reserved on somewhere else, then we can swap this spot with another bigger container that has no reservation yet. Just a random thought. > Make reservation work well when multi-node enabled > -- > > Key: YARN-9598 > URL: https://issues.apache.org/jira/browse/YARN-9598 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9598.001.patch, image-2019-06-10-11-37-43-283.png, > image-2019-06-10-11-37-44-975.png > > > This issue is to solve problems about reservation when multi-node enabled: > # As discussed in YARN-9576, re-reservation proposal may be always generated > on the same node and break the scheduling for this app and later apps. I > think re-reservation in unnecessary and we can replace it with > LOCALITY_SKIPPED to let scheduler have a chance to look up follow candidates > for this app when multi-node enabled. > # Scheduler iterates all nodes and try to allocate for reserved container in > LeafQueue#allocateFromReservedContainer. Here there are two problems: > ** The node of reserved container should be taken as candidates instead of > all nodes when calling FiCaSchedulerApp#assignContainers, otherwise later > scheduler may generate a reservation-fulfilled proposal on another node, > which will always be rejected in FiCaScheduler#commonCheckContainerAllocation. > ** Assignment returned by FiCaSchedulerApp#assignContainers could never be > null even if it's just skipped, it will break the normal scheduling process > for this leaf queue because of the if clause in LeafQueue#assignContainers: > "if (null != assignment) \{ return assignment;}" > # Nodes which have been reserved should be skipped when iterating candidates > in RegularContainerAllocator#allocate, otherwise scheduler may generate > allocation or reservation proposal on these node which will always be > rejected in FiCaScheduler#commonCheckContainerAllocation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9605) Add ZkConfiguredFailoverProxyProvider for RM HA
[ https://issues.apache.org/jira/browse/YARN-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859968#comment-16859968 ] Hadoop QA commented on YARN-9605: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 27s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 31s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 58s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 32s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 4m 37s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 4m 37s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 4m 37s{color} | {color:red} root in the patch failed. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 58s{color} | {color:orange} root: The patch generated 22 new + 22 unchanged - 0 fixed = 44 total (was 22) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 35s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 2m 18s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 24s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 34s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 8s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 47s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 36s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 83m 0s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}179m 6s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9605 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12971310/YARN-9605.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs
[jira] [Commented] (YARN-9471) Cleanup in TestLogAggregationIndexFileController
[ https://issues.apache.org/jira/browse/YARN-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859965#comment-16859965 ] Szilard Nemeth commented on YARN-9471: -- Hi [~adam.antal]! Thanks for this patch! I really like these kind of test-cleanup refactors. +1 (non-binding) for the latest patch! > Cleanup in TestLogAggregationIndexFileController > > > Key: YARN-9471 > URL: https://issues.apache.org/jira/browse/YARN-9471 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation, yarn >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: YARN-9471.001.patch, YARN-9471.002.patch > > > {{TestLogAggregationIndexFileController}} class can be cleaned up a bit: > - bad javadoc link > - should be renamed to TestLogAggregationIndex *ed* FileController > - some private class members can be removed > - static fields from Assert can be imported > - {{StringBuilder}} can be removed from {{logMessage}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8499) ATS v2 Generic TimelineStorageMonitor
[ https://issues.apache.org/jira/browse/YARN-8499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859957#comment-16859957 ] Szilard Nemeth commented on YARN-8499: -- Hi [~Prabhu Joseph]! Checked your changes with 012 patch. +1 (non-binding) for the latest patch! Thanks! > ATS v2 Generic TimelineStorageMonitor > - > > Key: YARN-8499 > URL: https://issues.apache.org/jira/browse/YARN-8499 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Reporter: Sunil Govindan >Assignee: Prabhu Joseph >Priority: Major > Labels: atsv2 > Attachments: YARN-8499-001.patch, YARN-8499-002.patch, > YARN-8499-003.patch, YARN-8499-004.patch, YARN-8499-005.patch, > YARN-8499-006.patch, YARN-8499-007.patch, YARN-8499-008.patch, > YARN-8499-009.patch, YARN-8499-010.patch, YARN-8499-011.patch, > YARN-8499-012.patch > > > Post YARN-8302, Hbase connection issues are handled in ATSv2. However this > could be made general by introducing an api in storage interface and > implementing in each of the storage as per the store semantics. > > cc [~rohithsharma] [~vinodkv] [~vrushalic] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859934#comment-16859934 ] Hadoop QA commented on YARN-9537: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 21s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 1s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 41s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}133m 53s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppAttempt | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9537 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12971304/YARN-9537.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 01e6f9598133 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / fcfe7a3 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/24253/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results |
[jira] [Updated] (YARN-9611) ApplicationHistoryServer related testcases failing
[ https://issues.apache.org/jira/browse/YARN-9611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9611: Component/s: test > ApplicationHistoryServer related testcases failing > -- > > Key: YARN-9611 > URL: https://issues.apache.org/jira/browse/YARN-9611 > Project: Hadoop YARN > Issue Type: Bug > Components: test, timelineserver >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7217-001.patch, YARN-9611-001.patch > > > *TestMRTimelineEventHandling.testMRTimelineEventHandling fails.* > {code:java} > ERROR] > testMRTimelineEventHandling(org.apache.hadoop.mapred.TestMRTimelineEventHandling) > Time elapsed: 46.337 s <<< FAILURE! > org.junit.ComparisonFailure: expected:<[AM_STAR]TED> but was:<[JOB_SUBMIT]TED> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.mapred.TestMRTimelineEventHandling.testMRTimelineEventHandling(TestMRTimelineEventHandling.java:147) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} > *TestJobHistoryEventHandler.testTimelineEventHandling* > {code} > [ERROR] > testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler) > Time elapsed: 5.858 s <<< FAILURE! > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:597) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at >
[jira] [Commented] (YARN-9611) ApplicationHistoryServer related testcases failing
[ https://issues.apache.org/jira/browse/YARN-9611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859929#comment-16859929 ] Prabhu Joseph commented on YARN-9611: - [~eyang] Can you review this Jira when you get time. This fixes the failing testcases related to ApplicationHistoryServer after HADOOP-16314. > ApplicationHistoryServer related testcases failing > -- > > Key: YARN-9611 > URL: https://issues.apache.org/jira/browse/YARN-9611 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7217-001.patch, YARN-9611-001.patch > > > *TestMRTimelineEventHandling.testMRTimelineEventHandling fails.* > {code:java} > ERROR] > testMRTimelineEventHandling(org.apache.hadoop.mapred.TestMRTimelineEventHandling) > Time elapsed: 46.337 s <<< FAILURE! > org.junit.ComparisonFailure: expected:<[AM_STAR]TED> but was:<[JOB_SUBMIT]TED> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.mapred.TestMRTimelineEventHandling.testMRTimelineEventHandling(TestMRTimelineEventHandling.java:147) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} > *TestJobHistoryEventHandler.testTimelineEventHandling* > {code} > [ERROR] > testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler) > Time elapsed: 5.858 s <<< FAILURE! > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:597) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at >
[jira] [Commented] (YARN-9611) ApplicationHistoryServer related testcases failing
[ https://issues.apache.org/jira/browse/YARN-9611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859897#comment-16859897 ] Hadoop QA commented on YARN-9611: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 50s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 21s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 5s{color} | {color:green} hadoop-yarn-server-tests in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 53m 26s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9611 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12971312/YARN-9611-001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 835db4efc63b 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / fcfe7a3 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24256/testReport/ | | Max. process+thread count | 615 (vs. ulimit of 1) | | modules |
[jira] [Commented] (YARN-9611) ApplicationHistoryServer related testcases failing
[ https://issues.apache.org/jira/browse/YARN-9611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859890#comment-16859890 ] Hadoop QA commented on YARN-9611: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 35s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 10s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: The patch generated 1 new + 28 unchanged - 0 fixed = 29 total (was 28) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 56s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 59s{color} | {color:green} hadoop-yarn-server-tests in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 45m 53s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9611 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12971296/MAPREDUCE-7217-001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 29b114b1b166 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / fcfe7a3 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | checkstyle |
[jira] [Commented] (YARN-9612) Support using ip to register NodeID
[ https://issues.apache.org/jira/browse/YARN-9612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859883#comment-16859883 ] Zhankun Tang commented on YARN-9612: [~cane], thanks for mentioning this. Per my understanding, the RM pod's service name in k8s can be used to register the NM? > Support using ip to register NodeID > --- > > Key: YARN-9612 > URL: https://issues.apache.org/jira/browse/YARN-9612 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhoukang >Priority: Major > > In the environment like k8s. We should support ip when register NodeID with > RM since the hostname will be podName which can not be be resolved by DNS of > k8s -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9612) Support using ip to register NodeID
zhoukang created YARN-9612: -- Summary: Support using ip to register NodeID Key: YARN-9612 URL: https://issues.apache.org/jira/browse/YARN-9612 Project: Hadoop YARN Issue Type: Improvement Reporter: zhoukang In the environment like k8s. We should support ip when register NodeID with RM since the hostname will be podName which can not be be resolved by DNS of k8s -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9611) ApplicationHistoryServer related testcases failing
[ https://issues.apache.org/jira/browse/YARN-9611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9611: Attachment: YARN-9611-001.patch > ApplicationHistoryServer related testcases failing > -- > > Key: YARN-9611 > URL: https://issues.apache.org/jira/browse/YARN-9611 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7217-001.patch, YARN-9611-001.patch > > > *TestMRTimelineEventHandling.testMRTimelineEventHandling fails.* > {code:java} > ERROR] > testMRTimelineEventHandling(org.apache.hadoop.mapred.TestMRTimelineEventHandling) > Time elapsed: 46.337 s <<< FAILURE! > org.junit.ComparisonFailure: expected:<[AM_STAR]TED> but was:<[JOB_SUBMIT]TED> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.mapred.TestMRTimelineEventHandling.testMRTimelineEventHandling(TestMRTimelineEventHandling.java:147) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} > *TestJobHistoryEventHandler.testTimelineEventHandling* > {code} > [ERROR] > testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler) > Time elapsed: 5.858 s <<< FAILURE! > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:597) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at >
[jira] [Updated] (YARN-9611) ApplicationHistoryServer related testcases failing
[ https://issues.apache.org/jira/browse/YARN-9611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9611: Component/s: timelineserver > ApplicationHistoryServer related testcases failing > -- > > Key: YARN-9611 > URL: https://issues.apache.org/jira/browse/YARN-9611 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7217-001.patch > > > *TestMRTimelineEventHandling.testMRTimelineEventHandling fails.* > {code:java} > ERROR] > testMRTimelineEventHandling(org.apache.hadoop.mapred.TestMRTimelineEventHandling) > Time elapsed: 46.337 s <<< FAILURE! > org.junit.ComparisonFailure: expected:<[AM_STAR]TED> but was:<[JOB_SUBMIT]TED> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.mapred.TestMRTimelineEventHandling.testMRTimelineEventHandling(TestMRTimelineEventHandling.java:147) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} > *TestJobHistoryEventHandler.testTimelineEventHandling* > {code} > [ERROR] > testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler) > Time elapsed: 5.858 s <<< FAILURE! > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:597) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at >
[jira] [Updated] (YARN-9611) ApplicationHistoryServer related testcases failing
[ https://issues.apache.org/jira/browse/YARN-9611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9611: Summary: ApplicationHistoryServer related testcases failing (was: TestMRTimelineEventHandling.testMRTimelineEventHandling fails) > ApplicationHistoryServer related testcases failing > -- > > Key: YARN-9611 > URL: https://issues.apache.org/jira/browse/YARN-9611 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7217-001.patch > > > *TestMRTimelineEventHandling.testMRTimelineEventHandling fails.* > {code:java} > ERROR] > testMRTimelineEventHandling(org.apache.hadoop.mapred.TestMRTimelineEventHandling) > Time elapsed: 46.337 s <<< FAILURE! > org.junit.ComparisonFailure: expected:<[AM_STAR]TED> but was:<[JOB_SUBMIT]TED> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.mapred.TestMRTimelineEventHandling.testMRTimelineEventHandling(TestMRTimelineEventHandling.java:147) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} > *TestJobHistoryEventHandler.testTimelineEventHandling* > {code} > [ERROR] > testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler) > Time elapsed: 5.858 s <<< FAILURE! > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:597) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at >
[jira] [Moved] (YARN-9611) TestMRTimelineEventHandling.testMRTimelineEventHandling fails
[ https://issues.apache.org/jira/browse/YARN-9611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph moved MAPREDUCE-7217 to YARN-9611: Affects Version/s: (was: 3.3.0) 3.3.0 Key: YARN-9611 (was: MAPREDUCE-7217) Project: Hadoop YARN (was: Hadoop Map/Reduce) > TestMRTimelineEventHandling.testMRTimelineEventHandling fails > - > > Key: YARN-9611 > URL: https://issues.apache.org/jira/browse/YARN-9611 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7217-001.patch > > > *TestMRTimelineEventHandling.testMRTimelineEventHandling fails.* > {code:java} > ERROR] > testMRTimelineEventHandling(org.apache.hadoop.mapred.TestMRTimelineEventHandling) > Time elapsed: 46.337 s <<< FAILURE! > org.junit.ComparisonFailure: expected:<[AM_STAR]TED> but was:<[JOB_SUBMIT]TED> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.mapred.TestMRTimelineEventHandling.testMRTimelineEventHandling(TestMRTimelineEventHandling.java:147) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} > *TestJobHistoryEventHandler.testTimelineEventHandling* > {code} > [ERROR] > testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler) > Time elapsed: 5.858 s <<< FAILURE! > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:597) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at >
[jira] [Updated] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-9537: --- Attachment: (was: YARN-9537.001.patch) > Add configuration to disable AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: zhoukang >Priority: Major > > In this issue, i will add a configuration to support disable AM preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-9537: --- Attachment: YARN-9537.001.patch > Add configuration to disable AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: zhoukang >Priority: Major > Attachments: YARN-9537.001.patch > > > In this issue, i will add a configuration to support disable AM preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9598) Make reservation work well when multi-node enabled
[ https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859810#comment-16859810 ] Tao Yang edited comment on YARN-9598 at 6/10/19 8:14 AM: - As I commented [above|https://issues.apache.org/jira/browse/YARN-9598?focusedCommentId=16859709=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16859709], re-reservation is harmful in multi-nodes scenarios, it can make a low-priority app get much more resources than needs which won't be released util all the needs satisfied, it's inefficient for the cluster utilization and can block requirements from high-priority apps. I think we should have a further discuss about this, a simple way is to add a configuration to control enable/disable which can be decided by users themselves, and a node-sorting policy which can put nodes with reserved containers in the back of sorting nodes is needed if re-reservation enabled. Thoughts? cc: [~cheersyang] was (Author: tao yang): As I commented [above|https://issues.apache.org/jira/browse/YARN-9598?focusedCommentId=16859709=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16859709], re-reservation is harmful in multi-nodes scenarios, it can make a low-priority app get much more resources than needs which won't be released util all the needs satisfied, it's inefficient for the cluster utilization and can block requirements from high-priority apps. I think we should have a further discuss about this, a simple way is to add a configuration to control enable/disable which can be decided by users themselves, and a node-sorting policy which can put nodes with reserved containers in the back of sorting nodes if re-reservation enabled. Thoughts? cc: [~cheersyang] > Make reservation work well when multi-node enabled > -- > > Key: YARN-9598 > URL: https://issues.apache.org/jira/browse/YARN-9598 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9598.001.patch, image-2019-06-10-11-37-43-283.png, > image-2019-06-10-11-37-44-975.png > > > This issue is to solve problems about reservation when multi-node enabled: > # As discussed in YARN-9576, re-reservation proposal may be always generated > on the same node and break the scheduling for this app and later apps. I > think re-reservation in unnecessary and we can replace it with > LOCALITY_SKIPPED to let scheduler have a chance to look up follow candidates > for this app when multi-node enabled. > # Scheduler iterates all nodes and try to allocate for reserved container in > LeafQueue#allocateFromReservedContainer. Here there are two problems: > ** The node of reserved container should be taken as candidates instead of > all nodes when calling FiCaSchedulerApp#assignContainers, otherwise later > scheduler may generate a reservation-fulfilled proposal on another node, > which will always be rejected in FiCaScheduler#commonCheckContainerAllocation. > ** Assignment returned by FiCaSchedulerApp#assignContainers could never be > null even if it's just skipped, it will break the normal scheduling process > for this leaf queue because of the if clause in LeafQueue#assignContainers: > "if (null != assignment) \{ return assignment;}" > # Nodes which have been reserved should be skipped when iterating candidates > in RegularContainerAllocator#allocate, otherwise scheduler may generate > allocation or reservation proposal on these node which will always be > rejected in FiCaScheduler#commonCheckContainerAllocation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9598) Make reservation work well when multi-node enabled
[ https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859810#comment-16859810 ] Tao Yang commented on YARN-9598: As I commented [above|https://issues.apache.org/jira/browse/YARN-9598?focusedCommentId=16859709=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16859709], re-reservation is harmful in multi-nodes scenarios, it can make a low-priority app get much more resources than needs which won't be released util all the needs satisfied, it's inefficient for the cluster utilization and can block requirements from high-priority apps. I think we should have a further discuss about this, a simple way is to add a configuration to control enable/disable which can be decided by users themselves, and a node-sorting policy which can put nodes with reserved containers in the back of sorting nodes if re-reservation enabled. Thoughts? cc: [~cheersyang] > Make reservation work well when multi-node enabled > -- > > Key: YARN-9598 > URL: https://issues.apache.org/jira/browse/YARN-9598 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9598.001.patch, image-2019-06-10-11-37-43-283.png, > image-2019-06-10-11-37-44-975.png > > > This issue is to solve problems about reservation when multi-node enabled: > # As discussed in YARN-9576, re-reservation proposal may be always generated > on the same node and break the scheduling for this app and later apps. I > think re-reservation in unnecessary and we can replace it with > LOCALITY_SKIPPED to let scheduler have a chance to look up follow candidates > for this app when multi-node enabled. > # Scheduler iterates all nodes and try to allocate for reserved container in > LeafQueue#allocateFromReservedContainer. Here there are two problems: > ** The node of reserved container should be taken as candidates instead of > all nodes when calling FiCaSchedulerApp#assignContainers, otherwise later > scheduler may generate a reservation-fulfilled proposal on another node, > which will always be rejected in FiCaScheduler#commonCheckContainerAllocation. > ** Assignment returned by FiCaSchedulerApp#assignContainers could never be > null even if it's just skipped, it will break the normal scheduling process > for this leaf queue because of the if clause in LeafQueue#assignContainers: > "if (null != assignment) \{ return assignment;}" > # Nodes which have been reserved should be skipped when iterating candidates > in RegularContainerAllocator#allocate, otherwise scheduler may generate > allocation or reservation proposal on these node which will always be > rejected in FiCaScheduler#commonCheckContainerAllocation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9598) Make reservation work well when multi-node enabled
[ https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859799#comment-16859799 ] Juanjuan Tian commented on YARN-9598: -- "inter-queue preemption can't happened because of resource fragmentation while cluster resource still have 20GB available memory, right?" I will think the answer is yes. I agree "it's not re-reservation's business but can be worked around by it". re-reservation can results in many reservation on many nodes, and then finally trigger preemption, it's a workround for preemption not smart enough. So I think we should reconsider the re-reservation logic in this patch. > Make reservation work well when multi-node enabled > -- > > Key: YARN-9598 > URL: https://issues.apache.org/jira/browse/YARN-9598 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9598.001.patch, image-2019-06-10-11-37-43-283.png, > image-2019-06-10-11-37-44-975.png > > > This issue is to solve problems about reservation when multi-node enabled: > # As discussed in YARN-9576, re-reservation proposal may be always generated > on the same node and break the scheduling for this app and later apps. I > think re-reservation in unnecessary and we can replace it with > LOCALITY_SKIPPED to let scheduler have a chance to look up follow candidates > for this app when multi-node enabled. > # Scheduler iterates all nodes and try to allocate for reserved container in > LeafQueue#allocateFromReservedContainer. Here there are two problems: > ** The node of reserved container should be taken as candidates instead of > all nodes when calling FiCaSchedulerApp#assignContainers, otherwise later > scheduler may generate a reservation-fulfilled proposal on another node, > which will always be rejected in FiCaScheduler#commonCheckContainerAllocation. > ** Assignment returned by FiCaSchedulerApp#assignContainers could never be > null even if it's just skipped, it will break the normal scheduling process > for this leaf queue because of the if clause in LeafQueue#assignContainers: > "if (null != assignment) \{ return assignment;}" > # Nodes which have been reserved should be skipped when iterating candidates > in RegularContainerAllocator#allocate, otherwise scheduler may generate > allocation or reservation proposal on these node which will always be > rejected in FiCaScheduler#commonCheckContainerAllocation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9598) Make reservation work well when multi-node enabled
[ https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859790#comment-16859790 ] Tao Yang edited comment on YARN-9598 at 6/10/19 7:38 AM: - It's weird to hear that preemption should depends on excess reservations. I think inter-queue preemption can't happened because of resource fragmentation while cluster resource still have 20GB available memory, right? That's indeed a problem in current preemption logic of community. If it is, I think it's not re-reservation's business but can be worked around by it, and re-reservation may hardly help for this in a large cluster. was (Author: tao yang): It's weird to hear that preemption should depends on excess reservations. I think inter-queue preemption can't happened because of resource fragmentation while cluster resource still have 20GB available memory, right? That's indeed a problem in current preemption logic of community. If it is, I think it's no re-reservation's business but can be worked around by it, and re-reservation may hardly help for this in a large cluster. > Make reservation work well when multi-node enabled > -- > > Key: YARN-9598 > URL: https://issues.apache.org/jira/browse/YARN-9598 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9598.001.patch, image-2019-06-10-11-37-43-283.png, > image-2019-06-10-11-37-44-975.png > > > This issue is to solve problems about reservation when multi-node enabled: > # As discussed in YARN-9576, re-reservation proposal may be always generated > on the same node and break the scheduling for this app and later apps. I > think re-reservation in unnecessary and we can replace it with > LOCALITY_SKIPPED to let scheduler have a chance to look up follow candidates > for this app when multi-node enabled. > # Scheduler iterates all nodes and try to allocate for reserved container in > LeafQueue#allocateFromReservedContainer. Here there are two problems: > ** The node of reserved container should be taken as candidates instead of > all nodes when calling FiCaSchedulerApp#assignContainers, otherwise later > scheduler may generate a reservation-fulfilled proposal on another node, > which will always be rejected in FiCaScheduler#commonCheckContainerAllocation. > ** Assignment returned by FiCaSchedulerApp#assignContainers could never be > null even if it's just skipped, it will break the normal scheduling process > for this leaf queue because of the if clause in LeafQueue#assignContainers: > "if (null != assignment) \{ return assignment;}" > # Nodes which have been reserved should be skipped when iterating candidates > in RegularContainerAllocator#allocate, otherwise scheduler may generate > allocation or reservation proposal on these node which will always be > rejected in FiCaScheduler#commonCheckContainerAllocation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9598) Make reservation work well when multi-node enabled
[ https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859790#comment-16859790 ] Tao Yang commented on YARN-9598: It's weird to hear that preemption should depends on excess reservations. I think inter-queue preemption can't happened because of resource fragmentation while cluster resource still have 20GB available memory, right? That's indeed a problem in current preemption logic of community. If it is, I think it's no re-reservation's business but can be worked around by it, and re-reservation may hardly help for this in a large cluster. > Make reservation work well when multi-node enabled > -- > > Key: YARN-9598 > URL: https://issues.apache.org/jira/browse/YARN-9598 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9598.001.patch, image-2019-06-10-11-37-43-283.png, > image-2019-06-10-11-37-44-975.png > > > This issue is to solve problems about reservation when multi-node enabled: > # As discussed in YARN-9576, re-reservation proposal may be always generated > on the same node and break the scheduling for this app and later apps. I > think re-reservation in unnecessary and we can replace it with > LOCALITY_SKIPPED to let scheduler have a chance to look up follow candidates > for this app when multi-node enabled. > # Scheduler iterates all nodes and try to allocate for reserved container in > LeafQueue#allocateFromReservedContainer. Here there are two problems: > ** The node of reserved container should be taken as candidates instead of > all nodes when calling FiCaSchedulerApp#assignContainers, otherwise later > scheduler may generate a reservation-fulfilled proposal on another node, > which will always be rejected in FiCaScheduler#commonCheckContainerAllocation. > ** Assignment returned by FiCaSchedulerApp#assignContainers could never be > null even if it's just skipped, it will break the normal scheduling process > for this leaf queue because of the if clause in LeafQueue#assignContainers: > "if (null != assignment) \{ return assignment;}" > # Nodes which have been reserved should be skipped when iterating candidates > in RegularContainerAllocator#allocate, otherwise scheduler may generate > allocation or reservation proposal on these node which will always be > rejected in FiCaScheduler#commonCheckContainerAllocation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9598) Make reservation work well when multi-node enabled
[ https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859776#comment-16859776 ] Juanjuan Tian commented on YARN-9598: -- Hi, [~Tao Yang], just like you said, there will always be one reserved container when re-reservation disabled, and thus even when inter-queue preemption is enabled in cluster, preemption will not happen. But if we can reseve several containers, preemption can be triggered (yarn.resourcemanager.monitor.capacity.preemption.additional_res_balance_based_on_reserved_containers is set to true ) > Make reservation work well when multi-node enabled > -- > > Key: YARN-9598 > URL: https://issues.apache.org/jira/browse/YARN-9598 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9598.001.patch, image-2019-06-10-11-37-43-283.png, > image-2019-06-10-11-37-44-975.png > > > This issue is to solve problems about reservation when multi-node enabled: > # As discussed in YARN-9576, re-reservation proposal may be always generated > on the same node and break the scheduling for this app and later apps. I > think re-reservation in unnecessary and we can replace it with > LOCALITY_SKIPPED to let scheduler have a chance to look up follow candidates > for this app when multi-node enabled. > # Scheduler iterates all nodes and try to allocate for reserved container in > LeafQueue#allocateFromReservedContainer. Here there are two problems: > ** The node of reserved container should be taken as candidates instead of > all nodes when calling FiCaSchedulerApp#assignContainers, otherwise later > scheduler may generate a reservation-fulfilled proposal on another node, > which will always be rejected in FiCaScheduler#commonCheckContainerAllocation. > ** Assignment returned by FiCaSchedulerApp#assignContainers could never be > null even if it's just skipped, it will break the normal scheduling process > for this leaf queue because of the if clause in LeafQueue#assignContainers: > "if (null != assignment) \{ return assignment;}" > # Nodes which have been reserved should be skipped when iterating candidates > in RegularContainerAllocator#allocate, otherwise scheduler may generate > allocation or reservation proposal on these node which will always be > rejected in FiCaScheduler#commonCheckContainerAllocation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9598) Make reservation work well when multi-node enabled
[ https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859760#comment-16859760 ] Tao Yang commented on YARN-9598: Hi, [~jutia]. In your example, queue A have been allocated 60GB and left only 2GB on every node, when queue B need a 3GB container, scheduler may reserve one container on one node. It sounds unrelated to whether re-reservation is enabled, I think it's about resource fragmentation and a simple way to solve the problem is inter-queue preemption. If inter-queue preemption is disabled in your cluster, there may be several reserved containers after many rounds of scheduling process when re-reservation enable and there will always be one reserved container when re-reservation disabled, that's the main difference between them and there will be an allocation and reserved container will be unreserved or fulfilled when someone node has enough resource (for example container allocated on it just finished). Please correct me if I was wrong. > Make reservation work well when multi-node enabled > -- > > Key: YARN-9598 > URL: https://issues.apache.org/jira/browse/YARN-9598 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9598.001.patch, image-2019-06-10-11-37-43-283.png, > image-2019-06-10-11-37-44-975.png > > > This issue is to solve problems about reservation when multi-node enabled: > # As discussed in YARN-9576, re-reservation proposal may be always generated > on the same node and break the scheduling for this app and later apps. I > think re-reservation in unnecessary and we can replace it with > LOCALITY_SKIPPED to let scheduler have a chance to look up follow candidates > for this app when multi-node enabled. > # Scheduler iterates all nodes and try to allocate for reserved container in > LeafQueue#allocateFromReservedContainer. Here there are two problems: > ** The node of reserved container should be taken as candidates instead of > all nodes when calling FiCaSchedulerApp#assignContainers, otherwise later > scheduler may generate a reservation-fulfilled proposal on another node, > which will always be rejected in FiCaScheduler#commonCheckContainerAllocation. > ** Assignment returned by FiCaSchedulerApp#assignContainers could never be > null even if it's just skipped, it will break the normal scheduling process > for this leaf queue because of the if clause in LeafQueue#assignContainers: > "if (null != assignment) \{ return assignment;}" > # Nodes which have been reserved should be skipped when iterating candidates > in RegularContainerAllocator#allocate, otherwise scheduler may generate > allocation or reservation proposal on these node which will always be > rejected in FiCaScheduler#commonCheckContainerAllocation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9598) Make reservation work well when multi-node enabled
[ https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859689#comment-16859689 ] Juanjuan Tian edited comment on YARN-9598 at 6/10/19 6:45 AM: --- Hi Tao, # As discussed in YARN-9576, re-reservation proposal may be always generated on the same node and break the scheduling for this app and later apps. I think re-reservation is unnecessary and we can replace it with LOCALITY_SKIPPED to let scheduler have a chance to look up follow candidates for this app when multi-node enabled. for this, if re-reservation is disabled, the shouldAllocOrReserveNewContainer may return false in most cases, and thus even scheduler has a change to look up other candidates, it may not assign containers. 2. After this patch, since Assignment returned by FiCaSchedulerApp#assignContainers could never be null even if it's just skipped, thus, even only one of the candidates has been reserved for a contianer, the allocateFromReservedContainer will still never be null, it still breaks normal scheduler process. So I'm wondering if we can just handle this case like sing-node, and change th logic in CapacityScheduler#allocateContainersOnMultiNodes{color:#d04437} like below{color} !image-2019-06-10-11-37-44-975.png! /* * New behavior, allocate containers considering multiple nodes */ private CSAssignment allocateContainersOnMultiNodes( {color:#d04437}FiCaSchedulerNode schedulerNode{color}) { // Backward compatible way to make sure previous behavior which allocation // driven by node heartbeat works. if (getNode(schedulerNode.getNodeID()) != schedulerNode) { LOG.error("Trying to schedule on a removed node, please double check."); return null; } // Assign new containers... // 1. Check for reserved applications // 2. Schedule if there are no reservations RMContainer reservedRMContainer = schedulerNode.getReservedContainer(); {color:#d04437}if (reservedRMContainer != null) {{color} allocateFromReservedContainer(schedulerNode, false, reservedRMContainer); } // Do not schedule if there are any reservations to fulfill on the node if (schedulerNode.getReservedContainer() != null) { if (LOG.isDebugEnabled()) { LOG.debug("Skipping scheduling since node " + schedulerNode.getNodeID() + " is reserved by application " + schedulerNode.getReservedContainer() .getContainerId().getApplicationAttemptId()); } return null; } {color:#d04437}PlacementSet ps = getCandidateNodeSet(schedulerNode);{color} // When this time look at multiple nodes, try schedule if the // partition has any available resource or killable resource if (getRootQueue().getQueueCapacities().getUsedCapacity( ps.getPartition()) >= 1.0f && preemptionManager.getKillableResource( CapacitySchedulerConfiguration.ROOT, ps.getPartition()) == Resources .none()) { was (Author: jutia): Hi Tao, # As discussed in YARN-9576, re-reservation proposal may be always generated on the same node and break the scheduling for this app and later apps. I think re-reservation is unnecessary and we can replace it with LOCALITY_SKIPPED to let scheduler have a chance to look up follow candidates for this app when multi-node enabled. for this, if re-reservation is disabled, the shouldAllocOrReserveNewContainer may return false in most cases, and thus even scheduler has a change to look up other candidates, it may not assign containers. 2. After this patch, since Assignment returned by FiCaSchedulerApp#assignContainers could never be null even if it's just skipped, thus, even only one of the candidates has been reserved for a contianer, the allocateFromReservedContainer will still never be null, it still breaks normal scheduler process. So I'm wondering why we just handle this case like sing-node, and change th logic in CapacityScheduler#allocateContainersOnMultiNodes{color:#d04437} like below{color} !image-2019-06-10-11-37-44-975.png! /* * New behavior, allocate containers considering multiple nodes */ private CSAssignment allocateContainersOnMultiNodes( {color:#d04437}FiCaSchedulerNode schedulerNode{color}) { // Backward compatible way to make sure previous behavior which allocation // driven by node heartbeat works. if (getNode(schedulerNode.getNodeID()) != schedulerNode) { LOG.error("Trying to schedule on a removed node, please double check."); return null; } // Assign new containers... // 1. Check for reserved applications // 2. Schedule if there are no reservations RMContainer reservedRMContainer = schedulerNode.getReservedContainer(); {color:#d04437}if (reservedRMContainer != null) {{color} allocateFromReservedContainer(schedulerNode, false, reservedRMContainer); } // Do not schedule if there are any reservations to fulfill on the node if (schedulerNode.getReservedContainer() != null) { if
[jira] [Comment Edited] (YARN-9598) Make reservation work well when multi-node enabled
[ https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859740#comment-16859740 ] Juanjuan Tian edited comment on YARN-9598 at 6/10/19 6:36 AM: --- Hi Tao, {noformat} disable re-reservation can only make the scheduler skip reserving the same container repeatedly and try to allocate on other nodes, it won't affect normal scheduling for this app and later apps. Thoughts?{noformat} for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in cluster, and two queues A,B, each is configured with 50% capacity. firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, and each node of the 10 nodes will have a contianer allocated. Afterwards, another job JobB which requests 3G resource is submited to queue B, and there will be one container with 3G size reserved on node h1, if we disable re-reservation, in this case, even scheduler can look up other nodes, since the shouldAllocOrReserveNewContainer is false, there is still on other reservations, and JobB will still get stuck. was (Author: jutia): Hi Tao, { } disable re-reservation can only make the scheduler skip reserving the same container repeatedly and try to allocate on other nodes, it won't affect normal scheduling for this app and later apps. Thoughts? {} for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in cluster, and two queues A,B, each is configured with 50% capacity. firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, and each node of the 10 nodes will have a contianer allocated. Afterwards, another job JobB which requests 3G resource is submited to queue B, and there will be one container with 3G size reserved on node h1 > Make reservation work well when multi-node enabled > -- > > Key: YARN-9598 > URL: https://issues.apache.org/jira/browse/YARN-9598 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9598.001.patch, image-2019-06-10-11-37-43-283.png, > image-2019-06-10-11-37-44-975.png > > > This issue is to solve problems about reservation when multi-node enabled: > # As discussed in YARN-9576, re-reservation proposal may be always generated > on the same node and break the scheduling for this app and later apps. I > think re-reservation in unnecessary and we can replace it with > LOCALITY_SKIPPED to let scheduler have a chance to look up follow candidates > for this app when multi-node enabled. > # Scheduler iterates all nodes and try to allocate for reserved container in > LeafQueue#allocateFromReservedContainer. Here there are two problems: > ** The node of reserved container should be taken as candidates instead of > all nodes when calling FiCaSchedulerApp#assignContainers, otherwise later > scheduler may generate a reservation-fulfilled proposal on another node, > which will always be rejected in FiCaScheduler#commonCheckContainerAllocation. > ** Assignment returned by FiCaSchedulerApp#assignContainers could never be > null even if it's just skipped, it will break the normal scheduling process > for this leaf queue because of the if clause in LeafQueue#assignContainers: > "if (null != assignment) \{ return assignment;}" > # Nodes which have been reserved should be skipped when iterating candidates > in RegularContainerAllocator#allocate, otherwise scheduler may generate > allocation or reservation proposal on these node which will always be > rejected in FiCaScheduler#commonCheckContainerAllocation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9598) Make reservation work well when multi-node enabled
[ https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859740#comment-16859740 ] Juanjuan Tian commented on YARN-9598: -- Hi Tao, { } disable re-reservation can only make the scheduler skip reserving the same container repeatedly and try to allocate on other nodes, it won't affect normal scheduling for this app and later apps. Thoughts? {} for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in cluster, and two queues A,B, each is configured with 50% capacity. firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, and each node of the 10 nodes will have a contianer allocated. Afterwards, another job JobB which requests 3G resource is submited to queue B, and there will be one container with 3G size reserved on node h1 > Make reservation work well when multi-node enabled > -- > > Key: YARN-9598 > URL: https://issues.apache.org/jira/browse/YARN-9598 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9598.001.patch, image-2019-06-10-11-37-43-283.png, > image-2019-06-10-11-37-44-975.png > > > This issue is to solve problems about reservation when multi-node enabled: > # As discussed in YARN-9576, re-reservation proposal may be always generated > on the same node and break the scheduling for this app and later apps. I > think re-reservation in unnecessary and we can replace it with > LOCALITY_SKIPPED to let scheduler have a chance to look up follow candidates > for this app when multi-node enabled. > # Scheduler iterates all nodes and try to allocate for reserved container in > LeafQueue#allocateFromReservedContainer. Here there are two problems: > ** The node of reserved container should be taken as candidates instead of > all nodes when calling FiCaSchedulerApp#assignContainers, otherwise later > scheduler may generate a reservation-fulfilled proposal on another node, > which will always be rejected in FiCaScheduler#commonCheckContainerAllocation. > ** Assignment returned by FiCaSchedulerApp#assignContainers could never be > null even if it's just skipped, it will break the normal scheduling process > for this leaf queue because of the if clause in LeafQueue#assignContainers: > "if (null != assignment) \{ return assignment;}" > # Nodes which have been reserved should be skipped when iterating candidates > in RegularContainerAllocator#allocate, otherwise scheduler may generate > allocation or reservation proposal on these node which will always be > rejected in FiCaScheduler#commonCheckContainerAllocation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9608) DecommissioningNodesWatcher should get lists of running applications on node from RMNode.
[ https://issues.apache.org/jira/browse/YARN-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859738#comment-16859738 ] Abhishek Modi commented on YARN-9608: - [~subru] [~elgoiri] [~giovanni.fumarola] could you please review it. Thanks. > DecommissioningNodesWatcher should get lists of running applications on node > from RMNode. > - > > Key: YARN-9608 > URL: https://issues.apache.org/jira/browse/YARN-9608 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9608.001.patch > > > At present, DecommissioningNodesWatcher tracks list of running applications > and triggers decommission of nodes when all the applications that ran on the > node completes. This Jira proposes to solve following problem: > # DecommissioningNodesWatcher skips tracking application containers on a > particular node before the node is in DECOMMISSIONING state. It only tracks > containers once the node is in DECOMMISSIONING state. This can lead to > shuffle data loss of apps whose containers ran on this node before it was > moved to decommissioning state. > # It is keeping track of running apps. We can leverage this directly from > RMNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9610) HeartbeatCallBack int FederationInterceptor clear AMRMToken in response from UAM should before add to aysncResponseSink
Morty Zhong created YARN-9610: - Summary: HeartbeatCallBack int FederationInterceptor clear AMRMToken in response from UAM should before add to aysncResponseSink Key: YARN-9610 URL: https://issues.apache.org/jira/browse/YARN-9610 Project: Hadoop YARN Issue Type: Bug Components: amrmproxy, federation Affects Versions: 3.1.2 Environment: in federation, `allocate` is async. the response from RM is cached in `asyncResponseSink`. the final allocate response is merged from all RMs allocate response. merge will throw exception when AMRMToken from UAM response is not null. But set AMRMToken from UAM response to null is not in the scope of lock. so there will be a change merge see that AMRMToken from UAM response is not null. so we should clear the token before add response to asyncResponseSink {code:java} synchronized (asyncResponseSink) { List responses = null; if (asyncResponseSink.containsKey(subClusterId)) { responses = asyncResponseSink.get(subClusterId); } else { responses = new ArrayList<>(); asyncResponseSink.put(subClusterId, responses); } responses.add(response); // Notify main thread about the response arrival asyncResponseSink.notifyAll(); } ... if (this.isUAM && response.getAMRMToken() != null) { Token newToken = ConverterUtils .convertFromYarn(response.getAMRMToken(), (Text) null); // Do not further propagate the new amrmToken for UAM response.setAMRMToken(null); ...{code} Reporter: Morty Zhong -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org