[jira] [Commented] (YARN-4597) Add SCHEDULE to NM container lifecycle
[ https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656417#comment-15656417 ] Arun Suresh commented on YARN-4597: --- Appreciate the review [~kkaranasos], 1. bq. The Container has two new methods (sendLaunchEvent and sendKillEvent), which are public and are not following.. sendKillEvent is used by the Scheduler (which is in another package) to kill a container. Since this patch introduces an external entity that launches and kills a container, viz. the Scheduler, I feel it is apt to keep both as public methods. I prefer it to 'dispatcher.getEventHandler().handle..'. 2. The Container needs to be added to the {{nodeUpdateQueue}} if the container is to be move from ACQUIRED to RUNNING state (this is a state transition all containers should go thru). Regarding the {{launchedContainers}}, Lets have both Opportunistic and Guaranteed containers flow through a common code-path... and introduce specific behaviors if required in subsequent patches as and when required. 3. bq. In the OpportunisticContainerAllocatorAMService we are now calling the SchedulerNode::allocate, and then we do not update the used resources but we do update some other counters, which leads to inconsistencies. Hmmm... I do see that the numContainers are not decremented correctly when release. Thanks... but it looks like it would more likely just impact reporting / UI, nothing functional (Will update the patch). Can you specify which other counters specifically ? Like I mentioned in the previous patch.. lets run all containers thru as much of the common code path before we add new counters etc. 4. bq. Maybe as part of a different JIRA, we should at some point extend the container.metrics in the ContainerImpl to keep track of the scheduled/queued containers. Yup.. +1 to that. The rest of your comments make sense... will update patch. bq. let's stress-test the code in a cluster before committing to make sure everything is good It has been tested on a 3 node cluster and MR Pi jobs (with opportunistic containers) and I didn't hit any major issues. We can always open follow-up JIRAs for specific performance related issues as and when we find it. Besides, stess-testing is not really a precondition to committing a patch. > Add SCHEDULE to NM container lifecycle > -- > > Key: YARN-4597 > URL: https://issues.apache.org/jira/browse/YARN-4597 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Chris Douglas >Assignee: Arun Suresh > Labels: oct16-hard > Attachments: YARN-4597.001.patch, YARN-4597.002.patch, > YARN-4597.003.patch, YARN-4597.004.patch, YARN-4597.005.patch, > YARN-4597.006.patch, YARN-4597.007.patch, YARN-4597.008.patch, > YARN-4597.009.patch > > > Currently, the NM immediately launches containers after resource > localization. Several features could be more cleanly implemented if the NM > included a separate stage for reserving resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5792) adopt the id prefix for YARN, MR, and DS entities
[ https://issues.apache.org/jira/browse/YARN-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-5792: --- Attachment: YARN-5792-YARN-5355.03.patch > adopt the id prefix for YARN, MR, and DS entities > - > > Key: YARN-5792 > URL: https://issues.apache.org/jira/browse/YARN-5792 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-5792-YARN-5355.01.patch, > YARN-5792-YARN-5355.02.patch, YARN-5792-YARN-5355.03.patch > > > We introduced the entity id prefix to support flexible entity sorting > (YARN-5715). We should adopt the id prefix for YARN entities, MR entities, > and DS entities to take advantage of the id prefix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656333#comment-15656333 ] Naganarasimha G R commented on YARN-5765: - @Thanks [~haibochen] & [~miklos.szeg...@cloudera.com] for some insightful comments There are 2 other places apart from launch_container_as_user where in mkdirs are getting used. {code} main RUN_AS_USER_INITIALIZE_CONTAINER mount_cgroup mkdirs create_validate_dir MOUNT_CGROUPS initialize_app mkdirs create_validate_dir {code} IIUC only setting umask before change_effective_user would not be ideal as it would be required in other places too. What i want to understand is what impact would it have if we do it always ? As we never run the container-executor.c binary with root user refer (set_user -> check_user) and would it be sufficient to reset the umask after mkdir ? bq. This means that by removing chmod this change does not apply to cases anymore, when the default ACL is too restrictive. Could this be an issue, or do we rely on the admin to set the default ACL correctly? Good query ... something to be thought about ! not sure we will be able to handle it. One more question is if we reset the umask after mkdir then will the container logs created will be accessible to the NM because of restrictive rights ? would be ideal to set default ACL for the folders created and reset the umask so that files created by the user under these directories have the rightful permissions? > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Naganarasimha G R >Priority: Blocker > Attachments: YARN-5765.001.patch > > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example job > as a user that does not belong to the same group that NM process belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5600) Add a parameter to ContainerLaunchContext to emulate yarn.nodemanager.delete.debug-delay-sec on a per-application basis
[ https://issues.apache.org/jira/browse/YARN-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656248#comment-15656248 ] Hadoop QA commented on YARN-5600: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 52s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 54s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 538 unchanged - 21 fixed = 538 total (was 559) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 34s{color} | {color:red} hadoop-yarn-api in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 15m 44s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 57m 11s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.conf.TestYarnConfigurationFields | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:e809691 | | JIRA Issue | YARN-5600 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12838500/YARN-5600.008.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 99fb6d6b2e4f 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8848a8a | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/13868/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/13868/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
[jira] [Updated] (YARN-5600) Add a parameter to ContainerLaunchContext to emulate yarn.nodemanager.delete.debug-delay-sec on a per-application basis
[ https://issues.apache.org/jira/browse/YARN-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi updated YARN-5600: - Attachment: YARN-5600.008.patch Updated patch based on the previous request. > Add a parameter to ContainerLaunchContext to emulate > yarn.nodemanager.delete.debug-delay-sec on a per-application basis > --- > > Key: YARN-5600 > URL: https://issues.apache.org/jira/browse/YARN-5600 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.0.0-alpha1 >Reporter: Daniel Templeton >Assignee: Miklos Szegedi > Labels: oct16-medium > Attachments: YARN-5600.000.patch, YARN-5600.001.patch, > YARN-5600.002.patch, YARN-5600.003.patch, YARN-5600.004.patch, > YARN-5600.005.patch, YARN-5600.006.patch, YARN-5600.007.patch, > YARN-5600.008.patch > > > To make debugging application launch failures simpler, I'd like to add a > parameter to the CLC to allow an application owner to request delayed > deletion of the application's launch artifacts. > This JIRA solves largely the same problem as YARN-5599, but for cases where > ATS is not in use, e.g. branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5814) Add druid as storage backend in YARN Timeline Service
[ https://issues.apache.org/jira/browse/YARN-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bingxue Qiu updated YARN-5814: -- Attachment: Add-Druid-in-YARN-Timeline-Service.pdf > Add druid as storage backend in YARN Timeline Service > -- > > Key: YARN-5814 > URL: https://issues.apache.org/jira/browse/YARN-5814 > Project: Hadoop YARN > Issue Type: New Feature > Components: ATSv2 >Affects Versions: 3.0.0-alpha2 >Reporter: Bingxue Qiu > Attachments: Add-Druid-in-YARN-Timeline-Service.pdf > > > h3. Introduction > I propose to add druid as storage backend in YARN Timeline Service. > We run more than 6000 applications and generate 450 million metrics daily in > Alibaba Clusters with thousands of nodes. We need to collect and store > meta/events/metrics data, online analyze the utilization reports of various > dimensions and display the trends of allocation/usage resources for cluster > by joining and aggregating data. It helps us to manage and optimize the > cluster by tracking resource utilization. > To achieve our goal we have changed to use druid as the storage instead of > HBase and have achieved sub-second OLAP performance in our production > environment for few months. > h3. Analysis > Currently YARN Timeline Service only supports aggregating metrics at a) flow > level by FlowRunCoprocessor and b) application level metrics aggregating by > AppLevelTimelineCollector, offline (time-based periodic) aggregation for > flows/users/queues for reporting and analysis is planned but not yet > implemented. YARN Timeline Service chooses Apache HBase as the primary > storage backend. As we all know that HBase doesn't fit for OLAP. > For arbitrary exploration of data,such as online analyze the utilization > reports of various dimensions(Queue,Flow,Users,Application,CPU,Memory) by > joining and aggregating data, Druid's custom column format enables ad-hoc > queries without pre-computation. The format also enables fast scans on > columns, which is important for good aggregation performance. > To achieve our goal that support to online analyze the utilization reports of > various dimensions, display the variation trends of allocation/usage > resources for cluster, and arbitrary exploration of data, we propose to add > druid storage and implement DruidWriter /DruidReader in YARN Timeline Service. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5814) Add druid as storage backend in YARN Timeline Service
[ https://issues.apache.org/jira/browse/YARN-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bingxue Qiu updated YARN-5814: -- Attachment: (was: Add-Druid-in-YARN-Timeline-Service.pdf) > Add druid as storage backend in YARN Timeline Service > -- > > Key: YARN-5814 > URL: https://issues.apache.org/jira/browse/YARN-5814 > Project: Hadoop YARN > Issue Type: New Feature > Components: ATSv2 >Affects Versions: 3.0.0-alpha2 >Reporter: Bingxue Qiu > > h3. Introduction > I propose to add druid as storage backend in YARN Timeline Service. > We run more than 6000 applications and generate 450 million metrics daily in > Alibaba Clusters with thousands of nodes. We need to collect and store > meta/events/metrics data, online analyze the utilization reports of various > dimensions and display the trends of allocation/usage resources for cluster > by joining and aggregating data. It helps us to manage and optimize the > cluster by tracking resource utilization. > To achieve our goal we have changed to use druid as the storage instead of > HBase and have achieved sub-second OLAP performance in our production > environment for few months. > h3. Analysis > Currently YARN Timeline Service only supports aggregating metrics at a) flow > level by FlowRunCoprocessor and b) application level metrics aggregating by > AppLevelTimelineCollector, offline (time-based periodic) aggregation for > flows/users/queues for reporting and analysis is planned but not yet > implemented. YARN Timeline Service chooses Apache HBase as the primary > storage backend. As we all know that HBase doesn't fit for OLAP. > For arbitrary exploration of data,such as online analyze the utilization > reports of various dimensions(Queue,Flow,Users,Application,CPU,Memory) by > joining and aggregating data, Druid's custom column format enables ad-hoc > queries without pre-computation. The format also enables fast scans on > columns, which is important for good aggregation performance. > To achieve our goal that support to online analyze the utilization reports of > various dimensions, display the variation trends of allocation/usage > resources for cluster, and arbitrary exploration of data, we propose to add > druid storage and implement DruidWriter /DruidReader in YARN Timeline Service. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5814) Add druid as storage backend in YARN Timeline Service
[ https://issues.apache.org/jira/browse/YARN-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bingxue Qiu updated YARN-5814: -- Attachment: Add-Druid-in-YARN-Timeline-Service.pdf > Add druid as storage backend in YARN Timeline Service > -- > > Key: YARN-5814 > URL: https://issues.apache.org/jira/browse/YARN-5814 > Project: Hadoop YARN > Issue Type: New Feature > Components: ATSv2 >Affects Versions: 3.0.0-alpha2 >Reporter: Bingxue Qiu > Attachments: Add-Druid-in-YARN-Timeline-Service.pdf > > > h3. Introduction > I propose to add druid as storage backend in YARN Timeline Service. > We run more than 6000 applications and generate 450 million metrics daily in > Alibaba Clusters with thousands of nodes. We need to collect and store > meta/events/metrics data, online analyze the utilization reports of various > dimensions and display the trends of allocation/usage resources for cluster > by joining and aggregating data. It helps us to manage and optimize the > cluster by tracking resource utilization. > To achieve our goal we have changed to use druid as the storage instead of > HBase and have achieved sub-second OLAP performance in our production > environment for few months. > h3. Analysis > Currently YARN Timeline Service only supports aggregating metrics at a) flow > level by FlowRunCoprocessor and b) application level metrics aggregating by > AppLevelTimelineCollector, offline (time-based periodic) aggregation for > flows/users/queues for reporting and analysis is planned but not yet > implemented. YARN Timeline Service chooses Apache HBase as the primary > storage backend. As we all know that HBase doesn't fit for OLAP. > For arbitrary exploration of data,such as online analyze the utilization > reports of various dimensions(Queue,Flow,Users,Application,CPU,Memory) by > joining and aggregating data, Druid's custom column format enables ad-hoc > queries without pre-computation. The format also enables fast scans on > columns, which is important for good aggregation performance. > To achieve our goal that support to online analyze the utilization reports of > various dimensions, display the variation trends of allocation/usage > resources for cluster, and arbitrary exploration of data, we propose to add > druid storage and implement DruidWriter /DruidReader in YARN Timeline Service. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4597) Add SCHEDULE to NM container lifecycle
[ https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655975#comment-15655975 ] Konstantinos Karanasos commented on YARN-4597: -- Thanks for working on this, [~asuresh]! I am sending some first comments. I have not yet looked at the {{ContainerScheduler}} -- I will do that tomorrow. - The {{Container}} has two new methods ({{sendLaunchEvent}} and {{sendKillEvent}}), which are public and are not following the design of the rest of the code that keeps such methods private and calls them through transitions in the {{ContainerImpl}}. Let's try to use the existing design if possible. - In {{RMNodeImpl}}: -- Instead of using the {{launchedContainers}} for both the launched and the queued, we might want to split it in two: one for the launched and one for the queued containers. -- I think we should not add opportunistic containers to the {{launchContainers}}. If we do, they will be added to the {{newlyLaunchedContainers}}, then to the {{nodeUpdateQueue}}, and, if I am not wrong, they will be propagated to the schedulers for the guaranteed containers, which will create problems. I have to look at it a bit more, but my hunch is that we should avoid doing it. Even if it does not affect the resource accounting, I don't see any advantage to adding them. - In the {{OpportunisticContainerAllocatorAMService}} we are now calling the {{SchedulerNode::allocate}}, and then we do not update the used resources, but we do update some other counters, which leads to inconsistencies. For example, when releasing a container, I think at the moment we are not calling the release of the {{SchedulerNode}}, which means that the container count will become inconsistent. -- Instead, I suggest to add some counters for opportunistic containers at the {{SchedulerNode}}, both for the number of containers and for the resources used. In this case, we need to make sure that those resources are released too. - Maybe as part of a different JIRA, we should at some point extend the {{container.metrics}} in the {{ContainerImpl}} to keep track of the scheduled/queued containers. h6. Nits: - There seem to be two redundant parameters at {{YarnConfiguration}} at the moment: {{NM_CONTAINER_QUEUING_MIN_QUEUE_LENGTH}} and {{NM_OPPORTUNISTIC_CONTAINERS_MAX_QUEUE_LENGTH}}. If I am not missing something, we should keep one of the two. - {{yarn-default.xml}}: numbed -> number (in a comment) - {{TestNodeManagerResync}}: I think it is better to use one of the existing methods for waiting to get to the RUNNING state. - In {{Container}}/{{ContainerImpl}} and all the associated classes, I would suggest to rename {{isMarkedToKill}} to {{isMarkedForKilling}}. I know it is minor, but it is more self-explanatory. I will send more comments once I check the {{ContainerScheduler}}. Also, let's stress-test the code in a cluster before committing to make sure everything is good. I can help with that. > Add SCHEDULE to NM container lifecycle > -- > > Key: YARN-4597 > URL: https://issues.apache.org/jira/browse/YARN-4597 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Chris Douglas >Assignee: Arun Suresh > Labels: oct16-hard > Attachments: YARN-4597.001.patch, YARN-4597.002.patch, > YARN-4597.003.patch, YARN-4597.004.patch, YARN-4597.005.patch, > YARN-4597.006.patch, YARN-4597.007.patch, YARN-4597.008.patch, > YARN-4597.009.patch > > > Currently, the NM immediately launches containers after resource > localization. Several features could be more cleanly implemented if the NM > included a separate stage for reserving resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5761) Separate QueueManager from Scheduler
[ https://issues.apache.org/jira/browse/YARN-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655868#comment-15655868 ] Jonathan Hung commented on YARN-5761: - This looks good, a few comments: # We can probably put the {{static QueueHook}} class in CapacitySchedulerQueueManager # Can we set the YarnAuthorizationProvider in the CSQueueManager instead of in CS? Since at least for now this is a queue-level component. # Can we move the queue label stuff to CSQueueManager as well? i.e. {{labelManager.reinitializeQueueLabels(getQueueToLabels())}} in CS#initializeQueues and CS#reinitializeQueues, we can move to the respective methods in CSQueueManager. So we can pass the labelManager to CSQueueManager in the constructor, and have getQueueToLabels be a method in CSQueueManager. # There are calls to {{CS#getQueue}} in some places inside CS, and some calls to queueManager.getQueue, perhaps we should make them all the same (we can use CS#getQueue since it is a wrapper around queueManager.getQueue) # Two other methods, {{getDefaultPriorityForQueue}}, {{getAndCheckLeafQueue}} maybe consider moving to CSQueueManager. I'm not so sure about getDefaultPriorityForQueue, since CSQueueManager does queue configuration, does it make sense to have queue attribute accessors such as getDefaultPriorityForQueue in CSQueueManager too? > Separate QueueManager from Scheduler > > > Key: YARN-5761 > URL: https://issues.apache.org/jira/browse/YARN-5761 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Xuan Gong >Assignee: Xuan Gong > Labels: oct16-medium > Attachments: YARN-5761.1.patch, YARN-5761.1.rebase.patch, > YARN-5761.2.patch, YARN-5761.3.patch > > > Currently, in scheduler code, we are doing queue manager and scheduling work. > We'd better separate the queue manager out of scheduler logic. In that case, > it would be much easier and safer to extend. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4752) [Umbrella] FairScheduler: Improve preemption
[ https://issues.apache.org/jira/browse/YARN-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655861#comment-15655861 ] Hadoop QA commented on YARN-4752: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 7 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 51s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 44s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 5m 19s{color} | {color:red} hadoop-yarn in trunk failed. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 3m 53s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 3m 53s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 49s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 11 new + 183 unchanged - 137 fixed = 194 total (was 320) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 0 new + 916 unchanged - 9 fixed = 916 total (was 925) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 27s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 37m 18s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 80m 10s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:e809691 | | JIRA Issue | YARN-4752 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12838473/yarn-4752.2.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 4d71dc1e74cc 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 93eeb13 | | Default Java | 1.8.0_101 | | compile | https://builds.apache.org/job/PreCommit-YARN-Build/13866/artifact/patchprocess/branch-compile-hadoop-yarn-project_hadoop-yarn.txt | | findbugs | v3.0.0 | |
[jira] [Commented] (YARN-5600) Add a parameter to ContainerLaunchContext to emulate yarn.nodemanager.delete.debug-delay-sec on a per-application basis
[ https://issues.apache.org/jira/browse/YARN-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655828#comment-15655828 ] Miklos Szegedi commented on YARN-5600: -- Thank you for reviewing the patch [~wangda], and also thank you for looking into the build issue! I think it is a good idea to have the value in an environment variable. Speaking about a debug feature I would like the change to be as small and simple as possible. When talking about environment variables, are you referring to this pattern? {code} String containerImageName = container.getLaunchContext().getEnvironment() .get(YarnConfiguration.NM_DOCKER_CONTAINER_EXECUTOR_IMAGE_NAME); {code} I will write a new patch that includes your suggestions including yarn.nodemanager.delete.max-debug-delay-sec value to limit the maximum wait. > Add a parameter to ContainerLaunchContext to emulate > yarn.nodemanager.delete.debug-delay-sec on a per-application basis > --- > > Key: YARN-5600 > URL: https://issues.apache.org/jira/browse/YARN-5600 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.0.0-alpha1 >Reporter: Daniel Templeton >Assignee: Miklos Szegedi > Labels: oct16-medium > Attachments: YARN-5600.000.patch, YARN-5600.001.patch, > YARN-5600.002.patch, YARN-5600.003.patch, YARN-5600.004.patch, > YARN-5600.005.patch, YARN-5600.006.patch, YARN-5600.007.patch > > > To make debugging application launch failures simpler, I'd like to add a > parameter to the CLC to allow an application owner to request delayed > deletion of the application's launch artifacts. > This JIRA solves largely the same problem as YARN-5599, but for cases where > ATS is not in use, e.g. branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5819) Verify fairshare and minshare preemption
[ https://issues.apache.org/jira/browse/YARN-5819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655814#comment-15655814 ] Hadoop QA commented on YARN-5819: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 40s{color} | {color:green} YARN-4752 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} YARN-4752 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} YARN-4752 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} YARN-4752 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 23s{color} | {color:green} YARN-4752 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s{color} | {color:green} YARN-4752 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} YARN-4752 passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 19s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 30 unchanged - 0 fixed = 31 total (was 30) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 40m 50s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 57m 52s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:e809691 | | JIRA Issue | YARN-5819 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12838471/yarn-5819.YARN-4752.5.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 4d6e115b41bc 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | YARN-4752 / 2140674 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/13865/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/13865/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/13865/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output |
[jira] [Commented] (YARN-5634) Simplify initialization/use of RouterPolicy via a RouterPolicyFacade
[ https://issues.apache.org/jira/browse/YARN-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655793#comment-15655793 ] Hadoop QA commented on YARN-5634: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 2s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 24s{color} | {color:green} YARN-2915 passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 5m 20s{color} | {color:red} hadoop-yarn in YARN-2915 failed. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 50s{color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s{color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 44s{color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 17s{color} | {color:green} YARN-2915 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} YARN-2915 passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 4m 26s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 4m 26s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 49s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 227 unchanged - 0 fixed = 229 total (was 227) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 38s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 10s{color} | {color:red} hadoop-yarn-server-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 46s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 45m 18s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.federation.policies.router.TestWeightedRandomRouterPolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:e809691 | | JIRA Issue | YARN-5634 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12838472/YARN-5634-YARN-2915.03.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 8586c13d77b0 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | YARN-2915 / c3a5672 | | Default Java | 1.8.0_111 | | compile | https://builds.apache.org/job/PreCommit-YARN-Build/13867/artifact/patchprocess/branch-compile-hadoop-yarn-project_hadoop-yarn.txt | | findbugs | v3.0.0 | | compile | https://builds.apache.org/job/PreCommit-YARN-Build/13867/artifact/patchprocess/patch-compile-hadoop-yarn-project_hadoop-yarn.txt | |
[jira] [Commented] (YARN-5600) Add a parameter to ContainerLaunchContext to emulate yarn.nodemanager.delete.debug-delay-sec on a per-application basis
[ https://issues.apache.org/jira/browse/YARN-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655703#comment-15655703 ] Tan, Wangda commented on YARN-5600: --- This is a very useful feature without any doubt, thanks [~miklos.szeg...@cloudera.com] for working on this JIRA and thanks reviews from [~Naganarasimha] / [~templedf] for reviewing the patch. Apologize for my very late review, I only looked at API of the patch: Have you ever considered the other approach, which we can turn on debug-delay-sec by passing a pre-defined environment. The biggest benefit is, we don't need to update most applications to use this feature, for example, MR/Spark support specify environment. Making changes to all major applications to use this feature sounds like a big task. As an example LinuxDockerContainerExecutor uses the approach which specify configurations by passing env var. And in addition, it will be better to have a global max-debug-delay-sec in yarn-site (which could be MAX_INT by default), considering disk space and security, we may want application occupy disk beyond some specified time. + [~djp] > Add a parameter to ContainerLaunchContext to emulate > yarn.nodemanager.delete.debug-delay-sec on a per-application basis > --- > > Key: YARN-5600 > URL: https://issues.apache.org/jira/browse/YARN-5600 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.0.0-alpha1 >Reporter: Daniel Templeton >Assignee: Miklos Szegedi > Labels: oct16-medium > Attachments: YARN-5600.000.patch, YARN-5600.001.patch, > YARN-5600.002.patch, YARN-5600.003.patch, YARN-5600.004.patch, > YARN-5600.005.patch, YARN-5600.006.patch, YARN-5600.007.patch > > > To make debugging application launch failures simpler, I'd like to add a > parameter to the CLC to allow an application owner to request delayed > deletion of the application's launch artifacts. > This JIRA solves largely the same problem as YARN-5599, but for cases where > ATS is not in use, e.g. branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5634) Simplify initialization/use of RouterPolicy via a RouterPolicyFacade
[ https://issues.apache.org/jira/browse/YARN-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-5634: --- Attachment: YARN-5634-YARN-2915.03.patch > Simplify initialization/use of RouterPolicy via a RouterPolicyFacade > - > > Key: YARN-5634 > URL: https://issues.apache.org/jira/browse/YARN-5634 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Affects Versions: YARN-2915 >Reporter: Carlo Curino >Assignee: Carlo Curino > Labels: oct16-medium > Attachments: YARN-5634-YARN-2915.01.patch, > YARN-5634-YARN-2915.02.patch, YARN-5634-YARN-2915.03.patch > > > The current set of policies require some machinery to (re)initialize based on > changes in the SubClusterPolicyConfiguration. This JIRA tracks the effort to > hide much of that behind a simple RouterPolicyFacade, making lifecycle and > usage of the policies easier to consumers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4752) [Umbrella] FairScheduler: Improve preemption
[ https://issues.apache.org/jira/browse/YARN-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-4752: --- Attachment: yarn-4752.2.patch > [Umbrella] FairScheduler: Improve preemption > > > Key: YARN-4752 > URL: https://issues.apache.org/jira/browse/YARN-4752 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla > Attachments: YARN-4752.FairSchedulerPreemptionOverhaul.pdf, > yarn-4752-1.patch, yarn-4752.2.patch > > > A number of issues have been reported with respect to preemption in > FairScheduler along the lines of: > # FairScheduler preempts resources from nodes even if the resultant free > resources cannot fit the incoming request. > # Preemption doesn't preempt from sibling queues > # Preemption doesn't preempt from sibling apps under the same queue that is > over its fairshare > # ... > Filing this umbrella JIRA to group all the issues together and think of a > comprehensive solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5819) Verify fairshare and minshare preemption
[ https://issues.apache.org/jira/browse/YARN-5819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-5819: --- Attachment: yarn-5819.YARN-4752.5.patch Rebased YARN-4752 on trunk. Updating this patch to catch up with the rebase. > Verify fairshare and minshare preemption > > > Key: YARN-5819 > URL: https://issues.apache.org/jira/browse/YARN-5819 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 2.9.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-5819.YARN-4752.1.patch, > yarn-5819.YARN-4752.2.patch, yarn-5819.YARN-4752.3.patch, > yarn-5819.YARN-4752.4.patch, yarn-5819.YARN-4752.5.patch > > > JIRA to track the unit test(s) verifying both fairshare and minshare > preemption. The tests should verify: > # preemption within a single leaf queue > # preemption between sibling leaf queues > # preemption between non-sibling leaf queues > # {{allowPreemption = false}} should disallow preemption from a queue -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3053) [Security] Review and implement security in ATS v.2
[ https://issues.apache.org/jira/browse/YARN-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655625#comment-15655625 ] Sangjin Lee commented on YARN-3053: --- Thanks [~varun_saxena] for putting together the proposal! It's a great start. Sorry it took me a while to get to this. I have a couple of quick questions (maybe more to follow): - How do other NMs (that are running the containers) authenticate? I don’t think they can do a real authentication. Then how would they get the delegation token for the app? To solve this, would we be able to allow YARN daemons to access and look up the DTs from RM? - How would each option handle the case of AM failures (and subsequent relaunching of app attempts and/or the timeline collector on another node)? It wasn’t very clear to me… > [Security] Review and implement security in ATS v.2 > --- > > Key: YARN-3053 > URL: https://issues.apache.org/jira/browse/YARN-3053 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > Labels: YARN-5355 > Attachments: ATSv2Authentication(draft).pdf > > > Per design in YARN-2928, we want to evaluate and review the system for > security, and ensure proper security in the system. > This includes proper authentication, token management, access control, and > any other relevant security aspects. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5600) Add a parameter to ContainerLaunchContext to emulate yarn.nodemanager.delete.debug-delay-sec on a per-application basis
[ https://issues.apache.org/jira/browse/YARN-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655581#comment-15655581 ] Hadoop QA commented on YARN-5600: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 9s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 5m 22s{color} | {color:red} hadoop-yarn in trunk failed. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 31s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 4m 8s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 4m 8s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 4m 8s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 54s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 577 unchanged - 7 fixed = 577 total (was 584) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 34s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 30s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 15m 55s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 45s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 64m 33s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:e809691 | | JIRA Issue | YARN-5600 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12838452/YARN-5600.007.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc | | uname | Linux 0805289b3944 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 93eeb13 | | Default Java | 1.8.0_101 | | compile | https://builds.apache.org/job/PreCommit-YARN-Build/13864/artifact/patchprocess/branch-compile-hadoop-yarn-project_hadoop-yarn.txt | | findbugs | v3.0.0 | | compile |
[jira] [Commented] (YARN-5792) adopt the id prefix for YARN, MR, and DS entities
[ https://issues.apache.org/jira/browse/YARN-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1561#comment-1561 ] Sangjin Lee commented on YARN-5792: --- At a high level, I'd like to discuss options. I see that you're mostly using the (inverted) start time as the id prefix. Would it be better to use simply the id instead whenever possible? One big benefit of using the id is that it is very portable. When creating entities and updating them, the id is almost always available. All we require is a uniqueness within the app and the entity type, and it seems to me that the id is a superior alternative to the start time. What do you think? > adopt the id prefix for YARN, MR, and DS entities > - > > Key: YARN-5792 > URL: https://issues.apache.org/jira/browse/YARN-5792 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-5792-YARN-5355.01.patch, > YARN-5792-YARN-5355.02.patch > > > We introduced the entity id prefix to support flexible entity sorting > (YARN-5715). We should adopt the id prefix for YARN entities, MR entities, > and DS entities to take advantage of the id prefix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5792) adopt the id prefix for YARN, MR, and DS entities
[ https://issues.apache.org/jira/browse/YARN-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655540#comment-15655540 ] Sangjin Lee commented on YARN-5792: --- Patch v.2 fails compilation: {noformat} [INFO] - [INFO] - [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /Users/sjlee/git/hadoop-ats/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java:[246,12] startTime has private access in org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.RecoveredContainerState [INFO] 1 error {noformat} Could you please take a look? Thanks! > adopt the id prefix for YARN, MR, and DS entities > - > > Key: YARN-5792 > URL: https://issues.apache.org/jira/browse/YARN-5792 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-5792-YARN-5355.01.patch, > YARN-5792-YARN-5355.02.patch > > > We introduced the entity id prefix to support flexible entity sorting > (YARN-5715). We should adopt the id prefix for YARN entities, MR entities, > and DS entities to take advantage of the id prefix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5044) Add peak memory usage counter for each task
[ https://issues.apache.org/jira/browse/YARN-5044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-5044: --- Assignee: (was: Yufei Gu) > Add peak memory usage counter for each task > --- > > Key: YARN-5044 > URL: https://issues.apache.org/jira/browse/YARN-5044 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Yufei Gu > > Each task has counters PHYSICAL_MEMORY_BYTES and VIRTUAL_MEMORY_BYTES, which > are snapshots of memory usage of that task. They are not sufficient for users > to understand peak memory usage by that task, e.g. in order to diagnose task > failures, tune job parameters or change application design. This new feature > will add two more counters for each task: PHYSICAL_MEMORY_BYTES_MAX and > VIRTUAL_MEMORY_BYTES_MAX. > This JIRA has the same feature from MAPREDUCE-4710. I file this new YARN > JIRA since MAPREDUCE-4710 is pretty old one from MR 1.x era, it more or less > assumes a branch-1 architecture, should be close at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4715) Add support to read resource types from a config file
[ https://issues.apache.org/jira/browse/YARN-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-4715: -- Fix Version/s: YARN-3926 Setting the missed fix-version to the branch-name. > Add support to read resource types from a config file > - > > Key: YARN-4715 > URL: https://issues.apache.org/jira/browse/YARN-4715 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Fix For: YARN-3926 > > Attachments: YARN-4715-YARN-3926.001.patch, > YARN-4715-YARN-3926.002.patch, YARN-4715-YARN-3926.003.patch, > YARN-4715-YARN-3926.004.patch, YARN-4715-YARN-3926.005.patch > > > This ticket is to add support to allow the RM to read the resource types to > be used for scheduling from a config file. I'll file follow up tickets to add > similar support in the NM as well as to handle the RM-NM handshake protocol > issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5864) Capacity Scheduler preemption for fragmented cluster
[ https://issues.apache.org/jira/browse/YARN-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655514#comment-15655514 ] Tan, Wangda commented on YARN-5864: --- Thanks [~curino] for sharing these insightful suggestions. The problem you mentioned is totally true: we were putting lots of efforts to add features for various of resource constraints (such as limits, node partition, priority, etc.) but we paid less attention about how to make easier/consistent semantics. I also agree that we do need to spend some time to think about what is the semantics that YARN scheduler should have. For example, the minimum guarantee of CS is queue should get at least their configured capacity, but a picky app could make an under-utilized queue waiting forever for the resource. And also as you mentioned above, non-preemptable queue can invalidate configured capacity as well. However, I would argue that the scheduler is not able to run perfectly without invalidating all the constraints. It is not just a group of formulas we need to define and let the solver to optimize it, it involves lots of human's emotions and preferences. For example, user may not understand and glad to accept why a picky request cannot be allocated even if the queue/cluster have available capacity. And it may not be acceptable to a production cluster that a long running service for realtime queries cannot be launched because we don't want to kill some less-important batch jobs. My point is, if we can have these rules defined in the doc and user can know what happened from the UI/log, we can add them. To improve these, I think your suggestion (1) will be more helpful and achievable in a short term, we can definitely remove some parameters, for example, existing user-limit definition is not good enough and user-limit-factor can always make a queue cannot fully utilize its capacity. And we can better define these semantics in doc and UI. (2) Looks beautiful but it may not be able to solve the root problem directly: The first priority is to make our users feel happy to accept it instead of beautifully solving it in mathematics. For example, for the problem I put in description of the JIRA, I don't think (2) can get allocation without harming other applications. And in implementation's perspective, I'm not sure how to make a solver-based solution can handle both of fast allocation (we want to do allocation within milli-seconds for interactive queries) and good placement (such as gang scheduling with some other constraints like anti-affinity). It seems to me that we will sacrifice low latency to get better quality of placement for the option (2). bq. This opens up many abuses, one that comes to mind ... Actually this feature will be only used in a pretty controlled environment: Important long running services running in a separate queue, and admin/user agrees that it can preempt other batch jobs to get new containers. ACLs will be set to avoid normal user running inside these queues, all apps running in the queue should be trusted apps such as YARN native services (Slider), Spark, etc. And we can also make sure these apps will try best to respect other apps. And please advice if you think we can improve the semantics of this feature. Thanks, > Capacity Scheduler preemption for fragmented cluster > - > > Key: YARN-5864 > URL: https://issues.apache.org/jira/browse/YARN-5864 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-5864.poc-0.patch > > > YARN-4390 added preemption for reserved container. However, we found one case > that large container cannot be allocated even if all queues are under their > limit. > For example, we have: > {code} > Two queues, a and b, capacity 50:50 > Two nodes: n1 and n2, each of them have 50 resource > Now queue-a uses 10 on n1 and 10 on n2 > queue-b asks for one single container with resource=45. > {code} > The container could be reserved on any of the host, but no preemption will > happen because all queues are under their limits. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4829) Add support for binary units
[ https://issues.apache.org/jira/browse/YARN-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-4829: -- Fix Version/s: YARN-3926 Setting the missed fix-version to the branch-name. > Add support for binary units > > > Key: YARN-4829 > URL: https://issues.apache.org/jira/browse/YARN-4829 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Fix For: YARN-3926 > > Attachments: YARN-4829-YARN-3926.001.patch, > YARN-4829-YARN-3926.002.patch, YARN-4829-YARN-3926.003.patch, > YARN-4829-YARN-3926.004.patch > > > The units conversion util should have support for binary units. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4830) Add support for resource types in the nodemanager
[ https://issues.apache.org/jira/browse/YARN-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-4830: -- Fix Version/s: YARN-3926 Setting the missed fix-version to the branch-name. > Add support for resource types in the nodemanager > - > > Key: YARN-4830 > URL: https://issues.apache.org/jira/browse/YARN-4830 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Fix For: YARN-3926 > > Attachments: YARN-4830-YARN-3926.001.patch, > YARN-4830-YARN-3926.002.patch, YARN-4830-YARN-3926.003.patch, > YARN-4830-YARN-3926.004.patch > > > The RM has support for multiple resource types. The same should be added for > the NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4218) Metric for resource*time that was preempted
[ https://issues.apache.org/jira/browse/YARN-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655459#comment-15655459 ] Hudson commented on YARN-4218: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10815 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10815/]) YARN-4218. Metric for resource*time that was preempted. Contributed by (epayne: rev 93eeb13164707d0e3556c2bf737bd2ee09a335c6) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationResourceUsageReportPBImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TimelineServiceV1Publisher.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppMetrics.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppBlock.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestAppPage.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationResourceUsageReport.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisherForV2.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/ApplicationAttemptStateDataPBImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/proto/yarn_server_resourcemanager_recovery.proto * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TimelineServiceV2Publisher.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java * (edit)
[jira] [Updated] (YARN-5600) Add a parameter to ContainerLaunchContext to emulate yarn.nodemanager.delete.debug-delay-sec on a per-application basis
[ https://issues.apache.org/jira/browse/YARN-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi updated YARN-5600: - Attachment: YARN-5600.007.patch Resubmitting the patch. > Add a parameter to ContainerLaunchContext to emulate > yarn.nodemanager.delete.debug-delay-sec on a per-application basis > --- > > Key: YARN-5600 > URL: https://issues.apache.org/jira/browse/YARN-5600 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.0.0-alpha1 >Reporter: Daniel Templeton >Assignee: Miklos Szegedi > Labels: oct16-medium > Attachments: YARN-5600.000.patch, YARN-5600.001.patch, > YARN-5600.002.patch, YARN-5600.003.patch, YARN-5600.004.patch, > YARN-5600.005.patch, YARN-5600.006.patch, YARN-5600.007.patch > > > To make debugging application launch failures simpler, I'd like to add a > parameter to the CLC to allow an application owner to request delayed > deletion of the application's launch artifacts. > This JIRA solves largely the same problem as YARN-5599, but for cases where > ATS is not in use, e.g. branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5765) LinuxContainerExecutor creates appcache and its subdirectories with wrong group owner.
[ https://issues.apache.org/jira/browse/YARN-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655375#comment-15655375 ] Miklos Szegedi commented on YARN-5765: -- Thank you, [~Naganarasimha] for the patch and [~haibochen] for the review. If I understand it correctly, this is the flow of calls. {code} launch_container_as_user fork create_local_dirs create_log_dirs mkdir change_effective_user create_container_directories mkdirs create_validate_dir {code} We cannot change umask before change_effective_user() I think and changing it in mkdirs() or create_validate_dir() may add side effects to other callers of mkdirs() in the future as [~haibochen] mentioned. What I would do is to set the umask at the beginning of create_container_directories right at the comment below {code} // create dirs as 0750 umask(0027); {code} I would also reset it to the previous value, before it returns. Just a side note: This is what the Linux man page says about mkdir(): "in the absence of a default ACL, the mode of the created directory is (mode & ~umask & 0777)" This means that by removing chmod this change does not apply to cases anymore, when the default ACL is too restrictive. Could this be an issue, or do we rely on the admin to set the default ACL correctly? > LinuxContainerExecutor creates appcache and its subdirectories with wrong > group owner. > -- > > Key: YARN-5765 > URL: https://issues.apache.org/jira/browse/YARN-5765 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Haibo Chen >Assignee: Naganarasimha G R >Priority: Blocker > Attachments: YARN-5765.001.patch > > > LinuxContainerExecutor creates usercache/\{userId\}/appcache/\{appId\} with > wrong group owner, causing Log aggregation and ShuffleHandler to fail because > node manager process does not have permission to read the files under the > directory. > This can be easily reproduced by enabling LCE and submitting a MR example job > as a user that does not belong to the same group that NM process belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5834) TestNodeStatusUpdater.testNMRMConnectionConf compares nodemanager wait time to the incorrect value
[ https://issues.apache.org/jira/browse/YARN-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655347#comment-15655347 ] Hudson commented on YARN-5834: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10814 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10814/]) YARN-5834. TestNodeStatusUpdater.testNMRMConnectionConf compares (kasha: rev 3a98419532687e4362ffc26abbc1264232820db7) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java > TestNodeStatusUpdater.testNMRMConnectionConf compares nodemanager wait time > to the incorrect value > -- > > Key: YARN-5834 > URL: https://issues.apache.org/jira/browse/YARN-5834 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Chang Li >Priority: Trivial > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-5834-branch-2.001.patch > > > The function is TestNodeStatusUpdater#testNMRMConnectionConf() > I believe the connectionWaitMs references below were meant to be > nmRmConnectionWaitMs. > {code} > conf.setLong(YarnConfiguration.NM_RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > nmRmConnectionWaitMs); > conf.setLong(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > connectionWaitMs); > ... > long t = System.currentTimeMillis(); > long duration = t - waitStartTime; > boolean waitTimeValid = (duration >= nmRmConnectionWaitMs) && > (duration < (*connectionWaitMs* + delta)); > if(!waitTimeValid) { > // throw exception if NM doesn't retry long enough > throw new Exception("NM should have tried re-connecting to RM during > " + > "period of at least " + *connectionWaitMs* + " ms, but " + > "stopped retrying within " + (*connectionWaitMs* + delta) + > " ms: " + e, e); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5834) TestNodeStatusUpdater.testNMRMConnectionConf compares nodemanager wait time to the incorrect value
[ https://issues.apache.org/jira/browse/YARN-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-5834: --- Fix Version/s: 3.0.0-alpha2 > TestNodeStatusUpdater.testNMRMConnectionConf compares nodemanager wait time > to the incorrect value > -- > > Key: YARN-5834 > URL: https://issues.apache.org/jira/browse/YARN-5834 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Chang Li >Priority: Trivial > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-5834-branch-2.001.patch > > > The function is TestNodeStatusUpdater#testNMRMConnectionConf() > I believe the connectionWaitMs references below were meant to be > nmRmConnectionWaitMs. > {code} > conf.setLong(YarnConfiguration.NM_RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > nmRmConnectionWaitMs); > conf.setLong(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > connectionWaitMs); > ... > long t = System.currentTimeMillis(); > long duration = t - waitStartTime; > boolean waitTimeValid = (duration >= nmRmConnectionWaitMs) && > (duration < (*connectionWaitMs* + delta)); > if(!waitTimeValid) { > // throw exception if NM doesn't retry long enough > throw new Exception("NM should have tried re-connecting to RM during > " + > "period of at least " + *connectionWaitMs* + " ms, but " + > "stopped retrying within " + (*connectionWaitMs* + delta) + > " ms: " + e, e); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5825) ProportionalPreemptionalPolicy could use readLock over LeafQueue instead of synchronized block
[ https://issues.apache.org/jira/browse/YARN-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655310#comment-15655310 ] Jian He commented on YARN-5825: --- looks good overall, looks like this method added is not used? can be removed {code} public ReentrantReadWriteLock.WriteLock getWriteLock() { return writeLock; } {code} > ProportionalPreemptionalPolicy could use readLock over LeafQueue instead of > synchronized block > -- > > Key: YARN-5825 > URL: https://issues.apache.org/jira/browse/YARN-5825 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Sunil G >Assignee: Sunil G > Attachments: YARN-5825.0001.patch > > > Currently in PCPP, {{synchronized (curQueue)}} is used in various places. > Such instances could be replaced with a read lock. Thank you [~jianhe] for > pointing out the same as comment > [here|https://issues.apache.org/jira/browse/YARN-2009?focusedCommentId=15626578=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15626578] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5834) TestNodeStatusUpdater.testNMRMConnectionConf compares nodemanager wait time to the incorrect value
[ https://issues.apache.org/jira/browse/YARN-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655304#comment-15655304 ] Karthik Kambatla commented on YARN-5834: +1. Checking this in.. > TestNodeStatusUpdater.testNMRMConnectionConf compares nodemanager wait time > to the incorrect value > -- > > Key: YARN-5834 > URL: https://issues.apache.org/jira/browse/YARN-5834 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Chang Li >Priority: Trivial > Attachments: YARN-5834-branch-2.001.patch > > > The function is TestNodeStatusUpdater#testNMRMConnectionConf() > I believe the connectionWaitMs references below were meant to be > nmRmConnectionWaitMs. > {code} > conf.setLong(YarnConfiguration.NM_RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > nmRmConnectionWaitMs); > conf.setLong(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > connectionWaitMs); > ... > long t = System.currentTimeMillis(); > long duration = t - waitStartTime; > boolean waitTimeValid = (duration >= nmRmConnectionWaitMs) && > (duration < (*connectionWaitMs* + delta)); > if(!waitTimeValid) { > // throw exception if NM doesn't retry long enough > throw new Exception("NM should have tried re-connecting to RM during > " + > "period of at least " + *connectionWaitMs* + " ms, but " + > "stopped retrying within " + (*connectionWaitMs* + delta) + > " ms: " + e, e); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5834) TestNodeStatusUpdater.testNMRMConnectionConf compares nodemanager wait time to the incorrect value
[ https://issues.apache.org/jira/browse/YARN-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-5834: --- Priority: Trivial (was: Minor) > TestNodeStatusUpdater.testNMRMConnectionConf compares nodemanager wait time > to the incorrect value > -- > > Key: YARN-5834 > URL: https://issues.apache.org/jira/browse/YARN-5834 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Chang Li >Priority: Trivial > Attachments: YARN-5834-branch-2.001.patch > > > The function is TestNodeStatusUpdater#testNMRMConnectionConf() > I believe the connectionWaitMs references below were meant to be > nmRmConnectionWaitMs. > {code} > conf.setLong(YarnConfiguration.NM_RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > nmRmConnectionWaitMs); > conf.setLong(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > connectionWaitMs); > ... > long t = System.currentTimeMillis(); > long duration = t - waitStartTime; > boolean waitTimeValid = (duration >= nmRmConnectionWaitMs) && > (duration < (*connectionWaitMs* + delta)); > if(!waitTimeValid) { > // throw exception if NM doesn't retry long enough > throw new Exception("NM should have tried re-connecting to RM during > " + > "period of at least " + *connectionWaitMs* + " ms, but " + > "stopped retrying within " + (*connectionWaitMs* + delta) + > " ms: " + e, e); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4218) Metric for resource*time that was preempted
[ https://issues.apache.org/jira/browse/YARN-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655289#comment-15655289 ] Eric Payne commented on YARN-4218: -- +1 Thanks [~lichangleo] for the patches and the work done on this JIRA. I will commit this. > Metric for resource*time that was preempted > --- > > Key: YARN-4218 > URL: https://issues.apache.org/jira/browse/YARN-4218 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4218-branch-2.003.patch, YARN-4218.006.patch, > YARN-4218.2.patch, YARN-4218.2.patch, YARN-4218.2.patch, YARN-4218.2.patch, > YARN-4218.3.patch, YARN-4218.4.patch, YARN-4218.5.patch, > YARN-4218.branch-2.2.patch, YARN-4218.branch-2.patch, YARN-4218.patch, > YARN-4218.trunk.2.patch, YARN-4218.trunk.3.patch, YARN-4218.trunk.patch, > YARN-4218.wip.patch, screenshot-1.png, screenshot-2.png, screenshot-3.png > > > After YARN-415 we have the ability to track the resource*time footprint of a > job and preemption metrics shows how many containers were preempted on a job. > However we don't have a metric showing the resource*time footprint cost of > preemption. In other words, we know how many containers were preempted but we > don't have a good measure of how much work was lost as a result of preemption. > We should add this metric so we can analyze how much work preemption is > costing on a grid and better track which jobs were heavily impacted by it. A > job that has 100 containers preempted that only lasted a minute each and were > very small is going to be less impacted than a job that only lost a single > container but that container was huge and had been running for 3 days. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5819) Verify fairshare and minshare preemption
[ https://issues.apache.org/jira/browse/YARN-5819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655248#comment-15655248 ] Karthik Kambatla commented on YARN-5819: Since updating {{Resource}} is not atomic, it seemed safer to do reads/writes/updates protected by a lock. > Verify fairshare and minshare preemption > > > Key: YARN-5819 > URL: https://issues.apache.org/jira/browse/YARN-5819 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 2.9.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-5819.YARN-4752.1.patch, > yarn-5819.YARN-4752.2.patch, yarn-5819.YARN-4752.3.patch, > yarn-5819.YARN-4752.4.patch > > > JIRA to track the unit test(s) verifying both fairshare and minshare > preemption. The tests should verify: > # preemption within a single leaf queue > # preemption between sibling leaf queues > # preemption between non-sibling leaf queues > # {{allowPreemption = false}} should disallow preemption from a queue -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5792) adopt the id prefix for YARN, MR, and DS entities
[ https://issues.apache.org/jira/browse/YARN-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-5792: --- Attachment: YARN-5792-YARN-5355.02.patch Fixed checkstyle and javadoc issues > adopt the id prefix for YARN, MR, and DS entities > - > > Key: YARN-5792 > URL: https://issues.apache.org/jira/browse/YARN-5792 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-5792-YARN-5355.01.patch, > YARN-5792-YARN-5355.02.patch > > > We introduced the entity id prefix to support flexible entity sorting > (YARN-5715). We should adopt the id prefix for YARN entities, MR entities, > and DS entities to take advantage of the id prefix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5819) Verify fairshare and minshare preemption
[ https://issues.apache.org/jira/browse/YARN-5819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655170#comment-15655170 ] Daniel Templeton commented on YARN-5819: Do you need to synchronize {{getPreemptedResources()}}? Doesn't look like it helps to me. Maybe synchronize and return a copy? > Verify fairshare and minshare preemption > > > Key: YARN-5819 > URL: https://issues.apache.org/jira/browse/YARN-5819 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 2.9.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: yarn-5819.YARN-4752.1.patch, > yarn-5819.YARN-4752.2.patch, yarn-5819.YARN-4752.3.patch, > yarn-5819.YARN-4752.4.patch > > > JIRA to track the unit test(s) verifying both fairshare and minshare > preemption. The tests should verify: > # preemption within a single leaf queue > # preemption between sibling leaf queues > # preemption between non-sibling leaf queues > # {{allowPreemption = false}} should disallow preemption from a queue -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4206) Add life time value in Application report and web UI
[ https://issues.apache.org/jira/browse/YARN-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655155#comment-15655155 ] Jian He commented on YARN-4206: --- The remaining time can be inferred from the absolute timeout value ? If so, I don't think we need an additional API in ApplicationClientProtocol to get that. > Add life time value in Application report and web UI > > > Key: YARN-4206 > URL: https://issues.apache.org/jira/browse/YARN-4206 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: nijel >Assignee: nijel > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5792) adopt the id prefix for YARN, MR, and DS entities
[ https://issues.apache.org/jira/browse/YARN-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-5792: --- Attachment: (was: YARN-5792-YARN-5355.02.patch) > adopt the id prefix for YARN, MR, and DS entities > - > > Key: YARN-5792 > URL: https://issues.apache.org/jira/browse/YARN-5792 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-5792-YARN-5355.01.patch > > > We introduced the entity id prefix to support flexible entity sorting > (YARN-5715). We should adopt the id prefix for YARN entities, MR entities, > and DS entities to take advantage of the id prefix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5792) adopt the id prefix for YARN, MR, and DS entities
[ https://issues.apache.org/jira/browse/YARN-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-5792: --- Attachment: YARN-5792-YARN-5355.02.patch > adopt the id prefix for YARN, MR, and DS entities > - > > Key: YARN-5792 > URL: https://issues.apache.org/jira/browse/YARN-5792 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-5792-YARN-5355.01.patch, > YARN-5792-YARN-5355.02.patch > > > We introduced the entity id prefix to support flexible entity sorting > (YARN-5715). We should adopt the id prefix for YARN entities, MR entities, > and DS entities to take advantage of the id prefix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5792) adopt the id prefix for YARN, MR, and DS entities
[ https://issues.apache.org/jira/browse/YARN-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655096#comment-15655096 ] Varun Saxena commented on YARN-5792: Tests pass in local. Lets see what comes up when build is invoked again. > adopt the id prefix for YARN, MR, and DS entities > - > > Key: YARN-5792 > URL: https://issues.apache.org/jira/browse/YARN-5792 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-5792-YARN-5355.01.patch > > > We introduced the entity id prefix to support flexible entity sorting > (YARN-5715). We should adopt the id prefix for YARN entities, MR entities, > and DS entities to take advantage of the id prefix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5792) adopt the id prefix for YARN, MR, and DS entities
[ https://issues.apache.org/jira/browse/YARN-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655013#comment-15655013 ] Hadoop QA commented on YARN-5792: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 10 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 45s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 36s{color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 14s{color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 44s{color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 58s{color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 35s{color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 5s{color} | {color:green} YARN-5355 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 55s{color} | {color:green} YARN-5355 passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 16s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 55s{color} | {color:orange} root: The patch generated 35 new + 1286 unchanged - 21 fixed = 1321 total (was 1307) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 7s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 26s{color} | {color:red} hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core generated 9 new + 2496 unchanged - 0 fixed = 2505 total (was 2496) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 15m 44s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 40m 53s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 34s{color} | {color:red} hadoop-yarn-applications-distributedshell in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 34s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 56s{color} | {color:green} hadoop-mapreduce-client-app in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}115m 38s{color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}282m 24s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.applications.distributedshell.TestDistributedShell | | | hadoop.mapred.TestMRTimelineEventHandling | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Issue | YARN-5792 | | JIRA Patch URL |
[jira] [Commented] (YARN-5600) Add a parameter to ContainerLaunchContext to emulate yarn.nodemanager.delete.debug-delay-sec on a per-application basis
[ https://issues.apache.org/jira/browse/YARN-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654874#comment-15654874 ] Miklos Szegedi commented on YARN-5600: -- The build error seems to be unrelated to the change: Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.3.1:exec (npm install) on project hadoop-yarn-ui > Add a parameter to ContainerLaunchContext to emulate > yarn.nodemanager.delete.debug-delay-sec on a per-application basis > --- > > Key: YARN-5600 > URL: https://issues.apache.org/jira/browse/YARN-5600 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.0.0-alpha1 >Reporter: Daniel Templeton >Assignee: Miklos Szegedi > Labels: oct16-medium > Attachments: YARN-5600.000.patch, YARN-5600.001.patch, > YARN-5600.002.patch, YARN-5600.003.patch, YARN-5600.004.patch, > YARN-5600.005.patch, YARN-5600.006.patch > > > To make debugging application launch failures simpler, I'd like to add a > parameter to the CLC to allow an application owner to request delayed > deletion of the application's launch artifacts. > This JIRA solves largely the same problem as YARN-5599, but for cases where > ATS is not in use, e.g. branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3955) Support for priority ACLs in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654866#comment-15654866 ] Hadoop QA commented on YARN-3955: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 38s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 6m 0s{color} | {color:red} hadoop-yarn in trunk failed. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 18s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 4m 44s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 4m 44s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 52s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 26 new + 365 unchanged - 1 fixed = 391 total (was 366) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 35s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 56s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 39m 18s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 87m 17s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.PriorityACLConfiguration.createACLStringPerPriority(HashMap, Map) invokes inefficient new String() constructor At PriorityACLConfiguration.java:String() constructor At PriorityACLConfiguration.java:[line 137] | | | Call to StringBuilder.equals(String) in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.PriorityACLConfiguration.createACLStringForPriority(Map, Priority, String, PriorityACLConfiguration$PriorityACLConfig) At PriorityACLConfiguration.java:Priority, String, PriorityACLConfiguration$PriorityACLConfig) At PriorityACLConfiguration.java:[line 223] | | Failed junit
[jira] [Commented] (YARN-5600) Add a parameter to ContainerLaunchContext to emulate yarn.nodemanager.delete.debug-delay-sec on a per-application basis
[ https://issues.apache.org/jira/browse/YARN-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654842#comment-15654842 ] Hadoop QA commented on YARN-5600: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 53s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 4s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 5m 17s{color} | {color:red} hadoop-yarn in trunk failed. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 3m 42s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 3m 42s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 3m 42s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 53s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 577 unchanged - 7 fixed = 577 total (was 584) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 34s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 24s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 14m 45s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 64m 21s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:e809691 | | JIRA Issue | YARN-5600 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12838405/YARN-5600.006.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc | | uname | Linux 7c38b041e57b 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 89354f0 | | Default Java | 1.8.0_111 | | compile | https://builds.apache.org/job/PreCommit-YARN-Build/13862/artifact/patchprocess/branch-compile-hadoop-yarn-project_hadoop-yarn.txt | | findbugs | v3.0.0 | | compile |
[jira] [Commented] (YARN-5634) Simplify initialization/use of RouterPolicy via a RouterPolicyFacade
[ https://issues.apache.org/jira/browse/YARN-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654839#comment-15654839 ] Carlo Curino commented on YARN-5634: [~subru] thanks for the prompt feedback. I addressed most of your points, and discuss the rest below. {{YarnConfiguration}} * I think having initialization params for the policy will be useful (while the current choice of default does not strictly needs params, I don't like hardcoding a null or empty buffer there, as a change of default should be limited to changes in YarnConfiguration). {{RouterPolicyFacade}} * I am explicitly avoiding the Charset.defaultCharset(), as this might depend on OS/VM conf, and since the serialize/deserialize will happen on separate machines, I want to avoid misaligned defaults which could means we need to redeploy code to fix bugs on a live cluster (for example if we go from a UniformRandomRouterPolicy to a WeightedRandomRouterPolicy and we realize the VMs have different defaultCharset). * I don't think the *if* rewrite proposed matches the semantics we need. We need to initialize both in case a queue is not been cached before, or if the cached copy is different. Am I missing something? {{TestFederationPolicyFacade}} * I am not sure I follow what you are proposing (as it is minor I will ask you offline). > Simplify initialization/use of RouterPolicy via a RouterPolicyFacade > - > > Key: YARN-5634 > URL: https://issues.apache.org/jira/browse/YARN-5634 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Affects Versions: YARN-2915 >Reporter: Carlo Curino >Assignee: Carlo Curino > Labels: oct16-medium > Attachments: YARN-5634-YARN-2915.01.patch, > YARN-5634-YARN-2915.02.patch > > > The current set of policies require some machinery to (re)initialize based on > changes in the SubClusterPolicyConfiguration. This JIRA tracks the effort to > hide much of that behind a simple RouterPolicyFacade, making lifecycle and > usage of the policies easier to consumers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5694) ZKRMStateStore should always start its verification thread to prevent accidental state store corruption
[ https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654697#comment-15654697 ] Jian He edited comment on YARN-5694 at 11/10/16 6:09 PM: - bq. If we agree that it's bad to have two RMs accidentally sharing the same state store, If it's in non-HA mode, currently there's no protection in the ZKStore preventing two RMs from sharing the same store. All the ACLs setting related code is only used in HA mode. Essentially, with current patch, I doubt it will get NoAuthException in the verifyThread, without making user change the ACLs manually. So the handling code in this patch will not be triggered with default setting. Maybe I'm wrong, you may try on a real cluster.. also, I thinking setting ACLs for RM is not a required step for deploying non-HA cluster, forcing this to be set is behavior change.. bq. why would you not want to catch the issue as early as possible? My point is that first,will this code work as mentioned above. second, if there's no difference in terms of functionality, why do I need to start a thread pinging the zk continuously every few seconds. Of course, I might miss something, you may clarify more... Also, is the use-case mainly about two clusters sharing the same zk-store with the same path ? IMHO, this is not a primary use-case to solve, if user mis-configured, it's user's fault. There are many other places that can go wrong. e.g. if two clusters configure the same path for anything on HDFS. If the use-case is about two RMs sharing the same zk-path in the same cluster with non-HA mode. I think in non-HA mode, the invalid RM will not take workload in the first place, clients, NMs will not switch to that RM if HA is not configured properly. was (Author: jianhe): bq. If we agree that it's bad to have two RMs accidentally sharing the same state store, If it's in non-HA mode, currently there's no protection in the ZKStore preventing two RMs from sharing the same store. All the ACLs setting related code is only used in HA mode. Essentially, with current patch, I doubt it will get NoAuthException in the verifyThread, without making user change the ACLs manually. So the handling code in this patch will not be triggered with default setting. Maybe I'm wrong, you may try on a real cluster.. bq. why would you not want to catch the issue as early as possible? My point is that first,will this code work as mentioned above. second, if there's no difference in terms of functionality, why do I need to start a thread pinging the zk continuously every few seconds. Of course, I might miss something, you may clarify more... Also, is the use-case mainly about two clusters sharing the same zk-store with the same path ? IMHO, this is not a primary use-case to solve, if user mis-configured, it's user's fault. There are many other places that can go wrong. e.g. if two clusters configure the same path for anything on HDFS. If the use-case is about two RMs sharing the same zk-path in the same cluster with non-HA mode. I think in non-HA mode, the invalid RM will not take workload in the first place, clients, NMs will not switch to that RM if HA is not configured properly. > ZKRMStateStore should always start its verification thread to prevent > accidental state store corruption > --- > > Key: YARN-5694 > URL: https://issues.apache.org/jira/browse/YARN-5694 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0-alpha1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Labels: oct16-medium > Attachments: YARN-5694.001.patch, YARN-5694.002.patch, > YARN-5694.003.patch, YARN-5694.004.patch, YARN-5694.004.patch, > YARN-5694.005.patch, YARN-5694.006.patch, YARN-5694.007.patch, > YARN-5694.branch-2.7.001.patch, YARN-5694.branch-2.7.002.patch > > > There are two cases. In branch-2.7, the > {{ZKRMStateStore.VerifyActiveStatusThread}} is always started, even when > using embedded or Curator failover. In branch-2.8, the > {{ZKRMStateStore.VerifyActiveStatusThread}} is only started when HA is > disabled, which makes no sense. Based on the JIRA that introduced that > change (YARN-4559), I believe the intent was to start it only when embedded > failover is disabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5694) ZKRMStateStore should always start its verification thread to prevent accidental state store corruption
[ https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654697#comment-15654697 ] Jian He commented on YARN-5694: --- bq. If we agree that it's bad to have two RMs accidentally sharing the same state store, If it's in non-HA mode, currently there's no protection in the ZKStore preventing two RMs from sharing the same store. All the ACLs setting related code is only used in HA mode. Essentially, with current patch, I doubt it will get NoAuthException in the verifyThread, without making user change the ACLs manually. So the handling code in this patch will not be triggered with default setting. Maybe I'm wrong, you may try on a real cluster.. bq. why would you not want to catch the issue as early as possible? My point is that first,will this code work as mentioned above. second, if there's no difference in terms of functionality, why do I need to start a thread pinging the zk continuously every few seconds. Of course, I might miss something, you may clarify more... Also, is the use-case mainly about two clusters sharing the same zk-store with the same path ? IMHO, this is not a primary use-case to solve, if user mis-configured, it's user's fault. There are many other places that can go wrong. e.g. if two clusters configure the same path for anything on HDFS. If the use-case is about two RMs sharing the same zk-path in the same cluster with non-HA mode. I think in non-HA mode, the invalid RM will not take workload in the first place, clients, NMs will not switch to that RM if HA is not configured properly. > ZKRMStateStore should always start its verification thread to prevent > accidental state store corruption > --- > > Key: YARN-5694 > URL: https://issues.apache.org/jira/browse/YARN-5694 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0-alpha1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Labels: oct16-medium > Attachments: YARN-5694.001.patch, YARN-5694.002.patch, > YARN-5694.003.patch, YARN-5694.004.patch, YARN-5694.004.patch, > YARN-5694.005.patch, YARN-5694.006.patch, YARN-5694.007.patch, > YARN-5694.branch-2.7.001.patch, YARN-5694.branch-2.7.002.patch > > > There are two cases. In branch-2.7, the > {{ZKRMStateStore.VerifyActiveStatusThread}} is always started, even when > using embedded or Curator failover. In branch-2.8, the > {{ZKRMStateStore.VerifyActiveStatusThread}} is only started when HA is > disabled, which makes no sense. Based on the JIRA that introduced that > change (YARN-4559), I believe the intent was to start it only when embedded > failover is disabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5834) TestNodeStatusUpdater.testNMRMConnectionConf compares nodemanager wait time to the incorrect value
[ https://issues.apache.org/jira/browse/YARN-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654682#comment-15654682 ] Miklos Szegedi commented on YARN-5834: -- +1 non-binding. Thank you, [~lichangleo]! The change looks good to me. > TestNodeStatusUpdater.testNMRMConnectionConf compares nodemanager wait time > to the incorrect value > -- > > Key: YARN-5834 > URL: https://issues.apache.org/jira/browse/YARN-5834 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Chang Li >Priority: Minor > Attachments: YARN-5834-branch-2.001.patch > > > The function is TestNodeStatusUpdater#testNMRMConnectionConf() > I believe the connectionWaitMs references below were meant to be > nmRmConnectionWaitMs. > {code} > conf.setLong(YarnConfiguration.NM_RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > nmRmConnectionWaitMs); > conf.setLong(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > connectionWaitMs); > ... > long t = System.currentTimeMillis(); > long duration = t - waitStartTime; > boolean waitTimeValid = (duration >= nmRmConnectionWaitMs) && > (duration < (*connectionWaitMs* + delta)); > if(!waitTimeValid) { > // throw exception if NM doesn't retry long enough > throw new Exception("NM should have tried re-connecting to RM during > " + > "period of at least " + *connectionWaitMs* + " ms, but " + > "stopped retrying within " + (*connectionWaitMs* + delta) + > " ms: " + e, e); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5600) Add a parameter to ContainerLaunchContext to emulate yarn.nodemanager.delete.debug-delay-sec on a per-application basis
[ https://issues.apache.org/jira/browse/YARN-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi updated YARN-5600: - Attachment: YARN-5600.006.patch Fixing checkstyle issue > Add a parameter to ContainerLaunchContext to emulate > yarn.nodemanager.delete.debug-delay-sec on a per-application basis > --- > > Key: YARN-5600 > URL: https://issues.apache.org/jira/browse/YARN-5600 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.0.0-alpha1 >Reporter: Daniel Templeton >Assignee: Miklos Szegedi > Labels: oct16-medium > Attachments: YARN-5600.000.patch, YARN-5600.001.patch, > YARN-5600.002.patch, YARN-5600.003.patch, YARN-5600.004.patch, > YARN-5600.005.patch, YARN-5600.006.patch > > > To make debugging application launch failures simpler, I'd like to add a > parameter to the CLC to allow an application owner to request delayed > deletion of the application's launch artifacts. > This JIRA solves largely the same problem as YARN-5599, but for cases where > ATS is not in use, e.g. branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5865) Retrospect updateApplicationPriority api to handle state store exception in align with YARN-5611
[ https://issues.apache.org/jira/browse/YARN-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-5865: -- Issue Type: Sub-task (was: Bug) Parent: YARN-1963 > Retrospect updateApplicationPriority api to handle state store exception in > align with YARN-5611 > > > Key: YARN-5865 > URL: https://issues.apache.org/jira/browse/YARN-5865 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil G >Assignee: Sunil G > Attachments: YARN-5865.0001.patch > > > Post YARN-5611, revisit dynamic update of application priority logic with > respect to state store error handling. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3955) Support for priority ACLs in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3955: -- Attachment: YARN-3955.0001.patch Thanks [~jianhe] for the comments. bq.I think readLock is not needed, the field itself is not changing {{priorityACLs}} could be changed during reinitialize. Do we need to consider this point. Also we may add REST based support to add ACLs during runtime. If reinitialize is fine, i could remove lock and add it when REST is in progress. Thoughts? > Support for priority ACLs in CapacityScheduler > -- > > Key: YARN-3955 > URL: https://issues.apache.org/jira/browse/YARN-3955 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: ApplicationPriority-ACL.pdf, > ApplicationPriority-ACLs-v2.pdf, YARN-3955.0001.patch, YARN-3955.v0.patch, > YARN-3955.v1.patch, YARN-3955.wip1.patch > > > Support will be added for User-level access permission to use different > application-priorities. This is to avoid situations where all users try > running max priority in the cluster and thus degrading the value of > priorities. > Access Control Lists can be set per priority level within each queue. Below > is an example configuration that can be added in capacity scheduler > configuration > file for each Queue level. > yarn.scheduler.capacity.root...acl=user1,user2 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5825) ProportionalPreemptionalPolicy could use readLock over LeafQueue instead of synchronized block
[ https://issues.apache.org/jira/browse/YARN-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654471#comment-15654471 ] Hadoop QA commented on YARN-5825: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 22s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 3 new + 114 unchanged - 0 fixed = 117 total (was 114) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 42m 20s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 60m 20s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyIntraQueue | | | hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy | | | hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyForReservedContainers | | | hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyForNodePartitions | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:e809691 | | JIRA Issue | YARN-5825 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12838367/YARN-5825.0001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 4537a22cc49e 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ca68f9c | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/13859/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit |
[jira] [Commented] (YARN-5867) DirectoryCollection#checkDirs can cause incorrect permission of nmlocal dir
[ https://issues.apache.org/jira/browse/YARN-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654405#comment-15654405 ] Jason Lowe commented on YARN-5867: -- I'm curious how the top-level local directory was deleted in the first place. It sounds like an incorrect setup, like tmpwatch or something was coming along and blowing away NM directories. Arbitrary removal of NM directories while it is running is going to cause container failures at a minimum. I'm somewhat torn on this. Part of me thinks it would be best to treat this case like a bad disk, since something _clearly_ is wrong when top-level directories go missing out of the blue. Either admins setup something wrong on the cluster or the filesystem is having difficulty persisting data. Both are bad. Someone should really look into it, otherwise if we keep silently trying to fix it up after the fact then we just move the issue to debugging mysteriously failing containers. However I can see the benefits of not forcing an admin to intervene, as it can hobble along automatically (with degraded performance due to reruns of mysteriously crashing containers). If we do go with solution 1, we need to log an error when we detect it. > DirectoryCollection#checkDirs can cause incorrect permission of nmlocal dir > --- > > Key: YARN-5867 > URL: https://issues.apache.org/jira/browse/YARN-5867 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > Steps to reproduce > === > # Set umask to 077 for user > # Start nodemanager with nmlocal dir configured > nmlocal dir permission is *755* > {{LocalDirsHandlerService#serviceInit}} > {code} > FsPermission perm = new FsPermission((short)0755); > boolean createSucceeded = localDirs.createNonExistentDirs(localFs, perm); > createSucceeded &= logDirs.createNonExistentDirs(localFs, perm); > {code} > # After startup delete the nmlocal dir and wait for {{MonitoringTimerTask}} > to run (simulation using delete) > # Now check the permission of {{nmlocal dir}} will be *700* > *Root Cause* > {{DirectoryCollection#testDirs}} checks as following > {code} > // create a random dir to make sure fs isn't in read-only mode > verifyDirUsingMkdir(testDir); > {code} > which cause a new Random directory to be create in {{localdir}} using > {{DiskChecker.checkDir(dir)}} -> {{!mkdirsWithExistsCheck(dir)}} causing the > nmlocal dir to be created with wrong permission. *700* > Few application fail to container launch due to permission denied. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5651) Changes to NMStateStore to persist reinitialization and rollback state
[ https://issues.apache.org/jira/browse/YARN-5651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654347#comment-15654347 ] Arun Suresh commented on YARN-5651: --- [~jianhe], Wondering what the right approach for this is. Currently, in the normal container startup flow, if the NM recovery happens *after* the container start request comes in (the RecoveredContainerStatus == REQUESTED) but *before* the container is launched (at which point RecoveredContainerStatus == LAUNCHED), the container is just reported back as killed. If the Container has been launched and the container is active, then the ContainerImpl's internal state is regenerated using the StartContainerRequest. I was thinking, similarly, if a re-initialization request (re-init / restart or rollback) arrives for a container, we just mark in the stateStore as RecoveredContainerStatus == RE_INITIALIZING. If the NM restarts and recovers before the container has finished re-initializing, then we just report the container as killed. If the Container has completed the relaunch, I proposed we: # we can replace the ContainerImpl's internal state (launchContext, ResourceSet etc.). We already do this now. # we also replace the stored StartContainerRequest object, stored in the db, with a new StartContainerRequest which we create from the ContainerImpl's internal state. This way, there is no real need to actually store the ReInitializeContainerRequest object anywhere. Thoughts ? > Changes to NMStateStore to persist reinitialization and rollback state > -- > > Key: YARN-5651 > URL: https://issues.apache.org/jira/browse/YARN-5651 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5792) adopt the id prefix for YARN, MR, and DS entities
[ https://issues.apache.org/jira/browse/YARN-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654321#comment-15654321 ] Varun Saxena commented on YARN-5792: The patch does the following. # Uses inverse of container start time to publish id prefix for container entities. As container start time is not stored in NM state store, added support to add it as well. # Uses inverse of task start time to publish id prefix for task entities. # Uses inverse of task attempt start time to publish id prefix for task attempt entities. # Uses inverse of DS container start time to publish id prefix for distributed shell container entities. # Uses inverse of attempt start time to publish id prefix for app attempt entities. We can potentially use inverse of attempt id bit of ApplicationAttemptId as well here. Also app registered time can be used. # Uses inverse of DS attempt start time to publish id prefix for DS attempt entities. We can potentially use inverse of attempt id bit of ApplicationAttemptId as well here. # Uses inverse of id bit of job id to publish id prefix for job entities. We can potentially use job start time here. Last 3 points we can reach a consensus on as to what to use. > adopt the id prefix for YARN, MR, and DS entities > - > > Key: YARN-5792 > URL: https://issues.apache.org/jira/browse/YARN-5792 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-5792-YARN-5355.01.patch > > > We introduced the entity id prefix to support flexible entity sorting > (YARN-5715). We should adopt the id prefix for YARN entities, MR entities, > and DS entities to take advantage of the id prefix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5867) DirectoryCollection#checkDirs can cause incorrect permission of nmlocal dir
[ https://issues.apache.org/jira/browse/YARN-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654312#comment-15654312 ] Bibin A Chundatt commented on YARN-5867: cc/ [~jlowe] and [~vvasudev] . Could you please share your thoughts too? > DirectoryCollection#checkDirs can cause incorrect permission of nmlocal dir > --- > > Key: YARN-5867 > URL: https://issues.apache.org/jira/browse/YARN-5867 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > Steps to reproduce > === > # Set umask to 077 for user > # Start nodemanager with nmlocal dir configured > nmlocal dir permission is *755* > {{LocalDirsHandlerService#serviceInit}} > {code} > FsPermission perm = new FsPermission((short)0755); > boolean createSucceeded = localDirs.createNonExistentDirs(localFs, perm); > createSucceeded &= logDirs.createNonExistentDirs(localFs, perm); > {code} > # After startup delete the nmlocal dir and wait for {{MonitoringTimerTask}} > to run (simulation using delete) > # Now check the permission of {{nmlocal dir}} will be *700* > *Root Cause* > {{DirectoryCollection#testDirs}} checks as following > {code} > // create a random dir to make sure fs isn't in read-only mode > verifyDirUsingMkdir(testDir); > {code} > which cause a new Random directory to be create in {{localdir}} using > {{DiskChecker.checkDir(dir)}} -> {{!mkdirsWithExistsCheck(dir)}} causing the > nmlocal dir to be created with wrong permission. *700* > Few application fail to container launch due to permission denied. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5792) adopt the id prefix for YARN, MR, and DS entities
[ https://issues.apache.org/jira/browse/YARN-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-5792: --- Attachment: YARN-5792-YARN-5355.01.patch > adopt the id prefix for YARN, MR, and DS entities > - > > Key: YARN-5792 > URL: https://issues.apache.org/jira/browse/YARN-5792 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-5792-YARN-5355.01.patch > > > We introduced the entity id prefix to support flexible entity sorting > (YARN-5715). We should adopt the id prefix for YARN entities, MR entities, > and DS entities to take advantage of the id prefix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5694) ZKRMStateStore should always start its verification thread to prevent accidental state store corruption
[ https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654259#comment-15654259 ] Daniel Templeton commented on YARN-5694: Yes, but if the RM isn't in HA mode, the fencing is quietly ignored, which is also something I should address in the next version of this patch. The reason to have the thread always run is so that we react earlier. If we agree that it's bad to have two RMs accidentally sharing the same state store, why would you not want to catch the issue as early as possible? > ZKRMStateStore should always start its verification thread to prevent > accidental state store corruption > --- > > Key: YARN-5694 > URL: https://issues.apache.org/jira/browse/YARN-5694 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0-alpha1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Labels: oct16-medium > Attachments: YARN-5694.001.patch, YARN-5694.002.patch, > YARN-5694.003.patch, YARN-5694.004.patch, YARN-5694.004.patch, > YARN-5694.005.patch, YARN-5694.006.patch, YARN-5694.007.patch, > YARN-5694.branch-2.7.001.patch, YARN-5694.branch-2.7.002.patch > > > There are two cases. In branch-2.7, the > {{ZKRMStateStore.VerifyActiveStatusThread}} is always started, even when > using embedded or Curator failover. In branch-2.8, the > {{ZKRMStateStore.VerifyActiveStatusThread}} is only started when HA is > disabled, which makes no sense. Based on the JIRA that introduced that > change (YARN-4559), I believe the intent was to start it only when embedded > failover is disabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5867) DirectoryCollection#checkDirs can cause incorrect permission of nmlocal dir
[ https://issues.apache.org/jira/browse/YARN-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653928#comment-15653928 ] Bibin A Chundatt edited comment on YARN-5867 at 11/10/16 2:28 PM: -- [~naganarasimha...@apache.org] Not related to appcache.. This jira is forconfigured nmlocaldir permission {quote} User with which NM is run ? {quote} NM started user umask is 077 was (Author: bibinchundatt): [~naganarasimha...@apache.org] Not related to appcache.. This i for root directory. configured nmlocaldir. {quote} User with which NM is run ? {quote} NM started user umask is 077 > DirectoryCollection#checkDirs can cause incorrect permission of nmlocal dir > --- > > Key: YARN-5867 > URL: https://issues.apache.org/jira/browse/YARN-5867 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > Steps to reproduce > === > # Set umask to 077 for user > # Start nodemanager with nmlocal dir configured > nmlocal dir permission is *755* > {{LocalDirsHandlerService#serviceInit}} > {code} > FsPermission perm = new FsPermission((short)0755); > boolean createSucceeded = localDirs.createNonExistentDirs(localFs, perm); > createSucceeded &= logDirs.createNonExistentDirs(localFs, perm); > {code} > # After startup delete the nmlocal dir and wait for {{MonitoringTimerTask}} > to run (simulation using delete) > # Now check the permission of {{nmlocal dir}} will be *700* > *Root Cause* > {{DirectoryCollection#testDirs}} checks as following > {code} > // create a random dir to make sure fs isn't in read-only mode > verifyDirUsingMkdir(testDir); > {code} > which cause a new Random directory to be create in {{localdir}} using > {{DiskChecker.checkDir(dir)}} -> {{!mkdirsWithExistsCheck(dir)}} causing the > nmlocal dir to be created with wrong permission. *700* > Few application fail to container launch due to permission denied. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5825) ProportionalPreemptionalPolicy could use readLock over LeafQueue instead of synchronized block
[ https://issues.apache.org/jira/browse/YARN-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654156#comment-15654156 ] Sunil G edited comment on YARN-5825 at 11/10/16 2:19 PM: - Attaching an initial version of patch. In PCPP#cloneQueues, we use CSQueue abstract object. So I had to add {{getReadLock}} as an api in CSQueue interface to do lock. I agree this is not so clean way, however the alternative way is also not clean. We might need to add below code in PCPP if we want to remove {{getReadQueue}} from CSQueue interface {code} private ReentrantReadWriteLock.ReadLock getQueueReadLock(CSQueue curQueue) { if (curQueue instanceof ParentQueue) { return ((ParentQueue) curQueue).getReadLock(); } else if (curQueue instanceof LeafQueue) { return ((LeafQueue) curQueue).getReadLock(); } return null; } {code} [~leftnoteasy] [~jianhe] thoughts? was (Author: sunilg): Attaching an initial version of patch. In PCPP#cloneQueues, we use CSQueue abstract object. So I had to add {{getReadLock}} as an api in CSQueue interface to do lock. I agree this is not so clean way, however the alternative way is also not clean. We might need to add below code in PCPP if we want to remove {{getReadQueue}} from CSQueue interface {code} private ReentrantReadWriteLock.ReadLock getQueueReadLock(CSQueue curQueue) { if (curQueue instanceof ParentQueue) { return ((ParentQueue) curQueue).getReadLock(); } else if (curQueue instanceof LeafQueue) { return ((LeafQueue) curQueue).getReadLock(); } return null; } {code} [~leftnoteasy] thoughts? > ProportionalPreemptionalPolicy could use readLock over LeafQueue instead of > synchronized block > -- > > Key: YARN-5825 > URL: https://issues.apache.org/jira/browse/YARN-5825 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Sunil G >Assignee: Sunil G > Attachments: YARN-5825.0001.patch > > > Currently in PCPP, {{synchronized (curQueue)}} is used in various places. > Such instances could be replaced with a read lock. Thank you [~jianhe] for > pointing out the same as comment > [here|https://issues.apache.org/jira/browse/YARN-2009?focusedCommentId=15626578=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15626578] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5825) ProportionalPreemptionalPolicy could use readLock over LeafQueue instead of synchronized block
[ https://issues.apache.org/jira/browse/YARN-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-5825: -- Attachment: YARN-5825.0001.patch Attaching an initial version of patch. In PCPP#cloneQueues, we use CSQueue abstract object. So I had to add {{getReadLock}} as an api in CSQueue interface to do lock. I agree this is not so clean way, however the alternative way is also not clean. We might need to add below code in PCPP if we want to remove {{getReadQueue}} from CSQueue interface {code} private ReentrantReadWriteLock.ReadLock getQueueReadLock(CSQueue curQueue) { if (curQueue instanceof ParentQueue) { return ((ParentQueue) curQueue).getReadLock(); } else if (curQueue instanceof LeafQueue) { return ((LeafQueue) curQueue).getReadLock(); } return null; } {code} [~leftnoteasy] thoughts? > ProportionalPreemptionalPolicy could use readLock over LeafQueue instead of > synchronized block > -- > > Key: YARN-5825 > URL: https://issues.apache.org/jira/browse/YARN-5825 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Sunil G >Assignee: Sunil G > Attachments: YARN-5825.0001.patch > > > Currently in PCPP, {{synchronized (curQueue)}} is used in various places. > Such instances could be replaced with a read lock. Thank you [~jianhe] for > pointing out the same as comment > [here|https://issues.apache.org/jira/browse/YARN-2009?focusedCommentId=15626578=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15626578] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5792) adopt the id prefix for YARN, MR, and DS entities
[ https://issues.apache.org/jira/browse/YARN-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-5792: --- Attachment: (was: YARN-5792-YARN-5355.01.patch) > adopt the id prefix for YARN, MR, and DS entities > - > > Key: YARN-5792 > URL: https://issues.apache.org/jira/browse/YARN-5792 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355 >Reporter: Sangjin Lee >Assignee: Varun Saxena > > We introduced the entity id prefix to support flexible entity sorting > (YARN-5715). We should adopt the id prefix for YARN entities, MR entities, > and DS entities to take advantage of the id prefix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5792) adopt the id prefix for YARN, MR, and DS entities
[ https://issues.apache.org/jira/browse/YARN-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-5792: --- Attachment: YARN-5792-YARN-5355.01.patch > adopt the id prefix for YARN, MR, and DS entities > - > > Key: YARN-5792 > URL: https://issues.apache.org/jira/browse/YARN-5792 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-5355 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-5792-YARN-5355.01.patch > > > We introduced the entity id prefix to support flexible entity sorting > (YARN-5715). We should adopt the id prefix for YARN entities, MR entities, > and DS entities to take advantage of the id prefix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5865) Retrospect updateApplicationPriority api to handle state store exception in align with YARN-5611
[ https://issues.apache.org/jira/browse/YARN-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654119#comment-15654119 ] Hadoop QA commented on YARN-5865: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 41m 5s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 56m 28s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:e809691 | | JIRA Issue | YARN-5865 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12838358/YARN-5865.0001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 38af88531a3c 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ca68f9c | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/13858/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/13858/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Retrospect updateApplicationPriority api to handle state store exception in > align with YARN-5611 > > > Key: YARN-5865 > URL: https://issues.apache.org/jira/browse/YARN-5865 > Project: Hadoop YARN > Issue Type: Bug >Reporter:
[jira] [Updated] (YARN-5865) Retrospect updateApplicationPriority api to handle state store exception in align with YARN-5611
[ https://issues.apache.org/jira/browse/YARN-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-5865: -- Attachment: YARN-5865.0001.patch Updating an initial version of the patch. cc/[~rohithsharma] and [~jianhe] > Retrospect updateApplicationPriority api to handle state store exception in > align with YARN-5611 > > > Key: YARN-5865 > URL: https://issues.apache.org/jira/browse/YARN-5865 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sunil G >Assignee: Sunil G > Attachments: YARN-5865.0001.patch > > > Post YARN-5611, revisit dynamic update of application priority logic with > respect to state store error handling. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5867) DirectoryCollection#checkDirs can cause incorrect permission of nmlocal dir
[ https://issues.apache.org/jira/browse/YARN-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653928#comment-15653928 ] Bibin A Chundatt commented on YARN-5867: [~naganarasimha...@apache.org] Not related to appcache.. This i for root directory. configured nmlocaldir. {quote} User with which NM is run ? {quote} NM started user umask is 077 > DirectoryCollection#checkDirs can cause incorrect permission of nmlocal dir > --- > > Key: YARN-5867 > URL: https://issues.apache.org/jira/browse/YARN-5867 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > Steps to reproduce > === > # Set umask to 077 for user > # Start nodemanager with nmlocal dir configured > nmlocal dir permission is *755* > {{LocalDirsHandlerService#serviceInit}} > {code} > FsPermission perm = new FsPermission((short)0755); > boolean createSucceeded = localDirs.createNonExistentDirs(localFs, perm); > createSucceeded &= logDirs.createNonExistentDirs(localFs, perm); > {code} > # After startup delete the nmlocal dir and wait for {{MonitoringTimerTask}} > to run (simulation using delete) > # Now check the permission of {{nmlocal dir}} will be *700* > *Root Cause* > {{DirectoryCollection#testDirs}} checks as following > {code} > // create a random dir to make sure fs isn't in read-only mode > verifyDirUsingMkdir(testDir); > {code} > which cause a new Random directory to be create in {{localdir}} using > {{DiskChecker.checkDir(dir)}} -> {{!mkdirsWithExistsCheck(dir)}} causing the > nmlocal dir to be created with wrong permission. *700* > Few application fail to container launch due to permission denied. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5867) DirectoryCollection#checkDirs can cause incorrect permission of nmlocal dir
[ https://issues.apache.org/jira/browse/YARN-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-5867: --- Description: Steps to reproduce === # Set umask to 077 for user # Start nodemanager with nmlocal dir configured nmlocal dir permission is *755* {{LocalDirsHandlerService#serviceInit}} {code} FsPermission perm = new FsPermission((short)0755); boolean createSucceeded = localDirs.createNonExistentDirs(localFs, perm); createSucceeded &= logDirs.createNonExistentDirs(localFs, perm); {code} # After startup delete the nmlocal dir and wait for {{MonitoringTimerTask}} to run (simulation using delete) # Now check the permission of {{nmlocal dir}} will be *700* *Root Cause* {{DirectoryCollection#testDirs}} checks as following {code} // create a random dir to make sure fs isn't in read-only mode verifyDirUsingMkdir(testDir); {code} which cause a new Random directory to be create in {{localdir}} using {{DiskChecker.checkDir(dir)}} -> {{!mkdirsWithExistsCheck(dir)}} causing the nmlocal dir to be created with wrong permission. *700* Few application fail to container launch due to permission denied. was: Steps to reproduce === # Set umask to 077 for user # Start nodemanager with nmlocal dir configured nmlocal dir permission is *755* {{LocalDirsHandlerService#serviceInit}} {code} FsPermission perm = new FsPermission((short)0755); boolean createSucceeded = localDirs.createNonExistentDirs(localFs, perm); createSucceeded &= logDirs.createNonExistentDirs(localFs, perm); {code} # After startup delete the nmlocal dir and wait for {{MonitoringTimerTask}} to run (simulation using delete) # Now check the permission of {{nmlocal dir}} will be *750* *Root Cause* {{DirectoryCollection#testDirs}} checks as following {code} // create a random dir to make sure fs isn't in read-only mode verifyDirUsingMkdir(testDir); {code} which cause a new Random directory to be create in {{localdir}} using {{DiskChecker.checkDir(dir)}} -> {{!mkdirsWithExistsCheck(dir)}} causing the nmlocal dir to be created with wrong permission. *750* Few application fail to container launch due to permission denied. > DirectoryCollection#checkDirs can cause incorrect permission of nmlocal dir > --- > > Key: YARN-5867 > URL: https://issues.apache.org/jira/browse/YARN-5867 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > Steps to reproduce > === > # Set umask to 077 for user > # Start nodemanager with nmlocal dir configured > nmlocal dir permission is *755* > {{LocalDirsHandlerService#serviceInit}} > {code} > FsPermission perm = new FsPermission((short)0755); > boolean createSucceeded = localDirs.createNonExistentDirs(localFs, perm); > createSucceeded &= logDirs.createNonExistentDirs(localFs, perm); > {code} > # After startup delete the nmlocal dir and wait for {{MonitoringTimerTask}} > to run (simulation using delete) > # Now check the permission of {{nmlocal dir}} will be *700* > *Root Cause* > {{DirectoryCollection#testDirs}} checks as following > {code} > // create a random dir to make sure fs isn't in read-only mode > verifyDirUsingMkdir(testDir); > {code} > which cause a new Random directory to be create in {{localdir}} using > {{DiskChecker.checkDir(dir)}} -> {{!mkdirsWithExistsCheck(dir)}} causing the > nmlocal dir to be created with wrong permission. *700* > Few application fail to container launch due to permission denied. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero
[ https://issues.apache.org/jira/browse/YARN-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653916#comment-15653916 ] Sunil G commented on YARN-5545: --- +1 > App submit failure on queue with label when default queue partition capacity > is zero > > > Key: YARN-5545 > URL: https://issues.apache.org/jira/browse/YARN-5545 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Labels: oct16-medium > Attachments: YARN-5545.0001.patch, YARN-5545.0002.patch, > YARN-5545.0003.patch, YARN-5545.0005.patch, YARN-5545.0006.patch, > YARN-5545.0007.patch, YARN-5545.0008.patch, YARN-5545.004.patch, > capacity-scheduler.xml > > > Configure capacity scheduler > yarn.scheduler.capacity.root.default.capacity=0 > yarn.scheduler.capacity.root.queue1.accessible-node-labels.labelx.capacity=50 > yarn.scheduler.capacity.root.default.accessible-node-labels.labelx.capacity=50 > Submit application as below > ./yarn jar > ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-alpha2-SNAPSHOT-tests.jar > sleep -Dmapreduce.job.node-label-expression=labelx > -Dmapreduce.job.queuename=default -m 1 -r 1 -mt 1000 -rt 1 > {noformat} > 2016-08-21 18:21:31,375 INFO mapreduce.JobSubmitter: Cleaning up the staging > area /tmp/hadoop-yarn/staging/root/.staging/job_1471670113386_0001 > java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed > to submit application_1471670113386_0001 to YARN : > org.apache.hadoop.security.AccessControlException: Queue root.default already > has 0 applications, cannot accept submission of application: > application_1471670113386_0001 > at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:316) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:255) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1790) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362) > at org.apache.hadoop.mapreduce.SleepJob.run(SleepJob.java:273) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.mapreduce.SleepJob.main(SleepJob.java:194) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) > at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) > at > org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:136) > at > org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:144) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:239) > at org.apache.hadoop.util.RunJar.main(RunJar.java:153) > Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit > application_1471670113386_0001 to YARN : > org.apache.hadoop.security.AccessControlException: Queue root.default already > has 0 applications, cannot accept submission of application: > application_1471670113386_0001 > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:286) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:296) > at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301) > ... 25 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5867) DirectoryCollection#checkDirs can cause incorrect permission of nmlocal dir
[ https://issues.apache.org/jira/browse/YARN-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653904#comment-15653904 ] Naganarasimha G R commented on YARN-5867: - I think its kind of related to YARN-5765 and YARN-5287 with restricted rights on the user. Not sure which user you are trying to refer here user with which NM is run ? > DirectoryCollection#checkDirs can cause incorrect permission of nmlocal dir > --- > > Key: YARN-5867 > URL: https://issues.apache.org/jira/browse/YARN-5867 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > Steps to reproduce > === > # Set umask to 077 for user > # Start nodemanager with nmlocal dir configured > nmlocal dir permission is *755* > {{LocalDirsHandlerService#serviceInit}} > {code} > FsPermission perm = new FsPermission((short)0755); > boolean createSucceeded = localDirs.createNonExistentDirs(localFs, perm); > createSucceeded &= logDirs.createNonExistentDirs(localFs, perm); > {code} > # After startup delete the nmlocal dir and wait for {{MonitoringTimerTask}} > to run (simulation using delete) > # Now check the permission of {{nmlocal dir}} will be *750* > *Root Cause* > {{DirectoryCollection#testDirs}} checks as following > {code} > // create a random dir to make sure fs isn't in read-only mode > verifyDirUsingMkdir(testDir); > {code} > which cause a new Random directory to be create in {{localdir}} using > {{DiskChecker.checkDir(dir)}} -> {{!mkdirsWithExistsCheck(dir)}} causing the > nmlocal dir to be created with wrong permission. *750* > Few application fail to container launch due to permission denied. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5867) DirectoryCollection#checkDirs can cause incorrect permission of nmlocal dir
[ https://issues.apache.org/jira/browse/YARN-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-5867: --- Description: Steps to reproduce === # Set umask to 077 for user # Start nodemanager with nmlocal dir configured nmlocal dir permission is *755* {{LocalDirsHandlerService#serviceInit}} {code} FsPermission perm = new FsPermission((short)0755); boolean createSucceeded = localDirs.createNonExistentDirs(localFs, perm); createSucceeded &= logDirs.createNonExistentDirs(localFs, perm); {code} # After startup delete the nmlocal dir and wait for {{MonitoringTimerTask}} to run (simulation using delete) # Now check the permission of {{nmlocal dir}} will be *750* *Root Cause* {{DirectoryCollection#testDirs}} checks as following {code} // create a random dir to make sure fs isn't in read-only mode verifyDirUsingMkdir(testDir); {code} which cause a new Random directory to be create in {{localdir}} using {{DiskChecker.checkDir(dir)}} -> {{!mkdirsWithExistsCheck(dir)}} causing the nmlocal dir to be created with wrong permission. *750* Few application fail to container launch due to permission denied. was: Steps to reproduce === # Set umask to 027 for user # Start nodemanager with nmlocal dir configured nmlocal dir permission is *755* {{LocalDirsHandlerService#serviceInit}} {code} FsPermission perm = new FsPermission((short)0755); boolean createSucceeded = localDirs.createNonExistentDirs(localFs, perm); createSucceeded &= logDirs.createNonExistentDirs(localFs, perm); {code} # After startup delete the nmlocal dir and wait for {{MonitoringTimerTask}} to run (simulation using delete) # Now check the permission of {{nmlocal dir}} will be *750* *Root Cause* {{DirectoryCollection#testDirs}} checks as following {code} // create a random dir to make sure fs isn't in read-only mode verifyDirUsingMkdir(testDir); {code} which cause a new Random directory to be create in {{localdir}} using {{DiskChecker.checkDir(dir)}} -> {{!mkdirsWithExistsCheck(dir)}} causing the nmlocal dir to be created with wrong permission. *750* Few application fail to container launch due to permission denied. > DirectoryCollection#checkDirs can cause incorrect permission of nmlocal dir > --- > > Key: YARN-5867 > URL: https://issues.apache.org/jira/browse/YARN-5867 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > Steps to reproduce > === > # Set umask to 077 for user > # Start nodemanager with nmlocal dir configured > nmlocal dir permission is *755* > {{LocalDirsHandlerService#serviceInit}} > {code} > FsPermission perm = new FsPermission((short)0755); > boolean createSucceeded = localDirs.createNonExistentDirs(localFs, perm); > createSucceeded &= logDirs.createNonExistentDirs(localFs, perm); > {code} > # After startup delete the nmlocal dir and wait for {{MonitoringTimerTask}} > to run (simulation using delete) > # Now check the permission of {{nmlocal dir}} will be *750* > *Root Cause* > {{DirectoryCollection#testDirs}} checks as following > {code} > // create a random dir to make sure fs isn't in read-only mode > verifyDirUsingMkdir(testDir); > {code} > which cause a new Random directory to be create in {{localdir}} using > {{DiskChecker.checkDir(dir)}} -> {{!mkdirsWithExistsCheck(dir)}} causing the > nmlocal dir to be created with wrong permission. *750* > Few application fail to container launch due to permission denied. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero
[ https://issues.apache.org/jira/browse/YARN-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653872#comment-15653872 ] Naganarasimha G R commented on YARN-5545: - Thanks [~bibinchundatt], +1, Latest patch looks good to me, if no further comments will commit it later today. > App submit failure on queue with label when default queue partition capacity > is zero > > > Key: YARN-5545 > URL: https://issues.apache.org/jira/browse/YARN-5545 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Labels: oct16-medium > Attachments: YARN-5545.0001.patch, YARN-5545.0002.patch, > YARN-5545.0003.patch, YARN-5545.0005.patch, YARN-5545.0006.patch, > YARN-5545.0007.patch, YARN-5545.0008.patch, YARN-5545.004.patch, > capacity-scheduler.xml > > > Configure capacity scheduler > yarn.scheduler.capacity.root.default.capacity=0 > yarn.scheduler.capacity.root.queue1.accessible-node-labels.labelx.capacity=50 > yarn.scheduler.capacity.root.default.accessible-node-labels.labelx.capacity=50 > Submit application as below > ./yarn jar > ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-alpha2-SNAPSHOT-tests.jar > sleep -Dmapreduce.job.node-label-expression=labelx > -Dmapreduce.job.queuename=default -m 1 -r 1 -mt 1000 -rt 1 > {noformat} > 2016-08-21 18:21:31,375 INFO mapreduce.JobSubmitter: Cleaning up the staging > area /tmp/hadoop-yarn/staging/root/.staging/job_1471670113386_0001 > java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed > to submit application_1471670113386_0001 to YARN : > org.apache.hadoop.security.AccessControlException: Queue root.default already > has 0 applications, cannot accept submission of application: > application_1471670113386_0001 > at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:316) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:255) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1790) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362) > at org.apache.hadoop.mapreduce.SleepJob.run(SleepJob.java:273) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.mapreduce.SleepJob.main(SleepJob.java:194) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) > at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) > at > org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:136) > at > org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:144) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:239) > at org.apache.hadoop.util.RunJar.main(RunJar.java:153) > Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit > application_1471670113386_0001 to YARN : > org.apache.hadoop.security.AccessControlException: Queue root.default already > has 0 applications, cannot accept submission of application: > application_1471670113386_0001 > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:286) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:296) > at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301) > ... 25 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5867) DirectoryCollection#checkDirs can cause incorrect permission of nmlocal dir
[ https://issues.apache.org/jira/browse/YARN-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653868#comment-15653868 ] Bibin A Chundatt commented on YARN-5867: *Solution* # We can check and try creation of localdir before testdir() all dir with *0755* permission # Should create Random localdir only if the localdir exits , So that local dir will be considered as bad. In my opinion should use *Solution 1* makes NM auto recoverable.Thoughts? > DirectoryCollection#checkDirs can cause incorrect permission of nmlocal dir > --- > > Key: YARN-5867 > URL: https://issues.apache.org/jira/browse/YARN-5867 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > Steps to reproduce > === > # Set umask to 027 for user > # Start nodemanager with nmlocal dir configured > nmlocal dir permission is *755* > {{LocalDirsHandlerService#serviceInit}} > {code} > FsPermission perm = new FsPermission((short)0755); > boolean createSucceeded = localDirs.createNonExistentDirs(localFs, perm); > createSucceeded &= logDirs.createNonExistentDirs(localFs, perm); > {code} > # After startup delete the nmlocal dir and wait for {{MonitoringTimerTask}} > to run (simulation using delete) > # Now check the permission of {{nmlocal dir}} will be *750* > *Root Cause* > {{DirectoryCollection#testDirs}} checks as following > {code} > // create a random dir to make sure fs isn't in read-only mode > verifyDirUsingMkdir(testDir); > {code} > which cause a new Random directory to be create in {{localdir}} using > {{DiskChecker.checkDir(dir)}} -> {{!mkdirsWithExistsCheck(dir)}} causing the > nmlocal dir to be created with wrong permission. *750* > Few application fail to container launch due to permission denied. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5867) DirectoryCollection#checkDirs can cause incorrect permission of nmlocal dir
Bibin A Chundatt created YARN-5867: -- Summary: DirectoryCollection#checkDirs can cause incorrect permission of nmlocal dir Key: YARN-5867 URL: https://issues.apache.org/jira/browse/YARN-5867 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Steps to reproduce === # Set umask to 027 for user # Start nodemanager with nmlocal dir configured nmlocal dir permission is *755* {{LocalDirsHandlerService#serviceInit}} {code} FsPermission perm = new FsPermission((short)0755); boolean createSucceeded = localDirs.createNonExistentDirs(localFs, perm); createSucceeded &= logDirs.createNonExistentDirs(localFs, perm); {code} # After startup delete the nmlocal dir and wait for {{MonitoringTimerTask}} to run (simulation using delete) # Now check the permission of {{nmlocal dir}} will be *750* *Root Cause* {{DirectoryCollection#testDirs}} checks as following {code} // create a random dir to make sure fs isn't in read-only mode verifyDirUsingMkdir(testDir); {code} which cause a new Random directory to be create in {{localdir}} using {{DiskChecker.checkDir(dir)}} -> {{!mkdirsWithExistsCheck(dir)}} causing the nmlocal dir to be created with wrong permission. *750* Few application fail to container launch due to permission denied. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4218) Metric for resource*time that was preempted
[ https://issues.apache.org/jira/browse/YARN-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653752#comment-15653752 ] Hadoop QA commented on YARN-4218: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 10 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 35s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 34s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s{color} | {color:green} branch-2 passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 12s{color} | {color:green} branch-2 passed with JDK v1.7.0_111 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 51s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 53s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 29s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 42s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 56s{color} | {color:green} branch-2 passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 8s{color} | {color:green} branch-2 passed with JDK v1.7.0_111 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 1s{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 16s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 53s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 10 new + 922 unchanged - 10 fixed = 932 total (was 932) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 53s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 20s{color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_101 with JDK v1.8.0_101 generated 1 new + 924 unchanged - 0 fixed = 925 total (was 924) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 3s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 27s{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_111. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 27s{color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_111. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s{color} | {color:green}
[jira] [Commented] (YARN-4218) Metric for resource*time that was preempted
[ https://issues.apache.org/jira/browse/YARN-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653631#comment-15653631 ] Hadoop QA commented on YARN-4218: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 11 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 2m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 50s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 10s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 1s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 10 new + 915 unchanged - 10 fixed = 925 total (was 925) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 2m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 10s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 31s{color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 1 new + 924 unchanged - 0 fixed = 925 total (was 924) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 35s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 27s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 36s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 22s{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 40m 39s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 16m 12s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}128m 8s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:e809691 | | JIRA Issue | YARN-4218 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12838311/YARN-4218.006.patch | |
[jira] [Commented] (YARN-5834) TestNodeStatusUpdater.testNMRMConnectionConf compares nodemanager wait time to the incorrect value
[ https://issues.apache.org/jira/browse/YARN-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653530#comment-15653530 ] Hadoop QA commented on YARN-5834: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 52s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} branch-2 passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} branch-2 passed with JDK v1.7.0_111 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 54s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} branch-2 passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} branch-2 passed with JDK v1.7.0_111 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 15m 36s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_111. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 47m 9s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:b59b8b7 | | JIRA Issue | YARN-5834 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12838319/YARN-5834-branch-2.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 87018bab2daf 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | branch-2 / f7b2542 | | Default Java | 1.7.0_111 | | Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_101 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_111 | | findbugs | v3.0.0 | | JDK v1.7.0_111 Test
[jira] [Commented] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero
[ https://issues.apache.org/jira/browse/YARN-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653440#comment-15653440 ] Hadoop QA commented on YARN-5545: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 42m 39s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 59m 2s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:e809691 | | JIRA Issue | YARN-5545 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12838310/YARN-5545.0008.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 3e8ae235e1f0 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / c8bc7a8 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/13854/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/13854/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > App submit failure on queue with label when default queue partition capacity > is zero > > > Key: YARN-5545 > URL: https://issues.apache.org/jira/browse/YARN-5545 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Labels: oct16-medium >
[jira] [Commented] (YARN-5453) FairScheduler#update may skip update demand resource of child queue/app if current demand reached maxResource
[ https://issues.apache.org/jira/browse/YARN-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653433#comment-15653433 ] sandflee commented on YARN-5453: thanks [~kasha] for review and commit ! > FairScheduler#update may skip update demand resource of child queue/app if > current demand reached maxResource > - > > Key: YARN-5453 > URL: https://issues.apache.org/jira/browse/YARN-5453 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: sandflee >Assignee: sandflee > Labels: oct16-easy > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-5453.01.patch, YARN-5453.02.patch, > YARN-5453.03.patch, YARN-5453.04.patch, YARN-5453.05.patch > > > {code} > demand = Resources.createResource(0); > for (FSQueue childQueue : childQueues) { > childQueue.updateDemand(); > Resource toAdd = childQueue.getDemand(); > demand = Resources.add(demand, toAdd); > demand = Resources.componentwiseMin(demand, maxRes); > if (Resources.equals(demand, maxRes)) { > break; > } > } > {code} > if one singe queue's demand resource exceed maxRes, the other queue's demand > resource will not update. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5834) TestNodeStatusUpdater.testNMRMConnectionConf compares nodemanager wait time to the incorrect value
[ https://issues.apache.org/jira/browse/YARN-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653391#comment-15653391 ] Chang Li commented on YARN-5834: Thanks for reporting. Yes it's meant to be nmRmConnectionWaitMs. Provide branch-2 patch since this test does not exist in trunk > TestNodeStatusUpdater.testNMRMConnectionConf compares nodemanager wait time > to the incorrect value > -- > > Key: YARN-5834 > URL: https://issues.apache.org/jira/browse/YARN-5834 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Chang Li >Priority: Minor > Attachments: YARN-5834-branch-2.001.patch > > > The function is TestNodeStatusUpdater#testNMRMConnectionConf() > I believe the connectionWaitMs references below were meant to be > nmRmConnectionWaitMs. > {code} > conf.setLong(YarnConfiguration.NM_RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > nmRmConnectionWaitMs); > conf.setLong(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > connectionWaitMs); > ... > long t = System.currentTimeMillis(); > long duration = t - waitStartTime; > boolean waitTimeValid = (duration >= nmRmConnectionWaitMs) && > (duration < (*connectionWaitMs* + delta)); > if(!waitTimeValid) { > // throw exception if NM doesn't retry long enough > throw new Exception("NM should have tried re-connecting to RM during > " + > "period of at least " + *connectionWaitMs* + " ms, but " + > "stopped retrying within " + (*connectionWaitMs* + delta) + > " ms: " + e, e); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5834) TestNodeStatusUpdater.testNMRMConnectionConf compares nodemanager wait time to the incorrect value
[ https://issues.apache.org/jira/browse/YARN-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-5834: --- Attachment: YARN-5834-branch-2.001.patch > TestNodeStatusUpdater.testNMRMConnectionConf compares nodemanager wait time > to the incorrect value > -- > > Key: YARN-5834 > URL: https://issues.apache.org/jira/browse/YARN-5834 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Chang Li >Priority: Minor > Attachments: YARN-5834-branch-2.001.patch > > > The function is TestNodeStatusUpdater#testNMRMConnectionConf() > I believe the connectionWaitMs references below were meant to be > nmRmConnectionWaitMs. > {code} > conf.setLong(YarnConfiguration.NM_RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > nmRmConnectionWaitMs); > conf.setLong(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > connectionWaitMs); > ... > long t = System.currentTimeMillis(); > long duration = t - waitStartTime; > boolean waitTimeValid = (duration >= nmRmConnectionWaitMs) && > (duration < (*connectionWaitMs* + delta)); > if(!waitTimeValid) { > // throw exception if NM doesn't retry long enough > throw new Exception("NM should have tried re-connecting to RM during > " + > "period of at least " + *connectionWaitMs* + " ms, but " + > "stopped retrying within " + (*connectionWaitMs* + delta) + > " ms: " + e, e); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-5834) TestNodeStatusUpdater.testNMRMConnectionConf compares nodemanager wait time to the incorrect value
[ https://issues.apache.org/jira/browse/YARN-5834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li reassigned YARN-5834: -- Assignee: Chang Li > TestNodeStatusUpdater.testNMRMConnectionConf compares nodemanager wait time > to the incorrect value > -- > > Key: YARN-5834 > URL: https://issues.apache.org/jira/browse/YARN-5834 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Chang Li >Priority: Minor > Attachments: YARN-5834-branch-2.001.patch > > > The function is TestNodeStatusUpdater#testNMRMConnectionConf() > I believe the connectionWaitMs references below were meant to be > nmRmConnectionWaitMs. > {code} > conf.setLong(YarnConfiguration.NM_RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > nmRmConnectionWaitMs); > conf.setLong(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, > connectionWaitMs); > ... > long t = System.currentTimeMillis(); > long duration = t - waitStartTime; > boolean waitTimeValid = (duration >= nmRmConnectionWaitMs) && > (duration < (*connectionWaitMs* + delta)); > if(!waitTimeValid) { > // throw exception if NM doesn't retry long enough > throw new Exception("NM should have tried re-connecting to RM during > " + > "period of at least " + *connectionWaitMs* + " ms, but " + > "stopped retrying within " + (*connectionWaitMs* + delta) + > " ms: " + e, e); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5453) FairScheduler#update may skip update demand resource of child queue/app if current demand reached maxResource
[ https://issues.apache.org/jira/browse/YARN-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653367#comment-15653367 ] Hudson commented on YARN-5453: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10811 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10811/]) YARN-5453. FairScheduler#update may skip update demand resource of child (kasha: rev 86ac1ad9fd65c7dd12278372b369de38dc4616db) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java > FairScheduler#update may skip update demand resource of child queue/app if > current demand reached maxResource > - > > Key: YARN-5453 > URL: https://issues.apache.org/jira/browse/YARN-5453 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: sandflee >Assignee: sandflee > Labels: oct16-easy > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-5453.01.patch, YARN-5453.02.patch, > YARN-5453.03.patch, YARN-5453.04.patch, YARN-5453.05.patch > > > {code} > demand = Resources.createResource(0); > for (FSQueue childQueue : childQueues) { > childQueue.updateDemand(); > Resource toAdd = childQueue.getDemand(); > demand = Resources.add(demand, toAdd); > demand = Resources.componentwiseMin(demand, maxRes); > if (Resources.equals(demand, maxRes)) { > break; > } > } > {code} > if one singe queue's demand resource exceed maxRes, the other queue's demand > resource will not update. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org