[jira] [Updated] (YARN-9264) [Umbrella] Follow-up on IntelOpenCL FPGA plugin
[ https://issues.apache.org/jira/browse/YARN-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-9264: --- Issue Type: Improvement (was: Bug) > [Umbrella] Follow-up on IntelOpenCL FPGA plugin > --- > > Key: YARN-9264 > URL: https://issues.apache.org/jira/browse/YARN-9264 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.1.0 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.3.0 > > > The Intel FPGA resource type support was released in Hadoop 3.1.0. > Right now the plugin implementation has some deficiencies that need to be > fixed. This JIRA lists all problems that need to be resolved. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10068) TimelineV2Client may leak file descriptors creating ClientResponse objects.
[ https://issues.apache.org/jira/browse/YARN-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010409#comment-17010409 ] Hudson commented on YARN-10068: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17826 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17826/]) YARN-10068. Fix TimelineV2Client leaking File Descriptors. (pjoseph: rev 571795cd180d3077e8ba189b3b70e81f0d1a7044) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineV2ClientImpl.java > TimelineV2Client may leak file descriptors creating ClientResponse objects. > --- > > Key: YARN-10068 > URL: https://issues.apache.org/jira/browse/YARN-10068 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.0.0 > Environment: HDP VERSION3.1.4 > AMBARI VERSION2.7.4.0 >Reporter: Anand Srinivasan >Assignee: Anand Srinivasan >Priority: Critical > Fix For: 3.3.0 > > Attachments: YARN-10068.001.patch, YARN-10068.002.patch, > YARN-10068.003.patch, image-2020-01-02-14-58-12-773.png > > > Hi team, > Code-walkthrough between v1 and v2 of TimelineClient API revealed that v2 API > TimelineV2ClientImpl#putObjects doesn't close ClientResponse objects under > success status returned from Timeline Server. ClientResponse is closed only > under erroneous response from the server using ClientResponse#getEntity. > We also noticed that TimelineClient (v1) closes the ClientResponse object in > TimelineWriter#putEntities by calling ClientResponse#getEntity in both > success and error conditions from the server thereby avoiding this file > descriptor leak. > Customer's original issue and the symptom was that the NodeManager went down > because of 'too many files open' condition where there were lots of > CLOSED_WAIT sockets observed between the timeline client (from NM) and the > timeline server hosts. > Could you please help resolve this issue ? Thanks. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10068) TimelineV2Client may leak file descriptors creating ClientResponse objects.
[ https://issues.apache.org/jira/browse/YARN-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010404#comment-17010404 ] Anand Srinivasan commented on YARN-10068: - Hi Prabhu Joseph, Thanks for the review feedback and commit to the trunk. Kind regards. > TimelineV2Client may leak file descriptors creating ClientResponse objects. > --- > > Key: YARN-10068 > URL: https://issues.apache.org/jira/browse/YARN-10068 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.0.0 > Environment: HDP VERSION3.1.4 > AMBARI VERSION2.7.4.0 >Reporter: Anand Srinivasan >Assignee: Anand Srinivasan >Priority: Critical > Fix For: 3.3.0 > > Attachments: YARN-10068.001.patch, YARN-10068.002.patch, > YARN-10068.003.patch, image-2020-01-02-14-58-12-773.png > > > Hi team, > Code-walkthrough between v1 and v2 of TimelineClient API revealed that v2 API > TimelineV2ClientImpl#putObjects doesn't close ClientResponse objects under > success status returned from Timeline Server. ClientResponse is closed only > under erroneous response from the server using ClientResponse#getEntity. > We also noticed that TimelineClient (v1) closes the ClientResponse object in > TimelineWriter#putEntities by calling ClientResponse#getEntity in both > success and error conditions from the server thereby avoiding this file > descriptor leak. > Customer's original issue and the symptom was that the NodeManager went down > because of 'too many files open' condition where there were lots of > CLOSED_WAIT sockets observed between the timeline client (from NM) and the > timeline server hosts. > Could you please help resolve this issue ? Thanks. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10068) TimelineV2Client may leak file descriptors creating ClientResponse objects.
[ https://issues.apache.org/jira/browse/YARN-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010399#comment-17010399 ] Prabhu Joseph commented on YARN-10068: -- Thanks [~anand.srinivasan] for the patch and [~adam.antal] for the review. Have committed the [^YARN-10068.003.patch] to trunk. > TimelineV2Client may leak file descriptors creating ClientResponse objects. > --- > > Key: YARN-10068 > URL: https://issues.apache.org/jira/browse/YARN-10068 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.0.0 > Environment: HDP VERSION3.1.4 > AMBARI VERSION2.7.4.0 >Reporter: Anand Srinivasan >Assignee: Anand Srinivasan >Priority: Critical > Attachments: YARN-10068.001.patch, YARN-10068.002.patch, > YARN-10068.003.patch, image-2020-01-02-14-58-12-773.png > > > Hi team, > Code-walkthrough between v1 and v2 of TimelineClient API revealed that v2 API > TimelineV2ClientImpl#putObjects doesn't close ClientResponse objects under > success status returned from Timeline Server. ClientResponse is closed only > under erroneous response from the server using ClientResponse#getEntity. > We also noticed that TimelineClient (v1) closes the ClientResponse object in > TimelineWriter#putEntities by calling ClientResponse#getEntity in both > success and error conditions from the server thereby avoiding this file > descriptor leak. > Customer's original issue and the symptom was that the NodeManager went down > because of 'too many files open' condition where there were lots of > CLOSED_WAIT sockets observed between the timeline client (from NM) and the > timeline server hosts. > Could you please help resolve this issue ? Thanks. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10068) TimelineV2Client may leak file descriptors creating ClientResponse objects.
[ https://issues.apache.org/jira/browse/YARN-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-10068: - Fix Version/s: 3.3.0 > TimelineV2Client may leak file descriptors creating ClientResponse objects. > --- > > Key: YARN-10068 > URL: https://issues.apache.org/jira/browse/YARN-10068 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.0.0 > Environment: HDP VERSION3.1.4 > AMBARI VERSION2.7.4.0 >Reporter: Anand Srinivasan >Assignee: Anand Srinivasan >Priority: Critical > Fix For: 3.3.0 > > Attachments: YARN-10068.001.patch, YARN-10068.002.patch, > YARN-10068.003.patch, image-2020-01-02-14-58-12-773.png > > > Hi team, > Code-walkthrough between v1 and v2 of TimelineClient API revealed that v2 API > TimelineV2ClientImpl#putObjects doesn't close ClientResponse objects under > success status returned from Timeline Server. ClientResponse is closed only > under erroneous response from the server using ClientResponse#getEntity. > We also noticed that TimelineClient (v1) closes the ClientResponse object in > TimelineWriter#putEntities by calling ClientResponse#getEntity in both > success and error conditions from the server thereby avoiding this file > descriptor leak. > Customer's original issue and the symptom was that the NodeManager went down > because of 'too many files open' condition where there were lots of > CLOSED_WAIT sockets observed between the timeline client (from NM) and the > timeline server hosts. > Could you please help resolve this issue ? Thanks. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9698) [Umbrella] Tools to help migration from Fair Scheduler to Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-9698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010315#comment-17010315 ] Brahma Reddy Battula edited comment on YARN-9698 at 1/8/20 4:53 AM: [~cheersyang] I am planning for 3.3.0 Release( will share th plan on mailing list),This feature is marked to 3.3.0 release, can you guys update the plan..? was (Author: brahmareddy): [~cheersyang] I am planning for 3.3.0 Release,This feature is marked to 3.3.0 release, can you guys update the plan..? > [Umbrella] Tools to help migration from Fair Scheduler to Capacity Scheduler > > > Key: YARN-9698 > URL: https://issues.apache.org/jira/browse/YARN-9698 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Weiwei Yang >Priority: Major > Labels: fs2cs > Attachments: FS-CS Migration.pdf > > > We see some users want to migrate from Fair Scheduler to Capacity Scheduler, > this Jira is created as an umbrella to track all related efforts for the > migration, the scope contains > * Bug fixes > * Add missing features > * Migration tools that help to generate CS configs based on FS, validate > configs etc > * Documents > this is part of CS component, the purpose is to make the migration process > smooth. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5542) Scheduling of opportunistic containers
[ https://issues.apache.org/jira/browse/YARN-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010313#comment-17010313 ] Brahma Reddy Battula edited comment on YARN-5542 at 1/8/20 4:53 AM: [~kkaranasos], Planning to release 3.3.0 ( will share th plan on mailing list).. Most of the subtasks are finished and merged to 3.3.0, can we close this umbrella..? was (Author: brahmareddy): [~kkaranasos], Planning to release 3.3.0.. Most of the subtasks are finished and merged to 3.3.0, can we close this umbrella..? > Scheduling of opportunistic containers > -- > > Key: YARN-5542 > URL: https://issues.apache.org/jira/browse/YARN-5542 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Konstantinos Karanasos >Priority: Major > > This JIRA groups all efforts related to the scheduling of opportunistic > containers. > It includes the scheduling of opportunistic container through the central RM > (YARN-5220), through distributed scheduling (YARN-2877), as well as the > scheduling of containers based on actual node utilization (YARN-1011) and the > container promotion/demotion (YARN-5085). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9414) Application Catalog for YARN applications
[ https://issues.apache.org/jira/browse/YARN-9414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010338#comment-17010338 ] Brahma Reddy Battula edited comment on YARN-9414 at 1/8/20 4:52 AM: [~eyang], I am planning for 3.3.0 release ( will share th plan on mailing list), Most of the these jira's are merged to 3.3.0. can this feature GA without remaning jira's..? can you please update..? was (Author: brahmareddy): [~eyang], I am planning for 3.3.0 release, Most of the these jira's are merged to 3.3.0. can this feature GA without remaning jira's..? can you please update..? > Application Catalog for YARN applications > - > > Key: YARN-9414 > URL: https://issues.apache.org/jira/browse/YARN-9414 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN Appstore.pdf, YARN-Application-Catalog.pdf > > > YARN native services provides web services API to improve usability of > application deployment on Hadoop using collection of docker images. It would > be nice to have an application catalog system which provides an editorial and > search interface for YARN applications. This improves usability of YARN for > manage the life cycle of applications. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8851) [Umbrella] A pluggable device plugin framework to ease vendor plugin development
[ https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010344#comment-17010344 ] Brahma Reddy Battula commented on YARN-8851: [~tangzhankun], can we close this jira..? and move the pending jira's out..? I am planning for 3.3.0 release, going to mention this feature. > [Umbrella] A pluggable device plugin framework to ease vendor plugin > development > > > Key: YARN-8851 > URL: https://issues.apache.org/jira/browse/YARN-8851 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Attachments: YARN-8851-WIP2-trunk.001.patch, > YARN-8851-WIP3-trunk.001.patch, YARN-8851-WIP4-trunk.001.patch, > YARN-8851-WIP5-trunk.001.patch, YARN-8851-WIP6-trunk.001.patch, > YARN-8851-WIP7-trunk.001.patch, YARN-8851-WIP8-trunk.001.patch, > YARN-8851-WIP9-trunk.001.patch, YARN-8851-trunk.001.patch, > YARN-8851-trunk.002.patch, [YARN-8851] > YARN_New_Device_Plugin_Framework_Design_Proposal-3.pdf, [YARN-8851] > YARN_New_Device_Plugin_Framework_Design_Proposal-4.pdf, [YARN-8851] > YARN_New_Device_Plugin_Framework_Design_Proposal.pdf > > > At present, we support GPU/FPGA device in YARN through a native, coupling > way. But it's difficult for a vendor to implement such a device plugin > because the developer needs much knowledge of YARN internals. And this brings > burden to the community to maintain both YARN core and vendor-specific code. > Here we propose a new device plugin framework to ease vendor device plugin > development and provide a more flexible way to integrate with YARN NM. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9050) [Umbrella] Usability improvements for scheduler activities
[ https://issues.apache.org/jira/browse/YARN-9050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010339#comment-17010339 ] Tao Yang commented on YARN-9050: Glad to hear that 3.3.0 release is on the way and thanks for reminding me. The remaining issues are almost ready and only need some reviews, they can be done before this release, thanks. > [Umbrella] Usability improvements for scheduler activities > -- > > Key: YARN-9050 > URL: https://issues.apache.org/jira/browse/YARN-9050 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Fix For: 3.3.0 > > Attachments: image-2018-11-23-16-46-38-138.png > > > We have did some usability improvements for scheduler activities based on > YARN3.1 in our cluster as follows: > 1. Not available for multi-thread asynchronous scheduling. App and node > activities maybe confused when multiple scheduling threads record activities > of different allocation processes in the same variables like appsAllocation > and recordingNodesAllocation in ActivitiesManager. I think these variables > should be thread-local to make activities clear among multiple threads. > 2. Incomplete activities for multi-node lookup mechanism, since > ActivitiesLogger will skip recording through \{{if (node == null || > activitiesManager == null) }} when node is null which represents this > allocation is for multi-nodes. We need support recording activities for > multi-node lookup mechanism. > 3. Current app activities can not meet requirements of diagnostics, for > example, we can know that node doesn't match request but hard to know why, > especially when using placement constraints, it's difficult to make a > detailed diagnosis manually. So I propose to improve the diagnoses of > activities, add diagnosis for placement constraints check, update > insufficient resource diagnosis with detailed info (like 'insufficient > resource names:[memory-mb]') and so on. > 4. Add more useful fields for app activities, in some scenarios we need to > distinguish different requests but can't locate requests based on app > activities info, there are some other fields can help to filter what we want > such as allocation tags. We have added containerPriority, allocationRequestId > and allocationTags fields in AppAllocation. > 5. Filter app activities by key fields, sometimes the results of app > activities is massive, it's hard to find what we want. We have support filter > by allocation-tags to meet requirements from some apps, more over, we can > take container-priority and allocation-request-id as candidates if necessary. > 6. Aggregate app activities by diagnoses. For a single allocation process, > activities still can be massive in a large cluster, we frequently want to > know why request can't be allocated in cluster, it's hard to check every node > manually in a large cluster, so that aggregation for app activities by > diagnoses is necessary to solve this trouble. We have added groupingType > parameter for app-activities REST API for this, supports grouping by > diagnostics. > I think we can have a discuss about these points, useful improvements which > can be accepted will be added into the patch. Thanks. > Running design doc is attached > [here|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.2jnaobmmfne5]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9414) Application Catalog for YARN applications
[ https://issues.apache.org/jira/browse/YARN-9414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010338#comment-17010338 ] Brahma Reddy Battula commented on YARN-9414: [~eyang], I am planning for 3.3.0 release, Most of the these jira's are merged to 3.3.0. can this feature GA without remaning jira's..? can you please update..? > Application Catalog for YARN applications > - > > Key: YARN-9414 > URL: https://issues.apache.org/jira/browse/YARN-9414 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN Appstore.pdf, YARN-Application-Catalog.pdf > > > YARN native services provides web services API to improve usability of > application deployment on Hadoop using collection of docker images. It would > be nice to have an application catalog system which provides an editorial and > search interface for YARN applications. This improves usability of YARN for > manage the life cycle of applications. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8283) [Umbrella] MaWo - A Master Worker framework on top of YARN Services
[ https://issues.apache.org/jira/browse/YARN-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010333#comment-17010333 ] Brahma Reddy Battula commented on YARN-8283: [~yeshavora], I am planning for 3.3.0 Release. Two of these jira's are merged to 3.3.0.. Are you planing remaing jira's also.? Will these be GA without these jira's..? > [Umbrella] MaWo - A Master Worker framework on top of YARN Services > --- > > Key: YARN-8283 > URL: https://issues.apache.org/jira/browse/YARN-8283 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > Attachments: [Design Doc] [YARN-8283] MaWo - A Master Worker > framework on top of YARN Services.pdf > > > There is a need for an application / framework to handle Master-Worker > scenarios. There are existing frameworks on YARN which can be used to run a > job in distributed manner such as Mapreduce, Tez, Spark etc. But > master-worker use-cases usually are force-fed into one of these existing > frameworks which have been designed primarily around data-parallelism instead > of generic Master Worker type of computations. > In this JIRA, we’d like to contribute MaWo - a YARN Service based framework > that achieves this goal. The overall goal is to create an app that can take > an input job specification with tasks, their durations and have a Master dish > the tasks off to a predetermined set of workers. The components will be > responsible for making sure that the tasks and the overall job finish in > specific time durations. > We have been using a version of the MaWo framework for running unit tests of > Hadoop in a parallel manner on an existing Hadoop YARN cluster. What > typically takes 10 hours to run all of Hadoop project’s unit-tests can finish > under 20 minutes on a MaWo app of about 50 containers! > YARN-3307 was an original attempt at this but through a first-class YARN app. > In this JIRA, we instead use YARN Service for orchestration so that our code > can focus on the core Master Worker paradigm. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8472) YARN Container Phase 2
[ https://issues.apache.org/jira/browse/YARN-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010332#comment-17010332 ] Brahma Reddy Battula commented on YARN-8472: [~eyang], Most of the jira's are fixed. Can we close this umbrella? As of some of these jira's are only in 3.3.0, I am planning to put in 3.3.0 release plan. > YARN Container Phase 2 > -- > > Key: YARN-8472 > URL: https://issues.apache.org/jira/browse/YARN-8472 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > > In YARN-3611, we have implemented basic Docker container support for YARN. > This story is the next phase to improve container usability. > Several area for improvements are: > # Software defined network support > # Interactive shell to container > # User management sss/nscd integration > # Runc/containerd support > # Metrics/Logs integration with Timeline service v2 > # Docker container profiles > # Docker cgroup management -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9050) [Umbrella] Usability improvements for scheduler activities
[ https://issues.apache.org/jira/browse/YARN-9050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010326#comment-17010326 ] Brahma Reddy Battula commented on YARN-9050: [~Tao Yang] Looks only two jira's are pending for this feature. I am plannig for 3.3.0 release and put this feaute in list.. can you update the plan for this..? > [Umbrella] Usability improvements for scheduler activities > -- > > Key: YARN-9050 > URL: https://issues.apache.org/jira/browse/YARN-9050 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Fix For: 3.3.0 > > Attachments: image-2018-11-23-16-46-38-138.png > > > We have did some usability improvements for scheduler activities based on > YARN3.1 in our cluster as follows: > 1. Not available for multi-thread asynchronous scheduling. App and node > activities maybe confused when multiple scheduling threads record activities > of different allocation processes in the same variables like appsAllocation > and recordingNodesAllocation in ActivitiesManager. I think these variables > should be thread-local to make activities clear among multiple threads. > 2. Incomplete activities for multi-node lookup mechanism, since > ActivitiesLogger will skip recording through \{{if (node == null || > activitiesManager == null) }} when node is null which represents this > allocation is for multi-nodes. We need support recording activities for > multi-node lookup mechanism. > 3. Current app activities can not meet requirements of diagnostics, for > example, we can know that node doesn't match request but hard to know why, > especially when using placement constraints, it's difficult to make a > detailed diagnosis manually. So I propose to improve the diagnoses of > activities, add diagnosis for placement constraints check, update > insufficient resource diagnosis with detailed info (like 'insufficient > resource names:[memory-mb]') and so on. > 4. Add more useful fields for app activities, in some scenarios we need to > distinguish different requests but can't locate requests based on app > activities info, there are some other fields can help to filter what we want > such as allocation tags. We have added containerPriority, allocationRequestId > and allocationTags fields in AppAllocation. > 5. Filter app activities by key fields, sometimes the results of app > activities is massive, it's hard to find what we want. We have support filter > by allocation-tags to meet requirements from some apps, more over, we can > take container-priority and allocation-request-id as candidates if necessary. > 6. Aggregate app activities by diagnoses. For a single allocation process, > activities still can be massive in a large cluster, we frequently want to > know why request can't be allocated in cluster, it's hard to check every node > manually in a large cluster, so that aggregation for app activities by > diagnoses is necessary to solve this trouble. We have added groupingType > parameter for app-activities REST API for this, supports grouping by > diagnostics. > I think we can have a discuss about these points, useful improvements which > can be accepted will be added into the patch. Thanks. > Running design doc is attached > [here|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.2jnaobmmfne5]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9698) [Umbrella] Tools to help migration from Fair Scheduler to Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-9698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010315#comment-17010315 ] Brahma Reddy Battula commented on YARN-9698: [~cheersyang] I am planning for 3.3.0 Release,This feature is marked to 3.3.0 release, can you guys update the plan..? > [Umbrella] Tools to help migration from Fair Scheduler to Capacity Scheduler > > > Key: YARN-9698 > URL: https://issues.apache.org/jira/browse/YARN-9698 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Weiwei Yang >Priority: Major > Labels: fs2cs > Attachments: FS-CS Migration.pdf > > > We see some users want to migrate from Fair Scheduler to Capacity Scheduler, > this Jira is created as an umbrella to track all related efforts for the > migration, the scope contains > * Bug fixes > * Add missing features > * Migration tools that help to generate CS configs based on FS, validate > configs etc > * Documents > this is part of CS component, the purpose is to make the migration process > smooth. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5542) Scheduling of opportunistic containers
[ https://issues.apache.org/jira/browse/YARN-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010313#comment-17010313 ] Brahma Reddy Battula commented on YARN-5542: [~kkaranasos], Planning to release 3.3.0.. Most of the subtasks are finished and merged to 3.3.0, can we close this umbrella..? > Scheduling of opportunistic containers > -- > > Key: YARN-5542 > URL: https://issues.apache.org/jira/browse/YARN-5542 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Konstantinos Karanasos >Priority: Major > > This JIRA groups all efforts related to the scheduling of opportunistic > containers. > It includes the scheduling of opportunistic container through the central RM > (YARN-5220), through distributed scheduling (YARN-2877), as well as the > scheduling of containers based on actual node utilization (YARN-1011) and the > container promotion/demotion (YARN-5085). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9014) runC container runtime
[ https://issues.apache.org/jira/browse/YARN-9014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010311#comment-17010311 ] Brahma Reddy Battula commented on YARN-9014: [~ebadger], Planning to release 3.3.0.. Looks some jira's are pending on this feature.. As some of jira's are already in 3.3.0,still you've plan to work on rest of the jira's..? > runC container runtime > -- > > Key: YARN-9014 > URL: https://issues.apache.org/jira/browse/YARN-9014 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Jason Darrell Lowe >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: OciSquashfsRuntime.v001.pdf, > RuncContainerRuntime.v002.pdf > > > This JIRA tracks a YARN container runtime that supports running containers in > images built by Docker but the runtime does not use Docker directly, and > Docker does not have to be installed on the nodes. The runtime leverages the > [OCI runtime standard|https://github.com/opencontainers/runtime-spec] to > launch containers, so an OCI-compliant runtime like {{runc}} is required. > {{runc}} has the benefit of not requiring a daemon like {{dockerd}} to be > running in order to launch/control containers. > The layers comprising the Docker image are uploaded to HDFS as > [squashfs|http://tldp.org/HOWTO/SquashFS-HOWTO/whatis.html] images, enabling > the runtime to efficiently download and execute directly on the compressed > layers. This saves image unpack time and space on the local disk. The image > layers, like other entries in the YARN distributed cache, can be spread > across the YARN local disks, increasing the available space for storing > container images on each node. > A design document will be posted shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers
[ https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010301#comment-17010301 ] Brahma Reddy Battula commented on YARN-1011: [~haibochen] , Looks most of the jira's are closed..Any plan to merge to trunk.. I am planning for 3.3.0 release so please let me know. > [Umbrella] Schedule containers based on utilization of currently allocated > containers > - > > Key: YARN-1011 > URL: https://issues.apache.org/jira/browse/YARN-1011 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Arun Murthy >Assignee: Karthik Kambatla >Priority: Major > Attachments: patch-for-yarn-1011.patch, yarn-1011-design-v0.pdf, > yarn-1011-design-v1.pdf, yarn-1011-design-v2.pdf, yarn-1011-design-v3.pdf > > > Currently RM allocates containers and assumes resources allocated are > utilized. > RM can, and should, get to a point where it measures utilization of allocated > containers and, if appropriate, allocate more (speculative?) containers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10063) Usage output of container-executor binary needs to include --http/--https argument
[ https://issues.apache.org/jira/browse/YARN-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010264#comment-17010264 ] Hadoop QA commented on YARN-10063: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 0s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 35m 23s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 37s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 22s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 74m 22s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-10063 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12990148/YARN-10063.002.patch | | Optional Tests | dupname asflicense compile cc mvnsite javac unit | | uname | Linux 81905a5ae989 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a7fccc1 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_232 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25346/testReport/ | | Max. process+thread count | 309 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/25346/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Usage output of container-executor binary needs to include --http/--https > argument > -- > > Key: YARN-10063 > URL: https://issues.apache.org/jira/browse/YARN-10063 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja >Priority: Minor > Attachments: YARN-10063.001.patch, YARN-10063.002.patch > > > YARN-8448/YARN-6586 seems to have introduced a new option - "\--http" > (default) and "\--https" that is possible to be passed in to the >
[jira] [Comment Edited] (YARN-10063) Usage output of container-executor binary needs to include --http/--https argument
[ https://issues.apache.org/jira/browse/YARN-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010225#comment-17010225 ] Siddharth Ahuja edited comment on YARN-10063 at 1/8/20 1:04 AM: Thanks [~pbacsko]. I have made the changes and uploaded the patch again that incorporates the changes discussed just above. Usage output of both commands - LAUNCH_CONTAINER (1) and LAUNCH_DOCKER_CONTAINER (4) has been updated and it looks as per below: {code} [root@ bin]# pwd /var/lib/yarn-ce/bin [root@ bin]# ll total 800 ---Sr-s--- 1 root yarn 728960 Jan 7 16:37 container-executor ---Sr-s--- 1 root yarn 87168 Nov 8 08:04 container-executor.orig [root@ bin]# ./container-executor [root@sid-63-1 bin]# ./container-executor Usage: container-executor --checksetup container-executor --mount-cgroups ... [DISABLED] container-executor --tc-modify-state [DISABLED] container-executor --tc-read-state [DISABLED] container-executor --tc-read-stats [DISABLED] container-executor --exec-container [DISABLED] container-executor --run-docker [DISABLED] container-executor --remove-docker-container [hierarchy] [DISABLED] container-executor --inspect-docker-container [DISABLED] container-executor --run-runc-container [DISABLED] container-executor --reap-runc-layer-mounts container-executor where command and command-args: initialize container: 0 appid tokens nm-local-dirs nm-log-dirs cmd app... launch container: 1 appid containerid workdir container-script tokens --http | --https keystorepath truststorepath pidfile nm-local-dirs nm-log-dirs resources [DISABLED] launch docker container: 4 appid containerid workdir container-script tokens --http | --https keystorepath truststorepath pidfile nm-local-dirs nm-log-dirs docker-command-file resources signal container: 2 container-pid signal delete as user: 3 relative-path list as user: 5 relative-path [DISABLED] sync yarn sysfs:6 app-id nm-local-dirs {code} Thanks in advance again for your check [~pbacsko]. was (Author: sahuja): Thanks [~pbacsko]. I have made the changes and uploaded the patch again that incorporates the changes discussed just above. Usage output of both commands - LAUNCH_CONTAINER (1) and LAUNCH_DOCKER_CONTAINER (4) has been updated and it looks as per below: {code} [root@ bin]# pwd /var/lib/yarn-ce/bin [root@ bin]# ll total 800 ---Sr-s--- 1 root yarn 728960 Jan 7 16:37 container-executor ---Sr-s--- 1 root yarn 87168 Nov 8 08:04 container-executor.orig [root@ bin]# ./container-executor [root@sid-63-1 bin]# ./container-executor Usage: container-executor --checksetup container-executor --mount-cgroups ... [DISABLED] container-executor --tc-modify-state [DISABLED] container-executor --tc-read-state [DISABLED] container-executor --tc-read-stats [DISABLED] container-executor --exec-container [DISABLED] container-executor --run-docker [DISABLED] container-executor --remove-docker-container [hierarchy] [DISABLED] container-executor --inspect-docker-container [DISABLED] container-executor --run-runc-container [DISABLED] container-executor --reap-runc-layer-mounts container-executor where command and command-args: initialize container: 0 appid tokens nm-local-dirs nm-log-dirs cmd app... launch container: 1 appid containerid workdir container-script tokens --http | --https keystorepath truststorepath pidfile nm-local-dirs nm-log-dirs resources [DISABLED] launch docker container: 4 appid containerid workdir container-script tokens --http | --https keystorepath truststorepath pidfile nm-local-dirs nm-log-dirs docker-command-file resources signal container: 2 container-pid signal delete as user: 3 relative-path list as user: 5 relative-path [DISABLED] sync yarn sysfs:6 app-id nm-local-dirs {code} > Usage output of container-executor binary needs to include --http/--https > argument > -- > > Key: YARN-10063 > URL: https://issues.apache.org/jira/browse/YARN-10063 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja >Priority: Minor > Attachments: YARN-10063.001.patch, YARN-10063.002.patch > > > YARN-8448/YARN-6586 seems to have introduced a new option - "\--http" > (default) and "\--https" that is possible to be passed in to the > container-executor binary, see : > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c#L564 > and >
[jira] [Comment Edited] (YARN-10063) Usage output of container-executor binary needs to include --http/--https argument
[ https://issues.apache.org/jira/browse/YARN-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010225#comment-17010225 ] Siddharth Ahuja edited comment on YARN-10063 at 1/8/20 1:02 AM: Thanks [~pbacsko]. I have made the changes and uploaded the patch again that incorporates the changes discussed just above. Usage output of both commands - LAUNCH_CONTAINER (1) and LAUNCH_DOCKER_CONTAINER (4) has been updated and it looks as per below: {code} [root@ bin]# pwd /var/lib/yarn-ce/bin [root@ bin]# ll total 800 ---Sr-s--- 1 root yarn 728960 Jan 7 16:37 container-executor ---Sr-s--- 1 root yarn 87168 Nov 8 08:04 container-executor.orig [root@ bin]# ./container-executor [root@sid-63-1 bin]# ./container-executor Usage: container-executor --checksetup container-executor --mount-cgroups ... [DISABLED] container-executor --tc-modify-state [DISABLED] container-executor --tc-read-state [DISABLED] container-executor --tc-read-stats [DISABLED] container-executor --exec-container [DISABLED] container-executor --run-docker [DISABLED] container-executor --remove-docker-container [hierarchy] [DISABLED] container-executor --inspect-docker-container [DISABLED] container-executor --run-runc-container [DISABLED] container-executor --reap-runc-layer-mounts container-executor where command and command-args: initialize container: 0 appid tokens nm-local-dirs nm-log-dirs cmd app... launch container: 1 appid containerid workdir container-script tokens --http | --https keystorepath truststorepath pidfile nm-local-dirs nm-log-dirs resources [DISABLED] launch docker container: 4 appid containerid workdir container-script tokens --http | --https keystorepath truststorepath pidfile nm-local-dirs nm-log-dirs docker-command-file resources signal container: 2 container-pid signal delete as user: 3 relative-path list as user: 5 relative-path [DISABLED] sync yarn sysfs:6 app-id nm-local-dirs {code} was (Author: sahuja): Thanks [~pbacsko]. I have made the changes and uploaded the patch again that incorporates the changes discussed just above. sage output of both commands - LAUNCH_CONTAINER (1) and LAUNCH_DOCKER_CONTAINER (4) has been updated to include the http flag or https flag with details. The usage output from the container-executor binary now looks as per below: {code} [root@ bin]# pwd /var/lib/yarn-ce/bin [root@ bin]# ll total 800 ---Sr-s--- 1 root yarn 728960 Jan 7 16:37 container-executor ---Sr-s--- 1 root yarn 87168 Nov 8 08:04 container-executor.orig [root@ bin]# ./container-executor [root@sid-63-1 bin]# ./container-executor Usage: container-executor --checksetup container-executor --mount-cgroups ... [DISABLED] container-executor --tc-modify-state [DISABLED] container-executor --tc-read-state [DISABLED] container-executor --tc-read-stats [DISABLED] container-executor --exec-container [DISABLED] container-executor --run-docker [DISABLED] container-executor --remove-docker-container [hierarchy] [DISABLED] container-executor --inspect-docker-container [DISABLED] container-executor --run-runc-container [DISABLED] container-executor --reap-runc-layer-mounts container-executor where command and command-args: initialize container: 0 appid tokens nm-local-dirs nm-log-dirs cmd app... launch container: 1 appid containerid workdir container-script tokens --http | --https keystorepath truststorepath pidfile nm-local-dirs nm-log-dirs resources [DISABLED] launch docker container: 4 appid containerid workdir container-script tokens --http | --https keystorepath truststorepath pidfile nm-local-dirs nm-log-dirs docker-command-file resources signal container: 2 container-pid signal delete as user: 3 relative-path list as user: 5 relative-path [DISABLED] sync yarn sysfs:6 app-id nm-local-dirs {code} > Usage output of container-executor binary needs to include --http/--https > argument > -- > > Key: YARN-10063 > URL: https://issues.apache.org/jira/browse/YARN-10063 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja >Priority: Minor > Attachments: YARN-10063.001.patch, YARN-10063.002.patch > > > YARN-8448/YARN-6586 seems to have introduced a new option - "\--http" > (default) and "\--https" that is possible to be passed in to the > container-executor binary, see : >
[jira] [Commented] (YARN-10063) Usage output of container-executor binary needs to include --http/--https argument
[ https://issues.apache.org/jira/browse/YARN-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010225#comment-17010225 ] Siddharth Ahuja commented on YARN-10063: Thanks [~pbacsko]. I have made the changes and uploaded the patch again that incorporates the changes discussed just above. sage output of both commands - LAUNCH_CONTAINER (1) and LAUNCH_DOCKER_CONTAINER (4) has been updated to include the http flag or https flag with details. The usage output from the container-executor binary now looks as per below: {code} [root@ bin]# pwd /var/lib/yarn-ce/bin [root@ bin]# ll total 800 ---Sr-s--- 1 root yarn 728960 Jan 7 16:37 container-executor ---Sr-s--- 1 root yarn 87168 Nov 8 08:04 container-executor.orig [root@ bin]# ./container-executor [root@sid-63-1 bin]# ./container-executor Usage: container-executor --checksetup container-executor --mount-cgroups ... [DISABLED] container-executor --tc-modify-state [DISABLED] container-executor --tc-read-state [DISABLED] container-executor --tc-read-stats [DISABLED] container-executor --exec-container [DISABLED] container-executor --run-docker [DISABLED] container-executor --remove-docker-container [hierarchy] [DISABLED] container-executor --inspect-docker-container [DISABLED] container-executor --run-runc-container [DISABLED] container-executor --reap-runc-layer-mounts container-executor where command and command-args: initialize container: 0 appid tokens nm-local-dirs nm-log-dirs cmd app... launch container: 1 appid containerid workdir container-script tokens --http | --https keystorepath truststorepath pidfile nm-local-dirs nm-log-dirs resources [DISABLED] launch docker container: 4 appid containerid workdir container-script tokens --http | --https keystorepath truststorepath pidfile nm-local-dirs nm-log-dirs docker-command-file resources signal container: 2 container-pid signal delete as user: 3 relative-path list as user: 5 relative-path [DISABLED] sync yarn sysfs:6 app-id nm-local-dirs {code} > Usage output of container-executor binary needs to include --http/--https > argument > -- > > Key: YARN-10063 > URL: https://issues.apache.org/jira/browse/YARN-10063 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja >Priority: Minor > Attachments: YARN-10063.001.patch, YARN-10063.002.patch > > > YARN-8448/YARN-6586 seems to have introduced a new option - "\--http" > (default) and "\--https" that is possible to be passed in to the > container-executor binary, see : > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c#L564 > and > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c#L521 > however, the usage output seems to have missed this: > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c#L74 > Raising this jira to improve this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10063) Usage output of container-executor binary needs to include --http/--https argument
[ https://issues.apache.org/jira/browse/YARN-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Ahuja updated YARN-10063: --- Attachment: YARN-10063.002.patch > Usage output of container-executor binary needs to include --http/--https > argument > -- > > Key: YARN-10063 > URL: https://issues.apache.org/jira/browse/YARN-10063 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja >Priority: Minor > Attachments: YARN-10063.001.patch, YARN-10063.002.patch > > > YARN-8448/YARN-6586 seems to have introduced a new option - "\--http" > (default) and "\--https" that is possible to be passed in to the > container-executor binary, see : > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c#L564 > and > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c#L521 > however, the usage output seems to have missed this: > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c#L74 > Raising this jira to improve this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out
[ https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010191#comment-17010191 ] Hadoop QA commented on YARN-8672: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 42s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} branch-3.2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 1s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 32s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 49s{color} | {color:green} branch-3.2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} branch-3.2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 24s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 1 new + 428 unchanged - 1 fixed = 429 total (was 429) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 14s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 22s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 74m 21s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:0f25cbbb251 | | JIRA Issue | YARN-8672 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12990139/YARN-8672-branch-3.2.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 23c84af7ea68 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-3.2 / 250cd9f | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_232 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/25345/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25345/testReport/ | | Max. process+thread count | 308 (vs. ulimit of 5500) | | modules | C:
[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS
[ https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010190#comment-17010190 ] Wilfred Spiegelenburg commented on YARN-9879: - Thank you [~leftnoteasy] for the comments. {quote}And once application is submitted to CS, internal to CS, we should make sure we use queue path instead of queue name at all other places. Otherwise we will complicate other logics. {quote} I agree that is what I had in mind too. Make it as simple as possible inside the scheduler and that is to use just the full path internally. For the configuration change: I do not think it is a problem and we can just accept the change. To be fair to the administrator we should show a message when the configuration is loaded or changed and the leaf queues are not unique (any more). However that is probably as far as we need to go. {quote}Instead of using scheduler.getQueue, we may need to consider to add a method like getAppSubmissionQueue() to get a queue based on path or name, and after that, we will put normalized queue_path back to submission context of application to make sure in the future inside scheduler we all refer to queue path. {quote} The FS already does something like this already because it uses a placement rule in all cases. We should leverage a similar mechanism in the CS. We pass the queue from the submission into the queue placement which handles the full path or not. In both cases it just passes back the queue object which will be using the full path. If the queue is not found or the queue name is not unique it fails as per normal. The returned queue info is updated in the app and submission context. Far simpler than putting the burden on the core scheduler. It is all hidden in the placement of the app into the queue using the placement engine. I did not mention queue mapping in my design. Queue mapping itself I thought did not need to change. We already calculate the parent queue in the rules if I am correct so the only change would be the return value. We do all internal handling for queues with the full queue path so it is a logical change. Using the placement rule for the qualified or not qualified mapping does require some changes in that area. I might have forgotten to mention other bits and pieces like the cli or flow on effects on the UI but that needs to assessed when we have have a design we agree on. There will be more jiras needed to fix separate parts when the change is made to the core. > Allow multiple leaf queues with the same name in CS > --- > > Key: YARN-9879 > URL: https://issues.apache.org/jira/browse/YARN-9879 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Attachments: DesignDoc_v1.pdf > > > Currently the leaf queue's name must be unique regardless of its position in > the queue hierarchy. > Design doc and first proposal is being made, I'll attach it as soon as it's > done. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10072) TestCSAllocateCustomResource failures
[ https://issues.apache.org/jira/browse/YARN-10072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010147#comment-17010147 ] Eric Payne commented on YARN-10072: --- +1. I'll commit tomorrow and port all branches back to branch-2.10. > TestCSAllocateCustomResource failures > - > > Key: YARN-10072 > URL: https://issues.apache.org/jira/browse/YARN-10072 > Project: Hadoop YARN > Issue Type: Test > Components: yarn >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Labels: YARN > Attachments: YARN-10072.001.patch, YARN-10072.002.patch > > > This test is failing for us consistently in our internal 2.10 based branch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out
[ https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010143#comment-17010143 ] Jim Brennan commented on YARN-8672: --- [~ebadger] I have uploaded a patch for branch-3.2. > TestContainerManager#testLocalingResourceWhileContainerRunning occasionally > times out > - > > Key: YARN-8672 > URL: https://issues.apache.org/jira/browse/YARN-8672 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.10.0, 3.2.0 >Reporter: Jason Darrell Lowe >Assignee: Chandni Singh >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-8672-branch-2.10.001.patch, > YARN-8672-branch-2.10.002.patch, YARN-8672-branch-2.10.003.patch, > YARN-8672-branch-3.2.001.patch, YARN-8672.001.patch, YARN-8672.002.patch, > YARN-8672.003.patch, YARN-8672.004.patch, YARN-8672.005.patch, > YARN-8672.006.patch, YARN-8672.007.patch, YARN-8672.008.patch > > > Precommit builds have been failing in > TestContainerManager#testLocalingResourceWhileContainerRunning. I have been > able to reproduce the problem without any patch applied if I run the test > enough times. It looks like something is removing container tokens from the > nmPrivate area just as a new localizer starts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out
[ https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-8672: -- Attachment: YARN-8672-branch-3.2.001.patch > TestContainerManager#testLocalingResourceWhileContainerRunning occasionally > times out > - > > Key: YARN-8672 > URL: https://issues.apache.org/jira/browse/YARN-8672 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.10.0, 3.2.0 >Reporter: Jason Darrell Lowe >Assignee: Chandni Singh >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-8672-branch-2.10.001.patch, > YARN-8672-branch-2.10.002.patch, YARN-8672-branch-2.10.003.patch, > YARN-8672-branch-3.2.001.patch, YARN-8672.001.patch, YARN-8672.002.patch, > YARN-8672.003.patch, YARN-8672.004.patch, YARN-8672.005.patch, > YARN-8672.006.patch, YARN-8672.007.patch, YARN-8672.008.patch > > > Precommit builds have been failing in > TestContainerManager#testLocalingResourceWhileContainerRunning. I have been > able to reproduce the problem without any patch applied if I run the test > enough times. It looks like something is removing container tokens from the > nmPrivate area just as a new localizer starts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10073) Intraqueue preemption doesn't work across partitions
Paul Jones created YARN-10073: - Summary: Intraqueue preemption doesn't work across partitions Key: YARN-10073 URL: https://issues.apache.org/jira/browse/YARN-10073 Project: Hadoop YARN Issue Type: Bug Components: capacity scheduler, capacityscheduler, scheduler preemption Affects Versions: 2.8.5 Reporter: Paul Jones Cluster: 1 Node with label "A" yarn.scheduler.capacity.root.accessible-node-labels=* yarn.resourcemanager.monitor.capacity.preemption.intra-queue-preemption.enabled=true yarn.scheduler.capacity.root.default.minimum-user-limit-percent=50 User 1: Submit job Y require 10x cluster resources to queue, default, using label "" User 2: (after job Y starts) submit job Z to queue, default using label "" What we see: Job Z doesn't start until job Y releases resources. This happens because the pending requests for job Y and Z are in partition "". However, queue default is using resources in partition "A". Pending requests in partition "" don't cause intra queue preemptions in partition "A". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10004) Javadoc of YarnConfigurationStore#initialize is not straightforward
[ https://issues.apache.org/jira/browse/YARN-10004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010136#comment-17010136 ] Eric Payne commented on YARN-10004: --- [~snemeth], can you please be more specific about what is wrong with the JavaDoc of YarnConfigurationStore#initialize? > Javadoc of YarnConfigurationStore#initialize is not straightforward > --- > > Key: YARN-10004 > URL: https://issues.apache.org/jira/browse/YARN-10004 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Siddharth Ahuja >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out
[ https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010120#comment-17010120 ] Hadoop QA commented on YARN-8672: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 18m 9s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} branch-2.10 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 38s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} branch-2.10 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} branch-2.10 passed with JDK v1.8.0_232 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} branch-2.10 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} branch-2.10 passed with JDK v1.8.0_232 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} the patch passed with JDK v1.8.0_232 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 24s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 1 new + 533 unchanged - 2 fixed = 534 total (was 535) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed with JDK v1.8.0_232 {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 15m 35s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 46s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 59m 34s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:a969cad0a12 | | JIRA Issue | YARN-8672 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12990132/YARN-8672-branch-2.10.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux c67b8478c954 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-2.10 / 82bc477 | | maven | version: Apache Maven 3.3.9 | |
[jira] [Commented] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out
[ https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010117#comment-17010117 ] Jim Brennan commented on YARN-8672: --- Thanks [~ebadger] I will put up a patch for branch-3.2. > TestContainerManager#testLocalingResourceWhileContainerRunning occasionally > times out > - > > Key: YARN-8672 > URL: https://issues.apache.org/jira/browse/YARN-8672 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.10.0, 3.2.0 >Reporter: Jason Darrell Lowe >Assignee: Chandni Singh >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-8672-branch-2.10.001.patch, > YARN-8672-branch-2.10.002.patch, YARN-8672-branch-2.10.003.patch, > YARN-8672.001.patch, YARN-8672.002.patch, YARN-8672.003.patch, > YARN-8672.004.patch, YARN-8672.005.patch, YARN-8672.006.patch, > YARN-8672.007.patch, YARN-8672.008.patch > > > Precommit builds have been failing in > TestContainerManager#testLocalingResourceWhileContainerRunning. I have been > able to reproduce the problem without any patch applied if I run the test > enough times. It looks like something is removing container tokens from the > nmPrivate area just as a new localizer starts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out
[ https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010115#comment-17010115 ] Eric Badger commented on YARN-8672: --- Thanks for the patch [~Jim_Brennan]! +1 on the branch-2.10 patch. Before I commit it, could you put up a patch for branch-3.2? The trunk patch doesn't apply cleanly and there are enough differences that I'm not comfortable fixing all of them without a patch running against hadoopQA. > TestContainerManager#testLocalingResourceWhileContainerRunning occasionally > times out > - > > Key: YARN-8672 > URL: https://issues.apache.org/jira/browse/YARN-8672 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.10.0, 3.2.0 >Reporter: Jason Darrell Lowe >Assignee: Chandni Singh >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-8672-branch-2.10.001.patch, > YARN-8672-branch-2.10.002.patch, YARN-8672-branch-2.10.003.patch, > YARN-8672.001.patch, YARN-8672.002.patch, YARN-8672.003.patch, > YARN-8672.004.patch, YARN-8672.005.patch, YARN-8672.006.patch, > YARN-8672.007.patch, YARN-8672.008.patch > > > Precommit builds have been failing in > TestContainerManager#testLocalingResourceWhileContainerRunning. I have been > able to reproduce the problem without any patch applied if I run the test > enough times. It looks like something is removing container tokens from the > nmPrivate area just as a new localizer starts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10072) TestCSAllocateCustomResource failures
[ https://issues.apache.org/jira/browse/YARN-10072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010112#comment-17010112 ] Jim Brennan commented on YARN-10072: The unit test failure in TestCapacityScheduler is unrelated to this change: {noformat} [ERROR] TestCapacityScheduler.testResourceOverCommit:1467 Too long: 2412ms {noformat} > TestCSAllocateCustomResource failures > - > > Key: YARN-10072 > URL: https://issues.apache.org/jira/browse/YARN-10072 > Project: Hadoop YARN > Issue Type: Test > Components: yarn >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Labels: YARN > Attachments: YARN-10072.001.patch, YARN-10072.002.patch > > > This test is failing for us consistently in our internal 2.10 based branch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10072) TestCSAllocateCustomResource failures
[ https://issues.apache.org/jira/browse/YARN-10072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010111#comment-17010111 ] Hadoop QA commented on YARN-10072: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 43s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 16s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 4s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 15s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}143m 6s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-10072 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12990123/YARN-10072.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux aba3cfcb2ada 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d1f5976 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_232 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/25342/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25342/testReport/ | | Max. process+thread count | 834 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U:
[jira] [Commented] (YARN-7387) org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer fails intermittently
[ https://issues.apache.org/jira/browse/YARN-7387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010086#comment-17010086 ] Jim Brennan commented on YARN-7387: --- [~ebadger] or [~epayne] can you please review this one? > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer > fails intermittently > --- > > Key: YARN-7387 > URL: https://issues.apache.org/jira/browse/YARN-7387 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-7387.001.patch > > > {code} > Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 52.481 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer > testDecreaseAfterIncreaseWithAllocationExpiration(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer) > Time elapsed: 13.292 sec <<< FAILURE! > java.lang.AssertionError: expected:<3072> but was:<4096> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer.testDecreaseAfterIncreaseWithAllocationExpiration(TestIncreaseAllocationExpirer.java:459) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out
[ https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010084#comment-17010084 ] Jim Brennan commented on YARN-8672: --- patch 003 fixes 2 of the three checkstyle issues. The last one is: {noformat} ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java:409: public static void buildMainArgs(List command,:22: More than 7 parameters (found 8). {noformat} This matches the trunk version. > TestContainerManager#testLocalingResourceWhileContainerRunning occasionally > times out > - > > Key: YARN-8672 > URL: https://issues.apache.org/jira/browse/YARN-8672 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.10.0, 3.2.0 >Reporter: Jason Darrell Lowe >Assignee: Chandni Singh >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-8672-branch-2.10.001.patch, > YARN-8672-branch-2.10.002.patch, YARN-8672-branch-2.10.003.patch, > YARN-8672.001.patch, YARN-8672.002.patch, YARN-8672.003.patch, > YARN-8672.004.patch, YARN-8672.005.patch, YARN-8672.006.patch, > YARN-8672.007.patch, YARN-8672.008.patch > > > Precommit builds have been failing in > TestContainerManager#testLocalingResourceWhileContainerRunning. I have been > able to reproduce the problem without any patch applied if I run the test > enough times. It looks like something is removing container tokens from the > nmPrivate area just as a new localizer starts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out
[ https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-8672: -- Attachment: YARN-8672-branch-2.10.003.patch > TestContainerManager#testLocalingResourceWhileContainerRunning occasionally > times out > - > > Key: YARN-8672 > URL: https://issues.apache.org/jira/browse/YARN-8672 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.10.0, 3.2.0 >Reporter: Jason Darrell Lowe >Assignee: Chandni Singh >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-8672-branch-2.10.001.patch, > YARN-8672-branch-2.10.002.patch, YARN-8672-branch-2.10.003.patch, > YARN-8672.001.patch, YARN-8672.002.patch, YARN-8672.003.patch, > YARN-8672.004.patch, YARN-8672.005.patch, YARN-8672.006.patch, > YARN-8672.007.patch, YARN-8672.008.patch > > > Precommit builds have been failing in > TestContainerManager#testLocalingResourceWhileContainerRunning. I have been > able to reproduce the problem without any patch applied if I run the test > enough times. It looks like something is removing container tokens from the > nmPrivate area just as a new localizer starts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out
[ https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010064#comment-17010064 ] Hadoop QA commented on YARN-8672: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 45s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} branch-2.10 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 1s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} branch-2.10 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} branch-2.10 passed with JDK v1.8.0_232 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 56s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} branch-2.10 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} branch-2.10 passed with JDK v1.8.0_232 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed with JDK v1.8.0_232 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 23s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 3 new + 534 unchanged - 2 fixed = 537 total (was 536) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed with JDK v1.8.0_232 {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 14m 55s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 40m 30s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:a969cad0a12 | | JIRA Issue | YARN-8672 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12990125/YARN-8672-branch-2.10.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 6f74a7f96f20 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-2.10 / 82bc477 | | maven | version: Apache Maven 3.3.9 | |
[jira] [Updated] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out
[ https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-8672: -- Attachment: YARN-8672-branch-2.10.002.patch > TestContainerManager#testLocalingResourceWhileContainerRunning occasionally > times out > - > > Key: YARN-8672 > URL: https://issues.apache.org/jira/browse/YARN-8672 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.10.0, 3.2.0 >Reporter: Jason Darrell Lowe >Assignee: Chandni Singh >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-8672-branch-2.10.001.patch, > YARN-8672-branch-2.10.002.patch, YARN-8672.001.patch, YARN-8672.002.patch, > YARN-8672.003.patch, YARN-8672.004.patch, YARN-8672.005.patch, > YARN-8672.006.patch, YARN-8672.007.patch, YARN-8672.008.patch > > > Precommit builds have been failing in > TestContainerManager#testLocalingResourceWhileContainerRunning. I have been > able to reproduce the problem without any patch applied if I run the test > enough times. It looks like something is removing container tokens from the > nmPrivate area just as a new localizer starts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out
[ https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010008#comment-17010008 ] Jim Brennan commented on YARN-8672: --- Looks like I missed a change to DockerContainerExecutor. I will fix. > TestContainerManager#testLocalingResourceWhileContainerRunning occasionally > times out > - > > Key: YARN-8672 > URL: https://issues.apache.org/jira/browse/YARN-8672 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.10.0, 3.2.0 >Reporter: Jason Darrell Lowe >Assignee: Chandni Singh >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-8672-branch-2.10.001.patch, YARN-8672.001.patch, > YARN-8672.002.patch, YARN-8672.003.patch, YARN-8672.004.patch, > YARN-8672.005.patch, YARN-8672.006.patch, YARN-8672.007.patch, > YARN-8672.008.patch > > > Precommit builds have been failing in > TestContainerManager#testLocalingResourceWhileContainerRunning. I have been > able to reproduce the problem without any patch applied if I run the test > enough times. It looks like something is removing container tokens from the > nmPrivate area just as a new localizer starts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10072) TestCSAllocateCustomResource failures
[ https://issues.apache.org/jira/browse/YARN-10072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10072: --- Attachment: YARN-10072.002.patch > TestCSAllocateCustomResource failures > - > > Key: YARN-10072 > URL: https://issues.apache.org/jira/browse/YARN-10072 > Project: Hadoop YARN > Issue Type: Test > Components: yarn >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Labels: YARN > Attachments: YARN-10072.001.patch, YARN-10072.002.patch > > > This test is failing for us consistently in our internal 2.10 based branch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10072) TestCSAllocateCustomResource failures
[ https://issues.apache.org/jira/browse/YARN-10072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009990#comment-17009990 ] Jim Brennan commented on YARN-10072: Thanks [~epayne]! I will put up a new patch to fix those. > TestCSAllocateCustomResource failures > - > > Key: YARN-10072 > URL: https://issues.apache.org/jira/browse/YARN-10072 > Project: Hadoop YARN > Issue Type: Test > Components: yarn >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Labels: YARN > Attachments: YARN-10072.001.patch > > > This test is failing for us consistently in our internal 2.10 based branch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out
[ https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009985#comment-17009985 ] Hadoop QA commented on YARN-8672: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 8s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} branch-2.10 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 20s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} branch-2.10 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} branch-2.10 passed with JDK v1.8.0_232 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 51s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} branch-2.10 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} branch-2.10 passed with JDK v1.8.0_232 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 19s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 23s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 23s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 19s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.8.0_232. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 19s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.8.0_232. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 22s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 1 new + 480 unchanged - 0 fixed = 481 total (was 480) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 19s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 16s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed with JDK v1.8.0_232 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 20s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 38m 38s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:a969cad0a12 | | JIRA Issue | YARN-8672 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12990120/YARN-8672-branch-2.10.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 9a779395d800 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC
[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS
[ https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009975#comment-17009975 ] Wangda Tan commented on YARN-9879: -- [~pbacsko], thanks for working on the design. In general, I agree with what [~wilfreds] mentioned: we should try to avoid change RPC protocols, instead we just change internal logic to make sure multiple queues can be handled. To me there're two major parts: 1) Whatever logic inside CS to allow multiple queue names. Either solution mentioned in the comment: https://issues.apache.org/jira/browse/YARN-9879?focusedCommentId=17009845=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17009845 should be fine. And I expect the lookup of queue name (not queue path) should only be called when submit application. And once application is submitted to CS, internal to CS, we should make sure we use queue path instead of queue name at all other places. Otherwise we will complicate other logics. 2) When submit app, the scheduler going to accept/reject app based on the uniqueness of queue name or path specified. The core part need to be changed is inside RMAppManager: {code:java} if (!isRecovery && YarnConfiguration.isAclEnabled(conf)) { if (scheduler instanceof CapacityScheduler) { String queueName = submissionContext.getQueue(); String appName = submissionContext.getApplicationName(); CSQueue csqueue = ((CapacityScheduler) scheduler).getQueue(queueName);{code} Instead of using scheduler.getQueue, we may need to consider to add a method like getAppSubmissionQueue() to get a queue based on path or name, and after that, we will put normalized queue_path back to submission context of application to make sure in the future inside scheduler we all refer to queue path. For the comment from [~wilfreds]: {quote}The important part is applying a new configuration. If the configuration adds a leaf queue that is not unique the configuration update currently is rejected. With this change we would allow that config to become active. This *could* break existing applications when they try to submit to the leaf queue that is no longer unique. {quote} I personally think it is not a big deal if application reject reasons from RM can clearly guide users to use full qualified queue path when duplicated queue names exists. It is like if a team has only one Peter we can use the first name only otherwise we will add last name to avoid confusion. It isn't counter-intuitive to me. Also, we need to handle queue mapping for queue-path instead of queue name also, I didn't see it from the design doc or I missed it. > Allow multiple leaf queues with the same name in CS > --- > > Key: YARN-9879 > URL: https://issues.apache.org/jira/browse/YARN-9879 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Attachments: DesignDoc_v1.pdf > > > Currently the leaf queue's name must be unique regardless of its position in > the queue hierarchy. > Design doc and first proposal is being made, I'll attach it as soon as it's > done. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out
[ https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009944#comment-17009944 ] Eric Yang commented on YARN-8672: - [~Jim_Brennan] No objection on backport. [~ebadger] Could you shepherd the process, if precommit build passes? Thanks > TestContainerManager#testLocalingResourceWhileContainerRunning occasionally > times out > - > > Key: YARN-8672 > URL: https://issues.apache.org/jira/browse/YARN-8672 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.10.0, 3.2.0 >Reporter: Jason Darrell Lowe >Assignee: Chandni Singh >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-8672-branch-2.10.001.patch, YARN-8672.001.patch, > YARN-8672.002.patch, YARN-8672.003.patch, YARN-8672.004.patch, > YARN-8672.005.patch, YARN-8672.006.patch, YARN-8672.007.patch, > YARN-8672.008.patch > > > Precommit builds have been failing in > TestContainerManager#testLocalingResourceWhileContainerRunning. I have been > able to reproduce the problem without any patch applied if I run the test > enough times. It looks like something is removing container tokens from the > nmPrivate area just as a new localizer starts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10072) TestCSAllocateCustomResource failures
[ https://issues.apache.org/jira/browse/YARN-10072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009938#comment-17009938 ] Eric Payne commented on YARN-10072: --- [~Jim_Brennan], the new CheckStyle warnings are due to the now-unused imports in TestCSAllocateCustomResource. Is there any reason not to remove those? > TestCSAllocateCustomResource failures > - > > Key: YARN-10072 > URL: https://issues.apache.org/jira/browse/YARN-10072 > Project: Hadoop YARN > Issue Type: Test > Components: yarn >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Labels: YARN > Attachments: YARN-10072.001.patch > > > This test is failing for us consistently in our internal 2.10 based branch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10072) TestCSAllocateCustomResource failures
[ https://issues.apache.org/jira/browse/YARN-10072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009936#comment-17009936 ] Hadoop QA commented on YARN-10072: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 33s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 31s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 6 new + 0 unchanged - 0 fixed = 6 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 47s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 85m 29s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}143m 34s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-10072 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12990108/YARN-10072.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 0fd08927ccca 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / bc366d4 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_232 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/25340/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25340/testReport/ | | Max. process+thread count | 818 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
[jira] [Reopened] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out
[ https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan reopened YARN-8672: --- [~csingh], [~eyang] we are seeing these failures in branch-2.10. Any objection to pulling these changes back to branch-2.10? I will provide a patch. > TestContainerManager#testLocalingResourceWhileContainerRunning occasionally > times out > - > > Key: YARN-8672 > URL: https://issues.apache.org/jira/browse/YARN-8672 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: Jason Darrell Lowe >Assignee: Chandni Singh >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-8672.001.patch, YARN-8672.002.patch, > YARN-8672.003.patch, YARN-8672.004.patch, YARN-8672.005.patch, > YARN-8672.006.patch, YARN-8672.007.patch, YARN-8672.008.patch > > > Precommit builds have been failing in > TestContainerManager#testLocalingResourceWhileContainerRunning. I have been > able to reproduce the problem without any patch applied if I run the test > enough times. It looks like something is removing container tokens from the > nmPrivate area just as a new localizer starts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10068) TimelineV2Client may leak file descriptors creating ClientResponse objects.
[ https://issues.apache.org/jira/browse/YARN-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009895#comment-17009895 ] Anand Srinivasan commented on YARN-10068: - Hi Adam Antal, Thanks for the review and feedbacks. Very much appreciate it. Kind regards. > TimelineV2Client may leak file descriptors creating ClientResponse objects. > --- > > Key: YARN-10068 > URL: https://issues.apache.org/jira/browse/YARN-10068 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.0.0 > Environment: HDP VERSION3.1.4 > AMBARI VERSION2.7.4.0 >Reporter: Anand Srinivasan >Assignee: Anand Srinivasan >Priority: Critical > Attachments: YARN-10068.001.patch, YARN-10068.002.patch, > YARN-10068.003.patch, image-2020-01-02-14-58-12-773.png > > > Hi team, > Code-walkthrough between v1 and v2 of TimelineClient API revealed that v2 API > TimelineV2ClientImpl#putObjects doesn't close ClientResponse objects under > success status returned from Timeline Server. ClientResponse is closed only > under erroneous response from the server using ClientResponse#getEntity. > We also noticed that TimelineClient (v1) closes the ClientResponse object in > TimelineWriter#putEntities by calling ClientResponse#getEntity in both > success and error conditions from the server thereby avoiding this file > descriptor leak. > Customer's original issue and the symptom was that the NodeManager went down > because of 'too many files open' condition where there were lots of > CLOSED_WAIT sockets observed between the timeline client (from NM) and the > timeline server hosts. > Could you please help resolve this issue ? Thanks. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10067) Add dry-run feature to FS-CS converter tool
[ https://issues.apache.org/jira/browse/YARN-10067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009892#comment-17009892 ] Szilard Nemeth commented on YARN-10067: --- Hi [~pbacsko]! Thanks for your patch. I have some comments: 1. In FSConfigToCSConfigArgumentHandler#parseAndConvert: Please extract the code from {code:java} dryRun = cliParser.hasOption(CliOption.DRY_RUN.shortSwitch); {code} until {code:java} converter.convert(params); {code} to a separate method, for better readability. 2. In FSConfigToCSConfigArgumentHandler#parseAndConvert: The exception handling logic is quite verbose as of now. You could extract a code block that is occurring 3 times of the exception handling: {code:java} if (dryRun) { dryRunResultHolder.addDryRunError(msg); } else { logAndStdErr(e, msg); return -1; } {code} 3. I think FSConfigToCSConfigArgumentHandler#printDryRunResults is a good method, in terms of contents. I would rather move the whole printing logic into DryRunResultHolder instead, so printing its own results can be the responsibility of that class. 4. Nit: FSConfigToCSConfigConverter#dryRun: You can omit "= false" from the declaration, since as per Java standards, booleans are initialized to false by default. 5. I can see in many places that the boolean dryRun + the DryRunResultsHolder are passed in tandem. For example, in FSConfigToCSConfigConverter, in FSConfigToCSConfigRuleHandler and in FSConfigToCSConfigArgumentHandler. Can you create a class to hold these two together? For example, I can image something named like "RuntimeParameters" and there you could hide the details like dry run, as well as any other future runtime options. Methods like FSConfigToCSConfigRuleHandler#handle and FSQueueConverter#convertQueueHierarchy could simply pass (delegate) the exceptionMessage to an instance of this RuntimeParameters class and the instance could decide on what to do with the error message: Either throw an UnsupportedPropertyException or to record it as a dry run error. This way, the dry-run feature is better abstracted, in my opinion. 6. Why don't you use FSConfigToCSConfigConverterParams#isDryRun anywhere? Is this intentional? 7. In TestFSQueueConverter, you have very similar code calls to create the FSQueueConverter objects. I would suggest to extract a method that creates a builder object with those common calls, e.g. {code:java} FSQueueConverterBuilder.create() .withRuleHandler(ruleHandler) .withCapacitySchedulerConfig(csConfig) .withPreemptionEnabled(false) .withSizeBasedWeight(false) .withAutoCreateChildQueues(true) .withClusterResource(CLUSTER_RESOURCE) .withQueueMaxAMShareDefault(0.16f) .withQueueMaxAppsDefault(15) .withDryRun(false) {code} and then tweak the builder to meet the testcase needs. This way, you can have a default builder object with a few additional calls to it to prepare the converter object. > Add dry-run feature to FS-CS converter tool > --- > > Key: YARN-10067 > URL: https://issues.apache.org/jira/browse/YARN-10067 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-10067-001.patch, YARN-10067-002.patch, > YARN-10067-003.patch > > > Add a "d" / "-dry-run" switch to the tool. The purpose of this would be to > inform the user whether a conversion is possible and if it is, are there any > warnings. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10068) TimelineV2Client may leak file descriptors creating ClientResponse objects.
[ https://issues.apache.org/jira/browse/YARN-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009883#comment-17009883 ] Adam Antal commented on YARN-10068: --- Thanks for the patch [~anand.srinivasan]. On a second look, I agree with resolution regarding my comments - thanks! +1 (non-binding) from me. > TimelineV2Client may leak file descriptors creating ClientResponse objects. > --- > > Key: YARN-10068 > URL: https://issues.apache.org/jira/browse/YARN-10068 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.0.0 > Environment: HDP VERSION3.1.4 > AMBARI VERSION2.7.4.0 >Reporter: Anand Srinivasan >Assignee: Anand Srinivasan >Priority: Critical > Attachments: YARN-10068.001.patch, YARN-10068.002.patch, > YARN-10068.003.patch, image-2020-01-02-14-58-12-773.png > > > Hi team, > Code-walkthrough between v1 and v2 of TimelineClient API revealed that v2 API > TimelineV2ClientImpl#putObjects doesn't close ClientResponse objects under > success status returned from Timeline Server. ClientResponse is closed only > under erroneous response from the server using ClientResponse#getEntity. > We also noticed that TimelineClient (v1) closes the ClientResponse object in > TimelineWriter#putEntities by calling ClientResponse#getEntity in both > success and error conditions from the server thereby avoiding this file > descriptor leak. > Customer's original issue and the symptom was that the NodeManager went down > because of 'too many files open' condition where there were lots of > CLOSED_WAIT sockets observed between the timeline client (from NM) and the > timeline server hosts. > Could you please help resolve this issue ? Thanks. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10072) TestCSAllocateCustomResource failures
[ https://issues.apache.org/jira/browse/YARN-10072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009882#comment-17009882 ] Eric Payne commented on YARN-10072: --- The changes look good to me. I'll wait for the pre-commit build results and evaluate more. > TestCSAllocateCustomResource failures > - > > Key: YARN-10072 > URL: https://issues.apache.org/jira/browse/YARN-10072 > Project: Hadoop YARN > Issue Type: Test > Components: yarn >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Labels: YARN > Attachments: YARN-10072.001.patch > > > This test is failing for us consistently in our internal 2.10 based branch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10072) TestCSAllocateCustomResource failures
[ https://issues.apache.org/jira/browse/YARN-10072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009868#comment-17009868 ] Jim Brennan commented on YARN-10072: [~epayne], I've put up patch 001 for this. Can you please review? > TestCSAllocateCustomResource failures > - > > Key: YARN-10072 > URL: https://issues.apache.org/jira/browse/YARN-10072 > Project: Hadoop YARN > Issue Type: Test > Components: yarn >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Labels: YARN > Attachments: YARN-10072.001.patch > > > This test is failing for us consistently in our internal 2.10 based branch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10027) Add ability for ATS (log servlet) to read logs of running apps
[ https://issues.apache.org/jira/browse/YARN-10027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Antal updated YARN-10027: -- Description: Currently neither version of the AHS is able to read logs of running apps (local logs of NodeManager). YARN log CLI is integrated with NodeManager to extract local logs as well (see YARN-5224 for reference), the same should be done for ATS. Some context: The local log files are read by the server in {{NMWebServices#getContainerLogFile}}. This is accessed by the YARN logs CLI through REST using the /containers/\{containerid}/logs/\{filename} endpoint in {{LogsCLI#getResponeFromNMWebService}}. If YARN-10026 we can pull the common code pieces out of those services, we can implement this in the common log servlet. was: Currently neither version of the AHS is able to read logs of running apps (local logs of NodeManager). YARN log CLI is integrated with NodeManager to extract local logs as well (see YARN-5224 for reference), the same should be done for ATS. Some context: The local log files are read by the server in {{NMWebServices#getContainerLogFile}}. This is accessed by the YARN logs CLI through REST using the /containers/{containerid}/logs/{filename} endpoint in {{LogsCLI#getResponeFromNMWebService}}. If YARN-10026 we can pull the common code pieces out of those services, we can implement this in the common log servlet. > Add ability for ATS (log servlet) to read logs of running apps > -- > > Key: YARN-10027 > URL: https://issues.apache.org/jira/browse/YARN-10027 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Minor > > Currently neither version of the AHS is able to read logs of running apps > (local logs of NodeManager). YARN log CLI is integrated with NodeManager to > extract local logs as well (see YARN-5224 for reference), the same should be > done for ATS. > Some context: > The local log files are read by the server in > {{NMWebServices#getContainerLogFile}}. This is accessed by the YARN logs CLI > through REST using the /containers/\{containerid}/logs/\{filename} endpoint > in {{LogsCLI#getResponeFromNMWebService}}. > If YARN-10026 we can pull the common code pieces out of those services, we > can implement this in the common log servlet. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10026) Pull out common code pieces from ATS v1.5 and v2
[ https://issues.apache.org/jira/browse/YARN-10026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009858#comment-17009858 ] Adam Antal commented on YARN-10026: --- Jenkins passed on branch-3.2. Could you please commit this [~snemeth] to that branch as well? > Pull out common code pieces from ATS v1.5 and v2 > > > Key: YARN-10026 > URL: https://issues.apache.org/jira/browse/YARN-10026 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2, yarn >Affects Versions: 3.2.1 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-10026.001.patch, YARN-10026.002.patch, > YARN-10026.003.patch, YARN-10026.branch-3.2.001.patch > > > ATSv1.5 and ATSv2 has lots of common code that can be pulled to an abstract > service / package. The logic is the same, and the code is _almost_ the same. > As far as I see, the only ATS specific thing in that AppInfo is constructed > from an ApplicationReport, which information is extracted from the > TimelineReader client, > Later the appInfo object's user and appState fields are used, but I see no > other dependency on the timeline part. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS
[ https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009845#comment-17009845 ] Peter Bacsko commented on YARN-9879: Alternatively, we can have {{Map fullToCSQueue}} and {{Map leafToCSQueue}}, so we can avoid the double lookup (not that it's really that expensive). Also it's probably better to have {{Map}} to check whether a leaf is unique. When we add/remove a queue, we increase/decrease a counter, so upon removal, we know whether it has became unique or not. > Allow multiple leaf queues with the same name in CS > --- > > Key: YARN-9879 > URL: https://issues.apache.org/jira/browse/YARN-9879 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Attachments: DesignDoc_v1.pdf > > > Currently the leaf queue's name must be unique regardless of its position in > the queue hierarchy. > Design doc and first proposal is being made, I'll attach it as soon as it's > done. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10043) FairOrderingPolicy Improvements
[ https://issues.apache.org/jira/browse/YARN-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009840#comment-17009840 ] Manikandan R commented on YARN-10043: - [~wilfreds] Any suggestions? > FairOrderingPolicy Improvements > --- > > Key: YARN-10043 > URL: https://issues.apache.org/jira/browse/YARN-10043 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > > FairOrderingPolicy can be improved by using some of the approaches (only > relevant) implemented in FairSharePolicy of FS. This improvement has > significance in FS to CS migration context. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9866) u:user2:%primary_group is not working as expected
[ https://issues.apache.org/jira/browse/YARN-9866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009838#comment-17009838 ] Manikandan R commented on YARN-9866: [~snemeth] Patch is ready for commit. Can you take a quick look? > u:user2:%primary_group is not working as expected > - > > Key: YARN-9866 > URL: https://issues.apache.org/jira/browse/YARN-9866 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9866.001.patch, YARN-9866.002.patch, > YARN-9866.003.patch, YARN-9866.004.patch, YARN-9866.005.patch > > > Please refer #1 in > https://issues.apache.org/jira/browse/YARN-9841?focusedCommentId=16937024=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937024 > for more details -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry
[ https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009835#comment-17009835 ] Manikandan R commented on YARN-9768: [~bibinchundatt] This is hanging for quite some time. Can we please get a closure on this? > RM Renew Delegation token thread should timeout and retry > - > > Key: YARN-9768 > URL: https://issues.apache.org/jira/browse/YARN-9768 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: CR Hota >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9768.001.patch, YARN-9768.002.patch, > YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, > YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch > > > Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews > HDFS tokens received to check for validity and expiration time. > This call is made to an underlying HDFS NN or Router Node (which has exact > APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the > thread remains stuck indefinitely. The thread should ideally timeout the > renewToken and retry from the client's perspective. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9868) Validate %primary_group queue in CS queue manager
[ https://issues.apache.org/jira/browse/YARN-9868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009829#comment-17009829 ] Manikandan R commented on YARN-9868: [~snemeth] Can you take it forward? > Validate %primary_group queue in CS queue manager > - > > Key: YARN-9868 > URL: https://issues.apache.org/jira/browse/YARN-9868 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9868-003.patch, YARN-9868-003.patch, > YARN-9868-004.patch, YARN-9868.001.patch, YARN-9868.002.patch, > YARN-9868.005.patch > > > As part of %secondary_group mapping, we ensure o/p of %secondary_group while > processing the queue mapping is available using CSQueueManager. Similarly, we > will need to same for %primary_group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10072) TestCSAllocateCustomResource failures
[ https://issues.apache.org/jira/browse/YARN-10072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan reassigned YARN-10072: -- Attachment: YARN-10072.001.patch Assignee: Jim Brennan Labels: YARN (was: ) > TestCSAllocateCustomResource failures > - > > Key: YARN-10072 > URL: https://issues.apache.org/jira/browse/YARN-10072 > Project: Hadoop YARN > Issue Type: Test > Components: yarn >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Labels: YARN > Attachments: YARN-10072.001.patch > > > This test is failing for us consistently in our internal 2.10 based branch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10072) TestCSAllocateCustomResource failures
[ https://issues.apache.org/jira/browse/YARN-10072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009806#comment-17009806 ] Jim Brennan commented on YARN-10072: I resolved this internally by changing TestCSAllocateCustomResource#testCapacitySchedulerInitWithCustomResourceType to use MockRM like the other test in TestCSAllocateCustomResource. It was using a lot of mocking/spying to try to isolate CapacityScheduler, but in so doing I think it introduced some inconsistency in the initialization process - I was getting inconsistent results depending on where I set breakpoints while debugging. By using MockRM, the CapacityScheduler initialization should more closely match what happens in production. And it removed the inconsistency with breakpoints. I will put up a patch for trunk shortly. > TestCSAllocateCustomResource failures > - > > Key: YARN-10072 > URL: https://issues.apache.org/jira/browse/YARN-10072 > Project: Hadoop YARN > Issue Type: Test > Components: yarn >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Priority: Major > > This test is failing for us consistently in our internal 2.10 based branch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10072) TestCSAllocateCustomResource failures
[ https://issues.apache.org/jira/browse/YARN-10072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009800#comment-17009800 ] Jim Brennan commented on YARN-10072: Here is a sample failure: {noformat} --- Test set: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCSAllocateCustomResource --- Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.291 s <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCSAllocateCustomResource testCapacitySchedulerInitWithCustomResourceType(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCSAllocateCustomResource) Time elapsed: 0.569 s <<< FAILURE! java.lang.AssertionError: Values should be different. Actual: 0 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failEquals(Assert.java:185) at org.junit.Assert.assertNotEquals(Assert.java:161) at org.junit.Assert.assertNotEquals(Assert.java:198) at org.junit.Assert.assertNotEquals(Assert.java:209) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCSAllocateCustomResource.testCapacitySchedulerInitWithCustomResourceType(TestCSAllocateCustomResource.java:184) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413) {noformat} > TestCSAllocateCustomResource failures > - > > Key: YARN-10072 > URL: https://issues.apache.org/jira/browse/YARN-10072 > Project: Hadoop YARN > Issue Type: Test > Components: yarn >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Priority: Major > > This test is failing for us consistently in our internal 2.10 based branch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10072) TestCSAllocateCustomResource failures
Jim Brennan created YARN-10072: -- Summary: TestCSAllocateCustomResource failures Key: YARN-10072 URL: https://issues.apache.org/jira/browse/YARN-10072 Project: Hadoop YARN Issue Type: Test Components: yarn Affects Versions: 2.10.0 Reporter: Jim Brennan This test is failing for us consistently in our internal 2.10 based branch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9879) Allow multiple leaf queues with the same name in CS
[ https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009797#comment-17009797 ] Peter Bacsko edited comment on YARN-9879 at 1/7/20 2:55 PM: [~wilfreds] based on your suggestion, here's what I came up with: We can still maintain the HashMap with queueName->CSQueue, however we'd use two levels: 1. Leaf queue -> full path 2. Full path -> CSQueue object We additionally need an extra map which tells whether a leaf queue is unique. So after some thinking, this is the semi-pseudocode that could possibly do the job: {noformat} Map fullPathQueues; Map leafToFullPath; Map leafUnique; public CSQueue getQueue(String queueName) { if (fullPathName(queueName)) { return fullPathQueues.get(queueName); } else { if (leafUnique.get(queueName)) { String fullName = leafToFullPath.get(queueName); return fullPathQueues.get(fullName); } else { throw new YarnException(queueName + " is not unique"); } } } {noformat} Obviously methods like {{addQueue()}}, {{removeQueue()}} should be updated too. was (Author: pbacsko): [~wilfreds] based on your suggestion, here's what I came up with: We can still maintain the HashMap with queueName->CSQueue, however we'd use two levels: 1. Leaf queue -> full path 2. Full path -> CSQueue object We additionally need an extra map which tells whether a leaf queue is unique. So after some thinking, this is the semi-pseudocode that could possibly do the job: {noformat} Map fullPathQueues; Map leafToFullPath; Map leafUnique; public CSQueue getQueue(String queueName) { if (fullPathName(queueName)) { return queues.get(queueName); } else { if (leafUnique.get(queueName)) { String fullName = leafToFullPath.get(queueName); return queues.get(fullName); } else { throw new YarnException(queueName + " is not unique"); } } } {noformat} Obviously methods like {{addQueue()}}, {{removeQueue()}} should be updated too. > Allow multiple leaf queues with the same name in CS > --- > > Key: YARN-9879 > URL: https://issues.apache.org/jira/browse/YARN-9879 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Attachments: DesignDoc_v1.pdf > > > Currently the leaf queue's name must be unique regardless of its position in > the queue hierarchy. > Design doc and first proposal is being made, I'll attach it as soon as it's > done. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS
[ https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009797#comment-17009797 ] Peter Bacsko commented on YARN-9879: [~wilfreds] based on your suggestion, here's what I came up with: We can still maintain the HashMap with queueName->CSQueue, however we'd use two levels: 1. Leaf queue -> full path 2. Full path -> CSQueue object We additionally need an extra map which tells whether a leaf queue is unique. So after some thinking, this is the semi-pseudocode that could possibly do the job: {noformat} MapfullPathQueues; Map leafToFullPath; Map leafUnique; public CSQueue getQueue(String queueName) { if (fullPathName(queueName)) { return queues.get(queueName); } else { if (leafUnique.get(queueName)) { String fullName = leafToFullPath.get(queueName); return queues.get(fullName); } else { throw new YarnException(queueName + " is not unique"); } } } {noformat} Obviously methods like {{addQueue()}}, {{removeQueue()}} should be updated too. > Allow multiple leaf queues with the same name in CS > --- > > Key: YARN-9879 > URL: https://issues.apache.org/jira/browse/YARN-9879 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Attachments: DesignDoc_v1.pdf > > > Currently the leaf queue's name must be unique regardless of its position in > the queue hierarchy. > Design doc and first proposal is being made, I'll attach it as soon as it's > done. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9879) Allow multiple leaf queues with the same name in CS
[ https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009797#comment-17009797 ] Peter Bacsko edited comment on YARN-9879 at 1/7/20 2:54 PM: [~wilfreds] based on your suggestion, here's what I came up with: We can still maintain the HashMap with queueName->CSQueue, however we'd use two levels: 1. Leaf queue -> full path 2. Full path -> CSQueue object We additionally need an extra map which tells whether a leaf queue is unique. So after some thinking, this is the semi-pseudocode that could possibly do the job: {noformat} Map fullPathQueues; Map leafToFullPath; Map leafUnique; public CSQueue getQueue(String queueName) { if (fullPathName(queueName)) { return queues.get(queueName); } else { if (leafUnique.get(queueName)) { String fullName = leafToFullPath.get(queueName); return queues.get(fullName); } else { throw new YarnException(queueName + " is not unique"); } } } {noformat} Obviously methods like {{addQueue()}}, {{removeQueue()}} should be updated too. was (Author: pbacsko): [~wilfreds] based on your suggestion, here's what I came up with: We can still maintain the HashMap with queueName->CSQueue, however we'd use two levels: 1. Leaf queue -> full path 2. Full path -> CSQueue object We additionally need an extra map which tells whether a leaf queue is unique. So after some thinking, this is the semi-pseudocode that could possibly do the job: {noformat} MapfullPathQueues; Map leafToFullPath; Map leafUnique; public CSQueue getQueue(String queueName) { if (fullPathName(queueName)) { return queues.get(queueName); } else { if (leafUnique.get(queueName)) { String fullName = leafToFullPath.get(queueName); return queues.get(fullName); } else { throw new YarnException(queueName + " is not unique"); } } } {noformat} Obviously methods like {{addQueue()}}, {{removeQueue()}} should be updated too. > Allow multiple leaf queues with the same name in CS > --- > > Key: YARN-9879 > URL: https://issues.apache.org/jira/browse/YARN-9879 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Attachments: DesignDoc_v1.pdf > > > Currently the leaf queue's name must be unique regardless of its position in > the queue hierarchy. > Design doc and first proposal is being made, I'll attach it as soon as it's > done. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7387) org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer fails intermittently
[ https://issues.apache.org/jira/browse/YARN-7387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009784#comment-17009784 ] Jim Brennan commented on YARN-7387: --- Thanks [~snemeth]! Do you want to review the patch? cc: [~epayne] > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer > fails intermittently > --- > > Key: YARN-7387 > URL: https://issues.apache.org/jira/browse/YARN-7387 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-7387.001.patch > > > {code} > Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 52.481 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer > testDecreaseAfterIncreaseWithAllocationExpiration(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer) > Time elapsed: 13.292 sec <<< FAILURE! > java.lang.AssertionError: expected:<3072> but was:<4096> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer.testDecreaseAfterIncreaseWithAllocationExpiration(TestIncreaseAllocationExpirer.java:459) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7387) org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer fails intermittently
[ https://issues.apache.org/jira/browse/YARN-7387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan reassigned YARN-7387: - Assignee: Jim Brennan (was: Szilard Nemeth) > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer > fails intermittently > --- > > Key: YARN-7387 > URL: https://issues.apache.org/jira/browse/YARN-7387 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-7387.001.patch > > > {code} > Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 52.481 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer > testDecreaseAfterIncreaseWithAllocationExpiration(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer) > Time elapsed: 13.292 sec <<< FAILURE! > java.lang.AssertionError: expected:<3072> but was:<4096> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer.testDecreaseAfterIncreaseWithAllocationExpiration(TestIncreaseAllocationExpirer.java:459) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9525) IFile format is not working against s3a remote folder
[ https://issues.apache.org/jira/browse/YARN-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009673#comment-17009673 ] Hadoop QA commented on YARN-9525: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 40s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 10s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 31s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 38s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 62m 52s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-9525 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12990088/YARN-9525.006.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 73efc528e676 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 2bbf73f | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_232 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25339/testReport/ | | Max. process+thread count | 363 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/25339/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > IFile format is
[jira] [Commented] (YARN-9525) IFile format is not working against s3a remote folder
[ https://issues.apache.org/jira/browse/YARN-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009637#comment-17009637 ] Adam Antal commented on YARN-9525: -- Reuploaded patch v6 as it latest Jenkins result was a while ago. > IFile format is not working against s3a remote folder > - > > Key: YARN-9525 > URL: https://issues.apache.org/jira/browse/YARN-9525 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation >Affects Versions: 3.1.2 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: IFile-S3A-POC01.patch, YARN-9525-001.patch, > YARN-9525.002.patch, YARN-9525.003.patch, YARN-9525.004.patch, > YARN-9525.005.patch, YARN-9525.006.patch, YARN-9525.006.patch > > > Using the IndexedFileFormat {{yarn.nodemanager.remote-app-log-dir}} > configured to an s3a URI throws the following exception during log > aggregation: > {noformat} > Cannot create writer for app application_1556199768861_0001. Skip log upload > this time. > java.io.IOException: java.io.FileNotFoundException: No such file or > directory: > s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041 > at > org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:247) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:464) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:420) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.FileNotFoundException: No such file or directory: > s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041 > at > org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321) > at > org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128) > at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1244) > at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1240) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1246) > at > org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:228) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:195) > ... 7 more > {noformat} > This stack trace point to > {{LogAggregationIndexedFileController$initializeWriter}} where we do the > following steps (in a non-rolling log aggregation setup): > - create FSDataOutputStream > - writing out a UUID > - flushing > - immediately after that we call a GetFileStatus to get the length of the log > file (the bytes we just wrote out), and that's where the failures happens: > the file is not there yet due to eventual consistency. > Maybe we can get rid of that, so we can use IFile format against a s3a target. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9525) IFile format is not working against s3a remote folder
[ https://issues.apache.org/jira/browse/YARN-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Antal updated YARN-9525: - Attachment: YARN-9525.006.patch > IFile format is not working against s3a remote folder > - > > Key: YARN-9525 > URL: https://issues.apache.org/jira/browse/YARN-9525 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation >Affects Versions: 3.1.2 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: IFile-S3A-POC01.patch, YARN-9525-001.patch, > YARN-9525.002.patch, YARN-9525.003.patch, YARN-9525.004.patch, > YARN-9525.005.patch, YARN-9525.006.patch, YARN-9525.006.patch > > > Using the IndexedFileFormat {{yarn.nodemanager.remote-app-log-dir}} > configured to an s3a URI throws the following exception during log > aggregation: > {noformat} > Cannot create writer for app application_1556199768861_0001. Skip log upload > this time. > java.io.IOException: java.io.FileNotFoundException: No such file or > directory: > s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041 > at > org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:247) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:464) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:420) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.FileNotFoundException: No such file or directory: > s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041 > at > org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321) > at > org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128) > at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1244) > at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1240) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1246) > at > org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:228) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:195) > ... 7 more > {noformat} > This stack trace point to > {{LogAggregationIndexedFileController$initializeWriter}} where we do the > following steps (in a non-rolling log aggregation setup): > - create FSDataOutputStream > - writing out a UUID > - flushing > - immediately after that we call a GetFileStatus to get the length of the log > file (the bytes we just wrote out), and that's where the failures happens: > the file is not there yet due to eventual consistency. > Maybe we can get rid of that, so we can use IFile format against a s3a target. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10071) Sync Mockito version with other modules
[ https://issues.apache.org/jira/browse/YARN-10071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009632#comment-17009632 ] Akira Ajisaka commented on YARN-10071: -- Now Mockito 1.x API is not used in the MaWo module. Therefore removing the dependency from pom.xml files seems fine. > Sync Mockito version with other modules > --- > > Key: YARN-10071 > URL: https://issues.apache.org/jira/browse/YARN-10071 > Project: Hadoop YARN > Issue Type: Sub-task > Components: build, test >Reporter: Akira Ajisaka >Priority: Major > > YARN-8551 introduced Mockito 1.x dependency, update. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8374) Upgrade objenesis to 2.6
[ https://issues.apache.org/jira/browse/YARN-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated YARN-8374: Component/s: build Summary: Upgrade objenesis to 2.6 (was: Upgrade objenesis dependency) > Upgrade objenesis to 2.6 > > > Key: YARN-8374 > URL: https://issues.apache.org/jira/browse/YARN-8374 > Project: Hadoop YARN > Issue Type: Improvement > Components: build, timelineservice >Reporter: Jason Darrell Lowe >Assignee: Akira Ajisaka >Priority: Major > > After HADOOP-14918 is committed we should be able to remove the explicit > objenesis dependency and objenesis exclusion from the fst dependency to pick > up the version fst wants naturally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8374) Upgrade objenesis dependency
[ https://issues.apache.org/jira/browse/YARN-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka reassigned YARN-8374: --- Assignee: Akira Ajisaka > Upgrade objenesis dependency > > > Key: YARN-8374 > URL: https://issues.apache.org/jira/browse/YARN-8374 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineservice >Reporter: Jason Darrell Lowe >Assignee: Akira Ajisaka >Priority: Major > > After HADOOP-14918 is committed we should be able to remove the explicit > objenesis dependency and objenesis exclusion from the fst dependency to pick > up the version fst wants naturally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10071) Sync Mockito version with other modules
Akira Ajisaka created YARN-10071: Summary: Sync Mockito version with other modules Key: YARN-10071 URL: https://issues.apache.org/jira/browse/YARN-10071 Project: Hadoop YARN Issue Type: Sub-task Components: build, test Reporter: Akira Ajisaka YARN-8551 introduced Mockito 1.x dependency, update. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10070) NPE if no rule is defined and application-tag-based-placement is enabled
Kinga Marton created YARN-10070: --- Summary: NPE if no rule is defined and application-tag-based-placement is enabled Key: YARN-10070 URL: https://issues.apache.org/jira/browse/YARN-10070 Project: Hadoop YARN Issue Type: Bug Reporter: Kinga Marton Assignee: Kinga Marton If there is no rule defined for a user NPE is thrown by the following line. {code:java} String queue = placementManager .placeApplication(context, usernameUsedForPlacement).getQueue();{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7913) Improve error handling when application recovery fails with exception
[ https://issues.apache.org/jira/browse/YARN-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009551#comment-17009551 ] Wilfred Spiegelenburg commented on YARN-7913: - [~snemeth] and [~sunilg] can you have a look at the change please? > Improve error handling when application recovery fails with exception > - > > Key: YARN-7913 > URL: https://issues.apache.org/jira/browse/YARN-7913 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.0.0 >Reporter: Gergo Repas >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-7913.000.poc.patch, YARN-7913.001.patch, > YARN-7913.002.patch, YARN-7913.003.patch > > > There are edge cases when the application recovery fails with an exception. > Example failure scenario: > * setup: a queue is a leaf queue in the primary RM's config and the same > queue is a parent queue in the secondary RM's config. > * When failover happens with this setup, the recovery will fail for > applications on this queue, and an APP_REJECTED event will be dispatched to > the async dispatcher. On the same thread (that handles the recovery), a > NullPointerException is thrown when the applicationAttempt is tried to be > recovered > (https://github.com/apache/hadoop/blob/55066cc53dc22b68f9ca55a0029741d6c846be0a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L494). > I don't see a good way to avoid the NPE in this scenario, because when the > NPE occurs the APP_REJECTED has not been processed yet, and we don't know > that the application recovery failed. > Currently the first exception will abort the recovery, and if there are X > applications, there will be ~X passive -> active RM transition attempts - the > passive -> active RM transition will only succeed when the last APP_REJECTED > event is processed on the async dispatcher thread. > _The point of this ticket is to improve the error handling and reduce the > number of passive -> active RM transition attempts (solving the above > described failure scenario isn't in scope)._ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6212) NodeManager metrics returning wrong negative values
[ https://issues.apache.org/jira/browse/YARN-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009544#comment-17009544 ] Max Xie commented on YARN-6212: In my cluster, the Nodemanagers metrics return the negative values too. > NodeManager metrics returning wrong negative values > --- > > Key: YARN-6212 > URL: https://issues.apache.org/jira/browse/YARN-6212 > Project: Hadoop YARN > Issue Type: Bug > Components: metrics >Affects Versions: 2.7.3 >Reporter: Abhishek Shivanna >Priority: Major > > It looks like the metrics returned by the NodeManager have negative values > for metrics that never should be negative. Here is an output form NM endpoint > {noformat} > /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics > {noformat} > {noformat} > { > "beans" : [ { > "name" : "Hadoop:service=NodeManager,name=NodeManagerMetrics", > "modelerType" : "NodeManagerMetrics", > "tag.Context" : "yarn", > "tag.Hostname" : "", > "ContainersLaunched" : 707, > "ContainersCompleted" : 9, > "ContainersFailed" : 124, > "ContainersKilled" : 579, > "ContainersIniting" : 0, > "ContainersRunning" : 19, > "AllocatedGB" : -26, > "AllocatedContainers" : -5, > "AvailableGB" : 252, > "AllocatedVCores" : -5, > "AvailableVCores" : 101, > "ContainerLaunchDurationNumOps" : 718, > "ContainerLaunchDurationAvgTime" : 18.0 > } ] > } > {noformat} > Is there any circumstance under which the value for AllocatedGB, > AllocatedContainers and AllocatedVCores go below 0? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9624) Use switch case for ProtoUtils#convertFromProtoFormat containerState
[ https://issues.apache.org/jira/browse/YARN-9624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009536#comment-17009536 ] Bibin Chundatt commented on YARN-9624: -- [~BilwaST] Could you update the patch ? > Use switch case for ProtoUtils#convertFromProtoFormat containerState > > > Key: YARN-9624 > URL: https://issues.apache.org/jira/browse/YARN-9624 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin Chundatt >Assignee: Bilwa S T >Priority: Major > Labels: performance > Attachments: YARN-9624.001.patch, YARN-9624.002.patch > > > On large cluster with 100K+ containers on every heartbeat > {{ContainerState.valueOf(e.name().replace(CONTAINER_STATE_PREFIX, ""))}} will > be too costly. Update with switch case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7387) org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer fails intermittently
[ https://issues.apache.org/jira/browse/YARN-7387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009478#comment-17009478 ] Szilard Nemeth commented on YARN-7387: -- Hi [~Jim_Brennan]! Thanks. I have never worked on this jira, it is just assigned to me as I have planned to work on it. Feel free to reassign the jira to yourself. > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer > fails intermittently > --- > > Key: YARN-7387 > URL: https://issues.apache.org/jira/browse/YARN-7387 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-7387.001.patch > > > {code} > Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 52.481 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer > testDecreaseAfterIncreaseWithAllocationExpiration(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer) > Time elapsed: 13.292 sec <<< FAILURE! > java.lang.AssertionError: expected:<3072> but was:<4096> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer.testDecreaseAfterIncreaseWithAllocationExpiration(TestIncreaseAllocationExpirer.java:459) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org