[jira] [Commented] (YARN-9076) ContainerShellWebSocket Render Process Output
[ https://issues.apache.org/jira/browse/YARN-9076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781743#comment-16781743 ] BELUGA BEHR commented on YARN-9076: --- [~giovanni.fumarola] Hello. Thank you for your review. I see what you mean. I'm not sure what the purpose of this 4K buffer is... maybe to protect from a bad client that sends an unlimited stream of data? I don't really know. However, I have provided a new patch that again caps the read at max 4K. Also, using {{available()}} is almost never the correct thing to do. It's return value can be very funky. Every {{InputStream}} implementation does it differently. > ContainerShellWebSocket Render Process Output > - > > Key: YARN-9076 > URL: https://issues.apache.org/jira/browse/YARN-9076 > Project: Hadoop YARN > Issue Type: Improvement > Components: webapp >Affects Versions: 3.3.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: YARN-9076.1.patch, YARN-9076.2.patch > > > {code:java|title=ContainerShellWebSocket.java} > // Render process output > int no = pair.in.available(); > pair.in.read(buffer, 0, Math.min(no, buffer.length)); > String formatted = new String(buffer, Charset.forName("UTF-8")) > .replaceAll("\n", "\r\n"); > session.getRemote().sendString(formatted); > } > {code} > This code strikes me as a bit odd. First of it, it is using {{available{}}} > which is known as a being unreliable and inaccurate (i.e., for sockets) . > Second, it will only read a max of 4000 characters and that's it. Anything > else is truncated. > Change this code to read the entire data stream. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9076) ContainerShellWebSocket Render Process Output
[ https://issues.apache.org/jira/browse/YARN-9076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-9076: -- Attachment: YARN-9076.3.patch > ContainerShellWebSocket Render Process Output > - > > Key: YARN-9076 > URL: https://issues.apache.org/jira/browse/YARN-9076 > Project: Hadoop YARN > Issue Type: Improvement > Components: webapp >Affects Versions: 3.3.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: YARN-9076.1.patch, YARN-9076.2.patch, YARN-9076.3.patch > > > {code:java|title=ContainerShellWebSocket.java} > // Render process output > int no = pair.in.available(); > pair.in.read(buffer, 0, Math.min(no, buffer.length)); > String formatted = new String(buffer, Charset.forName("UTF-8")) > .replaceAll("\n", "\r\n"); > session.getRemote().sendString(formatted); > } > {code} > This code strikes me as a bit odd. First of it, it is using {{available{}}} > which is known as a being unreliable and inaccurate (i.e., for sockets) . > Second, it will only read a max of 4000 characters and that's it. Anything > else is truncated. > Change this code to read the entire data stream. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9076) ContainerShellWebSocket Render Process Output
[ https://issues.apache.org/jira/browse/YARN-9076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-9076: -- Attachment: YARN-9076.2.patch > ContainerShellWebSocket Render Process Output > - > > Key: YARN-9076 > URL: https://issues.apache.org/jira/browse/YARN-9076 > Project: Hadoop YARN > Issue Type: Improvement > Components: webapp >Affects Versions: 3.3.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: YARN-9076.1.patch, YARN-9076.2.patch > > > {code:java|title=ContainerShellWebSocket.java} > // Render process output > int no = pair.in.available(); > pair.in.read(buffer, 0, Math.min(no, buffer.length)); > String formatted = new String(buffer, Charset.forName("UTF-8")) > .replaceAll("\n", "\r\n"); > session.getRemote().sendString(formatted); > } > {code} > This code strikes me as a bit odd. First of it, it is using {{available{}}} > which is known as a being unreliable and inaccurate (i.e., for sockets) . > Second, it will only read a max of 4000 characters and that's it. Anything > else is truncated. > Change this code to read the entire data stream. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8146) Remove LinkedList From resourcemanager.reservation.planning Package
[ https://issues.apache.org/jira/browse/YARN-8146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8146: -- Attachment: YARN-8146.2.patch > Remove LinkedList From resourcemanager.reservation.planning Package > --- > > Key: YARN-8146 > URL: https://issues.apache.org/jira/browse/YARN-8146 > Project: Hadoop YARN > Issue Type: Improvement > Components: reservation system >Affects Versions: 3.0.1 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Attachments: YARN-8146.1.patch, YARN-8146.2.patch > > > Remove {{LinkedList}} instances in favor of {{ArrayList}}. {{ArrayList}} is > generally more memory efficient, require less memory fragmentation, and with > memory localization, faster to iterate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9260) Re-Launch ApplicationMasters That Fail With OOM Using Larger Container
[ https://issues.apache.org/jira/browse/YARN-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-9260: -- Environment: (was: If an ApplicationMaster fails with an OOM, or is killed by YARN for using more memory than is allowed by the container it's launched in, when re-trying the AM, increase its container size.) > Re-Launch ApplicationMasters That Fail With OOM Using Larger Container > -- > > Key: YARN-9260 > URL: https://issues.apache.org/jira/browse/YARN-9260 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: BELUGA BEHR >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9260) Re-Launch ApplicationMasters That Fail With OOM Using Larger Container
[ https://issues.apache.org/jira/browse/YARN-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757787#comment-16757787 ] BELUGA BEHR commented on YARN-9260: --- Thanks for the input [~eepayne]. I could see a new configuration called {{growth-factor}} which specifies how large to grow the container each time it is re-tried. This would be a percentage, therefore, a {{growth-factor}} of 1.0f (100%) would preserve the current behavior. > Re-Launch ApplicationMasters That Fail With OOM Using Larger Container > -- > > Key: YARN-9260 > URL: https://issues.apache.org/jira/browse/YARN-9260 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: BELUGA BEHR >Priority: Major > > If an ApplicationMaster fails with an OOM, or is killed by YARN for using > more memory than is allowed by the container it's launched in, when re-trying > the AM, increase its container size. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9260) Re-Launch ApplicationMasters That Fail With OOM Using Larger Container
[ https://issues.apache.org/jira/browse/YARN-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-9260: -- Description: If an ApplicationMaster fails with an OOM, or is killed by YARN for using more memory than is allowed by the container it's launched in, when re-trying the AM, increase its container size. > Re-Launch ApplicationMasters That Fail With OOM Using Larger Container > -- > > Key: YARN-9260 > URL: https://issues.apache.org/jira/browse/YARN-9260 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: BELUGA BEHR >Priority: Major > > If an ApplicationMaster fails with an OOM, or is killed by YARN for using > more memory than is allowed by the container it's launched in, when re-trying > the AM, increase its container size. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9260) Re-Launch ApplicationMasters That Fail With OOM Using Larger Container
BELUGA BEHR created YARN-9260: - Summary: Re-Launch ApplicationMasters That Fail With OOM Using Larger Container Key: YARN-9260 URL: https://issues.apache.org/jira/browse/YARN-9260 Project: Hadoop YARN Issue Type: Improvement Environment: If an ApplicationMaster fails with an OOM, or is killed by YARN for using more memory than is allowed by the container it's launched in, when re-trying the AM, increase its container size. Reporter: BELUGA BEHR -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9259) Assign ApplicationMaster (AM) Heap Memory Based on Container Size
[ https://issues.apache.org/jira/browse/YARN-9259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757518#comment-16757518 ] BELUGA BEHR edited comment on YARN-9259 at 1/31/19 5:14 PM: This goes hand-in-hand with another proposed Idea: [MAPREDUCE-7180]. If the AM fails with a memory error, it should be relaunched with a larger container. However, the larger container may not be good enough, it may need a larger JVM heap size too (hence this ticket). This is a more crude (and simple) way of determining the correct container/heap size than [MAPREDUCE-5892]. It just tries with progressively larger container sizes instead of having to come up with some heuristic to reason about the amount of memory required based on number of splits and other complicating factors. Though, MR ApplicatonMasters are not all that dynamic in regards to memory usage. [MAPREDUCE-207] first needs to be addressed. was (Author: belugabehr): This goes hand-in-hand with another proposed Idea: [MAPREDUCE-7180]. If the AM fails with a memory error, it should be relaunched with a larger container (and therefore a larger JVM heap size). > Assign ApplicationMaster (AM) Heap Memory Based on Container Size > - > > Key: YARN-9259 > URL: https://issues.apache.org/jira/browse/YARN-9259 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: BELUGA BEHR >Priority: Major > > [YARN-7936] introduced a sane default value for the ApplicationMaster (AM) > Java heap size. However, [MAPREDUCE-5785] added a feature that sets the Java > Heap size of the Mapper/Reducer to be 80% (configurable, of course) of the > YARN container size. > Please add similar logic for MR ApplicationMaster. If the size of the AM > container is increased, the JVM heap size should be too automatically. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9260) Re-Launch ApplicationMasters That Fail With OOM Using Larger Container
[ https://issues.apache.org/jira/browse/YARN-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757528#comment-16757528 ] BELUGA BEHR commented on YARN-9260: --- When re-launching the container, the AM JVM Heap memory needs to be increased too: [YARN-9259] > Re-Launch ApplicationMasters That Fail With OOM Using Larger Container > -- > > Key: YARN-9260 > URL: https://issues.apache.org/jira/browse/YARN-9260 > Project: Hadoop YARN > Issue Type: Improvement > Environment: If an ApplicationMaster fails with an OOM, or is killed > by YARN for using more memory than is allowed by the container it's launched > in, when re-trying the AM, increase its container size. >Reporter: BELUGA BEHR >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9259) Assign ApplicationMaster (AM) Heap Memory Based on Container Size
[ https://issues.apache.org/jira/browse/YARN-9259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757518#comment-16757518 ] BELUGA BEHR commented on YARN-9259: --- This goes hand-in-hand with another proposed Idea: [MAPREDUCE-7180]. If the AM fails with a memory error, it should be relaunched with a larger container (and therefore a larger JVM heap size). > Assign ApplicationMaster (AM) Heap Memory Based on Container Size > - > > Key: YARN-9259 > URL: https://issues.apache.org/jira/browse/YARN-9259 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: BELUGA BEHR >Priority: Major > > [YARN-7936] introduced a sane default value for the ApplicationMaster (AM) > Java heap size. However, [MAPREDUCE-5785] added a feature that sets the Java > Heap size of the Mapper/Reducer to be 80% (configurable, of course) of the > YARN container size. > Please add similar logic for MR ApplicationMaster. If the size of the AM > container is increased, the JVM heap size should be too automatically. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9259) Assign ApplicationMaster (AM) Heap Memory Based on Container Size
[ https://issues.apache.org/jira/browse/YARN-9259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-9259: -- Summary: Assign ApplicationMaster (AM) Heap Memory Based on Container Size (was: Assign ApplicationMaster (AM) Memory Based on Container Size) > Assign ApplicationMaster (AM) Heap Memory Based on Container Size > - > > Key: YARN-9259 > URL: https://issues.apache.org/jira/browse/YARN-9259 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: BELUGA BEHR >Priority: Major > > [YARN-7936] introduced a sane default value for the ApplicationMaster (AM) > Java heap size. However, [MAPREDUCE-5785] added a feature that sets the Java > Heap size of the Mapper/Reducer to be 80% (configurable, of course) of the > YARN container size. > Please add similar logic for MR ApplicationMaster. If the size of the AM > container is increased, the JVM heap size should be too automatically. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9259) Assign ApplicationMaster (AM) Memory Based on Container Size
BELUGA BEHR created YARN-9259: - Summary: Assign ApplicationMaster (AM) Memory Based on Container Size Key: YARN-9259 URL: https://issues.apache.org/jira/browse/YARN-9259 Project: Hadoop YARN Issue Type: Improvement Reporter: BELUGA BEHR [YARN-7936] introduced a sane default value for the ApplicationMaster (AM) Java heap size. However, [MAPREDUCE-5785] added a feature that sets the Java Heap size of the Mapper/Reducer to be 80% (configurable, of course) of the YARN container size. Please add similar logic for MR ApplicationMaster. If the size of the AM container is increased, the JVM heap size should be too automatically. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9189) Clarify FairScheduler submission logging
[ https://issues.apache.org/jira/browse/YARN-9189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739522#comment-16739522 ] BELUGA BEHR commented on YARN-9189: --- I was having the same problem as Patrick I think. I'm not sure how my local git repo got so completely out of sync. I blew it away and started from scratch, so this latest patch should be good. Sorry for the testing spam. > Clarify FairScheduler submission logging > > > Key: YARN-9189 > URL: https://issues.apache.org/jira/browse/YARN-9189 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 3.2.0 >Reporter: Patrick Bayne >Priority: Minor > Attachments: YARN-9189.1.patch, YARN-9189.2.patch, YARN-9189.3.patch, > YARN-9189.4.patch > > > Logging was ambiguous for the fairscheduler. It was unclear if the "total > number applications" was referring to the global total or the queue's total. > Fixed wording/spelling of output logging. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9189) Clarify FairScheduler submission logging
[ https://issues.apache.org/jira/browse/YARN-9189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-9189: -- Attachment: YARN-9189.4.patch > Clarify FairScheduler submission logging > > > Key: YARN-9189 > URL: https://issues.apache.org/jira/browse/YARN-9189 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 3.2.0 >Reporter: Patrick Bayne >Priority: Minor > Attachments: YARN-9189.1.patch, YARN-9189.2.patch, YARN-9189.3.patch, > YARN-9189.4.patch > > > Logging was ambiguous for the fairscheduler. It was unclear if the "total > number applications" was referring to the global total or the queue's total. > Fixed wording/spelling of output logging. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9189) Clarify FairScheduler submission logging
[ https://issues.apache.org/jira/browse/YARN-9189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738967#comment-16738967 ] BELUGA BEHR edited comment on YARN-9189 at 1/10/19 3:35 PM: I think the original work was done on the wrong branch. was (Author: belugabehr): I think the original work was done on the wrong branch. I also updated logging to use slf4j. > Clarify FairScheduler submission logging > > > Key: YARN-9189 > URL: https://issues.apache.org/jira/browse/YARN-9189 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 3.2.0 >Reporter: Patrick Bayne >Priority: Minor > Attachments: YARN-9189.1.patch, YARN-9189.2.patch, YARN-9189.3.patch, > YARN-9189.4.patch > > > Logging was ambiguous for the fairscheduler. It was unclear if the "total > number applications" was referring to the global total or the queue's total. > Fixed wording/spelling of output logging. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9189) Clarify FairScheduler submission logging
[ https://issues.apache.org/jira/browse/YARN-9189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-9189: -- Attachment: YARN-9189.3.patch > Clarify FairScheduler submission logging > > > Key: YARN-9189 > URL: https://issues.apache.org/jira/browse/YARN-9189 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 3.2.0 >Reporter: Patrick Bayne >Priority: Minor > Attachments: YARN-9189.1.patch, YARN-9189.2.patch, YARN-9189.3.patch > > > Logging was ambiguous for the fairscheduler. It was unclear if the "total > number applications" was referring to the global total or the queue's total. > Fixed wording/spelling of output logging. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9189) Clarify FairScheduler submission logging
[ https://issues.apache.org/jira/browse/YARN-9189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-9189: -- Attachment: YARN-9189.2.patch > Clarify FairScheduler submission logging > > > Key: YARN-9189 > URL: https://issues.apache.org/jira/browse/YARN-9189 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 3.2.0 >Reporter: Patrick Bayne >Priority: Minor > Attachments: YARN-9189.1.patch, YARN-9189.2.patch > > > Logging was ambiguous for the fairscheduler. It was unclear if the "total > number applications" was referring to the global total or the queue's total. > Fixed wording/spelling of output logging. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707273#comment-16707273 ] BELUGA BEHR commented on YARN-8789: --- [~pbacsko] The {{offer()}} method is wrapped in a 'while' clause so it will continue to attempt to put the event in the queue for as long as it takes. They are not lost. [~wilfreds] Customer is using a version of CDH from before [MAPREDUCE-5124] was introduced. This queue change is also intended to throttle. If the queue is full, the producers will wait (their threads will block). If they wait a long time, I imagine that the events coming from a remote clients like a Mapper or Reducer will simply timeout and fail. The tasks will have to be re-tried, but it is better, in my mind, to have to restart a subset of tasks than to kill the AM with an OOM and never complete. > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.10.patch, > YARN-8789.12.patch, YARN-8789.14.patch, YARN-8789.2.patch, YARN-8789.3.patch, > YARN-8789.4.patch, YARN-8789.5.patch, YARN-8789.6.patch, YARN-8789.7.patch, > YARN-8789.7.patch, YARN-8789.8.patch, YARN-8789.9.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. > Logging Message: > Size of event-queue is xxx -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707273#comment-16707273 ] BELUGA BEHR edited comment on YARN-8789 at 12/3/18 2:31 PM: [~pbacsko] The {{offer()}} method is wrapped in a 'while' clause so it will continue to attempt to put the event in the queue for as long as it takes. They are not lost. [~wilfreds] Customer is using a version of CDH from before [MAPREDUCE-5124] was introduced. This queue change is also intended to throttle clients and protect the AM. If the queue is full, the producers will wait (their threads will block). If they wait a long time, I imagine that the events coming from a remote clients like a Mapper or Reducer will simply timeout and fail. The tasks will have to be re-tried, but it is better, in my mind, to have to restart a subset of tasks than to kill the AM with an OOM and never complete. was (Author: belugabehr): [~pbacsko] The {{offer()}} method is wrapped in a 'while' clause so it will continue to attempt to put the event in the queue for as long as it takes. They are not lost. [~wilfreds] Customer is using a version of CDH from before [MAPREDUCE-5124] was introduced. This queue change is also intended to throttle. If the queue is full, the producers will wait (their threads will block). If they wait a long time, I imagine that the events coming from a remote clients like a Mapper or Reducer will simply timeout and fail. The tasks will have to be re-tried, but it is better, in my mind, to have to restart a subset of tasks than to kill the AM with an OOM and never complete. > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.10.patch, > YARN-8789.12.patch, YARN-8789.14.patch, YARN-8789.2.patch, YARN-8789.3.patch, > YARN-8789.4.patch, YARN-8789.5.patch, YARN-8789.6.patch, YARN-8789.7.patch, > YARN-8789.7.patch, YARN-8789.8.patch, YARN-8789.9.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. > Logging Message: > Size of event-queue is xxx -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9076) ContainerShellWebSocket Render Process Output
BELUGA BEHR created YARN-9076: - Summary: ContainerShellWebSocket Render Process Output Key: YARN-9076 URL: https://issues.apache.org/jira/browse/YARN-9076 Project: Hadoop YARN Issue Type: Improvement Components: webapp Affects Versions: 3.3.0 Reporter: BELUGA BEHR Attachments: YARN-9076.1.patch {code:java|title=ContainerShellWebSocket.java} // Render process output int no = pair.in.available(); pair.in.read(buffer, 0, Math.min(no, buffer.length)); String formatted = new String(buffer, Charset.forName("UTF-8")) .replaceAll("\n", "\r\n"); session.getRemote().sendString(formatted); } {code} This code strikes me as a bit odd. First of it, it is using {{available{}}} which is known as a being unreliable and inaccurate (i.e., for sockets) . Second, it will only read a max of 4000 characters and that's it. Anything else is truncated. Change this code to read the entire data stream. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9076) ContainerShellWebSocket Render Process Output
[ https://issues.apache.org/jira/browse/YARN-9076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR reassigned YARN-9076: - Assignee: BELUGA BEHR > ContainerShellWebSocket Render Process Output > - > > Key: YARN-9076 > URL: https://issues.apache.org/jira/browse/YARN-9076 > Project: Hadoop YARN > Issue Type: Improvement > Components: webapp >Affects Versions: 3.3.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: YARN-9076.1.patch > > > {code:java|title=ContainerShellWebSocket.java} > // Render process output > int no = pair.in.available(); > pair.in.read(buffer, 0, Math.min(no, buffer.length)); > String formatted = new String(buffer, Charset.forName("UTF-8")) > .replaceAll("\n", "\r\n"); > session.getRemote().sendString(formatted); > } > {code} > This code strikes me as a bit odd. First of it, it is using {{available{}}} > which is known as a being unreliable and inaccurate (i.e., for sockets) . > Second, it will only read a max of 4000 characters and that's it. Anything > else is truncated. > Change this code to read the entire data stream. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9076) ContainerShellWebSocket Render Process Output
[ https://issues.apache.org/jira/browse/YARN-9076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-9076: -- Attachment: YARN-9076.1.patch > ContainerShellWebSocket Render Process Output > - > > Key: YARN-9076 > URL: https://issues.apache.org/jira/browse/YARN-9076 > Project: Hadoop YARN > Issue Type: Improvement > Components: webapp >Affects Versions: 3.3.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: YARN-9076.1.patch > > > {code:java|title=ContainerShellWebSocket.java} > // Render process output > int no = pair.in.available(); > pair.in.read(buffer, 0, Math.min(no, buffer.length)); > String formatted = new String(buffer, Charset.forName("UTF-8")) > .replaceAll("\n", "\r\n"); > session.getRemote().sendString(formatted); > } > {code} > This code strikes me as a bit odd. First of it, it is using {{available{}}} > which is known as a being unreliable and inaccurate (i.e., for sockets) . > Second, it will only read a max of 4000 characters and that's it. Anything > else is truncated. > Change this code to read the entire data stream. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8928) TestRMAdminService is failing
[ https://issues.apache.org/jira/browse/YARN-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8928: -- Attachment: YARN-8928.3.patch > TestRMAdminService is failing > - > > Key: YARN-8928 > URL: https://issues.apache.org/jira/browse/YARN-8928 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Jason Lowe >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8928.1.patch, YARN-8928.2.patch, YARN-8928.3.patch > > > After HADOOP-15836 TestRMAdminService has started failing consistently. > Sample stacktraces to follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8928) TestRMAdminService is failing
[ https://issues.apache.org/jira/browse/YARN-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659637#comment-16659637 ] BELUGA BEHR commented on YARN-8928: --- Hey [~elgoiri], thanks for the review. I'm sorry for the headache for you, especially since you've been very supportive of my efforts to scrub the code base. I'm working on HADOOP-12640, I actually did some work against this class a long time ago which I can now complete. I can use that ticket as an opportunity to update the JavaDoc to include information about ordering. I agree that it should be in there. I don't want to do too much in this ticket though, I just want the project to pass all unit tests ASAP. In regard to implementing {{buildAclString}} with a {{TreeSet}} seems a bit overkill. The test does not have to implement things the same way as the actual code (which may change at some point in the future). It just need to put things in order. However, I can move the comments to the new test method. > TestRMAdminService is failing > - > > Key: YARN-8928 > URL: https://issues.apache.org/jira/browse/YARN-8928 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Jason Lowe >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8928.1.patch, YARN-8928.2.patch > > > After HADOOP-15836 TestRMAdminService has started failing consistently. > Sample stacktraces to follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8928) TestRMAdminService is failing
[ https://issues.apache.org/jira/browse/YARN-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8928: -- Attachment: YARN-8928.2.patch > TestRMAdminService is failing > - > > Key: YARN-8928 > URL: https://issues.apache.org/jira/browse/YARN-8928 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Jason Lowe >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8928.1.patch, YARN-8928.2.patch > > > After HADOOP-15836 TestRMAdminService has started failing consistently. > Sample stacktraces to follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8928) TestRMAdminService is failing
[ https://issues.apache.org/jira/browse/YARN-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659138#comment-16659138 ] BELUGA BEHR commented on YARN-8928: --- One of the challenges here is that the tests use the name of the user that is running the test, since it was made alphabetical, the name of the user can impact the test. I've included a patch to address. > TestRMAdminService is failing > - > > Key: YARN-8928 > URL: https://issues.apache.org/jira/browse/YARN-8928 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Jason Lowe >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8928.1.patch > > > After HADOOP-15836 TestRMAdminService has started failing consistently. > Sample stacktraces to follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8928) TestRMAdminService is failing
[ https://issues.apache.org/jira/browse/YARN-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8928: -- Attachment: YARN-8928.1.patch > TestRMAdminService is failing > - > > Key: YARN-8928 > URL: https://issues.apache.org/jira/browse/YARN-8928 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Jason Lowe >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8928.1.patch > > > After HADOOP-15836 TestRMAdminService has started failing consistently. > Sample stacktraces to follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8928) TestRMAdminService is failing
[ https://issues.apache.org/jira/browse/YARN-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR reassigned YARN-8928: - Assignee: BELUGA BEHR > TestRMAdminService is failing > - > > Key: YARN-8928 > URL: https://issues.apache.org/jira/browse/YARN-8928 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Jason Lowe >Assignee: BELUGA BEHR >Priority: Major > > After HADOOP-15836 TestRMAdminService has started failing consistently. > Sample stacktraces to follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8894) Improve InMemoryPlan.java toString
[ https://issues.apache.org/jira/browse/YARN-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR reassigned YARN-8894: - Assignee: BELUGA BEHR > Improve InMemoryPlan.java toString > -- > > Key: YARN-8894 > URL: https://issues.apache.org/jira/browse/YARN-8894 > Project: Hadoop YARN > Issue Type: Improvement > Components: reservation system >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: YARN-8894.1.patch > > > * Replace {{StringBuffer}} with {{StringBuilder}} > * Add spaces between fields for readability -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8894) Improve InMemoryPlan.java toString
[ https://issues.apache.org/jira/browse/YARN-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8894: -- Attachment: YARN-8894.1.patch > Improve InMemoryPlan.java toString > -- > > Key: YARN-8894 > URL: https://issues.apache.org/jira/browse/YARN-8894 > Project: Hadoop YARN > Issue Type: Improvement > Components: reservation system >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: YARN-8894.1.patch > > > * Replace {{StringBuffer}} with {{StringBuilder}} > * Add spaces between fields for readability -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8894) Improve InMemoryPlan.java toString
BELUGA BEHR created YARN-8894: - Summary: Improve InMemoryPlan.java toString Key: YARN-8894 URL: https://issues.apache.org/jira/browse/YARN-8894 Project: Hadoop YARN Issue Type: Improvement Components: reservation system Affects Versions: 3.2.0 Reporter: BELUGA BEHR * Replace {{StringBuffer}} with {{StringBuilder}} * Add spaces between fields for readability -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8878) Remove StringBuffer from ManagedParentQueue.java
[ https://issues.apache.org/jira/browse/YARN-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8878: -- Attachment: YARN-8878.1.patch > Remove StringBuffer from ManagedParentQueue.java > > > Key: YARN-8878 > URL: https://issues.apache.org/jira/browse/YARN-8878 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Attachments: YARN-8878.1.patch > > > Remove all {{StringBuffer}} references from the class {{ManagedParentQueue}}. > {{StringBuffer}} are synchronized and should be instead replaced with the > non-synchronized {{StringBuilder}}. However, in this case, just use {{SLF4J}} > parameter logging. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8878) Remove StringBuffer from ManagedParentQueue.java
[ https://issues.apache.org/jira/browse/YARN-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8878: -- Attachment: (was: YARN-8878.1.patch) > Remove StringBuffer from ManagedParentQueue.java > > > Key: YARN-8878 > URL: https://issues.apache.org/jira/browse/YARN-8878 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > > Remove all {{StringBuffer}} references from the class {{ManagedParentQueue}}. > {{StringBuffer}} are synchronized and should be instead replaced with the > non-synchronized {{StringBuilder}}. However, in this case, just use {{SLF4J}} > parameter logging. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8878) Remove StringBuffer from ManagedParentQueue.java
[ https://issues.apache.org/jira/browse/YARN-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8878: -- Attachment: YARN-8878.1.patch > Remove StringBuffer from ManagedParentQueue.java > > > Key: YARN-8878 > URL: https://issues.apache.org/jira/browse/YARN-8878 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Attachments: YARN-8878.1.patch > > > Remove all {{StringBuffer}} references from the class {{ManagedParentQueue}}. > {{StringBuffer}} are synchronized and should be instead replaced with the > non-synchronized {{StringBuilder}}. However, in this case, just use {{SLF4J}} > parameter logging. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8878) Remove StringBuffer from ManagedParentQueue.java
[ https://issues.apache.org/jira/browse/YARN-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR reassigned YARN-8878: - Assignee: BELUGA BEHR > Remove StringBuffer from ManagedParentQueue.java > > > Key: YARN-8878 > URL: https://issues.apache.org/jira/browse/YARN-8878 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Attachments: YARN-8878.1.patch > > > Remove all {{StringBuffer}} references from the class {{ManagedParentQueue}}. > {{StringBuffer}} are synchronized and should be instead replaced with the > non-synchronized {{StringBuilder}}. However, in this case, just use {{SLF4J}} > parameter logging. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8878) Remove StringBuffer from ManagedParentQueue.java
BELUGA BEHR created YARN-8878: - Summary: Remove StringBuffer from ManagedParentQueue.java Key: YARN-8878 URL: https://issues.apache.org/jira/browse/YARN-8878 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 3.2.0 Reporter: BELUGA BEHR Attachments: YARN-8878.1.patch Remove all {{StringBuffer}} references from the class {{ManagedParentQueue}}. {{StringBuffer}} are synchronized and should be instead replaced with the non-synchronized {{StringBuilder}}. However, in this case, just use {{SLF4J}} parameter logging. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8146) Remove LinkedList From resourcemanager.reservation.planning Package
[ https://issues.apache.org/jira/browse/YARN-8146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR reassigned YARN-8146: - Assignee: BELUGA BEHR > Remove LinkedList From resourcemanager.reservation.planning Package > --- > > Key: YARN-8146 > URL: https://issues.apache.org/jira/browse/YARN-8146 > Project: Hadoop YARN > Issue Type: Improvement > Components: reservation system >Affects Versions: 3.0.1 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Attachments: YARN-8146.1.patch > > > Remove {{LinkedList}} instances in favor of {{ArrayList}}. {{ArrayList}} is > generally more memory efficient, require less memory fragmentation, and with > memory localization, faster to iterate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8837) TestNMProxy.testNMProxyRPCRetry Improvement
[ https://issues.apache.org/jira/browse/YARN-8837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633511#comment-16633511 ] BELUGA BEHR commented on YARN-8837: --- {{testNMProxyRetry()}} failed right on cue. New assertion error message: {code:java} Expected: an instance of java.net.SocketException but: http://wiki.apache.org/hadoop/UnknownHost> is a java.net.UnknownHostException at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) at org.junit.Assert.assertThat(Assert.java:865) at org.junit.Assert.assertThat(Assert.java:832) at org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRPCRetry(TestNMProxy.java:173) {code} Please accept this patch for inclusion in the project. > TestNMProxy.testNMProxyRPCRetry Improvement > --- > > Key: YARN-8837 > URL: https://issues.apache.org/jira/browse/YARN-8837 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Attachments: YARN-8789.1.patch > > > The unit test > {{org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRetry()}} > has had some issues in the past. You can search JIRA for it, but one example > is [YARN-5104]. I recently had some issues with it myself and found the > follow change helpful in troubleshooting. > {code:java|title=Current Implementation} > } catch (IOException e) { > // socket exception should be thrown immediately, without RPC retries. > Assert.assertTrue(e instanceof java.net.SocketException); > } > {code} > The issue here is that the test is true/false. The testing framework does > not give me any feedback regarding the type of exception that was thrown, it > just says "assertion failed." -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8837) TestNMProxy.testNMProxyRPCRetry Improvement
[ https://issues.apache.org/jira/browse/YARN-8837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8837: -- Attachment: YARN-8789.1.patch > TestNMProxy.testNMProxyRPCRetry Improvement > --- > > Key: YARN-8837 > URL: https://issues.apache.org/jira/browse/YARN-8837 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Attachments: YARN-8789.1.patch > > > The unit test > {{org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRetry()}} > has had some issues in the past. You can search JIRA for it, but one example > is [YARN-5104]. I recently had some issues with it myself and found the > follow change helpful in troubleshooting. > {code:java|title=Current Implementation} > } catch (IOException e) { > // socket exception should be thrown immediately, without RPC retries. > Assert.assertTrue(e instanceof java.net.SocketException); > } > {code} > The issue here is that the test is true/false. The testing framework does > not give me any feedback regarding the type of exception that was thrown, it > just says "assertion failed." -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8837) TestNMProxy.testNMProxyRPCRetry Improvement
[ https://issues.apache.org/jira/browse/YARN-8837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR reassigned YARN-8837: - Assignee: BELUGA BEHR > TestNMProxy.testNMProxyRPCRetry Improvement > --- > > Key: YARN-8837 > URL: https://issues.apache.org/jira/browse/YARN-8837 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > > The unit test > {{org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRetry()}} > has had some issues in the past. You can search JIRA for it, but one example > is [YARN-5104]. I recently had some issues with it myself and found the > follow change helpful in troubleshooting. > {code:java|title=Current Implementation} > } catch (IOException e) { > // socket exception should be thrown immediately, without RPC retries. > Assert.assertTrue(e instanceof java.net.SocketException); > } > {code} > The issue here is that the test is true/false. The testing framework does > not give me any feedback regarding the type of exception that was thrown, it > just says "assertion failed." -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8837) TestNMProxy.testNMProxyRPCRetry Improvement
BELUGA BEHR created YARN-8837: - Summary: TestNMProxy.testNMProxyRPCRetry Improvement Key: YARN-8837 URL: https://issues.apache.org/jira/browse/YARN-8837 Project: Hadoop YARN Issue Type: Improvement Components: yarn Affects Versions: 3.2.0 Reporter: BELUGA BEHR The unit test {{org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRetry()}} has had some issues in the past. You can search JIRA for it, but one example is [YARN-5104]. I recently had some issues with it myself and found the follow change helpful in troubleshooting. {code:java|title=Current Implementation} } catch (IOException e) { // socket exception should be thrown immediately, without RPC retries. Assert.assertTrue(e instanceof java.net.SocketException); } {code} The issue here is that the test is true/false. The testing framework does not give me any feedback regarding the type of exception that was thrown, it just says "assertion failed." -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633375#comment-16633375 ] BELUGA BEHR commented on YARN-8789: --- The one failed unit test passes locally. Seems to have problems regularly with this test YARN-5104 Please consider the latest patch [^YARN-8789.14.patch] for inclusion into the project. Thanks! {code:java} [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.683 s - in org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy [INFO] [INFO] Results: [INFO] [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0{code} > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.10.patch, > YARN-8789.12.patch, YARN-8789.14.patch, YARN-8789.2.patch, YARN-8789.3.patch, > YARN-8789.4.patch, YARN-8789.5.patch, YARN-8789.6.patch, YARN-8789.7.patch, > YARN-8789.7.patch, YARN-8789.8.patch, YARN-8789.9.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. > Logging Message: > Size of event-queue is xxx -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.14.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.10.patch, > YARN-8789.12.patch, YARN-8789.14.patch, YARN-8789.2.patch, YARN-8789.3.patch, > YARN-8789.4.patch, YARN-8789.5.patch, YARN-8789.6.patch, YARN-8789.7.patch, > YARN-8789.7.patch, YARN-8789.8.patch, YARN-8789.9.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. > Logging Message: > Size of event-queue is xxx -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.12.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.10.patch, > YARN-8789.12.patch, YARN-8789.2.patch, YARN-8789.3.patch, YARN-8789.4.patch, > YARN-8789.5.patch, YARN-8789.6.patch, YARN-8789.7.patch, YARN-8789.7.patch, > YARN-8789.8.patch, YARN-8789.9.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. > Logging Message: > Size of event-queue is xxx -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8832) Review of RMCommunicator Class
[ https://issues.apache.org/jira/browse/YARN-8832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR reassigned YARN-8832: - Assignee: BELUGA BEHR > Review of RMCommunicator Class > -- > > Key: YARN-8832 > URL: https://issues.apache.org/jira/browse/YARN-8832 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: YARN-88321.patch > > > Various improvements to the {{RMCommunicator}} class. > > * Use SLF4J parameterized logging > * Use switch statement instead of {{if}}-{{else statements}} > * Remove anti-pattern of "log and throw" (just throw) > * Use a flag to stop thread instead of an interrupt (it may be interrupting > the heartbeat code and not the thread loop) > * The main thread loops performs loops on the heartbeat callback queue until > the queue is empty. It's technically possible that other threads could > constantly put new callbacks into the queue and therefore the main thread > never progresses past the callbacks. Put a cap on the number of callbacks > that will be processed in any iteration. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8832) Review of RMCommunicator Class
[ https://issues.apache.org/jira/browse/YARN-8832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8832: -- Attachment: YARN-88321.patch > Review of RMCommunicator Class > -- > > Key: YARN-8832 > URL: https://issues.apache.org/jira/browse/YARN-8832 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: YARN-88321.patch > > > Various improvements to the {{RMCommunicator}} class. > > * Use SLF4J parameterized logging > * Use switch statement instead of {{if}}-{{else statements}} > * Remove anti-pattern of "log and throw" (just throw) > * Use a flag to stop thread instead of an interrupt (it may be interrupting > the heartbeat code and not the thread loop) > * The main thread loops performs loops on the heartbeat callback queue until > the queue is empty. It's technically possible that other threads could > constantly put new callbacks into the queue and therefore the main thread > never progresses past the callbacks. Put a cap on the number of callbacks > that will be processed in any iteration. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8832) Review of RMCommunicator Class
BELUGA BEHR created YARN-8832: - Summary: Review of RMCommunicator Class Key: YARN-8832 URL: https://issues.apache.org/jira/browse/YARN-8832 Project: Hadoop YARN Issue Type: Improvement Components: applications Affects Versions: 3.2.0 Reporter: BELUGA BEHR Various improvements to the {{RMCommunicator}} class. * Use SLF4J parameterized logging * Use switch statement instead of {{if}}-{{else statements}} * Remove anti-pattern of "log and throw" (just throw) * Use a flag to stop thread instead of an interrupt (it may be interrupting the heartbeat code and not the thread loop) * The main thread loops performs loops on the heartbeat callback queue until the queue is empty. It's technically possible that other threads could constantly put new callbacks into the queue and therefore the main thread never progresses past the callbacks. Put a cap on the number of callbacks that will be processed in any iteration. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8831) Review of LocalContainerAllocator
BELUGA BEHR created YARN-8831: - Summary: Review of LocalContainerAllocator Key: YARN-8831 URL: https://issues.apache.org/jira/browse/YARN-8831 Project: Hadoop YARN Issue Type: Improvement Components: applications Affects Versions: 3.2.0 Reporter: BELUGA BEHR Attachments: YARN-8831.1.patch Some trivial cleanup of class {{LocalContainerAllocator}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8831) Review of LocalContainerAllocator
[ https://issues.apache.org/jira/browse/YARN-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8831: -- Priority: Trivial (was: Minor) > Review of LocalContainerAllocator > - > > Key: YARN-8831 > URL: https://issues.apache.org/jira/browse/YARN-8831 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Priority: Trivial > Attachments: YARN-8831.1.patch > > > Some trivial cleanup of class {{LocalContainerAllocator}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8831) Review of LocalContainerAllocator
[ https://issues.apache.org/jira/browse/YARN-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR reassigned YARN-8831: - Assignee: BELUGA BEHR > Review of LocalContainerAllocator > - > > Key: YARN-8831 > URL: https://issues.apache.org/jira/browse/YARN-8831 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Attachments: YARN-8831.1.patch > > > Some trivial cleanup of class {{LocalContainerAllocator}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8831) Review of LocalContainerAllocator
[ https://issues.apache.org/jira/browse/YARN-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8831: -- Attachment: YARN-8831.1.patch > Review of LocalContainerAllocator > - > > Key: YARN-8831 > URL: https://issues.apache.org/jira/browse/YARN-8831 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Priority: Trivial > Attachments: YARN-8831.1.patch > > > Some trivial cleanup of class {{LocalContainerAllocator}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.10.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.10.patch, > YARN-8789.2.patch, YARN-8789.3.patch, YARN-8789.4.patch, YARN-8789.5.patch, > YARN-8789.6.patch, YARN-8789.7.patch, YARN-8789.7.patch, YARN-8789.8.patch, > YARN-8789.9.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. > Logging Message: > Size of event-queue is xxx -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.9.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch, YARN-8789.3.patch, > YARN-8789.4.patch, YARN-8789.5.patch, YARN-8789.6.patch, YARN-8789.7.patch, > YARN-8789.7.patch, YARN-8789.8.patch, YARN-8789.9.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. > Logging Message: > Size of event-queue is xxx -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.8.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch, YARN-8789.3.patch, > YARN-8789.4.patch, YARN-8789.5.patch, YARN-8789.6.patch, YARN-8789.7.patch, > YARN-8789.7.patch, YARN-8789.8.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. > Logging Message: > Size of event-queue is xxx -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.7.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch, YARN-8789.3.patch, > YARN-8789.4.patch, YARN-8789.5.patch, YARN-8789.6.patch, YARN-8789.7.patch, > YARN-8789.7.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. > Logging Message: > Size of event-queue is xxx -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.7.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch, YARN-8789.3.patch, > YARN-8789.4.patch, YARN-8789.5.patch, YARN-8789.6.patch, YARN-8789.7.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. > Logging Message: > Size of event-queue is xxx -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8816) YARN Unit Tests Fail with Ubuntu VM
[ https://issues.apache.org/jira/browse/YARN-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8816: -- Description: {code} Linux apache-dev 4.15.0-34-generic #37~16.04.1-Ubuntu SMP Tue Aug 28 10:44:06 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux {code} {code} [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 3.926 s <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands [ERROR] testRemoveApplicationFromStateStoreCmdForZK(org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands) Time elapsed: 2.668 s <<< ERROR! java.lang.ExceptionInInitializerError at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:316) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:304) at org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1828) at org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:710) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:660) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:571) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:286) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.serviceInit(MockRM.java:1381) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:164) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:143) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:139) at org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands.testRemoveApplicationFromStateStoreCmdForZK(TestRMStoreCommands.java:79) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413) Caused by: java.lang.NullPointerException at org.apache.hadoop.security.SecurityUtil$QualifiedHostResolver.(SecurityUtil.java:593) at org.apache.hadoop.security.SecurityUtil.setTokenServiceUseIp(SecurityUtil.java:129) at org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(SecurityUtil.java:102) at org.apache.hadoop.security.SecurityUtil.(SecurityUtil.java:88) ... 38 more {code} was: {code} [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 3.926 s <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands [ERROR] testRemoveApplicationFromStateStoreCmdForZK(org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands) Time elapsed: 2.668 s <<< ERROR!
[jira] [Created] (YARN-8816) YARN Unit Tests Fail with Ubuntu VM
BELUGA BEHR created YARN-8816: - Summary: YARN Unit Tests Fail with Ubuntu VM Key: YARN-8816 URL: https://issues.apache.org/jira/browse/YARN-8816 Project: Hadoop YARN Issue Type: Improvement Components: yarn Affects Versions: 3.2.0 Reporter: BELUGA BEHR {code} [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 3.926 s <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands [ERROR] testRemoveApplicationFromStateStoreCmdForZK(org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands) Time elapsed: 2.668 s <<< ERROR! java.lang.ExceptionInInitializerError at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:316) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:304) at org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1828) at org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:710) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:660) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:571) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:286) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.serviceInit(MockRM.java:1381) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:164) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:143) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:139) at org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands.testRemoveApplicationFromStateStoreCmdForZK(TestRMStoreCommands.java:79) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413) Caused by: java.lang.NullPointerException at org.apache.hadoop.security.SecurityUtil$QualifiedHostResolver.(SecurityUtil.java:593) at org.apache.hadoop.security.SecurityUtil.setTokenServiceUseIp(SecurityUtil.java:129) at org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(SecurityUtil.java:102) at org.apache.hadoop.security.SecurityUtil.(SecurityUtil.java:88) ... 38 more {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.6.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch, YARN-8789.3.patch, > YARN-8789.4.patch, YARN-8789.5.patch, YARN-8789.6.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. > Logging Message: > Size of event-queue is xxx -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.5.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch, YARN-8789.3.patch, > YARN-8789.4.patch, YARN-8789.5.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. > Logging Message: > Size of event-queue is xxx -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.4.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch, YARN-8789.3.patch, > YARN-8789.4.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. > Logging Message: > Size of event-queue is xxx -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Description: I recently came across a scenario where an MR ApplicationMaster was failing with an OOM exception. It had many thousands of Mappers and thousands of Reducers. It was noted that in the logging that the event-queue of {{AsyncDispatcher}} had a very large number of item in it and was seemingly never decreasing. I started looking at the code and thought it could use some clean up, simplification, and the ability to specify a bounded queue so that any incoming events are throttled until they can be processed. This will protect the ApplicationMaster from a flood of events. Logging Message: Size of event-queue is xxx was: I recently came across a scenario where an MR ApplicationMaster was failing with an OOM exception. It had many thousands of Mappers and thousands of Reducers. It was noted that in the logging that the event-queue of {{AsyncDispatcher}} had a very large number of item in it and was seemingly never decreasing. I started looking at the code and thought it could use some clean up, simplification, and the ability to specify a bounded queue so that any incoming events are throttled until they can be processed. This will protect the ApplicationMaster from a flood of events. > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch, YARN-8789.3.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. > Logging Message: > Size of event-queue is xxx -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.3.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch, YARN-8789.3.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.2.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: (was: YARN-8789.2.patch) > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.2.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: (was: YARN-8789.2.patch) > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.2.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: (was: YARN-8789.2.patch) > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.2.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.2.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.1.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR reassigned YARN-8789: - Assignee: BELUGA BEHR > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8789) Add BoundedQueue to AsyncDispatcher
BELUGA BEHR created YARN-8789: - Summary: Add BoundedQueue to AsyncDispatcher Key: YARN-8789 URL: https://issues.apache.org/jira/browse/YARN-8789 Project: Hadoop YARN Issue Type: Improvement Components: applications Affects Versions: 3.2.0 Reporter: BELUGA BEHR I recently came across a scenario where an MR ApplicationMaster was failing with an OOM exception. It had many thousands of Mappers and thousands of Reducers. It was noted that in the logging that the event-queue of {{AsyncDispatcher}} had a very large number of item in it and was seemingly never decreasing. I started looking at the code and thought it could use some clean up, simplification, and the ability to specify a bounded queue so that any incoming events are throttled until they can be processed. This will protect the ApplicationMaster from a flood of events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8169) Review RackResolver.java
[ https://issues.apache.org/jira/browse/YARN-8169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442506#comment-16442506 ] BELUGA BEHR commented on YARN-8169: --- [~ajisakaa] Checkstyle corrected :) > Review RackResolver.java > > > Key: YARN-8169 > URL: https://issues.apache.org/jira/browse/YARN-8169 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.0.1 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Attachments: YARN-8169.1.patch, YARN.8169.2.patch > > > # Use SLF4J > # Fix some checkstyle warnings > # Minor clean up -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8169) Review RackResolver.java
[ https://issues.apache.org/jira/browse/YARN-8169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8169: -- Attachment: YARN.8169.2.patch > Review RackResolver.java > > > Key: YARN-8169 > URL: https://issues.apache.org/jira/browse/YARN-8169 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.0.1 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Attachments: YARN-8169.1.patch, YARN.8169.2.patch > > > # Use SLF4J > # Fix some checkstyle warnings > # Minor clean up -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8169) Review RackResolver.java
[ https://issues.apache.org/jira/browse/YARN-8169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442403#comment-16442403 ] BELUGA BEHR commented on YARN-8169: --- [~leftnoteasy] Parameters are best for slf4j: # Avoids double-checking the 'debug enabled' flag # Faster than run-time String concatenation # Less code clutter # Produces smaller product binary (saves memory and execution cache) https://www.slf4j.org/faq.html#logging_performance > Review RackResolver.java > > > Key: YARN-8169 > URL: https://issues.apache.org/jira/browse/YARN-8169 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.0.1 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Attachments: YARN-8169.1.patch > > > # Use SLF4J > # Fix some checkstyle warnings > # Minor clean up -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8170) Caching Node Rack Location
BELUGA BEHR created YARN-8170: - Summary: Caching Node Rack Location Key: YARN-8170 URL: https://issues.apache.org/jira/browse/YARN-8170 Project: Hadoop YARN Issue Type: New Feature Components: applications, nodemanager Affects Versions: 3.0.1 Reporter: BELUGA BEHR When the MapReduce ApplicationMaster is trying to assign Mappers to Nodes, it loops all of the queued Mappers and looks up the ideal rack location of each Mapper. Under the covers, the rack awareness script is being called, once per Mapper. The results do get cached, but for only as long as the ApplicationMaster exists. That means that the script gets called N times each time a new ApplicationMaster is launched. If the rack awareness script is complex or requires an external lookup, this can be a slow process and can even DDOS the external lookup source. There are at least a couple of ways to tackle this... # Add a DNSToSwitchMapping implementation that caches in an external cache (i.e., memcached) instead of memory so that all ApplicationMasters can share the same cache and would rarely call the rack awareness script. # Like the shuffle service, add a new NodeManager auxiliary which exposes a rack lookup API so that the NodeManagers are responsible for caching the rack locations. This would also require a DNSToSwitchMapping implementation that interacts with this new service. # Other? {code:java} String host = allocated.getNodeId().getHost(); String rack = RackResolver.resolve(host).getNetworkLocation(); {code} [https://github.com/apache/hadoop/blob/453d48bdfbb67ed3e66c33c4aef239c3d7bdd3bc/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#L1435-L1464] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8169) Review RackResolver.java
[ https://issues.apache.org/jira/browse/YARN-8169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8169: -- Attachment: YARN-8169.1.patch > Review RackResolver.java > > > Key: YARN-8169 > URL: https://issues.apache.org/jira/browse/YARN-8169 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.0.1 >Reporter: BELUGA BEHR >Priority: Trivial > Attachments: YARN-8169.1.patch > > > # Use SLF4J > # Fix some checkstyle warnings > # Minor clean up -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8169) Review RackResolver.java
BELUGA BEHR created YARN-8169: - Summary: Review RackResolver.java Key: YARN-8169 URL: https://issues.apache.org/jira/browse/YARN-8169 Project: Hadoop YARN Issue Type: Improvement Components: yarn Affects Versions: 3.0.1 Reporter: BELUGA BEHR Attachments: YARN-8169.1.patch # Use SLF4J # Fix some checkstyle warnings # Minor clean up -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441037#comment-16441037 ] BELUGA BEHR commented on YARN-7962: --- [~billie.rinaldi] [~wilfreds] Any additional concerns? Please commit. > Race Condition When Stopping DelegationTokenRenewer > --- > > Key: YARN-7962 > URL: https://issues.apache.org/jira/browse/YARN-7962 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Priority: Minor > Attachments: YARN-7962.1.patch, YARN-7962.2.patch, YARN-7962.3.patch, > YARN-7962.4.patch > > > [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java] > {code:java} > private ThreadPoolExecutor renewerService; > private void processDelegationTokenRenewerEvent( > DelegationTokenRenewerEvent evt) { > serviceStateLock.readLock().lock(); > try { > if (isServiceStarted) { > renewerService.execute(new DelegationTokenRenewerRunnable(evt)); > } else { > pendingEventQueue.add(evt); > } > } finally { > serviceStateLock.readLock().unlock(); > } > } > @Override > protected void serviceStop() { > if (renewalTimer != null) { > renewalTimer.cancel(); > } > appTokens.clear(); > allTokens.clear(); > this.renewerService.shutdown(); > {code} > {code:java} > 2018-02-21 11:18:16,253 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.util.concurrent.RejectedExecutionException: Task > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2 > rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > What I think is going on here is that the {{serviceStop}} method is not > setting the {{isServiceStarted}} flag to 'false'. > Please update so that the {{serviceStop}} method grabs the > {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before > shutting down the {{renewerService}} thread pool, to avoid this condition. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8146) Remove LinkedList From resourcemanager.reservation.planning Package
[ https://issues.apache.org/jira/browse/YARN-8146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8146: -- Priority: Trivial (was: Minor) > Remove LinkedList From resourcemanager.reservation.planning Package > --- > > Key: YARN-8146 > URL: https://issues.apache.org/jira/browse/YARN-8146 > Project: Hadoop YARN > Issue Type: Improvement > Components: reservation system >Affects Versions: 3.0.1 >Reporter: BELUGA BEHR >Priority: Trivial > Attachments: YARN-8146.1.patch > > > Remove {{LinkedList}} instances in favor of {{ArrayList}}. {{ArrayList}} is > generally more memory efficient, require less memory fragmentation, and with > memory localization, faster to iterate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8146) Remove LinkedList From resourcemanager.reservation.planning Package
[ https://issues.apache.org/jira/browse/YARN-8146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8146: -- Attachment: YARN-8146.1.patch > Remove LinkedList From resourcemanager.reservation.planning Package > --- > > Key: YARN-8146 > URL: https://issues.apache.org/jira/browse/YARN-8146 > Project: Hadoop YARN > Issue Type: Improvement > Components: reservation system >Affects Versions: 3.0.1 >Reporter: BELUGA BEHR >Priority: Minor > Attachments: YARN-8146.1.patch > > > Remove {{LinkedList}} instances in favor of {{ArrayList}}. {{ArrayList}} is > generally more memory efficient, require less memory fragmentation, and with > memory localization, faster to iterate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8146) Remove LinkedList From resourcemanager.reservation.planning Package
BELUGA BEHR created YARN-8146: - Summary: Remove LinkedList From resourcemanager.reservation.planning Package Key: YARN-8146 URL: https://issues.apache.org/jira/browse/YARN-8146 Project: Hadoop YARN Issue Type: Improvement Components: reservation system Affects Versions: 3.0.1 Reporter: BELUGA BEHR Remove {{LinkedList}} instances in favor of {{ArrayList}}. {{ArrayList}} is generally more memory efficient, require less memory fragmentation, and with memory localization, faster to iterate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425525#comment-16425525 ] BELUGA BEHR commented on YARN-7962: --- [~wilfreds] {{try...finally}} is a best practice. bq. It is recommended practice to always immediately follow a call to lock with a try block, most typically in a before/after construction https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReentrantLock.html Please consider patch version 3 for inclusion into the project. > Race Condition When Stopping DelegationTokenRenewer > --- > > Key: YARN-7962 > URL: https://issues.apache.org/jira/browse/YARN-7962 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Priority: Minor > Attachments: YARN-7962.1.patch, YARN-7962.2.patch, YARN-7962.3.patch, > YARN-7962.4.patch > > > [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java] > {code:java} > private ThreadPoolExecutor renewerService; > private void processDelegationTokenRenewerEvent( > DelegationTokenRenewerEvent evt) { > serviceStateLock.readLock().lock(); > try { > if (isServiceStarted) { > renewerService.execute(new DelegationTokenRenewerRunnable(evt)); > } else { > pendingEventQueue.add(evt); > } > } finally { > serviceStateLock.readLock().unlock(); > } > } > @Override > protected void serviceStop() { > if (renewalTimer != null) { > renewalTimer.cancel(); > } > appTokens.clear(); > allTokens.clear(); > this.renewerService.shutdown(); > {code} > {code:java} > 2018-02-21 11:18:16,253 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.util.concurrent.RejectedExecutionException: Task > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2 > rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > What I think is going on here is that the {{serviceStop}} method is not > setting the {{isServiceStarted}} flag to 'false'. > Please update so that the {{serviceStop}} method grabs the > {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before > shutting down the {{renewerService}} thread pool, to avoid this condition. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-7962: -- Attachment: YARN-7962.3.patch > Race Condition When Stopping DelegationTokenRenewer > --- > > Key: YARN-7962 > URL: https://issues.apache.org/jira/browse/YARN-7962 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Priority: Minor > Attachments: YARN-7962.1.patch, YARN-7962.2.patch, YARN-7962.3.patch, > YARN-7962.4.patch > > > [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java] > {code:java} > private ThreadPoolExecutor renewerService; > private void processDelegationTokenRenewerEvent( > DelegationTokenRenewerEvent evt) { > serviceStateLock.readLock().lock(); > try { > if (isServiceStarted) { > renewerService.execute(new DelegationTokenRenewerRunnable(evt)); > } else { > pendingEventQueue.add(evt); > } > } finally { > serviceStateLock.readLock().unlock(); > } > } > @Override > protected void serviceStop() { > if (renewalTimer != null) { > renewalTimer.cancel(); > } > appTokens.clear(); > allTokens.clear(); > this.renewerService.shutdown(); > {code} > {code:java} > 2018-02-21 11:18:16,253 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.util.concurrent.RejectedExecutionException: Task > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2 > rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > What I think is going on here is that the {{serviceStop}} method is not > setting the {{isServiceStarted}} flag to 'false'. > Please update so that the {{serviceStop}} method grabs the > {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before > shutting down the {{renewerService}} thread pool, to avoid this condition. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-7962: -- Attachment: (was: YARN-7962.3.patch) > Race Condition When Stopping DelegationTokenRenewer > --- > > Key: YARN-7962 > URL: https://issues.apache.org/jira/browse/YARN-7962 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Priority: Minor > Attachments: YARN-7962.1.patch, YARN-7962.2.patch, YARN-7962.3.patch, > YARN-7962.4.patch > > > [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java] > {code:java} > private ThreadPoolExecutor renewerService; > private void processDelegationTokenRenewerEvent( > DelegationTokenRenewerEvent evt) { > serviceStateLock.readLock().lock(); > try { > if (isServiceStarted) { > renewerService.execute(new DelegationTokenRenewerRunnable(evt)); > } else { > pendingEventQueue.add(evt); > } > } finally { > serviceStateLock.readLock().unlock(); > } > } > @Override > protected void serviceStop() { > if (renewalTimer != null) { > renewalTimer.cancel(); > } > appTokens.clear(); > allTokens.clear(); > this.renewerService.shutdown(); > {code} > {code:java} > 2018-02-21 11:18:16,253 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.util.concurrent.RejectedExecutionException: Task > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2 > rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > What I think is going on here is that the {{serviceStop}} method is not > setting the {{isServiceStarted}} flag to 'false'. > Please update so that the {{serviceStop}} method grabs the > {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before > shutting down the {{renewerService}} thread pool, to avoid this condition. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-7962: -- Attachment: YARN-7962.4.patch > Race Condition When Stopping DelegationTokenRenewer > --- > > Key: YARN-7962 > URL: https://issues.apache.org/jira/browse/YARN-7962 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Priority: Minor > Attachments: YARN-7962.1.patch, YARN-7962.2.patch, YARN-7962.3.patch, > YARN-7962.4.patch > > > [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java] > {code:java} > private ThreadPoolExecutor renewerService; > private void processDelegationTokenRenewerEvent( > DelegationTokenRenewerEvent evt) { > serviceStateLock.readLock().lock(); > try { > if (isServiceStarted) { > renewerService.execute(new DelegationTokenRenewerRunnable(evt)); > } else { > pendingEventQueue.add(evt); > } > } finally { > serviceStateLock.readLock().unlock(); > } > } > @Override > protected void serviceStop() { > if (renewalTimer != null) { > renewalTimer.cancel(); > } > appTokens.clear(); > allTokens.clear(); > this.renewerService.shutdown(); > {code} > {code:java} > 2018-02-21 11:18:16,253 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.util.concurrent.RejectedExecutionException: Task > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2 > rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > What I think is going on here is that the {{serviceStop}} method is not > setting the {{isServiceStarted}} flag to 'false'. > Please update so that the {{serviceStop}} method grabs the > {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before > shutting down the {{renewerService}} thread pool, to avoid this condition. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419830#comment-16419830 ] BELUGA BEHR commented on YARN-7962: --- [~billie.rinaldi] Thank you for the assist! I have been on vacation now for a little bit and have been unable to work on this. > Race Condition When Stopping DelegationTokenRenewer > --- > > Key: YARN-7962 > URL: https://issues.apache.org/jira/browse/YARN-7962 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Priority: Minor > Attachments: YARN-7962.1.patch, YARN-7962.2.patch, YARN-7962.3.patch > > > [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java] > {code:java} > private ThreadPoolExecutor renewerService; > private void processDelegationTokenRenewerEvent( > DelegationTokenRenewerEvent evt) { > serviceStateLock.readLock().lock(); > try { > if (isServiceStarted) { > renewerService.execute(new DelegationTokenRenewerRunnable(evt)); > } else { > pendingEventQueue.add(evt); > } > } finally { > serviceStateLock.readLock().unlock(); > } > } > @Override > protected void serviceStop() { > if (renewalTimer != null) { > renewalTimer.cancel(); > } > appTokens.clear(); > allTokens.clear(); > this.renewerService.shutdown(); > {code} > {code:java} > 2018-02-21 11:18:16,253 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.util.concurrent.RejectedExecutionException: Task > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2 > rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > What I think is going on here is that the {{serviceStop}} method is not > setting the {{isServiceStarted}} flag to 'false'. > Please update so that the {{serviceStop}} method grabs the > {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before > shutting down the {{renewerService}} thread pool, to avoid this condition. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-7962: -- Attachment: YARN-7962.2.patch > Race Condition When Stopping DelegationTokenRenewer > --- > > Key: YARN-7962 > URL: https://issues.apache.org/jira/browse/YARN-7962 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Priority: Minor > Attachments: YARN-7962.1.patch, YARN-7962.2.patch > > > [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java] > {code:java} > private ThreadPoolExecutor renewerService; > private void processDelegationTokenRenewerEvent( > DelegationTokenRenewerEvent evt) { > serviceStateLock.readLock().lock(); > try { > if (isServiceStarted) { > renewerService.execute(new DelegationTokenRenewerRunnable(evt)); > } else { > pendingEventQueue.add(evt); > } > } finally { > serviceStateLock.readLock().unlock(); > } > } > @Override > protected void serviceStop() { > if (renewalTimer != null) { > renewalTimer.cancel(); > } > appTokens.clear(); > allTokens.clear(); > this.renewerService.shutdown(); > {code} > {code:java} > 2018-02-21 11:18:16,253 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.util.concurrent.RejectedExecutionException: Task > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2 > rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > What I think is going on here is that the {{serviceStop}} method is not > setting the {{isServiceStarted}} flag to 'false'. > Please update so that the {{serviceStop}} method grabs the > {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before > shutting down the {{renewerService}} thread pool, to avoid this condition. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402242#comment-16402242 ] BELUGA BEHR commented on YARN-7962: --- [~wilfreds] Can you please provide thoughts on how to unit test a race condition of this sort? How to introduce pauses into the locked code? Also, there technically isn't a need to lock on the initialization. It's just a safety and good practice item. There will be almost no overheard since we will only initialize one time (or maybe a couple) of times, so it doesn't hurt to be safe. > Race Condition When Stopping DelegationTokenRenewer > --- > > Key: YARN-7962 > URL: https://issues.apache.org/jira/browse/YARN-7962 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Priority: Minor > Attachments: YARN-7962.1.patch > > > [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java] > {code:java} > private ThreadPoolExecutor renewerService; > private void processDelegationTokenRenewerEvent( > DelegationTokenRenewerEvent evt) { > serviceStateLock.readLock().lock(); > try { > if (isServiceStarted) { > renewerService.execute(new DelegationTokenRenewerRunnable(evt)); > } else { > pendingEventQueue.add(evt); > } > } finally { > serviceStateLock.readLock().unlock(); > } > } > @Override > protected void serviceStop() { > if (renewalTimer != null) { > renewalTimer.cancel(); > } > appTokens.clear(); > allTokens.clear(); > this.renewerService.shutdown(); > {code} > {code:java} > 2018-02-21 11:18:16,253 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.util.concurrent.RejectedExecutionException: Task > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2 > rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > What I think is going on here is that the {{serviceStop}} method is not > setting the {{isServiceStarted}} flag to 'false'. > Please update so that the {{serviceStop}} method grabs the > {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before > shutting down the {{renewerService}} thread pool, to avoid this condition. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377552#comment-16377552 ] BELUGA BEHR commented on YARN-7962: --- Unit test failures appear to be unrelated. > Race Condition When Stopping DelegationTokenRenewer > --- > > Key: YARN-7962 > URL: https://issues.apache.org/jira/browse/YARN-7962 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Priority: Minor > Attachments: YARN-7962.1.patch > > > [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java] > {code:java} > private ThreadPoolExecutor renewerService; > private void processDelegationTokenRenewerEvent( > DelegationTokenRenewerEvent evt) { > serviceStateLock.readLock().lock(); > try { > if (isServiceStarted) { > renewerService.execute(new DelegationTokenRenewerRunnable(evt)); > } else { > pendingEventQueue.add(evt); > } > } finally { > serviceStateLock.readLock().unlock(); > } > } > @Override > protected void serviceStop() { > if (renewalTimer != null) { > renewalTimer.cancel(); > } > appTokens.clear(); > allTokens.clear(); > this.renewerService.shutdown(); > {code} > {code:java} > 2018-02-21 11:18:16,253 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.util.concurrent.RejectedExecutionException: Task > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2 > rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > What I think is going on here is that the {{serviceStop}} method is not > setting the {{isServiceStarted}} flag to 'false'. > Please update so that the {{serviceStop}} method grabs the > {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before > shutting down the {{renewerService}} thread pool, to avoid this condition. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374999#comment-16374999 ] BELUGA BEHR commented on YARN-7962: --- Also tighten things up a little (make start and stop symmetrical) when it comes to blocking. > Race Condition When Stopping DelegationTokenRenewer > --- > > Key: YARN-7962 > URL: https://issues.apache.org/jira/browse/YARN-7962 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Priority: Minor > Attachments: YARN-7962.1.patch > > > [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java] > {code:java} > private ThreadPoolExecutor renewerService; > private void processDelegationTokenRenewerEvent( > DelegationTokenRenewerEvent evt) { > serviceStateLock.readLock().lock(); > try { > if (isServiceStarted) { > renewerService.execute(new DelegationTokenRenewerRunnable(evt)); > } else { > pendingEventQueue.add(evt); > } > } finally { > serviceStateLock.readLock().unlock(); > } > } > @Override > protected void serviceStop() { > if (renewalTimer != null) { > renewalTimer.cancel(); > } > appTokens.clear(); > allTokens.clear(); > this.renewerService.shutdown(); > {code} > {code:java} > 2018-02-21 11:18:16,253 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.util.concurrent.RejectedExecutionException: Task > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2 > rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > What I think is going on here is that the {{serviceStop}} method is not > setting the {{isServiceStarted}} flag to 'false'. > Please update so that the {{serviceStop}} method grabs the > {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before > shutting down the {{renewerService}} thread pool, to avoid this condition. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-7962: -- Attachment: YARN-7962.1.patch > Race Condition When Stopping DelegationTokenRenewer > --- > > Key: YARN-7962 > URL: https://issues.apache.org/jira/browse/YARN-7962 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Priority: Minor > Attachments: YARN-7962.1.patch > > > [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java] > {code:java} > private ThreadPoolExecutor renewerService; > private void processDelegationTokenRenewerEvent( > DelegationTokenRenewerEvent evt) { > serviceStateLock.readLock().lock(); > try { > if (isServiceStarted) { > renewerService.execute(new DelegationTokenRenewerRunnable(evt)); > } else { > pendingEventQueue.add(evt); > } > } finally { > serviceStateLock.readLock().unlock(); > } > } > @Override > protected void serviceStop() { > if (renewalTimer != null) { > renewalTimer.cancel(); > } > appTokens.clear(); > allTokens.clear(); > this.renewerService.shutdown(); > {code} > {code:java} > 2018-02-21 11:18:16,253 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.util.concurrent.RejectedExecutionException: Task > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2 > rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > What I think is going on here is that the {{serviceStop}} method is not > setting the {{isServiceStarted}} flag to 'false'. > Please update so that the {{serviceStop}} method grabs the > {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before > shutting down the {{renewerService}} thread pool, to avoid this condition. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373587#comment-16373587 ] BELUGA BEHR commented on YARN-7962: --- {quote}pool size = 0, active threads = 0, queued tasks = 0{quote} Pool size is 0 because the pool was shut down is my guess. > Race Condition When Stopping DelegationTokenRenewer > --- > > Key: YARN-7962 > URL: https://issues.apache.org/jira/browse/YARN-7962 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Priority: Minor > > [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java] > {code:java} > private ThreadPoolExecutor renewerService; > private void processDelegationTokenRenewerEvent( > DelegationTokenRenewerEvent evt) { > serviceStateLock.readLock().lock(); > try { > if (isServiceStarted) { > renewerService.execute(new DelegationTokenRenewerRunnable(evt)); > } else { > pendingEventQueue.add(evt); > } > } finally { > serviceStateLock.readLock().unlock(); > } > } > @Override > protected void serviceStop() { > if (renewalTimer != null) { > renewalTimer.cancel(); > } > appTokens.clear(); > allTokens.clear(); > this.renewerService.shutdown(); > {code} > {code:java} > 2018-02-21 11:18:16,253 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.util.concurrent.RejectedExecutionException: Task > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2 > rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > What I think is going on here is that the {{serviceStop}} method is not > setting the {{isServiceStarted}} flag to 'false'. > Please update so that the {{serviceStop}} method grabs the > {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before > shutting down the {{renewerService}} thread pool, to avoid this condition. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer
BELUGA BEHR created YARN-7962: - Summary: Race Condition When Stopping DelegationTokenRenewer Key: YARN-7962 URL: https://issues.apache.org/jira/browse/YARN-7962 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0 Reporter: BELUGA BEHR [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java] {code:java} private ThreadPoolExecutor renewerService; private void processDelegationTokenRenewerEvent( DelegationTokenRenewerEvent evt) { serviceStateLock.readLock().lock(); try { if (isServiceStarted) { renewerService.execute(new DelegationTokenRenewerRunnable(evt)); } else { pendingEventQueue.add(evt); } } finally { serviceStateLock.readLock().unlock(); } } @Override protected void serviceStop() { if (renewalTimer != null) { renewalTimer.cancel(); } appTokens.clear(); allTokens.clear(); this.renewerService.shutdown(); {code} {code:java} 2018-02-21 11:18:16,253 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.util.concurrent.RejectedExecutionException: Task org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2 rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) at java.lang.Thread.run(Thread.java:745) {code} What I think is going on here is that the {{serviceStop}} method is not setting the {{isServiceStarted}} flag to 'false'. Please update so that the {{serviceStop}} method grabs the {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before shutting down the {{renewerService}} thread pool, to avoid this condition. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7688) Miscellaneous Improvements To ProcfsBasedProcessTree
[ https://issues.apache.org/jira/browse/YARN-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308841#comment-16308841 ] BELUGA BEHR commented on YARN-7688: --- [~miklos.szeg...@cloudera.com] Kindly consider this patch to the project. :) > Miscellaneous Improvements To ProcfsBasedProcessTree > > > Key: YARN-7688 > URL: https://issues.apache.org/jira/browse/YARN-7688 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Priority: Minor > Attachments: YARN-7688.1.patch, YARN-7688.2.patch, YARN-7688.3.patch, > YARN-7688.4.patch > > > * Use ArrayDeque for performance instead of LinkedList > * Use more Apache Commons routines to replace existing implementations > * Remove superfluous code guards around DEBUG statements > * Remove superfluous annotations in the tests > * Other small improvements -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7687) ContainerLogAppender Improvements
[ https://issues.apache.org/jira/browse/YARN-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-7687: -- Attachment: YARN-7687.3.patch > ContainerLogAppender Improvements > - > > Key: YARN-7687 > URL: https://issues.apache.org/jira/browse/YARN-7687 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Priority: Trivial > Attachments: YARN-7687.1.patch, YARN-7687.2.patch, YARN-7687.3.patch > > > * Use Array-backed collection instead of LinkedList > * Ignore calls to {{close()}} after the initial call > * Clear the queue after {{close}} is called to let garbage collection do its > magic on the items inside of it > * Fix int-to-long conversion issue (overflow) > * Remove superfluous white space -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7687) ContainerLogAppender Improvements
[ https://issues.apache.org/jira/browse/YARN-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-7687: -- Attachment: YARN-7687.2.patch > ContainerLogAppender Improvements > - > > Key: YARN-7687 > URL: https://issues.apache.org/jira/browse/YARN-7687 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Priority: Trivial > Attachments: YARN-7687.1.patch, YARN-7687.2.patch > > > * Use Array-backed collection instead of LinkedList > * Ignore calls to {{close()}} after the initial call > * Clear the queue after {{close}} is called to let garbage collection do its > magic on the items inside of it > * Fix int-to-long conversion issue (overflow) > * Remove superfluous white space -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7687) ContainerLogAppender Improvements
[ https://issues.apache.org/jira/browse/YARN-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-7687: -- Attachment: (was: YARN-7687.2.patch) > ContainerLogAppender Improvements > - > > Key: YARN-7687 > URL: https://issues.apache.org/jira/browse/YARN-7687 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Priority: Trivial > Attachments: YARN-7687.1.patch, YARN-7687.2.patch > > > * Use Array-backed collection instead of LinkedList > * Ignore calls to {{close()}} after the initial call > * Clear the queue after {{close}} is called to let garbage collection do its > magic on the items inside of it > * Fix int-to-long conversion issue (overflow) > * Remove superfluous white space -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org