[jira] [Commented] (YARN-6013) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-02-02 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850106#comment-15850106 ] Steven Rand commented on YARN-6013: --- [~djp], I'm wondering whether you have any opinions here, since

[jira] [Updated] (YARN-6013) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-02-06 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated YARN-6013: -- Attachment: YARN-6013-branch-2.8.0.002.patch // Attaching a better patch file now that I've gotten

[jira] [Updated] (YARN-6013) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-02-06 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated YARN-6013: -- Attachment: (was: YARN-6013-branch-2.8.0.001.patch) > ApplicationMasterProtocolPBClientImpl.allocate

[jira] [Commented] (YARN-6013) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-01-24 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836662#comment-15836662 ] Steven Rand commented on YARN-6013: --- This is also an issue on 2.8.0-RC1. [~jianhe], do you think there's

[jira] [Commented] (YARN-6013) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-01-24 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836703#comment-15836703 ] Steven Rand commented on YARN-6013: --- As an additional data point, setting {{hadoop.rpc.protection}} to

[jira] [Issue Comment Deleted] (YARN-6013) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-01-26 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated YARN-6013: -- Comment: was deleted (was: A bit more information: I've isolated the problem to the private class

[jira] [Commented] (YARN-6013) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-01-26 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840119#comment-15840119 ] Steven Rand commented on YARN-6013: --- // I deleted my above comment because it was inaccurate. Looked

[jira] [Commented] (YARN-6013) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-01-26 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840286#comment-15840286 ] Steven Rand commented on YARN-6013: --- Also the reason why other RPC calls are working is evidently that

[jira] [Commented] (YARN-6013) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-01-26 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840244#comment-15840244 ] Steven Rand commented on YARN-6013: --- The problem is that {{Client$IpcStreams#readResponse}} is trying to

[jira] [Commented] (YARN-6013) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-01-25 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838345#comment-15838345 ] Steven Rand commented on YARN-6013: --- A bit more information: I've isolated the problem to the private

[jira] [Commented] (YARN-5753) fix NPE in AMRMClientImpl.getMatchingRequests()

2016-11-15 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667438#comment-15667438 ] Steven Rand commented on YARN-5753: --- I'm seeing the same error in a spark-shell with Spark built against

[jira] [Commented] (YARN-6013) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-01-13 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822290#comment-15822290 ] Steven Rand commented on YARN-6013: --- Relevant part of AM container log at DEBUG level: {code} 2017-01-13

[jira] [Commented] (YARN-6013) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-01-13 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822052#comment-15822052 ] Steven Rand commented on YARN-6013: --- This issue also reproduces on latest branch-2.8.0 (most recent

[jira] [Updated] (YARN-6013) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2016-12-20 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated YARN-6013: -- Attachment: yarn-rm-log.txt [~jianhe], I've attached the log from the Resource Manager while the distcp

[jira] [Commented] (YARN-2985) YARN should support to delete the aggregated logs for Non-MapReduce applications

2016-12-25 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15777023#comment-15777023 ] Steven Rand commented on YARN-2985: --- I would like to submit a patch for this issue, but I don't seem to

[jira] [Created] (YARN-6013) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2016-12-19 Thread Steven Rand (JIRA)
Steven Rand created YARN-6013: - Summary: ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled Key: YARN-6013 URL: https://issues.apache.org/jira/browse/YARN-6013

[jira] [Updated] (YARN-2985) YARN should support to delete the aggregated logs for Non-MapReduce applications

2017-03-30 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated YARN-2985: -- Attachment: YARN-2985-branch-2-001.patch Attaching a patch for branch-2. I've tested this experimentally

[jira] [Commented] (YARN-2985) YARN should support to delete the aggregated logs for Non-MapReduce applications

2017-04-13 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968609#comment-15968609 ] Steven Rand commented on YARN-2985: --- [~jlowe], thanks for the thoughtful response. Based on that

[jira] [Commented] (YARN-6308) Fix TestAMRMClient compilation errors

2017-03-08 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902269#comment-15902269 ] Steven Rand commented on YARN-6308: --- Attached a new patch to HADOOP-14062, though I think this issue

[jira] [Resolved] (YARN-6120) add retention of aggregated logs to Timeline Server

2017-03-08 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand resolved YARN-6120. --- Resolution: Duplicate I now have the ability to submit a patch for YARN-2985, so this duplicate JIRA

[jira] [Assigned] (YARN-2985) YARN should support to delete the aggregated logs for Non-MapReduce applications

2017-03-08 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand reassigned YARN-2985: - Assignee: Steven Rand > YARN should support to delete the aggregated logs for Non-MapReduce >

[jira] [Updated] (YARN-6956) preemption may only consider resource requests for one node

2017-08-13 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated YARN-6956: -- Attachment: YARN-6956.001.patch > preemption may only consider resource requests for one node >

[jira] [Comment Edited] (YARN-6956) preemption may only consider resource requests for one node

2017-08-13 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125041#comment-16125041 ] Steven Rand edited comment on YARN-6956 at 8/13/17 8:30 PM: Thanks for the

[jira] [Commented] (YARN-6956) preemption may only consider resource requests for one node

2017-08-13 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125041#comment-16125041 ] Steven Rand commented on YARN-6956: --- Thanks for the clarifications. All three of those suggestions make

[jira] [Assigned] (YARN-6956) preemption may only consider resource requests for one node

2017-08-13 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand reassigned YARN-6956: - Assignee: Steven Rand > preemption may only consider resource requests for one node >

[jira] [Created] (YARN-6956) preemption may only consider resource requests for one node

2017-08-05 Thread Steven Rand (JIRA)
Steven Rand created YARN-6956: - Summary: preemption may only consider resource requests for one node Key: YARN-6956 URL: https://issues.apache.org/jira/browse/YARN-6956 Project: Hadoop YARN

[jira] [Updated] (YARN-6956) preemption may only consider resource requests for one node

2017-08-05 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated YARN-6956: -- Description: I'm observing the following series of events on a CDH 5.11.0 cluster, which seem to be

[jira] [Updated] (YARN-6956) preemption may only consider resource requests for one node

2017-08-05 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated YARN-6956: -- Description: I'm observing the following series of events on a CDH 5.11.0 cluster, which seem to be

[jira] [Commented] (YARN-6956) preemption may only consider resource requests for one node

2017-08-08 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119135#comment-16119135 ] Steven Rand commented on YARN-6956: --- Hi [~kasha], thanks for the suggestions. I would definitely like to

[jira] [Commented] (YARN-6960) definition of active queue allows idle long-running apps to distort fair shares

2017-08-08 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118146#comment-16118146 ] Steven Rand commented on YARN-6960: --- Yep, that concern is definitely valid. I wrote a patch that

[jira] [Comment Edited] (YARN-6956) preemption may only consider resource requests for one node

2017-08-06 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115847#comment-16115847 ] Steven Rand edited comment on YARN-6956 at 8/6/17 4:35 PM: --- Hi

[jira] [Commented] (YARN-6956) preemption may only consider resource requests for one node

2017-08-06 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115847#comment-16115847 ] Steven Rand commented on YARN-6956: --- Hi [~dan...@cloudera.com], thanks for the quick reply and

[jira] [Created] (YARN-6960) definition of active queue allows idle long-running apps to distort fair shares

2017-08-07 Thread Steven Rand (JIRA)
Steven Rand created YARN-6960: - Summary: definition of active queue allows idle long-running apps to distort fair shares Key: YARN-6960 URL: https://issues.apache.org/jira/browse/YARN-6960 Project:

[jira] [Updated] (YARN-6960) definition of active queue allows idle long-running apps to distort fair shares

2017-08-20 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated YARN-6960: -- Attachment: YARN-6960.002.patch Attaching a slightly modified patch that sets the fair share of an

[jira] [Commented] (YARN-6960) definition of active queue allows idle long-running apps to distort fair shares

2017-08-22 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137446#comment-16137446 ] Steven Rand commented on YARN-6960: --- Thanks, Daniel. Having thought about this some more, I don't think

[jira] [Assigned] (YARN-7290) canContainerBePreempted can return true when it shouldn't

2017-10-04 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand reassigned YARN-7290: - Assignee: Steven Rand > canContainerBePreempted can return true when it shouldn't >

[jira] [Created] (YARN-7290) canContainerBePreempted can return true when it shouldn't

2017-10-04 Thread Steven Rand (JIRA)
Steven Rand created YARN-7290: - Summary: canContainerBePreempted can return true when it shouldn't Key: YARN-7290 URL: https://issues.apache.org/jira/browse/YARN-7290 Project: Hadoop YARN Issue

[jira] [Updated] (YARN-7290) canContainerBePreempted can return true when it shouldn't

2017-10-04 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated YARN-7290: -- Attachment: YARN-7290.001.patch Added a patch which I _think_ fixes both issues. All tests in

[jira] [Updated] (YARN-7290) canContainerBePreempted can return true when it shouldn't

2017-10-04 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated YARN-7290: -- Attachment: YARN-7290.002.patch Adding a new patch to make checkstyles happy. The tests in

[jira] [Commented] (YARN-7290) canContainerBePreempted can return true when it shouldn't

2017-10-04 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192324#comment-16192324 ] Steven Rand commented on YARN-7290: --- An additional problem is that we call

[jira] [Updated] (YARN-7290) canContainerBePreempted can return true when it shouldn't

2017-10-04 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated YARN-7290: -- Attachment: YARN-7290-failing-test.patch > canContainerBePreempted can return true when it shouldn't >

[jira] [Commented] (YARN-6956) preemption may only consider resource requests for one node

2017-09-05 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16154671#comment-16154671 ] Steven Rand commented on YARN-6956: --- Friendly ping [~kasha] and/or [~templedf]. I'll fix the checkstyle

[jira] [Commented] (YARN-5742) Serve aggregated logs of historical apps from timeline service

2017-09-12 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162609#comment-16162609 ] Steven Rand commented on YARN-5742: --- Would it also be reasonable for the Timeline Service to enforce

[jira] [Commented] (YARN-4227) FairScheduler: RM quits processing expired container from a removed node

2017-08-21 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135938#comment-16135938 ] Steven Rand commented on YARN-4227: --- I'm seeing a similar issue on what's roughly branch-2 (CDH 5.11.0),

[jira] [Updated] (YARN-6960) definition of active queue allows idle long-running apps to distort fair shares

2017-08-20 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated YARN-6960: -- Attachment: YARN-6960.001.patch > definition of active queue allows idle long-running apps to distort

[jira] [Commented] (YARN-6960) definition of active queue allows idle long-running apps to distort fair shares

2017-08-20 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134416#comment-16134416 ] Steven Rand commented on YARN-6960: --- [~dan...@cloudera.com], I've uploaded a patch proposing a new

[jira] [Commented] (YARN-4227) FairScheduler: RM quits processing expired container from a removed node

2017-09-05 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16153423#comment-16153423 ] Steven Rand commented on YARN-4227: --- [~wilfreds], I can rebase the patch if you like. It seems to be

[jira] [Commented] (YARN-7290) canContainerBePreempted can return true when it shouldn't

2017-10-18 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210418#comment-16210418 ] Steven Rand commented on YARN-7290: --- Thanks [~templedf]. For what it's worth, I was able to repro this on

[jira] [Commented] (YARN-4227) FairScheduler: RM quits processing expired container from a removed node

2017-10-23 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216148#comment-16216148 ] Steven Rand commented on YARN-4227: --- Sorry, I was mistaken when I said the patch attached to this JIRA

[jira] [Commented] (YARN-4227) FairScheduler: RM quits processing expired container from a removed node

2017-10-23 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216201#comment-16216201 ] Steven Rand commented on YARN-4227: --- Maybe we could have ClusterNodeTracker#getNode check to see if

[jira] [Updated] (YARN-7290) canContainerBePreempted can return true when it shouldn't

2017-11-22 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated YARN-7290: -- Attachment: YARN-7290.005.patch Thanks, [~yufeigu]. Attaching a new patch which removes the list of

[jira] [Commented] (YARN-7655) avoid AM preemption caused by RRs for specific nodes or racks

2017-12-13 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290073#comment-16290073 ] Steven Rand commented on YARN-7655: --- One issue I'm having with the test in the patch is that preemption

[jira] [Created] (YARN-7655) avoid AM preemption caused by RRs for specific nodes or racks

2017-12-13 Thread Steven Rand (JIRA)
Steven Rand created YARN-7655: - Summary: avoid AM preemption caused by RRs for specific nodes or racks Key: YARN-7655 URL: https://issues.apache.org/jira/browse/YARN-7655 Project: Hadoop YARN

[jira] [Updated] (YARN-7655) avoid AM preemption caused by RRs for specific nodes or racks

2017-12-13 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated YARN-7655: -- Attachment: YARN-7655-001.patch > avoid AM preemption caused by RRs for specific nodes or racks >

[jira] [Updated] (YARN-7290) canContainerBePreempted can return true when it shouldn't

2017-11-19 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated YARN-7290: -- Attachment: YARN-7290.003.patch Uploaded a new patch to try to make the test a bit nicer. [~templedf],

[jira] [Updated] (YARN-7290) canContainerBePreempted can return true when it shouldn't

2017-11-21 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated YARN-7290: -- Attachment: YARN-7290.004.patch Thanks for reviewing, [~yufeigu]! I've attached a new patch which

[jira] [Comment Edited] (YARN-7391) Consider square root instead of natural log for size-based weight

2017-10-29 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223996#comment-16223996 ] Steven Rand edited comment on YARN-7391 at 10/29/17 1:27 PM: - [~templedf] and

[jira] [Commented] (YARN-7391) Consider square root instead of natural log for size-based weight

2017-10-29 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223996#comment-16223996 ] Steven Rand commented on YARN-7391: --- [~templedf] and [~yufeigu], thanks for commenting. Apologies for not

[jira] [Updated] (YARN-7391) Consider square root instead of natural log for size-based weight

2017-10-29 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated YARN-7391: -- Attachment: YARN-7391-001.patch I know this is still under discussion, but attached a patch just to make

[jira] [Created] (YARN-7391) Consider square root instead of natural log for size-based weight

2017-10-25 Thread Steven Rand (JIRA)
Steven Rand created YARN-7391: - Summary: Consider square root instead of natural log for size-based weight Key: YARN-7391 URL: https://issues.apache.org/jira/browse/YARN-7391 Project: Hadoop YARN

[jira] [Commented] (YARN-7655) avoid AM preemption caused by RRs for specific nodes or racks

2018-01-14 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325928#comment-16325928 ] Steven Rand commented on YARN-7655: --- Thanks [~yufeigu] for taking a look. The cluster sizes and nodes

[jira] [Comment Edited] (YARN-7655) avoid AM preemption caused by RRs for specific nodes or racks

2018-01-14 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325928#comment-16325928 ] Steven Rand edited comment on YARN-7655 at 1/15/18 6:49 AM: Thanks [~yufeigu]

[jira] [Commented] (YARN-7655) avoid AM preemption caused by RRs for specific nodes or racks

2018-01-26 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341964#comment-16341964 ] Steven Rand commented on YARN-7655: --- I'm not sure whether many AMs wind up on a limited number of NMs.

[jira] [Commented] (YARN-7655) avoid AM preemption caused by RRs for specific nodes or racks

2018-01-29 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344335#comment-16344335 ] Steven Rand commented on YARN-7655: --- Sounds good, thanks! > avoid AM preemption caused by RRs for

[jira] [Updated] (YARN-7655) avoid AM preemption caused by RRs for specific nodes or racks

2018-02-05 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated YARN-7655: -- Attachment: YARN-7655-003.patch > avoid AM preemption caused by RRs for specific nodes or racks >

[jira] [Commented] (YARN-7655) avoid AM preemption caused by RRs for specific nodes or racks

2018-02-05 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353279#comment-16353279 ] Steven Rand commented on YARN-7655: --- The concern I have with all three RRs being the same size is that we

[jira] [Commented] (YARN-7903) Method getStarvedResourceRequests() only consider the first encountered resource

2018-02-07 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356327#comment-16356327 ] Steven Rand commented on YARN-7903: --- Agreed that it seems weird/wrong to ignore locality when considering

[jira] [Updated] (YARN-7655) avoid AM preemption caused by RRs for specific nodes or racks

2018-02-07 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated YARN-7655: -- Attachment: YARN-7655-004.patch > avoid AM preemption caused by RRs for specific nodes or racks >

[jira] [Commented] (YARN-7655) avoid AM preemption caused by RRs for specific nodes or racks

2018-02-07 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356308#comment-16356308 ] Steven Rand commented on YARN-7655: --- Sounds good, I revised the patch to mention YARN-7903 in a comment

[jira] [Commented] (YARN-7903) Method getStarvedResourceRequests() only consider the first encountered resource

2018-02-08 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357923#comment-16357923 ] Steven Rand commented on YARN-7903: --- Agreed that having a concept of delay scheduling for preemption is a

[jira] [Created] (YARN-7910) Fix TODO in TestFairSchedulerPreemption#testRelaxLocalityToNotPreemptAM

2018-02-08 Thread Steven Rand (JIRA)
Steven Rand created YARN-7910: - Summary: Fix TODO in TestFairSchedulerPreemption#testRelaxLocalityToNotPreemptAM Key: YARN-7910 URL: https://issues.apache.org/jira/browse/YARN-7910 Project: Hadoop YARN

[jira] [Commented] (YARN-7655) Avoid AM preemption caused by RRs for specific nodes or racks

2018-02-08 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357913#comment-16357913 ] Steven Rand commented on YARN-7655: --- Thanks [~yufeigu]. I filed YARN-7910 for the {{TODO}} in the unit

[jira] [Commented] (YARN-7655) avoid AM preemption caused by RRs for specific nodes or racks

2018-02-04 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351704#comment-16351704 ] Steven Rand commented on YARN-7655: --- Thanks [~yufeigu], new patch is attached. Unfortunately I'm still

[jira] [Updated] (YARN-7655) avoid AM preemption caused by RRs for specific nodes or racks

2018-02-04 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated YARN-7655: -- Attachment: YARN-7655-002.patch > avoid AM preemption caused by RRs for specific nodes or racks >

[jira] [Created] (YARN-7911) Method identifyContainersToPreempt uses ResourceRequest#getRelaxLocality incorrectly

2018-02-08 Thread Steven Rand (JIRA)
Steven Rand created YARN-7911: - Summary: Method identifyContainersToPreempt uses ResourceRequest#getRelaxLocality incorrectly Key: YARN-7911 URL: https://issues.apache.org/jira/browse/YARN-7911 Project:

[jira] [Commented] (YARN-7655) avoid AM preemption caused by RRs for specific nodes or racks

2018-01-03 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310553#comment-16310553 ] Steven Rand commented on YARN-7655: --- Tagging [~yufeigu] and [~templedf] for thoughts. I can work through

[jira] [Created] (YARN-8903) when NM becomes unhealthy due to local disk usage, have option to kill application using most space instead of releasing all containers on node

2018-10-17 Thread Steven Rand (JIRA)
Steven Rand created YARN-8903: - Summary: when NM becomes unhealthy due to local disk usage, have option to kill application using most space instead of releasing all containers on node Key: YARN-8903 URL:

[jira] [Commented] (YARN-9041) Optimize FSPreemptionThread#identifyContainersToPreempt method

2018-11-20 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694329#comment-16694329 ] Steven Rand commented on YARN-9041: --- I'm not sure that this is correct. I think that it can lead to

[jira] [Commented] (YARN-9066) Deprecate Fair Scheduler min share

2018-11-27 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701348#comment-16701348 ] Steven Rand commented on YARN-9066: --- +1 -- I agree with the attached doc that since a schedulable's fair

[jira] [Commented] (YARN-9041) Optimize FSPreemptionThread#identifyContainersToPreempt method

2018-11-26 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699843#comment-16699843 ] Steven Rand commented on YARN-9041: --- Yes, the v2 patch resolves my concern -- thanks [~jiwq] for fixing

[jira] [Commented] (YARN-9041) Optimize FSPreemptionThread#identifyContainersToPreempt method

2018-11-28 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702538#comment-16702538 ] Steven Rand commented on YARN-9041: --- bq. If we not allowed relax locality, it will executes three

[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766783#comment-16766783 ] Steven Rand commented on YARN-9277: --- {code} +// We should not preempt container which has been

[jira] [Comment Edited] (YARN-4946) RM should not consider an application as COMPLETED when log aggregation is not in a terminal state

2019-08-02 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898599#comment-16898599 ] Steven Rand edited comment on YARN-4946 at 8/2/19 6:17 AM: --- I noticed after

[jira] [Commented] (YARN-4946) RM should not consider an application as COMPLETED when log aggregation is not in a terminal state

2019-08-02 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898599#comment-16898599 ] Steven Rand commented on YARN-4946: --- I noticed after upgrading a cluster to 3.2.0 that RM recovery now

[jira] [Commented] (YARN-4946) RM should not consider an application as COMPLETED when log aggregation is not in a terminal state

2019-08-06 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900669#comment-16900669 ] Steven Rand commented on YARN-4946: --- I reverted this patch in our fork, and now RM recovery time is back

[jira] [Created] (YARN-9850) document or revert change in which DefaultContainerExecutor no longer propagates NM env to containers

2019-09-21 Thread Steven Rand (Jira)
Steven Rand created YARN-9850: - Summary: document or revert change in which DefaultContainerExecutor no longer propagates NM env to containers Key: YARN-9850 URL: https://issues.apache.org/jira/browse/YARN-9850

[jira] [Commented] (YARN-9552) FairScheduler: NODE_UPDATE can cause NoSuchElementException

2019-09-19 Thread Steven Rand (Jira)
[ https://issues.apache.org/jira/browse/YARN-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934050#comment-16934050 ] Steven Rand commented on YARN-9552: --- This seems like an important fix since it prevents the RM from

[jira] [Created] (YARN-9848) revert YARN-4946

2019-09-19 Thread Steven Rand (Jira)
Steven Rand created YARN-9848: - Summary: revert YARN-4946 Key: YARN-9848 URL: https://issues.apache.org/jira/browse/YARN-9848 Project: Hadoop YARN Issue Type: Bug Components:

[jira] [Commented] (YARN-4946) RM should not consider an application as COMPLETED when log aggregation is not in a terminal state

2019-09-20 Thread Steven Rand (Jira)
[ https://issues.apache.org/jira/browse/YARN-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934091#comment-16934091 ] Steven Rand commented on YARN-4946: --- I created YARN-9848 for reverting. > RM should not consider an

[jira] [Updated] (YARN-9848) revert YARN-4946

2019-09-20 Thread Steven Rand (Jira)
[ https://issues.apache.org/jira/browse/YARN-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated YARN-9848: -- Attachment: YARN-9848-01.patch > revert YARN-4946 > > > Key: YARN-9848

[jira] [Commented] (YARN-9848) revert YARN-4946

2019-09-20 Thread Steven Rand (Jira)
[ https://issues.apache.org/jira/browse/YARN-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934095#comment-16934095 ] Steven Rand commented on YARN-9848: --- Attached a patch which reverts YARN-4946 on trunk. The revert

[jira] [Commented] (YARN-9552) FairScheduler: NODE_UPDATE can cause NoSuchElementException

2019-09-20 Thread Steven Rand (Jira)
[ https://issues.apache.org/jira/browse/YARN-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934105#comment-16934105 ] Steven Rand commented on YARN-9552: --- Thanks! > FairScheduler: NODE_UPDATE can cause

[jira] [Commented] (YARN-8990) Fix fair scheduler race condition in app submit and queue cleanup

2019-11-04 Thread Steven Rand (Jira)
[ https://issues.apache.org/jira/browse/YARN-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967099#comment-16967099 ] Steven Rand commented on YARN-8990: --- Hi all, Unfortunately, this patch never made its way into the

[jira] [Comment Edited] (YARN-8990) Fix fair scheduler race condition in app submit and queue cleanup

2019-11-04 Thread Steven Rand (Jira)
[ https://issues.apache.org/jira/browse/YARN-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967099#comment-16967099 ] Steven Rand edited comment on YARN-8990 at 11/4/19 11:43 PM: - Hi all,

[jira] [Commented] (YARN-8470) Fair scheduler exception with SLS

2019-10-08 Thread Steven Rand (Jira)
[ https://issues.apache.org/jira/browse/YARN-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947163#comment-16947163 ] Steven Rand commented on YARN-8470: --- Hi [~snemeth], [~szegedim], Friendly ping on this ticket. We've

[jira] [Commented] (YARN-8990) Fix fair scheduler race condition in app submit and queue cleanup

2020-01-27 Thread Steven Rand (Jira)
[ https://issues.apache.org/jira/browse/YARN-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17024799#comment-17024799 ] Steven Rand commented on YARN-8990: --- How would people feel about cherrypicking this and YARN-8992 to

[jira] [Commented] (YARN-4946) RM should not consider an application as COMPLETED when log aggregation is not in a terminal state

2020-01-08 Thread Steven Rand (Jira)
[ https://issues.apache.org/jira/browse/YARN-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011446#comment-17011446 ] Steven Rand commented on YARN-4946: --- Any update on what we want to do here? It seems like we're starting

[jira] [Commented] (YARN-9848) revert YARN-4946

2020-04-20 Thread Steven Rand (Jira)
[ https://issues.apache.org/jira/browse/YARN-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087990#comment-17087990 ] Steven Rand commented on YARN-9848: --- Thanks all. I also have a patch for branch-3.2 so that we can

[jira] [Updated] (YARN-10244) backport YARN-9848 to branch-3.2

2020-04-26 Thread Steven Rand (Jira)
[ https://issues.apache.org/jira/browse/YARN-10244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated YARN-10244: --- Attachment: YARN-10244-branch-3.2.001.patch > backport YARN-9848 to branch-3.2 >

[jira] [Commented] (YARN-9848) revert YARN-4946

2020-04-26 Thread Steven Rand (Jira)
[ https://issues.apache.org/jira/browse/YARN-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092756#comment-17092756 ] Steven Rand commented on YARN-9848: --- I created YARN-10244 for branch-3.2. For resolving this issue, I'm

  1   2   >