[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications

2016-08-09 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413725#comment-15413725 ] Jason Lowe commented on YARN-5382: -- +1 for the latest patch. Committing this. > RM does not audit log

[jira] [Commented] (YARN-5479) FairScheduler: Scheduling performance improvement

2016-08-09 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413653#comment-15413653 ] Jason Lowe commented on YARN-5479: -- bq. While doing so does not seemly cause any problem in production

[jira] [Updated] (YARN-5483) Optimize RMAppAttempt#pullJustFinishedContainers

2016-08-09 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-5483: - Affects Version/s: 2.6.0 Target Version/s: 2.6.5, 2.7.4 bq. I don't think YARN-5262 breaks this jira.

[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications

2016-08-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412593#comment-15412593 ] Jason Lowe commented on YARN-5382: -- The auditLogKillEvent method is now being called from the

[jira] [Commented] (YARN-5479) FairScheduler: Scheduling performance improvement

2016-08-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412177#comment-15412177 ] Jason Lowe commented on YARN-5479: -- Agree the proposals are interesting. I'd love to get the overhead of

[jira] [Updated] (YARN-5482) ContainerMetric Lead to memory leaks

2016-08-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-5482: - Assignee: tangshangwen (was: Jason Lowe) This is really more of a duplicate of YARN-5341 since this was

[jira] [Assigned] (YARN-5482) ContainerMetric Lead to memory leaks

2016-08-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned YARN-5482: Assignee: Jason Lowe (was: tangshangwen) > ContainerMetric Lead to memory leaks >

[jira] [Commented] (YARN-5483) Optimize RMAppAttempt#pullJustFinishedContainers

2016-08-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411939#comment-15411939 ] Jason Lowe commented on YARN-5483: -- Thanks for the report and patch, [~sandflee]! +1, patch looks good to

[jira] [Updated] (YARN-4573) TestRMAppTransitions.testAppRunningKill and testAppKilledKilled fail on trunk

2016-08-05 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-4573: - Fix Version/s: (was: 2.9.0) 2.7.4 2.6.5 2.8.0

[jira] [Resolved] (YARN-5469) Increase timeout of TestAmFilter.testFilter

2016-08-03 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe resolved YARN-5469. -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.7.4 2.8.0

[jira] [Commented] (YARN-5469) Increase timeout of TestAmFilter.testFilter

2016-08-03 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406512#comment-15406512 ] Jason Lowe commented on YARN-5469: -- +1 lgtm. One second test timeouts are clearly too low. A JVM or I/O

[jira] [Updated] (YARN-4717) TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IllegalArgumentException from cleanup

2016-08-03 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-4717: - Fix Version/s: (was: 2.9.0) 2.7.4 2.8.0 Thanks, [~templedf]! I

[jira] [Commented] (YARN-5462) TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown fails intermittently

2016-08-03 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406441#comment-15406441 ] Jason Lowe commented on YARN-5462: -- +1 lgtm. Committing this. >

[jira] [Commented] (YARN-4280) CapacityScheduler reservations may not prevent indefinite postponement on a busy cluster

2016-08-03 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406411#comment-15406411 ] Jason Lowe commented on YARN-4280: -- +1 for the branch-2.8 patch as well. Committing this. >

[jira] [Commented] (YARN-5451) Container localizers that hang are not cleaned up

2016-08-03 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406362#comment-15406362 ] Jason Lowe commented on YARN-5451: -- Note that we can get localizers to stop today (e.g.: when the

[jira] [Commented] (YARN-5451) Container localizers that hang are not cleaned up

2016-08-03 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406001#comment-15406001 ] Jason Lowe commented on YARN-5451: -- No, rather it's more an issue that there's no concept of heartbeat

[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications

2016-08-02 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404694#comment-15404694 ] Jason Lowe commented on YARN-5382: -- If we keep the kill success logging in both a transition and in

[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications

2016-08-02 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404090#comment-15404090 ] Jason Lowe commented on YARN-5382: -- bq. Does user expect audit logging both before killing and after

[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications

2016-08-01 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402675#comment-15402675 ] Jason Lowe commented on YARN-5382: -- Thanks for updating the patch! Nit: RMAppKillByClientLogEvent should

[jira] [Created] (YARN-5451) Container localizers that hang are not cleaned up

2016-07-29 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-5451: Summary: Container localizers that hang are not cleaned up Key: YARN-5451 URL: https://issues.apache.org/jira/browse/YARN-5451 Project: Hadoop YARN Issue Type: Bug

[jira] [Commented] (YARN-4280) CapacityScheduler reservations may not prevent indefinite postponement on a busy cluster

2016-07-29 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15399958#comment-15399958 ] Jason Lowe commented on YARN-4280: -- +1 for the latest patch. I'll commit this sometime next week unless

[jira] [Commented] (YARN-4280) CapacityScheduler reservations may not prevent indefinite postponement on a busy cluster

2016-07-28 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398328#comment-15398328 ] Jason Lowe commented on YARN-4280: -- Thanks for updating the patch! The parent queue code is now

[jira] [Commented] (YARN-5416) TestRMRestart#testRMRestartWaitForPreviousAMToFinish failed intermittently due to not wait SchedulerApplicationAttempt to be stopped

2016-07-27 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396249#comment-15396249 ] Jason Lowe commented on YARN-5416: -- bq. I think we can close this as dup of that. What do you think? I

[jira] [Commented] (YARN-4280) CapacityScheduler reservations may not prevent indefinite postponement on a busy cluster

2016-07-27 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396200#comment-15396200 ] Jason Lowe commented on YARN-4280: -- Thanks for updating the patch, Kuhu! The copy constructor for

[jira] [Commented] (YARN-5438) TimelineClientImpl leaking FileSystem Instances causing Long running services like HiverServer2 daemon going OOM

2016-07-27 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396195#comment-15396195 ] Jason Lowe commented on YARN-5438: -- Ah, thanks Rohith. My bad, I missed that it was creating the

[jira] [Commented] (YARN-5438) TimelineClientImpl leaking FileSystem Instances causing Long running services like HiverServer2 daemon going OOM

2016-07-27 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396044#comment-15396044 ] Jason Lowe commented on YARN-5438: -- Thanks for the patch, Rohith! This probably works for the HiveServer2

[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications

2016-07-26 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394699#comment-15394699 ] Jason Lowe commented on YARN-5382: -- Note that if the trunk patch applies as-is to branch-2 (as I suspect

[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications

2016-07-26 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394678#comment-15394678 ] Jason Lowe commented on YARN-5382: -- Thanks for the update, Vrushali! I should have said this earlier:

[jira] [Commented] (YARN-5423) Yarn applications are failing when job submitter use is re-created in OS.

2016-07-22 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389613#comment-15389613 ] Jason Lowe commented on YARN-5423: -- This sounds like a bug in the user re-creation procedure rather than

[jira] [Commented] (YARN-5092) TestRMDelegationTokens fails intermittently

2016-07-22 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389615#comment-15389615 ] Jason Lowe commented on YARN-5092: -- All unrelated. The findbugs issue is extant, as already reported.

[jira] [Updated] (YARN-3707) RM Web UI queue filter doesn't work

2016-07-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-3707: - Fix Version/s: 2.7.4 Thanks, Wangda! I committed this to branch-2.7 as well. > RM Web UI queue filter

[jira] [Resolved] (YARN-5417) Clicking queue on CapacityScheduler web page computes wrong app search filter

2016-07-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe resolved YARN-5417. -- Resolution: Duplicate > Clicking queue on CapacityScheduler web page computes wrong app search filter >

[jira] [Commented] (YARN-5417) Clicking queue on CapacityScheduler web page computes wrong app search filter

2016-07-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388145#comment-15388145 ] Jason Lowe commented on YARN-5417: -- Duplicate of YARN-3707. > Clicking queue on CapacityScheduler web

[jira] [Created] (YARN-5417) Clicking queue on CapacityScheduler web page computes wrong app search filter

2016-07-21 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-5417: Summary: Clicking queue on CapacityScheduler web page computes wrong app search filter Key: YARN-5417 URL: https://issues.apache.org/jira/browse/YARN-5417 Project: Hadoop

[jira] [Commented] (YARN-5416) TestRMRestart#testRMRestartWaitForPreviousAMToFinish failed intermittently due to not wait SchedulerApplicationAttempt to be stopped

2016-07-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388082#comment-15388082 ] Jason Lowe commented on YARN-5416: -- This looks like an exact dup of YARN-1468 which you also filed. Are

[jira] [Updated] (YARN-5092) TestRMDelegationTokens fails intermittently

2016-07-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-5092: - Attachment: YARN-5092-branch-2.7.003.patch Thanks for the review and commit, Rohith! Attaching a patch

[jira] [Updated] (YARN-5057) resourcemanager.security.TestDelegationTokenRenewer fails in trunk

2016-07-20 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-5057: - Attachment: YARN-5057.001.patch There's a race in the test. MockRM.finishAMAndVerifyAppState only waits

[jira] [Commented] (YARN-5092) TestRMDelegationTokens fails intermittently

2016-07-20 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385929#comment-15385929 ] Jason Lowe commented on YARN-5092: -- The TestDelegationTokenRenewer failure from the earlier precommit run

[jira] [Updated] (YARN-5092) TestRMDelegationTokens fails intermittently

2016-07-20 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-5092: - Attachment: YARN-5092.003.patch Fixing the checkstyle reported issue for the unused import. >

[jira] [Updated] (YARN-5092) TestRMDelegationTokens fails intermittently

2016-07-19 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-5092: - Attachment: YARN-5092.002.patch Thanks for the review, Rohith! Good catch on the rm1.stop() suggestion.

[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications

2016-07-19 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384722#comment-15384722 ] Jason Lowe commented on YARN-5382: -- bq. Will update the patch to include auditing of killing of active

[jira] [Commented] (YARN-5401) yarn application kill does not let mapreduce jobs show up in jobhistory

2016-07-19 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384529#comment-15384529 ] Jason Lowe commented on YARN-5401: -- Yes, if an application framework provides a kill command then that

[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications

2016-07-19 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384373#comment-15384373 ] Jason Lowe commented on YARN-5382: -- I like the general idea, but I'm not sure a literal move of the audit

[jira] [Commented] (YARN-5401) yarn application kill does not let mapreduce jobs show up in jobhistory

2016-07-19 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384330#comment-15384330 ] Jason Lowe commented on YARN-5401: -- This is effectively a duplicate of YARN-2261. MapReduce history

[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications

2016-07-18 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382769#comment-15382769 ] Jason Lowe commented on YARN-5382: -- Ah, I see. The client is continuing to issue the kill request until

[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications

2016-07-15 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379561#comment-15379561 ] Jason Lowe commented on YARN-5382: -- Thanks for the patch, [~vrushalic]! I don't think we should do a

[jira] [Created] (YARN-5382) RM does not audit log kill request for active applications

2016-07-14 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-5382: Summary: RM does not audit log kill request for active applications Key: YARN-5382 URL: https://issues.apache.org/jira/browse/YARN-5382 Project: Hadoop YARN Issue

[jira] [Commented] (YARN-5370) Setting yarn.nodemanager.delete.debug-delay-sec to high number crashes NM because of OOM

2016-07-13 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375089#comment-15375089 ] Jason Lowe commented on YARN-5370: -- It's expected behavior in the sense that the debug delay setting

[jira] [Commented] (YARN-5317) testAMRestartNotLostContainerCompleteMsg may fail

2016-07-12 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373598#comment-15373598 ] Jason Lowe commented on YARN-5317: -- +1 lgtm. Filed YARN-5362 for the unrelated TestRMRestart failure.

[jira] [Created] (YARN-5362) TestRMRestart#testFinishedAppRemovalAfterRMRestart can fail

2016-07-12 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-5362: Summary: TestRMRestart#testFinishedAppRemovalAfterRMRestart can fail Key: YARN-5362 URL: https://issues.apache.org/jira/browse/YARN-5362 Project: Hadoop YARN Issue

[jira] [Updated] (YARN-4393) TestResourceLocalizationService#testFailedDirsResourceRelease fails intermittently

2016-07-12 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-4393: - Fix Version/s: (was: 2.9.0) 2.7.4 2.6.5 2.8.0

[jira] [Commented] (YARN-5317) testAMRestartNotLostContainerCompleteMsg may fail

2016-07-12 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372975#comment-15372975 ] Jason Lowe commented on YARN-5317: -- Thanks for the patch, [~sandflee]! Also I noticed the same code

[jira] [Commented] (YARN-5353) ResourceManager can leak delegation tokens when they are shared across apps

2016-07-12 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372953#comment-15372953 ] Jason Lowe commented on YARN-5353: -- Test failures are unrelated and pass for me locally with the patch

[jira] [Updated] (YARN-5353) ResourceManager can leak delegation tokens when they are shared across apps

2016-07-11 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-5353: - Attachment: YARN-5353.001.patch Seems to me that we need to make sure that the appTokens map always has

[jira] [Created] (YARN-5353) ResourceManager can leak delegation tokens when they are shared across apps

2016-07-11 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-5353: Summary: ResourceManager can leak delegation tokens when they are shared across apps Key: YARN-5353 URL: https://issues.apache.org/jira/browse/YARN-5353 Project: Hadoop YARN

[jira] [Updated] (YARN-5341) DefaultMetricsSystem leaks the source name when a source unregisters

2016-07-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-5341: - Summary: DefaultMetricsSystem leaks the source name when a source unregisters (was: Nodemanager leaks

[jira] [Commented] (YARN-5341) Nodemanager leaks ContainerResource names in UniqueNames for each container

2016-07-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368472#comment-15368472 ] Jason Lowe commented on YARN-5341: -- Looks like this may be a bug in the metrics system rather than

[jira] [Commented] (YARN-5296) NMs going OutOfMemory because ContainerMetrics leak in ContainerMonitorImpl

2016-07-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368453#comment-15368453 ] Jason Lowe commented on YARN-5296: -- I just ran across a ContainerMetrics leak that I thought could be the

[jira] [Created] (YARN-5341) Nodemanager leaks ContainerResource names in UniqueNames for each container

2016-07-08 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-5341: Summary: Nodemanager leaks ContainerResource names in UniqueNames for each container Key: YARN-5341 URL: https://issues.apache.org/jira/browse/YARN-5341 Project: Hadoop YARN

[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-07-06 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365070#comment-15365070 ] Jason Lowe commented on YARN-5215: -- Maybe I'm missing something, but any of the proposed approaches has

[jira] [Commented] (YARN-5292) Support for PAUSED container state

2016-06-24 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348359#comment-15348359 ] Jason Lowe commented on YARN-5292: -- I assume we need more than just a PAUSED state, correct? Seems like

[jira] [Resolved] (YARN-5290) ResourceManager can place more containers on a node than the node size allows

2016-06-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe resolved YARN-5290. -- Resolution: Duplicate > ResourceManager can place more containers on a node than the node size allows >

[jira] [Updated] (YARN-4148) When killing app, RM releases app's resource before they are released by NM

2016-06-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-4148: - Attachment: free_in_scheduler_but_not_node_prototype-branch-2.7.patch Sorry for joining the discussion

[jira] [Commented] (YARN-5290) ResourceManager can place more containers on a node than the node size allows

2016-06-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346569#comment-15346569 ] Jason Lowe commented on YARN-5290: -- Thanks for the pointer, Jun! This is a duplicate of YARN-4148. I'll

[jira] [Commented] (YARN-4280) CapacityScheduler reservations may not prevent indefinite postponement on a busy cluster

2016-06-22 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345266#comment-15345266 ] Jason Lowe commented on YARN-4280: -- bq. IIRC the headroom is a combination of the user limits and the

[jira] [Commented] (YARN-4280) CapacityScheduler reservations may not prevent indefinite postponement on a busy cluster

2016-06-22 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345203#comment-15345203 ] Jason Lowe commented on YARN-4280: -- Thanks for updating the patch, Kuhu! I'm confused by the

[jira] [Commented] (YARN-4862) Handle duplicate completed containers in RMNodeImpl

2016-06-22 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345069#comment-15345069 ] Jason Lowe commented on YARN-4862: -- bq. I also do not see this as a performance bottleneck, as we are

[jira] [Commented] (YARN-5290) ResourceManager can place more containers on a node than the node size allows

2016-06-22 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345064#comment-15345064 ] Jason Lowe commented on YARN-5290: -- We could have the RM wait until it receives hard confirmation from the

[jira] [Created] (YARN-5290) ResourceManager can place more containers on a node than the node size allows

2016-06-22 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-5290: Summary: ResourceManager can place more containers on a node than the node size allows Key: YARN-5290 URL: https://issues.apache.org/jira/browse/YARN-5290 Project: Hadoop

[jira] [Commented] (YARN-4280) CapacityScheduler reservations may not prevent indefinite postponement on a busy cluster

2016-06-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342666#comment-15342666 ] Jason Lowe commented on YARN-4280: -- I haven't had a chance to look at the patch yet, but I'm not thrilled

[jira] [Updated] (YARN-4280) CapacityScheduler reservations may not prevent indefinite postponement on a busy cluster

2016-06-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-4280: - Assignee: Kuhu Shukla (was: Jason Lowe) > CapacityScheduler reservations may not prevent indefinite

[jira] [Assigned] (YARN-4280) CapacityScheduler reservations may not prevent indefinite postponement on a busy cluster

2016-06-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned YARN-4280: Assignee: Jason Lowe (was: Kuhu Shukla) > CapacityScheduler reservations may not prevent

[jira] [Commented] (YARN-4862) Handle duplicate completed containers in RMNodeImpl

2016-06-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342367#comment-15342367 ] Jason Lowe commented on YARN-4862: -- What I was thinking is a similar idea to YARN-5197. We can track the

[jira] [Commented] (YARN-4862) Handle duplicate completed containers in RMNodeImpl

2016-06-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342008#comment-15342008 ] Jason Lowe commented on YARN-4862: -- Agree that the RM needs to inform the NM to stop tracking a container

[jira] [Commented] (YARN-5197) RM leaks containers if running container disappears from node update

2016-06-21 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341894#comment-15341894 ] Jason Lowe commented on YARN-5197: -- bq. is this possible that container info disappear from node update?

[jira] [Updated] (YARN-5197) RM leaks containers if running container disappears from node update

2016-06-20 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-5197: - Attachment: YARN-5197-branch-2.7.003.patch YARN-5197-branch-2.8.003.patch Thanks for the

[jira] [Commented] (YARN-5261) Lease/Reclaim Extension to Yarn

2016-06-16 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334930#comment-15334930 ] Jason Lowe commented on YARN-5261: -- This is very similar work to YARN-5215 and YARN-1011 / YARN-5202. It

[jira] [Commented] (YARN-5092) TestRMDelegationTokens fails intermittently

2016-06-10 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324962#comment-15324962 ] Jason Lowe commented on YARN-5092: -- The whitespace complaint is not related. It's complaining about some

[jira] [Updated] (YARN-5092) TestRMDelegationTokens fails intermittently

2016-06-10 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-5092: - Attachment: YARN-5092.001.patch There are two unrelated problems here. The class cast exception is caused

[jira] [Updated] (YARN-5197) RM leaks containers if running container disappears from node update

2016-06-10 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-5197: - Attachment: YARN-5197.003.patch Thanks for the review, Rohith! I updated the patch to add the GUARANTEED

[jira] [Commented] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location

2016-06-09 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323354#comment-15323354 ] Jason Lowe commented on YARN-314: - bq. Do we have applications that need this capability? Tez can sometimes

[jira] [Commented] (YARN-5202) Dynamic Overcommit of Node Resources - POC

2016-06-09 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15322717#comment-15322717 ] Jason Lowe commented on YARN-5202: -- We're fine with some of the work being moved into that effort. We're

[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-06-09 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15322685#comment-15322685 ] Jason Lowe commented on YARN-5215: -- bq. All in all, I see a strong connection with over-commit, but this

[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-06-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321648#comment-15321648 ] Jason Lowe commented on YARN-5215: -- Ah, so the headline was a bit misleading. Most people saw that and

[jira] [Commented] (YARN-5215) Scheduling containers based on load in the servers

2016-06-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321318#comment-15321318 ] Jason Lowe commented on YARN-5215: -- Besides YARN-1011 this is also very similar to the dynamic overcommit

[jira] [Updated] (YARN-4288) NodeManager restart should keep retrying to register to RM while connection exception happens during RM failed over.

2016-06-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-4288: - Fix Version/s: 2.7.3 Thanks, Junping! We've seen AMRMClientImpl die with connection reset by peer

[jira] [Updated] (YARN-5206) RegistrySecurity includes id:pass in exception text if considered invalid

2016-06-08 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-5206: - Affects Version/s: 2.6.4 Hadoop Flags: Reviewed > RegistrySecurity includes id:pass in exception

[jira] [Commented] (YARN-5206) RegistrySecurity includes id:pass in exception text if considered invalid

2016-06-07 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318507#comment-15318507 ] Jason Lowe commented on YARN-5206: -- +1 lgtm. Will commit this later today if no objections. >

[jira] [Updated] (YARN-5197) RM leaks containers if running container disappears from node update

2016-06-06 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-5197: - Attachment: YARN-5197.002.patch Updated the patch for the checkstyle issue. The test failures are tracked

[jira] [Updated] (YARN-5197) RM leaks containers if running container disappears from node update

2016-06-02 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-5197: - Attachment: YARN-5197.001.patch RMNodeImpl checks the list of running containers on the node against

[jira] [Created] (YARN-5197) RM leaks containers if running container disappears from node update

2016-06-02 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-5197: Summary: RM leaks containers if running container disappears from node update Key: YARN-5197 URL: https://issues.apache.org/jira/browse/YARN-5197 Project: Hadoop YARN

[jira] [Commented] (YARN-5193) For long running services, aggregate logs when a container completes instead of when the app completes

2016-06-02 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313075#comment-15313075 ] Jason Lowe commented on YARN-5193: -- I don't think long-running necessarily means low container churn,

[jira] [Commented] (YARN-4953) Delete completed container log folder when rolling log aggregation is enabled

2016-06-02 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312317#comment-15312317 ] Jason Lowe commented on YARN-4953: -- Sorry for missing this earlier. As I mentioned on YARN-5193, log

[jira] [Commented] (YARN-5193) For long running services, aggregate logs when a container completes instead of when the app completes

2016-06-02 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312315#comment-15312315 ] Jason Lowe commented on YARN-5193: -- Main thing to watch out for here is additional load to the namenode.

[jira] [Created] (YARN-5154) DelayedProcessKiller can kill the wrong process if pid is recycled

2016-05-25 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-5154: Summary: DelayedProcessKiller can kill the wrong process if pid is recycled Key: YARN-5154 URL: https://issues.apache.org/jira/browse/YARN-5154 Project: Hadoop YARN

[jira] [Updated] (YARN-4459) container-executor should only kill process groups

2016-05-25 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-4459: - Hadoop Flags: Reviewed Summary: container-executor should only kill process groups (was:

[jira] [Updated] (YARN-4459) container-executor might kill process wrongly

2016-05-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-4459: - Attachment: YARN-4459.03.patch Updated the patch to fix the unit test failure. Now that we're only

[jira] [Commented] (YARN-4459) container-executor might kill process wrongly

2016-05-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296729#comment-15296729 ] Jason Lowe commented on YARN-4459: -- Sorry to arrive to this late. I agree that we should be killing the

[jira] [Commented] (YARN-5120) Metric for RM async dispatcher queue size

2016-05-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296700#comment-15296700 ] Jason Lowe commented on YARN-5120: -- Note there are multiple important dispatchers in the ResourceManager

[jira] [Updated] (YARN-5055) max apps per user can be larger than max per queue

2016-05-23 Thread Jason Lowe (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-5055: - Hadoop Flags: Reviewed Summary: max apps per user can be larger than max per queue (was: max per

<    8   9   10   11   12   13   14   15   16   17   >