[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-19 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-630938158 Sounds good, happy to help coordinate with any reviews needed. Would like us to be able to start using this in 3.1 :)

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-18 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-630365923 Merged to master (target 3.1). Let me know if you're interested in doing any of the follow-ups @ngone51 / @prakharjain09 otherwise I'll get that started after the shuffle

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-18 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-630363255 LGTM. Thanks for working on this @prakharjain09 I know how frustrating debugging test-only issues that only show up in Jenkins can be. Thanks for taking the time to

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-15 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-629389491 The hive test failure is probably a flaky test. Jenkins retest this please This is an automated message from

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-14 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-628933224 Seeing weird network issues on Jenkins. Jenkins retest this please. This is an automated message from the

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-14 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-628815289 From the dev@ list post it seems like the maven fetch issue is known and being worked on. You might want to cherry-pick the set -x I've got in though as well so we can make

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-14 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-628792196 Could you add the code to avoide the infinite retry loop on error & also checking thread interrupted incase something else swallows the thread interruption exception in the

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-08 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-625921906 Ok running just those three tests alone in SBT on my local host doesn't show any failures on this branch. I'm not seeing the memory error we had earlier that would have

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-08 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-625665354 Jenkins retest this please. I feel like I’ve seen these particular tests fail before; let’s dig into what’s causing them to fail.

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-08 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-625664892 Right so just send SIGPWR to the worker. Are you saying in standalone mode you have one worker with multiple executors and you want to decommission a specific executor?

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-07 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-625514065 Jenkins retest this please This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-07 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-625457201 Jenkins, retest this please This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-07 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-625457036 So SIGPWR doesn't require immediate shutdown, and it's trappable. This is an automated message from the Apache

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-05 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-624346604 Jenkins retest this please. Jenkins add to whitelist Also @prakharjain09 you can always just push another commit to re-trigger the tests (you might be able to ask Jenkins

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-04 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-623734294 Ok so I think the CLI test is flaky (looking at it it's awaint a future and timing out so that's not surprising). If you can re-enable your tests @prakharjain09 I think the new

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-04 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-623593170 I think those three are not related. Jenkins retest this please This is an automated message from the Apache

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-01 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-622475169 Jenkins retest this please This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-30 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-622203530 So I tried it out locally, and if instead of disabling the entire stop (we want to keep the interrupt call), we call interrupt & inside of the catch block with interrupt set

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-30 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-622197638 Oh yeah to be clear the line numbers inside of block manager are different because I was playing with some debugging, but the rest of it should be fairly direct.

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-30 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-62219 Looks like the tests are passing but were still seeing the executor hang, I did a jstack dump on a local run and I got: > 2020-04-30 17:44:40 > Full thread dump

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-30 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-622111857 Ok so it isn't (currently) timing out but the OOM is a bit worrying. Jenkins retest this please This is an

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-30 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-621945031 @ScrapCodes: In the future (and I've filed a JIRA for this), for non-voluntary scale downs we can try and prioritize blocks, but I think this is a solid first step :)

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-29 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-621507928 @scrapcodes It depends on your cluster, but it could be anywhere from 1 second to several hours. Generally, though I'd expect most situations to be in the minutes time frame.

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-27 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-620077765 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and