[GitHub] [spark] wypoon commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-07-21 Thread GitBox
wypoon commented on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-662246115 I created https://github.com/apache/spark/pull/29182 for the backport to branch-2.4. This is an automated

[GitHub] [spark] wypoon commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-07-21 Thread GitBox
wypoon commented on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-662003702 @tgravescs @squito I made some small tweaks to the scaladoc comments and also had to rebase since you approved the change. It has been open for some time now to any additional

[GitHub] [spark] wypoon commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-07-15 Thread GitBox
wypoon commented on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-658933529 PySpark packaging tests fail with ``` Writing pyspark-3.1.0.dev0/setup.cfg creating dist Creating tar archive removing 'pyspark-3.1.0.dev0' (and everything under

[GitHub] [spark] wypoon commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-07-07 Thread GitBox
wypoon commented on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-655000416 Thanks @dongjoon-hyun for the move to disable the doc generation in Jenkins. This is an automated message from

[GitHub] [spark] wypoon commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-06-29 Thread GitBox
wypoon commented on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-651247829 @tgravescs @squito are you satisfied with the logic of the change? I have renamed the epochs and improved the comments.

[GitHub] [spark] wypoon commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-06-25 Thread GitBox
wypoon commented on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-649968228 In the latest update, there are three changes: 1. `failedEpoch` and `fileLostEpoch` are renamed and comments explaining what they are are expanded, largely based on

[GitHub] [spark] wypoon commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-06-25 Thread GitBox
wypoon commented on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-649874411 > @wypoon if you have not started extending the test with the multiple fetch failures case you can use this I you agree with it: >

[GitHub] [spark] wypoon commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-06-23 Thread GitBox
wypoon commented on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-648344459 @tgravescs a suggestion to improve the title of the PR is also welcome. It is hard to do justice in one simple sentence. I see how you would fail to grasp what the change is for

[GitHub] [spark] wypoon commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-06-23 Thread GitBox
wypoon commented on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-648339593 > sorry if I wasn't clear. I think this approach of having the fileLostEpoch is better so we avoid the locking in MapOutputTracker. Personally I wouldn't mind fileLost being

[GitHub] [spark] wypoon commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-06-23 Thread GitBox
wypoon commented on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-648293945 @tgravescs as you point out, it is ok to call `mapOutputTracker.removeOutputsOnHost` or `mapOutputTracker.removeOutputsOnExecutor` multiple times with the same host/execId.

[GitHub] [spark] wypoon commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-06-23 Thread GitBox
wypoon commented on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-648290364 @tgravescs thanks for reviewing. Our customer was not using spark.files.fetchFailure.unRegisterOutputOnHost. In case of `FetchFailure`, in

[GitHub] [spark] wypoon commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-06-22 Thread GitBox
wypoon commented on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-647906200 retest this please This is an automated message from the Apache Git Service. To respond to the message, please