gczsjdy commented on issue #24462: [SPARK-26268][CORE] Do not resubmit tasks
when executors are lost
URL: https://github.com/apache/spark/pull/24462#issuecomment-520799822
Thank you @squito
This is an automated message from
gczsjdy commented on issue #24462: [SPARK-26268][CORE] Do not resubmit tasks
when executors are lost
URL: https://github.com/apache/spark/pull/24462#issuecomment-519802295
@squito Yeah it saves us much, from a TPC-DS 1T benchmark, 30% queries get
1.1x+ performance boost, 13% get 1.2x +
gczsjdy commented on issue #24462: [SPARK-26268][CORE] Do not resubmit tasks
when executors are lost
URL: https://github.com/apache/spark/pull/24462#issuecomment-519354047
@squito Index and data files are both stored on DFS, the difference is that:
data files are directly read from DFS,
gczsjdy commented on issue #24462: [SPARK-26268][CORE] Do not resubmit tasks
when executors are lost
URL: https://github.com/apache/spark/pull/24462#issuecomment-514078853
@squito I met with a condition that cannot be satisfied without this PR:
- On map side, all shuffle files are
gczsjdy commented on issue #24462: [SPARK-26268][CORE] Do not resubmit tasks
when executors are lost
URL: https://github.com/apache/spark/pull/24462#issuecomment-510766013
@yifeih Thank you, I understand now. But can your way (making `MapStatus`
able to contain an empty location in order
gczsjdy commented on issue #24462: [SPARK-26268][CORE] Do not resubmit tasks
when executors are lost
URL: https://github.com/apache/spark/pull/24462#issuecomment-509537346
@yifeih It's very interesting, but I didn't 100% get it. The new `ShuffleIO`
API will not influence the existed
gczsjdy commented on issue #24462: [SPARK-26268][CORE] Do not resubmit tasks
when executors are lost
URL: https://github.com/apache/spark/pull/24462#issuecomment-509251934
@squito I agree with you, but still want to make sure I understand it right:
The function
gczsjdy commented on issue #24462: [SPARK-26268][CORE] Do not resubmit tasks
when executors are lost
URL: https://github.com/apache/spark/pull/24462#issuecomment-487843910
@bsidhom I agree that it would be ideal if there's a field in
`ShuffleManager` indicating 'whether it can serve
gczsjdy commented on issue #24462: [SPARK-26268][CORE] Do not resubmit tasks
when executors are lost
URL: https://github.com/apache/spark/pull/24462#issuecomment-487484432
@liupc Thanks for the explanation.
This is an
gczsjdy commented on issue #24462: [SPARK-26268][CORE] Do not resubmit tasks
when executors are lost
URL: https://github.com/apache/spark/pull/24462#issuecomment-487336835
@liupc Under remote shuffle, if certain executors are lost, we can still
fetch the shuffle data from remote
gczsjdy commented on issue #24462: [SPARK-26268][CORE] Do not resubmit tasks
when executors are lost
URL: https://github.com/apache/spark/pull/24462#issuecomment-486900975
@liupc Since shuffle manager is pluggable in Spark, this 'resubmit switch'
in scheduler should also be configurable.
11 matches
Mail list logo