Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/21698
Thank you for your comments @mridulm !
We focus on resolving the RDD.repartition() correctness issue here in this
PR, because it is most commonly used, and that we can still address the
RDD.zip* issue using the similar approach. At first I was worried that the
changes may be huge and trying to address the correctness issue for multiple
operations may make it difficult to backport the PR. But now it turns out that
the PR didn't change that much code, so maybe I can consider include the
RDD.zip* fix in this PR too.
Since you are also deeply involved in the related discussion on the
correctness issue caused by non-deterministic input for shuffle, you may also
agree that there is actually no easy way to both guarantee correctness and
don't cause serious performance drop-off. I have to insist that correctness
always goes beyond performance concerns, and that we shall not expect users to
always remember they may hit a known correctness bug in case of some use
patterns.
As for the proposed solution, there are actually two ways to follow: Either
you insert a local sort before a shuffle repartition (that's how we fixed the
DataFrame.repartition()), or you always retry the whole stage with repartition
on FetchFailure. The problem with the local-sort solution is that, it can't fix
all the problems for RDD (since the data type of an RDD can be not sortable,
and it's hard to construct a sorting for a generic type), also it can make the
time consumption of repartition() increases by 3X ~ 5X. By applying the
approach proposed in this PR, the performance shall keep the same in case no
FetchFailure happens, and it shall works well for DataFrames as well as for
RDDs.
I have to admit that if you have a big query running on a huge cluster, and
the tasks can easily hit FetchFailure issues, then you may see the job takes
more time to finish (or even fall due to reach max consequence stage failure
limit). But again, your big query may be producing wrong result without a
patch, and I have to say that is even more unacceptable :( . As for the
cascading cost, you are right, it makes things worse, and I don't have good
advice for that issue.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]