Production results of push-based shuffle after rolling out to 100% of Spark workloads at LinkedIn

mshen Thu, 15 Apr 2021 13:50:18 -0700

Hi,

We previously raised the SPIP for push-based shuffle in  SPARK-30602
<https://issues.apache.org/jira/browse/SPARK-30602>  .
Thanks for the reviews from the community, a significant portion of the code
has already been merged.


In the meantime, we have been continuing to improve the solution at LinkedIn
to scale it to cover 100% of offline Spark workloads at LinkedIn, and we
reached that milestone last month.
We have observed a significant improvement to the shuffle operation
efficiency as well as job runtime across the clusters, and the results are
shared in the following blog post.
https://www.linkedin.com/pulse/bringing-next-gen-shuffle-architecture-data-linkedin-scale-min-shen/

Would like to get feedbacks from the community on the content covered in the
blog post.
In addition, since the release timeline for Spark 3.2 is now postponed till
September, we believe it would be reasonable to include push-based shuffle
as part of Spark 3.2 release itself, given that this feature has already
been validated in production at scale.
Want to also bring attention to the various patches currently under/pending
reviews under SPARK-30602, so we can get more eyes on the remaining patches.



-----
Min Shen
Sr. Staff Software Engineer
LinkedIn
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Production results of push-based shuffle after rolling out to 100% of Spark workloads at LinkedIn

Reply via email to