Hi, We previously raised the SPIP for push-based shuffle in SPARK-30602 <https://issues.apache.org/jira/browse/SPARK-30602> . Thanks for the reviews from the community, a significant portion of the code has already been merged.
In the meantime, we have been continuing to improve the solution at LinkedIn to scale it to cover 100% of offline Spark workloads at LinkedIn, and we reached that milestone last month. We have observed a significant improvement to the shuffle operation efficiency as well as job runtime across the clusters, and the results are shared in the following blog post. https://www.linkedin.com/pulse/bringing-next-gen-shuffle-architecture-data-linkedin-scale-min-shen/ Would like to get feedbacks from the community on the content covered in the blog post. In addition, since the release timeline for Spark 3.2 is now postponed till September, we believe it would be reasonable to include push-based shuffle as part of Spark 3.2 release itself, given that this feature has already been validated in production at scale. Want to also bring attention to the various patches currently under/pending reviews under SPARK-30602, so we can get more eyes on the remaining patches. ----- Min Shen Sr. Staff Software Engineer LinkedIn -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org