[ https://issues.apache.org/jira/browse/SPARK-43987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mridul Muralidharan reassigned SPARK-43987: ------------------------------------------- Assignee: SHU WANG > Separate finalizeShuffleMerge Processing to Dedicated Thread Pools > ------------------------------------------------------------------ > > Key: SPARK-43987 > URL: https://issues.apache.org/jira/browse/SPARK-43987 > Project: Spark > Issue Type: Improvement > Components: Shuffle > Affects Versions: 3.2.0, 3.4.0 > Reporter: SHU WANG > Assignee: SHU WANG > Priority: Critical > > In our production environment, _finalizeShuffleMerge_ processing took longer > time (p90 is around 20s) than other PRC requests. This is due to > _finalizeShuffleMerge_ invoking IO operations like truncate and file > open/close. > More importantly, processing this _finalizeShuffleMerge_ can block other > critical lightweight messages like authentications, which can cause > authentication timeout as well as fetch failures. Those timeout and fetch > failures affect the stability of the Spark job executions. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org