[ https://issues.apache.org/jira/browse/SPARK-42776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700286#comment-17700286 ]
Timothy Miller commented on SPARK-42776: ---------------------------------------- A little more detail about the sequence events that cause this bug: * org.apache.spark.sql.execution.RemoveRedundantProjects is applied * that causes BroadcastHashJoinExec to get created * org.apache.spark.sql.execution.exchange.EnsureRequirements is applied * BroadcastHashJoinExec.requiredChildDistribution gets called, creating the hashmap object that gets broadcast * a few more rules are applied, followed by org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions I can't find a way to inject extra rules into or between RemoveRedundantProjects or EnsureRequirements, so there doesn't seem to be a workaround either. > BroadcastHashJoinExec.requiredChildDistribution called before columnar > replacement rules > ---------------------------------------------------------------------------------------- > > Key: SPARK-42776 > URL: https://issues.apache.org/jira/browse/SPARK-42776 > Project: Spark > Issue Type: Bug > Components: Optimizer > Affects Versions: 3.3.1 > Environment: I'm prototyping on a Mac, but that's not really relevant. > Reporter: Timothy Miller > Priority: Major > > I am trying to replace BroadcastHashJoinExec with a columnar equivalent. > However, I noticed that BroadcastHashJoinExec.requiredChildDistribution gets > called BEFORE the columnar replacement rules. As a result, the object that > gets broadcast is the plain old hashmap created from row data. By the time > the columnar replacement rules are applied, it's too late to get Spark to > broadcast any other kind of object. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org