Andy Grove created SPARK-35881: ---------------------------------- Summary: [SQL] AQE does not support columnar execution for the final query stage Key: SPARK-35881 URL: https://issues.apache.org/jira/browse/SPARK-35881 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Andy Grove
In AdaptiveSparkPlanExec, a query is broken down into stages and these stages are executed until the entire query has been executed. These stages can be row-based or columnar. However, the final stage, produced by the private getFinalPhysicalPlan method is always assumed to be row-based. The only way to execute the final stage is by calling the various doExecute methods on AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, which is another limitation. However, AQE is special because we don't know if the final stage will be columnar or not until the child stages have been executed and the final stage has been re-planned and re-optimized, so we can't easily change the behavior of supportsColumnar. We can't just implement doExecuteColumnar because we don't know whether the final stage will be columnar oir not until after we start executing the query. In the RAPIDS Accelerator for Apache Spark, we currently call the private getFinalPhysicalPlan method using reflection and then invoke that plan, bypassing the doExecute methods on AdaptiveSparkPlanExec. I propose that we make getFinalPhysicalPlan public, and part of the developer API, so that columnar plugins can call this method and determine if the final stage is columnar or not, and execute it appropriately. This would not affect any existing Spark functionality. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org