[jira] [Updated] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage
[ https://issues.apache.org/jira/browse/SPARK-35881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-35881: -- Fix Version/s: 3.2.0 > [SQL] AQE does not support columnar execution for the final query stage > --- > > Key: SPARK-35881 > URL: https://issues.apache.org/jira/browse/SPARK-35881 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 3.2.0, 3.3.0 > > > In AdaptiveSparkPlanExec, a query is broken down into stages and these stages > are executed until the entire query has been executed. These stages can be > row-based or columnar. However, the final stage, produced by the private > getFinalPhysicalPlan method is always assumed to be row-based. The only way > to execute the final stage is by calling the various doExecute methods on > AdaptiveSparkPlanExec, and doExecuteColumnar is not implemented. The > supportsColumnar method also always returns false. > In the RAPIDS Accelerator for Apache Spark, we currently call the private > getFinalPhysicalPlan method using reflection and then determine if that plan > is columnar or not, and then call the appropriate doExecute method, bypassing > the doExecute methods on AdaptiveSparkPlanExec. We would like a supported > mechanism for executing a columnar AQE plan so that we do not need to use > reflection. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage
[ https://issues.apache.org/jira/browse/SPARK-35881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-35881: -- Fix Version/s: (was: 3.2.0) > [SQL] AQE does not support columnar execution for the final query stage > --- > > Key: SPARK-35881 > URL: https://issues.apache.org/jira/browse/SPARK-35881 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 3.3.0 > > > In AdaptiveSparkPlanExec, a query is broken down into stages and these stages > are executed until the entire query has been executed. These stages can be > row-based or columnar. However, the final stage, produced by the private > getFinalPhysicalPlan method is always assumed to be row-based. The only way > to execute the final stage is by calling the various doExecute methods on > AdaptiveSparkPlanExec, and doExecuteColumnar is not implemented. The > supportsColumnar method also always returns false. > In the RAPIDS Accelerator for Apache Spark, we currently call the private > getFinalPhysicalPlan method using reflection and then determine if that plan > is columnar or not, and then call the appropriate doExecute method, bypassing > the doExecute methods on AdaptiveSparkPlanExec. We would like a supported > mechanism for executing a columnar AQE plan so that we do not need to use > reflection. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage
[ https://issues.apache.org/jira/browse/SPARK-35881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated SPARK-35881: --- Affects Version/s: 3.0.3 3.1.2 > [SQL] AQE does not support columnar execution for the final query stage > --- > > Key: SPARK-35881 > URL: https://issues.apache.org/jira/browse/SPARK-35881 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Andy Grove >Priority: Major > > In AdaptiveSparkPlanExec, a query is broken down into stages and these stages > are executed until the entire query has been executed. These stages can be > row-based or columnar. However, the final stage, produced by the private > getFinalPhysicalPlan method is always assumed to be row-based. The only way > to execute the final stage is by calling the various doExecute methods on > AdaptiveSparkPlanExec, and doExecuteColumnar is not implemented. The > supportsColumnar method also always returns false. > In the RAPIDS Accelerator for Apache Spark, we currently call the private > getFinalPhysicalPlan method using reflection and then determine if that plan > is columnar or not, and then call the appropriate doExecute method, bypassing > the doExecute methods on AdaptiveSparkPlanExec. We would like a supported > mechanism for executing a columnar AQE plan so that we do not need to use > reflection. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage
[ https://issues.apache.org/jira/browse/SPARK-35881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated SPARK-35881: --- Description: In AdaptiveSparkPlanExec, a query is broken down into stages and these stages are executed until the entire query has been executed. These stages can be row-based or columnar. However, the final stage, produced by the private getFinalPhysicalPlan method is always assumed to be row-based. The only way to execute the final stage is by calling the various doExecute methods on AdaptiveSparkPlanExec, and doExecuteColumnar is not implemented. The supportsColumnar method also always returns false. In the RAPIDS Accelerator for Apache Spark, we currently call the private getFinalPhysicalPlan method using reflection and then determine if that plan is columnar or not, and then call the appropriate doExecute method, bypassing the doExecute methods on AdaptiveSparkPlanExec. We would like a supported mechanism for executing a columnar AQE plan so that we do not need to use reflection. was: In AdaptiveSparkPlanExec, a query is broken down into stages and these stages are executed until the entire query has been executed. These stages can be row-based or columnar. However, the final stage, produced by the private getFinalPhysicalPlan method is always assumed to be row-based. The only way to execute the final stage is by calling the various doExecute methods on AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, which is another limitation. However, AQE is special because we don't know if the final stage will be columnar or not until the child stages have been executed and the final stage has been re-planned and re-optimized, so we can't easily change the behavior of supportsColumnar. We can't just implement doExecuteColumnar because we don't know whether the final stage will be columnar oir not until after we start executing the query. In the RAPIDS Accelerator for Apache Spark, we currently call the private getFinalPhysicalPlan method using reflection and then determine if that plan is columnar or not, and then calling the appropriate doExecute method, bypassing the doExecute methods on AdaptiveSparkPlanExec. I propose that we make getFinalPhysicalPlan public, and part of the developer API, so that columnar plugins can call this method and determine if the final stage is columnar or not, and execute it appropriately. This would not affect any existing Spark functionality. We also need a mechanism for invoking finalPlanUpdate after the query has been executed. > [SQL] AQE does not support columnar execution for the final query stage > --- > > Key: SPARK-35881 > URL: https://issues.apache.org/jira/browse/SPARK-35881 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Andy Grove >Priority: Major > > In AdaptiveSparkPlanExec, a query is broken down into stages and these stages > are executed until the entire query has been executed. These stages can be > row-based or columnar. However, the final stage, produced by the private > getFinalPhysicalPlan method is always assumed to be row-based. The only way > to execute the final stage is by calling the various doExecute methods on > AdaptiveSparkPlanExec, and doExecuteColumnar is not implemented. The > supportsColumnar method also always returns false. > In the RAPIDS Accelerator for Apache Spark, we currently call the private > getFinalPhysicalPlan method using reflection and then determine if that plan > is columnar or not, and then call the appropriate doExecute method, bypassing > the doExecute methods on AdaptiveSparkPlanExec. We would like a supported > mechanism for executing a columnar AQE plan so that we do not need to use > reflection. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage
[ https://issues.apache.org/jira/browse/SPARK-35881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated SPARK-35881: --- Description: In AdaptiveSparkPlanExec, a query is broken down into stages and these stages are executed until the entire query has been executed. These stages can be row-based or columnar. However, the final stage, produced by the private getFinalPhysicalPlan method is always assumed to be row-based. The only way to execute the final stage is by calling the various doExecute methods on AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, which is another limitation. However, AQE is special because we don't know if the final stage will be columnar or not until the child stages have been executed and the final stage has been re-planned and re-optimized, so we can't easily change the behavior of supportsColumnar. We can't just implement doExecuteColumnar because we don't know whether the final stage will be columnar oir not until after we start executing the query. In the RAPIDS Accelerator for Apache Spark, we currently call the private getFinalPhysicalPlan method using reflection and then determine if that plan is columnar or not, and then calling the appropriate doExecute method, bypassing the doExecute methods on AdaptiveSparkPlanExec. I propose that we make getFinalPhysicalPlan public, and part of the developer API, so that columnar plugins can call this method and determine if the final stage is columnar or not, and execute it appropriately. This would not affect any existing Spark functionality. We also need a mechanism for invoking finalPlanUpdate after the query has been executed. was: In AdaptiveSparkPlanExec, a query is broken down into stages and these stages are executed until the entire query has been executed. These stages can be row-based or columnar. However, the final stage, produced by the private getFinalPhysicalPlan method is always assumed to be row-based. The only way to execute the final stage is by calling the various doExecute methods on AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, which is another limitation. However, AQE is special because we don't know if the final stage will be columnar or not until the child stages have been executed and the final stage has been re-planned and re-optimized, so we can't easily change the behavior of supportsColumnar. We can't just implement doExecuteColumnar because we don't know whether the final stage will be columnar oir not until after we start executing the query. In the RAPIDS Accelerator for Apache Spark, we currently call the private getFinalPhysicalPlan method using reflection and then determine if that plan is columnar or not, and then calling the appropriate doExecute method, bypassing the doExecute methods on AdaptiveSparkPlanExec. I propose that we make getFinalPhysicalPlan public, and part of the developer API, so that columnar plugins can call this method and determine if the final stage is columnar or not, and execute it appropriately. This would not affect any existing Spark functionality. > [SQL] AQE does not support columnar execution for the final query stage > --- > > Key: SPARK-35881 > URL: https://issues.apache.org/jira/browse/SPARK-35881 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Andy Grove >Priority: Major > > In AdaptiveSparkPlanExec, a query is broken down into stages and these stages > are executed until the entire query has been executed. These stages can be > row-based or columnar. However, the final stage, produced by the private > getFinalPhysicalPlan method is always assumed to be row-based. The only way > to execute the final stage is by calling the various doExecute methods on > AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, > which is another limitation. However, AQE is special because we don't know if > the final stage will be columnar or not until the child stages have been > executed and the final stage has been re-planned and re-optimized, so we > can't easily change the behavior of supportsColumnar. We can't just implement > doExecuteColumnar because we don't know whether the final stage will be > columnar oir not until after we start executing the query. > In the RAPIDS Accelerator for Apache Spark, we currently call the private > getFinalPhysicalPlan method using reflection and then determine if that plan > is columnar or not, and then calling the appropriate doExecute method, > bypassing the doExecute methods on AdaptiveSparkPlanExec. > I propose that we make getFinalPhysicalPlan public, and part of the developer > API, so that columnar plugins
[jira] [Updated] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage
[ https://issues.apache.org/jira/browse/SPARK-35881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove updated SPARK-35881: --- Description: In AdaptiveSparkPlanExec, a query is broken down into stages and these stages are executed until the entire query has been executed. These stages can be row-based or columnar. However, the final stage, produced by the private getFinalPhysicalPlan method is always assumed to be row-based. The only way to execute the final stage is by calling the various doExecute methods on AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, which is another limitation. However, AQE is special because we don't know if the final stage will be columnar or not until the child stages have been executed and the final stage has been re-planned and re-optimized, so we can't easily change the behavior of supportsColumnar. We can't just implement doExecuteColumnar because we don't know whether the final stage will be columnar oir not until after we start executing the query. In the RAPIDS Accelerator for Apache Spark, we currently call the private getFinalPhysicalPlan method using reflection and then determine if that plan is columnar or not, and then calling the appropriate doExecute method, bypassing the doExecute methods on AdaptiveSparkPlanExec. I propose that we make getFinalPhysicalPlan public, and part of the developer API, so that columnar plugins can call this method and determine if the final stage is columnar or not, and execute it appropriately. This would not affect any existing Spark functionality. was: In AdaptiveSparkPlanExec, a query is broken down into stages and these stages are executed until the entire query has been executed. These stages can be row-based or columnar. However, the final stage, produced by the private getFinalPhysicalPlan method is always assumed to be row-based. The only way to execute the final stage is by calling the various doExecute methods on AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, which is another limitation. However, AQE is special because we don't know if the final stage will be columnar or not until the child stages have been executed and the final stage has been re-planned and re-optimized, so we can't easily change the behavior of supportsColumnar. We can't just implement doExecuteColumnar because we don't know whether the final stage will be columnar oir not until after we start executing the query. In the RAPIDS Accelerator for Apache Spark, we currently call the private getFinalPhysicalPlan method using reflection and then invoke that plan, bypassing the doExecute methods on AdaptiveSparkPlanExec. I propose that we make getFinalPhysicalPlan public, and part of the developer API, so that columnar plugins can call this method and determine if the final stage is columnar or not, and execute it appropriately. This would not affect any existing Spark functionality. > [SQL] AQE does not support columnar execution for the final query stage > --- > > Key: SPARK-35881 > URL: https://issues.apache.org/jira/browse/SPARK-35881 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Andy Grove >Priority: Major > > In AdaptiveSparkPlanExec, a query is broken down into stages and these stages > are executed until the entire query has been executed. These stages can be > row-based or columnar. However, the final stage, produced by the private > getFinalPhysicalPlan method is always assumed to be row-based. The only way > to execute the final stage is by calling the various doExecute methods on > AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, > which is another limitation. However, AQE is special because we don't know if > the final stage will be columnar or not until the child stages have been > executed and the final stage has been re-planned and re-optimized, so we > can't easily change the behavior of supportsColumnar. We can't just implement > doExecuteColumnar because we don't know whether the final stage will be > columnar oir not until after we start executing the query. > In the RAPIDS Accelerator for Apache Spark, we currently call the private > getFinalPhysicalPlan method using reflection and then determine if that plan > is columnar or not, and then calling the appropriate doExecute method, > bypassing the doExecute methods on AdaptiveSparkPlanExec. > I propose that we make getFinalPhysicalPlan public, and part of the developer > API, so that columnar plugins can call this method and determine if the final > stage is columnar or not, and execute it appropriately. This would not affect > any existing Spark functionality. >