[jira] [Updated] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage

2021-07-30 Thread Thomas Graves (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-35881:
--
Fix Version/s: 3.2.0

> [SQL] AQE does not support columnar execution for the final query stage
> ---
>
> Key: SPARK-35881
> URL: https://issues.apache.org/jira/browse/SPARK-35881
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 3.2.0, 3.3.0
>
>
> In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
> are executed until the entire query has been executed. These stages can be 
> row-based or columnar. However, the final stage, produced by the private 
> getFinalPhysicalPlan method is always assumed to be row-based. The only way 
> to execute the final stage is by calling the various doExecute methods on 
> AdaptiveSparkPlanExec, and doExecuteColumnar is not implemented. The 
> supportsColumnar method also always returns false.
> In the RAPIDS Accelerator for Apache Spark, we currently call the private 
> getFinalPhysicalPlan method using reflection and then determine if that plan 
> is columnar or not, and then call the appropriate doExecute method, bypassing 
> the doExecute methods on AdaptiveSparkPlanExec. We would like a supported 
> mechanism for executing a columnar AQE plan so that we do not need to use 
> reflection.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage

2021-07-30 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-35881:
--
Fix Version/s: (was: 3.2.0)

> [SQL] AQE does not support columnar execution for the final query stage
> ---
>
> Key: SPARK-35881
> URL: https://issues.apache.org/jira/browse/SPARK-35881
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 3.3.0
>
>
> In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
> are executed until the entire query has been executed. These stages can be 
> row-based or columnar. However, the final stage, produced by the private 
> getFinalPhysicalPlan method is always assumed to be row-based. The only way 
> to execute the final stage is by calling the various doExecute methods on 
> AdaptiveSparkPlanExec, and doExecuteColumnar is not implemented. The 
> supportsColumnar method also always returns false.
> In the RAPIDS Accelerator for Apache Spark, we currently call the private 
> getFinalPhysicalPlan method using reflection and then determine if that plan 
> is columnar or not, and then call the appropriate doExecute method, bypassing 
> the doExecute methods on AdaptiveSparkPlanExec. We would like a supported 
> mechanism for executing a columnar AQE plan so that we do not need to use 
> reflection.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage

2021-06-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-35881:
---
Affects Version/s: 3.0.3
   3.1.2

> [SQL] AQE does not support columnar execution for the final query stage
> ---
>
> Key: SPARK-35881
> URL: https://issues.apache.org/jira/browse/SPARK-35881
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: Andy Grove
>Priority: Major
>
> In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
> are executed until the entire query has been executed. These stages can be 
> row-based or columnar. However, the final stage, produced by the private 
> getFinalPhysicalPlan method is always assumed to be row-based. The only way 
> to execute the final stage is by calling the various doExecute methods on 
> AdaptiveSparkPlanExec, and doExecuteColumnar is not implemented. The 
> supportsColumnar method also always returns false.
> In the RAPIDS Accelerator for Apache Spark, we currently call the private 
> getFinalPhysicalPlan method using reflection and then determine if that plan 
> is columnar or not, and then call the appropriate doExecute method, bypassing 
> the doExecute methods on AdaptiveSparkPlanExec. We would like a supported 
> mechanism for executing a columnar AQE plan so that we do not need to use 
> reflection.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage

2021-06-28 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-35881:
---
Description: 
In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
are executed until the entire query has been executed. These stages can be 
row-based or columnar. However, the final stage, produced by the private 
getFinalPhysicalPlan method is always assumed to be row-based. The only way to 
execute the final stage is by calling the various doExecute methods on 
AdaptiveSparkPlanExec, and doExecuteColumnar is not implemented. The 
supportsColumnar method also always returns false.

In the RAPIDS Accelerator for Apache Spark, we currently call the private 
getFinalPhysicalPlan method using reflection and then determine if that plan is 
columnar or not, and then call the appropriate doExecute method, bypassing the 
doExecute methods on AdaptiveSparkPlanExec. We would like a supported mechanism 
for executing a columnar AQE plan so that we do not need to use reflection.

 

 

 

 

  was:
In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
are executed until the entire query has been executed. These stages can be 
row-based or columnar. However, the final stage, produced by the private 
getFinalPhysicalPlan method is always assumed to be row-based. The only way to 
execute the final stage is by calling the various doExecute methods on 
AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, 
which is another limitation. However, AQE is special because we don't know if 
the final stage will be columnar or not until the child stages have been 
executed and the final stage has been re-planned and re-optimized, so we can't 
easily change the behavior of supportsColumnar. We can't just implement 
doExecuteColumnar because we don't know whether the final stage will be 
columnar oir not until after we start executing the query.

In the RAPIDS Accelerator for Apache Spark, we currently call the private 
getFinalPhysicalPlan method using reflection and then determine if that plan is 
columnar or not, and then calling the appropriate doExecute method, bypassing 
the doExecute methods on AdaptiveSparkPlanExec.

I propose that we make getFinalPhysicalPlan public, and part of the developer 
API, so that columnar plugins can call this method and determine if the final 
stage is columnar or not, and execute it appropriately. This would not affect 
any existing Spark functionality. We also need a mechanism for invoking 
finalPlanUpdate after the query has been executed.

 

 

 


> [SQL] AQE does not support columnar execution for the final query stage
> ---
>
> Key: SPARK-35881
> URL: https://issues.apache.org/jira/browse/SPARK-35881
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Andy Grove
>Priority: Major
>
> In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
> are executed until the entire query has been executed. These stages can be 
> row-based or columnar. However, the final stage, produced by the private 
> getFinalPhysicalPlan method is always assumed to be row-based. The only way 
> to execute the final stage is by calling the various doExecute methods on 
> AdaptiveSparkPlanExec, and doExecuteColumnar is not implemented. The 
> supportsColumnar method also always returns false.
> In the RAPIDS Accelerator for Apache Spark, we currently call the private 
> getFinalPhysicalPlan method using reflection and then determine if that plan 
> is columnar or not, and then call the appropriate doExecute method, bypassing 
> the doExecute methods on AdaptiveSparkPlanExec. We would like a supported 
> mechanism for executing a columnar AQE plan so that we do not need to use 
> reflection.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage

2021-06-24 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-35881:
---
Description: 
In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
are executed until the entire query has been executed. These stages can be 
row-based or columnar. However, the final stage, produced by the private 
getFinalPhysicalPlan method is always assumed to be row-based. The only way to 
execute the final stage is by calling the various doExecute methods on 
AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, 
which is another limitation. However, AQE is special because we don't know if 
the final stage will be columnar or not until the child stages have been 
executed and the final stage has been re-planned and re-optimized, so we can't 
easily change the behavior of supportsColumnar. We can't just implement 
doExecuteColumnar because we don't know whether the final stage will be 
columnar oir not until after we start executing the query.

In the RAPIDS Accelerator for Apache Spark, we currently call the private 
getFinalPhysicalPlan method using reflection and then determine if that plan is 
columnar or not, and then calling the appropriate doExecute method, bypassing 
the doExecute methods on AdaptiveSparkPlanExec.

I propose that we make getFinalPhysicalPlan public, and part of the developer 
API, so that columnar plugins can call this method and determine if the final 
stage is columnar or not, and execute it appropriately. This would not affect 
any existing Spark functionality. We also need a mechanism for invoking 
finalPlanUpdate after the query has been executed.

 

 

 

  was:
In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
are executed until the entire query has been executed. These stages can be 
row-based or columnar. However, the final stage, produced by the private 
getFinalPhysicalPlan method is always assumed to be row-based. The only way to 
execute the final stage is by calling the various doExecute methods on 
AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, 
which is another limitation. However, AQE is special because we don't know if 
the final stage will be columnar or not until the child stages have been 
executed and the final stage has been re-planned and re-optimized, so we can't 
easily change the behavior of supportsColumnar. We can't just implement 
doExecuteColumnar because we don't know whether the final stage will be 
columnar oir not until after we start executing the query.

In the RAPIDS Accelerator for Apache Spark, we currently call the private 
getFinalPhysicalPlan method using reflection and then determine if that plan is 
columnar or not, and then calling the appropriate doExecute method, bypassing 
the doExecute methods on AdaptiveSparkPlanExec.

I propose that we make getFinalPhysicalPlan public, and part of the developer 
API, so that columnar plugins can call this method and determine if the final 
stage is columnar or not, and execute it appropriately. This would not affect 
any existing Spark functionality.

 

 

 


> [SQL] AQE does not support columnar execution for the final query stage
> ---
>
> Key: SPARK-35881
> URL: https://issues.apache.org/jira/browse/SPARK-35881
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Andy Grove
>Priority: Major
>
> In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
> are executed until the entire query has been executed. These stages can be 
> row-based or columnar. However, the final stage, produced by the private 
> getFinalPhysicalPlan method is always assumed to be row-based. The only way 
> to execute the final stage is by calling the various doExecute methods on 
> AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, 
> which is another limitation. However, AQE is special because we don't know if 
> the final stage will be columnar or not until the child stages have been 
> executed and the final stage has been re-planned and re-optimized, so we 
> can't easily change the behavior of supportsColumnar. We can't just implement 
> doExecuteColumnar because we don't know whether the final stage will be 
> columnar oir not until after we start executing the query.
> In the RAPIDS Accelerator for Apache Spark, we currently call the private 
> getFinalPhysicalPlan method using reflection and then determine if that plan 
> is columnar or not, and then calling the appropriate doExecute method, 
> bypassing the doExecute methods on AdaptiveSparkPlanExec.
> I propose that we make getFinalPhysicalPlan public, and part of the developer 
> API, so that columnar plugins 

[jira] [Updated] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage

2021-06-24 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-35881:
---
Description: 
In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
are executed until the entire query has been executed. These stages can be 
row-based or columnar. However, the final stage, produced by the private 
getFinalPhysicalPlan method is always assumed to be row-based. The only way to 
execute the final stage is by calling the various doExecute methods on 
AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, 
which is another limitation. However, AQE is special because we don't know if 
the final stage will be columnar or not until the child stages have been 
executed and the final stage has been re-planned and re-optimized, so we can't 
easily change the behavior of supportsColumnar. We can't just implement 
doExecuteColumnar because we don't know whether the final stage will be 
columnar oir not until after we start executing the query.

In the RAPIDS Accelerator for Apache Spark, we currently call the private 
getFinalPhysicalPlan method using reflection and then determine if that plan is 
columnar or not, and then calling the appropriate doExecute method, bypassing 
the doExecute methods on AdaptiveSparkPlanExec.

I propose that we make getFinalPhysicalPlan public, and part of the developer 
API, so that columnar plugins can call this method and determine if the final 
stage is columnar or not, and execute it appropriately. This would not affect 
any existing Spark functionality.

 

 

 

  was:
In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
are executed until the entire query has been executed. These stages can be 
row-based or columnar. However, the final stage, produced by the private 
getFinalPhysicalPlan method is always assumed to be row-based. The only way to 
execute the final stage is by calling the various doExecute methods on 
AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, 
which is another limitation. However, AQE is special because we don't know if 
the final stage will be columnar or not until the child stages have been 
executed and the final stage has been re-planned and re-optimized, so we can't 
easily change the behavior of supportsColumnar. We can't just implement 
doExecuteColumnar because we don't know whether the final stage will be 
columnar oir not until after we start executing the query.

In the RAPIDS Accelerator for Apache Spark, we currently call the private 
getFinalPhysicalPlan method using reflection and then invoke that plan, 
bypassing the doExecute methods on AdaptiveSparkPlanExec.

I propose that we make getFinalPhysicalPlan public, and part of the developer 
API, so that columnar plugins can call this method and determine if the final 
stage is columnar or not, and execute it appropriately. This would not affect 
any existing Spark functionality.

 

 

 


> [SQL] AQE does not support columnar execution for the final query stage
> ---
>
> Key: SPARK-35881
> URL: https://issues.apache.org/jira/browse/SPARK-35881
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Andy Grove
>Priority: Major
>
> In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
> are executed until the entire query has been executed. These stages can be 
> row-based or columnar. However, the final stage, produced by the private 
> getFinalPhysicalPlan method is always assumed to be row-based. The only way 
> to execute the final stage is by calling the various doExecute methods on 
> AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, 
> which is another limitation. However, AQE is special because we don't know if 
> the final stage will be columnar or not until the child stages have been 
> executed and the final stage has been re-planned and re-optimized, so we 
> can't easily change the behavior of supportsColumnar. We can't just implement 
> doExecuteColumnar because we don't know whether the final stage will be 
> columnar oir not until after we start executing the query.
> In the RAPIDS Accelerator for Apache Spark, we currently call the private 
> getFinalPhysicalPlan method using reflection and then determine if that plan 
> is columnar or not, and then calling the appropriate doExecute method, 
> bypassing the doExecute methods on AdaptiveSparkPlanExec.
> I propose that we make getFinalPhysicalPlan public, and part of the developer 
> API, so that columnar plugins can call this method and determine if the final 
> stage is columnar or not, and execute it appropriately. This would not affect 
> any existing Spark functionality.
>