beliefer opened a new pull request #35932:
URL: https://github.com/apache/spark/pull/35932


   ### What changes were proposed in this pull request?
   Currently, Spark DS V2 aggregate push-down doesn't supports project with 
alias.
   
   Refer 
https://github.com/apache/spark/blob/c91c2e9afec0d5d5bbbd2e155057fe409c5bb928/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala#L96
   
   This PR let it works good with alias.
   
   **The first example:**
   the origin plan show below:
   ```
   Aggregate [DEPT#0], [DEPT#0, sum(mySalary#8) AS total#14]
   +- Project [DEPT#0, SALARY#2 AS mySalary#8]
      +- ScanBuilderHolder [DEPT#0, NAME#1, SALARY#2, BONUS#3], 
RelationV2[DEPT#0, NAME#1, SALARY#2, BONUS#3] test.employee, 
JDBCScanBuilder(org.apache.spark.sql.test.TestSparkSession@77978658,StructType(StructField(DEPT,IntegerType,true),StructField(NAME,StringType,true),StructField(SALARY,DecimalType(20,2),true),StructField(BONUS,DoubleType,true)),org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions@5f8da82)
   ```
   If we can complete push down the aggregate, then the plan will be:
   ```
   Project [DEPT#0, SUM(SALARY)#18 AS sum(SALARY#2)#13 AS total#14]
   +- RelationV2[DEPT#0, SUM(SALARY)#18] test.employee
   ```
   If we can partial push down the aggregate, then the plan will be:
   ```
   Aggregate [DEPT#0], [DEPT#0, sum(cast(SUM(SALARY)#18 as decimal(20,2))) AS 
total#14]
   +- RelationV2[DEPT#0, SUM(SALARY)#18] test.employee
   ```
   
   
   **The second example:**
   the origin plan show below:
   ```
   Aggregate [myDept#33], [myDept#33, sum(mySalary#34) AS total#40]
   +- Project [DEPT#25 AS myDept#33, SALARY#27 AS mySalary#34]
      +- ScanBuilderHolder [DEPT#25, NAME#26, SALARY#27, BONUS#28], 
RelationV2[DEPT#25, NAME#26, SALARY#27, BONUS#28] test.employee, 
JDBCScanBuilder(org.apache.spark.sql.test.TestSparkSession@25c4f621,StructType(StructField(DEPT,IntegerType,true),StructField(NAME,StringType,true),StructField(SALARY,DecimalType(20,2),true),StructField(BONUS,DoubleType,true)),org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions@345d641e)
   ```
   If we can complete push down the aggregate, then the plan will be:
   ```
   Project [DEPT#25 AS myDept#33, SUM(SALARY)#44 AS sum(SALARY#27)#39 AS 
total#40]
   +- RelationV2[DEPT#25, SUM(SALARY)#44] test.employee
   ```
   If we can partial push down the aggregate, then the plan will be:
   ```
   Aggregate [DEPT#25], [DEPT#25 AS myDept#33, sum(cast(SUM(SALARY)#56 as 
decimal(20,2))) AS total#52]
   +- RelationV2[DEPT#25, SUM(SALARY)#56] test.employee
   ```
   
   
   
   ### Why are the changes needed?
   Alias is more useful.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'Yes'.
   Users could see DS V2 aggregate push-down supports project with alias.
   
   
   ### How was this patch tested?
   New tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to