cloud-fan commented on a change in pull request #35130:
URL: https://github.com/apache/spark/pull/35130#discussion_r782227932



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala
##########
@@ -220,10 +222,16 @@ abstract class JdbcDialect extends Serializable with 
Logging{
         Some(s"SUM($distinct$column)")
       case _: CountStar =>
         Some("COUNT(*)")
-      case f: GeneralAggregateFunc if f.name() == "AVG" =>
-        assert(f.inputs().length == 1)
-        val distinct = if (f.isDistinct) "DISTINCT " else ""
-        Some(s"AVG($distinct${f.inputs().head})")
+      case avg: Avg =>
+        if (avg.column.fieldNames.length != 1) return None
+        val distinct = if (avg.isDistinct) "DISTINCT " else ""
+        val column = quoteIdentifier(avg.column.fieldNames.head)
+        if (supportCompletePushDown) {
+          Some(s"AVG($distinct$column)")
+        } else {
+          // For simplify code, we not reuse exists `SUM` or `COUNT`.
+          Some(s"SUM($distinct$column), COUNT($distinct$column)")

Review comment:
       Since partial agg pushdown needs Spark to do final agg, Spark must be 
fully aware of the AVG translation, to make the final agg match the data source 
scan with partial agg pushed.
   
   I think the process should be
   1. Spark translates catalyst Aggregate operator to a DS V2 `Aggregation`.
   2. Spark calls `supportCompletePushDown` to check if it can completely push 
down agg
   3. JDBC source returns false in `supportCompletePushDown` if AVG is present
   4. Spark gives up complete agg push down, and starts to try partial agg push 
down
   5. Spark splits AVG into 2 functions: SUM and COUNT, and pushes the 
`Aggregation` to JDBC source
   6. Spark constructs the final agg and calculates AVG by SUM / COUNT.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to