[GitHub] [spark] rf972 commented on a change in pull request #29695: [SPARK-22390][SPARK-32833][SQL] [WIP]JDBC V2 Datasource aggregate push down

GitBox Mon, 16 Nov 2020 13:46:23 -0800


rf972 commented on a change in pull request #29695:
URL: https://github.com/apache/spark/pull/29695#discussion_r524623692




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -73,33 +77,25 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] {
               postScanFilters)
             Aggregate(groupingExpressions, resultExpressions, plan)
           } else {
-            val resultAttributes = resultExpressions.map(_.toAttribute)
-              .map ( e => e match { case a: AttributeReference => a })
-            var index = 0
             val aggOutputBuilder = ArrayBuilder.make[AttributeReference]
-            for (a <- resultAttributes) {
-              val newName = if (a.name.contains("FILTER")) {
-                a.name.substring(0, a.name.indexOf("FILTER") - 1)
-              } else if (a.name.contains("DISTINCT")) {
-                a.name.replace("DISTINCT ", "")
-              } else {
-                a.name
-              }
-
-              aggOutputBuilder +=
-                a.copy(name = newName,
-                  dataType = aggregates(index).dataType)(exprId = 
NamedExpression.newExprId,
-                  qualifier = a.qualifier)
-              index += 1
+            for (a <- aggregates) {
+                aggOutputBuilder += AttributeReference(toPrettySQL(a), 
a.dataType)()
             }
             val aggOutput = aggOutputBuilder.result
 
-            var newOutput = aggOutput
-            for (col <- output) {
-              if (!aggOutput.exists(_.name.contains(col.name))) {
-                newOutput = col +: newOutput
+            val newOutputBuilder = ArrayBuilder.make[AttributeReference]
+            for (col1 <- output) {
+              var found = false
+              for (col2 <- aggOutput) {
+                  if (contains(col2.name, col1.name)) {

Review comment:
       Thanks very much for the fix !  We have evaluated it and it is working 
great for us !  
   We do have one minor nit. 
   We noticed that very short column names like col = 'i' can end up with an 
exception, because the compare can mistakenly match portions of the aggregate 
expression for short column names.
   For example, the column name col= 'i' can match with col2 = "min(k)".
   The following fix seems to solve the issue for us:
   if (col2.name.toLowerCase(Locale.ROOT).contains("(" + 
col1.name.toLowerCase(Locale.ROOT))) {
   The single "(" came about since we saw cases where col2 looked like: 
"sum(CAST(I AS BIGINT))#15" or in other cases col2 was something like this: 
"min(i)"
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] rf972 commented on a change in pull request #29695: [SPARK-22390][SPARK-32833][SQL] [WIP]JDBC V2 Datasource aggregate push down

Reply via email to