rf972 commented on a change in pull request #29695:
URL: https://github.com/apache/spark/pull/29695#discussion_r524623692
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -73,33 +77,25 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] {
postScanFilters)
Aggregate(groupingExpressions, resultExpressions, plan)
} else {
- val resultAttributes = resultExpressions.map(_.toAttribute)
- .map ( e => e match { case a: AttributeReference => a })
- var index = 0
val aggOutputBuilder = ArrayBuilder.make[AttributeReference]
- for (a <- resultAttributes) {
- val newName = if (a.name.contains("FILTER")) {
- a.name.substring(0, a.name.indexOf("FILTER") - 1)
- } else if (a.name.contains("DISTINCT")) {
- a.name.replace("DISTINCT ", "")
- } else {
- a.name
- }
-
- aggOutputBuilder +=
- a.copy(name = newName,
- dataType = aggregates(index).dataType)(exprId =
NamedExpression.newExprId,
- qualifier = a.qualifier)
- index += 1
+ for (a <- aggregates) {
+ aggOutputBuilder += AttributeReference(toPrettySQL(a),
a.dataType)()
}
val aggOutput = aggOutputBuilder.result
- var newOutput = aggOutput
- for (col <- output) {
- if (!aggOutput.exists(_.name.contains(col.name))) {
- newOutput = col +: newOutput
+ val newOutputBuilder = ArrayBuilder.make[AttributeReference]
+ for (col1 <- output) {
+ var found = false
+ for (col2 <- aggOutput) {
+ if (contains(col2.name, col1.name)) {
Review comment:
Thanks very much for the fix ! We have evaluated it and it is working
great for us !
We do have one minor nit.
We noticed that very short column names like col = 'i' can end up with an
exception, because the compare can mistakenly match portions of the aggregate
expression for short column names.
For example, the column name col= 'i' can match with col2 = "min(k)".
The following fix seems to solve the issue for us:
if (col2.name.toLowerCase(Locale.ROOT).contains("(" +
col1.name.toLowerCase(Locale.ROOT))) {
The single "(" came about since we saw cases where col2 looked like:
"sum(CAST(I AS BIGINT))#15" or in other cases col2 was something like this:
"min(i)"
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]