Re: [PR] [SPARK-51820][SQL] Move `UnresolvedOrdinal` construction before analysis to avoid issue with group by ordinal [spark]

2025-04-18 Thread via GitHub


dtenedor commented on code in PR #50606:
URL: https://github.com/apache/spark/pull/50606#discussion_r2047820851


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##
@@ -6532,12 +6547,12 @@ class AstBuilder extends DataTypeAstBuilder
 case n: NamedExpression =>
   newGroupingExpressions += n
   newAggregateExpressions += n
-// If the grouping expression is an integer literal, create 
[[UnresolvedOrdinal]] and
-// [[UnresolvedPipeAggregateOrdinal]] expressions to represent it 
in the final grouping
-// and aggregate expressions, respectively. This will let the
+// If the grouping expression is an [[UnresolvedOrdinal]], replace 
the ordinal value and
+// create [[UnresolvedPipeAggregateOrdinal]] expressions to 
represent it in the final
+// grouping and aggregate expressions, respectively. This will let 
the
 // [[ResolveOrdinalInOrderByAndGroupBy]] rule detect the ordinal 
in the aggregate list
 // and replace it with the corresponding attribute from the child 
operator.
-case Literal(v: Int, IntegerType) if conf.groupByOrdinal =>
+case UnresolvedOrdinal(v: Int) =>
   newGroupingExpressions += 
UnresolvedOrdinal(newAggregateExpressions.length + 1)

Review Comment:
   Note that for pipe SQL syntax, GROUP BY ordinals work differently. In this 
case, the ordinals refer to the one-based indexes of the attributes returned 
from the child operator, not to the grouping expressions.
   
   
   
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51820][SQL] Move `UnresolvedOrdinal` construction before analysis to avoid issue with group by ordinal [spark]

2025-04-18 Thread via GitHub


cloud-fan closed pull request #50606: [SPARK-51820][SQL] Move 
`UnresolvedOrdinal` construction before analysis to avoid issue with group by 
ordinal
URL: https://github.com/apache/spark/pull/50606


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51820][SQL] Move `UnresolvedOrdinal` construction before analysis to avoid issue with group by ordinal [spark]

2025-04-17 Thread via GitHub


cloud-fan commented on PR #50606:
URL: https://github.com/apache/spark/pull/50606#issuecomment-2814257625

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51820][SQL] Move `UnresolvedOrdinal` construction before analysis to avoid issue with group by ordinal [spark]

2025-04-17 Thread via GitHub


mihailotim-db commented on code in PR #50606:
URL: https://github.com/apache/spark/pull/50606#discussion_r2048387700


##
sql/core/src/main/scala/org/apache/spark/sql/classic/Dataset.scala:
##
@@ -929,7 +929,16 @@ class Dataset[T] private[sql](
   /** @inheritdoc */
   @scala.annotation.varargs
   def groupBy(cols: Column*): RelationalGroupedDataset = {
-RelationalGroupedDataset(toDF(), cols.map(_.expr), 
RelationalGroupedDataset.GroupByType)
+val groupingExpressionsWithReplacedOrdinals = cols.map { col => col.expr 
match {

Review Comment:
   Done!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51820][SQL] Move `UnresolvedOrdinal` construction before analysis to avoid issue with group by ordinal [spark]

2025-04-17 Thread via GitHub


mihailotim-db commented on code in PR #50606:
URL: https://github.com/apache/spark/pull/50606#discussion_r2048385309


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##
@@ -1825,24 +1825,32 @@ class AstBuilder extends DataTypeAstBuilder
   }
   visitNamedExpression(n)
 }.toSeq
+  val groupByExpressionsWithReplacedOrdinals =
+replaceOrdinalsInGroupingExpressions(groupByExpressions)
   if (ctx.GROUPING != null) {
 // GROUP BY ... GROUPING SETS (...)
 // `groupByExpressions` can be non-empty for Hive compatibility. It 
may add extra grouping
 // expressions that do not exist in GROUPING SETS (...), and the value 
is always null.
 // For example, `SELECT a, b, c FROM ... GROUP BY a, b, c GROUPING 
SETS (a, b)`, the output
 // of column `c` is always null.
 val groupingSets =
-  ctx.groupingSet.asScala.map(_.expression.asScala.map(e => 
expression(e)).toSeq)
-Aggregate(Seq(GroupingSets(groupingSets.toSeq, groupByExpressions)),
+  ctx.groupingSet.asScala.map(_.expression.asScala.map(e => {

Review Comment:
   Done! I moved to a separate method with a scaladoc instead



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51820][SQL] Move `UnresolvedOrdinal` construction before analysis to avoid issue with group by ordinal [spark]

2025-04-17 Thread via GitHub


mihailotim-db commented on code in PR #50606:
URL: https://github.com/apache/spark/pull/50606#discussion_r2048521406


##
sql/core/src/test/scala/org/apache/spark/sql/analysis/resolver/AggregateResolverSuite.scala:
##
@@ -44,12 +44,6 @@ class AggregateResolverSuite extends QueryTest with 
SharedSparkSession {
 resolverRunner.resolve(query)
   }
 
-  test("Valid group by ordinal") {

Review Comment:
   Yep, that's better. Done!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51820][SQL] Move `UnresolvedOrdinal` construction before analysis to avoid issue with group by ordinal [spark]

2025-04-17 Thread via GitHub


mihailotim-db commented on code in PR #50606:
URL: https://github.com/apache/spark/pull/50606#discussion_r2048513651


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala:
##
@@ -446,11 +447,16 @@ package object dsl {
   def sortBy(sortExprs: SortOrder*): LogicalPlan = Sort(sortExprs, false, 
logicalPlan)
 
   def groupBy(groupingExprs: Expression*)(aggregateExprs: Expression*): 
LogicalPlan = {
+// Replace top-level integer literals with ordinals, if 
`groupByOrdinal` is enabled.
+val groupingExprsWithReplacedOrdinals = groupingExprs.map {

Review Comment:
   Fixed!



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##
@@ -1825,24 +1825,25 @@ class AstBuilder extends DataTypeAstBuilder
   }
   visitNamedExpression(n)
 }.toSeq
+  val groupByExpressionsWithReplacedOrdinals =
+replaceOrdinalsInGroupingExpressions(groupByExpressions)

Review Comment:
   Sounds good!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51820][SQL] Move `UnresolvedOrdinal` construction before analysis to avoid issue with group by ordinal [spark]

2025-04-17 Thread via GitHub


mihailotim-db commented on code in PR #50606:
URL: https://github.com/apache/spark/pull/50606#discussion_r2048515428


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -1979,19 +1978,7 @@ class Analyzer(override val catalogManager: 
CatalogManager) extends RuleExecutor
   throw 
QueryCompilationErrors.groupByPositionRefersToAggregateFunctionError(
 index, ordinalExpr)
 } else {

Review Comment:
   Nice!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51820][SQL] Move `UnresolvedOrdinal` construction before analysis to avoid issue with group by ordinal [spark]

2025-04-17 Thread via GitHub


mihailotim-db commented on PR #50606:
URL: https://github.com/apache/spark/pull/50606#issuecomment-2812189399

   > Just one thing we need to check - if the view is persisted with 
`ORDER_BY_ORDINAL` conf ON, what happens if we read this view with 
`ORDER_BY_ORDINAL` conf `OFF`? This might be an issue, since we moved the conf 
check to the parser.
   > 
   > The view must keep its confs.
   
   Confirmed the correct behavior in the shell. With conf off, the query should 
error out with `MISSING_AGGREGATION`
   
![image](https://github.com/user-attachments/assets/46b34017-7ad3-4bce-95a5-ecf0335516e6)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51820][SQL] Move `UnresolvedOrdinal` construction before analysis to avoid issue with group by ordinal [spark]

2025-04-17 Thread via GitHub


mihailotim-db commented on code in PR #50606:
URL: https://github.com/apache/spark/pull/50606#discussion_r2048372228


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala:
##
@@ -446,11 +447,15 @@ package object dsl {
   def sortBy(sortExprs: SortOrder*): LogicalPlan = Sort(sortExprs, false, 
logicalPlan)
 
   def groupBy(groupingExprs: Expression*)(aggregateExprs: Expression*): 
LogicalPlan = {
+val groupingExprsWithReplacedOrdinals = groupingExprs.map {

Review Comment:
   Done!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51820][SQL] Move `UnresolvedOrdinal` construction before analysis to avoid issue with group by ordinal [spark]

2025-04-17 Thread via GitHub


mihailotim-db commented on code in PR #50606:
URL: https://github.com/apache/spark/pull/50606#discussion_r2048461271


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/GroupByOrdinalsRepeatedAnalysisSuite.scala:
##
@@ -17,63 +17,42 @@
 
 package org.apache.spark.sql.catalyst.analysis
 
-import org.apache.spark.sql.catalyst.analysis.TestRelations.{testRelation, 
testRelation2}
+import org.apache.spark.sql.catalyst.analysis.TestRelations.testRelation
 import org.apache.spark.sql.catalyst.dsl.expressions._
 import org.apache.spark.sql.catalyst.dsl.plans._
 import org.apache.spark.sql.catalyst.expressions.{GenericInternalRow, Literal}
 import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
-import org.apache.spark.sql.internal.SQLConf
 
-class SubstituteUnresolvedOrdinalsSuite extends AnalysisTest {
-  private lazy val a = testRelation2.output(0)
-  private lazy val b = testRelation2.output(1)
+class GroupByOrdinalsRepeatedAnalysisSuite extends AnalysisTest {
 
   test("unresolved ordinal should not be unresolved") {
 // Expression OrderByOrdinal is unresolved.
 assert(!UnresolvedOrdinal(0).resolved)
   }
 
-  test("order by ordinal") {
-// Tests order by ordinal, apply single rule.
-val plan = testRelation2.orderBy(Literal(1).asc, Literal(2).asc)
+  test("SPARK-45920: group by ordinal repeated analysis") {
+val plan = testRelation.groupBy(Literal(1))(Literal(100).as("a")).analyze
 comparePlans(
-  SubstituteUnresolvedOrdinals.apply(plan),
-  testRelation2.orderBy(UnresolvedOrdinal(1).asc, 
UnresolvedOrdinal(2).asc))
-
-// Tests order by ordinal, do full analysis
-checkAnalysis(plan, testRelation2.orderBy(a.asc, b.asc))
+  plan,
+  testRelation.groupBy(Literal(1))(Literal(100).as("a")).analyze
+)
 
-// order by ordinal can be turned off by config
-withSQLConf(SQLConf.ORDER_BY_ORDINAL.key -> "false") {

Review Comment:
   We are removing `SubstituteUnresolvedOrdinals` object and we also have 
golden file tests for these cases so I think it would be redundant to rewrite 
them again



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51820][SQL] Move `UnresolvedOrdinal` construction before analysis to avoid issue with group by ordinal [spark]

2025-04-17 Thread via GitHub


vladimirg-db commented on code in PR #50606:
URL: https://github.com/apache/spark/pull/50606#discussion_r2048438196


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala:
##
@@ -446,11 +447,16 @@ package object dsl {
   def sortBy(sortExprs: SortOrder*): LogicalPlan = Sort(sortExprs, false, 
logicalPlan)
 
   def groupBy(groupingExprs: Expression*)(aggregateExprs: Expression*): 
LogicalPlan = {
+// Replace top-level integer literals with ordinals, if 
`groupByOrdinal` is enabled.
+val groupingExprsWithReplacedOrdinals = groupingExprs.map {

Review Comment:
   The ordinals are not replaced. They are "injected"
   
   ```suggestion
   val groupingExprsWithOrdinals = groupingExprs.map {
   ```



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -1979,19 +1978,7 @@ class Analyzer(override val catalogManager: 
CatalogManager) extends RuleExecutor
   throw 
QueryCompilationErrors.groupByPositionRefersToAggregateFunctionError(
 index, ordinalExpr)
 } else {

Review Comment:
   You can drop this `else`, since there's a `throw` above.



##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/GroupByOrdinalsRepeatedAnalysisSuite.scala:
##
@@ -17,63 +17,42 @@
 
 package org.apache.spark.sql.catalyst.analysis
 
-import org.apache.spark.sql.catalyst.analysis.TestRelations.{testRelation, 
testRelation2}
+import org.apache.spark.sql.catalyst.analysis.TestRelations.testRelation
 import org.apache.spark.sql.catalyst.dsl.expressions._
 import org.apache.spark.sql.catalyst.dsl.plans._
 import org.apache.spark.sql.catalyst.expressions.{GenericInternalRow, Literal}
 import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
-import org.apache.spark.sql.internal.SQLConf
 
-class SubstituteUnresolvedOrdinalsSuite extends AnalysisTest {
-  private lazy val a = testRelation2.output(0)
-  private lazy val b = testRelation2.output(1)
+class GroupByOrdinalsRepeatedAnalysisSuite extends AnalysisTest {
 
   test("unresolved ordinal should not be unresolved") {
 // Expression OrderByOrdinal is unresolved.
 assert(!UnresolvedOrdinal(0).resolved)
   }
 
-  test("order by ordinal") {
-// Tests order by ordinal, apply single rule.
-val plan = testRelation2.orderBy(Literal(1).asc, Literal(2).asc)
+  test("SPARK-45920: group by ordinal repeated analysis") {
+val plan = testRelation.groupBy(Literal(1))(Literal(100).as("a")).analyze
 comparePlans(
-  SubstituteUnresolvedOrdinals.apply(plan),
-  testRelation2.orderBy(UnresolvedOrdinal(1).asc, 
UnresolvedOrdinal(2).asc))
-
-// Tests order by ordinal, do full analysis
-checkAnalysis(plan, testRelation2.orderBy(a.asc, b.asc))
+  plan,
+  testRelation.groupBy(Literal(1))(Literal(100).as("a")).analyze
+)
 
-// order by ordinal can be turned off by config
-withSQLConf(SQLConf.ORDER_BY_ORDINAL.key -> "false") {

Review Comment:
   Why do we remove this piece of test?



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteUnresolvedOrdinals.scala:
##
@@ -1,64 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.catalyst.analysis
-
-import org.apache.spark.sql.catalyst.expressions.{BaseGroupingSets, 
Expression, Literal, SortOrder}
-import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, LogicalPlan, 
Sort}
-import org.apache.spark.sql.catalyst.rules.Rule
-import org.apache.spark.sql.catalyst.trees.CurrentOrigin.withOrigin
-import org.apache.spark.sql.catalyst.trees.TreePattern._
-import org.apache.spark.sql.types.IntegerType
-
-/**
- * Replaces ordinal in 'order by' or 'group by' with UnresolvedOrdinal 
expression.
- */
-object SubstituteUnresolvedOrdinals extends Rule[LogicalPlan] {

Review Comment:
   Nice!



##
sql/core/src/test/scala/org/apache/spark/sql/analysis/resolver/AggregateResolverSuite.scala:
##
@@ -44,12 +44,6 @@ class AggregateResolverSuite extends QueryTest with 
SharedSparkSession {
 resolverRunner.resolve(query)
   }
 
-  test("Valid group by ordinal") {

Rev

Re: [PR] [SPARK-51820][SQL] Move `UnresolvedOrdinal` construction before analysis to avoid issue with group by ordinal [spark]

2025-04-17 Thread via GitHub


vladimirg-db commented on PR #50606:
URL: https://github.com/apache/spark/pull/50606#issuecomment-2812125611

   Just one thing we need to check - if the view is persisted with 
`ORDER_BY_ORDINAL` conf ON, what happens if we read this view with 
`ORDER_BY_ORDINAL` conf `OFF`? This might be an issue, since we moved the conf 
check to the parser. 
   
   The view must keep its confs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51820][SQL] Move `UnresolvedOrdinal` construction before analysis to avoid issue with group by ordinal [spark]

2025-04-17 Thread via GitHub


mihailotim-db commented on code in PR #50606:
URL: https://github.com/apache/spark/pull/50606#discussion_r2048385581


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##
@@ -6558,6 +6573,31 @@ class AstBuilder extends DataTypeAstBuilder
 }
   }
 
+  private def visitSortItemAndReplaceOrdinals(sortItemContext: 
SortItemContext) = {

Review Comment:
   Done!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51820][SQL] Move `UnresolvedOrdinal` construction before analysis to avoid issue with group by ordinal [spark]

2025-04-17 Thread via GitHub


mihailotim-db commented on PR #50606:
URL: https://github.com/apache/spark/pull/50606#issuecomment-2812045197

   > shall we also handle Spark Connect queries in `SparkConnectPlanner`?
   
   Sure, done!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51820][SQL] Move `UnresolvedOrdinal` construction before analysis to avoid issue with group by ordinal [spark]

2025-04-17 Thread via GitHub


mihailotim-db commented on code in PR #50606:
URL: https://github.com/apache/spark/pull/50606#discussion_r2048368533


##
sql/core/src/test/scala/org/apache/spark/sql/analysis/resolver/AggregateResolverSuite.scala:
##
@@ -44,12 +44,6 @@ class AggregateResolverSuite extends QueryTest with 
SharedSparkSession {
 resolverRunner.resolve(query)

Review Comment:
   Done!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51820][SQL] Move `UnresolvedOrdinal` construction before analysis to avoid issue with group by ordinal [spark]

2025-04-17 Thread via GitHub


mihailotim-db commented on code in PR #50606:
URL: https://github.com/apache/spark/pull/50606#discussion_r2048369191


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##
@@ -6532,12 +6547,12 @@ class AstBuilder extends DataTypeAstBuilder
 case n: NamedExpression =>
   newGroupingExpressions += n
   newAggregateExpressions += n
-// If the grouping expression is an integer literal, create 
[[UnresolvedOrdinal]] and
-// [[UnresolvedPipeAggregateOrdinal]] expressions to represent it 
in the final grouping
-// and aggregate expressions, respectively. This will let the
+// If the grouping expression is an [[UnresolvedOrdinal]], replace 
the ordinal value and
+// create [[UnresolvedPipeAggregateOrdinal]] expressions to 
represent it in the final
+// grouping and aggregate expressions, respectively. This will let 
the
 // [[ResolveOrdinalInOrderByAndGroupBy]] rule detect the ordinal 
in the aggregate list
 // and replace it with the corresponding attribute from the child 
operator.
-case Literal(v: Int, IntegerType) if conf.groupByOrdinal =>
+case UnresolvedOrdinal(v: Int) =>
   newGroupingExpressions += 
UnresolvedOrdinal(newAggregateExpressions.length + 1)

Review Comment:
   Thanks for the clarification!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51820][SQL] Move `UnresolvedOrdinal` construction before analysis to avoid issue with group by ordinal [spark]

2025-04-16 Thread via GitHub


cloud-fan commented on PR #50606:
URL: https://github.com/apache/spark/pull/50606#issuecomment-2811844154

   shall we also handle Spark Connect queries in `SparkConnectPlanner`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51820][SQL] Move `UnresolvedOrdinal` construction before analysis to avoid issue with group by ordinal [spark]

2025-04-16 Thread via GitHub


cloud-fan commented on code in PR #50606:
URL: https://github.com/apache/spark/pull/50606#discussion_r2048273150


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala:
##
@@ -446,11 +447,15 @@ package object dsl {
   def sortBy(sortExprs: SortOrder*): LogicalPlan = Sort(sortExprs, false, 
logicalPlan)
 
   def groupBy(groupingExprs: Expression*)(aggregateExprs: Expression*): 
LogicalPlan = {
+val groupingExprsWithReplacedOrdinals = groupingExprs.map {

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51820][SQL] Move `UnresolvedOrdinal` construction before analysis to avoid issue with group by ordinal [spark]

2025-04-16 Thread via GitHub


dtenedor commented on code in PR #50606:
URL: https://github.com/apache/spark/pull/50606#discussion_r2047817277


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala:
##
@@ -446,11 +447,15 @@ package object dsl {
   def sortBy(sortExprs: SortOrder*): LogicalPlan = Sort(sortExprs, false, 
logicalPlan)
 
   def groupBy(groupingExprs: Expression*)(aggregateExprs: Expression*): 
LogicalPlan = {
+val groupingExprsWithReplacedOrdinals = groupingExprs.map {

Review Comment:
   can you please add a comment here saying what this part is doing?



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##
@@ -1825,24 +1825,32 @@ class AstBuilder extends DataTypeAstBuilder
   }
   visitNamedExpression(n)
 }.toSeq
+  val groupByExpressionsWithReplacedOrdinals =
+replaceOrdinalsInGroupingExpressions(groupByExpressions)
   if (ctx.GROUPING != null) {
 // GROUP BY ... GROUPING SETS (...)
 // `groupByExpressions` can be non-empty for Hive compatibility. It 
may add extra grouping
 // expressions that do not exist in GROUPING SETS (...), and the value 
is always null.
 // For example, `SELECT a, b, c FROM ... GROUP BY a, b, c GROUPING 
SETS (a, b)`, the output
 // of column `c` is always null.
 val groupingSets =
-  ctx.groupingSet.asScala.map(_.expression.asScala.map(e => 
expression(e)).toSeq)
-Aggregate(Seq(GroupingSets(groupingSets.toSeq, groupByExpressions)),
+  ctx.groupingSet.asScala.map(_.expression.asScala.map(e => {

Review Comment:
   can you please add a comment here saying what this part is doing?



##
sql/core/src/test/scala/org/apache/spark/sql/analysis/resolver/AggregateResolverSuite.scala:
##
@@ -44,12 +44,6 @@ class AggregateResolverSuite extends QueryTest with 
SharedSparkSession {
 resolverRunner.resolve(query)

Review Comment:
   Can you copy these test contents to the Jira so we don't forget?



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##
@@ -6558,6 +6573,31 @@ class AstBuilder extends DataTypeAstBuilder
 }
   }
 
+  private def visitSortItemAndReplaceOrdinals(sortItemContext: 
SortItemContext) = {

Review Comment:
   can you please add a comment here saying what these new methods are doing?



##
sql/core/src/main/scala/org/apache/spark/sql/classic/Dataset.scala:
##
@@ -929,7 +929,16 @@ class Dataset[T] private[sql](
   /** @inheritdoc */
   @scala.annotation.varargs
   def groupBy(cols: Column*): RelationalGroupedDataset = {
-RelationalGroupedDataset(toDF(), cols.map(_.expr), 
RelationalGroupedDataset.GroupByType)
+val groupingExpressionsWithReplacedOrdinals = cols.map { col => col.expr 
match {

Review Comment:
   can you please add a comment here saying what this part is doing?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org