[GitHub] [spark] Hisoka-X commented on a diff in pull request #41347: [SPARK-43838][SQL] Fix subquery on single table with having clause can't be optimized

via GitHub Thu, 13 Jul 2023 03:42:55 -0700


Hisoka-X commented on code in PR #41347:
URL: https://github.com/apache/spark/pull/41347#discussion_r1262375339



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala:
##########
@@ -271,7 +271,10 @@ case class ScalarSubquery(
     mayHaveCountBug: Option[Boolean] = None)
   extends SubqueryExpression(plan, outerAttrs, exprId, joinCond, hint) with 
Unevaluable {
   override def dataType: DataType = {
-    assert(plan.schema.fields.nonEmpty, "Scalar subquery should have only one 
column")
+    if (!plan.schema.fields.nonEmpty) {

Review Comment:
   Yes. Usually this error will be thrown by `checkAnalysis`, but we may call 
datatype in `DeduplicateRelations ` to cause this exception to be thrown. This 
change ensures that the thrown exception is consistent.
   ```log
   Caused by: sbt.ForkMain$ForkError: java.lang.AssertionError: assertion 
failed: Scalar subquery should have only one column
        at scala.Predef$.assert(Predef.scala:223)
        at 
org.apache.spark.sql.catalyst.expressions.ScalarSubquery.dataType(subquery.scala:274)
        at 
org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:194)
        at 
org.apache.spark.sql.catalyst.analysis.DeduplicateRelations$$anonfun$findAliases$1.applyOrElse(DeduplicateRelations.scala:530)
        at 
org.apache.spark.sql.catalyst.analysis.DeduplicateRelations$$anonfun$findAliases$1.applyOrElse(DeduplicateRelations.scala:530)
        at 
scala.PartialFunction.$anonfun$runWith$1$adapted(PartialFunction.scala:145)
        at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at scala.collection.TraversableLike.collect(TraversableLike.scala:407)
        at scala.collection.TraversableLike.collect$(TraversableLike.scala:405)
        at scala.collection.AbstractTraversable.collect(Traversable.scala:108)
        at 
org.apache.spark.sql.catalyst.analysis.DeduplicateRelations$.findAliases(DeduplicateRelations.scala:530)
        at 
org.apache.spark.sql.catalyst.analysis.DeduplicateRelations$.org$apache$spark$sql$catalyst$analysis$DeduplicateRelations$$renewDuplicatedRelations(DeduplicateRelations.scala:120)
        at 
org.apache.spark.sql.catalyst.analysis.DeduplicateRelations$.apply(DeduplicateRelations.scala:40)
        at 
org.apache.spark.sql.catalyst.analysis.DeduplicateRelations$.apply(DeduplicateRelations.scala:38)
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Hisoka-X commented on a diff in pull request #41347: [SPARK-43838][SQL] Fix subquery on single table with having clause can't be optimized

Reply via email to