[GitHub] [spark] c21 commented on a change in pull request #31892: [SPARK-34796][SQL] Initialize counter variable for LIMIT code-gen in doProduce()

GitBox Thu, 18 Mar 2021 23:06:34 -0700


c21 commented on a change in pull request #31892:
URL: https://github.com/apache/spark/pull/31892#discussion_r597426996




##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##########
@@ -4097,6 +4097,25 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
       checkAnswer(df2, Seq(Row(2, 1, 1), Row(4, 2, 2)))
     }
   }
+
+  test("SPARK-34796: Avoid code-gen compilation error for LIMIT query") {
+    withTable("left_table", "empty_right_table", "output_table") {
+      spark.range(5).toDF("k").write.saveAsTable("left_table")
+      spark.range(0).toDF("k").write.saveAsTable("empty_right_table")
+
+      withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "false") {
+        spark.sql("CREATE TABLE output_table (k INT) USING parquet")
+        spark.sql(
+          """
+            |INSERT INTO TABLE output_table

Review comment:
       @cloud-fan - yes I think so.
   
   For query:
   
   ```
   SELECT t1.k FROM left_table t1
   JOIN empty_right_table t2
   ON t1.k = t2.k
   LIMIT 3
   ```
   
   The physical plan:
   
   
   ```
   CollectLimit 3
   +- *(2) Project [k#228L]
      +- *(2) BroadcastHashJoin [k#228L], [k#229L], Inner, BuildRight, false
         :- *(2) Filter isnotnull(k#228L)
         :  +- *(2) ColumnarToRow
         :     +- FileScan parquet default.left_table[k#228L] Batched: true, 
DataFilters: [isnotnull(k#228L)], Format: Parquet, Location: 
InMemoryFileIndex(1 
paths)[file:/Users/chengsu/spark/sql/core/spark-warehouse/org.apache.spark.sq...,
 PartitionFilters: [], PushedFilters: [IsNotNull(k)], ReadSchema: 
struct<k:bigint>
         +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, 
false]),false), [id=#148]
            +- *(1) Filter isnotnull(k#229L)
               +- *(1) ColumnarToRow
                  +- FileScan parquet default.empty_right_table[k#229L] 
Batched: true, DataFilters: [isnotnull(k#229L)], Format: Parquet, Location: 
InMemoryFileIndex(1 
paths)[file:/Users/chengsu/spark/sql/core/spark-warehouse/org.apache.spark.sq...,
 PartitionFilters: [], PushedFilters: [IsNotNull(k)], ReadSchema: 
struct<k:bigint>
   ```
   
   This is not reproduced any more as there's no code-gen `BaseLimitExec` after 
`BroadcastHashJoin`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] c21 commented on a change in pull request #31892: [SPARK-34796][SQL] Initialize counter variable for LIMIT code-gen in doProduce()

Reply via email to