cloud-fan commented on code in PR #56249:
URL: https://github.com/apache/spark/pull/56249#discussion_r3338010193


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala:
##########
@@ -1993,6 +1994,40 @@ class DataFrameFunctionsSuite extends SharedSparkSession 
{
     )
   }
 
+  test("array_join with nullable nullReplacement under whole-stage codegen") {
+    // With a nullable nullReplacement column and an upstream IsNotNull
+    // filter that tightens the array (and delimiter) to non-nullable, 
whole-stage codegen used to
+    // build the joined string but leave ev.isNull = true, discarding every 
row as NULL. The result
+    // must match interpreted eval(). The source is materialized via a cached 
temp view (an
+    // InMemoryRelation), so the plan is not folded to interpreted eval by 
ConvertToLocalRelation.
+    withTempView("array_join_codegen") {
+      Seq(
+        (Seq[String]("a", null, "b"), ",", "NR"),
+        (Seq[String]("a", null, "b"), ",", null),
+        (Seq[String]("x", "y"), "-", "NR")
+      ).toDF("arr", "delim_col", 
"repl_col").createOrReplaceTempView("array_join_codegen")
+      spark.catalog.cacheTable("array_join_codegen")
+
+      val query =
+        "SELECT array_join(arr, ',', repl_col) FROM array_join_codegen " +

Review Comment:
   The delimiter passed to `array_join` is the literal `','`, which is 
inherently non-nullable. `delim_col` is never referenced by the `array_join` 
call, so the `AND delim_col IS NOT NULL` predicate is dead (and filters no 
rows, since all three rows have a non-null `delim_col`), and the comment 
above's "an upstream IsNotNull filter that tightens the array (and delimiter) 
to non-nullable" is inaccurate about the delimiter — nothing tightens a 
delimiter column here.
   
   Non-blocking: the test still hits the buggy `else` branch and fails pre-fix. 
But it's a more faithful repro (and the comment/filter become meaningful) if 
`delim_col` is the delimiter:
   ```suggestion
           "SELECT array_join(arr, delim_col, repl_col) FROM array_join_codegen 
" +
   ```
   If you take this, the third row's expected result becomes `Row("x-y")` (its 
`delim_col` is `-`), so update the `checkAnswer` rows accordingly. 
Alternatively, just drop the dead `AND delim_col IS NOT NULL` and the "(and 
delimiter)" parenthetical.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to