Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21621#discussion_r197633691
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -556,6 +556,17 @@ class DataFrameFunctionsSuite extends QueryTest with
SharedSQLContext {
checkAnswer(df8.selectExpr("arrays_zip(v1, v2)"), expectedValue8)
}
+ test("SPARK-24633: arrays_zip splits input processing correctly") {
+ Seq("true", "false").foreach { wholestageCodegenEnabled =>
+ withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key ->
wholestageCodegenEnabled) {
+ val df = spark.range(1)
+ val exprs = (0 to 5).map(x => array($"id" + lit(x)))
--- End diff --
With/Without this PR, I see the split methods in the generated code using
the original test cases (i.e. `(0 to 5).map...`).
```
/* 226 */ private ArrayData ArraysZip_0(InternalRow i) {
/* 227 */ ArrayData[] arrVals_0 = new ArrayData[6];
/* 228 */ int biggestCardinality_0 = 0;
/* 229 */ ArrayData value_0 = null;
/* 230 */
/* 231 */ biggestCardinality_0 = getValuesAndCardinalities_0_0(i,
arrVals_0, biggestCardinality_0);
/* 232 */ biggestCardinality_0 = getValuesAndCardinalities_0_1(i,
arrVals_0, biggestCardinality_0);
/* 233 */ biggestCardinality_0 = getValuesAndCardinalities_0_2(i,
arrVals_0, biggestCardinality_0);
/* 234 */ boolean isNull_0 = biggestCardinality_0 == -1;
...
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]