AngersZhuuuu opened a new pull request #33810: URL: https://github.com/apache/spark/pull/33810
#### What changes were proposed in this pull request? This PR fixes an issue that field names of structs generated by arrays_zip function could be unexpectedly re-written by analyzer/optimizer. Here is an example. ``` val df = sc.parallelize(Seq((Array(1, 2), Array(3, 4)))).toDF("a1", "b1").selectExpr("arrays_zip(a1, b1) as zipped") df.printSchema root |-- zipped: array (nullable = true) | |-- element: struct (containsNull = false) | | |-- a1: integer (nullable = true) // OK. a1 is expected name | | |-- b1: integer (nullable = true) // OK. b1 is expected name df.explain == Physical Plan == *(1) Project [arrays_zip(_1#3, _2#4) AS zipped#12] // Not OK. field names are re-written as _1 and _2 respectively df.write.parquet("/tmp/test.parquet") val df2 = spark.read.parquet("/tmp/test.parquet") df2.printSchema root |-- zipped: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- _1: integer (nullable = true) // Not OK. a1 is expected but got _1 | | |-- _2: integer (nullable = true) // Not OK. b1 is expected but got _2 ``` This issue happens when aliases are eliminated by AliasHelper.replaceAliasButKeepName or AliasHelper.trimNonTopLevelAliases called via analyzer/optimizer spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala Line 883 in b89cd8d upper.map(replaceAliasButKeepName(_, aliases)) spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala Line 3759 in b89cd8d val cleanedProjectList = projectList.map(trimNonTopLevelAliases) I investigated functions which can be affected this issue but I found only arrays_zip so far. To fix this issue, this PR changes the definition of ArraysZip to retain field names to avoid being re-written by analyzer/optimizer. ### Why are the changes needed? This is apparently a bug. ### Does this PR introduce any user-facing change? No. After this change, the field names are no longer re-written but it should be expected behavior for users. #### How was this patch tested? New tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org