HeartSaVioR commented on a change in pull request #26173: [SPARK-29503][SQL]
Remove conversion CreateNamedStruct to CreateNamedStructUnsafe
URL: https://github.com/apache/spark/pull/26173#discussion_r339968513
##########
File path:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala
##########
@@ -64,6 +68,24 @@ class DataFrameComplexTypeSuite extends QueryTest with
SharedSparkSession {
val ds100_5 = Seq(S100_5()).toDS()
ds100_5.rdd.count
}
+
+ test("SPARK-29503 nest unsafe struct inside safe array") {
+ withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
+ val df = spark.sparkContext.parallelize(Seq(Seq(1, 2, 3))).toDF("items")
+
+ // items: Seq[Int] => items.map { item => Seq(Struct(item)) }
+ val result = df.select(
+ new Column(MapObjects(
+ (item: Expression) => array(struct(new Column(item))).expr,
Review comment:
I haven't spent another time to try it (as it seems to be clean and simple
reproducer). I'm not sure it's not going to be valid reproducer since it pulls
catalyst package. Catalyst could analyze the query and inject it if necessary
in any way.
I indicated you'd like to revisit #25745 - that was WIP and it didn't have
any number of performance gain. I'd rather choose "safeness" over "speed", and
even we haven't figured out there's outstanding difference between twos. It was
the only one case MapObjects could have unsafe struct, by allowing this, safe
and unsafe are possibly mixed up leading to encounter corner case.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]