EnricoMi commented on code in PR #38356: URL: https://github.com/apache/spark/pull/38356#discussion_r1021635684
########## sql/core/src/test/scala/org/apache/spark/sql/sources/PartitionedWriteSuite.scala: ########## @@ -220,6 +220,23 @@ class PartitionedWriteSuite extends QueryTest with SharedSparkSession { } } } + + test("SPARK-40885: V1 write uses the sort with partitionBy operator") { + withTempPath { f => + Seq((20, 30, "partition"), (15, 20, "partition"), + (30, 70, "partition"), (18, 40, "partition")) + .toDF("id", "sort_col", "p") + .repartition(1) + .sortWithinPartitions("p", "sort_col") + .write + .partitionBy("p") Review Comment: Adding `empty2null` throws null values and empty values into same partition, and the user has no way to make Spark treat them as distinct values. But changing this smells like a breaking change, unless some config allows to bring back the current behaviour. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org