EnricoMi commented on PR #44458:
URL: https://github.com/apache/spark/pull/44458#issuecomment-1867988850
This is not about optimal ordering (I presume you refer to partitions being
ordered by partition columns, which is optimal to have only one file writer
open at a time), but about additional ordering (to have some additional order
that is not required by the writer task). Having sorted partitions is very
useful when your downstream systems that consume the written data can expect
some order beyond partition keys. So users care about the in-partition order.
I am happy as long as `df.repartition("id").sortWithinPartitions("id",
"time").write.partitionBy("id")` keeps being supported.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]