Re: [PR] [SPARK-46485][SQL] V1Write should not add Sort when not needed [spark]

via GitHub Fri, 22 Dec 2023 08:20:40 -0800


EnricoMi commented on PR #44458:
URL: https://github.com/apache/spark/pull/44458#issuecomment-1867863923


   What do you think about making user-desired order of partitions explicit by 
opening `.write.orderBy` to `.write.partitionBy`? Right now, `.write.orderBy` 
is exclusively used by bucketing (`.write.bucketBy`).
   
   Instead of
   
       df.sortWithinPartitions("id", "time").write.partitionBy("id")
   
   users can explicitly sort the partitions:
   
       df.write.partitionBy("id").sortBy("id", "time")
   
   Then that desire is explicitly available to the writer and does not need to 
be derived from the plan.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-46485][SQL] V1Write should not add Sort when not needed [spark]

Reply via email to