PartitionBy and SortWithinPartitions

Nikhil Goyal Fri, 03 Jun 2022 08:14:31 -0700

Hi folks,

We are trying to do
`
df.coalesce(1000).sortWithinPartitions("col1").write.mode('overwrite').partitionBy("col2").parquet(...)
`


I do see that coalesce 1000 is applied for every sub partition. But I
wanted to know if sortWithinPartitions(col1) works after applying
partitionBy or before? Basically would spark first partitionBy col2 and
then sort by col1 or sort first and then partition?

Thanks
Nikhil

PartitionBy and SortWithinPartitions

Reply via email to