Hello,
I have my data stored in parquet file format. My data Is already partitioned by 
dates and keyNow I want my data in each file to be sorted by a new Code column. 
date1    -> key1
            -> paqfile1
            ->paqfile2

    ->key2
            ->paqfile1
            ->paqfile2

date2     -> key1            -> paqfile1
            ->paqfile2

    ->key2
            ->paqfile1
            ->paqfile2

df.sort("code").write.mode(SaveMode.Append).format("parquet").save("/apps/spark/logs")

I am doing some thing like this and assuming my current partitioning will still 
be respected and data in my parquet file will be sorted by codes. can you 
please let me know if that will be the casE?

can i still expect the same partitioning or do i have to partition again? 
RegardsShiv 

Reply via email to