Hi All, If dataframe is repartitioned in memory by (date, id) columns and then if I use multiple window functions which uses partition by clause with (date, id) columns --> we can avoid shuffle/sort again I believe.. Can someone confirm this?
However what happens when dataframe repartition was done using (date, id) columns, but window function which follows repartition needs a partition by clause with (date, id, col3, col4) columns ? Would spark reshuffle the data? or would it know to utilize the initially partitioned/shuffled data by date/id (as date & id are the common partition keys)? -- Regards, Rishi Shah