Any suggestions? On Wed, May 22, 2019 at 6:32 AM Rishi Shah <rishishah.s...@gmail.com> wrote:
> Hi All, > > If dataframe is repartitioned in memory by (date, id) columns and then if > I use multiple window functions which uses partition by clause with (date, > id) columns --> we can avoid shuffle/sort again I believe.. Can someone > confirm this? > > However what happens when dataframe repartition was done using (date, id) > columns, but window function which follows repartition needs a partition by > clause with (date, id, col3, col4) columns ? Would spark reshuffle the > data? or would it know to utilize the initially partitioned/shuffled data > by date/id (as date & id are the common partition keys)? > > -- > Regards, > > Rishi Shah >