Re: [pyspark 2.3+] repartition followed by window function

Shraddha Shah Wed, 22 May 2019 20:54:39 -0700

Any suggestions?

On Wed, May 22, 2019 at 6:32 AM Rishi Shah <rishishah.s...@gmail.com> wrote:


> Hi All,
>
> If dataframe is repartitioned in memory by (date, id) columns and then if
> I use multiple window functions which uses partition by clause with (date,
> id) columns --> we can avoid shuffle/sort again I believe.. Can someone
> confirm this?
>
> However what happens when dataframe repartition was done using (date, id)
> columns, but window function which follows repartition needs a partition by
> clause with (date, id, col3, col4) columns ? Would spark reshuffle the
> data? or would it know to utilize the initially partitioned/shuffled data
> by date/id (as date & id are the common partition keys)?
>
> --
> Regards,
>
> Rishi Shah
>

Re: [pyspark 2.3+] repartition followed by window function

Reply via email to