Re: [pyspark] Use output of one aggregated function for another aggregated function within the same groupby

Georg Heiler Wed, 24 Apr 2019 21:00:56 -0700

Is analytical window funktions to rank the result and then filter to the
desired rank.


Rishi Shah <rishishah.s...@gmail.com> schrieb am Do. 25. Apr. 2019 um 05:07:

> Hi All,
>
> [PySpark 2.3, python 2.7]
>
> I would like to achieve something like this, could you please suggest best
> way to implement (perhaps highlight pros & cons of the approach in terms of
> performance)?
>
> df = df.groupby('grp_col').agg(max(date).alias('max_date'), count(when
> col('file_date') == col('max_date')))
>
> Please note 'max_date' is a result of aggregate function max inside the
> group by agg. I can definitely use multiple groupbys to achieve this but is
> there a better way? better performance wise may be?
>
> Appreciate your help!
>
>
> --
> Regards,
>
> Rishi Shah
>

Re: [pyspark] Use output of one aggregated function for another aggregated function within the same groupby

Reply via email to