Is analytical window funktions to rank the result and then filter to the desired rank.
Rishi Shah <rishishah.s...@gmail.com> schrieb am Do. 25. Apr. 2019 um 05:07: > Hi All, > > [PySpark 2.3, python 2.7] > > I would like to achieve something like this, could you please suggest best > way to implement (perhaps highlight pros & cons of the approach in terms of > performance)? > > df = df.groupby('grp_col').agg(max(date).alias('max_date'), count(when > col('file_date') == col('max_date'))) > > Please note 'max_date' is a result of aggregate function max inside the > group by agg. I can definitely use multiple groupbys to achieve this but is > there a better way? better performance wise may be? > > Appreciate your help! > > > -- > Regards, > > Rishi Shah >