Oh okay that makes sense. The trick is to take max on tuple2 so you carry the other column along.
It is still unclear to me why we should remember all these tricks (or add lots of extra little functions) when this elegantly can be expressed in a reduce operation with a simple one line lamba function. The same applies to these Window functions. I had to read it 3 times to understand what it all means. Maybe it makes sense for someone who has been forced to use such limited tools in sql for many years but that's not necessary what we should aim for. Why can I not just have the sortBy and then an Iterator[X] => Iterator[Y] to express what I want to do? All these functions (rank etc.) can be trivially expressed in this, plus I can add other operations if needed, instead of being locked in like this Window framework. On Nov 3, 2016 4:10 PM, "Michael Armbrust" <[email protected]> wrote: You are looking to perform an *argmax*, which you can do with a single aggregation. Here is an example <https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/3170497669323442/2840265927289860/latest.html> . On Thu, Nov 3, 2016 at 4:53 AM, Rabin Banerjee <[email protected] > wrote: > Hi All , > > I want to do a dataframe operation to find the rows having the latest > timestamp in each group using the below operation > > df.orderBy(desc("transaction_date")).groupBy("mobileno").agg(first("customername").as("customername"),first("service_type").as("service_type"),first("cust_addr").as("cust_abbr")) > .select("customername","service_type","mobileno","cust_addr") > > > *Spark Version :: 1.6.x* > > My Question is *"Will Spark guarantee the Order while doing the groupBy , if > DF is ordered using OrderBy previously in Spark 1.6.x"??* > > > *I referred a blog here :: > **https://bzhangusc.wordpress.com/2015/05/28/groupby-on-dataframe-is-not-the-groupby-on-rdd/ > > <https://bzhangusc.wordpress.com/2015/05/28/groupby-on-dataframe-is-not-the-groupby-on-rdd/>* > > *Which claims it will work except in Spark 1.5.1 and 1.5.2 .* > > > *I need a bit elaboration of how internally spark handles it ? also is it > more efficient than using a Window function ?* > > > *Thanks in Advance ,* > > *Rabin Banerjee* > > > >
