Hi Pedro, I could not think of a way using an aggregate. It's possible with a window function, partitioned on user and ordered by time:
// Assuming "df" holds your dataframe ... import org.apache.spark.sql.functions._ import org.apache.spark.sql.expressions.Window val wSpec = Window.partitionBy("user").orderBy("time") df.select($"user", $"time", rank().over(wSpec).as("rank")) .where($"rank" === 1) Xinh On Fri, Jul 8, 2016 at 12:57 PM, Pedro Rodriguez <ski.rodrig...@gmail.com> wrote: > Is there a way to on a GroupedData (from groupBy in DataFrame) to have an > aggregate that returns column A based on a min of column B? For example, I > have a list of sites visited by a given user and I would like to find the > event with the minimum time (first event) > > Thanks, > -- > Pedro Rodriguez > PhD Student in Distributed Machine Learning | CU Boulder > UC Berkeley AMPLab Alumni > > ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423 > Github: github.com/EntilZha | LinkedIn: > https://www.linkedin.com/in/pedrorodriguezscience > >