I'm still unclear on if orderBy/groupBy + aggregates is a viable approach
or when one could rely on the last or first aggregate functions, but a
working alternative is to use window functions with row_number and a filter
kind of like this:
import spark.implicits._
val reverseOrdering = Seq("a", "
Hi,
I'm struggling a little with some unintuitive behavior with the Scala API.
(Spark 2.0.2)
I wrote something like
df.orderBy("a", "b")
.groupBy("group_id")
.agg(sum("col_to_sum").as("total"),
last("row_id").as("last_row_id")))
and expected a result with a unique group_id column, a