I have a use case similar to this:
http://stackoverflow.com/questions/33878370/spark-dataframe-select-the-first-row-of-each-group
and I'm trying to understand the solution titled "ordering over structs":
1) Is a struct in Spark like a struct in C++?
2) What is an alias in this context?
3) How does this code even work?
4) Is it faster doing it this way than doing a join or window function in
Spark SQL?
val dfTop = df.select($"Hour", struct($"TotalValue", $"Category").alias("vs"))
.groupBy($"hour")
.agg(max("vs").alias("vs"))
.select($"Hour", $"vs.Category", $"vs.TotalValue")
thank you,
imran