Ah, that gives me an idea.
val window = Window.partitionBy()
val getRand = udf((cnt:Int) => )
df
.withColumn("cnt", count().over(window))
.withColumn("rnd", getRand($"cnt"))
.where($"rnd" === $"cnt")
Not sure how performant this would be, but writing a UDF is much simpler
than a UDAF.
On Tue,
You can use rank with window function. Rank=1 is same as calling first().
Not sure how you would randomly pick records though, if there is no Nth
record. In your example, what happens if data is of only 2 rows?
On 27 Jul 2016 00:57, "Alex Nastetsky"
wrote:
>
Spark SQL has a "first" function that returns the first item in a group. Is
there a similar function, perhaps in a third party lib, that allows you to
return an arbitrary (e.g. 3rd) item from the group? Was thinking of writing
a UDAF for it, but didn't want to reinvent the wheel. My endgoal is to