Re: spark sql aggregate function "Nth"
Ah, that gives me an idea. val window = Window.partitionBy() val getRand = udf((cnt:Int) => ) df .withColumn("cnt", count().over(window)) .withColumn("rnd", getRand($"cnt")) .where($"rnd" === $"cnt") Not sure how performant this would be, but writing a UDF is much simpler than a UDAF. On Tue, Jul 26, 2016 at 11:48 AM, ayan guhawrote: > You can use rank with window function. Rank=1 is same as calling first(). > > Not sure how you would randomly pick records though, if there is no Nth > record. In your example, what happens if data is of only 2 rows? > On 27 Jul 2016 00:57, "Alex Nastetsky" > wrote: > >> Spark SQL has a "first" function that returns the first item in a group. >> Is there a similar function, perhaps in a third party lib, that allows you >> to return an arbitrary (e.g. 3rd) item from the group? Was thinking of >> writing a UDAF for it, but didn't want to reinvent the wheel. My endgoal is >> to be able to select a random item from the group, using random number >> generator. >> >> Thanks. >> >
Re: spark sql aggregate function "Nth"
You can use rank with window function. Rank=1 is same as calling first(). Not sure how you would randomly pick records though, if there is no Nth record. In your example, what happens if data is of only 2 rows? On 27 Jul 2016 00:57, "Alex Nastetsky"wrote: > Spark SQL has a "first" function that returns the first item in a group. > Is there a similar function, perhaps in a third party lib, that allows you > to return an arbitrary (e.g. 3rd) item from the group? Was thinking of > writing a UDAF for it, but didn't want to reinvent the wheel. My endgoal is > to be able to select a random item from the group, using random number > generator. > > Thanks. >
spark sql aggregate function "Nth"
Spark SQL has a "first" function that returns the first item in a group. Is there a similar function, perhaps in a third party lib, that allows you to return an arbitrary (e.g. 3rd) item from the group? Was thinking of writing a UDAF for it, but didn't want to reinvent the wheel. My endgoal is to be able to select a random item from the group, using random number generator. Thanks.