Re: spark sql aggregate function "Nth"

2016-07-26 Thread Alex Nastetsky
Ah, that gives me an idea.

val window = Window.partitionBy()
val getRand = udf((cnt:Int) =>  )

df
.withColumn("cnt", count().over(window))
.withColumn("rnd", getRand($"cnt"))
.where($"rnd" === $"cnt")

Not sure how performant this would be, but writing a UDF is much simpler
than a UDAF.

On Tue, Jul 26, 2016 at 11:48 AM, ayan guha  wrote:

> You can use rank with window function. Rank=1 is same as calling first().
>
> Not sure how you would randomly pick records though, if there is no Nth
> record. In your example, what happens if data is of only 2 rows?
> On 27 Jul 2016 00:57, "Alex Nastetsky" 
> wrote:
>
>> Spark SQL has a "first" function that returns the first item in a group.
>> Is there a similar function, perhaps in a third party lib, that allows you
>> to return an arbitrary (e.g. 3rd) item from the group? Was thinking of
>> writing a UDAF for it, but didn't want to reinvent the wheel. My endgoal is
>> to be able to select a random item from the group, using random number
>> generator.
>>
>> Thanks.
>>
>


Re: spark sql aggregate function "Nth"

2016-07-26 Thread ayan guha
You can use rank with window function. Rank=1 is same as calling first().

Not sure how you would randomly pick records though, if there is no Nth
record. In your example, what happens if data is of only 2 rows?
On 27 Jul 2016 00:57, "Alex Nastetsky" 
wrote:

> Spark SQL has a "first" function that returns the first item in a group.
> Is there a similar function, perhaps in a third party lib, that allows you
> to return an arbitrary (e.g. 3rd) item from the group? Was thinking of
> writing a UDAF for it, but didn't want to reinvent the wheel. My endgoal is
> to be able to select a random item from the group, using random number
> generator.
>
> Thanks.
>


spark sql aggregate function "Nth"

2016-07-26 Thread Alex Nastetsky
Spark SQL has a "first" function that returns the first item in a group. Is
there a similar function, perhaps in a third party lib, that allows you to
return an arbitrary (e.g. 3rd) item from the group? Was thinking of writing
a UDAF for it, but didn't want to reinvent the wheel. My endgoal is to be
able to select a random item from the group, using random number generator.

Thanks.