Re: Compute Median in Spark Dataframe

Yana Kadiyska Tue, 02 Jun 2015 06:38:33 -0700

Like this...sqlContext should be a HiveContext instance

case class KeyValue(key: Int, value: String)
val df=sc.parallelize(1 to 50).map(i=>KeyValue(i, i.toString)).toDF
df.registerTempTable("table")
sqlContext.sql("select percentile(key,0.5) from table").show()




On Tue, Jun 2, 2015 at 8:07 AM, Olivier Girardot <
o.girar...@lateral-thoughts.com> wrote:

> Hi everyone,
> Is there any way to compute a median on a column using Spark's Dataframe.
> I know you can use stats in a RDD but I'd rather stay within a dataframe.
> Hive seems to imply that using ntile one can compute percentiles,
> quartiles and therefore a median.
> Does anyone have experience with this ?
>
> Regards,
>
> Olivier.
>

Re: Compute Median in Spark Dataframe

Reply via email to