Like this...sqlContext should be a HiveContext instance case class KeyValue(key: Int, value: String) val df=sc.parallelize(1 to 50).map(i=>KeyValue(i, i.toString)).toDF df.registerTempTable("table") sqlContext.sql("select percentile(key,0.5) from table").show()
On Tue, Jun 2, 2015 at 8:07 AM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > Hi everyone, > Is there any way to compute a median on a column using Spark's Dataframe. > I know you can use stats in a RDD but I'd rather stay within a dataframe. > Hive seems to imply that using ntile one can compute percentiles, > quartiles and therefore a median. > Does anyone have experience with this ? > > Regards, > > Olivier. >