Hi, I just noticed the Percentile UDAF PR being merged into trunk and decided to test it. So pulled in today's trunk and tested the percentile queries. They work marvelously, Thanks a lot for bringing this into Spark SQL.
However Hive percentile UDAF also supports an array mode where in you can give the list of percentiles that you want and it would return an array of double values one for each requested percentile. This query is failing with the below error. However a query with the individual percentiles like percentile(turnaroundtime,0.25),percentile(turnaroundtime,0.5),percentile(turnaroundtime,0.75) is working. (and so this issue is not of a high priority as there is this workaround for us) Thanks, Anand Mohan 0: jdbc:hive2://dev-uuppala.sfohi.philips.com> select name, percentile(turnaroundtime,array(0,0.25,0.5,0.75,1)) from exam group by name; Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 25.0 failed 4 times, most recent failure: Lost task 1.3 in stage 25.0 (TID 305, Dev-uuppala.sfohi.philips.com): java.lang.ClassCastException: scala.collection.mutable.ArrayBuffer cannot be cast to [Ljava.lang.Object; org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getListLength(StandardListObjectInspector.java:83) org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$ListConverter.convert(ObjectInspectorConverters.java:259) org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$ConversionHelper.convertIfNecessary(GenericUDFUtils.java:349) org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.iterate(GenericUDAFBridge.java:170) org.apache.spark.sql.hive.HiveUdafFunction.update(hiveUdfs.scala:342) org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:167) org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:151) org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599) org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599) org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) org.apache.spark.scheduler.Task.run(Task.scala:56) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) Driver stacktrace: (state=,code=0) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Percentile-UDAF-tp16092.html Sent from the Apache Spark User List mailing list archive at Nabble.com.