Spark SQL Percentile UDAF

Anand Mohan Thu, 09 Oct 2014 18:48:54 -0700

Hi,

I just noticed the Percentile UDAF PR being merged into trunk and decided
to test it.
So pulled in today's trunk and tested the percentile queries.
They work marvelously, Thanks a lot for bringing this into Spark SQL.


However Hive percentile UDAF also supports an array mode where in you can
give the list of percentiles that you want and it would return an array of
double values one for each requested percentile.
This query is failing with the below error. However a query with the
individual percentiles like
percentile(turnaroundtime,0.25),percentile(turnaroundtime,0.5),percentile(turnaroundtime,0.75)
is working. (and so this issue is not of a high priority as there is this
workaround for us)

Thanks,
Anand Mohan

0: jdbc:hive2://dev-uuppala.sfohi.philips.com> select name,
percentile(turnaroundtime,array(0,0.25,0.5,0.75,1)) from exam group by name;

Error: org.apache.spark.SparkException: Job aborted due to stage failure:
Task 1 in stage 25.0 failed 4 times, most recent failure: Lost task 1.3 in
stage 25.0 (TID 305, Dev-uuppala.sfohi.philips.com):
java.lang.ClassCastException: scala.collection.mutable.ArrayBuffer cannot
be cast to [Ljava.lang.Object;

org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getListLength(StandardListObjectInspector.java:83)

org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$ListConverter.convert(ObjectInspectorConverters.java:259)

org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$ConversionHelper.convertIfNecessary(GenericUDFUtils.java:349)

org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.iterate(GenericUDAFBridge.java:170)

org.apache.spark.sql.hive.HiveUdafFunction.update(hiveUdfs.scala:342)

org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:167)

org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:151)
        org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
        org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)

org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        org.apache.spark.scheduler.Task.run(Task.scala:56)

org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:745)
Driver stacktrace: (state=,code=0)




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Percentile-UDAF-tp16092.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Spark SQL Percentile UDAF

Reply via email to