Re: Spark SQL Percentile UDAF

Anand Mohan Tumuluri Thu, 09 Oct 2014 19:35:49 -0700

Filed https://issues.apache.org/jira/browse/SPARK-3891


Thanks,
Anand Mohan

On Thu, Oct 9, 2014 at 7:13 PM, Michael Armbrust <mich...@databricks.com>
wrote:

> Please file a JIRA:https://issues.apache.org/jira/browse/SPARK/
> <https://www.google.com/url?q=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK%2F&sa=D&sntz=1&usg=AFQjCNFS_GnMso2OCOITA0TSJ5U10b3JSQ>
>
> On Thu, Oct 9, 2014 at 6:48 PM, Anand Mohan <chinn...@gmail.com> wrote:
>
>> Hi,
>>
>> I just noticed the Percentile UDAF PR being merged into trunk and decided
>> to test it.
>> So pulled in today's trunk and tested the percentile queries.
>> They work marvelously, Thanks a lot for bringing this into Spark SQL.
>>
>> However Hive percentile UDAF also supports an array mode where in you can
>> give the list of percentiles that you want and it would return an array of
>> double values one for each requested percentile.
>> This query is failing with the below error. However a query with the
>> individual percentiles like
>> percentile(turnaroundtime,0.25),percentile(turnaroundtime,0.5),percentile(turnaroundtime,0.75)
>> is working. (and so this issue is not of a high priority as there is this
>> workaround for us)
>>
>> Thanks,
>> Anand Mohan
>>
>> 0: jdbc:hive2://dev-uuppala.sfohi.philips.com> select name,
>> percentile(turnaroundtime,array(0,0.25,0.5,0.75,1)) from exam group by name;
>>
>> Error: org.apache.spark.SparkException: Job aborted due to stage failure:
>> Task 1 in stage 25.0 failed 4 times, most recent failure: Lost task 1.3 in
>> stage 25.0 (TID 305, Dev-uuppala.sfohi.philips.com):
>> java.lang.ClassCastException: scala.collection.mutable.ArrayBuffer cannot
>> be cast to [Ljava.lang.Object;
>>
>> org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getListLength(StandardListObjectInspector.java:83)
>>
>> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$ListConverter.convert(ObjectInspectorConverters.java:259)
>>
>> org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$ConversionHelper.convertIfNecessary(GenericUDFUtils.java:349)
>>
>> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.iterate(GenericUDAFBridge.java:170)
>>
>> org.apache.spark.sql.hive.HiveUdafFunction.update(hiveUdfs.scala:342)
>>
>> org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:167)
>>
>> org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:151)
>>         org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
>>         org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
>>
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>         org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>         org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>>         org.apache.spark.scheduler.Task.run(Task.scala:56)
>>
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         java.lang.Thread.run(Thread.java:745)
>> Driver stacktrace: (state=,code=0)
>>
>>
>>
>> ------------------------------
>> View this message in context: Spark SQL Percentile UDAF
>> <http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Percentile-UDAF-tp16092.html>
>> Sent from the Apache Spark User List mailing list archive
>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>>
>
>

Re: Spark SQL Percentile UDAF

Reply via email to