Re: callUdf("percentile_approx",col("mycol"),lit(0.25)) does not compile spark 1.5.1 source but it does work in spark 1.5.1 bin

Umesh Kacha Mon, 02 Nov 2015 23:02:36 -0800

Hi Ted I checked  hive-exec-1.2.1.spark.jar contains the following required
classes but still it doesn't compile I don't understand why is this Jar
getting overwritten in scope


org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFMultiplePercentileApproxEvaluator.class

Please guide.

On Mon, Oct 19, 2015 at 4:30 PM, Umesh Kacha <umesh.ka...@gmail.com> wrote:

> Hi Ted thanks much for your help really appreciate it. I tried to use
> maven dependencies you mentioned but still callUdf is not compiling please
> find snap shot of my intellij editor. I am sorry you may have to zoom
> pictures as I can't share code. Thanks again.
> On Oct 19, 2015 8:32 AM, "Ted Yu" <yuzhih...@gmail.com> wrote:
>
>> Umesh:
>>
>> $ jar tvf
>> /home/hbase/.m2/repository/org/spark-project/hive/hive-exec/1.2.1.spark/hive-exec-1.2.1.spark.jar
>> | grep GenericUDAFPercentile
>>   2143 Fri Jul 31 23:51:48 PDT 2015
>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$1.class
>>   4602 Fri Jul 31 23:51:48 PDT 2015
>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFMultiplePercentileApproxEvaluator.class
>>
>> As long as the following dependency is in your pom.xml:
>> [INFO] +- org.spark-project.hive:hive-exec:jar:1.2.1.spark:compile
>>
>> You should be able to invoke percentile_approx
>>
>> Cheers
>>
>> On Sun, Oct 18, 2015 at 8:58 AM, Umesh Kacha <umesh.ka...@gmail.com>
>> wrote:
>>
>>> Thanks much Ted so when do we get to use this sparkUdf in Java code
>>> using maven code dependencies?? You said JIRA 10671 is not pushed as
>>> part of 1.5.1 so it should be released in 1.6.0 as mentioned in the JIRA
>>> right?
>>>
>>> On Sun, Oct 18, 2015 at 9:20 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>
>>>> The udf is defined in GenericUDAFPercentileApprox of hive.
>>>>
>>>> When spark-shell runs, it has access to the above class which is
>>>> packaged
>>>> in assembly/target/scala-2.10/spark-assembly-1.6.0-SNAPSHOT-hadoop2.7.0.jar
>>>> :
>>>>
>>>>   2143 Fri Oct 16 15:02:26 PDT 2015
>>>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$1.class
>>>>   4602 Fri Oct 16 15:02:26 PDT 2015
>>>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFMultiplePercentileApproxEvaluator.class
>>>>   1697 Fri Oct 16 15:02:26 PDT 2015
>>>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator$PercentileAggBuf.class
>>>>   6570 Fri Oct 16 15:02:26 PDT 2015
>>>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator.class
>>>>   4334 Fri Oct 16 15:02:26 PDT 2015
>>>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox$GenericUDAFSinglePercentileApproxEvaluator.class
>>>>   6293 Fri Oct 16 15:02:26 PDT 2015
>>>> org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox.class
>>>>
>>>> That was the cause for different behavior.
>>>>
>>>> FYI
>>>>
>>>> On Sun, Oct 18, 2015 at 12:10 AM, unk1102 <umesh.ka...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi starting new thread following old thread looks like code for
>>>>> compiling
>>>>> callUdf("percentile_approx",col("mycol"),lit(0.25)) is not merged in
>>>>> spark
>>>>> 1.5.1 source but I dont understand why this function call works in
>>>>> Spark
>>>>> 1.5.1 spark-shell/bin. Please guide.
>>>>>
>>>>> ---------- Forwarded message ----------
>>>>> From: "Ted Yu" <yuzhih...@gmail.com>
>>>>> Date: Oct 14, 2015 3:26 AM
>>>>> Subject: Re: How to calculate percentile of a column of DataFrame?
>>>>> To: "Umesh Kacha" <umesh.ka...@gmail.com>
>>>>> Cc: "Michael Armbrust" <mich...@databricks.com>,
>>>>> "&lt;saif.a.ell...@wellsfargo.com&gt;" <saif.a.ell...@wellsfargo.com>,
>>>>> "user" <user@spark.apache.org>
>>>>>
>>>>> I modified DataFrameSuite, in master branch, to call percentile_approx
>>>>> instead of simpleUDF :
>>>>>
>>>>> - deprecated callUdf in SQLContext
>>>>> - callUDF in SQLContext *** FAILED ***
>>>>>   org.apache.spark.sql.AnalysisException: undefined function
>>>>> percentile_approx;
>>>>>   at
>>>>>
>>>>> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:64)
>>>>>   at
>>>>>
>>>>> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:64)
>>>>>   at scala.Option.getOrElse(Option.scala:120)
>>>>>   at
>>>>>
>>>>> org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:63)
>>>>>   at
>>>>>
>>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
>>>>>   at
>>>>>
>>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
>>>>>   at
>>>>>
>>>>> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
>>>>>   at
>>>>>
>>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:505)
>>>>>   at
>>>>>
>>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:502)
>>>>>   at
>>>>>
>>>>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
>>>>>
>>>>> SPARK-10671 is included.
>>>>> For 1.5.1, I guess the absence of SPARK-10671 means that SparkSQL
>>>>> treats
>>>>> percentile_approx as normal UDF.
>>>>>
>>>>> Experts can correct me, if there is any misunderstanding.
>>>>>
>>>>> Cheers
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/callUdf-percentile-approx-col-mycol-lit-0-25-does-not-compile-spark-1-5-1-source-but-it-does-work-inn-tp25111.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>
>>>>>
>>>>
>>>
>>

Re: callUdf("percentile_approx",col("mycol"),lit(0.25)) does not compile spark 1.5.1 source but it does work in spark 1.5.1 bin

Reply via email to