Re: Spark DataFrame callUdf does not compile?

Umesh Kacha Mon, 28 Dec 2015 08:39:17 -0800

Thanks but I tried everything I want to confirm I am writing code below if
you can compile the following in Java with spark 1.5.2 then great otherwise
nothing is helpful here as I am stumbling with this since last few days.


public class PercentileHiveApproxTestMain {

public static void main(String[] args) {
SparkConf sparkconf = new
SparkConf().setAppName("PercentileHiveApproxTestMain").setMaster("local[*]");
SparkContext sc = new SparkContext(sparkconf);
SqlContext sqlContext = new SqlContext(sc);
//load two column data from csv and create dataframe with columns
C1(int),C0(string)
DataFrame df =
sqlContext.read().format("com.databricks.spark.csv").load("/tmp/df.csv");
df.select(callUdf("percentile_approx",col("C1"),lit(0.25))).show() //does
not compile
}

}

On Mon, Dec 28, 2015 at 9:56 PM, Hamel Kothari <hamelkoth...@gmail.com>
wrote:

> If you scroll further down in the documentation, you will see that callUDF
> does have a version which takes (String, Column...) as arguments: *callUDF
> <https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#callUDF(java.lang.String,%20org.apache.spark.sql.Column...)>*
> (java.lang.String udfName, Column
> <https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Column.html>
> ... cols)
>
> Unfortunately the link I posted above doesn't seem to work because of the
> punctuation in the URL but it is there. If you use "callUdf" from Java with
> a string argument, which is what you seem to be doing, it expects a
> Seq<Column> because of the way it is defined in scala. That's also a
> deprecated method anyways.
>
> The reason you're getting the exception is not because that's the wrong
> method to call. It's because the percentile_approx UDF is never registered.
> If you're passing in a UDF by name, you must register it with your SQL
> context as follows (example taken from the documentation of the above
> referenced method):
>
>   import org.apache.spark.sql._
>
>   val df = Seq(("id1", 1), ("id2", 4), ("id3", 5)).toDF("id", "value")
>   val sqlContext = df.sqlContext
>   sqlContext.udf.register("simpleUDF", (v: Int) => v * v)
>   df.select($"id", callUDF("simpleUDF", $"value"))
>
>
>
>
> On Mon, Dec 28, 2015 at 11:08 AM Umesh Kacha <umesh.ka...@gmail.com>
> wrote:
>
>> Hi thanks you understood question incorrectly. First of all I am passing
>> UDF name as String and if you see callUDF arguments then it does not take
>> string as first argument and if I use callUDF it will throw me exception
>> saying percentile_approx function not found. And another thing I mentioned
>> is that it works in Spark scala console so it does not have any problem of
>> calling it in not expected way. Hope now question is clear.
>>
>> On Mon, Dec 28, 2015 at 9:21 PM, Hamel Kothari <hamelkoth...@gmail.com>
>> wrote:
>>
>>> Also, if I'm reading correctly, it looks like you're calling "callUdf"
>>> when what you probably want is "callUDF" (notice the subtle capitalization
>>> difference). Docs:
>>> https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#callUDF(java.lang.String,%20org.apache.spark.sql.Column..
>>> .)
>>>
>>> On Mon, Dec 28, 2015 at 10:48 AM Hamel Kothari <hamelkoth...@gmail.com>
>>> wrote:
>>>
>>>> Would you mind sharing more of your code? I can't really see the code
>>>> that well from the attached screenshot but it appears that "Lit" is
>>>> capitalized. Not sure what this method actually refers to but the
>>>> definition in functions.scala is lowercased.
>>>>
>>>> Even if that's not it, some more code would be helpful to solving this.
>>>> Also, since it's a compilation error, if you could share the compilation
>>>> error that would be very useful.
>>>>
>>>> -Hamel
>>>>
>>>> On Mon, Dec 28, 2015 at 10:26 AM unk1102 <umesh.ka...@gmail.com> wrote:
>>>>
>>>>> <
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/file/n25821/Screen_Shot_2015-12-28_at_8.jpg
>>>>> >
>>>>>
>>>>> Hi I am trying to invoke Hive UDF using
>>>>> dataframe.select(callUdf("percentile_approx",col("C1"),lit(0.25))) but
>>>>> it
>>>>> does not compile however same call works in Spark scala console I dont
>>>>> understand why. I am using Spark 1.5.2 maven source in my Java code. I
>>>>> have
>>>>> also explicitly added maven dependency hive-exec-1.2.1.spark.jar where
>>>>> percentile_approx is located but still does not compile code please
>>>>> check
>>>>> attached code image. Please guide. Thanks in advance.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-DataFrame-callUdf-does-not-compile-tp25821.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>
>>>>>
>>

Re: Spark DataFrame callUdf does not compile?

Reply via email to