Thanks but I tried everything I want to confirm I am writing code below if you can compile the following in Java with spark 1.5.2 then great otherwise nothing is helpful here as I am stumbling with this since last few days.
public class PercentileHiveApproxTestMain { public static void main(String[] args) { SparkConf sparkconf = new SparkConf().setAppName("PercentileHiveApproxTestMain").setMaster("local[*]"); SparkContext sc = new SparkContext(sparkconf); SqlContext sqlContext = new SqlContext(sc); //load two column data from csv and create dataframe with columns C1(int),C0(string) DataFrame df = sqlContext.read().format("com.databricks.spark.csv").load("/tmp/df.csv"); df.select(callUdf("percentile_approx",col("C1"),lit(0.25))).show() //does not compile } } On Mon, Dec 28, 2015 at 9:56 PM, Hamel Kothari <hamelkoth...@gmail.com> wrote: > If you scroll further down in the documentation, you will see that callUDF > does have a version which takes (String, Column...) as arguments: *callUDF > <https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#callUDF(java.lang.String,%20org.apache.spark.sql.Column...)>* > (java.lang.String udfName, Column > <https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Column.html> > ... cols) > > Unfortunately the link I posted above doesn't seem to work because of the > punctuation in the URL but it is there. If you use "callUdf" from Java with > a string argument, which is what you seem to be doing, it expects a > Seq<Column> because of the way it is defined in scala. That's also a > deprecated method anyways. > > The reason you're getting the exception is not because that's the wrong > method to call. It's because the percentile_approx UDF is never registered. > If you're passing in a UDF by name, you must register it with your SQL > context as follows (example taken from the documentation of the above > referenced method): > > import org.apache.spark.sql._ > > val df = Seq(("id1", 1), ("id2", 4), ("id3", 5)).toDF("id", "value") > val sqlContext = df.sqlContext > sqlContext.udf.register("simpleUDF", (v: Int) => v * v) > df.select($"id", callUDF("simpleUDF", $"value")) > > > > > On Mon, Dec 28, 2015 at 11:08 AM Umesh Kacha <umesh.ka...@gmail.com> > wrote: > >> Hi thanks you understood question incorrectly. First of all I am passing >> UDF name as String and if you see callUDF arguments then it does not take >> string as first argument and if I use callUDF it will throw me exception >> saying percentile_approx function not found. And another thing I mentioned >> is that it works in Spark scala console so it does not have any problem of >> calling it in not expected way. Hope now question is clear. >> >> On Mon, Dec 28, 2015 at 9:21 PM, Hamel Kothari <hamelkoth...@gmail.com> >> wrote: >> >>> Also, if I'm reading correctly, it looks like you're calling "callUdf" >>> when what you probably want is "callUDF" (notice the subtle capitalization >>> difference). Docs: >>> https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#callUDF(java.lang.String,%20org.apache.spark.sql.Column.. >>> .) >>> >>> On Mon, Dec 28, 2015 at 10:48 AM Hamel Kothari <hamelkoth...@gmail.com> >>> wrote: >>> >>>> Would you mind sharing more of your code? I can't really see the code >>>> that well from the attached screenshot but it appears that "Lit" is >>>> capitalized. Not sure what this method actually refers to but the >>>> definition in functions.scala is lowercased. >>>> >>>> Even if that's not it, some more code would be helpful to solving this. >>>> Also, since it's a compilation error, if you could share the compilation >>>> error that would be very useful. >>>> >>>> -Hamel >>>> >>>> On Mon, Dec 28, 2015 at 10:26 AM unk1102 <umesh.ka...@gmail.com> wrote: >>>> >>>>> < >>>>> http://apache-spark-user-list.1001560.n3.nabble.com/file/n25821/Screen_Shot_2015-12-28_at_8.jpg >>>>> > >>>>> >>>>> Hi I am trying to invoke Hive UDF using >>>>> dataframe.select(callUdf("percentile_approx",col("C1"),lit(0.25))) but >>>>> it >>>>> does not compile however same call works in Spark scala console I dont >>>>> understand why. I am using Spark 1.5.2 maven source in my Java code. I >>>>> have >>>>> also explicitly added maven dependency hive-exec-1.2.1.spark.jar where >>>>> percentile_approx is located but still does not compile code please >>>>> check >>>>> attached code image. Please guide. Thanks in advance. >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-DataFrame-callUdf-does-not-compile-tp25821.html >>>>> Sent from the Apache Spark User List mailing list archive at >>>>> Nabble.com. >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>> >>>>> >>