Hi Ted thanks much for the detailed answer and appreciate your efforts. Do we need to register Hive UDFs?
sqlContext.udf.register("percentile_approx");???//is it valid? I am calling Hive UDF percentile_approx in the following manner which gives compilation error df.select("col1").groupby("col1").agg(callUdf("percentile_approx",col("col1"),lit(0.25)));//compile error //compile error because callUdf() takes String and Column* as arguments. Please guide. Thanks much. On Mon, Oct 12, 2015 at 11:44 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Using spark-shell, I did the following exercise (master branch) : > > > SQL context available as sqlContext. > > scala> val df = Seq(("id1", 1), ("id2", 4), ("id3", 5)).toDF("id", "value") > df: org.apache.spark.sql.DataFrame = [id: string, value: int] > > scala> sqlContext.udf.register("simpleUDF", (v: Int, cnst: Int) => v * v + > cnst) > res0: org.apache.spark.sql.UserDefinedFunction = > UserDefinedFunction(<function2>,IntegerType,List()) > > scala> df.select($"id", callUDF("simpleUDF", $"value", lit(25))).show() > +---+--------------------+ > | id|'simpleUDF(value,25)| > +---+--------------------+ > |id1| 26| > |id2| 41| > |id3| 50| > +---+--------------------+ > > Which Spark release are you using ? > > Can you pastebin the full stack trace where you got the error ? > > Cheers > > On Fri, Oct 9, 2015 at 1:09 PM, Umesh Kacha <umesh.ka...@gmail.com> wrote: > >> I have a doubt Michael I tried to use callUDF in the following code it >> does not work. >> >> sourceFrame.agg(callUdf("percentile_approx",col("myCol"),lit(0.25))) >> >> Above code does not compile because callUdf() takes only two arguments >> function name in String and Column class type. Please guide. >> >> On Sat, Oct 10, 2015 at 1:29 AM, Umesh Kacha <umesh.ka...@gmail.com> >> wrote: >> >>> thanks much Michael let me try. >>> >>> On Sat, Oct 10, 2015 at 1:20 AM, Michael Armbrust < >>> mich...@databricks.com> wrote: >>> >>>> This is confusing because I made a typo... >>>> >>>> callUDF("percentile_approx", col("mycol"), lit(0.25)) >>>> >>>> The first argument is the name of the UDF, all other arguments need to >>>> be columns that are passed in as arguments. lit is just saying to make a >>>> literal column that always has the value 0.25. >>>> >>>> On Fri, Oct 9, 2015 at 12:16 PM, <saif.a.ell...@wellsfargo.com> wrote: >>>> >>>>> Yes but I mean, this is rather curious. How is def lit(literal:Any) >>>>> --> becomes a percentile function lit(25) >>>>> >>>>> >>>>> >>>>> Thanks for clarification >>>>> >>>>> Saif >>>>> >>>>> >>>>> >>>>> *From:* Umesh Kacha [mailto:umesh.ka...@gmail.com] >>>>> *Sent:* Friday, October 09, 2015 4:10 PM >>>>> *To:* Ellafi, Saif A. >>>>> *Cc:* Michael Armbrust; user >>>>> >>>>> *Subject:* Re: How to calculate percentile of a column of DataFrame? >>>>> >>>>> >>>>> >>>>> I found it in 1.3 documentation lit says something else not percent >>>>> >>>>> >>>>> >>>>> public static Column >>>>> <https://spark.apache.org/docs/1.3.1/api/java/org/apache/spark/sql/Column.html> >>>>> lit(Object literal) >>>>> >>>>> Creates a Column >>>>> <https://spark.apache.org/docs/1.3.1/api/java/org/apache/spark/sql/Column.html> >>>>> of >>>>> literal value. >>>>> >>>>> The passed in object is returned directly if it is already a Column >>>>> <https://spark.apache.org/docs/1.3.1/api/java/org/apache/spark/sql/Column.html>. >>>>> If the object is a Scala Symbol, it is converted into a Column >>>>> <https://spark.apache.org/docs/1.3.1/api/java/org/apache/spark/sql/Column.html> >>>>> also. >>>>> Otherwise, a new Column >>>>> <https://spark.apache.org/docs/1.3.1/api/java/org/apache/spark/sql/Column.html> >>>>> is >>>>> created to represent the literal value. >>>>> >>>>> >>>>> >>>>> On Sat, Oct 10, 2015 at 12:39 AM, <saif.a.ell...@wellsfargo.com> >>>>> wrote: >>>>> >>>>> Where can we find other available functions such as lit() ? I can’t >>>>> find lit in the api. >>>>> >>>>> >>>>> >>>>> Thanks >>>>> >>>>> >>>>> >>>>> *From:* Michael Armbrust [mailto:mich...@databricks.com] >>>>> *Sent:* Friday, October 09, 2015 4:04 PM >>>>> *To:* unk1102 >>>>> *Cc:* user >>>>> *Subject:* Re: How to calculate percentile of a column of DataFrame? >>>>> >>>>> >>>>> >>>>> You can use callUDF(col("mycol"), lit(0.25)) to call hive UDFs from >>>>> dataframes. >>>>> >>>>> >>>>> >>>>> On Fri, Oct 9, 2015 at 12:01 PM, unk1102 <umesh.ka...@gmail.com> >>>>> wrote: >>>>> >>>>> Hi how to calculate percentile of a column in a DataFrame? I cant find >>>>> any >>>>> percentile_approx function in Spark aggregation functions. For e.g. in >>>>> Hive >>>>> we have percentile_approx and we can use it in the following way >>>>> >>>>> hiveContext.sql("select percentile_approx("mycol",0.25) from myTable); >>>>> >>>>> I can see ntile function but not sure how it is gonna give results >>>>> same as >>>>> above query please guide. >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-calculate-percentile-of-a-column-of-DataFrame-tp25000.html >>>>> Sent from the Apache Spark User List mailing list archive at >>>>> Nabble.com. >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >> >