I've made some progress in this issue and I think it's a bug... Apparently, when trying to use registered UDFs on tables that comes from Hive - it returns the above exception (*ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext*). When create new table and register it - UDFs works as expected. You can see below to full details and example.
Can someone tell if it's the expected behavior or a bug? BTW I don't mind to work on that bug - if you can give a pointer to the right places. BTW2 Trying to register the SAME DataFrame as tempTable does not solve the problem - only creating new table out of new DataFrame (see below). *Detailed example* 1. I have table in Hive called '*hive_table*' with string field called *'name'* and int filed called *'sid'* 2. I registered a udf: *def getStr(str: String) = str + "_str"* *hc.udf.register("getStr", getStr _)* 3. Running the following on Zeppelin: *%sql select getStr(name), * from** hive_table* yields with excpetion: *ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext* 4. Creating new table, as follows: *case class SidName(sid: Int, name: String)* *val sidNameList = hc.sql("select sid, name from hive_table limit 10").collectAsList().map(row => new SidName(row.getInt(0), row.getString(1)))* *val sidNameDF = hc.createDataFrame(sidNameList)* *sidNameDF.registerTempTable("tmp_sid_name")* 5. Query the new table in the same fashion: *%sql select getStr(name), * from tmp_sid_name* This time I get the expected results! On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <oph...@gmail.com> wrote: > BTW > The same query, on the same cluster but on Spark shell return the expected > results. > > On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <oph...@gmail.com> wrote: > >> It looks that Zeppelin jar does not distributed to Spark nodes, though I >> can't understand why it needed for the UDF. >> >> On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <oph...@gmail.com> wrote: >> >>> Thanks for the response, >>> I'm not sure what do you mean, it exactly what I tried and failed. >>> As I wrote above, 'hc' is actually different name to sqlc (that is >>> different name to z.sqlContext). >>> >>> I get the same results. >>> >>> >>> On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <mina...@nflabs.com> wrote: >>> >>>> Hi Ophir, >>>> >>>> Can you try below? >>>> >>>> def getNum(): Int = { >>>> 100 >>>> } >>>> sqlc.udf.register("getNum", getNum _) >>>> sqlc.sql("select getNum() from filteredNc limit 1").show >>>> >>>> FYI sqlContext(==sqlc) is internally created by Zeppelin >>>> and use hiveContext as sqlContext by default. >>>> (If you did not change useHiveContext to be "false" in interpreter >>>> menu.) >>>> >>>> Hope it helps. >>>> >>>> On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <oph...@gmail.com> wrote: >>>> >>>>> Guys? >>>>> Somebody? >>>>> Can it be that Zeppelin does not support UDFs? >>>>> >>>>> On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <oph...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Guys, >>>>>> One more problem I have encountered using Zeppelin. >>>>>> Using Spark 1.3.1 on Yarn Hadoop 2.4 >>>>>> >>>>>> I'm trying to create and use UDF (hc == z.sqlContext == HiveContext): >>>>>> 1. Create and register the UDF: >>>>>> def getNum(): Int = { >>>>>> 100 >>>>>> } >>>>>> >>>>>> hc.udf.register("getNum",getNum _) >>>>>> 2. And I try to use on exist table: >>>>>> %sql select getNum() from filteredNc limit 1 >>>>>> >>>>>> Or: >>>>>> 3. Trying using direct hc: >>>>>> hc.sql("select getNum() from filteredNc limit 1").collect >>>>>> >>>>>> Both of them yield with >>>>>> *"java.lang.ClassNotFoundException: >>>>>> org.apache.zeppelin.spark.ZeppelinContext"* >>>>>> (see below the full exception). >>>>>> >>>>>> And my questions is: >>>>>> 1. Can it be that ZeppelinContext is not available on Spark nodes? >>>>>> 2. Why it need ZeppelinContext anyway? Why it's relevant? >>>>>> >>>>>> The exception: >>>>>> WARN [2015-06-28 08:43:53,850] ({task-result-getter-0} >>>>>> Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626, >>>>>> ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError: >>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext; >>>>>> at java.lang.Class.getDeclaredFields0(Native Method) >>>>>> at java.lang.Class.privateGetDeclaredFields(Class.java:2499) >>>>>> at java.lang.Class.getDeclaredField(Class.java:1951) >>>>>> at >>>>>> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659) >>>>>> >>>>>> <Many more of ObjectStreamClass lines of exception> >>>>>> >>>>>> Caused by: java.lang.ClassNotFoundException: >>>>>> org.apache.zeppelin.spark.ZeppelinContext >>>>>> at >>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69) >>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>>>>> ... 103 more >>>>>> Caused by: java.lang.ClassNotFoundException: >>>>>> org.apache.zeppelin.spark.ZeppelinContext >>>>>> at java.lang.ClassLoader.findClass(ClassLoader.java:531) >>>>>> at >>>>>> org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26) >>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>>>>> at >>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34) >>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>>>>> at >>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30) >>>>>> at >>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64) >>>>>> ... 105 more >>>>>> >>>>> >>>>> >>>> >>> >> >