Thank you Moon. Here is the link: https://issues.apache.org/jira/browse/ZEPPELIN-150
Please let me know how can I help further more. On Thu, Jul 2, 2015 at 2:35 AM, moon soo Lee <m...@apache.org> wrote: > Really appreciate for sharing the problem. > Very interesting. Do you mind file a issue on JIRA? > > Best, > moon > > On Tue, Jun 30, 2015 at 4:32 AM Ophir Cohen <oph...@gmail.com> wrote: > >> BTW, this isn't working as well: >> >> >> >> *val sidNameDF = hc.sql("select sid, name from hive_table limit 10")val >> sidNameDF2 = hc.createDataFrame(sidNameDF.rdd, sidNameDF.schema) >> sidNameDF2.registerTempTable("tmp_sid_name2")* >> >> >> On Tue, Jun 30, 2015 at 1:45 PM, Ophir Cohen <oph...@gmail.com> wrote: >> >>> I've made some progress in this issue and I think it's a bug... >>> >>> Apparently, when trying to use registered UDFs on tables that comes >>> from Hive - it returns the above exception (*ClassNotFoundException: >>> org.apache.zeppelin.spark.ZeppelinContext*). >>> When create new table and register it - UDFs works as expected. >>> You can see below to full details and example. >>> >>> Can someone tell if it's the expected behavior or a bug? >>> BTW >>> I don't mind to work on that bug - if you can give a pointer to the >>> right places. >>> >>> BTW2 >>> Trying to register the SAME DataFrame as tempTable does not solve the >>> problem - only creating new table out of new DataFrame (see below). >>> >>> >>> *Detailed example* >>> 1. I have table in Hive called '*hive_table*' with string field called >>> *'name'* and int filed called *'sid'* >>> >>> 2. I registered a udf: >>> *def getStr(str: String) = str + "_str"* >>> *hc.udf.register("getStr", getStr _)* >>> >>> 3. Running the following on Zeppelin: >>> *%sql select getStr(name), * from** hive_table* >>> yields with excpetion: >>> *ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext* >>> >>> 4. Creating new table, as follows: >>> *case class SidName(sid: Int, name: String)* >>> *val sidNameList = hc.sql("select sid, name from hive_table limit >>> 10").collectAsList().map(row => new SidName(row.getInt(0), >>> row.getString(1)))* >>> *val sidNameDF = hc.createDataFrame(sidNameList)* >>> *sidNameDF.registerTempTable("tmp_sid_name")* >>> >>> 5. Query the new table in the same fashion: >>> *%sql select getStr(name), * from tmp_sid_name* >>> >>> This time I get the expected results! >>> >>> >>> On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <oph...@gmail.com> wrote: >>> >>>> BTW >>>> The same query, on the same cluster but on Spark shell return the >>>> expected results. >>>> >>>> On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <oph...@gmail.com> wrote: >>>> >>>>> It looks that Zeppelin jar does not distributed to Spark nodes, though >>>>> I can't understand why it needed for the UDF. >>>>> >>>>> On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <oph...@gmail.com> wrote: >>>>> >>>>>> Thanks for the response, >>>>>> I'm not sure what do you mean, it exactly what I tried and failed. >>>>>> As I wrote above, 'hc' is actually different name to sqlc (that is >>>>>> different name to z.sqlContext). >>>>>> >>>>>> I get the same results. >>>>>> >>>>>> >>>>>> On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <mina...@nflabs.com> wrote: >>>>>> >>>>>>> Hi Ophir, >>>>>>> >>>>>>> Can you try below? >>>>>>> >>>>>>> def getNum(): Int = { >>>>>>> 100 >>>>>>> } >>>>>>> sqlc.udf.register("getNum", getNum _) >>>>>>> sqlc.sql("select getNum() from filteredNc limit 1").show >>>>>>> >>>>>>> FYI sqlContext(==sqlc) is internally created by Zeppelin >>>>>>> and use hiveContext as sqlContext by default. >>>>>>> (If you did not change useHiveContext to be "false" in interpreter >>>>>>> menu.) >>>>>>> >>>>>>> Hope it helps. >>>>>>> >>>>>>> On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <oph...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Guys? >>>>>>>> Somebody? >>>>>>>> Can it be that Zeppelin does not support UDFs? >>>>>>>> >>>>>>>> On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <oph...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Guys, >>>>>>>>> One more problem I have encountered using Zeppelin. >>>>>>>>> Using Spark 1.3.1 on Yarn Hadoop 2.4 >>>>>>>>> >>>>>>>>> I'm trying to create and use UDF (hc == z.sqlContext == >>>>>>>>> HiveContext): >>>>>>>>> 1. Create and register the UDF: >>>>>>>>> def getNum(): Int = { >>>>>>>>> 100 >>>>>>>>> } >>>>>>>>> >>>>>>>>> hc.udf.register("getNum",getNum _) >>>>>>>>> 2. And I try to use on exist table: >>>>>>>>> %sql select getNum() from filteredNc limit 1 >>>>>>>>> >>>>>>>>> Or: >>>>>>>>> 3. Trying using direct hc: >>>>>>>>> hc.sql("select getNum() from filteredNc limit 1").collect >>>>>>>>> >>>>>>>>> Both of them yield with >>>>>>>>> *"java.lang.ClassNotFoundException: >>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext"* >>>>>>>>> (see below the full exception). >>>>>>>>> >>>>>>>>> And my questions is: >>>>>>>>> 1. Can it be that ZeppelinContext is not available on Spark nodes? >>>>>>>>> 2. Why it need ZeppelinContext anyway? Why it's relevant? >>>>>>>>> >>>>>>>>> The exception: >>>>>>>>> WARN [2015-06-28 08:43:53,850] ({task-result-getter-0} >>>>>>>>> Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626, >>>>>>>>> ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError: >>>>>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext; >>>>>>>>> at java.lang.Class.getDeclaredFields0(Native Method) >>>>>>>>> at java.lang.Class.privateGetDeclaredFields(Class.java:2499) >>>>>>>>> at java.lang.Class.getDeclaredField(Class.java:1951) >>>>>>>>> at >>>>>>>>> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659) >>>>>>>>> >>>>>>>>> <Many more of ObjectStreamClass lines of exception> >>>>>>>>> >>>>>>>>> Caused by: java.lang.ClassNotFoundException: >>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext >>>>>>>>> at >>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69) >>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>>>>>>>> ... 103 more >>>>>>>>> Caused by: java.lang.ClassNotFoundException: >>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext >>>>>>>>> at java.lang.ClassLoader.findClass(ClassLoader.java:531) >>>>>>>>> at >>>>>>>>> org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26) >>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>>>>>>>> at >>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34) >>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>>>>>>>> at >>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30) >>>>>>>>> at >>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64) >>>>>>>>> ... 105 more >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>