Does this happen on a local mode as well or just on external cluster? with regard to the repro - %sql select getNum() from filteredNc limit 1 I guess, filterdNc is some table you have? cause when I tried it on my local machine I got : no such table filteredNc; line 1 pos 21 Eran
On Thu, Jul 2, 2015 at 12:44 PM Ophir Cohen <oph...@gmail.com> wrote: > Thank you Moon. > Here is the link: > https://issues.apache.org/jira/browse/ZEPPELIN-150 > > Please let me know how can I help further more. > > On Thu, Jul 2, 2015 at 2:35 AM, moon soo Lee <m...@apache.org> wrote: > >> Really appreciate for sharing the problem. >> Very interesting. Do you mind file a issue on JIRA? >> >> Best, >> moon >> >> On Tue, Jun 30, 2015 at 4:32 AM Ophir Cohen <oph...@gmail.com> wrote: >> >>> BTW, this isn't working as well: >>> >>> >>> >>> *val sidNameDF = hc.sql("select sid, name from hive_table limit 10")val >>> sidNameDF2 = hc.createDataFrame(sidNameDF.rdd, sidNameDF.schema) >>> sidNameDF2.registerTempTable("tmp_sid_name2")* >>> >>> >>> On Tue, Jun 30, 2015 at 1:45 PM, Ophir Cohen <oph...@gmail.com> wrote: >>> >>>> I've made some progress in this issue and I think it's a bug... >>>> >>>> Apparently, when trying to use registered UDFs on tables that comes >>>> from Hive - it returns the above exception (*ClassNotFoundException: >>>> org.apache.zeppelin.spark.ZeppelinContext*). >>>> When create new table and register it - UDFs works as expected. >>>> You can see below to full details and example. >>>> >>>> Can someone tell if it's the expected behavior or a bug? >>>> BTW >>>> I don't mind to work on that bug - if you can give a pointer to the >>>> right places. >>>> >>>> BTW2 >>>> Trying to register the SAME DataFrame as tempTable does not solve the >>>> problem - only creating new table out of new DataFrame (see below). >>>> >>>> >>>> *Detailed example* >>>> 1. I have table in Hive called '*hive_table*' with string field called >>>> *'name'* and int filed called *'sid'* >>>> >>>> 2. I registered a udf: >>>> *def getStr(str: String) = str + "_str"* >>>> *hc.udf.register("getStr", getStr _)* >>>> >>>> 3. Running the following on Zeppelin: >>>> *%sql select getStr(name), * from** hive_table* >>>> yields with excpetion: >>>> *ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext* >>>> >>>> 4. Creating new table, as follows: >>>> *case class SidName(sid: Int, name: String)* >>>> *val sidNameList = hc.sql("select sid, name from hive_table limit >>>> 10").collectAsList().map(row => new SidName(row.getInt(0), >>>> row.getString(1)))* >>>> *val sidNameDF = hc.createDataFrame(sidNameList)* >>>> *sidNameDF.registerTempTable("tmp_sid_name")* >>>> >>>> 5. Query the new table in the same fashion: >>>> *%sql select getStr(name), * from tmp_sid_name* >>>> >>>> This time I get the expected results! >>>> >>>> >>>> On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <oph...@gmail.com> wrote: >>>> >>>>> BTW >>>>> The same query, on the same cluster but on Spark shell return the >>>>> expected results. >>>>> >>>>> On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <oph...@gmail.com> wrote: >>>>> >>>>>> It looks that Zeppelin jar does not distributed to Spark nodes, >>>>>> though I can't understand why it needed for the UDF. >>>>>> >>>>>> On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <oph...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Thanks for the response, >>>>>>> I'm not sure what do you mean, it exactly what I tried and failed. >>>>>>> As I wrote above, 'hc' is actually different name to sqlc (that is >>>>>>> different name to z.sqlContext). >>>>>>> >>>>>>> I get the same results. >>>>>>> >>>>>>> >>>>>>> On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <mina...@nflabs.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Ophir, >>>>>>>> >>>>>>>> Can you try below? >>>>>>>> >>>>>>>> def getNum(): Int = { >>>>>>>> 100 >>>>>>>> } >>>>>>>> sqlc.udf.register("getNum", getNum _) >>>>>>>> sqlc.sql("select getNum() from filteredNc limit 1").show >>>>>>>> >>>>>>>> FYI sqlContext(==sqlc) is internally created by Zeppelin >>>>>>>> and use hiveContext as sqlContext by default. >>>>>>>> (If you did not change useHiveContext to be "false" in interpreter >>>>>>>> menu.) >>>>>>>> >>>>>>>> Hope it helps. >>>>>>>> >>>>>>>> On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <oph...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Guys? >>>>>>>>> Somebody? >>>>>>>>> Can it be that Zeppelin does not support UDFs? >>>>>>>>> >>>>>>>>> On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <oph...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Guys, >>>>>>>>>> One more problem I have encountered using Zeppelin. >>>>>>>>>> Using Spark 1.3.1 on Yarn Hadoop 2.4 >>>>>>>>>> >>>>>>>>>> I'm trying to create and use UDF (hc == z.sqlContext == >>>>>>>>>> HiveContext): >>>>>>>>>> 1. Create and register the UDF: >>>>>>>>>> def getNum(): Int = { >>>>>>>>>> 100 >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> hc.udf.register("getNum",getNum _) >>>>>>>>>> 2. And I try to use on exist table: >>>>>>>>>> %sql select getNum() from filteredNc limit 1 >>>>>>>>>> >>>>>>>>>> Or: >>>>>>>>>> 3. Trying using direct hc: >>>>>>>>>> hc.sql("select getNum() from filteredNc limit 1").collect >>>>>>>>>> >>>>>>>>>> Both of them yield with >>>>>>>>>> *"java.lang.ClassNotFoundException: >>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext"* >>>>>>>>>> (see below the full exception). >>>>>>>>>> >>>>>>>>>> And my questions is: >>>>>>>>>> 1. Can it be that ZeppelinContext is not available on Spark nodes? >>>>>>>>>> 2. Why it need ZeppelinContext anyway? Why it's relevant? >>>>>>>>>> >>>>>>>>>> The exception: >>>>>>>>>> WARN [2015-06-28 08:43:53,850] ({task-result-getter-0} >>>>>>>>>> Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID >>>>>>>>>> 1626, >>>>>>>>>> ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError: >>>>>>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext; >>>>>>>>>> at java.lang.Class.getDeclaredFields0(Native Method) >>>>>>>>>> at java.lang.Class.privateGetDeclaredFields(Class.java:2499) >>>>>>>>>> at java.lang.Class.getDeclaredField(Class.java:1951) >>>>>>>>>> at >>>>>>>>>> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659) >>>>>>>>>> >>>>>>>>>> <Many more of ObjectStreamClass lines of exception> >>>>>>>>>> >>>>>>>>>> Caused by: java.lang.ClassNotFoundException: >>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext >>>>>>>>>> at >>>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69) >>>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>>>>>>>>> ... 103 more >>>>>>>>>> Caused by: java.lang.ClassNotFoundException: >>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext >>>>>>>>>> at java.lang.ClassLoader.findClass(ClassLoader.java:531) >>>>>>>>>> at >>>>>>>>>> org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26) >>>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>>>>>>>>> at >>>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34) >>>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>>>>>>>>> at >>>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30) >>>>>>>>>> at >>>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64) >>>>>>>>>> ... 105 more >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >