Will do so soon. 10x On Thu, Jul 2, 2015 at 2:39 PM, IT CTO <goi....@gmail.com> wrote:
> I think you should add these notes to the JIRA note as it is not clear > from the note itself. (sorry that this is not helping solving the problem > itself :-)) > > On Thu, Jul 2, 2015 at 2:06 PM Ophir Cohen <oph...@gmail.com> wrote: > >> It does not happen in local mode. >> Actually whenever it works in the same process it works great. >> It looks that somehow Zeppelin jar does not distributed into the nodes. >> Still, it strange as register UDF and the UDF itslef does not need >> ZeppelinContext (at least not explicitly). >> >> And yes, filterdNc is a local table, I just use it to enable me call the >> UDF. you can try that on any table. >> >> On Thu, Jul 2, 2015 at 1:23 PM, IT CTO <goi....@gmail.com> wrote: >> >>> Does this happen on a local mode as well or just on external cluster? >>> with regard to the repro - %sql select getNum() from filteredNc limit 1 >>> I guess, filterdNc is some table you have? cause when I tried it on my >>> local machine I got : >>> no such table filteredNc; line 1 pos 21 >>> Eran >>> >>> On Thu, Jul 2, 2015 at 12:44 PM Ophir Cohen <oph...@gmail.com> wrote: >>> >>>> Thank you Moon. >>>> Here is the link: >>>> https://issues.apache.org/jira/browse/ZEPPELIN-150 >>>> >>>> Please let me know how can I help further more. >>>> >>>> On Thu, Jul 2, 2015 at 2:35 AM, moon soo Lee <m...@apache.org> wrote: >>>> >>>>> Really appreciate for sharing the problem. >>>>> Very interesting. Do you mind file a issue on JIRA? >>>>> >>>>> Best, >>>>> moon >>>>> >>>>> On Tue, Jun 30, 2015 at 4:32 AM Ophir Cohen <oph...@gmail.com> wrote: >>>>> >>>>>> BTW, this isn't working as well: >>>>>> >>>>>> >>>>>> >>>>>> *val sidNameDF = hc.sql("select sid, name from hive_table limit >>>>>> 10")val sidNameDF2 = hc.createDataFrame(sidNameDF.rdd, sidNameDF.schema) >>>>>> sidNameDF2.registerTempTable("tmp_sid_name2")* >>>>>> >>>>>> >>>>>> On Tue, Jun 30, 2015 at 1:45 PM, Ophir Cohen <oph...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> I've made some progress in this issue and I think it's a bug... >>>>>>> >>>>>>> Apparently, when trying to use registered UDFs on tables that comes >>>>>>> from Hive - it returns the above exception (*ClassNotFoundException: >>>>>>> org.apache.zeppelin.spark.ZeppelinContext*). >>>>>>> When create new table and register it - UDFs works as expected. >>>>>>> You can see below to full details and example. >>>>>>> >>>>>>> Can someone tell if it's the expected behavior or a bug? >>>>>>> BTW >>>>>>> I don't mind to work on that bug - if you can give a pointer to the >>>>>>> right places. >>>>>>> >>>>>>> BTW2 >>>>>>> Trying to register the SAME DataFrame as tempTable does not solve >>>>>>> the problem - only creating new table out of new DataFrame (see below). >>>>>>> >>>>>>> >>>>>>> *Detailed example* >>>>>>> 1. I have table in Hive called '*hive_table*' with string field >>>>>>> called *'name'* and int filed called *'sid'* >>>>>>> >>>>>>> 2. I registered a udf: >>>>>>> *def getStr(str: String) = str + "_str"* >>>>>>> *hc.udf.register("getStr", getStr _)* >>>>>>> >>>>>>> 3. Running the following on Zeppelin: >>>>>>> *%sql select getStr(name), * from** hive_table* >>>>>>> yields with excpetion: >>>>>>> *ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext* >>>>>>> >>>>>>> 4. Creating new table, as follows: >>>>>>> *case class SidName(sid: Int, name: String)* >>>>>>> *val sidNameList = hc.sql("select sid, name from hive_table limit >>>>>>> 10").collectAsList().map(row => new SidName(row.getInt(0), >>>>>>> row.getString(1)))* >>>>>>> *val sidNameDF = hc.createDataFrame(sidNameList)* >>>>>>> *sidNameDF.registerTempTable("tmp_sid_name")* >>>>>>> >>>>>>> 5. Query the new table in the same fashion: >>>>>>> *%sql select getStr(name), * from tmp_sid_name* >>>>>>> >>>>>>> This time I get the expected results! >>>>>>> >>>>>>> >>>>>>> On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <oph...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> BTW >>>>>>>> The same query, on the same cluster but on Spark shell return the >>>>>>>> expected results. >>>>>>>> >>>>>>>> On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <oph...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> It looks that Zeppelin jar does not distributed to Spark nodes, >>>>>>>>> though I can't understand why it needed for the UDF. >>>>>>>>> >>>>>>>>> On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <oph...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks for the response, >>>>>>>>>> I'm not sure what do you mean, it exactly what I tried and failed. >>>>>>>>>> As I wrote above, 'hc' is actually different name to sqlc (that >>>>>>>>>> is different name to z.sqlContext). >>>>>>>>>> >>>>>>>>>> I get the same results. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <mina...@nflabs.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Ophir, >>>>>>>>>>> >>>>>>>>>>> Can you try below? >>>>>>>>>>> >>>>>>>>>>> def getNum(): Int = { >>>>>>>>>>> 100 >>>>>>>>>>> } >>>>>>>>>>> sqlc.udf.register("getNum", getNum _) >>>>>>>>>>> sqlc.sql("select getNum() from filteredNc limit 1").show >>>>>>>>>>> >>>>>>>>>>> FYI sqlContext(==sqlc) is internally created by Zeppelin >>>>>>>>>>> and use hiveContext as sqlContext by default. >>>>>>>>>>> (If you did not change useHiveContext to be "false" in >>>>>>>>>>> interpreter menu.) >>>>>>>>>>> >>>>>>>>>>> Hope it helps. >>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <oph...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Guys? >>>>>>>>>>>> Somebody? >>>>>>>>>>>> Can it be that Zeppelin does not support UDFs? >>>>>>>>>>>> >>>>>>>>>>>> On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <oph...@gmail.com >>>>>>>>>>>> > wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Guys, >>>>>>>>>>>>> One more problem I have encountered using Zeppelin. >>>>>>>>>>>>> Using Spark 1.3.1 on Yarn Hadoop 2.4 >>>>>>>>>>>>> >>>>>>>>>>>>> I'm trying to create and use UDF (hc == z.sqlContext == >>>>>>>>>>>>> HiveContext): >>>>>>>>>>>>> 1. Create and register the UDF: >>>>>>>>>>>>> def getNum(): Int = { >>>>>>>>>>>>> 100 >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> hc.udf.register("getNum",getNum _) >>>>>>>>>>>>> 2. And I try to use on exist table: >>>>>>>>>>>>> %sql select getNum() from filteredNc limit 1 >>>>>>>>>>>>> >>>>>>>>>>>>> Or: >>>>>>>>>>>>> 3. Trying using direct hc: >>>>>>>>>>>>> hc.sql("select getNum() from filteredNc limit 1").collect >>>>>>>>>>>>> >>>>>>>>>>>>> Both of them yield with >>>>>>>>>>>>> *"java.lang.ClassNotFoundException: >>>>>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext"* >>>>>>>>>>>>> (see below the full exception). >>>>>>>>>>>>> >>>>>>>>>>>>> And my questions is: >>>>>>>>>>>>> 1. Can it be that ZeppelinContext is not available on Spark >>>>>>>>>>>>> nodes? >>>>>>>>>>>>> 2. Why it need ZeppelinContext anyway? Why it's relevant? >>>>>>>>>>>>> >>>>>>>>>>>>> The exception: >>>>>>>>>>>>> WARN [2015-06-28 08:43:53,850] ({task-result-getter-0} >>>>>>>>>>>>> Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID >>>>>>>>>>>>> 1626, >>>>>>>>>>>>> ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError: >>>>>>>>>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext; >>>>>>>>>>>>> at java.lang.Class.getDeclaredFields0(Native Method) >>>>>>>>>>>>> at >>>>>>>>>>>>> java.lang.Class.privateGetDeclaredFields(Class.java:2499) >>>>>>>>>>>>> at java.lang.Class.getDeclaredField(Class.java:1951) >>>>>>>>>>>>> at >>>>>>>>>>>>> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659) >>>>>>>>>>>>> >>>>>>>>>>>>> <Many more of ObjectStreamClass lines of exception> >>>>>>>>>>>>> >>>>>>>>>>>>> Caused by: java.lang.ClassNotFoundException: >>>>>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext >>>>>>>>>>>>> at >>>>>>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69) >>>>>>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>>>>>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>>>>>>>>>>>> ... 103 more >>>>>>>>>>>>> Caused by: java.lang.ClassNotFoundException: >>>>>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext >>>>>>>>>>>>> at java.lang.ClassLoader.findClass(ClassLoader.java:531) >>>>>>>>>>>>> at >>>>>>>>>>>>> org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26) >>>>>>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>>>>>>>>>>>> at >>>>>>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34) >>>>>>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>>>>>>>>>>>> at >>>>>>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30) >>>>>>>>>>>>> at >>>>>>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64) >>>>>>>>>>>>> ... 105 more >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>