Re: UDFs in Zeppelin??

Ophir Cohen Thu, 02 Jul 2015 02:45:05 -0700

Thank you Moon.
Here is the link:
https://issues.apache.org/jira/browse/ZEPPELIN-150


Please let me know how can I help further more.

On Thu, Jul 2, 2015 at 2:35 AM, moon soo Lee <m...@apache.org> wrote:

> Really appreciate for sharing the problem.
> Very interesting. Do you mind file a issue on JIRA?
>
> Best,
> moon
>
> On Tue, Jun 30, 2015 at 4:32 AM Ophir Cohen <oph...@gmail.com> wrote:
>
>> BTW, this isn't working as well:
>>
>>
>>
>> *val sidNameDF = hc.sql("select sid, name from hive_table limit 10")val
>> sidNameDF2 = hc.createDataFrame(sidNameDF.rdd, sidNameDF.schema)
>> sidNameDF2.registerTempTable("tmp_sid_name2")*
>>
>>
>> On Tue, Jun 30, 2015 at 1:45 PM, Ophir Cohen <oph...@gmail.com> wrote:
>>
>>> I've made some progress in this issue and I think it's a bug...
>>>
>>> Apparently, when trying to use registered UDFs on tables that comes
>>> from Hive - it returns the above exception (*ClassNotFoundException:
>>> org.apache.zeppelin.spark.ZeppelinContext*).
>>> When create new table and register it - UDFs works as expected.
>>> You can see below to full details and example.
>>>
>>> Can someone tell if it's the expected behavior or a bug?
>>> BTW
>>> I don't mind to work on that bug - if you can give a pointer to the
>>> right places.
>>>
>>> BTW2
>>> Trying to register the SAME DataFrame as tempTable does not solve the
>>> problem - only creating new table out of new DataFrame (see below).
>>>
>>>
>>> *Detailed example*
>>> 1. I have table in Hive called '*hive_table*' with string field called
>>> *'name'* and int filed called *'sid'*
>>>
>>> 2. I registered a udf:
>>> *def getStr(str: String) = str + "_str"*
>>> *hc.udf.register("getStr", getStr _)*
>>>
>>> 3. Running the following on Zeppelin:
>>> *%sql select getStr(name), * from** hive_table*
>>> yields with excpetion:
>>> *ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext*
>>>
>>> 4. Creating new table, as follows:
>>> *case class SidName(sid: Int, name: String)*
>>> *val sidNameList = hc.sql("select sid, name from hive_table limit
>>> 10").collectAsList().map(row => new SidName(row.getInt(0),
>>> row.getString(1)))*
>>> *val sidNameDF = hc.createDataFrame(sidNameList)*
>>> *sidNameDF.registerTempTable("tmp_sid_name")*
>>>
>>> 5. Query the new table in the same fashion:
>>> *%sql select getStr(name), * from tmp_sid_name*
>>>
>>> This time I get the expected results!
>>>
>>>
>>> On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <oph...@gmail.com> wrote:
>>>
>>>> BTW
>>>> The same query, on the same cluster but on Spark shell return the
>>>> expected results.
>>>>
>>>> On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <oph...@gmail.com> wrote:
>>>>
>>>>> It looks that Zeppelin jar does not distributed to Spark nodes, though
>>>>> I can't understand why it needed for the UDF.
>>>>>
>>>>> On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <oph...@gmail.com> wrote:
>>>>>
>>>>>> Thanks for the response,
>>>>>> I'm not sure what do you mean, it exactly what I tried and failed.
>>>>>> As I wrote above, 'hc' is actually different name to sqlc (that is
>>>>>> different name to z.sqlContext).
>>>>>>
>>>>>> I get the same results.
>>>>>>
>>>>>>
>>>>>> On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <mina...@nflabs.com> wrote:
>>>>>>
>>>>>>> Hi Ophir,
>>>>>>>
>>>>>>> Can you try below?
>>>>>>>
>>>>>>> def getNum(): Int = {
>>>>>>>     100
>>>>>>> }
>>>>>>> sqlc.udf.register("getNum", getNum _)
>>>>>>> sqlc.sql("select getNum() from filteredNc limit 1").show
>>>>>>>
>>>>>>> FYI sqlContext(==sqlc) is internally created by Zeppelin
>>>>>>> and use hiveContext as sqlContext by default.
>>>>>>> (If you did not change useHiveContext to be "false" in interpreter
>>>>>>> menu.)
>>>>>>>
>>>>>>> Hope it helps.
>>>>>>>
>>>>>>> On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <oph...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Guys?
>>>>>>>> Somebody?
>>>>>>>> Can it be that Zeppelin does not support UDFs?
>>>>>>>>
>>>>>>>> On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <oph...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Guys,
>>>>>>>>> One more problem I have encountered using Zeppelin.
>>>>>>>>> Using Spark 1.3.1 on Yarn Hadoop 2.4
>>>>>>>>>
>>>>>>>>> I'm trying to create and use UDF (hc == z.sqlContext ==
>>>>>>>>> HiveContext):
>>>>>>>>> 1. Create and register the UDF:
>>>>>>>>> def getNum(): Int = {
>>>>>>>>>     100
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> hc.udf.register("getNum",getNum _)
>>>>>>>>> 2. And I try to use on exist table:
>>>>>>>>> %sql select getNum() from filteredNc limit 1
>>>>>>>>>
>>>>>>>>> Or:
>>>>>>>>> 3. Trying using direct hc:
>>>>>>>>> hc.sql("select getNum() from filteredNc limit 1").collect
>>>>>>>>>
>>>>>>>>> Both of them yield with
>>>>>>>>> *"java.lang.ClassNotFoundException:
>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext"*
>>>>>>>>> (see below the full exception).
>>>>>>>>>
>>>>>>>>> And my questions is:
>>>>>>>>> 1. Can it be that ZeppelinContext is not available on Spark nodes?
>>>>>>>>> 2. Why it need ZeppelinContext anyway? Why it's relevant?
>>>>>>>>>
>>>>>>>>> The exception:
>>>>>>>>>  WARN [2015-06-28 08:43:53,850] ({task-result-getter-0}
>>>>>>>>> Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626,
>>>>>>>>> ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError:
>>>>>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext;
>>>>>>>>>     at java.lang.Class.getDeclaredFields0(Native Method)
>>>>>>>>>     at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
>>>>>>>>>     at java.lang.Class.getDeclaredField(Class.java:1951)
>>>>>>>>>     at
>>>>>>>>> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)
>>>>>>>>>
>>>>>>>>> <Many more of ObjectStreamClass lines of exception>
>>>>>>>>>
>>>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext
>>>>>>>>>     at
>>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>>>>>>>     ... 103 more
>>>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext
>>>>>>>>>     at java.lang.ClassLoader.findClass(ClassLoader.java:531)
>>>>>>>>>     at
>>>>>>>>> org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>>>>>>>     at
>>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>>>>>>>     at
>>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
>>>>>>>>>     at
>>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
>>>>>>>>>     ... 105 more
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Re: UDFs in Zeppelin??

Reply via email to