Re: UDFs in Zeppelin??

Ophir Cohen Thu, 02 Jul 2015 05:05:31 -0700

Will do so soon.
10x

On Thu, Jul 2, 2015 at 2:39 PM, IT CTO <goi....@gmail.com> wrote:


> I think you should add these notes to the JIRA note as it is not clear
> from the note itself. (sorry that this is not helping solving the problem
> itself :-))
>
> On Thu, Jul 2, 2015 at 2:06 PM Ophir Cohen <oph...@gmail.com> wrote:
>
>> It does not happen in local mode.
>> Actually whenever it works in the same process it works great.
>> It looks that somehow Zeppelin jar does not distributed into the nodes.
>> Still, it strange as register UDF and the UDF itslef does not need
>> ZeppelinContext (at least not explicitly).
>>
>> And yes, filterdNc is a local table, I just use it to enable me call the
>> UDF. you can try that on any table.
>>
>> On Thu, Jul 2, 2015 at 1:23 PM, IT CTO <goi....@gmail.com> wrote:
>>
>>> Does this happen on a local mode as well or just on external cluster?
>>> with regard to the repro - %sql select getNum() from filteredNc limit 1
>>> I guess, filterdNc is some table you have? cause when I tried it on my
>>> local machine I got :
>>> no such table filteredNc; line 1 pos 21
>>> Eran
>>>
>>> On Thu, Jul 2, 2015 at 12:44 PM Ophir Cohen <oph...@gmail.com> wrote:
>>>
>>>> Thank you Moon.
>>>> Here is the link:
>>>> https://issues.apache.org/jira/browse/ZEPPELIN-150
>>>>
>>>> Please let me know how can I help further more.
>>>>
>>>> On Thu, Jul 2, 2015 at 2:35 AM, moon soo Lee <m...@apache.org> wrote:
>>>>
>>>>> Really appreciate for sharing the problem.
>>>>> Very interesting. Do you mind file a issue on JIRA?
>>>>>
>>>>> Best,
>>>>> moon
>>>>>
>>>>> On Tue, Jun 30, 2015 at 4:32 AM Ophir Cohen <oph...@gmail.com> wrote:
>>>>>
>>>>>> BTW, this isn't working as well:
>>>>>>
>>>>>>
>>>>>>
>>>>>> *val sidNameDF = hc.sql("select sid, name from hive_table limit
>>>>>> 10")val sidNameDF2 = hc.createDataFrame(sidNameDF.rdd, sidNameDF.schema)
>>>>>> sidNameDF2.registerTempTable("tmp_sid_name2")*
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 30, 2015 at 1:45 PM, Ophir Cohen <oph...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I've made some progress in this issue and I think it's a bug...
>>>>>>>
>>>>>>> Apparently, when trying to use registered UDFs on tables that comes
>>>>>>> from Hive - it returns the above exception (*ClassNotFoundException:
>>>>>>> org.apache.zeppelin.spark.ZeppelinContext*).
>>>>>>> When create new table and register it - UDFs works as expected.
>>>>>>> You can see below to full details and example.
>>>>>>>
>>>>>>> Can someone tell if it's the expected behavior or a bug?
>>>>>>> BTW
>>>>>>> I don't mind to work on that bug - if you can give a pointer to the
>>>>>>> right places.
>>>>>>>
>>>>>>> BTW2
>>>>>>> Trying to register the SAME DataFrame as tempTable does not solve
>>>>>>> the problem - only creating new table out of new DataFrame (see below).
>>>>>>>
>>>>>>>
>>>>>>> *Detailed example*
>>>>>>> 1. I have table in Hive called '*hive_table*' with string field
>>>>>>> called *'name'* and int filed called *'sid'*
>>>>>>>
>>>>>>> 2. I registered a udf:
>>>>>>> *def getStr(str: String) = str + "_str"*
>>>>>>> *hc.udf.register("getStr", getStr _)*
>>>>>>>
>>>>>>> 3. Running the following on Zeppelin:
>>>>>>> *%sql select getStr(name), * from** hive_table*
>>>>>>> yields with excpetion:
>>>>>>> *ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext*
>>>>>>>
>>>>>>> 4. Creating new table, as follows:
>>>>>>> *case class SidName(sid: Int, name: String)*
>>>>>>> *val sidNameList = hc.sql("select sid, name from hive_table limit
>>>>>>> 10").collectAsList().map(row => new SidName(row.getInt(0),
>>>>>>> row.getString(1)))*
>>>>>>> *val sidNameDF = hc.createDataFrame(sidNameList)*
>>>>>>> *sidNameDF.registerTempTable("tmp_sid_name")*
>>>>>>>
>>>>>>> 5. Query the new table in the same fashion:
>>>>>>> *%sql select getStr(name), * from tmp_sid_name*
>>>>>>>
>>>>>>> This time I get the expected results!
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <oph...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> BTW
>>>>>>>> The same query, on the same cluster but on Spark shell return the
>>>>>>>> expected results.
>>>>>>>>
>>>>>>>> On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <oph...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> It looks that Zeppelin jar does not distributed to Spark nodes,
>>>>>>>>> though I can't understand why it needed for the UDF.
>>>>>>>>>
>>>>>>>>> On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <oph...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks for the response,
>>>>>>>>>> I'm not sure what do you mean, it exactly what I tried and failed.
>>>>>>>>>> As I wrote above, 'hc' is actually different name to sqlc (that
>>>>>>>>>> is different name to z.sqlContext).
>>>>>>>>>>
>>>>>>>>>> I get the same results.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <mina...@nflabs.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Ophir,
>>>>>>>>>>>
>>>>>>>>>>> Can you try below?
>>>>>>>>>>>
>>>>>>>>>>> def getNum(): Int = {
>>>>>>>>>>>     100
>>>>>>>>>>> }
>>>>>>>>>>> sqlc.udf.register("getNum", getNum _)
>>>>>>>>>>> sqlc.sql("select getNum() from filteredNc limit 1").show
>>>>>>>>>>>
>>>>>>>>>>> FYI sqlContext(==sqlc) is internally created by Zeppelin
>>>>>>>>>>> and use hiveContext as sqlContext by default.
>>>>>>>>>>> (If you did not change useHiveContext to be "false" in
>>>>>>>>>>> interpreter menu.)
>>>>>>>>>>>
>>>>>>>>>>> Hope it helps.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <oph...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Guys?
>>>>>>>>>>>> Somebody?
>>>>>>>>>>>> Can it be that Zeppelin does not support UDFs?
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <oph...@gmail.com
>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Guys,
>>>>>>>>>>>>> One more problem I have encountered using Zeppelin.
>>>>>>>>>>>>> Using Spark 1.3.1 on Yarn Hadoop 2.4
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm trying to create and use UDF (hc == z.sqlContext ==
>>>>>>>>>>>>> HiveContext):
>>>>>>>>>>>>> 1. Create and register the UDF:
>>>>>>>>>>>>> def getNum(): Int = {
>>>>>>>>>>>>>     100
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> hc.udf.register("getNum",getNum _)
>>>>>>>>>>>>> 2. And I try to use on exist table:
>>>>>>>>>>>>> %sql select getNum() from filteredNc limit 1
>>>>>>>>>>>>>
>>>>>>>>>>>>> Or:
>>>>>>>>>>>>> 3. Trying using direct hc:
>>>>>>>>>>>>> hc.sql("select getNum() from filteredNc limit 1").collect
>>>>>>>>>>>>>
>>>>>>>>>>>>> Both of them yield with
>>>>>>>>>>>>> *"java.lang.ClassNotFoundException:
>>>>>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext"*
>>>>>>>>>>>>> (see below the full exception).
>>>>>>>>>>>>>
>>>>>>>>>>>>> And my questions is:
>>>>>>>>>>>>> 1. Can it be that ZeppelinContext is not available on Spark
>>>>>>>>>>>>> nodes?
>>>>>>>>>>>>> 2. Why it need ZeppelinContext anyway? Why it's relevant?
>>>>>>>>>>>>>
>>>>>>>>>>>>> The exception:
>>>>>>>>>>>>>  WARN [2015-06-28 08:43:53,850] ({task-result-getter-0}
>>>>>>>>>>>>> Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 
>>>>>>>>>>>>> 1626,
>>>>>>>>>>>>> ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError:
>>>>>>>>>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext;
>>>>>>>>>>>>>     at java.lang.Class.getDeclaredFields0(Native Method)
>>>>>>>>>>>>>     at
>>>>>>>>>>>>> java.lang.Class.privateGetDeclaredFields(Class.java:2499)
>>>>>>>>>>>>>     at java.lang.Class.getDeclaredField(Class.java:1951)
>>>>>>>>>>>>>     at
>>>>>>>>>>>>> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)
>>>>>>>>>>>>>
>>>>>>>>>>>>> <Many more of ObjectStreamClass lines of exception>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext
>>>>>>>>>>>>>     at
>>>>>>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
>>>>>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>>>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>>>>>>>>>>>     ... 103 more
>>>>>>>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>>>>>>>> org.apache.zeppelin.spark.ZeppelinContext
>>>>>>>>>>>>>     at java.lang.ClassLoader.findClass(ClassLoader.java:531)
>>>>>>>>>>>>>     at
>>>>>>>>>>>>> org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
>>>>>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>>>>>>>>>>>     at
>>>>>>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
>>>>>>>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>>>>>>>>>>>     at
>>>>>>>>>>>>> org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
>>>>>>>>>>>>>     at
>>>>>>>>>>>>> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
>>>>>>>>>>>>>     ... 105 more
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>

Re: UDFs in Zeppelin??

Reply via email to