Re: Closure issue with spark 1.4.1

David Salinas Thu, 17 Sep 2015 01:34:56 -0700

Hi,

I have tried this example after
https://github.com/apache/incubator-zeppelin/pull/270.


But it is not working for several reasons:

1/ Zeppelin context is not found (!):
val s = z.input("Foo")
<console>:21: error: not found: value z

2/ If I include my jar, the classpath is not communicated to slaves, the
code works only locally (it used to work on the cluster before this
change), I guess there is something wrong with the way I set the classpath
(which is also probably linked to 1/)

I have added this line in zeppelin-env.sh to use one of my jar
export ZEPPELIN_JAVA_OPTS="-Dspark.driver.host=`hostname`
-Dspark.mesos.coarse=true -Dspark.executor.memory=20g -Dspark.cores.max=80
-Dspark.jars=${SOME_JAR} -cp ${SOME_CLASSPATH_FOR_THE_JAR}

How can one add its extraclasspath jar with this new version? Could you add
an ZEPPELIN_EXTRA_JAR, ZEPPELIN_EXTRA_CLASSPATH to zeppelin-env.sh so that
the user can add easily his code?

Best,

David

On Thu, Sep 3, 2015 at 9:39 AM, David Salinas <david.salinas....@gmail.com>
wrote:

> Hi Moon,
>
> Thanks for your reactivity, I will notice you of the result as soon as I
> can.
>
> Best,
>
> David
>
> On Wed, Sep 2, 2015 at 6:51 AM, moon soo Lee <m...@apache.org> wrote:
>
>> Hi,
>>
>> I just pushed patch for ZEPPELIN-262 at
>> https://github.com/apache/incubator-zeppelin/pull/270.
>> It'll take some time to be reviewed and merged into master.
>> Before that, you can try the branch of the PullRequest.
>>
>> I believe it'll solve your problem, but let me know when you still have
>> problem after this patch.
>>
>> Thanks,
>> moon
>>
>> On Tue, Sep 1, 2015 at 2:46 PM moon soo Lee <m...@apache.org> wrote:
>>
>>> Hi,
>>>
>>> I'm testing patch for ZEPPELIN-262 with some environments that i have.
>>> I think i can create pullrequest tonight.
>>>
>>> Thanks,
>>> moon
>>>
>>>
>>> On Tue, Sep 1, 2015 at 1:34 PM Steven Kirtzic <
>>> steven.kirtzic.f...@statefarm.com> wrote:
>>>
>>>> Hi Moon,
>>>>
>>>>
>>>>
>>>> When are you guys targeting the release for Zeppelin-262? Thanks,
>>>>
>>>>
>>>>
>>>> -Steven
>>>>
>>>>
>>>>
>>>> *From:* moon soo Lee [mailto:m...@apache.org]
>>>> *Sent:* Tuesday, September 01, 2015 12:38 AM
>>>> *To:* users@zeppelin.incubator.apache.org
>>>> *Subject:* Re: Closure issue with spark 1.4.1
>>>>
>>>>
>>>>
>>>> Hi David, Jerry,
>>>>
>>>>
>>>>
>>>> There're series of efforts to improve spark integration.
>>>>
>>>>
>>>>
>>>> Work with provided version of Spark
>>>>
>>>> https://issues.apache.org/jira/browse/ZEPPELIN-160
>>>>
>>>>
>>>>
>>>> Self diagnostics of configuration
>>>>
>>>> https://issues.apache.org/jira/browse/ZEPPELIN-256
>>>>
>>>>
>>>>
>>>> Use spark-submit to run spark interpreter process
>>>>
>>>> https://issues.apache.org/jira/browse/ZEPPELIN-262
>>>>
>>>>
>>>>
>>>> I saw many people struggled with configuring spark in Zeppelin with
>>>> various environments in the mailing list.
>>>>
>>>> ZEPPELIN-262 will virtually solve all the problems around configuration
>>>> with Spark.
>>>>
>>>>
>>>>
>>>> Thanks for sharing your problems and feedback. That enables zeppelin
>>>> make progress.
>>>>
>>>>
>>>>
>>>> Best,
>>>>
>>>> moon
>>>>
>>>>
>>>>
>>>> On Mon, Aug 31, 2015 at 9:17 PM Jerry Lam <chiling...@gmail.com> wrote:
>>>>
>>>> Hi David,
>>>>
>>>>
>>>>
>>>> We gave up on zeppelin because of the lack of support. It seems that
>>>> zeppelin has a lot of fancy features but lack of depth. Only time will tell
>>>> if zeppelin can overcome those limitations.
>>>>
>>>>
>>>>
>>>> Good luck,
>>>>
>>>>
>>>>
>>>> Jerry
>>>>
>>>>
>>>>
>>>> On Mon, Aug 31, 2015 at 8:17 AM, David Salinas <
>>>> david.salinas....@gmail.com> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> Has anyone been able to reproduce the error with the last code snipplet
>>>> I gave? It fails 100% of the time on cluster for me.
>>>> This serialization issue asking for ZeppelinContext comes also in many
>>>> other cases in my setting where it should not be the case as it works fine
>>>> with spark shell.
>>>>
>>>> Best regards,
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>> On Mon, Aug 24, 2015 at 9:07 PM, Jerry Lam <chiling...@gmail.com>
>>>> wrote:
>>>>
>>>> Hi Zeppelin developers,
>>>>
>>>>
>>>>
>>>> This issue sounds very serious. Is this specific to David's use case
>>>> here?
>>>>
>>>>
>>>>
>>>> Best Regards,
>>>>
>>>>
>>>>
>>>> Jerry
>>>>
>>>>
>>>>
>>>> On Mon, Aug 24, 2015 at 1:28 PM, David Salinas <
>>>> david.salinas....@gmail.com> wrote:
>>>>
>>>> I have looked at the SparkInterpreter.java code and this is indeed the
>>>> issue. Whenever one uses an instruction with z.input("...") something then
>>>> no spark transformation can work as z will be shipped to the slaves where
>>>> Zeppelin is not installed as showed by the example I sent.
>>>>
>>>> A workaround could be to interpret separately the variables (by
>>>> defining a map of variables before interpreting).
>>>>
>>>> Best,
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Aug 24, 2015 at 6:45 PM, David Salinas <
>>>> david.salinas....@gmail.com> wrote:
>>>>
>>>> Hi Moon,
>>>>
>>>> I found another way to reproduce the problem:
>>>>
>>>> //cell 1 does not work
>>>>
>>>> val file = "hdfs://someclusterfile.json"
>>>> val s = z.input("Foo").toString
>>>> val textFile = sc.textFile(file)
>>>> textFile.filter(_.contains(s)).count
>>>> //org.apache.spark.SparkException: Job aborted due to stage failure:
>>>> Task 41 in stage 5.0 failed 4 times, most recent failure: Lost task 41.3 in
>>>> stage 5.0 (TID 2735,XXX.com ): java.lang.NoClassDefFoundError:
>>>> Lorg/apache/zeppelin/spark/ZeppelinContext;
>>>>
>>>> // cell 2 works
>>>>
>>>> val file = "hdfs://someclusterfile.json"
>>>> val s = "Y"
>>>> val textFile = sc.textFile(file)
>>>> textFile.filter(_.contains(s)).count
>>>> //res19: Long = 109
>>>>
>>>> This kind of issue happens often also when using variables from other
>>>> cells and also when taking closure for transformation. Maybe you are
>>>> reading variables inside the transformation with something like
>>>> "z.get("s")" which causes z to be send to the slaves as one of its member
>>>> is used (although I also sometimes have this issue without using anything
>>>> from other cells).
>>>>
>>>> Best,
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Aug 24, 2015 at 10:34 AM, David Salinas <
>>>> david.salinas....@gmail.com> wrote:
>>>>
>>>> Sorry I forgot to mention my environment:
>>>>
>>>> mesos 0.17, spark 1.4.1, scala 2.10.4, java 1.8
>>>>
>>>>
>>>>
>>>> On Mon, Aug 24, 2015 at 10:32 AM, David Salinas <
>>>> david.salinas....@gmail.com> wrote:
>>>>
>>>> Hi Moon,
>>>>
>>>>
>>>> Today I cannot reproduce the bug with elementary example either but it
>>>> is still impacting all my notebooks. The weird thing is that when calling a
>>>> transformation with map, it takes Zeppelin Context in the closure which
>>>> gives these java.lang.NoClassDefFoundError:
>>>> Lorg/apache/zeppelin/spark/ZeppelinContext errors (spark shell run this
>>>> command without any problem). I will try to find another example that is
>>>> more persistent (it is weird this example was failing yesterday). Do you
>>>> have any idea of what could cause Zeppelin Context to be included in the
>>>> closure?
>>>>
>>>> Best,
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Aug 21, 2015 at 6:29 PM, moon soo Lee <m...@apache.org> wrote:
>>>>
>>>> I have tested your code and can not reproduce the problem.
>>>>
>>>>
>>>>
>>>> Could you share your environment? how did you configure Zeppelin with
>>>> Spark?
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> moon
>>>>
>>>>
>>>>
>>>> On Fri, Aug 21, 2015 at 2:25 AM David Salinas <
>>>> david.salinas....@gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I have a problem when using spark closure. This error was not appearing
>>>> with spark 1.2.1.
>>>>
>>>> I have included a reproducible example that happens when taking the
>>>> closure (Zeppelin has been built with head of master with this command mvn
>>>> install -DskipTests -Pspark-1.4 -Dspark.version=1.4.1
>>>> -Dhadoop.version=2.2.0 -Dprotobuf.version=2.5.0). Does anyone ever
>>>> encountered this problem? All my previous notebooks are broken by this :(
>>>>
>>>> ------------------------------
>>>> val textFile = sc.textFile("hdfs://somefile.txt")
>>>>
>>>> val f = (s: String) => s+s
>>>> textFile.map(f).count
>>>> //works fine
>>>> //res145: Long = 407
>>>>
>>>>
>>>> def f(s:String) = {
>>>>     s+s
>>>> }
>>>> textFile.map(f).count
>>>>
>>>> //fails ->
>>>>
>>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task
>>>> 566 in stage 87.0 failed 4 times, most recent failure: Lost task 566.3 in
>>>> stage 87.0 (TID 43396, XXX.com): java.lang.NoClassDefFoundError:
>>>> Lorg/apache/zeppelin/spark/ZeppelinContext; at
>>>> java.lang.Class.getDeclaredFields0(Native Method) at
>>>> java.lang.Class.privateGetDeclaredFields(Class.java:2583) at
>>>> java.lang.Class.getDeclaredField(Class.java:2068) ...
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>>>> at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at
>>>>
>>>> Best,
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>

Re: Closure issue with spark 1.4.1

Reply via email to