Re: Closure issue with spark 1.4.1

David Salinas Thu, 03 Sep 2015 00:41:01 -0700

Hi Moon,

Thanks for your reactivity, I will notice you of the result as soon as I
can.


Best,

David

On Wed, Sep 2, 2015 at 6:51 AM, moon soo Lee <m...@apache.org> wrote:

> Hi,
>
> I just pushed patch for ZEPPELIN-262 at
> https://github.com/apache/incubator-zeppelin/pull/270.
> It'll take some time to be reviewed and merged into master.
> Before that, you can try the branch of the PullRequest.
>
> I believe it'll solve your problem, but let me know when you still have
> problem after this patch.
>
> Thanks,
> moon
>
> On Tue, Sep 1, 2015 at 2:46 PM moon soo Lee <m...@apache.org> wrote:
>
>> Hi,
>>
>> I'm testing patch for ZEPPELIN-262 with some environments that i have.
>> I think i can create pullrequest tonight.
>>
>> Thanks,
>> moon
>>
>>
>> On Tue, Sep 1, 2015 at 1:34 PM Steven Kirtzic <
>> steven.kirtzic.f...@statefarm.com> wrote:
>>
>>> Hi Moon,
>>>
>>>
>>>
>>> When are you guys targeting the release for Zeppelin-262? Thanks,
>>>
>>>
>>>
>>> -Steven
>>>
>>>
>>>
>>> *From:* moon soo Lee [mailto:m...@apache.org]
>>> *Sent:* Tuesday, September 01, 2015 12:38 AM
>>> *To:* users@zeppelin.incubator.apache.org
>>> *Subject:* Re: Closure issue with spark 1.4.1
>>>
>>>
>>>
>>> Hi David, Jerry,
>>>
>>>
>>>
>>> There're series of efforts to improve spark integration.
>>>
>>>
>>>
>>> Work with provided version of Spark
>>>
>>> https://issues.apache.org/jira/browse/ZEPPELIN-160
>>>
>>>
>>>
>>> Self diagnostics of configuration
>>>
>>> https://issues.apache.org/jira/browse/ZEPPELIN-256
>>>
>>>
>>>
>>> Use spark-submit to run spark interpreter process
>>>
>>> https://issues.apache.org/jira/browse/ZEPPELIN-262
>>>
>>>
>>>
>>> I saw many people struggled with configuring spark in Zeppelin with
>>> various environments in the mailing list.
>>>
>>> ZEPPELIN-262 will virtually solve all the problems around configuration
>>> with Spark.
>>>
>>>
>>>
>>> Thanks for sharing your problems and feedback. That enables zeppelin
>>> make progress.
>>>
>>>
>>>
>>> Best,
>>>
>>> moon
>>>
>>>
>>>
>>> On Mon, Aug 31, 2015 at 9:17 PM Jerry Lam <chiling...@gmail.com> wrote:
>>>
>>> Hi David,
>>>
>>>
>>>
>>> We gave up on zeppelin because of the lack of support. It seems that
>>> zeppelin has a lot of fancy features but lack of depth. Only time will tell
>>> if zeppelin can overcome those limitations.
>>>
>>>
>>>
>>> Good luck,
>>>
>>>
>>>
>>> Jerry
>>>
>>>
>>>
>>> On Mon, Aug 31, 2015 at 8:17 AM, David Salinas <
>>> david.salinas....@gmail.com> wrote:
>>>
>>> Hi all,
>>>
>>> Has anyone been able to reproduce the error with the last code snipplet
>>> I gave? It fails 100% of the time on cluster for me.
>>> This serialization issue asking for ZeppelinContext comes also in many
>>> other cases in my setting where it should not be the case as it works fine
>>> with spark shell.
>>>
>>> Best regards,
>>>
>>> David
>>>
>>>
>>>
>>> On Mon, Aug 24, 2015 at 9:07 PM, Jerry Lam <chiling...@gmail.com> wrote:
>>>
>>> Hi Zeppelin developers,
>>>
>>>
>>>
>>> This issue sounds very serious. Is this specific to David's use case
>>> here?
>>>
>>>
>>>
>>> Best Regards,
>>>
>>>
>>>
>>> Jerry
>>>
>>>
>>>
>>> On Mon, Aug 24, 2015 at 1:28 PM, David Salinas <
>>> david.salinas....@gmail.com> wrote:
>>>
>>> I have looked at the SparkInterpreter.java code and this is indeed the
>>> issue. Whenever one uses an instruction with z.input("...") something then
>>> no spark transformation can work as z will be shipped to the slaves where
>>> Zeppelin is not installed as showed by the example I sent.
>>>
>>> A workaround could be to interpret separately the variables (by defining
>>> a map of variables before interpreting).
>>>
>>> Best,
>>>
>>> David
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Aug 24, 2015 at 6:45 PM, David Salinas <
>>> david.salinas....@gmail.com> wrote:
>>>
>>> Hi Moon,
>>>
>>> I found another way to reproduce the problem:
>>>
>>> //cell 1 does not work
>>>
>>> val file = "hdfs://someclusterfile.json"
>>> val s = z.input("Foo").toString
>>> val textFile = sc.textFile(file)
>>> textFile.filter(_.contains(s)).count
>>> //org.apache.spark.SparkException: Job aborted due to stage failure:
>>> Task 41 in stage 5.0 failed 4 times, most recent failure: Lost task 41.3 in
>>> stage 5.0 (TID 2735,XXX.com ): java.lang.NoClassDefFoundError:
>>> Lorg/apache/zeppelin/spark/ZeppelinContext;
>>>
>>> // cell 2 works
>>>
>>> val file = "hdfs://someclusterfile.json"
>>> val s = "Y"
>>> val textFile = sc.textFile(file)
>>> textFile.filter(_.contains(s)).count
>>> //res19: Long = 109
>>>
>>> This kind of issue happens often also when using variables from other
>>> cells and also when taking closure for transformation. Maybe you are
>>> reading variables inside the transformation with something like
>>> "z.get("s")" which causes z to be send to the slaves as one of its member
>>> is used (although I also sometimes have this issue without using anything
>>> from other cells).
>>>
>>> Best,
>>>
>>> David
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Aug 24, 2015 at 10:34 AM, David Salinas <
>>> david.salinas....@gmail.com> wrote:
>>>
>>> Sorry I forgot to mention my environment:
>>>
>>> mesos 0.17, spark 1.4.1, scala 2.10.4, java 1.8
>>>
>>>
>>>
>>> On Mon, Aug 24, 2015 at 10:32 AM, David Salinas <
>>> david.salinas....@gmail.com> wrote:
>>>
>>> Hi Moon,
>>>
>>>
>>> Today I cannot reproduce the bug with elementary example either but it
>>> is still impacting all my notebooks. The weird thing is that when calling a
>>> transformation with map, it takes Zeppelin Context in the closure which
>>> gives these java.lang.NoClassDefFoundError:
>>> Lorg/apache/zeppelin/spark/ZeppelinContext errors (spark shell run this
>>> command without any problem). I will try to find another example that is
>>> more persistent (it is weird this example was failing yesterday). Do you
>>> have any idea of what could cause Zeppelin Context to be included in the
>>> closure?
>>>
>>> Best,
>>>
>>> David
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Aug 21, 2015 at 6:29 PM, moon soo Lee <m...@apache.org> wrote:
>>>
>>> I have tested your code and can not reproduce the problem.
>>>
>>>
>>>
>>> Could you share your environment? how did you configure Zeppelin with
>>> Spark?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> moon
>>>
>>>
>>>
>>> On Fri, Aug 21, 2015 at 2:25 AM David Salinas <
>>> david.salinas....@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I have a problem when using spark closure. This error was not appearing
>>> with spark 1.2.1.
>>>
>>> I have included a reproducible example that happens when taking the
>>> closure (Zeppelin has been built with head of master with this command mvn
>>> install -DskipTests -Pspark-1.4 -Dspark.version=1.4.1
>>> -Dhadoop.version=2.2.0 -Dprotobuf.version=2.5.0). Does anyone ever
>>> encountered this problem? All my previous notebooks are broken by this :(
>>>
>>> ------------------------------
>>> val textFile = sc.textFile("hdfs://somefile.txt")
>>>
>>> val f = (s: String) => s+s
>>> textFile.map(f).count
>>> //works fine
>>> //res145: Long = 407
>>>
>>>
>>> def f(s:String) = {
>>>     s+s
>>> }
>>> textFile.map(f).count
>>>
>>> //fails ->
>>>
>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task
>>> 566 in stage 87.0 failed 4 times, most recent failure: Lost task 566.3 in
>>> stage 87.0 (TID 43396, XXX.com): java.lang.NoClassDefFoundError:
>>> Lorg/apache/zeppelin/spark/ZeppelinContext; at
>>> java.lang.Class.getDeclaredFields0(Native Method) at
>>> java.lang.Class.privateGetDeclaredFields(Class.java:2583) at
>>> java.lang.Class.getDeclaredField(Class.java:2068) ...
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at
>>>
>>> Best,
>>>
>>> David
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>

Re: Closure issue with spark 1.4.1

Reply via email to