Re: Monitoring a Notebook in Spark UI

2020-07-21 Thread Jeff Zhang
Hi Stephane, I mean running spark sql job concurrently via %spark.sql just by setting zeppelin.spark.concurrentSQL to be true. See the details here http://zeppelin.apache.org/docs/0.9.0-preview1/interpreter/spark.html#sparksql 于2020年7月22日周三 上午12:21写道: > Hi Jeff, > > > > * You can also run

RE: Monitoring a Notebook in Spark UI

2020-07-21 Thread stephane.davy
Hi Jeff, You can also run multiple spark sql jobs concurrently in one spark app Can you please elaborate on this? What I see (with Zeppelin 0.8) is that with shared interpreter, each job is ran one after one. When going to one interpreter per user, many users can run a job at the same

Re: Monitoring a Notebook in Spark UI

2020-07-21 Thread Jeff Zhang
Regarding how many spark apps, it depends on the interpreter binding mode, you can refer to this document. http://zeppelin.apache.org/docs/0.9.0-preview1/usage/interpreter/interpreter_binding_mode.html Internally, each spark app run a scala shell to execute scala code and python shell to execute

Monitoring a Notebook in Spark UI

2020-07-21 Thread Joshua Conlin
Hello, I'm looking for documentation to better understand pyspark/scala notebook execution in Spark. I typically see application runtimes that can be very long, is there always a spark "application" running for a notebook or zeppelin session? Those that are not actually being run in zeppelin

Re: [VOTE] Release Apache Zeppelin 0.9.0-preview2 (RC1)

2020-07-21 Thread Alex Ott
I didn't compile it myself, I just use binaries that Jeff created for preview2. My point is that it worked out of box in preview1, and previous versions, and should continue be the same, otherwise it's a very breaking change that requires that people know about that... On Tue, Jul 21, 2020 at

Re: [VOTE] Release Apache Zeppelin 0.9.0-preview2 (RC1)

2020-07-21 Thread Jeff Zhang
That's right, In that PR, I exclude hadoop jars from zeppelin distribution, so that we can support both hadoop2 and hadoop3 (user could set USE_HADOOP=true in zeppelin-env.sh, so that zeppelin run command `hadoop classpath` and put all the hadoop jars in classpath of zeppelin. But for this issue,

Re: [VOTE] Release Apache Zeppelin 0.9.0-preview2 (RC1)

2020-07-21 Thread Philipp Dallig
Hi Alex, It seems that Hadoop classes are missing. Do you include Hadoop jars with "-P include-hadoop"? I think it's related to https://github.com/apache/zeppelin/commit/6fa79a9fc743f2b4321ac9e8713b3380bb4d64c9#diff-600376dffeb79835ede4a0b285078036. Philipp Am 21.07.20 um 11:28 schrieb

Re: [VOTE] Release Apache Zeppelin 0.9.0-preview2 (RC1)

2020-07-21 Thread Alex Ott
Hi Jeff I've found another issue in both rc1 & rc2 - if you don't specify the SPARK_HOME, then the default Spark interpreter doesn't start with following error if I execute the code for reading from Cassandra: %spark import org.apache.spark.sql.cassandra._ val data =