Re: Error about PySpark

2017-02-02 Thread Jianfeng (Jeff) Zhang

Please try to install numpy


Best Regard,
Jeff Zhang


From: mingda li >
Reply-To: "users@zeppelin.apache.org" 
>
Date: Friday, February 3, 2017 at 6:03 AM
To: "users@zeppelin.apache.org" 
>
Subject: Re: Error about PySpark

And I tried the ./bin/pyspark to run same program with package of mllib, That 
can work well for spark.

So do I need to set something for Zeppelin? Like PySpark_Python or PythonPath.

Bests,
Mingda

On Thu, Feb 2, 2017 at 12:07 PM, mingda li 
> wrote:
Thanks. But when I changed the env of zeppelin as following:

export JAVA_HOME=/home/clash/asterixdb/jdk1.8.0_101

export ZEPPELIN_PORT=19037

export SPARK_HOME=/home/clash/sparks/spark-1.6.1-bin-hadoop12

Each time, if I want to use the mllib in zeppelin, I will meet the problem:

Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 
13, SCAI05.CS.UCLA.EDU): 
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File 
"/home/clash/sparks/spark-1.6.1-bin-hadoop12/python/lib/pyspark.zip/pyspark/worker.py",
 line 98, in main
command = pickleSer._read_with_length(infile)
File 
"/home/clash/sparks/spark-1.6.1-bin-hadoop12/python/lib/pyspark.zip/pyspark/serializers.py",
 line 164, in _read_with_length
return self.loads(obj)
File 
"/home/clash/sparks/spark-1.6.1-bin-hadoop12/python/lib/pyspark.zip/pyspark/serializers.py",
 line 422, in loads
return pickle.loads(obj)
File 
"/home/clash/sparks/spark-1.6.1-bin-hadoop12/python/lib/pyspark.zip/pyspark/mllib/__init__.py",
 line 25, in 
ImportError: No module named numpy
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
at org.apache.spark.api.python.PythonRunner$$anon$1.(PythonRDD.scala:207)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
at org.apache.spark.api.python.PythonRDD$.runJob(PythonRDD.scala:393)
at org.apache.spark.api.python.PythonRDD.runJob(PythonRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at 

Re: Connection refused when trying to run livy.spark interpterer in kerberized HDP

2017-01-28 Thread Jianfeng (Jeff) Zhang

>>> But when I enable user impersonation on ivy.spark interpterer

Do you mean you enable the option of user impersonation on livy interpreter 
setting of zeppelin ? You don't need to do that as impersonation is done by 
livy, the option there is a little confusing, it should not be displayed.


Best Regard,
Jeff Zhang


From: Michał Kabocik >
Reply-To: "users@zeppelin.apache.org" 
>
Date: Friday, January 27, 2017 at 10:59 PM
To: "users@zeppelin.apache.org" 
>
Subject: Connection refused when trying to run livy.spark interpterer in 
kerberized HDP

But when I enable user impersonation on livy.spark interpterer


Re: 0.6.1 and spark 2.0.0

2016-09-16 Thread Jianfeng (Jeff) Zhang

This is a known issue. There’s 2 solutions.

1.  Use the spark 2.0 of HDP 2.5
2.  Disable timeline service in yarn-site.xml


Best Regard,
Jeff Zhang


From: Herman Yu >
Reply-To: "users@zeppelin.apache.org" 
>
Date: Friday, September 16, 2016 at 11:32 AM
To: "users@zeppelin.apache.org" 
>
Subject: 0.6.1 and spark 2.0.0

With a binary built of zeppelin 0.6.1 and spark 2.0.0 on a HDP 2.4 sandbox, I 
am getting the following error when spark interpreter (spark conf and zeppelin 
conf folders are copied from HDP 2.4 and modified accordingly)

I google’d it and it seems related to Yarn timeline server (jersey 1 vs. jersey 
2). Has anyone encountered the same issue? Is there a solution/walk-around to 
make spark 2.0 interpreter work?

Thanks
Herman.


DEBUG [2016-09-15 22:48:08,563] ({pool-2-thread-2} 
DataTransferSaslUtil.java[getSaslPropertiesResolver]:183) - 
DataTransferProtocol not using SaslPropertiesResolver, no QOP found in 
configuration for dfs.data.transfer.protection
DEBUG [2016-09-15 22:48:08,566] ({pool-2-thread-2} 
AbstractService.java[enterState]:452) - Service: 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl entered state INITED
ERROR [2016-09-15 22:48:08,590] ({pool-2-thread-2} Utils.java[invokeMethod]:40) 
-
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:33)
at 
org.apache.zeppelin.spark.SparkInterpreter.createSparkSession(SparkInterpreter.java:343)
at 
org.apache.zeppelin.spark.SparkInterpreter.getSparkSession(SparkInterpreter.java:216)
at 
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:741)
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:341)
at org.apache.zeppelin.scheduler.Job.run(Job.java:176)
at 
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoClassDefFoundError: 
com/sun/jersey/api/client/config/ClientConfig
at 
org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181)
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:150)
at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
at org.apache.spark.SparkContext.(SparkContext.scala:500)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2256)
at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)
at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)
... 21 more
Caused by: java.lang.ClassNotFoundException: 
com.sun.jersey.api.client.config.ClientConfig
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at 

Re: No active SparkContext black hole

2016-10-07 Thread Jianfeng (Jeff) Zhang

Could you paste the log ?


Best Regard,
Jeff Zhang


From: Mark Libucha >
Reply-To: "users@zeppelin.apache.org" 
>
Date: Friday, October 7, 2016 at 12:11 AM
To: "users@zeppelin.apache.org" 
>
Subject: Re: No active SparkContext black hole

Actually, it's stuck in the Running state. Trying to cancel it causes the No 
active SparkContext to appear in the log. Seems like a bug.

On Thu, Oct 6, 2016 at 9:06 AM, Mark Libucha 
> wrote:
Hello again,

On "longer" running jobs (I'm using yarn-client mode), I sometimes get RPC 
timeouts. Seems like Zeppelin is losing connectivity with the Spark cluster. I 
can deal with that.

But my notebook has sections stuck in the "Cancel" state, and I can't get them 
out. When I re-click on cancel, I see "No active SparkContext" in the log. But 
I can't reload a new instance of the notebook, or kill the one that's stuck, 
without restarting all of zeppelin.

Suggestions?

Thanks,

Mark



Re: Having issues with Hive RuntimeException in runing Zeppelin notebook applicaiton

2016-11-16 Thread Jianfeng (Jeff) Zhang
>>> at /home/asif/zeppelin-0.6.2-bin-all/metastore_db has an incompatible 
>>> format with the current version of the software. The database was created 
>>> by or upgraded by version 10.11.

Try to delete this folder and rerun it again.


Best Regard,
Jeff Zhang


From: moon soo Lee >
Reply-To: "users@zeppelin.apache.org" 
>
Date: Thursday, November 17, 2016 at 6:47 AM
To: "users@zeppelin.apache.org" 
>, Muhammad Rezaul 
Karim >
Subject: Re: Having issues with Hive RuntimeException in runing Zeppelin 
notebook applicaiton

Hi,

It's strange,
Do you have SPARK_HOME or HADOOP_CONF_DIR defined in conf/zeppelin-env.sh?

You can stop Zeppelin, delete /home/asif/zeppelin-0.6.2-bin-all/metastore_db, 
start Zeppelin and try again.

Thanks,
moon

On Tue, Nov 15, 2016 at 4:05 PM Muhammad Rezaul Karim 
> wrote:



Hi,
I am a new user of Apache Zeppelin and I am running a simple notebook app on 
Zeppelin (version 0.6.2-bin-all) using Scala based on Spark.

My source code is as follows:

val bankText = 
sc.textFile("/home/rezkar/zeppelin-0.6.2-bin-all/bin/bank-full.csv")
case class Bank(age:String, job:String, marital:String, education:String, 
balance:String)
val bank = bankText.map(s=>s.split(";")).map(
s=>Bank(s(0),
s(1).replaceAll("\"", ""),
s(2).replaceAll("\"", ""),
s(3).replaceAll("\"", ""),
s(5)
)
)
bank.collect()
val mydf=bank.toDF()
mydf.registerTempTable("mydf");

Upto bank.collect() it works pretty well. However, I'm getting the following 
error on Ubuntu 14.04 when trying to execute the last two lines of my code:

java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:171)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at org.apache.spark.sql.SQLImplicits.rddToDatasetHolder(SQLImplicits.scala:163)
... 46 elided
Caused by: java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86)
... 70 more
Caused by: java.lang.reflect.InvocationTargetException: 
javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the 
given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, 
username = APP. Terminating connection pool (set lazyInit to true if you expect 
to start your database after your app). Original Exception: --
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)

Caused by: java.sql.SQLException: Failed to start database 'metastore_db' with 
class loader 
org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@48f2b61e, see the 
next exception for details.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown 
Source)
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
 Source)
... 136 more
Caused by: java.sql.SQLException: Database at 
/home/asif/zeppelin-0.6.2-bin-all/metastore_db has an incompatible format with 
the current version of the software. The database was created by or upgraded by 
version 10.11.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown 
Source)
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
 Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
... 133 more
Caused by: ERROR XSLAN: Database at 
/home/asif/zeppelin-0.6.2-bin-all/metastore_db has an incompatible format with 
the current version of the software. The database was created by or upgraded by 
version 10.11.
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
--
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
... 76 more
Caused by: javax.jdo.JDOFatalDataStoreException: Unable to open a 

Re: Unable to print("Hello World") using %python :(

2016-12-10 Thread Jianfeng (Jeff) Zhang

This exception is thrown from spark side. I suspect you set them.
Please grep them in SPARK_CONF_DIR

grep -nr SPARK_CLASSPATH SPARK_CONF_DIR
grep -nr spark.driver.extraClassPath SPARK_CONF_DIR

Best Regard,
Jeff Zhang


From: Russell Jurney >
Reply-To: "users@zeppelin.apache.org" 
>
Date: Saturday, December 10, 2016 at 12:02 PM
To: "users@zeppelin.apache.org" 
>
Subject: Re: Unable to print("Hello World") using %python :(

Use only the former.


Re: Zeppeline becomes too slow after adding artifacts

2016-12-06 Thread Jianfeng (Jeff) Zhang

Maybe spark interpreter is downloading the dependency. Could rerun the
paragraph again ?

Best Regard,
Jeff Zhang





On 12/7/16, 4:38 AM, "Nabajyoti Dash"  wrote:

>Hi,
>I am using zeppelin to visualize my hbase data.
>I had a sample zeppelin notebook, which was working fine.
>But after adding "org.apache.hbase:hbase-client:1.2.3" dependacy, my
>notebook gets stuck for a simple spark sql.
>Please suggest..
>
>
>
>--
>View this message in context:
>http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/Z
>eppeline-becomes-too-slow-after-adding-artifacts-tp4689.html
>Sent from the Apache Zeppelin Users (incubating) mailing list mailing
>list archive at Nabble.com.
>



Re: Limit on multiple concurrent interpreters / isolated notebooks?

2016-12-14 Thread Jianfeng (Jeff) Zhang

Yes, this would be a critical performance issue for multiple user case.
Because currently zeppelin only support yarn-client mode which means the
driver JVM is in the same host
of zeppelin server. So regarding the concurrent users, it depends on the
memory you configure for the driver and how many memory you have for the
machine. 

But for the long term, we should support yarn-cluster mode. Here¹s one
ticket and wiki for this

https://issues.apache.org/jira/browse/ZEPPELIN-1377


https://cwiki.apache.org/confluence/display/ZEPPELIN/Cluster+Manager+Propos
al





Best Regard,
Jeff Zhang





On 12/14/16, 11:27 PM, "blaubaer"  wrote:

>Hi
> 
>We are running Zeppelin (0.5) on our YARN managed cluster. To allow for
>multiple concurrent users, without sharing the spark context, we simply
>setup one interpreter for every user. This works pretty OK, however, at
>some
>point we seem to hit the limit of how many concurrent (spark) interpreters
>the Zeppelin (daemon) service can handle. Now with the ³new² (0.6) feature
>of isolated notebooks, this topic should pop up with other users as well.
> 
>So, I was wondering: what are your experiences with multiple concurrent
>interpreters? What are determining factors for the question how many
>concurrent interpreters can run (besides from cluster resources to
>actually
>be able to start the multiple interpreters)? Any experiences on that? For
>us
>it seems that 2-3 is OK, 4-5 get¹s critical, but that also depends on the
>load of the jobs it seems.
> 
>Thx.
>
>
>
>--
>View this message in context:
>http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/L
>imit-on-multiple-concurrent-interpreters-isolated-notebooks-tp4732.html
>Sent from the Apache Zeppelin Users (incubating) mailing list mailing
>list archive at Nabble.com.
>



Re: Unable to print("Hello World") using %python :(

2016-12-11 Thread Jianfeng (Jeff) Zhang
Can you run spark-shell correctly ?
If yes, then I suspect you may set them in zeppelin-side, grep them in 
ZEPPELIN_CONF_DIR


Best Regard,
Jeff Zhang


From: Russell Jurney <russell.jur...@gmail.com<mailto:russell.jur...@gmail.com>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Monday, December 12, 2016 at 8:27 AM
To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Subject: Re: Unable to print("Hello World") using %python :(

Thanks, I've already gone through and checked my spark config. SPARK_CLASSPATH 
isn't set, and I spark-submit all the time.

On Sat, Dec 10, 2016 at 5:06 PM Jianfeng (Jeff) Zhang 
<jzh...@hortonworks.com<mailto:jzh...@hortonworks.com>> wrote:



















This exception is thrown from spark side. I suspect you set them.


Please grep them in SPARK_CONF_DIR







grep -nr SPARK_CLASSPATH SPARK_CONF_DIR


grep -nr spark.driver.extraClassPath SPARK_CONF_DIR









Best Regard,


Jeff Zhang






















From: Russell Jurney <russell.jur...@gmail.com<mailto:russell.jur...@gmail.com>>


Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>


Date: Saturday, December 10, 2016 at 12:02 PM


To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>


Subject: Re: Unable to print("Hello World") using %python :(









Use

only the former.







Re: Job failed: Implementing class

2016-12-07 Thread Jianfeng (Jeff) Zhang

Do you use zeppelin binary distribution or build it by yourself ?


Best Regard,
Jeff Zhang


From: Facundo Bianco >
Reply-To: "users@zeppelin.apache.org" 
>
Date: Thursday, December 8, 2016 at 7:33 AM
To: "users@zeppelin.apache.org" 
>
Subject: Job failed: Implementing class

Hi there,

On HDP 2.4 I've installed Zeppelin 0.6.1 with Spark interpreter built with 
Scala 2.10. (Spark version is 1.6.1.)

All interpreters work well but the Spark interpreter fails. The error in log 
message is:

> ERROR [2016-12-07 15:57:40,512] ({pool-2-thread-2} Job.java[run]:189) - Job 
> failed
> java.lang.IncompatibleClassChangeError: Implementing class

(All error stack trace is here: 
https://gist.github.com/vando/50bd0dbb970d0c2bd2fe13a6344109b8.)

In zeppelin-env.sh file the environment variables are

> export MASTER=yarn-client
> export HADOOP_CONF_DIR="/etc/hadoop/conf"
> export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.4.2.0-258 
> -Dspark.yarn.queue=default"
> export SPARK_HOME="/usr/hdp/current/spark-client"
> export 
> PYTHONPATH="${SPARK_HOME}/python:${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip"
> export SPARK_YARN_USER_ENV="PYTHONPATH=${PYTHONPATH}"

Do you have any idea on how to correct this error?

Thanks in advance.

Best,


Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

2017-03-26 Thread Jianfeng (Jeff) Zhang

This is a bug of zeppelin. spark.driver.memory won't take effect. As for now it 
isn't passed to spark through -conf parameter. See 
https://issues.apache.org/jira/browse/ZEPPELIN-1263
The workaround is to specify SPARK_DRIVER_MEMORY in interpreter setting page.



Best Regard,
Jeff Zhang


From: RUSHIKESH RAUT 
>
Reply-To: "users@zeppelin.apache.org" 
>
Date: Sunday, March 26, 2017 at 5:03 PM
To: "users@zeppelin.apache.org" 
>
Subject: Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

ZEPPELIN_INTP_JAVA_OPTS


Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

2017-03-26 Thread Jianfeng (Jeff) Zhang

I verify it in master branch, it works for me. Set it in interpreter setting 
page as following.


[cid:8CB49F76-39F5-4A53-816B-9E47F7993050]


Best Regard,
Jeff Zhang


From: RUSHIKESH RAUT 
<rushikeshraut...@gmail.com<mailto:rushikeshraut...@gmail.com>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Sunday, March 26, 2017 at 8:02 PM
To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Subject: Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

Thanks Jianfeng,

But i am still not able to solve the issue. I have set it to 4g but still no 
luck.Can you please explain it to me how can I set SPARK_DRIVER_MEMORY  
property.
Also as I have read that GC overhead limit exceeded error occurs when the heap 
memory is insufficient. So How can I increase the heap memory. Please correct 
me if I am wrong as I am still trying to learn these things.
Reagrds,
Rushikesh Raut

On Sun, Mar 26, 2017 at 4:25 PM, Jianfeng (Jeff) Zhang 
<jzh...@hortonworks.com<mailto:jzh...@hortonworks.com>> wrote:

This is a bug of zeppelin. spark.driver.memory won't take effect. As for now it 
isn't passed to spark through -conf parameter. See 
https://issues.apache.org/jira/browse/ZEPPELIN-1263
The workaround is to specify SPARK_DRIVER_MEMORY in interpreter setting page.



Best Regard,
Jeff Zhang


From: RUSHIKESH RAUT 
<rushikeshraut...@gmail.com<mailto:rushikeshraut...@gmail.com>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Sunday, March 26, 2017 at 5:03 PM
To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Subject: Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

ZEPPELIN_INTP_JAVA_OPTS



Re: Setting Zeppelin to work with multiple Hadoop clusters when running Spark.

2017-03-26 Thread Jianfeng (Jeff) Zhang

What do you mean non-reliable ? If you want to read/write 2 hadoop cluster in 
one program, I am afraid this is the only way. It is impossible to specify 
multiple HADOOP_CONF_DIR under one jvm classpath. Only one default 
configuration will be used.


Best Regard,
Jeff Zhang


From: Serega Sheypak <serega.shey...@gmail.com<mailto:serega.shey...@gmail.com>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Sunday, March 26, 2017 at 7:47 PM
To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Subject: Re: Setting Zeppelin to work with multiple Hadoop clusters when 
running Spark.

I know it, thanks, but it's non reliable solution.

2017-03-26 5:23 GMT+02:00 Jianfeng (Jeff) Zhang 
<jzh...@hortonworks.com<mailto:jzh...@hortonworks.com>>:

You can try to specify the namenode address for hdfs file. e.g

spark.read.csv("hdfs://localhost:9009/file")

Best Regard,
Jeff Zhang


From: Serega Sheypak <serega.shey...@gmail.com<mailto:serega.shey...@gmail.com>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Sunday, March 26, 2017 at 2:47 AM
To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Subject: Setting Zeppelin to work with multiple Hadoop clusters when running 
Spark.

Hi, I have three hadoop clusters. Each cluster has it's own NN HA configured 
and YARN.
I want to allow user to read from ant cluster and write to any cluster. Also 
user should be able to choose where to run is spark job.
What is the right way to configure it in Zeppelin?




Re: Should zeppelin.pyspark.python be used on the worker nodes ?

2017-03-20 Thread Jianfeng (Jeff) Zhang

It is dynamic, you can set enviroment variable in interpreter setting page.


Best Regard,
Jeff Zhang


From: Ruslan Dautkhanov >
Reply-To: "users@zeppelin.apache.org" 
>
Date: Tuesday, March 21, 2017 at 3:27 AM
To: users >
Subject: Re: Should zeppelin.pyspark.python be used on the worker nodes ?

You're right - it will not be dynamic.

You may want to check
https://issues.apache.org/jira/browse/ZEPPELIN-2195
https://github.com/apache/zeppelin/pull/2079
it seems it is fixed in a current snapshot of Zeppelin (comitted 3 weeks ago).






--
Ruslan Dautkhanov

On Mon, Mar 20, 2017 at 1:21 PM, William Markito Oliveira 
> wrote:
Thanks for the quick response Ruslan.

But given that it's an environment variable, I can't quickly change that value 
and point to a different python environment without restarting the Zeppelin 
process, can I ? I mean is there a way to set the value for PYSPARK_PYTHON from 
the Interpreter configuration screen ?

Thanks,


On Mon, Mar 20, 2017 at 2:15 PM, Ruslan Dautkhanov 
> wrote:
You can set PYSPARK_PYTHON environment variable for that.

Not sure about zeppelin.pyspark.python. I think it does not work
See comments in https://issues.apache.org/jira/browse/ZEPPELIN-1265


Eventually, i think we can remove zeppelin.pyspark.python and use only 
PYSPARK_PYTHON instead to avoid confusion.


--
Ruslan Dautkhanov

On Mon, Mar 20, 2017 at 12:59 PM, William Markito Oliveira 
> wrote:
I'm trying to use zeppelin.pyspark.python as the variable to set the python 
that Spark worker nodes should use for my job, but it doesn't seem to be 
working.

Am I missing something or this variable does not do that ?

My goal is to change that variable to point to different conda environments.  
These environments are available in all worker nodes since it's on a shared 
location and ideally all nodes then would have access to the same libraries and 
dependencies.

Thanks,

~/William




--
~/William



Re: Roadmap for 0.8.0

2017-03-20 Thread Jianfeng (Jeff) Zhang

Strongly +1 for adding system test for different interpreter modes and focus on 
bug fixing than new features. I do heard from some users complain about the 
bugs of zeppelin major release.  A stabilized release is very necessary for 
community.




Best Regard,
Jeff Zhang


From: moon soo Lee >
Reply-To: "users@zeppelin.apache.org" 
>
Date: Tuesday, March 21, 2017 at 4:10 AM
To: "users@zeppelin.apache.org" 
>, dev 
>
Subject: Re: Roadmap for 0.8.0

Great to see discussion for 0.8.0.
List of features for 0.8.0 looks really good.

Interpreter factory refactoring
Interpreter layer supports various behavior depends on combination of 
PerNote,PerUser / Shared,Scoped,Isolated. We'll need strong test cases for each 
combination as a first step.
Otherwise, any pullrequest will silently break one of behavior at any time no 
matter we refactor or not. And fixing and testing this behavior is so hard.
Once we have complete test cases, not only guarantee the behavior but also make 
refactoring much easier.


0.8.0 release
I'd like to suggest improvements on how we release a new version.

In the past, 0.6.0 and 0.7.0 release with some critical problems. (took 3 
months to stabilize 0.6 and we're working on stabilizing 0.7.0 for 2 months)

I think the same thing will happen again with 0.8.0, while we're going to make 
lots of changes and add many new features.
After we released 0.8.0, while 'Stabilizing' the new release, user who tried 
the new release may get wrong impression of the quality. Which is very bad and 
we already repeated the mistake in 0.6.0 and 0.7.0.

So from 0.8.0 release, I'd suggest we improve way we release new version to 
give user proper expectation. I think there're several ways of doing it.

1. Release 0.8.0-preview officially and then release 0.8.0.
2. Release 0.8.0 with 'beta' or 'unstable' label. And keep 0.7.x as a 'stable' 
release in the download page. Once 0.8.x release becomes stable enough make 
0.8.x release as a 'stable' and move 0.7.x to 'old' releases.


After 0.8.0,
Since Zeppelin projects starts, project went through some major milestone, like

- project gets first users and first contributor
- project went into Apache Incubator
- project became TLP.

And I think it's time to think about hitting another major milestone.

Considering features we already have, features we're planning on 0.8, wide 
adoption of Zeppelin in the industry, I think it's time to focus on make 
project more mature and make a 1.0 release. Which i think big milestone for the 
project.

After 0.8.0 release, I suggest we more focus on bug fixes, stability 
improvement, optimizing user experience than adding new features. And with 
subsequent minor release, 0.8.1, 0.8.2 ... moment we feel confident about the 
quality, release it as a 1.0.0 instead of 0.8.x.

Once we have 1.0.0 released, then I think we can make larger, experimental 
changes on 2.0.0 branch aggressively, while we keep maintaining 1.0.x branch.


Thanks,
moon

On Mon, Mar 20, 2017 at 8:55 AM Felix Cheung 
> wrote:
There are several pending visualization improvements/PRs that would be very 
good to get them in as well.



From: Jongyoul Lee >
Sent: Sunday, March 19, 2017 9:03:24 PM
To: dev; users@zeppelin.apache.org
Subject: Roadmap for 0.8.0

Hi dev & users,

Recently, community submits very new features for Apache Zeppelin. I think it's 
very positive signals to improve Apache Zeppelin and its community. But in 
another aspect, we should focus on what the next release includes. I think we 
need to summarize and prioritize them. Here is what I know:

* Cluster management
* Admin feature
* Replace some context to separate users
* Helium online

Feel free to talk if you want to add more things. I think we need to choose 
which features will be included in 0.8.0, too.

Regards,
Jongyoul Lee

--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


Re: sqlContext not avilable as hiveContext in notebook

2017-04-04 Thread Jianfeng (Jeff) Zhang
I could get hiveContext correctly in zeppelin master, could you try 0.7.1 which 
is released recently ?



Best Regard,
Jeff Zhang


From: Meethu Mathew >
Reply-To: "users@zeppelin.apache.org" 
>
Date: Tuesday, April 4, 2017 at 8:45 PM
To: "users@zeppelin.apache.org" 
>
Subject: sqlContext not avilable as hiveContext in notebook

Hi,

I am running zeppelin 0.7.0. the sqlContext already created in the zeppelin 
notebook returns a , 
even though my spark is built with HIVE.

"zeppelin.spark.useHiveContext" in the spark properties is set to true.

As mentioned in https://issues.apache.org/jira/browse/ZEPPELIN-1728, I tried


  hc = HiveContext.getOrCreate(sc)

​but still its returning
​
​.

My pyspark shell and ​jupyter notebook is returning

​ without doing anything.

How to get

​ in the zeppelin notebook ?​

Regards,
Meethu Mathew



Re: java.lang.NullpointerException

2017-04-23 Thread Jianfeng (Jeff) Zhang
The message is clear that you are setting both spark.driver.extraJavaOptions 
and SPARK_JAVA_OPTS.
Please check spark-defaults.conf and interpreter setting.



Best Regard,
Jeff Zhang


From: kant kodali >
Reply-To: "users@zeppelin.apache.org" 
>
Date: Sunday, April 23, 2017 at 1:58 PM
To: "users@zeppelin.apache.org" 
>
Subject: Re: java.lang.NullpointerException

FYI: I am using Spark Standalone mode

On Sat, Apr 22, 2017 at 10:57 PM, kant kodali 
> wrote:

Hi All,

I get the below stack trace when I am using Zeppelin. If I don't use Zeppelin 
all my client Jobs are running fine. I am using spark 2.1.0

I am not sure why Zeppelin is unable to create a SparkContext ? but then it 
says it created SparkSession so it doesn't seem to make a lot of sense. any 
idea?

Thanks!


Caused by: org.apache.spark.SparkException: Found both 
spark.driver.extraJavaOptions and SPARK_JAVA_OPTS. Use only the former.

at 
org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$5.apply(SparkConf.scala:521)

at 
org.apache.spark.SparkConf$$anonfun$validateSettings$6$$anonfun$apply$5.apply(SparkConf.scala:519)

at scala.collection.immutable.List.foreach(List.scala:381)

at 
org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:519)

at 
org.apache.spark.SparkConf$$anonfun$validateSettings$6.apply(SparkConf.scala:505)

at scala.Option.foreach(Option.scala:257)

at org.apache.spark.SparkConf.validateSettings(SparkConf.scala:505)

at org.apache.spark.SparkContext.(SparkContext.scala:365)

at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2258)

at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)

at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823)

at scala.Option.getOrElse(Option.scala:121)

at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)

... 20 more

 INFO [2017-04-23 05:17:08,467] ({pool-2-thread-2} 
SparkInterpreter.java[createSparkSession]:372) - Created Spark session

ERROR [2017-04-23 05:17:08,467] ({pool-2-thread-2} Job.java[run]:181) - Job 
failed

java.lang.NullPointerException

at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)

at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:33)

at 
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_2(SparkInterpreter.java:391)

at 
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:380)

at 
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:146)

at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:828)

at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)

at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:483)

at org.apache.zeppelin.scheduler.Job.run(Job.java:175)

at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)

at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

 INFO [2017-04-23 05:17:08,475] ({pool-2-thread-2} 
SchedulerFactory.java[jobFinished]:137) - Job remoteInterpretJob_1492924623730 
finished by scheduler org.apache.zeppelin.spark.SparkInterpreter1233713905



Re: Release on 0.7.1 and 0.7.2

2017-03-14 Thread Jianfeng (Jeff) Zhang

+1

Best Regard,
Jeff Zhang


From: Jun Kim >
Reply-To: "users@zeppelin.apache.org" 
>
Date: Tuesday, March 14, 2017 at 11:38 AM
To: "users@zeppelin.apache.org" 
>
Subject: Re: Release on 0.7.1 and 0.7.2

Cool! I look forward to it!

2017년 3월 14일 (화) 오후 12:31, moon soo Lee 
>님이 작성:
Sounds like a plan!


On Mon, Mar 13, 2017 at 8:22 PM Xiaohui Liu 
> wrote:
This is the right action. In fact, 0.7.0 release bin did not work for my team. 
We almost started to use 0.7.1-snapshot immediately after 0.7.0 release.

I guess many of us are taking the same route.

But for new zeppelin users, starting with 0.7.0 will give them the wrong first 
impression.


On Tue, 14 Mar 2017 at 10:28 AM, Jongyoul Lee 
> wrote:
Hi dev and users,

As we released 0.7.0, most of users and dev reported a lot of bugs which were 
critical. For the reason, community including me started to prepare new minor 
release with umbrella issue[1]. Due to contributors' efforts, we have resolved 
some of issues and have reviewed almost unresolved issues. I want to talk about 
the new minor release at this point. Generally, we have resolved all of issues 
reported as bugs before we release but some issues are very critical and it 
causes serious problem using Apache Zeppelin. Then I think, in this time, it's 
better to release 0.7.1 as soon as we can and prepare a new minor release with 
rest of unresolved issues.

I'd like to start a process this Friday and if some issues are not merged until 
then, I hope they would be included in 0.7.2.

Feel free to talk to me if you have a better plan to improve users' experiences.

Regards,
Jongyoul Lee

[1] https://issues.apache.org/jira/browse/ZEPPELIN-2134


--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net
--
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul


Re: java.lang.ClassNotFoundException: $anonfun$1

2017-03-07 Thread Jianfeng (Jeff) Zhang
>>>  It appears that  during execution time on the yarn hosts, the native CDH 
>>> spark1.5 jars are loaded before the new spark2 jars.  I've tried using 
>>> spark.yarn.archive to specify the spark2 jars in hdfs as well as using 
>>> other spark options, none of which seems to make a difference.

Where do you see “ spark1.5 jars are loaded before the new spark2 jars” ?

Best Regard,
Jeff Zhang


From: Rob Anderson 
>
Reply-To: "users@zeppelin.apache.org" 
>
Date: Wednesday, March 8, 2017 at 2:29 AM
To: "users@zeppelin.apache.org" 
>
Subject: Re: java.lang.ClassNotFoundException: $anonfun$1

Thanks.  I can reach out to Cloudera, although the same commands seem to be 
work via Spak-Shell (see below).  So, the issue seems unique to Zeppelin.


Spark context available as 'sc' (master = yarn, app id = 
application_1472496315722_481416).

Spark session available as 'spark'.

Welcome to

    __

 / __/__  ___ _/ /__

_\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 2.0.0.cloudera1

  /_/



Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60)

Type in expressions to have them evaluated.

Type :help for more information.


scala> val taxonomy = sc.textFile("/user/user1/data/")

taxonomy: org.apache.spark.rdd.RDD[String] = /user/user1/data/ 
MapPartitionsRDD[1] at textFile at :24


scala> .map(l => l.split("\t"))

res0: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[2] at map at 
:27


scala> taxonomy.first

res1: String = 43 B 459Sheets & Pillow 45 Sheets1 Sheets

On Mon, Mar 6, 2017 at 6:48 PM, moon soo Lee 
> wrote:
Hi Rob,

Thanks for sharing the problem.
fyi, https://issues.apache.org/jira/browse/ZEPPELIN-1735 is tracking the 
problem.

If we can get help from cloudera forum, that would be great.

Thanks,
moon

On Tue, Mar 7, 2017 at 10:08 AM Jeff Zhang 
> wrote:

It seems CDH specific issue, you might be better to ask cloudera forum.


Rob Anderson 
>于2017年3月7日周二 
上午9:02写道:
Hey Everyone,

We're running Zeppelin 0.7.0.  We've just cut over to spark2, using scala11 via 
the CDH parcel (SPARK2-2.0.0.cloudera1-1.cdh5.7.0.p0.113931).

Running a simple job, throws a "Caused by: java.lang.ClassNotFoundException: 
$anonfun$1".  It appears that  during execution time on the yarn hosts, the 
native CDH spark1.5 jars are loaded before the new spark2 jars.  I've tried 
using spark.yarn.archive to specify the spark2 jars in hdfs as well as using 
other spark options, none of which seems to make a difference.


Any suggestions you can offer is appreciated.

Thanks,

Rob




%spark
val taxonomy = sc.textFile("/user/user1/data/")
 .map(l => l.split("\t"))

%spark
taxonomy.first


org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 
7, data08.hadoop.prod.ostk.com, executor 
2): java.lang.ClassNotFoundException: $anonfun$1
at 
org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:82)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at 

Re: Zeppelin postgres interpreter throws NPE

2017-08-03 Thread Jianfeng (Jeff) Zhang
Agree, you can create a ticket for better exception message. And the error is 
better to be displayed in frontend




Best Regard,
Jeff Zhang


From: Richard Xin >
Reply-To: "users@zeppelin.apache.org" 
>
Date: Friday, August 4, 2017 at 4:40 AM
To: "users@zeppelin.apache.org" 
>
Subject: Re: Zeppelin postgres interpreter throws NPE

oh, just figured it out myself
it seems that the property I aded in interpreter config postgres.database is 
not recognized by zeppelin, I solved it by adding dbname to url.

anyway, the exception is better to be handled and emit more friendly error 
message.


On Thursday, August 3, 2017, 1:35:04 PM PDT, Richard Xin 
> wrote:


on AWS EMR
1. Tested remote postgres connection using psql, works OK
2. added postges interpreter via Interpreter UI
3. create new note:
%postgresql   (or %psql)
select * from test.batch_report;
got NPE

4. here are what's in the log:

WARN [2017-08-03 20:22:46,859] ({qtp459296537-18} 
LoginRestApi.java[postLogin]:119) - 
{"status":"OK","message":"","body":{"principal":"richard.xin","ticket":"d86ab3d9-3976-43a4-a274-df50ee3b627a","roles":"[]"}}
INFO [2017-08-03 20:22:50,370] ({qtp459296537-21} 
NotebookServer.java[sendNote]:705) - New operation from 172.17.197.49 : 57990 : 
richard.xin : GET_NOTE : 2CNXQE5CR
INFO [2017-08-03 20:22:50,656] ({qtp459296537-18} 
InterpreterFactory.java[createInterpretersForNote]:188) - Create interpreter 
instance spark for note 2CNXQE5CR
INFO [2017-08-03 20:22:50,661] ({qtp459296537-18} 
InterpreterFactory.java[createInterpretersForNote]:221) - Interpreter 
org.apache.zeppelin.spark.SparkInterpreter 2080711349 created
INFO [2017-08-03 20:22:50,662] ({qtp459296537-18} 
InterpreterFactory.java[createInterpretersForNote]:221) - Interpreter 
org.apache.zeppelin.spark.PySparkInterpreter 506689376 created
INFO [2017-08-03 20:22:50,662] ({qtp459296537-18} 
InterpreterFactory.java[createInterpretersForNote]:221) - Interpreter 
org.apache.zeppelin.spark.SparkSqlInterpreter 1462545045 created
INFO [2017-08-03 20:22:50,776] ({qtp459296537-22} 
InterpreterFactory.java[createInterpretersForNote]:188) - Create interpreter 
instance postgresql for note 2CNXQE5CR
INFO [2017-08-03 20:22:50,776] ({qtp459296537-22} 
InterpreterFactory.java[createInterpretersForNote]:221) - Interpreter 
org.apache.zeppelin.postgresql.PostgreSqlInterpreter 340437485 created
INFO [2017-08-03 20:22:53,610] ({pool-2-thread-2} 
SchedulerFactory.java[jobStarted]:131) - Job paragraph_1501783535283_1713771734 
started by scheduler 
org.apache.zeppelin.interpreter.remote.RemoteInterpretershared_session392982422
INFO [2017-08-03 20:22:53,611] ({pool-2-thread-2} Paragraph.java[jobRun]:362) - 
run paragraph 20170803-180535_552293631 using psql 
org.apache.zeppelin.interpreter.LazyOpenInterpreter@144aa9ed
INFO [2017-08-03 20:22:53,620] ({pool-2-thread-2} 
RemoteInterpreterManagedProcess.java[start]:126) - Run interpreter process 
[/usr/lib/zeppelin/bin/interpreter.sh, -d, 
/usr/lib/zeppelin/interpreter/postgresql, -p, 42610, -l, 
/usr/lib/zeppelin/local-repo/2CPQEJPGC]
INFO [2017-08-03 20:22:54,190] ({pool-2-thread-2} 
RemoteInterpreter.java[init]:221) - Create remote interpreter 
org.apache.zeppelin.postgresql.PostgreSqlInterpreter
INFO [2017-08-03 20:22:54,370] ({pool-2-thread-2} 
RemoteInterpreter.java[pushAngularObjectRegistryToRemote]:551) - Push local 
angular object registry from ZeppelinServer to remote interpreter group 
2CPQEJPGC:shared_process
WARN [2017-08-03 20:22:54,606] ({pool-2-thread-2} 
NotebookServer.java[afterStatusChange]:2058) - Job 20170803-180535_552293631 is 
finished, status: ERROR, exception: null, result: %text 
java.lang.NullPointerException
at 
org.apache.zeppelin.postgresql.PostgreSqlInterpreter.executeSql(PostgreSqlInterpreter.java:202)
at 
org.apache.zeppelin.postgresql.PostgreSqlInterpreter.interpret(PostgreSqlInterpreter.java:289)
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:97)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:498)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 

Re: can Zeppelin runParagraph from different notebooks?

2017-06-26 Thread Jianfeng (Jeff) Zhang

z.run is a async call, It just send the message to ask zeppelin to run note 
instead of waiting until is finished.




Best Regard,
Jeff Zhang


From: Richard Xin <richardxin...@yahoo.com<mailto:richardxin...@yahoo.com>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Tuesday, June 27, 2017 at 7:26 AM
To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>, 
"m...@apache.org<mailto:m...@apache.org>" 
<m...@apache.org<mailto:m...@apache.org>>
Subject: Re: can Zeppelin runParagraph from different notebooks?

Thanks Jeff,
It worked in some cases, However it seems it has following issues:
example:
in Note A I have following paragraph:
val message = "Hello Zeppelin"
println("Note 1 = " + message)


Test Case 1===

in Note B:
Paragraph 1:
%spark
z.run("2CN3UDXMZ", "20170612-231131_191205958")
// works but it does not print out message "Note 1 = " ...

Paragraph 2:
println("P2: message == " + message)
// Works OK

Test Case 2===
in Note B:
Paragraph 1:
%spark
z.run("2CN3UDXMZ", "20170612-231131_191205958")
println("Note2: " + message)
//:316: error: not found: value message
// newly defined variables can not be used in the same paragraph of z.run?


My question is, is the behavior I described expected? Or did I missing 
something obvious?

Thanks,
RichardX



On Monday, June 26, 2017, 3:23:52 PM PDT, Jianfeng (Jeff) Zhang 
<jzh...@hortonworks.com<mailto:jzh...@hortonworks.com>> wrote:



z.run


Best Regard,
Jeff Zhang


From: Richard Xin <richardxin...@yahoo.com<mailto:richardxin...@yahoo.com>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Tuesday, June 27, 2017 at 6:04 AM
To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>, 
"m...@apache.org<mailto:m...@apache.org>" 
<m...@apache.org<mailto:m...@apache.org>>
Subject: Re: can Zeppelin runParagraph from different notebooks?

%spark
run("2CN3UDXMZ", "20170609-233658_1498522009");  (which is spark script written 
in scala)

:288: error: not found: value run
run("2CN3UDXMZ", "20170609-233658_1498522009");
^

Zeppelin Version 0.7.1
Did I miss anything?
Thanks,



On Tuesday, June 13, 2017, 8:48:54 PM PDT, Jianfeng (Jeff) Zhang 
<jzh...@hortonworks.com<mailto:jzh...@hortonworks.com>> wrote:



Please use 0.7.x


Best Regard,
Jeff Zhang


From: Richard Xin <richardxin...@yahoo.com<mailto:richardxin...@yahoo.com>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Wednesday, June 14, 2017 at 8:48 AM
To: "m...@apache.org<mailto:m...@apache.org>" 
<m...@apache.org<mailto:m...@apache.org>>, 
"users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Subject: Re: can Zeppelin runParagraph from different notebooks?

it doesn't work, which version you are using?


On Tuesday, June 13, 2017, 5:41:06 PM PDT, moon soo Lee 
<m...@apache.org<mailto:m...@apache.org>> wrote:


In spark interpreter, you can try

%spark
run(NOTE_ID, PARAGRAPH_ID)

Hope this helps.

Thanks,
moon

On Mon, Jun 12, 2017 at 9:52 AM Richard Xin 
<richardxin...@yahoo.com<mailto:richardxin...@yahoo.com>> wrote:
Angular (frontend 
API)<http://zeppelin.apache.org/docs/0.6.0/displaysystem/front-end-angular.html#run-paragraph>

<http://zeppelin.apache.org/docs/0.6.0/displaysystem/front-end-angular.html#run-paragraph>





Angular (frontend API)

The Apache Software Foundation

Angular (frontend API)





This method doesn't seem to support running paragraphs from different 
notebooks, did I miss anything?


Re: InvalidClassException using Zeppelin (master) and spark-2.1 on a standalone spark cluster

2017-06-27 Thread Jianfeng (Jeff) Zhang

It is fixed in https://issues.apache.org/jira/browse/ZEPPELIN-1977


Best Regard,
Jeff Zhang





On 6/27/17, 12:46 PM, "David Howell"  wrote:

>Hi, 
>I know this issue is resolved for reading from json, and tested for that
>use
>case, but I'm seeing the exact same error message when writing to json.
>
>java.io.InvalidClassException:
>org.apache.commons.lang3.time.FastDateParser;
>local class incompatible: stream classdesc serialVersionUID = 2, local
>class
>serialVersionUID = 3
>
>Easy to reproduce on AWS (also happens writing to HDFS)
>
>val dfyo = List((1,"hi"),(2,"there"),(3,"yo"))
>.toDF()
>.write
>.json(f"s3n://ReplaceWithBucketName/test/")
>
>
>I know writing to json is not a common use case, so not an urgent issue.
>
>I see the git commit looks like it is mostly replacing the import of
>-import org.apache.commons.lang3.StringUtils;
>+import org.apache.commons.lang.StringUtils;
>
>But the issue is with org.apache.commons.lang3.time.FastDateParser
>
>I had a quick search in the codebase and couldn't find any imports of that
>class directly or indirectly so I'm not really sure what would fix it.
>
>
>
>
>--
>View this message in context:
>http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/I
>nvalidClassException-using-Zeppelin-master-and-spark-2-1-on-a-standalone-s
>park-cluster-tp4900p5854.html
>Sent from the Apache Zeppelin Users (incubating) mailing list mailing
>list archive at Nabble.com.
>



Re: unable to restart zeppelin on AWS EMR

2017-06-04 Thread Jianfeng (Jeff) Zhang

Please check the zeppelin log


Best Regard,
Jeff Zhang


From: shyla deshpande 
>
Reply-To: "users@zeppelin.apache.org" 
>
Date: Sunday, June 4, 2017 at 11:28 AM
To: "users@zeppelin.apache.org" 
>
Subject: unable to restart zeppelin on AWS EMR

I changed some configuration and want to restart  zeppelin on AWS EMR, but 
unable to do. My local Zeppelin works fine.

I have tried
1.  zeppelin-daemon.sh restart   outputs[  OK  ]   but has no 
effect.
2.  sudo stop zeppelin outputsstop: Unknown instance:
3.  sudo start zeppelin outputs .  start: Job failed to 
start

Zeppelin is running and able to login as anonymous user, unable to restart . 
Appreciate your input.

Thanks


Re: Unable to run Zeppelin Spark on YARN

2017-05-04 Thread Jianfeng (Jeff) Zhang

Could you try set yarn-client in interpreter setting page ?


Best Regard,
Jeff Zhang


From: Yeshwanth Jagini >
Reply-To: "users@zeppelin.apache.org" 
>
Date: Friday, May 5, 2017 at 3:13 AM
To: "users@zeppelin.apache.org" 
>
Subject: Unable to run Zeppelin Spark on YARN

Hi we are running cloudera CDH 5.9.1 .

while setting up zeppelin, i followed the documentation on website and 
specified following options

export ZEPPELIN_JAVA_OPTS="-Dhadoop.version=2.6.0-cdh5.9.1" 
# Additional jvm options. for example, export 
ZEPPELIN_JAVA_OPTS="-Dspark.executor.memory=8g -Dspark.cores.max=16"

export SPARK_HOME="/opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p0.4/lib/spark"   
 # (required) When it is defined, load it instead of 
Zeppelin embedded Spark libraries
export SPARK_SUBMIT_OPTIONS="--master yarn --deploy-mode client"
   # (optional) extra options to pass to spark submit. eg) "--driver-memory 
512M --executor-memory 1G".
export SPARK_APP_NAME=Zeppelin # (optional) The name of 
spark application.

export HADOOP_CONF_DIR=/etc/hadoop/conf # yarn-site.xml 
is located in configuration directory in HADOOP_CONF_DIR.

export ZEPPELIN_IMPERSONATE_CMD='sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash 
-c'   # Optional, when user want to run interpreter as end web user. eg) 
'sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c '

when running spark notebook, spark-submit is running in local mode and i cannot 
see the application in yarn resource manager.
is there any other configuration i am missing?


Thanks,
Yeshwanth Jagini


Re: InvalidClassException using Zeppelin (master) and spark-2.1 on a standalone spark cluster

2017-06-27 Thread Jianfeng (Jeff) Zhang

It works for me in master branch.

Maybe the issue is due to ZEPPELIN-2375

Could you try 0.7.2 ?



Best Regard,
Jeff Zhang





On 6/27/17, 2:43 PM, "David Howell" <david.how...@zipmoney.com.au> wrote:

>Hi Jeff, 
>The ticket says it is fixed from Zeppelin 0.7.0
>
>I am running Zeppelin 0.7.1 , and yes it is fixed for reading from json,
>but it still throws an error for writing to json.
>
>See my repro example.
>
>
>-----Original Message-
>From: Jianfeng (Jeff) Zhang [mailto:jzh...@hortonworks.com]
>Sent: Tuesday, 27 June 2017 4:40 PM
>To: users@zeppelin.apache.org; us...@zeppelin.incubator.apache.org
>Subject: Re: InvalidClassException using Zeppelin (master) and spark-2.1
>on a standalone spark cluster
>
>
>It is fixed in https://issues.apache.org/jira/browse/ZEPPELIN-1977
>
>
>Best Regard,
>Jeff Zhang
>
>
>
>
>
>On 6/27/17, 12:46 PM, "David Howell" <david.how...@zipmoney.com.au> wrote:
>
>>Hi,
>>I know this issue is resolved for reading from json, and tested for
>>that use case, but I'm seeing the exact same error message when writing
>>to json.
>>
>>java.io.InvalidClassException:
>>org.apache.commons.lang3.time.FastDateParser;
>>local class incompatible: stream classdesc serialVersionUID = 2, local
>>class serialVersionUID = 3
>>
>>Easy to reproduce on AWS (also happens writing to HDFS)
>>
>>val dfyo = List((1,"hi"),(2,"there"),(3,"yo"))
>>.toDF()
>>.write
>>.json(f"s3n://ReplaceWithBucketName/test/")
>>
>>
>>I know writing to json is not a common use case, so not an urgent issue.
>>
>>I see the git commit looks like it is mostly replacing the import of
>>-import org.apache.commons.lang3.StringUtils;
>>+import org.apache.commons.lang.StringUtils;
>>
>>But the issue is with org.apache.commons.lang3.time.FastDateParser
>>
>>I had a quick search in the codebase and couldn't find any imports of
>>that class directly or indirectly so I'm not really sure what would fix
>>it.
>>
>>
>>
>>
>>--
>>View this message in context:
>>http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.co
>>m/I 
>>nvalidClassException-using-Zeppelin-master-and-spark-2-1-on-a-standalon
>>e-s
>>park-cluster-tp4900p5854.html
>>Sent from the Apache Zeppelin Users (incubating) mailing list mailing
>>list archive at Nabble.com.
>>
>



Re: Zeppelin Port Configuration

2017-09-13 Thread Jianfeng (Jeff) Zhang

What do you see in logs ?


Best Regard,
Jeff Zhang


From: Carlos Andres Zambrano Barrera >
Reply-To: "users@zeppelin.apache.org" 
>
Date: Thursday, September 14, 2017 at 12:07 AM
To: "users@zeppelin.apache.org" 
>
Subject: Zeppelin Port Configuration

Hi,

I need to have zeppelin on port 80, and i change it in:
zeppelin-site.xml
Andzeppelin-env.sh
I restart the services and it does not work the interpreter.

Anyone could help me, please?
--
Ing. Carlos Andrés Zambrano Barrera
Cel: 3123825834






[https://s3-eu-west-1.amazonaws.com/mailtrack-crx/icon-signature.png]Sent
 with 
Mailtrack


Re: shell interpreter variables

2017-09-22 Thread Jianfeng (Jeff) Zhang

This is due to the implementation of shell interpreter. Each paragraph will 
launch a shell process, which means each paragraph are in separated shell 
sessions.




Best Regard,
Jeff Zhang


From: Mohit Jaggi >
Reply-To: "users@zeppelin.apache.org" 
>
Date: Saturday, September 23, 2017 at 2:54 AM
To: "users@zeppelin.apache.org" 
>
Subject: shell interpreter variables

Hi All,
I am using the shell interpreter and noticed that although I can see the scope 
is global and shared in the configuration, any shell variable set in one para 
is not visible in another.

e.g.

para1 --
export x=1 #also tried x =1, without export
echo $x #prints 1

para2 --
echo $x #prints nothing

What am I doing wrong?

Cheers,
Mohit.


Re: Zeppelin Port Configuration

2017-09-13 Thread Jianfeng (Jeff) Zhang

That doesn't work fine :)

Please check the log for details.


Best Regard,
Jeff Zhang


From: Carlos Andres Zambrano Barrera <cza...@gmail.com<mailto:cza...@gmail.com>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Thursday, September 14, 2017 at 3:50 AM
To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Subject: Re: Zeppelin Port Configuration

I just change and it works fine!

now when i tried to connect it shows websocket disconnected and i can not see 
my notebooks

I tried with different web-browser from different IP and without proxy, but we 
can not open my notebooks.
.
[Imágenes integradas 1]



[https://s3-eu-west-1.amazonaws.com/mailtrack-crx/icon-signature.png]<https://mailtrack.io/>Sent
 with 
Mailtrack<https://mailtrack.io/install?source=signature=en=cza...@gmail.com=22>

2017-09-13 14:45 GMT-05:00 Jianfeng (Jeff) Zhang 
<jzh...@hortonworks.com<mailto:jzh...@hortonworks.com>>:

What do you see in logs ?


Best Regard,
Jeff Zhang


From: Carlos Andres Zambrano Barrera <cza...@gmail.com<mailto:cza...@gmail.com>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Thursday, September 14, 2017 at 12:07 AM
To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Subject: Zeppelin Port Configuration

Hi,

I need to have zeppelin on port 80, and i change it in:
zeppelin-site.xml
Andzeppelin-env.sh
I restart the services and it does not work the interpreter.

Anyone could help me, please?
--
Ing. Carlos Andrés Zambrano Barrera
Cel: 3123825834<tel:312%203825834>






[https://s3-eu-west-1.amazonaws.com/mailtrack-crx/icon-signature.png]<https://mailtrack.io/>Sent
 with 
Mailtrack<https://mailtrack.io/install?source=signature=en=cza...@gmail.com=22>



--
Ing. Carlos Andrés Zambrano Barrera
Cel: 3123825834





Re: Implementing run all paragraphs sequentially

2017-10-06 Thread Jianfeng (Jeff) Zhang

Since almost everyone agree on to run serial by default. We could implement it 
first. Regarding the parallel mode,  we could leave it in future although 
personally I prefer to define DAG for note.


Best Regard,
Jeff Zhang


From: Michael Segel 
>
Reply-To: "users@zeppelin.apache.org" 
>
Date: Friday, October 6, 2017 at 10:08 PM
To: "users@zeppelin.apache.org" 
>
Subject: Re: Implementing run all paragraphs sequentially

Guys…

1) You’re posting this to the user list… Isn’t this a dev question?

2) +1 on the run serial… but doesn’t that already exist with the “run all 
paragraphs” button already?

3) -1 on a ‘run all in parallel’ button.  (Its like putting lipstick on a pig.)

Are you really going to run all of the paragraphs in parallel?  You’re not 
going to have a paragraph that is used to set things up? Import external 
libraries?  Define classes/functions for future paragraphs to use?

IMHO I would much rather see a DAG where each paragraph can set their 
dependancy… (this isn’t the right term. I’m trying to think back to how it was 
described in NeXTStep objective-c code.)
Then you could set your parallel button to run in parallel but if your 
paragraph is dependent on another, its blocked from executing until its 
predecessor completes.

But that’s just my $0.02

On Oct 6, 2017, at 2:25 AM, Polyakov Valeriy 
> wrote:

Thank you all for sharing the problem. Naman Mishra had started the 
implementation of serial run in [1] so I propose to come back for the 
discussion of next step (both Parallel and Serial run buttons) after [1] will 
resolved.

[1] https://issues.apache.org/jira/browse/ZEPPELIN-2368


Valeriy Polyakov

From: Jeff Zhang [mailto:zjf...@gmail.com]
Sent: Friday, October 06, 2017 10:14 AM
To: users@zeppelin.apache.org
Subject: Re: Implementing run all paragraphs sequentially


+1 for serial run by default.  Let's leave others in future.

Mohit Jaggi >于2017年10月6日周五 
上午7:48写道:
+1 for serial run by default.

Sent from my iPhone

On Oct 5, 2017, at 3:36 PM, moon soo Lee 
> wrote:
I'd like to we also consider simplicity of use.

We can have two different modes, or two different run buttons for Serial or 
Parallel run. This gives flexibility of choosing two different scheduler as a 
benefit, but to make user understand difference between two run button, there 
must be really good UI treatment.

I see there're high user demands for run notebook sequentially. And i think 
there're 3 action items in this discussion threads.

1. Change Parallel -> Serial the current run all button behavior
2. Provide both Parallel and Serial run buttons with really good UI treatment.
3. Provides DAG

I think 1) does not stop 2) and 3) in the future. 2) also does not stop 3) in 
the future.

So, why don't we try 1) first and keep discuss and polish idea about 2) and 3)?


Thanks,
moon

On Mon, Oct 2, 2017 at 10:22 AM Michael Segel 
> wrote:
Whoa!
Seems I walked in to something.

Herval,

What do you suggest?  A simple switch that runs everything in serial, or 
everything in parallel?
That would be a very bad idea.

I gave you an example of a class of solutions where you don’t want that 
behavior.
E.g Unit testing where you have one setup and then run several unit tests in 
parallel.

If that’s not enough for you… how about if you want to test producer/consumer 
problems?

Or if you want to define classes in one paragraph but then call on them in 
later paragraphs. If everything runs in parallel from the start of time 0, you 
can’t do this.


So, if you want to do it right the first time… you need to establish a way to 
control the dependency of paragraphs. This isn’t rocket science.
And frankly not that complex.

BTW, this is the user list not the dev list…

Just saying…  ;-)


On Oct 2, 2017, at 11:24 AM, Herval Freire 
> wrote:

 "nice to have" isn't a very strong requirement. I strongly uggest you really, 
really think about this before you start pounding an overengineered solution to 
a non-issue :-)

h

On Mon, Oct 2, 2017 at 9:12 AM, Michael Segel 
> wrote:
Yes…
 You have bunch of unit tests you can run in parallel where you only need one 
constructor and one cleanup.

I would strongly suggest that you really, really think about this long and hard 
before you start to pound code.
Its going to be harder to back out and fix than if you take the time to think 
thru the problem and not make a dumb mistake.

On Oct 2, 2017, at 

Re: Is any limitation of maximum interpreter processes?

2017-10-02 Thread Jianfeng (Jeff) Zhang

Which interpreter is pending ? It is possible that spark interpreter pending 
due to yarn resource capacity if you run it in yarn client mode

If it is pending, you can check the log first.



Best Regard,
Jeff Zhang


From: Belousov Maksim Eduardovich 
>
Reply-To: "users@zeppelin.apache.org" 
>
Date: Monday, October 2, 2017 at 9:26 PM
To: "users@zeppelin.apache.org" 
>
Subject: Is any limitation of maximum interpreter processes?

Hello, users!

Our analysts run notes with such interpreters: markdown, one or two jdbc and 
pyspark. The interpreters are instantiated Per User in isolated process and Per 
Note in isolated process.

And the analysts complain that sometimes paragraphs aren't processed and stay 
in status 'Pending'.
We noticed that it happen when number of started interpreter processes is about 
90-100.
If admin restarts one of the popular interpreter (that is killing some 
interpreter processes), the paragraphs become 'Running'.

We can't see any workload on zeppelin server when paragraphs are pended. RAM is 
sufficiently, iowait ~ 0
Also we can't find out any parameters about maximum interpreter processes.


Has anyone of you faced the same problem? How can this problem be solved?


Thanks,

Maksim Belousov




Re: ZeppelinContext run method runs a paragraph as anonymous user

2017-09-13 Thread Jianfeng (Jeff) Zhang

Could you create a ticket for it first ?  I suspect it is due to we didn't pass 
user name in InterpreterContextRunner, but needs more investigation.




Best Regard,
Jeff Zhang


From: Deenar Toraskar 
>
Reply-To: "users@zeppelin.apache.org" 
>
Date: Wednesday, September 13, 2017 at 3:04 PM
To: "users@zeppelin.apache.org" 
>
Subject: Re: ZeppelinContext run method runs a paragraph as anonymous user

Hi

I work with Luis. We have tried both options of triggering cells in the same 
notebook using z.run(paraIndex) as well as the paragraph id 
z.run("20170620-085926_474506193") .

We are happy to patch Zeppelin. Would be grateful if you can point us to the 
right direction on how to propogate the credentials.

Regards
Deenar

On 12 September 2017 at 20:30, Luis Angel Vicente Sanchez 
> wrote:
I also tried with z.run(paragraphSeqNumber)

--
  Luis Angel Vicente Sanchez
  zeppelin-us...@bigcente.ch



On Tue, 12 Sep 2017, at 09:22, Luis Angel Vicente Sanchez wrote:
That's quite simple actually. Zeppelin exposes the ZeppelinContext as the 
variable z... you just need to do this:

z.run(paragraphId)

--
  Luis Angel Vicente Sanchez
  zeppelin-us...@bigcente.ch



On Tue, 12 Sep 2017, at 03:33, Park Hoon wrote:
Hi, could you share the paragraph to show how did you run a paragraph in a 
different paragraph?

> if we want to run a paragraph from another paragraph in the same notebook (to 
> refresh it),

Regard,

On Mon, Sep 11, 2017 at 11:24 PM, Luis Angel Vicente Sanchez 
> wrote:
Some extra info:

println(s"AUTHENTICATION INFO ::
${z.getInterpreterContext.getAuthenticationInfo.getUser}
${z.getInterpreterContext.getAuthenticationInfo.getTicket}")

That line inside a Spark notebook prints both the user name and the
ticket that the user gets after a successful login... so the interpreter
knows who the user is. Can that info be used to run a paragraph?


--
  Luis Angel Vicente Sanchez
  zeppelin-us...@bigcente.ch
On Mon, 11 Sep 2017, at 12:16, Luis Angel Vicente Sanchez wrote:
> And we are running the notebook using spark local, and using a whirl
> JdbcRealm to authenticate users is there anything we can do to make the
> spark interpreter impersonate the front-end user?
>
> --
>   Luis Angel Vicente Sanchez
>   zeppelin-us...@bigcente.ch
>
> On Mon, 11 Sep 2017, at 11:14, Luis Angel Vicente Sanchez wrote:
> > We are using Zeppelin 0.7.1/
> >
> >
> > --
> >   Luis Angel Vicente Sanchez
> >   zeppelin-us...@bigcente.ch
> >
> > On Mon, 11 Sep 2017, at 11:12, Luis Angel Vicente Sanchez wrote:
> > > Hi,
> > >
> > > We have enabled notebook permissions in our Zeppelin installation and
> > > now we are facing the problem that if we want to run a paragraph from
> > > another paragraph in the same notebook (to refresh it), the user that is
> > > running that paragraph is the anonymous user and not the front-end user
> > > and, therefore, we get a "ForbiddenException" because of that.
> > >
> > > Is there a way to run a paragraph as the front-end user?
> > >
> > >
> > > Kind regards,
> > >
> > > Luis Angel Vicente Sanchez
> > > zeppelin-us...@bigcente.ch




Re: Zeppelin Interpreter Page Not Showing

2017-09-27 Thread Jianfeng (Jeff) Zhang

Which version of zeppelin do you use ? And could you check the zeppelin server 
log ?


Best Regard,
Jeff Zhang


From: "Tan, Jialiang" >
Reply-To: "users@zeppelin.apache.org" 
>
Date: Thursday, September 28, 2017 at 2:24 AM
To: "users@zeppelin.apache.org" 
>
Subject: Zeppelin Interpreter Page Not Showing

There are cases where after using Zeppelin for a while, its interpreter page 
does not show. Also I was not able to run any paragraphs anymore (nothing 
happens when click run button, not even any message). It seems that something 
is broken. Now I have to restart my Zeppelin instance when this happened. After 
checking the network calls from interpreter page, it is the 
/api/interpreter/setting that is not responding. What may cause the issue? Is 
there anything I can do to prevent this from happening?



Re: How to execute spark-submit on Note

2017-10-03 Thread Jianfeng (Jeff) Zhang

I am surprised why would you use %spark-submit, there’s no document about 
%spark-submit.   If you want to use spark-submit in zeppelin, then you could 
use %sh


Best Regard,
Jeff Zhang


From: 小野圭二 >
Reply-To: "users@zeppelin.apache.org" 
>
Date: Tuesday, October 3, 2017 at 12:49 PM
To: "users@zeppelin.apache.org" 
>
Subject: How to execute spark-submit on Note

Hi all,

I searched this topic on the archive of ml, but still could not find out the 
solution clearly.
So i have tried to post this again(maybe).

I am using ver 0.8.0, and have installed spark 2.2 on the other path, just for 
checking my test program.
Then i wrote a quite simple sample python code to check the how to.

1. the code works fine on a note in Zeppelin
2. the same code but added the initialize code for SparkContext in it works 
fine on the Spark by using 'spark-submit'.
3. tried to execute "2" from a note in Zeppelin with the following script.
yes, "spark" interpreter has been implemented in the note.
then on the note,
%spark-submit 
  -> interpreter not found error
4.I have arranged 'SPARK_SUBMIT_OPTIONS' in zeppelin-env.sh order by the doc
ex. export SPARK_SUBMIT_OPTIONS='--packages 
com.databricks:spark-csv_2.10:1.2.0'
5. then running
 %spark-submit 
  -> interpreter not found error  (as same as "3")

How can i use spark-submit from a note?
Any advice thanks.

-Keiji


Re: Error in combining data from Tajo and MariaDB with Spark and Zeppelin

2017-08-31 Thread Jianfeng (Jeff) Zhang

Have tried that in spark-shell ?

Best Regard,
Jeff Zhang


From: Cinyoung Hur >
Reply-To: "users@zeppelin.apache.org" 
>
Date: Friday, September 1, 2017 at 10:43 AM
To: "users@zeppelin.apache.org" 
>
Subject: Error in combining data from Tajo and MariaDB with Spark and Zeppelin

Hi,

I tried to combine two tables, one from Tajo, and the other from MariaDB.
My spark interpreter has dependency on "org.apache.tajo:tajo-jdbc:0.11.0".

But, Tajo table doesn't show anything.
The followings are Spark code and the result.

val componentDF = sqlc.load("jdbc", Map(
"url"-> "jdbc:tajo://tajo-master-ip:26002/analysis",
"driver"->"org.apache.tajo.jdbc.TajoDriver",
"dbtable"->"component_usage_2015"
))
componentDF.registerTempTable("components")
val allComponents = sqlContext.sql("select * from components")
allComponents.show(5)

warning: there was one deprecation warning; re-run with -deprecation for details
componentDF: org.apache.spark.sql.DataFrame = 
[analysis.component_usage_2015.gnl_nm_cd: string, 
analysis.component_usage_2015.qty: double ... 1 more field]
warning: there was one deprecation warning; re-run with -deprecation for details
allComponents: org.apache.spark.sql.DataFrame = 
[analysis.component_usage_2015.gnl_nm_cd: string, 
analysis.component_usage_2015.qty: double ... 1 more field]
++--+--+
|analysis.component_usage_2015.gnl_nm_cd|analysis.component_usage_2015.qty|analysis.component_usage_2015.amt|
++--+--+
++--+--+





Re: Trying to 0.7.3 running with Spark

2017-10-07 Thread Jianfeng (Jeff) Zhang

Could you check the log again ? There should be another exception above the 
exception you pasted. Most likely SparkContext is failed to create.



Best Regard,
Jeff Zhang


From: Terry Healy >
Reply-To: "users@zeppelin.apache.org" 
>
Date: Friday, October 6, 2017 at 10:35 PM
To: "users@zeppelin.apache.org" 
>
Subject: Trying to 0.7.3 running with Spark

Using Zeppelin 0.7.3, Spark 2.1.0-mapr-1703 / Scala 2.11.8

I had previously run the demo and successfully set up MongoDB and JDBC 
interpreter for Impala under V0.7.2. Since I have upgraded to 0.7.3, everything 
broke. I am down to to complete re-install (several, in fact) and get a 
response like below for most everything I try. (Focusing just on %spark for 
now) apparently have something very basic wrong, but I'll be damned if I can 
find it. The same example works fine in spark-shell.

Any suggestions for a new guy very much appreciated.

I found [ZEPPELIN-2475] 
and
 [ZEPPELIN-1560] which seem to be the same, or similar, but I did not 
understand what to change 
where

This is from "Zeppelin Tutorial/Basic Features (Spark)".

java.lang.NullPointerException
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:33)
at 
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_2(SparkInterpreter.java:398)
at 
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:387)
at 
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:146)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:843)
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:491)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)


Re: Sequential processing disabled?

2017-11-30 Thread Jianfeng (Jeff) Zhang
It is per interpreter. You can create impala interperter based on jdbc 
template. and configure its zeppelin.jdbc.concurrent.use to false,


Best Regard,
Jeff Zhang


From: "Geiss, Chris" <chris.ge...@citi.com<mailto:chris.ge...@citi.com>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Thursday, November 30, 2017 at 10:10 PM
To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Subject: RE: Sequential processing disabled?

Jeff,

OK makes sense. Is this a global setting? Can it be set per user, or per 
Notebook?

Chris


From: Jianfeng (Jeff) Zhang [mailto:jzh...@hortonworks.com]
Sent: Thursday, November 30, 2017 9:08 AM
To: users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>
Subject: Re: Sequential processing disabled?


Right, when zeppelin.jdbc.concurrent.useis false, paragraphs will run as FIFO


Best Regard,
Jeff Zhang


From: "Geiss, Chris" <chris.ge...@citi.com<mailto:chris.ge...@citi.com>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Thursday, November 30, 2017 at 9:58 PM
To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Subject: RE: Sequential processing disabled?

Hi Jeff,

Thank you for the reply. So that will result in Impala paragraphs running 
sequentially? Is that because Impala goes through JDBC?

Chris


From: Jianfeng (Jeff) Zhang [mailto:jzh...@hortonworks.com]
Sent: Wednesday, November 29, 2017 7:03 PM
To: users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>
Subject: Re: Sequential processing disabled?


You can configure zeppelin.jdbc.concurrent.use as false to make jdbc 
interpreter run sequentially

Best Regard,
Jeff Zhang


From: "Geiss, Chris" <chris.ge...@citi.com<mailto:chris.ge...@citi.com>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Thursday, November 30, 2017 at 6:22 AM
To: "'users@zeppelin.apache.org<mailto:'users@zeppelin.apache.org>'" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Subject: Sequential processing disabled?

I am using Zeppelin 0.8.0-SNAPSHOT with Impala. If I understand correctly, 
paragraphs that use the same interpreter should process sequentially. But some 
of the paragraphs seem to process in parallel. Am I wrong that they should be 
sequentially executed, or is there something different about Impala? Is there 
some configuration option I can check?

Chris



Re: Sequential processing disabled?

2017-11-29 Thread Jianfeng (Jeff) Zhang

You can configure zeppelin.jdbc.concurrent.use as false to make jdbc 
interpreter run sequentially

Best Regard,
Jeff Zhang


From: "Geiss, Chris" >
Reply-To: "users@zeppelin.apache.org" 
>
Date: Thursday, November 30, 2017 at 6:22 AM
To: "'users@zeppelin.apache.org'" 
>
Subject: Sequential processing disabled?

I am using Zeppelin 0.8.0-SNAPSHOT with Impala. If I understand correctly, 
paragraphs that use the same interpreter should process sequentially. But some 
of the paragraphs seem to process in parallel. Am I wrong that they should be 
sequentially executed, or is there something different about Impala? Is there 
some configuration option I can check?

Chris



Re: Sequential processing disabled?

2017-11-30 Thread Jianfeng (Jeff) Zhang

Right, when zeppelin.jdbc.concurrent.use is false, paragraphs will run as FIFO


Best Regard,
Jeff Zhang


From: "Geiss, Chris" <chris.ge...@citi.com<mailto:chris.ge...@citi.com>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Thursday, November 30, 2017 at 9:58 PM
To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Subject: RE: Sequential processing disabled?

Hi Jeff,

Thank you for the reply. So that will result in Impala paragraphs running 
sequentially? Is that because Impala goes through JDBC?

Chris


From: Jianfeng (Jeff) Zhang [mailto:jzh...@hortonworks.com]
Sent: Wednesday, November 29, 2017 7:03 PM
To: users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>
Subject: Re: Sequential processing disabled?


You can configure zeppelin.jdbc.concurrent.use as false to make jdbc 
interpreter run sequentially

Best Regard,
Jeff Zhang


From: "Geiss, Chris" <chris.ge...@citi.com<mailto:chris.ge...@citi.com>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Thursday, November 30, 2017 at 6:22 AM
To: "'users@zeppelin.apache.org<mailto:'users@zeppelin.apache.org>'" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Subject: Sequential processing disabled?

I am using Zeppelin 0.8.0-SNAPSHOT with Impala. If I understand correctly, 
paragraphs that use the same interpreter should process sequentially. But some 
of the paragraphs seem to process in parallel. Am I wrong that they should be 
sequentially executed, or is there something different about Impala? Is there 
some configuration option I can check?

Chris



Re: Livy Manager - Web UI for Managing Apache Livy Sessions

2017-12-07 Thread Jianfeng (Jeff) Zhang

Great work, @Keiji,  Livy also provide a webUI for managing livy sessions. And 
welcome to contribute your work to livy.

http://livy.incubator.apache.org/
https://github.com/apache/incubator-livy


Best Regard,
Jeff Zhang


From: Keiji Yoshida >
Reply-To: "users@zeppelin.apache.org" 
>
Date: Friday, December 8, 2017 at 1:08 AM
To: "users@zeppelin.apache.org" 
>
Subject: Livy Manager - Web UI for Managing Apache Livy Sessions

Hi,

I just released a Web UI for managing Apache Livy sessions for non-developer 
users:

https://github.com/kjmrknsn/livy-manager

This Web UI enables non-developer Livy users to monitor and kill their Livy 
sessions.

I have been managing Apache Zeppelin with Livy interpreter and I have felt that 
it is difficult for non-developer users to monitor and kill their Livy sessions 
/ Spark applications and that causes a heavy and long running Spark application 
remains for a long time when a non-developer user submits such an application 
accidentally. That feeling made me create this application.

Additionally, this application supports LDAP authentication and authorization, 
that works well with Apache Zeppelin + LDAP authentication + Livy interpreter.

I would be glad if you would try this application.

Regards,

Keiji Yoshida
https://github.com/kjmrknsn


Re: Sequential processing disabled?

2017-12-01 Thread Jianfeng (Jeff) Zhang

Right, actually you can create as many impala interpreter as you can. And each 
impala can have different setting.


Best Regard,
Jeff Zhang


From: "Geiss, Chris" <chris.ge...@citi.com<mailto:chris.ge...@citi.com>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Friday, December 1, 2017 at 9:33 PM
To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Subject: RE: Sequential processing disabled?

Hi Jeff,

Does that mean we can have two Impala interpreters configured? One with 
sequential processing and one with parallel processing?

Chris


From: Jianfeng (Jeff) Zhang [mailto:jzh...@hortonworks.com]
Sent: Thursday, November 30, 2017 7:08 PM
To: users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>
Subject: Re: Sequential processing disabled?

It is per interpreter. You can create impala interperter based on jdbc 
template. and configure its zeppelin.jdbc.concurrent.use to false,




Best Regard,
Jeff Zhang


From: "Geiss, Chris" <chris.ge...@citi.com<mailto:chris.ge...@citi.com>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Thursday, November 30, 2017 at 10:10 PM
To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Subject: RE: Sequential processing disabled?

Jeff,

OK makes sense. Is this a global setting? Can it be set per user, or per 
Notebook?

Chris


From: Jianfeng (Jeff) Zhang [mailto:jzh...@hortonworks.com]
Sent: Thursday, November 30, 2017 9:08 AM
To: users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>
Subject: Re: Sequential processing disabled?


Right, when zeppelin.jdbc.concurrent.useis false, paragraphs will run as FIFO


Best Regard,
Jeff Zhang


From: "Geiss, Chris" <chris.ge...@citi.com<mailto:chris.ge...@citi.com>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Thursday, November 30, 2017 at 9:58 PM
To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Subject: RE: Sequential processing disabled?

Hi Jeff,

Thank you for the reply. So that will result in Impala paragraphs running 
sequentially? Is that because Impala goes through JDBC?

Chris


From: Jianfeng (Jeff) Zhang [mailto:jzh...@hortonworks.com]
Sent: Wednesday, November 29, 2017 7:03 PM
To: users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>
Subject: Re: Sequential processing disabled?


You can configure zeppelin.jdbc.concurrent.use as false to make jdbc 
interpreter run sequentially

Best Regard,
Jeff Zhang


From: "Geiss, Chris" <chris.ge...@citi.com<mailto:chris.ge...@citi.com>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Thursday, November 30, 2017 at 6:22 AM
To: "'users@zeppelin.apache.org<mailto:'users@zeppelin.apache.org>'" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Subject: Sequential processing disabled?

I am using Zeppelin 0.8.0-SNAPSHOT with Impala. If I understand correctly, 
paragraphs that use the same interpreter should process sequentially. But some 
of the paragraphs seem to process in parallel. Am I wrong that they should be 
sequentially executed, or is there something different about Impala? Is there 
some configuration option I can check?

Chris



Re: All PySpark jobs are canceled when one user cancel his PySpark paragraph (job)

2018-06-12 Thread Jianfeng (Jeff) Zhang

Which version do you use ?


Best Regard,
Jeff Zhang


From: Jhon Anderson Cardenas Diaz 
mailto:jhonderson2...@gmail.com>>
Reply-To: "users@zeppelin.apache.org" 
mailto:users@zeppelin.apache.org>>
Date: Friday, June 8, 2018 at 11:08 PM
To: "users@zeppelin.apache.org" 
mailto:users@zeppelin.apache.org>>, 
"d...@zeppelin.apache.org" 
mailto:d...@zeppelin.apache.org>>
Subject: All PySpark jobs are canceled when one user cancel his PySpark 
paragraph (job)

Dear community,

Currently we are having problems with multiple users running paragraphs 
associated with pyspark jobs.

The problem is that if an user aborts/cancels his pyspark paragraph (job), the 
active pyspark jobs of the other users are canceled too.

Going into detail, I've seen that when you cancel a user's job this method is 
invoked (which is fine):

sc.cancelJobGroup("zeppelin-[notebook-id]-[paragraph-id]")

But somehow unknown to me, this method is also invoked:

sc.cancelAllJobs()

The above is due to the trace of the log that appears in the jobs of the other 
users:

Py4JJavaError: An error occurred while calling o885.count.
: org.apache.spark.SparkException: Job 461 cancelled as part of cancellation of 
all jobs
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
at 
org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:1375)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply$mcVI$sp(DAGScheduler.scala:721)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:721)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:721)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
at 
org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGScheduler.scala:721)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1628)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:275)
at 
org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2386)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788)
at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385)
at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2392)
at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2420)
at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2419)
at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2801)
at org.apache.spark.sql.Dataset.count(Dataset.scala:2419)
at sun.reflect.GeneratedMethodAccessor120.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)

(, Py4JJavaError('An error occurred while 
calling o885.count.\n', JavaObject id=o886), )

Any idea of why this could be happening?

(I have 0.8.0 version from September 2017)

Thank you!


Re: Zeppelin 0.8

2018-06-07 Thread Jianfeng (Jeff) Zhang

I am doing the release, the latest RC4 is canceled, I will start RC5 in the 
next few days.


Best Regard,
Jeff Zhang


From: Benjamin Kim mailto:bbuil...@gmail.com>>
Reply-To: "users@zeppelin.apache.org" 
mailto:users@zeppelin.apache.org>>
Date: Thursday, June 7, 2018 at 10:52 PM
To: "users@zeppelin.apache.org" 
mailto:users@zeppelin.apache.org>>
Subject: Re: Zeppelin 0.8

Can anyone tell me what the status is for 0.8 release?

On May 2, 2018, at 4:43 PM, Jeff Zhang 
mailto:zjf...@gmail.com>> wrote:


Yes, 0.8 will support spark 2.3

Benjamin Kim mailto:bbuil...@gmail.com>>于2018年5月3日周四 
上午1:59写道:
Will Zeppelin 0.8 have Spark 2.3 support?

On Apr 30, 2018, at 1:27 AM, Rotem Herzberg 
mailto:rotem.herzb...@gigaspaces.com>> wrote:

Thanks

On Mon, Apr 30, 2018 at 11:16 AM, Jeff Zhang 
mailto:zjf...@gmail.com>> wrote:

I am preparing the RC for 0.8


Rotem Herzberg 
mailto:rotem.herzb...@gigaspaces.com>>于2018年4月30日周一
 下午3:57写道:
Hi,

What is the release date for Zeppelin 0.8? (support for spark 2.3)

Thanks,

--
[https://storage.googleapis.com/signaturesatori/customer-C00h6qxzk/images/-1a2e6b7d626eb2978e380ac586ed5981ba86c4c70b6c557c571908c4cdb8fd0e.png]
Rotem Herzberg
SW Engineer | GigaSpaces Technologies

rotem.herzb...@gigaspaces.com  | M 
+972547718880

 
[https://storage.googleapis.com/signaturesatori/customer-C00h6qxzk/images/4f4d7968f470731f0ef8b9da14b9cf7ba778a53deac7757e1b2b57812426897f.png]
 
[https://storage.googleapis.com/signaturesatori/customer-C00h6qxzk/images/-59956a2e877aa01955fa046b46b752af4aa755b370b46f9b9ac1d5a319c6a2b2.png]
 
[https://storage.googleapis.com/signaturesatori/customer-C00h6qxzk/images/-78d2ddbda5535a86836b182f31495a57c6890d8d5947584e89cf9ab60269a29b.png]
 




--
[https://storage.googleapis.com/signaturesatori/customer-C00h6qxzk/images/-1a2e6b7d626eb2978e380ac586ed5981ba86c4c70b6c557c571908c4cdb8fd0e.png]
Rotem Herzberg
SW Engineer | GigaSpaces Technologies

rotem.herzb...@gigaspaces.com  | M 
+972547718880

 
[https://storage.googleapis.com/signaturesatori/customer-C00h6qxzk/images/4f4d7968f470731f0ef8b9da14b9cf7ba778a53deac7757e1b2b57812426897f.png]
 
[https://storage.googleapis.com/signaturesatori/customer-C00h6qxzk/images/-59956a2e877aa01955fa046b46b752af4aa755b370b46f9b9ac1d5a319c6a2b2.png]
 
[https://storage.googleapis.com/signaturesatori/customer-C00h6qxzk/images/-78d2ddbda5535a86836b182f31495a57c6890d8d5947584e89cf9ab60269a29b.png]
 





Re: Spark Error replayPlot(x): could not open file

2018-04-18 Thread Jianfeng (Jeff) Zhang

Do you mind to try zeppelin 0.8 branch ? You need to build it from source




Best Regard,
Jeff Zhang


From: "Joe W. Byers" >
Reply-To: "users@zeppelin.apache.org" 
>
Date: Wednesday, April 18, 2018 at 9:49 PM
To: "users@zeppelin.apache.org" 
>
Subject: Re: Spark Error replayPlot(x): could not open file



On 2018/04/18 11:16:40, "Joe W. Byers"  
wrote:
> All,>
>
> I am getting this error on all R plots in all the tutorial examples >
> using Zeppelin 0.7.3, and in simple example %r scripts.>
>
> Error in replayPlot(x): could not open file 'figure/unnamed-chunk-1-1.png'>
>
> I think this is something to do with file permissions on temporary >
> files/directories.  There is one or two posts I found searching that are >
> under Rstudio that offer a system call, but it does not work with >
> Zeppelin/Spark.>
> Sys.umask(mode="") is place at the beginning of script.>
>
> This is what I have tried.>
> %r>
> Sys.umask(mode="")>
> a = rnorm(1000)>
> plot(a)>
>
> Thanks>
> Joe>
>
> -- >
> *Joe W. Byers*>
>

A follow up.

I am running zeppelin as a service on a Fedora 25 server controlled by systemd 
(systemctl ...).  I create a user and group, zeppelin, with /sbin/nologin and 
home /dir1/zeppelin.  I set up shiro.ini for my ldap authetication.  This issue 
did not occur when I was testing, starting server with command line script or 
daemon as root user, and logging in as anonymous on the zeppelin url.

Googleviz charts do work, only R specific plot, ggplot2, have this issue.

Thanks
Joe
--
Joe W. Byers


Re: [ANNOUNCE] Apache Zeppelin 0.8.0 released

2018-06-28 Thread Jianfeng (Jeff) Zhang
Hi Patrick,

Which link is broken ? I can access all the links.

Best Regard,
Jeff Zhang


From: Patrick Maroney mailto:pmaro...@wapacklabs.com>>
Reply-To: mailto:users@zeppelin.apache.org>>
Date: Friday, June 29, 2018 at 4:59 AM
To: mailto:users@zeppelin.apache.org>>
Cc: dev mailto:d...@zeppelin.apache.org>>
Subject: Re: [ANNOUNCE] Apache Zeppelin 0.8.0 released

Great work Team/Community!

Links on the main download page are broken:

http://zeppelin.apache.org/download.html

...at least the ones I need ;-)

Patrick Maroney
Principal Engineer - Data Science & Analytics
Wapack Labs LLC


Public Key: http://pgp.mit.edu/pks/lookup?op=get=0x7C810C9769BD29AF

On Jun 27, 2018, at 11:21 PM, Prabhjyot Singh 
mailto:prabhjyotsi...@gmail.com>> wrote:

Awesome! congratulations team.



On Thu 28 Jun, 2018, 8:39 AM Taejun Kim, 
mailto:i2r@gmail.com>> wrote:
Awesome! Thanks for your great work :)

2018?? 6?? 28?? (??)  12:07, Jeff Zhang 
mailto:zjf...@apache.org>> :
The Apache Zeppelin community is pleased to announce the availability of
the 0.8.0 release.

Zeppelin is a collaborative data analytics and visualization tool for
distributed, general-purpose data processing system such as Apache Spark,
Apache Flink, etc.

This is another major release after the last minor release 0.7.3.
The community put significant effort into improving Apache Zeppelin since
the last release. 122 contributors fixed totally 602 issues. Lots of
new features are introduced, such as inline configuration, ipython
interpreter, yarn-cluster mode support , interpreter lifecycle manager
and etc.

We encourage you to download the latest release
fromhttp://zeppelin.apache.org/download.html

Release note is available
athttp://zeppelin.apache.org/releases/zeppelin-release-0.8.0.html

We welcome your help and feedback. For more information on the project and
how to get involved, visit our website at http://zeppelin.apache.org/

Thank you all users and contributors who have helped to improve Apache
Zeppelin.

Regards,
The Apache Zeppelin community
--
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul