from:"Dave Boyd"

Re: Equivalent of Jupyter %run

2020-01-11 Thread Dave Boyd

Jeff:
   Thanks.

I tried the following:

%ipython
%run /Dave_Folder/ElasticUtils

I get the following error:

java.util.concurrent.RejectedExecutionException: Task 
io.grpc.internal.SerializingExecutor@7068c569 rejected from 
java.util.concurrent.ThreadPoolExecutor@789e11b[Terminated, pool size = 0, 
active threads = 0, queued tasks = 0, completed tasks = 45] at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
 at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) 
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) 
at io.grpc.internal.SerializingExecutor.schedule(SerializingExecutor.java:93) 
at io.grpc.internal.SerializingExecutor.execute(SerializingExecutor.java:86) at 
io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.closed(ClientCallImpl.java:588)
 at io.grpc.internal.FailingClientStream.start(FailingClientStream.java:54) at 
io.grpc.internal.ClientCallImpl.start(ClientCallImpl.java:273) at 
io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1.start(CensusTracingModule.java:398)
 at 
io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1.start(CensusStatsModule.java:673)
 at io.grpc.stub.ClientCalls.startCall(ClientCalls.java:308) at 
io.grpc.stub.ClientCalls.asyncUnaryRequestCall(ClientCalls.java:280) at 
io.grpc.stub.ClientCalls.asyncUnaryRequestCall(ClientCalls.java:265) at 
io.grpc.stub.ClientCalls.asyncServerStreamingCall(ClientCalls.java:73) at 
org.apache.zeppelin.python.proto.IPythonGrpc$IPythonStub.execute(IPythonGrpc.java:240)
 at 
org.apache.zeppelin.python.IPythonClient.stream_execute(IPythonClient.java:89) 
at 
org.apache.zeppelin.python.IPythonInterpreter.interpret(IPythonInterpreter.java:350)
 at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
 at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:632)
 at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at 
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140) at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)

I am running 0.8.2 in a docker container.


On 1/10/2020 6:31 PM, Jeff Zhang wrote:
You can do it via ipython interpreter which support all the of jupyter magics

http://zeppelin.apache.org/docs/0.8.2/interpreter/python.html#ipython-support


Partridge, Lucas (GE Aviation) 
mailto:lucas.partri...@ge.com>> 于2020年1月10日周五 下午5:13写道：
I've hardly used Jupyter so can't comment on an equivalent for %run.

But for Zeppelin you can put your python files on the local file system of your 
Spark driver node, or more commonly in HDFS, and then use sc.addPyFile() [1] to 
make each file available in the SparkContext.  Then you can import your python 
packages as normal.  The slightly annoying thing is that if you change your 
code you'll need to restart your Spark application to pick up the changes as 
there's no reliable way to reimport the updated modules in a running 
application.  But you could put your importing of common files in a shared 
notebook so everyone can run it easily.

Once you're happy with your code and it's fairly stable then you can package it 
with a setup.py and install the packages on all the nodes of your cluster like 
any other python package. Then you can skip the sc.addPyFile() step.

DataBricks have a great facility for allowing users to upload their own Python 
packages/libraries. It would be great if Zeppelin provided this feature as well 
(although maybe they do now as I'm on an older version...).

Lucas.

[1] 
https://spark.apache.org/docs/latest/api/python/pyspark.html?highlight=addpyfile#pyspark.SparkContext.addPyFile

-Original Message-
From: Dave Boyd mailto:db...@incadencecorp.com>>
Sent: 09 January 2020 17:44
To: users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>
Subject: EXT: Equivalent of Jupyter %run

I have googled this but don't see a solution.

We are working on a project where we want to have some common python functions 
shared between notes.

In Jupyter we would just do a %run.  Is there an equivelent in Zeppelin?
Is there a way to store files as .py files that zeppelin can find for import to 
work.

Looking to see how folks may have solved this need.

--
= mailto:db...@incadencecorp.com<mailto:db...@incadencecorp.com> 
 David W. Boyd VP,  Data So

Equivalent of Jupyter %run

2020-01-09 Thread Dave Boyd

I have googled this but don't see a solution.

We are working on a project where we want to have some common python 
functions
shared between notes.

In Jupyter we would just do a %run.  Is there an equivelent in Zeppelin?
Is there a way to store files as .py files that zeppelin can find for 
import to work.

Looking to see how folks may have solved this need.

-- 
= mailto:db...@incadencecorp.com 
David W. Boyd
VP,  Data Solutions
10432 Balls Ford, Suite 240
Manassas, VA 20109
office:   +1-703-552-2862
cell: +1-703-402-7908
== http://www.incadencecorp.com/ 
ISO/IEC JTC1 SC42/WG2, editor ISO/IEC 20546, ISO/IEC 20547-1
Chair ANSI/INCITS TG Big Data
Co-chair NIST Big Data Public Working Group Reference Architecture
First Robotic Mentor - FRC, FTC - www.iliterobotics.org
Board Member- USSTEM Foundation - www.usstem.org

The information contained in this message may be privileged
and/or confidential and protected from disclosure.
If the reader of this message is not the intended recipient
or an employee or agent responsible for delivering this message
to the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this communication
is strictly prohibited.  If you have received this communication
in error, please notify the sender immediately by replying to
this message and deleting the material from any computer.

Re: Adding jars to Spark 2.4.0 under yarn cluster mode in Zeppelin 0.8.1

2019-04-07 Thread Dave Boyd

From the connection refused message I wonder if it is an SSL error.  I note 
none of the information for SSL (truststore, keystore, etc.)
I would think the YARN cluster requires some form of authentication.

On 4/7/19 9:27 AM, Jeff Zhang wrote:
It looks like the interpreter process can not connect to zeppelin server 
process. I guess it is due to some network issue, can you check whether the 
node in yarn cluster can connect to the zeppelin server host ?

Y. Ethan Guo mailto:guoyi...@uber.com>> 于2019年4月7日周日 
下午3:31写道：
Hi Jeff,

Given this PR is merged, I'm trying to see if I can run yarn cluster mode from 
master build.  I built Zeppelin master from this commit:

commit 3655c12b875884410224eca5d6155287d51916ac
Author: Jongyoul Lee mailto:jongy...@gmail.com>>
Date:   Mon Apr 1 15:37:57 2019 +0900
[MINOR] Refactor CronJob class (#3335)

While I can successfully run Spark interpreter yarn client mode, I'm having 
trouble making the yarn cluster mode working.  Specifically, while the 
interpreter job was accepted in yarn, the job failed after 1-2 minutes because 
of this exception (see below).  Do you have any idea why this is happening?

DEBUG [2019-04-07 06:57:00,314] ({main} Logging.scala[logDebug]:58) - Created 
SSL options for fs: SSLOptions{enabled=false, keyStore=None, 
keyStorePassword=None, trustStore=None, trustStorePassword=None, protocol=None, 
enabledAlgorithms=Set()}
 INFO [2019-04-07 06:57:00,323] ({main} Logging.scala[logInfo]:54) - Starting 
the user application in a separate Thread
 INFO [2019-04-07 06:57:00,350] ({main} Logging.scala[logInfo]:54) - Waiting 
for spark context initialization...
 INFO [2019-04-07 06:57:00,403] ({Driver} 
RemoteInterpreterServer.java[]:148) - Starting remote interpreter server 
on port 0, intpEventServerAddress: 172.17.0.1:45128
ERROR [2019-04-07 06:57:00,408] ({Driver} Logging.scala[logError]:91) - User 
class threw exception: org.apache.thrift.transport.TTransportException: 
java.net.ConnectException: Connection refused (Connection refused)
org.apache.thrift.transport.TTransportException: java.net.ConnectException: 
Connection refused (Connection refused)
at org.apache.thrift.transport.TSocket.open(TSocket.java:226)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.(RemoteInterpreterServer.java:154)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.(RemoteInterpreterServer.java:139)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.main(RemoteInterpreterServer.java:285)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.thrift.transport.TSocket.open(TSocket.java:221)
... 8 more

Thanks,
- Ethan

On Wed, Feb 27, 2019 at 4:24 PM Jeff Zhang 
mailto:zjf...@gmail.com>> wrote:
Here's the PR
https://github.com/apache/zeppelin/pull/3308

Y. Ethan Guo mailto:guoyi...@uber.com>> 于2019年2月28日周四 
上午2:50写道：
Hi All,

I'm trying to use the new feature of yarn cluster mode to run Spark 2.4.0 jobs 
on Zeppelin 0.8.1. I've set the SPARK_HOME, SPARK_SUBMIT_OPTIONS, and 
HADOOP_CONF_DIR env variables in zeppelin-env.sh so that the Spark interpreter 
can be started in the cluster. I used `--jars` in SPARK_SUBMIT_OPTIONS to add 
local jars. However, when I tried to import a class from the jars in a Spark 
paragraph, the interpreter complained that it cannot find the package and class 
(":23: error: object ... is not a member of package ..."). Looks like 
the jars are not properly imported.

I followed the instruction 
here
 to add the jars, but it seems that it's not working in the cluster mode.  And 
this issue seems to be related to this bug: 
https://jira.apache.org/jira/browse/ZEPPELIN-3986.  Is there any update on 
fixing it? What is the right way to add local jars in yarn cluster mode? Any 
help and update are much appreciated.


Here's the SPARK_SUBMIT_OPTIONS I used (packages and jars paths omitted):

export SPARK_SUBMIT_OPTIONS="--driver-memory 12G --packages ... --jars ... 
--repositories

Re: Spark issue moving from local to yarn-client

2019-03-14 Thread Dave Boyd

Ok, this had more information:

> INFO [2019-03-15 02:00:46,364] ({pool-2-thread-3} 
> Logging.scala[logInfo]:54) - Logging events to 
> hdfs:///var/log/spark/applicationHistory/application_1551287663522_0145
> ERROR [2019-03-15 02:00:46,366] ({SparkListenerBus} 
> Logging.scala[logError]:91) - uncaught error in thread 
> SparkListenerBus, stopping SparkContext
> java.lang.NoSuchMethodError: 
> org.json4s.Formats.emptyValueStrategy()Lorg/json4s/prefs/EmptyValueStrategy;
>     at org.json4s.jackson.JsonMethods$class.render(JsonMethods.scala:32)
>     at org.json4s.jackson.JsonMethods$.render(JsonMethods.scala:50)
>     at 
> org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$1.apply(EventLoggingListener.scala:136)
>     at 
> org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$1.apply(EventLoggingListener.scala:136)
>     at scala.Option.foreach(Option.scala:257)
>     at 
> org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:136)
>     at 
> org.apache.spark.scheduler.EventLoggingListener.onBlockManagerAdded(EventLoggingListener.scala:168)
>     at 
> org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:49)
>     at 
> org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
>     at 
> org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
>     at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63)
>     at 
> org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:36)
>     at 
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiveListenerBus.scala:94)
>     at 
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
>     at 
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
>     at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>     at 
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:78)
>     at 
> org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1245)
>     at 
> org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:77)
> ERROR [2019-03-15 02:00:46,367] ({SparkListenerBus} 
> Logging.scala[logError]:91) - throw uncaught fatal error in thread 
> SparkListenerBus
> java.lang.NoSuchMethodError: 
> org.json4s.Formats.emptyValueStrategy()Lorg/json4s/prefs/EmptyValueStrategy;
>     at org.json4s.jackson.JsonMethods$class.render(JsonMethods.scala:32)
>     at org.json4s.jackson.JsonMethods$.render(JsonMethods.scala:50)
>     at 
> org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$1.apply(EventLoggingListener.scala:136)
>     at 
> org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$1.apply(EventLoggingListener.scala:136)
>     at scala.Option.foreach(Option.scala:257)
>     at 
> org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:136)
>     at 
> org.apache.spark.scheduler.EventLoggingListener.onBlockManagerAdded(EventLoggingListener.scala:168)
>     at 
> org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:49)
>     at 
> org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
>     at 
> org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
>     at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63)
>     at 
> org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:36)
>     at 
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiveListenerBus.scala:94)
>     at 
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
>     at 
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
>     at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>     at 
> org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:78)
>     at 
> org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1245)
>     at 
> org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:77)
>  INFO [2019-03-15 02:00:46,368] ({pool-2-thread-3} 
> Logging.scala[logInfo]:54) - SchedulerBackend is ready for scheduling 
> beginning after waiting maxRegisteredResourcesWaitingTime: 3(ms)
>  INFO [2019-03-15 02:00:46,375] ({stop-spark-context} 
> AbstractConnector.java[doStop]:306) - Stopped 
> ServerConnector@718326a2{HTTP/1.1}{0.0.0.0:55600}
>  INFO [2019-03-15 02:00:46,376] ({stop-spark-context} 
> ContextHandler.java[doStop]:865) - Stopped 
>

Re: Spark issue moving from local to yarn-client

2019-03-14 Thread Dave Boyd

g.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:616)
org.apache.zeppelin.scheduler.Job.run(Job.java:188)
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

The currently active SparkContext was created at:

(No active SparkContext.)

  at org.apache.spark.SparkContext.assertNotStopped(SparkContext.scala:100)
  at 
org.apache.spark.SparkContext$$anonfun$parallelize$1.apply(SparkContext.scala:716)
  at 
org.apache.spark.SparkContext$$anonfun$parallelize$1.apply(SparkContext.scala:715)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.SparkContext.withScope(SparkContext.scala:701)
  at org.apache.spark.SparkContext.parallelize(SparkContext.scala:715)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
  at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
  at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
  at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
  at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)
  at org.apache.spark.sql.Dataset.(Dataset.scala:185)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
  at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:2822)
  at org.apache.spark.sql.Dataset.createOrReplaceTempView(Dataset.scala:2605)
  ... 47 elided

 INFO [2019-03-15 01:07:27,118] ({pool-2-thread-43} 
VFSNotebookRepo.java[save]:196) - Saving note:2E6X2CDWW
 INFO [2019-03-15 01:07:27,124] ({pool-2-thread-43} 
SchedulerFactory.java[jobFinished]:120) - Job 20190222-204451_856915056 
finished by scheduler 
org.apache.zeppelin.interpreter.remote.RemoteInterpreter-spark:shared_process-shared_session

On 3/14/19 9:02 PM, Jeff Zhang wrote:
Hi Dave,

Could you paste the full stacktrace ? You can find it in the spark interpreter 
log file which is located in ZEPPELIN_HOME/logs

Xun Liu mailto:neliu...@163.com>> 于2019年3月15日周五 上午8:21写道：
Hi

You can first execute a simple statement in spark, through sparksql, to see if 
it can run normally in YARN.
If sparksql is running without problems, check the zeppelin and spark on yarn 
issues.

Also, what do you use for zeppelin-0.7.4? zeppelin-0.8.2? Is it a branch that 
you maintain yourself?

在 2019年3月15日，上午6:31，Dave Boyd 
mailto:db...@incadencecorp.com>> 写道：


All:

   I have some code that worked fine in Zeppelin 0.7.4 but I am having issues 
in 0.8.2 when going from spark master of local to yarn-client.  Yarn client 
worked in 0.7.4.

When my master is set to local[*] it runs just fine.  However, as soon as I 
switch to yarn-client I get the Cannot call methods on a stopped SparkContext 
error.   In looking at my yarn logs everything creates fine and the job 
finishes without an error.  The executors start just fine
from what I get out of yarn logs.

Any suggestions on where to look?   This happens with any note that trys to run 
spark.

If I try this very simple code:

// Spark Version
spark.version

I get this error:

java.lang.IllegalStateException: Spark context stopped while waiting for 
backend at 
org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:614)
 at 
org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:169)
 at org.apache.spark.SparkContext.(SparkContext.scala:567) at 
org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313) at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
 at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
 at scala.Option.getOrElse(Option.scala:121) at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI

Spark issue moving from local to yarn-client

2019-03-14 Thread Dave Boyd

All:

   I have some code that worked fine in Zeppelin 0.7.4 but I am having issues 
in 0.8.2 when going from spark master of local to yarn-client.  Yarn client 
worked in 0.7.4.

When my master is set to local[*] it runs just fine.  However, as soon as I 
switch to yarn-client I get the Cannot call methods on a stopped SparkContext 
error.   In looking at my yarn logs everything creates fine and the job 
finishes without an error.  The executors start just fine
from what I get out of yarn logs.

Any suggestions on where to look?   This happens with any note that trys to run 
spark.

If I try this very simple code:

// Spark Version
spark.version

I get this error:

java.lang.IllegalStateException: Spark context stopped while waiting for 
backend at 
org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:614)
 at 
org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:169)
 at org.apache.spark.SparkContext.(SparkContext.scala:567) at 
org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313) at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
 at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
 at scala.Option.getOrElse(Option.scala:121) at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.apache.zeppelin.spark.BaseSparkScalaInterpreter.spark2CreateContext(BaseSparkScalaInterpreter.scala:259)
 at 
org.apache.zeppelin.spark.BaseSparkScalaInterpreter.createSparkContext(BaseSparkScalaInterpreter.scala:178)
 at 
org.apache.zeppelin.spark.SparkScala211Interpreter.open(SparkScala211Interpreter.scala:89)
 at 
org.apache.zeppelin.spark.NewSparkInterpreter.open(NewSparkInterpreter.java:102)
 at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:62) 
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
 at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:616)
 at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at 
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140) at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)

What am I missing?

--
= mailto:db...@incadencecorp.com 
David W. Boyd
VP,  Data Solutions
10432 Balls Ford, Suite 240
Manassas, VA 20109
office:   +1-703-552-2862
cell: +1-703-402-7908
== http://www.incadencecorp.com/ 
ISO/IEC JTC1 WG9, editor ISO/IEC 20547 Big Data Reference Architecture
Chair ANSI/INCITS TC Big Data
Co-chair NIST Big Data Public Working Group Reference Architecture
First Robotic Mentor - FRC, FTC - 
www.iliterobotics.org
Board Member- USSTEM Foundation - www.usstem.org

The information contained in this message may be privileged
and/or confidential and protected from disclosure.
If the reader of this message is not the intended recipient
or an employee or agent responsible for delivering this message
to the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this communication
is strictly prohibited.  If you have received this communication
in error, please notify the sender immediately by replying to
this message and deleting the material from any computer.

Re: Equivalent of Jupyter %run

Equivalent of Jupyter %run

Re: Adding jars to Spark 2.4.0 under yarn cluster mode in Zeppelin 0.8.1

Re: Spark issue moving from local to yarn-client

Re: Spark issue moving from local to yarn-client

Spark issue moving from local to yarn-client

6 matches

Site Navigation

Mail list logo

Footer information