Basically I used this Cloudera blog as my basis:

http://blog.cloudera.com/blog/2014/08/how-to-use-ipython-notebook-with-apache-spark/

and I kick off iPython Notebooks like this:

# Setup Spark Enviroment

spark_home = '/opt/mapr/spark/spark-1.2.0-bin-mapr4'

pyspark_submit_args = '--master mesos://hadoopmapr3:5050
--driver-memory 1G --executor-memory 4096M'


# Set the OS ENV for Spark

os.environ['SPARK_HOME'] = spark_home

os.environ['PYSPARK_SUBMIT_ARGS'] = pyspark_submit_args


# Append Spark Paths

sys.path.insert(0, os.path.join(spark_home, 'python'))

sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.8.2.1-src.zip'))


# Import Hive and Spark moduels

from pyspark.sql import SQLContext, Row, HiveContext

import pyhs2



# For getting intel from sources

import urllib2

#Not sure if this goes here or not... we'll find out.

execfile(os.path.join(spark_home, 'python/pyspark/shell.py'))

On Sun, Feb 15, 2015 at 10:16 AM, Niels Schenk <[email protected]> wrote:
> Hi John,
>
> I'm not really sure what you're doing to submit your Spark app but the code 
> below lets me load different contexts if I give them a different name. You 
> can either set coarse mode to true or false, depending on whether you're fine 
> with a slight delay in the command you execute on the context.
>
> I'm not sure where you would find the UI in this case, I'm not using the UI.
>
> cheers,
>
> Niels
>
>
> conf=SparkConf()
> conf.set("spark.executor.uri", 
> "https://somehost/spark-1.2.1-bin-hadoop2.4.tgz";)
> conf.set("spark.executor.memory", "10g")
> conf.set("spark.cores.max", "4")
> conf.setAppName("PythonPi")
> conf.setMaster("mesos://zk://somehost:2181/mesos")
> conf.set("spark.mesos.coarse", "true")
>
> sc = SparkContext(conf=conf)
>
>> On 15 Feb 2015, at 02:06, John Omernik <[email protected]> wrote:
>>
>> Hello all, I am running Spark on Mesos and I think I am love, but I
>> have some questions. I am running the python shell via iPython
>> Notebooks (Jupyter) and it works great, but I am trying to figure out
>> how things are actually submitted... like for example, when I submit
>> the spark app from the iPython notebook server, I am opening a new
>> kernel and I see a new spark submit (similar to the below) for each
>> kernel... but, how is that actually working on the cluster, I can
>> connect to the spark server UI on 4040, but shouldn't there be a
>> different one for each driver? Is that causing conflicts? after a
>> while things seem to run slow is this due to some weird conflicts?
>> Should I be specifying unique ports for each server? Is the driver
>> shared between users? what about between kerne's for the same user?
>> Curious if anyone has any insight.
>>
>> Thanks!
>>
>>
>> java org.apache.spark.deploy.SparkSubmitDriverBootstrapper --master
>> mesos://hadoopmapr3:5050 --driver-memory 1G --executor-memory 4096M
>> pyspark-shell
>

Reply via email to