Basically I used this Cloudera blog as my basis: http://blog.cloudera.com/blog/2014/08/how-to-use-ipython-notebook-with-apache-spark/
and I kick off iPython Notebooks like this: # Setup Spark Enviroment spark_home = '/opt/mapr/spark/spark-1.2.0-bin-mapr4' pyspark_submit_args = '--master mesos://hadoopmapr3:5050 --driver-memory 1G --executor-memory 4096M' # Set the OS ENV for Spark os.environ['SPARK_HOME'] = spark_home os.environ['PYSPARK_SUBMIT_ARGS'] = pyspark_submit_args # Append Spark Paths sys.path.insert(0, os.path.join(spark_home, 'python')) sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.8.2.1-src.zip')) # Import Hive and Spark moduels from pyspark.sql import SQLContext, Row, HiveContext import pyhs2 # For getting intel from sources import urllib2 #Not sure if this goes here or not... we'll find out. execfile(os.path.join(spark_home, 'python/pyspark/shell.py')) On Sun, Feb 15, 2015 at 10:16 AM, Niels Schenk <[email protected]> wrote: > Hi John, > > I'm not really sure what you're doing to submit your Spark app but the code > below lets me load different contexts if I give them a different name. You > can either set coarse mode to true or false, depending on whether you're fine > with a slight delay in the command you execute on the context. > > I'm not sure where you would find the UI in this case, I'm not using the UI. > > cheers, > > Niels > > > conf=SparkConf() > conf.set("spark.executor.uri", > "https://somehost/spark-1.2.1-bin-hadoop2.4.tgz") > conf.set("spark.executor.memory", "10g") > conf.set("spark.cores.max", "4") > conf.setAppName("PythonPi") > conf.setMaster("mesos://zk://somehost:2181/mesos") > conf.set("spark.mesos.coarse", "true") > > sc = SparkContext(conf=conf) > >> On 15 Feb 2015, at 02:06, John Omernik <[email protected]> wrote: >> >> Hello all, I am running Spark on Mesos and I think I am love, but I >> have some questions. I am running the python shell via iPython >> Notebooks (Jupyter) and it works great, but I am trying to figure out >> how things are actually submitted... like for example, when I submit >> the spark app from the iPython notebook server, I am opening a new >> kernel and I see a new spark submit (similar to the below) for each >> kernel... but, how is that actually working on the cluster, I can >> connect to the spark server UI on 4040, but shouldn't there be a >> different one for each driver? Is that causing conflicts? after a >> while things seem to run slow is this due to some weird conflicts? >> Should I be specifying unique ports for each server? Is the driver >> shared between users? what about between kerne's for the same user? >> Curious if anyone has any insight. >> >> Thanks! >> >> >> java org.apache.spark.deploy.SparkSubmitDriverBootstrapper --master >> mesos://hadoopmapr3:5050 --driver-memory 1G --executor-memory 4096M >> pyspark-shell >

