Re: Does Ipython notebook work with spark? trivial example does not work. Re: bug with IPython notebook?

2014-10-10 Thread jay vyas
PySpark definetly works for me in ipython notebook.  A good way to debug is
do setMaster(local) in your python sc object, see if that works.  Then
from there, modify it to point to the real spark server.

Also, I added a hack where i  did sys.path.insert the path to pyspark in my
python note book to get it working properly.

You can try these instructions out if you want which i recently put
together based on some other stuff online + a few minor modifications .

http://jayunit100.blogspot.com/2014/07/ipython-on-spark.html


On Thu, Oct 9, 2014 at 2:50 PM, Andy Davidson a...@santacruzintegration.com
 wrote:

 I wonder if I am starting iPython notebook incorrectly. The example in my
 original email does not work. It looks like stdout is not configured
 correctly If I submit it as a python.py file It works correctly

 Any idea how I what the problem is?


 Thanks

 Andy


 From: Andrew Davidson a...@santacruzintegration.com
 Date: Tuesday, October 7, 2014 at 4:23 PM
 To: user@spark.apache.org user@spark.apache.org
 Subject: bug with IPython notebook?

 Hi

 I think I found a bug in the iPython notebook integration. I am not sure
 how to report it

 I am running spark-1.1.0-bin-hadoop2.4 on an AWS ec2 cluster. I start the
 cluster using the launch script provided by spark

 I start iPython notebook on my cluster master as follows and use an ssh
 tunnel to open the notebook in a browser running on my local computer

 ec2-user@ip-172-31-20-107 ~]$ IPYTHON_OPTS=notebook --pylab inline
 --no-browser --port=7000 /root/spark/bin/pyspark

 Bellow is the code my notebook executes


 Bug list:

1. Why do I need to create a SparkContext? If I run pyspark
interactively The context is created automatically for me
2. The print statement causes the output to be displayed in the
terminal I started pyspark, not in the notebooks output

 Any comments or suggestions would be greatly appreciated

 Thanks

 Andy


 import sys
 from operator import add

 from pyspark import SparkContext

 # only stand alone jobs should create a SparkContext
 sc = SparkContext(appName=pyStreamingSparkRDDPipe”)

 data = [1, 2, 3, 4, 5]
 rdd = sc.parallelize(data)

 def echo(data):
 print python recieved: %s % (data) # output winds up in the shell
 console in my cluster (ie. The machine I launched pyspark from)

 rdd.foreach(echo)
 print we are done





-- 
jay vyas


bug with IPython notebook?

2014-10-07 Thread Andy Davidson
Hi

I think I found a bug in the iPython notebook integration. I am not sure how
to report it

I am running spark-1.1.0-bin-hadoop2.4 on an AWS ec2 cluster. I start the
cluster using the launch script provided by spark

I start iPython notebook on my cluster master as follows and use an ssh
tunnel to open the notebook in a browser running on my local computer

ec2-user@ip-172-31-20-107 ~]$ IPYTHON_OPTS=notebook --pylab inline
--no-browser --port=7000 /root/spark/bin/pyspark


Bellow is the code my notebook executes


Bug list:
1. Why do I need to create a SparkContext? If I run pyspark interactively
The context is created automatically for me
2. The print statement causes the output to be displayed in the terminal I
started pyspark, not in the notebooks output
Any comments or suggestions would be greatly appreciated

Thanks

Andy


import sys
from operator import add

from pyspark import SparkContext

# only stand alone jobs should create a SparkContext
sc = SparkContext(appName=pyStreamingSparkRDDPipe²)

data = [1, 2, 3, 4, 5]
rdd = sc.parallelize(data)

def echo(data):
print python recieved: %s % (data) # output winds up in the shell
console in my cluster (ie. The machine I launched pyspark from)

rdd.foreach(echo)
print we are done