Re: PySpark without PySpark

Sujit Pal Thu, 09 Jul 2015 09:04:34 -0700

Hi Ashish,

Your 00-pyspark-setup file looks very different from mine (and from the one
described in the blog post). Questions:


1) Do you have SPARK_HOME set up in your environment? Because if not, it
sets it to None in your code. You should provide the path to your Spark
installation. In my case I have spark-1.3.1 installed under $HOME/Software
and the code block under "# Configure the environment" (or yellow highlight
in the code below) reflects that.
2) Is there a python2 or python subdirectory under the root of your Spark
installation? In my case its "python" not "python2". This contains the
Python bindings for spark, so the block under "# Add the PySpark/py4j to
the Python path" (or green highlight in the code below) adds it to the
Python sys.path so things like pyspark.SparkContext are accessible in your
Python environment.

import os
import sys

# Configure the environment
if 'SPARK_HOME' not in os.environ:
    os.environ['SPARK_HOME'] = "/Users/palsujit/Software/spark-1.3.1"

# Create a variable for our root path
SPARK_HOME = os.environ['SPARK_HOME']

# Add the PySpark/py4j to the Python Path
sys.path.insert(0, os.path.join(SPARK_HOME, "python", "build"))
sys.path.insert(0, os.path.join(SPARK_HOME, "python"))

Hope this fixes things for you.

-sujit


On Wed, Jul 8, 2015 at 9:52 PM, Ashish Dutt <ashish.du...@gmail.com> wrote:

> Hi Sujit,
> Thanks for your response.
>
> So i opened a new notebook using the command ipython notebook --profile
> spark and tried the sequence of commands. i am getting errors. Attached is
> the screenshot of the same.
> Also I am attaching the  00-pyspark-setup.py for your reference. Looks
> like, I have written something wrong here. Cannot seem to figure out, what
> is it?
>
> Thank you for your help
>
>
> Sincerely,
> Ashish Dutt
>
> On Thu, Jul 9, 2015 at 11:53 AM, Sujit Pal <sujitatgt...@gmail.com> wrote:
>
>> Hi Ashish,
>>
>> >> Nice post.
>> Agreed, kudos to the author of the post, Benjamin Benfort of District
>> Labs.
>>
>> >> Following your post, I get this problem;
>> Again, not my post.
>>
>> I did try setting up IPython with the Spark profile for the edX Intro to
>> Spark course (because I didn't want to use the Vagrant container) and it
>> worked flawlessly with the instructions provided (on OSX). I haven't used
>> the IPython/PySpark environment beyond very basic tasks since then though,
>> because my employer has a Databricks license which we were already using
>> for other stuff and we ended up doing the labs on Databricks.
>>
>> Looking at your screenshot though, I don't see why you think its picking
>> up the default profile. One simple way of checking to see if things are
>> working is to open a new notebook and try this sequence of commands:
>>
>> from pyspark import SparkContext
>> sc = SparkContext("local", "pyspark")
>> sc
>>
>> You should see something like this after a little while:
>> <pyspark.context.SparkContext at 0x1093c9b10>
>>
>> While the context is being instantiated, you should also see lots of log
>> lines scroll by on the terminal where you started the "ipython notebook
>> --profile spark" command - these log lines are from Spark.
>>
>> Hope this helps,
>> Sujit
>>
>>
>> On Wed, Jul 8, 2015 at 6:04 PM, Ashish Dutt <ashish.du...@gmail.com>
>> wrote:
>>
>>> Hi Sujit,
>>> Nice post.. Exactly what I had been looking for.
>>> I am relatively a beginner with Spark and real time data processing.
>>> We have a server with CDH5.4 with 4 nodes. The spark version in our
>>> server is 1.3.0
>>> On my laptop I have spark 1.3.0 too and its using Windows 7 environment.
>>> As per point 5 of your post I am able to invoke pyspark locally as in a
>>> standalone mode.
>>>
>>> Following your post, I get this problem;
>>>
>>> 1. In section "Using Ipython notebook with spark" I cannot understand
>>> why it is picking up the default profile and not the pyspark profile. I am
>>> sure it is because of the path variables. Attached is the screenshot. Can
>>> you suggest how to solve this.
>>>
>>> Current the path variables for my laptop are like
>>> SPARK_HOME="C:\SPARK-1.3.0\BIN", JAVA_HOME="C:\PROGRAM
>>> FILES\JAVA\JDK1.7.0_79", HADOOP_HOME="D:\WINUTILS", M2_HOME="D:\MAVEN\BIN",
>>> MAVEN_HOME="D:\MAVEN\BIN", PYTHON_HOME="C:\PYTHON27\", SBT_HOME="C:\SBT\"
>>>
>>>
>>> Sincerely,
>>> Ashish Dutt
>>> PhD Candidate
>>> Department of Information Systems
>>> University of Malaya, Lembah Pantai,
>>> 50603 Kuala Lumpur, Malaysia
>>>
>>> On Thu, Jul 9, 2015 at 4:56 AM, Sujit Pal <sujitatgt...@gmail.com>
>>> wrote:
>>>
>>>> You are welcome Davies. Just to clarify, I didn't write the post (not
>>>> sure if my earlier post gave that impression, apologize if so), although I
>>>> agree its great :-).
>>>>
>>>> -sujit
>>>>
>>>>
>>>> On Wed, Jul 8, 2015 at 10:36 AM, Davies Liu <dav...@databricks.com>
>>>> wrote:
>>>>
>>>>> Great post, thanks for sharing with us!
>>>>>
>>>>>
>>>>> On Wed, Jul 8, 2015 at 9:59 AM, Sujit Pal <sujitatgt...@gmail.com>
>>>>> wrote:
>>>>> > Hi Julian,
>>>>> >
>>>>> > I recently built a Python+Spark application to do search relevance
>>>>> > analytics. I use spark-submit to submit PySpark jobs to a Spark
>>>>> cluster on
>>>>> > EC2 (so I don't use the PySpark shell, hopefully thats what you are
>>>>> looking
>>>>> > for). Can't share the code, but the basic approach is covered in
>>>>> this blog
>>>>> > post - scroll down to the section "Writing a Spark Application".
>>>>> >
>>>>> >
>>>>> https://districtdatalabs.silvrback.com/getting-started-with-spark-in-python
>>>>> >
>>>>> > Hope this helps,
>>>>> >
>>>>> > -sujit
>>>>> >
>>>>> >
>>>>> > On Wed, Jul 8, 2015 at 7:46 AM, Julian <julian+sp...@magnetic.com>
>>>>> wrote:
>>>>> >>
>>>>> >> Hey.
>>>>> >>
>>>>> >> Is there a resource that has written up what the necessary steps
>>>>> are for
>>>>> >> running PySpark without using the PySpark shell?
>>>>> >>
>>>>> >> I can reverse engineer (by following the tracebacks and reading the
>>>>> shell
>>>>> >> source) what the relevant Java imports needed are, but I would
>>>>> assume
>>>>> >> someone has attempted this before and just published something I can
>>>>> >> either
>>>>> >> follow or install? If not, I have something that pretty much works
>>>>> and can
>>>>> >> publish it, but I'm not a heavy Spark user, so there may be some
>>>>> things
>>>>> >> I've
>>>>> >> left out that I haven't hit because of how little of pyspark I'm
>>>>> playing
>>>>> >> with.
>>>>> >>
>>>>> >> Thanks,
>>>>> >> Julian
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> View this message in context:
>>>>> >>
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-without-PySpark-tp23719.html
>>>>> >> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>> >>
>>>>> >>
>>>>> ---------------------------------------------------------------------
>>>>> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>> >> For additional commands, e-mail: user-h...@spark.apache.org
>>>>> >>
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Re: PySpark without PySpark

Reply via email to