[jira] [Updated] (SPARK-11874) DistributedCache for PySpark

Ranjana Rajendran (JIRA) Thu, 19 Nov 2015 17:27:29 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-11874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ranjana Rajendran updated SPARK-11874:
--------------------------------------
    Description: 
I have access only to the workbench of a cluster. All the nodes have only 
python 2.6. I want to use PySpark with iPython notebook with Python 2.7. 

I created a python2.7 virtual environment as follows:
conda create -n py27 python=2.7 anaconda

source activate py27

I installed all required modules in py27 .

Created a zip for the py27 virtual environment.
zip -r py27.zip py27

hadoop fs -put py27.zip

Now

export PYSPARK_DRIVER_PYTHON=ipython
export PYSPARK_DRIVER_PYTHON_OPTS=notebook
export PYSPARK_PYTHON=./py27/bin/python
export 
PYTHONPATH=/opt/spark/python/lib/py4j-0.8.2.1-src.zip:/opt/spark/python/:PYSPARK_DRIVER_PYTHON=ipython

I launched pyspark as follows:

/opt/spark/bin/pyspark --verbose  --name iPythondemo --conf 
spark.yarn.executor.memoryOverhead=2048 --conf 
spark.eventLog.dir=${spark_event_log_dir}$USER/ --master yarn --deploy-mode 
client --archives hdfs:///user/alti_ranjana/py27.zip#py27 --executor-memory 8G  
--executor-cores 2 --queue default --num-executors 48 $spark_opts_extra

When I try to run a job in client mode, i.e. making use of executors running on 
all the nodes, 

I get error stating that file ./py27/bin/python  does not exist. 

I also tried launching pyspark specifying argument  --file py27.zip#py27

I get error

Exception in thread "main" java.lang.IllegalArgumentException: pyspark does not 
support any application options.

Am I doing this the right way ?  Is there something wrong in the way I am doing 
this or is this a known issue ?  Is PySpark working for DistributedCache of zip 
files ?




  was:
I have access only to the workbench of a cluster. All the nodes have only 
python 2.6. I want to use PySpark with iPython notebook with Python 2.7. 

I created a python2.7 virtual environment as follows:
conda create -n py27 python=2.7 anaconda

source activate py27

I installed all required modules in py27 .

Created a zip for the py27 virtual environment.
zip -r py27.zip py27

hadoop fs -put py27.zip

Now

export PYSPARK_DRIVER_PYTHON=ipython
export PYSPARK_DRIVER_PYTHON_OPTS=notebook
export PYSPARK_PYTHON=./py27/bin/python
export 
PYTHONPATH=/opt/spark/python/lib/py4j-0.8.2.1-src.zip:/opt/spark/python/:PYSPARK_DRIVER_PYTHON=ipython

I launched pyspark as follows:

/opt/spark/bin/pyspark --verbose  --name iPythondemo --conf 
spark.yarn.executor.memoryOverhead=2048 --conf 
spark.eventLog.dir=${spark_event_log_dir}$USER/ --master yarn --deploy-mode 
client --file py27.zip --archives hdfs:///user/alti_ranjana/py27.zip#py27 
--executor-memory 8G  --executor-cores 2 --queue default --num-executors 48 
$spark_opts_extra

When I try to run a job in client mode, i.e. making use of executors running on 
all the nodes, 

I get error stating that file ./py27/bin/python  does not exist. 

I also tried launching pyspark specifying argument  --file py27.zip#py27

I get error

Exception in thread "main" java.lang.IllegalArgumentException: pyspark does not 
support any application options.

Am I doing this the right way ?  Is there something wrong in the way I am doing 
this or is this a known issue ?  Is PySpark working for DistributedCache of zip 
files ?





> DistributedCache for PySpark
> ----------------------------
>
>                 Key: SPARK-11874
>                 URL: https://issues.apache.org/jira/browse/SPARK-11874
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.4.1
>            Reporter: Ranjana Rajendran
>
> I have access only to the workbench of a cluster. All the nodes have only 
> python 2.6. I want to use PySpark with iPython notebook with Python 2.7. 
> I created a python2.7 virtual environment as follows:
> conda create -n py27 python=2.7 anaconda
> source activate py27
> I installed all required modules in py27 .
> Created a zip for the py27 virtual environment.
> zip -r py27.zip py27
> hadoop fs -put py27.zip
> Now
> export PYSPARK_DRIVER_PYTHON=ipython
> export PYSPARK_DRIVER_PYTHON_OPTS=notebook
> export PYSPARK_PYTHON=./py27/bin/python
> export 
> PYTHONPATH=/opt/spark/python/lib/py4j-0.8.2.1-src.zip:/opt/spark/python/:PYSPARK_DRIVER_PYTHON=ipython
> I launched pyspark as follows:
> /opt/spark/bin/pyspark --verbose  --name iPythondemo --conf 
> spark.yarn.executor.memoryOverhead=2048 --conf 
> spark.eventLog.dir=${spark_event_log_dir}$USER/ --master yarn --deploy-mode 
> client --archives hdfs:///user/alti_ranjana/py27.zip#py27 --executor-memory 
> 8G  --executor-cores 2 --queue default --num-executors 48 $spark_opts_extra
> When I try to run a job in client mode, i.e. making use of executors running 
> on all the nodes, 
> I get error stating that file ./py27/bin/python  does not exist. 
> I also tried launching pyspark specifying argument  --file py27.zip#py27
> I get error
> Exception in thread "main" java.lang.IllegalArgumentException: pyspark does 
> not support any application options.
> Am I doing this the right way ?  Is there something wrong in the way I am 
> doing this or is this a known issue ?  Is PySpark working for 
> DistributedCache of zip files ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-11874) DistributedCache for PySpark

Reply via email to