Hi please paste the exception for Spark vs Jupyter, you might want to sign up for this. It'll give you jupyter and spark...and presumably the spark-csv is already part of it ?
https://community.cloud.databricks.com/login.html hth marco On Sat, Sep 3, 2016 at 8:10 PM, Arif,Mubaraka <arif.mubar...@heb.com> wrote: > On the on-premise *Cloudera Hadoop 5.7.2* I have installed the anaconda > package and trying to *setup Jupyter notebook *to work with spark1.6. > > > > I have ran into problems when I trying to use the package > *com.databricks:spark-csv_2.10:1.4.0* for *reading and inferring the > schema of the csv file using python spark*. > > > > I have installed the* jar file - spark-csv_2.10-1.4.0.jar *in > */var/opt/teradata/cloudera/parcels/CDH-5.7.2-1.cdh5.7.2.p0.18/jar* and c > *onfigurations* are set as : > > > > export PYSPARK_DRIVER_PYTHON=/var/opt/teradata/cloudera/parcels/ > Anaconda-4.0.0/bin/jupyter > export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=False > --NotebookApp.ip='*' --NotebookApp.port=8083" > export PYSPARK_PYTHON=/var/opt/teradata/cloudera/parcels/ > Anaconda-4.0.0/bin/python > > > > When I run pyspark from the command line with packages option, like : > > > > *$pyspark --packages com.databricks:spark-csv_2.10:1.4.0 * > > > > It throws the error and fails to recognize the added dependency. > > > > Any ideas on how to resolve this error is much appreciated. > > > > Also, any ideas on the experience in installing and running Jupyter > notebook with anaconda and spark please share. > > > > thanks, > > Muby > > > > > --------------------------------------------------------------------- To > unsubscribe e-mail: user-unsubscr...@spark.apache.org