download a right version of this jar http://mvnrepository.com/artifact/com.databricks/spark-csv_2.10 (or 2.11), and append it to SPARK_CLASSPATH
2016-02-18 11:05 GMT+01:00 Devesh Raj Singh <raj.deves...@gmail.com>: > Hi, > > I want to read CSV file in pyspark > > I am running pyspark on pycharm > I am trying to load a csv using pyspark > > import os > import sys > > os.environ['SPARK_HOME']="/Users/devesh/Downloads/spark-1.5.1-bin-hadoop2.6" > sys.path.append("/Users/devesh/Downloads/spark-1.5.1-bin-hadoop2.6/python/") > > # Now we are ready to import Spark Modules > try: > from pyspark import SparkContext > from pyspark import SparkConf > from pyspark.mllib.fpm import FPGrowth > print ("Successfully imported all Spark Modules") > except ImportError as e: > print ("Error importing Spark Modules", e) > sys.exit(1) > > > sc = SparkContext('local') > > from pyspark.sql import HiveContext, SQLContext > from pyspark.sql import SQLContext > > df = > sqlContext.read.format('com.databricks.spark.csv').options(header='true', > inferschema='true').load('/Users/devesh/work/iris/iris.csv') > > I am getting the following error > > Py4JJavaError: An error occurred while calling o88.load. > : java.lang.ClassNotFoundException: Failed to load class for data source: > com.databricks.spark.csv. > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:67) > -- > Warm regards, > Devesh. >