Hi, Joseph, This is a known issue but not a bug.
This issue does not occur when you use interactive SparkR session, while it does occur when you execute an R file. The reason behind this is that in case you execute an R file, the R backend launches before the R interpreter, so there is no opportunity for packages specified with ‘sparkPackages’ to be processed. For now, if you want to execute an R file with additional spark packages, please use the “--packages” command line option. > On Jun 17, 2016, at 10:46, Joseph <wxy81...@sina.com> wrote: > > Hi all, > > I find an issue in sparkR, maybe it's a bug: > > When I read csv file, it's normal to use the following way: > ${SPARK_HOME}/bin/spark-submit --packages > com.databricks:spark-csv_2.11:1.4.0 example.R > > But using the following way will give an error: > sc <- sparkR.init(sparkPackages="com.databricks:spark-csv_2.11:1.4.0") > > 16/06/17 09:54:12 ERROR RBackendHandler: loadDF on > org.apache.spark.sql.api.r.SQLUtils failed > Error in invokeJava(isStatic = TRUE, className, methodName, ...) : > java.lang.ClassNotFoundException: Failed to find data source: csv. Please > find packages at http://spark-packages.org <http://spark-packages.org/> > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77) > > It is obvious that the sparkR.init() does not load the specified package! > ----------------------------------------------------------------------------------------------------------------------------------------- > > Appendix: > The complete code for example.R: > > if (nchar(Sys.getenv("SPARK_HOME")) < 1) { > Sys.setenv(SPARK_HOME = "/home/hadoop/spark-1.6.1-bin-hadoop2.6") > } > > library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"))) > > sc <- sparkR.init(master = "local[2]", sparkEnvir = > list(spark.driver.memory="1g"), > sparkPackages="com.databricks:spark-csv_2.11:1.4.0") > > sqlContext <- sparkRSQL.init(sc) > people <- read.df(sqlContext, > "file:/home/hadoop/spark-1.6.1-bin-hadoop2.6/data/mllib/sample_tree_data.csv", > "csv") > registerTempTable(people, "people") > teenagers <- sql(sqlContext, "SELECT * FROM people") > head(teenagers) > > Joseph