Hi,
I am trying to dynamically create Dataframe by reading subdirectories under
parent directory
My code looks like
> import org.apache.spark._
> import org.apache.spark.sql._
> val hadoopConf = new org.apache.hadoop.conf.Configuration()
> val hdfsConn = org.apache.hadoop.fs.FileSystem.get(new
> java.net.URI("hdfs://xxx.xx.xx.xxx:8020"), hadoopConf)
> hdfsConn.listStatus(new
> org.apache.hadoop.fs.Path("/TestDivya/Spark/ParentDir/")).foreach{
> fileStatus =>
> val filePathName = fileStatus.getPath().toString()
> val fileName = fileStatus.getPath().getName().toLowerCase()
> var df = "df"+fileName
> df =
> sqlContext.read.format("com.databricks.spark.csv").option("header",
> "true").option("inferSchema", "true").load(filePathName)
> }
getting below error
> <console>:35: error: type mismatch;
> found : org.apache.spark.sql.DataFrame
> required: String
> df =
> sqlContext.read.format("com.databricks.spark.csv").option("header",
> "true").option("inferSchema", "true").load(filePathName)
Am I missing something ?
Would really appreciate the help .
Thanks,
Divya