Hi,
I am trying to dynamically create Dataframe by reading subdirectories under
parent directory

My code looks like

> import org.apache.spark._
> import org.apache.spark.sql._
> val hadoopConf = new org.apache.hadoop.conf.Configuration()
> val hdfsConn = org.apache.hadoop.fs.FileSystem.get(new
> java.net.URI("hdfs://xxx.xx.xx.xxx:8020"), hadoopConf)
> hdfsConn.listStatus(new
> org.apache.hadoop.fs.Path("/TestDivya/Spark/ParentDir/")).foreach{
> fileStatus =>
>    val filePathName = fileStatus.getPath().toString()
>    val fileName = fileStatus.getPath().getName().toLowerCase()
>    var df =  "df"+fileName
>    df =
> sqlContext.read.format("com.databricks.spark.csv").option("header",
> "true").option("inferSchema", "true").load(filePathName)
> }


getting below error

> <console>:35: error: type mismatch;
>  found   : org.apache.spark.sql.DataFrame
>  required: String
>                  df =
> sqlContext.read.format("com.databricks.spark.csv").option("header",
> "true").option("inferSchema", "true").load(filePathName)


Am I missing something ?

Would really appreciate the help .


Thanks,
Divya

Reply via email to