If I understood correctly, you can have many sub-dirs under *hdfs:///TestDirectory *and and you need to attach a schema to all part files in a sub-dir.
1) I am assuming that you know the sub-dirs names : For that, you need to list all sub-dirs inside *hdfs:///TestDirectory *using Scala, iterate over sub-dirs foreach sub-dir in the list read the partfiles , identify and attach schema respective to that sub-directory. 2) If you don't know the sub-directory names: You need to store schema somewhere inside that sub-directory and read it in iteration. On Fri, Feb 19, 2016 at 3:44 PM, Divya Gehlot <divya.htco...@gmail.com> wrote: > Hi, > I have a use case ,where I have one parent directory > > File stucture looks like > hdfs:///TestDirectory/spark1/part files( created by some spark job ) > hdfs:///TestDirectory/spark2/ part files (created by some spark job ) > > spark1 and spark 2 has different schema > > like spark 1 part files schema > carname model year > > Spark2 part files schema > carowner city carcost > > > As these spark 1 and spark2 directory gets created dynamically > can have spark3 directory with different schema > > M requirement is to read the parent directory and list sub drectory > and create dataframe for each subdirectory > > I am not able to get how can I list subdirectory under parent directory > and dynamically create dataframes. > > Thanks, > Divya > > > > >