Hi, @Umesh :You understanding is partially correct as per my requirement. My idea which I try to implement is Steps which I am trying to follow (Not sure how feasible it is I am new new bee to spark and scala) 1.List all the files under parent directory hdfs :///Testdirectory/ As list For example : val listsubdirs =(subdir1,subdir2...subdir.n) Iterate through this list for(subdir <-listsubdirs){ val df ="df"+subdir df= read it using spark csv package using custom schema
} Will get dataframes equal to subdirs Now I got stuck in first step itself . How do I list directories and put it in list ? Hope you understood my issue now. Thanks, Divya On Feb 19, 2016 6:54 PM, "UMESH CHAUDHARY" <umesh9...@gmail.com> wrote: > If I understood correctly, you can have many sub-dirs under > *hdfs:///TestDirectory > *and and you need to attach a schema to all part files in a sub-dir. > > 1) I am assuming that you know the sub-dirs names : > > For that, you need to list all sub-dirs inside *hdfs:///TestDirectory > *using Scala, iterate over sub-dirs > foreach sub-dir in the list > read the partfiles , identify and attach schema respective to that > sub-directory. > > 2) If you don't know the sub-directory names: > You need to store schema somewhere inside that sub-directory and read > it in iteration. > > On Fri, Feb 19, 2016 at 3:44 PM, Divya Gehlot <divya.htco...@gmail.com> > wrote: > >> Hi, >> I have a use case ,where I have one parent directory >> >> File stucture looks like >> hdfs:///TestDirectory/spark1/part files( created by some spark job ) >> hdfs:///TestDirectory/spark2/ part files (created by some spark job ) >> >> spark1 and spark 2 has different schema >> >> like spark 1 part files schema >> carname model year >> >> Spark2 part files schema >> carowner city carcost >> >> >> As these spark 1 and spark2 directory gets created dynamically >> can have spark3 directory with different schema >> >> M requirement is to read the parent directory and list sub drectory >> and create dataframe for each subdirectory >> >> I am not able to get how can I list subdirectory under parent directory >> and dynamically create dataframes. >> >> Thanks, >> Divya >> >> >> >> >> >