Hi,
@Umesh :You understanding is partially correct as per my requirement.
My idea which I try to implement is
Steps which I am trying to follow
(Not sure how feasible it is I am new new bee to spark and scala)
1.List all the files under parent directory
  hdfs :///Testdirectory/
As list
For example : val listsubdirs =(subdir1,subdir2...subdir.n)
Iterate through this list
for(subdir <-listsubdirs){
val df ="df"+subdir
df= read it using spark csv package using custom schema

}
Will get dataframes equal to subdirs

Now I got stuck in first step itself .
How do I list directories and put it in list ?

Hope you understood my issue now.
Thanks,
Divya
On Feb 19, 2016 6:54 PM, "UMESH CHAUDHARY" <umesh9...@gmail.com> wrote:

> If I understood correctly, you can have many sub-dirs under 
> *hdfs:///TestDirectory
> *and and you need to attach a schema to all part files in a sub-dir.
>
> 1) I am assuming that you know the sub-dirs names :
>
>     For that, you need to list all sub-dirs inside *hdfs:///TestDirectory
> *using Scala, iterate over sub-dirs
>     foreach sub-dir in the list
>     read the partfiles , identify and attach schema respective to that
> sub-directory.
>
> 2) If you don't know the sub-directory names:
>     You need to store schema somewhere inside that sub-directory and read
> it in iteration.
>
> On Fri, Feb 19, 2016 at 3:44 PM, Divya Gehlot <divya.htco...@gmail.com>
> wrote:
>
>> Hi,
>> I have a use case ,where I have one parent directory
>>
>> File stucture looks like
>> hdfs:///TestDirectory/spark1/part files( created by some spark job )
>> hdfs:///TestDirectory/spark2/ part files (created by some spark job )
>>
>> spark1 and spark 2 has different schema
>>
>> like spark 1  part files schema
>> carname model year
>>
>> Spark2 part files schema
>> carowner city  carcost
>>
>>
>> As these spark 1 and spark2 directory gets created dynamically
>> can have spark3 directory with different schema
>>
>> M requirement is to read the parent directory and list sub drectory
>> and create dataframe for each subdirectory
>>
>> I am not able to get how can I list subdirectory under parent directory
>> and dynamically create dataframes.
>>
>> Thanks,
>> Divya
>>
>>
>>
>>
>>
>

Reply via email to