If I understood correctly, you can have many sub-dirs under
*hdfs:///TestDirectory
*and and you need to attach a schema to all part files in a sub-dir.

1) I am assuming that you know the sub-dirs names :

    For that, you need to list all sub-dirs inside
*hdfs:///TestDirectory *using
Scala, iterate over sub-dirs
    foreach sub-dir in the list
    read the partfiles , identify and attach schema respective to that
sub-directory.

2) If you don't know the sub-directory names:
    You need to store schema somewhere inside that sub-directory and read
it in iteration.

On Fri, Feb 19, 2016 at 3:44 PM, Divya Gehlot <divya.htco...@gmail.com>
wrote:

> Hi,
> I have a use case ,where I have one parent directory
>
> File stucture looks like
> hdfs:///TestDirectory/spark1/part files( created by some spark job )
> hdfs:///TestDirectory/spark2/ part files (created by some spark job )
>
> spark1 and spark 2 has different schema
>
> like spark 1  part files schema
> carname model year
>
> Spark2 part files schema
> carowner city  carcost
>
>
> As these spark 1 and spark2 directory gets created dynamically
> can have spark3 directory with different schema
>
> M requirement is to read the parent directory and list sub drectory
> and create dataframe for each subdirectory
>
> I am not able to get how can I list subdirectory under parent directory
> and dynamically create dataframes.
>
> Thanks,
> Divya
>
>
>
>
>

Reply via email to