I have a large number of directories under a common root: batch-1/file1.txt batch-1/file2.txt batch-1/file3.txt ... batch-2/file1.txt batch-2/file2.txt batch-2/file3.txt ... batch-N/file1.txt batch-N/file2.txt batch-N/file3.txt ...
I would like to read them into an RDD like { "batch-1" : [ content1, content2, content3,...] "batch-2" : [ content1, content2, content3,...] ... "batch-N" : [ content1, content2, content3,...] } Thank you, Oleg On 1 June 2014 17:00, Nicholas Chammas <nicholas.cham...@gmail.com> wrote: > Could you provide an example of what you mean? > > I know it's possible to create an RDD from a path with wildcards, like in > the subject. > > For example, sc.textFile('s3n://bucket/2014-??-??/*.gz'). You can also > provide a comma delimited list of paths. > > Nick > > 2014년 6월 1일 일요일, Oleg Proudnikov<oleg.proudni...@gmail.com>님이 작성한 메시지: > > Hi All, >> >> Is it possible to create an RDD from a directory tree of the following >> form? >> >> RDD[(PATH, Seq[TEXT])] >> >> Thank you, >> Oleg >> >> -- Kind regards, Oleg