I have a large number of directories under a common root:

batch-1/file1.txt
batch-1/file2.txt
batch-1/file3.txt
...
batch-2/file1.txt
batch-2/file2.txt
batch-2/file3.txt
...
batch-N/file1.txt
batch-N/file2.txt
batch-N/file3.txt
...

I would like to read them into an RDD like

{
"batch-1" : [ content1, content2, content3,...]
"batch-2" : [ content1, content2, content3,...]
...
"batch-N" : [ content1, content2, content3,...]
}

Thank you,
Oleg



On 1 June 2014 17:00, Nicholas Chammas <nicholas.cham...@gmail.com> wrote:

> Could you provide an example of what you mean?
>
> I know it's possible to create an RDD from a path with wildcards, like in
> the subject.
>
> For example, sc.textFile('s3n://bucket/2014-??-??/*.gz'). You can also
> provide a comma delimited list of paths.
>
> Nick
>
> 2014년 6월 1일 일요일, Oleg Proudnikov<oleg.proudni...@gmail.com>님이 작성한 메시지:
>
> Hi All,
>>
>> Is it possible to create an RDD from a directory tree of the following
>> form?
>>
>> RDD[(PATH, Seq[TEXT])]
>>
>> Thank you,
>> Oleg
>>
>>


-- 
Kind regards,

Oleg

Reply via email to