Perfect. 

BTW just so I know where to look next time, was that in some docs?

On Apr 28, 2014, at 7:04 PM, Nicholas Chammas <nicholas.cham...@gmail.com> 
wrote:

Yep, as I just found out, you can also provide sc.textFile() with a 
comma-delimited string of all the files you want to load.

For example:

sc.textFile('/path/to/file1,/path/to/file2')
So once you have your list of files, concatenate their paths like that and pass 
the single string to textFile().

Nick



On Mon, Apr 28, 2014 at 7:23 PM, Pat Ferrel <pat.fer...@gmail.com> wrote:
sc.textFile(URI) supports reading multiple files in parallel but only with a 
wildcard. I need to walk a dir tree, match a regex to create a list of files, 
then I’d like to read them into a single RDD in parallel. I understand these 
could go into separate RDDs then a union RDD can be created. Is there a way to 
create a single RDD from a URI list?


Reply via email to