I could be wrong, but I think you can do a wild card.
df = spark.read.format('csv').load('/path/to/file*.csv.gz')
Thank You,
Irving Duran
On Fri, May 4, 2018 at 4:38 AM Shuporno Choudhury <
shuporno.choudh...@gmail.com> wrote:
> Hi,
>
> I want to read multiple files parallely into 1 dataframe
Hi,
I want to read multiple files parallely into 1 dataframe. But the files
have random names and cannot confirm to any pattern (so I can't use
wildcard). Also, the files can be in different directories.
If I provide the file names in a list to the dataframe reader, it reads
then sequentially.