Hi All,
Just following up on below to see if anyone has any suggestions. Appreciate
your help in advance.
Thanks,
Rishi
On Mon, Jun 1, 2020 at 9:33 AM Rishi Shah wrote:
> Hi All,
>
> I use the following to read a set of parquet file paths when files are
> scattered across many many
Hi All,
I use the following to read a set of parquet file paths when files are
scattered across many many partitions.
paths = ['p1', 'p2', ... 'p1']
df = spark.read.parquet(*paths)
Above method feels like is sequentially reading those files & not really
parallelizing the read operation, is