subject:"\[PySpark 2.3\+\] Reading parquet entire path vs a set of file paths"

Re: [PySpark 2.3+] Reading parquet entire path vs a set of file paths

2020-06-03 Thread Rishi Shah

Hi All, Just following up on below to see if anyone has any suggestions. Appreciate your help in advance. Thanks, Rishi On Mon, Jun 1, 2020 at 9:33 AM Rishi Shah wrote: > Hi All, > > I use the following to read a set of parquet file paths when files are > scattered across many many

[PySpark 2.3+] Reading parquet entire path vs a set of file paths

2020-06-01 Thread Rishi Shah

Hi All, I use the following to read a set of parquet file paths when files are scattered across many many partitions. paths = ['p1', 'p2', ... 'p1'] df = spark.read.parquet(*paths) Above method feels like is sequentially reading those files & not really parallelizing the read operation, is