What version of pyarrow are you using? What's your OS? Is the file on a local disk or S3? How many row groups are in your file?
A difference of that much is not expected. However, they do use different infrastructure under the hood. Do you also get the faster performance with pq.read_table(use_legacy_dataset=True) as well. On Wed, Feb 23, 2022, 7:07 PM Shawn Zeng <[email protected]> wrote: > Hi all, I found that for the same parquet file, > using pq.ParquetFile(file_name).read() takes 6s while > pq.read_table(file_name) takes 17s. How do those two apis differ? I thought > they use the same internals but it seems not. The parquet file is 865MB, > snappy compression and enable dictionary. All other settings are default, > writing with pyarrow. >
