There is but it's not exposed in Python yet See the "batch_size" parameter of ArrowReaderProperties
https://github.com/apache/arrow/blob/master/cpp/src/parquet/properties.h#L565 and the GetRecordBatchReader method on parquet::arrow::FileReader. There's some related work happening in the C++ Datasets project I'd like to see batch-based reading refined and better documented both in C++ and Python, this would be a nice project for a volunteer to take on. - Wes On Sun, Dec 8, 2019 at 9:00 PM Zhuo Jia Dai <[email protected]> wrote: > > > For example, pandas's read_csv has a chunk_size argument which allows the > read_csv to return an iterator on the CSV file so we can read it in chunks. > > The Parquet format stores the data in chunks, but there isn't a documented > way to read in it chunks like read_csv. > > Is there a way to read parquet files in chunks? > > -- > ZJ > > [email protected]
