I'm not familiar with the "parquet block size". However, you can use row groups to accomplish this task. You could write a single 10GB file with 5 row groups. Then, when reading, the arrow readers allow you to specify which row groups you would like to read.
On Wed, Jun 7, 2023 at 6:13 AM Sanskar Modi <[email protected]> wrote: > Hi everyone, > > We have a use case where we're writing a parquet file to a remote server > and we want to read this parquet file using arrow. But we want multiple > hosts to read splits of the parquet file based on parquet block size. > > Ex: If the parquet file size is 10 GB, we want 5 hosts to read a 2 GB > split of the parquet file. This is possible if we read via native > ParquetReader but from documentation, it is not clear if arrow readers > support this. Can someone help with this? > > Regards, > Sanskar Modi >
