I'm not familiar with the "parquet block size".  However, you can use row
groups to accomplish this task.  You could write a single 10GB file with 5
row groups.  Then, when reading, the arrow readers allow you to specify
which row groups you would like to read.

On Wed, Jun 7, 2023 at 6:13 AM Sanskar Modi <[email protected]> wrote:

> Hi everyone,
>
> We have a use case where we're writing a parquet file to a remote server
> and we want to read this parquet file using arrow. But we want multiple
> hosts to read splits of the parquet file based on parquet block size.
>
> Ex: If the parquet file size is 10 GB, we want 5 hosts to read a 2 GB
> split of the parquet file. This is possible if we read via native
> ParquetReader but from documentation, it is not clear if arrow readers
> support this. Can someone help with this?
>
> Regards,
> Sanskar Modi
>

Reply via email to