What size are the row groups in your parquet files?  How many columns and
rows in the files?

On Sat, Jul 1, 2023, 6:08 PM Paulo Motta <[email protected]> wrote:

> Hi,
>
> I'm trying to read 4096 parquet files with a total size of 6GB using this
> cookbook:
> https://arrow.apache.org/cookbook/java/dataset.html#query-parquet-file
>
> I'm using 100 threads, each thread processing one file at a time on a 72
> core machine with 32GB heap. The files are pre-loaded in memory.
>
> However it's taking about 10 minutes to process these 4096 files with a
> total size of only 6GB and the process seems to be cpu-bound.
>
> Is this expected read performance for parquet files or am I
> doing something wrong? Any help or tips would be appreciated.
>
> Thanks,
>
> Paulo
>

Reply via email to