What size are the row groups in your parquet files? How many columns and rows in the files?
On Sat, Jul 1, 2023, 6:08 PM Paulo Motta <[email protected]> wrote: > Hi, > > I'm trying to read 4096 parquet files with a total size of 6GB using this > cookbook: > https://arrow.apache.org/cookbook/java/dataset.html#query-parquet-file > > I'm using 100 threads, each thread processing one file at a time on a 72 > core machine with 32GB heap. The files are pre-loaded in memory. > > However it's taking about 10 minutes to process these 4096 files with a > total size of only 6GB and the process seems to be cpu-bound. > > Is this expected read performance for parquet files or am I > doing something wrong? Any help or tips would be appreciated. > > Thanks, > > Paulo >
