Hi,

I'm new to the list so apologies up front if this is the wrong place to post this (glad to take input).

I converted a large set of CSV files to parquet files using Drill. I tried this with snappy and uncompressed.

Subsequent reads with a 'select count(*) from dfs.`mydir` where `somecolumn` > 47;' always does 16k reads. Using flightrecorder this seems to come from the Page Header in the Parquet files.

Anyone know a way to increase the 16k reads? Thinking about writing my own parquet files but thought I'd ask if there was some config way to do it first. And also ask if writing my own parquet file with bigger sizes in the Page Header will help?

Thanks in advance,
Mark

Reply via email to