Hi,

  We are using version 1.4 with 100's of thousands of parquet files. We are
exploring ways to speed up our queries.

Is is possible to have .drill.parquet.metadata files to include min-max for
numerical columns?
And then have the query take advantage of min-max metadata to determine
which folder/files to drill into when using criteria on these numerical
column?

example:
working with a schema that contains a INT column called QTY

data\.drill.parquet.metadata             -> COLUMN QTY min:1000 max: 5000

data\0\1000 parquets files...
data\0\.drill.parquet.metadata         -> COLUMN QTY min:1000 max: 2000

data\1\1000 parquets files...
data\1\.drill.parquet.metadata      -> COLUMN QTY min:3000 max: 5000


Doing a select where QTY < 999 would hit no files at all.
Doing a select where QTY > 1 and QTY < 1500 would hit files in data\0\ only


Thanks

François

Reply via email to