Querying Parquet: Filtering on a sorted column

Dan Wild Fri, 01 Jul 2016 12:49:15 -0700

Hi,

I'm attempting to query a directory of parquet files that are partitioned
on column A (int) and sorted on column B (also int).  When I run a query
such as SELECT * FROM mydirectory WHERE A = 123 AND B = 456, I can see that
the physical query plan is using the criteria on A to choose the correct
parquet file, but it is performing a ParquetGroupScan on ALL rows in that
file despite the criteria on the sorted column B.


Based on my understanding of parquet, Drill should be using the page and/or
column metadata to avoid scanning the entire file when filtering on a
sorted column.  However, there is no performance benefit when filtering on
column B compared to any other non-sorted column.

Is there something I can do to make Drill take advantage of the fact that
my file is sorted?

Thanks,
Dan

Querying Parquet: Filtering on a sorted column

Reply via email to