This is something which is not currently supported. The "parquet filter pushdown" feature should be able to achieve this. Its still under development.
- Rahul On Fri, Jul 1, 2016 at 12:10 PM, Dan Wild <dwild...@gmail.com> wrote: > Hi, > > I'm attempting to query a directory of parquet files that are partitioned > on column A (int) and sorted on column B (also int). When I run a query > such as SELECT * FROM mydirectory WHERE A = 123 AND B = 456, I can see that > the physical query plan is using the criteria on A to choose the correct > parquet file, but it is performing a ParquetGroupScan on ALL rows in that > file despite the criteria on the sorted column B. > > Based on my understanding of parquet, Drill should be using the page and/or > column metadata to avoid scanning the entire file when filtering on a > sorted column. However, there is no performance benefit when filtering on > column B compared to any other non-sorted column. > > Is there something I can do to make Drill take advantage of the fact that > my file is sorted? > > Thanks, > Dan >