Hi, I would like to verify that my understanding of parquet filter pushdown in Drill (https://drill.apache.org/docs/parquet-filter-pushdown/) is correct.
Is it correctly understood that Drill does not support predicate push-down for string fields when dictionary based string encoding is enabled? (It looks like Presto can do this.) We save a lot of space using dictionary encoding (not enabled in Drill 1.10 by default) and if my understanding of how-it-works is correct then the segment dictionary could be used to determine if a value is in a segments or if it can be pruned/skipped when filtering based on columns that are compressed/encoded using a dictionary. I may be misunderstanding how this works and perhaps the dictionary is create for the file as a whole and not individual sections but I know that min/max values would not be good to determine the need for a segment scan. I was hoping we could use partitioning on field(s) with lower cardinality to create partitions for typical partition pruning and then sort the contents of individual fields by session/customer IDs (which include alphanumeric characters here) so that segments would only contain a relatively low number of those unique values to facilitate "segment pruning" when looking for data belonging to individual sessions/customers. Best regards, -Stefán Baxter
