I'm running a drill query with a where clause on a non-partitioned column via hive storage plugin. This query inspects all partitions (kind of expected), but when i run the same query in Hive I can see a predicate passed down to the query plan. This particular query is much faster in Hive vs Drill. BTW these are parquet files.
Hive: Stage-0 Fetch Operator limit:-1 Select Operator [SEL_2] outputColumnNames:["_col0"] Filter Operator [FIL_4] predicate:(my_column = 123) (type: boolean) TableScan [TS_0] alias:my_table Any idea on why this is? My guess is Hive is storing hive specific info in the parquet file since it was created through Hive. Although it seems drill-hive plugin should honor this to. Not sure, but willing to look through code if someone can point me in the right direction. Thanks! --
