Sonny, If the underlying data in the hive table is in parquet format, there are 3 ways to query from drill :
1. Using the hive plugin : This does not support filter pushdown for any formats (ORC, Parquet, Text...etc) 2. Directly Querying the folder in maprfs/hdfs which contains the parquet files using DFS plugin: With DRILL-1950, we can now do a filter pushdown into the parquet files. In order to take advantage of this feature, the underlying parquet files should have the relevant stats. This feature will only be available with the 1.9.0 release 3. Using the drill's native parquet reader in conjunction with the hive plugin (See store.hive.optimize_scan_with_native_readers) : This allows drill to fetch all the metadata about the hive table from the metastore and then drill uses its own parquet reader for actually reading the files. This approach currently does not support parquet filter pushdown but this might be added in the next release after 1.9.0. - Rahul On Sun, Nov 13, 2016 at 11:06 AM, Sonny Heer <[email protected]> wrote: > I'm running a drill query with a where clause on a non-partitioned column > via hive storage plugin. This query inspects all partitions (kind of > expected), but when i run the same query in Hive I can see a predicate > passed down to the query plan. This particular query is much faster in > Hive vs Drill. BTW these are parquet files. > > Hive: > > Stage-0 > > Fetch Operator > > limit:-1 > > Select Operator [SEL_2] > > outputColumnNames:["_col0"] > > Filter Operator [FIL_4] > > predicate:(my_column = 123) (type: boolean) > > TableScan [TS_0] > > alias:my_table > > > Any idea on why this is? My guess is Hive is storing hive specific info in > the parquet file since it was created through Hive. Although it seems > drill-hive plugin should honor this to. Not sure, but willing to look > through code if someone can point me in the right direction. Thanks! > > -- >
