Rahul, Thanks for the details. Is there any plans to support filter pushdown for #1? Do you know if we run analyze stats through hive on a parquet file if that will have enough info to do the pushdown?
Thanks again. On Mon, Nov 14, 2016 at 9:50 AM, rahul challapalli < [email protected]> wrote: > Sonny, > > If the underlying data in the hive table is in parquet format, there are 3 > ways to query from drill : > > 1. Using the hive plugin : This does not support filter pushdown for any > formats (ORC, Parquet, Text...etc) > 2. Directly Querying the folder in maprfs/hdfs which contains the parquet > files using DFS plugin: With DRILL-1950, we can now do a filter pushdown > into the parquet files. In order to take advantage of this feature, the > underlying parquet files should have the relevant stats. This feature will > only be available with the 1.9.0 release > 3. Using the drill's native parquet reader in conjunction with the hive > plugin (See store.hive.optimize_scan_with_native_readers) : This allows > drill to fetch all the metadata about the hive table from the metastore and > then drill uses its own parquet reader for actually reading the files. This > approach currently does not support parquet filter pushdown but this might > be added in the next release after 1.9.0. > > - Rahul > > On Sun, Nov 13, 2016 at 11:06 AM, Sonny Heer <[email protected]> wrote: > > > I'm running a drill query with a where clause on a non-partitioned column > > via hive storage plugin. This query inspects all partitions (kind of > > expected), but when i run the same query in Hive I can see a predicate > > passed down to the query plan. This particular query is much faster in > > Hive vs Drill. BTW these are parquet files. > > > > Hive: > > > > Stage-0 > > > > Fetch Operator > > > > limit:-1 > > > > Select Operator [SEL_2] > > > > outputColumnNames:["_col0"] > > > > Filter Operator [FIL_4] > > > > predicate:(my_column = 123) (type: boolean) > > > > TableScan [TS_0] > > > > alias:my_table > > > > > > Any idea on why this is? My guess is Hive is storing hive specific info > in > > the parquet file since it was created through Hive. Although it seems > > drill-hive plugin should honor this to. Not sure, but willing to look > > through code if someone can point me in the right direction. Thanks! > > > > -- > > > -- Pushpinder S. Heer Senior Software Engineer m: 360-434-4354 h: 509-884-2574
