I didn't get any response or updates on this jira ticket ( DRILL-4665). Does anyone looking into this? On 11 May 2016 03:31, "Aman Sinha" <amansi...@apache.org> wrote:
> The Drill test team was able to repro this and is now filed as: > https://issues.apache.org/jira/browse/DRILL-4665 > > On Tue, May 10, 2016 at 8:16 AM, Aman Sinha <amansi...@apache.org> wrote: > > > This is supposed to work, especially since LIKE predicate is not even on > > the partitioning column (it should work either way). I did a quick test > > with file system tables and it works for LIKE conditions. Not sure yet > > about Hive tables. Could you pls file a JIRA and we'll follow up. > > Thanks. > > > > -Aman > > > > On Tue, May 10, 2016 at 1:09 AM, Shankar Mane < > shankar.m...@games24x7.com> > > wrote: > > > >> Problem: > >> > >> 1. In drill, we are using hive partition table. But explain plan (same > >> query) for like and = operator differs and used all partitions in case > of > >> like operator. > >> 2. If you see below drill explain plans: Like operator uses *all* > >> partitions where > >> = operator uses *only* partition filtered by log_date condition. > >> > >> FYI- We are storing our logs in hive partition table (parquet, > >> gz-compressed). Each partition is having ~15 GB data. Below is the > >> describe > >> statement output from hive: > >> > >> > >> /**************************************************************** Hive > >> > >> > ************************************************************************************/ > >> hive> desc hive_kafkalogs_daily ; > >> OK > >> col_name data_type comment > >> sessionid string > >> ajaxurl string > >> > >> log_date string > >> > >> # Partition Information > >> # col_name data_type comment > >> > >> log_date string > >> > >> > >> > >> > >> /***************************************************************** Drill > >> Plan (query with LIKE) > >> > >> > ***********************************************************************************/ > >> > >> explain plan for select sessionid, servertime, ajaxUrl from > >> hive.hive_kafkalogs_daily where log_date = '2016-05-09' and ajaxUrl like > >> '%utm_source%' limit 1 ; > >> > >> +------+------+ > >> | text | json | > >> +------+------+ > >> | 00-00 Screen > >> 00-01 Project(sessionid=[$0], servertime=[$1], ajaxUrl=[$2]) > >> 00-02 SelectionVectorRemover > >> 00-03 Limit(fetch=[1]) > >> 00-04 UnionExchange > >> 01-01 SelectionVectorRemover > >> 01-02 Limit(fetch=[1]) > >> 01-03 Project(sessionid=[$0], servertime=[$1], > >> ajaxUrl=[$2]) > >> 01-04 SelectionVectorRemover > >> 01-05 Filter(condition=[AND(=($3, '2016-05-09'), > >> LIKE($2, '%utm_source%'))]) > >> 01-06 Scan(groupscan=[HiveScan > >> [table=Table(dbName:default, tableName:hive_kafkalogs_daily), > >> columns=[`sessionid`, `servertime`, `ajaxurl`, `log_date`], > >> numPartitions=29, partitions= [Partition(values:[2016-04-11]), > >> Partition(values:[2016-04-12]), Partition(values:[2016-04-13]), > >> Partition(values:[2016-04-14]), Partition(values:[2016-04-15]), > >> Partition(values:[2016-04-16]), Partition(values:[2016-04-17]), > >> Partition(values:[2016-04-18]), Partition(values:[2016-04-19]), > >> Partition(values:[2016-04-20]), Partition(values:[2016-04-21]), > >> Partition(values:[2016-04-22]), Partition(values:[2016-04-23]), > >> Partition(values:[2016-04-24]), Partition(values:[2016-04-25]), > >> Partition(values:[2016-04-26]), Partition(values:[2016-04-27]), > >> Partition(values:[2016-04-28]), Partition(values:[2016-04-29]), > >> Partition(values:[2016-04-30]), Partition(values:[2016-05-01]), > >> Partition(values:[2016-05-02]), Partition(values:[2016-05-03]), > >> Partition(values:[2016-05-04]), Partition(values:[2016-05-05]), > >> Partition(values:[2016-05-06]), Partition(values:[2016-05-07]), > >> Partition(values:[2016-05-08]), Partition(values:[2016-05-09])], > >> > >> > inputDirectories=[hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160411, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160412, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160413, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160414, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160415, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160416, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160417, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160418, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160419, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160420, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160421, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160422, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160423, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160424, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160425, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160426, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160427, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160428, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160429, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160430, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160501, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160502, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160503, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160504, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160505, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160506, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160507, > >> hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160508, > >> > hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160509]]]) > >> | { > >> "head" : { > >> "version" : 1, > >> "generator" : { > >> "type" : "ExplainHandler", > >> "info" : "" > >> }, > >> "type" : "APACHE_DRILL_PHYSICAL", > >> "options" : [ ], > >> "queue" : 0, > >> "resultMode" : "EXEC" > >> }, > >> "graph" : [ { > >> "pop" : "hive-scan", > >> "@id" : 65542, > >> "userName" : "hadoop", > >> "hive-table" : { > >> "table" : { > >> "tableName" : "hive_kafkalogs_daily", > >> "dbName" : "default", > >> "owner" : "hadoop", > >> "createTime" : 1461952920, > >> "lastAccessTime" : 0, > >> "retention" : 0, > >> "sd" : { > >> "cols" : [ { > >> "name" : "sessionid", > >> "type" : "string", > >> "comment" : null > >> }, { > >> "name" : "servertime", > >> "type" : "string", > >> "comment" : null > >> }, { > >> "name" : "ajaxurl", > >> "type" : "string", > >> "comment" : null > >> } ], > >> "location" : > >> "hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily", > >> "inputFormat" : > >> "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat", > >> "outputFormat" : > >> "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat", > >> "compressed" : false, > >> "numBuckets" : -1, > >> "serDeInfo" : { > >> "name" : null, > >> "serializationLib" : > >> "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe", > >> "parameters" : { > >> "serialization.format" : "1" > >> } > >> }, > >> "sortCols" : [ ], > >> "parameters" : { } > >> }, > >> "partitionKeys" : [ { > >> "name" : "log_date", > >> "type" : "string", > >> "comment" : null > >> } ], > >> "parameters" : { > >> "EXTERNAL" : "TRUE", > >> "transient_lastDdlTime" : "1461952920" > >> }, > >> "viewOriginalText" : null, > >> "viewExpandedText" : null, > >> "tableType" : "EXTERNAL_TABLE" > >> }, > >> "partitions" : [ { > >> "values" : [ "2016-04-11" ], > >> "tableName" : "hive_kafkalogs_daily", > >> "dbName" : "default", > >> "createTime" : 1461952941, > >> "lastAccessTime" : 0, > >> "sd" : { > >> "cols" : [ { > >> "name" : "sessionid", > >> "type" : "string", > >> "comment" : null > >> | > >> +------+------+ > >> 1 row selected (0.859 seconds) > >> > >> > >> > >> > >> > >> > >> /***************************************************************** Drill > >> Plan (query without LIKE) > >> > >> > ***********************************************************************************/ > >> > >> explain plan for select sessionid, servertime, ajaxUrl from > >> hive.hive_kafkalogs_daily where log_date = '2016-05-09' and ajaxUrl = > >> 'utm_source' limit 1 ; > >> > >> +------+------+ > >> | text | json | > >> +------+------+ > >> | 00-00 Screen > >> 00-01 Project(sessionid=[$0], servertime=[$1], ajaxUrl=[$2]) > >> 00-02 SelectionVectorRemover > >> 00-03 Limit(fetch=[1]) > >> 00-04 UnionExchange > >> 01-01 SelectionVectorRemover > >> 01-02 Limit(fetch=[1]) > >> 01-03 Project(sessionid=[$0], servertime=[$1], > >> ajaxUrl=[$2]) > >> 01-04 SelectionVectorRemover > >> 01-05 Filter(condition=[AND(=($3, '2016-05-09'), > >> =($2, > >> 'utm_source'))]) > >> 01-06 Scan(groupscan=[HiveScan > >> [table=Table(dbName:default, tableName:hive_kafkalogs_daily), > >> columns=[`sessionid`, `servertime`, `ajaxurl`, `log_date`], > >> numPartitions=1, partitions= [Partition(values:[2016-05-09])], > >> > >> > inputDirectories=[hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily_20160509]]]) > >> | { > >> "head" : { > >> "version" : 1, > >> "generator" : { > >> "type" : "ExplainHandler", > >> "info" : "" > >> }, > >> "type" : "APACHE_DRILL_PHYSICAL", > >> "options" : [ ], > >> "queue" : 0, > >> "resultMode" : "EXEC" > >> }, > >> "graph" : [ { > >> "pop" : "hive-scan", > >> "@id" : 65542, > >> "userName" : "hadoop", > >> "hive-table" : { > >> "table" : { > >> "tableName" : "hive_kafkalogs_daily", > >> "dbName" : "default", > >> "owner" : "hadoop", > >> "createTime" : 1461952920, > >> "lastAccessTime" : 0, > >> "retention" : 0, > >> "sd" : { > >> "cols" : [ { > >> "name" : "sessionid", > >> "type" : "string", > >> "comment" : null > >> }, { > >> "name" : "servertime", > >> "type" : "string", > >> "comment" : null > >> }, { > >> "name" : "ajaxurl", > >> "type" : "string", > >> "comment" : null > >> } ], > >> "location" : > >> "hdfs://namenode:9000/usr/hive/warehouse/hive_kafkalogs_daily", > >> "inputFormat" : > >> "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat", > >> "outputFormat" : > >> "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat", > >> "compressed" : false, > >> "numBuckets" : -1, > >> "serDeInfo" : { > >> "name" : null, > >> "serializationLib" : > >> "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe", > >> "parameters" : { > >> "serialization.format" : "1" > >> } > >> }, > >> "sortCols" : [ ], > >> "parameters" : { } > >> }, > >> "partitionKeys" : [ { > >> "name" : "log_date", > >> "type" : "string", > >> "comment" : null > >> } ], > >> "parameters" : { > >> "EXTERNAL" : "TRUE", > >> "transient_lastDdlTime" : "1461952920" > >> }, > >> "viewOriginalText" : null, > >> "viewExpandedText" : null, > >> "tableType" : "EXTERNAL_TABLE" > >> }, > >> "partitions" : [ { > >> "values" : [ "2016-05-09" ], > >> "tableName" : "hive_kafkalogs_daily", > >> "dbName" : "default", > >> "createTime" : 1462848405, > >> "lastAccessTime" : 0, > >> "sd" : { > >> "cols" : [ { > >> "name" : "sessionid", > >> "type" : "string", > >> "comment" : null > >> }, { > >> "name" : "servertime", > >> "type" : "string", > >> "comment" : null > >> }, { > >> "name" : "ajaxurl", > >> "type" : "string", > >> "comment" : null > >> | > >> +------+------+ > >> 1 row selected (3.394 seconds) > >> > > > > >