anybody run up against this one?  hive 2.1.0 + using a  "not in" on a list
+ the column is a partition key participant.


* using not

query:

explain
SELECT count(*)
FROM bi.fact_email_funnel
WHERE
    event_date_key = 20160824
    AND etl_source_database *not* in ('foo')


output frag:

Map Operator Tree:
          TableScan
            alias: fact_email_funnel
            Statistics: Num rows: 33723530 Data size: 12154626048 Basic
stats: COMPLETE Column stats: NONE
            Filter Operator
              *predicate: false (type: boolean)  #### this kills any hope
of the query returning anything.*
              Statistics: Num rows: 1 Data size: 360 Basic stats: COMPLETE
Column stats: NONE
              Group By Operator
                aggregations: count()




* not using not

query:

  explain
  SELECT count(*)
  FROM bi.fact_email_funnel
  WHERE
    event_date_key = 20160824
    AND etl_source_database in ('foo')




output frag:
 Map Operator Tree:
          TableScan
            alias: fact_email_funnel
            Statistics: Num rows: 33744659 Data size: 14562180096 Basic
stats: COMPLETE Column stats: NONE
            Select Operator        *###doesn't even mention a filter*
              Statistics: Num rows: 33744659 Data size: 14562180096 Basic
stats: COMPLETE Column stats: NONE
              Group By Operator
                aggregations: count()



* show create table output:

CREATE TABLE `bi.fact_email_funnel`(
  `user_id` string,
  `email` string,
  `ipaddress` string,
  `email_type_key` string,
  `email_type_id` string,
  `email_type_name` string,
  `tracking_code` string,
  `action_type` string,
  `email_guid` string,
  `mailing_id` string,
  `message_id` string,
  `delivered_status` string,
  `delivered_error_code` string,
  `click_link_id` string,
  `click_link` string,
  `click_property_id` string,
  `platform` string,
  `etl_source_id_column` string,
  `etl_source_id` string,
  `event_datetime` timestamp,
  `event_time_key` bigint,
  `id_job` bigint,
  `load_date` timestamp,
  `version` string,
  `event_datetime_utc` timestamp,
  `num_leads` int COMMENT 'number of leads',
  `lead_product_type` string COMMENT 'product lead type <single family,
rental community>',
  `property_impression`
array<struct<propertyid:string,propertystatus:string,city:string,statecode:string,order:string>>
COMMENT 'sent property attributes',
  `search_impression` map<string,string> COMMENT 'search attributes')
PARTITIONED BY (
  `event_date_key` bigint,
  `etl_source_database` string,
  `etl_source_table` string)

Reply via email to