[ 
https://issues.apache.org/jira/browse/HIVE-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106370#comment-15106370
 ] 

Charles Pritchard commented on HIVE-10888:
------------------------------------------

I'm seeing a similar issue, in Hive 0.14. I have a two-level partition -- 
partitioned by (date string, bucket string) and it seems that most queries do 
not include the default partition (for bucket) when run. While I can run  
create temp table as select *, and get a fully functioning table, I can not 
simply run select * where, and get useable results from the default partition, 
when I have a where query.

This may be a regression introduced in HIVE-4878. I'll check through some 
support channels to see what I can find. 

> Hive Dynamic Partition + Default Partition makes Null Values Not querable
> -------------------------------------------------------------------------
>
>                 Key: HIVE-10888
>                 URL: https://issues.apache.org/jira/browse/HIVE-10888
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Query Processor
>            Reporter: Goden Yao
>
> This is reported by Pivotal.io (Noa Horn)
> And HAWQ latest version should have this fixed in our queries.
> === Expected Behavior ===
> When dynamic partition enabled and mode = nonstrict, the null value in the 
> default partition should still be returned when user specify that in 
> "...WHERE.... is Null".
> === Problem statment ===
> *Enable dynamic partitions*
> {code}
> hive.exec.dynamic.partition = true
> hive.exec.dynamic.partition.mode = nonstrict
> #Get default partition name:
> hive.exec.default.partition.name
> Default Value: _HIVE_DEFAULT_PARTITION_
> {code}
> Hive creates a default partition if the partition key value doesn’t conform 
> to the field type. For example, if the partition key is NULL.
> *Hive Example*
> Add the following parameters to hive-site.xml
> {code}
>       <property>
>               <name>hive.exec.dynamic.partition</name>
>               <value>true</value>
>       </property>
>       <property>
>               <name>hive.exec.dynamic.partition.mode</name>
>               <value>true</value>
>       </property>
> {code}
> Create data:
> vi /tmp/base_data.txt
> 1,1.0,1900-01-01
> 2,2.2,1994-04-14
> 3,3.3,2011-03-31
> 4,4.5,bla
> 5,5.0,2013-12-06
> Create hive table and load the data to it. This table is used to load data to 
> the partition table.
> {code}
> hive>
> CREATE TABLE base (order_id bigint, order_amount float, date date) ROW FORMAT 
> DELIMITED FIELDS TERMINATED BY ',';
> LOAD DATA LOCAL INPATH '/tmp/base_data.txt' INTO TABLE base;
> SELECT * FROM base;
> OK
> 1    1.0    1900-01-01
> 2    2.2    1994-04-14
> 3    3.3    2011-03-31
> 4    4.5    NULL
> 5    5.0    2013-12-06
> {code}
> Note that one of the rows has NULL in its date field.
> Create hive partition table and load data from base table to it. The data 
> will be dynamically partitioned
> {code}
> CREATE TABLE sales (order_id bigint, order_amount float) PARTITIONED BY (date 
> date);
> INSERT INTO TABLE sales PARTITION (date) SELECT * FROM base;
> SELECT * FROM sales;
> OK
> 1    1.0    1900-01-01
> 2    2.2    1994-04-14
> 3    3.3    2011-03-31
> 5    5.0    2013-12-06
> 4    4.5    NULL
> {code}
> Check that the table has different partitions
> {code}
> hdfs dfs -ls /hive/warehouse/sales
> Found 5 items
> drwxr-xr-x   - nhorn supergroup       0 2015-04-30 15:03 
> /hive/warehouse/sales/date=1900-01-01
> drwxr-xr-x   - nhorn supergroup       0 2015-04-30 15:03 
> /hive/warehouse/sales/date=1994-04-14
> drwxr-xr-x   - nhorn supergroup       0 2015-04-30 15:03 
> /hive/warehouse/sales/date=2011-03-31
> drwxr-xr-x   - nhorn supergroup       0 2015-04-30 15:03 
> /hive/warehouse/sales/date=2013-12-06
> drwxr-xr-x   - nhorn supergroup       0 2015-04-30 15:03 
> /hive/warehouse/sales/date=__HIVE_DEFAULT_PARTITION__
> {code}
> Hive queries with default partition
> Queries without a filter or with a filter on a different field returns the 
> default partition data:
> {code}
> hive> select * from sales;
> OK
> 1    1.0    1900-01-01
> 2    2.2    1994-04-14
> 3    3.3    2011-03-31
> 5    5.0    2013-12-06
> 4    4.5    NULL
> Time taken: 0.578 seconds, Fetched: 5 row(s)
> {code}
> Queries with a filter on the partition field omit the default partition data:
> {code}
> hive> select * from sales where date <> '2013-12-06';
> OK
> 1    1.0    1900-01-01
> 2    2.2    1994-04-14
> 3    3.3    2011-03-31
> Time taken: 0.19 seconds, Fetched: 3 row(s)
> hive> select * from sales where date is null;          
> OK
> Time taken: 0.035 seconds
> hive> select * from sales where date is not null;
> OK
> 1    1.0    1900-01-01
> 2    2.2    1994-04-14
> 3    3.3    2011-03-31
> 5    5.0    2013-12-06
> Time taken: 0.042 seconds, Fetched: 4 row(s)
> hive> select * from sales where date='__HIVE_DEFAULT_PARTITION__';
> OK
> Time taken: 0.056 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to