[jira] [Resolved] (HIVE-27728) Changed behavior for alter table rename partition from legacy tables

2023-10-06 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-27728.

Fix Version/s: 4.0.0
   Resolution: Fixed

Thank you [~ngangam] and [~hemanth619] for the review!

> Changed behavior for alter table rename partition from legacy tables
> 
>
> Key: HIVE-27728
> URL: https://issues.apache.org/jira/browse/HIVE-27728
> Project: Hive
>  Issue Type: Bug
>Reporter: Naveen Gangam
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> set hive.create.as.external.legacy=true;
> CREATE TABLE default.metadata_test1(
> emp_number int,
> emp_name string,
> city string)
> PARTITIONED BY(state string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ‘,’;
> INSERT INTO default.metadata_test1 PARTITION(state=‘A’) VALUES (11, ‘ABC’, 
> ‘AA’);
> INSERT INTO default.metadata_test1 PARTITION(state=‘B’) VALUES (12, ‘XYZ’, 
> ‘BX’);
> INSERT INTO default.metadata_test1 PARTITION(state=‘B’) VALUES (13, ‘UVW’, 
> ‘BU’);
> from hdfs
> $ hdfs dfs -ls /warehouse/tablespace/external/hive/metadata_test1
> Found 2 items
> drwxr-xr-x - hive hive 0 2023-08-11 14:24 
> /warehouse/tablespace/external/hive/metadata_test1/state=A
> drwxr-xr-x - hive hive 0 2023-08-11 14:25 
> /warehouse/tablespace/external/hive/metadata_test1/state=B
> Now when we alter the partition.
> ALTER TABLE default.metadata_test1 PARTITION (state=‘A’) RENAME TO PARTITION 
> (state=‘C’);
> select * from default.metadata_test1;
> ++--+--+---+
> | metadata_test1.emp_number | metadata_test1.emp_name | metadata_test1.city | 
> metadata_test1.state |
> ++--+--+---+
> | 12 | XYZ | BX | B |
> | 13 | UVW | BU | B |
> | 11 | ABC | AA | C |
> ++--+--+---+
> show partitions default.metadata_test1;
> ++
> | partition |
> ++
> | state=B |
> | state=C |
> ++
> But from HDFS
> hdfs dfs -ls /warehouse/tablespace/external/hive/metadata_test1
> Found 2 items
> drwxr-xr-x - hive hive 0 2023-08-11 14:24 
> /warehouse/tablespace/external/hive/metadata_test1/state=A
> drwxr-xr-x - hive hive 0 2023-08-11 14:25 
> /warehouse/tablespace/external/hive/metadata_test1/state=B
> Hive only relocates/renames the partition locations to match the new 
> partition value if it is a MANAGED table. For external tables, we assume that 
> ETL pipelines are writing to the old location and thus will not rename the 
> dir.
> But for legacy tables, (create table) that would have created non-acid 
> managed tables, we convert this tables to EXTERNAL_TABLE but with auto-purge 
> enabled. For such tables, we should take the liberty to managed these 
> locations as well to match the legacy MANAGED table behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27584) Backport HIVE-21407 to branch-3

2023-10-06 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha resolved HIVE-27584.
-
Resolution: Fixed

Committed to branch-3. Thanks for the review [~chinnalalam]  !!!

> Backport HIVE-21407 to branch-3
> ---
>
> Key: HIVE-27584
> URL: https://issues.apache.org/jira/browse/HIVE-27584
> Project: Hive
>  Issue Type: Task
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>
> HIVE-21407: Parquet predicate pushdown is not working correctly for char 
> column types (Marta Kuczora reviewed by Peter Vary)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27399) Add lateral view support for CBO

2023-10-06 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-27399.

Fix Version/s: 4.0.0
   Resolution: Fixed

Fixed in 
https://github.com/apache/hive/commit/6d75f92142c7bf2c4e531242c4b106b4c0d71221. 
Thanks for the PR [~scarlin] awesome work! 

Also many thanks to [~kkasa] for the review.

> Add lateral view support for CBO
> 
>
> Key: HIVE-27399
> URL: https://issues.apache.org/jira/browse/HIVE-27399
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> This subtask consists of the implementation of lateral views for CBO.  The 
> first task was to refactor the code but make sure all the "q" tests work. 
> This Jira will make sure CBO works for lateral views and add appropriate 
> tests. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27734) Add Icenerg's storage-partitioned join capabilities to Hive's [sorted-]bucket-map-join

2023-10-06 Thread okumin (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772576#comment-17772576
 ] 

okumin commented on HIVE-27734:
---

I'm taking a look since I have experience implementing Bucket Map Join for 
non-native tables.

> Add Icenerg's storage-partitioned join capabilities to Hive's 
> [sorted-]bucket-map-join
> --
>
> Key: HIVE-27734
> URL: https://issues.apache.org/jira/browse/HIVE-27734
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-alpha-2
>Reporter: Janos Kovacs
>Priority: Major
>
> Iceberg's 'data bucketing' is implemented through its rich (function based) 
> partitioning feature which helps to optimize join operations - called storage 
> partitioned joins. 
> doc: 
> [https://docs.google.com/document/d/1foTkDSM91VxKgkEcBMsuAvEjNybjja-uHk-r3vtXWFE/edit#heading=h.82w8qxfl2uwl]
> spark impl.: https://issues.apache.org/jira/browse/SPARK-37375
> This feature is not yet leveraged in Hive into its bucket-map-join 
> optimization, neither alone nor with Iceberg's SortOrder to 
> sorted-bucket-map-join.
> Customers migrating from Hive table format to Iceberg format with storage 
> optimized schema will experience performance degradation on large tables 
> where Iceberg's gain on no-listing performance improvement is significantly 
> smaller than the actual join performance over bucket-join or even 
> sorted-bucket-join.
>  
> {noformat}
> SET hive.query.results.cache.enabled=false;
> SET hive.fetch.task.conversion = none;
> SET hive.optimize.bucketmapjoin=true;
> SET hive.convert.join.bucket.mapjoin.tez=true;
> SET hive.auto.convert.join.noconditionaltask.size=1000;
> --if you are working with external table, you need this for bmj:
> SET hive.disable.unsafe.external.table.operations=false;
> -- HIVE BUCKET-MAP-JOIN
> DROP TABLE IF EXISTS default.hivebmjt1 PURGE;
> DROP TABLE IF EXISTS default.hivebmjt2 PURGE;
> CREATE TABLE default.hivebmjt1 (id int, txt string) CLUSTERED BY (id) INTO 8 
> BUCKETS;
> CREATE TABLE default.hivebmjt2 (id int, txt string);
> INSERT INTO default.hivebmjt1 VALUES 
> (1,'1'),(2,'2'),(3,'3'),(4,'4'),(5,'5'),(6,'6'),(7,'7'),(8,'8');
> INSERT INTO default.hivebmjt2 VALUES (1,'1'),(2,'2'),(3,'3'),(4,'4');
> EXPLAIN
> SELECT * FROM default.hivebmjt1 f INNER  JOIN default.hivebmjt2 d ON f.id 
> = d.id;
> EXPLAIN
> SELECT * FROM default.hivebmjt1 f LEFT OUTER JOIN default.hivebmjt2 d ON f.id 
> = d.id;
> -- Both are optimized into BMJ
> -- ICEBERG BUCKET-MAP-JOIN via Iceberg's storage-partitioned join
> DROP TABLE IF EXISTS default.icespbmjt1 PURGE;
> DROP TABLE IF EXISTS default.icespbmjt2 PURGE;
> CREATE TABLE default.icespbmjt1 (txt string) PARTITIONED BY (id int) STORED 
> BY ICEBERG ;
> CREATE TABLE default.icespbmjt2 (txt string) PARTITIONED BY (id int) STORED 
> BY ICEBERG ;
> INSERT INTO default.icespbmjt1 VALUES ('1',1),('2',2),('3',3),('4',4);
> INSERT INTO default.icespbmjt2 VALUES ('1',1),('2',2),('3',3),('4',4);
> EXPLAIN
> SELECT * FROM default.icespbmjt1 f INNER  JOIN default.icespbmjt2 d ON 
> f.id = d.id;
> EXPLAIN
> SELECT * FROM default.icespbmjt1 f LEFT OUTER JOIN default.icespbmjt2 d ON 
> f.id = d.id;
> -- Only Map-Join optimised
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27776) Iceberg: Upgrade iceberg version to 1.4.0

2023-10-06 Thread zhangbutao (Jira)
zhangbutao created HIVE-27776:
-

 Summary: Iceberg: Upgrade iceberg version to 1.4.0
 Key: HIVE-27776
 URL: https://issues.apache.org/jira/browse/HIVE-27776
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: zhangbutao


[https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-core/1.4.0]

Iceberg1.4.0 has been released out, and we need upgrade iceberg depdency from 
1.3.1 to 1.4.0. Meantime, we should port some Hive catalog changes from Iceberg 
repo to Hive repo.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27775) DirectSQL and JDO results are different when fetching partitions by timestamp in DST shift

2023-10-06 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772556#comment-17772556
 ] 

Stamatis Zampetakis commented on HIVE-27775:


We have discovered this bug while writting tests for HIVE-27760.

> DirectSQL and JDO results are different when fetching partitions by timestamp 
> in DST shift
> --
>
> Key: HIVE-27775
> URL: https://issues.apache.org/jira/browse/HIVE-27775
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Priority: Major
>
> DirectSQL and JDO results are different when fetching partitions by timestamp 
> in DST shift.
> {code:sql}
> --! qt:timezone:Europe/Paris
> CREATE EXTERNAL TABLE payments (card string) PARTITIONED BY(txn_datetime 
> TIMESTAMP) STORED AS ORC;
> INSERT into payments VALUES('---', '2023-03-26 02:30:00');
> SELECT * FROM payments WHERE txn_datetime = '2023-03-26 02:30:00';
> {code}
> The '2023-03-26 02:30:00' is a timestamp that in Europe/Paris timezone falls 
> exactly in the middle of the DST shift. In this particular timezone this date 
> time never really exists since we are jumping directly from 02:00:00 to 
> 03:00:00. However, the TIMESTAMP data type in Hive is timezone agnostic 
> (https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types) 
> so it is a perfectly valid timestamp that can be inserted in a table and we 
> must be able to recover it back.
> For the SELECT query above, partition pruning kicks in and calls the 
> ObjectStore#getPartitionsByExpr method in order to fetch the respective 
> partitions matching the timestamp from HMS.
> The tests however reveal that DirectSQL and JDO paths are not returning the 
> same results leading to an exception when VerifyingObjectStore is used. 
> According to the error below DirectSQL is able to recover one partition from 
> HMS (expected) while JDO/ORM returns empty (not expected).
> {noformat}
> 2023-10-06T03:51:19,406 ERROR [80252df4-3fdc-4971-badf-ad67ce8567c7 main] 
> metastore.VerifyingObjectStore: Lists are not the same size: SQL 1, ORM 0
> 2023-10-06T03:51:19,409 ERROR [80252df4-3fdc-4971-badf-ad67ce8567c7 main] 
> metastore.RetryingHMSHandler: MetaException(message:Lists are not the same 
> size: SQL 1, ORM 0)
>   at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.verifyLists(VerifyingObjectStore.java:148)
>   at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:88)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
>   at com.sun.proxy.$Proxy57.getPartitionsByExpr(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HMSHandler.get_partitions_spec_by_expr(HMSHandler.java:7330)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:98)
>   at 
> org.apache.hadoop.hive.metastore.AbstractHMSHandlerProxy.invoke(AbstractHMSHandlerProxy.java:82)
>   at com.sun.proxy.$Proxy59.get_partitions_spec_by_expr(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionsSpecByExprInternal(HiveMetaStoreClient.java:2472)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreClientWithLocalCache.getPartitionsSpecByExprInternal(HiveMetaStoreClientWithLocalCache.java:396)
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getPartitionsSpecByExprInternal(SessionHiveMetaStoreClient.java:2279)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsSpecByExpr(HiveMetaStoreClient.java:2484)
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.listPartitionsSpecByExpr(SessionHiveMetaStoreClient.java:1346)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.r

[jira] [Assigned] (HIVE-27775) DirectSQL and JDO results are different when fetching partitions by timestamp in DST shift

2023-10-06 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-27775:
--

Assignee: (was: Stamatis Zampetakis)

> DirectSQL and JDO results are different when fetching partitions by timestamp 
> in DST shift
> --
>
> Key: HIVE-27775
> URL: https://issues.apache.org/jira/browse/HIVE-27775
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Priority: Major
>
> DirectSQL and JDO results are different when fetching partitions by timestamp 
> in DST shift.
> {code:sql}
> --! qt:timezone:Europe/Paris
> CREATE EXTERNAL TABLE payments (card string) PARTITIONED BY(txn_datetime 
> TIMESTAMP) STORED AS ORC;
> INSERT into payments VALUES('---', '2023-03-26 02:30:00');
> SELECT * FROM payments WHERE txn_datetime = '2023-03-26 02:30:00';
> {code}
> The '2023-03-26 02:30:00' is a timestamp that in Europe/Paris timezone falls 
> exactly in the middle of the DST shift. In this particular timezone this date 
> time never really exists since we are jumping directly from 02:00:00 to 
> 03:00:00. However, the TIMESTAMP data type in Hive is timezone agnostic 
> (https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types) 
> so it is a perfectly valid timestamp that can be inserted in a table and we 
> must be able to recover it back.
> For the SELECT query above, partition pruning kicks in and calls the 
> ObjectStore#getPartitionsByExpr method in order to fetch the respective 
> partitions matching the timestamp from HMS.
> The tests however reveal that DirectSQL and JDO paths are not returning the 
> same results leading to an exception when VerifyingObjectStore is used. 
> According to the error below DirectSQL is able to recover one partition from 
> HMS (expected) while JDO/ORM returns empty (not expected).
> {noformat}
> 2023-10-06T03:51:19,406 ERROR [80252df4-3fdc-4971-badf-ad67ce8567c7 main] 
> metastore.VerifyingObjectStore: Lists are not the same size: SQL 1, ORM 0
> 2023-10-06T03:51:19,409 ERROR [80252df4-3fdc-4971-badf-ad67ce8567c7 main] 
> metastore.RetryingHMSHandler: MetaException(message:Lists are not the same 
> size: SQL 1, ORM 0)
>   at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.verifyLists(VerifyingObjectStore.java:148)
>   at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:88)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
>   at com.sun.proxy.$Proxy57.getPartitionsByExpr(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HMSHandler.get_partitions_spec_by_expr(HMSHandler.java:7330)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:98)
>   at 
> org.apache.hadoop.hive.metastore.AbstractHMSHandlerProxy.invoke(AbstractHMSHandlerProxy.java:82)
>   at com.sun.proxy.$Proxy59.get_partitions_spec_by_expr(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionsSpecByExprInternal(HiveMetaStoreClient.java:2472)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreClientWithLocalCache.getPartitionsSpecByExprInternal(HiveMetaStoreClientWithLocalCache.java:396)
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getPartitionsSpecByExprInternal(SessionHiveMetaStoreClient.java:2279)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsSpecByExpr(HiveMetaStoreClient.java:2484)
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.listPartitionsSpecByExpr(SessionHiveMetaStoreClient.java:1346)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.h

[jira] [Created] (HIVE-27775) DirectSQL and JDO results are different when fetching partitions by timestamp in DST shift

2023-10-06 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-27775:
--

 Summary: DirectSQL and JDO results are different when fetching 
partitions by timestamp in DST shift
 Key: HIVE-27775
 URL: https://issues.apache.org/jira/browse/HIVE-27775
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Affects Versions: 4.0.0-beta-1
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


DirectSQL and JDO results are different when fetching partitions by timestamp 
in DST shift.

{code:sql}
--! qt:timezone:Europe/Paris
CREATE EXTERNAL TABLE payments (card string) PARTITIONED BY(txn_datetime 
TIMESTAMP) STORED AS ORC;
INSERT into payments VALUES('---', '2023-03-26 02:30:00');
SELECT * FROM payments WHERE txn_datetime = '2023-03-26 02:30:00';
{code}

The '2023-03-26 02:30:00' is a timestamp that in Europe/Paris timezone falls 
exactly in the middle of the DST shift. In this particular timezone this date 
time never really exists since we are jumping directly from 02:00:00 to 
03:00:00. However, the TIMESTAMP data type in Hive is timezone agnostic 
(https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types) so 
it is a perfectly valid timestamp that can be inserted in a table and we must 
be able to recover it back.

For the SELECT query above, partition pruning kicks in and calls the 
ObjectStore#getPartitionsByExpr method in order to fetch the respective 
partitions matching the timestamp from HMS.

The tests however reveal that DirectSQL and JDO paths are not returning the 
same results leading to an exception when VerifyingObjectStore is used. 
According to the error below DirectSQL is able to recover one partition from 
HMS (expected) while JDO/ORM returns empty (not expected).

{noformat}
2023-10-06T03:51:19,406 ERROR [80252df4-3fdc-4971-badf-ad67ce8567c7 main] 
metastore.VerifyingObjectStore: Lists are not the same size: SQL 1, ORM 0
2023-10-06T03:51:19,409 ERROR [80252df4-3fdc-4971-badf-ad67ce8567c7 main] 
metastore.RetryingHMSHandler: MetaException(message:Lists are not the same 
size: SQL 1, ORM 0)
at 
org.apache.hadoop.hive.metastore.VerifyingObjectStore.verifyLists(VerifyingObjectStore.java:148)
at 
org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:88)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
at com.sun.proxy.$Proxy57.getPartitionsByExpr(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HMSHandler.get_partitions_spec_by_expr(HMSHandler.java:7330)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:98)
at 
org.apache.hadoop.hive.metastore.AbstractHMSHandlerProxy.invoke(AbstractHMSHandlerProxy.java:82)
at com.sun.proxy.$Proxy59.get_partitions_spec_by_expr(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionsSpecByExprInternal(HiveMetaStoreClient.java:2472)
at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreClientWithLocalCache.getPartitionsSpecByExprInternal(HiveMetaStoreClientWithLocalCache.java:396)
at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getPartitionsSpecByExprInternal(SessionHiveMetaStoreClient.java:2279)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsSpecByExpr(HiveMetaStoreClient.java:2484)
at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.listPartitionsSpecByExpr(SessionHiveMetaStoreClient.java:1346)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:213)
at com.sun.proxy.$Proxy60.listPartitionsSpecByExpr(Unknown Source)
at 
org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByExpr(Hive.java:4507)
at 
org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.getPartitionsFromServer(Part

[jira] [Updated] (HIVE-27774) Clean up properties

2023-10-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27774:
--
Labels: pull-request-available  (was: )

> Clean up properties
> ---
>
> Key: HIVE-27774
> URL: https://issues.apache.org/jira/browse/HIVE-27774
> Project: Hive
>  Issue Type: Task
>Reporter: Zsolt Miskolczi
>Priority: Major
>  Labels: pull-request-available
>
> Clean up after removing PowerMock
>  
> Remove powermock.version
> For mockito-inline, create a version property and use it. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27774) Clean up properties

2023-10-06 Thread Zsolt Miskolczi (Jira)
Zsolt Miskolczi created HIVE-27774:
--

 Summary: Clean up properties
 Key: HIVE-27774
 URL: https://issues.apache.org/jira/browse/HIVE-27774
 Project: Hive
  Issue Type: Task
Reporter: Zsolt Miskolczi


Clean up after removing PowerMock

 

Remove powermock.version

For mockito-inline, create a version property and use it. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27760) WHERE condition on DATE type partitioning column leads to wrong results

2023-10-06 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27760:
---
Summary: WHERE condition on DATE type partitioning column leads to wrong 
results   (was: Filter on date type partitioning column producing 0 results)

> WHERE condition on DATE type partitioning column leads to wrong results 
> 
>
> Key: HIVE-27760
> URL: https://issues.apache.org/jira/browse/HIVE-27760
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: pull-request-available
>
> Filter on date type partitioning columns producing 0 results.
> {*}Reproduction steps{*}:
> 1. test.q
> {noformat}
> CREATE EXTERNAL TABLE test(a string,b String) PARTITIONED BY(PartitionDate 
> DATE) STORED AS ORC;
> INSERT into test(PartitionDate, a,b) 
> VALUES('2023-01-01','2023-01-01','2023-01-01');
> INSERT into test(PartitionDate, a,b) 
> VALUES('2023-01-02','2023-01-02','2023-01-02');
> select count(*) from test where PartitionDate = '2023-01-01';{noformat}
> 2. Command to execute (pass different timezone than server)
> {noformat}
> mvn test -Dtest=TestMiniTezCliDriver -Dqfile=test.q 
> -Dtest.output.overwrite=true -Duser.timezone=Asia/Hong_Kong{noformat}
>  
> *RootCause:* As a part of HIVE-27373 issue fix to parse the string to 
> java.sql.Date object, java.text.SimpleDateFormat is replaced with 
> java.time.format.DateTimeFormatter using java.time.LocalDate which represents 
> a Date without TimeZone.  Here this input is passed [here 
> |https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1370]
>  which uses SimpleDateFormat(parsing dates in a locale-sensitive manner) and 
> java.sql.Date. Here user timezone is passed different so actual value is 
> getting changed to a different value (for example 2023-01-01 is changed to 
> 2022-12-31) which is not matching with any partition so nothing gets returned.
> *Solution:*
> 1. In MetaStoreDirectSql.java, we should use 
> java.time.format.DateTimeFormatter with java.time.LocalDate so that it will 
> return proper date string.
>         Or
> 2. Revert the code changes(only formatter and LocalDate) done for HIVE-27373 
> and use SimpleDateFormat and java.sql.Date.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27760) WHERE condition on DATE type partitioning column leads to wrong results

2023-10-06 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-27760.

Fix Version/s: 4.0.0
   Resolution: Fixed

Fixed in 
https://github.com/apache/hive/commit/1e919dd98169212ab10d267de0dc25bbbdf96b1e. 
Thanks for the PR [~Dayakar]!

> WHERE condition on DATE type partitioning column leads to wrong results 
> 
>
> Key: HIVE-27760
> URL: https://issues.apache.org/jira/browse/HIVE-27760
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Filter on date type partitioning columns producing 0 results.
> {*}Reproduction steps{*}:
> 1. test.q
> {noformat}
> CREATE EXTERNAL TABLE test(a string,b String) PARTITIONED BY(PartitionDate 
> DATE) STORED AS ORC;
> INSERT into test(PartitionDate, a,b) 
> VALUES('2023-01-01','2023-01-01','2023-01-01');
> INSERT into test(PartitionDate, a,b) 
> VALUES('2023-01-02','2023-01-02','2023-01-02');
> select count(*) from test where PartitionDate = '2023-01-01';{noformat}
> 2. Command to execute (pass different timezone than server)
> {noformat}
> mvn test -Dtest=TestMiniTezCliDriver -Dqfile=test.q 
> -Dtest.output.overwrite=true -Duser.timezone=Asia/Hong_Kong{noformat}
>  
> *RootCause:* As a part of HIVE-27373 issue fix to parse the string to 
> java.sql.Date object, java.text.SimpleDateFormat is replaced with 
> java.time.format.DateTimeFormatter using java.time.LocalDate which represents 
> a Date without TimeZone.  Here this input is passed [here 
> |https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1370]
>  which uses SimpleDateFormat(parsing dates in a locale-sensitive manner) and 
> java.sql.Date. Here user timezone is passed different so actual value is 
> getting changed to a different value (for example 2023-01-01 is changed to 
> 2022-12-31) which is not matching with any partition so nothing gets returned.
> *Solution:*
> 1. In MetaStoreDirectSql.java, we should use 
> java.time.format.DateTimeFormatter with java.time.LocalDate so that it will 
> return proper date string.
>         Or
> 2. Revert the code changes(only formatter and LocalDate) done for HIVE-27373 
> and use SimpleDateFormat and java.sql.Date.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27084) Iceberg: Stats are not populated correctly during query compilation

2023-10-06 Thread okumin (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772464#comment-17772464
 ] 

okumin commented on HIVE-27084:
---

[~rajesh.balamohan] It could not be a problem of Iceberg and it could be an 
expected behavior.

First, the first query scans two columns, `ss_sold_date_sk` and 
`ss_wholesale_cost`. The second query scans a single column, `ss_sold_date_sk`. 
I guess `ss_wholesale_cost` is not a partition column. So, the first query has 
to scan more data than the second one.

{code:sql}
select count(*) from store_sales where ss_sold_date_sk=2450822 and 
ss_wholesale_cost > 0.0;
select count(*) from store_sales where ss_sold_date_sk=2450822
{code}

Then, I guess `ss_sold_date_sk` is int or bigint and `ss_wholesale_cost` is 
decimal. On my machine, the derived size of bigint is 8 while that of decimal 
is 112. It can potentially make a big difference.

> Iceberg: Stats are not populated correctly during query compilation
> ---
>
> Key: HIVE-27084
> URL: https://issues.apache.org/jira/browse/HIVE-27084
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: performance
>
> - Table stats are not properly used/computed during query compilation phase.
>  - Here is an example. Check out the query with the filter which give more 
> data than the regular query
> This is just an example, real world queries can have bad query plans due to 
> this
> {{10470974584 with filter, vs 303658262936 without filter}}
> {noformat}
> explain select count(*) from store_sales where ss_sold_date_sk=2450822 and 
> ss_wholesale_cost > 0.0
> Explain
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: hive_20230216065808_80d68e3f-3a6b-422b-9265-50bc707ae3c6:48
>   Edges:
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
>   DagName: hive_20230216065808_80d68e3f-3a6b-422b-9265-50bc707ae3c6:48
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: ((ss_sold_date_sk = 2450822) and 
> (ss_wholesale_cost > 0)) (type: boolean)
>   Statistics: Num rows: 2755519629 Data size: 303658262936 
> Basic stats: COMPLETE Column stats: NONE
>   Filter Operator
> predicate: ((ss_sold_date_sk = 2450822) and 
> (ss_wholesale_cost > 0)) (type: boolean)
> Statistics: Num rows: 5 Data size: 550 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   Statistics: Num rows: 5 Data size: 550 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> aggregations: count()
> minReductionHashAggr: 0.99
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 124 Basic stats: 
> COMPLETE Column stats: NONE
> Reduce Output Operator
>   null sort order:
>   sort order:
>   Statistics: Num rows: 1 Data size: 124 Basic stats: 
> COMPLETE Column stats: NONE
>   value expressions: _col0 (type: bigint)
> Execution mode: vectorized, llap
> LLAP IO: all inputs (cache only)
> Reducer 2
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Group By Operator
> aggregations: count(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 124 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 1 Data size: 124 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> ListSink
> 58 rows selected (0.73 seconds)
> explain select count(*) from store_sales where ss_sold_date_sk=2450822
> INFO  : Starting task [Stage-3:EXPLAIN] in serial mode
> INFO  : Completed executing 
> command(queryId=hive_20230216065813_e51482a2-1c9a-41a7-b1b3-9aec2fba9ba7); 
> Time ta