[jira] [Resolved] (HIVE-27728) Changed behavior for alter table rename partition from legacy tables
[ https://issues.apache.org/jira/browse/HIVE-27728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng resolved HIVE-27728. Fix Version/s: 4.0.0 Resolution: Fixed Thank you [~ngangam] and [~hemanth619] for the review! > Changed behavior for alter table rename partition from legacy tables > > > Key: HIVE-27728 > URL: https://issues.apache.org/jira/browse/HIVE-27728 > Project: Hive > Issue Type: Bug >Reporter: Naveen Gangam >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > set hive.create.as.external.legacy=true; > CREATE TABLE default.metadata_test1( > emp_number int, > emp_name string, > city string) > PARTITIONED BY(state string) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ‘,’; > INSERT INTO default.metadata_test1 PARTITION(state=‘A’) VALUES (11, ‘ABC’, > ‘AA’); > INSERT INTO default.metadata_test1 PARTITION(state=‘B’) VALUES (12, ‘XYZ’, > ‘BX’); > INSERT INTO default.metadata_test1 PARTITION(state=‘B’) VALUES (13, ‘UVW’, > ‘BU’); > from hdfs > $ hdfs dfs -ls /warehouse/tablespace/external/hive/metadata_test1 > Found 2 items > drwxr-xr-x - hive hive 0 2023-08-11 14:24 > /warehouse/tablespace/external/hive/metadata_test1/state=A > drwxr-xr-x - hive hive 0 2023-08-11 14:25 > /warehouse/tablespace/external/hive/metadata_test1/state=B > Now when we alter the partition. > ALTER TABLE default.metadata_test1 PARTITION (state=‘A’) RENAME TO PARTITION > (state=‘C’); > select * from default.metadata_test1; > ++--+--+---+ > | metadata_test1.emp_number | metadata_test1.emp_name | metadata_test1.city | > metadata_test1.state | > ++--+--+---+ > | 12 | XYZ | BX | B | > | 13 | UVW | BU | B | > | 11 | ABC | AA | C | > ++--+--+---+ > show partitions default.metadata_test1; > ++ > | partition | > ++ > | state=B | > | state=C | > ++ > But from HDFS > hdfs dfs -ls /warehouse/tablespace/external/hive/metadata_test1 > Found 2 items > drwxr-xr-x - hive hive 0 2023-08-11 14:24 > /warehouse/tablespace/external/hive/metadata_test1/state=A > drwxr-xr-x - hive hive 0 2023-08-11 14:25 > /warehouse/tablespace/external/hive/metadata_test1/state=B > Hive only relocates/renames the partition locations to match the new > partition value if it is a MANAGED table. For external tables, we assume that > ETL pipelines are writing to the old location and thus will not rename the > dir. > But for legacy tables, (create table) that would have created non-acid > managed tables, we convert this tables to EXTERNAL_TABLE but with auto-purge > enabled. For such tables, we should take the liberty to managed these > locations as well to match the legacy MANAGED table behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27584) Backport HIVE-21407 to branch-3
[ https://issues.apache.org/jira/browse/HIVE-27584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pravin Sinha resolved HIVE-27584. - Resolution: Fixed Committed to branch-3. Thanks for the review [~chinnalalam] !!! > Backport HIVE-21407 to branch-3 > --- > > Key: HIVE-27584 > URL: https://issues.apache.org/jira/browse/HIVE-27584 > Project: Hive > Issue Type: Task >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > > HIVE-21407: Parquet predicate pushdown is not working correctly for char > column types (Marta Kuczora reviewed by Peter Vary) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27399) Add lateral view support for CBO
[ https://issues.apache.org/jira/browse/HIVE-27399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-27399. Fix Version/s: 4.0.0 Resolution: Fixed Fixed in https://github.com/apache/hive/commit/6d75f92142c7bf2c4e531242c4b106b4c0d71221. Thanks for the PR [~scarlin] awesome work! Also many thanks to [~kkasa] for the review. > Add lateral view support for CBO > > > Key: HIVE-27399 > URL: https://issues.apache.org/jira/browse/HIVE-27399 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Steve Carlin >Assignee: Steve Carlin >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > This subtask consists of the implementation of lateral views for CBO. The > first task was to refactor the code but make sure all the "q" tests work. > This Jira will make sure CBO works for lateral views and add appropriate > tests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27734) Add Icenerg's storage-partitioned join capabilities to Hive's [sorted-]bucket-map-join
[ https://issues.apache.org/jira/browse/HIVE-27734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772576#comment-17772576 ] okumin commented on HIVE-27734: --- I'm taking a look since I have experience implementing Bucket Map Join for non-native tables. > Add Icenerg's storage-partitioned join capabilities to Hive's > [sorted-]bucket-map-join > -- > > Key: HIVE-27734 > URL: https://issues.apache.org/jira/browse/HIVE-27734 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-alpha-2 >Reporter: Janos Kovacs >Priority: Major > > Iceberg's 'data bucketing' is implemented through its rich (function based) > partitioning feature which helps to optimize join operations - called storage > partitioned joins. > doc: > [https://docs.google.com/document/d/1foTkDSM91VxKgkEcBMsuAvEjNybjja-uHk-r3vtXWFE/edit#heading=h.82w8qxfl2uwl] > spark impl.: https://issues.apache.org/jira/browse/SPARK-37375 > This feature is not yet leveraged in Hive into its bucket-map-join > optimization, neither alone nor with Iceberg's SortOrder to > sorted-bucket-map-join. > Customers migrating from Hive table format to Iceberg format with storage > optimized schema will experience performance degradation on large tables > where Iceberg's gain on no-listing performance improvement is significantly > smaller than the actual join performance over bucket-join or even > sorted-bucket-join. > > {noformat} > SET hive.query.results.cache.enabled=false; > SET hive.fetch.task.conversion = none; > SET hive.optimize.bucketmapjoin=true; > SET hive.convert.join.bucket.mapjoin.tez=true; > SET hive.auto.convert.join.noconditionaltask.size=1000; > --if you are working with external table, you need this for bmj: > SET hive.disable.unsafe.external.table.operations=false; > -- HIVE BUCKET-MAP-JOIN > DROP TABLE IF EXISTS default.hivebmjt1 PURGE; > DROP TABLE IF EXISTS default.hivebmjt2 PURGE; > CREATE TABLE default.hivebmjt1 (id int, txt string) CLUSTERED BY (id) INTO 8 > BUCKETS; > CREATE TABLE default.hivebmjt2 (id int, txt string); > INSERT INTO default.hivebmjt1 VALUES > (1,'1'),(2,'2'),(3,'3'),(4,'4'),(5,'5'),(6,'6'),(7,'7'),(8,'8'); > INSERT INTO default.hivebmjt2 VALUES (1,'1'),(2,'2'),(3,'3'),(4,'4'); > EXPLAIN > SELECT * FROM default.hivebmjt1 f INNER JOIN default.hivebmjt2 d ON f.id > = d.id; > EXPLAIN > SELECT * FROM default.hivebmjt1 f LEFT OUTER JOIN default.hivebmjt2 d ON f.id > = d.id; > -- Both are optimized into BMJ > -- ICEBERG BUCKET-MAP-JOIN via Iceberg's storage-partitioned join > DROP TABLE IF EXISTS default.icespbmjt1 PURGE; > DROP TABLE IF EXISTS default.icespbmjt2 PURGE; > CREATE TABLE default.icespbmjt1 (txt string) PARTITIONED BY (id int) STORED > BY ICEBERG ; > CREATE TABLE default.icespbmjt2 (txt string) PARTITIONED BY (id int) STORED > BY ICEBERG ; > INSERT INTO default.icespbmjt1 VALUES ('1',1),('2',2),('3',3),('4',4); > INSERT INTO default.icespbmjt2 VALUES ('1',1),('2',2),('3',3),('4',4); > EXPLAIN > SELECT * FROM default.icespbmjt1 f INNER JOIN default.icespbmjt2 d ON > f.id = d.id; > EXPLAIN > SELECT * FROM default.icespbmjt1 f LEFT OUTER JOIN default.icespbmjt2 d ON > f.id = d.id; > -- Only Map-Join optimised > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27776) Iceberg: Upgrade iceberg version to 1.4.0
zhangbutao created HIVE-27776: - Summary: Iceberg: Upgrade iceberg version to 1.4.0 Key: HIVE-27776 URL: https://issues.apache.org/jira/browse/HIVE-27776 Project: Hive Issue Type: Improvement Components: Iceberg integration Reporter: zhangbutao [https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-core/1.4.0] Iceberg1.4.0 has been released out, and we need upgrade iceberg depdency from 1.3.1 to 1.4.0. Meantime, we should port some Hive catalog changes from Iceberg repo to Hive repo. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27775) DirectSQL and JDO results are different when fetching partitions by timestamp in DST shift
[ https://issues.apache.org/jira/browse/HIVE-27775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772556#comment-17772556 ] Stamatis Zampetakis commented on HIVE-27775: We have discovered this bug while writting tests for HIVE-27760. > DirectSQL and JDO results are different when fetching partitions by timestamp > in DST shift > -- > > Key: HIVE-27775 > URL: https://issues.apache.org/jira/browse/HIVE-27775 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore >Affects Versions: 4.0.0-beta-1 >Reporter: Stamatis Zampetakis >Priority: Major > > DirectSQL and JDO results are different when fetching partitions by timestamp > in DST shift. > {code:sql} > --! qt:timezone:Europe/Paris > CREATE EXTERNAL TABLE payments (card string) PARTITIONED BY(txn_datetime > TIMESTAMP) STORED AS ORC; > INSERT into payments VALUES('---', '2023-03-26 02:30:00'); > SELECT * FROM payments WHERE txn_datetime = '2023-03-26 02:30:00'; > {code} > The '2023-03-26 02:30:00' is a timestamp that in Europe/Paris timezone falls > exactly in the middle of the DST shift. In this particular timezone this date > time never really exists since we are jumping directly from 02:00:00 to > 03:00:00. However, the TIMESTAMP data type in Hive is timezone agnostic > (https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types) > so it is a perfectly valid timestamp that can be inserted in a table and we > must be able to recover it back. > For the SELECT query above, partition pruning kicks in and calls the > ObjectStore#getPartitionsByExpr method in order to fetch the respective > partitions matching the timestamp from HMS. > The tests however reveal that DirectSQL and JDO paths are not returning the > same results leading to an exception when VerifyingObjectStore is used. > According to the error below DirectSQL is able to recover one partition from > HMS (expected) while JDO/ORM returns empty (not expected). > {noformat} > 2023-10-06T03:51:19,406 ERROR [80252df4-3fdc-4971-badf-ad67ce8567c7 main] > metastore.VerifyingObjectStore: Lists are not the same size: SQL 1, ORM 0 > 2023-10-06T03:51:19,409 ERROR [80252df4-3fdc-4971-badf-ad67ce8567c7 main] > metastore.RetryingHMSHandler: MetaException(message:Lists are not the same > size: SQL 1, ORM 0) > at > org.apache.hadoop.hive.metastore.VerifyingObjectStore.verifyLists(VerifyingObjectStore.java:148) > at > org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:88) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) > at com.sun.proxy.$Proxy57.getPartitionsByExpr(Unknown Source) > at > org.apache.hadoop.hive.metastore.HMSHandler.get_partitions_spec_by_expr(HMSHandler.java:7330) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:98) > at > org.apache.hadoop.hive.metastore.AbstractHMSHandlerProxy.invoke(AbstractHMSHandlerProxy.java:82) > at com.sun.proxy.$Proxy59.get_partitions_spec_by_expr(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionsSpecByExprInternal(HiveMetaStoreClient.java:2472) > at > org.apache.hadoop.hive.ql.metadata.HiveMetaStoreClientWithLocalCache.getPartitionsSpecByExprInternal(HiveMetaStoreClientWithLocalCache.java:396) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getPartitionsSpecByExprInternal(SessionHiveMetaStoreClient.java:2279) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsSpecByExpr(HiveMetaStoreClient.java:2484) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.listPartitionsSpecByExpr(SessionHiveMetaStoreClient.java:1346) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.r
[jira] [Assigned] (HIVE-27775) DirectSQL and JDO results are different when fetching partitions by timestamp in DST shift
[ https://issues.apache.org/jira/browse/HIVE-27775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis reassigned HIVE-27775: -- Assignee: (was: Stamatis Zampetakis) > DirectSQL and JDO results are different when fetching partitions by timestamp > in DST shift > -- > > Key: HIVE-27775 > URL: https://issues.apache.org/jira/browse/HIVE-27775 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore >Affects Versions: 4.0.0-beta-1 >Reporter: Stamatis Zampetakis >Priority: Major > > DirectSQL and JDO results are different when fetching partitions by timestamp > in DST shift. > {code:sql} > --! qt:timezone:Europe/Paris > CREATE EXTERNAL TABLE payments (card string) PARTITIONED BY(txn_datetime > TIMESTAMP) STORED AS ORC; > INSERT into payments VALUES('---', '2023-03-26 02:30:00'); > SELECT * FROM payments WHERE txn_datetime = '2023-03-26 02:30:00'; > {code} > The '2023-03-26 02:30:00' is a timestamp that in Europe/Paris timezone falls > exactly in the middle of the DST shift. In this particular timezone this date > time never really exists since we are jumping directly from 02:00:00 to > 03:00:00. However, the TIMESTAMP data type in Hive is timezone agnostic > (https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types) > so it is a perfectly valid timestamp that can be inserted in a table and we > must be able to recover it back. > For the SELECT query above, partition pruning kicks in and calls the > ObjectStore#getPartitionsByExpr method in order to fetch the respective > partitions matching the timestamp from HMS. > The tests however reveal that DirectSQL and JDO paths are not returning the > same results leading to an exception when VerifyingObjectStore is used. > According to the error below DirectSQL is able to recover one partition from > HMS (expected) while JDO/ORM returns empty (not expected). > {noformat} > 2023-10-06T03:51:19,406 ERROR [80252df4-3fdc-4971-badf-ad67ce8567c7 main] > metastore.VerifyingObjectStore: Lists are not the same size: SQL 1, ORM 0 > 2023-10-06T03:51:19,409 ERROR [80252df4-3fdc-4971-badf-ad67ce8567c7 main] > metastore.RetryingHMSHandler: MetaException(message:Lists are not the same > size: SQL 1, ORM 0) > at > org.apache.hadoop.hive.metastore.VerifyingObjectStore.verifyLists(VerifyingObjectStore.java:148) > at > org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:88) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) > at com.sun.proxy.$Proxy57.getPartitionsByExpr(Unknown Source) > at > org.apache.hadoop.hive.metastore.HMSHandler.get_partitions_spec_by_expr(HMSHandler.java:7330) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:98) > at > org.apache.hadoop.hive.metastore.AbstractHMSHandlerProxy.invoke(AbstractHMSHandlerProxy.java:82) > at com.sun.proxy.$Proxy59.get_partitions_spec_by_expr(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionsSpecByExprInternal(HiveMetaStoreClient.java:2472) > at > org.apache.hadoop.hive.ql.metadata.HiveMetaStoreClientWithLocalCache.getPartitionsSpecByExprInternal(HiveMetaStoreClientWithLocalCache.java:396) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getPartitionsSpecByExprInternal(SessionHiveMetaStoreClient.java:2279) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsSpecByExpr(HiveMetaStoreClient.java:2484) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.listPartitionsSpecByExpr(SessionHiveMetaStoreClient.java:1346) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.h
[jira] [Created] (HIVE-27775) DirectSQL and JDO results are different when fetching partitions by timestamp in DST shift
Stamatis Zampetakis created HIVE-27775: -- Summary: DirectSQL and JDO results are different when fetching partitions by timestamp in DST shift Key: HIVE-27775 URL: https://issues.apache.org/jira/browse/HIVE-27775 Project: Hive Issue Type: Bug Components: Standalone Metastore Affects Versions: 4.0.0-beta-1 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis DirectSQL and JDO results are different when fetching partitions by timestamp in DST shift. {code:sql} --! qt:timezone:Europe/Paris CREATE EXTERNAL TABLE payments (card string) PARTITIONED BY(txn_datetime TIMESTAMP) STORED AS ORC; INSERT into payments VALUES('---', '2023-03-26 02:30:00'); SELECT * FROM payments WHERE txn_datetime = '2023-03-26 02:30:00'; {code} The '2023-03-26 02:30:00' is a timestamp that in Europe/Paris timezone falls exactly in the middle of the DST shift. In this particular timezone this date time never really exists since we are jumping directly from 02:00:00 to 03:00:00. However, the TIMESTAMP data type in Hive is timezone agnostic (https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types) so it is a perfectly valid timestamp that can be inserted in a table and we must be able to recover it back. For the SELECT query above, partition pruning kicks in and calls the ObjectStore#getPartitionsByExpr method in order to fetch the respective partitions matching the timestamp from HMS. The tests however reveal that DirectSQL and JDO paths are not returning the same results leading to an exception when VerifyingObjectStore is used. According to the error below DirectSQL is able to recover one partition from HMS (expected) while JDO/ORM returns empty (not expected). {noformat} 2023-10-06T03:51:19,406 ERROR [80252df4-3fdc-4971-badf-ad67ce8567c7 main] metastore.VerifyingObjectStore: Lists are not the same size: SQL 1, ORM 0 2023-10-06T03:51:19,409 ERROR [80252df4-3fdc-4971-badf-ad67ce8567c7 main] metastore.RetryingHMSHandler: MetaException(message:Lists are not the same size: SQL 1, ORM 0) at org.apache.hadoop.hive.metastore.VerifyingObjectStore.verifyLists(VerifyingObjectStore.java:148) at org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:88) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) at com.sun.proxy.$Proxy57.getPartitionsByExpr(Unknown Source) at org.apache.hadoop.hive.metastore.HMSHandler.get_partitions_spec_by_expr(HMSHandler.java:7330) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:98) at org.apache.hadoop.hive.metastore.AbstractHMSHandlerProxy.invoke(AbstractHMSHandlerProxy.java:82) at com.sun.proxy.$Proxy59.get_partitions_spec_by_expr(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionsSpecByExprInternal(HiveMetaStoreClient.java:2472) at org.apache.hadoop.hive.ql.metadata.HiveMetaStoreClientWithLocalCache.getPartitionsSpecByExprInternal(HiveMetaStoreClientWithLocalCache.java:396) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getPartitionsSpecByExprInternal(SessionHiveMetaStoreClient.java:2279) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsSpecByExpr(HiveMetaStoreClient.java:2484) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.listPartitionsSpecByExpr(SessionHiveMetaStoreClient.java:1346) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:213) at com.sun.proxy.$Proxy60.listPartitionsSpecByExpr(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByExpr(Hive.java:4507) at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.getPartitionsFromServer(Part
[jira] [Updated] (HIVE-27774) Clean up properties
[ https://issues.apache.org/jira/browse/HIVE-27774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27774: -- Labels: pull-request-available (was: ) > Clean up properties > --- > > Key: HIVE-27774 > URL: https://issues.apache.org/jira/browse/HIVE-27774 > Project: Hive > Issue Type: Task >Reporter: Zsolt Miskolczi >Priority: Major > Labels: pull-request-available > > Clean up after removing PowerMock > > Remove powermock.version > For mockito-inline, create a version property and use it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27774) Clean up properties
Zsolt Miskolczi created HIVE-27774: -- Summary: Clean up properties Key: HIVE-27774 URL: https://issues.apache.org/jira/browse/HIVE-27774 Project: Hive Issue Type: Task Reporter: Zsolt Miskolczi Clean up after removing PowerMock Remove powermock.version For mockito-inline, create a version property and use it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27760) WHERE condition on DATE type partitioning column leads to wrong results
[ https://issues.apache.org/jira/browse/HIVE-27760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-27760: --- Summary: WHERE condition on DATE type partitioning column leads to wrong results (was: Filter on date type partitioning column producing 0 results) > WHERE condition on DATE type partitioning column leads to wrong results > > > Key: HIVE-27760 > URL: https://issues.apache.org/jira/browse/HIVE-27760 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Dayakar M >Assignee: Dayakar M >Priority: Major > Labels: pull-request-available > > Filter on date type partitioning columns producing 0 results. > {*}Reproduction steps{*}: > 1. test.q > {noformat} > CREATE EXTERNAL TABLE test(a string,b String) PARTITIONED BY(PartitionDate > DATE) STORED AS ORC; > INSERT into test(PartitionDate, a,b) > VALUES('2023-01-01','2023-01-01','2023-01-01'); > INSERT into test(PartitionDate, a,b) > VALUES('2023-01-02','2023-01-02','2023-01-02'); > select count(*) from test where PartitionDate = '2023-01-01';{noformat} > 2. Command to execute (pass different timezone than server) > {noformat} > mvn test -Dtest=TestMiniTezCliDriver -Dqfile=test.q > -Dtest.output.overwrite=true -Duser.timezone=Asia/Hong_Kong{noformat} > > *RootCause:* As a part of HIVE-27373 issue fix to parse the string to > java.sql.Date object, java.text.SimpleDateFormat is replaced with > java.time.format.DateTimeFormatter using java.time.LocalDate which represents > a Date without TimeZone. Here this input is passed [here > |https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1370] > which uses SimpleDateFormat(parsing dates in a locale-sensitive manner) and > java.sql.Date. Here user timezone is passed different so actual value is > getting changed to a different value (for example 2023-01-01 is changed to > 2022-12-31) which is not matching with any partition so nothing gets returned. > *Solution:* > 1. In MetaStoreDirectSql.java, we should use > java.time.format.DateTimeFormatter with java.time.LocalDate so that it will > return proper date string. > Or > 2. Revert the code changes(only formatter and LocalDate) done for HIVE-27373 > and use SimpleDateFormat and java.sql.Date. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27760) WHERE condition on DATE type partitioning column leads to wrong results
[ https://issues.apache.org/jira/browse/HIVE-27760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-27760. Fix Version/s: 4.0.0 Resolution: Fixed Fixed in https://github.com/apache/hive/commit/1e919dd98169212ab10d267de0dc25bbbdf96b1e. Thanks for the PR [~Dayakar]! > WHERE condition on DATE type partitioning column leads to wrong results > > > Key: HIVE-27760 > URL: https://issues.apache.org/jira/browse/HIVE-27760 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Dayakar M >Assignee: Dayakar M >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Filter on date type partitioning columns producing 0 results. > {*}Reproduction steps{*}: > 1. test.q > {noformat} > CREATE EXTERNAL TABLE test(a string,b String) PARTITIONED BY(PartitionDate > DATE) STORED AS ORC; > INSERT into test(PartitionDate, a,b) > VALUES('2023-01-01','2023-01-01','2023-01-01'); > INSERT into test(PartitionDate, a,b) > VALUES('2023-01-02','2023-01-02','2023-01-02'); > select count(*) from test where PartitionDate = '2023-01-01';{noformat} > 2. Command to execute (pass different timezone than server) > {noformat} > mvn test -Dtest=TestMiniTezCliDriver -Dqfile=test.q > -Dtest.output.overwrite=true -Duser.timezone=Asia/Hong_Kong{noformat} > > *RootCause:* As a part of HIVE-27373 issue fix to parse the string to > java.sql.Date object, java.text.SimpleDateFormat is replaced with > java.time.format.DateTimeFormatter using java.time.LocalDate which represents > a Date without TimeZone. Here this input is passed [here > |https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1370] > which uses SimpleDateFormat(parsing dates in a locale-sensitive manner) and > java.sql.Date. Here user timezone is passed different so actual value is > getting changed to a different value (for example 2023-01-01 is changed to > 2022-12-31) which is not matching with any partition so nothing gets returned. > *Solution:* > 1. In MetaStoreDirectSql.java, we should use > java.time.format.DateTimeFormatter with java.time.LocalDate so that it will > return proper date string. > Or > 2. Revert the code changes(only formatter and LocalDate) done for HIVE-27373 > and use SimpleDateFormat and java.sql.Date. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27084) Iceberg: Stats are not populated correctly during query compilation
[ https://issues.apache.org/jira/browse/HIVE-27084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772464#comment-17772464 ] okumin commented on HIVE-27084: --- [~rajesh.balamohan] It could not be a problem of Iceberg and it could be an expected behavior. First, the first query scans two columns, `ss_sold_date_sk` and `ss_wholesale_cost`. The second query scans a single column, `ss_sold_date_sk`. I guess `ss_wholesale_cost` is not a partition column. So, the first query has to scan more data than the second one. {code:sql} select count(*) from store_sales where ss_sold_date_sk=2450822 and ss_wholesale_cost > 0.0; select count(*) from store_sales where ss_sold_date_sk=2450822 {code} Then, I guess `ss_sold_date_sk` is int or bigint and `ss_wholesale_cost` is decimal. On my machine, the derived size of bigint is 8 while that of decimal is 112. It can potentially make a big difference. > Iceberg: Stats are not populated correctly during query compilation > --- > > Key: HIVE-27084 > URL: https://issues.apache.org/jira/browse/HIVE-27084 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: Rajesh Balamohan >Priority: Major > Labels: performance > > - Table stats are not properly used/computed during query compilation phase. > - Here is an example. Check out the query with the filter which give more > data than the regular query > This is just an example, real world queries can have bad query plans due to > this > {{10470974584 with filter, vs 303658262936 without filter}} > {noformat} > explain select count(*) from store_sales where ss_sold_date_sk=2450822 and > ss_wholesale_cost > 0.0 > Explain > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > DagId: hive_20230216065808_80d68e3f-3a6b-422b-9265-50bc707ae3c6:48 > Edges: > Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) > DagName: hive_20230216065808_80d68e3f-3a6b-422b-9265-50bc707ae3c6:48 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: store_sales > filterExpr: ((ss_sold_date_sk = 2450822) and > (ss_wholesale_cost > 0)) (type: boolean) > Statistics: Num rows: 2755519629 Data size: 303658262936 > Basic stats: COMPLETE Column stats: NONE > Filter Operator > predicate: ((ss_sold_date_sk = 2450822) and > (ss_wholesale_cost > 0)) (type: boolean) > Statistics: Num rows: 5 Data size: 550 Basic stats: > COMPLETE Column stats: NONE > Select Operator > Statistics: Num rows: 5 Data size: 550 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > aggregations: count() > minReductionHashAggr: 0.99 > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 124 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > null sort order: > sort order: > Statistics: Num rows: 1 Data size: 124 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col0 (type: bigint) > Execution mode: vectorized, llap > LLAP IO: all inputs (cache only) > Reducer 2 > Execution mode: vectorized, llap > Reduce Operator Tree: > Group By Operator > aggregations: count(VALUE._col0) > mode: mergepartial > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 124 Basic stats: COMPLETE > Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 1 Data size: 124 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.SequenceFileInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > 58 rows selected (0.73 seconds) > explain select count(*) from store_sales where ss_sold_date_sk=2450822 > INFO : Starting task [Stage-3:EXPLAIN] in serial mode > INFO : Completed executing > command(queryId=hive_20230216065813_e51482a2-1c9a-41a7-b1b3-9aec2fba9ba7); > Time ta