[jira] [Commented] (HIVE-28014) to_unix_timestamp udf produces inconsistent results in different jdk versions

2024-09-23 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883788#comment-17883788
 ] 

Stamatis Zampetakis commented on HIVE-28014:


I suppose that now that HIVE-28337 is fixed the first failure reported here in 
TestMetastoreUtils should no longer appear.

> to_unix_timestamp udf produces inconsistent results in different jdk versions
> -
>
> Key: HIVE-28014
> URL: https://issues.apache.org/jira/browse/HIVE-28014
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-beta-1
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>
> In HIVE-27999 we update the CI docker image which upgrades jdk8 from 
> {*}1.8.0_262-b19{*} to *1.8.0_392-b08*. This upgrade cause 3 timestamp 
> related tests failed:
> *1. Testing / split-02 / PostProcess / 
> testTimestampToString[zoneId=Europe/Paris, timestamp=2417-03-26T02:08:43] – 
> org.apache.hadoop.hive.metastore.utils.TestMetaStoreUtils*
> {code:bash}
> Error
> expected:<2417-03-26 0[2]:08:43> but was:<2417-03-26 0[3]:08:43>
> Stacktrace
> org.junit.ComparisonFailure: expected:<2417-03-26 0[2]:08:43> but 
> was:<2417-03-26 0[3]:08:43>
>   at 
> org.apache.hadoop.hive.metastore.utils.TestMetaStoreUtils.testTimestampToString(TestMetaStoreUtils.java:85)
> {code}
> *2. Testing / split-01 / PostProcess / testCliDriver[udf5] – 
> org.apache.hadoop.hive.cli.split24.TestMiniLlapLocalCliDriver*
> {code:bash}
> Error
> Client Execution succeeded but contained differences (error code = 1) after 
> executing udf5.q 
> 263c263
> < 1400-11-08 07:35:34
> ---
> > 1400-11-08 07:35:24
> 272c272
> < 1800-11-08 07:35:34
> ---
> > 1800-11-08 07:35:24
> 434c434
> < 1399-12-31 23:35:34
> ---
> > 1399-12-31 23:35:24
> 443c443
> < 1799-12-31 23:35:34
> ---
> > 1799-12-31 23:35:24
> 452c452
> < 1899-12-31 23:35:34
> ---
> > 1899-12-31 23:35:24
> {code}
> *3. Testing / split-19 / PostProcess / testStringArg2 – 
> org.apache.hadoop.hive.ql.udf.generic.TestGenericUDFToUnixTimestamp*
> {code:bash}
> Stacktrace
> org.junit.ComparisonFailure: expected:<-17984790[40]0> but 
> was:<-17984790[39]0>
>   at org.junit.Assert.assertEquals(Assert.java:117)
>   at org.junit.Assert.assertEquals(Assert.java:146)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.TestGenericUDFToUnixTimestamp.runAndVerify(TestGenericUDFToUnixTimestamp.java:70)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.TestGenericUDFToUnixTimestamp.testStringArg2(TestGenericUDFToUnixTimestamp.java:167)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> {code}
> It maybe a jdk bug and fixed in the new release, because we could get the 
> same result from Spark:
> {code:sql}
> spark-sql> select to_unix_timestamp(to_timestamp("1400-02-01 00:00:00 ICT", 
> "-MM-dd HH:mm:ss z"), "US/Pacific");
> -17984790390
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28337) Process timestamps at UTC timezone instead of local timezone in MetaStoreUtils

2024-09-23 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-28337.

Fix Version/s: 4.1.0
   Resolution: Fixed

Fixed in 
[https://github.com/apache/hive/commit/e31811bb7c6670ab1f725adde3aa2b012ca64415]

Thanks for the PR [~kiranvelumuri] and for the review [~wechar] !

> Process timestamps at UTC timezone instead of local timezone in MetaStoreUtils
> --
>
> Key: HIVE-28337
> URL: https://issues.apache.org/jira/browse/HIVE-28337
> Project: Hive
>  Issue Type: Bug
>Reporter: Kiran Velumuri
>Assignee: Kiran Velumuri
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
> Attachments: image-2024-06-18-12-42-05-646.png, 
> image-2024-06-18-12-42-31-472.png
>
>
> Currently in MetaStoreUtils, the conversion to/from timestamp and string 
> makes use of LocalDateTime in the local time zone while processing 
> timestamps. This causes issue with representing timestamps *as mentioned 
> below*. Instead, while dealing with timestamps it is proposed to use 
> java.time.Instant to represent a point on the time-line, which would overcome 
> the issue with representing such timestamps. Accordingly the test class for 
> MetaStoreUtils (TestMetaStoreUtils) has also been modified to account for 
> these changes.
> +Failing scenario:+
> Timestamps in time-zones which observe daylight savings during which the 
> clock is set forward(typicallly 2:00 AM - 3:00 AM)
> Example: 2417-03-26T02:08:43 in Europe/Paris is invalid, and would get 
> converted to 2417-03-26T03:08:43 by Timestamp.valueOf() method, when instead 
> we want to represent the original timestamp without conversion.
> This is happening due to representing timestamp as LocalDateTime in 
> TestMetaStoreUtils, which is independent of the time-zone of the timestamp. 
> This LocalDateTime timestamp when combined with time-zone is leading to 
> invalid timestamp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28450) Follow the array size of JVM in Hive transferable objects

2024-09-23 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28450:
---
Fix Version/s: (was: 4.0.1)

> Follow the array size of JVM in Hive transferable objects
> -
>
> Key: HIVE-28450
> URL: https://issues.apache.org/jira/browse/HIVE-28450
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.1.3, 4.0.0
>Reporter: Sercan Tekin
>Priority: Major
> Fix For: 4.1.0
>
>
> We are experiencing an issue with a partitioned table in Hive. When querying 
> the table via the Hive CLI, the data retrieval works as expected without any 
> errors. However, when attempting to query the same table through Spark, we 
> encounter the following error in the HMS logs:
> {code:java}
> 2024-01-30 23:03:59,052 main DEBUG 
> org.apache.logging.log4j.core.util.SystemClock does not support precise 
> timestamps.
> Exception in thread "pool-7-thread-4" java.lang.OutOfMemoryError: Requested 
> array size exceeds VM limit
>   at java.util.Arrays.copyOf(Arrays.java:3236)
>   at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
>   at 
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>   at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
>   at 
> org.apache.thrift.transport.TSaslTransport.write(TSaslTransport.java:473)
>   at 
> org.apache.thrift.transport.TSaslServerTransport.write(TSaslServerTransport.java:42)
>   at 
> org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227)
>   at 
> org.apache.hadoop.hive.metastore.api.FieldSchema$FieldSchemaStandardScheme.write(FieldSchema.java:517)
>   at 
> org.apache.hadoop.hive.metastore.api.FieldSchema$FieldSchemaStandardScheme.write(FieldSchema.java:456)
>   at 
> org.apache.hadoop.hive.metastore.api.FieldSchema.write(FieldSchema.java:394)
>   at 
> org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.write(StorageDescriptor.java:1423)
>   at 
> org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.write(StorageDescriptor.java:1250)
>   at 
> org.apache.hadoop.hive.metastore.api.StorageDescriptor.write(StorageDescriptor.java:1116)
>   at 
> org.apache.hadoop.hive.metastore.api.Partition$PartitionStandardScheme.write(Partition.java:1033)
>   at 
> org.apache.hadoop.hive.metastore.api.Partition$PartitionStandardScheme.write(Partition.java:890)
>   at 
> org.apache.hadoop.hive.metastore.api.Partition.write(Partition.java:786)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result$get_partitions_resultStandardScheme.write(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result$get_partitions_resultStandardScheme.write(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.write(ThriftHiveMetastore.java)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:58)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:603)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:600)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:600)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:313)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:750)
> Exception in thread "pool-7-thread-6" java.lang.OutOfMemoryError: Requested 
> array size exceeds VM limit
> Exception in thread "pool-7-thread-9" java.lang.OutOfMemoryError: Requested 
> array size exceeds VM limit
> {code}
> This error appears to be related to the JVM’s conservative approach to array 
> size a

[jira] [Commented] (HIVE-28450) Follow the array size of JVM in Hive transferable objects

2024-09-23 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883776#comment-17883776
 ] 

Zhihua Deng commented on HIVE-28450:


Deferring this Jira to 4.1.0.

Not sure how to change the VM limit,  the workaround makes sense to me and I 
think it's better to batch retrieve the partitions in this case, even we have a 
fix on the server side, the client might see the same problem when receiving 
the bytes.

> Follow the array size of JVM in Hive transferable objects
> -
>
> Key: HIVE-28450
> URL: https://issues.apache.org/jira/browse/HIVE-28450
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.1.3, 4.0.0
>Reporter: Sercan Tekin
>Priority: Major
> Fix For: 4.1.0, 4.0.1
>
>
> We are experiencing an issue with a partitioned table in Hive. When querying 
> the table via the Hive CLI, the data retrieval works as expected without any 
> errors. However, when attempting to query the same table through Spark, we 
> encounter the following error in the HMS logs:
> {code:java}
> 2024-01-30 23:03:59,052 main DEBUG 
> org.apache.logging.log4j.core.util.SystemClock does not support precise 
> timestamps.
> Exception in thread "pool-7-thread-4" java.lang.OutOfMemoryError: Requested 
> array size exceeds VM limit
>   at java.util.Arrays.copyOf(Arrays.java:3236)
>   at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
>   at 
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>   at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
>   at 
> org.apache.thrift.transport.TSaslTransport.write(TSaslTransport.java:473)
>   at 
> org.apache.thrift.transport.TSaslServerTransport.write(TSaslServerTransport.java:42)
>   at 
> org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227)
>   at 
> org.apache.hadoop.hive.metastore.api.FieldSchema$FieldSchemaStandardScheme.write(FieldSchema.java:517)
>   at 
> org.apache.hadoop.hive.metastore.api.FieldSchema$FieldSchemaStandardScheme.write(FieldSchema.java:456)
>   at 
> org.apache.hadoop.hive.metastore.api.FieldSchema.write(FieldSchema.java:394)
>   at 
> org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.write(StorageDescriptor.java:1423)
>   at 
> org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.write(StorageDescriptor.java:1250)
>   at 
> org.apache.hadoop.hive.metastore.api.StorageDescriptor.write(StorageDescriptor.java:1116)
>   at 
> org.apache.hadoop.hive.metastore.api.Partition$PartitionStandardScheme.write(Partition.java:1033)
>   at 
> org.apache.hadoop.hive.metastore.api.Partition$PartitionStandardScheme.write(Partition.java:890)
>   at 
> org.apache.hadoop.hive.metastore.api.Partition.write(Partition.java:786)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result$get_partitions_resultStandardScheme.write(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result$get_partitions_resultStandardScheme.write(ThriftHiveMetastore.java)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.write(ThriftHiveMetastore.java)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:58)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:603)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:600)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:600)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:313)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:750)
> Exception in thread "poo

[jira] [Updated] (HIVE-28267) Support merge task functionality for Iceberg delete files

2024-09-23 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28267:
--
Fix Version/s: 4.1.0
   (was: 4.0.1)

> Support merge task functionality for Iceberg delete files
> -
>
> Key: HIVE-28267
> URL: https://issues.apache.org/jira/browse/HIVE-28267
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> Support merge task functionality for Iceberg delete files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28341) Iceberg: Change Major QB Full Table Compaction to compact partitions in parallel

2024-09-23 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28341:
--
Fix Version/s: 4.1.0
   (was: 4.0.1)

> Iceberg: Change Major QB Full Table Compaction to compact partitions in 
> parallel
> 
>
> Key: HIVE-28341
> URL: https://issues.apache.org/jira/browse/HIVE-28341
> Project: Hive
>  Issue Type: Task
>  Components: Hive, Iceberg integration
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: hive, iceberg, pull-request-available
> Fix For: 4.1.0
>
>
> Currently, Major compaction compacts a whole table in one step. If a table is 
> partition and has a lot of data this operation can take a lot of time and it 
> risks getting write conflicts at the commit stage. This can be improved to 
> work partition by partition. Also, for each partition it will create one 
> snapshot instead of 2 snapshots (truncate+IOW) created now when compacting 
> the whole table in one step.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28532) Map Join Reuse cache allows to share hashtables for Outer join and Inner join

2024-09-23 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28532:
--
Description: 
Map Join Reuse cache allows to share hashtables for Outer join and Inner join. 
But we cannot reuse a hash table for a non-outer join vs outer join. Because 
outer join cannot accept the hash table kind other than HASHMAP, whereas there 
are other types like HASHSET and HASH_MULTISET.
{code}
Caused by: java.lang.ClassCastException: class 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastStringHashMultiSetContainer
 cannot be cast to class 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinHashMap
{code}

  was:Map Join Reuse cache allows to share hashtables for Outer join and Inner 
join. But we cannot reuse a hash table for a non-outer join vs outer join. 
Because outer join cannot accept the hash table kind other than HASHMAP, 
whereas there are other types like HASHSET and HASH_MULTISET.


> Map Join Reuse cache allows to share hashtables for Outer join and Inner join
> -
>
> Key: HIVE-28532
> URL: https://issues.apache.org/jira/browse/HIVE-28532
> Project: Hive
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>  Components: Logical Optimizer
>Affects Versions: 4.0.0
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>
> Map Join Reuse cache allows to share hashtables for Outer join and Inner 
> join. But we cannot reuse a hash table for a non-outer join vs outer join. 
> Because outer join cannot accept the hash table kind other than HASHMAP, 
> whereas there are other types like HASHSET and HASH_MULTISET.
> {code}
> Caused by: java.lang.ClassCastException: class 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastStringHashMultiSetContainer
>  cannot be cast to class 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinHashMap
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28028) Remove duplicated proto reader/writer classes introduced in HIVE-19288

2024-09-23 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28028:
---
Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

> Remove duplicated proto reader/writer classes introduced in HIVE-19288
> --
>
> Key: HIVE-28028
> URL: https://issues.apache.org/jira/browse/HIVE-28028
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: Raghav Aggarwal
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
> Attachments: HIVE-28028.patch
>
>
> introduced in commit:
> https://github.com/apache/hive/commit/8349dbde55f479167e43cfd1f089e131d4271e5b
> as discussed in a related PR: https://github.com/apache/hive/pull/5033, we 
> should
> 1. check if anything has been contributed to those compared to what we 
> currently have on tez/master
> 2. remove those



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28026) Reading proto data more than 2GB from multiple splits fails

2024-09-23 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28026:
---
Fix Version/s: Not Applicable
   (was: 4.0.1)

> Reading proto data more than 2GB from multiple splits fails
> ---
>
> Key: HIVE-28026
> URL: https://issues.apache.org/jira/browse/HIVE-28026
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0-beta-1
> Environment:   
>Reporter: Raghav Aggarwal
>Assignee: Raghav Aggarwal
>Priority: Major
>  Labels: pull-request-available
> Fix For: Not Applicable
>
>
> {*}Query{*}: select * from __
> {*}Explanation{*}:
> On running the above mentioned query on a hive proto table, multiple tez 
> containers will be spawned to process the data. In a container, if there are 
> multiple hdfs splits and the combined size of decompressed data is more than 
> 2GB then the query fails with the following error:
> {code:java}
> "While parsing a protocol message, the input ended unexpectedly in the middle 
> of a field.  This could mean either that the input has been truncated or that 
> an embedded message misreported its own length." {code}
>  
> This is happening because of 
> _[CodedInputStream|https://github.com/protocolbuffers/protobuf/blob/54489e95e01882407f356f83c9074415e561db00/java/core/src/main/java/com/google/protobuf/CodedInputStream.java#L2712C7-L2712C16]
>  i.e. byteLimit += totalBytesRetired + pos;_
> _byteLimit_ is __ getting InterOverflow as _totalBytesRetired_ is retaining 
> count of all the bytes that it has read as CodedInputStream is initiliazed 
> once for a container 
> [https://github.com/apache/hive/blob/564d7e54d2360488611da39d0e5f027a2d574fc1/ql/src/java/org/apache/tez/dag/history/logging/proto/ProtoMessageWritable.java#L96]
>  . 
>  
> This is different from issue reproduced in 
> [https://github.com/zabetak/protobuf-large-message] as there it is a single 
> proto data file more than 2GB, but in my case, there are multiple file total 
> resulting in 2GB.
> CC [~zabetak] 
> *Limitation:*
> This fix will still not resolve the issue which is mentioned 
> [https://github.com/protocolbuffers/protobuf/issues/11729] 
> Here is DDL:
>  
> {code:java}
> beeline  -u 
> 'jdbc:hive2://hostnames/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;thrift.client.max.message.size=2147483647'
>  --showHeader=false --outputformat=tsv2 -e "select * from 
> raaggarw.proto_hive_query_data where executionmode='MR' and otherinfo['CONF'] 
> != 'NULL'" >> ./output {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28325) Lack of "owner" in HivePrivilegeObject causes Ranger slowness at compilation time

2024-09-23 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28325:
---
Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

> Lack of "owner" in HivePrivilegeObject causes Ranger slowness at compilation 
> time
> -
>
> Key: HIVE-28325
> URL: https://issues.apache.org/jira/browse/HIVE-28325
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.0.1
>
>
> There is a HivePrivilegeObject created in the SemanticAnalyzer that is used 
> for Ranger calls. 
> Ranger uses the owner as a filter when searching for objects. When the owner 
> is not passed in, Ranger calls get slowed down noticeably, causing a slowdown 
> in compilation time.
>  
> This is related to HIVE-27285



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28532) Map Join Reuse cache allows to share hashtables for Outer join and Inner join

2024-09-23 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28532:
--
Component/s: Logical Optimizer

> Map Join Reuse cache allows to share hashtables for Outer join and Inner join
> -
>
> Key: HIVE-28532
> URL: https://issues.apache.org/jira/browse/HIVE-28532
> Project: Hive
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>  Components: Logical Optimizer
>Affects Versions: 4.0.0
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>
> Map Join Reuse cache allows to share hashtables for Outer join and Inner 
> join. But we cannot reuse a hash table for a non-outer join vs outer join. 
> Because outer join cannot accept the hash table kind other than HASHMAP, 
> whereas there are other types like HASHSET and HASH_MULTISET.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28532) Map Join Reuse cache allows to share hashtables for Outer join and Inner join

2024-09-23 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28532:
--
Affects Version/s: 4.0.0

> Map Join Reuse cache allows to share hashtables for Outer join and Inner join
> -
>
> Key: HIVE-28532
> URL: https://issues.apache.org/jira/browse/HIVE-28532
> Project: Hive
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>Affects Versions: 4.0.0
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>
> Map Join Reuse cache allows to share hashtables for Outer join and Inner 
> join. But we cannot reuse a hash table for a non-outer join vs outer join. 
> Because outer join cannot accept the hash table kind other than HASHMAP, 
> whereas there are other types like HASHSET and HASH_MULTISET.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28532) Map Join Reuse cache allows to share hashtables for Outer join and Inner join

2024-09-23 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883754#comment-17883754
 ] 

Denys Kuzmenko commented on HIVE-28532:
---

it's not limited to Outer joins only, there are multiple other combinations 
like ANTI_JOIN/LEFT_SEMI_JOIN(HASHSET) + INNER_JOIN (HASH_MULTISET)

> Map Join Reuse cache allows to share hashtables for Outer join and Inner join
> -
>
> Key: HIVE-28532
> URL: https://issues.apache.org/jira/browse/HIVE-28532
> Project: Hive
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>
> Map Join Reuse cache allows to share hashtables for Outer join and Inner 
> join. But we cannot reuse a hash table for a non-outer join vs outer join. 
> Because outer join cannot accept the hash table kind other than HASHMAP, 
> whereas there are other types like HASHSET and HASH_MULTISET.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28530) Fetched result from another query

2024-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28530:
--
Labels: pull-request-available  (was: )

> Fetched result from another query
> -
>
> Key: HIVE-28530
> URL: https://issues.apache.org/jira/browse/HIVE-28530
> Project: Hive
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: Xiaomin Zhang
>Priority: Major
>  Labels: pull-request-available
>
> When running Hive load tests, we observed Beeline can fetch wrong query 
> result which is from another one running at same time.  We ruled out Load 
> Balancing issue, because it happened to a single HiveServer2.  And we found 
> this issue only happens when *hive.query.result.cached.enabled is false.*
> All test queries are in the same format as below: 
> {code:java}
> select concat('total record (test_$PID)=',count(*)) as count_record from t1t
> {code}
> We randomized the query by replacing the $PID with the Beeline PID and the 
> test driver ran 10 Beeline concurrently.  The table t1t is static and has a 
> few rows. So now the test driver can check if the query result is equal to: 
> total record (test_recon_mock_$PID)=2
> When query result cache is disabled,  we can see randomly query got a wrong 
> result, and can always reproduced.  For example, below two queries were 
> running in parallel:
> {code:java}
> queryId=hive_20240701103742_ff1adb2d-e9eb-448d-990e-00ab371e9db6): select 
> concat('total record (test_21535)=',count(*)) as count_record from t1t
> queryId=hive_20240701103742_9bdfff92-89e1-4bcd-88ea-bf73ba5fd93d): select 
> concat('total record (test_21566)=',count(*)) as count_record from t1t
> {code}
> While the second query is supposed to get below result:
> *total record (test_21566)=2*
> But actually Beeline got below result:
> *total record (test_21535)=2*
> There is no error in the HS2 log.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28483) CAST string to date should return null when format is invalid

2024-09-23 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-28483.

Fix Version/s: 4.1.0
   Resolution: Fixed

Fixed in 
[https://github.com/apache/hive/commit/e87b5bb8f4bded30b17e46ee573151488c78d178.]

Thanks for the PR [~zratkai] !

The new behavior was also discussed in the mailing lists: 
https://lists.apache.org/thread/blo8ozrhmh1jq9c0oz8bhm39lpb95bbv

> CAST string to date should return null when format is invalid
> -
>
> Key: HIVE-28483
> URL: https://issues.apache.org/jira/browse/HIVE-28483
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Rátkai
>Assignee: Zoltán Rátkai
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> Date conversation gives wrong result. Like:1 row selected (6.403 seconds)
> select to_date('03-08-2024');
> Result:
> +-+
> |    _c0    |
> +-+
> |0003-08-20  |
> +-+
> or:
> select to_date(last_day(add_months(last_day('03-08-2024'), -1))) ;
> Result:
> +-+
> |    _c0    |
> +-+
> |0003-07-31  |
> +-
> Here is my comparison with other database systems:
> +
> --
> PostgreSQL
> --
> SELECT TO_DATE('03-08-2024','MMDD');
> invalid value "03-0" for "" DETAIL: Field requires 4 characters, but only 
> 2 could be parsed. HINT: If your source string is not fixed-width, try using 
> the "FM" modifier. 
> SELECT TO_DATE('03-08-2024','DD-MM-');
> to_date
> Sat, 03 Aug 2024 00:00:00 GMT
> SELECT CAST('03-08-2024' AS date);
> date
> Fri, 08 Mar 2024 00:00:00 GMT
> SELECT CAST('2024-08-03' AS date);
> date
> Sat, 03 Aug 2024 00:00:00 GMT
> SELECT CAST('2024-08-03 T' AS date);
> invalid input syntax for type date: "2024-08-03 T" LINE 1: SELECT 
> CAST('2024-08-03 T' AS date) ^ 
> SELECT CAST('2024-08-03T' AS date);
> invalid input syntax for type date: "2024-08-03T" LINE 1: SELECT 
> CAST('2024-08-03T' AS date) ^ 
> SELECT CAST('2024-08-03T12:00:00' AS date);
> date
> Sat, 03 Aug 2024 00:00:00 GMT
> SELECT CAST('2024-08-0312:00:00' AS date);
> date/time field value out of range: "2024-08-0312:00:00" LINE 1: SELECT 
> CAST('2024-08-0312:00:00' AS date) ^ HINT: Perhaps you need a different 
> "datestyle" setting. 
> --
> -ORACLE---
> --
> select CAST('2024-08-03 12:00:00' AS date) from dual;
> Output:
> select CAST('2024-08-03 12:00:00' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01861: literal does not match format string
> -
> select CAST('2024-08-03' AS date) from dual;
> Output:
> select CAST('2024-08-03' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01861: literal does not match format string
> -
> SELECT TO_DATE('08/03/2024', 'MM/DD/') FROM DUAL;
> Output:
> TO_DATE('
> -
> 03-AUG-24
> -
> SELECT TO_DATE('2024-08-03', '-MM-DD') FROM DUAL;
> Output:
> TO_DATE('
> -
> 03-AUG-24
> -
> select CAST('03-08-2024' AS date) from dual;
> Output:
> select CAST('03-08-2024' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01843: An invalid month was specified.
> -
> select CAST('2024-08-0312:00:00' AS date) from dual;
> Output:
> select CAST('2024-08-0312:00:00' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01861: literal does not match format string
> -
> select CAST('10-AUG-24' AS date) from dual;
> Output:
> CAST(&#x

[jira] [Updated] (HIVE-28483) CAST string to date should return null when format is invalid

2024-09-23 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-28483:
---
Summary: CAST string to date should return null when format is invalid  
(was: String date cast giving wrong result)

> CAST string to date should return null when format is invalid
> -
>
> Key: HIVE-28483
> URL: https://issues.apache.org/jira/browse/HIVE-28483
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Rátkai
>Assignee: Zoltán Rátkai
>Priority: Minor
>  Labels: pull-request-available
>
> Date conversation gives wrong result. Like:1 row selected (6.403 seconds)
> select to_date('03-08-2024');
> Result:
> +-+
> |    _c0    |
> +-+
> |0003-08-20  |
> +-+
> or:
> select to_date(last_day(add_months(last_day('03-08-2024'), -1))) ;
> Result:
> +-+
> |    _c0    |
> +-+
> |0003-07-31  |
> +-
> Here is my comparison with other database systems:
> +
> --
> PostgreSQL
> --
> SELECT TO_DATE('03-08-2024','MMDD');
> invalid value "03-0" for "" DETAIL: Field requires 4 characters, but only 
> 2 could be parsed. HINT: If your source string is not fixed-width, try using 
> the "FM" modifier. 
> SELECT TO_DATE('03-08-2024','DD-MM-');
> to_date
> Sat, 03 Aug 2024 00:00:00 GMT
> SELECT CAST('03-08-2024' AS date);
> date
> Fri, 08 Mar 2024 00:00:00 GMT
> SELECT CAST('2024-08-03' AS date);
> date
> Sat, 03 Aug 2024 00:00:00 GMT
> SELECT CAST('2024-08-03 T' AS date);
> invalid input syntax for type date: "2024-08-03 T" LINE 1: SELECT 
> CAST('2024-08-03 T' AS date) ^ 
> SELECT CAST('2024-08-03T' AS date);
> invalid input syntax for type date: "2024-08-03T" LINE 1: SELECT 
> CAST('2024-08-03T' AS date) ^ 
> SELECT CAST('2024-08-03T12:00:00' AS date);
> date
> Sat, 03 Aug 2024 00:00:00 GMT
> SELECT CAST('2024-08-0312:00:00' AS date);
> date/time field value out of range: "2024-08-0312:00:00" LINE 1: SELECT 
> CAST('2024-08-0312:00:00' AS date) ^ HINT: Perhaps you need a different 
> "datestyle" setting. 
> --
> -ORACLE---
> --
> select CAST('2024-08-03 12:00:00' AS date) from dual;
> Output:
> select CAST('2024-08-03 12:00:00' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01861: literal does not match format string
> -
> select CAST('2024-08-03' AS date) from dual;
> Output:
> select CAST('2024-08-03' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01861: literal does not match format string
> -
> SELECT TO_DATE('08/03/2024', 'MM/DD/') FROM DUAL;
> Output:
> TO_DATE('
> -
> 03-AUG-24
> -
> SELECT TO_DATE('2024-08-03', '-MM-DD') FROM DUAL;
> Output:
> TO_DATE('
> -
> 03-AUG-24
> -
> select CAST('03-08-2024' AS date) from dual;
> Output:
> select CAST('03-08-2024' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01843: An invalid month was specified.
> -
> select CAST('2024-08-0312:00:00' AS date) from dual;
> Output:
> select CAST('2024-08-0312:00:00' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01861: literal does not match format string
> -
> select CAST('10-AUG-24' AS date) from dual;
> Output:
> CAST('10-
> -
> 10-AUG-24
> -
> select CAST('10-AUG-2024' AS date) from dual;
> Output:
> CAST('10-
> -
> 10-AUG-24
> -

[jira] [Commented] (HIVE-28529) HiveMetaStore#getConf blocked when meet high load

2024-09-23 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883724#comment-17883724
 ] 

Xiaoqiao He commented on HIVE-28529:


{code:java}
In hadoop maybe we can explore having a method for cloning without lock, but if 
someone concurrently modifies the config passed while creating the HMSHandler, 
ideally not, but if yes, that would backfire
{code}
+1. I plan originally to update hadoop Configuration implement at first glance, 
however I also think it will not be safe way. Then try to reach out here and 
look forwards to some more smooth solution. Thanks all.

> HiveMetaStore#getConf blocked when meet high load
> -
>
> Key: HIVE-28529
> URL: https://issues.apache.org/jira/browse/HIVE-28529
> Project: Hive
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>  Components: Metastore
>Reporter: Xiaoqiao He
>Priority: Major
>
> Thousand threads are blocked for long time when metastore meet high load as 
> the following stack shows.
> a. there are 1836 threads(as stack 1 shows) are waiting Lock 
> #0x7f8bf9477180 which is hold by stack 2.
> {code:java}
> # grep "0x7f8bf9477180" metastore.stack | wc -l
> 1836
> {code}
> b. there are 105 threads (as stack 2 shows) are waiting Lock 
> #0x7f8bf805f660 which is hold by stack 3. 
> {code:java}
> # grep "0x7f8bf805f660" metastore.stack | wc -l
> 105
> {code}
> c. stack 3 shows that it is time cost operation when init configuration, 
> which hold object  (#hiveConf as the last code snippet)synchronized which is 
> at key path for metastore and impact the performance.
> So, IMO, it need to improve and remove the lock competition to improve the 
> performance. FYI.
> NOTE: I have deployed one early version, but the newest one include this 
> issue too.
> {code:java}
> "pool-12-thread-1482355" #126195588 prio=5 os_prio=0 tid=0x7f86d507b800 
> nid=0x21570 waiting for monitor entry [0x7f875849b000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreInit.initConnectionUrlHook(MetaStoreInit.java:95)
> - waiting to lock <0x7f8bf9477180> (a java.lang.Class for 
> org.apache.hadoop.hive.metastore.MetaStoreInit)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreInit.updateConnectionURL(MetaStoreInit.java:62)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.init(RawStoreProxy.java:87)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:55)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:817)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:795)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database_core(HiveMetaStore.java:1308)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:1240)
> at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:276)
> at com.sun.proxy.$Proxy23.get_database(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$MetricHMSProxy.invoke(HiveMetaStore.java:8241)
> at com.sun.proxy.$Proxy23.get_database(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:11142)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:11126)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:754)
>

[jira] [Comment Edited] (HIVE-28529) HiveMetaStore#getConf blocked when meet high load

2024-09-23 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883724#comment-17883724
 ] 

Xiaoqiao He edited comment on HIVE-28529 at 9/23/24 7:21 AM:
-

{quote}In hadoop maybe we can explore having a method for cloning without lock, 
but if someone concurrently modifies the config passed while creating the 
HMSHandler, ideally not, but if yes, that would backfire
{quote}

+1. I plan originally to update hadoop Configuration implement at first glance, 
however I also think it will not be safe way. Then try to reach out here and 
look forwards to some more smooth solution. Thanks all.


was (Author: hexiaoqiao):
{code:java}
In hadoop maybe we can explore having a method for cloning without lock, but if 
someone concurrently modifies the config passed while creating the HMSHandler, 
ideally not, but if yes, that would backfire
{code}
+1. I plan originally to update hadoop Configuration implement at first glance, 
however I also think it will not be safe way. Then try to reach out here and 
look forwards to some more smooth solution. Thanks all.

> HiveMetaStore#getConf blocked when meet high load
> -
>
> Key: HIVE-28529
> URL: https://issues.apache.org/jira/browse/HIVE-28529
> Project: Hive
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>  Components: Metastore
>Reporter: Xiaoqiao He
>Priority: Major
>
> Thousand threads are blocked for long time when metastore meet high load as 
> the following stack shows.
> a. there are 1836 threads(as stack 1 shows) are waiting Lock 
> #0x7f8bf9477180 which is hold by stack 2.
> {code:java}
> # grep "0x7f8bf9477180" metastore.stack | wc -l
> 1836
> {code}
> b. there are 105 threads (as stack 2 shows) are waiting Lock 
> #0x7f8bf805f660 which is hold by stack 3. 
> {code:java}
> # grep "0x7f8bf805f660" metastore.stack | wc -l
> 105
> {code}
> c. stack 3 shows that it is time cost operation when init configuration, 
> which hold object  (#hiveConf as the last code snippet)synchronized which is 
> at key path for metastore and impact the performance.
> So, IMO, it need to improve and remove the lock competition to improve the 
> performance. FYI.
> NOTE: I have deployed one early version, but the newest one include this 
> issue too.
> {code:java}
> "pool-12-thread-1482355" #126195588 prio=5 os_prio=0 tid=0x7f86d507b800 
> nid=0x21570 waiting for monitor entry [0x7f875849b000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreInit.initConnectionUrlHook(MetaStoreInit.java:95)
> - waiting to lock <0x7f8bf9477180> (a java.lang.Class for 
> org.apache.hadoop.hive.metastore.MetaStoreInit)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreInit.updateConnectionURL(MetaStoreInit.java:62)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.init(RawStoreProxy.java:87)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:55)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:817)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:795)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database_core(HiveMetaStore.java:1308)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:1240)
> at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:276)
> at com.sun.proxy.$Proxy23.get_database(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$MetricHMSProxy.invoke(HiveMetaStore.java:8241)
> at com.sun.proxy.$Proxy23.get_database(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(Thr

[jira] [Updated] (HIVE-28215) Signalling CONDITION HANDLER is not working in HPLSQL.

2024-09-22 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28215:
---
Labels: hive-4.0.1-must pull-request-available  (was: hive-4.0.0-must 
pull-request-available)

> Signalling CONDITION HANDLER is not working in HPLSQL.
> --
>
> Key: HIVE-28215
> URL: https://issues.apache.org/jira/browse/HIVE-28215
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> Signalling CONDITION HANDLER is not working in HPLSQL.
> Refer [http://www.hplsql.org/declare-condition] and 
> [http://www.hplsql.org/declare-handler] for more details about this feature.
>  
> +Steps to Reproduce:+
> {noformat}
> jdbc:hive2://ccycloud-1.nightly-71x-oq.roo> DECLARE cnt INT DEFAULT 0; 
> . . . . . . . . . . . . . . . . . . . . . . .> DECLARE wrong_cnt_condition 
> CONDITION;
> . . . . . . . . . . . . . . . . . . . . . . .> 
> . . . . . . . . . . . . . . . . . . . . . . .> DECLARE EXIT HANDLER FOR 
> wrong_cnt_condition
> . . . . . . . . . . . . . . . . . . . . . . .>   PRINT 'Wrong number of 
> rows'; 
> . . . . . . . . . . . . . . . . . . . . . . .> 
> . . . . . . . . . . . . . . . . . . . . . . .> EXECUTE IMMEDIATE 'SELECT 
> COUNT(*) FROM sys.tbls' INTO cnt;
> . . . . . . . . . . . . . . . . . . . . . . .> 
> . . . . . . . . . . . . . . . . . . . . . . .> IF cnt <> 0 THEN
> . . . . . . . . . . . . . . . . . . . . . . .>   SIGNAL wrong_cnt_condition;
> . . . . . . . . . . . . . . . . . . . . . . .> END IF;
> . . . . . . . . . . . . . . . . . . . . . . .> /
> INFO  : Compiling 
> command(queryId=hive_20240424171747_7f22fef6-70d5-483a-af67-7a6b9f17ac8b): 
> SELECT COUNT(*) FROM sys.tbls
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
> type:bigint, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20240424171747_7f22fef6-70d5-483a-af67-7a6b9f17ac8b); 
> Time taken: 0.995 seconds 
> INFO  : Completed executing 
> command(queryId=hive_20240424171747_7f22fef6-70d5-483a-af67-7a6b9f17ac8b); 
> Time taken: 8.479 seconds
> INFO  : OK
> ERROR : wrong_cnt_condition
> No rows affected (9.559 seconds)
> 0: jdbc:hive2://localhost>{noformat}
>  
> Here when _SIGNAL wrong_cnt_condition;_ statement is executed, it has to 
> invoke corresponding continue/exit handlers and should execute the statements 
> present in the handler block. But currently its not happening.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28529) HiveMetaStore#getConf blocked when meet high load

2024-09-22 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883707#comment-17883707
 ] 

Ayush Saxena commented on HIVE-28529:
-

I see...

That lock is within the hadoop code & I am not sure if there is any easy way to 
get rid of that in hive code, we have that {{threadLocalConf}} so it should be 
a one time operation per thread, but if we have so many threads being created 
at same point, this might choke for some time?

In hadoop maybe we can explore having a method for cloning without lock, but if 
someone concurrently modifies the config passed while creating the HMSHandler, 
ideally not, but if yes, that would backfire

[~dkuzmenko]/[~abstractdog]/[~dengzh]/[~zabetak] anything that comes into your 
mind, that we can do to speed up things in this context?

> HiveMetaStore#getConf blocked when meet high load
> -
>
> Key: HIVE-28529
> URL: https://issues.apache.org/jira/browse/HIVE-28529
> Project: Hive
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>  Components: Metastore
>Reporter: Xiaoqiao He
>Priority: Major
>
> Thousand threads are blocked for long time when metastore meet high load as 
> the following stack shows.
> a. there are 1836 threads(as stack 1 shows) are waiting Lock 
> #0x7f8bf9477180 which is hold by stack 2.
> {code:java}
> # grep "0x7f8bf9477180" metastore.stack | wc -l
> 1836
> {code}
> b. there are 105 threads (as stack 2 shows) are waiting Lock 
> #0x7f8bf805f660 which is hold by stack 3. 
> {code:java}
> # grep "0x7f8bf805f660" metastore.stack | wc -l
> 105
> {code}
> c. stack 3 shows that it is time cost operation when init configuration, 
> which hold object  (#hiveConf as the last code snippet)synchronized which is 
> at key path for metastore and impact the performance.
> So, IMO, it need to improve and remove the lock competition to improve the 
> performance. FYI.
> NOTE: I have deployed one early version, but the newest one include this 
> issue too.
> {code:java}
> "pool-12-thread-1482355" #126195588 prio=5 os_prio=0 tid=0x7f86d507b800 
> nid=0x21570 waiting for monitor entry [0x7f875849b000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreInit.initConnectionUrlHook(MetaStoreInit.java:95)
> - waiting to lock <0x7f8bf9477180> (a java.lang.Class for 
> org.apache.hadoop.hive.metastore.MetaStoreInit)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreInit.updateConnectionURL(MetaStoreInit.java:62)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.init(RawStoreProxy.java:87)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:55)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:817)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:795)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database_core(HiveMetaStore.java:1308)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:1240)
> at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:276)
> at com.sun.proxy.$Proxy23.get_database(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$MetricHMSProxy.invoke(HiveMetaStore.java:8241)
> at com.sun.proxy.$Proxy23.get_database(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:11142)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:11126)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcess

[jira] [Updated] (HIVE-28300) ALTER TABLE CONCATENATE on a List Bucketing Table fails when using Tez.

2024-09-22 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28300:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> ALTER TABLE CONCATENATE on a List Bucketing Table fails when using Tez.
> ---
>
> Key: HIVE-28300
> URL: https://issues.apache.org/jira/browse/HIVE-28300
> Project: Hive
>  Issue Type: Bug
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> Running list_bucket_dml_8.q using TestMiniLlapLocalCliDriver fails with the 
> following error message:
> {code:java}
> org.apache.hadoop.hive.ql.exec.tez.TezRuntimeException: Vertex failed, 
> vertexName=File Merge, vertexId=vertex_1717492217780_0001_4_00, 
> diagnostics=[Task failed, taskId=task_1717492217780_0001_4_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Node: ### : Error while 
> running task ( failure ) : 
> attempt_1717492217780_0001_4_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: Multiple partitions for one merge mapper: 
> file:/data2/hive-lngsg/itests/qtest/target/localfs/warehouse/list_bucketing_dynamic_part_n2/ds=2008-04-08/hr=b1/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME
>  NOT EQUAL TO 
> file:/data2/hive-lngsg/itests/qtest/target/localfs/warehouse/list_bucketing_dynamic_part_n2/ds=2008-04-08/hr=b1/key=484/value=val_484
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> Multiple partitions for one merge mapper: 
> file:/data2/hive-lngsg/itests/qtest/target/localfs/warehouse/list_bucketing_dynamic_part_n2/ds=2008-04-08/hr=b1/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME
>  NOT EQUAL TO 
> file:/data2/hive-lngsg/itests/qtest/target/localfs/warehouse/list_bucketing_dynamic_part_n2/ds=2008-04-08/hr=b1/key=484/value=val_484
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:220)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:153)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>   ... 16 more
> {code}
> This is a Hive-Tez problem which happens when Hive handles ALTER TABLE 
> CONCATENATE command on a List Bucketing table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28353) Iceberg: Reading *Files Metadata table files if the column is of TIMESTAMP type

2024-09-22 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28353:
---
Labels: hive-4.0.1-merged hive-4.0.1-must  (was: hive-4.0.1-must)

> Iceberg: Reading *Files Metadata table files if the column is of TIMESTAMP 
> type
> ---
>
> Key: HIVE-28353
> URL: https://issues.apache.org/jira/browse/HIVE-28353
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must
> Fix For: 4.1.0
>
>
> If the main table has a column of type TIMESTAMP, reading the *FILES Metadata 
> table fails
> {noformat}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: java.time.OffsetDateTime cannot be cast to 
> java.time.LocalDateTime
> at 
> org.apache.hadoop.hive.ql.exec.ListSinkOperator.process(ListSinkOperator.java:98)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:545)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:537)
> at 
> org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:194)
> ... 55 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28439) Iceberg: Bucket partition transform with DECIMAL can throw NPE

2024-09-22 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28439:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Iceberg: Bucket partition transform with DECIMAL can throw NPE
> --
>
> Key: HIVE-28439
> URL: https://issues.apache.org/jira/browse/HIVE-28439
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> Hive can fail when we bucket records by decimal columns.
> {code:java}
> CREATE TABLE test (c_decimal DECIMAL(38, 0)) PARTITIONED BY SPEC (bucket(8, 
> c_decimal)) STORED BY ICEBERG;
> INSERT INTO test VALUES (CAST('5000441610525' AS DECIMAL(38, 
> 0))); {code}
> Stacktrace
> {code:java}
> ERROR : Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1722775255811_0004_1_00, diagnostics=[Task failed, 
> taskId=task_1722775255811_0004_1_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Node: 
> yarn-nodemanager-2.yarn-nodemanager.zookage.svc.cluster.local/10.1.5.93 : 
> Error while running task ( failure ) : 
> attempt_1722775255811_0004_1_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing writable
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing writable
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>   ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing writable
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:569)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
>   ... 19 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:384)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Ope

[jira] [Commented] (HIVE-28529) HiveMetaStore#getConf blocked when meet high load

2024-09-22 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883680#comment-17883680
 ] 

Xiaoqiao He commented on HIVE-28529:


[~ayushtkn] Thanks for your quick response.

In my internal version, we has backport-ed HIVE-20740 actually. The main issue 
as the below code snippet now. 

org.apache.hadoop.hive.metastore.HiveMetaStore.HMSHandler#getConf
{code:java}
@Override
public Configuration getConf() {
  Configuration conf = threadLocalConf.get();
  if (conf == null) {
conf = new Configuration(hiveConf);
threadLocalConf.set(conf);
  }
  return conf;
}
{code}

When init conf using 
org.apache.hadoop.conf.Configuration#Configuration(org.apache.hadoop.conf.Configuration),
 which lock on #hiveConf, then some other threads will be blocked for long 
times, because create Configuration object is one time cost operation.
I am not sure if we could have other way to avoid holding #hiveConf lock here.

> HiveMetaStore#getConf blocked when meet high load
> -
>
> Key: HIVE-28529
> URL: https://issues.apache.org/jira/browse/HIVE-28529
> Project: Hive
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>  Components: Metastore
>Reporter: Xiaoqiao He
>Priority: Major
>
> Thousand threads are blocked for long time when metastore meet high load as 
> the following stack shows.
> a. there are 1836 threads(as stack 1 shows) are waiting Lock 
> #0x7f8bf9477180 which is hold by stack 2.
> {code:java}
> # grep "0x7f8bf9477180" metastore.stack | wc -l
> 1836
> {code}
> b. there are 105 threads (as stack 2 shows) are waiting Lock 
> #0x7f8bf805f660 which is hold by stack 3. 
> {code:java}
> # grep "0x7f8bf805f660" metastore.stack | wc -l
> 105
> {code}
> c. stack 3 shows that it is time cost operation when init configuration, 
> which hold object  (#hiveConf as the last code snippet)synchronized which is 
> at key path for metastore and impact the performance.
> So, IMO, it need to improve and remove the lock competition to improve the 
> performance. FYI.
> NOTE: I have deployed one early version, but the newest one include this 
> issue too.
> {code:java}
> "pool-12-thread-1482355" #126195588 prio=5 os_prio=0 tid=0x7f86d507b800 
> nid=0x21570 waiting for monitor entry [0x7f875849b000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreInit.initConnectionUrlHook(MetaStoreInit.java:95)
> - waiting to lock <0x7f8bf9477180> (a java.lang.Class for 
> org.apache.hadoop.hive.metastore.MetaStoreInit)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreInit.updateConnectionURL(MetaStoreInit.java:62)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.init(RawStoreProxy.java:87)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:55)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:817)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:795)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database_core(HiveMetaStore.java:1308)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:1240)
> at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:276)
> at com.sun.proxy.$Proxy23.get_database(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$MetricHMSProxy.invoke(HiveMetaStore.java:8241)
> at com.sun.proxy.$Proxy23.get_database(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:11142)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(

[jira] [Comment Edited] (HIVE-28529) HiveMetaStore#getConf blocked when meet high load

2024-09-22 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883678#comment-17883678
 ] 

Ayush Saxena edited comment on HIVE-28529 at 9/23/24 4:42 AM:
--

I remember seeing something similar long back, where things are getting really 
bad on concurrent calls and the problem was due to lock on the conf object, but 
things worked post we got HIVE-20740 in

I am checking the old trace that I had
{noformat}
org.apache.hadoop.hive.metastore.ObjectStore.initializeHelper(ObjectStore.java:392)
at 
org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:322)
at 
org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:283)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:60)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:69)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:628)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:601)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:595)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_all_functions(HiveMetaStore.java:6117){noformat}
It is going on similar track matching with one of your trace, not same though 
beyond a point, but maybe due to different versions or so, but should see if 
you have HIVE-20740 in your version? If yes then this is something new need to 
explore


was (Author: ayushtkn):
I remember seeing something similar long back, where things are getting really 
bad on concurrent calls and the problem was due to lock on the conf object, but 
things worked post we got HIVE-20740 in

I am checking the old trace that I had
{noformat}
org.apache.hadoop.hive.metastore.ObjectStore.initializeHelper(ObjectStore.java:392)
at 
org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:322)
at 
org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:283)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:60)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:69)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:628)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:601)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:595)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_all_functions(HiveMetaStore.java:6117){noformat}
It is going on similar track matching with one of your trace, not same though 
beyond a point, but maybe due to different versions or so, but should see if 
you have HIVE-20740 in your version? If not then this is something new need to 
explore

> HiveMetaStore#getConf blocked when meet high load
> -
>
> Key: HIVE-28529
> URL: https://issues.apache.org/jira/browse/HIVE-28529
> Project: Hive
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>  Components: Metastore
>Reporter: Xiaoqiao He
>Priority: Major
>
> Thousand threads are blocked for long time when metastore meet high load as 
> the following stack shows.
> a. there are 1836 threads(as stack 1 shows) are waiting Lock 
> #0x7f8bf9477180 which is hold by stack 2.
> {code:java}
> # grep "0x7f8bf9477180" metastore.stack | wc -l
> 1836
> {code}
> b. there are 105 threads (as stack 2 shows) are waiting Lock 
> #0x7f8bf805f660 which is hold by stack 3. 
> {code:java}
> # grep "0x7f8bf805f660" metastore.stack | wc -l
> 105
> {code}
> c. stack 3 shows that it is time cost operation when init configuration, 
> which hold object  (#hiveConf as the last code snippet)synchronized which is 
> at key path for metastore and impact the performance.
> So, IMO, it need to improve and remove the lock competition to improve the 
> performance. FYI.
> NOTE: I have deployed one early version, but the newest one include this 
> issue too.
> {code:java}
> "pool-12-thread-1482355" #126195588 prio=5 os_prio=0 tid=0x7f86d507b800 
> nid=0x21570 waiting for monitor entry [0x7f875849b000]
>java.lang.Thr

[jira] [Commented] (HIVE-28529) HiveMetaStore#getConf blocked when meet high load

2024-09-22 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883678#comment-17883678
 ] 

Ayush Saxena commented on HIVE-28529:
-

I remember seeing something similar long back, where things are getting really 
bad on concurrent calls and the problem was due to lock on the conf object, but 
things worked post we got HIVE-20740 in

I am checking the old trace that I had
{noformat}
org.apache.hadoop.hive.metastore.ObjectStore.initializeHelper(ObjectStore.java:392)
at 
org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:322)
at 
org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:283)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:60)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:69)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:628)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:601)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:595)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_all_functions(HiveMetaStore.java:6117){noformat}
It is going on similar track matching with one of your trace, not same though 
beyond a point, but maybe due to different versions or so, but should see if 
you have HIVE-20740 in your version? If not then this is something new need to 
explore

> HiveMetaStore#getConf blocked when meet high load
> -
>
> Key: HIVE-28529
> URL: https://issues.apache.org/jira/browse/HIVE-28529
> Project: Hive
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>  Components: Metastore
>Reporter: Xiaoqiao He
>Priority: Major
>
> Thousand threads are blocked for long time when metastore meet high load as 
> the following stack shows.
> a. there are 1836 threads(as stack 1 shows) are waiting Lock 
> #0x7f8bf9477180 which is hold by stack 2.
> {code:java}
> # grep "0x7f8bf9477180" metastore.stack | wc -l
> 1836
> {code}
> b. there are 105 threads (as stack 2 shows) are waiting Lock 
> #0x7f8bf805f660 which is hold by stack 3. 
> {code:java}
> # grep "0x7f8bf805f660" metastore.stack | wc -l
> 105
> {code}
> c. stack 3 shows that it is time cost operation when init configuration, 
> which hold object  (#hiveConf as the last code snippet)synchronized which is 
> at key path for metastore and impact the performance.
> So, IMO, it need to improve and remove the lock competition to improve the 
> performance. FYI.
> NOTE: I have deployed one early version, but the newest one include this 
> issue too.
> {code:java}
> "pool-12-thread-1482355" #126195588 prio=5 os_prio=0 tid=0x7f86d507b800 
> nid=0x21570 waiting for monitor entry [0x7f875849b000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreInit.initConnectionUrlHook(MetaStoreInit.java:95)
> - waiting to lock <0x7f8bf9477180> (a java.lang.Class for 
> org.apache.hadoop.hive.metastore.MetaStoreInit)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreInit.updateConnectionURL(MetaStoreInit.java:62)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.init(RawStoreProxy.java:87)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:55)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:817)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:795)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database_core(HiveMetaStore.java:1308)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:1240)
> at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:276)
> at com.sun.proxy.$Proxy23.get_database(Unknown Source)
>

[jira] [Updated] (HIVE-28119) Iceberg: Allow insert clause with a column list in Merge query not_matched condition

2024-09-22 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28119:
---
Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

> Iceberg: Allow insert clause with a column list in Merge query not_matched 
> condition
> 
>
> Key: HIVE-28119
> URL: https://issues.apache.org/jira/browse/HIVE-28119
> Project: Hive
>  Issue Type: Improvement
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> {code}
> merge into target_ice as t using source src ON t.a = src.a
> when not matched then insert (a, c) values (src.a, src.c);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28529) HiveMetaStore#getConf blocked when meet high load

2024-09-22 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883676#comment-17883676
 ] 

Xiaoqiao He commented on HIVE-28529:


cc [~ayushtkn] Any thought here? Thanks.

> HiveMetaStore#getConf blocked when meet high load
> -
>
> Key: HIVE-28529
> URL: https://issues.apache.org/jira/browse/HIVE-28529
> Project: Hive
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>  Components: Metastore
>Reporter: Xiaoqiao He
>Priority: Major
>
> Thousand threads are blocked for long time when metastore meet high load as 
> the following stack shows.
> a. there are 1836 threads(as stack 1 shows) are waiting Lock 
> #0x7f8bf9477180 which is hold by stack 2.
> {code:java}
> # grep "0x7f8bf9477180" metastore.stack | wc -l
> 1836
> {code}
> b. there are 105 threads (as stack 2 shows) are waiting Lock 
> #0x7f8bf805f660 which is hold by stack 3. 
> {code:java}
> # grep "0x7f8bf805f660" metastore.stack | wc -l
> 105
> {code}
> c. stack 3 shows that it is time cost operation when init configuration, 
> which hold object  (#hiveConf as the last code snippet)synchronized which is 
> at key path for metastore and impact the performance.
> So, IMO, it need to improve and remove the lock competition to improve the 
> performance. FYI.
> NOTE: I have deployed one early version, but the newest one include this 
> issue too.
> {code:java}
> "pool-12-thread-1482355" #126195588 prio=5 os_prio=0 tid=0x7f86d507b800 
> nid=0x21570 waiting for monitor entry [0x7f875849b000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreInit.initConnectionUrlHook(MetaStoreInit.java:95)
> - waiting to lock <0x7f8bf9477180> (a java.lang.Class for 
> org.apache.hadoop.hive.metastore.MetaStoreInit)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreInit.updateConnectionURL(MetaStoreInit.java:62)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.init(RawStoreProxy.java:87)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:55)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:817)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:795)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database_core(HiveMetaStore.java:1308)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:1240)
> at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:276)
> at com.sun.proxy.$Proxy23.get_database(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$MetricHMSProxy.invoke(HiveMetaStore.java:8241)
> at com.sun.proxy.$Proxy23.get_database(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:11142)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:11126)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:754)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:749)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1717)
> at 
> org.apache.hadoop.hive.thrift.Hadoo

[jira] [Updated] (HIVE-28530) Fetched result from another query

2024-09-22 Thread Xiaomin Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomin Zhang updated HIVE-28530:
-
Description: 
When running Hive load tests, we observed Beeline can fetch wrong query result 
which is from another one running at same time.  We ruled out Load Balancing 
issue, because it happened to a single HiveServer2.  And we found this issue 
only happens when *hive.query.result.cached.enabled is false.*

All test queries are in the same format as below: 

{code:java}
select concat('total record (test_$PID)=',count(*)) as count_record from t1t
{code}

We randomized the query by replacing the $PID with the Beeline PID and the test 
driver ran 10 Beeline concurrently.  The table t1t is static and has a few 
rows. So now the test driver can check if the query result is equal to: total 
record (test_recon_mock_$PID)=2

When query result cache is disabled,  we can see randomly query got a wrong 
result, and can always reproduced.  For example, below two queries were running 
in parallel:

{code:java}
queryId=hive_20240701103742_ff1adb2d-e9eb-448d-990e-00ab371e9db6): select 
concat('total record (test_21535)=',count(*)) as count_record from t1t

queryId=hive_20240701103742_9bdfff92-89e1-4bcd-88ea-bf73ba5fd93d): select 
concat('total record (test_21566)=',count(*)) as count_record from t1t
{code}

While the second query is supposed to get below result:
*total record (test_21566)=2*

But actually Beeline got below result:
*total record (test_21535)=2*

There is no error in the HS2 log.

  was:
When running Hive load tests, we observed Beeline can fetch wrong query result 
which is from another one running at same time.  We ruled out Load Balancing 
issue, because it happened to a single HiveServer2.  And we found this issue 
only happens when *hive.query.result.cached.enabled is false.*

All test queries are in the same format as below: 

{code:java}
select concat('total record (test_recon_mock_$PID)=',count(*)) as count_record 
from t1t
{code}

We randomized the query by replacing the $PID with the Beeline PID and the test 
driver ran 10 Beeline concurrently.  The table t1t is static and has a few 
rows. So now the test driver can check if the query result is equal to: total 
record (test_recon_mock_$PID)=2

When query result cache is disabled,  we can see randomly query got a wrong 
result, and can always reproduced.  For example, below two queries were running 
in parallel:

{code:java}
queryId=hive_20240701103742_ff1adb2d-e9eb-448d-990e-00ab371e9db6): select 
concat('total record (test_recon_mock_21535)=',count(*)) as count_record from 
t1t

queryId=hive_20240701103742_9bdfff92-89e1-4bcd-88ea-bf73ba5fd93d): select 
concat('total record (test_recon_mock_21566)=',count(*)) as count_record from 
t1t
{code}

While the second query is supposed to get below result:
*total record (test_recon_mock_21566)=2*

But actually Beeline got below result:
*total record (test_recon_mock_21535)=2*

There is no error in the HS2 log.


> Fetched result from another query
> -
>
>     Key: HIVE-28530
> URL: https://issues.apache.org/jira/browse/HIVE-28530
> Project: Hive
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: Xiaomin Zhang
>Priority: Major
>
> When running Hive load tests, we observed Beeline can fetch wrong query 
> result which is from another one running at same time.  We ruled out Load 
> Balancing issue, because it happened to a single HiveServer2.  And we found 
> this issue only happens when *hive.query.result.cached.enabled is false.*
> All test queries are in the same format as below: 
> {code:java}
> select concat('total record (test_$PID)=',count(*)) as count_record from t1t
> {code}
> We randomized the query by replacing the $PID with the Beeline PID and the 
> test driver ran 10 Beeline concurrently.  The table t1t is static and has a 
> few rows. So now the test driver can check if the query result is equal to: 
> total record (test_recon_mock_$PID)=2
> When query result cache is disabled,  we can see randomly query got a wrong 
> result, and can always reproduced.  For example, below two queries were 
> running in parallel:
> {code:java}
> queryId=hive_20240701103742_ff1adb2d-e9eb-448d-990e-00ab371e9db6): select 
> concat('total record (test_21535)=',count(*)) as count_record from t1t
> queryId=hive_20240701103742_9bdfff92-89e1-4bcd-88ea-bf73ba5fd93d): select 
> concat('total record (test_21566)=',count(*)) as count_record from t1t
> {code}
> While the second query is supposed to get below result:
> *total record (test_21566)=2*
> But actually Beeline got below result:
> *total record (test_21535)=2*
> There is no error in the HS2 log.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28532) Map Join Reuse cache allows to share hashtables for Outer join and Inner join

2024-09-22 Thread Ramesh Kumar Thangarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramesh Kumar Thangarajan updated HIVE-28532:

Description: Map Join Reuse cache allows to share hashtables for Outer join 
and Inner join. But we cannot reuse a hash table for a non-outer join vs outer 
join. Because outer join cannot accept the hash table kind other than HASHMAP, 
whereas there are other types like HASHSET and HASH_MULTISET.  (was: Map Join 
Reuse cache allows to share hashtables for Outer join and Inner join. But we 
cannot reuse for hash table for a non-outer join to outer join. Because outer 
join cannot accept the hash table kind other than HASHMAP, whereas there are 
other types like HASHSET and HASH_MULTISET.)

> Map Join Reuse cache allows to share hashtables for Outer join and Inner join
> -
>
> Key: HIVE-28532
> URL: https://issues.apache.org/jira/browse/HIVE-28532
> Project: Hive
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>
> Map Join Reuse cache allows to share hashtables for Outer join and Inner 
> join. But we cannot reuse a hash table for a non-outer join vs outer join. 
> Because outer join cannot accept the hash table kind other than HASHMAP, 
> whereas there are other types like HASHSET and HASH_MULTISET.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28355) Fix intermittent failure of TestHplSqlViaBeeLine#testUNIX_TIMESTAMPHplSQLFunction

2024-09-22 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28355:
---
Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

> Fix intermittent failure of 
> TestHplSqlViaBeeLine#testUNIX_TIMESTAMPHplSQLFunction
> -
>
> Key: HIVE-28355
> URL: https://issues.apache.org/jira/browse/HIVE-28355
> Project: Hive
>  Issue Type: Bug
>Reporter: Kiran Velumuri
>Assignee: Kiran Velumuri
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
> Attachments: image-2024-06-27-12-27-47-772.png
>
>
> The test TestHplSqlViaBeeLine#testUNIX_TIMESTAMPHplSQLFunction compares 
> UNIX_TIMESTAMP() with System.currentTimeMillis() using regex.
> Instead of seconds(divide by 1000), it compares in 10s of seconds(divide by 
> 1) to account for the difference of few seconds that might come up 
> between the execution of the System.currentTimeMillis()/1 and 
> UNIX_TIMESTAMP().
> However, it fails in the case when System.currentTimeMillis()(in seconds) 
> differs from UNIX_TIMESTAMP() in 10s digit and/or 100s digit and/or 1000s 
> digit and so on
>  
> Examples:
> The current comparison is highlighted in bold.
> 1. Difference in 1s digit - success
> System.currentTimeMillis() in seconds - {*}171946770{*}5
> UNIX_TIMESTAMP() - {*}171946770{*}6
>  
> 2. Difference in 10s digit - fail
> System.currentTimeMillis() in seconds - {*}171946770{*}9
> UNIX_TIMESTAMP() - {*}171946771{*}0
>  
> 3. Difference in 100s digit - fail
> System.currentTimeMillis() in seconds - {*}171946779{*}9
> UNIX_TIMESTAMP() - {*}171946780{*}0
>  
> 4. Difference in 1000s digit - fail
> System.currentTimeMillis() in seconds - {*}171946799{*}9
> UNIX_TIMESTAMP() - {*}171946800{*}0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28253) Unable to set the value for hplsql.onerror in hplsql mode.

2024-09-22 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28253:
---
Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

> Unable to set the value for hplsql.onerror in hplsql mode.
> --
>
> Key: HIVE-28253
> URL: https://issues.apache.org/jira/browse/HIVE-28253
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> Unable to set the value for hplsql.onerror in hplsql mode.
>  
> Stesp to reproduce:
> {noformat}
> 0: jdbc:hive2://localhost> set hplsql.onerror='stop';
> . . . . . . . . . . . . . . . . . . . . . . .> /
> ERROR : Syntax error at line 1:18 no viable alternative at input 
> 'hplsql.onerror='
> ERROR : Ln:1 identifier 'SET' must be declared.
> No rows affected (0.534 seconds)
> 0: jdbc:hive2://localhost> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28214) HPLSQL not using the hive variables passed through beeline using --hivevar option

2024-09-22 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28214:
---
Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

> HPLSQL not using the hive variables passed through beeline using --hivevar 
> option
> -
>
> Key: HIVE-28214
> URL: https://issues.apache.org/jira/browse/HIVE-28214
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> HPLSQL not using the hive variables passed through beeline using --hivevar 
> option.
> Steps to reproduce:
> {noformat}
> beeline -u 
> 'jdbc:hive2://localhost:1/default;user=hive;password=hive;mode=hplsql' 
> --hivevar hivedb=sys --hivevar hivetbl=tbls{noformat}
> {noformat}
> 0: jdbc:hive2://localhost> DECLARE hivedb_tbl string;
>  . . . . . . . . . . . . . . . . . . . . . . .> SELECT hivedb || '.' || 
> hivetbl into hivedb_tbl;
>  . . . . . . . . . . . . . . . . . . . . . . .> PRINT hivedb_tbl;
>  . . . . . . . . . . . . . . . . . . . . . . .> /
> INFO  : Compiling 
> command(queryId=hive_20240424145826_617acb79-0b27-46eb-aa05-1332703c94fb): 
> SELECT CONCAT(hivedb, '.', hivetbl) 
> ERROR : FAILED: SemanticException [Error 10004]: Line 1:14 Invalid table 
> alias or column reference 'hivedb': (possible column names are: ) 
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:14 Invalid table 
> alias or column reference 'hivedb': (possible column names are: )
>  
> INFO  : Completed compiling 
> command(queryId=hive_20240424145826_617acb79-0b27-46eb-aa05-1332703c94fb); 
> Time taken: 3.976 seconds 
> ERROR : Unhandled exception in HPL/SQL 
> No rows affected (4.901 seconds)
> 0: jdbc:hive2://localhost>
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28215) Signalling CONDITION HANDLER is not working in HPLSQL.

2024-09-22 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28215:
---
Labels: hive-4.0.0-must pull-request-available  (was: 
pull-request-available)

> Signalling CONDITION HANDLER is not working in HPLSQL.
> --
>
> Key: HIVE-28215
> URL: https://issues.apache.org/jira/browse/HIVE-28215
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: hive-4.0.0-must, pull-request-available
> Fix For: 4.1.0
>
>
> Signalling CONDITION HANDLER is not working in HPLSQL.
> Refer [http://www.hplsql.org/declare-condition] and 
> [http://www.hplsql.org/declare-handler] for more details about this feature.
>  
> +Steps to Reproduce:+
> {noformat}
> jdbc:hive2://ccycloud-1.nightly-71x-oq.roo> DECLARE cnt INT DEFAULT 0; 
> . . . . . . . . . . . . . . . . . . . . . . .> DECLARE wrong_cnt_condition 
> CONDITION;
> . . . . . . . . . . . . . . . . . . . . . . .> 
> . . . . . . . . . . . . . . . . . . . . . . .> DECLARE EXIT HANDLER FOR 
> wrong_cnt_condition
> . . . . . . . . . . . . . . . . . . . . . . .>   PRINT 'Wrong number of 
> rows'; 
> . . . . . . . . . . . . . . . . . . . . . . .> 
> . . . . . . . . . . . . . . . . . . . . . . .> EXECUTE IMMEDIATE 'SELECT 
> COUNT(*) FROM sys.tbls' INTO cnt;
> . . . . . . . . . . . . . . . . . . . . . . .> 
> . . . . . . . . . . . . . . . . . . . . . . .> IF cnt <> 0 THEN
> . . . . . . . . . . . . . . . . . . . . . . .>   SIGNAL wrong_cnt_condition;
> . . . . . . . . . . . . . . . . . . . . . . .> END IF;
> . . . . . . . . . . . . . . . . . . . . . . .> /
> INFO  : Compiling 
> command(queryId=hive_20240424171747_7f22fef6-70d5-483a-af67-7a6b9f17ac8b): 
> SELECT COUNT(*) FROM sys.tbls
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
> type:bigint, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20240424171747_7f22fef6-70d5-483a-af67-7a6b9f17ac8b); 
> Time taken: 0.995 seconds 
> INFO  : Completed executing 
> command(queryId=hive_20240424171747_7f22fef6-70d5-483a-af67-7a6b9f17ac8b); 
> Time taken: 8.479 seconds
> INFO  : OK
> ERROR : wrong_cnt_condition
> No rows affected (9.559 seconds)
> 0: jdbc:hive2://localhost>{noformat}
>  
> Here when _SIGNAL wrong_cnt_condition;_ statement is executed, it has to 
> invoke corresponding continue/exit handlers and should execute the statements 
> present in the handler block. But currently its not happening.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28247) Execute immediate 'select count(*) from tbl' throwing ClassCastException in hplsql mode.

2024-09-22 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28247:
---
Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

> Execute immediate 'select count(*) from tbl' throwing ClassCastException in 
> hplsql mode.
> 
>
> Key: HIVE-28247
> URL: https://issues.apache.org/jira/browse/HIVE-28247
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Reporter: Dayakar M
>Assignee: Dayakar M
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> Execute immediate 'select count(*) from tbl' throwing ClassCastException in 
> hplsql mode.
>  
> Steps to reproduce:
> {noformat}
> execute immediate 'SELECT count(*) from result';"{noformat}
> StackTrace in HS2 logs:
> {noformat}
> 2024-05-06T08:45:42,730 ERROR [HiveServer2-Background-Pool: Thread-850] 
> hplsql.HplSqlOperation: Error running hive query
> org.apache.hive.service.cli.HiveSQLException: Error running HPL/SQL operation
>at 
> org.apache.hive.service.cli.operation.hplsql.HplSqlOperation.interpret(HplSqlOperation.java:111)
>  ~[hive-service-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at 
> org.apache.hive.service.cli.operation.hplsql.HplSqlOperation.access$500(HplSqlOperation.java:54)
>  ~[hive-service-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at 
> org.apache.hive.service.cli.operation.hplsql.HplSqlOperation$BackgroundWork.lambda$run$0(HplSqlOperation.java:207)
>  ~[hive-service-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_292]
>at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_292]
>at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>  ~[hadoop-common-3.3.6.jar:?]
>at 
> org.apache.hive.service.cli.operation.hplsql.HplSqlOperation$BackgroundWork.run(HplSqlOperation.java:219)
>  ~[hive-service-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_292]
>at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_292]
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_292]
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_292]
>at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]
> Caused by: java.lang.ClassCastException: class java.lang.Long cannot be 
> casted to class java.lang.String
>at 
> org.apache.hive.service.cli.operation.hplsql.HplSqlQueryExecutor$OperationRowResult.get(HplSqlQueryExecutor.java:147)
>  ~[hive-service-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at org.apache.hive.hplsql.executor.QueryResult.column(QueryResult.java:49) 
> ~[hive-hplsql-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at org.apache.hive.hplsql.Stmt.exec(Stmt.java:1095) 
> ~[hive-hplsql-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at org.apache.hive.hplsql.Exec.visitExec_stmt(Exec.java:2061) 
> ~[hive-hplsql-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at org.apache.hive.hplsql.Exec.visitExec_stmt(Exec.java:96) 
> ~[hive-hplsql-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at 
> org.apache.hive.hplsql.HplsqlParser$Exec_stmtContext.accept(HplsqlParser.java:10369)
>  ~[hive-hplsql-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:46)
>  ~[antlr4-runtime-4.9.3.jar:4.9.3]
>at org.apache.hive.hplsql.Exec.visitStmt(Exec.java:1103) 
> ~[hive-hplsql-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at org.apache.hive.hplsql.Exec.visitStmt(Exec.java:96) 
> ~[hive-hplsql-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at 
> org.apache.hive.hplsql.HplsqlParser$StmtContext.accept(HplsqlParser.java:1054)
>  ~[hive-hplsql-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:46)
>  ~[antlr4-runtime-4.9.3.jar:4.9.3]
>at 
> org.apache.hive.hplsql.HplsqlBaseVisitor.visitBlock(HplsqlBaseVisitor.java:27)
>  ~[hive-hplsql-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at 
> org.apache.hive.hplsql.HplsqlParser$BlockContext.accept(HplsqlParser.java:473)
>  ~[hive-hplsql-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visitChildren(AbstractParseTreeVisitor.java:46)
>

[jira] [Created] (HIVE-28532) Map Join Reuse cache allows to share hashtables for Outer join and Inner join

2024-09-22 Thread Ramesh Kumar Thangarajan (Jira)
Ramesh Kumar Thangarajan created HIVE-28532:
---

 Summary: Map Join Reuse cache allows to share hashtables for Outer 
join and Inner join
 Key: HIVE-28532
 URL: https://issues.apache.org/jira/browse/HIVE-28532
 Project: Hive
  Issue Type: Bug
  Security Level: Public (Viewable by anyone)
Reporter: Ramesh Kumar Thangarajan
Assignee: Ramesh Kumar Thangarajan


Map Join Reuse cache allows to share hashtables for Outer join and Inner join. 
But we cannot reuse for hash table for a non-outer join to outer join. Because 
outer join cannot accept the hash table kind other than HASHMAP, whereas there 
are other types like HASHSET and HASH_MULTISET.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28455) Missing dependencies due to upgrade of maven-shade-plugin

2024-09-22 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28455:
---
Labels:   (was: hive-4.0.1-must)

> Missing dependencies due to upgrade of maven-shade-plugin
> -
>
> Key: HIVE-28455
> URL: https://issues.apache.org/jira/browse/HIVE-28455
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0, 4.0.0-beta-1, 4.1.0
>Reporter: Kokila N
>Assignee: Kokila N
>Priority: Major
>
> For, hive jdbc , we create two jars {{hive-jdbc}} and 
> {{hive-jdbc-standalone}} (shaded jar/uber jar).
> *Reason for change in pom :*
> Due to the changes in the maven code after version 3.2.4, when we create a 
> shaded jar ( {{{}hive-jdbc-standalone{}}}),  {{dependency-reduced-pom.xml}}  
> is generated and dependencies that have been included into the uber JAR will 
> be removed from the {{}} section of the generated POM to avoid 
> duplication. This {{dependency-reduced-pom.xml}} is why the dependencies are 
> removed from the pom as its common for both {{hive-jdbc}} and 
> {{{}hive-jdbc-standalone{}}}. So, currently for hive-jdbc , the transitive 
> dependencies for it are not propagated.
> Same applies to hive-beeline and hive-exec modules as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28337) Process timestamps at UTC timezone instead of local timezone in MetaStoreUtils

2024-09-21 Thread Kiran Velumuri (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kiran Velumuri updated HIVE-28337:
--
Summary: Process timestamps at UTC timezone instead of local timezone in 
MetaStoreUtils  (was: TestMetaStoreUtils fails for invalid timestamps)

> Process timestamps at UTC timezone instead of local timezone in MetaStoreUtils
> --
>
> Key: HIVE-28337
> URL: https://issues.apache.org/jira/browse/HIVE-28337
> Project: Hive
>  Issue Type: Bug
>Reporter: Kiran Velumuri
>Assignee: Kiran Velumuri
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-06-18-12-42-05-646.png, 
> image-2024-06-18-12-42-31-472.png
>
>
> Currently in MetaStoreUtils, the conversion to/from timestamp and string 
> makes use of LocalDateTime in the local time zone while processing 
> timestamps. This causes issue with representing timestamps *as mentioned 
> below*. Instead, while dealing with timestamps it is proposed to use 
> java.time.Instant to represent a point on the time-line, which would overcome 
> the issue with representing such timestamps. Accordingly the test class for 
> MetaStoreUtils (TestMetaStoreUtils) has also been modified to account for 
> these changes.
> +Failing scenario:+
> Timestamps in time-zones which observe daylight savings during which the 
> clock is set forward(typicallly 2:00 AM - 3:00 AM)
> Example: 2417-03-26T02:08:43 in Europe/Paris is invalid, and would get 
> converted to 2417-03-26T03:08:43 by Timestamp.valueOf() method, when instead 
> we want to represent the original timestamp without conversion.
> This is happening due to representing timestamp as LocalDateTime in 
> TestMetaStoreUtils, which is independent of the time-zone of the timestamp. 
> This LocalDateTime timestamp when combined with time-zone is leading to 
> invalid timestamp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28337) TestMetaStoreUtils fails for invalid timestamps

2024-09-21 Thread Kiran Velumuri (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kiran Velumuri updated HIVE-28337:
--
Description: 
Currently in MetaStoreUtils, the conversion to/from timestamp and string makes 
use of LocalDateTime in the local time zone while processing timestamps. This 
causes issue with representing timestamps *as mentioned below*. Instead, while 
dealing with timestamps it is proposed to use java.time.Instant to represent a 
point on the time-line, which would overcome the issue with representing such 
timestamps. Accordingly the test class for MetaStoreUtils (TestMetaStoreUtils) 
has also been modified to account for these changes.



+Failing scenario:+
Timestamps in time-zones which observe daylight savings during which the clock 
is set forward(typicallly 2:00 AM - 3:00 AM)

Example: 2417-03-26T02:08:43 in Europe/Paris is invalid, and would get 
converted to 2417-03-26T03:08:43 by Timestamp.valueOf() method, when instead we 
want to represent the original timestamp without conversion.

This is happening due to representing timestamp as LocalDateTime in 
TestMetaStoreUtils, which is independent of the time-zone of the timestamp. 
This LocalDateTime timestamp when combined with time-zone is leading to invalid 
timestamp.

  was:
The test 
org.apache.hadoop.hive.metastore.utils.TestMetaStoreUtils#testTimestampToString 
and #testDateToString fails for invalid timestamps in the following cases:

1. Timestamps in time-zones which observe daylight savings during which the 
clock is set forward(typicallly 2:00 AM - 3:00 AM)

Example: 2417-03-26T02:08:43 in Europe/Paris is invalid, and would get 
converted to 2417-03-26T03:08:43 by Timestamp.valueOf() method

This is happening due to representing timestamp as LocalDateTime in 
TestMetaStoreUtils, which is independent of the time-zone of the timestamp. 
This LocalDateTime timestamp when combined with time-zone is leading to invalid 
timestamp.

 

2. Timestamps with year as ''

Example: -01-07T22:44:36 is invalid and would get converted to 
0001-01-07T22:44:36 by Timestamp.valueof() method

Year '' is invalid and should not be included while generating the test 
cases.


> TestMetaStoreUtils fails for invalid timestamps
> ---
>
> Key: HIVE-28337
> URL: https://issues.apache.org/jira/browse/HIVE-28337
> Project: Hive
>  Issue Type: Bug
>Reporter: Kiran Velumuri
>Assignee: Kiran Velumuri
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-06-18-12-42-05-646.png, 
> image-2024-06-18-12-42-31-472.png
>
>
> Currently in MetaStoreUtils, the conversion to/from timestamp and string 
> makes use of LocalDateTime in the local time zone while processing 
> timestamps. This causes issue with representing timestamps *as mentioned 
> below*. Instead, while dealing with timestamps it is proposed to use 
> java.time.Instant to represent a point on the time-line, which would overcome 
> the issue with representing such timestamps. Accordingly the test class for 
> MetaStoreUtils (TestMetaStoreUtils) has also been modified to account for 
> these changes.
> +Failing scenario:+
> Timestamps in time-zones which observe daylight savings during which the 
> clock is set forward(typicallly 2:00 AM - 3:00 AM)
> Example: 2417-03-26T02:08:43 in Europe/Paris is invalid, and would get 
> converted to 2417-03-26T03:08:43 by Timestamp.valueOf() method, when instead 
> we want to represent the original timestamp without conversion.
> This is happening due to representing timestamp as LocalDateTime in 
> TestMetaStoreUtils, which is independent of the time-zone of the timestamp. 
> This LocalDateTime timestamp when combined with time-zone is leading to 
> invalid timestamp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27743) Semantic Search In Hive

2024-09-21 Thread Sreenath (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreenath updated HIVE-27743:

Description: 
_Semantic search is the tech power *vector databases,* and we can have the same 
power in Hive._
Semantic search is a way for computers to understand the meaning behind words 
and phrases when you're searching for something. Instead of just looking for 
exact matches of keywords, it tries to figure out what you're really asking and 
provides results that are more relevant and meaningful to your question. It's 
like having a search engine that can understand what you mean, not just what 
you say, making it easier to find the information you're looking for. This 
ticket is to have Semantic search in Hive as UDFs.

The proposal is to implement functions for on-the-fly calculation of similarity 
distance between two values. Once we have them we could easily do semantic 
search as part of a where clause.
 * Eg (using a cosine similarity function): “WHERE cos_sim(region, 'europe') > 
0.9“. And it could return records with regions like Scandinavia, Nordic, Baltic 
etc…
 * We could have functions thats accept values as text or as vector embeddings.

*On the implementation side, we can have a set of new UDFs and configuration 
properties:*

*UDFs:*
 # *embed(sentences[, prompt, embedding_type, normalize_embeddings])*
 # *cos_sim(a, b)*
 # *dot_score(a, b)*
 # *euclidean_sim(a, b)*
 # *manhattan_sim(a, b)*

Additionally we can have a *llm(text)* function to use the power of a LLM.

*Configuration properties:*
 # hive.embedding.model - Path to a pre-trained SentenceTransformer model
 # hive.embedding.batch_size - The batch size used for the computation
 # hive.embedding.precision - The precision to use for the embeddings. Can be 
“float32”, “int8”, “uint8”, “binary”, or “ubinary”
 # hive.embedding.default_prompt - Prompt prefix that must be used by default
 # hive.embedding.cache_folder - Path to a local folder to store models

  was:
_Semantic search is the tech power *vector databases,* and we can have the same 
power in Hive._
Semantic search is a way for computers to understand the meaning behind words 
and phrases when you're searching for something. Instead of just looking for 
exact matches of keywords, it tries to figure out what you're really asking and 
provides results that are more relevant and meaningful to your question. It's 
like having a search engine that can understand what you mean, not just what 
you say, making it easier to find the information you're looking for. This 
ticket is to have Semantic search in Hive as UDFs.

The proposal is to implement functions for on-the-fly calculation of similarity 
distance between two values. Once we have them we could easily do semantic 
search as part of a where clause.
 * Eg (using a cosine similarity function): “WHERE cos_sim(region, 'europe') > 
0.9“. And it could return records with regions like Scandinavia, Nordic, Baltic 
etc…
 * We could have functions thats accept values as text or as vector embeddings.

*On the implementation side, we can have a set of new UDFs and configuration 
properties:*

*UDFs:*
 # embed(sentences[, prompt, embedding_type, normalize_embeddings])
 # cos_sim(a, b)
 # dot_score(a, b)
 # euclidean_sim(a, b)
 # manhattan_sim(a, b)

*Configuration properties:*
 # hive.embedding.model - Path to a pre-trained SentenceTransformer model
 # hive.embedding.batch_size - The batch size used for the computation
 # hive.embedding.precision - The precision to use for the embeddings. Can be 
“float32”, “int8”, “uint8”, “binary”, or “ubinary”
 # hive.embedding.default_prompt - Prompt prefix that must be used by default
 # hive.embedding.cache_folder - Path to a local folder to store models


> Semantic Search In Hive
> ---
>
> Key: HIVE-27743
> URL: https://issues.apache.org/jira/browse/HIVE-27743
> Project: Hive
>  Issue Type: Wish
> Environment: *  
>Reporter: Sreenath
>Assignee: Sreenath
>Priority: Major
>
> _Semantic search is the tech power *vector databases,* and we can have the 
> same power in Hive._
> Semantic search is a way for computers to understand the meaning behind words 
> and phrases when you're searching for something. Instead of just looking for 
> exact matches of keywords, it tries to figure out what you're really asking 
> and provides results that are more relevant and meaningful to your question. 
> It's like having a search engine that can understand what you mean, not just 
> what you say, making it easier to find the information you're looking for. 
> This ticket is to have Semantic search in Hive as UDFs.
> The proposal is to implement functions fo

[jira] [Assigned] (HIVE-28265) Improve the error message for hive.query.timeout.seconds

2024-09-20 Thread Shohei Okumiya (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shohei Okumiya reassigned HIVE-28265:
-

Assignee: (was: Shohei Okumiya)

> Improve the error message for hive.query.timeout.seconds
> 
>
> Key: HIVE-28265
> URL: https://issues.apache.org/jira/browse/HIVE-28265
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Shohei Okumiya
>Priority: Major
>
> `hive.query.timeout.seconds` seems to be working correctly, but it always 
> says it timed out in 0 second.
> {code:java}
> 0: jdbc:hive2://hive-hiveserver2:1/defaul> set 
> hive.query.timeout.seconds=1s;
> No rows affected (0.111 seconds)
> 0: jdbc:hive2://hive-hiveserver2:1/defaul> select count(*) from test;
> ...
> Error: Query timed out after 0 seconds (state=,code=0){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28282) Merging into iceberg table fails with copy on write when values clause has a function call

2024-09-20 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28282:
--
Parent: (was: HIVE-26630)
Issue Type: Bug  (was: Sub-task)

> Merging into iceberg table fails with copy on write when values clause has a 
> function call
> --
>
> Key: HIVE-28282
> URL: https://issues.apache.org/jira/browse/HIVE-28282
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration, Query Planning
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> {code}
> create external table target_ice(a int, b string, c int) stored by iceberg 
> tblproperties ('format-version'='2', 'write.merge.mode'='copy-on-write');
> create table source(a int, b string, c int);
> explain
> merge into target_ice as t using source src ON t.a = src.a
> when matched and t.a > 100 THEN DELETE
> when not matched then insert (a, b) values (src.a, concat(src.b, '-merge new 
> 2'));
> {code}
> {code}
>  org.apache.hadoop.hive.ql.parse.SemanticException: Encountered parse error 
> while parsing rewritten merge/update or delete query
>   at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.parseRewrittenQuery(ParseUtils.java:721)
>   at 
> org.apache.hadoop.hive.ql.parse.rewrite.CopyOnWriteMergeRewriter.rewrite(CopyOnWriteMergeRewriter.java:84)
>   at 
> org.apache.hadoop.hive.ql.parse.rewrite.CopyOnWriteMergeRewriter.rewrite(CopyOnWriteMergeRewriter.java:48)
>   at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.rewriteAndAnalyze(RewriteSemanticAnalyzer.java:93)
>   at 
> org.apache.hadoop.hive.ql.parse.MergeSemanticAnalyzer.analyze(MergeSemanticAnalyzer.java:201)
>   at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyze(RewriteSemanticAnalyzer.java:84)
>   at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:72)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>   at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:732)
>   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:702)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:115)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>   at 
> org.apache.hadoop.hive.cli.TestIcebergLlapLocalCliDriver.testCliDriver(TestIcebergLlapLocalCliDriver.java:60)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeEx

[jira] [Updated] (HIVE-28165) HiveSplitGenerator: send splits through filesystem instead of RPC in case of big payload

2024-09-20 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28165:
--
Labels: pull-request-available  (was: hive-4.0.1-must 
pull-request-available)

> HiveSplitGenerator: send splits through filesystem instead of RPC in case of 
> big payload
> 
>
> Key: HIVE-28165
> URL: https://issues.apache.org/jira/browse/HIVE-28165
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> After some investigations regarding hive iceberg issues, it turned out that 
> in the presence of delete files, the serialized payload might be huge, like 
> 1-4MB / split, which might lead to extreme memory pressure in the Tez AM, 
> getting worse when having more and more splits.
> Optimizing the payload is always the best option but it's not that obvious: 
> instead, we should make hive and tez together take care of such situations 
> without running into OOMs like this below:
> {code}
> ERROR : FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1711290808080__4_00, diagnostics=[Vertex 
> vertex_1711290808080__4_00 [Map 1] killed/failed due 
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: web_sales_1 initializer failed, 
> vertex=vertex_1711290808080__4_00 [Map 1], java.lang.OutOfMemoryError: 
> Java heap space
>   at 
> com.google.protobuf.ByteString$CodedBuilder.(ByteString.java:907)
>   at 
> com.google.protobuf.ByteString$CodedBuilder.(ByteString.java:902)
>   at com.google.protobuf.ByteString.newCodedBuilder(ByteString.java:898)
>   at 
> com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.createEventList(HiveSplitGenerator.java:378)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:337)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda$runInitializer$3(RootInputInitializerManager.java:199)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$$Lambda$319/0x000840942440.run(Unknown
>  Source)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializer(RootInputInitializerManager.java:192)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializerAndProcessResult(RootInputInitializerManager.java:173)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda$createAndStartInitializing$2(RootInputInitializerManager.java:167)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$$Lambda$318/0x000840942040.run(Unknown
>  Source)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28165) HiveSplitGenerator: send splits through filesystem instead of RPC in case of big payload

2024-09-20 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28165:
--
Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

> HiveSplitGenerator: send splits through filesystem instead of RPC in case of 
> big payload
> 
>
> Key: HIVE-28165
> URL: https://issues.apache.org/jira/browse/HIVE-28165
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> After some investigations regarding hive iceberg issues, it turned out that 
> in the presence of delete files, the serialized payload might be huge, like 
> 1-4MB / split, which might lead to extreme memory pressure in the Tez AM, 
> getting worse when having more and more splits.
> Optimizing the payload is always the best option but it's not that obvious: 
> instead, we should make hive and tez together take care of such situations 
> without running into OOMs like this below:
> {code}
> ERROR : FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1711290808080__4_00, diagnostics=[Vertex 
> vertex_1711290808080__4_00 [Map 1] killed/failed due 
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: web_sales_1 initializer failed, 
> vertex=vertex_1711290808080__4_00 [Map 1], java.lang.OutOfMemoryError: 
> Java heap space
>   at 
> com.google.protobuf.ByteString$CodedBuilder.(ByteString.java:907)
>   at 
> com.google.protobuf.ByteString$CodedBuilder.(ByteString.java:902)
>   at com.google.protobuf.ByteString.newCodedBuilder(ByteString.java:898)
>   at 
> com.google.protobuf.AbstractMessageLite.toByteString(AbstractMessageLite.java:49)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.createEventList(HiveSplitGenerator.java:378)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:337)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda$runInitializer$3(RootInputInitializerManager.java:199)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$$Lambda$319/0x000840942440.run(Unknown
>  Source)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializer(RootInputInitializerManager.java:192)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializerAndProcessResult(RootInputInitializerManager.java:173)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda$createAndStartInitializing$2(RootInputInitializerManager.java:167)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$$Lambda$318/0x000840942040.run(Unknown
>  Source)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28282) Merging into iceberg table fails with copy on write when values clause has a function call

2024-09-20 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28282:
--
Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

> Merging into iceberg table fails with copy on write when values clause has a 
> function call
> --
>
> Key: HIVE-28282
> URL: https://issues.apache.org/jira/browse/HIVE-28282
> Project: Hive
>  Issue Type: Sub-task
>  Components: Iceberg integration, Query Planning
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> {code}
> create external table target_ice(a int, b string, c int) stored by iceberg 
> tblproperties ('format-version'='2', 'write.merge.mode'='copy-on-write');
> create table source(a int, b string, c int);
> explain
> merge into target_ice as t using source src ON t.a = src.a
> when matched and t.a > 100 THEN DELETE
> when not matched then insert (a, b) values (src.a, concat(src.b, '-merge new 
> 2'));
> {code}
> {code}
>  org.apache.hadoop.hive.ql.parse.SemanticException: Encountered parse error 
> while parsing rewritten merge/update or delete query
>   at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.parseRewrittenQuery(ParseUtils.java:721)
>   at 
> org.apache.hadoop.hive.ql.parse.rewrite.CopyOnWriteMergeRewriter.rewrite(CopyOnWriteMergeRewriter.java:84)
>   at 
> org.apache.hadoop.hive.ql.parse.rewrite.CopyOnWriteMergeRewriter.rewrite(CopyOnWriteMergeRewriter.java:48)
>   at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.rewriteAndAnalyze(RewriteSemanticAnalyzer.java:93)
>   at 
> org.apache.hadoop.hive.ql.parse.MergeSemanticAnalyzer.analyze(MergeSemanticAnalyzer.java:201)
>   at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyze(RewriteSemanticAnalyzer.java:84)
>   at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:72)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>   at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:732)
>   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:702)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:115)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>   at 
> org.apache.hadoop.hive.cli.TestIcebergLlapLocalCliDriver.testCliDriver(TestIcebergLlapLocalCliDriver.java:60)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.Framewor

[jira] [Updated] (HIVE-28310) Disable hive.optimize.join.disjunctive.transitive.predicates.pushdown by default

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28310:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Disable hive.optimize.join.disjunctive.transitive.predicates.pushdown by 
> default
> 
>
> Key: HIVE-28310
> URL: https://issues.apache.org/jira/browse/HIVE-28310
> Project: Hive
>  Issue Type: Task
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> HIVE-25758 introduced 
> hive.optimize.join.disjunctive.transitive.predicates.pushdown  to 
> conditionally limit some features of the HiveJoinPushTransitivePredicatesRule 
> which are rather unsafe and can lead to Hiveserver2 crashes (OOM, hangs, 
> etc.). 
> The property was initially set to true to retain the old behavior and prevent 
> changes in performance for those queries that work fine as is. However, when 
> the property is true there are various known cases/queries that can bring 
> down HS2 completely. When this happens debugging, finding the root cause, and 
> turning off the property may require lots of effort from developers and users.
> In this ticket, we propose to disable the property by default and thus limit 
> the optimizations performed by the rule (at least till a complete solution is 
> found for the known problematic cases).
> This change favors HS2 stability at the expense of slight performance 
> degradation in certain queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28399) Improve the fetch size in HiveConnection

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28399:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Improve the fetch size in HiveConnection
> 
>
> Key: HIVE-28399
> URL: https://issues.apache.org/jira/browse/HIVE-28399
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> If the 4.x Hive Jdbc client connects to an older HS2 or other thrift 
> implementations, it might throw the IllegalStateException: 
> [https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L1253-L1258],
>  as the remote might haven't set the property:  
> hive.server2.thrift.resultset.default.fetch.size back to the response of 
> OpenSession request. It also introduces the confusing on what the real fetch 
> size the connection is, as we have both initFetchSize and defaultFetchSize in 
> this HiveConnection, the HiveStatement checks the initFetchSize, 
> defaultFetchSize and 
> HIVE_SERVER2_THRIFT_RESULTSET_DEFAULT_FETCH_SIZE.defaultIntVal to obtain the 
> real fetch size, we can make them one in HiveConnection, so every statement 
> created from the connection uses this new fetch size.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28515) Iceberg: Concurrent queries fail during commit with ValidationException

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28515:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Iceberg: Concurrent queries fail during commit with ValidationException
> ---
>
> Key: HIVE-28515
> URL: https://issues.apache.org/jira/browse/HIVE-28515
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> {noformat}
> Caused by: org.apache.iceberg.exceptions.ValidationException: Cannot commit, 
> missing data files: 
> [file:/Users/ayushsaxena/code/hive/iceberg/iceberg-handler/target/tmp/hive7073916777566968859/external/customers/data/0-0-data-ayushsaxena_20240909232021_99fd025f-1e27-4541-ab3e-77c6f9905eb7-job_17259492220180_0001-6-1.parquet]
>         at 
> org.apache.iceberg.MergingSnapshotProducer.validateDataFilesExist(MergingSnapshotProducer.java:751)
>         at org.apache.iceberg.BaseRowDelta.validate(BaseRowDelta.java:116)
>         at 
> org.apache.iceberg.SnapshotProducer.apply(SnapshotProducer.java:233)
>         at 
> org.apache.iceberg.SnapshotProducer.lambda$commit$2(SnapshotProducer.java:384)
>         at 
> org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
>         at 
> org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
>         at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)
>         at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
>         at 
> org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:382)
>         at 
> org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitWrite(HiveIcebergOutputCommitter.java:580)
>         at 
> org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitTable(HiveIcebergOutputCommitter.java:494)
>         at 
> org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.lambda$commitJobs$4(HiveIcebergOutputCommitter.java:291){noformat}
> Queries failing with {{ValidationException}} during commit even with retry 
> strategy configured with {{write_conflict}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28327) Missing null-check in TruncDateFromTimestamp

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28327:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Missing null-check in TruncDateFromTimestamp
> 
>
> Key: HIVE-28327
> URL: https://issues.apache.org/jira/browse/HIVE-28327
> Project: Hive
>  Issue Type: Bug
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> The vectorized implementation of UDF trunc() does not null-check when 
> VectorizedRowBatch.selectedInUse is true. This causes NullPointerException 
> when running vector_udf_trunc.q using TestMiniLlapLocalCliDriver.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28347) Make a UDAF 'collect_set' work with complex types, even when map-side aggregation is disabled.

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28347:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Make a UDAF 'collect_set' work with complex types, even when map-side 
> aggregation is disabled.
> --
>
> Key: HIVE-28347
> URL: https://issues.apache.org/jira/browse/HIVE-28347
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.4.0, 3.1.3, 4.0.0
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>Priority: Minor
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> collect_set() (+ collect_list()) doesn't work with complex types, when 
> map-side aggregation is disabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28484) SharedWorkOptimizer leaves residual unused operator tree that send DPP events to unknown operators

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28484:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> SharedWorkOptimizer leaves residual unused operator tree that send DPP events 
> to unknown operators
> --
>
> Key: HIVE-28484
> URL: https://issues.apache.org/jira/browse/HIVE-28484
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Physical Optimizer
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> Please see below the series of events:
>  
> {code:java}
> 2024-08-27 15:59:47,141 DEBUG 
> org.apache.hadoop.hive.ql.optimizer.SharedWorkOptimizer: 
> [51bcc513-a0e8-4b90-9108-bed2005f7f8c HiveServer2-Handler-Pool: Thread-190]: 
> Before SharedWorkOptimizer:
> TS[0]-SEL[2]-MAPJOIN[189]-MAPJOIN[194]-SEL[91]-FIL[92]-SEL[93]-LIM[94]-FS[95]
> TS[3]-FIL[123]-SEL[5]-RS[30]-MAPJOIN[185]-MAPJOIN[188]-SEL[38]-GBY[40]-RS[41]-GBY[42]-SEL[43]-RS[86]-MAPJOIN[189]
> TS[6]-FIL[124]-SEL[8]-MAPJOIN[185]
> TS[9]-FIL[126]-SEL[11]-MAPJOIN[187]-SEL[29]-GBY[34]-RS[36]-MAPJOIN[188]
> TS[12]-FIL[128]-SEL[14]-MAPJOIN[186]-GBY[22]-RS[23]-GBY[24]-SEL[25]-RS[27]-MAPJOIN[187]
>                                                                    
> -SEL[147]-GBY[148]-EVENT[149]
> TS[15]-FIL[129]-SEL[17]-RS[19]-MAPJOIN[186]
>                        -SEL[153]-GBY[154]-EVENT[155]
> TS[44]-FIL[130]-SEL[46]-RS[71]-MAPJOIN[190]-MAPJOIN[193]-SEL[79]-GBY[81]-RS[82]-GBY[83]-RS[89]-MAPJOIN[194]
> TS[47]-FIL[131]-SEL[49]-MAPJOIN[190]
> TS[50]-FIL[133]-SEL[52]-MAPJOIN[192]-SEL[70]-GBY[75]-RS[77]-MAPJOIN[193]
> TS[53]-FIL[135]-SEL[55]-MAPJOIN[191]-GBY[63]-RS[64]-GBY[65]-SEL[66]-RS[68]-MAPJOIN[192]
>                                                                    
> -SEL[171]-GBY[172]-EVENT[173]
> TS[56]-FIL[136]-SEL[58]-RS[60]-MAPJOIN[191]
>                        -SEL[177]-GBY[178]-EVENT[179]2024-08-27 15:59:47,141 
> DEBUG org.apache.hadoop.hive.ql.optimizer.SharedWorkOptimizer: 
> [51bcc513-a0e8-4b90-9108-bed2005f7f8c HiveServer2-Handler-Pool: 
> Thread-190]:DPP information stored in the cache: {TS[9]=[EVENT[149]], 
> TS[12]=[EVENT[155]], TS[53]=[EVENT[179]], TS[50]=[EVENT[173]]}2024-08-27 
> 15:59:47,142 DEBUG org.apache.hadoop.hive.ql.optimizer.SharedWorkOptimizer: 
> [51bcc513-a0e8-4b90-9108-bed2005f7f8c HiveServer2-Handler-Pool: Thread-190]: 
> Merging subtree starting at TS[50] into subtree starting at TS[9]
> 2024-08-27 15:59:47,142 DEBUG 
> org.apache.hadoop.hive.ql.optimizer.SharedWorkOptimizer: 
> [51bcc513-a0e8-4b90-9108-bed2005f7f8c HiveServer2-Handler-Pool: Thread-190]: 
> Input operator removed: MAPJOIN[191]
> 2024-08-27 15:59:47,142 DEBUG 
> org.apache.hadoop.hive.ql.optimizer.SharedWorkOptimizer: 
> [51bcc513-a0e8-4b90-9108-bed2005f7f8c HiveServer2-Handler-Pool: Thread-190]: 
> Input operator removed: RS[68]
> 2024-08-27 15:59:47,142 DEBUG 
> org.apache.hadoop.hive.ql.optimizer.SharedWorkOptimizer: 
> [51bcc513-a0e8-4b90-9108-bed2005f7f8c HiveServer2-Handler-Pool: Thread-190]: 
> Input operator removed: GBY[65]
> 2024-08-27 15:59:47,142 DEBUG 
> org.apache.hadoop.hive.ql.optimizer.SharedWorkOptimizer: 
> [51bcc513-a0e8-4b90-9108-bed2005f7f8c HiveServer2-Handler-Pool: Thread-190]: 
> Input operator removed: RS[64]
> 2024-08-27 15:59:47,142 DEBUG 
> org.apache.hadoop.hive.ql.optimizer.SharedWorkOptimizer: 
> [51bcc513-a0e8-4b90-9108-bed2005f7f8c HiveServer2-Handler-Pool: Thread-190]: 
> Input operator removed: SEL[66]
> 2024-08-27 15:59:47,142 DEBUG 
> org.apache.hadoop.hive.ql.optimizer.SharedWorkOptimizer: 
> [51bcc513-a0e8-4b90-9108-bed2005f7f8c HiveServer2-Handler-Pool: Thread-190]: 
> Input operator removed: SEL[55]
> 2024-08-27 15:59:47,142 DEBUG 
> org.apache.hadoop.hive.ql.optimizer.SharedWorkOptimizer: 
> [51bcc513-a0e8-4b90-9108-bed2005f7f8c HiveServer2-Handler-Pool: Thread-190]: 
> Input operator removed: GBY[63]
> 2024-08-27 15:59:47,142 DEBUG 
> org.apache.hadoop.hive.ql.optimizer.SharedWorkOptimizer: 
> [51bcc513-a0e8-4b90-9108-bed2005f7f8c HiveServer2-Handler-Pool: Thread-190]: 
> Input operator removed: RS[60]
> 2024-08-27 15:59:47,142 DEBUG 
> org.apache.hadoop.hive.ql.optimizer.SharedWorkOptimizer: 
> [51bcc513-a0e8-4b90-9108-bed2005f7f8c HiveServer2-Handler-Pool: Thread-190]: 
> Input operator removed

[jira] [Updated] (HIVE-28349) SHOW TABLES with invalid connector, giving 0 results, instead of failing

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28349:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> SHOW TABLES with invalid connector, giving 0 results, instead of failing
> 
>
> Key: HIVE-28349
> URL: https://issues.apache.org/jira/browse/HIVE-28349
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> SHOW TABLES with invalid connector, giving 0 results, instead of failing
> Steps to repro:
> {code:java}
> drop connector postgres_connector;
> create connector postgres_connector type 'postgres' url 
> 'jdbc:postgresql://1.1.1.1:31462' with DCPROPERTIES 
> ("hive.sql.dbcp.username"="root", "hive.sql.dbcp.password"="cloudera");
> drop database pg_hive_testing;
> create remote database pg_hive_testing using postgres_connector with 
> DBPROPERTIES ("connector.remoteDbName"="postgres");
> show tables in pg_hive_testing; {code}
> The last query gives 0 rows (not a failure).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28167) Full table deletion fails when converting to truncate for Iceberg and ACID tables

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28167:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Full table deletion fails when converting to truncate for Iceberg and ACID 
> tables
> -
>
> Key: HIVE-28167
> URL: https://issues.apache.org/jira/browse/HIVE-28167
> Project: Hive
>  Issue Type: Bug
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> A simple repro - 
> {code:java}
> create table ice01 (id int, key int) stored by iceberg stored as orc 
> tblproperties ('format-version'='2', 'write.delete.mode'='copy-on-write');
> insert into ice01 values (1,1),(2,1),(3,1),(4,1);
> insert into ice01 values (1,2),(2,2),(3,2),(4,2);
> insert into ice01 values (1,3),(2,3),(3,3),(4,3);
> insert into ice01 values (1,4),(2,4),(3,4),(4,4);
> insert into ice01 values (1,5),(2,5),(3,5),(4,5);
> explain analyze delete from ice01;
> delete from ice01;
> select count(*) from ice01;
> select * from ice01;
> describe formatted ice01; {code}
> The solution is to convert full table deletion to a truncate operation on the 
> table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28456) ObjectStore updatePartitionColumnStatisticsInBatch can cause connection starvation

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28456:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> ObjectStore updatePartitionColumnStatisticsInBatch can cause connection 
> starvation 
> ---
>
> Key: HIVE-28456
> URL: https://issues.apache.org/jira/browse/HIVE-28456
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> Since HIVE-26419, we have a secondary connection pool for schema generation, 
> and for value generation operations, the size of this pool is 2. However, 
> based on DataNucleus documentation on datanucleus.ConnectionFactory2, link:
> [https://www.datanucleus.org/products/accessplatform_5_0/jdo/persistence.html]
> the secondary pool also serves for nontransactional connections, which makes 
> the ObjectStore updatePartitionColumnStatisticsInBatch request the connection 
> from this pool, as it doesn't open a transaction explicitly. If there is a 
> slow on inserting or updating the column statistics, the pool will become 
> unavailable quickly(the pool reaches its maximum size), the ObjectStore cloud 
> see the "Connection is not available, request timed out" under such a 
> situation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28249) Parquet legacy timezone conversion converts march 1st to 29th feb and fails with not a leap year exception

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28249:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Parquet legacy timezone conversion converts march 1st to 29th feb and fails 
> with not a leap year exception
> --
>
> Key: HIVE-28249
> URL: https://issues.apache.org/jira/browse/HIVE-28249
> Project: Hive
>  Issue Type: Task
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> When handling legacy time stamp conversions in parquet,'February 29' year 
> '200' is an edge case.
> This is because, according to this: [https://www.lanl.gov/Caesar/node202.html]
> The Julian day for 200 CE/02/29 in the Julian calendar is different from the 
> Julian day in Gregorian Calendar .
> ||Date (BC/AD)||Date (CE)||Julian Day||Julian Day||
> |-|  -|(Julian Calendar)|(Gregorian Calendar)|
> |200 AD/02/28|200 CE/02/28|1794166|1794167|
> |200 AD/02/29|200 CE/02/29|1794167|1794168|
> |200 AD/03/01|200 CE/03/01|1794168|1794168|
> |300 AD/02/28|300 CE/02/28|1830691|1830691|
> |300 AD/02/29|300 CE/02/29|1830692|1830692|
> |300 AD/03/01|300 CE/03/01|1830693|1830692|
>  
>  * Because of this:
> {noformat}
> int julianDay = nt.getJulianDay(); {noformat}
> returns julian day 1794167 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/timestamp/NanoTimeUtils.java#L92]
>  * Later :
> {noformat}
> Timestamp result = Timestamp.valueOf(formatter.format(date)); {noformat}
> _{{{}formatter.format(date{}}})_ returns 29-02-200 as it seems to be using 
> julian calendar
> but _{{Timestamp.valueOf(29-02-200)}}_ seems to be using gregorian calendar 
> and fails with "not a leap year exception" for 29th Feb 200"
> [https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/type/TimestampTZUtil.java#L196]
> Since hive stores timestamp in UTC, when converting 200 CE/03/01 between 
> timezones, hive runs into an exception and fails with "not a leap year 
> exception" for 29th Feb 200 even if the actual record inserted was 200 
> CE/03/01 in Asia/Singapore timezone.
>  
> Fullstack trace:
> {noformat}
> java.lang.RuntimeException: java.io.IOException: 
> org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in 
> block -1 in file 
> file:/Users/simhadri.govindappa/Documents/apache/hive/itests/qtest/target/localfs/warehouse/test_sgt/sgt000
>     at 
> org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:210)
>     at org.apache.hadoop.hive.ql.exec.FetchTask.execute(FetchTask.java:95)
>     at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:212)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149)
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185)
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:230)
>     at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356)
>     at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:732)
>     at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:702)
>     at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:116)
>     at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>     at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMet

[jira] [Updated] (HIVE-28326) Enabling hive.stageid.rearrange causes NullPointerException

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28326:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Enabling hive.stageid.rearrange causes NullPointerException
> ---
>
> Key: HIVE-28326
> URL: https://issues.apache.org/jira/browse/HIVE-28326
> Project: Hive
>  Issue Type: Bug
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> Setting hive.stageid.rearrange to other than 'none' causes 
> NullPointerException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-23964) SemanticException in query 30 while generating logical plan

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-23964:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> SemanticException in query 30 while generating logical plan
> ---
>
> Key: HIVE-23964
> URL: https://issues.apache.org/jira/browse/HIVE-23964
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
> Attachments: cbo_query30_stacktrace.txt
>
>
> Invalid table alias or column reference 'c_last_review_date' is thrown when  
> running TPC-DS query 30 (cbo_query30.q, query30.q) on the metastore with the 
> partitoned TPC-DS 30TB dataset. 
> The respective stacktrace is attached to this case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28087) Iceberg: Timestamp partition columns with transforms are not correctly sorted during insert

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28087:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Iceberg: Timestamp partition columns with transforms are not correctly sorted 
> during insert
> ---
>
> Key: HIVE-28087
> URL: https://issues.apache.org/jira/browse/HIVE-28087
> Project: Hive
>  Issue Type: Task
>Reporter: Simhadri Govindappa
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
> Attachments: query-hive-377.csv
>
>
> Insert into partitioned table fails with the following error if the data is 
> not clustered.
> *Using cluster by clause it succeeds :* 
> {noformat}
> 0: jdbc:hive2://localhost:10001/> insert into table partition_transform_4 
> select t, ts from t1 cluster by ts;
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 1 .. container SUCCEEDED  1  100  
>  0   0
> Reducer 2 .. container SUCCEEDED  1  100  
>  0   0
> --
> VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 9.47 s
> --
> INFO  : Starting task [Stage-2:DEPENDENCY_COLLECTION] in serial mode
> INFO  : Starting task [Stage-0:MOVE] in serial mode
> INFO  : Completed executing 
> command(queryId=root_20240222123244_0c448b32-4fd9-420d-be31-e39e2972af82); 
> Time taken: 10.534 seconds
> 100 rows affected (10.696 seconds){noformat}
>  
> *Without cluster By it fails:* 
> {noformat}
> 0: jdbc:hive2://localhost:10001/> insert into table partition_transform_4 
> select t, ts from t1;
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 1 .. container SUCCEEDED  1  100  
>  0   0
> Reducer 2container   RUNNING  1  010  
>  2   0
> --
> VERTICES: 01/02  [=>>-] 50%   ELAPSED TIME: 9.53 s
> --
> Caused by: java.lang.IllegalStateException: Incoming records violate the 
> writer assumption that records are clustered by spec and by partition within 
> each spec. Either cluster the incoming records or switch to fanout writers.
> Encountered records that belong to already closed files:
> partition 'ts_month=2027-03' in spec [
>   1000: ts_month: month(2)
> ]
>   at org.apache.iceberg.io.ClusteredWriter.write(ClusteredWriter.java:96)
>   at 
> org.apache.iceberg.io.ClusteredDataWriter.write(ClusteredDataWriter.java:31)
>   at 
> org.apache.iceberg.mr.hive.writer.HiveIcebergRecordWriter.write(HiveIcebergRecordWriter.java:53)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1181)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:502)
>   ... 20 more{noformat}
>  
>  
> A simple repro, using the attached csv file: 
> [^query-hive-377.csv]
> {noformat}
> create database t3;
> use t3;
> create table vector1k(
>         t int,
>         si int,
>         i int,
>         b bigint,
>         f float,
>         d double,
>         dc decimal(38,18),
&g

[jira] [Updated] (HIVE-28042) DigestMD5 token expired or does not exist error while opening a new connection to HMS

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28042:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> DigestMD5 token expired or does not exist error while opening a new 
> connection to HMS
> -
>
> Key: HIVE-28042
> URL: https://issues.apache.org/jira/browse/HIVE-28042
> Project: Hive
>  Issue Type: Bug
>Reporter: Vikram Ahuja
>Assignee: Vikram Ahuja
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
> Attachments: HIVE-28042 - DigestMD5 error during opening connection 
> to HMS.pdf
>
>
> Hello,
> In our deployment we are facing the following exception in the HMS logs when 
> a HMS connection is opened from the HS2 in cases where a session is open for 
> a long time leading to query failures:
> {code:java}
> 2024-01-24T02:11:21,324 ERROR [TThreadPoolServer WorkerProcess-760394]: 
> transport.TSaslTransport (TSaslTransport.java:open) - SASL negotiation 
> failurejavax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring 
> password    
> at 
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java)
>     
> at 
> com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java)
>     
> at 
> org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java)
>     at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java)    
> at 
> org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java)
>     
> at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java)
>     
> at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java)
>     
> at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java)
>     
> at java.security.AccessController.doPrivileged(Native Method)    
> at javax.security.auth.Subject.doAs(Subject.javA)    
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java)
>     
> at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java)
>     
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java)
>     
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java) 
>    
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java)   
>  
> at java.lang.Thread.run(Thread.java)Caused by: 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: token expired or 
> does not exist: HIVE_DELEGATION_TOKEN owner=***, renewer=***, 
> realUser=*, issueDate=1705973286139, maxDate=1706578086139, 
> sequenceNumber=3294063, masterKeyId=7601    
> at 
> org.apache.hadoop.hive.metastore.security.TokenStoreDelegationTokenSecretManager.retrievePassword(TokenStoreDelegationTokenSecretManager.java)
>     
> at 
> org.apache.hadoop.hive.metastore.security.TokenStoreDelegationTokenSecretManager.retrievePassword(TokenStoreDelegationTokenSecretManager.java)
>     
> at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$SaslDigestCallbackHandler.getPassword(HadoopThriftAuthBridge.java)
>     
> at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$SaslDigestCallbackHandler.handle(HadoopThriftAuthBridge.java)
>     
> at 
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java)
>     ... 15 more {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28360) Upgrade jersey to version 1.19.4,

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28360:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Upgrade jersey to version 1.19.4,
> -
>
> Key: HIVE-28360
> URL: https://issues.apache.org/jira/browse/HIVE-28360
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: lvyankui
>Assignee: lvyankui
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0, 4.0.1
>
> Attachments: HIVE-28360.patch
>
>
> Hive version: 3.1.3
> Hadoop version: 3.3.6
> After upgrading to Hadoop 3.3.6, the Hive WebHCat server fails to start 
> because of inconsistent versions of the Jersey JAR package. Hive HCat lacks 
> the jersey-server-1.19 jar.
>  
> After upgrading to Hadoop 3.3.5+, Hadoop updates jersey to version 
> {color:#ff}1.19.4{color}, which is inconsistent with the jersey version 
> in the Hive WebHCat server. As a result, the startup fails. To resolve this, 
> manually download a package and place it in 
> /usr/lib/hive-hcatalog/share/webhcat/svr/lib/
> Therefore, when packaging Hive, we need to specify the version of Jersey in 
> the Hive POM file to match the version of Jersey in Hadoop  to avoid version 
> conflicts.
>  
> Here is the error log 
> INFO  | 18 Jul 2024 14:37:13,237 | org.eclipse.jetty.server.Server | 
> jetty-9.4.53.v20231009; built: 2023-10-09T12:29:09.265Z; git: 
> 27bde00a0b95a1d5bbee0eae7984f891d2d0f8c9; jvm 1.8.0_412-b08
> WARN  | 18 Jul 2024 14:37:13,326 | 
> org.eclipse.jetty.server.handler.ContextHandler.ROOT | unavailable
> com.sun.jersey.api.container.ContainerException: No WebApplication provider 
> is present
>         at 
> com.sun.jersey.spi.container.WebApplicationFactory.createWebApplication(WebApplicationFactory.java:69)
>  ~[jersey-server-1.19.4.jar:1.19.4]
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer.create(ServletContainer.java:412)
>  ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer$InternalWebComponent.create(ServletContainer.java:327)
>  ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.WebComponent.load(WebComponent.java:603) 
> ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.WebComponent.init(WebComponent.java:207) 
> ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:394)
>  ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:577)
>  ~[jersey-servlet-1.19.jar:1.19]
>         at javax.servlet.GenericServlet.init(GenericServlet.java:244) 
> ~[javax.servlet-api-3.1.0.jar:3.1.0]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28428) Map hash aggregation performance degradation

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28428:
---
Labels: hive-4.0.1-merged hive-4.0.1-must performance 
pull-request-available  (was: hive-4.0.1-must performance 
pull-request-available)

>  Map hash aggregation performance degradation
> -
>
> Key: HIVE-28428
> URL: https://issues.apache.org/jira/browse/HIVE-28428
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Operators, Query Processor
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, performance, 
> pull-request-available
> Fix For: 4.1.0
>
> Attachments: 2024-08-02 14.35.46.png, 
> image-2024-08-02-14-37-01-824.png, image-2024-08-02-14-38-45-459.png
>
>
> The following ticket has been fixed to enable map hash aggregation, but 
> performance degradation than when it is disabled.
> https://issues.apache.org/jira/browse/HIVE-23356
> I found a few reasons for this. If there are a large number of keys, the 
> following log will be output in large volume, affecting performance. And, 
> this can also cause an OOM.
> {code:java}
> 2024-08-02 05:21:53,675 [INFO] [TezChild] |exec.GroupByOperator|: Hash Tbl 
> flush: #hash table = 171000
> 2024-08-02 05:21:53,713 [INFO] [TezChild] |exec.GroupByOperator|: Hash Table 
> flushed: new size = 153900
> {code}
> By fixing this, we can improve performance as follows.
> Before:
> !image-2024-08-02-14-37-01-824.png!
> After:
> !2024-08-02 14.35.46.png!
> And, currently the flush size is fixed, but performance can be improved by 
> changing it depending on the data:
> !image-2024-08-02-14-38-45-459.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28264) OOM/slow compilation when query contains SELECT clauses with nested expressions

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28264:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> OOM/slow compilation when query contains SELECT clauses with nested 
> expressions
> ---
>
> Key: HIVE-28264
> URL: https://issues.apache.org/jira/browse/HIVE-28264
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, HiveServer2
>Affects Versions: 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> {code:sql}
> CREATE TABLE t0 (`title` string);
> SELECT x10 from
> (SELECT concat_ws('L10',x9, x9, x9, x9) as x10 from
> (SELECT concat_ws('L9',x8, x8, x8, x8) as x9 from
> (SELECT concat_ws('L8',x7, x7, x7, x7) as x8 from
> (SELECT concat_ws('L7',x6, x6, x6, x6) as x7 from
> (SELECT concat_ws('L6',x5, x5, x5, x5) as x6 from
> (SELECT concat_ws('L5',x4, x4, x4, x4) as x5 from
> (SELECT concat_ws('L4',x3, x3, x3, x3) as x4 from
> (SELECT concat_ws('L3',x2, x2, x2, x2) as x3 
> from
> (SELECT concat_ws('L2',x1, x1, x1, x1) as 
> x2 from
> (SELECT concat_ws('L1',x0, x0, x0, 
> x0) as x1 from
> (SELECT concat_ws('L0',title, 
> title, title, title) as x0 from t0) t1) t2) t3) t4) t5) t6) t7) t8) t9) t10) t
> WHERE x10 = 'Something';
> {code}
> The query above fails with OOM when run with the TestMiniLlapLocalCliDriver 
> and the default max heap size configuration effective for tests (-Xmx2048m).
> {noformat}
> java.lang.OutOfMemoryError: Java heap space
>   at java.util.Arrays.copyOf(Arrays.java:3332)
>   at 
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
>   at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
>   at java.lang.StringBuilder.append(StringBuilder.java:136)
>   at org.apache.calcite.rex.RexCall.computeDigest(RexCall.java:152)
>   at org.apache.calcite.rex.RexCall.toString(RexCall.java:165)
>   at org.apache.calcite.rex.RexCall.appendOperands(RexCall.java:105)
>   at org.apache.calcite.rex.RexCall.computeDigest(RexCall.java:151)
>   at org.apache.calcite.rex.RexCall.toString(RexCall.java:165)
>   at java.lang.String.valueOf(String.java:2994)
>   at java.lang.StringBuilder.append(StringBuilder.java:131)
>   at 
> org.apache.calcite.rel.externalize.RelWriterImpl.explain_(RelWriterImpl.java:90)
>   at 
> org.apache.calcite.rel.externalize.RelWriterImpl.done(RelWriterImpl.java:144)
>   at 
> org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246)
>   at 
> org.apache.calcite.rel.externalize.RelWriterImpl.explainInputs(RelWriterImpl.java:122)
>   at 
> org.apache.calcite.rel.externalize.RelWriterImpl.explain_(RelWriterImpl.java:116)
>   at 
> org.apache.calcite.rel.externalize.RelWriterImpl.done(RelWriterImpl.java:144)
>   at 
> org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246)
>   at org.apache.calcite.plan.RelOptUtil.toString(RelOptUtil.java:2308)
>   at org.apache.calcite.plan.RelOptUtil.toString(RelOptUtil.java:2292)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.RuleEventLogger.ruleProductionSucceeded(RuleEventLogger.java:73)
>   at 
> org.apache.calcite.plan.MulticastRelOptListener.ruleProductionSucceeded(MulticastRelOptListener.java:68)
>   at 
> org.apache.calcite.plan.AbstractRelOptPlanner.notifyTransformation(AbstractRelOptPlanner.java:370)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.applyTransformationResults(HepPlanner.java:702)
>   at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:545)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:271)
>   at 
> org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74)
>   at 
> org.apache.calcite.pl

[jira] [Updated] (HIVE-28434) Upgrade to tez 0.10.4

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28434:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Upgrade to tez 0.10.4
> -
>
> Key: HIVE-28434
> URL: https://issues.apache.org/jira/browse/HIVE-28434
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28480) Disable SMB on partition hash generator mismatch across join branches in previous RS

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28480:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Disable SMB on partition hash generator mismatch across join branches in 
> previous RS
> 
>
> Key: HIVE-28480
> URL: https://issues.apache.org/jira/browse/HIVE-28480
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Himanshu Mishra
>Assignee: Himanshu Mishra
>Priority: Critical
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0, 4.0.1
>
>
> As SMB replaces last RS op from the joining branches and the JOIN op with 
> MERGEJOIN, we need to ensure the RS before these RS, in both branches, are 
> partitioning using same hash generator.
> Hash code generator differs based on ReducerTraits.UNIFORM i.e. 
> [ReduceSinkOperator#computeMurmurHash()  or 
> ReduceSinkOperator#computeHashCode()|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java#L340-L344],
>  leading to different hash code for same value.
> Skip SMB join in such cases.
> h3. Replication:
> Consider following query, where join would get converted to SMB. Auto reducer 
> is enabled which ensures more than 1 reducer task.
>  
> {code:java}
> CREATE TABLE t_asj_18 (k STRING, v INT);
> INSERT INTO t_asj_18 values ('a', 10), ('a', 10);
> set hive.auto.convert.join=false;
> set hive.tez.auto.reducer.parallelism=true;
> EXPLAIN SELECT * FROM (
> SELECT k, COUNT(DISTINCT v), SUM(v)
> FROM t_asj_18 GROUP BY k
> ) a LEFT JOIN (
> SELECT k, COUNT(v)
> FROM t_asj_18 GROUP BY k
> ) b ON a.k = b.k; {code}
>  
>  
> Expected result is:
>  
> {code:java}
> a   1   20  a   2 {code}
> but on master branch, it results in
>  
>  
> {code:java}
> a   1   20  NULLNULL {code}
>  
>  
> Here for COUNT(DISTINCT), the RS key is k, v while partition is still k. In 
> such scenario [reducer trait UNIFORM is not 
> set|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SetReducerParallelism.java#L99-L104].]
>  The hash code for "a" from 2nd subquery is generated using murmurHash 
> (270516725) while 1st is generated using bucketHash (1086686554) and result 
> in rows with "a" key reaching different reducer tasks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27357) Map-side SMB Join returns incorrect result when it 2 tables have different bucket size

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-27357:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Map-side SMB Join returns incorrect result when it 2 tables have different 
> bucket size
> --
>
> Key: HIVE-27357
> URL: https://issues.apache.org/jira/browse/HIVE-27357
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.3, 4.0.0-alpha-2
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> The following query returns \{(1, 1), (2, 2), (7, 7)} instead of \{(1, 1), 
> (2, 2), (7, 7), (6, 6), (14, 14), (11, 11)}.
>  
>  
> {code:java}
> set hive.strict.checks.bucketing=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.auto.convert.join.noconditionaltask.size=1;
> set hive.optimize.dynamic.partition.hashjoin=false;
> DROP TABLE IF EXISTS bucket2;
> CREATE TABLE bucket2(key string, value string) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS;
> DROP TABLE IF EXISTS bucket3;
> CREATE TABLE bucket3(key string, value string) CLUSTERED BY (key) SORTED BY 
> (key) INTO 3 BUCKETS;
> INSERT INTO TABLE bucket2 VALUES (1, 1), (2, 2), (7, 7), (6, 6), (14, 14), 
> (11, 11);
> INSERT INTO TABLE bucket3 VALUES (1, 1), (2, 2), (7, 7), (6, 6), (14, 14), 
> (11, 11);
> SELECT * FROM bucket2 JOIN bucket3 on bucket2.key = bucket3.key; {code}
>  
>  
> It is known that sort-merge join is used when two tables have the same number 
> of buckets, but I could not find such restriction from the source code. Also, 
> current Hive uses map side SMB join for the above query, which joins 2 
> buckets table and 3 buckets table. So I'm planning to fix this issue not by 
> using another Join algorithms.
> Originally, we found this issue by running auto_sortmerge_join_12.q with 
> hive.strict.checks.bucketing=true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28266) Iceberg: select count(*) from data_files metadata tables gives wrong result

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28266:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Iceberg: select count(*) from data_files metadata tables gives wrong result
> ---
>
> Key: HIVE-28266
> URL: https://issues.apache.org/jira/browse/HIVE-28266
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> In Hive Iceberg, every table has a corresponding metadata table 
> "*.data_files" that contains info about the files that contain table's data.
> select count(*) from a data_file metadata table returns number of rows in the 
> data table instead of number of data files from the metadata table.
>  
> {code:java}
> CREATE TABLE x (name VARCHAR(50), age TINYINT, num_clicks BIGINT) stored by 
> iceberg stored as orc TBLPROPERTIES 
> ('external.table.purge'='true','format-version'='2');
> insert into x values 
> ('amy', 35, 123412344),
> ('adxfvy', 36, 123412534),
> ('amsdfyy', 37, 123417234),
> ('asafmy', 38, 123412534);
> insert into x values 
> ('amerqwy', 39, 123441234),
> ('amyxzcv', 40, 123341234),
> ('erweramy', 45, 122341234);
> Select * from default.x.data_files;
> – Returns 2 records in the output
> Select count from default.x.data_files;
> – Returns 7 instead of 2
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28487) Outdated MetastoreSchemaTool class reference in schemaTool.sh

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28487:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Outdated MetastoreSchemaTool class reference in schemaTool.sh
> -
>
> Key: HIVE-28487
> URL: https://issues.apache.org/jira/browse/HIVE-28487
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Sebastian Bernauer
>Assignee: Sebastian Bernauer
>Priority: Minor
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> In HIVE-21298 {{MetastoreSchemaTool}} was moved from 
> {{org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool}} to 
> {{{}org.apache.hadoop.hive.metastore.tools.schematool.MetastoreSchemaTool{}}},
>  but it seems like {{schemaTool.sh}} was not updated.
>  
> This results in the following error being raised when invoking the shell 
> script:
> {code:java}
> /stackable/apache-hive-metastore-4.0.0-bin $ bin/base --service schemaTool
> Exception in thread "main" java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool
> at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
> at java.base/java.lang.Class.forName0(Native Method)
> at java.base/java.lang.Class.forName(Class.java:398)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:321)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:241){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28431) Fix RexLiteral to ExprNode conversion if the literal is an empty string

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28431:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Fix RexLiteral to ExprNode conversion if the literal is an empty string
> ---
>
> Key: HIVE-28431
> URL: https://issues.apache.org/jira/browse/HIVE-28431
> Project: Hive
>  Issue Type: Bug
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> Currently conversion from RexLiteral to ExprNode fail if the literal is an 
> empty string. This was introduced from 
> https://issues.apache.org/jira/browse/HIVE-23892. This causes the CBO to fail
> RexLiteral node will not be null but still value within RexLiteral can be 
> empty.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28482) Iceberg: CTAS, CTLT query failure while fetching URI for authorization

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28482:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Iceberg: CTAS, CTLT query failure while fetching URI for authorization
> --
>
> Key: HIVE-28482
> URL: https://issues.apache.org/jira/browse/HIVE-28482
> Project: Hive
>  Issue Type: Bug
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> When we perform CTAS query with the following configs set to true - 
> {code:java}
> set hive.security.authorization.enabled=true;
> set hive.security.authorization.tables.on.storagehandlers=true;
> create table ctas_source stored by iceberg stored as orc as select * from 
> src;{code}
> The following error trace is seen - 
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Exception 
> occurred while getting the URI from storage handler: null
>         at 
> org.apache.hadoop.hive.ql.security.authorization.command.CommandAuthorizerV2.addHivePrivObject(CommandAuthorizerV2.java:213)
>         at 
> org.apache.hadoop.hive.ql.security.authorization.command.CommandAuthorizerV2.getHivePrivObjects(CommandAuthorizerV2.java:152)
>         at 
> org.apache.hadoop.hive.ql.security.authorization.command.CommandAuthorizerV2.doAuthorization(CommandAuthorizerV2.java:77)
>         at 
> org.apache.hadoop.hive.ql.security.authorization.command.CommandAuthorizer.doAuthorization(CommandAuthorizer.java:58)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28315) Missing classes while using hive jdbc standalone jar

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28315:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Missing classes while using hive jdbc standalone jar
> 
>
> Key: HIVE-28315
> URL: https://issues.apache.org/jira/browse/HIVE-28315
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28436) Incorrect syntax in Hive schema file for table MIN_HISTORY_LEVEL

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28436:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Incorrect syntax in Hive schema file for table MIN_HISTORY_LEVEL
> 
>
> Key: HIVE-28436
> URL: https://issues.apache.org/jira/browse/HIVE-28436
> Project: Hive
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Assignee: Indhumathi Muthumurugesh
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> Steps to Reproduce:
> Install latest hive 4.1.0 version.
> Run the below query in SYS DB.
> select * from sys.MIN_HISTORY_LEVEL;
> Exception:
> ERROR : Failed with exception java.io.IOException:java.io.IOException: 
> org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: 
> Caught exception while trying to execute query: You have an error in your SQL 
> syntax; check the manual that corresponds to your MySQL server version for 
> the right syntax to use near 'FROM "MIN_HISTORY_LEVEL"' at line 4
> java.io.IOException: java.io.IOException: 
> org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: 
> Caught exception while trying to execute query: You have an error in your SQL 
> syntax; check the manual that corresponds to your MySQL server version for 
> the right syntax to use near 'FROM "MIN_HISTORY_LEVEL"' at line 4
>  at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:628)
>  at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:535)
>  at org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:194)
>  at org.apache.hadoop.hive.ql.exec.FetchTask.execute(FetchTask.java:95)
>  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:201)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:142)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:137)
>  at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:90)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:356)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:750)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28350) Drop remote database succeeds but fails while deleting data under

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28350:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> Drop remote database succeeds but fails while deleting data under
> -
>
> Key: HIVE-28350
> URL: https://issues.apache.org/jira/browse/HIVE-28350
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> Drop remote database operation succeeds but fails towards the end while 
> clearing data under the database's location because while fetching database 
> object via JDO we don't seem to set the 'locationUri' field.
> {code:java}
> > drop database pg_hive_tests;
> INFO  : Compiling 
> command(queryId=hive_20240625161645_bbe11908-8d1c-46d7-9a02-1ef2091e1b86): 
> drop database pg_hive_tests
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20240625161645_bbe11908-8d1c-46d7-9a02-1ef2091e1b86); 
> Time taken: 0.115 seconds
> INFO  : Executing 
> command(queryId=hive_20240625161645_bbe11908-8d1c-46d7-9a02-1ef2091e1b86): 
> drop database pg_hive_tests
> INFO  : Starting task [Stage-0:DDL] in serial mode
> ERROR : Failed
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:java.lang.IllegalArgumentException: Can not create a 
> Path from a null string)
>     at org.apache.hadoop.hive.ql.metadata.Hive.dropDatabase(Hive.java:716) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at 
> org.apache.hadoop.hive.ql.ddl.database.drop.DropDatabaseOperation.execute(DropDatabaseOperation.java:51)
>  ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:356) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:329) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:813) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:550) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:544) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:190) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235)
>  ~[hive-service-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
>  ~[hive-service-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340)
>  ~[hive-service-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_232]
>     at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_232]
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>  ~[hadoop-common-3.1.1.7.2.18.0-641.jar:?]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:360)
>  ~[hive-service-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.

[jira] [Updated] (HIVE-28207) NullPointerException is thrown when checking column uniqueness

2024-09-20 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28207:
---
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available  (was: 
hive-4.0.1-must pull-request-available)

> NullPointerException is thrown when checking column uniqueness
> --
>
> Key: HIVE-28207
> URL: https://issues.apache.org/jira/browse/HIVE-28207
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: hive-4.0.1-merged, hive-4.0.1-must, 
> pull-request-available
> Fix For: 4.1.0
>
>
> In some cases, we skip checking null. For example, the last statement in the 
> following set of queries fails with NPE.
> {code:java}
> CREATE TABLE `store_sales` (`ss_item_sk` bigint);
> CREATE TABLE `household_demographics` (`hd_demo_sk` bigint);
> CREATE TABLE `item` (`i_item_sk` bigint);
> ALTER TABLE `store_sales` ADD CONSTRAINT `pk_ss` PRIMARY KEY (`ss_item_sk`) 
> DISABLE NOVALIDATE RELY;
> ALTER TABLE `item` ADD CONSTRAINT `pk_i` PRIMARY KEY (`i_item_sk`) DISABLE 
> NOVALIDATE RELY;
> ALTER TABLE `store_sales` ADD CONSTRAINT `ss_i` FOREIGN KEY (`ss_item_sk`) 
> REFERENCES `item`(`i_item_sk`) DISABLE NOVALIDATE RELY;
> EXPLAIN
> SELECT i_item_sk
> FROM store_sales, household_demographics, item
> WHERE ss_item_sk = i_item_sk{code}
> The NPE happens with HiveJoinConstraintsRule in the above case.
> {code:java}
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: NullPointerException null
>      at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:376)
>  ~[hive-service-4.0.0.jar:4.0.0]
>      at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:214)
>  ~[hive-service-4.0.0.jar:4.0.0]
>      at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:270)
>  ~[hive-service-4.0.0.jar:4.0.0]
>      at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:286) 
> ~[hive-service-4.0.0.jar:4.0.0]
>      at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:557)
>  ~[hive-service-4.0.0.jar:4.0.0]
>      at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:542)
>  ~[hive-service-4.0.0.jar:4.0.0]
>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_275]
>      at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_275]
>      at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_275]
>      at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_275]
>      at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>  ~[hive-service-4.0.0.jar:4.0.0]
>      at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>  ~[hive-service-4.0.0.jar:4.0.0]
>      at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>  ~[hive-service-4.0.0.jar:4.0.0]
>      at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_275]
>      at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_275]
>      at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>  ~[hadoop-common-3.3.6.jar:?]
>      at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>  ~[hive-service-4.0.0.jar:4.0.0]
>      at com.sun.proxy.$Proxy42.executeStatementAsync(Unknown Source) ~[?:?]
>      at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:316)
>  ~[hive-service-4.0.0.jar:4.0.0]
>      at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:652)
>  ~[hive-service-4.0.0.jar:4.0.0]
>      at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1670)
>  ~[hive-exec-4.0.0.jar:4.0.0]
>      at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1650)
>  ~[hive-exec-4.0.0.jar:4.0.0]
>      at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) 
> ~[hive-exec-4.0.0.jar:4.0.0]
>      at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.jav

[jira] [Updated] (HIVE-28372) No need to update partitions stats when renaming table

2024-09-19 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28372:
---
Labels: pull-request-available  (was: hive-4.0.1-must 
pull-request-available)

> No need to update partitions stats when renaming table
> --
>
> Key: HIVE-28372
> URL: https://issues.apache.org/jira/browse/HIVE-28372
> Project: Hive
>  Issue Type: Improvement
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> After HIVE-27725, We no need to update partitions stats when renaming table, 
> as table name & db name are not in partition stats.
> This change can speed up partitioned table rename operation in case of many 
> partition stats stored in HMS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28360) Upgrade jersey to version 1.19.4,

2024-09-19 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883142#comment-17883142
 ] 

Zhihua Deng commented on HIVE-28360:


Fix has been pushed to master. Thank you for the contribution, [~lvyankui]!

> Upgrade jersey to version 1.19.4,
> -
>
> Key: HIVE-28360
> URL: https://issues.apache.org/jira/browse/HIVE-28360
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: lvyankui
>Assignee: lvyankui
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0, 4.0.1
>
> Attachments: HIVE-28360.patch
>
>
> Hive version: 3.1.3
> Hadoop version: 3.3.6
> After upgrading to Hadoop 3.3.6, the Hive WebHCat server fails to start 
> because of inconsistent versions of the Jersey JAR package. Hive HCat lacks 
> the jersey-server-1.19 jar.
>  
> After upgrading to Hadoop 3.3.5+, Hadoop updates jersey to version 
> {color:#ff}1.19.4{color}, which is inconsistent with the jersey version 
> in the Hive WebHCat server. As a result, the startup fails. To resolve this, 
> manually download a package and place it in 
> /usr/lib/hive-hcatalog/share/webhcat/svr/lib/
> Therefore, when packaging Hive, we need to specify the version of Jersey in 
> the Hive POM file to match the version of Jersey in Hadoop  to avoid version 
> conflicts.
>  
> Here is the error log 
> INFO  | 18 Jul 2024 14:37:13,237 | org.eclipse.jetty.server.Server | 
> jetty-9.4.53.v20231009; built: 2023-10-09T12:29:09.265Z; git: 
> 27bde00a0b95a1d5bbee0eae7984f891d2d0f8c9; jvm 1.8.0_412-b08
> WARN  | 18 Jul 2024 14:37:13,326 | 
> org.eclipse.jetty.server.handler.ContextHandler.ROOT | unavailable
> com.sun.jersey.api.container.ContainerException: No WebApplication provider 
> is present
>         at 
> com.sun.jersey.spi.container.WebApplicationFactory.createWebApplication(WebApplicationFactory.java:69)
>  ~[jersey-server-1.19.4.jar:1.19.4]
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer.create(ServletContainer.java:412)
>  ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer$InternalWebComponent.create(ServletContainer.java:327)
>  ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.WebComponent.load(WebComponent.java:603) 
> ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.WebComponent.init(WebComponent.java:207) 
> ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:394)
>  ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:577)
>  ~[jersey-servlet-1.19.jar:1.19]
>         at javax.servlet.GenericServlet.init(GenericServlet.java:244) 
> ~[javax.servlet-api-3.1.0.jar:3.1.0]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28360) Upgrade jersey to version 1.19.4,

2024-09-19 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-28360.

Fix Version/s: 4.1.0
   4.0.1
   Resolution: Fixed

> Upgrade jersey to version 1.19.4,
> -
>
> Key: HIVE-28360
> URL: https://issues.apache.org/jira/browse/HIVE-28360
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: lvyankui
>Assignee: lvyankui
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0, 4.0.1
>
> Attachments: HIVE-28360.patch
>
>
> Hive version: 3.1.3
> Hadoop version: 3.3.6
> After upgrading to Hadoop 3.3.6, the Hive WebHCat server fails to start 
> because of inconsistent versions of the Jersey JAR package. Hive HCat lacks 
> the jersey-server-1.19 jar.
>  
> After upgrading to Hadoop 3.3.5+, Hadoop updates jersey to version 
> {color:#ff}1.19.4{color}, which is inconsistent with the jersey version 
> in the Hive WebHCat server. As a result, the startup fails. To resolve this, 
> manually download a package and place it in 
> /usr/lib/hive-hcatalog/share/webhcat/svr/lib/
> Therefore, when packaging Hive, we need to specify the version of Jersey in 
> the Hive POM file to match the version of Jersey in Hadoop  to avoid version 
> conflicts.
>  
> Here is the error log 
> INFO  | 18 Jul 2024 14:37:13,237 | org.eclipse.jetty.server.Server | 
> jetty-9.4.53.v20231009; built: 2023-10-09T12:29:09.265Z; git: 
> 27bde00a0b95a1d5bbee0eae7984f891d2d0f8c9; jvm 1.8.0_412-b08
> WARN  | 18 Jul 2024 14:37:13,326 | 
> org.eclipse.jetty.server.handler.ContextHandler.ROOT | unavailable
> com.sun.jersey.api.container.ContainerException: No WebApplication provider 
> is present
>         at 
> com.sun.jersey.spi.container.WebApplicationFactory.createWebApplication(WebApplicationFactory.java:69)
>  ~[jersey-server-1.19.4.jar:1.19.4]
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer.create(ServletContainer.java:412)
>  ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer$InternalWebComponent.create(ServletContainer.java:327)
>  ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.WebComponent.load(WebComponent.java:603) 
> ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.WebComponent.init(WebComponent.java:207) 
> ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:394)
>  ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:577)
>  ~[jersey-servlet-1.19.jar:1.19]
>         at javax.servlet.GenericServlet.init(GenericServlet.java:244) 
> ~[javax.servlet-api-3.1.0.jar:3.1.0]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28399) Improve the fetch size in HiveConnection

2024-09-19 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-28399:
---
Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

> Improve the fetch size in HiveConnection
> 
>
> Key: HIVE-28399
> URL: https://issues.apache.org/jira/browse/HIVE-28399
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> If the 4.x Hive Jdbc client connects to an older HS2 or other thrift 
> implementations, it might throw the IllegalStateException: 
> [https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L1253-L1258],
>  as the remote might haven't set the property:  
> hive.server2.thrift.resultset.default.fetch.size back to the response of 
> OpenSession request. It also introduces the confusing on what the real fetch 
> size the connection is, as we have both initFetchSize and defaultFetchSize in 
> this HiveConnection, the HiveStatement checks the initFetchSize, 
> defaultFetchSize and 
> HIVE_SERVER2_THRIFT_RESULTSET_DEFAULT_FETCH_SIZE.defaultIntVal to obtain the 
> real fetch size, we can make them one in HiveConnection, so every statement 
> created from the connection uses this new fetch size.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28512) CREATE TABLE x LIKE retain whitelisted table properties

2024-09-19 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala reassigned HIVE-28512:


Assignee: Jintong Jiang  (was: Sai Hemanth Gantasala)

> CREATE TABLE x LIKE retain whitelisted table properties
> ---
>
> Key: HIVE-28512
> URL: https://issues.apache.org/jira/browse/HIVE-28512
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Sai Hemanth Gantasala
>Assignee: Jintong Jiang
>Priority: Major
>  Labels: pull-request-available
>
> It would be good to retain properties in 
> HiveConf.ConfVars.DDL_CTL_PARAMETERS_WHITELIST for CTLT query, as this is 
> particularly useful for avro base tables as the schema can evolve over time 
> and avro schema is mentioned in the avro.schema.url tblproperty.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28512) CREATE TABLE x LIKE retain whitelisted table properties

2024-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28512:
--
Labels: pull-request-available  (was: )

> CREATE TABLE x LIKE retain whitelisted table properties
> ---
>
> Key: HIVE-28512
> URL: https://issues.apache.org/jira/browse/HIVE-28512
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>
> It would be good to retain properties in 
> HiveConf.ConfVars.DDL_CTL_PARAMETERS_WHITELIST for CTLT query, as this is 
> particularly useful for avro base tables as the schema can evolve over time 
> and avro schema is mentioned in the avro.schema.url tblproperty.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28531) Iceberg metadata table query failing with ClassCastException

2024-09-19 Thread Chiran Ravani (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chiran Ravani resolved HIVE-28531.
--
Fix Version/s: Not Applicable
   Resolution: Duplicate

> Iceberg metadata table query failing with ClassCastException
> 
>
> Key: HIVE-28531
> URL: https://issues.apache.org/jira/browse/HIVE-28531
> Project: Hive
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>Reporter: Chiran Ravani
>Priority: Major
> Fix For: Not Applicable
>
>
> When Iceberg table has a timestamp column metadata table query fails with 
> below error.
> {code:java}
> Error while compiling statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
> vertexId=vertex_172676552_0001_6_00, diagnostics=[Task failed, 
> taskId=task_172676552_0001_6_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_172676552_0001_6_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:351)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280)
>     at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:86)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:72)
>     at java.base/java.security.AccessController.doPrivileged(Native Method)
>     at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:72)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:42)
>     at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>     at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:829) Caused by: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:297)
>  ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>     at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:580)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
>  ... 18 more Caused by: java.lang.ClassCastException: class 
> java.time.OffsetDateTime cannot be cast to class java.time.LocalDateTime 
> (java.time.OffsetDateTime and java.time.LocalDateTime are in module java.base 
> of loader 'bootstrap')
>     at 
> org.apache.iceberg.mr.hive.serde.objectinspector.IcebergTimestampObjectInspectorHive3.getPrimitiveJavaObject(IcebergTimestampObjectInspectorHive3.java:58)
>     at 
> org.apache.iceberg.mr.hive.serde.objectinspector.IcebergTimestampObjectInspectorHive3.getPrimitiveWritableObject(IcebergTimestampObjectInspectorHive3.java:64)
>     at 
> org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:308)
>     at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292)
>     at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:352)
>     at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:352)
>     at 
> org.ap

[jira] [Commented] (HIVE-28531) Iceberg metadata table query failing with ClassCastException

2024-09-19 Thread Chiran Ravani (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883098#comment-17883098
 ] 

Chiran Ravani commented on HIVE-28531:
--

[~ayushtkn]  Thank you, this does seem to be duplicate of HIVE-28353

> Iceberg metadata table query failing with ClassCastException
> 
>
> Key: HIVE-28531
> URL: https://issues.apache.org/jira/browse/HIVE-28531
> Project: Hive
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>Reporter: Chiran Ravani
>Priority: Major
>
> When Iceberg table has a timestamp column metadata table query fails with 
> below error.
> {code:java}
> Error while compiling statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
> vertexId=vertex_172676552_0001_6_00, diagnostics=[Task failed, 
> taskId=task_172676552_0001_6_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_172676552_0001_6_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:351)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280)
>     at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:86)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:72)
>     at java.base/java.security.AccessController.doPrivileged(Native Method)
>     at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:72)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:42)
>     at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>     at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:829) Caused by: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:297)
>  ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>     at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:580)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
>  ... 18 more Caused by: java.lang.ClassCastException: class 
> java.time.OffsetDateTime cannot be cast to class java.time.LocalDateTime 
> (java.time.OffsetDateTime and java.time.LocalDateTime are in module java.base 
> of loader 'bootstrap')
>     at 
> org.apache.iceberg.mr.hive.serde.objectinspector.IcebergTimestampObjectInspectorHive3.getPrimitiveJavaObject(IcebergTimestampObjectInspectorHive3.java:58)
>     at 
> org.apache.iceberg.mr.hive.serde.objectinspector.IcebergTimestampObjectInspectorHive3.getPrimitiveWritableObject(IcebergTimestampObjectInspectorHive3.java:64)
>     at 
> org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:308)
>     at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292)
>     at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:352)
>     at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:352)
>     at 
> org.apac

[jira] [Updated] (HIVE-28531) Iceberg metadata table query failing with ClassCastException

2024-09-19 Thread Chiran Ravani (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chiran Ravani updated HIVE-28531:
-
Description: 
When Iceberg table has a timestamp column metadata table query fails with below 
error.
{code:java}
Error while compiling statement: FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
vertexId=vertex_172676552_0001_6_00, diagnostics=[Task failed, 
taskId=task_172676552_0001_6_00_00, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Error while running task ( failure ) : 
attempt_172676552_0001_6_00_00_0:java.lang.RuntimeException: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row
    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:351)
    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280)
    at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:86)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:72)
    at java.base/java.security.AccessController.doPrivileged(Native Method)
    at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:72)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:42)
    at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
    at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829) Caused by: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row
    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437)
    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:297)
 ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive 
Runtime Error while processing row
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:580)
    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
 ... 18 more Caused by: java.lang.ClassCastException: class 
java.time.OffsetDateTime cannot be cast to class java.time.LocalDateTime 
(java.time.OffsetDateTime and java.time.LocalDateTime are in module java.base 
of loader 'bootstrap')
    at 
org.apache.iceberg.mr.hive.serde.objectinspector.IcebergTimestampObjectInspectorHive3.getPrimitiveJavaObject(IcebergTimestampObjectInspectorHive3.java:58)
    at 
org.apache.iceberg.mr.hive.serde.objectinspector.IcebergTimestampObjectInspectorHive3.getPrimitiveWritableObject(IcebergTimestampObjectInspectorHive3.java:64)
    at 
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:308)
    at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292)
    at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:352)
    at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:352)
    at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247)
    at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231)
    at 
org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:52)
    at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1148)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937)
    at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937)
    at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:174)
    at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:154)
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:559) 
{code}
 

Steps to reproduce

[jira] [Commented] (HIVE-28531) Iceberg metadata table query failing with ClassCastException

2024-09-19 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883095#comment-17883095
 ] 

Ayush Saxena commented on HIVE-28531:
-

Dupes HIVE-28353?

> Iceberg metadata table query failing with ClassCastException
> 
>
> Key: HIVE-28531
> URL: https://issues.apache.org/jira/browse/HIVE-28531
> Project: Hive
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>Reporter: Chiran Ravani
>Priority: Major
>
> When Iceberg table has a timestamp column metadata table query fails with 
> below error.
> {code:java}
> Error while compiling statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
> vertexId=vertex_172676552_0001_6_00, diagnostics=[Task failed, 
> taskId=task_172676552_0001_6_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_172676552_0001_6_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:351)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280)
>     at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:86)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:72)
>     at java.base/java.security.AccessController.doPrivileged(Native Method)
>     at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:72)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:42)
>     at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>     at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:829) Caused by: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:297)
>  ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>     at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:580)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
>  ... 18 more Caused by: java.lang.ClassCastException: class 
> java.time.OffsetDateTime cannot be cast to class java.time.LocalDateTime 
> (java.time.OffsetDateTime and java.time.LocalDateTime are in module java.base 
> of loader 'bootstrap')
>     at 
> org.apache.iceberg.mr.hive.serde.objectinspector.IcebergTimestampObjectInspectorHive3.getPrimitiveJavaObject(IcebergTimestampObjectInspectorHive3.java:58)
>     at 
> org.apache.iceberg.mr.hive.serde.objectinspector.IcebergTimestampObjectInspectorHive3.getPrimitiveWritableObject(IcebergTimestampObjectInspectorHive3.java:64)
>     at 
> org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:308)
>     at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292)
>     at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:352)
>     at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:352)
>     at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.se

[jira] [Updated] (HIVE-28531) Iceberg metadata table query failing with ClassCastException

2024-09-19 Thread Chiran Ravani (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chiran Ravani updated HIVE-28531:
-
Description: 
When Iceberg table has a timestamp column metadata table query fails with below 
error.
{code:java}
Error while compiling statement: FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
vertexId=vertex_172676552_0001_6_00, diagnostics=[Task failed, 
taskId=task_172676552_0001_6_00_00, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Error while running task ( failure ) : 
attempt_172676552_0001_6_00_00_0:java.lang.RuntimeException: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row
    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:351)
    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280)
    at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:86)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:72)
    at java.base/java.security.AccessController.doPrivileged(Native Method)
    at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:72)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:42)
    at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
    at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829) Caused by: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row
    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437)
    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:297)
 ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive 
Runtime Error while processing row
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:580)
    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
 ... 18 more Caused by: java.lang.ClassCastException: class 
java.time.OffsetDateTime cannot be cast to class java.time.LocalDateTime 
(java.time.OffsetDateTime and java.time.LocalDateTime are in module java.base 
of loader 'bootstrap')
    at 
org.apache.iceberg.mr.hive.serde.objectinspector.IcebergTimestampObjectInspectorHive3.getPrimitiveJavaObject(IcebergTimestampObjectInspectorHive3.java:58)
    at 
org.apache.iceberg.mr.hive.serde.objectinspector.IcebergTimestampObjectInspectorHive3.getPrimitiveWritableObject(IcebergTimestampObjectInspectorHive3.java:64)
    at 
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:308)
    at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292)
    at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:352)
    at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:352)
    at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247)
    at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231)
    at 
org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:52)
    at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1148)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937)
    at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937)
    at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:174)
    at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:154)
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:559) 
{code}
 

Steps to reproduce

[jira] [Created] (HIVE-28531) Iceberg metadata table query failing with ClassCastException

2024-09-19 Thread Chiran Ravani (Jira)
Chiran Ravani created HIVE-28531:


 Summary: Iceberg metadata table query failing with 
ClassCastException
 Key: HIVE-28531
 URL: https://issues.apache.org/jira/browse/HIVE-28531
 Project: Hive
  Issue Type: Bug
  Security Level: Public (Viewable by anyone)
Reporter: Chiran Ravani


When Iceberg table has a timestamp column metadata table query fails with below 
error.
{code:java}
Error while compiling statement: FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
vertexId=vertex_172676552_0001_6_00, diagnostics=[Task failed, 
taskId=task_172676552_0001_6_00_00, diagnostics=[TaskAttempt 0 failed, 
info=[Error: Error while running task ( failure ) : 
attempt_172676552_0001_6_00_00_0:java.lang.RuntimeException: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row
    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:351)
    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280)
    at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:86)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:72)
    at java.base/java.security.AccessController.doPrivileged(Native Method)
    at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:72)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:42)
    at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
    at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829) Caused by: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row
    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437)
    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:297)
 ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive 
Runtime Error while processing row
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:580)
    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
 ... 18 more Caused by: java.lang.ClassCastException: class 
java.time.OffsetDateTime cannot be cast to class java.time.LocalDateTime 
(java.time.OffsetDateTime and java.time.LocalDateTime are in module java.base 
of loader 'bootstrap')
    at 
org.apache.iceberg.mr.hive.serde.objectinspector.IcebergTimestampObjectInspectorHive3.getPrimitiveJavaObject(IcebergTimestampObjectInspectorHive3.java:58)
    at 
org.apache.iceberg.mr.hive.serde.objectinspector.IcebergTimestampObjectInspectorHive3.getPrimitiveWritableObject(IcebergTimestampObjectInspectorHive3.java:64)
    at 
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:308)
    at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292)
    at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:352)
    at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:352)
    at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247)
    at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231)
    at 
org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:52)
    at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1148)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937)
    at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937)
    at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:1

[jira] [Commented] (HIVE-28530) Fetched result from another query

2024-09-19 Thread Xiaomin Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883028#comment-17883028
 ] 

Xiaomin Zhang commented on HIVE-28530:
--

Issue seems related to below jira:

https://issues.apache.org/jira/browse/HIVE-21279

In this jira, a new HiveSequenceFileInputFormat was introduced and it has a 
volatile field fileStatuses, which is referenced twice in the FetchOperator by 
getNextSplits() call, only when query.result.cache is disabled. Unfortunately 
this field access is not thread-safe because the HiveSequenceFileInputFormat 
object itself is actually a shared object. Due to this, there could be various 
failing scenarios such like:

1) One thread set fileStatuses to null, another thread overrides it to its 
result files ==> getting a wrong result from another query

2) One thread set fileStatuses to its result files, then another thread 
overrides it to null ==> getting empty result

> Fetched result from another query
> -
>
> Key: HIVE-28530
> URL: https://issues.apache.org/jira/browse/HIVE-28530
> Project: Hive
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: Xiaomin Zhang
>Priority: Major
>
> When running Hive load tests, we observed Beeline can fetch wrong query 
> result which is from another one running at same time.  We ruled out Load 
> Balancing issue, because it happened to a single HiveServer2.  And we found 
> this issue only happens when *hive.query.result.cached.enabled is false.*
> All test queries are in the same format as below: 
> {code:java}
> select concat('total record (test_recon_mock_$PID)=',count(*)) as 
> count_record from t1t
> {code}
> We randomized the query by replacing the $PID with the Beeline PID and the 
> test driver ran 10 Beeline concurrently.  The table t1t is static and has a 
> few rows. So now the test driver can check if the query result is equal to: 
> total record (test_recon_mock_$PID)=2
> When query result cache is disabled,  we can see randomly query got a wrong 
> result, and can always reproduced.  For example, below two queries were 
> running in parallel:
> {code:java}
> queryId=hive_20240701103742_ff1adb2d-e9eb-448d-990e-00ab371e9db6): select 
> concat('total record (test_recon_mock_21535)=',count(*)) as count_record from 
> t1t
> queryId=hive_20240701103742_9bdfff92-89e1-4bcd-88ea-bf73ba5fd93d): select 
> concat('total record (test_recon_mock_21566)=',count(*)) as count_record from 
> t1t
> {code}
> While the second query is supposed to get below result:
> *total record (test_recon_mock_21566)=2*
> But actually Beeline got below result:
> *total record (test_recon_mock_21535)=2*
> There is no error in the HS2 log.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28530) Fetched result from another query

2024-09-19 Thread Xiaomin Zhang (Jira)
Xiaomin Zhang created HIVE-28530:


 Summary: Fetched result from another query
 Key: HIVE-28530
 URL: https://issues.apache.org/jira/browse/HIVE-28530
 Project: Hive
  Issue Type: Bug
  Security Level: Public (Viewable by anyone)
  Components: HiveServer2
Affects Versions: 3.0.0
Reporter: Xiaomin Zhang


When running Hive load tests, we observed Beeline can fetch wrong query result 
which is from another one running at same time.  We ruled out Load Balancing 
issue, because it happened to a single HiveServer2.  And we found this issue 
only happens when *hive.query.result.cached.enabled is false.*

All test queries are in the same format as below: 

{code:java}
select concat('total record (test_recon_mock_$PID)=',count(*)) as count_record 
from t1t
{code}

We randomized the query by replacing the $PID with the Beeline PID and the test 
driver ran 10 Beeline concurrently.  The table t1t is static and has a few 
rows. So now the test driver can check if the query result is equal to: total 
record (test_recon_mock_$PID)=2

When query result cache is disabled,  we can see randomly query got a wrong 
result, and can always reproduced.  For example, below two queries were running 
in parallel:

{code:java}
queryId=hive_20240701103742_ff1adb2d-e9eb-448d-990e-00ab371e9db6): select 
concat('total record (test_recon_mock_21535)=',count(*)) as count_record from 
t1t

queryId=hive_20240701103742_9bdfff92-89e1-4bcd-88ea-bf73ba5fd93d): select 
concat('total record (test_recon_mock_21566)=',count(*)) as count_record from 
t1t
{code}

While the second query is supposed to get below result:
*total record (test_recon_mock_21566)=2*

But actually Beeline got below result:
*total record (test_recon_mock_21535)=2*

There is no error in the HS2 log.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28529) HiveMetaStore#getConf blocked when meet high load

2024-09-19 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HIVE-28529:
---
Description: 
Thousand threads are blocked for long time when metastore meet high load as the 
following stack shows.
a. there are 1836 threads(as stack 1 shows) are waiting Lock 
#0x7f8bf9477180 which is hold by stack 2.

{code:java}
# grep "0x7f8bf9477180" metastore.stack | wc -l
1836
{code}

b. there are 105 threads (as stack 2 shows) are waiting Lock 
#0x7f8bf805f660 which is hold by stack 3. 
{code:java}
# grep "0x7f8bf805f660" metastore.stack | wc -l
105
{code}

c. stack 3 shows that it is time cost operation when init configuration, which 
hold object  (#hiveConf as the last code snippet)synchronized which is at key 
path for metastore and impact the performance.

So, IMO, it need to improve and remove the lock competition to improve the 
performance. FYI.

NOTE: I have deployed one early version, but the newest one include this issue 
too.

{code:java}
"pool-12-thread-1482355" #126195588 prio=5 os_prio=0 tid=0x7f86d507b800 
nid=0x21570 waiting for monitor entry [0x7f875849b000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.hive.metastore.MetaStoreInit.initConnectionUrlHook(MetaStoreInit.java:95)
- waiting to lock <0x7f8bf9477180> (a java.lang.Class for 
org.apache.hadoop.hive.metastore.MetaStoreInit)
at 
org.apache.hadoop.hive.metastore.MetaStoreInit.updateConnectionURL(MetaStoreInit.java:62)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.init(RawStoreProxy.java:87)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:55)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:817)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:795)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database_core(HiveMetaStore.java:1308)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:1240)
at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:276)
at com.sun.proxy.$Proxy23.get_database(Unknown Source)
at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$MetricHMSProxy.invoke(HiveMetaStore.java:8241)
at com.sun.proxy.$Proxy23.get_database(Unknown Source)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:11142)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:11126)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:754)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:749)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1717)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:749)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

{code:java}
"pool-12-thread-1482175" #126193367 prio=5 os_prio=0 tid=0x7f87567d3000 
nid=0x20565 waiting for monitor entry [0x7f8698ccd000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2415)
- waiting to lock <0x7f8bf805f660> (a 
org.apache.hadoop.hive.conf.HiveConf)
at org.apache.hadoop.conf.C

[jira] [Created] (HIVE-28529) HiveMetaStore#getConf blocked when meet high load

2024-09-19 Thread Xiaoqiao He (Jira)
Xiaoqiao He created HIVE-28529:
--

 Summary: HiveMetaStore#getConf blocked when meet high load
 Key: HIVE-28529
 URL: https://issues.apache.org/jira/browse/HIVE-28529
 Project: Hive
  Issue Type: Improvement
  Security Level: Public (Viewable by anyone)
  Components: Metastore
Reporter: Xiaoqiao He


Thousand threads are blocked for long time when metastore meet high load as the 
following stack shows.
a. there are 1836 threads(as stack 1 shows) are waiting Lock 
#0x7f8bf9477180 which is hold by stack 2.

{code:java}
# grep "0x7f8bf9477180" metastore.stack | wc -l
1836
{code}

b. there are 105 threads (as stack 2 shows) are waiting Lock 
#0x7f8bf805f660 which is hold by stack 3. 
# grep "0x7f8bf805f660" metastore.stack | wc -l
105

c. stack 3 shows that it is time cost operation when init configuration, which 
hold object  (#hiveConf as the last code snippet)synchronized which is at key 
path for metastore and impact the performance.

So, IMO, it need to improve and remove the lock competition to improve the 
performance. FYI.

NOTE: I have deployed one early version, but the newest one include this issue 
too.

{code:java}
"pool-12-thread-1482355" #126195588 prio=5 os_prio=0 tid=0x7f86d507b800 
nid=0x21570 waiting for monitor entry [0x7f875849b000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.hive.metastore.MetaStoreInit.initConnectionUrlHook(MetaStoreInit.java:95)
- waiting to lock <0x7f8bf9477180> (a java.lang.Class for 
org.apache.hadoop.hive.metastore.MetaStoreInit)
at 
org.apache.hadoop.hive.metastore.MetaStoreInit.updateConnectionURL(MetaStoreInit.java:62)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.init(RawStoreProxy.java:87)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:55)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:817)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:795)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database_core(HiveMetaStore.java:1308)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:1240)
at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:276)
at com.sun.proxy.$Proxy23.get_database(Unknown Source)
at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$MetricHMSProxy.invoke(HiveMetaStore.java:8241)
at com.sun.proxy.$Proxy23.get_database(Unknown Source)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:11142)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:11126)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:754)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:749)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1717)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:749)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

{code:java}
"pool-12-thread-1482175" #126193367 prio=5 os_prio=0 tid=0x7f87567d3000 
nid=0x20565 waiting for monitor entry [0x7f8698ccd000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.conf.Configuration.getProps(Conf

[jira] [Updated] (HIVE-28529) HiveMetaStore#getConf blocked when meet high load

2024-09-19 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HIVE-28529:
---
Description: 
Thousand threads are blocked for long time when metastore meet high load as the 
following stack shows.
a. there are 1836 threads(as stack 1 shows) are waiting Lock 
#0x7f8bf9477180 which is hold by stack 2.

{code:java}
# grep "0x7f8bf9477180" metastore.stack | wc -l
1836
{code}

b. there are 105 threads (as stack 2 shows) are waiting Lock 
#0x7f8bf805f660 which is hold by stack 3. 
{code:java}
# grep "0x7f8bf805f660" metastore.stack | wc -l
105
{code}

c. stack 3 shows that it is time cost operation when init configuration, which 
hold object  (#hiveConf as the last code snippet)synchronized which is at key 
path for metastore and impact the performance.

So, IMO, it need to improve and remove the lock competition to improve the 
performance. FYI.

NOTE: I have deployed one early version, but the newest one include this issue 
too.

{code:java}
"pool-12-thread-1482355" #126195588 prio=5 os_prio=0 tid=0x7f86d507b800 
nid=0x21570 waiting for monitor entry [0x7f875849b000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.hive.metastore.MetaStoreInit.initConnectionUrlHook(MetaStoreInit.java:95)
- waiting to lock <0x7f8bf9477180> (a java.lang.Class for 
org.apache.hadoop.hive.metastore.MetaStoreInit)
at 
org.apache.hadoop.hive.metastore.MetaStoreInit.updateConnectionURL(MetaStoreInit.java:62)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.init(RawStoreProxy.java:87)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:55)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:817)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:795)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database_core(HiveMetaStore.java:1308)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:1240)
at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:276)
at com.sun.proxy.$Proxy23.get_database(Unknown Source)
at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$MetricHMSProxy.invoke(HiveMetaStore.java:8241)
at com.sun.proxy.$Proxy23.get_database(Unknown Source)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:11142)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:11126)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:754)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:749)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1717)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:749)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

{code:java}
"pool-12-thread-1482175" #126193367 prio=5 os_prio=0 tid=0x7f87567d3000 
nid=0x20565 waiting for monitor entry [0x7f8698ccd000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2415)
- waiting to lock <0x7f8bf805f660> (a 
org.apache.hadoop.hive.conf.HiveConf)
at org.apache.hadoop.conf.C

[jira] [Commented] (HIVE-28483) String date cast giving wrong result

2024-09-19 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882943#comment-17882943
 ] 

Stamatis Zampetakis commented on HIVE-28483:


For the behavior, of the CAST function across Hive versions, I added some more 
detailed tests in HIVE-27586.

> String date cast giving wrong result
> 
>
> Key: HIVE-28483
> URL: https://issues.apache.org/jira/browse/HIVE-28483
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Rátkai
>Assignee: Zoltán Rátkai
>Priority: Minor
>  Labels: pull-request-available
>
> Date conversation gives wrong result. Like:1 row selected (6.403 seconds)
> select to_date('03-08-2024');
> Result:
> +-+
> |    _c0    |
> +-+
> |0003-08-20  |
> +-+
> or:
> select to_date(last_day(add_months(last_day('03-08-2024'), -1))) ;
> Result:
> +-+
> |    _c0    |
> +-+
> |0003-07-31  |
> +-
> Here is my comparison with other database systems:
> +
> --
> PostgreSQL
> --
> SELECT TO_DATE('03-08-2024','MMDD');
> invalid value "03-0" for "" DETAIL: Field requires 4 characters, but only 
> 2 could be parsed. HINT: If your source string is not fixed-width, try using 
> the "FM" modifier. 
> SELECT TO_DATE('03-08-2024','DD-MM-');
> to_date
> Sat, 03 Aug 2024 00:00:00 GMT
> SELECT CAST('03-08-2024' AS date);
> date
> Fri, 08 Mar 2024 00:00:00 GMT
> SELECT CAST('2024-08-03' AS date);
> date
> Sat, 03 Aug 2024 00:00:00 GMT
> SELECT CAST('2024-08-03 T' AS date);
> invalid input syntax for type date: "2024-08-03 T" LINE 1: SELECT 
> CAST('2024-08-03 T' AS date) ^ 
> SELECT CAST('2024-08-03T' AS date);
> invalid input syntax for type date: "2024-08-03T" LINE 1: SELECT 
> CAST('2024-08-03T' AS date) ^ 
> SELECT CAST('2024-08-03T12:00:00' AS date);
> date
> Sat, 03 Aug 2024 00:00:00 GMT
> SELECT CAST('2024-08-0312:00:00' AS date);
> date/time field value out of range: "2024-08-0312:00:00" LINE 1: SELECT 
> CAST('2024-08-0312:00:00' AS date) ^ HINT: Perhaps you need a different 
> "datestyle" setting. 
> --
> -ORACLE---
> --
> select CAST('2024-08-03 12:00:00' AS date) from dual;
> Output:
> select CAST('2024-08-03 12:00:00' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01861: literal does not match format string
> -
> select CAST('2024-08-03' AS date) from dual;
> Output:
> select CAST('2024-08-03' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01861: literal does not match format string
> -
> SELECT TO_DATE('08/03/2024', 'MM/DD/') FROM DUAL;
> Output:
> TO_DATE('
> -
> 03-AUG-24
> -
> SELECT TO_DATE('2024-08-03', '-MM-DD') FROM DUAL;
> Output:
> TO_DATE('
> -
> 03-AUG-24
> -
> select CAST('03-08-2024' AS date) from dual;
> Output:
> select CAST('03-08-2024' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01843: An invalid month was specified.
> -
> select CAST('2024-08-0312:00:00' AS date) from dual;
> Output:
> select CAST('2024-08-0312:00:00' AS date) from dual
>             *
> ERROR at line 1:
> ORA-01861: literal does not match format string
> -
> select CAST('10-AUG-24' AS date) from dual;
> Output:
> CAST('10-
> -
> 10-AUG-24
> -
> select CAST('10-AUG-2024' AS date) from dual;
> Output:
> CAST('10-
> -
> 10-AUG-24
> -

[jira] [Commented] (HIVE-27586) Parse dates from strings ignoring trailing (potentialy) invalid chars

2024-09-19 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882942#comment-17882942
 ] 

Stamatis Zampetakis commented on HIVE-27586:


In the light of HIVE-28483, I performed a series of tests to document the 
behavior of parsing dates from strings across some major Hive versions. Date 
parsing appears in various places and may differ slightly across SQL functions 
so in the tests that follow I only examined the results of the CAST (V AS DATE) 
which is probably the most popular way of performing string to date 
conversions. For various SQL functions, the behavior of the vectorized and 
non-vectorized implementation is not aligned so in the tests I included both 
variants.

 !cast_string_date_hive_versions.svg! 

The tests were performed using the script in  [^cast_as_date.q] file and were 
run using the following command.

{noformat}
mvn test -Dtest=TestCliDriver -Dqfile=cast_as_date.q -Phadoop-2  
-Dtest.output.overwrite
{noformat}

Note that hadoop-2 profile is necessary for building older versions of Hive.

> Parse dates from strings ignoring trailing (potentialy) invalid chars
> -
>
> Key: HIVE-27586
> URL: https://issues.apache.org/jira/browse/HIVE-27586
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available
> Fix For: 4.0.0
>
> Attachments: cast_as_date.q, cast_string_date_hive_versions.pdf, 
> cast_string_date_hive_versions.png, cast_string_date_hive_versions.svg
>
>
> The goal of this ticket is to extract and return a valid date from a string 
> value when there is a valid date prefix in the string.
> The following table contains a few illustrative examples highlighting what 
> happens now and what will happen after the proposed changes to ignore 
> trailing characters. HIVE-20007 introduced some behavior changes around this 
> area so the table also displays what was the Hive behavior before that change.
> ||ID||String value||Before HIVE-20007||Current behavior||Ignore trailing 
> chars||
> |1|2023-08-03_16:02:00|2023-08-03|null|2023-08-03|
> |2|2023-08-03-16:02:00|2023-08-03|null|2023-08-03|
> |3|2023-08-0316:02:00|2024-06-11|null|2023-08-03|
> |4|03-08-2023|0009-02-12|null|0003-08-20|
> |5|2023-08-03 GARBAGE|2023-08-03|2023-08-03|2023-08-03|
> |6|2023-08-03TGARBAGE|2023-08-03|2023-08-03|2023-08-03|
> |7|2023-08-03_GARBAGE|2023-08-03|null|2023-08-03|
> This change partially (see example 3 and 4) restores the behavior changes 
> introduced by HIVE-20007 and at the same time makes the current behavior of 
> handling trailing invalid chars more uniform. 
> This change will have an impact on various Hive SQL functions and operators 
> (+/-) that accept dates from string values. A partial list of affected 
> functions is outlined below:
> * CAST (V AS DATE)
> * CAST (V AS TIMESTAMP)
> * TO_DATE
> * DATE_ADD
> * DATE_DIFF
> * WEEKOFYEAR
> * DAYOFWEEK
> * TRUNC



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27586) Parse dates from strings ignoring trailing (potentialy) invalid chars

2024-09-19 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27586:
---
Attachment: cast_string_date_hive_versions.svg

> Parse dates from strings ignoring trailing (potentialy) invalid chars
> -
>
> Key: HIVE-27586
> URL: https://issues.apache.org/jira/browse/HIVE-27586
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available
> Fix For: 4.0.0
>
> Attachments: cast_string_date_hive_versions.pdf, 
> cast_string_date_hive_versions.svg
>
>
> The goal of this ticket is to extract and return a valid date from a string 
> value when there is a valid date prefix in the string.
> The following table contains a few illustrative examples highlighting what 
> happens now and what will happen after the proposed changes to ignore 
> trailing characters. HIVE-20007 introduced some behavior changes around this 
> area so the table also displays what was the Hive behavior before that change.
> ||ID||String value||Before HIVE-20007||Current behavior||Ignore trailing 
> chars||
> |1|2023-08-03_16:02:00|2023-08-03|null|2023-08-03|
> |2|2023-08-03-16:02:00|2023-08-03|null|2023-08-03|
> |3|2023-08-0316:02:00|2024-06-11|null|2023-08-03|
> |4|03-08-2023|0009-02-12|null|0003-08-20|
> |5|2023-08-03 GARBAGE|2023-08-03|2023-08-03|2023-08-03|
> |6|2023-08-03TGARBAGE|2023-08-03|2023-08-03|2023-08-03|
> |7|2023-08-03_GARBAGE|2023-08-03|null|2023-08-03|
> This change partially (see example 3 and 4) restores the behavior changes 
> introduced by HIVE-20007 and at the same time makes the current behavior of 
> handling trailing invalid chars more uniform. 
> This change will have an impact on various Hive SQL functions and operators 
> (+/-) that accept dates from string values. A partial list of affected 
> functions is outlined below:
> * CAST (V AS DATE)
> * CAST (V AS TIMESTAMP)
> * TO_DATE
> * DATE_ADD
> * DATE_DIFF
> * WEEKOFYEAR
> * DAYOFWEEK
> * TRUNC



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27586) Parse dates from strings ignoring trailing (potentialy) invalid chars

2024-09-19 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27586:
---
Attachment: cast_as_date.q

> Parse dates from strings ignoring trailing (potentialy) invalid chars
> -
>
> Key: HIVE-27586
> URL: https://issues.apache.org/jira/browse/HIVE-27586
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available
> Fix For: 4.0.0
>
> Attachments: cast_as_date.q, cast_string_date_hive_versions.pdf, 
> cast_string_date_hive_versions.png, cast_string_date_hive_versions.svg
>
>
> The goal of this ticket is to extract and return a valid date from a string 
> value when there is a valid date prefix in the string.
> The following table contains a few illustrative examples highlighting what 
> happens now and what will happen after the proposed changes to ignore 
> trailing characters. HIVE-20007 introduced some behavior changes around this 
> area so the table also displays what was the Hive behavior before that change.
> ||ID||String value||Before HIVE-20007||Current behavior||Ignore trailing 
> chars||
> |1|2023-08-03_16:02:00|2023-08-03|null|2023-08-03|
> |2|2023-08-03-16:02:00|2023-08-03|null|2023-08-03|
> |3|2023-08-0316:02:00|2024-06-11|null|2023-08-03|
> |4|03-08-2023|0009-02-12|null|0003-08-20|
> |5|2023-08-03 GARBAGE|2023-08-03|2023-08-03|2023-08-03|
> |6|2023-08-03TGARBAGE|2023-08-03|2023-08-03|2023-08-03|
> |7|2023-08-03_GARBAGE|2023-08-03|null|2023-08-03|
> This change partially (see example 3 and 4) restores the behavior changes 
> introduced by HIVE-20007 and at the same time makes the current behavior of 
> handling trailing invalid chars more uniform. 
> This change will have an impact on various Hive SQL functions and operators 
> (+/-) that accept dates from string values. A partial list of affected 
> functions is outlined below:
> * CAST (V AS DATE)
> * CAST (V AS TIMESTAMP)
> * TO_DATE
> * DATE_ADD
> * DATE_DIFF
> * WEEKOFYEAR
> * DAYOFWEEK
> * TRUNC



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27586) Parse dates from strings ignoring trailing (potentialy) invalid chars

2024-09-19 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27586:
---
Attachment: cast_string_date_hive_versions.png

> Parse dates from strings ignoring trailing (potentialy) invalid chars
> -
>
> Key: HIVE-27586
> URL: https://issues.apache.org/jira/browse/HIVE-27586
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available
> Fix For: 4.0.0
>
> Attachments: cast_string_date_hive_versions.pdf, 
> cast_string_date_hive_versions.png, cast_string_date_hive_versions.svg
>
>
> The goal of this ticket is to extract and return a valid date from a string 
> value when there is a valid date prefix in the string.
> The following table contains a few illustrative examples highlighting what 
> happens now and what will happen after the proposed changes to ignore 
> trailing characters. HIVE-20007 introduced some behavior changes around this 
> area so the table also displays what was the Hive behavior before that change.
> ||ID||String value||Before HIVE-20007||Current behavior||Ignore trailing 
> chars||
> |1|2023-08-03_16:02:00|2023-08-03|null|2023-08-03|
> |2|2023-08-03-16:02:00|2023-08-03|null|2023-08-03|
> |3|2023-08-0316:02:00|2024-06-11|null|2023-08-03|
> |4|03-08-2023|0009-02-12|null|0003-08-20|
> |5|2023-08-03 GARBAGE|2023-08-03|2023-08-03|2023-08-03|
> |6|2023-08-03TGARBAGE|2023-08-03|2023-08-03|2023-08-03|
> |7|2023-08-03_GARBAGE|2023-08-03|null|2023-08-03|
> This change partially (see example 3 and 4) restores the behavior changes 
> introduced by HIVE-20007 and at the same time makes the current behavior of 
> handling trailing invalid chars more uniform. 
> This change will have an impact on various Hive SQL functions and operators 
> (+/-) that accept dates from string values. A partial list of affected 
> functions is outlined below:
> * CAST (V AS DATE)
> * CAST (V AS TIMESTAMP)
> * TO_DATE
> * DATE_ADD
> * DATE_DIFF
> * WEEKOFYEAR
> * DAYOFWEEK
> * TRUNC



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27586) Parse dates from strings ignoring trailing (potentialy) invalid chars

2024-09-19 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27586:
---
Attachment: cast_string_date_hive_versions.pdf

> Parse dates from strings ignoring trailing (potentialy) invalid chars
> -
>
> Key: HIVE-27586
> URL: https://issues.apache.org/jira/browse/HIVE-27586
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0-beta-1
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: backwards-compatibility, pull-request-available
> Fix For: 4.0.0
>
> Attachments: cast_string_date_hive_versions.pdf
>
>
> The goal of this ticket is to extract and return a valid date from a string 
> value when there is a valid date prefix in the string.
> The following table contains a few illustrative examples highlighting what 
> happens now and what will happen after the proposed changes to ignore 
> trailing characters. HIVE-20007 introduced some behavior changes around this 
> area so the table also displays what was the Hive behavior before that change.
> ||ID||String value||Before HIVE-20007||Current behavior||Ignore trailing 
> chars||
> |1|2023-08-03_16:02:00|2023-08-03|null|2023-08-03|
> |2|2023-08-03-16:02:00|2023-08-03|null|2023-08-03|
> |3|2023-08-0316:02:00|2024-06-11|null|2023-08-03|
> |4|03-08-2023|0009-02-12|null|0003-08-20|
> |5|2023-08-03 GARBAGE|2023-08-03|2023-08-03|2023-08-03|
> |6|2023-08-03TGARBAGE|2023-08-03|2023-08-03|2023-08-03|
> |7|2023-08-03_GARBAGE|2023-08-03|null|2023-08-03|
> This change partially (see example 3 and 4) restores the behavior changes 
> introduced by HIVE-20007 and at the same time makes the current behavior of 
> handling trailing invalid chars more uniform. 
> This change will have an impact on various Hive SQL functions and operators 
> (+/-) that accept dates from string values. A partial list of affected 
> functions is outlined below:
> * CAST (V AS DATE)
> * CAST (V AS TIMESTAMP)
> * TO_DATE
> * DATE_ADD
> * DATE_DIFF
> * WEEKOFYEAR
> * DAYOFWEEK
> * TRUNC



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28528) Support Bitmap function

2024-09-18 Thread yongzhi.shao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882871#comment-17882871
 ] 

yongzhi.shao commented on HIVE-28528:
-

[~zhangbutao] & [~dkuzmenko]  & [~ayushsaxena]

Hello. what do you think?

> Support Bitmap function
> ---
>
> Key: HIVE-28528
> URL: https://issues.apache.org/jira/browse/HIVE-28528
> Project: Hive
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>  Components: UDF
>Affects Versions: 4.0.0
>Reporter: yongzhi.shao
>Priority: Major
>
> Since we have introduced roaringbitmap dependency in hive-ql module.
> Can we take this opportunity to introduce bitmap related UDFs, which can be 
> used to quickly compute intersection and merger differences, de-duplication 
> statistics, and other computational needs.
> If so, I can do this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28528) Support Bitmap function

2024-09-18 Thread yongzhi.shao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-28528:

Priority: Minor  (was: Major)

> Support Bitmap function
> ---
>
> Key: HIVE-28528
> URL: https://issues.apache.org/jira/browse/HIVE-28528
> Project: Hive
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>  Components: UDF
>Affects Versions: 4.0.0
>Reporter: yongzhi.shao
>Priority: Minor
>
> Since we have introduced roaringbitmap dependency in hive-ql module.
> Can we take this opportunity to introduce bitmap related UDFs, which can be 
> used to quickly compute intersection and merger differences, de-duplication 
> statistics, and other computational needs.
> If so, I can do this.
>  
> DEMO:
> {code:java}
> CREATE TABLE IF NOT EXISTS `hive_bitmap_table`
> ( 
> k  int,
> uuid   bigint,
> bitmap binary 
> ) comment
> STORED AS ORC;  
> --demo
> select count(distinct uuid) from hive_bitmap_table;
> select bitmap_count(to_bitmap(uuid)) from hive_bitmap_table; 
> insert into table hive_bitmap_table select  2 as id,2 as uuid,to_bitmap(2) as 
> bitmap;{code}
>  
>  
>  
> |UDF|desc                           |demo               |result|
> |to_bitmap|Convert number (int or bigint) to bitmap|to_bitmap(num)|bitmap 
> (binary)|
> |bitmap_union|Multiple bitmaps merged into one bitmap 
> (concatenation)|bitmap_union(bitmap)|bitmap|
> |bitmap_count|Calculate the number of elements stored in the 
> bitmap|bitmap_count(bitmap)|long|
> |bitmap_and|Calculate the intersection of two 
> bitmaps|bitmap_and(bitmap1,bitmap2)|bitmap|
> |bitmap_or|Calculate the concatenation of two 
> bitmaps|bitmap_or(bitmap1,bitmap2)|bitmap|
> |bitmap_xor|Calculate the difference between two 
> bitmaps|bitmap_xor(bitmap1,bitmap2)|bitmap|
> |bitmap_from_array|Converting an array to a 
> bitmap|bitmap_from_array(array)|bitmap|
> |bitmap_to_array|Convert bitmap to 
> array|bitmap_to_array(bitmap)|array|
> |bitmap_contains|Determine if a bitmap contains all the elements of another 
> bitmap.|bitmap_contains(bitmap1,bitmap2)|boolean|
> |bitmap_contains|Determine if a bitmap contains an 
> element|bitmap_contains(bitmap,num)|boolean|
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28528) Support Bitmap function

2024-09-18 Thread yongzhi.shao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-28528:

Description: 
Since we have introduced roaringbitmap dependency in hive-ql module.

Can we take this opportunity to introduce bitmap related UDFs, which can be 
used to quickly compute intersection and merger differences, de-duplication 
statistics, and other computational needs.

If so, I can do this.

 

DEMO:
{code:java}

CREATE TABLE IF NOT EXISTS `hive_bitmap_table`
( 
k  int,
uuid   bigint,
bitmap binary 
) comment
STORED AS ORC;  

--demo
select count(distinct uuid) from hive_bitmap_table;
select bitmap_count(to_bitmap(uuid)) from hive_bitmap_table; 
insert into table hive_bitmap_table select  2 as id,2 as uuid,to_bitmap(2) as 
bitmap;{code}
 

 

 
|UDF|desc                           |demo               |result|
|to_bitmap|Convert number (int or bigint) to bitmap|to_bitmap(num)|bitmap 
(binary)|
|bitmap_union|Multiple bitmaps merged into one bitmap 
(concatenation)|bitmap_union(bitmap)|bitmap|
|bitmap_count|Calculate the number of elements stored in the 
bitmap|bitmap_count(bitmap)|long|
|bitmap_and|Calculate the intersection of two 
bitmaps|bitmap_and(bitmap1,bitmap2)|bitmap|
|bitmap_or|Calculate the concatenation of two 
bitmaps|bitmap_or(bitmap1,bitmap2)|bitmap|
|bitmap_xor|Calculate the difference between two 
bitmaps|bitmap_xor(bitmap1,bitmap2)|bitmap|
|bitmap_from_array|Converting an array to a 
bitmap|bitmap_from_array(array)|bitmap|
|bitmap_to_array|Convert bitmap to array|bitmap_to_array(bitmap)|array|
|bitmap_contains|Determine if a bitmap contains all the elements of another 
bitmap.|bitmap_contains(bitmap1,bitmap2)|boolean|
|bitmap_contains|Determine if a bitmap contains an 
element|bitmap_contains(bitmap,num)|boolean|

 

  was:
Since we have introduced roaringbitmap dependency in hive-ql module.

Can we take this opportunity to introduce bitmap related UDFs, which can be 
used to quickly compute intersection and merger differences, de-duplication 
statistics, and other computational needs.

If so, I can do this.

 

 

 

 
|UDF|desc                           |demo               |result|
|to_bitmap|Convert number (int or bigint) to bitmap|to_bitmap(num)|bitmap|
|bitmap_union|Multiple bitmaps merged into one bitmap 
(concatenation)|bitmap_union(bitmap)|bitmap|
|bitmap_count|Calculate the number of elements stored in the 
bitmap|bitmap_count(bitmap)|long|
|bitmap_and|Calculate the intersection of two 
bitmaps|bitmap_and(bitmap1,bitmap2)|bitmap|
|bitmap_or|Calculate the concatenation of two 
bitmaps|bitmap_or(bitmap1,bitmap2)|bitmap|
|bitmap_xor|Calculate the difference between two 
bitmaps|bitmap_xor(bitmap1,bitmap2)|bitmap|
|bitmap_from_array|Converting an array to a 
bitmap|bitmap_from_array(array)|bitmap|
|bitmap_to_array|Convert bitmap to array|bitmap_to_array(bitmap)|array|
|bitmap_contains|Determine if a bitmap contains all the elements of another 
bitmap.|bitmap_contains(bitmap1,bitmap2)|boolean|
|bitmap_contains|Determine if a bitmap contains an 
element|bitmap_contains(bitmap,num)|boolean|

 


> Support Bitmap function
> ---
>
> Key: HIVE-28528
> URL: https://issues.apache.org/jira/browse/HIVE-28528
> Project: Hive
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>  Components: UDF
>Affects Versions: 4.0.0
>Reporter: yongzhi.shao
>Priority: Major
>
> Since we have introduced roaringbitmap dependency in hive-ql module.
> Can we take this opportunity to introduce bitmap related UDFs, which can be 
> used to quickly compute intersection and merger differences, de-duplication 
> statistics, and other computational needs.
> If so, I can do this.
>  
> DEMO:
> {code:java}
> CREATE TABLE IF NOT EXISTS `hive_bitmap_table`
> ( 
> k  int,
> uuid   bigint,
> bitmap binary 
> ) comment
> STORED AS ORC;  
> --demo
> select count(distinct uuid) from hive_bitmap_table;
> select bitmap_count(to_bitmap(uuid)) from hive_bitmap_table; 
> insert into table hive_bitmap_table select  2 as id,2 as uuid,to_bitmap(2) as 
> bitmap;{code}
>  
>  
>  
> |UDF|desc                           |demo               |result|
> |to_bitmap|Convert number (int or bigint) to bitmap|to_bitmap(num)|bitmap 
> (binary)|
> |bitmap_union|Multiple bitmaps merged into one bitmap 
> (concatenation)|bitmap_union(bitmap)|bitmap|
> |bitmap_count|Calculate the number of elements stored in the 
> bitmap|bitmap_count(bitmap)|long|
> |bitmap_and|Calculate the intersection of two 
> bitmaps|bitmap_and(bitmap1,bitmap2)|bitmap|
> |bitmap_or|Calculate the concatenation of two 
> bitmaps|bitmap_or(bitmap1,bitmap2)|bitmap|
> |bitmap

[jira] [Updated] (HIVE-28528) Support Bitmap function

2024-09-18 Thread yongzhi.shao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-28528:

Description: 
Since we have introduced roaringbitmap dependency in hive-ql module.

Can we take this opportunity to introduce bitmap related UDFs, which can be 
used to quickly compute intersection and merger differences, de-duplication 
statistics, and other computational needs.

If so, I can do this.

 

 

 

 
|UDF|desc                           |demo               |result|
|to_bitmap|Convert number (int or bigint) to bitmap|to_bitmap(num)|bitmap|
|bitmap_union|Multiple bitmaps merged into one bitmap 
(concatenation)|bitmap_union(bitmap)|bitmap|
|bitmap_count|Calculate the number of elements stored in the 
bitmap|bitmap_count(bitmap)|long|
|bitmap_and|Calculate the intersection of two 
bitmaps|bitmap_and(bitmap1,bitmap2)|bitmap|
|bitmap_or|Calculate the concatenation of two 
bitmaps|bitmap_or(bitmap1,bitmap2)|bitmap|
|bitmap_xor|Calculate the difference between two 
bitmaps|bitmap_xor(bitmap1,bitmap2)|bitmap|
|bitmap_from_array|Converting an array to a 
bitmap|bitmap_from_array(array)|bitmap|
|bitmap_to_array|Convert bitmap to array|bitmap_to_array(bitmap)|array|
|bitmap_contains|Determine if a bitmap contains all the elements of another 
bitmap.|bitmap_contains(bitmap1,bitmap2)|boolean|
|bitmap_contains|Determine if a bitmap contains an 
element|bitmap_contains(bitmap,num)|boolean|

 

  was:
Since we have introduced roaringbitmap dependency in hive-ql module.

Can we take this opportunity to introduce bitmap related UDFs, which can be 
used to quickly compute intersection and merger differences, de-duplication 
statistics, and other computational needs.

If so, I can do this.

 

 

 

 

| UDF | desc                           | demo               | result|
|:-:|:---:|::|:-:|
| to_bitmap | Convert number (int or bigint) to bitmap | to_bitmap(num) | 
bitmap |
| bitmap_union | Multiple bitmaps merged into one bitmap (concatenation) | 
bitmap_union(bitmap) | bitmap |
| bitmap_count | Calculate the number of elements stored in the bitmap | 
bitmap_count(bitmap) | long |
| bitmap_and | Calculate the intersection of two bitmaps | 
bitmap_and(bitmap1,bitmap2) | bitmap |
| bitmap_or | Calculate the concatenation of two bitmaps | 
bitmap_or(bitmap1,bitmap2) | bitmap |
| bitmap_xor | Calculate the difference between two bitmaps | 
bitmap_xor(bitmap1,bitmap2) | bitmap |
| bitmap_from_array | Converting an array to a bitmap | 
bitmap_from_array(array) | bitmap |
| bitmap_to_array | Convert bitmap to array | bitmap_to_array(bitmap) | 
array |
| bitmap_contains | Determine if a bitmap contains all the elements of another 
bitmap. | bitmap_contains(bitmap1,bitmap2) | boolean |
| bitmap_contains | Determine if a bitmap contains an element | 
bitmap_contains(bitmap,num) | boolean |
 


> Support Bitmap function
> ---
>
> Key: HIVE-28528
> URL: https://issues.apache.org/jira/browse/HIVE-28528
> Project: Hive
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>  Components: UDF
>Affects Versions: 4.0.0
>Reporter: yongzhi.shao
>Priority: Major
>
> Since we have introduced roaringbitmap dependency in hive-ql module.
> Can we take this opportunity to introduce bitmap related UDFs, which can be 
> used to quickly compute intersection and merger differences, de-duplication 
> statistics, and other computational needs.
> If so, I can do this.
>  
>  
>  
>  
> |UDF|desc                           |demo               |result|
> |to_bitmap|Convert number (int or bigint) to bitmap|to_bitmap(num)|bitmap|
> |bitmap_union|Multiple bitmaps merged into one bitmap 
> (concatenation)|bitmap_union(bitmap)|bitmap|
> |bitmap_count|Calculate the number of elements stored in the 
> bitmap|bitmap_count(bitmap)|long|
> |bitmap_and|Calculate the intersection of two 
> bitmaps|bitmap_and(bitmap1,bitmap2)|bitmap|
> |bitmap_or|Calculate the concatenation of two 
> bitmaps|bitmap_or(bitmap1,bitmap2)|bitmap|
> |bitmap_xor|Calculate the difference between two 
> bitmaps|bitmap_xor(bitmap1,bitmap2)|bitmap|
> |bitmap_from_array|Converting an array to a 
> bitmap|bitmap_from_array(array)|bitmap|
> |bitmap_to_array|Convert bitmap to 
> array|bitmap_to_array(bitmap)|array|
> |bitmap_contains|Determine if a bitmap contains all the elements of another 
> bitmap.|bitmap_contains(bitmap1,bitmap2)|boolean|
> |bitmap_contains|Determine if a bitmap contains an 
> element|bitmap_contains(bitmap,num)|boolean|
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   7   8   9   10   >