[jira] [Created] (HIVE-27807) Backport of HIVE-20629, HIVE-20705, HIVE-20734

2023-10-17 Thread Aman Raj (Jira)
Aman Raj created HIVE-27807:
---

 Summary: Backport of HIVE-20629, HIVE-20705, HIVE-20734
 Key: HIVE-27807
 URL: https://issues.apache.org/jira/browse/HIVE-27807
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 3.2.0
Reporter: Aman Raj
Assignee: Aman Raj






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-25351) stddev(), stddev_pop() with CBO enable returning null

2023-10-17 Thread Dayakar M (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dayakar M updated HIVE-25351:
-
Summary: stddev(), stddev_pop() with CBO enable returning null  (was: 
stddev(), sstddev_pop() with CBO enable returning null)

> stddev(), stddev_pop() with CBO enable returning null
> -
>
> Key: HIVE-25351
> URL: https://issues.apache.org/jira/browse/HIVE-25351
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashish Sharma
>Assignee: Dayakar M
>Priority: Blocker
>
> *script used to repro*
> create table cbo_test (key string, v1 double, v2 decimal(30,2), v3 
> decimal(30,2));
> insert into cbo_test values ("00140006375905", 10230.72, 
> 10230.72, 10230.69), ("00140006375905", 10230.72, 10230.72, 
> 10230.69), ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69);
> select stddev(v1), stddev(v2), stddev(v3) from cbo_test;
> *Enable CBO*
> ++
> |  Explain   |
> ++
> | Plan optimized by CBO. |
> ||
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)|
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2 vectorized |
> |   File Output Operator [FS_13] |
> | Select Operator [SEL_12] (rows=1 width=24) |
> |   Output:["_col0","_col1","_col2"] |
> |   Group By Operator [GBY_11] (rows=1 width=72) |
> | 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(VALUE._col0)","sum(VALUE._col1)","count(VALUE._col2)","sum(VALUE._col3)","sum(VALUE._col4)","count(VALUE._col5)","sum(VALUE._col6)","sum(VALUE._col7)","count(VALUE._col8)"]
>  |
> |   <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized  |
> | PARTITION_ONLY_SHUFFLE [RS_10] |
> |   Group By Operator [GBY_9] (rows=1 width=72) |
> | 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(_col3)","sum(_col0)","count(_col0)","sum(_col5)","sum(_col4)","count(_col1)","sum(_col7)","sum(_col6)","count(_col2)"]
>  |
> | Select Operator [SEL_8] (rows=6 width=232) |
> |   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7"] |
> |   TableScan [TS_0] (rows=6 width=232) |
> | default@cbo_test,cbo_test, ACID 
> table,Tbl:COMPLETE,Col:COMPLETE,Output:["v1","v2","v3"] |
> ||
> ++
> *Query Result* 
> _c0   _c1 _c2
> 0.0   NaN NaN
> *Disable CBO*
> ++
> |  Explain   |
> ++
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)|
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2 vectorized |
> |   File Output Operator [FS_11] |
> | Group By Operator [GBY_10] (rows=1 width=24) |
> |   
> Output:["_col0","_col1","_col2"],aggregations:["stddev(VALUE._col0)","stddev(VALUE._col1)","stddev(VALUE._col2)"]
>  |
> | <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized|
> |   PARTITION_ONLY_SHUFFLE [RS_9]|
> | Group By Operator [GBY_8] (rows=1 width=240) |
> |   
> Output:["_col0","_col1","_col2"],aggregations:["stddev(v1)","stddev(v2)","stddev(v3)"]
>  |
> |   Select Operator [SEL_7] (rows=6 width=232) |
> | Output:["v1","v2","v3"]|
> | TableScan [TS_0] (rows=6 width=232) |
> |   default@cbo_test,cbo_test, ACID 
> 

[jira] [Commented] (HIVE-25351) stddev(), sstddev_pop() with CBO enable returning null

2023-10-17 Thread Dayakar M (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776470#comment-17776470
 ] 

Dayakar M commented on HIVE-25351:
--

There is no activity for the last 2 years so assigned myself to work on this.

> stddev(), sstddev_pop() with CBO enable returning null
> --
>
> Key: HIVE-25351
> URL: https://issues.apache.org/jira/browse/HIVE-25351
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashish Sharma
>Assignee: Dayakar M
>Priority: Blocker
>
> *script used to repro*
> create table cbo_test (key string, v1 double, v2 decimal(30,2), v3 
> decimal(30,2));
> insert into cbo_test values ("00140006375905", 10230.72, 
> 10230.72, 10230.69), ("00140006375905", 10230.72, 10230.72, 
> 10230.69), ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69);
> select stddev(v1), stddev(v2), stddev(v3) from cbo_test;
> *Enable CBO*
> ++
> |  Explain   |
> ++
> | Plan optimized by CBO. |
> ||
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)|
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2 vectorized |
> |   File Output Operator [FS_13] |
> | Select Operator [SEL_12] (rows=1 width=24) |
> |   Output:["_col0","_col1","_col2"] |
> |   Group By Operator [GBY_11] (rows=1 width=72) |
> | 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(VALUE._col0)","sum(VALUE._col1)","count(VALUE._col2)","sum(VALUE._col3)","sum(VALUE._col4)","count(VALUE._col5)","sum(VALUE._col6)","sum(VALUE._col7)","count(VALUE._col8)"]
>  |
> |   <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized  |
> | PARTITION_ONLY_SHUFFLE [RS_10] |
> |   Group By Operator [GBY_9] (rows=1 width=72) |
> | 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(_col3)","sum(_col0)","count(_col0)","sum(_col5)","sum(_col4)","count(_col1)","sum(_col7)","sum(_col6)","count(_col2)"]
>  |
> | Select Operator [SEL_8] (rows=6 width=232) |
> |   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7"] |
> |   TableScan [TS_0] (rows=6 width=232) |
> | default@cbo_test,cbo_test, ACID 
> table,Tbl:COMPLETE,Col:COMPLETE,Output:["v1","v2","v3"] |
> ||
> ++
> *Query Result* 
> _c0   _c1 _c2
> 0.0   NaN NaN
> *Disable CBO*
> ++
> |  Explain   |
> ++
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)|
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2 vectorized |
> |   File Output Operator [FS_11] |
> | Group By Operator [GBY_10] (rows=1 width=24) |
> |   
> Output:["_col0","_col1","_col2"],aggregations:["stddev(VALUE._col0)","stddev(VALUE._col1)","stddev(VALUE._col2)"]
>  |
> | <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized|
> |   PARTITION_ONLY_SHUFFLE [RS_9]|
> | Group By Operator [GBY_8] (rows=1 width=240) |
> |   
> Output:["_col0","_col1","_col2"],aggregations:["stddev(v1)","stddev(v2)","stddev(v3)"]
>  |
> |   Select Operator [SEL_7] (rows=6 width=232) |
> | Output:["v1","v2","v3"]|
> | TableScan [TS_0] (rows=6 width=232) |
> |   default@cbo_test,cbo_test, ACID 
> 

[jira] [Assigned] (HIVE-25351) stddev(), sstddev_pop() with CBO enable returning null

2023-10-17 Thread Dayakar M (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dayakar M reassigned HIVE-25351:


Assignee: Dayakar M  (was: Pritha Dawn)

> stddev(), sstddev_pop() with CBO enable returning null
> --
>
> Key: HIVE-25351
> URL: https://issues.apache.org/jira/browse/HIVE-25351
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashish Sharma
>Assignee: Dayakar M
>Priority: Blocker
>
> *script used to repro*
> create table cbo_test (key string, v1 double, v2 decimal(30,2), v3 
> decimal(30,2));
> insert into cbo_test values ("00140006375905", 10230.72, 
> 10230.72, 10230.69), ("00140006375905", 10230.72, 10230.72, 
> 10230.69), ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69);
> select stddev(v1), stddev(v2), stddev(v3) from cbo_test;
> *Enable CBO*
> ++
> |  Explain   |
> ++
> | Plan optimized by CBO. |
> ||
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)|
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2 vectorized |
> |   File Output Operator [FS_13] |
> | Select Operator [SEL_12] (rows=1 width=24) |
> |   Output:["_col0","_col1","_col2"] |
> |   Group By Operator [GBY_11] (rows=1 width=72) |
> | 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(VALUE._col0)","sum(VALUE._col1)","count(VALUE._col2)","sum(VALUE._col3)","sum(VALUE._col4)","count(VALUE._col5)","sum(VALUE._col6)","sum(VALUE._col7)","count(VALUE._col8)"]
>  |
> |   <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized  |
> | PARTITION_ONLY_SHUFFLE [RS_10] |
> |   Group By Operator [GBY_9] (rows=1 width=72) |
> | 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(_col3)","sum(_col0)","count(_col0)","sum(_col5)","sum(_col4)","count(_col1)","sum(_col7)","sum(_col6)","count(_col2)"]
>  |
> | Select Operator [SEL_8] (rows=6 width=232) |
> |   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7"] |
> |   TableScan [TS_0] (rows=6 width=232) |
> | default@cbo_test,cbo_test, ACID 
> table,Tbl:COMPLETE,Col:COMPLETE,Output:["v1","v2","v3"] |
> ||
> ++
> *Query Result* 
> _c0   _c1 _c2
> 0.0   NaN NaN
> *Disable CBO*
> ++
> |  Explain   |
> ++
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)|
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2 vectorized |
> |   File Output Operator [FS_11] |
> | Group By Operator [GBY_10] (rows=1 width=24) |
> |   
> Output:["_col0","_col1","_col2"],aggregations:["stddev(VALUE._col0)","stddev(VALUE._col1)","stddev(VALUE._col2)"]
>  |
> | <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized|
> |   PARTITION_ONLY_SHUFFLE [RS_9]|
> | Group By Operator [GBY_8] (rows=1 width=240) |
> |   
> Output:["_col0","_col1","_col2"],aggregations:["stddev(v1)","stddev(v2)","stddev(v3)"]
>  |
> |   Select Operator [SEL_7] (rows=6 width=232) |
> | Output:["v1","v2","v3"]|
> | TableScan [TS_0] (rows=6 width=232) |
> |   default@cbo_test,cbo_test, ACID 
> table,Tbl:COMPLETE,Col:COMPLETE,Output:["v1","v2","v3"] |
> ||
> 

[jira] [Created] (HIVE-27806) Backport of HIVE-20536, HIVE-20632, HIVE-20511, HIVE-20560, HIVE-20631, HIVE-20637, HIVE-20609, HIVE-20439

2023-10-17 Thread Aman Raj (Jira)
Aman Raj created HIVE-27806:
---

 Summary: Backport of HIVE-20536, HIVE-20632, HIVE-20511, 
HIVE-20560, HIVE-20631, HIVE-20637, HIVE-20609, HIVE-20439
 Key: HIVE-27806
 URL: https://issues.apache.org/jira/browse/HIVE-27806
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 3.2.0
Reporter: Aman Raj
Assignee: Aman Raj






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27786) Iceberg: Eliminate engine.hive.enabled table property

2023-10-17 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776462#comment-17776462
 ] 

Ayush Saxena commented on HIVE-27786:
-

Committed to master.

Thanx [~dkuzmenko] for the review!!!

> Iceberg: Eliminate engine.hive.enabled table property
> -
>
> Key: HIVE-27786
> URL: https://issues.apache.org/jira/browse/HIVE-27786
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>
> Hive iceberg tables persists *engine.hive.enabled* this property, attempt to 
> eliminate this & make sure things work without it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27786) Iceberg: Eliminate engine.hive.enabled table property

2023-10-17 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HIVE-27786.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Iceberg: Eliminate engine.hive.enabled table property
> -
>
> Key: HIVE-27786
> URL: https://issues.apache.org/jira/browse/HIVE-27786
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Hive iceberg tables persists *engine.hive.enabled* this property, attempt to 
> eliminate this & make sure things work without it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27676) Reuse the add_partitions logic for add_partition in ObjectStore

2023-10-17 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala resolved HIVE-27676.
--
Resolution: Fixed

[~wechar] - Patch merged to the master branch. Thanks for the contribution.

> Reuse the add_partitions logic for add_partition in ObjectStore
> ---
>
> Key: HIVE-27676
> URL: https://issues.apache.org/jira/browse/HIVE-27676
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0-beta-1
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
>
> HIVE-26035 implements direct SQL for {{add_partitions}} to improve 
> performance, we can also reuse this logic for {{{}add_partition{}}} with 
> following benefits:
> * Get the performance improvement in direct SQL
> * Code cleaner, reduce the duplicate code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27804) Implement batching in getPartition calls which returns partition list along with auth info

2023-10-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27804:
--
Labels: pull-request-available  (was: )

> Implement batching in getPartition calls which returns partition list along 
> with auth info
> --
>
> Key: HIVE-27804
> URL: https://issues.apache.org/jira/browse/HIVE-27804
> Project: Hive
>  Issue Type: Bug
>Reporter: Vikram Ahuja
>Assignee: Vikram Ahuja
>Priority: Major
>  Labels: pull-request-available
>
> Hive.getPartitions() methods returns partition list along with auth info in 
> one HMS call. These calls when made on wide tables(> 2000 columns) with very 
> large number of partitions(100,000+) can cause memory related issues when the 
> data is being transferred from HMS to HS2 using Thrift calls. These APIs can 
> be optimised by using PartitionIterable implementation where the partition 
> list if fetched in batched of a smaller size rather than one huge call. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27805) Hive server2 connections limits bug

2023-10-17 Thread Xiwei Wang (Jira)
Xiwei Wang created HIVE-27805:
-

 Summary: Hive server2 connections limits bug
 Key: HIVE-27805
 URL: https://issues.apache.org/jira/browse/HIVE-27805
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 3.1.3, 3.1.2, 3.1.1, 3.0.0, 3.1.0
 Environment: image: apache/hive:3.1.3

jdbc driver: hive-jdbc:2.1.0
Reporter: Xiwei Wang


When I use JDBC and specify a non-existent database to connect to a hiveserver2 
that configured hive.server2.limit.connections.per.user=10, a session 
initialization error occurs(org.apache.hive.service.cli.HiveSQLException: 
Failed to open new session: Database not_exists_db does not exist); and even a 
normal connection will report an error after the number of attempts exceeds the 
maximum limit I configured (org.apache.hive.service.cli.HiveSQLException: 
Connection limit per user reached (user: aeolus limit: 10))
 
I found that inside the method 
org.apache.hive.service.cli.session.SessionManager#createSession
, if seesion initialization fails, it will cause the increased number of 
connections called incrementConnections cannot be released; after the number of 
failures exceeds the maximum number of connections configured by the user, such 
as hive.server2.limit.connections.per.user, hiveserver2 will not accept any 
connections due to the limitations.
 
{code:java}
2023-10-17T12:14:54,313  WARN [HiveServer2-Handler-Pool: Thread-3329] 
thrift.ThriftCLIService: Error opening session:
org.apache.hive.service.cli.HiveSQLException: Failed to open new session: 
Database not_exists_db does not exist
    at 
org.apache.hive.service.cli.session.SessionManager.createSession(SessionManager.java:434)
 ~[hive-service-3.1.3.jar:3.1.3]
    at 
org.apache.hive.service.cli.session.SessionManager.openSession(SessionManager.java:373)
 ~[hive-service-3.1.3.jar:3.1.3]
    at org.apache.hive.service.cli.CLIService.openSession(CLIService.java:187) 
~[hive-service-3.1.3.jar:3.1.3]
    at 
org.apache.hive.service.cli.thrift.ThriftCLIService.getSessionHandle(ThriftCLIService.java:475)
 ~[hive-service-3.1.3.jar:3.1.3]
    at 
org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:322)
 ~[hive-service-3.1.3.jar:3.1.3]
    at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1497)
 ~[hive-exec-3.1.3.jar:3.1.3]
    at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1482)
 ~[hive-exec-3.1.3.jar:3.1.3]
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
~[hive-exec-3.1.3.jar:3.1.3]
    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
~[hive-exec-3.1.3.jar:3.1.3]
    at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
 ~[hive-service-3.1.3.jar:3.1.3]
    at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
 ~[hive-exec-3.1.3.jar:3.1.3]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_342]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_342]
    at java.lang.Thread.run(Thread.java:750) [?:1.8.0_342]
Caused by: org.apache.hive.service.cli.HiveSQLException: Database dw_aeolus 
does not exist
    at 
org.apache.hive.service.cli.session.HiveSessionImpl.configureSession(HiveSessionImpl.java:294)
 ~[hive-service-3.1.3.jar:3.1.3]
    at 
org.apache.hive.service.cli.session.HiveSessionImpl.open(HiveSessionImpl.java:199)
 ~[hive-service-3.1.3.jar:3.1.3]
    at 
org.apache.hive.service.cli.session.SessionManager.createSession(SessionManager.java:425)
 ~[hive-service-3.1.3.jar:3.1.3]
    ... 13 more
2023-10-17T12:14:54,972  INFO [HiveServer2-Handler-Pool: Thread-3330] 
thrift.ThriftCLIService: Client protocol version: HIVE_CLI_SERVICE_PROTOCOL_V8
2023-10-17T12:14:54,973 ERROR [HiveServer2-Handler-Pool: Thread-3330] 
service.CompositeService: Connection limit per user reached (user: aeolus 
limit: 10)
2023-10-17T12:14:54,973  WARN [HiveServer2-Handler-Pool: Thread-3330] 
thrift.ThriftCLIService: Error opening session:
org.apache.hive.service.cli.HiveSQLException: Connection limit per user reached 
(user: aeolus limit: 10)
    at 
org.apache.hive.service.cli.session.SessionManager.incrementConnections(SessionManager.java:476)
 ~[hive-service-3.1.3.jar:3.1.3]
    at 
org.apache.hive.service.cli.session.SessionManager.createSession(SessionManager.java:383)
 ~[hive-service-3.1.3.jar:3.1.3]
    at 
org.apache.hive.service.cli.session.SessionManager.openSession(SessionManager.java:373)
 ~[hive-service-3.1.3.jar:3.1.3]
    at org.apache.hive.service.cli.CLIService.openSession(CLIService.java:187) 
~[hive-service-3.1.3.jar:3.1.3]
    at 
org.apache.hive.service.cli.thrift.ThriftCLIService.getSessionHandle(ThriftCLIService.java:475)

[jira] [Updated] (HIVE-27804) Implement batching in getPartition calls which returns partition list along with auth info

2023-10-17 Thread Vikram Ahuja (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Ahuja updated HIVE-27804:

Summary: Implement batching in getPartition calls which returns partition 
list along with auth info  (was: Implement batching in getPartition calls which 
returns auth info)

> Implement batching in getPartition calls which returns partition list along 
> with auth info
> --
>
> Key: HIVE-27804
> URL: https://issues.apache.org/jira/browse/HIVE-27804
> Project: Hive
>  Issue Type: Bug
>Reporter: Vikram Ahuja
>Assignee: Vikram Ahuja
>Priority: Major
>
> Hive.getPartitions() methods returns partition list along with auth info in 
> one HMS call. These calls when made on wide tables(> 2000 columns) with very 
> large number of partitions(100,000+) can cause memory related issues when the 
> data is being transferred from HMS to HS2 using Thrift calls. These APIs can 
> be optimised by using PartitionIterable implementation where the partition 
> list if fetched in batched of a smaller size rather than one huge call. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27804) Implement batching in getPartition calls which returns auth info

2023-10-17 Thread Vikram Ahuja (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Ahuja updated HIVE-27804:

Description: Hive.getPartitions() methods returns partition list along with 
auth info in one HMS call. These calls when made on wide tables(> 2000 columns) 
with very large number of partitions(100,000+) can cause memory related issues 
when the data is being transferred from HMS to HS2 using Thrift calls. These 
APIs can be optimised by using PartitionIterable implementation where the 
partition list if fetched in batched of a smaller size rather than one huge 
call.   (was: Hive.getPartitions() methods returns partition list along with 
auth info in one . These calls when made on table with )

> Implement batching in getPartition calls which returns auth info
> 
>
> Key: HIVE-27804
> URL: https://issues.apache.org/jira/browse/HIVE-27804
> Project: Hive
>  Issue Type: Bug
>Reporter: Vikram Ahuja
>Assignee: Vikram Ahuja
>Priority: Major
>
> Hive.getPartitions() methods returns partition list along with auth info in 
> one HMS call. These calls when made on wide tables(> 2000 columns) with very 
> large number of partitions(100,000+) can cause memory related issues when the 
> data is being transferred from HMS to HS2 using Thrift calls. These APIs can 
> be optimised by using PartitionIterable implementation where the partition 
> list if fetched in batched of a smaller size rather than one huge call. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27804) Implement batching in getPartition calls which returns auth info

2023-10-17 Thread Vikram Ahuja (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Ahuja updated HIVE-27804:

Summary: Implement batching in getPartition calls which returns auth info  
(was: Implement batching in getPartition calls which returns auth info as well)

> Implement batching in getPartition calls which returns auth info
> 
>
> Key: HIVE-27804
> URL: https://issues.apache.org/jira/browse/HIVE-27804
> Project: Hive
>  Issue Type: Bug
>Reporter: Vikram Ahuja
>Assignee: Vikram Ahuja
>Priority: Major
>
> Hive.getPartitions() methods returns partition list along with auth info in 
> one . These calls when made on table with 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27804) Implement batching in getPartition calls which returns auth info as well

2023-10-17 Thread Vikram Ahuja (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Ahuja updated HIVE-27804:

Description: Hive.getPartitions() methods returns partition list along with 
auth info in one . These calls when made on table with 

> Implement batching in getPartition calls which returns auth info as well
> 
>
> Key: HIVE-27804
> URL: https://issues.apache.org/jira/browse/HIVE-27804
> Project: Hive
>  Issue Type: Bug
>Reporter: Vikram Ahuja
>Assignee: Vikram Ahuja
>Priority: Major
>
> Hive.getPartitions() methods returns partition list along with auth info in 
> one . These calls when made on table with 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27804) Implement batching in getPartition calls which returns auth info as well

2023-10-17 Thread Vikram Ahuja (Jira)
Vikram Ahuja created HIVE-27804:
---

 Summary: Implement batching in getPartition calls which returns 
auth info as well
 Key: HIVE-27804
 URL: https://issues.apache.org/jira/browse/HIVE-27804
 Project: Hive
  Issue Type: Bug
Reporter: Vikram Ahuja
Assignee: Vikram Ahuja






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27797) Transactions that got timed out are not getting logged as ABORTED transactions in NOTIFICATION_LOG

2023-10-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27797:
--
Labels: pull-request-available  (was: )

> Transactions that got timed out are not getting logged as ABORTED 
> transactions in NOTIFICATION_LOG
> --
>
> Key: HIVE-27797
> URL: https://issues.apache.org/jira/browse/HIVE-27797
> Project: Hive
>  Issue Type: Bug
>  Components: repl, Transactions
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Taraka Rama Rao Lethavadla
>Priority: Major
>  Labels: pull-request-available
>
> +Scenario:+
> Let's there are 100 transactions opened. These 100 will be logged in 
> notification_log and when replicated, they will get created in target 
> cluster. 
> Now 50 out of these 100 transactions got aborted due to timeout and got 
> removed from HMS. In this step, we are not logging those transactions in to 
> notification_log. 
> So next time when we do replication, these 50 aborted transactions will not 
> be replicated.
> As a result in the target cluster the transactions that got created earlier 
> will only get removed after number of days configured in config 
> {code:java}
> hive.repl.txn.timeout (11 days default){code}
> Actually, we have the logic to log aborted transactions if they got aborted 
> for some other reason but not for those that are getting timed out.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27797) Transactions that got timed out are not getting logged as ABORTED transactions in NOTIFICATION_LOG

2023-10-17 Thread Taraka Rama Rao Lethavadla (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Taraka Rama Rao Lethavadla updated HIVE-27797:
--
Description: 
+Scenario:+

Let's there are 100 transactions opened. These 100 will be logged in 
notification_log and when replicated, they will get created in target cluster. 

Now 50 out of these 100 transactions got aborted due to timeout and got removed 
from HMS. In this step, we are not logging those transactions in to 
notification_log. 

So next time when we do replication, these 50 aborted transactions will not be 
replicated.

As a result in the target cluster the transactions that got created earlier 
will only get removed after number of days configured in config 
{code:java}
hive.repl.txn.timeout (11 days default){code}
Actually, we have the logic to log aborted transactions if they got aborted for 
some other reason but not for those that are getting timed out.

  was:
+Scenario:+

Let's there are 100 transactions opened. These 100 will be logged in 
notification_log and when replicated, they will get created in target cluster. 

Now 50 out of these 100 transactions got aborted due to timeout and got removed 
from HMS. In this step, we are not logging those transactions in to 
notification_log. 

So next time when we do replication, these 50 aborted transactions will not be 
replicated.

As a result in the target cluster the transactions that got created earlier 
will stay forever without getting cleaned

Actually, we have the logic to log aborted transactions if they got aborted for 
some other reason but not for those getting timed out.


> Transactions that got timed out are not getting logged as ABORTED 
> transactions in NOTIFICATION_LOG
> --
>
> Key: HIVE-27797
> URL: https://issues.apache.org/jira/browse/HIVE-27797
> Project: Hive
>  Issue Type: Bug
>  Components: repl, Transactions
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Taraka Rama Rao Lethavadla
>Priority: Major
>
> +Scenario:+
> Let's there are 100 transactions opened. These 100 will be logged in 
> notification_log and when replicated, they will get created in target 
> cluster. 
> Now 50 out of these 100 transactions got aborted due to timeout and got 
> removed from HMS. In this step, we are not logging those transactions in to 
> notification_log. 
> So next time when we do replication, these 50 aborted transactions will not 
> be replicated.
> As a result in the target cluster the transactions that got created earlier 
> will only get removed after number of days configured in config 
> {code:java}
> hive.repl.txn.timeout (11 days default){code}
> Actually, we have the logic to log aborted transactions if they got aborted 
> for some other reason but not for those that are getting timed out.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27788) Exception in Sort Merge join with Group By + PTF Operator

2023-10-17 Thread Krisztian Kasa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776189#comment-17776189
 ] 

Krisztian Kasa commented on HIVE-27788:
---

Another repro with 3 records and inner join
{code}
set hive.optimize.semijoin.conversion = false;

CREATE TABLE tbl1_n5(key int, value string) CLUSTERED BY (key) SORTED BY (key) 
INTO 2 BUCKETS;

insert into tbl1_n5(key, value)
values
(0, 'val_0'),
(2, 'val_2'),
(9, 'val_9');

explain
SELECT t1.key from
(SELECT  key , row_number() over(partition by key order by value desc) as rk 
from tbl1_n5) t1
join
( SELECT key,count(distinct value) as cp_count from tbl1_n5 group by key) t2
on t1.key = t2.key where rk = 1;
{code}
{code}
POSTHOOK: query: explain
SELECT t1.key from
(SELECT  key , row_number() over(partition by key order by value desc) as rk 
from tbl1_n5) t1
join
( SELECT key,count(distinct value) as cp_count from tbl1_n5 group by key) t2
on t1.key = t2.key where rk = 1
POSTHOOK: type: QUERY
POSTHOOK: Input: default@tbl1_n5
 A masked pattern was here 
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
 A masked pattern was here 
  Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (SIMPLE_EDGE)
 A masked pattern was here 
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: tbl1_n5
  filterExpr: key is not null (type: boolean)
  Statistics: Num rows: 3 Data size: 279 Basic stats: COMPLETE 
Column stats: COMPLETE
  Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 3 Data size: 279 Basic stats: 
COMPLETE Column stats: COMPLETE
Reduce Output Operator
  key expressions: key (type: int), value (type: string)
  null sort order: aa
  sort order: +-
  Map-reduce partition columns: key (type: int)
  Statistics: Num rows: 3 Data size: 279 Basic stats: 
COMPLETE Column stats: COMPLETE
Execution mode: vectorized, llap
LLAP IO: all inputs
Map 3 
Map Operator Tree:
TableScan
  alias: tbl1_n5
  filterExpr: key is not null (type: boolean)
  Statistics: Num rows: 3 Data size: 279 Basic stats: COMPLETE 
Column stats: COMPLETE
  Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 3 Data size: 279 Basic stats: 
COMPLETE Column stats: COMPLETE
Group By Operator
  keys: key (type: int), value (type: string)
  minReductionHashAggr: 0.4
  mode: hash
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 3 Data size: 279 Basic stats: 
COMPLETE Column stats: COMPLETE
  Reduce Output Operator
key expressions: _col0 (type: int), _col1 (type: string)
null sort order: zz
sort order: ++
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 3 Data size: 279 Basic stats: 
COMPLETE Column stats: COMPLETE
Execution mode: vectorized, llap
LLAP IO: all inputs
Reducer 2 
Reduce Operator Tree:
  Group By Operator
keys: KEY._col0 (type: int), KEY._col1 (type: string)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 3 Data size: 279 Basic stats: COMPLETE 
Column stats: COMPLETE
Select Operator
  expressions: _col0 (type: int)
  outputColumnNames: _col0
  Statistics: Num rows: 3 Data size: 279 Basic stats: COMPLETE 
Column stats: COMPLETE
  Group By Operator
keys: _col0 (type: int)
mode: complete
outputColumnNames: _col0
Statistics: Num rows: 3 Data size: 12 Basic stats: COMPLETE 
Column stats: COMPLETE
Dummy Store
Execution mode: llap
Reduce Operator Tree:
  Select Operator
expressions: KEY.reducesinkkey0 (type: int), KEY.reducesinkkey1 
(type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 3 Data size: 279 Basic stats: COMPLETE 
Column stats: COMPLETE
PTF Operator
  Function definitions:
  Input definition
input alias: ptf_0
output shape: 

[jira] [Created] (HIVE-27803) Bump org.apache.avro:avro from 1.11.1 to 1.11.3

2023-10-17 Thread Ayush Saxena (Jira)
Ayush Saxena created HIVE-27803:
---

 Summary: Bump org.apache.avro:avro from 1.11.1 to 1.11.3
 Key: HIVE-27803
 URL: https://issues.apache.org/jira/browse/HIVE-27803
 Project: Hive
  Issue Type: Improvement
Reporter: Ayush Saxena


PR from *[dependabot|https://github.com/apps/dependabot]*

https://github.com/apache/hive/pull/4764



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-25351) stddev(), sstddev_pop() with CBO enable returning null

2023-10-17 Thread Dayakar M (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776105#comment-17776105
 ] 

Dayakar M commented on HIVE-25351:
--

[~Pritha] Are you working on this issue? if not please let me know, I will work 
on this. Thanks

> stddev(), sstddev_pop() with CBO enable returning null
> --
>
> Key: HIVE-25351
> URL: https://issues.apache.org/jira/browse/HIVE-25351
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashish Sharma
>Assignee: Pritha Dawn
>Priority: Blocker
>
> *script used to repro*
> create table cbo_test (key string, v1 double, v2 decimal(30,2), v3 
> decimal(30,2));
> insert into cbo_test values ("00140006375905", 10230.72, 
> 10230.72, 10230.69), ("00140006375905", 10230.72, 10230.72, 
> 10230.69), ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69);
> select stddev(v1), stddev(v2), stddev(v3) from cbo_test;
> *Enable CBO*
> ++
> |  Explain   |
> ++
> | Plan optimized by CBO. |
> ||
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)|
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2 vectorized |
> |   File Output Operator [FS_13] |
> | Select Operator [SEL_12] (rows=1 width=24) |
> |   Output:["_col0","_col1","_col2"] |
> |   Group By Operator [GBY_11] (rows=1 width=72) |
> | 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(VALUE._col0)","sum(VALUE._col1)","count(VALUE._col2)","sum(VALUE._col3)","sum(VALUE._col4)","count(VALUE._col5)","sum(VALUE._col6)","sum(VALUE._col7)","count(VALUE._col8)"]
>  |
> |   <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized  |
> | PARTITION_ONLY_SHUFFLE [RS_10] |
> |   Group By Operator [GBY_9] (rows=1 width=72) |
> | 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(_col3)","sum(_col0)","count(_col0)","sum(_col5)","sum(_col4)","count(_col1)","sum(_col7)","sum(_col6)","count(_col2)"]
>  |
> | Select Operator [SEL_8] (rows=6 width=232) |
> |   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7"] |
> |   TableScan [TS_0] (rows=6 width=232) |
> | default@cbo_test,cbo_test, ACID 
> table,Tbl:COMPLETE,Col:COMPLETE,Output:["v1","v2","v3"] |
> ||
> ++
> *Query Result* 
> _c0   _c1 _c2
> 0.0   NaN NaN
> *Disable CBO*
> ++
> |  Explain   |
> ++
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)|
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2 vectorized |
> |   File Output Operator [FS_11] |
> | Group By Operator [GBY_10] (rows=1 width=24) |
> |   
> Output:["_col0","_col1","_col2"],aggregations:["stddev(VALUE._col0)","stddev(VALUE._col1)","stddev(VALUE._col2)"]
>  |
> | <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized|
> |   PARTITION_ONLY_SHUFFLE [RS_9]|
> | Group By Operator [GBY_8] (rows=1 width=240) |
> |   
> Output:["_col0","_col1","_col2"],aggregations:["stddev(v1)","stddev(v2)","stddev(v3)"]
>  |
> |   Select Operator [SEL_7] (rows=6 width=232) |
> | Output:["v1","v2","v3"]|
> | TableScan [TS_0] (rows=6 width=232) |
> |   default@cbo_test,cbo_test, ACID 
> 

[jira] [Commented] (HIVE-27746) Hive Metastore should send single AlterPartitionEvent with list of partitions

2023-10-17 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776049#comment-17776049
 ] 

Quanlong Huang commented on HIVE-27746:
---

Another benifit of this is that partition events on different tables won't 
interleave. So event consumers like Impala can handle such batch events more 
efficiently (IMPALA-12463).

It'd be nice if we can do the same for partition level insert events.

> Hive Metastore should send single AlterPartitionEvent with list of partitions
> -
>
> Key: HIVE-27746
> URL: https://issues.apache.org/jira/browse/HIVE-27746
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Naveen Gangam
>Assignee: Zhihua Deng
>Priority: Major
>
> In HIVE-3938, work was done to send single AddPartitionEvent for APIs that 
> add partitions in bulk. Similarly, we have alter_partitions APIs that alter 
> partitions in bulk via a single HMS call. For such events, we should also 
> send a single AlterPartitionEvent with a list of partitions in it.
> This would be way more efficient than having to send and process them 
> individually.
> This fix will be incompatible with the older clients that expect single 
> partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-27791) Eliminate totalSize check from test

2023-10-17 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-27791 started by Zoltán Rátkai.

> Eliminate totalSize check from test
> ---
>
> Key: HIVE-27791
> URL: https://issues.apache.org/jira/browse/HIVE-27791
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltán Rátkai
>Assignee: Zoltán Rátkai
>Priority: Major
>
> As discussed in this ticket, totalSize checks need to be eliminated from 
> tests:
> https://github.com/apache/hive/pull/4690



--
This message was sent by Atlassian Jira
(v8.20.10#820010)