[jira] [Created] (HIVE-27807) Backport of HIVE-20629, HIVE-20705, HIVE-20734
Aman Raj created HIVE-27807: --- Summary: Backport of HIVE-20629, HIVE-20705, HIVE-20734 Key: HIVE-27807 URL: https://issues.apache.org/jira/browse/HIVE-27807 Project: Hive Issue Type: Sub-task Affects Versions: 3.2.0 Reporter: Aman Raj Assignee: Aman Raj -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-25351) stddev(), stddev_pop() with CBO enable returning null
[ https://issues.apache.org/jira/browse/HIVE-25351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dayakar M updated HIVE-25351: - Summary: stddev(), stddev_pop() with CBO enable returning null (was: stddev(), sstddev_pop() with CBO enable returning null) > stddev(), stddev_pop() with CBO enable returning null > - > > Key: HIVE-25351 > URL: https://issues.apache.org/jira/browse/HIVE-25351 > Project: Hive > Issue Type: Bug >Reporter: Ashish Sharma >Assignee: Dayakar M >Priority: Blocker > > *script used to repro* > create table cbo_test (key string, v1 double, v2 decimal(30,2), v3 > decimal(30,2)); > insert into cbo_test values ("00140006375905", 10230.72, > 10230.72, 10230.69), ("00140006375905", 10230.72, 10230.72, > 10230.69), ("00140006375905", 10230.72, 10230.72, 10230.69), > ("00140006375905", 10230.72, 10230.72, 10230.69), > ("00140006375905", 10230.72, 10230.72, 10230.69), > ("00140006375905", 10230.72, 10230.72, 10230.69); > select stddev(v1), stddev(v2), stddev(v3) from cbo_test; > *Enable CBO* > ++ > | Explain | > ++ > | Plan optimized by CBO. | > || > | Vertex dependency in root stage| > | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)| > || > | Stage-0| > | Fetch Operator | > | limit:-1 | > | Stage-1| > | Reducer 2 vectorized | > | File Output Operator [FS_13] | > | Select Operator [SEL_12] (rows=1 width=24) | > | Output:["_col0","_col1","_col2"] | > | Group By Operator [GBY_11] (rows=1 width=72) | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(VALUE._col0)","sum(VALUE._col1)","count(VALUE._col2)","sum(VALUE._col3)","sum(VALUE._col4)","count(VALUE._col5)","sum(VALUE._col6)","sum(VALUE._col7)","count(VALUE._col8)"] > | > | <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized | > | PARTITION_ONLY_SHUFFLE [RS_10] | > | Group By Operator [GBY_9] (rows=1 width=72) | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(_col3)","sum(_col0)","count(_col0)","sum(_col5)","sum(_col4)","count(_col1)","sum(_col7)","sum(_col6)","count(_col2)"] > | > | Select Operator [SEL_8] (rows=6 width=232) | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7"] | > | TableScan [TS_0] (rows=6 width=232) | > | default@cbo_test,cbo_test, ACID > table,Tbl:COMPLETE,Col:COMPLETE,Output:["v1","v2","v3"] | > || > ++ > *Query Result* > _c0 _c1 _c2 > 0.0 NaN NaN > *Disable CBO* > ++ > | Explain | > ++ > | Vertex dependency in root stage| > | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)| > || > | Stage-0| > | Fetch Operator | > | limit:-1 | > | Stage-1| > | Reducer 2 vectorized | > | File Output Operator [FS_11] | > | Group By Operator [GBY_10] (rows=1 width=24) | > | > Output:["_col0","_col1","_col2"],aggregations:["stddev(VALUE._col0)","stddev(VALUE._col1)","stddev(VALUE._col2)"] > | > | <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized| > | PARTITION_ONLY_SHUFFLE [RS_9]| > | Group By Operator [GBY_8] (rows=1 width=240) | > | > Output:["_col0","_col1","_col2"],aggregations:["stddev(v1)","stddev(v2)","stddev(v3)"] > | > | Select Operator [SEL_7] (rows=6 width=232) | > | Output:["v1","v2","v3"]| > | TableScan [TS_0] (rows=6 width=232) | > | default@cbo_test,cbo_test, ACID >
[jira] [Commented] (HIVE-25351) stddev(), sstddev_pop() with CBO enable returning null
[ https://issues.apache.org/jira/browse/HIVE-25351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776470#comment-17776470 ] Dayakar M commented on HIVE-25351: -- There is no activity for the last 2 years so assigned myself to work on this. > stddev(), sstddev_pop() with CBO enable returning null > -- > > Key: HIVE-25351 > URL: https://issues.apache.org/jira/browse/HIVE-25351 > Project: Hive > Issue Type: Bug >Reporter: Ashish Sharma >Assignee: Dayakar M >Priority: Blocker > > *script used to repro* > create table cbo_test (key string, v1 double, v2 decimal(30,2), v3 > decimal(30,2)); > insert into cbo_test values ("00140006375905", 10230.72, > 10230.72, 10230.69), ("00140006375905", 10230.72, 10230.72, > 10230.69), ("00140006375905", 10230.72, 10230.72, 10230.69), > ("00140006375905", 10230.72, 10230.72, 10230.69), > ("00140006375905", 10230.72, 10230.72, 10230.69), > ("00140006375905", 10230.72, 10230.72, 10230.69); > select stddev(v1), stddev(v2), stddev(v3) from cbo_test; > *Enable CBO* > ++ > | Explain | > ++ > | Plan optimized by CBO. | > || > | Vertex dependency in root stage| > | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)| > || > | Stage-0| > | Fetch Operator | > | limit:-1 | > | Stage-1| > | Reducer 2 vectorized | > | File Output Operator [FS_13] | > | Select Operator [SEL_12] (rows=1 width=24) | > | Output:["_col0","_col1","_col2"] | > | Group By Operator [GBY_11] (rows=1 width=72) | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(VALUE._col0)","sum(VALUE._col1)","count(VALUE._col2)","sum(VALUE._col3)","sum(VALUE._col4)","count(VALUE._col5)","sum(VALUE._col6)","sum(VALUE._col7)","count(VALUE._col8)"] > | > | <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized | > | PARTITION_ONLY_SHUFFLE [RS_10] | > | Group By Operator [GBY_9] (rows=1 width=72) | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(_col3)","sum(_col0)","count(_col0)","sum(_col5)","sum(_col4)","count(_col1)","sum(_col7)","sum(_col6)","count(_col2)"] > | > | Select Operator [SEL_8] (rows=6 width=232) | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7"] | > | TableScan [TS_0] (rows=6 width=232) | > | default@cbo_test,cbo_test, ACID > table,Tbl:COMPLETE,Col:COMPLETE,Output:["v1","v2","v3"] | > || > ++ > *Query Result* > _c0 _c1 _c2 > 0.0 NaN NaN > *Disable CBO* > ++ > | Explain | > ++ > | Vertex dependency in root stage| > | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)| > || > | Stage-0| > | Fetch Operator | > | limit:-1 | > | Stage-1| > | Reducer 2 vectorized | > | File Output Operator [FS_11] | > | Group By Operator [GBY_10] (rows=1 width=24) | > | > Output:["_col0","_col1","_col2"],aggregations:["stddev(VALUE._col0)","stddev(VALUE._col1)","stddev(VALUE._col2)"] > | > | <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized| > | PARTITION_ONLY_SHUFFLE [RS_9]| > | Group By Operator [GBY_8] (rows=1 width=240) | > | > Output:["_col0","_col1","_col2"],aggregations:["stddev(v1)","stddev(v2)","stddev(v3)"] > | > | Select Operator [SEL_7] (rows=6 width=232) | > | Output:["v1","v2","v3"]| > | TableScan [TS_0] (rows=6 width=232) | > | default@cbo_test,cbo_test, ACID >
[jira] [Assigned] (HIVE-25351) stddev(), sstddev_pop() with CBO enable returning null
[ https://issues.apache.org/jira/browse/HIVE-25351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dayakar M reassigned HIVE-25351: Assignee: Dayakar M (was: Pritha Dawn) > stddev(), sstddev_pop() with CBO enable returning null > -- > > Key: HIVE-25351 > URL: https://issues.apache.org/jira/browse/HIVE-25351 > Project: Hive > Issue Type: Bug >Reporter: Ashish Sharma >Assignee: Dayakar M >Priority: Blocker > > *script used to repro* > create table cbo_test (key string, v1 double, v2 decimal(30,2), v3 > decimal(30,2)); > insert into cbo_test values ("00140006375905", 10230.72, > 10230.72, 10230.69), ("00140006375905", 10230.72, 10230.72, > 10230.69), ("00140006375905", 10230.72, 10230.72, 10230.69), > ("00140006375905", 10230.72, 10230.72, 10230.69), > ("00140006375905", 10230.72, 10230.72, 10230.69), > ("00140006375905", 10230.72, 10230.72, 10230.69); > select stddev(v1), stddev(v2), stddev(v3) from cbo_test; > *Enable CBO* > ++ > | Explain | > ++ > | Plan optimized by CBO. | > || > | Vertex dependency in root stage| > | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)| > || > | Stage-0| > | Fetch Operator | > | limit:-1 | > | Stage-1| > | Reducer 2 vectorized | > | File Output Operator [FS_13] | > | Select Operator [SEL_12] (rows=1 width=24) | > | Output:["_col0","_col1","_col2"] | > | Group By Operator [GBY_11] (rows=1 width=72) | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(VALUE._col0)","sum(VALUE._col1)","count(VALUE._col2)","sum(VALUE._col3)","sum(VALUE._col4)","count(VALUE._col5)","sum(VALUE._col6)","sum(VALUE._col7)","count(VALUE._col8)"] > | > | <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized | > | PARTITION_ONLY_SHUFFLE [RS_10] | > | Group By Operator [GBY_9] (rows=1 width=72) | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(_col3)","sum(_col0)","count(_col0)","sum(_col5)","sum(_col4)","count(_col1)","sum(_col7)","sum(_col6)","count(_col2)"] > | > | Select Operator [SEL_8] (rows=6 width=232) | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7"] | > | TableScan [TS_0] (rows=6 width=232) | > | default@cbo_test,cbo_test, ACID > table,Tbl:COMPLETE,Col:COMPLETE,Output:["v1","v2","v3"] | > || > ++ > *Query Result* > _c0 _c1 _c2 > 0.0 NaN NaN > *Disable CBO* > ++ > | Explain | > ++ > | Vertex dependency in root stage| > | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)| > || > | Stage-0| > | Fetch Operator | > | limit:-1 | > | Stage-1| > | Reducer 2 vectorized | > | File Output Operator [FS_11] | > | Group By Operator [GBY_10] (rows=1 width=24) | > | > Output:["_col0","_col1","_col2"],aggregations:["stddev(VALUE._col0)","stddev(VALUE._col1)","stddev(VALUE._col2)"] > | > | <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized| > | PARTITION_ONLY_SHUFFLE [RS_9]| > | Group By Operator [GBY_8] (rows=1 width=240) | > | > Output:["_col0","_col1","_col2"],aggregations:["stddev(v1)","stddev(v2)","stddev(v3)"] > | > | Select Operator [SEL_7] (rows=6 width=232) | > | Output:["v1","v2","v3"]| > | TableScan [TS_0] (rows=6 width=232) | > | default@cbo_test,cbo_test, ACID > table,Tbl:COMPLETE,Col:COMPLETE,Output:["v1","v2","v3"] | > || >
[jira] [Created] (HIVE-27806) Backport of HIVE-20536, HIVE-20632, HIVE-20511, HIVE-20560, HIVE-20631, HIVE-20637, HIVE-20609, HIVE-20439
Aman Raj created HIVE-27806: --- Summary: Backport of HIVE-20536, HIVE-20632, HIVE-20511, HIVE-20560, HIVE-20631, HIVE-20637, HIVE-20609, HIVE-20439 Key: HIVE-27806 URL: https://issues.apache.org/jira/browse/HIVE-27806 Project: Hive Issue Type: Sub-task Affects Versions: 3.2.0 Reporter: Aman Raj Assignee: Aman Raj -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27786) Iceberg: Eliminate engine.hive.enabled table property
[ https://issues.apache.org/jira/browse/HIVE-27786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776462#comment-17776462 ] Ayush Saxena commented on HIVE-27786: - Committed to master. Thanx [~dkuzmenko] for the review!!! > Iceberg: Eliminate engine.hive.enabled table property > - > > Key: HIVE-27786 > URL: https://issues.apache.org/jira/browse/HIVE-27786 > Project: Hive > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > > Hive iceberg tables persists *engine.hive.enabled* this property, attempt to > eliminate this & make sure things work without it -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27786) Iceberg: Eliminate engine.hive.enabled table property
[ https://issues.apache.org/jira/browse/HIVE-27786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena resolved HIVE-27786. - Fix Version/s: 4.0.0 Resolution: Fixed > Iceberg: Eliminate engine.hive.enabled table property > - > > Key: HIVE-27786 > URL: https://issues.apache.org/jira/browse/HIVE-27786 > Project: Hive > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Hive iceberg tables persists *engine.hive.enabled* this property, attempt to > eliminate this & make sure things work without it -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27676) Reuse the add_partitions logic for add_partition in ObjectStore
[ https://issues.apache.org/jira/browse/HIVE-27676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sai Hemanth Gantasala resolved HIVE-27676. -- Resolution: Fixed [~wechar] - Patch merged to the master branch. Thanks for the contribution. > Reuse the add_partitions logic for add_partition in ObjectStore > --- > > Key: HIVE-27676 > URL: https://issues.apache.org/jira/browse/HIVE-27676 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 4.0.0-beta-1 >Reporter: Wechar >Assignee: Wechar >Priority: Major > Labels: pull-request-available > > HIVE-26035 implements direct SQL for {{add_partitions}} to improve > performance, we can also reuse this logic for {{{}add_partition{}}} with > following benefits: > * Get the performance improvement in direct SQL > * Code cleaner, reduce the duplicate code. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27804) Implement batching in getPartition calls which returns partition list along with auth info
[ https://issues.apache.org/jira/browse/HIVE-27804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27804: -- Labels: pull-request-available (was: ) > Implement batching in getPartition calls which returns partition list along > with auth info > -- > > Key: HIVE-27804 > URL: https://issues.apache.org/jira/browse/HIVE-27804 > Project: Hive > Issue Type: Bug >Reporter: Vikram Ahuja >Assignee: Vikram Ahuja >Priority: Major > Labels: pull-request-available > > Hive.getPartitions() methods returns partition list along with auth info in > one HMS call. These calls when made on wide tables(> 2000 columns) with very > large number of partitions(100,000+) can cause memory related issues when the > data is being transferred from HMS to HS2 using Thrift calls. These APIs can > be optimised by using PartitionIterable implementation where the partition > list if fetched in batched of a smaller size rather than one huge call. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27805) Hive server2 connections limits bug
Xiwei Wang created HIVE-27805: - Summary: Hive server2 connections limits bug Key: HIVE-27805 URL: https://issues.apache.org/jira/browse/HIVE-27805 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 3.1.3, 3.1.2, 3.1.1, 3.0.0, 3.1.0 Environment: image: apache/hive:3.1.3 jdbc driver: hive-jdbc:2.1.0 Reporter: Xiwei Wang When I use JDBC and specify a non-existent database to connect to a hiveserver2 that configured hive.server2.limit.connections.per.user=10, a session initialization error occurs(org.apache.hive.service.cli.HiveSQLException: Failed to open new session: Database not_exists_db does not exist); and even a normal connection will report an error after the number of attempts exceeds the maximum limit I configured (org.apache.hive.service.cli.HiveSQLException: Connection limit per user reached (user: aeolus limit: 10)) I found that inside the method org.apache.hive.service.cli.session.SessionManager#createSession , if seesion initialization fails, it will cause the increased number of connections called incrementConnections cannot be released; after the number of failures exceeds the maximum number of connections configured by the user, such as hive.server2.limit.connections.per.user, hiveserver2 will not accept any connections due to the limitations. {code:java} 2023-10-17T12:14:54,313 WARN [HiveServer2-Handler-Pool: Thread-3329] thrift.ThriftCLIService: Error opening session: org.apache.hive.service.cli.HiveSQLException: Failed to open new session: Database not_exists_db does not exist at org.apache.hive.service.cli.session.SessionManager.createSession(SessionManager.java:434) ~[hive-service-3.1.3.jar:3.1.3] at org.apache.hive.service.cli.session.SessionManager.openSession(SessionManager.java:373) ~[hive-service-3.1.3.jar:3.1.3] at org.apache.hive.service.cli.CLIService.openSession(CLIService.java:187) ~[hive-service-3.1.3.jar:3.1.3] at org.apache.hive.service.cli.thrift.ThriftCLIService.getSessionHandle(ThriftCLIService.java:475) ~[hive-service-3.1.3.jar:3.1.3] at org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:322) ~[hive-service-3.1.3.jar:3.1.3] at org.apache.hive.service.rpc.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1497) ~[hive-exec-3.1.3.jar:3.1.3] at org.apache.hive.service.rpc.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1482) ~[hive-exec-3.1.3.jar:3.1.3] at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[hive-exec-3.1.3.jar:3.1.3] at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[hive-exec-3.1.3.jar:3.1.3] at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) ~[hive-service-3.1.3.jar:3.1.3] at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) ~[hive-exec-3.1.3.jar:3.1.3] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_342] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_342] at java.lang.Thread.run(Thread.java:750) [?:1.8.0_342] Caused by: org.apache.hive.service.cli.HiveSQLException: Database dw_aeolus does not exist at org.apache.hive.service.cli.session.HiveSessionImpl.configureSession(HiveSessionImpl.java:294) ~[hive-service-3.1.3.jar:3.1.3] at org.apache.hive.service.cli.session.HiveSessionImpl.open(HiveSessionImpl.java:199) ~[hive-service-3.1.3.jar:3.1.3] at org.apache.hive.service.cli.session.SessionManager.createSession(SessionManager.java:425) ~[hive-service-3.1.3.jar:3.1.3] ... 13 more 2023-10-17T12:14:54,972 INFO [HiveServer2-Handler-Pool: Thread-3330] thrift.ThriftCLIService: Client protocol version: HIVE_CLI_SERVICE_PROTOCOL_V8 2023-10-17T12:14:54,973 ERROR [HiveServer2-Handler-Pool: Thread-3330] service.CompositeService: Connection limit per user reached (user: aeolus limit: 10) 2023-10-17T12:14:54,973 WARN [HiveServer2-Handler-Pool: Thread-3330] thrift.ThriftCLIService: Error opening session: org.apache.hive.service.cli.HiveSQLException: Connection limit per user reached (user: aeolus limit: 10) at org.apache.hive.service.cli.session.SessionManager.incrementConnections(SessionManager.java:476) ~[hive-service-3.1.3.jar:3.1.3] at org.apache.hive.service.cli.session.SessionManager.createSession(SessionManager.java:383) ~[hive-service-3.1.3.jar:3.1.3] at org.apache.hive.service.cli.session.SessionManager.openSession(SessionManager.java:373) ~[hive-service-3.1.3.jar:3.1.3] at org.apache.hive.service.cli.CLIService.openSession(CLIService.java:187) ~[hive-service-3.1.3.jar:3.1.3] at org.apache.hive.service.cli.thrift.ThriftCLIService.getSessionHandle(ThriftCLIService.java:475)
[jira] [Updated] (HIVE-27804) Implement batching in getPartition calls which returns partition list along with auth info
[ https://issues.apache.org/jira/browse/HIVE-27804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Ahuja updated HIVE-27804: Summary: Implement batching in getPartition calls which returns partition list along with auth info (was: Implement batching in getPartition calls which returns auth info) > Implement batching in getPartition calls which returns partition list along > with auth info > -- > > Key: HIVE-27804 > URL: https://issues.apache.org/jira/browse/HIVE-27804 > Project: Hive > Issue Type: Bug >Reporter: Vikram Ahuja >Assignee: Vikram Ahuja >Priority: Major > > Hive.getPartitions() methods returns partition list along with auth info in > one HMS call. These calls when made on wide tables(> 2000 columns) with very > large number of partitions(100,000+) can cause memory related issues when the > data is being transferred from HMS to HS2 using Thrift calls. These APIs can > be optimised by using PartitionIterable implementation where the partition > list if fetched in batched of a smaller size rather than one huge call. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27804) Implement batching in getPartition calls which returns auth info
[ https://issues.apache.org/jira/browse/HIVE-27804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Ahuja updated HIVE-27804: Description: Hive.getPartitions() methods returns partition list along with auth info in one HMS call. These calls when made on wide tables(> 2000 columns) with very large number of partitions(100,000+) can cause memory related issues when the data is being transferred from HMS to HS2 using Thrift calls. These APIs can be optimised by using PartitionIterable implementation where the partition list if fetched in batched of a smaller size rather than one huge call. (was: Hive.getPartitions() methods returns partition list along with auth info in one . These calls when made on table with ) > Implement batching in getPartition calls which returns auth info > > > Key: HIVE-27804 > URL: https://issues.apache.org/jira/browse/HIVE-27804 > Project: Hive > Issue Type: Bug >Reporter: Vikram Ahuja >Assignee: Vikram Ahuja >Priority: Major > > Hive.getPartitions() methods returns partition list along with auth info in > one HMS call. These calls when made on wide tables(> 2000 columns) with very > large number of partitions(100,000+) can cause memory related issues when the > data is being transferred from HMS to HS2 using Thrift calls. These APIs can > be optimised by using PartitionIterable implementation where the partition > list if fetched in batched of a smaller size rather than one huge call. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27804) Implement batching in getPartition calls which returns auth info
[ https://issues.apache.org/jira/browse/HIVE-27804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Ahuja updated HIVE-27804: Summary: Implement batching in getPartition calls which returns auth info (was: Implement batching in getPartition calls which returns auth info as well) > Implement batching in getPartition calls which returns auth info > > > Key: HIVE-27804 > URL: https://issues.apache.org/jira/browse/HIVE-27804 > Project: Hive > Issue Type: Bug >Reporter: Vikram Ahuja >Assignee: Vikram Ahuja >Priority: Major > > Hive.getPartitions() methods returns partition list along with auth info in > one . These calls when made on table with -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27804) Implement batching in getPartition calls which returns auth info as well
[ https://issues.apache.org/jira/browse/HIVE-27804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Ahuja updated HIVE-27804: Description: Hive.getPartitions() methods returns partition list along with auth info in one . These calls when made on table with > Implement batching in getPartition calls which returns auth info as well > > > Key: HIVE-27804 > URL: https://issues.apache.org/jira/browse/HIVE-27804 > Project: Hive > Issue Type: Bug >Reporter: Vikram Ahuja >Assignee: Vikram Ahuja >Priority: Major > > Hive.getPartitions() methods returns partition list along with auth info in > one . These calls when made on table with -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27804) Implement batching in getPartition calls which returns auth info as well
Vikram Ahuja created HIVE-27804: --- Summary: Implement batching in getPartition calls which returns auth info as well Key: HIVE-27804 URL: https://issues.apache.org/jira/browse/HIVE-27804 Project: Hive Issue Type: Bug Reporter: Vikram Ahuja Assignee: Vikram Ahuja -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27797) Transactions that got timed out are not getting logged as ABORTED transactions in NOTIFICATION_LOG
[ https://issues.apache.org/jira/browse/HIVE-27797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27797: -- Labels: pull-request-available (was: ) > Transactions that got timed out are not getting logged as ABORTED > transactions in NOTIFICATION_LOG > -- > > Key: HIVE-27797 > URL: https://issues.apache.org/jira/browse/HIVE-27797 > Project: Hive > Issue Type: Bug > Components: repl, Transactions >Reporter: Taraka Rama Rao Lethavadla >Assignee: Taraka Rama Rao Lethavadla >Priority: Major > Labels: pull-request-available > > +Scenario:+ > Let's there are 100 transactions opened. These 100 will be logged in > notification_log and when replicated, they will get created in target > cluster. > Now 50 out of these 100 transactions got aborted due to timeout and got > removed from HMS. In this step, we are not logging those transactions in to > notification_log. > So next time when we do replication, these 50 aborted transactions will not > be replicated. > As a result in the target cluster the transactions that got created earlier > will only get removed after number of days configured in config > {code:java} > hive.repl.txn.timeout (11 days default){code} > Actually, we have the logic to log aborted transactions if they got aborted > for some other reason but not for those that are getting timed out. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27797) Transactions that got timed out are not getting logged as ABORTED transactions in NOTIFICATION_LOG
[ https://issues.apache.org/jira/browse/HIVE-27797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Taraka Rama Rao Lethavadla updated HIVE-27797: -- Description: +Scenario:+ Let's there are 100 transactions opened. These 100 will be logged in notification_log and when replicated, they will get created in target cluster. Now 50 out of these 100 transactions got aborted due to timeout and got removed from HMS. In this step, we are not logging those transactions in to notification_log. So next time when we do replication, these 50 aborted transactions will not be replicated. As a result in the target cluster the transactions that got created earlier will only get removed after number of days configured in config {code:java} hive.repl.txn.timeout (11 days default){code} Actually, we have the logic to log aborted transactions if they got aborted for some other reason but not for those that are getting timed out. was: +Scenario:+ Let's there are 100 transactions opened. These 100 will be logged in notification_log and when replicated, they will get created in target cluster. Now 50 out of these 100 transactions got aborted due to timeout and got removed from HMS. In this step, we are not logging those transactions in to notification_log. So next time when we do replication, these 50 aborted transactions will not be replicated. As a result in the target cluster the transactions that got created earlier will stay forever without getting cleaned Actually, we have the logic to log aborted transactions if they got aborted for some other reason but not for those getting timed out. > Transactions that got timed out are not getting logged as ABORTED > transactions in NOTIFICATION_LOG > -- > > Key: HIVE-27797 > URL: https://issues.apache.org/jira/browse/HIVE-27797 > Project: Hive > Issue Type: Bug > Components: repl, Transactions >Reporter: Taraka Rama Rao Lethavadla >Assignee: Taraka Rama Rao Lethavadla >Priority: Major > > +Scenario:+ > Let's there are 100 transactions opened. These 100 will be logged in > notification_log and when replicated, they will get created in target > cluster. > Now 50 out of these 100 transactions got aborted due to timeout and got > removed from HMS. In this step, we are not logging those transactions in to > notification_log. > So next time when we do replication, these 50 aborted transactions will not > be replicated. > As a result in the target cluster the transactions that got created earlier > will only get removed after number of days configured in config > {code:java} > hive.repl.txn.timeout (11 days default){code} > Actually, we have the logic to log aborted transactions if they got aborted > for some other reason but not for those that are getting timed out. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27788) Exception in Sort Merge join with Group By + PTF Operator
[ https://issues.apache.org/jira/browse/HIVE-27788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776189#comment-17776189 ] Krisztian Kasa commented on HIVE-27788: --- Another repro with 3 records and inner join {code} set hive.optimize.semijoin.conversion = false; CREATE TABLE tbl1_n5(key int, value string) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS; insert into tbl1_n5(key, value) values (0, 'val_0'), (2, 'val_2'), (9, 'val_9'); explain SELECT t1.key from (SELECT key , row_number() over(partition by key order by value desc) as rk from tbl1_n5) t1 join ( SELECT key,count(distinct value) as cp_count from tbl1_n5 group by key) t2 on t1.key = t2.key where rk = 1; {code} {code} POSTHOOK: query: explain SELECT t1.key from (SELECT key , row_number() over(partition by key order by value desc) as rk from tbl1_n5) t1 join ( SELECT key,count(distinct value) as cp_count from tbl1_n5 group by key) t2 on t1.key = t2.key where rk = 1 POSTHOOK: type: QUERY POSTHOOK: Input: default@tbl1_n5 A masked pattern was here STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez A masked pattern was here Edges: Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (SIMPLE_EDGE) A masked pattern was here Vertices: Map 1 Map Operator Tree: TableScan alias: tbl1_n5 filterExpr: key is not null (type: boolean) Statistics: Num rows: 3 Data size: 279 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 3 Data size: 279 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: key (type: int), value (type: string) null sort order: aa sort order: +- Map-reduce partition columns: key (type: int) Statistics: Num rows: 3 Data size: 279 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized, llap LLAP IO: all inputs Map 3 Map Operator Tree: TableScan alias: tbl1_n5 filterExpr: key is not null (type: boolean) Statistics: Num rows: 3 Data size: 279 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 3 Data size: 279 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: key (type: int), value (type: string) minReductionHashAggr: 0.4 mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 3 Data size: 279 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int), _col1 (type: string) null sort order: zz sort order: ++ Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 3 Data size: 279 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized, llap LLAP IO: all inputs Reducer 2 Reduce Operator Tree: Group By Operator keys: KEY._col0 (type: int), KEY._col1 (type: string) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 3 Data size: 279 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: int) outputColumnNames: _col0 Statistics: Num rows: 3 Data size: 279 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: _col0 (type: int) mode: complete outputColumnNames: _col0 Statistics: Num rows: 3 Data size: 12 Basic stats: COMPLETE Column stats: COMPLETE Dummy Store Execution mode: llap Reduce Operator Tree: Select Operator expressions: KEY.reducesinkkey0 (type: int), KEY.reducesinkkey1 (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 3 Data size: 279 Basic stats: COMPLETE Column stats: COMPLETE PTF Operator Function definitions: Input definition input alias: ptf_0 output shape:
[jira] [Created] (HIVE-27803) Bump org.apache.avro:avro from 1.11.1 to 1.11.3
Ayush Saxena created HIVE-27803: --- Summary: Bump org.apache.avro:avro from 1.11.1 to 1.11.3 Key: HIVE-27803 URL: https://issues.apache.org/jira/browse/HIVE-27803 Project: Hive Issue Type: Improvement Reporter: Ayush Saxena PR from *[dependabot|https://github.com/apps/dependabot]* https://github.com/apache/hive/pull/4764 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-25351) stddev(), sstddev_pop() with CBO enable returning null
[ https://issues.apache.org/jira/browse/HIVE-25351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776105#comment-17776105 ] Dayakar M commented on HIVE-25351: -- [~Pritha] Are you working on this issue? if not please let me know, I will work on this. Thanks > stddev(), sstddev_pop() with CBO enable returning null > -- > > Key: HIVE-25351 > URL: https://issues.apache.org/jira/browse/HIVE-25351 > Project: Hive > Issue Type: Bug >Reporter: Ashish Sharma >Assignee: Pritha Dawn >Priority: Blocker > > *script used to repro* > create table cbo_test (key string, v1 double, v2 decimal(30,2), v3 > decimal(30,2)); > insert into cbo_test values ("00140006375905", 10230.72, > 10230.72, 10230.69), ("00140006375905", 10230.72, 10230.72, > 10230.69), ("00140006375905", 10230.72, 10230.72, 10230.69), > ("00140006375905", 10230.72, 10230.72, 10230.69), > ("00140006375905", 10230.72, 10230.72, 10230.69), > ("00140006375905", 10230.72, 10230.72, 10230.69); > select stddev(v1), stddev(v2), stddev(v3) from cbo_test; > *Enable CBO* > ++ > | Explain | > ++ > | Plan optimized by CBO. | > || > | Vertex dependency in root stage| > | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)| > || > | Stage-0| > | Fetch Operator | > | limit:-1 | > | Stage-1| > | Reducer 2 vectorized | > | File Output Operator [FS_13] | > | Select Operator [SEL_12] (rows=1 width=24) | > | Output:["_col0","_col1","_col2"] | > | Group By Operator [GBY_11] (rows=1 width=72) | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(VALUE._col0)","sum(VALUE._col1)","count(VALUE._col2)","sum(VALUE._col3)","sum(VALUE._col4)","count(VALUE._col5)","sum(VALUE._col6)","sum(VALUE._col7)","count(VALUE._col8)"] > | > | <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized | > | PARTITION_ONLY_SHUFFLE [RS_10] | > | Group By Operator [GBY_9] (rows=1 width=72) | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(_col3)","sum(_col0)","count(_col0)","sum(_col5)","sum(_col4)","count(_col1)","sum(_col7)","sum(_col6)","count(_col2)"] > | > | Select Operator [SEL_8] (rows=6 width=232) | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7"] | > | TableScan [TS_0] (rows=6 width=232) | > | default@cbo_test,cbo_test, ACID > table,Tbl:COMPLETE,Col:COMPLETE,Output:["v1","v2","v3"] | > || > ++ > *Query Result* > _c0 _c1 _c2 > 0.0 NaN NaN > *Disable CBO* > ++ > | Explain | > ++ > | Vertex dependency in root stage| > | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)| > || > | Stage-0| > | Fetch Operator | > | limit:-1 | > | Stage-1| > | Reducer 2 vectorized | > | File Output Operator [FS_11] | > | Group By Operator [GBY_10] (rows=1 width=24) | > | > Output:["_col0","_col1","_col2"],aggregations:["stddev(VALUE._col0)","stddev(VALUE._col1)","stddev(VALUE._col2)"] > | > | <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized| > | PARTITION_ONLY_SHUFFLE [RS_9]| > | Group By Operator [GBY_8] (rows=1 width=240) | > | > Output:["_col0","_col1","_col2"],aggregations:["stddev(v1)","stddev(v2)","stddev(v3)"] > | > | Select Operator [SEL_7] (rows=6 width=232) | > | Output:["v1","v2","v3"]| > | TableScan [TS_0] (rows=6 width=232) | > | default@cbo_test,cbo_test, ACID >
[jira] [Commented] (HIVE-27746) Hive Metastore should send single AlterPartitionEvent with list of partitions
[ https://issues.apache.org/jira/browse/HIVE-27746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776049#comment-17776049 ] Quanlong Huang commented on HIVE-27746: --- Another benifit of this is that partition events on different tables won't interleave. So event consumers like Impala can handle such batch events more efficiently (IMPALA-12463). It'd be nice if we can do the same for partition level insert events. > Hive Metastore should send single AlterPartitionEvent with list of partitions > - > > Key: HIVE-27746 > URL: https://issues.apache.org/jira/browse/HIVE-27746 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Naveen Gangam >Assignee: Zhihua Deng >Priority: Major > > In HIVE-3938, work was done to send single AddPartitionEvent for APIs that > add partitions in bulk. Similarly, we have alter_partitions APIs that alter > partitions in bulk via a single HMS call. For such events, we should also > send a single AlterPartitionEvent with a list of partitions in it. > This would be way more efficient than having to send and process them > individually. > This fix will be incompatible with the older clients that expect single > partition. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-27791) Eliminate totalSize check from test
[ https://issues.apache.org/jira/browse/HIVE-27791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-27791 started by Zoltán Rátkai. > Eliminate totalSize check from test > --- > > Key: HIVE-27791 > URL: https://issues.apache.org/jira/browse/HIVE-27791 > Project: Hive > Issue Type: Improvement >Reporter: Zoltán Rátkai >Assignee: Zoltán Rátkai >Priority: Major > > As discussed in this ticket, totalSize checks need to be eliminated from > tests: > https://github.com/apache/hive/pull/4690 -- This message was sent by Atlassian Jira (v8.20.10#820010)