[jira] [Created] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.
Haymant Mangla created HIVE-25154: - Summary: Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over. Key: HIVE-25154 URL: https://issues.apache.org/jira/browse/HIVE-25154 Project: Hive Issue Type: Improvement Reporter: Haymant Mangla Assignee: Haymant Mangla -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25153) JDBC connection randomly fails if both ha and zookeeper discovery namespaces are within znode /hiveserver2
Pranay created HIVE-25153: - Summary: JDBC connection randomly fails if both ha and zookeeper discovery namespaces are within znode /hiveserver2 Key: HIVE-25153 URL: https://issues.apache.org/jira/browse/HIVE-25153 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 3.1.2 Environment: Hive 3.1.2 Reporter: Pranay Hello, jdbc connection randomly fails to connect to hiveserver2. Here are observation - Zookeeper znodes {code:java} [zk: redacted106.visa.com:2181(CONNECTED) 5] ls /hiveserver2 [leader, serverUri=redacted206.visa.com:1;version=3.1.2-1-SNAPSHOT;sequence=07]{code} ZooKeeperHiveClientHelper.java randomly reads server list from the znode from /hiveserver2 and when it reads leader as random host, it fails to connect. Failed jdbc connection {code:java} 21/05/21 20:14:17 [main-SendThread(redacted106.visa.com:2181)]: DEBUG zookeeper.ClientCnxn: Reading reply sessionid:0x102e58cc0bd0019, packet:: clientPath:null serverPath:null finished:false header:: 1,12 replyHeader:: 1,42949673107,0 request:: '/hiveserver2,F response:: v{'leader,'serverUri=redacted206.visa.com:1;version=3.1.2-1-SNAPSHOT;sequence=05},s{21474836584,21474836584,1621616800534,1621616800534,0,10,0,0,13,2,42949673008} 21/05/21 20:14:17 [main-SendThread(redacted106.visa.com:2181)]: DEBUG zookeeper.ClientCnxn: Reading reply sessionid:0x102e58cc0bd0019, packet:: clientPath:null serverPath:null finished:false header:: 2,4 replyHeader:: 2,42949673107,0 request:: '/hiveserver2/leader,F response:: ,s{21474836589,21474836589,1621616800587,1621616800587,0,5,0,0,0,1,42949673013} 2 21/05/21 20:14:17 [main-SendThread(redacted106.visa.com:2181)]: DEBUG zookeeper.ClientCnxn: An exception was thrown while closing send thread for session 0x102e58cc0bd0019 : Unable to read additional data from server sessionid 0x102e58cc0bd0019, likely server has closed socket Error: org.apache.hive.jdbc.ZooKeeperHiveClientException: Unable to read HiveServer2 configs from ZooKeeper (state=,code=0){code} Successful jdbc connection {code:java} 21/05/21 20:14:01 [main-SendThread(redacted106.visa.com:2181)]: DEBUG zookeeper.ClientCnxn: Reading reply sessionid:0x102e58cc0bd0018, packet:: clientPath:null serverPath:null finished:false header:: 1,12 replyHeader:: 1,42949673103,0 request:: '/hiveserver2,F response:: v{'leader,'serverUri=redacted206.visa.com:1;version=3.1.2-1-SNAPSHOT;sequence=05},s{21474836584,21474836584,1621616800534,1621616800534,0,10,0,0,13,2,42949673008} 21/05/21 20:14:01 [main-SendThread(redacted106.visa.com:2181)]: DEBUG zookeeper.ClientCnxn: Reading reply sessionid:0x102e58cc0bd0018, packet:: clientPath:null serverPath:null finished:false header:: 2,4 replyHeader:: 2,42949673103,0 request:: '/hiveserver2/serverUri=redacted206.visa.com:1;version=3.1.2-1-SNAPSHOT;sequence=05,F response:: #686976652e736572766572322e696e7374616e63652e7572693d736c373364706968636e30303230362e766973612e636f6d3a31303030303b686976652e736572766572322e61757468656e7469636174696f6e3d4b45524245524f533b686976652e736572766572322e7472616e73706f72742e6d6f64653d62696e6172793b686976652e736572766572322e7468726966742e7361736c2e716f703d617574683b686976652e736572766572322e7468726966742e62696e642e686f73743d736c373364706968636e30303230362e766973612e636f6d3b686976652e736572766572322e7468726966742e706f72743d31303030303b686976652e736572766572322e7573652e53534c3d66616c73653b686976652e736572766572322e61757468656e7469636174696f6e2e6b65726265726f732e7072696e636970616c3d686976652f5f484f535440434f52504445562e564953412e434f4d,s{42949673008,42949673008,1621626552349,1621626552349,0,0,0,72872936683143172,350,0,42949673008} 21/05/21 20:14:01 [main]: DEBUG jdbc.Utils: Resolved authority: redacted206.visa.com:1{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25152) Remove Superfluous Logging Code
David Mollitor created HIVE-25152: - Summary: Remove Superfluous Logging Code Key: HIVE-25152 URL: https://issues.apache.org/jira/browse/HIVE-25152 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor So much logging code can be removed to lessen the amount of code in the project (and perhaps some small performance gains). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25151) Remove Unused Interner from HiveMetastoreChecker
David Mollitor created HIVE-25151: - Summary: Remove Unused Interner from HiveMetastoreChecker Key: HIVE-25151 URL: https://issues.apache.org/jira/browse/HIVE-25151 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25150) Tab characters are not removed before decimal conversion similar to space character which is fixed as part of HIVE-24378
Taraka Rama Rao Lethavadla created HIVE-25150: - Summary: Tab characters are not removed before decimal conversion similar to space character which is fixed as part of HIVE-24378 Key: HIVE-25150 URL: https://issues.apache.org/jira/browse/HIVE-25150 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 4.0.0 Reporter: Taraka Rama Rao Lethavadla Test case: column values with space and tab character bash-4.2$ cat data/files/test_dec_space.csv 1,0 2, 1 3, 2 {noformat} create external table test_dec_space (id int, value decimal) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' location '/tmp/test_dec_space'; {noformat} output of select * from test_dec_space would be {noformat} 1 0 2 1 3 NULL{noformat} The behaviour in MySQL when there is tab & space characters in decimal values bash-4.2$ cat /tmp/insert.csv "1","aa",11.88 "2","bb", 99.88 "4","dd", 209.88 MariaDB [test]> load data local infile '/tmp/insert.csv' into table t2 fields terminated by ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'; Query OK, 3 rows affected, 3 warnings (0.00 sec) Records: 3 Deleted: 0 Skipped: 0 Warnings: 3 MariaDB [test]> select * from t2; +--+--+---+ | id | name | score | +--+--+---+ | 1| aa |12 | | 2| bb | 100 | | 4| dd | 210 | +--+--+---+ 3 rows in set (0.00 sec) So hive should not show up NULL -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25149) Support parallel load for Optimized HT implementations
Panagiotis Garefalakis created HIVE-25149: - Summary: Support parallel load for Optimized HT implementations Key: HIVE-25149 URL: https://issues.apache.org/jira/browse/HIVE-25149 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25148) Support parallel load for Fast HT implementation
Panagiotis Garefalakis created HIVE-25148: - Summary: Support parallel load for Fast HT implementation Key: HIVE-25148 URL: https://issues.apache.org/jira/browse/HIVE-25148 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25147) Limit offset query in CTAS will cause data loss
gaozhan ding created HIVE-25147: --- Summary: Limit offset query in CTAS will cause data loss Key: HIVE-25147 URL: https://issues.apache.org/jira/browse/HIVE-25147 Project: Hive Issue Type: Bug Components: Parser Affects Versions: 3.1.0 Reporter: gaozhan ding query like: {code:java} create table ... as select ... from ... limit offset or: insert overwrite table ... select ... from ... limit offset {code} will cause data loss. reproduce step: {code:java} create table test_limit_offset (id int); insert into test_limit_offset values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16); drop table if exists test_limit_offset2; create table test_limit_offset2 as select * from test_limit_offset limit 5 offset 2; {code} query test_limit_offset2 {code:java} ++ | test_limit_offset2.id | ++ | 5 | | 6 | | 7 | ++ {code} expected 5 numbers while got 3; We can see the problem from the execution plan {code:java} ++ | Explain | ++ | Plan optimized by CBO. | || | Vertex dependency in root stage| | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)| || | Stage-3| | Stats Work{} | | Stage-9| | Create Table Operator: | | name:dgz.test_limit_offset2| | Stage-2| | Dependency Collection{} | | Stage-5(CONDITIONAL) | | Move Operator| | Stage-8(CONDITIONAL CHILD TASKS: Stage-5, Stage-4, Stage-6) | | Conditional Operator | | Stage-1| | Reducer 2| | File Output Operator [FS_6] | | table:{"name:":"dgz.test_limit_offset2"} | | Limit [LIM_5] (rows=5 width=1) | //reduce side full limit offset | Number of rows:5,Offset of rows:2 | | Select Operator [SEL_4] (rows=5 width=1) | | Output:["_col0"] | | <-Map 1 [CUSTOM_SIMPLE_EDGE] | | PARTITION_ONLY_SHUFFLE [RS_3] | | Limit [LIM_2] (rows=5 width=1) | //map side full limit offset | Number of rows:5,Offset of rows:2 | | Select Operator [SEL_1] (rows=13 width=1) | | Output:["_col0"] | | TableScan [TS_0] (rows=13 width=1) | | dgz@test_limit_offset,test_limit_offset,Tbl:COMPLETE,Col:NONE,Output:["id"] | | Stage-4(CONDITIONAL) | | File Merge | | Please refer to the previous Stage-8(CONDITIONAL CHILD TASKS: Stage-5, Stage-4, Stage-6) | | Stage-7| | Move Operator| | Stage-6(CONDITIONAL) | | File Merge | | Please refer to the previous Stage-8(CONDITIONAL CHILD TASKS: Stage-5, Stage-4, Stage-6) | | Stage-0| | Move Operator| | Please refer to the previous Stage-5(CONDITIONAL) | | Please refer to the previous Stage-4(CONDITIONAL) | | Please refer to the previous Stage-7 | || ++ {code} It generate limit operator on both map and reduce side. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25146) JMH tests for Multi HT and parallel load
Panagiotis Garefalakis created HIVE-25146: - Summary: JMH tests for Multi HT and parallel load Key: HIVE-25146 URL: https://issues.apache.org/jira/browse/HIVE-25146 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis As the title suggests, add some benchmarks for Parallel HT construction feature -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25145) Improve Multi-HashTable EstimatedMemorySize
Panagiotis Garefalakis created HIVE-25145: - Summary: Improve Multi-HashTable EstimatedMemorySize Key: HIVE-25145 URL: https://issues.apache.org/jira/browse/HIVE-25145 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis When Multi HashTable is used for parallel HT loading, we calculate the estimatedMemorySize as the sum of all HTs. However, each of those HTs already adds some constants to memory estimation e.g., adding 16KB constant memory for keyBinarySortableDeserializeRead This ticket aims to improve the memory estimation for Multi HT -- This message was sent by Atlassian Jira (v8.3.4#803005)