[jira] [Created] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.

2021-05-21 Thread Haymant Mangla (Jira)
Haymant Mangla created HIVE-25154:
-

 Summary: Disable StatsUpdaterThread and PartitionManagementTask 
for db that is being failoved over.
 Key: HIVE-25154
 URL: https://issues.apache.org/jira/browse/HIVE-25154
 Project: Hive
  Issue Type: Improvement
Reporter: Haymant Mangla
Assignee: Haymant Mangla






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25153) JDBC connection randomly fails if both ha and zookeeper discovery namespaces are within znode /hiveserver2

2021-05-21 Thread Pranay (Jira)
Pranay created HIVE-25153:
-

 Summary: JDBC connection randomly fails if both ha and zookeeper 
discovery namespaces are within znode /hiveserver2
 Key: HIVE-25153
 URL: https://issues.apache.org/jira/browse/HIVE-25153
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 3.1.2
 Environment: Hive 3.1.2
Reporter: Pranay


Hello, 

jdbc connection randomly fails to connect to hiveserver2.  Here are observation 
- 

Zookeeper znodes
{code:java}
[zk: redacted106.visa.com:2181(CONNECTED) 5] ls /hiveserver2
[leader, 
serverUri=redacted206.visa.com:1;version=3.1.2-1-SNAPSHOT;sequence=07]{code}
 

ZooKeeperHiveClientHelper.java randomly reads server list from the znode from 
/hiveserver2 and when it reads leader as random host, it fails to connect. 


Failed jdbc connection

{code:java}

21/05/21 20:14:17 [main-SendThread(redacted106.visa.com:2181)]: DEBUG 
zookeeper.ClientCnxn: Reading reply sessionid:0x102e58cc0bd0019, packet:: 
clientPath:null serverPath:null finished:false header:: 1,12 replyHeader:: 
1,42949673107,0 request:: '/hiveserver2,F response:: 
v{'leader,'serverUri=redacted206.visa.com:1;version=3.1.2-1-SNAPSHOT;sequence=05},s{21474836584,21474836584,1621616800534,1621616800534,0,10,0,0,13,2,42949673008}
21/05/21 20:14:17 [main-SendThread(redacted106.visa.com:2181)]: DEBUG 
zookeeper.ClientCnxn: Reading reply sessionid:0x102e58cc0bd0019, packet:: 
clientPath:null serverPath:null finished:false header:: 2,4 replyHeader:: 
2,42949673107,0 request:: '/hiveserver2/leader,F response:: 
,s{21474836589,21474836589,1621616800587,1621616800587,0,5,0,0,0,1,42949673013}
2
21/05/21 20:14:17 [main-SendThread(redacted106.visa.com:2181)]: DEBUG 
zookeeper.ClientCnxn: An exception was thrown while closing send thread for 
session 0x102e58cc0bd0019 : Unable to read additional data from server 
sessionid 0x102e58cc0bd0019, likely server has closed socket
Error: org.apache.hive.jdbc.ZooKeeperHiveClientException: Unable to read 
HiveServer2 configs from ZooKeeper (state=,code=0){code}
 

 

Successful jdbc connection




 
{code:java}

21/05/21 20:14:01 [main-SendThread(redacted106.visa.com:2181)]: DEBUG 
zookeeper.ClientCnxn: Reading reply sessionid:0x102e58cc0bd0018, packet:: 
clientPath:null serverPath:null finished:false header:: 1,12 replyHeader:: 
1,42949673103,0 request:: '/hiveserver2,F response:: 
v{'leader,'serverUri=redacted206.visa.com:1;version=3.1.2-1-SNAPSHOT;sequence=05},s{21474836584,21474836584,1621616800534,1621616800534,0,10,0,0,13,2,42949673008}
21/05/21 20:14:01 [main-SendThread(redacted106.visa.com:2181)]: DEBUG 
zookeeper.ClientCnxn: Reading reply sessionid:0x102e58cc0bd0018, packet:: 
clientPath:null serverPath:null finished:false header:: 2,4 replyHeader:: 
2,42949673103,0 request:: 
'/hiveserver2/serverUri=redacted206.visa.com:1;version=3.1.2-1-SNAPSHOT;sequence=05,F
 response:: 
#686976652e736572766572322e696e7374616e63652e7572693d736c373364706968636e30303230362e766973612e636f6d3a31303030303b686976652e736572766572322e61757468656e7469636174696f6e3d4b45524245524f533b686976652e736572766572322e7472616e73706f72742e6d6f64653d62696e6172793b686976652e736572766572322e7468726966742e7361736c2e716f703d617574683b686976652e736572766572322e7468726966742e62696e642e686f73743d736c373364706968636e30303230362e766973612e636f6d3b686976652e736572766572322e7468726966742e706f72743d31303030303b686976652e736572766572322e7573652e53534c3d66616c73653b686976652e736572766572322e61757468656e7469636174696f6e2e6b65726265726f732e7072696e636970616c3d686976652f5f484f535440434f52504445562e564953412e434f4d,s{42949673008,42949673008,1621626552349,1621626552349,0,0,0,72872936683143172,350,0,42949673008}
21/05/21 20:14:01 [main]: DEBUG jdbc.Utils: Resolved authority: 
redacted206.visa.com:1{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25152) Remove Superfluous Logging Code

2021-05-21 Thread David Mollitor (Jira)
David Mollitor created HIVE-25152:
-

 Summary: Remove Superfluous Logging Code
 Key: HIVE-25152
 URL: https://issues.apache.org/jira/browse/HIVE-25152
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor


So much logging code can be removed to lessen the amount of code in the project 
(and perhaps some small performance gains).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25151) Remove Unused Interner from HiveMetastoreChecker

2021-05-21 Thread David Mollitor (Jira)
David Mollitor created HIVE-25151:
-

 Summary: Remove Unused Interner from HiveMetastoreChecker
 Key: HIVE-25151
 URL: https://issues.apache.org/jira/browse/HIVE-25151
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25150) Tab characters are not removed before decimal conversion similar to space character which is fixed as part of HIVE-24378

2021-05-21 Thread Taraka Rama Rao Lethavadla (Jira)
Taraka Rama Rao Lethavadla created HIVE-25150:
-

 Summary: Tab characters are not removed before decimal conversion 
similar to space character which is fixed as part of HIVE-24378
 Key: HIVE-25150
 URL: https://issues.apache.org/jira/browse/HIVE-25150
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 4.0.0
Reporter: Taraka Rama Rao Lethavadla


Test case: 
column values with space and tab character 
bash-4.2$ cat data/files/test_dec_space.csv
1,0
2, 1
3,  2
 
{noformat}
create external table test_dec_space (id int, value decimal) ROW FORMAT 
DELIMITED
 FIELDS TERMINATED BY ',' location '/tmp/test_dec_space';
{noformat}
 

output of select * from test_dec_space would be

 
{noformat}
1   0
2   1
3   NULL{noformat}
The behaviour in MySQL when there is tab & space characters in decimal values
bash-4.2$ cat /tmp/insert.csv 
"1","aa",11.88
"2","bb", 99.88
"4","dd",   209.88

MariaDB [test]> load data local infile '/tmp/insert.csv' into table t2 fields 
terminated by ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
Query OK, 3 rows affected, 3 warnings (0.00 sec) 
Records: 3  Deleted: 0  Skipped: 0  Warnings: 3

MariaDB [test]> select * from t2;
+--+--+---+
| id   | name | score |
+--+--+---+
| 1| aa   |12 |
| 2| bb   |   100 |
| 4| dd   |   210 |
+--+--+---+
3 rows in set (0.00 sec)
So hive should not show up NULL 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25149) Support parallel load for Optimized HT implementations

2021-05-21 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-25149:
-

 Summary: Support parallel load for Optimized HT implementations
 Key: HIVE-25149
 URL: https://issues.apache.org/jira/browse/HIVE-25149
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25148) Support parallel load for Fast HT implementation

2021-05-21 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-25148:
-

 Summary: Support parallel load for Fast HT implementation
 Key: HIVE-25148
 URL: https://issues.apache.org/jira/browse/HIVE-25148
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25147) Limit offset query in CTAS will cause data loss

2021-05-21 Thread gaozhan ding (Jira)
gaozhan ding created HIVE-25147:
---

 Summary: Limit offset query in CTAS will cause data loss 
 Key: HIVE-25147
 URL: https://issues.apache.org/jira/browse/HIVE-25147
 Project: Hive
  Issue Type: Bug
  Components: Parser
Affects Versions: 3.1.0
Reporter: gaozhan ding


query like:

 
{code:java}
create table ... as select ... from ... limit  offset 
or:
insert overwrite table ... select ... from ... limit offset 
{code}
will cause data loss.

 

reproduce step:
{code:java}
create table test_limit_offset (id int);
insert into test_limit_offset 
values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16);
drop table if exists test_limit_offset2;
create table test_limit_offset2 as select * from test_limit_offset limit 5 
offset 2;
{code}
query test_limit_offset2
{code:java}
++
| test_limit_offset2.id  |
++
| 5  |
| 6  |
| 7  |
++

{code}
expected 5 numbers while got 3;

We can see the problem from the execution plan
{code:java}
++
|  Explain   |
++
| Plan optimized by CBO. |
||
| Vertex dependency in root stage|
| Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)|
||
| Stage-3|
|   Stats Work{} |
| Stage-9|
|   Create Table Operator:   |
| name:dgz.test_limit_offset2|
| Stage-2|
|   Dependency Collection{}  |
| Stage-5(CONDITIONAL)   |
|   Move Operator|
| Stage-8(CONDITIONAL CHILD TASKS: Stage-5, Stage-4, Stage-6) |
|   Conditional Operator |
| Stage-1|
|   Reducer 2|
|   File Output Operator [FS_6]  |
| table:{"name:":"dgz.test_limit_offset2"} |
| Limit [LIM_5] (rows=5 width=1) |   //reduce side full 
limit offset
|   Number of rows:5,Offset of rows:2 |
|   Select Operator [SEL_4] (rows=5 width=1) |
| Output:["_col0"]   |
|   <-Map 1 [CUSTOM_SIMPLE_EDGE] |
| PARTITION_ONLY_SHUFFLE [RS_3] |
|   Limit [LIM_2] (rows=5 width=1) |  //map side 
full limit offset
| Number of rows:5,Offset of rows:2 |
| Select Operator [SEL_1] (rows=13 width=1) |
|   Output:["_col0"] |
|   TableScan [TS_0] (rows=13 width=1) |
| 
dgz@test_limit_offset,test_limit_offset,Tbl:COMPLETE,Col:NONE,Output:["id"] |
| Stage-4(CONDITIONAL)   |
|   File Merge   |
|  Please refer to the previous Stage-8(CONDITIONAL CHILD 
TASKS: Stage-5, Stage-4, Stage-6) |
| Stage-7|
|   Move Operator|
| Stage-6(CONDITIONAL)   |
|   File Merge   |
|  Please refer to the previous Stage-8(CONDITIONAL CHILD 
TASKS: Stage-5, Stage-4, Stage-6) |
| Stage-0|
|   Move Operator|
|  Please refer to the previous Stage-5(CONDITIONAL) |
|  Please refer to the previous Stage-4(CONDITIONAL) |
|  Please refer to the previous Stage-7  |
||
++

{code}
It generate limit operator on both map and reduce side. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25146) JMH tests for Multi HT and parallel load

2021-05-21 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-25146:
-

 Summary: JMH tests for Multi HT and parallel load
 Key: HIVE-25146
 URL: https://issues.apache.org/jira/browse/HIVE-25146
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis


As the title suggests, add some benchmarks for Parallel HT construction feature



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25145) Improve Multi-HashTable EstimatedMemorySize

2021-05-21 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-25145:
-

 Summary: Improve Multi-HashTable EstimatedMemorySize
 Key: HIVE-25145
 URL: https://issues.apache.org/jira/browse/HIVE-25145
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis


When Multi HashTable is used for parallel HT loading, we calculate the 
estimatedMemorySize as the sum of all HTs.
However, each of those HTs already adds some constants to memory estimation 
e.g., adding 16KB constant memory for keyBinarySortableDeserializeRead

This ticket aims to improve the memory estimation for Multi HT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)