[jira] [Updated] (HIVE-10931) Wrong columns selected on multiple joins

2015-06-12 Thread Furcy Pin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furcy Pin updated HIVE-10931:
-
Description: 
The following set of queries :

{code:sql}
DROP TABLE IF EXISTS test1 ;
DROP TABLE IF EXISTS test2 ;
DROP TABLE IF EXISTS test3 ;

CREATE TABLE test1 (col1 INT, col2 STRING, col3 STRING, coL4 STRING, coL5 
STRING, col6 STRING) ;
INSERT INTO TABLE test1 VALUES (1,NULL,NULL,NULL,NULL,A) ;

CREATE TABLE test2 (col1 INT, col2 STRING, col3 STRING, coL4 STRING, coL5 
STRING, col6 STRING) ;
INSERT INTO TABLE test2 VALUES (1,NULL,NULL,NULL,NULL,X) ;

CREATE TABLE test3 (coL1 STRING) ;
INSERT INTO TABLE test3 VALUES (A) ;

SELECT
  T2.val
FROM test1 T1
LEFT JOIN (SELECT col1, col2, col3, col4, col5,  COALESCE(col6,) as val FROM 
test2) T2
ON T2.col1 = T1.col1
LEFT JOIN test3 T3  
ON T3.col1 = T1.col6 
;
{code}

will return this :

{noformat}
+--+--+
| t2.val   |
+--+--+
| A|
+--+--+
{noformat}

Obviously, this result is wrong as table `test2` contains a X and no A.

This is the most minimal example we found of this issue, in particular
having less than 6 columns in the tables will work, for instance :

{code:sql}
SELECT
  T2.val
FROM test1 T1
LEFT JOIN (SELECT col1, col2, col3, col4, COALESCE(col6,) as val FROM test2) 
T2
ON T2.col1 = T1.col1
LEFT JOIN test3 T3  
ON T3.col1 = T1.col6 
;
{code}

(same query as before, but `col5` was removed from the select)
will return :

{noformat}
+--+--+
| t2.val   |
+--+--+
| X|
+--+--+
{noformat}

Removing the `COALESCE` also removes the bug...




  was:
The following set of queries :

{code:sql}
DROP TABLE IF EXISTS test1 ;
DROP TABLE IF EXISTS test2 ;
DROP TABLE IF EXISTS test3 ;

CREATE TABLE test1 (col1 INT, col2 STRING, col3 STRING, coL4 STRING, coL5 
STRING, col6 STRING) ;
INSERT INTO TABLE test1 VALUES (1,NULL,NULL,NULL,NULL,A) ;

CREATE TABLE test2 (col1 INT, col2 STRING, col3 STRING, coL4 STRING, coL5 
STRING, col6 STRING) ;
INSERT INTO TABLE test2 VALUES (1,NULL,NULL,NULL,NULL,X) ;

CREATE TABLE test3 (coL1 STRING) ;
INSERT INTO TABLE test3 VALUES (A) ;

SELECT
  T2.val
FROM test1 T1
LEFT JOIN (SELECT col1, col2, col3, col4, col5,  COALESCE(col6,) as val FROM 
test2) T2
ON T2.col1 = T1.col1
LEFT JOIN test3 T3  
ON T3.col1 = T1.col6 
;
{code}

will return this :

{noformat}
+--+--+
| t2.val   |
+--+--+
| A|
+--+--+
{noformat}

Obviously, this result is wrong as table `test2` contains a X and no A.

This is the most minimal example we found of this issue, in particular
having less than 6 columns in the tables will work, for instance :

{code:sql}
SELECT
  T2.val
FROM test1 T1
LEFT JOIN (SELECT col1, col2, col3, col4, COALESCE(col6,) as val FROM test2) 
T2
ON T2.col1 = T1.col1
LEFT JOIN test3 T3  
ON T3.col1 = T1.col6 
;
{code:sql}

(same query as before, but `col5` was removed from the select)
will return :

{noformat}
+--+--+
| t2.val   |
+--+--+
| X|
+--+--+
{noformat}

Removing the `COALESCE` also removes the bug...





 Wrong columns selected on multiple joins
 

 Key: HIVE-10931
 URL: https://issues.apache.org/jira/browse/HIVE-10931
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.1.0
 Environment: Cloudera cdh5.4.2
Reporter: Furcy Pin

 The following set of queries :
 {code:sql}
 DROP TABLE IF EXISTS test1 ;
 DROP TABLE IF EXISTS test2 ;
 DROP TABLE IF EXISTS test3 ;
 CREATE TABLE test1 (col1 INT, col2 STRING, col3 STRING, coL4 STRING, coL5 
 STRING, col6 STRING) ;
 INSERT INTO TABLE test1 VALUES (1,NULL,NULL,NULL,NULL,A) ;
 CREATE TABLE test2 (col1 INT, col2 STRING, col3 STRING, coL4 STRING, coL5 
 STRING, col6 STRING) ;
 INSERT INTO TABLE test2 VALUES (1,NULL,NULL,NULL,NULL,X) ;
 CREATE TABLE test3 (coL1 STRING) ;
 INSERT INTO TABLE test3 VALUES (A) ;
 SELECT
   T2.val
 FROM test1 T1
 LEFT JOIN (SELECT col1, col2, col3, col4, col5,  COALESCE(col6,) as val 
 FROM test2) T2
 ON T2.col1 = T1.col1
 LEFT JOIN test3 T3  
 ON T3.col1 = T1.col6 
 ;
 {code}
 will return this :
 {noformat}
 +--+--+
 | t2.val   |
 +--+--+
 | A|
 +--+--+
 {noformat}
 Obviously, this result is wrong as table `test2` contains a X and no A.
 This is the most minimal example we found of this issue, in particular
 having less than 6 columns in the tables will work, for instance :
 {code:sql}
 SELECT
   T2.val
 FROM test1 T1
 LEFT JOIN (SELECT col1, col2, col3, col4, COALESCE(col6,) as val FROM 
 test2) T2
 ON T2.col1 = T1.col1
 LEFT JOIN test3 T3  
 ON T3.col1 = T1.col6 
 ;
 {code}
 (same query as before, but `col5` was removed from the select)
 will return :
 {noformat}
 +--+--+
 | t2.val   |
 +--+--+
 | X|
 +--+--+
 {noformat}
 Removing the `COALESCE` also removes the bug...



--

[jira] [Updated] (HIVE-10931) Wrong columns selected on multiple joins

2015-06-12 Thread Furcy Pin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furcy Pin updated HIVE-10931:
-
Description: 
The following set of queries :

{code:sql}
DROP TABLE IF EXISTS test1 ;
DROP TABLE IF EXISTS test2 ;
DROP TABLE IF EXISTS test3 ;

CREATE TABLE test1 (col1 INT, col2 STRING, col3 STRING, coL4 STRING, coL5 
STRING, col6 STRING) ;
INSERT INTO TABLE test1 VALUES (1,NULL,NULL,NULL,NULL,A) ;

CREATE TABLE test2 (col1 INT, col2 STRING, col3 STRING, coL4 STRING, coL5 
STRING, col6 STRING) ;
INSERT INTO TABLE test2 VALUES (1,NULL,NULL,NULL,NULL,X) ;

CREATE TABLE test3 (coL1 STRING) ;
INSERT INTO TABLE test3 VALUES (A) ;

SELECT
  T2.val
FROM test1 T1
LEFT JOIN (SELECT col1, col2, col3, col4, col5,  COALESCE(col6,) as val FROM 
test2) T2
ON T2.col1 = T1.col1
LEFT JOIN test3 T3  
ON T3.col1 = T1.col6 
;
{code}

will return this :

{noformat}
+--+--+
| t2.val   |
+--+--+
| A|
+--+--+
{noformat}

Obviously, this result is wrong as table `test2` contains a X and no A.

This is the most minimal example we found of this issue, in particular
having less than 6 columns in the tables will work, for instance :

{code:sql}
SELECT
  T2.val
FROM test1 T1
LEFT JOIN (SELECT col1, col2, col3, col4, COALESCE(col6,) as val FROM test2) 
T2
ON T2.col1 = T1.col1
LEFT JOIN test3 T3  
ON T3.col1 = T1.col6 
;
{code:sql}

(same query as before, but `col5` was removed from the select)
will return :

{noformat}
+--+--+
| t2.val   |
+--+--+
| X|
+--+--+
{noformat}

Removing the `COALESCE` also removes the bug...




  was:
The following set of queries :

```
DROP TABLE IF EXISTS test1 ;
DROP TABLE IF EXISTS test2 ;
DROP TABLE IF EXISTS test3 ;

CREATE TABLE test1 (col1 INT, col2 STRING, col3 STRING, coL4 STRING, coL5 
STRING, col6 STRING) ;
INSERT INTO TABLE test1 VALUES (1,NULL,NULL,NULL,NULL,A) ;

CREATE TABLE test2 (col1 INT, col2 STRING, col3 STRING, coL4 STRING, coL5 
STRING, col6 STRING) ;
INSERT INTO TABLE test2 VALUES (1,NULL,NULL,NULL,NULL,X) ;

CREATE TABLE test3 (coL1 STRING) ;
INSERT INTO TABLE test3 VALUES (A) ;

SELECT
  T2.val
FROM test1 T1
LEFT JOIN (SELECT col1, col2, col3, col4, col5,  COALESCE(col6,) as val FROM 
test2) T2
ON T2.col1 = T1.col1
LEFT JOIN test3 T3  
ON T3.col1 = T1.col6 
;
```

will return this :

```
+--+--+
| t2.val   |
+--+--+
| A|
+--+--+
```

Obviously, this result is wrong as table `test2` contains a X and no A.

This is the most minimal example we found of this issue, in particular
having less than 6 columns in the tables will work, for instance :

```
SELECT
  T2.val
FROM test1 T1
LEFT JOIN (SELECT col1, col2, col3, col4, COALESCE(col6,) as val FROM test2) 
T2
ON T2.col1 = T1.col1
LEFT JOIN test3 T3  
ON T3.col1 = T1.col6 
;
```

(same query as before, but `col5` was removed from the select)
will return :

```
+--+--+
| t2.val   |
+--+--+
| X|
+--+--+
```

Removing the `COALESCE` also removes the bug...





 Wrong columns selected on multiple joins
 

 Key: HIVE-10931
 URL: https://issues.apache.org/jira/browse/HIVE-10931
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.1.0
 Environment: Cloudera cdh5.4.2
Reporter: Furcy Pin

 The following set of queries :
 {code:sql}
 DROP TABLE IF EXISTS test1 ;
 DROP TABLE IF EXISTS test2 ;
 DROP TABLE IF EXISTS test3 ;
 CREATE TABLE test1 (col1 INT, col2 STRING, col3 STRING, coL4 STRING, coL5 
 STRING, col6 STRING) ;
 INSERT INTO TABLE test1 VALUES (1,NULL,NULL,NULL,NULL,A) ;
 CREATE TABLE test2 (col1 INT, col2 STRING, col3 STRING, coL4 STRING, coL5 
 STRING, col6 STRING) ;
 INSERT INTO TABLE test2 VALUES (1,NULL,NULL,NULL,NULL,X) ;
 CREATE TABLE test3 (coL1 STRING) ;
 INSERT INTO TABLE test3 VALUES (A) ;
 SELECT
   T2.val
 FROM test1 T1
 LEFT JOIN (SELECT col1, col2, col3, col4, col5,  COALESCE(col6,) as val 
 FROM test2) T2
 ON T2.col1 = T1.col1
 LEFT JOIN test3 T3  
 ON T3.col1 = T1.col6 
 ;
 {code}
 will return this :
 {noformat}
 +--+--+
 | t2.val   |
 +--+--+
 | A|
 +--+--+
 {noformat}
 Obviously, this result is wrong as table `test2` contains a X and no A.
 This is the most minimal example we found of this issue, in particular
 having less than 6 columns in the tables will work, for instance :
 {code:sql}
 SELECT
   T2.val
 FROM test1 T1
 LEFT JOIN (SELECT col1, col2, col3, col4, COALESCE(col6,) as val FROM 
 test2) T2
 ON T2.col1 = T1.col1
 LEFT JOIN test3 T3  
 ON T3.col1 = T1.col6 
 ;
 {code:sql}
 (same query as before, but `col5` was removed from the select)
 will return :
 {noformat}
 +--+--+
 | t2.val   |
 +--+--+
 | X|
 +--+--+
 {noformat}
 Removing the `COALESCE` also removes the bug...



--
This message was sent by Atlassian JIRA

[jira] [Commented] (HIVE-11192) Wrong results for query with WHERE ... NOT IN when table has null values

2015-07-07 Thread Furcy Pin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616963#comment-14616963
 ] 

Furcy Pin commented on HIVE-11192:
--

My apologies, this was not a bug but the expected (confusing IMHO) behavior for 
SQL.

SELECT 1 IN (1,2,3,NULL) ; 
 true

SELECT 1 IN (2,3) ;
 false

SELECT 1 IN (2,3,NULL) ;
 NULL

SELECT 1 NOT IN (1,2,3,NULL) ; 
 false

SELECT 1 NOT IN (2,3,NULL) ;
 NULL

SELECT 1 NOT IN (2,3) ;
 true


 Wrong results for query with WHERE ... NOT IN when table has null values
 

 Key: HIVE-11192
 URL: https://issues.apache.org/jira/browse/HIVE-11192
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.1.0, 1.2.1
 Environment: Hive on MR
Reporter: Furcy Pin

 I tested this on cdh5.4.2 cluster and locally on the release-1.2.1 branch
 ```sql
 DROP TABLE IF EXISTS test1 ;
 DROP TABLE IF EXISTS test2 ;
 CREATE TABLE test1 (col1 STRING) ;
 INSERT INTO TABLE test1 VALUES (1), (2), (3), (4) ;
 CREATE TABLE test2 (col1 STRING) ;
 INSERT INTO TABLE test2 VALUES (1), (4), (NULL) ;
 SELECT 
 COUNT(1)
 FROM test1 T1
 WHERE T1.col1 NOT IN (SELECT col1 FROM test2)
 ;
 SELECT 
 COUNT(1)
 FROM test1 T1
 WHERE T1.col1 NOT IN (SELECT col1 FROM test2 WHERE col1 IS NOT NULL)
 ;
 ```
 The first query returns 0 and the second returns 2.
 Obviously, the expected answer is always 2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12258) read/write into same partitioned table + concurrency = deadlock

2015-10-24 Thread Furcy Pin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furcy Pin updated HIVE-12258:
-
Description: 
When hive.support.concurrency is enabled if you launch a query that reads data 
from a partition and writes data into another partition of the same table,
it creates a deadlock. 
The worse part is that once the deadlock is active, you can't query the table 
until it times out. 

* How to reproduce :

```
CREATE TABLE test_table (id INT) 
PARTITIONED BY (part STRING)
;

INSERT INTO TABLE test_table PARTITION (part="test")
VALUES (1), (2), (3), (4) 
;

INSERT OVERWRITE TABLE test_table PARTITION (part="test2")
SELECT id FROM test_table WHERE part="test1";
```

Nothing happens, and when doing a SHOW LOCKS in another terminal we get :

```
SHOW LOCKS ;
+--+---+++-+--+-+-++
| lockid   | database  | table  | partition  | lock_state  | lock_type| 
transaction_id  | last_heartbeat  |  acquired_at   |
+--+---+++-+--+-+-++
| 3765 | default   | test_table | NULL   | WAITING | SHARED_READ  | 
NULL| 1440603633148   | NULL   |
| 3765 | default   | test_table | part=test2 | WAITING | EXCLUSIVE| 
NULL| 1440603633148   | NULL   |
+--+---+++-+--+-+-++
```

This was tested on Hive 1.1.0-cdh5.4.2 but I believe the bug is still presents 
in 1.2.1.
I could not reproduce it easily locally because it requires a 
pseudo-distributed setup with zookeeper to have concurrency enabled.

>From looking at the code I believe the problem comes from the 
>EmbeddedLockManager method 
`public List lock(List objs, int numRetriesForLock, long 
sleepTime)`
that keeps trying to acquire two incompatible locks, and ends up failing after 
hive.lock.numretries*hive.lock.sleep.between.retries which by defaut is 100*60s 
= 100 minutes.


  was:
When hive.support.concurrency is enabled if you launch a query that reads data 
from a partition and writes data into another partition of the same table,
it creates a deadlock. 
The worse part is that once the deadlock is active, you can't query the table 
until it times out. 

## How to reproduce :

```sql
CREATE TABLE test_table (id INT) 
PARTITIONED BY (part STRING)
;

INSERT INTO TABLE test_table PARTITION (part="test")
VALUES (1), (2), (3), (4) 
;

INSERT OVERWRITE TABLE test_table PARTITION (part="test2")
SELECT id FROM test_table WHERE part="test1";
```

Nothing happens, and when doing a SHOW LOCKS in another terminal we get :

```
SHOW LOCKS ;
+--+---+++-+--+-+-++
| lockid   | database  | table  | partition  | lock_state  | lock_type| 
transaction_id  | last_heartbeat  |  acquired_at   |
+--+---+++-+--+-+-++
| 3765 | default   | test_table | NULL   | WAITING | SHARED_READ  | 
NULL| 1440603633148   | NULL   |
| 3765 | default   | test_table | part=test2 | WAITING | EXCLUSIVE| 
NULL| 1440603633148   | NULL   |
+--+---+++-+--+-+-++
```

This was tested on Hive 1.1.0-cdh5.4.2 but I believe the bug is still presents 
in 1.2.1.
I could not reproduce it easily locally because it requires a 
pseudo-distributed setup with zookeeper to have concurrency enabled.

>From looking at the code I believe the problem comes from the 
>EmbeddedLockManager method 
`public List lock(List objs, int numRetriesForLock, long 
sleepTime)`
that keeps trying to acquire two incompatible locks, and ends up failing after 
hive.lock.numretries*hive.lock.sleep.between.retries which by defaut is 100*60s 
= 100 minutes.



> read/write into same partitioned table + concurrency = deadlock
> ---
>
> Key: HIVE-12258
> URL: https://issues.apache.org/jira/browse/HIVE-12258
> Project: Hive
>  Issue Type: Bug
>Reporter: Furcy Pin
>
> When hive.support.concurrency is enabled if you launch a query that reads 
> data from a partition and writes data into another partition of the same 
> table,
> it creates a deadlock. 
> The worse part is that once the deadlock is active, you can't query the table 
> until it times out. 
> * How to reproduce :
> ```
> CREATE TABLE test_table (id INT) 
> PARTITIONED BY (part STRING)
> ;
> 

[jira] [Updated] (HIVE-12258) read/write into same partitioned table + concurrency = deadlock

2015-10-24 Thread Furcy Pin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furcy Pin updated HIVE-12258:
-
Description: 
When hive.support.concurrency is enabled if you launch a query that reads data 
from a partition and writes data into another partition of the same table,
it creates a deadlock. 
The worse part is that once the deadlock is active, you can't query the table 
until it times out. 

* How to reproduce :


CREATE TABLE test_table (id INT) 
PARTITIONED BY (part STRING)
;

INSERT INTO TABLE test_table PARTITION (part="test")
VALUES (1), (2), (3), (4) 
;

INSERT OVERWRITE TABLE test_table PARTITION (part="test2")
SELECT id FROM test_table WHERE part="test1";

Nothing happens, and when doing a SHOW LOCKS in another terminal we get :

+--+---+++-+--+-+-++
| lockid   | database  | table  | partition  | lock_state  | lock_type| 
transaction_id  | last_heartbeat  |  acquired_at   |
+--+---+++-+--+-+-++
| 3765 | default   | test_table | NULL   | WAITING | SHARED_READ  | 
NULL| 1440603633148   | NULL   |
| 3765 | default   | test_table | part=test2 | WAITING | EXCLUSIVE| 
NULL| 1440603633148   | NULL   |
+--+---+++-+--+-+-++

This was tested on Hive 1.1.0-cdh5.4.2 but I believe the bug is still presents 
in 1.2.1.
I could not reproduce it easily locally because it requires a 
pseudo-distributed setup with zookeeper to have concurrency enabled.

>From looking at the code I believe the problem comes from the 
>EmbeddedLockManager method 
`public List lock(List objs, int numRetriesForLock, long 
sleepTime)`
that keeps trying to acquire two incompatible locks, and ends up failing after 
hive.lock.numretries*hive.lock.sleep.between.retries which by defaut is 100*60s 
= 100 minutes.


  was:
When hive.support.concurrency is enabled if you launch a query that reads data 
from a partition and writes data into another partition of the same table,
it creates a deadlock. 
The worse part is that once the deadlock is active, you can't query the table 
until it times out. 

* How to reproduce :

```
CREATE TABLE test_table (id INT) 
PARTITIONED BY (part STRING)
;

INSERT INTO TABLE test_table PARTITION (part="test")
VALUES (1), (2), (3), (4) 
;

INSERT OVERWRITE TABLE test_table PARTITION (part="test2")
SELECT id FROM test_table WHERE part="test1";
```

Nothing happens, and when doing a SHOW LOCKS in another terminal we get :

```
SHOW LOCKS ;
+--+---+++-+--+-+-++
| lockid   | database  | table  | partition  | lock_state  | lock_type| 
transaction_id  | last_heartbeat  |  acquired_at   |
+--+---+++-+--+-+-++
| 3765 | default   | test_table | NULL   | WAITING | SHARED_READ  | 
NULL| 1440603633148   | NULL   |
| 3765 | default   | test_table | part=test2 | WAITING | EXCLUSIVE| 
NULL| 1440603633148   | NULL   |
+--+---+++-+--+-+-++
```

This was tested on Hive 1.1.0-cdh5.4.2 but I believe the bug is still presents 
in 1.2.1.
I could not reproduce it easily locally because it requires a 
pseudo-distributed setup with zookeeper to have concurrency enabled.

>From looking at the code I believe the problem comes from the 
>EmbeddedLockManager method 
`public List lock(List objs, int numRetriesForLock, long 
sleepTime)`
that keeps trying to acquire two incompatible locks, and ends up failing after 
hive.lock.numretries*hive.lock.sleep.between.retries which by defaut is 100*60s 
= 100 minutes.



> read/write into same partitioned table + concurrency = deadlock
> ---
>
> Key: HIVE-12258
> URL: https://issues.apache.org/jira/browse/HIVE-12258
> Project: Hive
>  Issue Type: Bug
>Reporter: Furcy Pin
>
> When hive.support.concurrency is enabled if you launch a query that reads 
> data from a partition and writes data into another partition of the same 
> table,
> it creates a deadlock. 
> The worse part is that once the deadlock is active, you can't query the table 
> until it times out. 
> * How to reproduce :
> CREATE TABLE test_table (id INT) 
> PARTITIONED BY (part STRING)
> ;
> INSERT INTO TABLE test_table PARTITION 

[jira] [Updated] (HIVE-12258) read/write into same partitioned table + concurrency = deadlock

2015-10-24 Thread Furcy Pin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furcy Pin updated HIVE-12258:
-
Description: 
When hive.support.concurrency is enabled if you launch a query that reads data 
from a partition and writes data into another partition of the same table,
it creates a deadlock. 
The worse part is that once the deadlock is active, you can't query the table 
until it times out. 

* How to reproduce :


CREATE TABLE test_table (id INT) 
PARTITIONED BY (part STRING)
;

INSERT INTO TABLE test_table PARTITION (part="test")
VALUES (1), (2), (3), (4) 
;

INSERT OVERWRITE TABLE test_table PARTITION (part="test2")
SELECT id FROM test_table WHERE part="test1";

Nothing happens, and when doing a SHOW LOCKS in another terminal we get :

| lockid   | database  | table  | partition  | lock_state  | lock_type| 
transaction_id  | last_heartbeat  |  acquired_at   |
| 3765 | default   | test_table | NULL   | WAITING | SHARED_READ  | 
NULL| 1440603633148   | NULL   |
| 3765 | default   | test_table | part=test2 | WAITING | EXCLUSIVE| 
NULL| 1440603633148   | NULL   |

This was tested on Hive 1.1.0-cdh5.4.2 but I believe the bug is still presents 
in 1.2.1.
I could not reproduce it easily locally because it requires a 
pseudo-distributed setup with zookeeper to have concurrency enabled.

>From looking at the code I believe the problem comes from the 
>EmbeddedLockManager method 
`public List lock(List objs, int numRetriesForLock, long 
sleepTime)`
that keeps trying to acquire two incompatible locks, and ends up failing after 
hive.lock.numretries*hive.lock.sleep.between.retries which by defaut is 100*60s 
= 100 minutes.


  was:
When hive.support.concurrency is enabled if you launch a query that reads data 
from a partition and writes data into another partition of the same table,
it creates a deadlock. 
The worse part is that once the deadlock is active, you can't query the table 
until it times out. 

* How to reproduce :


CREATE TABLE test_table (id INT) 
PARTITIONED BY (part STRING)
;

INSERT INTO TABLE test_table PARTITION (part="test")
VALUES (1), (2), (3), (4) 
;

INSERT OVERWRITE TABLE test_table PARTITION (part="test2")
SELECT id FROM test_table WHERE part="test1";

Nothing happens, and when doing a SHOW LOCKS in another terminal we get :

+--+---+++-+--+-+-++
| lockid   | database  | table  | partition  | lock_state  | lock_type| 
transaction_id  | last_heartbeat  |  acquired_at   |
+--+---+++-+--+-+-++
| 3765 | default   | test_table | NULL   | WAITING | SHARED_READ  | 
NULL| 1440603633148   | NULL   |
| 3765 | default   | test_table | part=test2 | WAITING | EXCLUSIVE| 
NULL| 1440603633148   | NULL   |
+--+---+++-+--+-+-++

This was tested on Hive 1.1.0-cdh5.4.2 but I believe the bug is still presents 
in 1.2.1.
I could not reproduce it easily locally because it requires a 
pseudo-distributed setup with zookeeper to have concurrency enabled.

>From looking at the code I believe the problem comes from the 
>EmbeddedLockManager method 
`public List lock(List objs, int numRetriesForLock, long 
sleepTime)`
that keeps trying to acquire two incompatible locks, and ends up failing after 
hive.lock.numretries*hive.lock.sleep.between.retries which by defaut is 100*60s 
= 100 minutes.



> read/write into same partitioned table + concurrency = deadlock
> ---
>
> Key: HIVE-12258
> URL: https://issues.apache.org/jira/browse/HIVE-12258
> Project: Hive
>  Issue Type: Bug
>Reporter: Furcy Pin
>
> When hive.support.concurrency is enabled if you launch a query that reads 
> data from a partition and writes data into another partition of the same 
> table,
> it creates a deadlock. 
> The worse part is that once the deadlock is active, you can't query the table 
> until it times out. 
> * How to reproduce :
> CREATE TABLE test_table (id INT) 
> PARTITIONED BY (part STRING)
> ;
> INSERT INTO TABLE test_table PARTITION (part="test")
> VALUES (1), (2), (3), (4) 
> ;
> INSERT OVERWRITE TABLE test_table PARTITION (part="test2")
> SELECT id FROM test_table WHERE part="test1";
> Nothing happens, and when doing a SHOW LOCKS in another terminal we get :
> | lockid   | database  | table  | partition  | lock_state  | lock_type
> | transaction_id  | last_heartbeat  |  acquired_at   |
> | 3765 | default   | test_table | NULL   | 

[jira] [Updated] (HIVE-12258) read/write into same partitioned table + concurrency = deadlock

2015-10-24 Thread Furcy Pin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furcy Pin updated HIVE-12258:
-
Description: 
When hive.support.concurrency is enabled if you launch a query that reads data 
from a partition and writes data into another partition of the same table,
it creates a deadlock. 
The worse part is that once the deadlock is active, you can't query the table 
until it times out. 

* How to reproduce :


CREATE TABLE test_table (id INT) 
PARTITIONED BY (part STRING)
;

INSERT INTO TABLE test_table PARTITION (part="test")
VALUES (1), (2), (3), (4) 
;

INSERT OVERWRITE TABLE test_table PARTITION (part="test2")
SELECT id FROM test_table WHERE part="test1";

Nothing happens, and when doing a SHOW LOCKS in another terminal we get :

| lockid   | database  | table  | partition  | lock_state  | lock_type| 
transaction_id  | last_heartbeat  |  acquired_at   |
| 3765 | default   | test_table | NULL   | WAITING | SHARED_READ  | 
NULL| 1440603633148   | NULL   |
| 3765 | default   | test_table | part=test2 | WAITING | EXCLUSIVE| 
NULL| 1440603633148   | NULL   |

This was tested on Hive 1.1.0-cdh5.4.2 but I believe the bug is still presents 
in 1.2.1.
I could not reproduce it easily locally because it requires a 
pseudo-distributed setup with zookeeper to have concurrency enabled.

>From looking at the code I believe the problem comes from the 
>EmbeddedLockManager method 
public List lock(List objs, int numRetriesForLock, long 
sleepTime)
that keeps trying to acquire two incompatible locks, and ends up failing after 
hive.lock.numretries*hive.lock.sleep.between.retries which by defaut is 100*60s 
= 100 minutes.


  was:
When hive.support.concurrency is enabled if you launch a query that reads data 
from a partition and writes data into another partition of the same table,
it creates a deadlock. 
The worse part is that once the deadlock is active, you can't query the table 
until it times out. 

* How to reproduce :


CREATE TABLE test_table (id INT) 
PARTITIONED BY (part STRING)
;

INSERT INTO TABLE test_table PARTITION (part="test")
VALUES (1), (2), (3), (4) 
;

INSERT OVERWRITE TABLE test_table PARTITION (part="test2")
SELECT id FROM test_table WHERE part="test1";

Nothing happens, and when doing a SHOW LOCKS in another terminal we get :

| lockid   | database  | table  | partition  | lock_state  | lock_type| 
transaction_id  | last_heartbeat  |  acquired_at   |
| 3765 | default   | test_table | NULL   | WAITING | SHARED_READ  | 
NULL| 1440603633148   | NULL   |
| 3765 | default   | test_table | part=test2 | WAITING | EXCLUSIVE| 
NULL| 1440603633148   | NULL   |

This was tested on Hive 1.1.0-cdh5.4.2 but I believe the bug is still presents 
in 1.2.1.
I could not reproduce it easily locally because it requires a 
pseudo-distributed setup with zookeeper to have concurrency enabled.

>From looking at the code I believe the problem comes from the 
>EmbeddedLockManager method 
`public List lock(List objs, int numRetriesForLock, long 
sleepTime)`
that keeps trying to acquire two incompatible locks, and ends up failing after 
hive.lock.numretries*hive.lock.sleep.between.retries which by defaut is 100*60s 
= 100 minutes.



> read/write into same partitioned table + concurrency = deadlock
> ---
>
> Key: HIVE-12258
> URL: https://issues.apache.org/jira/browse/HIVE-12258
> Project: Hive
>  Issue Type: Bug
>Reporter: Furcy Pin
>
> When hive.support.concurrency is enabled if you launch a query that reads 
> data from a partition and writes data into another partition of the same 
> table,
> it creates a deadlock. 
> The worse part is that once the deadlock is active, you can't query the table 
> until it times out. 
> * How to reproduce :
> CREATE TABLE test_table (id INT) 
> PARTITIONED BY (part STRING)
> ;
> INSERT INTO TABLE test_table PARTITION (part="test")
> VALUES (1), (2), (3), (4) 
> ;
> INSERT OVERWRITE TABLE test_table PARTITION (part="test2")
> SELECT id FROM test_table WHERE part="test1";
> Nothing happens, and when doing a SHOW LOCKS in another terminal we get :
> | lockid   | database  | table  | partition  | lock_state  | lock_type
> | transaction_id  | last_heartbeat  |  acquired_at   |
> | 3765 | default   | test_table | NULL   | WAITING | SHARED_READ  
> | NULL| 1440603633148   | NULL   |
> | 3765 | default   | test_table | part=test2 | WAITING | EXCLUSIVE
> | NULL| 1440603633148   | NULL   |
> This was tested on Hive 1.1.0-cdh5.4.2 but I believe the bug is still 
> presents in 1.2.1.
> I could not reproduce it easily locally because it requires a 
> pseudo-distributed 

[jira] [Commented] (HIVE-12258) read/write into same partitioned table + concurrency = deadlock

2015-10-25 Thread Furcy Pin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973263#comment-14973263
 ] 

Furcy Pin commented on HIVE-12258:
--

Besides, even after killing beeline, the locks are still active, with a 
heartbeat (more than 24h later).
And creating a new table in the same schema will hang as well because it tries 
to acquire a lock on the schema...


> read/write into same partitioned table + concurrency = deadlock
> ---
>
> Key: HIVE-12258
> URL: https://issues.apache.org/jira/browse/HIVE-12258
> Project: Hive
>  Issue Type: Bug
>Reporter: Furcy Pin
>
> When hive.support.concurrency is enabled if you launch a query that reads 
> data from a partition and writes data into another partition of the same 
> table,
> it creates a deadlock. 
> The worse part is that once the deadlock is active, you can't query the table 
> until it times out. 
> * How to reproduce :
> CREATE TABLE test_table (id INT) 
> PARTITIONED BY (part STRING)
> ;
> INSERT INTO TABLE test_table PARTITION (part="test")
> VALUES (1), (2), (3), (4) 
> ;
> INSERT OVERWRITE TABLE test_table PARTITION (part="test2")
> SELECT id FROM test_table WHERE part="test1";
> Nothing happens, and when doing a SHOW LOCKS in another terminal we get :
> | lockid   | database  | table  | partition  | lock_state  | lock_type
> | transaction_id  | last_heartbeat  |  acquired_at   |
> | 3765 | default   | test_table | NULL   | WAITING | SHARED_READ  
> | NULL| 1440603633148   | NULL   |
> | 3765 | default   | test_table | part=test2 | WAITING | EXCLUSIVE
> | NULL| 1440603633148   | NULL   |
> This was tested on Hive 1.1.0-cdh5.4.2 but I believe the bug is still 
> presents in 1.2.1.
> I could not reproduce it easily locally because it requires a 
> pseudo-distributed setup with zookeeper to have concurrency enabled.
> From looking at the code I believe the problem comes from the 
> EmbeddedLockManager method 
> public List lock(List objs, int numRetriesForLock, 
> long sleepTime)
> that keeps trying to acquire two incompatible locks, and ends up failing 
> after 
> hive.lock.numretries*hive.lock.sleep.between.retries which by defaut is 
> 100*60s = 100 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12028) An empty array is of type Array and incompatible with other array types

2015-10-14 Thread Furcy Pin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956499#comment-14956499
 ] 

Furcy Pin commented on HIVE-12028:
--

Exactly, it's the same as "cast(null as int)".
Only shorter and probably not ISO-compliant.


> An empty array is of type Array and incompatible with other array 
> types
> ---
>
> Key: HIVE-12028
> URL: https://issues.apache.org/jira/browse/HIVE-12028
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 1.2.1
>Reporter: Furcy Pin
>
> How to reproduce:
> ```sql
> SELECT ARRAY(ARRAY(1),ARRAY()) ;
> FAILED: SemanticException [Error 10016]: Line 1:22 Argument type mismatch 
> 'ARRAY': Argument type "array" is different from preceding arguments. 
> Previous type was "array"
> SELECT COALESCE(ARRAY(1),ARRAY()) ;
> FAILED: SemanticException [Error 10016]: Line 1:25 Argument type mismatch 
> 'ARRAY': The expressions after COALESCE should all have the same type: 
> "array" is expected but "array" is found
> ```
> This is especially painful for COALESCE, as we cannot
> remove NULLS after doing a JOIN.
> The same problem holds with maps.
> The only workaround I could think of (except adding my own UDF)
> is quite ugly :
> ```sql
> SELECT ARRAY(ARRAY(1),empty.arr) FROM (SELECT collect_set(id) as arr FROM 
> (SELECT 1 as id) T WHERE id=0) empty ;
> ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12028) An empty array is of type Array and incompatible with other array types

2015-10-12 Thread Furcy Pin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14952901#comment-14952901
 ] 

Furcy Pin commented on HIVE-12028:
--

I found a simpler workaround :

SELECT collect_set(INT(NULL)) ;

This returns an empty array of type ARRAY.

> An empty array is of type Array and incompatible with other array 
> types
> ---
>
> Key: HIVE-12028
> URL: https://issues.apache.org/jira/browse/HIVE-12028
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 1.2.1
>Reporter: Furcy Pin
>
> How to reproduce:
> ```sql
> SELECT ARRAY(ARRAY(1),ARRAY()) ;
> FAILED: SemanticException [Error 10016]: Line 1:22 Argument type mismatch 
> 'ARRAY': Argument type "array" is different from preceding arguments. 
> Previous type was "array"
> SELECT COALESCE(ARRAY(1),ARRAY()) ;
> FAILED: SemanticException [Error 10016]: Line 1:25 Argument type mismatch 
> 'ARRAY': The expressions after COALESCE should all have the same type: 
> "array" is expected but "array" is found
> ```
> This is especially painful for COALESCE, as we cannot
> remove NULLS after doing a JOIN.
> The same problem holds with maps.
> The only workaround I could think of (except adding my own UDF)
> is quite ugly :
> ```sql
> SELECT ARRAY(ARRAY(1),empty.arr) FROM (SELECT collect_set(id) as arr FROM 
> (SELECT 1 as id) T WHERE id=0) empty ;
> ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10931) Wrong columns selected on multiple joins

2015-07-07 Thread Furcy Pin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616478#comment-14616478
 ] 

Furcy Pin commented on HIVE-10931:
--

I tried to reproduce on a local environment on the release-1.2.1 branch, and 
the bug has disappeared.
I guess the issue can be closed.

 Wrong columns selected on multiple joins
 

 Key: HIVE-10931
 URL: https://issues.apache.org/jira/browse/HIVE-10931
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.1.0
 Environment: Cloudera cdh5.4.2
Reporter: Furcy Pin

 The following set of queries :
 {code:sql}
 DROP TABLE IF EXISTS test1 ;
 DROP TABLE IF EXISTS test2 ;
 DROP TABLE IF EXISTS test3 ;
 CREATE TABLE test1 (col1 INT, col2 STRING, col3 STRING, coL4 STRING, coL5 
 STRING, col6 STRING) ;
 INSERT INTO TABLE test1 VALUES (1,NULL,NULL,NULL,NULL,A) ;
 CREATE TABLE test2 (col1 INT, col2 STRING, col3 STRING, coL4 STRING, coL5 
 STRING, col6 STRING) ;
 INSERT INTO TABLE test2 VALUES (1,NULL,NULL,NULL,NULL,X) ;
 CREATE TABLE test3 (coL1 STRING) ;
 INSERT INTO TABLE test3 VALUES (A) ;
 SELECT
   T2.val
 FROM test1 T1
 LEFT JOIN (SELECT col1, col2, col3, col4, col5,  COALESCE(col6,) as val 
 FROM test2) T2
 ON T2.col1 = T1.col1
 LEFT JOIN test3 T3  
 ON T3.col1 = T1.col6 
 ;
 {code}
 will return this :
 {noformat}
 +--+--+
 | t2.val   |
 +--+--+
 | A|
 +--+--+
 {noformat}
 Obviously, this result is wrong as table `test2` contains a X and no A.
 This is the most minimal example we found of this issue, in particular
 having less than 6 columns in the tables will work, for instance :
 {code:sql}
 SELECT
   T2.val
 FROM test1 T1
 LEFT JOIN (SELECT col1, col2, col3, col4, COALESCE(col6,) as val FROM 
 test2) T2
 ON T2.col1 = T1.col1
 LEFT JOIN test3 T3  
 ON T3.col1 = T1.col6 
 ;
 {code}
 (same query as before, but `col5` was removed from the select)
 will return :
 {noformat}
 +--+--+
 | t2.val   |
 +--+--+
 | X|
 +--+--+
 {noformat}
 Removing the `COALESCE` also removes the bug...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10931) Wrong columns selected on multiple joins

2015-07-07 Thread Furcy Pin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furcy Pin updated HIVE-10931:
-
Fix Version/s: 1.2.1

 Wrong columns selected on multiple joins
 

 Key: HIVE-10931
 URL: https://issues.apache.org/jira/browse/HIVE-10931
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.1.0
 Environment: Cloudera cdh5.4.2
Reporter: Furcy Pin
 Fix For: 1.2.1


 The following set of queries :
 {code:sql}
 DROP TABLE IF EXISTS test1 ;
 DROP TABLE IF EXISTS test2 ;
 DROP TABLE IF EXISTS test3 ;
 CREATE TABLE test1 (col1 INT, col2 STRING, col3 STRING, coL4 STRING, coL5 
 STRING, col6 STRING) ;
 INSERT INTO TABLE test1 VALUES (1,NULL,NULL,NULL,NULL,A) ;
 CREATE TABLE test2 (col1 INT, col2 STRING, col3 STRING, coL4 STRING, coL5 
 STRING, col6 STRING) ;
 INSERT INTO TABLE test2 VALUES (1,NULL,NULL,NULL,NULL,X) ;
 CREATE TABLE test3 (coL1 STRING) ;
 INSERT INTO TABLE test3 VALUES (A) ;
 SELECT
   T2.val
 FROM test1 T1
 LEFT JOIN (SELECT col1, col2, col3, col4, col5,  COALESCE(col6,) as val 
 FROM test2) T2
 ON T2.col1 = T1.col1
 LEFT JOIN test3 T3  
 ON T3.col1 = T1.col6 
 ;
 {code}
 will return this :
 {noformat}
 +--+--+
 | t2.val   |
 +--+--+
 | A|
 +--+--+
 {noformat}
 Obviously, this result is wrong as table `test2` contains a X and no A.
 This is the most minimal example we found of this issue, in particular
 having less than 6 columns in the tables will work, for instance :
 {code:sql}
 SELECT
   T2.val
 FROM test1 T1
 LEFT JOIN (SELECT col1, col2, col3, col4, COALESCE(col6,) as val FROM 
 test2) T2
 ON T2.col1 = T1.col1
 LEFT JOIN test3 T3  
 ON T3.col1 = T1.col6 
 ;
 {code}
 (same query as before, but `col5` was removed from the select)
 will return :
 {noformat}
 +--+--+
 | t2.val   |
 +--+--+
 | X|
 +--+--+
 {noformat}
 Removing the `COALESCE` also removes the bug...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7483) hive insert overwrite table select from self dead lock

2015-08-26 Thread Furcy Pin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14714475#comment-14714475
 ] 

Furcy Pin commented on HIVE-7483:
-

I confirm, we are having the same issue on Hive 1.1.0, and I reproduced it on 
1.2.1

CREATE TABLE test_db.test_table (id INT) 
PARTITIONED BY (part STRING)
STORED AS ORC ;

INSERT INTO TABLE test_db.test_table PARTITION (part=test)
VALUES (1), (2), (3), (4) 
;

SET hive.exec.dynamic.partition.mode=nonstrict ;
SET hive.support.concurrency=true ;

INSERT OVERWRITE TABLE test_db.test_table PARTITION (part)
SELECT * FROM test_db.test_table ;

nothing happens, and when doing a SHOW LOCKS in another shell we get :

+--+---+++-+--+-+-++
| lockid   | database  | table  | partition  | lock_state  | lock_type| 
transaction_id  | last_heartbeat  |  acquired_at   |
+--+---+++-+--+-+-++
| 3765 | test_db   | test_table | NULL   | WAITING | EXCLUSIVE| 
NULL| 1440603633148   | NULL   |
| 3765 | test_db   | test_table | part=test  | WAITING | SHARED_READ  | 
NULL| 1440603633148   | NULL   |
+--+---+++-+--+-+-++

from looking at the source of 
org.apache.hadoop.hive.ql.lockmgr.EmbeddedLockManager, I would say it is stuck
in the for loop of the method lock(ListHiveLockObj objs, int 
numRetriesForLock, long sleepTime), 
where lockPrimitive() keeps failing on the second lock.







 hive insert overwrite table select from self dead lock
 --

 Key: HIVE-7483
 URL: https://issues.apache.org/jira/browse/HIVE-7483
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 0.13.1
Reporter: Xiaoyu Wang

 CREATE TABLE test(
   id int, 
   msg string)
 PARTITIONED BY ( 
   continent string, 
   country string)
 CLUSTERED BY (id) 
 INTO 10 BUCKETS
 STORED AS ORC;
 alter table test add partition(continent='Asia',country='India');
 in hive-site.xml:
 hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
 hive.support.concurrency=true;
 in hive shell:
 set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
 insert into test table some records first.
 then execute sql:
 insert overwrite table test partition(continent='Asia',country='India') 
 select id,msg from test;
 the log stop at :
 INFO log.PerfLogger: PERFLOG method=acquireReadWriteLocks 
 from=org.apache.hadoop.hive.ql.Driver
 i think it has dead lock when insert overwrite table from it self.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6050) Newer versions of JDBC driver does not work with older HiveServer2

2016-12-01 Thread Furcy Pin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15711624#comment-15711624
 ] 

Furcy Pin commented on HIVE-6050:
-

Hi,

I ran into the same problem, and found a workaround for my setup, so I wanted 
to share it.

I have an application that uses hive-cli 1.2.0 (because of Spark) and that 
sends queries to our cluster via hive-jdbc.
Our cluster is still stuck in 1.1.0 (thank you Cloudera) and I ran into this 
issue when talking with the v1.2.0 jdbc client to the v1.1.0 jdbc server.

I managed to make everything work with the following dependency setup (sbt 
syntax):

{code}
libraryDependencies += "org.apache.hive" % "hive-service" % "1.1.0"

libraryDependencies += "org.apache.hive" % "hive-jdbc" % "1.1.0"

libraryDependencies += "org.apache.hive" % "hive-cli" % "1.2.0" 
exclude("org.apache.hive", "hive-service")
{code}

The trick was that the {{client_protocol}} is not defined in {{hive-jdbc}} but 
in {{hive-service}}, so you must make sure that your client has the same 
version of {{hive-service}} than your server.

I didn't run into any other compatibility problem so far, but I don't guarantee 
that nothing will break when mixing two hive versions together...


> Newer versions of JDBC driver does not work with older HiveServer2
> --
>
> Key: HIVE-6050
> URL: https://issues.apache.org/jira/browse/HIVE-6050
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 0.13.0
>Reporter: Szehon Ho
>Priority: Blocker
>
> HiveServer2 instance has to be upgraded before the JDBC drivers used by 
> applications are upgraded. If jdbc drivers are updated before HiveServer2 is 
> upgraded it will not be functional.
> Connect from JDBC driver of Hive 0.13 (TProtocolVersion=v4) to HiveServer2 of 
> Hive 0.10 (TProtocolVersion=v1), will return the following exception:
> {noformat}
> java.sql.SQLException: Could not establish connection to 
> jdbc:hive2://localhost:1/default: Required field 'client_protocol' is 
> unset! Struct:TOpenSessionReq(client_protocol:null)
>   at 
> org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:336)
>   at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:158)
>   at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
>   at java.sql.DriverManager.getConnection(DriverManager.java:571)
>   at java.sql.DriverManager.getConnection(DriverManager.java:187)
>   at 
> org.apache.hive.jdbc.MyTestJdbcDriver2.getConnection(MyTestJdbcDriver2.java:73)
>   at 
> org.apache.hive.jdbc.MyTestJdbcDriver2.init(MyTestJdbcDriver2.java:49)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.createTest(BlockJUnit4ClassRunner.java:187)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.runReflectiveCall(BlockJUnit4ClassRunner.java:236)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.methodBlock(BlockJUnit4ClassRunner.java:233)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
>   at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:523)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1063)
>   at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:914)
> Caused by: org.apache.thrift.TApplicationException: Required field 
> 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null)
>   at 
> org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
>   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
>   at 
>