[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-24 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980819#comment-15980819
 ] 

Lefty Leverenz commented on HIVE-16321:
---

Doc note:  This adds *datanucleus.connectionPool.maxPoolSize* to HiveConf.java 
in releases 2.3.0 and 2.4.0.  (HIVE-16383 already added it to master for 
release 3.0.0.)

It should be documented in the MetaStore section of Configuration Properties:

* [Configuration Properties -- MetaStore | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-MetaStore]

Added a TODOC2.3 label.

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
>  Labels: TODOC2.3
> Fix For: 2.3.0, 3.0.0, 2.4.0
>
> Attachments: HIVE-16321.01-branch-2.3.patch, 
> HIVE-16321.01.branch-2.patch, HIVE-16321.01.patch, 
> HIVE-16321.02.branch-2.patch, HIVE-16321.02.patch, 
> HIVE-16321.03-branch-2.patch, HIVE-16321.03.patch, 
> HIVE-16321.05-branch-2.3.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-22 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980157#comment-15980157
 ] 

Eugene Koifman commented on HIVE-16321:
---

committed to branch-2.3 
https://github.com/apache/hive/commit/6cdfaaedd7d2b22fe32fa0f6686d3533012bf982

Thanks [~wzheng] for the review

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16321.01-branch-2.3.patch, 
> HIVE-16321.01.branch-2.patch, HIVE-16321.01.patch, 
> HIVE-16321.02.branch-2.patch, HIVE-16321.02.patch, 
> HIVE-16321.03-branch-2.patch, HIVE-16321.03.patch, 
> HIVE-16321.05-branch-2.3.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980132#comment-15980132
 ] 

Hive QA commented on HIVE-16321:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864641/HIVE-16321.05-branch-2.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10570 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=142)
org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=174)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDecimalX 
(batchId=177)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteTimestamp 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4846/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4846/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4846/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12864641 - PreCommit-HIVE-Build

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16321.01-branch-2.3.patch, 
> HIVE-16321.01.branch-2.patch, HIVE-16321.01.patch, 
> HIVE-16321.02.branch-2.patch, HIVE-16321.02.patch, 
> HIVE-16321.03-branch-2.patch, HIVE-16321.03.patch, 
> HIVE-16321.05-branch-2.3.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-22 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980019#comment-15980019
 ] 

Eugene Koifman commented on HIVE-16321:
---

committed to branch-2 
https://github.com/apache/hive/commit/59faf36952f70d243ba997e54c1349ea2c01e670


> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16321.01-branch-2.3.patch, 
> HIVE-16321.01.branch-2.patch, HIVE-16321.01.patch, 
> HIVE-16321.02.branch-2.patch, HIVE-16321.02.patch, 
> HIVE-16321.03-branch-2.patch, HIVE-16321.03.patch, 
> HIVE-16321.05-branch-2.3.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979991#comment-15979991
 ] 

Hive QA commented on HIVE-16321:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864614/HIVE-16321.03-branch-2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10571 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=142)
org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=174)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4843/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4843/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4843/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12864614 - PreCommit-HIVE-Build

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16321.01.branch-2.patch, HIVE-16321.01.patch, 
> HIVE-16321.02.branch-2.patch, HIVE-16321.02.patch, 
> HIVE-16321.03-branch-2.patch, HIVE-16321.03.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979576#comment-15979576
 ] 

Hive QA commented on HIVE-16321:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864558/HIVE-16321.02.branch-2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4832/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4832/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4832/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-04-21 23:31:56.671
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-4832/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-04-21 23:31:56.674
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 6566065 HIVE-15982 : Support the width_bucket function (Sahil 
Takiar via Ashutosh Chauhan)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 6566065 HIVE-15982 : Support the width_bucket function (Sahil 
Takiar via Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-04-21 23:31:57.802
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:737
error: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: patch does 
not apply
error: patch failed: 
metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java:147
error: metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: 
patch does not apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12864558 - PreCommit-HIVE-Build

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16321.01.branch-2.patch, HIVE-16321.01.patch, 
> HIVE-16321.02.branch-2.patch, HIVE-16321.02.patch, HIVE-16321.03.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent 

[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-20 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977901#comment-15977901
 ] 

Eugene Koifman commented on HIVE-16321:
---

committed to master 
https://github.com/apache/hive/commit/182218b760397e27936c5b9885083cdc774fef90


> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16321.01.branch-2.patch, HIVE-16321.01.patch, 
> HIVE-16321.02.patch, HIVE-16321.03.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-20 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977475#comment-15977475
 ] 

Wei Zheng commented on HIVE-16321:
--

+1

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16321.01.patch, HIVE-16321.02.patch, 
> HIVE-16321.03.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-20 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977208#comment-15977208
 ] 

Eugene Koifman commented on HIVE-16321:
---

patches 2 and 3 are the same but have different set of failures except 
vector_if_expr which is a flaky test

no related failures

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16321.01.patch, HIVE-16321.02.patch, 
> HIVE-16321.03.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977012#comment-15977012
 ] 

Hive QA commented on HIVE-16321:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864282/HIVE-16321.04.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4793/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4793/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4793/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Tests exited with: ExecutionException: 
org.apache.hive.ptest.execution.ssh.SSHExecutionException: RSyncResult 
[localFile=/data/hiveptest/logs/PreCommit-HIVE-Build-4793/succeeded/196_TestRemoteHiveMetaStoreIpAddress,
 remoteFile=/home/hiveptest/104.154.80.32-hiveptest-0/logs/, getExitCode()=255, 
getException()=null, getUser()=hiveptest, getHost()=104.154.80.32, 
getInstance()=0]: 'Warning: Permanently added '104.154.80.32' (ECDSA) to the 
list of known hosts.
Permission denied (publickey).
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1]
Warning: Permanently added '104.154.80.32' (ECDSA) to the list of known hosts.
Permission denied (publickey).
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1]
Warning: Permanently added '104.154.80.32' (ECDSA) to the list of known hosts.
Permission denied (publickey).
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1]
Warning: Permanently added '104.154.80.32' (ECDSA) to the list of known hosts.
Permission denied (publickey).
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1]
Warning: Permanently added '104.154.80.32' (ECDSA) to the list of known hosts.
Permission denied (publickey).
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1]
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12864282 - PreCommit-HIVE-Build

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16321.01.patch, HIVE-16321.02.patch, 
> HIVE-16321.03.patch, HIVE-16321.04.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976412#comment-15976412
 ] 

Hive QA commented on HIVE-16321:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864173/HIVE-16321.03.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10567 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=100)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testConnection (batchId=235)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testIsValid (batchId=235)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testIsValidNeg (batchId=235)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testNegativeProxyAuth 
(batchId=235)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testNegativeTokenAuth 
(batchId=235)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testProxyAuth (batchId=235)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testTokenAuth (batchId=235)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4785/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4785/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4785/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12864173 - PreCommit-HIVE-Build

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16321.01.patch, HIVE-16321.02.patch, 
> HIVE-16321.03.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975813#comment-15975813
 ] 

Hive QA commented on HIVE-16321:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864061/HIVE-16321.02.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10581 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge2]
 (batchId=167)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4770/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4770/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4770/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12864061 - PreCommit-HIVE-Build

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16321.01.patch, HIVE-16321.02.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-19 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975338#comment-15975338
 ] 

Eugene Koifman commented on HIVE-16321:
---

I don't think "if (connPoolMutex == null)"  will add anything.  if creating the 
1st pool throws Thrift should abandon the whole object

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-16321.01.patch, HIVE-16321.02.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-13 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968426#comment-15968426
 ] 

Wei Zheng commented on HIVE-16321:
--

Typo in setConf(): getCommectionTimeoutMs. Also we may use a constant instead 
of hardcoded number for this timeout. Should we add "if (connPoolMutex == 
null)" before initializing connPoolMutex?

In setupJdbcConnectionPool(): for hikaricp block, there are two 
setMaximumPoolSize. One should be removed.

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-16321.01.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-13 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968315#comment-15968315
 ] 

Eugene Koifman commented on HIVE-16321:
---

wow, a clean run!

[~wzheng] could you review please

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-16321.01.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled

2017-04-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968294#comment-15968294
 ] 

Hive QA commented on HIVE-16321:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12863338/HIVE-16321.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 10571 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4678/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4678/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4678/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12863338 - PreCommit-HIVE-Build

> Possible deadlock in metastore with Acid enabled
> 
>
> Key: HIVE-16321
> URL: https://issues.apache.org/jira/browse/HIVE-16321
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-16321.01.patch
>
>
> TxnStore.MutexAPI is a mechanism how different Metastore instances can 
> coordinate their operations.  It uses a JDBCConnection to achieve it.
> In some cases this may lead to deadlock.  TxnHandler uses a connection pool 
> of fixed size.  Suppose you have X simultaneous calls to  TxnHandler.lock(), 
> where X is >= size of the pool.  This take all connections form the pool, so 
> when
> {noformat}
> handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name());
> {noformat} 
> is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the 
> pool is empty and the system is deadlocked.
> MutexAPI can't use the same connection as the operation it's protecting.  
> (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example).
> We could make MutexAPI use a separate connection pool (size > 'primary' conn 
> pool).
> Or we could make TxnHandler.lock(LockRequest rqst) return immediately after 
> enqueueing the lock with the expectation that the caller will always follow 
> up with a call to checkLock(CheckLockRequest rqst).
> cc [~f1sherox]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)