[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled
[ https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980819#comment-15980819 ] Lefty Leverenz commented on HIVE-16321: --- Doc note: This adds *datanucleus.connectionPool.maxPoolSize* to HiveConf.java in releases 2.3.0 and 2.4.0. (HIVE-16383 already added it to master for release 3.0.0.) It should be documented in the MetaStore section of Configuration Properties: * [Configuration Properties -- MetaStore | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-MetaStore] Added a TODOC2.3 label. > Possible deadlock in metastore with Acid enabled > > > Key: HIVE-16321 > URL: https://issues.apache.org/jira/browse/HIVE-16321 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Labels: TODOC2.3 > Fix For: 2.3.0, 3.0.0, 2.4.0 > > Attachments: HIVE-16321.01-branch-2.3.patch, > HIVE-16321.01.branch-2.patch, HIVE-16321.01.patch, > HIVE-16321.02.branch-2.patch, HIVE-16321.02.patch, > HIVE-16321.03-branch-2.patch, HIVE-16321.03.patch, > HIVE-16321.05-branch-2.3.patch > > > TxnStore.MutexAPI is a mechanism how different Metastore instances can > coordinate their operations. It uses a JDBCConnection to achieve it. > In some cases this may lead to deadlock. TxnHandler uses a connection pool > of fixed size. Suppose you have X simultaneous calls to TxnHandler.lock(), > where X is >= size of the pool. This take all connections form the pool, so > when > {noformat} > handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name()); > {noformat} > is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the > pool is empty and the system is deadlocked. > MutexAPI can't use the same connection as the operation it's protecting. > (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example). > We could make MutexAPI use a separate connection pool (size > 'primary' conn > pool). > Or we could make TxnHandler.lock(LockRequest rqst) return immediately after > enqueueing the lock with the expectation that the caller will always follow > up with a call to checkLock(CheckLockRequest rqst). > cc [~f1sherox] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled
[ https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980157#comment-15980157 ] Eugene Koifman commented on HIVE-16321: --- committed to branch-2.3 https://github.com/apache/hive/commit/6cdfaaedd7d2b22fe32fa0f6686d3533012bf982 Thanks [~wzheng] for the review > Possible deadlock in metastore with Acid enabled > > > Key: HIVE-16321 > URL: https://issues.apache.org/jira/browse/HIVE-16321 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-16321.01-branch-2.3.patch, > HIVE-16321.01.branch-2.patch, HIVE-16321.01.patch, > HIVE-16321.02.branch-2.patch, HIVE-16321.02.patch, > HIVE-16321.03-branch-2.patch, HIVE-16321.03.patch, > HIVE-16321.05-branch-2.3.patch > > > TxnStore.MutexAPI is a mechanism how different Metastore instances can > coordinate their operations. It uses a JDBCConnection to achieve it. > In some cases this may lead to deadlock. TxnHandler uses a connection pool > of fixed size. Suppose you have X simultaneous calls to TxnHandler.lock(), > where X is >= size of the pool. This take all connections form the pool, so > when > {noformat} > handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name()); > {noformat} > is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the > pool is empty and the system is deadlocked. > MutexAPI can't use the same connection as the operation it's protecting. > (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example). > We could make MutexAPI use a separate connection pool (size > 'primary' conn > pool). > Or we could make TxnHandler.lock(LockRequest rqst) return immediately after > enqueueing the lock with the expectation that the caller will always follow > up with a call to checkLock(CheckLockRequest rqst). > cc [~f1sherox] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled
[ https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980132#comment-15980132 ] Hive QA commented on HIVE-16321: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12864641/HIVE-16321.05-branch-2.3.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10570 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=142) org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=174) org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDecimalX (batchId=177) org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteTimestamp (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4846/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4846/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4846/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12864641 - PreCommit-HIVE-Build > Possible deadlock in metastore with Acid enabled > > > Key: HIVE-16321 > URL: https://issues.apache.org/jira/browse/HIVE-16321 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-16321.01-branch-2.3.patch, > HIVE-16321.01.branch-2.patch, HIVE-16321.01.patch, > HIVE-16321.02.branch-2.patch, HIVE-16321.02.patch, > HIVE-16321.03-branch-2.patch, HIVE-16321.03.patch, > HIVE-16321.05-branch-2.3.patch > > > TxnStore.MutexAPI is a mechanism how different Metastore instances can > coordinate their operations. It uses a JDBCConnection to achieve it. > In some cases this may lead to deadlock. TxnHandler uses a connection pool > of fixed size. Suppose you have X simultaneous calls to TxnHandler.lock(), > where X is >= size of the pool. This take all connections form the pool, so > when > {noformat} > handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name()); > {noformat} > is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the > pool is empty and the system is deadlocked. > MutexAPI can't use the same connection as the operation it's protecting. > (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example). > We could make MutexAPI use a separate connection pool (size > 'primary' conn > pool). > Or we could make TxnHandler.lock(LockRequest rqst) return immediately after > enqueueing the lock with the expectation that the caller will always follow > up with a call to checkLock(CheckLockRequest rqst). > cc [~f1sherox] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled
[ https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980019#comment-15980019 ] Eugene Koifman commented on HIVE-16321: --- committed to branch-2 https://github.com/apache/hive/commit/59faf36952f70d243ba997e54c1349ea2c01e670 > Possible deadlock in metastore with Acid enabled > > > Key: HIVE-16321 > URL: https://issues.apache.org/jira/browse/HIVE-16321 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-16321.01-branch-2.3.patch, > HIVE-16321.01.branch-2.patch, HIVE-16321.01.patch, > HIVE-16321.02.branch-2.patch, HIVE-16321.02.patch, > HIVE-16321.03-branch-2.patch, HIVE-16321.03.patch, > HIVE-16321.05-branch-2.3.patch > > > TxnStore.MutexAPI is a mechanism how different Metastore instances can > coordinate their operations. It uses a JDBCConnection to achieve it. > In some cases this may lead to deadlock. TxnHandler uses a connection pool > of fixed size. Suppose you have X simultaneous calls to TxnHandler.lock(), > where X is >= size of the pool. This take all connections form the pool, so > when > {noformat} > handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name()); > {noformat} > is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the > pool is empty and the system is deadlocked. > MutexAPI can't use the same connection as the operation it's protecting. > (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example). > We could make MutexAPI use a separate connection pool (size > 'primary' conn > pool). > Or we could make TxnHandler.lock(LockRequest rqst) return immediately after > enqueueing the lock with the expectation that the caller will always follow > up with a call to checkLock(CheckLockRequest rqst). > cc [~f1sherox] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled
[ https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979991#comment-15979991 ] Hive QA commented on HIVE-16321: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12864614/HIVE-16321.03-branch-2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10571 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=154) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=142) org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=174) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4843/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4843/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4843/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12864614 - PreCommit-HIVE-Build > Possible deadlock in metastore with Acid enabled > > > Key: HIVE-16321 > URL: https://issues.apache.org/jira/browse/HIVE-16321 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-16321.01.branch-2.patch, HIVE-16321.01.patch, > HIVE-16321.02.branch-2.patch, HIVE-16321.02.patch, > HIVE-16321.03-branch-2.patch, HIVE-16321.03.patch > > > TxnStore.MutexAPI is a mechanism how different Metastore instances can > coordinate their operations. It uses a JDBCConnection to achieve it. > In some cases this may lead to deadlock. TxnHandler uses a connection pool > of fixed size. Suppose you have X simultaneous calls to TxnHandler.lock(), > where X is >= size of the pool. This take all connections form the pool, so > when > {noformat} > handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name()); > {noformat} > is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the > pool is empty and the system is deadlocked. > MutexAPI can't use the same connection as the operation it's protecting. > (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example). > We could make MutexAPI use a separate connection pool (size > 'primary' conn > pool). > Or we could make TxnHandler.lock(LockRequest rqst) return immediately after > enqueueing the lock with the expectation that the caller will always follow > up with a call to checkLock(CheckLockRequest rqst). > cc [~f1sherox] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled
[ https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979576#comment-15979576 ] Hive QA commented on HIVE-16321: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12864558/HIVE-16321.02.branch-2.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4832/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4832/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4832/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-04-21 23:31:56.671 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-4832/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2017-04-21 23:31:56.674 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 6566065 HIVE-15982 : Support the width_bucket function (Sahil Takiar via Ashutosh Chauhan) + git clean -f -d + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at 6566065 HIVE-15982 : Support the width_bucket function (Sahil Takiar via Ashutosh Chauhan) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-04-21 23:31:57.802 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: patch failed: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:737 error: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: patch does not apply error: patch failed: metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java:147 error: metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: patch does not apply The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12864558 - PreCommit-HIVE-Build > Possible deadlock in metastore with Acid enabled > > > Key: HIVE-16321 > URL: https://issues.apache.org/jira/browse/HIVE-16321 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-16321.01.branch-2.patch, HIVE-16321.01.patch, > HIVE-16321.02.branch-2.patch, HIVE-16321.02.patch, HIVE-16321.03.patch > > > TxnStore.MutexAPI is a mechanism how different Metastore instances can > coordinate their operations. It uses a JDBCConnection to achieve it. > In some cases this may lead to deadlock. TxnHandler uses a connection pool > of fixed size. Suppose you have X simultaneous calls to TxnHandler.lock(), > where X is >= size of the pool. This take all connections form the pool, so > when > {noformat} > handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name()); > {noformat} > is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the > pool is empty and the system is deadlocked. > MutexAPI can't use the same connection as the operation it's protecting. > (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example). > We could make MutexAPI use a separate connection pool (size > 'primary' conn > pool). > Or we could make TxnHandler.lock(LockRequest rqst) return immediately after > enqueueing the lock with the expectation that the caller will always follow > up with a call to checkLock(CheckLockRequest rqst). > cc [~f1sherox] -- This message was sent
[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled
[ https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977901#comment-15977901 ] Eugene Koifman commented on HIVE-16321: --- committed to master https://github.com/apache/hive/commit/182218b760397e27936c5b9885083cdc774fef90 > Possible deadlock in metastore with Acid enabled > > > Key: HIVE-16321 > URL: https://issues.apache.org/jira/browse/HIVE-16321 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16321.01.branch-2.patch, HIVE-16321.01.patch, > HIVE-16321.02.patch, HIVE-16321.03.patch > > > TxnStore.MutexAPI is a mechanism how different Metastore instances can > coordinate their operations. It uses a JDBCConnection to achieve it. > In some cases this may lead to deadlock. TxnHandler uses a connection pool > of fixed size. Suppose you have X simultaneous calls to TxnHandler.lock(), > where X is >= size of the pool. This take all connections form the pool, so > when > {noformat} > handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name()); > {noformat} > is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the > pool is empty and the system is deadlocked. > MutexAPI can't use the same connection as the operation it's protecting. > (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example). > We could make MutexAPI use a separate connection pool (size > 'primary' conn > pool). > Or we could make TxnHandler.lock(LockRequest rqst) return immediately after > enqueueing the lock with the expectation that the caller will always follow > up with a call to checkLock(CheckLockRequest rqst). > cc [~f1sherox] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled
[ https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977475#comment-15977475 ] Wei Zheng commented on HIVE-16321: -- +1 > Possible deadlock in metastore with Acid enabled > > > Key: HIVE-16321 > URL: https://issues.apache.org/jira/browse/HIVE-16321 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16321.01.patch, HIVE-16321.02.patch, > HIVE-16321.03.patch > > > TxnStore.MutexAPI is a mechanism how different Metastore instances can > coordinate their operations. It uses a JDBCConnection to achieve it. > In some cases this may lead to deadlock. TxnHandler uses a connection pool > of fixed size. Suppose you have X simultaneous calls to TxnHandler.lock(), > where X is >= size of the pool. This take all connections form the pool, so > when > {noformat} > handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name()); > {noformat} > is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the > pool is empty and the system is deadlocked. > MutexAPI can't use the same connection as the operation it's protecting. > (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example). > We could make MutexAPI use a separate connection pool (size > 'primary' conn > pool). > Or we could make TxnHandler.lock(LockRequest rqst) return immediately after > enqueueing the lock with the expectation that the caller will always follow > up with a call to checkLock(CheckLockRequest rqst). > cc [~f1sherox] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled
[ https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977208#comment-15977208 ] Eugene Koifman commented on HIVE-16321: --- patches 2 and 3 are the same but have different set of failures except vector_if_expr which is a flaky test no related failures > Possible deadlock in metastore with Acid enabled > > > Key: HIVE-16321 > URL: https://issues.apache.org/jira/browse/HIVE-16321 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16321.01.patch, HIVE-16321.02.patch, > HIVE-16321.03.patch > > > TxnStore.MutexAPI is a mechanism how different Metastore instances can > coordinate their operations. It uses a JDBCConnection to achieve it. > In some cases this may lead to deadlock. TxnHandler uses a connection pool > of fixed size. Suppose you have X simultaneous calls to TxnHandler.lock(), > where X is >= size of the pool. This take all connections form the pool, so > when > {noformat} > handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name()); > {noformat} > is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the > pool is empty and the system is deadlocked. > MutexAPI can't use the same connection as the operation it's protecting. > (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example). > We could make MutexAPI use a separate connection pool (size > 'primary' conn > pool). > Or we could make TxnHandler.lock(LockRequest rqst) return immediately after > enqueueing the lock with the expectation that the caller will always follow > up with a call to checkLock(CheckLockRequest rqst). > cc [~f1sherox] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled
[ https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977012#comment-15977012 ] Hive QA commented on HIVE-16321: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12864282/HIVE-16321.04.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4793/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4793/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4793/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Tests exited with: ExecutionException: org.apache.hive.ptest.execution.ssh.SSHExecutionException: RSyncResult [localFile=/data/hiveptest/logs/PreCommit-HIVE-Build-4793/succeeded/196_TestRemoteHiveMetaStoreIpAddress, remoteFile=/home/hiveptest/104.154.80.32-hiveptest-0/logs/, getExitCode()=255, getException()=null, getUser()=hiveptest, getHost()=104.154.80.32, getInstance()=0]: 'Warning: Permanently added '104.154.80.32' (ECDSA) to the list of known hosts. Permission denied (publickey). rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1] Warning: Permanently added '104.154.80.32' (ECDSA) to the list of known hosts. Permission denied (publickey). rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1] Warning: Permanently added '104.154.80.32' (ECDSA) to the list of known hosts. Permission denied (publickey). rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1] Warning: Permanently added '104.154.80.32' (ECDSA) to the list of known hosts. Permission denied (publickey). rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1] Warning: Permanently added '104.154.80.32' (ECDSA) to the list of known hosts. Permission denied (publickey). rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1] ' {noformat} This message is automatically generated. ATTACHMENT ID: 12864282 - PreCommit-HIVE-Build > Possible deadlock in metastore with Acid enabled > > > Key: HIVE-16321 > URL: https://issues.apache.org/jira/browse/HIVE-16321 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16321.01.patch, HIVE-16321.02.patch, > HIVE-16321.03.patch, HIVE-16321.04.patch > > > TxnStore.MutexAPI is a mechanism how different Metastore instances can > coordinate their operations. It uses a JDBCConnection to achieve it. > In some cases this may lead to deadlock. TxnHandler uses a connection pool > of fixed size. Suppose you have X simultaneous calls to TxnHandler.lock(), > where X is >= size of the pool. This take all connections form the pool, so > when > {noformat} > handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name()); > {noformat} > is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the > pool is empty and the system is deadlocked. > MutexAPI can't use the same connection as the operation it's protecting. > (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example). > We could make MutexAPI use a separate connection pool (size > 'primary' conn > pool). > Or we could make TxnHandler.lock(LockRequest rqst) return immediately after > enqueueing the lock with the expectation that the caller will always follow > up with a call to checkLock(CheckLockRequest rqst). > cc [~f1sherox] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled
[ https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976412#comment-15976412 ] Hive QA commented on HIVE-16321: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12864173/HIVE-16321.03.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10567 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver (batchId=100) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testConnection (batchId=235) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testIsValid (batchId=235) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testIsValidNeg (batchId=235) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testNegativeProxyAuth (batchId=235) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testNegativeTokenAuth (batchId=235) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testProxyAuth (batchId=235) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testTokenAuth (batchId=235) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4785/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4785/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4785/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12864173 - PreCommit-HIVE-Build > Possible deadlock in metastore with Acid enabled > > > Key: HIVE-16321 > URL: https://issues.apache.org/jira/browse/HIVE-16321 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16321.01.patch, HIVE-16321.02.patch, > HIVE-16321.03.patch > > > TxnStore.MutexAPI is a mechanism how different Metastore instances can > coordinate their operations. It uses a JDBCConnection to achieve it. > In some cases this may lead to deadlock. TxnHandler uses a connection pool > of fixed size. Suppose you have X simultaneous calls to TxnHandler.lock(), > where X is >= size of the pool. This take all connections form the pool, so > when > {noformat} > handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name()); > {noformat} > is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the > pool is empty and the system is deadlocked. > MutexAPI can't use the same connection as the operation it's protecting. > (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example). > We could make MutexAPI use a separate connection pool (size > 'primary' conn > pool). > Or we could make TxnHandler.lock(LockRequest rqst) return immediately after > enqueueing the lock with the expectation that the caller will always follow > up with a call to checkLock(CheckLockRequest rqst). > cc [~f1sherox] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled
[ https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975813#comment-15975813 ] Hive QA commented on HIVE-16321: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12864061/HIVE-16321.02.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10581 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge2] (batchId=167) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4770/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4770/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4770/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12864061 - PreCommit-HIVE-Build > Possible deadlock in metastore with Acid enabled > > > Key: HIVE-16321 > URL: https://issues.apache.org/jira/browse/HIVE-16321 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16321.01.patch, HIVE-16321.02.patch > > > TxnStore.MutexAPI is a mechanism how different Metastore instances can > coordinate their operations. It uses a JDBCConnection to achieve it. > In some cases this may lead to deadlock. TxnHandler uses a connection pool > of fixed size. Suppose you have X simultaneous calls to TxnHandler.lock(), > where X is >= size of the pool. This take all connections form the pool, so > when > {noformat} > handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name()); > {noformat} > is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the > pool is empty and the system is deadlocked. > MutexAPI can't use the same connection as the operation it's protecting. > (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example). > We could make MutexAPI use a separate connection pool (size > 'primary' conn > pool). > Or we could make TxnHandler.lock(LockRequest rqst) return immediately after > enqueueing the lock with the expectation that the caller will always follow > up with a call to checkLock(CheckLockRequest rqst). > cc [~f1sherox] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled
[ https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975338#comment-15975338 ] Eugene Koifman commented on HIVE-16321: --- I don't think "if (connPoolMutex == null)" will add anything. if creating the 1st pool throws Thrift should abandon the whole object > Possible deadlock in metastore with Acid enabled > > > Key: HIVE-16321 > URL: https://issues.apache.org/jira/browse/HIVE-16321 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-16321.01.patch, HIVE-16321.02.patch > > > TxnStore.MutexAPI is a mechanism how different Metastore instances can > coordinate their operations. It uses a JDBCConnection to achieve it. > In some cases this may lead to deadlock. TxnHandler uses a connection pool > of fixed size. Suppose you have X simultaneous calls to TxnHandler.lock(), > where X is >= size of the pool. This take all connections form the pool, so > when > {noformat} > handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name()); > {noformat} > is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the > pool is empty and the system is deadlocked. > MutexAPI can't use the same connection as the operation it's protecting. > (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example). > We could make MutexAPI use a separate connection pool (size > 'primary' conn > pool). > Or we could make TxnHandler.lock(LockRequest rqst) return immediately after > enqueueing the lock with the expectation that the caller will always follow > up with a call to checkLock(CheckLockRequest rqst). > cc [~f1sherox] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled
[ https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968426#comment-15968426 ] Wei Zheng commented on HIVE-16321: -- Typo in setConf(): getCommectionTimeoutMs. Also we may use a constant instead of hardcoded number for this timeout. Should we add "if (connPoolMutex == null)" before initializing connPoolMutex? In setupJdbcConnectionPool(): for hikaricp block, there are two setMaximumPoolSize. One should be removed. > Possible deadlock in metastore with Acid enabled > > > Key: HIVE-16321 > URL: https://issues.apache.org/jira/browse/HIVE-16321 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-16321.01.patch > > > TxnStore.MutexAPI is a mechanism how different Metastore instances can > coordinate their operations. It uses a JDBCConnection to achieve it. > In some cases this may lead to deadlock. TxnHandler uses a connection pool > of fixed size. Suppose you have X simultaneous calls to TxnHandler.lock(), > where X is >= size of the pool. This take all connections form the pool, so > when > {noformat} > handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name()); > {noformat} > is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the > pool is empty and the system is deadlocked. > MutexAPI can't use the same connection as the operation it's protecting. > (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example). > We could make MutexAPI use a separate connection pool (size > 'primary' conn > pool). > Or we could make TxnHandler.lock(LockRequest rqst) return immediately after > enqueueing the lock with the expectation that the caller will always follow > up with a call to checkLock(CheckLockRequest rqst). > cc [~f1sherox] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled
[ https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968315#comment-15968315 ] Eugene Koifman commented on HIVE-16321: --- wow, a clean run! [~wzheng] could you review please > Possible deadlock in metastore with Acid enabled > > > Key: HIVE-16321 > URL: https://issues.apache.org/jira/browse/HIVE-16321 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-16321.01.patch > > > TxnStore.MutexAPI is a mechanism how different Metastore instances can > coordinate their operations. It uses a JDBCConnection to achieve it. > In some cases this may lead to deadlock. TxnHandler uses a connection pool > of fixed size. Suppose you have X simultaneous calls to TxnHandler.lock(), > where X is >= size of the pool. This take all connections form the pool, so > when > {noformat} > handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name()); > {noformat} > is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the > pool is empty and the system is deadlocked. > MutexAPI can't use the same connection as the operation it's protecting. > (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example). > We could make MutexAPI use a separate connection pool (size > 'primary' conn > pool). > Or we could make TxnHandler.lock(LockRequest rqst) return immediately after > enqueueing the lock with the expectation that the caller will always follow > up with a call to checkLock(CheckLockRequest rqst). > cc [~f1sherox] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16321) Possible deadlock in metastore with Acid enabled
[ https://issues.apache.org/jira/browse/HIVE-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968294#comment-15968294 ] Hive QA commented on HIVE-16321: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12863338/HIVE-16321.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 10571 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4678/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4678/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4678/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12863338 - PreCommit-HIVE-Build > Possible deadlock in metastore with Acid enabled > > > Key: HIVE-16321 > URL: https://issues.apache.org/jira/browse/HIVE-16321 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-16321.01.patch > > > TxnStore.MutexAPI is a mechanism how different Metastore instances can > coordinate their operations. It uses a JDBCConnection to achieve it. > In some cases this may lead to deadlock. TxnHandler uses a connection pool > of fixed size. Suppose you have X simultaneous calls to TxnHandler.lock(), > where X is >= size of the pool. This take all connections form the pool, so > when > {noformat} > handle = getMutexAPI().acquireLock(MUTEX_KEY.CheckLock.name()); > {noformat} > is executed in _TxnHandler.checkLock(Connection dbConn, long extLockId)_ the > pool is empty and the system is deadlocked. > MutexAPI can't use the same connection as the operation it's protecting. > (TxnHandler.checkLock(Connection dbConn, long extLockId) is an example). > We could make MutexAPI use a separate connection pool (size > 'primary' conn > pool). > Or we could make TxnHandler.lock(LockRequest rqst) return immediately after > enqueueing the lock with the expectation that the caller will always follow > up with a call to checkLock(CheckLockRequest rqst). > cc [~f1sherox] -- This message was sent by Atlassian JIRA (v6.3.15#6346)