What is your hive.txn.timeout set to and does the value as seen by the metastore process agree with what is seen by Storm client?
The good news is that this race condition is no longer possible since https://issues.apache.org/jira/browse/HIVE-13013 Eugene From: Igor Kuzmenko <f1she...@gmail.com<mailto:f1she...@gmail.com>> Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Date: Wednesday, August 24, 2016 at 3:32 AM To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Subject: Re: Hive transaction doesn't release lock. Oh, my copy paste skills failed me. Here's output from HIVE_LOCKS: # HL_LOCK_EXT_ID, HL_LOCK_INT_ID, HL_TXNID, HL_DB, HL_TABLE, HL_PARTITION, HL_LOCK_STATE, HL_LOCK_TYPE, HL_LAST_HEARTBEAT, HL_ACQUIRED_AT, HL_USER, HL_HOST '8496355', '1', '78461824', 'default', 'data_http', 'dt=20160821', 'a', 'r', '0', '1471768489000', 'storm', 'sorm-data03.msk.mts.ru<http://sorm-data03.msk.mts.ru>' TXNS, TXN_COMPONENTS: # TXN_ID, TXN_STATE, TXN_STARTED, TXN_LAST_HEARTBEAT, TXN_USER, TXN_HOST, TC_TXNID, TC_DATABASE, TC_TABLE, TC_PARTITION 78461824, a, 1471762974000, 1471768488000, storm, sorm-data03.msk.mts.ru<http://sorm-data03.msk.mts.ru>, 78461824, default, data_http, dt=201608213 Good news, I've found a log when it all started: Here comes heartbeat: 2016-08-21 11:34:48,934 DEBUG [pool-3-thread-155]: txn.TxnHandler (TxnHandler.java:heartbeatTxn(1882)) - Going to execute update <update TXNS set txn_last_heartbeat = 1471768488000 where txn_id = 78461824 and txn_state = 'o'> Concurrently we are trying to abort transaction 2016-08-21 11:34:48,907 DEBUG [DeadTxnReaper-0]: txn.TxnHandler (TxnHandler.java:abortTxns(1429)) - Going to execute update <delete from HIVE_LOCKS where hl_txnid in (78461824,78461825,78461826,78461827,78461828,78461829)> 2016-08-21 11:34:48,943 DEBUG [DeadTxnReaper-0]: txn.TxnHandler (TxnHandler.java:abortTxns(1446)) - Going to execute update <update TXNS set txn_state = 'a' where txn_state = 'o' and txn_id in (78461824,78461825,78461826,78461827,78461828,78461829) and txn_last_heartbeat < 1471764233000> 2016-08-21 11:34:48,957 DEBUG [pool-3-thread-11]: txn.TxnHandler (TxnHandler.java:abortTxns(1429)) - Going to execute update <delete from HIVE_LOCKS where hl_txnid in (78461824)> 2016-08-21 11:34:48,979 DEBUG [pool-3-thread-11]: txn.TxnHandler (TxnHandler.java:abortTxns(1446)) - Going to execute update <update TXNS set txn_state = 'a' where txn_state = 'o' and txn_id in (78461824)> At this point transaction aborted and thers no lock. But heartbeat thread still working and: 2016-08-21 11:34:49,025 DEBUG [pool-3-thread-155]: txn.TxnHandler (TxnHandler.java:lock(1546)) - Going to execute update <insert into TXN_COMPONENTS (tc_txnid, tc_database, tc_table, tc_partition) values (78461824, 'default', 'data_http', 'dt=20160821')> 2016-08-21 11:34:49,027 DEBUG [pool-3-thread-155]: txn.TxnHandler (TxnHandler.java:lock(1576)) - Going to execute update <insert into HIVE_LOCKS (hl_lock_ext_id, hl_lock_int_id, hl_txnid, hl_db, hl_table, hl_partition, hl_lock_state, hl_lock_type, hl_last_heartbeat, hl_user, hl_host) values (8496355, 1,78461824, 'default', 'data_http', 'dt=20160821', 'w', 'r', 0, 'storm', 'sorm-data03.msk.mts.ru<http://sorm-data03.msk.mts.ru>')> 2016-08-21 11:34:49,029 DEBUG [pool-3-thread-155]: txn.TxnHandler (TxnHandler.java:checkLock(1696)) - lockid:8496355 intLockId:1 txnid:78461824 db:default table:data_http partition:dt=20160821 state:WAITING type:SHARED_READ So I guess, that it's race condition between heartbeat thread and TxnReaper thread. Last heartbeat information in HIVE_LOCKS table differs from TXNS table. Full log here<https://drive.google.com/open?id=0ByB92PAoAkrKSFFaQjBNSnNXZFk>. On Tue, Aug 23, 2016 at 8:20 PM, Eugene Koifman <ekoif...@hortonworks.com<mailto:ekoif...@hortonworks.com>> wrote: your query is “select * from HIVE_LOCKS” but the output is not from HIVE_LOCKS. What entries do you have in HIVE_LOCKS for this txn_id? If all you see is an entry in TXN table in ‘a’ state – that is OK. that just mean that this transaction was aborted. Eugene From: Igor Kuzmenko <f1she...@gmail.com<mailto:f1she...@gmail.com>> Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Date: Monday, August 22, 2016 at 8:27 AM To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Subject: Hive transaction doesn't release lock. Hello, I'm using Apache Hive 1.2.1 and Apache Storm to stream data in hive table. After making some tests I tried to truncate my table, but sql execution doesn't complete because of the lock on table: select * from HIVE_LOCKS; # TXN_ID, TXN_STATE, TXN_STARTED, TXN_LAST_HEARTBEAT, TXN_USER, TXN_HOST, TC_TXNID, TC_DATABASE, TC_TABLE, TC_PARTITION '78461824', 'a', '1471762974000', '1471768488000', 'storm', 'sorm-data03.msk.mts.ru', '78461824', 'default', 'data_http', 'dt=20160821' At the same time: select * from TXNS, TXN_COMPONENTS where 1=1 and TXNS.TXN_ID = TXN_COMPONENTS.TC_TXNID and TXNS.TXN_ID = 78461824 ; # TXN_ID, TXN_STATE, TXN_STARTED, TXN_LAST_HEARTBEAT, TXN_USER, TXN_HOST, TC_TXNID, TC_DATABASE, TC_TABLE, TC_PARTITION 78461824, a, 1471762974000, 1471768488000, storm, sorm-data03.msk.mts.ru, 78461824, default, data_http, dt=20160821 Txn 78461824 is in aborted state, but still keep lock on table 'data_http'. How can I avoid this? What should I do to get rid of that lock?