Oh, my copy paste skills failed me. Here's output from HIVE_LOCKS: # HL_LOCK_EXT_ID, HL_LOCK_INT_ID, HL_TXNID, HL_DB, HL_TABLE, HL_PARTITION, HL_LOCK_STATE, HL_LOCK_TYPE, HL_LAST_HEARTBEAT, HL_ACQUIRED_AT, HL_USER, HL_HOST '8496355', '1', '78461824', 'default', 'data_http', 'dt=20160821', 'a', 'r', '0', '1471768489000', 'storm', 'sorm-data03.msk.mts.ru'
TXNS, TXN_COMPONENTS: # TXN_ID, TXN_STATE, TXN_STARTED, TXN_LAST_HEARTBEAT, TXN_USER, TXN_HOST, TC_TXNID, TC_DATABASE, TC_TABLE, TC_PARTITION 78461824, a, 1471762974000, 1471768488000, storm, sorm-data03.msk.mts.ru, 78461824, default, data_http, dt=201608213 Good news, I've found a log when it all started: Here comes heartbeat: 2016-08-21 11:34:48,934 DEBUG [pool-3-thread-155]: txn.TxnHandler (TxnHandler.java:heartbeatTxn(1882)) - Going to execute update <update TXNS set txn_last_heartbeat = 1471768488000 where txn_id = 78461824 and txn_state = 'o'> Concurrently we are trying to abort transaction 2016-08-21 11:34:48,907 DEBUG [DeadTxnReaper-0]: txn.TxnHandler (TxnHandler.java:abortTxns(1429)) - Going to execute update <delete from HIVE_LOCKS where hl_txnid in (78461824,78461825,78461826,78461827,78461828,78461829)> 2016-08-21 11:34:48,943 DEBUG [DeadTxnReaper-0]: txn.TxnHandler (TxnHandler.java:abortTxns(1446)) - Going to execute update <update TXNS set txn_state = 'a' where txn_state = 'o' and txn_id in (78461824,78461825,78461826,78461827,78461828,78461829) and txn_last_heartbeat < 1471764233000> 2016-08-21 11:34:48,957 DEBUG [pool-3-thread-11]: txn.TxnHandler (TxnHandler.java:abortTxns(1429)) - Going to execute update <delete from HIVE_LOCKS where hl_txnid in (78461824)> 2016-08-21 11:34:48,979 DEBUG [pool-3-thread-11]: txn.TxnHandler (TxnHandler.java:abortTxns(1446)) - Going to execute update <update TXNS set txn_state = 'a' where txn_state = 'o' and txn_id in (78461824)> At this point transaction aborted and thers no lock. But heartbeat thread still working and: 2016-08-21 11:34:49,025 DEBUG [pool-3-thread-155]: txn.TxnHandler (TxnHandler.java:lock(1546)) - Going to execute update <insert into TXN_COMPONENTS (tc_txnid, tc_database, tc_table, tc_partition) values (78461824, 'default', 'data_http', 'dt=20160821')> 2016-08-21 11:34:49,027 DEBUG [pool-3-thread-155]: txn.TxnHandler (TxnHandler.java:lock(1576)) - Going to execute update <insert into HIVE_LOCKS (hl_lock_ext_id, hl_lock_int_id, hl_txnid, hl_db, hl_table, hl_partition, hl_lock_state, hl_lock_type, hl_last_heartbeat, hl_user, hl_host) values (8496355, 1,78461824, 'default', 'data_http', 'dt=20160821', 'w', 'r', 0, 'storm', 'sorm-data03.msk.mts.ru')> 2016-08-21 11:34:49,029 DEBUG [pool-3-thread-155]: txn.TxnHandler (TxnHandler.java:checkLock(1696)) - lockid:8496355 intLockId:1 txnid:78461824 db:default table:data_http partition:dt=20160821 state:WAITING type:SHARED_READ So I guess, that it's race condition between heartbeat thread and TxnReaper thread. Last heartbeat information in HIVE_LOCKS table differs from TXNS table. Full log here <https://drive.google.com/open?id=0ByB92PAoAkrKSFFaQjBNSnNXZFk>. On Tue, Aug 23, 2016 at 8:20 PM, Eugene Koifman <ekoif...@hortonworks.com> wrote: > your query is “select * from HIVE_LOCKS” but the output is not from > HIVE_LOCKS. > What entries do you have in HIVE_LOCKS for this txn_id? > > If all you see is an entry in TXN table in ‘a’ state – that is OK. that > just mean that this transaction was aborted. > > Eugene > > From: Igor Kuzmenko <f1she...@gmail.com> > Reply-To: "user@hive.apache.org" <user@hive.apache.org> > Date: Monday, August 22, 2016 at 8:27 AM > To: "user@hive.apache.org" <user@hive.apache.org> > Subject: Hive transaction doesn't release lock. > > Hello, I'm using Apache Hive 1.2.1 and Apache Storm to stream data in hive > table. > After making some tests I tried to truncate my table, but sql execution > doesn't complete because of the lock on table: > > > select * from HIVE_LOCKS; > > > # TXN_ID, TXN_STATE, TXN_STARTED, TXN_LAST_HEARTBEAT, TXN_USER, TXN_HOST, > TC_TXNID, TC_DATABASE, TC_TABLE, TC_PARTITION > '78461824', 'a', '1471762974000', '1471768488000', 'storm', 'sorm-data03. > msk.mts.ru', '78461824', 'default', 'data_http', 'dt=20160821' > > > > At the same time: > > select * from TXNS, TXN_COMPONENTS > where 1=1 > and TXNS.TXN_ID = TXN_COMPONENTS.TC_TXNID > and TXNS.TXN_ID = 78461824 > ; > > > # TXN_ID, TXN_STATE, TXN_STARTED, TXN_LAST_HEARTBEAT, TXN_USER, TXN_HOST, > TC_TXNID, TC_DATABASE, TC_TABLE, TC_PARTITION > 78461824, a, 1471762974000, 1471768488000, storm, sorm-data03.msk.mts.ru, > 78461824, default, data_http, dt=20160821 > > > Txn 78461824 is in aborted state, but still keep lock on table > 'data_http'. How can I avoid this? What should I do to get rid of that > lock? > >