What is your hive.txn.timeout set to and does the value as seen by the 
metastore process agree with what is seen by Storm client?

The good news is that this race condition is no longer possible since 
https://issues.apache.org/jira/browse/HIVE-13013

Eugene

From: Igor Kuzmenko <f1she...@gmail.com<mailto:f1she...@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Wednesday, August 24, 2016 at 3:32 AM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Re: Hive transaction doesn't release lock.

Oh, my copy paste skills failed me.
Here's output from HIVE_LOCKS:

# HL_LOCK_EXT_ID, HL_LOCK_INT_ID, HL_TXNID, HL_DB, HL_TABLE, HL_PARTITION, 
HL_LOCK_STATE, HL_LOCK_TYPE, HL_LAST_HEARTBEAT, HL_ACQUIRED_AT, HL_USER, HL_HOST
'8496355', '1', '78461824', 'default', 'data_http', 'dt=20160821', 'a', 'r', 
'0', '1471768489000', 'storm', 
'sorm-data03.msk.mts.ru<http://sorm-data03.msk.mts.ru>'

TXNS, TXN_COMPONENTS:
# TXN_ID, TXN_STATE, TXN_STARTED, TXN_LAST_HEARTBEAT, TXN_USER, TXN_HOST, 
TC_TXNID, TC_DATABASE, TC_TABLE, TC_PARTITION
78461824, a, 1471762974000, 1471768488000, storm, 
sorm-data03.msk.mts.ru<http://sorm-data03.msk.mts.ru>, 78461824, default, 
data_http, dt=201608213


Good news, I've found a log when it all started:

Here comes heartbeat:
2016-08-21 11:34:48,934 DEBUG [pool-3-thread-155]: txn.TxnHandler 
(TxnHandler.java:heartbeatTxn(1882)) - Going to execute update <update TXNS set 
txn_last_heartbeat = 1471768488000 where txn_id = 78461824 and txn_state = 'o'>

Concurrently we are trying to abort transaction
2016-08-21 11:34:48,907 DEBUG [DeadTxnReaper-0]: txn.TxnHandler 
(TxnHandler.java:abortTxns(1429)) - Going to execute update <delete from 
HIVE_LOCKS where hl_txnid in 
(78461824,78461825,78461826,78461827,78461828,78461829)>
2016-08-21 11:34:48,943 DEBUG [DeadTxnReaper-0]: txn.TxnHandler 
(TxnHandler.java:abortTxns(1446)) - Going to execute update <update TXNS set 
txn_state = 'a' where txn_state = 'o' and txn_id in 
(78461824,78461825,78461826,78461827,78461828,78461829) and txn_last_heartbeat 
< 1471764233000>
2016-08-21 11:34:48,957 DEBUG [pool-3-thread-11]: txn.TxnHandler 
(TxnHandler.java:abortTxns(1429)) - Going to execute update <delete from 
HIVE_LOCKS where hl_txnid in (78461824)>
2016-08-21 11:34:48,979 DEBUG [pool-3-thread-11]: txn.TxnHandler 
(TxnHandler.java:abortTxns(1446)) - Going to execute update <update TXNS set 
txn_state = 'a' where txn_state = 'o' and txn_id in (78461824)>

At this point transaction aborted and thers no lock.
But heartbeat thread still working and:
2016-08-21 11:34:49,025 DEBUG [pool-3-thread-155]: txn.TxnHandler 
(TxnHandler.java:lock(1546)) - Going to execute update <insert into 
TXN_COMPONENTS (tc_txnid, tc_database, tc_table, tc_partition) values 
(78461824, 'default', 'data_http', 'dt=20160821')>
2016-08-21 11:34:49,027 DEBUG [pool-3-thread-155]: txn.TxnHandler 
(TxnHandler.java:lock(1576)) - Going to execute update <insert into HIVE_LOCKS  
(hl_lock_ext_id, hl_lock_int_id, hl_txnid, hl_db, hl_table, hl_partition, 
hl_lock_state, hl_lock_type, hl_last_heartbeat, hl_user, hl_host) values 
(8496355, 1,78461824, 'default', 'data_http', 'dt=20160821', 'w', 'r', 0, 
'storm', 'sorm-data03.msk.mts.ru<http://sorm-data03.msk.mts.ru>')>
2016-08-21 11:34:49,029 DEBUG [pool-3-thread-155]: txn.TxnHandler 
(TxnHandler.java:checkLock(1696)) -   lockid:8496355 intLockId:1 txnid:78461824 
db:default table:data_http partition:dt=20160821 state:WAITING type:SHARED_READ


So I guess, that it's race condition between heartbeat thread and TxnReaper 
thread. Last heartbeat information in HIVE_LOCKS table differs from TXNS table.
Full log here<https://drive.google.com/open?id=0ByB92PAoAkrKSFFaQjBNSnNXZFk>.

On Tue, Aug 23, 2016 at 8:20 PM, Eugene Koifman 
<ekoif...@hortonworks.com<mailto:ekoif...@hortonworks.com>> wrote:
your query is “select * from HIVE_LOCKS” but the output is not from HIVE_LOCKS.
What entries do you have in HIVE_LOCKS for this txn_id?

If all you see is an entry in TXN table in ‘a’ state – that is OK.  that just 
mean that this transaction was aborted.

Eugene

From: Igor Kuzmenko <f1she...@gmail.com<mailto:f1she...@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Monday, August 22, 2016 at 8:27 AM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Hive transaction doesn't release lock.

Hello, I'm using Apache Hive 1.2.1 and Apache Storm to stream data in hive 
table.
After making some tests I tried to truncate my table, but sql execution doesn't 
complete because of the lock on table:

select * from HIVE_LOCKS;

# TXN_ID, TXN_STATE, TXN_STARTED, TXN_LAST_HEARTBEAT, TXN_USER, TXN_HOST, 
TC_TXNID, TC_DATABASE, TC_TABLE, TC_PARTITION
'78461824', 'a', '1471762974000', '1471768488000', 'storm', 
'sorm-data03.msk.mts.ru', '78461824', 'default', 'data_http', 'dt=20160821'



At the same time:

select * from TXNS, TXN_COMPONENTS
where 1=1
and TXNS.TXN_ID = TXN_COMPONENTS.TC_TXNID
and TXNS.TXN_ID = 78461824
;


# TXN_ID, TXN_STATE, TXN_STARTED, TXN_LAST_HEARTBEAT, TXN_USER, TXN_HOST, 
TC_TXNID, TC_DATABASE, TC_TABLE, TC_PARTITION
78461824, a, 1471762974000, 1471768488000, storm, sorm-data03.msk.mts.ru, 
78461824, default, data_http, dt=20160821


Txn 78461824 is in aborted state, but still keep lock on table 'data_http'. How 
can I avoid this? What should I do to get rid of that lock?


Reply via email to