Jianchao Jia created YARN-8337:
----------------------------------
Summary: Deadlock Federation Router
Key: YARN-8337
URL: https://issues.apache.org/jira/browse/YARN-8337
Project: Hadoop YARN
Issue Type: Bug
Components: federation, router
Reporter: Jianchao Jia
We use mysql innodb as the state store engine,in router log we found dead lock
error like below:
{code:java}
[2018-05-21T15:41:40.383+08:00] [ERROR] [IPC Server handler 25 on 8050] :
Unable to insert the newly generated application
application_1526295230627_127402
com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock
found when trying to get lock; try restarting transaction
at sun.reflect.GeneratedConstructorAccessor107.newInstance(Unknown
Source)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
at com.mysql.jdbc.Util.getInstance(Util.java:408)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:952)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
at
com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
at
com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079)
at
com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013)
at
com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104)
at
com.mysql.jdbc.CallableStatement.executeLargeUpdate(CallableStatement.java:2418)
at
com.mysql.jdbc.CallableStatement.executeUpdate(CallableStatement.java:887)
at
com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
at
com.zaxxer.hikari.pool.HikariProxyCallableStatement.executeUpdate(HikariProxyCallableStatement.java)
at
org.apache.hadoop.yarn.server.federation.store.impl.SQLFederationStateStore.addApplicationHomeSubCluster(SQLFederationStateStore.java:547)
{code}
Use "show engine innodb status;" command to find what happens
{code:java}
2018-05-21 15:41:40 7f4685870700
*** (1) TRANSACTION:
TRANSACTION 241131538, ACTIVE 0 sec inserting, thread declared inside InnoDB
4999
mysql tables in use 2, locked 2
LOCK WAIT 4 lock struct(s), heap size 1184, 2 row lock(s)
MySQL thread id 7602335, OS thread handle 0x7f46858f2700, query id 2919792534
192.168.1.138 federation executing
INSERT INTO applicationsHomeSubCluster
(applicationId,homeSubCluster)
(SELECT applicationId_IN, homeSubCluster_IN
FROM applicationsHomeSubCluster
WHERE applicationId = applicationId_IN
HAVING COUNT(*) = 0 )
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table
`guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131538
lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734;
asc application_1526295230627_1274; (total 31 bytes);
1: len 6; hex 00000ba5f32d; asc -;;
2: len 7; hex dd000000280110; asc ( ;;
3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;
*** (2) TRANSACTION:
TRANSACTION 241131539, ACTIVE 0 sec inserting, thread declared inside InnoDB
4999
mysql tables in use 2, locked 2
4 lock struct(s), heap size 1184, 2 row lock(s)
MySQL thread id 7600638, OS thread handle 0x7f4685870700, query id 2919792535
192.168.1.138 federation executing
INSERT INTO applicationsHomeSubCluster
(applicationId,homeSubCluster)
(SELECT applicationId_IN, homeSubCluster_IN
FROM applicationsHomeSubCluster
WHERE applicationId = applicationId_IN
HAVING COUNT(*) = 0 )
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table
`guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131539
lock mode S locks gap before rec
Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734;
asc application_1526295230627_1274; (total 31 bytes);
1: len 6; hex 00000ba5f32d; asc -;;
2: len 7; hex dd000000280110; asc ( ;;
3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table
`guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131539
lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734;
asc application_1526295230627_1274; (total 31 bytes);
1: len 6; hex 00000ba5f32d; asc -;;
2: len 7; hex dd000000280110; asc ( ;;
3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;
*** WE ROLL BACK TRANSACTION (2)
{code}
PROCEDURE sp_addApplicationHomeSubCluster will create gap lock .
insert into select where clause,if the applicationId does not exists in table
applicationsHomeSubCluster ,it will create a gap lock。
At this moment ,if other threads want to insert new records of the gap,deaklock
may happen.
To reproduce the situation , we use 3 sessions to insert different
applicationId
:application_1526528662010_001201,application_1526528662010_001202,application_1526528662010_001203
!http://bit.jd.com/zhangmang/JDHadoop-2.7.1/uploads/78912e6d245b8009052fb24e73cbaf54/image.png!
To fix this issue,we should use insert ignore into instead of insert into
select
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]