Jianchao Jia created YARN-8337:
----------------------------------

             Summary: Deadlock Federation Router
                 Key: YARN-8337
                 URL: https://issues.apache.org/jira/browse/YARN-8337
             Project: Hadoop YARN
          Issue Type: Bug
          Components: federation, router
            Reporter: Jianchao Jia


We use mysql innodb as the state store engine,in router log we found dead lock 
error like below:
{code:java}
[2018-05-21T15:41:40.383+08:00] [ERROR] [IPC Server handler 25 on 8050] : 
Unable to insert the newly generated application 
application_1526295230627_127402
com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock 
found when trying to get lock; try restarting transaction
        at sun.reflect.GeneratedConstructorAccessor107.newInstance(Unknown 
Source)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
        at com.mysql.jdbc.Util.getInstance(Util.java:408)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:952)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
        at 
com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
        at 
com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079)
        at 
com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013)
        at 
com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104)
        at 
com.mysql.jdbc.CallableStatement.executeLargeUpdate(CallableStatement.java:2418)
        at 
com.mysql.jdbc.CallableStatement.executeUpdate(CallableStatement.java:887)
        at 
com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
        at 
com.zaxxer.hikari.pool.HikariProxyCallableStatement.executeUpdate(HikariProxyCallableStatement.java)
        at 
org.apache.hadoop.yarn.server.federation.store.impl.SQLFederationStateStore.addApplicationHomeSubCluster(SQLFederationStateStore.java:547)
{code}
Use "show engine innodb status;" command to find what happens 
{code:java}
2018-05-21 15:41:40 7f4685870700
*** (1) TRANSACTION:
TRANSACTION 241131538, ACTIVE 0 sec inserting, thread declared inside InnoDB 
4999
mysql tables in use 2, locked 2
LOCK WAIT 4 lock struct(s), heap size 1184, 2 row lock(s)
MySQL thread id 7602335, OS thread handle 0x7f46858f2700, query id 2919792534 
192.168.1.138 federation executing
INSERT INTO applicationsHomeSubCluster
(applicationId,homeSubCluster)
(SELECT applicationId_IN, homeSubCluster_IN
FROM applicationsHomeSubCluster
WHERE applicationId = applicationId_IN
HAVING COUNT(*) = 0 )
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table 
`guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131538 
lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; 
asc application_1526295230627_1274; (total 31 bytes);
1: len 6; hex 00000ba5f32d; asc -;;
2: len 7; hex dd000000280110; asc ( ;;
3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;

*** (2) TRANSACTION:
TRANSACTION 241131539, ACTIVE 0 sec inserting, thread declared inside InnoDB 
4999
mysql tables in use 2, locked 2
4 lock struct(s), heap size 1184, 2 row lock(s)
MySQL thread id 7600638, OS thread handle 0x7f4685870700, query id 2919792535 
192.168.1.138 federation executing
INSERT INTO applicationsHomeSubCluster
(applicationId,homeSubCluster)
(SELECT applicationId_IN, homeSubCluster_IN
FROM applicationsHomeSubCluster
WHERE applicationId = applicationId_IN
HAVING COUNT(*) = 0 )
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table 
`guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131539 
lock mode S locks gap before rec
Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; 
asc application_1526295230627_1274; (total 31 bytes);
1: len 6; hex 00000ba5f32d; asc -;;
2: len 7; hex dd000000280110; asc ( ;;
3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table 
`guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131539 
lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; 
asc application_1526295230627_1274; (total 31 bytes);
1: len 6; hex 00000ba5f32d; asc -;;
2: len 7; hex dd000000280110; asc ( ;;
3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;

*** WE ROLL BACK TRANSACTION (2)
{code}
PROCEDURE sp_addApplicationHomeSubCluster will create gap lock .

insert into select where clause,if the applicationId does not exists in table 
applicationsHomeSubCluster ,it will create a gap lock。

At this moment ,if other threads want to insert new records of the gap,deaklock 
may happen.

 

To reproduce the situation , we use 3 sessions to insert different 
applicationId 
:application_1526528662010_001201,application_1526528662010_001202,application_1526528662010_001203

!http://bit.jd.com/zhangmang/JDHadoop-2.7.1/uploads/78912e6d245b8009052fb24e73cbaf54/image.png!

 

To fix this issue,we should use insert ignore into  instead of insert into 
select 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to