[
https://issues.apache.org/jira/browse/YARN-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17727603#comment-17727603
]
ASF GitHub Bot commented on YARN-8337:
--------------------------------------
slfan1989 commented on code in PR #5673:
URL: https://github.com/apache/hadoop/pull/5673#discussion_r1210495371
##########
hadoop-yarn-project/hadoop-yarn/bin/FederationStateStore/MySQL/FederationStateStoreStoredProcs.sql:
##########
@@ -92,12 +92,9 @@ CREATE PROCEDURE sp_addApplicationHomeSubCluster(
IN applicationContext_IN BLOB,
OUT storedHomeSubCluster_OUT varchar(256), OUT rowCount_OUT int)
BEGIN
- INSERT INTO applicationsHomeSubCluster
- (applicationId, homeSubCluster, createTime, applicationContext)
- (SELECT applicationId_IN, homeSubCluster_IN, NOW(), applicationContext_IN
- FROM applicationsHomeSubCluster
- WHERE applicationId = applicationId_IN
- HAVING COUNT(*) = 0 );
+ INSERT IGNORE INTO applicationsHomeSubCluster(
Review Comment:
Thank you very much for your question! I will try to find a better way to
handle this code to make it run more smoothly. In the process of submitting
applications, we have added retry logic so that even if this part of the code
fails, we will not miss application information. I will set up a complete Mysql
test environment and submit a large number of apps in a multi-session situation
to confirm the performance.
> [FederationStateStore - MySql] Improve ApplicationHome Mysql Script.
> --------------------------------------------------------------------
>
> Key: YARN-8337
> URL: https://issues.apache.org/jira/browse/YARN-8337
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: federation, router
> Reporter: Jianchao Jia
> Assignee: Shilun Fan
> Priority: Major
> Labels: pull-request-available
> Attachments: YARN-8337.001.patch, YARN-8337.002.patch
>
>
> We use mysql innodb as the state store engine,in router log we found dead
> lock error like below:
> {code:java}
> [2018-05-21T15:41:40.383+08:00] [ERROR] [IPC Server handler 25 on 8050] :
> Unable to insert the newly generated application
> application_1526295230627_127402
> com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock
> found when trying to get lock; try restarting transaction
> at sun.reflect.GeneratedConstructorAccessor107.newInstance(Unknown
> Source)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
> at com.mysql.jdbc.Util.getInstance(Util.java:408)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:952)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
> at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
> at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
> at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
> at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
> at
> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
> at
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079)
> at
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013)
> at
> com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104)
> at
> com.mysql.jdbc.CallableStatement.executeLargeUpdate(CallableStatement.java:2418)
> at
> com.mysql.jdbc.CallableStatement.executeUpdate(CallableStatement.java:887)
> at
> com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
> at
> com.zaxxer.hikari.pool.HikariProxyCallableStatement.executeUpdate(HikariProxyCallableStatement.java)
> at
> org.apache.hadoop.yarn.server.federation.store.impl.SQLFederationStateStore.addApplicationHomeSubCluster(SQLFederationStateStore.java:547)
> {code}
> Use "show engine innodb status;" command to find what happens
> {code:java}
> 2018-05-21 15:41:40 7f4685870700
> *** (1) TRANSACTION:
> TRANSACTION 241131538, ACTIVE 0 sec inserting, thread declared inside InnoDB
> 4999
> mysql tables in use 2, locked 2
> LOCK WAIT 4 lock struct(s), heap size 1184, 2 row lock(s)
> MySQL thread id 7602335, OS thread handle 0x7f46858f2700, query id 2919792534
> 192.168.1.138 federation executing
> INSERT INTO applicationsHomeSubCluster
> (applicationId,homeSubCluster)
> (SELECT applicationId_IN, homeSubCluster_IN
> FROM applicationsHomeSubCluster
> WHERE applicationId = applicationId_IN
> HAVING COUNT(*) = 0 )
> *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
> RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table
> `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131538
> lock_mode X locks gap before rec insert intention waiting
> Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info
> bits 0
> 0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734;
> asc application_1526295230627_1274; (total 31 bytes);
> 1: len 6; hex 00000ba5f32d; asc -;;
> 2: len 7; hex dd000000280110; asc ( ;;
> 3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;
> *** (2) TRANSACTION:
> TRANSACTION 241131539, ACTIVE 0 sec inserting, thread declared inside InnoDB
> 4999
> mysql tables in use 2, locked 2
> 4 lock struct(s), heap size 1184, 2 row lock(s)
> MySQL thread id 7600638, OS thread handle 0x7f4685870700, query id 2919792535
> 192.168.1.138 federation executing
> INSERT INTO applicationsHomeSubCluster
> (applicationId,homeSubCluster)
> (SELECT applicationId_IN, homeSubCluster_IN
> FROM applicationsHomeSubCluster
> WHERE applicationId = applicationId_IN
> HAVING COUNT(*) = 0 )
> *** (2) HOLDS THE LOCK(S):
> RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table
> `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131539
> lock mode S locks gap before rec
> Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info
> bits 0
> 0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734;
> asc application_1526295230627_1274; (total 31 bytes);
> 1: len 6; hex 00000ba5f32d; asc -;;
> 2: len 7; hex dd000000280110; asc ( ;;
> 3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;
> *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
> RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table
> `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131539
> lock_mode X locks gap before rec insert intention waiting
> Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info
> bits 0
> 0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734;
> asc application_1526295230627_1274; (total 31 bytes);
> 1: len 6; hex 00000ba5f32d; asc -;;
> 2: len 7; hex dd000000280110; asc ( ;;
> 3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;;
> *** WE ROLL BACK TRANSACTION (2)
> {code}
> PROCEDURE sp_addApplicationHomeSubCluster will create gap lock .
> insert into select where clause,if the applicationId does not exists in table
> applicationsHomeSubCluster ,it will create a gap lock。
> At this moment ,if other threads want to insert new records of the
> gap,deaklock may happen.
>
> To reproduce the situation , we use 3 sessions to insert different
> applicationId
> :application_1526528662010_001201,application_1526528662010_001202,application_1526528662010_001203
> !http://bit.jd.com/zhangmang/JDHadoop-2.7.1/uploads/78912e6d245b8009052fb24e73cbaf54/image.png!
>
> To fix this issue,we should use insert ignore into instead of insert into
> select
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]