| Marostegui added subscribers: Banyek, Marostegui. Marostegui added a comment. |
So taking a quick look at the schema differences between rc groups in different DCs, they look the same:
The rc slaves in codfw:
root@neodymium:~# diff -u <(mysql.py -hdb2085:3318 -e "show create table recentchanges\G" wikidatawiki) <(mysql.py -hdb2086:3318 -e "show create table recentchanges\G" wikidatawiki) root@neodymium:~#
The rc slaves in eqiad:
root@neodymium:~# diff -u <(mysql.py -hdb1101:3318 -e "show create table recentchanges\G" wikidatawiki) <(mysql.py -hdb1099:3318 -e "show create table recentchanges\G" wikidatawiki) root@neodymium:~#eqiad vs codfw:
root@neodymium:~# diff -u <(mysql.py -hdb1101:3318 -e "show create table recentchanges\G" wikidatawiki) <(mysql.py -hdb2085:3318 -e "show create table recentchanges\G" wikidatawiki) root@neodymium:~#API slaves eqiad:
root@neodymium:~# diff -u <(mysql.py -hdb1104 -e "show create table recentchanges\G" wikidatawiki) <(mysql.py -hdb1092 -e "show create table recentchanges\G" wikidatawiki) root@neodymium:~#API slaves codfw:
root@neodymium:~# diff -u <(mysql.py -hdb2081 -e "show create table recentchanges\G" wikidatawiki) <(mysql.py -hdb2080 -e "show create table recentchanges\G" wikidatawiki) root@neodymium:~#And as @Reedy points out, there are indexes differences between eqiad and codfw for the api hosts
eqiad vs codfw:root@neodymium:~# diff -u <(mysql.py -hdb2081 -e "show create table recentchanges\G" wikidatawiki) <(mysql.py -hdb1104 -e "show create table recentchanges\G" wikidatawiki) --- /dev/fd/63 2018-09-15 05:31:53.629312288 +0000 +++ /dev/fd/62 2018-09-15 05:31:53.629312288 +0000 @@ -34,10 +34,8 @@ KEY `rc_ip` (`rc_ip`), KEY `rc_ns_usertext` (`rc_namespace`,`rc_user_text`), KEY `rc_user_text` (`rc_user_text`,`rc_timestamp`), - KEY `tmp_2` (`rc_bot`,`rc_timestamp`), - KEY `tmp_3` (`rc_namespace`,`rc_timestamp`), KEY `rc_name_type_patrolled_timestamp` (`rc_namespace`,`rc_type`,`rc_patrolled`,`rc_timestamp`),Those two indexes tmp2 and tmp3 (T202167) aren't present on codfw, maybe they are messing up the query plan?
What is interesting to note is that the query plan for codfw is supposedly to better than the one in eqiad by looking at the number of rows for the API slaves:
[email protected][wikidatawiki]> explain SELECT rc_id,rc_timestamp,rc_namespace,rc_title,rc_cur_id,rc_type,rc_deleted,rc_this_oldid,rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>='20180914110000') AND rc_namespace IN ('0','120') AND rc_type IN ('0','1','3','6') ORDER BY rc_timestamp ASC,rc_id ASC LIMIT 101 ; +------+-------------+---------------+-------+-------------------------------------------------------------------------------------------------------------+--------------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+---------------+-------+-------------------------------------------------------------------------------------------------------------+--------------+---------+------+------+-------------+ | 1 | SIMPLE | recentchanges | index | rc_timestamp,rc_ns_usertext,tmp_3,rc_name_type_patrolled_timestamp,rc_ns_actor,rc_namespace_title_timestamp | rc_timestamp | 16 | NULL | 1767 | Using where | +------+-------------+---------------+-------+-------------------------------------------------------------------------------------------------------------+--------------+---------+------+------+-------------+ 1 row in set (0.04 sec)And:
[email protected][wikidatawiki]> explain SELECT rc_id,rc_timestamp,rc_namespace,rc_title,rc_cur_id,rc_type,rc_deleted,rc_this_oldid,rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>='20180914110000') AND rc_namespace IN ('0','120') AND rc_type IN ('0','1','3','6') ORDER BY rc_timestamp ASC,rc_id ASC LIMIT 101 ; +------+-------------+---------------+-------+-------------------------------------------------------------------------------------------------------+--------------+---------+------+---------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+---------------+-------+-------------------------------------------------------------------------------------------------------+--------------+---------+------+---------+------------------------------------+ | 1 | SIMPLE | recentchanges | range | rc_timestamp,rc_ns_usertext,rc_name_type_patrolled_timestamp,rc_ns_actor,rc_namespace_title_timestamp | rc_timestamp | 16 | NULL | 1079546 | Using index condition; Using where | +------+-------------+---------------+-------+-------------------------------------------------------------------------------------------------------+--------------+---------+------+---------+------------------------------------+ 1 row in set (0.00 sec)Even for the same index chosen rc_timestamp. Even if the index is not chosen (or looks like it wouldn't be chosen), those tmp_2, and tmp_3 indexes are messing up with the optimizer.
The query runtimes are:eqiad
101 rows in set (0.01 sec)codfw:
101 rows in set (27.94 sec)But what if we ignore those tmp_2 and tmp_3 indexes in codfw? Even though they are not supposed to be used (as per the explain)?:
Then the query runs as fast as it does in eqiad:[email protected][wikidatawiki]> SELECT rc_id,rc_timestamp,rc_namespace,rc_title,rc_cur_id,rc_type,rc_deleted,rc_this_oldid,rc_last_oldid FROM `recentchanges` IGNORE INDEX(tmp_2,tmp_3) WHERE (rc_timestamp>='20180914110000') AND rc_namespace IN ('0','120') AND rc_type IN ('0','1','3','6') ORDER BY rc_timestamp ASC,rc_id ASC LIMIT 101 ; +-----------+----------------+--------------+-----------+-----------+---------+------------+---------------+---------------+ | rc_id | rc_timestamp | rc_namespace | rc_title | rc_cur_id | rc_type | rc_deleted | rc_this_oldid | rc_last_oldid | +-----------+----------------+--------------+-----------+-----------+---------+------------+---------------+---------------+ <snip> 101 rows in set (0.04 sec)The solution then is to drop those two indexes on recentchanges table, which are not even on tables.sql (or on eqiad).
@Reedy @Krinkle @Smalyshev I assume nothing else in code is forced to use those indexes, as that would have been failing in eqiad since the indexes aren't present there but, maybe it is worth a quick grep in code just in case?. Specially for recentchanges related queries (as those indexes do exist on eqiad rc group slaves).
I could delete those two indexes from one of the API hosts on Monday, check that nothing really breaks and then go ahead and kill it from the other one?
And then we should also check and delete them from eqiad too?. These are the hosts with the indexes:root@neodymium:/home/marostegui# ./section s8 | while read host port ; do echo "$host:$port" ; mysql.py -h$host:$port -e "show create table recentchanges\G" wikidatawiki | egrep "tmp_2|tmp_3" ; done labsdb1011.eqiad.wmnet:3306 labsdb1010.eqiad.wmnet:3306 labsdb1009.eqiad.wmnet:3306 dbstore2001.codfw.wmnet:3318 KEY `tmp_2` (`rc_bot`,`rc_timestamp`), KEY `tmp_3` (`rc_namespace`,`rc_timestamp`), dbstore1002.eqiad.wmnet:3306 KEY `tmp_2` (`rc_bot`,`rc_timestamp`), KEY `tmp_3` (`rc_namespace`,`rc_timestamp`), db2094.codfw.wmnet:3318 db2086.codfw.wmnet:3318 KEY `tmp_2` (`rc_bot`,`rc_timestamp`), KEY `tmp_3` (`rc_namespace`,`rc_timestamp`), db2085.codfw.wmnet:3318 KEY `tmp_2` (`rc_bot`,`rc_timestamp`), KEY `tmp_3` (`rc_namespace`,`rc_timestamp`), db2083.codfw.wmnet:3306 KEY `tmp_2` (`rc_bot`,`rc_timestamp`), KEY `tmp_3` (`rc_namespace`,`rc_timestamp`), db2082.codfw.wmnet:3306 KEY `tmp_2` (`rc_bot`,`rc_timestamp`), KEY `tmp_3` (`rc_namespace`,`rc_timestamp`), db2081.codfw.wmnet:3306 KEY `tmp_2` (`rc_bot`,`rc_timestamp`), KEY `tmp_3` (`rc_namespace`,`rc_timestamp`), db2080.codfw.wmnet:3306 KEY `tmp_2` (`rc_bot`,`rc_timestamp`), KEY `tmp_3` (`rc_namespace`,`rc_timestamp`), db2079.codfw.wmnet:3306 KEY `tmp_2` (`rc_bot`,`rc_timestamp`), KEY `tmp_3` (`rc_namespace`,`rc_timestamp`), db2045.codfw.wmnet:3306 KEY `tmp_2` (`rc_bot`,`rc_timestamp`), KEY `tmp_3` (`rc_namespace`,`rc_timestamp`), db1124.eqiad.wmnet:3318 db1109.eqiad.wmnet:3306 db1104.eqiad.wmnet:3306 db1101.eqiad.wmnet:3318 -----> RC group KEY `tmp_2` (`rc_bot`,`rc_timestamp`), KEY `tmp_3` (`rc_namespace`,`rc_timestamp`), db1099.eqiad.wmnet:3318 -----> RC group KEY `tmp_2` (`rc_bot`,`rc_timestamp`), KEY `tmp_3` (`rc_namespace`,`rc_timestamp`), db1092.eqiad.wmnet:3306 db1087.eqiad.wmnet:3306 db1071.eqiad.wmnet:3306 KEY `tmp_2` (`rc_bot`,`rc_timestamp`), KEY `tmp_3` (`rc_namespace`,`rc_timestamp`),Doing a quick check on one of rc slaves sys schema shows that that index might actually being used (this is not 100% real, but it is a pretty good indication):
[email protected][sys]> select * from schema_unused_indexes where index_name like '%tmp%'; Empty set (0.06 sec)On an API host it is the same:
[email protected][sys]> select * from schema_unused_indexes where index_name like '%tmp%'; Empty set (0.06 sec)
Cc: Marostegui, Banyek, Reedy, gerritbot, Krinkle, Addshore, Yurik, jcrespo, Imarlier, Ladsgroup, Lydia_Pintscher, WMDE-leszek, Aklapper, Gehel, Smalyshev, AndyTan, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Davinaclare77, Adrian1985, Qtn1293, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Hfbn0, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Vali.matei, Lewizho99, Minhnv-2809, Zppix, Maathavan, Jonas, Xmlizer, Wong128hk, Luke081515, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Jay8g, Krenair, fgiunchedi
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
