Anomie added a comment.

wikiadmin@db1101(wikidatawiki)>explain SELECT  rev_id,rev_page,rev_text_id,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,COALESCE( comment_rev_comment.comment_text, rev_comment ) AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,rev_user,rev_user_text,NULL AS `rev_actor`,rev_content_format,rev_content_model,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,user_name,page_is_new,(SELECT  GROUP_CONCAT(ct_tag SEPARATOR ',')  FROM `change_tag`    WHERE ct_rev_id=rev_id  ) AS `ts_tags`,ores_damaging_cls.oresc_probability AS `ores_damaging_score`,0.412 AS `ores_damaging_threshold`  FROM `revision` LEFT JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id)) LEFT JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id)) INNER JOIN `page` ON ((page_id = rev_page)) LEFT JOIN `user` ON ((rev_user != 0) AND (user_id = rev_user)) LEFT JOIN `ores_classification` `ores_damaging_cls` ON (ores_damaging_cls.oresc_model = '9' AND (ores_damaging_cls.oresc_rev=rev_id) AND ores_damaging_cls.oresc_class = '1')   WHERE ((rev_user = 2752938)) AND (rev_parent_id = 0) AND (page_namespace = '0') AND ((rev_deleted & 4) = 0)  ORDER BY rev_timestamp DESC LIMIT 51;
+------+--------------------+---------------------+--------+---------------------------------------------------------------+-----------------------+---------+-----------------------------------------------------+----------+---------------------------------+
| id   | select_type        | table               | type   | possible_keys                                                 | key                   | key_len | ref                                                 | rows     | Extra                           |
+------+--------------------+---------------------+--------+---------------------------------------------------------------+-----------------------+---------+-----------------------------------------------------+----------+---------------------------------+
|    1 | PRIMARY            | page                | ref    | PRIMARY,name_title                                            | name_title            | 4       | const                                               | 23650821 | Using temporary; Using filesort |
|    1 | PRIMARY            | revision            | ref    | user_timestamp,rev_page_id,page_timestamp,page_user_timestamp | page_user_timestamp   | 8       | wikidatawiki.page.page_id,const                     |        1 | Using where                     |
|    1 | PRIMARY            | temp_rev_comment    | ref    | PRIMARY,revcomment_rev                                        | PRIMARY               | 4       | wikidatawiki.revision.rev_id                        |        1 | Using index                     |
|    1 | PRIMARY            | comment_rev_comment | eq_ref | PRIMARY                                                       | PRIMARY               | 8       | wikidatawiki.temp_rev_comment.revcomment_comment_id |        1 | Using where                     |
|    1 | PRIMARY            | user                | const  | PRIMARY                                                       | PRIMARY               | 4       | const                                               |        1 | Using where                     |
|    1 | PRIMARY            | ores_damaging_cls   | eq_ref | oresc_rev_model_class                                         | oresc_rev_model_class | 7       | wikidatawiki.revision.rev_id,const,const            |        1 |                                 |
|    2 | DEPENDENT SUBQUERY | change_tag          | ref    | change_tag_rev_tag                                            | change_tag_rev_tag    | 5       | wikidatawiki.revision.rev_id                        |        1 | Using where; Using index        |
+------+--------------------+---------------------+--------+---------------------------------------------------------------+-----------------------+---------+-----------------------------------------------------+----------+---------------------------------+
7 rows in set (0.00 sec)

It decides it wants to fetch all the pages in namespace 0, then use the page_user_timestamp to get all the revisions for those pages by the user in question. It's estimating 23 million rows it'll have to touch, out of 47 million total in namespace 0.

It could probably do better here, by using the user_timestamp index to fetch the revisions in order and filter them:

wikiadmin@db1101(wikidatawiki)>explain SELECT  rev_id,rev_page,rev_text_id,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,COALESCE( comment_rev_comment.comment_text, rev_comment ) AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,rev_user,rev_user_text,NULL AS `rev_actor`,rev_content_format,rev_content_model,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,user_name,page_is_new,(SELECT  GROUP_CONCAT(ct_tag SEPARATOR ',')  FROM `change_tag`    WHERE ct_rev_id=rev_id  ) AS `ts_tags`,ores_damaging_cls.oresc_probability AS `ores_damaging_score`,0.412 AS `ores_damaging_threshold`  FROM `revision` USE INDEX (user_timestamp) LEFT JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id)) LEFT JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id)) INNER JOIN `page` ON ((page_id = rev_page)) LEFT JOIN `user` ON ((rev_user != 0) AND (user_id = rev_user)) LEFT JOIN `ores_classification` `ores_damaging_cls` ON (ores_damaging_cls.oresc_model = '9' AND (ores_damaging_cls.oresc_rev=rev_id) AND ores_damaging_cls.oresc_class = '1')   WHERE ((rev_user = '2752938')) AND (rev_parent_id = 0) AND (page_namespace = '0') AND ((rev_deleted & 4) = 0)  ORDER BY rev_timestamp DESC LIMIT 51;
+------+--------------------+---------------------+--------+------------------------+-----------------------+---------+-----------------------------------------------------+----------+--------------------------+
| id   | select_type        | table               | type   | possible_keys          | key                   | key_len | ref                                                 | rows     | Extra                    |
+------+--------------------+---------------------+--------+------------------------+-----------------------+---------+-----------------------------------------------------+----------+--------------------------+
|    1 | PRIMARY            | revision            | ref    | user_timestamp         | user_timestamp        | 4       | const                                               | 32698864 | Using where              |
|    1 | PRIMARY            | user                | eq_ref | PRIMARY                | PRIMARY               | 4       | wikidatawiki.revision.rev_user                      |        1 | Using where              |
|    1 | PRIMARY            | temp_rev_comment    | ref    | PRIMARY,revcomment_rev | PRIMARY               | 4       | wikidatawiki.revision.rev_id                        |        1 | Using index              |
|    1 | PRIMARY            | comment_rev_comment | eq_ref | PRIMARY                | PRIMARY               | 8       | wikidatawiki.temp_rev_comment.revcomment_comment_id |        1 | Using where              |
|    1 | PRIMARY            | ores_damaging_cls   | eq_ref | oresc_rev_model_class  | oresc_rev_model_class | 7       | wikidatawiki.revision.rev_id,const,const            |        1 |                          |
|    1 | PRIMARY            | page                | eq_ref | PRIMARY,name_title     | PRIMARY               | 4       | wikidatawiki.revision.rev_page                      |        1 | Using where              |
|    2 | DEPENDENT SUBQUERY | change_tag          | ref    | change_tag_rev_tag     | change_tag_rev_tag    | 5       | wikidatawiki.revision.rev_id                        |        1 | Using where; Using index |
+------+--------------------+---------------------+--------+------------------------+-----------------------+---------+-----------------------------------------------------+----------+--------------------------+
7 rows in set (0.00 sec)

I suppose it's looking at the worst case: if nothing passes the filters then it thinks it'll have to go through 32 million rows. It turns out this bot actually has only 17 million revision total. On the other hand, it could go through many fewer rows since this query can stop early once the LIMIT is reached.

It turns out this only touches 360 rows for this case:

wikiadmin@db1101(wikidatawiki)>SELECT  rev_id,rev_page,rev_text_id,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,COALESCE( comment_rev_comment.comment_text, rev_comment ) AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,rev_user,rev_user_text,NULL AS `rev_actor`,rev_content_format,rev_content_model,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,user_name,page_is_new,(SELECT  GROUP_CONCAT(ct_tag SEPARATOR ',')  FROM `change_tag`    WHERE ct_rev_id=rev_id  ) AS `ts_tags`,ores_damaging_cls.oresc_probability AS `ores_damaging_score`,0.412 AS `ores_damaging_threshold`  FROM `revision` USE INDEX (user_timestamp) LEFT JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id)) LEFT JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id)) INNER JOIN `page` ON ((page_id = rev_page)) LEFT JOIN `user` ON ((rev_user != 0) AND (user_id = rev_user)) LEFT JOIN `ores_classification` `ores_damaging_cls` ON (ores_damaging_cls.oresc_model = '9' AND (ores_damaging_cls.oresc_rev=rev_id) AND ores_damaging_cls.oresc_class = '1')   WHERE ((rev_user = '2752938')) AND (rev_parent_id = 0) AND (page_namespace = '0') AND ((rev_deleted & 4) = 0)  ORDER BY rev_timestamp DESC LIMIT 51;

[... SNIP ...]

wikiadmin@db1101(wikidatawiki)>SHOW STATUS like 'Hand%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 3     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 207   |
| Handler_read_last          | 0     |
| Handler_read_next          | 101   |
| Handler_read_prev          | 52    |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
26 rows in set (0.08 sec)

T47619 is not really related. The problematic query there had (rev_user >18370655) rather than an equality, which as noted would make the query there filesort when used with the user_timestamp index.

What probably made the difference is that rMW27c61fb1e94d: Add `actor` table and code to start using it recently removed the index forcing when there's only a single user. We could add that back in easily enough for now, although this may happen again during the actor table migration when the query has to check for either the actor ID or the user ID/name.


TASK DETAIL
https://phabricator.wikimedia.org/T189026

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Anomie
Cc: Marostegui, Anomie, Aklapper, jcrespo, Rayssa-, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Minhnv-2809, Luke081515, Wikidata-bugs, aude, Mbch331, Jay8g, Krenair
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to