https://bugzilla.wikimedia.org/show_bug.cgi?id=53577

Sean Pringle <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |---

--- Comment #4 from Sean Pringle <[email protected]> ---
Larger WMF wikis are seeing more spikes of LogPager queries of which an example
follows. These have had some recent tuning by initially removing forced indexes
that no longer suited data set sizes, which has helped (10min down to ~100s),
but we need to do more.

mysql wmf db73 dewiki> explain SELECT /* IndexPager::buildQueryInfo (LogPager)
xxx.xxx.xxx.xxx */
log_id,log_type,log_action,log_timestamp,log_user,log_user_text,log_namespace,log_title,log_comment,log_params,log_deleted,user_id,user_name,user_editcount,ts_tags
FROM `logging` LEFT JOIN `user` ON ((log_user=user_id)) LEFT JOIN `tag_summary`
ON ((ts_log_id=log_id)) WHERE (log_type NOT IN ('suppress','spamblacklist'))
AND log_user = ? AND ((log_deleted & 4) = 0) AND (log_type != 'review') ORDER
BY log_timestamp DESC LIMIT 51\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: logging
         type: index
possible_keys: type_time,user_time,log_user_type_time
          key: times
      key_len: 16
          ref: NULL
         rows: 6705
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: user
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: dewiki.logging.log_user
         rows: 1
        Extra: 
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: tag_summary
         type: ref
possible_keys: ts_log_id,tag_summary_log_id
          key: ts_log_id
      key_len: 5
          ref: dewiki.logging.log_id
         rows: 20172                   <-- often much larger
        Extra: Using where
3 rows in set (0.29 sec)

The remaining problems are:

- The queries are fast enough until one hits cold data leading a slave to
bottleneck on disk. The buffer pool churn has a flow-on effect on other
normally fast queries which become false positives in the slow log. Especially
problematic when a crawler like MSNBOT triggers a spike of these.

- The tag_summary table often has a relatively low cardinality for ts_log_id
(many nulls, few ids).

- Sometimes a slave logs multiple identical copies of a query (always a form
filtered by log_user) at intervals of several seconds. Presumably someone
refreshing a page that is too slow to respond.

Possibilities:

- Pull out only log_id,log_user first to reduce the impact of the filesort and
grab user and tag data in follow-up batch query. If the query plan sitches to
using Index Condition Pushdown ("Using index condition" in explain), cold data
is less of an issue as fewer table rows are hit and fewer pages read.

- Limit the size of the dataset to be considered. By timestamp perhaps, last X
months maybe. For heavier users only?

- Improve the method of retrieving ts_tags using something other than
tag_summary. Or remove it entirely.

- Isolate these queries to specific slaves (like watchlist).

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to