https://bugzilla.wikimedia.org/show_bug.cgi?id=70558

            Bug ID: 70558
           Summary: categorylinks InnoDB extended indexes behavior differs
                    between enwiki slaves
           Product: MediaWiki
           Version: 1.24-git
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: Unprioritized
         Component: Database
          Assignee: [email protected]
          Reporter: [email protected]
       Web browser: ---
   Mobile Platform: ---

The API runs the following query:

SELECT /* ApiQueryCategoryMembers::run */ 
cl_from,cl_sortkey,cl_type,page_namespace,page_title,cl_timestamp  FROM
page,categorylinks  FORCE INDEX (cl_timestamp) WHERE cl_to =
'Copy_to_Wikimedia_Commons_(bot-assessed)' AND (cl_from=page_id)  ORDER BY
cl_timestamp,cl_from LIMIT 501;

CREATE TABLE categorylinks (
  cl_from int(8) unsigned NOT NULL DEFAULT '0',
  cl_to varbinary(255) NOT NULL DEFAULT '',
  cl_sortkey varbinary(230) NOT NULL DEFAULT '',
  cl_timestamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE
CURRENT_TIMESTAMP,
  cl_sortkey_prefix varbinary(255) NOT NULL DEFAULT '',
  cl_collation varbinary(32) NOT NULL DEFAULT '',
  cl_type enum('page','subcat','file') NOT NULL DEFAULT 'page',
  UNIQUE KEY cl_from (cl_from,cl_to),
  KEY cl_timestamp (cl_to,cl_timestamp),
  KEY cl_collation (cl_collation),
  KEY cl_sortkey (cl_to,cl_type,cl_sortkey,cl_from)
) ENGINE=InnoDB DEFAULT CHARSET=binary

The forced cl_timestamp index is expected to utilize InnoDB extended indexes
(where secondary indexes also store the clustered primary key fields) to avoid
a filesort. Enwiki slaves up to MariaDB 5.5.34 do this correctly.

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: categorylinks
         type: ref
possible_keys: cl_timestamp
          key: cl_timestamp
      key_len: 257
          ref: const
         rows: 552330
        Extra: Using index condition; Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: page
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: enwiki.categorylinks.cl_from
         rows: 1
        Extra: 
2 rows in set (0.38 sec)

Slaves running 5.5.36+ revert to using a filesort despite apparently having
identical categorylinks tables. Handler% stats suport the observation.

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: categorylinks
         type: ref
possible_keys: cl_from,cl_timestamp,cl_sortkey
          key: cl_sortkey
      key_len: 257
          ref: const
         rows: 466736
        Extra: Using index condition; Using where; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: page
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: enwiki.categorylinks.cl_from
         rows: 1
        Extra: 
2 rows in set (0.32 sec)

Important to note that the first enwiki 5.5.36 slave, from which others were
cloned, did have data dumped and reloaded rather than being upgraded in place,
so categorylinks was definitely recreated. Possibly something has changed with
how the first UNIQUE index can be chosen as primary key, and the 6-byte rowid
is in use instead?

Simply rebuilding the table has no effect:

ALTER TABLE categorylinks ENGINE=InnoDB;

But making the primary key explicit restores the expected behavior:

ALTER TABLE categorylinks DROP INDEX cl_from, ADD PRIMARY KEY (cl_from, cl_to);

Need to investigate what caused the change in behavior, and whether this
affects any other mediawiki tables.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to