aaron added a comment.
The JOIN for query #2 does not seem to have an index.
wbqev_identifier_properties has:
PRIMARY KEY (identifier_pid, dump_id)
...but nothing like (dump_id). Is the wbqev_identifier_properties table going
to pruned of older dumps or will it just keep growing?
Query #3 would require a few clever index dives to run well. I'm not sure how
smart MariaDB is here. Has anyone tried "EXPLAIN format=json" with a good
dataset (doesn't have to be full size, but a few 100k items). My fear would be
having just the (dump_id) index part used and scanning happening for the other
two IN()s. On the revision table, two IN()s work fine:
EXPLAIN EXTENDED SELECT * FROM revision FORCE INDEX(rev_page_id) WHERE
rev_page IN (5043734,3535,6234) AND rev_id IN (343535,23255,33626);
stdClass Object
(
[id] => 1
[select_type] => SIMPLE
[table] => revision
[type] => range
[possible_keys] => rev_page_id
[key] => rev_page_id
[key_len] => 8
[ref] =>
[rows] => 9
[filtered] => 100.00
[Extra] => Using index condition
)
For wbqev_external_data, the only index with pid has it as the third part of
the index.
CREATE INDEX /*i*/dump_id_external_id_pid ON /*_*/wbqev_external_data
(dump_id, external_id, pid);
...so mariadb will need to handle 3 levels of IN nesting to avoid scanning
extra rows. It may work just fine, though it would be nice to know for sure.
I'm having trouble finding documentation about wbqev_external_data. The PK is
just an opaque rowid field. How many rows might there be for any given
(dump_id, external_id, pid) combination? What decides that number?
TASK DETAIL
https://phabricator.wikimedia.org/T103912
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: aaron
Cc: Lydia_Pintscher, aaron, Wikibase-Quality-External-Validation, Aklapper,
Liuxinyu970226, Andreasburmeister, csteipp, Tamslo, Jonaskeutel, JanZerebecki,
soeren.oldag, dpatrick, Wikidata-bugs, aude, Krenair
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs