aaron added a comment.

The JOIN for query #2 does not seem to have an index. 
wbqev_identifier_properties has:

  PRIMARY KEY (identifier_pid, dump_id)

...but nothing like (dump_id). Is the wbqev_identifier_properties table going 
to pruned of older dumps or will it just keep growing?

Query #3 would require a few clever index dives to run well. I'm not sure how 
smart MariaDB is here. Has anyone tried "EXPLAIN format=json" with a good 
dataset (doesn't have to be full size, but a few 100k items). My fear would be 
having just the (dump_id) index part used and scanning happening for the other 
two IN()s. On the revision table, two IN()s work fine:

  EXPLAIN EXTENDED SELECT * FROM revision FORCE INDEX(rev_page_id) WHERE 
rev_page IN (5043734,3535,6234) AND rev_id IN (343535,23255,33626);
  stdClass Object
  (
      [id] => 1
      [select_type] => SIMPLE
      [table] => revision
      [type] => range
      [possible_keys] => rev_page_id
      [key] => rev_page_id
      [key_len] => 8
      [ref] => 
      [rows] => 9
      [filtered] => 100.00
      [Extra] => Using index condition
  )

For wbqev_external_data, the only index with pid has it as the third part of 
the index.

  CREATE INDEX /*i*/dump_id_external_id_pid ON /*_*/wbqev_external_data 
(dump_id, external_id, pid);

...so mariadb will need to handle 3 levels of IN nesting to avoid scanning 
extra rows. It may work just fine, though it would be nice to know for sure.

I'm having trouble finding documentation about wbqev_external_data. The PK is 
just an opaque rowid field. How many rows might there be for any given 
(dump_id, external_id, pid) combination? What decides that number?


TASK DETAIL
  https://phabricator.wikimedia.org/T103912

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: aaron
Cc: Lydia_Pintscher, aaron, Wikibase-Quality-External-Validation, Aklapper, 
Liuxinyu970226, Andreasburmeister, csteipp, Tamslo, Jonaskeutel, JanZerebecki, 
soeren.oldag, dpatrick, Wikidata-bugs, aude, Krenair



_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to