daniel added a comment.

@jcrespo @hoo: ick, 14 million rows? And this isn't optimized away because of the DISTINCT?

The intent of the query is: "for the given set of entity IDs, tell me which ones are in the table (at least once)". So the total result will never be larger than the number of IDs in the query.

Instead of looking up batches, we could check each ID individually:

SELECT eu_entity_id FROM wbc_entity_usage WHERE eu_entity_id = @id LIMIT 1;

The crucial bit here is the LIMIT. Doing this for each ID would perform a lot better than the "bad" case I think. But how does performance compare for the typical case?

Also: Would it help to add a limit to the batched query? Would it be planned/executed differently? I mean like this:

SELECT DISTINCT eu_entity_id  FROM `wbc_entity_usage` 
WHERE eu_entity_id IN ('Q148475', 'Q54919', 'Q423048', 'Q2494649', 'Q13219454', 'Q131454', 'Q36578', 'Q1798125', 'Q2597810', 'Q19938912', 'Q193563', 'Q2553334', 'Q477675', 'Q623578', 'Q19675', 'Q13481868', 'Q384602')
LIMIT 17;

With 17 being the number if IDs in the set. Does this help?


TASK DETAIL
https://phabricator.wikimedia.org/T116404

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: daniel
Cc: aaron, aude, daniel, hoo, Aklapper, jcrespo, Vali.matei, Minhnv-2809, Volans, D3r1ck01, Izno, Luke081515, Wikidata-bugs, GWicke, Mbch331, Jay8g, Krenair
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to