| daniel added a comment. |
@jcrespo @hoo: ick, 14 million rows? And this isn't optimized away because of the DISTINCT?
The intent of the query is: "for the given set of entity IDs, tell me which ones are in the table (at least once)". So the total result will never be larger than the number of IDs in the query.
Instead of looking up batches, we could check each ID individually:
SELECT eu_entity_id FROM wbc_entity_usage WHERE eu_entity_id = @id LIMIT 1;
The crucial bit here is the LIMIT. Doing this for each ID would perform a lot better than the "bad" case I think. But how does performance compare for the typical case?
Also: Would it help to add a limit to the batched query? Would it be planned/executed differently? I mean like this:
SELECT DISTINCT eu_entity_id FROM `wbc_entity_usage` WHERE eu_entity_id IN ('Q148475', 'Q54919', 'Q423048', 'Q2494649', 'Q13219454', 'Q131454', 'Q36578', 'Q1798125', 'Q2597810', 'Q19938912', 'Q193563', 'Q2553334', 'Q477675', 'Q623578', 'Q19675', 'Q13481868', 'Q384602') LIMIT 17;With 17 being the number if IDs in the set. Does this help?
TASK DETAIL
EMAIL PREFERENCES
To: daniel
Cc: aaron, aude, daniel, hoo, Aklapper, jcrespo, Vali.matei, Minhnv-2809, Volans, D3r1ck01, Izno, Luke081515, Wikidata-bugs, GWicke, Mbch331, Jay8g, Krenair
Cc: aaron, aude, daniel, hoo, Aklapper, jcrespo, Vali.matei, Minhnv-2809, Volans, D3r1ck01, Izno, Luke081515, Wikidata-bugs, GWicke, Mbch331, Jay8g, Krenair
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
