| jcrespo added a comment. |
@daniel I think this is a case of prematurely optimizing. It is true that things like:
foreach ... { 'SELECT' }Are usually considered bad practices, but for trying to minimize "round-trip time", we are actually doing a way worse query. Your query cannot know that with only 1 result per IN value it would be enough- it is implemented using a group by.
Look at this quick script I wrote:
time while read id; do mysql -h db1068 commonswiki -e "SELECT eu_entity_id FROM wbc_entity_usage WHERE eu_entity_id = '$id' LIMIT 1"; done < eu_entity_id eu_entity_id Q148475 eu_entity_id Q54919 eu_entity_id Q423048 eu_entity_id Q2494649 eu_entity_id Q13219454 eu_entity_id Q131454 eu_entity_id Q36578 eu_entity_id Q1798125 eu_entity_id Q2597810 eu_entity_id Q19938912 eu_entity_id Q193563 eu_entity_id Q2553334 eu_entity_id Q477675 eu_entity_id Q623578 eu_entity_id Q19675 eu_entity_id Q13481868 eu_entity_id Q384602 real 0m0.119s user 0m0.028s sys 0m0.048sDespite creating a *new connection* each time, it only takes 0.1 seconds (and I am connecting and disconnecting 15 times! It will take even less if we are not connecting every time, as a normal script would do).
Let's go with the simpler approach (point SELECT queries), and we can later try to optimize the round-trip time if needed. InnoDB is very, very fast to do index lookups thanks to internal hash tables.
Of course, that was not a trivial example, but this happens many times- we assume only a few records per value exist, when in reality, uses explodes to several millions quickly.
Cc: aaron, aude, daniel, hoo, Aklapper, jcrespo, Vali.matei, Minhnv-2809, Volans, D3r1ck01, Izno, Luke081515, Wikidata-bugs, GWicke, Mbch331, Jay8g, Krenair
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
