hoo added a comment.
In T297347#7648387 <https://phabricator.wikimedia.org/T297347#7648387>, @Lucas_Werkmeister_WMDE wrote: > The script looks alright to me – I remember reading something about how `ORDER BY RAND()` isn’t an ideal way to shuffle a collection (especially depending on the sorting algorithm), but it’s probably good enough here. Sadly, AFAIK, MariaDB has now real sampling options (TABLESAMPLE is not implemented), so this is the only thing that came to my mind. An (easy to implement) alternative that I can think of, that will work with this many rows, would be to just pick random revision ids in the range (and obviously discard everything that doesn't fit our criteria) up until we collected 10k valid revisions. TASK DETAIL https://phabricator.wikimedia.org/T297347 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo Cc: Ladsgroup, Lucas_Werkmeister_WMDE, Michael, Manuel, Aklapper, Lydia_Pintscher, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- [email protected] To unsubscribe send an email to [email protected]
