hoo added a comment.

  In T297347#7648387 <https://phabricator.wikimedia.org/T297347#7648387>, 
@Lucas_Werkmeister_WMDE wrote:
  
  > The script looks alright to me – I remember reading something about how 
`ORDER BY RAND()` isn’t an ideal way to shuffle a collection (especially 
depending on the sorting algorithm), but it’s probably good enough here.
  
  Sadly, AFAIK, MariaDB has now real sampling options (TABLESAMPLE is not 
implemented), so this is the only thing that came to my mind.
  
  An (easy to implement) alternative that I can think of, that will work with 
this many rows, would be to just pick random revision ids in the range (and 
obviously discard everything that doesn't fit our criteria) up until we 
collected 10k valid revisions.

TASK DETAIL
  https://phabricator.wikimedia.org/T297347

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hoo
Cc: Ladsgroup, Lucas_Werkmeister_WMDE, Michael, Manuel, Aklapper, 
Lydia_Pintscher, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to