GoranSMilovanovic added a comment.
@Lydia_Pintscher We forgot to mention this task in our recent 1:1. In the meantime, I've tested a 10% daily queries sample and the statistics of the smaller, previously used 1% daily queries sample, turn out to be quite representative. However, if tabulation - e.g. counts and average query response times, and similar, per user agent - is really all that we need here, then we do not need to sample anything at all, just let PySpark do it in the Analytics Cluster and follow everything up to some amount of time in the past. TASK DETAIL https://phabricator.wikimedia.org/T248308 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
