GoranSMilovanovic added a subscriber: Samantha_Alipio_WMDE. GoranSMilovanovic added a comment.
@WMDE-leszek @darthmon_wmde @Lydia_Pintscher @Addshore @Gehel @Samantha_Alipio_WMDE This could be useful for tomorrow's discussion on repeated queries: F31802788: queries_Clustered_3000.csv <https://phabricator.wikimedia.org/F31802788> Columns: - `uniqueSparqlId`: forget it - `sparql`: a SPARQL query - `Cluster`: a cluster to which the respective query belongs to. What is this: - The most frequently observed 3,000 SPARQL queries - from an approx. 1M sample of queries observed at the WDQS endpoint - were selected and than grouped into clusters - by relying on their similarity across the features that describe them; - the features were previously selected by XGBoost (see above: all the struggle to find a set of features that can predict the query processing time). How can this be used? - Similar queries belong to the same cluster (column: `Cluster` in the dataset) - once again: we are looking at the 3,000 most frequently repeated queries; - so we get to observe structurally similar queries that were often used --> - --> ideas on how to approach their optimization - (e.g. pre-compute a set of results that matches many structurally similar queries --> serve from an API, Elastic, something...) TASK DETAIL https://phabricator.wikimedia.org/T248308 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
