[Wikidata-bugs] [Maniphest] [Changed Subscribers] T248308: Analyse a small sample of the most often used query patterns on WDQS

GoranSMilovanovic Mon, 04 May 2020 18:03:20 -0700

GoranSMilovanovic added a subscriber: Samantha_Alipio_WMDE.
GoranSMilovanovic added a comment.



  @WMDE-leszek @darthmon_wmde @Lydia_Pintscher @Addshore @Gehel 
@Samantha_Alipio_WMDE
  
  This could be useful for tomorrow's discussion on repeated queries:
  
  F31802788: queries_Clustered_3000.csv 
<https://phabricator.wikimedia.org/F31802788>
  
  Columns:
  
  - `uniqueSparqlId`: forget it
  - `sparql`: a SPARQL query
  - `Cluster`: a cluster to which the respective query belongs to.
  
  What is this:
  
  - The most frequently observed 3,000 SPARQL queries
  - from an approx. 1M sample of queries observed at the WDQS endpoint
  - were selected and than grouped into clusters
  - by relying on their similarity across the features that describe them;
  - the features were previously selected by XGBoost (see above: all the 
struggle to find a set of features that can predict the query processing time).
  
  How can this be used?
  
  - Similar queries belong to the same cluster (column: `Cluster` in the 
dataset)
  - once again: we are looking at the 3,000 most frequently repeated queries;
  - so we get to observe structurally similar queries that were often used -->
  - --> ideas on how to approach their optimization
  - (e.g. pre-compute a set of results that matches many structurally similar 
queries --> serve from an API, Elastic, something...)

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331

_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T248308: Analyse a small sample of the most often used query patterns on WDQS

Reply via email to