[Wikidata-bugs] [Maniphest] T266022: Programmatically categorize WDQS queries by potential alternative solution

CBogen Tue, 20 Oct 2020 07:16:02 -0700

CBogen created this task.
CBogen added projects: Wikidata-Query-Service, Wikidata.
Restricted Application added a subscriber: Aklapper.


TASK DESCRIPTION
  As a WDQS administrator, I want to be able to categorize WDQS buckets by 
their potential alternative solutions so that we can prioritize the next steps 
for scaling WDQS.
  
  Based on the work done in T264194 
<https://phabricator.wikimedia.org/T264194>, we want to programmatically 
categorize (1000?) WDQS queries into the categories defined in the WDQS user 
flow document 
<https://docs.google.com/drawings/d/1TVghcFdaGSyer_9sisWlRF5-4Ic2eDKqEZOz6aVxOds/edit?ts=5f7f54a1>.
 Then we'd like to further note how many "expensive" queries are in each 
category.  We learned in T264194 <https://phabricator.wikimedia.org/T264194> 
that a significant percentage of users are using the query service for 
questions that other alternative solutions could easily solve with just a 
little bit more work. The hope is that some categories will contain more 
"expensive" queries, giving us a clear indicator that we should prioritize the 
alternative solution described in that category.
  
  Some ways we discussed being able to differentiate the queries 
programmatically:
  
  - Identify the number of "hops" used in a query.
    - We noticed that many queries only require one "hop", which means they may 
be more efficiently served by another service, such as a property graph instead 
of a triple store graph.
    - There are also many queries that ask for a specific property value pair 
and therefore require no "hops". These are likely better served by the new REST 
API.
    - There are also many queries that are simple identifier lookups, which we 
could have a separate service or dedicated space for.
  
  Acceptance criteria:
  
  [ ] Refine the flow and categories defined in the WDQS user flow document 
<https://docs.google.com/drawings/d/1TVghcFdaGSyer_9sisWlRF5-4Ic2eDKqEZOz6aVxOds/edit?ts=5f7f54a1>.
    [ ] Add more precision to each bucket's definition
    [ ] Ensure we have a shared understanding of what goes in each bucket
  [ ] Programmatically categorize a subset of (1000?) queries into each bucket
  [ ] Programmatically determine which buckets contain the most "expensive" 
queries

TASK DETAIL
  https://phabricator.wikimedia.org/T266022

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: CBogen
Cc: Aklapper, Gehel, Addshore, JAllemandou, Lydia_Pintscher, CBogen, Akuckartz, 
darthmon_wmde, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331

_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

[Wikidata-bugs] [Maniphest] T266022: Programmatically categorize WDQS queries by potential alternative solution

Reply via email to