dcausse created this task.
dcausse added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  Propagating the lag of a wdqs host should only be done if this host is 
''pooled'' (actually serving user traffic).
  Determining the ''pooling'' status appeared to be quite challenging in our 
infra so in T336352 <https://phabricator.wikimedia.org/T336352> we started 
using a metric based on the query rate hoping that it would be a reasonably 
proxy for determining if the server is serving users or not.
  
  This worked well so far but a recent incident where a server was depooled 
after being stuck for some reasons showed that this metric based on query rate 
is too fragile:
  We consider a server to be pooled if its query rate is above 1 qps:
  
`rate(org_wikidata_query_rdf_blazegraph_filters_QueryEventSenderFilter_event_sender_filter_StartedQueries{}[10m])
 > 1`
  
  Sadly this was not true on wdqs1013 when it was depooled, for some reasons 
its query rate was still above 1 (below 1.3). It is possible that this metric 
is polluted with monitoring queries that do not relate to serving user traffic. 
We should perhaps refine how we generate 
`org_wikidata_query_rdf_blazegraph_filters_QueryEventSenderFilter_event_sender_filter_StartedQueries`
 and make sure we only measure user queries.
  
  AC:
  
  - wdqs lag propagation should no longer include false positives (count the 
lag of a server that is actually depooled)

TASK DETAIL
  https://phabricator.wikimedia.org/T360993

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, 
EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, 
aude, Tobias1984, Manybubbles
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to