dcausse created this task.
dcausse added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

  Propagating the lag of a wdqs host should only be done if this host is 
''pooled'' (actually serving user traffic).
  Determining the ''pooling'' status appeared to be quite challenging in our 
infra so in T336352 <https://phabricator.wikimedia.org/T336352> we started 
using a metric based on the query rate hoping that it would be a reasonably 
proxy for determining if the server is serving users or not.
  This worked well so far but a recent incident where a server was depooled 
after being stuck for some reasons showed that this metric based on query rate 
is too fragile:
  We consider a server to be pooled if its query rate is above 1 qps:
 > 1`
  Sadly this was not true on wdqs1013 when it was depooled, for some reasons 
its query rate was still above 1 (below 1.3). It is possible that this metric 
is polluted with monitoring queries that do not relate to serving user traffic. 
We should perhaps refine how we generate 
 and make sure we only measure user queries.
  - wdqs lag propagation should no longer include false positives (count the 
lag of a server that is actually depooled)



To: dcausse
Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, 
EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, 
aude, Tobias1984, Manybubbles
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to