dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper.
TASK DESCRIPTION Propagating the lag of a wdqs host should only be done if this host is ''pooled'' (actually serving user traffic). Determining the ''pooling'' status appeared to be quite challenging in our infra so in T336352 <https://phabricator.wikimedia.org/T336352> we started using a metric based on the query rate hoping that it would be a reasonably proxy for determining if the server is serving users or not. This worked well so far but a recent incident where a server was depooled after being stuck for some reasons showed that this metric based on query rate is too fragile: We consider a server to be pooled if its query rate is above 1 qps: `rate(org_wikidata_query_rdf_blazegraph_filters_QueryEventSenderFilter_event_sender_filter_StartedQueries{}[10m]) > 1` Sadly this was not true on wdqs1013 when it was depooled, for some reasons its query rate was still above 1 (below 1.3). It is possible that this metric is polluted with monitoring queries that do not relate to serving user traffic. We should perhaps refine how we generate `org_wikidata_query_rdf_blazegraph_filters_QueryEventSenderFilter_event_sender_filter_StartedQueries` and make sure we only measure user queries. AC: - wdqs lag propagation should no longer include false positives (count the lag of a server that is actually depooled) TASK DETAIL https://phabricator.wikimedia.org/T360993 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, AWesterinen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, KimKelting, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org