Gehel created this task. Gehel added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper.
TASK DESCRIPTION As an operator of WDQS I want to be alerted of journal growth issues so that I can remediate it before disk is full and the node is rendered completely inoperational. The issue is related to free allocators in Blazegraph, and we don't have a good way to prevent it at the moment. The best we can do is to raise an alert and recover the journal from another node. The alert should point to the appropriate runbook so that the remediation is clear. The alert could be done on the overall size of the journal or the wdqs data directory being over a certain size (but this threshold will need to be updated over time to account for organic growth). Or it could be done on the rate of increase (but that requires state, possibly via prometheus). AC: - an alert is raised when the Blazegraph journal is growing out of control - the alert points to the relevant runbook for remediation TASK DETAIL https://phabricator.wikimedia.org/T284446 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Gehel Cc: Gehel, Aklapper, MPhamWMF, CBogen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org