Gehel created this task.
Gehel added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  As an operator of WDQS I want to be alerted of journal growth issues so that 
I can remediate it before disk is full and the node is rendered completely 
inoperational.
  
  The issue is related to free allocators in Blazegraph, and we don't have a 
good way to prevent it at the moment. The best we can do is to raise an alert 
and recover the journal from another node. The alert should point to the 
appropriate runbook so that the remediation is clear. The alert could be done 
on the overall size of the journal or the wdqs data directory being over a 
certain size (but this threshold will need to be updated over time to account 
for organic growth). Or it could be done on the rate of increase (but that 
requires state, possibly via prometheus).
  
  AC:
  
  - an alert is raised when the Blazegraph journal is growing out of control
  - the alert points to the relevant runbook for remediation

TASK DETAIL
  https://phabricator.wikimedia.org/T284446

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel
Cc: Gehel, Aklapper, MPhamWMF, CBogen, Namenlos314, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to