[Wikidata-bugs] [Maniphest] T315455: Investigate what caused our WDQS update lag to drop and remain below SLO on 28/07

2022-08-22 Thread MPhamWMF
MPhamWMF closed this task as "Resolved".
MPhamWMF claimed this task.

TASK DETAIL
  https://phabricator.wikimedia.org/T315455

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: MPhamWMF
Cc: MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, 
maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T315455: Investigate what caused our WDQS update lag to drop and remain below SLO on 28/07

2022-08-22 Thread MPhamWMF
MPhamWMF added a comment.


  per talking to David:
  
  > This is due to two events that I think did not have any major user impact:
  >
  > - July 23 at 08:00: wdqs1004 suffered from a deadlock and was automatically 
depooled, the machine remained depooled until July 26 at 21:00 and it took 
1.5days to catchup (I can't remember if this machine remained depooled while it 
caught up so it is possible that some user queries might have returned not up 
to date results during that period).
  > - August 9 at 15:00, I performed some maintenance operations on the 
streaming updater running in k8s@codfw due to T314835 
. Before doing so Brian did depool 
codfw so users were not impacted.
  >
  > The first problem is due to one of the two failure modes we know can affect 
blazegraph, one is the memory pressure that was mitigated using jvmquake, the 
other is a deadlock for which we don't have yet a remediation in place 
.
  > The second problem is mainly due to the fact that our SLO calculation do 
not know what servers are pooled (related to 
https://phabricator.wikimedia.org/T238751) and thus can burn our budget even in 
case of planned maintenance operations.
  > Barring any new problems the update lag SLO should be back to normal in a 
couple weeks (due to the 30d time window). I think it's already back to normal 
if you select 7d for "Period to calculate"
  
  No further action is needed at this time

TASK DETAIL
  https://phabricator.wikimedia.org/T315455

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: MPhamWMF
Cc: MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, 
maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T315455: Investigate what caused our WDQS update lag to drop and remain below SLO on 28/07

2022-08-17 Thread Maintenance_bot
Maintenance_bot added a project: Wikidata.

TASK DETAIL
  https://phabricator.wikimedia.org/T315455

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Maintenance_bot
Cc: MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, 
maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T315455: Investigate what caused our WDQS update lag to drop and remain below SLO on 28/07

2022-08-17 Thread MPhamWMF
MPhamWMF created this task.
MPhamWMF added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  This ticket is to investigate what caused WDQS update lag to drop at the end 
of July. We can file another ticket for any resolution actions that need to be 
taken.
  
  WDQS uptime lag dropped around 28 July below our established 95% SLO: 
https://grafana.wikimedia.org/d/yCBd7Tdnk/wdqs-wcqs-lag-slo?orgId=1=now-90d=now_name=wdqs_threshold=600_period=30d
  
  AC:
  
  - identify reason for drop in WDQS update lag SLO

TASK DETAIL
  https://phabricator.wikimedia.org/T315455

WORKBOARD
  https://phabricator.wikimedia.org/project/board/891/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: MPhamWMF
Cc: MPhamWMF, Aklapper, AWesterinen, CBogen, Namenlos314, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org