[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2023-08-04 Thread Gehel
Gehel closed subtask T324811: Create WDQS Lag SLO dashboard with Grizzly 
 documentation as Resolved.

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper, Gehel
Cc: bking, RLazarus, Gehel, MPhamWMF, Aklapper, Danny_Benjafield_WMDE, 
Astuthiodit_1, AWesterinen, BTullis, karapayneWMDE, Invadibot, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2023-05-12 Thread Gehel
Gehel closed this task as "Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper, Gehel
Cc: bking, RLazarus, Gehel, MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2023-05-09 Thread RKemper
RKemper moved this task from In Progress to Needs Reporting on the 
Discovery-Search (Current work) board.
RKemper added a comment.


  With https://gerrit.wikimedia.org/r/c/operations/grafana-grizzly/+/917938, we 
now have the grizzly dashboard where we want it. That was the last blocker for 
closing out this ticket, so this should be all done.

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper
Cc: bking, RLazarus, Gehel, MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2023-05-08 Thread Gehel
Gehel reopened subtask T324811: Create WDQS Lag SLO dashboard with Grizzly 
 documentation as Open.

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper, Gehel
Cc: bking, RLazarus, Gehel, MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2023-05-02 Thread Gehel
Gehel added a project: Data-Platform-SRE.

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper, Gehel
Cc: bking, RLazarus, Gehel, MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, 
karapayneWMDE, Invadibot, maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2023-03-16 Thread Gehel
Gehel merged a task: T305951: Create SLI for Blazegraph uptime.
Gehel added a subscriber: bking.

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper, Gehel
Cc: bking, RLazarus, Gehel, MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, 
karapayneWMDE, Invadibot, maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2023-03-10 Thread Gehel
Gehel closed subtask T323064: Create WDQS Uptime SLO dashboard in Grizzly as 
Resolved.

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper, Gehel
Cc: RLazarus, Gehel, MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, 
karapayneWMDE, Invadibot, maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2023-03-10 Thread Gehel
Gehel closed subtask T324811: Create WDQS Lag SLO dashboard with Grizzly 
 documentation as Resolved.

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper, Gehel
Cc: RLazarus, Gehel, MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, 
karapayneWMDE, Invadibot, maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2023-03-10 Thread Gehel
Gehel closed subtask T325324: Evaluate options to soften wdqs paging as 
Resolved.

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper, Gehel
Cc: RLazarus, Gehel, MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, 
karapayneWMDE, Invadibot, maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2023-02-28 Thread RKemper
RKemper added a subtask: T325324: Evaluate options to soften wdqs paging.

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper
Cc: RLazarus, Gehel, MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, 
karapayneWMDE, Invadibot, maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2023-02-28 Thread RKemper
RKemper updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper
Cc: RLazarus, Gehel, MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, 
karapayneWMDE, Invadibot, maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2022-12-09 Thread Gehel
Gehel closed subtask T323066: Understand meaning of trafficserver wdqs request 
data vs turnilo webrequest data as Resolved.

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper, Gehel
Cc: RLazarus, Gehel, MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, 
karapayneWMDE, Invadibot, maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2022-11-23 Thread RKemper
RKemper added a comment.


  In T313751#8388946 , 
@Gehel wrote:
  
  > A few comments on the current dashboard 
:
  >
  > - a very quick look at Turnilo : the graph look 
different enough that I'd like to know why the discrepancies
  > - as discussed, we should define the service as "working" not only when 
returning HTTP/200, but also when requests are throttled (429) or banned (403)
  > - we probably need to dig a bit more into other response codes and the dips 
we see in the graph to understand what they are and if they are problematic 
(and thus refine our definition of a "working" service)
  
  Just following up here: dashboard 
 was updated 
to accept any of `200`, `403`, or `429` as successful as far as our SLI is 
concerned. Working on updating our SLO documentation accordingly.

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper
Cc: RLazarus, Gehel, MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, 
karapayneWMDE, Invadibot, maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, 
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2022-11-11 Thread Gehel
Gehel added a comment.


  A few comments on the current dashboard 
:
  
  - a very quick look at Turnilo : the graph look 
different enough that I'd like to know why the discrepancies
  - as discussed, we should define the service as "working" not only when 
returning HTTP/200, but also when requests are throttled (429) or banned (403)
  - we probably need to dig a bit more into other response codes and the dips 
we see in the graph to understand what they are and if they are problematic 
(and thus refine our definition of a "working" service)

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper, Gehel
Cc: Gehel, MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, karapayneWMDE, 
Invadibot, maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2022-10-13 Thread RKemper
RKemper added a comment.


  With respect to recording nginx request responses:
  
  **Getting direct logs**: One idea is to add `/var/log/nginx/access.log` to 
`RollingFileAppender` in `modules/query_service/templates/logback.xml.erb` 
(https://github.com/wikimedia/puppet/blob/6e3c52f30166f88c5021c11ebd5f6aa48854/modules/query_service/templates/logback.xml.erb#L29)
  
Example log line:
```
(REDACTED IP; EXAMPLE FORMAT xx.xx.x.xx) - - [13/Oct/2022:16:28:14 +] 
"GET /sparql?format=json=REDACTED_QUERY_STRING HTTP/1.1" 200 97 "-" 
"REDACTED_USER_AGENT"
```

Pros: We're ingesting the full log line, not just prometheus metrics. This 
would make it easier to correlate, say, a spike in 5xx responses, with the log 
lines corresponding to the actual requests

Cons: We don't directly get time-series metrics for this. We'd probably 
want to separately ingest corresponding time series metrics so we can actually 
see this in Grafana. Kibana has the ability to visualize by parsing log lines, 
but this is computationally expensive, so we probably want to directly export 
nginx request metrics to Prometheus.
  
  **Getting metrics**: I'm a bit hazy on the best way to do this. There's 
hopefully a pretty straightforward way. I'll see if o11y has any thoughts on 
this.

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper
Cc: MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, 
maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2022-10-13 Thread RKemper
RKemper added a comment.


  The current approach we're trying to work towards is recording the nginx 
response codes for requests. That will give us insight into the number of 
failures we're seeing.
  
  At a high level, these are the various response codes we expect for different 
scenarios:
  
  **User throttled** => **429**  ("Too Many Requests - Please retry in %s 
seconds.")
  
  **User banned** => **403** ("You have been banned until %s, please respect 
throttling and retry-after headers.")
  
  **Successful request** => **2xx**
  
  **Failed request** => **5xx**
  
  - One common failure mode is a specific wdqs host's blazegraph instance being 
deadlocked. In this case, nginx will never hear back from blazegraph, and will 
issue some sort of 5xx code (not yet sure which exact code)

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper
Cc: MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, 
maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2022-10-11 Thread Maintenance_bot
Maintenance_bot removed a project: Patch-For-Review.

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper, Maintenance_bot
Cc: MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, 
maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Jersione, 
Hellket777, LisafBia6531, 786, Biggs657, Juan90264, Alter-paule, Beast1978, 
Un1tY, Hook696, Kent7301, joker88john, CucyNoiD, Gaboe420, Giuliamocci, 
Cpaulf30, Af420, Bsandipan, Lewizho99, Maathavan, Neuronton
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2022-10-11 Thread gerritbot
gerritbot added a comment.


  Change 841518 **merged** by Ryan Kemper:
  
  [operations/puppet@production] Revert "wdqs-test: try installing nginx w 
extras"
  
  https://gerrit.wikimedia.org/r/841518

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper, gerritbot
Cc: MPhamWMF, Aklapper, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, 
AWesterinen, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, 
Alter-paule, Beast1978, CBogen, ItamarWMDE, Un1tY, Akuckartz, Hook696, 
Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, 
Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Neuronton, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2022-10-11 Thread gerritbot
gerritbot added a comment.


  Change 841518 had a related patch set uploaded (by Ryan Kemper; author: Ryan 
Kemper):
  
  [operations/puppet@production] Revert "wdqs-test: try installing nginx w 
extras"
  
  https://gerrit.wikimedia.org/r/841518

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper, gerritbot
Cc: MPhamWMF, Aklapper, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, 
AWesterinen, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, 
Alter-paule, Beast1978, CBogen, ItamarWMDE, Un1tY, Akuckartz, Hook696, 
Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, 
Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Neuronton, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2022-10-11 Thread gerritbot
gerritbot added a comment.


  Change 841582 **merged** by Ryan Kemper:
  
  [operations/puppet@production] wdqs-test: try installing nginx w extras
  
  https://gerrit.wikimedia.org/r/841582

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper, gerritbot
Cc: MPhamWMF, Aklapper, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, 
AWesterinen, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, 
Alter-paule, Beast1978, CBogen, ItamarWMDE, Un1tY, Akuckartz, Hook696, 
Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, 
Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Neuronton, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2022-10-11 Thread gerritbot
gerritbot added a project: Patch-For-Review.

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper, gerritbot
Cc: MPhamWMF, Aklapper, Jersione, Hellket777, LisafBia6531, Astuthiodit_1, 
AWesterinen, 786, Biggs657, karapayneWMDE, Invadibot, maantietaja, Juan90264, 
Alter-paule, Beast1978, CBogen, ItamarWMDE, Un1tY, Akuckartz, Hook696, 
Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, 
Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Neuronton, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2022-10-11 Thread gerritbot
gerritbot added a comment.


  Change 841582 had a related patch set uploaded (by Ryan Kemper; author: Ryan 
Kemper):
  
  [operations/puppet@production] [wip] query_service: try installing nginx w 
extras
  
  https://gerrit.wikimedia.org/r/841582

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper, gerritbot
Cc: MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, 
maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2022-10-04 Thread RKemper
RKemper added a comment.


  With respect to the SLO itself, our goal is an SLO that captures the promise 
we make about service availability: namely, that WDQS is available on a 
**best-effort** basis. In practice, this means that if an issue arises out of 
"business hours", it's acceptable to wait until "business hours" to resolve it. 
For example, in the most extreme case, if the service were to have an outage on 
a Friday night, we wouldn't be paging anyone to work the night nor the weekend, 
but come Monday we'd be focusing our efforts on restoring availability as soon 
as possible. This specific scenario - a multi-day full outage - would of course 
be quite rare (on the order of a few times a year at most, but generally much 
less).
  
  Thus our uptime % goal should reflect the above reality. I think a good 
starting point would be 95% uptime. This means that the service could be down 
for 18.25 days out of a year. With that number we could have basically one full 
weekend outage a quarter and be within our threshold.
  
  Note that any % chosen is to some extent arbitrary. For example if WDQS were 
down during business hours and we weren't doing anything to try to fix it, but 
were still above 95% uptime, we'd be within our SLO but not actually meeting 
our best-effort claim. Conversely, if we were experiencing frequent weekend 
outages but were always getting things operational by the time the normal 
workweek has commenced, we could fall below our SLO's threshold while still 
actually meeting our own expectations for the service. But this 95% number 
seems like a reasonable initial target to convey our intent with this service. 
To be clear, in practice, at least based off current performance, I'd expect 
our uptime to be well over that 95% minimum bar, but the point is that the 95% 
threshold lets us be explicit about what kind of error budget we're allowing 
for.

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper
Cc: MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, 
maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2022-10-04 Thread RKemper
RKemper added a comment.


  Gehel and I met with bblack today.
  
  Some highlights:
  
  - Best to use real user traffic if possible, rather than artificial. However 
this might be difficult for our use case (given that some subset of queries we 
consider invalid/failing)
  
  - If going artificial traffic route, makes more sense to do local queries on 
each host (accounting for pool/depool status ofc) rather than running queries 
at the dc level. This way we have a better separation of concerns.

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper
Cc: MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, 
maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2022-09-13 Thread RKemper
RKemper added a comment.


  Intro (some context for traffic team)
  -
  
  Search team is working on creating an SLI to measure uptime of WDQS. We want 
our SLI to map as well to the actual user experience as possible, so to that 
end we're trying to come up with a way to hit WDQS endpoints externally or 
semi-externally. Ideally the solution would be ambivalent to the underlying 
pool/depool state of the underlying hosts (translation: if a host is depooled 
the request won't ultimately route to it).
  
  The primary idea we have is to send automated requests to each datacenter, 
for example by querying `wdqs.svc.{codfw,eqiad}.wmnet`. However I/we have some 
knowledge gaps that makes it a bit murky to get a clear idea of what exactly 
that solution looks like.
  
  Do you all have any thoughts on the best way to do this?
  
  To that end it'd help to sanity check a few assumptions I'm working off of:
  
  Assumptions
  ---
  
  **(A1)** By hitting `wdqs.svc.{codfw,eqiad}.wmnet`, we're bypassing any 
geoDNS logic since that happens earlier up in the stack.
  
  **(A2)** Requests that hit `wdqs.svc.{codfw,eqiad}.wmnet` round robin to an 
underlying pooled host in the fleet 
(https://config-master.wikimedia.org/pybal/eqiad/wdqs for example)
  
  Beyond the sanity check of those assumptions, I have a few further questions:
  
  Questions
  -
  
  **(Q1)** Is there precedent for this pattern already, ie is there perhaps an 
existing service that uses a similar approach?
  
  **(Q2)** If there isn't already precedent for this, what are your initial 
thoughts on the best way to do this? For example could we just literally send a 
request to `wdqs.svc.{codfw,eqiad}.wmnet` every X minutes from an alerting 
host, or is there a better way?

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper
Cc: MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, 
maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2022-08-26 Thread RKemper
RKemper added a comment.


  Some quick pros/cons of two possible approaches to getting the SLI metrics: 
approach #1 is to run a query or set of queries per-dc at a certain frequency, 
approach #2 is just to run a query on each host at a certain frequency
  
  * Approach #1: Hit `wdqs.svc.{codfw,eqiad}.wmnet`
  -
  
  Pros
  
  
  - Routing through pybal so we automatically ignore depooled hosts
  - Covers a broader class of failures than just simply running queries on each 
host
  - Maps a bit better to the actual user experience (ie if 10% of hosts are 
down)
  
  Cons
  
  
  - Adds some complexity in terms of understanding how routing works (ex: do we 
have to worry about geoDNS [ie that we might end up unintentionally always 
routing to the same host] or is that [geoDNS] "higher up" in the stack and 
therefore not relevant?)
  
  * Approach #2: Just run a simple query on each host in the fleet
  
  
  Pros
  
  
  - Easy to reason about
  - Constantly testing each host individually, so we have host-level granularity
  
  Cons
  
  
  - For generating the SLI itself, for each host need to filter out time range 
in which they're depooled
  - Not as pure of a gauge of user experience as compared to hitting 
`wdqs.svc.{codfw,eqiad}.wmnet`
  
  ---
  
  Personally I lean a bit towards #1 because it intuitively seems to measure 
the user experience better, but I do have significant gaps in my understanding 
of our network / request routing stack so there's perhaps more unknowns

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper
Cc: MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, 
maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2022-08-22 Thread RKemper
RKemper added a comment.


  Aisha has written some Jupyter notebooks to pull together a random selection 
from groupings of query by time-to-completion and query structure (which 
operators are used, basically).
  
  On the ops side of things we'll need to decide between whether we want to 
just run a single simple query on every host or run a whole set of queries. 
Right now with Aisha's data we'll have the option of choosing either way.

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper
Cc: MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, 
maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2022-08-01 Thread MPhamWMF
MPhamWMF set the point value for this task to "5".

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper, MPhamWMF
Cc: MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, 
maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2022-08-01 Thread MPhamWMF
MPhamWMF moved this task from Incoming to Current work on the 
Wikidata-Query-Service board.
MPhamWMF added a project: Discovery-Search (Current work).

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

WORKBOARD
  https://phabricator.wikimedia.org/project/board/891/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper, MPhamWMF
Cc: MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, 
maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2022-07-26 Thread RKemper
RKemper updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper
Cc: MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, 
maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2022-07-25 Thread Maintenance_bot
Maintenance_bot added a project: Wikidata.

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper, Maintenance_bot
Cc: MPhamWMF, Aklapper, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, 
maantietaja, CBogen, ItamarWMDE, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T313751: Create WDQS uptime SLO

2022-07-25 Thread MPhamWMF
MPhamWMF created this task.
MPhamWMF added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  As a user and a maintainer of WDQS, I want an expectation of service 
availability so that I know when issues can/should be resolved.
  
  The WDQS uptime SLO will be based on running a set of non-cached 
representative test queries periodically on the WDQS cluster, and comparing the 
time it takes for the queries to run against the baseline expectation; if this 
test time is over a TBD threshold, WDQS will be considered down, and require 
maintenance. This should approximate actual service availability for users. The 
tests will be non-cached and run against the entire cluster rather than per 
host.
  
  Example -- test queries are run hourly should not take more than 200% time to 
run over baseline (<200ms if baseline is 100ms). Goal is to keep this uptime 
95% of the time.
  
  Sub-tasks:
  
  [ ] Create set of test queries (starting from Andrea's work)
  [ ] Establish baseline for test queries
  [ ] Establish SLO for test queries
  
  AC:
  
  - SLO for WDQS uptime is established
  - SLO for WDQS uptime is viewable on WDQS dashboard

TASK DETAIL
  https://phabricator.wikimedia.org/T313751

WORKBOARD
  https://phabricator.wikimedia.org/project/board/891/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: RKemper, MPhamWMF
Cc: MPhamWMF, Aklapper, AWesterinen, CBogen, Namenlos314, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org