[Wikidata-bugs] [Maniphest] T332314: Configure new WDQS servers in codfw (wdqs20[13-22])

2023-07-07 Thread bking
bking added a comment. Current state: 2019 and 2020 are production-ready. The others need a data transfer and/or scap deploy to be complete. The command below checks the deployment directory size. If the directory size is smaller than 471M, that means `git-fat` isn't working and the host

[Wikidata-bugs] [Maniphest] T332314: Configure new WDQS servers in codfw (wdqs20[13-22])

2023-07-10 Thread bking
bking added a comment. Update: I forgot to target 2013 in my last command, here is the latest list of hosts that need a data transfer and a deploy: (4) wdqs[2013-2016].codfw.wmnet - OUTPUT of 'du -hcxs /srv/de...6756ebe194261756' - 132M /srv/deployment/wdqs/wdqs

[Wikidata-bugs] [Maniphest] T332314: Configure new WDQS servers in codfw (wdqs20[13-22])

2023-07-17 Thread bking
bking added a comment. Update: `wdqs2016.codfw.wmnet` is the last host that needs to be configured for production. `wdqs2020.codfw.wmnet` has been receiving production traffic for a week now, with no observed issues. We should be able to finish the rest pretty soon and start

[Wikidata-bugs] [Maniphest] T339347: qlever dblp endpoint for wikidata federated query nomination

2023-07-10 Thread bking
bking set the point value for this task to "1". TASK DETAIL https://phabricator.wikimedia.org/T339347 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: bking, Aklapper, WolfgangFahl, Astuthiodit_1, AWesterinen, BTullis, kar

[Wikidata-bugs] [Maniphest] T332314: Configure new WDQS servers in codfw (wdqs20[13-22])

2023-07-07 Thread bking
bking added a comment. Update: wdqs[2017-2021].codfw.wmnet are now production ready: = NODE GROUP = (4) wdqs[2014-2016,2022].codfw.wmnet - OUTPUT of 'du -hcxs /srv/de...6756ebe194261756' - 132M /srv/deployment/wdqs/wdqs-cache/revs

[Wikidata-bugs] [Maniphest] T336574: Review alerting around Wikidata Query Service update pipeline

2023-05-31 Thread bking
bking added a comment. Per today's SRE meeting, the larger SRE org is working on a comprehensive alert review <https://etherpad.wikimedia.org/p/alert-review-may-2023> . We should work with the SREs to help out and use their methods to review our own alerts. TASK DETAIL

[Wikidata-bugs] [Maniphest] T321605: Make WCQS/WDQS data transfer cookbook more reliable

2023-05-31 Thread bking
bking added a comment. I've been working on this a bit more lately. The Transfer.py documentation <https://doc.wikimedia.org/transferpy/master/transferpy/transferpy.html#module-transferpy.Firewall> mentions "remote_execution" but does not mention that it's a required argumen

[Wikidata-bugs] [Maniphest] T336577: Update WDQS Runbook following update lag incident

2023-05-22 Thread bking
bking claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T336577 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: bking, dcausse, Gehel, Aklapper, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, Zabe, MPhamWMF

[Wikidata-bugs] [Maniphest] T336577: Update WDQS Runbook following update lag incident

2023-05-22 Thread bking
bking added subscribers: dcausse, bking. bking added a comment. Updated the Streaming Updater operations docs <https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater#The_consumers_are_backlogged> after today's pairing session with @dcausse . We'll continue to

[Wikidata-bugs] [Maniphest] T336577: Update WDQS Runbook following update lag incident

2023-05-25 Thread bking
bking added a comment. Other action items: - Add link to new WDQS superset dashboard to WDQS runbook page. - Fix dead logstash link on WDQS runbook page <https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Timeouts> - Better documentation of throttling be

[Wikidata-bugs] [Maniphest] T336709: Allow federated queries with the BNCF SPARQL endpoint

2023-05-30 Thread bking
bking added a comment. @Epidosis Thanks for your patience. We've added https://digitale.bncf.firenze.sbn.it/openrdf-workbench/repositories/NS/query as an allowed SPARQL endpoint for WDQS. Please test this out and respond here with your results-good or bad. TASK DETAIL https

[Wikidata-bugs] [Maniphest] T335994: Allow federated queries to the UNESCO SPARQL endpoint

2023-05-30 Thread bking
bking added a comment. @Nikki Thanks for your patience. We've added https://vocabularies.unesco.org/sparql as an allowed SPARQL endpoint for WDQS. Please test this out and respond here with your results-good or bad. TASK DETAIL https://phabricator.wikimedia.org/T335994 EMAIL

[Wikidata-bugs] [Maniphest] T337230: Include LiLa Linking Latin SPARQL endpoint in whitelist for federated queries

2023-05-30 Thread bking
bking added a comment. @DL2204 Thanks for your patience. We've added https://lila-erc.eu/sparql/lila_knowledge_base/sparql as an allowed SPARQL endpoint for WDQS. Please test this out and respond here with your results-good or bad. TASK DETAIL https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] T336577: Update WDQS Runbook following update lag incident

2023-05-30 Thread bking
bking added projects: Sustainability, SRE-OnFire. TASK DETAIL https://phabricator.wikimedia.org/T336577 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: bking, dcausse, Gehel, Aklapper, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot

[Wikidata-bugs] [Maniphest] T336574: Review alerting around Wikidata Query Service update pipeline

2023-05-30 Thread bking
bking edited projects, added Sustainability, SRE-OnFire; removed Sustainability (Incident Followup). TASK DETAIL https://phabricator.wikimedia.org/T336574 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: bking, Aklapper, Gehel

[Wikidata-bugs] [Maniphest] T336574: Review alerting around Wikidata Query Service update pipeline

2023-05-22 Thread bking
bking added a comment. Revised totals for alerts in the last year after looking at Logstash: `RdfStreamingUpdaterHighConsumerUpdateLag` 373 `RdfStreamingUpdaterFlinkProcessingLatencyIsHigh` 63 `RdfStreamingUpdaterFlinkJobUnstable` 125 The majority of all three alert types fired

[Wikidata-bugs] [Maniphest] T339347: qlever dblp endpoint for wikidata federated query nomination

2023-08-08 Thread bking
bking added a comment. @WolfgangFahl We've whitelisted the endpoints, but the query you linked above <https://w.wiki/6q2i> still does not work. Can you verify that is it working as expected? My teammate mentioned "it's returning application/sparql-results+xml but we on

[Wikidata-bugs] [Maniphest] T336134: wdqs2*** lagged for more than one day

2023-08-11 Thread bking
bking added a subtask: T337801: WDQS: Document procedure for switching between Kubernetes and Yarn Streaming Updater. TASK DETAIL https://phabricator.wikimedia.org/T336134 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: karapayneWMDE

[Wikidata-bugs] [Maniphest] T332314: Configure new WDQS servers in codfw (wdqs20[13-22])

2023-08-07 Thread bking
bking closed subtask T330714: Document SRE steps for deploying a new WDQS (and WCQS) host as Resolved. TASK DETAIL https://phabricator.wikimedia.org/T332314 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: RKemper, Gehel, Aklapper

[Wikidata-bugs] [Maniphest] T332314: Configure new WDQS servers in codfw (wdqs20[13-22])

2023-06-21 Thread bking
bking claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T332314 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: Gehel, Aklapper, Astuthiodit_1, AWesterinen, BTullis, karapayneWMDE, Invadibot, MPhamWMF, maantietaja

[Wikidata-bugs] [Maniphest] T321605: Make WCQS/WDQS data transfer cookbook more reliable

2023-06-22 Thread bking
bking reopened this task as "In Progress". TASK DETAIL https://phabricator.wikimedia.org/T321605 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: Vgutierrez, RKemper, Volans, Aklapper, bking, Astuthiodit_1, AWesterinen, kar

[Wikidata-bugs] [Maniphest] T336134: wdqs2*** lagged for more than one day

2023-05-08 Thread bking
bking claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T336134 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: bking, Aklapper, Bugreporter, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, MPhamWMF, maantietaja

[Wikidata-bugs] [Maniphest] T270614: Automatically depool wdqs servers that are "lagged"

2023-05-15 Thread bking
bking added a comment. Per today's triage meeting, we have a liveliness probe in our nginx config, we need to replace it with something smarter. TASK DETAIL https://phabricator.wikimedia.org/T270614 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] T325602: Decide whether or not to keep wdqs-heavy-queries and wdqs-ssl PyBal pools

2023-05-15 Thread bking
bking removed a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T325602 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: Gehel, Aklapper, RKemper, bking, Astuthiodit_1, AWesterinen, karapayneWMDE

[Wikidata-bugs] [Maniphest] T274270: WDQS servers taking up to 30 minutes to reboot

2023-05-15 Thread bking
bking edited projects, added Discovery-Search; removed Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T274270 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: bking, RKemper, Gehel, Aklapper, Astuthiodit_1

[Wikidata-bugs] [Maniphest] T193473: Add HTTPS support to wdqs-internal service

2023-05-15 Thread bking
bking removed a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T193473 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: bking, Aklapper, Smalyshev, Gehel, Astuthiodit_1, AWesterinen, karapayneWMDE

[Wikidata-bugs] [Maniphest] T336134: wdqs2*** lagged for more than one day

2023-05-11 Thread bking
bking updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T336134 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: bking, Aklapper, Bugreporter, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, MPhamWMF

[Wikidata-bugs] [Maniphest] T336574: Review alerting around Wikidata Query Service update pipeline

2023-05-18 Thread bking
bking added a comment. Quick notes here before I forget. Checking my "alerts" email folder for the past year (not the most reliable source), I have: - 89 alerts with title RdfStreamingUpdaterHighConsumerUpdateLag - 72 alerts with title RdfStreamingUpdaterFlinkProcessingLat

[Wikidata-bugs] [Maniphest] T336134: wdqs2*** lagged for more than one day

2023-08-14 Thread bking
bking closed subtask T337801: WDQS: Document procedure for switching between Kubernetes and Yarn Streaming Updater as Resolved. TASK DETAIL https://phabricator.wikimedia.org/T336134 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc

[Wikidata-bugs] [Maniphest] T350784: Identify/complete post-migration tasks after rdf-streaming-updater migrates to flink operator

2023-12-06 Thread bking
bking updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T350784 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: dcausse, JMeybohm, Aklapper, bking, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, BTullis

[Wikidata-bugs] [Maniphest] T352921: Review alerts for miscweb-hosted domains commons-query.wikimedia.org and query.wikidata.org

2023-12-18 Thread bking
bking added a comment. No objections from the Search Platform side. Side note, our new team (Data Platform SRE) is reviewing its alerts and alerting strategy, <https://phabricator.wikimedia.org/T345698> so our config may change in the near future. TASK DETAIL

[Wikidata-bugs] [Maniphest] T352921: Review alerts for miscweb-hosted domains commons-query.wikimedia.org and query.wikidata.org

2023-12-18 Thread bking
bking closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T352921 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: Aklapper, Gehel, BTullis, bking, Dzahn, Danny_Benjafield_WMDE, Astuthiodit_1, kar

[Wikidata-bugs] [Maniphest] T347284: Restore service for https://query.wikidata.org/bigdata/ldf

2023-12-20 Thread bking
bking closed subtask T347355: Create alerts for https://query.wikidata.org/bigdata/ldf as Resolved. TASK DETAIL https://phabricator.wikimedia.org/T347284 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: RKemper, bking, MisterSynergy

[Wikidata-bugs] [Maniphest] T350784: Identify/complete post-migration tasks after rdf-streaming-updater migrates to flink operator

2024-01-08 Thread bking
bking edited projects, added Data-Platform-SRE; removed Data-Platform-SRE (2023/24 Q3 Milestone 1). TASK DETAIL https://phabricator.wikimedia.org/T350784 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: dcausse, JMeybohm, Aklapper, bking

[Wikidata-bugs] [Maniphest] T354555: WDQS graph split hosts: Remove throttling/banning mechanisms and improve hadoop access

2024-01-08 Thread bking
bking created this task. bking added projects: Wikidata, Data-Platform-SRE. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION Per IRC conversation with @dcausse , we have a couple of issues with the graph split hosts: - Querying the test hosts is resulting in bans

[Wikidata-bugs] [Maniphest] T354555: WDQS graph split hosts: Remove throttling/banning mechanisms and investigate external connectivity

2024-01-11 Thread bking
bking moved this task from In Progress to Done on the Data-Platform-SRE (2023/24 Q3 Milestone 1) board. bking closed this task as "Resolved". bking claimed this task. bking added a comment. Merging/applying the above patch added TLS to the test hosts, using the domain "wdqs-t

[Wikidata-bugs] [Maniphest] T350464: Expose SPARQL endpoints with full wikidata data set and with split graph to enable experimentation on federation with a split graph

2024-01-16 Thread bking
bking closed subtask T352878: Troubleshoot recurring systemd unit failures and availability issues for wdqs1022-24 as Resolved. TASK DETAIL https://phabricator.wikimedia.org/T350464 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: Gehel

[Wikidata-bugs] [Maniphest] T354555: WDQS graph split hosts: Remove throttling/banning mechanisms and improve hadoop access

2024-01-09 Thread bking
bking updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T354555 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: Aklapper, RKemper, bking, dcausse, Danny_Benjafield_WMDE, Astuthiodit_1, BTullis, karapayneWMDE

[Wikidata-bugs] [Maniphest] T354555: WDQS graph split hosts: Remove throttling/banning mechanisms and improve hadoop access

2024-01-09 Thread bking
bking updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T354555 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: Aklapper, RKemper, bking, dcausse, Danny_Benjafield_WMDE, Astuthiodit_1, BTullis, karapayneWMDE

[Wikidata-bugs] [Maniphest] T354555: WDQS graph split hosts: Remove throttling/banning mechanisms and investigate external connectivity

2024-01-09 Thread bking
bking renamed this task from "WDQS graph split hosts: Remove throttling/banning mechanisms and improve hadoop access" to "WDQS graph split hosts: Remove throttling/banning mechanisms and investigate external connectivity". TASK DETAIL https://phabricator.wikimedi

[Wikidata-bugs] [Maniphest] T354555: WDQS graph split hosts: Remove throttling/banning mechanisms and improve hadoop access

2024-01-09 Thread bking
bking updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T354555 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: Aklapper, RKemper, bking, dcausse, Danny_Benjafield_WMDE, Astuthiodit_1, BTullis, karapayneWMDE

[Wikidata-bugs] [Maniphest] T354555: WDQS graph split hosts: Remove throttling/banning mechanisms and investigate external connectivity

2024-01-09 Thread bking
bking changed the task status from "Open" to "In Progress". bking moved this task from Backlog to Needs Review on the Data-Platform-SRE (2023/24 Q3 Milestone 1) board. bking added a subscriber: BTullis. bking added a comment. Per pairing session with @BTullis ,

[Wikidata-bugs] [Maniphest] T336577: Update WDQS Runbook following update lag incident

2024-01-09 Thread bking
bking closed this task as "Resolved". bking moved this task from Backlog to Done on the Data-Platform-SRE (2023/24 Q3 Milestone 1) board. bking claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T336577 WORKBOARD https://phabricator.wikimedia.org/project/board/68

[Wikidata-bugs] [Maniphest] T336134: wdqs2*** lagged for more than one day

2024-01-09 Thread bking
bking closed subtask T336577: Update WDQS Runbook following update lag incident as Resolved. TASK DETAIL https://phabricator.wikimedia.org/T336134 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: karapayneWMDE, bking, Aklapper, Bugreporter

[Wikidata-bugs] [Maniphest] T336577: Update WDQS Runbook following update lag incident

2024-01-09 Thread bking
bking added a comment. Based on a quick read of the linked documentation and a small addition, I believe we have satisfied the requirements. Closing... TASK DETAIL https://phabricator.wikimedia.org/T336577 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] T350784: Identify/complete post-migration tasks after rdf-streaming-updater migrates to flink operator

2024-01-05 Thread bking
bking updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T350784 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: dcausse, JMeybohm, Aklapper, bking, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, BTullis

[Wikidata-bugs] [Maniphest] T350784: Identify/complete post-migration tasks after rdf-streaming-updater migrates to flink operator

2024-01-05 Thread bking
bking claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T350784 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: dcausse, JMeybohm, Aklapper, bking, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, BTullis, karapayneWMDE

[Wikidata-bugs] [Maniphest] T350784: Identify/complete post-migration tasks after rdf-streaming-updater migrates to flink operator

2024-01-05 Thread bking
bking edited projects, added Data-Platform-SRE (2023/24 Q3 Milestone 1); removed Data-Platform-SRE. TASK DETAIL https://phabricator.wikimedia.org/T350784 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: dcausse, JMeybohm, Aklapper, bking

[Wikidata-bugs] [Maniphest] T347355: Create alerts for https://query.wikidata.org/bigdata/ldf

2023-11-28 Thread bking
bking added a comment. I've created another 24-hour silence for this alert, UUID 59b5ca30-1aeb-4d06-b083-7023a373ccb3 . TASK DETAIL https://phabricator.wikimedia.org/T347355 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: Gehel, Dzahn

[Wikidata-bugs] [Maniphest] T347504: WDQS graph split: load data from dumps into new hosts

2023-11-27 Thread bking
bking added a comment. Looks like the data reload for lexemes completed. @dcausse , are you able to check the data from the reload and make sure it's usable? Let me know if I can help. TASK DETAIL https://phabricator.wikimedia.org/T347504 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T347355: Create alerts for https://query.wikidata.org/bigdata/ldf

2023-12-05 Thread bking
bking added a comment. Reverted the last change after we some alerts for the following hosts: `1008 1009 1010 1011 2008 2014` I suspect this has something to do with the tiers, since the ldf_host var was set in public.yaml...will check and respond here. TASK DETAIL https

[Wikidata-bugs] [Maniphest] T326409: Migrate the wdqs streaming updater flink jobs to flink-k8s-operator deployment model

2023-11-29 Thread bking
bking updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T326409 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: Gehel, BTullis, JMeybohm, gmodena, Ottomata, bking, Aklapper, dcausse, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T326409: Migrate the wdqs streaming updater flink jobs to flink-k8s-operator deployment model

2023-11-29 Thread bking
bking updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T326409 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: Gehel, BTullis, JMeybohm, gmodena, Ottomata, bking, Aklapper, dcausse, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T350784: Identify/complete post-migration tasks after rdf-streaming-updater migrates to flink operator

2023-11-29 Thread bking
bking updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T350784 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: dcausse, JMeybohm, Aklapper, bking, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, BTullis

[Wikidata-bugs] [Maniphest] T350784: Identify/complete post-migration tasks after rdf-streaming-updater migrates to flink operator

2023-11-29 Thread bking
bking updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T350784 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: dcausse, JMeybohm, Aklapper, bking, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, BTullis

[Wikidata-bugs] [Maniphest] T350784: Identify/complete post-migration tasks after rdf-streaming-updater migrates to flink operator

2023-11-29 Thread bking
bking updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T350784 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: dcausse, JMeybohm, Aklapper, bking, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen, BTullis

[Wikidata-bugs] [Maniphest] T350784: Identify/complete post-migration tasks after rdf-streaming-updater migrates to flink operator

2023-11-29 Thread bking
bking added a comment. [ ] remove unused secrets from kubernetes.yaml on private puppet TASK DETAIL https://phabricator.wikimedia.org/T350784 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: dcausse, JMeybohm, Aklapper, bking

[Wikidata-bugs] [Maniphest] T347355: Create alerts for https://query.wikidata.org/bigdata/ldf

2023-11-29 Thread bking
bking added a comment. We've silenced the alert for another 24 hours. The network probes Grafana dashboard <https://grafana-rw.wikimedia.org/d/O0nHhdhnz/network-probes-overview?forceLogin=true=1=probes%2Fcustom=All=3> is still showing 0% availability for our ldf probe . TASK DETAIL

[Wikidata-bugs] [Maniphest] T326409: Migrate the wdqs streaming updater flink jobs to flink-k8s-operator deployment model

2023-11-29 Thread bking
bking updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T326409 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: Gehel, BTullis, JMeybohm, gmodena, Ottomata, bking, Aklapper, dcausse, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T326409: Migrate the wdqs streaming updater flink jobs to flink-k8s-operator deployment model

2023-11-29 Thread bking
bking moved this task from In Progress to Done on the Data-Platform-SRE board. bking closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T326409 WORKBOARD https://phabricator.wikimedia.org/project/board/6524/ EMAIL PREFERENCES https://phabricator.wik

[Wikidata-bugs] [Maniphest] T326409: Migrate the wdqs streaming updater flink jobs to flink-k8s-operator deployment model

2023-11-29 Thread bking
bking added a comment. I'm happy to say the flink operator migration is complete. Commons and wikidata are stable in both CODFW and EQIAD. As such, I'm resolving this ticket. Post-migration cleanup work continues in T350784 <https://phabricator.wikimedia.org/T350784> . TASK DETAIL

[Wikidata-bugs] [Maniphest] T350106: Implement a spark job that converts a RDF triples table into a RDF file format

2023-12-07 Thread bking
bking added a comment. I started a transfer from of the gzip files mentioned above to `wdqs1023` from `wdqs1024 ` (wdqs hosts have 10Gbps Ethernet vs. 1Gps for the stat machines, so this should be faster). You can set a temporary iptables rule to allow traffic between hosts

[Wikidata-bugs] [Maniphest] T347355: Create alerts for https://query.wikidata.org/bigdata/ldf

2023-11-27 Thread bking
bking added a comment. Looks like the check targets are rendered at `/srv/prometheus/ops/targets/probes-custom_puppet-http.yaml` on the prom hosts after merging the above patch, the target config for LDF endpoint looks like this <https://phabricator.wikimedia.org/P53914#218

[Wikidata-bugs] [Maniphest] T347355: Create alerts for https://query.wikidata.org/bigdata/ldf

2023-11-27 Thread bking
bking added a comment. The probe is getting a 500 error, which is spawning phab tickets for serviceops-collab team (see T352084 <https://phabricator.wikimedia.org/T352084> ). As such, I've set a 24-hour suppression in alertmanager (UUID fc02d897-8a64-4ebb-a362-77a765a7f155 ) . Will r

[Wikidata-bugs] [Maniphest] T352921: Review alerts for miscweb-hosted domains commons-query.wikimedia.org and query.wikidata.org

2023-12-06 Thread bking
bking updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T352921 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: Aklapper, Gehel, BTullis, bking, Dzahn, me, Danny_Benjafield_WMDE, Astuthiodit_1, BeautifulBold

[Wikidata-bugs] [Maniphest] T352921: Review alerts for miscweb-hosted domains commons-query.wikimedia.org and query.wikidata.org

2023-12-06 Thread bking
bking created this task. bking added projects: Data-Platform-SRE, Epic, Wikidata. TASK DESCRIPTION Per IRC conversation with @Dzahn , we noticed that there a couple of Data Platform SRE-owned sites ( commons-query.wikimedia.org and query.wikidata.org) hosted at least in part from the miscweb

[Wikidata-bugs] [Maniphest] T347504: WDQS graph split: load data from dumps into new hosts

2023-11-30 Thread bking
bking moved this task from In Progress to Blocked / Waiting on the Data-Platform-SRE board. bking added a comment. Moving to "blocked/waiting" until we have confirmation on the reload data. TASK DETAIL https://phabricator.wikimedia.org/T347504 WORKBOARD https://phabricator.wik

[Wikidata-bugs] [Maniphest] T347355: Create alerts for https://query.wikidata.org/bigdata/ldf

2023-11-30 Thread bking
bking added a comment. After some thought, I think the problem is the blackbox check's association with miscweb. We are actually cutting around miscweb when we access the ldf endpoint, so we should put the blackbox check outside of `modules/profile/manifests/microsites/query_service.pp

[Wikidata-bugs] [Maniphest] T349095: Migrate staging rdf-streaming-updater to flink operator

2023-11-30 Thread bking
bking closed this task as "Resolved". bking moved this task from In Progress to Done on the Data-Platform-SRE board. bking added a comment. Apologies for the confusion. We have already migrated the rdf-streaming-updater to production, so I'm closing this ticket (which is focused

[Wikidata-bugs] [Maniphest] T326409: Migrate the wdqs streaming updater flink jobs to flink-k8s-operator deployment model

2023-11-30 Thread bking
bking closed subtask T349095: Migrate staging rdf-streaming-updater to flink operator as Resolved. TASK DETAIL https://phabricator.wikimedia.org/T326409 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: Gehel, BTullis, JMeybohm, gmodena

[Wikidata-bugs] [Maniphest] T347504: WDQS graph split: load data from dumps into new hosts

2023-11-20 Thread bking
bking added a comment. Update: The wikidata dump finished on wdqs1022 ( Wikidata dump loaded in 25 days, 13:32:17.263762) . But all 3 hosts are stuck at the moment; I'm not sure what happened but each of their individual tasks are stuck at 0%. Maybe the dumps server locked up? TASK

[Wikidata-bugs] [Maniphest] T351662: Test hardware-based performance optimizations for WDQS import

2023-11-20 Thread bking
bking created this task. bking added projects: Wikidata, Wikidata-Query-Service, Epic. TASK DESCRIPTION Per today's Search Platform triage meeting, I'm creating this ticket to specifically test hardware optimizations that could speed up the import process. This is in contrast to T336443

[Wikidata-bugs] [Maniphest] T351662: Test hardware-based performance optimizations for WDQS import

2023-11-20 Thread bking
bking added a comment. We'll test I/O `wdqs1014` (R440 <https://phabricator.wikimedia.org/diffusion/EPOC/>) and `wdqs1015` (R450 <https://phabricator.wikimedia.org/diffusion/EPRO/>) using `fio`. This Wikitech page <https://wikitech.wikimedia.org/wiki/Kafka/Kafka-main-r

[Wikidata-bugs] [Maniphest] T326409: Migrate the wdqs streaming updater flink jobs to flink-k8s-operator deployment model

2023-11-20 Thread bking
bking updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T326409 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: Gehel, BTullis, JMeybohm, gmodena, Ottomata, bking, Aklapper, dcausse, Danny_Benjafield_WMDE

[Wikidata-bugs] [Maniphest] T349095: Migrate staging rdf-streaming-updater to flink operator

2023-11-13 Thread bking
bking added a comment. Both apps (commons and wikidata) are stable in staging-eqiad now: bking@deploy2002:~/deployment-charts$ kubectl get flinkdeployments.flink.apache.org NAME JOB STATUS LIFECYCLE STATE flink-app-commonsRUNNING STABLE flink-app

[Wikidata-bugs] [Maniphest] T347355: Create alerts for https://query.wikidata.org/bigdata/ldf

2023-11-14 Thread bking
bking added a comment. team-sre/probes.yaml <https://github.com/wikimedia/operations-alerts/blob/master/team-sre/probes.yaml> in the alerts repo looks like a good place to start. That file also references `prometheus::blackbox::check::http` , which looks like an easy way to s

[Wikidata-bugs] [Maniphest] T347355: Create alerts for https://query.wikidata.org/bigdata/ldf

2023-11-14 Thread bking
bking claimed this task. bking moved this task from Prioritized Backlog to In Progress on the Data-Platform-SRE board. TASK DETAIL https://phabricator.wikimedia.org/T347355 WORKBOARD https://phabricator.wikimedia.org/project/board/6524/ EMAIL PREFERENCES https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] T347504: WDQS graph split: load data from dumps into new hosts

2023-11-13 Thread bking
bking added a comment. Another progress report: We are 80% (869/1104) done on the leading host (wdqs1022). TASK DETAIL https://phabricator.wikimedia.org/T347504 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: RKemper, dcausse, Aklapper

[Wikidata-bugs] [Maniphest] T241128: EPIC: Reduce the time needed to do the initial WDQS import

2023-11-20 Thread bking
bking updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T241128 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: MPhamWMF, Gehel, Addshore, dcausse, Aklapper, me, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen

[Wikidata-bugs] [Maniphest] T241128: EPIC: Reduce the time needed to do the initial WDQS import

2023-11-20 Thread bking
bking updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T241128 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: MPhamWMF, Gehel, Addshore, dcausse, Aklapper, me, Danny_Benjafield_WMDE, Astuthiodit_1, AWesterinen

[Wikidata-bugs] [Maniphest] T351662: Test hardware-based performance optimizations for WDQS import

2023-11-20 Thread bking
bking added a comment. Thanks @Addshore , this is a wealth of great info! Your observation > CPU clock speed makes a big difference with the first 10 batches of RDF taking 2 hours less on a c2-standard-8 3.1-3.8 Ghz machine instead of a n1-standard-8 2.2-3.7 Ghz mach

[Wikidata-bugs] [Maniphest] T358727: Reclaim recently-decommed CP host for WDQS (see T352253)

2024-02-28 Thread bking
bking updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T358727 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: ssingh, RKemper, dr0ptp4kt, wiki_willy, bking, AWesterinen, BTullis, Namenlos314, Gq86

[Wikidata-bugs] [Maniphest] T358727: Reclaim recently-decommed CP host for WDQS (see T352253)

2024-02-28 Thread bking
bking created this task. bking added projects: Data-Platform-SRE, Wikidata-Query-Service, ops-eqiad. TASK DESCRIPTION Hello DC Ops, Per T352253 <https://phabricator.wikimedia.org/T352253> , @dr0ptp4kt requested one of the recently-decommed CP hosts for the WDQS graph split expe

[Wikidata-bugs] [Maniphest] T358727: Reclaim recently-decommed CP host for WDQS (see T352253)

2024-03-01 Thread bking
bking added a comment. @VRiley-WMF or @Jclark-ctr are there any other lifecycle steps I need to take to get this host back into production as `wdqs1025`? This host was already decommissioned, so I'm not sure what to do to get it back into production. I can't seem to reach `cp1086

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-05 Thread bking
bking added a subtask: T358727: Reclaim recently-decommed CP host for WDQS (see T352253). TASK DETAIL https://phabricator.wikimedia.org/T359062 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dr0ptp4kt, bking Cc: dr0ptp4kt, Aklapper

[Wikidata-bugs] [Maniphest] T358727: Reclaim recently-decommed CP host for WDQS (see T352253)

2024-03-05 Thread bking
bking added a parent task: T359062: Assess Wikidata dump import hardware. TASK DETAIL https://phabricator.wikimedia.org/T358727 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: VRiley-WMF, bking Cc: cmooney, Jclark-ctr, VRiley-WMF, ssingh, RKemper

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-06 Thread bking
bking reopened subtask T358727: Reclaim recently-decommed CP host for WDQS (see T352253) as Open. TASK DETAIL https://phabricator.wikimedia.org/T359062 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dr0ptp4kt, bking Cc: dr0ptp4kt, Aklapper

[Wikidata-bugs] [Maniphest] T358727: Reclaim recently-decommed CP host for WDQS (see T352253)

2024-03-06 Thread bking
bking reopened this task as "Open". bking added a comment. @VRiley-WMF `wdqs1025` is failing to reimage. I can't see any disks in the DRAC interface, are you able to check the disks and see if they're properly seated? TASK DETAIL https://phabricator.wikimedia.org/T358

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-07 Thread bking
bking added a comment. @dr0ptp4kt `wdqs1025` should be ready for your I/O tests. Let us know how it goes! TASK DETAIL https://phabricator.wikimedia.org/T359062 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dr0ptp4kt, bking Cc: bking, dr0ptp4kt

[Wikidata-bugs] [Maniphest] T358727: Reclaim recently-decommed CP host for WDQS (see T352253)

2024-03-06 Thread bking
bking added a comment. @VRiley-WMF Unfortunately, I'm still getting errors (screenshot) <https://ewr1.vultrobjects.com/work/disk_errors_wdqs1025.png> when I try to boot up the host. Are you able to reseat the cables and disks? TASK DETAIL https://phabricator.wikimedia.org/T

[Wikidata-bugs] [Maniphest] T358727: Reclaim recently-decommed CP host for WDQS (see T352253)

2024-03-06 Thread bking
bking added a comment. @Jclark-ctr Thanks for the tip, I've added a patch and will try the reimage again. TASK DETAIL https://phabricator.wikimedia.org/T358727 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: VRiley-WMF, bking Cc: cmooney, Jclark

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-19 Thread bking
bking closed subtask T358727: Reclaim recently-decommed CP host for WDQS (see T352253) as Resolved. TASK DETAIL https://phabricator.wikimedia.org/T359062 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dr0ptp4kt, bking Cc: ssingh, bking, dr0ptp4kt

[Wikidata-bugs] [Maniphest] T358727: Reclaim recently-decommed CP host for WDQS (see T352253)

2024-03-19 Thread bking
bking closed this task as "Resolved". bking moved this task from Backlog to Done on the Data-Platform-SRE (2024.03.04 - 2024.03.24) board. bking added a comment. Apologies for not posting this sooner. `wdqs1025` has been ready for use since the above request was merged. Closing th

[Wikidata-bugs] [Maniphest] T359062: Assess Wikidata dump import hardware

2024-03-08 Thread bking
bking added a comment. @ssingh @dr0ptp4kt hold up on the testing for on your hosts for now...we might be able to get an NVMe into this year's budget, will let you know. @dr0ptp4kt If you want to run i/o tests on the existing hosts, I recommend the approach detailed in this wikitech page

[Wikidata-bugs] [Maniphest] T360993: WDQS lag propagation to wikidata not working as intended

2024-03-26 Thread bking
bking added a comment. Per `sudo cumin A:prometheus 'w'` from a cumin host, there are 8 active prometheus hosts. We also have 3 load balancer pools for each wdqs host <https://config-master.wikimedia.org/pybal/codfw/>: - wdqs - wdqs-ssl - wdqs-heavy-queries Ea

[Wikidata-bugs] [Maniphest] T361106: Restore wdqs1013 with a data transfer

2024-03-27 Thread bking
bking created this task. bking added projects: Wikidata-Query-Service, Discovery-Search (Current work), Wikidata, Patch-For-Review, Data-Platform-SRE. Restricted Application removed a project: Patch-For-Review. TASK DESCRIPTION `wdqs1013` is too far behind and we'll need to perform a data

[Wikidata-bugs] [Maniphest] T361114: Alert Search Platform and/or DPE SRE when Wikidata is lagged

2024-03-27 Thread bking
bking created this task. bking added projects: Wikidata-Query-Service, Discovery-Search (Current work), Wikidata, Patch-For-Review, Data-Platform-SRE. Restricted Application removed a project: Patch-For-Review. TASK DESCRIPTION Per parent ticket, our inaccurate MaxLag calculation caused

[Wikidata-bugs] [Maniphest] T360993: WDQS lag propagation to wikidata not working as intended

2024-03-26 Thread bking
bking added a comment. > It is possible that this metric is polluted with monitoring queries that do not relate to serving user traffic I did a little checking around this. Prometheus blackbox checks are defined here <https://gerrit.wikimedia.org/r/plugins/gitiles/operations/

[Wikidata-bugs] [Maniphest] T327689: Use rsync instead of NFS for wdqs data reload cookbook

2024-04-04 Thread bking
bking added a project: Data-Platform-SRE. TASK DETAIL https://phabricator.wikimedia.org/T327689 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: Gehel, ArielGlenn, bking, Aklapper, Danny_Benjafield_WMDE, S8321414, Astuthiodit_1, AWesterinen

[Wikidata-bugs] [Maniphest] T327689: Use rsync instead of NFS for wdqs data reload cookbook

2024-04-04 Thread bking
bking edited projects, added Data-Platform-SRE (2024.03.25 - 2024.04.14); removed Data-Platform-SRE. TASK DETAIL https://phabricator.wikimedia.org/T327689 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bking Cc: Gehel, ArielGlenn, bking, Aklapper

<    1   2   3   4   >