bking added a comment.
Current state: 2019 and 2020 are production-ready. The others need a data
transfer and/or scap deploy to be complete. The command below checks the
deployment directory size. If the directory size is smaller than 471M, that
means `git-fat` isn't working and the host
bking added a comment.
Update: I forgot to target 2013 in my last command, here is the latest list
of hosts that need a data transfer and a deploy:
(4) wdqs[2013-2016].codfw.wmnet
- OUTPUT of 'du -hcxs /srv/de...6756ebe194261756' -
132M
/srv/deployment/wdqs/wdqs
bking added a comment.
Update: `wdqs2016.codfw.wmnet` is the last host that needs to be configured
for production.
`wdqs2020.codfw.wmnet` has been receiving production traffic for a week now,
with no observed issues.
We should be able to finish the rest pretty soon and start
bking set the point value for this task to "1".
TASK DETAIL
https://phabricator.wikimedia.org/T339347
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: bking, Aklapper, WolfgangFahl, Astuthiodit_1, AWesterinen, BTullis,
kar
bking added a comment.
Update: wdqs[2017-2021].codfw.wmnet are now production ready:
= NODE GROUP =
(4) wdqs[2014-2016,2022].codfw.wmnet
- OUTPUT of 'du -hcxs /srv/de...6756ebe194261756' -
132M
/srv/deployment/wdqs/wdqs-cache/revs
bking added a comment.
Per today's SRE meeting, the larger SRE org is working on a comprehensive
alert review <https://etherpad.wikimedia.org/p/alert-review-may-2023> . We
should work with the SREs to help out and use their methods to review our own
alerts.
TASK DETAIL
bking added a comment.
I've been working on this a bit more lately. The Transfer.py documentation
<https://doc.wikimedia.org/transferpy/master/transferpy/transferpy.html#module-transferpy.Firewall>
mentions "remote_execution" but does not mention that it's a required argumen
bking claimed this task.
TASK DETAIL
https://phabricator.wikimedia.org/T336577
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: bking, dcausse, Gehel, Aklapper, Astuthiodit_1, AWesterinen, karapayneWMDE,
Invadibot, Zabe, MPhamWMF
bking added subscribers: dcausse, bking.
bking added a comment.
Updated the Streaming Updater operations docs
<https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater#The_consumers_are_backlogged>
after today's pairing session with @dcausse . We'll continue to
bking added a comment.
Other action items:
- Add link to new WDQS superset dashboard to WDQS runbook page.
- Fix dead logstash link on WDQS runbook page
<https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Timeouts>
- Better documentation of throttling be
bking added a comment.
@Epidosis Thanks for your patience. We've added
https://digitale.bncf.firenze.sbn.it/openrdf-workbench/repositories/NS/query as
an allowed SPARQL endpoint for WDQS. Please test this out and respond here with
your results-good or bad.
TASK DETAIL
https
bking added a comment.
@Nikki Thanks for your patience. We've added
https://vocabularies.unesco.org/sparql as an allowed SPARQL endpoint for WDQS.
Please test this out and respond here with your results-good or bad.
TASK DETAIL
https://phabricator.wikimedia.org/T335994
EMAIL
bking added a comment.
@DL2204 Thanks for your patience. We've added
https://lila-erc.eu/sparql/lila_knowledge_base/sparql as an allowed SPARQL
endpoint for WDQS.
Please test this out and respond here with your results-good or bad.
TASK DETAIL
https://phabricator.wikimedia.org
bking added projects: Sustainability, SRE-OnFire.
TASK DETAIL
https://phabricator.wikimedia.org/T336577
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: bking, dcausse, Gehel, Aklapper, Astuthiodit_1, AWesterinen, karapayneWMDE,
Invadibot
bking edited projects, added Sustainability, SRE-OnFire; removed Sustainability
(Incident Followup).
TASK DETAIL
https://phabricator.wikimedia.org/T336574
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: bking, Aklapper, Gehel
bking added a comment.
Revised totals for alerts in the last year after looking at Logstash:
`RdfStreamingUpdaterHighConsumerUpdateLag` 373
`RdfStreamingUpdaterFlinkProcessingLatencyIsHigh` 63
`RdfStreamingUpdaterFlinkJobUnstable` 125
The majority of all three alert types fired
bking added a comment.
@WolfgangFahl We've whitelisted the endpoints, but the query you linked above
<https://w.wiki/6q2i> still does not work. Can you verify that is it working as
expected? My teammate mentioned "it's returning application/sparql-results+xml
but we on
bking added a subtask: T337801: WDQS: Document procedure for switching between
Kubernetes and Yarn Streaming Updater.
TASK DETAIL
https://phabricator.wikimedia.org/T336134
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: karapayneWMDE
bking closed subtask T330714: Document SRE steps for deploying a new WDQS (and
WCQS) host as Resolved.
TASK DETAIL
https://phabricator.wikimedia.org/T332314
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: RKemper, Gehel, Aklapper
bking claimed this task.
TASK DETAIL
https://phabricator.wikimedia.org/T332314
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: Gehel, Aklapper, Astuthiodit_1, AWesterinen, BTullis, karapayneWMDE,
Invadibot, MPhamWMF, maantietaja
bking reopened this task as "In Progress".
TASK DETAIL
https://phabricator.wikimedia.org/T321605
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: Vgutierrez, RKemper, Volans, Aklapper, bking, Astuthiodit_1, AWesterinen,
kar
bking claimed this task.
TASK DETAIL
https://phabricator.wikimedia.org/T336134
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: bking, Aklapper, Bugreporter, Astuthiodit_1, AWesterinen, karapayneWMDE,
Invadibot, MPhamWMF, maantietaja
bking added a comment.
Per today's triage meeting, we have a liveliness probe in our nginx config,
we need to replace it with something smarter.
TASK DETAIL
https://phabricator.wikimedia.org/T270614
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences
bking removed a project: Discovery-Search (Current work).
TASK DETAIL
https://phabricator.wikimedia.org/T325602
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: Gehel, Aklapper, RKemper, bking, Astuthiodit_1, AWesterinen, karapayneWMDE
bking edited projects, added Discovery-Search; removed Discovery-Search
(Current work).
TASK DETAIL
https://phabricator.wikimedia.org/T274270
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: bking, RKemper, Gehel, Aklapper, Astuthiodit_1
bking removed a project: Discovery-Search (Current work).
TASK DETAIL
https://phabricator.wikimedia.org/T193473
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: bking, Aklapper, Smalyshev, Gehel, Astuthiodit_1, AWesterinen,
karapayneWMDE
bking updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T336134
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: bking, Aklapper, Bugreporter, Astuthiodit_1, AWesterinen, karapayneWMDE,
Invadibot, MPhamWMF
bking added a comment.
Quick notes here before I forget. Checking my "alerts" email folder for the
past year (not the most reliable source), I have:
- 89 alerts with title RdfStreamingUpdaterHighConsumerUpdateLag
- 72 alerts with title RdfStreamingUpdaterFlinkProcessingLat
bking closed subtask T337801: WDQS: Document procedure for switching between
Kubernetes and Yarn Streaming Updater as Resolved.
TASK DETAIL
https://phabricator.wikimedia.org/T336134
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc
bking updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T350784
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: dcausse, JMeybohm, Aklapper, bking, Danny_Benjafield_WMDE, Astuthiodit_1,
AWesterinen, BTullis
bking added a comment.
No objections from the Search Platform side. Side note, our new team (Data
Platform SRE) is reviewing its alerts and alerting strategy,
<https://phabricator.wikimedia.org/T345698> so our config may change in the
near future.
TASK DETAIL
bking closed this task as "Resolved".
TASK DETAIL
https://phabricator.wikimedia.org/T352921
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: Aklapper, Gehel, BTullis, bking, Dzahn, Danny_Benjafield_WMDE,
Astuthiodit_1, kar
bking closed subtask T347355: Create alerts for
https://query.wikidata.org/bigdata/ldf as Resolved.
TASK DETAIL
https://phabricator.wikimedia.org/T347284
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: RKemper, bking, MisterSynergy
bking edited projects, added Data-Platform-SRE; removed Data-Platform-SRE
(2023/24 Q3 Milestone 1).
TASK DETAIL
https://phabricator.wikimedia.org/T350784
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: dcausse, JMeybohm, Aklapper, bking
bking created this task.
bking added projects: Wikidata, Data-Platform-SRE.
Restricted Application added a subscriber: Aklapper.
TASK DESCRIPTION
Per IRC conversation with @dcausse , we have a couple of issues with the
graph split hosts:
- Querying the test hosts is resulting in bans
bking moved this task from In Progress to Done on the Data-Platform-SRE
(2023/24 Q3 Milestone 1) board.
bking closed this task as "Resolved".
bking claimed this task.
bking added a comment.
Merging/applying the above patch added TLS to the test hosts, using the
domain "wdqs-t
bking closed subtask T352878: Troubleshoot recurring systemd unit failures and
availability issues for wdqs1022-24 as Resolved.
TASK DETAIL
https://phabricator.wikimedia.org/T350464
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: Gehel
bking updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T354555
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: Aklapper, RKemper, bking, dcausse, Danny_Benjafield_WMDE, Astuthiodit_1,
BTullis, karapayneWMDE
bking updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T354555
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: Aklapper, RKemper, bking, dcausse, Danny_Benjafield_WMDE, Astuthiodit_1,
BTullis, karapayneWMDE
bking renamed this task from "WDQS graph split hosts: Remove throttling/banning
mechanisms and improve hadoop access" to "WDQS graph split hosts: Remove
throttling/banning mechanisms and investigate external connectivity".
TASK DETAIL
https://phabricator.wikimedi
bking updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T354555
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: Aklapper, RKemper, bking, dcausse, Danny_Benjafield_WMDE, Astuthiodit_1,
BTullis, karapayneWMDE
bking changed the task status from "Open" to "In Progress".
bking moved this task from Backlog to Needs Review on the Data-Platform-SRE
(2023/24 Q3 Milestone 1) board.
bking added a subscriber: BTullis.
bking added a comment.
Per pairing session with @BTullis ,
bking closed this task as "Resolved".
bking moved this task from Backlog to Done on the Data-Platform-SRE (2023/24 Q3
Milestone 1) board.
bking claimed this task.
TASK DETAIL
https://phabricator.wikimedia.org/T336577
WORKBOARD
https://phabricator.wikimedia.org/project/board/68
bking closed subtask T336577: Update WDQS Runbook following update lag incident
as Resolved.
TASK DETAIL
https://phabricator.wikimedia.org/T336134
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: karapayneWMDE, bking, Aklapper, Bugreporter
bking added a comment.
Based on a quick read of the linked documentation and a small addition, I
believe we have satisfied the requirements. Closing...
TASK DETAIL
https://phabricator.wikimedia.org/T336577
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel
bking updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T350784
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: dcausse, JMeybohm, Aklapper, bking, Danny_Benjafield_WMDE, Astuthiodit_1,
AWesterinen, BTullis
bking claimed this task.
TASK DETAIL
https://phabricator.wikimedia.org/T350784
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: dcausse, JMeybohm, Aklapper, bking, Danny_Benjafield_WMDE, Astuthiodit_1,
AWesterinen, BTullis, karapayneWMDE
bking edited projects, added Data-Platform-SRE (2023/24 Q3 Milestone 1);
removed Data-Platform-SRE.
TASK DETAIL
https://phabricator.wikimedia.org/T350784
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: dcausse, JMeybohm, Aklapper, bking
bking added a comment.
I've created another 24-hour silence for this alert, UUID
59b5ca30-1aeb-4d06-b083-7023a373ccb3 .
TASK DETAIL
https://phabricator.wikimedia.org/T347355
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: Gehel, Dzahn
bking added a comment.
Looks like the data reload for lexemes completed. @dcausse , are you able to
check the data from the reload and make sure it's usable? Let me know if I can
help.
TASK DETAIL
https://phabricator.wikimedia.org/T347504
EMAIL PREFERENCES
https
bking added a comment.
Reverted the last change after we some alerts for the following hosts: `1008
1009 1010 1011 2008 2014`
I suspect this has something to do with the tiers, since the ldf_host var was
set in public.yaml...will check and respond here.
TASK DETAIL
https
bking updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T326409
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: Gehel, BTullis, JMeybohm, gmodena, Ottomata, bking, Aklapper, dcausse,
Danny_Benjafield_WMDE
bking updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T326409
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: Gehel, BTullis, JMeybohm, gmodena, Ottomata, bking, Aklapper, dcausse,
Danny_Benjafield_WMDE
bking updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T350784
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: dcausse, JMeybohm, Aklapper, bking, Danny_Benjafield_WMDE, Astuthiodit_1,
AWesterinen, BTullis
bking updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T350784
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: dcausse, JMeybohm, Aklapper, bking, Danny_Benjafield_WMDE, Astuthiodit_1,
AWesterinen, BTullis
bking updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T350784
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: dcausse, JMeybohm, Aklapper, bking, Danny_Benjafield_WMDE, Astuthiodit_1,
AWesterinen, BTullis
bking added a comment.
[ ] remove unused secrets from kubernetes.yaml on private puppet
TASK DETAIL
https://phabricator.wikimedia.org/T350784
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: dcausse, JMeybohm, Aklapper, bking
bking added a comment.
We've silenced the alert for another 24 hours. The network probes Grafana
dashboard
<https://grafana-rw.wikimedia.org/d/O0nHhdhnz/network-probes-overview?forceLogin=true=1=probes%2Fcustom=All=3>
is still showing 0% availability for our ldf probe .
TASK DETAIL
bking updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T326409
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: Gehel, BTullis, JMeybohm, gmodena, Ottomata, bking, Aklapper, dcausse,
Danny_Benjafield_WMDE
bking moved this task from In Progress to Done on the Data-Platform-SRE board.
bking closed this task as "Resolved".
TASK DETAIL
https://phabricator.wikimedia.org/T326409
WORKBOARD
https://phabricator.wikimedia.org/project/board/6524/
EMAIL PREFERENCES
https://phabricator.wik
bking added a comment.
I'm happy to say the flink operator migration is complete. Commons and
wikidata are stable in both CODFW and EQIAD. As such, I'm resolving this
ticket. Post-migration cleanup work continues in T350784
<https://phabricator.wikimedia.org/T350784> .
TASK DETAIL
bking added a comment.
I started a transfer from of the gzip files mentioned above to `wdqs1023`
from `wdqs1024 ` (wdqs hosts have 10Gbps Ethernet vs. 1Gps for the stat
machines, so this should be faster).
You can set a temporary iptables rule to allow traffic between hosts
bking added a comment.
Looks like the check targets are rendered at
`/srv/prometheus/ops/targets/probes-custom_puppet-http.yaml` on the prom hosts
after merging the above patch, the target config for LDF endpoint looks like
this <https://phabricator.wikimedia.org/P53914#218
bking added a comment.
The probe is getting a 500 error, which is spawning phab tickets for
serviceops-collab team (see T352084 <https://phabricator.wikimedia.org/T352084>
). As such, I've set a 24-hour suppression in alertmanager (UUID
fc02d897-8a64-4ebb-a362-77a765a7f155 ) . Will r
bking updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T352921
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: Aklapper, Gehel, BTullis, bking, Dzahn, me, Danny_Benjafield_WMDE,
Astuthiodit_1, BeautifulBold
bking created this task.
bking added projects: Data-Platform-SRE, Epic, Wikidata.
TASK DESCRIPTION
Per IRC conversation with @Dzahn , we noticed that there a couple of Data
Platform SRE-owned sites ( commons-query.wikimedia.org and query.wikidata.org)
hosted at least in part from the miscweb
bking moved this task from In Progress to Blocked / Waiting on the
Data-Platform-SRE board.
bking added a comment.
Moving to "blocked/waiting" until we have confirmation on the reload data.
TASK DETAIL
https://phabricator.wikimedia.org/T347504
WORKBOARD
https://phabricator.wik
bking added a comment.
After some thought, I think the problem is the blackbox check's association
with miscweb. We are actually cutting around miscweb when we access the ldf
endpoint, so we should put the blackbox check outside of
`modules/profile/manifests/microsites/query_service.pp
bking closed this task as "Resolved".
bking moved this task from In Progress to Done on the Data-Platform-SRE board.
bking added a comment.
Apologies for the confusion. We have already migrated the
rdf-streaming-updater to production, so I'm closing this ticket (which is
focused
bking closed subtask T349095: Migrate staging rdf-streaming-updater to flink
operator as Resolved.
TASK DETAIL
https://phabricator.wikimedia.org/T326409
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: Gehel, BTullis, JMeybohm, gmodena
bking added a comment.
Update: The wikidata dump finished on wdqs1022 ( Wikidata dump loaded in 25
days, 13:32:17.263762) .
But all 3 hosts are stuck at the moment; I'm not sure what happened but each
of their individual tasks are stuck at 0%. Maybe the dumps server locked up?
TASK
bking created this task.
bking added projects: Wikidata, Wikidata-Query-Service, Epic.
TASK DESCRIPTION
Per today's Search Platform triage meeting, I'm creating this ticket to
specifically test hardware optimizations that could speed up the import
process. This is in contrast to T336443
bking added a comment.
We'll test I/O `wdqs1014` (R440
<https://phabricator.wikimedia.org/diffusion/EPOC/>) and `wdqs1015` (R450
<https://phabricator.wikimedia.org/diffusion/EPRO/>) using `fio`. This Wikitech
page
<https://wikitech.wikimedia.org/wiki/Kafka/Kafka-main-r
bking updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T326409
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: Gehel, BTullis, JMeybohm, gmodena, Ottomata, bking, Aklapper, dcausse,
Danny_Benjafield_WMDE
bking added a comment.
Both apps (commons and wikidata) are stable in staging-eqiad now:
bking@deploy2002:~/deployment-charts$ kubectl get
flinkdeployments.flink.apache.org
NAME JOB STATUS LIFECYCLE STATE
flink-app-commonsRUNNING STABLE
flink-app
bking added a comment.
team-sre/probes.yaml
<https://github.com/wikimedia/operations-alerts/blob/master/team-sre/probes.yaml>
in the alerts repo looks like a good place to start.
That file also references `prometheus::blackbox::check::http` , which looks
like an easy way to s
bking claimed this task.
bking moved this task from Prioritized Backlog to In Progress on the
Data-Platform-SRE board.
TASK DETAIL
https://phabricator.wikimedia.org/T347355
WORKBOARD
https://phabricator.wikimedia.org/project/board/6524/
EMAIL PREFERENCES
https://phabricator.wikimedia.org
bking added a comment.
Another progress report: We are 80% (869/1104) done on the leading host
(wdqs1022).
TASK DETAIL
https://phabricator.wikimedia.org/T347504
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: RKemper, dcausse, Aklapper
bking updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T241128
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: MPhamWMF, Gehel, Addshore, dcausse, Aklapper, me, Danny_Benjafield_WMDE,
Astuthiodit_1, AWesterinen
bking updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T241128
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: MPhamWMF, Gehel, Addshore, dcausse, Aklapper, me, Danny_Benjafield_WMDE,
Astuthiodit_1, AWesterinen
bking added a comment.
Thanks @Addshore , this is a wealth of great info!
Your observation
> CPU clock speed makes a big difference with the first 10 batches of RDF
taking 2 hours less on a c2-standard-8 3.1-3.8 Ghz machine instead of a
n1-standard-8 2.2-3.7 Ghz mach
bking updated the task description.
TASK DETAIL
https://phabricator.wikimedia.org/T358727
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: ssingh, RKemper, dr0ptp4kt, wiki_willy, bking, AWesterinen, BTullis,
Namenlos314, Gq86
bking created this task.
bking added projects: Data-Platform-SRE, Wikidata-Query-Service, ops-eqiad.
TASK DESCRIPTION
Hello DC Ops,
Per T352253 <https://phabricator.wikimedia.org/T352253> , @dr0ptp4kt
requested one of the recently-decommed CP hosts for the WDQS graph split
expe
bking added a comment.
@VRiley-WMF or @Jclark-ctr are there any other lifecycle steps I need to take
to get this host back into production as `wdqs1025`? This host was already
decommissioned, so I'm not sure what to do to get it back into production.
I can't seem to reach `cp1086
bking added a subtask: T358727: Reclaim recently-decommed CP host for WDQS (see
T352253).
TASK DETAIL
https://phabricator.wikimedia.org/T359062
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: dr0ptp4kt, bking
Cc: dr0ptp4kt, Aklapper
bking added a parent task: T359062: Assess Wikidata dump import hardware.
TASK DETAIL
https://phabricator.wikimedia.org/T358727
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: VRiley-WMF, bking
Cc: cmooney, Jclark-ctr, VRiley-WMF, ssingh, RKemper
bking reopened subtask T358727: Reclaim recently-decommed CP host for WDQS (see
T352253) as Open.
TASK DETAIL
https://phabricator.wikimedia.org/T359062
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: dr0ptp4kt, bking
Cc: dr0ptp4kt, Aklapper
bking reopened this task as "Open".
bking added a comment.
@VRiley-WMF `wdqs1025` is failing to reimage. I can't see any disks in the
DRAC interface, are you able to check the disks and see if they're properly
seated?
TASK DETAIL
https://phabricator.wikimedia.org/T358
bking added a comment.
@dr0ptp4kt `wdqs1025` should be ready for your I/O tests. Let us know how it
goes!
TASK DETAIL
https://phabricator.wikimedia.org/T359062
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: dr0ptp4kt, bking
Cc: bking, dr0ptp4kt
bking added a comment.
@VRiley-WMF
Unfortunately, I'm still getting errors (screenshot)
<https://ewr1.vultrobjects.com/work/disk_errors_wdqs1025.png> when I try to
boot up the host. Are you able to reseat the cables and disks?
TASK DETAIL
https://phabricator.wikimedia.org/T
bking added a comment.
@Jclark-ctr Thanks for the tip, I've added a patch and will try the reimage
again.
TASK DETAIL
https://phabricator.wikimedia.org/T358727
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: VRiley-WMF, bking
Cc: cmooney, Jclark
bking closed subtask T358727: Reclaim recently-decommed CP host for WDQS (see
T352253) as Resolved.
TASK DETAIL
https://phabricator.wikimedia.org/T359062
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: dr0ptp4kt, bking
Cc: ssingh, bking, dr0ptp4kt
bking closed this task as "Resolved".
bking moved this task from Backlog to Done on the Data-Platform-SRE (2024.03.04
- 2024.03.24) board.
bking added a comment.
Apologies for not posting this sooner. `wdqs1025` has been ready for use
since the above request was merged. Closing th
bking added a comment.
@ssingh @dr0ptp4kt hold up on the testing for on your hosts for now...we
might be able to get an NVMe into this year's budget, will let you know.
@dr0ptp4kt If you want to run i/o tests on the existing hosts, I recommend
the approach detailed in this wikitech page
bking added a comment.
Per `sudo cumin A:prometheus 'w'` from a cumin host, there are 8 active
prometheus hosts.
We also have 3 load balancer pools for each wdqs host
<https://config-master.wikimedia.org/pybal/codfw/>:
- wdqs
- wdqs-ssl
- wdqs-heavy-queries
Ea
bking created this task.
bking added projects: Wikidata-Query-Service, Discovery-Search (Current work),
Wikidata, Patch-For-Review, Data-Platform-SRE.
Restricted Application removed a project: Patch-For-Review.
TASK DESCRIPTION
`wdqs1013` is too far behind and we'll need to perform a data
bking created this task.
bking added projects: Wikidata-Query-Service, Discovery-Search (Current work),
Wikidata, Patch-For-Review, Data-Platform-SRE.
Restricted Application removed a project: Patch-For-Review.
TASK DESCRIPTION
Per parent ticket, our inaccurate MaxLag calculation caused
bking added a comment.
> It is possible that this metric is polluted with monitoring queries that do
not relate to serving user traffic
I did a little checking around this. Prometheus blackbox checks are defined
here
<https://gerrit.wikimedia.org/r/plugins/gitiles/operations/
bking added a project: Data-Platform-SRE.
TASK DETAIL
https://phabricator.wikimedia.org/T327689
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: Gehel, ArielGlenn, bking, Aklapper, Danny_Benjafield_WMDE, S8321414,
Astuthiodit_1, AWesterinen
bking edited projects, added Data-Platform-SRE (2024.03.25 - 2024.04.14);
removed Data-Platform-SRE.
TASK DETAIL
https://phabricator.wikimedia.org/T327689
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: Gehel, ArielGlenn, bking, Aklapper
201 - 300 of 312 matches
Mail list logo