bking added a comment.
Correction: both MDRAID and LVM servers have this problem. Both services'
systemd unit files have the same "Conflicts=shutdown.target" directive. Still
haven't tried the systemd workaround though, will test that today.
TASK DETAIL
https://phabricator.wik
bking added a comment.
Another piece of the puzzle, some wdqs hosts use MDRAID for their /srv
partition, some use LVM <https://phabricator.wikimedia.org/P23901> . Working
assumption is that only the LVM hosts will take forever to reboot.
TASK DETAIL
https://phabricator.wikimed
bking added a comment.
Actions tried so far: disabling swap via systemd before rebooting. Worked on
`wdqs2007`, did not work on `wdqs2002`. Also worth noting is that we had
previously rebooted `wdqs2007` within the last 30 minutes, so a minor kernel
update (from 4.19.0-16-amd64 to 4.19.0-20
bking added a comment.
This is still happening, @RKemper found some interesting links that could
explain this behavior:
https://wiki.freedesktop.org/www/Software/systemd/Debugging/#diagnosingshutdownproblems
https://old.reddit.com/r/archlinux/comments/ba3zec
bking added a comment.
Per conversation with dcausse, we could potentially run jstack on a timer and
grep the output for errors as shown above, then alert and/or remediate.
TASK DETAIL
https://phabricator.wikimedia.org/T242453
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings
bking renamed this task from "Deadlock in blazegraph blocking all queries and
updates" to "Detect and alert and/or remediate Blazegraph deadlocks".
TASK DETAIL
https://phabricator.wikimedia.org/T242453
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/pa
bking added a comment.
Per messages above, we have completely failed over the wdqs and wdqs-internal
services from eqiad to codfw.
TASK DETAIL
https://phabricator.wikimedia.org/T302494
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: RKemper
bking claimed this task.
TASK DETAIL
https://phabricator.wikimedia.org/T303134
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: jbond, Aklapper, Astuthiodit_1, karapayneWMDE, Invadibot, MPhamWMF,
maantietaja, CBogen, Akuckartz, Nandana
bking added a comment.
Suggestions:
- Data reload
- Server reimage
- Hardware tests
- Close observation over a limited time
TASK DETAIL
https://phabricator.wikimedia.org/T301953
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc
bking claimed this task.
TASK DETAIL
https://phabricator.wikimedia.org/T301953
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: bking
Cc: bking, Aklapper, Zbyszko, Astuthiodit_1, karapayneWMDE, Invadibot,
MPhamWMF, maantietaja, CBogen, Akuckartz
bking added a comment.
Manually installed on wdqs1010
TASK DETAIL
https://phabricator.wikimedia.org/T293862
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: dcausse, bking
Cc: bking, Aklapper, dcausse, Astuthiodit_1, karapayneWMDE, Invadibot
bking added a comment.
Started data load via tmux session on cumin1001 at ~ `Tue Jan 11 16:53:46
2022` . Expected to take at least 24 hours.
TASK DETAIL
https://phabricator.wikimedia.org/T296470
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences
bking added a comment.
Related commits here
<https://gerrit.wikimedia.org/r/plugins/gitiles/operations/alerts/+log/refs/heads/master/team-search-platform/blazegraph.yaml>
TASK DETAIL
https://phabricator.wikimedia.org/T298525
EMAIL PREFERENCES
https://phabricator.wikimedia.org/se
bking renamed this task from "Tune "BlazegraphFreeAllocatorsDecreasingRapidly""
to "Tune "BlazegraphFreeAllocatorsDecreasingRapidly" alerts".
TASK DETAIL
https://phabricator.wikimedia.org/T298525
EMAIL PREFERENCES
https://phabricator.wikimedia.org/set
bking added a subscriber: dcausse.
bking added a comment.
More context from @dcausse :
The alert is managed by Alertmanager, code stored in Gerrit
<https://gerrit.wikimedia.org/r/plugins/gitiles/operations/alerts/+/refs/heads/master/team-search-platform/blazegraph.yaml>
301 - 315 of 315 matches
Mail list logo