[Wikidata-bugs] [Maniphest] T269693: mediawiki_job_wikidata-updateQueryServiceLag failing
RLazarus closed this task as "Resolved". RLazarus added a comment. Yep, the alert has cleared. Thanks! TASK DETAIL https://phabricator.wikimedia.org/T269693 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: RKemper, RLazarus Cc: dcausse, Dzahn, RLazarus, RKemper, Aklapper, Ramtin0071, Devnull, lmata, Muchiri124, CBogen, Akuckartz, Legado_Shulgin, Nandana, Namenlos314, Davinaclare77, Qtn1293, Techguru.pc, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Zppix, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, Wong128hk, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Rxy, Jay8g, fgiunchedi ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T269693: mediawiki_job_wikidata-updateQueryServiceLag failing
dcausse assigned this task to RKemper. dcausse moved this task from Ready for Development to Needs Reporting on the Discovery-Search (Current work) board. dcausse added a comment. Seems to be resolved now: dcausse@mwmaint1002:~$ mwscript extensions/Wikidata.org/maintenance/updateQueryServiceLag.php --wiki wikidatawiki --cluster wdqs --prometheus prometheus.svc.eqiad.wmnet --prometheus prometheus.svc.codfw.wmnet Done. TASK DETAIL https://phabricator.wikimedia.org/T269693 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: RKemper, dcausse Cc: dcausse, Dzahn, RLazarus, RKemper, Aklapper, Ramtin0071, Devnull, lmata, Muchiri124, CBogen, Akuckartz, Legado_Shulgin, Nandana, Namenlos314, Davinaclare77, Qtn1293, Techguru.pc, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Zppix, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, Wong128hk, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Rxy, Jay8g, fgiunchedi ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T269693: mediawiki_job_wikidata-updateQueryServiceLag failing
CBogen set the point value for this task to "3". TASK DETAIL https://phabricator.wikimedia.org/T269693 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: CBogen Cc: Dzahn, RLazarus, RKemper, Aklapper, Ramtin0071, Devnull, lmata, Muchiri124, CBogen, Akuckartz, Legado_Shulgin, Nandana, Namenlos314, Davinaclare77, Qtn1293, Techguru.pc, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Zppix, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, Wong128hk, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Rxy, Jay8g, fgiunchedi ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T269693: mediawiki_job_wikidata-updateQueryServiceLag failing
CBogen moved this task from All WDQS-related tasks to Current work on the Wikidata-Query-Service board. CBogen added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T269693 WORKBOARD https://phabricator.wikimedia.org/project/board/891/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: CBogen Cc: Dzahn, RLazarus, RKemper, Aklapper, Ramtin0071, Devnull, lmata, Muchiri124, CBogen, Akuckartz, Legado_Shulgin, Nandana, Namenlos314, Davinaclare77, Qtn1293, Techguru.pc, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Zppix, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, Wong128hk, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Rxy, Jay8g, fgiunchedi ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T269693: mediawiki_job_wikidata-updateQueryServiceLag failing
RKemper added a comment. Job lives here: https://github.com/wikimedia/mediawiki-extensions-Wikidata.org/blob/60c5f96ebf424b792077bb7c6b533a68702e7aea/maintenance/updateQueryServiceLag.php#L70 I have a patch open here: https://gerrit.wikimedia.org/r/c/operations/puppet/+/646888 that addresses a different manifestation of this same issue with `blazegraph_lastupdated`. Currently it looks like we'll want to switch `blazegraph_lastupdated` from a `Counter` to a `Gauge`, but I need to figure out if changing the metric type will break the behavior in `updateQueryServiceLag.php` or elsewhere. TASK DETAIL https://phabricator.wikimedia.org/T269693 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: RKemper Cc: Dzahn, RLazarus, RKemper, Aklapper, Ramtin0071, Devnull, lmata, Muchiri124, CBogen, Akuckartz, Legado_Shulgin, Nandana, Namenlos314, Davinaclare77, Qtn1293, Techguru.pc, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Zppix, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, Wong128hk, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Rxy, Jay8g, fgiunchedi ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T269693: mediawiki_job_wikidata-updateQueryServiceLag failing
Dzahn added a comment. I ran the command manually as the same user, www-data. The error is simply "Failed to get lag from prometheus". @mwmaint1002:~# sudo -u www-data /usr/local/bin/mwscript extensions/Wikidata.org/maintenance/updateQueryServiceLag.php --wiki wikidatawiki --cluster wdqs --prometheus prometheus.svc.eqiad.wmnet --prometheus prometheus.svc.codfw.wmnet Failed to get lag from prometheus TASK DETAIL https://phabricator.wikimedia.org/T269693 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Dzahn Cc: Dzahn, RLazarus, RKemper, Aklapper, Ramtin0071, Devnull, lmata, Muchiri124, CBogen, Akuckartz, Legado_Shulgin, Nandana, Namenlos314, Davinaclare77, Qtn1293, Techguru.pc, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Zppix, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, Wong128hk, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Rxy, Jay8g, fgiunchedi ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T269693: mediawiki_job_wikidata-updateQueryServiceLag failing
Dzahn added a comment. ExecStart=/usr/local/bin/mw-cli-wrapper /usr/local/bin/mwscript extensions/Wikidata.org/maintenance/updateQueryServiceLag.php --wiki wikidatawiki --cluster wdqs --prometheus prometheus.svc.eqiad.wmnet --prometheus prometheus.svc.codfw.wmnet TASK DETAIL https://phabricator.wikimedia.org/T269693 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Dzahn Cc: Dzahn, RLazarus, RKemper, Aklapper, Ramtin0071, Devnull, lmata, Muchiri124, CBogen, Akuckartz, Legado_Shulgin, Nandana, Namenlos314, Davinaclare77, Qtn1293, Techguru.pc, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Zppix, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, Wong128hk, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Rxy, Jay8g, fgiunchedi ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T269693: mediawiki_job_wikidata-updateQueryServiceLag failing
RKemper added a comment. There's some context in the description of https://phabricator.wikimedia.org/T269204 that mentions that the counter metric `blazegraph_lastupdated` is now `blazegraph_lastupdated_total`, so if the`mediawiki_job_wikidata-updateQueryServiceLag` job has to do with that metric then the re-image is likely the source of the problem TASK DETAIL https://phabricator.wikimedia.org/T269693 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: RKemper Cc: Dzahn, RLazarus, RKemper, Aklapper, Ramtin0071, Devnull, lmata, Muchiri124, CBogen, Akuckartz, Legado_Shulgin, Nandana, Namenlos314, Davinaclare77, Qtn1293, Techguru.pc, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Zppix, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, Wong128hk, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Rxy, Jay8g, fgiunchedi ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T269693: mediawiki_job_wikidata-updateQueryServiceLag failing
Dzahn added a comment. [20:17:19] ryankemper: I guess it makes sense that "job_wikidata-updateQueryServiceLag" could not run during current work [20:19:46] mutante: yeah, I'm not familiar with how the job works specifically but that would make sense [20:20:42] if it's supposed to alert at <50% availability then that might be a bit unexpected because only one node in each `dc x [internal, external]` is being re-imaged at a time [20:21:12] So for codfw for example there's one codfw wdqs-internal host that would be unable to report and one codfw external wdqs host [20:21:48] ryankemper: no, this case is not about availability, it's "one of the mediawiki maintenance 'crons' (that are now systemd timers) failed to run on the maintenance servers [20:26:03] ryankemper: what happens is: "maintenance job tries to update what the current lag is like.. tries to get lag data from prometheus and that fails. now since it's a systemd timer and not a cron it means it's a failed service which then turns into an Icinga alert about "systemd state is bad on a mwmaint server". and nothing clears it.. but this job runs every minute.. so it's more alert than [20:26:09] is appropriate [20:26:35] let me just clear that failed service and wait a minute [20:27:14] !log mwmaint1002 - systemctl reset-failed to clear icinga alert TASK DETAIL https://phabricator.wikimedia.org/T269693 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Dzahn Cc: Dzahn, RLazarus, RKemper, Aklapper, Ramtin0071, Devnull, lmata, Muchiri124, CBogen, Akuckartz, Legado_Shulgin, Nandana, Namenlos314, Davinaclare77, Qtn1293, Techguru.pc, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Zppix, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, Wong128hk, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Rxy, Jay8g, fgiunchedi ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T269693: mediawiki_job_wikidata-updateQueryServiceLag failing
jbond triaged this task as "Medium" priority. TASK DETAIL https://phabricator.wikimedia.org/T269693 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: jbond Cc: RLazarus, RKemper, Aklapper, Devnull, lmata, Muchiri124, CBogen, Akuckartz, Legado_Shulgin, Nandana, Namenlos314, Davinaclare77, Qtn1293, Techguru.pc, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Zppix, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, Wong128hk, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Rxy, Jay8g, fgiunchedi ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T269693: mediawiki_job_wikidata-updateQueryServiceLag failing
RLazarus created this task. RLazarus added projects: Wikidata-Query-Service, Operations. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata. TASK DESCRIPTION This systemd timer runs every minute, but the last time it succeeded was Dec 7, 19:41 UTC. Since then it's been failing consistently; journalctl looks like: Dec 08 16:52:00 mwmaint1002 systemd[1]: Started MediaWiki periodic job wikidata-updateQueryServiceLag. Dec 08 16:52:01 mwmaint1002 mediawiki_job_wikidata-updateQueryServiceLag[131972]: Failed to get lag from prometheus Dec 08 16:52:01 mwmaint1002 systemd[1]: mediawiki_job_wikidata-updateQueryServiceLag.service: Main process exited, code=exited, status=1/FAILURE Dec 08 16:52:01 mwmaint1002 systemd[1]: mediawiki_job_wikidata-updateQueryServiceLag.service: Unit entered failed state. Dec 08 16:52:01 mwmaint1002 systemd[1]: mediawiki_job_wikidata-updateQueryServiceLag.service: Failed with result 'exit-code'. SAL shows @RKemper was reimaging WDQS hosts at the time it started failing, not sure if that's related or coincidence. TASK DETAIL https://phabricator.wikimedia.org/T269693 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: RLazarus Cc: RLazarus, RKemper, Aklapper, Devnull, lmata, Muchiri124, CBogen, Akuckartz, Legado_Shulgin, Nandana, Namenlos314, Davinaclare77, Qtn1293, Techguru.pc, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Zppix, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, Wong128hk, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Rxy, Jay8g, fgiunchedi ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs