Gehel added a comment.

So it seems that puppet failures on wdqs1001 are notified on IRC (# wikidata), but 'WDQS HTTP Port' are not. Looking at the check definition on neon:/etc/icinga/puppet_services.cfg I don't see a significant difference:

define service {
# --PUPPET_NAME-- wdqs1001 WDQS_Internal_HTTP_endpoint
        active_checks_enabled          1
        check_command                  nrpe_check!check_WDQS_Internal_HTTP_endpoint!10
        check_freshness                0
        check_period                   24x7
        contact_groups                 admins,wdqs-admins
        host_name                      wdqs1001
        is_volatile                    0
        max_check_attempts             3
        normal_check_interval          1
        notification_interval          0
        notification_options           c,r,f
        notification_period            24x7
        passive_checks_enabled         1
        retry_check_interval           1
        servicegroups                  wdqs_eqiad
        service_description            WDQS HTTP Port

}
define service {
# --PUPPET_NAME-- wdqs1001 puppet_checkpuppetrun
        active_checks_enabled          1
        check_command                  nrpe_check!check_puppet_checkpuppetrun!10
        check_freshness                0
        check_period                   24x7
        contact_groups                 admins,wdqs-admins
        host_name                      wdqs1001
        is_volatile                    0
        max_check_attempts             3
        normal_check_interval          1
        notification_interval          0
        notification_options           c,r,f
        notification_period            24x7
        passive_checks_enabled         1
        retry_check_interval           1
        servicegroups                  wdqs_eqiad
        service_description            puppet last run

}

Looking at neon:/var/log/icinga/irc-wikidata.log it seems that notifications were sent for both of those checks:

gehel@neon:/var/log/icinga$ grep -i puppet irc-wikidata.log | tail -n 2
PROBLEM - puppet last run on wdqs1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues  
RECOVERY - puppet last run on wdqs1001 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures  
gehel@neon:/var/log/icinga$ grep -i 'WDQS HTTP Port' irc-wikidata.log | tail -n 2
PROBLEM - WDQS HTTP Port on wdqs1002 is CRITICAL: Connection refused by host  
RECOVERY - WDQS HTTP Port on wdqs1002 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 80

TASK DETAIL
https://phabricator.wikimedia.org/T144948

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel
Cc: Smalyshev, Gehel, Aklapper, hoo, mschwarzer, Avner, debt, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, faidon, Mbch331, Jay8g
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to