[opsview-users] Customized graph per service check.
Hello. The newest opsview supports graph customization. but I can't save current customized graph view configuration. I want to save specific configuration per each service check. I think the feature could be added in service check configuration menu. ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/listinfo/opsview-users
[opsview-users] opsview 3.0.4 problem.
Hello, I've upgrade to version 3.0.4. I am using Firefox 3.0.10 on Windows XP. When i click the arrow button in host detail,service detail and graph pages, they do not show up popup menu. but with Internet explorer, there is no problem. and It seems that "all metrics" menu has disappeared in the arrow button popup menu in graph view pages. Has it been removed deliverately? I hope http://lists.opsview.org/lurker/message/20090507.210619.32fc0fd5.en.htmlwill be included in version 3.1. ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/listinfo/opsview-users
[opsview-users] Passive check state automatically revert to OK state.
Hello. We have centeralized syslog server which collects logs from all servers for matching patterns with SEC and notify to master mointoring server(passive check) by nsca_send command. (master) <-notify--- / \ | (slave) (slave)(log server) Master doesn't do any active checking, and all checking is done by slaves now. but the state(Warning/Critical) set by nsca_send command automatically revert to OK state at own independant hourly intervals. (Watch lines marked with an asterisk.) I think these states have to remain unchanged until it is manually cleared by "submit check result" menu. Why this happen ? - sample log --- *[12-06-2009 19:16:12] SERVICE ALERT: hostname;syslog_event;OK;HARD;1;. [12-06-2009 18:30:42] SERVICE ALERT: hostname;syslog_event;CRITICAL;HARD;1;security[success] 540 .. * [12-06-2009 18:15:12] SERVICE ALERT: hostname;syslog_event;OK;HARD;1;. [12-06-2009 18:12:28] SERVICE ALERT: hostname;syslog_event;WARNING;HARD;1;security[success] 538 .. *[12-06-2009 17:14:12] SERVICE ALERT: hostname;syslog_event;OK;HARD;1;. [12-06-2009 16:40:13] SERVICE ALERT: hostname;syslog_event;WARNING;HARD;1;security[success] . *[12-06-2009 16:13:17] SERVICE ALERT: hostname;syslog_event;OK;HARD;1;. [12-06-2009 16:10:57] SERVICE ALERT: hostname;syslog_event;WARNING;HARD;1;security[success] . *[12-06-2009 15:12:17] SERVICE ALERT: hostname;syslog_event;OK;HARD;1;. [12-06-2009 15:10:45] SERVICE ALERT: hostname;syslog_event;WARNING;HARD;1;security[success] . ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/listinfo/opsview-users
Re: [opsview-users] Passive check state automatically revert to OK state.
I checked this again. Changing State is repeats of the last manual state change to OK. Service Ok[15-06-2009 10:57:44] SERVICE ALERT: hostname;syslog_event;OK;HARD;1;WILL THIS REPEAT? <--- AUTOMATIC STATE REVERT( This should not happen ) Service Critical[15-06-2009 10:40:48] SERVICE ALERT: hostname;syslog_event;CRITICAL;HARD;1;security[success] Service Ok[15-06-2009 09:57:24] SERVICE ALERT: hostname;syslog_event;OK;HARD;1;WILL THIS REPEAT? <--- MANUAL STATE CHANGE Service Critical[15-06-2009 09:37:48] SERVICE ALERT: hostname;syslog_event;CRITICAL;HARD;1;security[success] 2009/6/12 Kang > Hello. > > We have centeralized syslog server which collects logs from all servers for > matching patterns with SEC > and notify to master mointoring server(passive check) by nsca_send command. > > (master) <-notify--- > / \ | > (slave) (slave)(log server) > > > Master doesn't do any active checking, and all checking is done by slaves > now. > > but the state(Warning/Critical) set by nsca_send command automatically > revert to OK state at own independant hourly intervals. > (Watch lines marked with an asterisk.) > > I think these states have to remain unchanged until it is manually cleared > by "submit check result" menu. > > Why this happen ? > > - sample log --- > *[12-06-2009 19:16:12] SERVICE ALERT: hostname;syslog_event;OK;HARD;1;. > [12-06-2009 18:30:42] SERVICE ALERT: > hostname;syslog_event;CRITICAL;HARD;1;security[success] 540 .. > > * [12-06-2009 18:15:12] SERVICE ALERT: hostname;syslog_event;OK;HARD;1;. > [12-06-2009 18:12:28] SERVICE ALERT: > hostname;syslog_event;WARNING;HARD;1;security[success] 538 .. > > *[12-06-2009 17:14:12] SERVICE ALERT: hostname;syslog_event;OK;HARD;1;. > [12-06-2009 16:40:13] SERVICE ALERT: > hostname;syslog_event;WARNING;HARD;1;security[success] . > > *[12-06-2009 16:13:17] SERVICE ALERT: hostname;syslog_event;OK;HARD;1;. > [12-06-2009 16:10:57] SERVICE ALERT: > hostname;syslog_event;WARNING;HARD;1;security[success] . > > *[12-06-2009 15:12:17] SERVICE ALERT: hostname;syslog_event;OK;HARD;1;. > [12-06-2009 15:10:45] SERVICE ALERT: > hostname;syslog_event;WARNING;HARD;1;security[success] . > ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/listinfo/opsview-users
Re: [opsview-users] Passive check state automatically revert to OK state.
Hello I checked again based on the talks in #opsv...@irc.freenode.net channel with you last night. Sending nsca notify to all slaves instead of master still has the same problem. So I changed configuration as per Ton Voon's advice. After that, state revert problem was solved. 2009/6/15 Duncan Ferguson > > On 15 Jun 2009, at 03:13, Kang wrote: > > I checked this again. > > Changing State is repeats of the last manual state change to OK. > > > Service Ok[15-06-2009 10:57:44] SERVICE ALERT: > hostname;syslog_event;OK;HARD;1;WILL THIS REPEAT? <--- AUTOMATIC STATE > REVERT( This should not happen ) > Service Critical[15-06-2009 10:40:48] SERVICE ALERT: > hostname;syslog_event;CRITICAL;HARD;1;security[success] > > Service Ok[15-06-2009 09:57:24] SERVICE ALERT: > hostname;syslog_event;OK;HARD;1;WILL THIS REPEAT? <--- MANUAL STATE > CHANGE > Service Critical[15-06-2009 09:37:48] SERVICE ALERT: > hostname;syslog_event;CRITICAL;HARD;1;security[success] > > > > I ran a test of this all weekend - basically set up a passive check, set the > status to not-OK and left it. The state was not reverted. > > > Can you provide more details of your setup? The log server reports directly > to the master server? Are these checks therefore asigned to the master or > one of the slaves? Even though the checks are passive the nsca events should > be sent to the assigned monitoring server. > > Duncs > > -- > Duncan Ferguson > Senior Developer > > > > Opsera Limited | Unit 69 Suttons Business Park > Reading | Berkshire | RG6 1AZ | UK* > > Phone: *+44 (0) 845 057 7887 > *Mobile**: *+44 (0) 7968 148 748 > *Skype*: duncan_j_ferguson *Email:* *duncan.fergu...@opsera.com** > *www.opsera.com > > Opsera Limited is registered in the UK under Company Number 5396532. Our > registered office is Gorse View, Horsell Rise, Woking, Surrey, GU21 4RB. > > > ___ > Opsview-users mailing list > Opsview-users@lists.opsview.org > http://lists.opsview.org/listinfo/opsview-users > > ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/listinfo/opsview-users
[opsview-users] check_snmp_linkstatus problem
Hello. I am monitoring Cisco Catalyst 4500 switch. but the network traffic graphs are weird.( See the attached file. ) It is sparsely sunken. I experienced this symptom when traffic counter (usally 32bit) is overflowed between checking intervals. So I debugged check_snmp_linkstatus script but It gets 64bit counter well. Where does this problem come from? My guess is that 1. SNMP returns 64bit counter but Perl can't properly handle 64bit integers. ( Needs such as bigint module ?) 2. check_snmp_linkstatus script can't properly handle counter reset. 3. check_snmp_linkstatus script returns the correct value but nagios can't properly handle 64bit integers. <>___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/listinfo/opsview-users
[opsview-users] NMIS ifAlias problem
Hello Our Cisco 4500 switch has configured ifAlias setting. $snmpwalk -v 2c -c community serveip .1.3.6.1.2.1.31.1.1.1.18 IF-MIB::ifAlias.2 = STRING: ## AS1 0/1 ## IF-MIB::ifAlias.3 = STRING: ## AS1 0/2 ## IF-MIB::ifAlias.4 = STRING: ## AS2 0/1 ## IF-MIB::ifAlias.5 = STRING: ## AS2 0/2 ## IF-MIB::ifAlias.6 = STRING: ## LNX ## . . . but NMIS complains that there are no ifAlias. This interface will not be collected of the next reason Node= xxx.xxx.xxx.xxx Interface = GigabitEthernet4/30 (ifDescr) Type= ethernetCsmacd (ifType) Description = (ifAlias) Reason = no Description (ifAlias) Where does NMIS search for ifAlias? Is there another place where ifAlias is exsist? ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/listinfo/opsview-users
[opsview-users] More RRDGraph options.
Hello. I want to configure vertical axis lower and upper limit of graphs. In opsview 2.0. It is possible with rrdopts parameter ( http://docs.opsview.org/doku.php?id=opsview3:faq#performance_graphs_show_an_arbitrary_range) But it seems that the feature is removed in opsview 3.X. Opsview/Web/Controller/RRDgraph.pm file doesn't support all rrdoptions and doesn't get custom rrd options with such as rrdopts parameter in opsview 2.X. I hope that opsview 3.X will support more rrd options. ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/lists/listinfo/opsview-users
[opsview-users] Opsview daemon occasionally dies when applying "Cancel all hostgroup downtime"
Hello Opsview daemon occasionally dies when applying "Cancel all hostgroup downtime" and Server staus icon in the bottom status bar become red. Has Anyone Experienced This? ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/lists/listinfo/opsview-users
[opsview-users] PIDFILE Problem with Catalyst::Engine::HTTP::Prefork 0.50
Hello http://docs.opsview.org/doku.php?id=opsview3.1:prefork says how to improve WEB UI performance. but Catalyst::Engine::HTTP::Prefork 0.50 module included opsview 3.1 has bug So it can't properly create pid file and /etc/init.d/opsview-web script also doesn't properly work. If you want it to work properly. See http://dev.catalystframework.org/svnweb/Catalyst/revision/?rev=10422 and patch. ( Catalyst::Engine::HTTP::Prefork 0.51 solved the problem ) Thank you. ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/lists/listinfo/opsview-users
Re: [opsview-users] Opsview daemon occasionally dies when applying "Cancel all hostgroup downtime"
Hello The problem is reproduced. Q) Which exact version of Opsview are you running, on which platform? A) ubuntu 8.04 hardy 64bit, opsview 3.1 Q) Does the nagios process die or the opsviewd? A) opsviewd is alive, but nagios process is dead. Q)What are in the /var/log/opsview/opsviewd.log and the /usr/local/nagios/var/nagios. > > log? A) [2009/07/28 18:48:02] [opsviewd] [INFO] Processing commands found in /usr/local/nagios/var/slave_commands.cache [2009/07/28 18:48:03] [sendcmd2slaves] [INFO] Sending commands to slaves [2009/07/28 18:48:03] [sendcmd2slaves] [INFO] Commands sent to slaves [2009/07/28 18:48:12] [opsviewd] [INFO] Processing commands found in /usr/local/nagios/var/slave_commands.cache [2009/07/28 18:48:13] [sendcmd2slaves] [INFO] Sending commands to slaves [2009/07/28 18:48:13] [sendcmd2slaves] [INFO] Commands sent to slaves [1248774475] EXTERNAL COMMAND: SCHEDULE_HOST_SVC_DOWNTIME;ns.myhost.net ;1248774471;12 48781671;1;0;;admin;Host 'ns.myhost.net': test [1248774475] EXTERNAL COMMAND: SCHEDULE_HOST_DOWNTIME;ns.myhost.net ;1248774471;124878 1671;1;0;;admin;Host 'ns.myhost.net': test [1248774475] SERVICE DOWNTIME ALERT: ns.myhost.net;DNS;STARTED; Service has entered a period of scheduled downtime [1248774475] HOST DOWNTIME ALERT: ns.myhost.net;STARTED; Host has entered a period of scheduled downtime [1248774488] EXTERNAL COMMAND: DEL_HOSTGROUP_SVC_DOWNTIME;External Servers [1248774488] Caught SIGSEGV, shutting down... -> nagios process dead. 2009/7/28 Ton Voon > > On 27 Jul 2009, at 02:15, Kang wrote: > > Opsview daemon occasionally dies when applying "Cancel all hostgroup >> downtime" >> and Server staus icon in the bottom status bar become red. >> > > Hi Kang, > > No, this has not been reported. > > Does the nagios process die or the opsviewd? What are in the > /var/log/opsview/opsviewd.log and the /usr/local/nagios/var/nagios.log? > > Which exact version of Opsview are you running, on which platform? > > Ton > > ___ > Opsview-users mailing list > Opsview-users@lists.opsview.org > http://lists.opsview.org/lists/listinfo/opsview-users > ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/lists/listinfo/opsview-users
Re: [opsview-users] Opsview daemon occasionally dies when applying "Cancel all hostgroup downtime"
Hello > >Please send the coredump file, the nagios executable and the strace > output, tarred and gzipped. > Sorry, Coredump file contains too much our infra's information to send. the following is gdb bt log. > sudo gdb /usr/local/nagios/bin/nagios core.6419 . . Core was generated by `/usr/local/nagios/bin/nagios -uxd /usr/local/nagios/etc/nagios.cfg'. Program terminated with signal 11, Segmentation fault. [New process 6419] [New process 6421] #0 0x0042a96d in cmd_delete_downtime () (gdb) bt #0 0x0042a96d in cmd_delete_downtime () #1 0x004277bf in process_external_command2 () #2 0x0042741c in process_external_command1 () #3 0x00425ce9 in check_for_external_commands () #4 0x004319df in event_execution_loop () #5 0x00413d31 in main () (gdb) > >What does 'file /usr/local/nagios/bin/nagios' and 'uname -a' give? > /usr/local/nagios/bin/nagios: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), for GNU/Linux 2.6.8, dynamically linked (uses shared libs), stripped Linux HOST 2.6.24-24-server #1 SMP Fri Jul 24 22:44:54 UTC 2009 x86_64 GNU/Linux ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/lists/listinfo/opsview-users
[opsview-users] /usr/local/nagios/bin/query_host probelm.
Hello I encounterd a weird error when adding switch ports with query_host menu. but xml in error message seemed to have no problem. so I redirected xml output to file and checked it. i found the file contains \x00 character in ifAlias section \x00 character in XML caused XML parsing error. I think some switch occasionally returns \x00 character when ifAlias are not set. so I patched /usr/local/nagios/bin/query_host file added $hash->{ifAlias} =~ s/\x00//g; to line 311 of query_host file. ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/lists/listinfo/opsview-users
Re: [opsview-users] Opsview daemon occasionally dies when applying"Cancel all hostgroup downtime"
Hello I retried in new opsview 3.3.0. but the problem is still unsolved. I seems the problem occures during setting scheduling and canceling for only *one host*(not hostgroup). 2009/8/24 unix > On 2009-07-27 01:15, Kang wrote: > >> Hello >> >> Opsview daemon occasionally dies when applying "Cancel all hostgroup >> downtime" >> and Server staus icon in the bottom status bar become red. >> >> >> Has Anyone Experienced This? >> >> Occasionally for us too, when using "Cancel all host downtime" for one > host. > No so big problem for us, the cluster service restart's opsview. > Running opsview 3.1. and Red Hat Enterprise Linux Server release 5.3 64-bit > . > > /Urban > ___ > Opsview-users mailing list > Opsview-users@lists.opsview.org > http://lists.opsview.org/lists/listinfo/opsview-users > ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/lists/listinfo/opsview-users
[opsview-users] Opsview 3.3.0's Object tab in Graph Configuration is not working
Hello I upgraded to Opsview 3.3.0. but Object tab in Graph Configuration is not working. so i can't graph multiple servers data. Is there any patch available ? ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/lists/listinfo/opsview-users
[opsview-users] The way for fully I18N and UTF-8-ized opsview.
Hello. Opsview 3.3.0 started to support i18n. but it's all latin1 based languages even though all opsview web pages are utf8 encoded. Recently I had digged into opsview source code to support korean characters(utf8 encoding). In order to support real i18n. *1. Opsview should implicitly change its based DB encoding to UTF8.* CREATE DATABASE [DBNAME] DEFAULT CHARACTER SET *utf8* COLLATE utf8_general_ci; Most default mysql installation are latin1-based.( i don't want to modify mysql server configuration and it works well without server dependency ) If you want to create utf8-encoded database, you should add implicit option when creating DB. *2. DB connect string should contain ";mysql_enable_utf8=1" * Most default mysql installation are latin1-based and some old mysql doesn't properly handle utf8 client connection. so I modified opsview.conf to override opsview.default file to ensure the client is utf8 connection like "set names utf8"; $dbhost = "localhost;mysql_enable_utf8=1"; .. .. After doing those things. Some database columns ( alias column in hosts table ) can handle korean characters(utf8 encoding) well. but I wanted more columns can handle utf8 encoding. < For example description column in keywords table > so I modified DBIx::Class code in /usr/local/nagios/lib/Opsview/Schema/Keywords.pm file. __PACKAGE__->load_components(qw/UTF8Columns Core/); __PACKAGE__->utf8_columns(qw/description/); but It didn't work as I expected. and I searched other codes. Finally I found that opsview has both DBIx::Class and Class::DBI ORM codes. ( I don't know what and where codes are really used. ) so I modified Class::DBI code ( /usr/local/nagios/lib/Opsview/Keyword.pm ) __PACKAGE__->utf8_columns( qw/description/ ); After that, It worked well. *3. All DB columns need not to be alphanumeric characters should be enabled to handle utf8-encoding characters.* < DBIx::Class code > __PACKAGE__->load_components(qw/UTF8Columns Code/); __PACKAGE__->utf8_columns(qw/utf8_enabled_column1 utf8_enabled_column2/); or __PACKAGE__->load_components(qw/ForceUTF8/); < Class::DBI code > __PACKAGE__->utf8_columns(qw/utf8_enabled_column1 utf8_enabled_column2/); References: http://dev.catalystframework.org/wiki/tutorialsandhowtos/using_unicode ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/lists/listinfo/opsview-users
[opsview-users] Centralized mail notification on the master server
Hello. Email notification of Opsview distributed monitoring sturcture is occurring in each slave. but Mail configuration for every slaves are chore and I had the problem that slave didn't send notify mail for passive check. So I tried to modify all mail notification to be done on the master server. After I knowing that all notify by atom is done by master, I added the following code at the bottom of /usr/local/bin/atom-generator file. It works well as i expected. Note: You must enable both "Notify by Email" and "Notify by WebFeed". Delete or disable mail agent on every slave not to receive duplicated mail. Thank you. - cut here -- # Send mail if ( $ENV{NAGIOS_SERVICEDESC} ) { # IF Service my $mail_title = qq/"$ENV{NAGIOS_NOTIFICATIONTYPE}: $ENV{NAGIOS_SERVICEDESC} is $ENV{NAGIOS_SERVICESTATE} on $ENV{NAGIOS_HOSTNAME}"/; my $mail_receiver = $ENV{NAGIOS_CONTACTEMAIL}; my $mail_content = <<"MAIL1"; $ENV{NAGIOS_NOTIFICATIONTYPE}: $ENV{NAGIOS_SERVICEDESC} is $ENV{NAGIOS_SERVICESTATE} on host $ENV{NAGIOS_HOSTNAME}: $ENV{NAGIOS_SERVICEOUT PUT} Service: $ENV{NAGIOS_SERVICEDESC} Host: $ENV{NAGIOS_HOSTNAME} Alias: $ENV{NAGIOS_HOSTALIAS} Address: $ENV{NAGIOS_HOSTADDRESS} State: $ENV{NAGIOS_SERVICESTATE} Comment: $ENV{NAGIOS_SERVICEACKCOMMENT} ($ENV{NAGIOS_SERVICEACKAUTHOR}) Date/Time: $ENV{NAGIOS_LONGDATETIME} Additional Info: $ENV{NAGIOS_SERVICEOUTPUT} MAIL1 $mail_content =~ s/\\//g; open my $fh, '|-', "/usr/bin/Mail -s $mail_title $mail_receiver"; print {$fh} $mail_content; close $fh; } else { # IF Host my $mail_title = qq/"$ENV{NAGIOS_NOTIFICATIONTYPE}: $ENV{NAGIOS_HOSTNAME} is $ENV{NAGIOS_HOSTSTATE}"/; my $mail_receiver = $ENV{NAGIOS_CONTACTEMAIL}; my $mail_content = <<"MAIL2"; $ENV{NAGIOS_NOTIFICATIONTYPE}: $ENV{NAGIOS_HOSTNAME} is $ENV{NAGIOS_HOSTSTATE}: $ENV{NAGIOS_HOSTOUTPUT} Host: $ENV{NAGIOS_HOSTNAME} Alias: $ENV{NAGIOS_HOSTALIAS} Address: $ENV{NAGIOS_HOSTADDRESS} State: $ENV{NAGIOS_HOSTSTATE} Comment: $ENV{NAGIOS_HOSTACKCOMMENT} ($ENV{NAGIOS_HOSTACKAUTHOR}) Date/Time: $ENV{NAGIOS_LONGDATETIME} Info: $ENV{NAGIOS_HOSTOUTPUT} MAIL2 $mail_content =~ s/\\//g; open my $fh, '|-', "/usr/bin/Mail -s $mail_title $mail_receiver"; print {$fh} $mail_content; close $fh; } - cut here -- ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/lists/listinfo/opsview-users
[opsview-users] Host UNREACHABLE Problem
Hello In opsview 3.3.1 I have two hosts whose configuration have the same parent. When two hosts go down, their host states are different. one is DOWN, the other is UNREACHABLE. why this problem happened?? - Genereated host.cfg gadgets01 -> UNREACHABLE gadgets02 -> DOWN # gadgets01 host definition define host { host_name gadgets01 alias gadgets01 address 10.10.10.19 hostgroups open icon_image linux.png icon_image_alt LOGO - Linux Penguin vrml_image linux.png statusmap_image linux.png action_url /info/host/364 contact_groups hostgroup3_servicegroup19/distprofile,hostgroup3_servicegroup22/distprofile,hostgroup3_servicegroup17/distprofile,hostgroup3_servicegroup18/distprofile,hostgroup3_servicegroup3/distprofile parents Public_OPEN,10lan-OPEN notifications_enabled 1 notification_interval 60 notification_period 24x7 notification_optionsu,d,r,f use host-global } # gadgets02 host definition define host { host_name gadgets02 alias gadgets02 address 10.10.10.20 hostgroups open icon_image linux.png icon_image_alt LOGO - Linux Penguin vrml_image linux.png statusmap_image linux.png action_url /info/host/365 contact_groups hostgroup3_servicegroup19/distprofile,hostgroup3_servicegroup22/distprofile,hostgroup3_servicegroup17/distprofile,hostgroup3_servicegroup18/distprofile,hostgroup3_servicegroup3/distprofile parents Public_OPEN,10lan-OPEN notifications_enabled 1 notification_interval 60 notification_period 24x7 notification_optionsu,d,r,f use host-global } ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/lists/listinfo/opsview-users
Re: [opsview-users] Host UNREACHABLE Problem
Hi I switched back the hosts state to OK manually and tail-greped the two hosts' log. [1253235965] API LOG: admin;PROCESS_HOST_CHECK_RESULT;gadgets01;0;test| [1253235972] HOST ALERT: gadgets01;UP;HARD;1;test [1253235972] HOST NOTIFICATION: admin/distprofile;gadgets01;UP;notify-by-atom;test [1253235978] SERVICE ALERT: gadgets01;Linux CPU Usage;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds. [1253235978] SERVICE ALERT: gadgets01;Linux Network Usage;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds. [1253236011] API LOG: admin;PROCESS_HOST_CHECK_RESULT;gadgets02;0;test| [1253236013] HOST ALERT: gadgets02;UP;HARD;1;test [1253236013] HOST NOTIFICATION: admin/distprofile;gadgets02;UP;notify-by-atom;test [1253236027] SERVICE ALERT: gadgets02;Linux Disk Usage;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds. [1253236037] SERVICE ALERT: gadgets02;Nagios Agent check;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds. [1253236037] SERVICE ALERT: gadgets01;Linux Load Average;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds. [1253236037] SERVICE ALERT: gadgets01;Nagios Agent check;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds. [1253236079] SERVICE ALERT: gadgets02;Linux Hardware Spec;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds. [1253236081] SERVICE ALERT: gadgets02;Linux TCP Established;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds. [1253236097] HOST ALERT: gadgets01;UNREACHABLE;SOFT;1;CRITICAL - 10.10.10.19: rta nan, lost 100% [1253236126] SERVICE ALERT: gadgets02;Linux Load Average;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds. [1253236126] SERVICE ALERT: gadgets02;Syslogd Check;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds. [1253236158] HOST NOTIFICATION: admin/distprofile;gadgetsdb01;UNREACHABLE;notify-by-atom;CRITICAL - 10.10.10.21: rta nan, lost 100% [1253236186] HOST ALERT: gadgets02;DOWN;SOFT;1;CRITICAL - 10.10.10.20: rta nan, lost 100% There is no difference. PS. I had modified hosts.cfg generating section of nagconfgen.pl check_interval 0 ; For the moment, set check_interval to 0 so hosts only checked on demand, like Nagios 2 0 to 5 2009/9/17 Ton Voon > > On 17 Sep 2009, at 11:21, Kang wrote: > > In opsview 3.3.1 >> I have two hosts whose configuration have the same parent. >> >> When two hosts go down, their host states are different. >> one is DOWN, the other is UNREACHABLE. >> >> why this problem happened?? >> > > That sounds strange. Can you provide relevant nagios.log entries around > this time? > > I'm guessing that it could be a very deep nagios host logic problem (I note > there are two parents for each of these hosts), but I'd need to know the > recreation steps. > > Ton > > ___ > Opsview-users mailing list > Opsview-users@lists.opsview.org > http://lists.opsview.org/lists/listinfo/opsview-users > ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/lists/listinfo/opsview-users
Re: [opsview-users] Host UNREACHABLE Problem
Hi I tried again. Both parent are OK and there is no state change during the test. 2009/9/18 Ton Voon > > On 18 Sep 2009, at 02:26, Kang wrote: > > Hi > > I switched back the hosts state to OK manually and tail-greped the two > hosts' log. > > [1253235965] API LOG: admin;PROCESS_HOST_CHECK_RESULT;gadgets01;0;test| > [1253235972] HOST ALERT: gadgets01;UP;HARD;1;test > [1253235972] HOST NOTIFICATION: > admin/distprofile;gadgets01;UP;notify-by-atom;test > [1253235978] SERVICE ALERT: gadgets01;Linux CPU > Usage;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds. > [1253235978] SERVICE ALERT: gadgets01;Linux Network > Usage;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds. > [1253236011] API LOG: admin;PROCESS_HOST_CHECK_RESULT;gadgets02;0;test| > [1253236013] HOST ALERT: gadgets02;UP;HARD;1;test > [1253236013] HOST NOTIFICATION: > admin/distprofile;gadgets02;UP;notify-by-atom;test > [1253236027] SERVICE ALERT: gadgets02;Linux Disk > Usage;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds. > [1253236037] SERVICE ALERT: gadgets02;Nagios Agent > check;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds. > [1253236037] SERVICE ALERT: gadgets01;Linux Load > Average;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds. > [1253236037] SERVICE ALERT: gadgets01;Nagios Agent > check;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds. > [1253236079] SERVICE ALERT: gadgets02;Linux Hardware > Spec;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds. > [1253236081] SERVICE ALERT: gadgets02;Linux TCP > Established;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds. > [1253236097] HOST ALERT: gadgets01;UNREACHABLE;SOFT;1;CRITICAL - > 10.10.10.19: rta nan, lost 100% > [1253236126] SERVICE ALERT: gadgets02;Linux Load > Average;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds. > [1253236126] SERVICE ALERT: gadgets02;Syslogd > Check;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds. > [1253236158] HOST NOTIFICATION: > admin/distprofile;gadgetsdb01;UNREACHABLE;notify-by-atom;CRITICAL - > 10.10.10.21: rta nan, lost 100% > [1253236186] HOST ALERT: gadgets02;DOWN;SOFT;1;CRITICAL - 10.10.10.20: rta > nan, lost 100% > > > Can you include a grep of the state of the parents? > > Ton > > > ___ > Opsview-users mailing list > Opsview-users@lists.opsview.org > http://lists.opsview.org/lists/listinfo/opsview-users > > ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/lists/listinfo/opsview-users
[opsview-users] Performance problem on scheduled downtime for many host(over 1000)
Hi I set a scheduled down time on the host group (checked by slave) which has about 1000 host and over 1 service checks on master server. but It takes too long time to complete the job.( nagios daemon sends internal scheduled down time commands for every service, hosts, sub host group) Leaving the problem, after doing that all nagios CGIs'(status.cgi,extinfo.cgi and so on) response time become very long and the cgi process hog CPU 100%. Could this problem caused by too many scheduled downtimes? ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/lists/listinfo/opsview-users
Re: [opsview-users] Performance problem on scheduled downtime for many host(over 1000)
Hello I talked about this in #opsview IRC channel with duncs. Opsview 3.3.1 uses Nagios ver. 3.06 now. I found the changelog at http://www.nagios.org/development/history/core-3x 3.1.2 - 06/23/2009 - Fix for CPU hogging in service and host check scheduling logic I heard duncs that the next verision of opsview will migrate to nagios 3.2. so this problem will automatically be solved. 2009/9/22 Kang > Hi > > I set a scheduled down time on the host group (checked by slave) which has > about 1000 host and over 1 service checks on master server. > but It takes too long time to complete the job.( nagios daemon sends > internal scheduled down time commands for every service, hosts, sub host > group) > > Leaving the problem, after doing that all nagios > CGIs'(status.cgi,extinfo.cgi and so on) response time become very long and > the cgi process hog CPU 100%. > Could this problem caused by too many scheduled downtimes? > > ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/lists/listinfo/opsview-users
Re: [opsview-users] Performance problem on scheduled downtime for many host(over 1000)
Hello I've tested with new opsview version 3.3.2[ nagios 3.2 core]. but It has the same problem. :( 2009/9/25 Kang > Hello > > I talked about this in #opsview IRC channel with duncs. > > Opsview 3.3.1 uses Nagios ver. 3.06 now. > I found the changelog at http://www.nagios.org/development/history/core-3x > > 3.1.2 - 06/23/2009 > >- Fix for CPU hogging in service and host check scheduling logic > > > I heard duncs that the next verision of opsview will migrate to nagios 3.2. > so this problem will automatically be solved. > > > 2009/9/22 Kang > > Hi >> >> I set a scheduled down time on the host group (checked by slave) which has >> about 1000 host and over 1 service checks on master server. >> but It takes too long time to complete the job.( nagios daemon sends >> internal scheduled down time commands for every service, hosts, sub host >> group) >> >> Leaving the problem, after doing that all nagios >> CGIs'(status.cgi,extinfo.cgi and so on) response time become very long and >> the cgi process hog CPU 100%. >> Could this problem caused by too many scheduled downtimes? >> >> > ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/lists/listinfo/opsview-users
Re: [opsview-users] I've posted screen shots previewing the new user interface in Opsview 3.5.0
Hello A new framework for perf. data will replace the current rrd-based graph ? RRD graph support many customizable options ( stacking, multigraphing, etc.) Will a new Flot( http://code.google.com/p/flot/ )-based graph framwork support those features and still use rrdfile for obtaning graph data with such like javascriptRRD( http://sourceforge.net/projects/javascriptrrd/ ) ? and it will 2009/10/23 James Peel > > I've posted screen shots previewing the new user interface in Opsview 3.5.0 > here: http://bit.ly/4d1rdI > We've also moved to a new framework for displaying performance data. > > Let us know what you think! > > > -- > James > > > > > ___ > Opsview-users mailing list > Opsview-users@lists.opsview.org > http://lists.opsview.org/lists/listinfo/opsview-users > > ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/lists/listinfo/opsview-users
Re: [opsview-users] SNMP "Query Host" returns an error
I guess it is the same problem with http://lists.opsview.org/lurker/message/20090819.024639.881c7a50.en.html I guess it's caused by unvisible \x00 character . 2009/11/18 Matt White > Hi Duncs, > > > > Opsview is running on a VM (ESXi 4) the guest OS is Ubuntu 8.04 server > 32-bit > > Windows OS’s are variations of Server 2003 R2 Standard/Enterprise and > Server 2008 Standard > > > > Not sure if this is related to any teaming of NICs on the Windows servers > or how the data is collected and returned by the query_host script? > > > > Kind regards, > > Matt > > > > *From:* opsview-users-boun...@lists.opsview.org [mailto: > opsview-users-boun...@lists.opsview.org] *On Behalf Of *Duncan Ferguson > *Sent:* 17 November 2009 14:54 > *To:* Opsview Users > *Subject:* Re: [opsview-users] SNMP "Query Host" returns an error > > > > > > On 16 Nov 2009, at 17:01, Matt White wrote: > > > > I have just checked again and it appears to be for all windows hosts. > > > > Removed and re-added one of the older servers and got the same error > message. > > > > What version of Opsview on what OS are you running? > > > > Duncs > > > -- > > Duncan Ferguson > Senior Developer > > > > > Opsera Limited | Unit 69 Suttons Business Park > Reading | Berkshire | RG6 1AZ | UK* > > Phone: *+44 (0) 845 057 7887 > *Mobile**: *+44 (0) 7968 148 748 > *Skype*: duncan_j_ferguson *Email:* *duncan.fergu...@opsera.com** > *www.opsera.com > > > > Opsera Limited is registered in the UK under Company Number 5396532. Our > registered office is Gorse View, Horsell Rise, Woking, Surrey, GU21 4RB. > > > > ___ > Opsview-users mailing list > Opsview-users@lists.opsview.org > http://lists.opsview.org/lists/listinfo/opsview-users > > ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/lists/listinfo/opsview-users
[opsview-users] opsview 3.5.0 graph
Hello The new interface of Opsview 3.5.0 is very nice. But the javascript-ajax based new graph framework is not fully satisfying. it's lack of customizable options compare to old rrdgraph( upper/lower limit, static cur/min/max legend label, fast static image rendering, etc.) and it seems that xtics of graphs don't show local timestamp but GMT timestamp. I've found I can still use RRDgraph with /graphrrd url instead of /graph but right-upper side slide menu disappeared. Are you planing to continute to maintain both graph framework together? I prefer static image RRDgraph to javascript-ajax based graph. because It is fast and easy to integrate with other system. What do you think about that? ** ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/lists/listinfo/opsview-users
Re: [opsview-users] opsview 3.5.0 graph
> > > On Nov 27, 2009, at 1:48 AM, Kang wrote: > > it's lack of customizable options compare to old rrdgraph( upper/ > > lower limit, static cur/min/max legend label, fast static image > > rendering, etc.) > > > What do you mean by upper/lower limit? I think there are some possible > optimisations with how the graphs choose the top and bottom values for the > y-axis. > > > I want to set ower/upper limit of y-axis range myself. but I can't find any menu or parameters to do so. RRDgraph have theses options. In opsview-web/lib/Opsview/Web/Controllere/RRDgraph.pm 123 #if (defined $full_size_mode){ push @$rrdoptions, "--full-size-mode"; } 124 if ( defined $upper_limit ) { push @$rrdoptions, "--upper-limit", $upper_limit } 125 if ( defined $lower_limit ) { push @$rrdoptions, "--lower-limit", $lower_limit } 126 if ( defined $rigid ) { push @$rrdoptions, "--rigid" } 127 if ( defined $alt_autoscale ) { push @$rrdoptions, "--alt-autoscale" } http://oss.oetiker.ch/rrdtool/doc/rrdgraph.en.html [*-u*|*--upper-limit* *value*] [*-l*|*--lower-limit* *value*] [*-r*|*--rigid *] By default the graph will be autoscaling so that it will adjust the y-axis to the range of the data. You can change this behavior by explicitly setting the limits. The displayed y-axis will then range at least from *lower-limit*to *upper-limit*. Autoscaling will still permit those boundaries to be stretched unless the *rigid* option is set. ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/lists/listinfo/opsview-users
Re: [opsview-users] Opsview 3.5.1 released
Hello I told Duncan that ver. 3.5.1's Graph autoscaling does not work by default in #opsview irc channel. please notify me where can I get the patch for the problem. Thank you in advance. 2009/12/22 Duncan Ferguson > Opsview 3.5.1 is now available! More information at > http://opsview.org/opsview_3.5.1 > > Merry Christmas and a Happy New Year from everyone here at Opsera. > > Duncs > ___ > Opsview-users mailing list > Opsview-users@lists.opsview.org > http://lists.opsview.org/lists/listinfo/opsview-users > ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/lists/listinfo/opsview-users
Re: [opsview-users] Bug in cleanup of import_runtime
Hello. https://secure.opsera.com/jira/browse/OPS-950 I'm using Opsview v 3.3.2 now. but i think there is no service_saved_state table(in our DB, its size is 220MB) growing problem. ( i executed the DELETE query which you pasted but there is no change in table data length size. ) i doubt it is the main cause of performance degradation. 2010/1/18 Ton Voon > Hi! > > Just wanted to make people aware of a fix for reducing the time to run > import_runtime. This affects people if there are importing into ODW and have > a large number of services. > > The cleanup section wasn't getting invoked correctly which means that the > odw.service_saved_state table will continue to grow and it may slow down the > duration of the hourly imports into ODW. > > You can run this SQL command on your ODW database if you find that the > odw.service_saved_state table is too large: > > mysql> DELETE FROM service_saved_state WHERE opsview_instance_id = 1 AND > start_timev <= UNIX_TIMESTAMP(NOW() - INTERVAL 7 DAY) > > The patch is here: > > > https://secure.opsera.com/wsvn/wsvn/opsview/trunk/opsview-core/bin/import_runtime?op=diff&; > > This will be included in a future release. > > Ton > > ___ > Opsview-users mailing list > Opsview-users@lists.opsview.org > http://lists.opsview.org/lists/listinfo/opsview-users > ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/lists/listinfo/opsview-users
Re: [opsview-users] Bug in cleanup of import_runtime
Hello. Where should i patch in v.3.3.2's import_runtime.( maybe https://secure.opsera.com/wsvn/wsvn/opsview/trunk/opsview-core/bin/import_runtime?rev=3336&peg=3336) I look into the script but patched parts( https://secure.opsera.com/wsvn/wsvn/opsview/trunk/opsview-core/bin/import_runtime?op=diff&;) are no diffrent. and where service_saved_state table is used? I can't find the table in odw DB schema diagram. 2010/1/19 Ton Voon > > On 19 Jan 2010, at 02:33, Kang wrote: > > https://secure.opsera.com/jira/browse/OPS-950 > > I'm using Opsview v 3.3.2 now. > but i think there is no service_saved_state table(in our DB, its size is > 220MB) growing problem. > ( i executed the DELETE query which you pasted but there is no change in > table data length size. ) > i doubt it is the main cause of performance degradation. > > > On one customer's large system, this was a problem which we saw during the > data load, so we've fixed this problem and raised it on the mailing lists. > > Your system may have a different issue. We I can't look it without a > support contract :( > > The DELETE will not change the table data length based on the length of the > data file - you will need to run a mysql optimise to do that. > > You can see the dataload timings by running this command: > http://docs.opsview.org/doku.php?id=opsview-community:odw#how_long_does_a_dataload_take > > Ton > > > ___ > Opsview-users mailing list > Opsview-users@lists.opsview.org > http://lists.opsview.org/lists/listinfo/opsview-users > > ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/lists/listinfo/opsview-users
Re: [opsview-users] Opsview 3.5.2 released
Hello. Thank you all for your efforts! but it seems that javascript-based graphs in opsview 3.5.2 are still not autoscaling but showing weird scientific notations by default. ___ Opsview-users mailing list Opsview-users@lists.opsview.org http://lists.opsview.org/lists/listinfo/opsview-users