[opsview-users] Customized graph per service check.

2009-05-06 Thread Kang
Hello.

The newest opsview supports graph customization.
but I can't save current customized graph view configuration.

I want to save specific configuration per each service check.
I think the feature could be added in service check configuration menu.
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/listinfo/opsview-users


[opsview-users] opsview 3.0.4 problem.

2009-05-15 Thread Kang
Hello,

I've upgrade to version 3.0.4.
I am using Firefox 3.0.10 on Windows XP. When i click the arrow button in
host detail,service detail and graph pages, they do not show up popup menu.
but with Internet explorer, there is no problem.

and It seems that "all metrics" menu has disappeared in the arrow button
popup menu in graph view pages.
Has it been removed deliverately?

I hope
http://lists.opsview.org/lurker/message/20090507.210619.32fc0fd5.en.htmlwill
be included in version 3.1.
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/listinfo/opsview-users


[opsview-users] Passive check state automatically revert to OK state.

2009-06-12 Thread Kang
Hello.

We have centeralized syslog server which collects logs from all servers for
matching patterns with SEC
and notify to master mointoring server(passive check) by nsca_send command.

(master)  <-notify---
  / \  |
(slave)  (slave)(log server)


Master doesn't do any active checking, and all checking is done by slaves
now.

but the state(Warning/Critical) set by nsca_send command automatically
revert to OK state at own independant hourly intervals.
(Watch lines marked with an asterisk.)

I think these states have to remain unchanged until it is manually cleared
by "submit check result" menu.

Why this happen ?

- sample log ---
*[12-06-2009 19:16:12] SERVICE ALERT: hostname;syslog_event;OK;HARD;1;.
 [12-06-2009 18:30:42] SERVICE ALERT:
hostname;syslog_event;CRITICAL;HARD;1;security[success] 540 ..

* [12-06-2009 18:15:12] SERVICE ALERT: hostname;syslog_event;OK;HARD;1;.
  [12-06-2009 18:12:28] SERVICE ALERT:
hostname;syslog_event;WARNING;HARD;1;security[success] 538 ..

*[12-06-2009 17:14:12] SERVICE ALERT: hostname;syslog_event;OK;HARD;1;.
 [12-06-2009 16:40:13] SERVICE ALERT:
hostname;syslog_event;WARNING;HARD;1;security[success] .

*[12-06-2009 16:13:17] SERVICE ALERT: hostname;syslog_event;OK;HARD;1;.
 [12-06-2009 16:10:57] SERVICE ALERT:
hostname;syslog_event;WARNING;HARD;1;security[success] .

*[12-06-2009 15:12:17] SERVICE ALERT: hostname;syslog_event;OK;HARD;1;.
 [12-06-2009 15:10:45] SERVICE ALERT:
hostname;syslog_event;WARNING;HARD;1;security[success] .
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/listinfo/opsview-users


Re: [opsview-users] Passive check state automatically revert to OK state.

2009-06-14 Thread Kang
I checked this again.

Changing State is repeats of the last manual state change to OK.


Service Ok[15-06-2009 10:57:44] SERVICE ALERT:
hostname;syslog_event;OK;HARD;1;WILL THIS REPEAT?  <--- AUTOMATIC STATE
REVERT( This should not  happen )
Service Critical[15-06-2009 10:40:48] SERVICE ALERT:
hostname;syslog_event;CRITICAL;HARD;1;security[success] 

Service Ok[15-06-2009 09:57:24] SERVICE ALERT:
hostname;syslog_event;OK;HARD;1;WILL THIS REPEAT?  <--- MANUAL STATE
CHANGE
Service Critical[15-06-2009 09:37:48] SERVICE ALERT:
hostname;syslog_event;CRITICAL;HARD;1;security[success] 




2009/6/12 Kang 

> Hello.
>
> We have centeralized syslog server which collects logs from all servers for
> matching patterns with SEC
> and notify to master mointoring server(passive check) by nsca_send command.
>
> (master)  <-notify---
>   / \  |
> (slave)  (slave)(log server)
>
>
> Master doesn't do any active checking, and all checking is done by slaves
> now.
>
> but the state(Warning/Critical) set by nsca_send command automatically
> revert to OK state at own independant hourly intervals.
> (Watch lines marked with an asterisk.)
>
> I think these states have to remain unchanged until it is manually cleared
> by "submit check result" menu.
>
> Why this happen ?
>
> - sample log ---
> *[12-06-2009 19:16:12] SERVICE ALERT: hostname;syslog_event;OK;HARD;1;.
>  [12-06-2009 18:30:42] SERVICE ALERT:
> hostname;syslog_event;CRITICAL;HARD;1;security[success] 540 ..
>
> * [12-06-2009 18:15:12] SERVICE ALERT: hostname;syslog_event;OK;HARD;1;.
>   [12-06-2009 18:12:28] SERVICE ALERT:
> hostname;syslog_event;WARNING;HARD;1;security[success] 538 ..
>
> *[12-06-2009 17:14:12] SERVICE ALERT: hostname;syslog_event;OK;HARD;1;.
>  [12-06-2009 16:40:13] SERVICE ALERT:
> hostname;syslog_event;WARNING;HARD;1;security[success] .
>
> *[12-06-2009 16:13:17] SERVICE ALERT: hostname;syslog_event;OK;HARD;1;.
>  [12-06-2009 16:10:57] SERVICE ALERT:
> hostname;syslog_event;WARNING;HARD;1;security[success] .
>
> *[12-06-2009 15:12:17] SERVICE ALERT: hostname;syslog_event;OK;HARD;1;.
>  [12-06-2009 15:10:45] SERVICE ALERT:
> hostname;syslog_event;WARNING;HARD;1;security[success] .
>
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/listinfo/opsview-users


Re: [opsview-users] Passive check state automatically revert to OK state.

2009-06-16 Thread Kang
Hello

I checked again based on the talks in #opsv...@irc.freenode.net channel with
you last night.
Sending nsca notify to all slaves instead of master still has the same
problem.

So I changed configuration as per Ton Voon's advice.
After that, state revert problem was solved.



2009/6/15 Duncan Ferguson 

>
> On 15 Jun 2009, at 03:13, Kang wrote:
>
> I checked this again.
>
> Changing State is repeats of the last manual state change to OK.
>
>
> Service Ok[15-06-2009 10:57:44] SERVICE ALERT:
> hostname;syslog_event;OK;HARD;1;WILL THIS REPEAT?  <--- AUTOMATIC STATE
> REVERT( This should not  happen )
> Service Critical[15-06-2009 10:40:48] SERVICE ALERT:
> hostname;syslog_event;CRITICAL;HARD;1;security[success] 
>
> Service Ok[15-06-2009 09:57:24] SERVICE ALERT:
> hostname;syslog_event;OK;HARD;1;WILL THIS REPEAT?  <--- MANUAL STATE
> CHANGE
> Service Critical[15-06-2009 09:37:48] SERVICE ALERT:
> hostname;syslog_event;CRITICAL;HARD;1;security[success] 
>
>
>
> I ran a test of this all weekend - basically set up a passive check, set the 
> status to not-OK and left it.  The state was not reverted.
>
>
> Can you provide more details of your setup?  The log server reports directly 
> to the master server?  Are these checks therefore asigned to the master or 
> one of the slaves?  Even though the checks are passive the nsca events should 
> be sent to the assigned monitoring server.
>
>   Duncs
>
> --
> Duncan Ferguson
> Senior Developer
>
>
>
> Opsera Limited | Unit 69 Suttons Business Park
> Reading | Berkshire | RG6 1AZ | UK*
>
> Phone:   *+44 (0) 845 057 7887
> *Mobile**:   *+44 (0) 7968 148 748
> *Skype*:   duncan_j_ferguson *Email:*   *duncan.fergu...@opsera.com**
> *www.opsera.com
>
> Opsera Limited is registered in the UK under Company Number 5396532. Our
> registered office is Gorse View, Horsell Rise, Woking, Surrey, GU21 4RB.
>
>
> ___
> Opsview-users mailing list
> Opsview-users@lists.opsview.org
> http://lists.opsview.org/listinfo/opsview-users
>
>
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/listinfo/opsview-users


[opsview-users] check_snmp_linkstatus problem

2009-07-06 Thread Kang
Hello.

I am monitoring Cisco Catalyst 4500 switch.
but the network traffic graphs are weird.( See the attached file. )
It is sparsely sunken.
I experienced this symptom when traffic counter (usally 32bit) is overflowed
between checking intervals.

So I debugged check_snmp_linkstatus script but It gets 64bit counter well.

Where does this problem come from?

My guess is that
1. SNMP returns 64bit counter but Perl can't properly handle 64bit integers.
( Needs such as bigint module ?)
2. check_snmp_linkstatus script can't properly handle counter reset.
3. check_snmp_linkstatus script returns the correct value but nagios can't
properly handle 64bit integers.
<>___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/listinfo/opsview-users


[opsview-users] NMIS ifAlias problem

2009-07-06 Thread Kang
Hello

Our Cisco 4500 switch has configured ifAlias setting.

$snmpwalk -v 2c -c community serveip .1.3.6.1.2.1.31.1.1.1.18
IF-MIB::ifAlias.2 = STRING: ## AS1 0/1 ##
IF-MIB::ifAlias.3 = STRING: ## AS1 0/2 ##
IF-MIB::ifAlias.4 = STRING: ## AS2 0/1 ##
IF-MIB::ifAlias.5 = STRING: ## AS2 0/2 ##
IF-MIB::ifAlias.6 = STRING: ## LNX ##
.
.
.


but NMIS complains that there are no ifAlias.



 This interface will not be collected of the next reason

 Node= xxx.xxx.xxx.xxx

 Interface   = GigabitEthernet4/30
  (ifDescr)
 Type= ethernetCsmacd
  (ifType)
 Description =
  (ifAlias)

 Reason  = no Description (ifAlias)

Where does NMIS search for ifAlias?
Is there another place where ifAlias is exsist?
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/listinfo/opsview-users


[opsview-users] More RRDGraph options.

2009-07-09 Thread Kang
Hello.

I want to configure vertical axis lower and upper limit of graphs.
In opsview 2.0. It is possible with rrdopts parameter (
http://docs.opsview.org/doku.php?id=opsview3:faq#performance_graphs_show_an_arbitrary_range)
But it seems that the feature is removed in opsview 3.X.

Opsview/Web/Controller/RRDgraph.pm file doesn't support all rrdoptions and
doesn't get custom rrd options with such as rrdopts parameter in opsview
2.X.

I hope that opsview 3.X will support more rrd options.
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users


[opsview-users] Opsview daemon occasionally dies when applying "Cancel all hostgroup downtime"

2009-07-26 Thread Kang
Hello

Opsview daemon occasionally dies when applying "Cancel all hostgroup
downtime"
and Server staus icon in the bottom status bar become red.


Has Anyone Experienced This?
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users


[opsview-users] PIDFILE Problem with Catalyst::Engine::HTTP::Prefork 0.50

2009-07-27 Thread Kang
Hello

http://docs.opsview.org/doku.php?id=opsview3.1:prefork says how to improve
WEB UI performance.
but Catalyst::Engine::HTTP::Prefork 0.50 module included opsview 3.1 has bug
So it can't properly create pid file and /etc/init.d/opsview-web script also
doesn't properly work.

If you want it to work properly. See
http://dev.catalystframework.org/svnweb/Catalyst/revision/?rev=10422  and
patch.
( Catalyst::Engine::HTTP::Prefork 0.51 solved the problem )

Thank you.
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users


Re: [opsview-users] Opsview daemon occasionally dies when applying "Cancel all hostgroup downtime"

2009-07-28 Thread Kang
Hello

The problem is reproduced.

Q) Which exact version of Opsview are you running, on which platform?
A) ubuntu 8.04 hardy 64bit, opsview 3.1

Q) Does the nagios process die or the opsviewd?
A) opsviewd is alive, but nagios process is dead.

Q)What are in the /var/log/opsview/opsviewd.log and the
/usr/local/nagios/var/nagios.
>
> log?

A)

[2009/07/28 18:48:02] [opsviewd] [INFO] Processing commands found in
/usr/local/nagios/var/slave_commands.cache
[2009/07/28 18:48:03] [sendcmd2slaves] [INFO] Sending commands to slaves
[2009/07/28 18:48:03] [sendcmd2slaves] [INFO] Commands sent to slaves
[2009/07/28 18:48:12] [opsviewd] [INFO] Processing commands found in
/usr/local/nagios/var/slave_commands.cache
[2009/07/28 18:48:13] [sendcmd2slaves] [INFO] Sending commands to slaves
[2009/07/28 18:48:13] [sendcmd2slaves] [INFO] Commands sent to slaves


[1248774475] EXTERNAL COMMAND: SCHEDULE_HOST_SVC_DOWNTIME;ns.myhost.net
;1248774471;12
48781671;1;0;;admin;Host 'ns.myhost.net': test
[1248774475] EXTERNAL COMMAND: SCHEDULE_HOST_DOWNTIME;ns.myhost.net
;1248774471;124878
1671;1;0;;admin;Host 'ns.myhost.net': test
[1248774475] SERVICE DOWNTIME ALERT: ns.myhost.net;DNS;STARTED; Service has
entered a
 period of scheduled downtime
[1248774475] HOST DOWNTIME ALERT: ns.myhost.net;STARTED; Host has entered a
period of
 scheduled downtime
[1248774488] EXTERNAL COMMAND: DEL_HOSTGROUP_SVC_DOWNTIME;External Servers
[1248774488] Caught SIGSEGV, shutting down...
 -> nagios process dead.





2009/7/28 Ton Voon 

>
> On 27 Jul 2009, at 02:15, Kang wrote:
>
>  Opsview daemon occasionally dies when applying "Cancel all hostgroup
>> downtime"
>> and Server staus icon in the bottom status bar become red.
>>
>
> Hi Kang,
>
> No, this has not been reported.
>
> Does the nagios process die or the opsviewd? What are in the
> /var/log/opsview/opsviewd.log and the /usr/local/nagios/var/nagios.log?
>
> Which exact version of Opsview are you running, on which platform?
>
> Ton
>
> ___
> Opsview-users mailing list
> Opsview-users@lists.opsview.org
> http://lists.opsview.org/lists/listinfo/opsview-users
>
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users


Re: [opsview-users] Opsview daemon occasionally dies when applying "Cancel all hostgroup downtime"

2009-07-29 Thread Kang
Hello


> >Please send the coredump file, the nagios executable and the strace
> output, tarred and gzipped.
>

Sorry, Coredump file contains too much our infra's information to send.

the following is gdb bt log.
> sudo gdb /usr/local/nagios/bin/nagios core.6419
.
.

Core was generated by `/usr/local/nagios/bin/nagios -uxd
/usr/local/nagios/etc/nagios.cfg'.
Program terminated with signal 11, Segmentation fault.
[New process 6419]
[New process 6421]
#0  0x0042a96d in cmd_delete_downtime ()
(gdb) bt
#0  0x0042a96d in cmd_delete_downtime ()
#1  0x004277bf in process_external_command2 ()
#2  0x0042741c in process_external_command1 ()
#3  0x00425ce9 in check_for_external_commands ()
#4  0x004319df in event_execution_loop ()
#5  0x00413d31 in main ()
(gdb)


> >What does 'file /usr/local/nagios/bin/nagios' and 'uname -a' give?
>

/usr/local/nagios/bin/nagios: ELF 64-bit LSB executable, x86-64, version 1
(SYSV), for GNU/Linux 2.6.8, dynamically linked (uses shared libs), stripped

Linux HOST 2.6.24-24-server #1 SMP Fri Jul 24 22:44:54 UTC 2009 x86_64
GNU/Linux
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users


[opsview-users] /usr/local/nagios/bin/query_host probelm.

2009-08-18 Thread Kang
Hello

I encounterd a weird error when adding switch ports with query_host menu.
but xml in error message seemed to have no problem.
so I redirected xml output to file and checked it.
i found the file contains \x00 character in ifAlias section
\x00 character in XML caused XML parsing error.
I think some switch occasionally returns \x00 character when ifAlias are not
set.

so I patched /usr/local/nagios/bin/query_host file
added
 $hash->{ifAlias} =~ s/\x00//g;
to line 311 of query_host file.
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users


Re: [opsview-users] Opsview daemon occasionally dies when applying"Cancel all hostgroup downtime"

2009-08-27 Thread Kang
Hello

I retried in new opsview 3.3.0.
but the problem is still unsolved.
I seems the problem occures during setting scheduling and canceling for only
*one host*(not hostgroup).


2009/8/24 unix 

> On 2009-07-27 01:15, Kang wrote:
>
>> Hello
>>
>> Opsview daemon occasionally dies when applying "Cancel all hostgroup
>> downtime"
>> and Server staus icon in the bottom status bar become red.
>>
>>
>> Has Anyone Experienced This?
>>
>>  Occasionally for us too, when using "Cancel all host downtime" for one
> host.
> No so big problem for us, the cluster service restart's opsview.
> Running opsview 3.1. and Red Hat Enterprise Linux Server release 5.3 64-bit
> .
>
> /Urban
> ___
> Opsview-users mailing list
> Opsview-users@lists.opsview.org
> http://lists.opsview.org/lists/listinfo/opsview-users
>
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users


[opsview-users] Opsview 3.3.0's Object tab in Graph Configuration is not working

2009-08-27 Thread Kang
Hello

I upgraded to Opsview 3.3.0.
but Object tab in Graph Configuration is not working.
so i can't graph multiple  servers data.

Is there any patch available ?
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users


[opsview-users] The way for fully I18N and UTF-8-ized opsview.

2009-08-29 Thread Kang
Hello.

Opsview 3.3.0 started to support i18n.
but it's all latin1 based languages even though all opsview web pages are
utf8 encoded.

Recently I had digged into opsview source code to support korean
characters(utf8 encoding).

In order to support real i18n.

*1. Opsview should implicitly change its based DB encoding to UTF8.*
CREATE DATABASE [DBNAME] DEFAULT CHARACTER SET *utf8* COLLATE
utf8_general_ci;

Most default mysql installation are latin1-based.( i don't want to modify
mysql server configuration and it works well without server dependency )
If you want to create utf8-encoded database, you should add implicit option
when creating DB.


*2. DB connect string should contain ";mysql_enable_utf8=1"
*
Most default mysql installation are latin1-based and some old mysql doesn't
properly handle utf8 client connection.
so I modified opsview.conf to override opsview.default file to ensure the
client is utf8 connection like "set names utf8";

$dbhost = "localhost;mysql_enable_utf8=1";
..
..

After doing those things.

Some database columns ( alias column in hosts table ) can handle korean
characters(utf8 encoding) well.
but I wanted more columns can handle utf8 encoding.
< For example description column in keywords table >

so I modified DBIx::Class code in
/usr/local/nagios/lib/Opsview/Schema/Keywords.pm file.
__PACKAGE__->load_components(qw/UTF8Columns Core/);
__PACKAGE__->utf8_columns(qw/description/);

but It didn't work as I expected. and I searched other codes.
Finally I found that opsview has both DBIx::Class and Class::DBI ORM codes.
( I don't know what and where codes are really used. )

so I modified Class::DBI code ( /usr/local/nagios/lib/Opsview/Keyword.pm )
__PACKAGE__->utf8_columns( qw/description/ );

After that, It worked well.

*3. All DB columns need not to be alphanumeric characters should be enabled
to handle utf8-encoding characters.*

< DBIx::Class code >

__PACKAGE__->load_components(qw/UTF8Columns Code/);
__PACKAGE__->utf8_columns(qw/utf8_enabled_column1 utf8_enabled_column2/);

or
__PACKAGE__->load_components(qw/ForceUTF8/);


< Class::DBI code >

__PACKAGE__->utf8_columns(qw/utf8_enabled_column1 utf8_enabled_column2/);


References:
http://dev.catalystframework.org/wiki/tutorialsandhowtos/using_unicode
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users


[opsview-users] Centralized mail notification on the master server

2009-09-04 Thread Kang
Hello.

Email notification of Opsview distributed monitoring sturcture is occurring
in each slave.
but Mail configuration for every slaves are chore and I had the problem that
slave didn't send notify mail for passive check.

So I tried to modify all mail notification to be done on the master server.
After I knowing that all notify by atom is done by master, I added the
following code at the bottom of /usr/local/bin/atom-generator file.
It works well as i expected.

Note: You must enable both "Notify by Email" and "Notify by WebFeed".
  Delete or disable mail agent on every slave not to receive
duplicated mail.

Thank you.

- cut here --
# Send mail
if ( $ENV{NAGIOS_SERVICEDESC} ) {
# IF Service

my $mail_title = qq/"$ENV{NAGIOS_NOTIFICATIONTYPE}:
$ENV{NAGIOS_SERVICEDESC} is $ENV{NAGIOS_SERVICESTATE} on
$ENV{NAGIOS_HOSTNAME}"/;
my $mail_receiver = $ENV{NAGIOS_CONTACTEMAIL};
my $mail_content = <<"MAIL1";
$ENV{NAGIOS_NOTIFICATIONTYPE}: $ENV{NAGIOS_SERVICEDESC} is
$ENV{NAGIOS_SERVICESTATE} on host $ENV{NAGIOS_HOSTNAME}:
$ENV{NAGIOS_SERVICEOUT
PUT}

Service: $ENV{NAGIOS_SERVICEDESC}
Host: $ENV{NAGIOS_HOSTNAME}
Alias: $ENV{NAGIOS_HOSTALIAS}
Address: $ENV{NAGIOS_HOSTADDRESS}
State: $ENV{NAGIOS_SERVICESTATE}
Comment: $ENV{NAGIOS_SERVICEACKCOMMENT} ($ENV{NAGIOS_SERVICEACKAUTHOR})
Date/Time: $ENV{NAGIOS_LONGDATETIME}

Additional Info:

$ENV{NAGIOS_SERVICEOUTPUT}
MAIL1

$mail_content =~ s/\\//g;
open my $fh, '|-', "/usr/bin/Mail -s $mail_title $mail_receiver";
print {$fh} $mail_content;
close $fh;
}
else {
# IF Host

my $mail_title = qq/"$ENV{NAGIOS_NOTIFICATIONTYPE}:
$ENV{NAGIOS_HOSTNAME} is $ENV{NAGIOS_HOSTSTATE}"/;
my $mail_receiver = $ENV{NAGIOS_CONTACTEMAIL};
my $mail_content = <<"MAIL2";
$ENV{NAGIOS_NOTIFICATIONTYPE}: $ENV{NAGIOS_HOSTNAME} is
$ENV{NAGIOS_HOSTSTATE}: $ENV{NAGIOS_HOSTOUTPUT}

Host: $ENV{NAGIOS_HOSTNAME}
Alias: $ENV{NAGIOS_HOSTALIAS}
Address: $ENV{NAGIOS_HOSTADDRESS}
State: $ENV{NAGIOS_HOSTSTATE}
Comment: $ENV{NAGIOS_HOSTACKCOMMENT} ($ENV{NAGIOS_HOSTACKAUTHOR})
Date/Time: $ENV{NAGIOS_LONGDATETIME}
Info: $ENV{NAGIOS_HOSTOUTPUT}
MAIL2

$mail_content =~ s/\\//g;
open my $fh, '|-', "/usr/bin/Mail -s $mail_title $mail_receiver";
print {$fh} $mail_content;
close $fh;

}

- cut here --
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users


[opsview-users] Host UNREACHABLE Problem

2009-09-17 Thread Kang
Hello

In opsview 3.3.1
I have two hosts whose configuration have the same parent.

When two hosts go down, their host states are different.
one is DOWN, the other is UNREACHABLE.

why this problem happened??

-
Genereated host.cfg

gadgets01 -> UNREACHABLE
gadgets02 -> DOWN

# gadgets01 host definition
define host {
host_name   gadgets01
alias   gadgets01
address 10.10.10.19
hostgroups  open
icon_image  linux.png
icon_image_alt  LOGO - Linux Penguin
vrml_image  linux.png
statusmap_image linux.png
action_url  /info/host/364
contact_groups
hostgroup3_servicegroup19/distprofile,hostgroup3_servicegroup22/distprofile,hostgroup3_servicegroup17/distprofile,hostgroup3_servicegroup18/distprofile,hostgroup3_servicegroup3/distprofile
parents Public_OPEN,10lan-OPEN
notifications_enabled   1
notification_interval   60
notification_period 24x7
notification_optionsu,d,r,f
use host-global
}

# gadgets02 host definition
define host {
host_name   gadgets02
alias   gadgets02
address 10.10.10.20
hostgroups  open
icon_image  linux.png
icon_image_alt  LOGO - Linux Penguin
vrml_image  linux.png
statusmap_image linux.png
action_url  /info/host/365
contact_groups
hostgroup3_servicegroup19/distprofile,hostgroup3_servicegroup22/distprofile,hostgroup3_servicegroup17/distprofile,hostgroup3_servicegroup18/distprofile,hostgroup3_servicegroup3/distprofile
parents Public_OPEN,10lan-OPEN
notifications_enabled   1
notification_interval   60
notification_period 24x7
notification_optionsu,d,r,f
use host-global
}
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users


Re: [opsview-users] Host UNREACHABLE Problem

2009-09-17 Thread Kang
Hi

I switched back the hosts state to OK manually and tail-greped the two
hosts' log.

[1253235965] API LOG: admin;PROCESS_HOST_CHECK_RESULT;gadgets01;0;test|
[1253235972] HOST ALERT: gadgets01;UP;HARD;1;test
[1253235972] HOST NOTIFICATION:
admin/distprofile;gadgets01;UP;notify-by-atom;test
[1253235978] SERVICE ALERT: gadgets01;Linux CPU
Usage;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
[1253235978] SERVICE ALERT: gadgets01;Linux Network
Usage;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
[1253236011] API LOG: admin;PROCESS_HOST_CHECK_RESULT;gadgets02;0;test|
[1253236013] HOST ALERT: gadgets02;UP;HARD;1;test
[1253236013] HOST NOTIFICATION:
admin/distprofile;gadgets02;UP;notify-by-atom;test
[1253236027] SERVICE ALERT: gadgets02;Linux Disk
Usage;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
[1253236037] SERVICE ALERT: gadgets02;Nagios Agent
check;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
[1253236037] SERVICE ALERT: gadgets01;Linux Load
Average;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
[1253236037] SERVICE ALERT: gadgets01;Nagios Agent
check;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
[1253236079] SERVICE ALERT: gadgets02;Linux Hardware
Spec;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
[1253236081] SERVICE ALERT: gadgets02;Linux TCP
Established;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
[1253236097] HOST ALERT: gadgets01;UNREACHABLE;SOFT;1;CRITICAL - 10.10.10.19:
rta nan, lost 100%
[1253236126] SERVICE ALERT: gadgets02;Linux Load
Average;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
[1253236126] SERVICE ALERT: gadgets02;Syslogd
Check;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
[1253236158] HOST NOTIFICATION:
admin/distprofile;gadgetsdb01;UNREACHABLE;notify-by-atom;CRITICAL -
10.10.10.21: rta nan, lost 100%
[1253236186] HOST ALERT: gadgets02;DOWN;SOFT;1;CRITICAL - 10.10.10.20: rta
nan, lost 100%

There is no difference.

PS. I had modified hosts.cfg generating section of nagconfgen.pl

check_interval  0   ; For the moment, set
check_interval to 0 so hosts only checked on demand, like Nagios 2

0 to 5



2009/9/17 Ton Voon 

>
> On 17 Sep 2009, at 11:21, Kang wrote:
>
>  In opsview 3.3.1
>> I have two hosts whose configuration have the same parent.
>>
>> When two hosts go down, their host states are different.
>> one is DOWN, the other is UNREACHABLE.
>>
>> why this problem happened??
>>
>
> That sounds strange. Can you provide relevant nagios.log entries around
> this time?
>
> I'm guessing that it could be a very deep nagios host logic problem (I note
> there are two parents for each of these hosts), but I'd need to know the
> recreation steps.
>
> Ton
>
> ___
> Opsview-users mailing list
> Opsview-users@lists.opsview.org
> http://lists.opsview.org/lists/listinfo/opsview-users
>
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users


Re: [opsview-users] Host UNREACHABLE Problem

2009-09-20 Thread Kang
Hi

I tried again.
Both parent are OK
and there is no state change during the test.


2009/9/18 Ton Voon 

>
> On 18 Sep 2009, at 02:26, Kang wrote:
>
> Hi
>
> I switched back the hosts state to OK manually and tail-greped the two
> hosts' log.
>
> [1253235965] API LOG: admin;PROCESS_HOST_CHECK_RESULT;gadgets01;0;test|
> [1253235972] HOST ALERT: gadgets01;UP;HARD;1;test
> [1253235972] HOST NOTIFICATION:
> admin/distprofile;gadgets01;UP;notify-by-atom;test
> [1253235978] SERVICE ALERT: gadgets01;Linux CPU
> Usage;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
> [1253235978] SERVICE ALERT: gadgets01;Linux Network
> Usage;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
> [1253236011] API LOG: admin;PROCESS_HOST_CHECK_RESULT;gadgets02;0;test|
> [1253236013] HOST ALERT: gadgets02;UP;HARD;1;test
> [1253236013] HOST NOTIFICATION:
> admin/distprofile;gadgets02;UP;notify-by-atom;test
> [1253236027] SERVICE ALERT: gadgets02;Linux Disk
> Usage;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
> [1253236037] SERVICE ALERT: gadgets02;Nagios Agent
> check;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
> [1253236037] SERVICE ALERT: gadgets01;Linux Load
> Average;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
> [1253236037] SERVICE ALERT: gadgets01;Nagios Agent
> check;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
> [1253236079] SERVICE ALERT: gadgets02;Linux Hardware
> Spec;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
> [1253236081] SERVICE ALERT: gadgets02;Linux TCP
> Established;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
> [1253236097] HOST ALERT: gadgets01;UNREACHABLE;SOFT;1;CRITICAL -
> 10.10.10.19: rta nan, lost 100%
> [1253236126] SERVICE ALERT: gadgets02;Linux Load
> Average;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
> [1253236126] SERVICE ALERT: gadgets02;Syslogd
> Check;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
> [1253236158] HOST NOTIFICATION:
> admin/distprofile;gadgetsdb01;UNREACHABLE;notify-by-atom;CRITICAL -
> 10.10.10.21: rta nan, lost 100%
> [1253236186] HOST ALERT: gadgets02;DOWN;SOFT;1;CRITICAL - 10.10.10.20: rta
> nan, lost 100%
>
>
> Can you include a grep of the state of the parents?
>
> Ton
>
>
> ___
> Opsview-users mailing list
> Opsview-users@lists.opsview.org
> http://lists.opsview.org/lists/listinfo/opsview-users
>
>
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users


[opsview-users] Performance problem on scheduled downtime for many host(over 1000)

2009-09-22 Thread Kang
Hi

I set a scheduled down time on the host group (checked by slave) which has
about 1000 host and over 1 service checks on master server.
but It takes too long time to complete the job.( nagios daemon sends
internal scheduled down time commands for every service, hosts, sub host
group)

Leaving the problem, after doing that all nagios
CGIs'(status.cgi,extinfo.cgi and so on) response time become very long and
the cgi process hog CPU 100%.
Could this problem caused by too many scheduled downtimes?
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users


Re: [opsview-users] Performance problem on scheduled downtime for many host(over 1000)

2009-09-24 Thread Kang
Hello

I talked about this in #opsview IRC channel with duncs.

Opsview 3.3.1 uses Nagios ver. 3.06 now.
I found the changelog at http://www.nagios.org/development/history/core-3x

3.1.2 - 06/23/2009

   - Fix for CPU hogging in service and host check scheduling logic


I heard duncs that the next verision of opsview will migrate to nagios 3.2.
so this problem will automatically be solved.


2009/9/22 Kang 

> Hi
>
> I set a scheduled down time on the host group (checked by slave) which has
> about 1000 host and over 1 service checks on master server.
> but It takes too long time to complete the job.( nagios daemon sends
> internal scheduled down time commands for every service, hosts, sub host
> group)
>
> Leaving the problem, after doing that all nagios
> CGIs'(status.cgi,extinfo.cgi and so on) response time become very long and
> the cgi process hog CPU 100%.
> Could this problem caused by too many scheduled downtimes?
>
>
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users


Re: [opsview-users] Performance problem on scheduled downtime for many host(over 1000)

2009-10-12 Thread Kang
Hello

I've tested with new opsview version 3.3.2[ nagios 3.2 core].
but It has the same problem. :(



2009/9/25 Kang 

> Hello
>
> I talked about this in #opsview IRC channel with duncs.
>
> Opsview 3.3.1 uses Nagios ver. 3.06 now.
> I found the changelog at http://www.nagios.org/development/history/core-3x
>
> 3.1.2 - 06/23/2009
>
>- Fix for CPU hogging in service and host check scheduling logic
>
>
> I heard duncs that the next verision of opsview will migrate to nagios 3.2.
> so this problem will automatically be solved.
>
>
> 2009/9/22 Kang 
>
> Hi
>>
>> I set a scheduled down time on the host group (checked by slave) which has
>> about 1000 host and over 1 service checks on master server.
>> but It takes too long time to complete the job.( nagios daemon sends
>> internal scheduled down time commands for every service, hosts, sub host
>> group)
>>
>> Leaving the problem, after doing that all nagios
>> CGIs'(status.cgi,extinfo.cgi and so on) response time become very long and
>> the cgi process hog CPU 100%.
>> Could this problem caused by too many scheduled downtimes?
>>
>>
>
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users


Re: [opsview-users] I've posted screen shots previewing the new user interface in Opsview 3.5.0

2009-10-26 Thread Kang
Hello

A new framework for perf. data will replace the current rrd-based graph ?
RRD graph support many customizable options ( stacking, multigraphing, etc.)
Will a new Flot( http://code.google.com/p/flot/ )-based graph framwork
support those features
and still use rrdfile for obtaning graph data with such like javascriptRRD(
http://sourceforge.net/projects/javascriptrrd/ ) ?


and it will

2009/10/23 James Peel 

>
> I've posted screen shots previewing the new user interface in Opsview 3.5.0
> here: http://bit.ly/4d1rdI
> We've also moved to a new framework for displaying performance data.
>
> Let us know what you think!
>
>
> --
> James
>
>
>
>
> ___
> Opsview-users mailing list
> Opsview-users@lists.opsview.org
> http://lists.opsview.org/lists/listinfo/opsview-users
>
>
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users


Re: [opsview-users] SNMP "Query Host" returns an error

2009-11-17 Thread Kang
I guess it is the same problem with
http://lists.opsview.org/lurker/message/20090819.024639.881c7a50.en.html

I guess it's caused by unvisible \x00 character .



2009/11/18 Matt White 

>  Hi Duncs,
>
>
>
> Opsview is running on a VM (ESXi 4) the guest OS is Ubuntu 8.04 server
> 32-bit
>
> Windows OS’s are variations of Server 2003 R2 Standard/Enterprise and
> Server 2008 Standard
>
>
>
> Not sure if this is related to any teaming of NICs on the Windows servers
> or how the data is collected and returned by the query_host script?
>
>
>
> Kind regards,
>
> Matt
>
>
>
> *From:* opsview-users-boun...@lists.opsview.org [mailto:
> opsview-users-boun...@lists.opsview.org] *On Behalf Of *Duncan Ferguson
> *Sent:* 17 November 2009 14:54
> *To:* Opsview Users
> *Subject:* Re: [opsview-users] SNMP "Query Host" returns an error
>
>
>
>
>
> On 16 Nov 2009, at 17:01, Matt White wrote:
>
>
>
>   I have just checked again and it appears to be for all windows hosts.
>
>
>
> Removed and re-added one of the older servers and got the same error
> message.
>
>
>
> What version of Opsview on what OS are you running?
>
>
>
>   Duncs
>
>
> --
>
> Duncan Ferguson
> Senior Developer
>
>
>
>
> Opsera Limited | Unit 69 Suttons Business Park
> Reading | Berkshire | RG6 1AZ | UK*
>
> Phone:   *+44 (0) 845 057 7887
> *Mobile**:   *+44 (0) 7968 148 748
> *Skype*:   duncan_j_ferguson *Email:*   *duncan.fergu...@opsera.com**
> *www.opsera.com
>
>
>
> Opsera Limited is registered in the UK under Company Number 5396532. Our
> registered office is Gorse View, Horsell Rise, Woking, Surrey, GU21 4RB.
>
>
>
> ___
> Opsview-users mailing list
> Opsview-users@lists.opsview.org
> http://lists.opsview.org/lists/listinfo/opsview-users
>
>
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users


[opsview-users] opsview 3.5.0 graph

2009-11-26 Thread Kang
Hello

The new interface of Opsview 3.5.0 is very nice.
But the javascript-ajax based new graph framework is not fully satisfying.

it's lack of customizable options compare to old rrdgraph( upper/lower
limit, static cur/min/max legend label, fast static image rendering, etc.)
and it seems that xtics of graphs don't show local timestamp but GMT
timestamp.

I've found I can still use RRDgraph with /graphrrd url instead of /graph but
right-upper side slide menu disappeared.
Are you planing to continute to maintain both graph framework together?

I prefer static image RRDgraph to javascript-ajax based graph.
because It is fast and easy to integrate with other system.

What do you think about that?
**
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users


Re: [opsview-users] opsview 3.5.0 graph

2009-12-02 Thread Kang
>
>
> On Nov 27, 2009, at 1:48 AM, Kang wrote:
>
> it's lack of customizable options compare to old rrdgraph( upper/
>
> lower limit, static cur/min/max legend label, fast static image
>
> rendering, etc.)
>
>
> What do you mean by upper/lower limit? I think there are some possible
> optimisations with how the graphs choose the top and bottom values for the
> y-axis.
>
>
>
I want to set ower/upper limit of y-axis range myself.
but I can't find any menu or parameters to do so.

RRDgraph have theses options.
In opsview-web/lib/Opsview/Web/Controllere/RRDgraph.pm

123 #if (defined $full_size_mode){ push @$rrdoptions,
"--full-size-mode"; }
124 if ( defined $upper_limit ) { push @$rrdoptions, "--upper-limit",
$upper_limit }
125 if ( defined $lower_limit ) { push @$rrdoptions, "--lower-limit",
$lower_limit }
126 if ( defined $rigid )   { push @$rrdoptions, "--rigid" }
127 if ( defined $alt_autoscale ) { push @$rrdoptions, "--alt-autoscale"
}


http://oss.oetiker.ch/rrdtool/doc/rrdgraph.en.html

[*-u*|*--upper-limit* *value*] [*-l*|*--lower-limit* *value*] [*-r*|*--rigid
*]

By default the graph will be autoscaling so that it will adjust the y-axis
to the range of the data. You can change this behavior by explicitly setting
the limits. The displayed y-axis will then range at least from *lower-limit*to
*upper-limit*. Autoscaling will still permit those boundaries to be
stretched unless the *rigid* option is set.
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users


Re: [opsview-users] Opsview 3.5.1 released

2009-12-23 Thread Kang
Hello

I told Duncan that ver. 3.5.1's Graph autoscaling does not work by default
in #opsview irc channel.
please notify me where can I get the patch for the problem.

Thank you in advance.


2009/12/22 Duncan Ferguson 

> Opsview 3.5.1 is now available!  More information at
> http://opsview.org/opsview_3.5.1
>
> Merry Christmas and a Happy New Year from everyone here at Opsera.
>
>  Duncs
> ___
> Opsview-users mailing list
> Opsview-users@lists.opsview.org
> http://lists.opsview.org/lists/listinfo/opsview-users
>
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users


Re: [opsview-users] Bug in cleanup of import_runtime

2010-01-18 Thread Kang
Hello.

https://secure.opsera.com/jira/browse/OPS-950

I'm using Opsview v 3.3.2 now.
but i think there is no service_saved_state table(in our DB, its size is
220MB)  growing problem.
( i executed the DELETE query which you pasted but there is no change in
table data length size. )
i doubt it is the main cause of performance degradation.


2010/1/18 Ton Voon 

> Hi!
>
> Just wanted to make people aware of a fix for reducing the time to run
> import_runtime. This affects people if there are importing into ODW and have
> a large number of services.
>
> The cleanup section wasn't getting invoked correctly which means that the
> odw.service_saved_state table will continue to grow and it may slow down the
> duration of the hourly imports into ODW.
>
> You can run this SQL command on your ODW database if you find that the
> odw.service_saved_state table is too large:
>
> mysql>  DELETE FROM service_saved_state WHERE opsview_instance_id = 1 AND
> start_timev <= UNIX_TIMESTAMP(NOW() - INTERVAL 7 DAY)
>
> The patch is here:
>
>
> https://secure.opsera.com/wsvn/wsvn/opsview/trunk/opsview-core/bin/import_runtime?op=diff&;
>
> This will be included in a future release.
>
> Ton
>
> ___
> Opsview-users mailing list
> Opsview-users@lists.opsview.org
> http://lists.opsview.org/lists/listinfo/opsview-users
>
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users


Re: [opsview-users] Bug in cleanup of import_runtime

2010-01-19 Thread Kang
Hello.

Where should i patch in v.3.3.2's import_runtime.( maybe
https://secure.opsera.com/wsvn/wsvn/opsview/trunk/opsview-core/bin/import_runtime?rev=3336&peg=3336)
I look into the script but patched parts(
https://secure.opsera.com/wsvn/wsvn/opsview/trunk/opsview-core/bin/import_runtime?op=diff&;)
are no diffrent.

and where service_saved_state table is used?
I can't find the table in odw DB schema diagram.



2010/1/19 Ton Voon 

>
> On 19 Jan 2010, at 02:33, Kang wrote:
>
> https://secure.opsera.com/jira/browse/OPS-950
>
> I'm using Opsview v 3.3.2 now.
> but i think there is no service_saved_state table(in our DB, its size is
> 220MB)  growing problem.
> ( i executed the DELETE query which you pasted but there is no change in
> table data length size. )
> i doubt it is the main cause of performance degradation.
>
>
> On one customer's large system, this was a problem which we saw during the
> data load, so we've fixed this problem and raised it on the mailing lists.
>
> Your system may have a different issue. We I can't look it without a
> support contract :(
>
> The DELETE will not change the table data length based on the length of the
> data file - you will need to run a mysql optimise to do that.
>
> You can see the dataload timings by running this command:
> http://docs.opsview.org/doku.php?id=opsview-community:odw#how_long_does_a_dataload_take
>
> Ton
>
>
> ___
> Opsview-users mailing list
> Opsview-users@lists.opsview.org
> http://lists.opsview.org/lists/listinfo/opsview-users
>
>
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users


Re: [opsview-users] Opsview 3.5.2 released

2010-02-03 Thread Kang
Hello.

Thank you all for your efforts!
but it seems that javascript-based graphs in opsview 3.5.2 are still not
autoscaling but showing weird scientific notations by default.
___
Opsview-users mailing list
Opsview-users@lists.opsview.org
http://lists.opsview.org/lists/listinfo/opsview-users