Re: [rsyslog] High availability on rsyslog (cluster)

david Wed, 01 Jun 2011 12:22:50 -0700

On Wed, 1 Jun 2011, Rainer Gerhards wrote:

Hi David, Christian,


I finally have found time to look into the provided links. This looks indeed
very simple from an rsyslog PoV. However, I get the feeling that I myself may
not be the best person to do the majority of work, as to develop the actual
OCF scripts access to a test cluster, and experience with it (!), seems to be
very beneficial. So I wonder if anyone of you would be interested in helping
to get this going (with the scripts becoming part of the regular rsyslog
release).


I've got the clusters to work with and would be happy to help

but you really don't need a cluster to test it (more below)

As far as I understand, I would need to implement some facility inside
rsyslog that can be used to check its health by the monitor script. Or would
it even be an alternative for the monitor script to just check if the rsyslog
process to be monitored is in the process list?


it depends how reliable you want the testing to be.

having the monitor check if rsyslog is in the process table is an easyfirst step (and is all that is done for many applications), but it has acouple of problems

1. if something goes wrong where rsyslog is up, but not processingmessages (full buffer, full disk, etc) the cluster software will thinkthat everything is fine.

2. you really want to have rsyslog running all the time, even when youaren't active so that this system can log.

having some way to ask rsyslog if everything is good would be very handy(and possibly provide the start of some interactive debugging tool???). itis not a requirement, but would be better.


it's all a matter of how far you want to go.


at a minimum it needs to implement start, stop, monitor, meta-data

A. meta-data is a pretty trivial thing (a series of echo statements tooutputsome XML)

B. start and stop get a little more complicated since ryslog needs to berunning even if a box is not active

when rsyslog is active, it means that it's running with the 'active'configuration, when it's inactive, it means that it's running with the'inactive' configuration

this can be done with no changes to rsyslog, simply by having twodifferent .conf files and having the OCF script stop the current instanceof rsyslog and start the correct one. If rsyslog has the ability to switchbetween two configs internally (which may be possible with the new configsupport), then this could be a signal of some sort to rsyslog.


C. monitor is where things get interesting

monitor needs to return one of three conditions, active (0), standby (7),or failed (anything else)

if start/stop are done by starting rsyslog with one of two differentconfig files, the script can keep track of which config file it startedrsyslog with and return the appropriate value. it can also check ifrsyslog is running and if not return an error.

if start/stop are done by rsyslog internally, monitor needs to ask rsyslogwhich state it is in (or return an error state)

to test this, you don't need a cluster, all you need is to run the scriptto test the various modes. but you need to do it in the following order


start
start
monitor
stop
stop
monitor

this is to test to make sure that things don't get confused if it gets'started' or 'stopped' twice

the one other wrinkle in all of this is that when rsyslog starts at systemboot time, you want it to start in the 'inactive' mode

In my case I have dedicated relay systems, and what I do is have rsyslogrunning on both boxes in the pair all the time. I then have an IP addressthat I move from one box to the other todo the failover. this works 99+%of the time, but in the very rare cases where rsyslog dies, this isn'tdetected.


David Lang

Any comments, advise, collaboration is deeply appreciated.

Rainer
PS: just in case: tomorrow is a public holiday over here, and I may leave for
a long weekend. I still thought I get this effort kicked off...

-----Original Message-----
From: [email protected] [mailto:rsyslog-
[email protected]] On Behalf Of [email protected]
Sent: Tuesday, May 24, 2011 8:13 AM
To: rsyslog-users
Subject: Re: [rsyslog] High availability on rsyslog (cluster)

take a look at

http://linux-ha.org/wiki/Resource_Agents

and

http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html

David Lang

On Tue, 24 May 2011, Rainer Gerhards wrote:

Date: Tue, 24 May 2011 08:09:28 +0200
From: Rainer Gerhards <[email protected]>
Reply-To: rsyslog-users <[email protected]>
To: rsyslog-users <[email protected]>
Subject: Re: [rsyslog] High availability on rsyslog (cluster)

Thx -- sounds interesting and probably not too much work to do...

Rainer

-----Original Message-----
From: [email protected] [mailto:rsyslog-
[email protected]] On Behalf Of [email protected]
Sent: Tuesday, May 24, 2011 8:08 AM
To: rsyslog-users
Subject: Re: [rsyslog] High availability on rsyslog (cluster)

take a look at linux-ha

It's a framework to manage HA (including active/active load sharing,
quorums, etc)

it extends the traditional init.d startup scripts to also include a
'status' call to tell if the service is active or not. the framework
calls this service periodically and if the service fails, it does a
failover.
With the correct configuration (and software), it can do sub-second
failover.

David Lang

  On
Tue, 24 May 2011, Rainer Gerhards wrote:

David and all,

are you aware of any high availability APIs that would enable
rsyslog

to do

some kind of automatic failover in a cluster environment? I have

never

specifically programmed for that and wonder if there are any options.

Rainer

-----Original Message-----
From: [email protected] [mailto:rsyslog-
[email protected]] On Behalf Of [email protected]
Sent: Tuesday, May 24, 2011 12:30 AM
To: rsyslog-users
Subject: Re: [rsyslog] High availability on rsyslog (cluster)

depending on how active your logging is, you could watch the logs

and

say
that if you don't receive any logs for 1 min (or whatever time is
approprate), somthing is wrong.

you could also generate known UDP logs to yourself and alert if
they don't show up.

David Lang

  On Mon, 23 May 2011, Christian Lete wrote:

Hi,

I have a small question,  I would need to setup an rsyslog
receiver/forwarder, listening on udp port, since some clients,
only support this option. I would need this service to be highly
available(I don't want to have two machines and having duplicated
information), but since this udp, I can't be for sure if the

service

is running fine. What I thought is to indirectly check it, by

having

another port listening on tcp and checking the tcp service, if the
service is not running on tcp I would assume the whole system is

down

and would failover to the other instance of the cluster, that's
the only way I could think of, do you currently have another way?


thank you very much,

Regards,

Christian
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: [rsyslog] High availability on rsyslog (cluster)

Reply via email to