Re: [Linux-HA] FW cluster fails at 4am

2014-01-07 Thread Andrew Beekhof
On 28 Dec 2013, at 3:34 pm, Tracy Reed tr...@ultraviolet.org wrote: Hello all, First, thanks in advance for any help anyone may provide. I've been battling this problem off and on for months and it is driving me mad: Once every week or two my cluster fails. For reasons unknown it seems

Re: [Linux-HA] FW cluster fails at 4am

2014-01-07 Thread Andrew Beekhof
On 7 Jan 2014, at 10:52 am, Tracy Reed tr...@ultraviolet.org wrote: On Sat, Dec 28, 2013 at 12:42:28AM PST, Jefferson Ogata spake thusly: Is it possible that it's a coincidence of log rotation after patching? In certain circumstances i've had library replacement or subsequent prelink

Re: [Linux-HA] FW cluster fails at 4am

2014-01-06 Thread Tracy Reed
On Sat, Dec 28, 2013 at 12:42:28AM PST, Jefferson Ogata spake thusly: Is it possible that it's a coincidence of log rotation after patching? In certain circumstances i've had library replacement or subsequent prelink activity on libraries lead to a crash of some services during log rotation.

Re: [Linux-HA] FW cluster fails at 4am

2013-12-28 Thread Jefferson Ogata
On 2013-12-28 06:13, Tracy Reed wrote: On Fri, Dec 27, 2013 at 08:54:17PM PST, Jefferson Ogata spake thusly: Log rotation tends to run around that time on Red Hat. Check your logrotate configuration. Maybe something is rotating corosync logs and using the wrong signal to start a new log file.

Re: [Linux-HA] FW cluster fails at 4am

2013-12-27 Thread Jefferson Ogata
On 2013-12-28 04:34, Tracy Reed wrote: First, thanks in advance for any help anyone may provide. I've been battling this problem off and on for months and it is driving me mad: Once every week or two my cluster fails. For reasons unknown it seems to initiate a failover and then the shorewall

Re: [Linux-HA] FW cluster fails at 4am

2013-12-27 Thread Tracy Reed
On Fri, Dec 27, 2013 at 08:54:17PM PST, Jefferson Ogata spake thusly: Log rotation tends to run around that time on Red Hat. Check your logrotate configuration. Maybe something is rotating corosync logs and using the wrong signal to start a new log file. That was actually the first thing I