Hi,

On Tue, Mar 09, 2010 at 11:37:02AM -0000, darren.mans...@opengi.co.uk wrote:
> Hi everyone.
> 
>  
> 
> Further to some discussions a couple of weeks ago with regard to OCFS2
> on SLES 11 HAE I'm looking to finally nail this problem.
> 
> We have a 3 node cluster that has a STONITH shootout every week. This
> morning one node got stuck in a state where it couldn't be fenced due
> the RSA not being responsive.
> 
> I'm not sure if the problem is due to:
> 
> *         Network interruption causing Totem failures.
> *         Java (Tomcat) processes falling over.

I suppose that those are activequote and activequoteadmin. You
should increase the timeouts, 10 seconds is too short in general,
and for java/tomcat probably even more so.

> *         DLM falling over.
> *         Any of the above in any combination.
> 
> I've attached a hb_report. Could you see if you can see anything?

Any good reason to ignore quorum? For a three node cluster you
should remove the no-quorum-policy property or, perhaps because
of ocfs2, set it to freeze.

Pacemaker is 1.0.3, perhaps it's time to upgrade too. There is a
SLE11 HAE update available.

>From the logs:

Mar  9 06:28:43 OGG-ACTIVEQUOTE-02 pengine: [5540]: WARN: unpack_rsc_op: 
Processing failed op activequote:1_monitor_10000 on OGG-ACTIVEQUOTE-03: unknown 
exec error

Interestingly, there is no lrmd log for this on 03.

Then there are several operation timeouts, perhaps due to ocfs2
hanging, two activequote and activequoteadmin stop operations
could not be killed even with -9, so they were probably waiting
for the disk.

Mar  9 06:29:40 OGG-ACTIVEQUOTE-02 openais[5439]: [crm  ] info: 
pcmk_peer_update: lost: OGG-ACTIVEQUOTE-03 504997642

Do you know why the node vanished? You should try to keep your
networking healthy.

Thanks,

Dejan

>  
> 
> Thanks
> 
> Darren Mansell
> 
> 
> 
>  
> 


> _______________________________________________
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker


_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Reply via email to