That and also we've seen similar issues with Broadcom TG3 drivers. We use
Intel E1000 mostly and thus did not experience the same issue.

As far as the configurable net timeouts goes, the patch was added into
mainline on Dec 4th. So it will be available with ocfs2 1.4. We are still
seeing if we have the bandwidth to backport it to 1.2.

http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=history;f=fs/ocfs2/cluster/tcp.c;h=ae4ff4a6636b23759522994898a95c148a4401f1;hb=HEAD

commit 828ae6afbef03bfe107a4a8cc38798419d6a2765
Author: Andrew Beekhof <[EMAIL PROTECTED]>
Date:   Mon Dec 4 14:04:55 2006 +0100

   [patch 3/3] OCFS2 Configurable timeouts - Protocol changes

   Modify the OCFS2 handshake to ensure essential timeouts are configured
   identically on all nodes.

   Only allow changes when there are no connected peers

   Improves the logic in o2net_advance_rx() which broke now that
   sizeof(struct o2net_handshake) is greater than sizeof(struct o2net_msg)

Included is the field for userspace-heartbeat timeout to avoid the need for
   further protocol changes.

Uses a global spinlock to ensure the decisions to update configfs entries
   are made on the correct value.  The region covered by the spinlock when
incrementing the counter is much larger as this is the more critical case.

   Small cleanup contributed by Adrian Bunk <[EMAIL PROTECTED]>

   Signed-off-by: Andrew Beekhof <[EMAIL PROTECTED]>
   Signed-off-by: Mark Fasheh <[EMAIL PROTECTED]>

commit b5dd80304da482d77b2320e1a01a189e656b9770
Author: Jeff Mahoney <[EMAIL PROTECTED]>
Date:   Mon Dec 4 14:04:54 2006 +0100

   [patch 2/3] OCFS2 Configurable timeouts

   Allow configuration of OCFS2 timeouts from userspace via configfs

   Signed-off-by: Andrew Beekhof <[EMAIL PROTECTED]>
   Signed-off-by: Mark Fasheh <[EMAIL PROTECTED]>

Andy Phillips wrote:
Hello,

   I've made some progress with the o2net_idle_timer issue. Various
people seem to occasionally report instability and faults where the
following message is generated;

(From Andrew Brunton)
Sep 17 22:06:04 argon2 kernel: (0,0):o2net_idle_timer:1310 connection to
node argon1.crewe.ukfuels.co.uk (num 0) at 10.1.1.110:7777 has been idle
for 10 seconds, shutting it down.

(From Peter Santos)
Nov 21 11:40:36 dbo3 kernel: o2net: connection to node dbo2 (num 1) at
192.168.134.141:7777 has been idle for 10 seconds, shutting it down.

And from me;
Aug  2 19:06:27 fred kernel: o2net: connection to node barney (num 0) at
172.16.6.10:7777 has been idle for 10 seconds, shutting it down.

I've tried unsuccessfully to replicate the issue on my testbed
environment. The problem stems from the o2net layer function
'o2net_idle_timer' firing, after not receiving a valid packet after O2NET_IDLE_TIMEOUT_SECS, which is defined to be 10 seconds in
ocfs2-1.2.3/fs/ocfs2/cluster/tcp_internal.h. This then causes the rest
of the code to fall over in a heap, once the underlying socket goes.

It turns out that its very likely not a bug in ocfs2.
This code is doing what its supposed to do. Others will (and have)
argued that the network timeout is too low - see any and all posts by
Alexei to this list. Leaving that aside, or indeed the idea that the network layer should make an attempt at reconnecting before killing the
entire machine, I'll focus on the causes we've found here of this
problem which are not spanning tree related.
One common thread is that people finding this are on EM64T or Opteron
based systems. There are various bugs reported against RedHat Linux (and
probably SuSE as well) for the kernels before RHAS 4.4.
e.g. page 16 of this document - "lost ticks" Message Under Stress With
Non Uniform Memory Access Enabled on AMD Processor-Based Systems
http://support.dell.com/support/edocs/software/osrhel4/en/INT/HJ834A00.pdf

Or oracle bug 4593892 referenced in;
http://www.oracle.com/technology/tech/linux/validated-configurations/html/vc_dell6850-rhel4-cx500-1_1.html

We were also seeing messages of the form;

Dec 18 10:35:44 gs2dwdb02 kernel: warning: many lost ticks.
Dec 18 10:35:44 gs2dwdb02 kernel: Your time source seems to be instable
or some driver is hogging interupts
(sic)

Our problem seems to have been at least partially down to dodgy AMI
megaraid firmware for the system disks. We were getting messages from
the megaraid driver module on the console, which correlated with dropped
packets as logged by Oracle RAC's cssd.log.
So given the above numa and driver/hardware errors its likely that ocfs2
was going for periods as long as 10 seconds without receiving a packet,
and failing accordingly.

Ocfs2 was hit the worst, as it has the finest trigger on lost packets.
The heartbeat failure times for rac are over 60 seconds. The o2cb
heartbeat is set to 61 for us, which is about 120 seconds IIRC, which is
fine for interruptions to the SAN/multipathing failover failures.
We're planning an upgrade to 4.4 which apparently has fixed several of
these bugs, and would recommend others with this problem to carefully
check for signs of driver misbehaviour, particularly lost ticks
messages. If you're running a large amd box with more than a couple of
sockets, then turning numa off seems to be a way of making things more
stable according to some pdfs.
Sunil, I think that 10 seconds is too low for this timeout. Please
consider making this tunable, in the way that O2CB_HEARTBEAT_THRESHOLD
is tunable in /etc/sysconfig/o2cb. It can kill the box, and its a bit counter intuitive to have the documented o2cb_heartbeat_threshold
effectively ignored when it comes to the network heartbeat. Having this
in 1.2.4 would be ideal. Please.
This is the point where alexei can jump in and tell us all that he told
us so. He has a point about network spanning tree convergence, even
though most sensible designs for heartbeat networks would never allow
that to happen. I hope I've made it clear that this is a somewhat
different problem.
What we're planning to do next - once we've confirmed that our new disk
firmware has eliminated the problem, is to test with numa=off, and
eventually upgrade. We're also looking at trying to simulate a bad
driver blocking interrupts in the kernel for configurable periods to
confirm that this diagnosis is correct. I hope this some what long winded message is of use to people.
Andy

_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to