Not iSCSI, but OCFS. If you run OCFS in 2 node configuration, then, when 1 node crashes, second can't resolve split-brain problem so it self-fence if it is not primary node. It makes many scenarios, when (in 2 node OCFS) both nodes crash after some failure (one by itself, second by self fencing).
If you add third system (doing nothing, just mounting) it will play a role of arbiter - system, which sees it, knows, that it have a quorum, so single node crash never destroy the whole cluster. Real story. We run 2 Oracle RAC nodes in the lab. Each had ASM and OCFSv2. In some point, one of our switches restarted because of short power glitch. It cause interconnect to went down for about 1 minute and it caused some delay in iSCSI disk access. No one normal file system noticed it - all resumed working with a minor error messages. But clusters was another story... Both nodes rebooted - one because 'ASM lost quorum' and second because 'OCFS lost quorum' (and they happen to have a different masters). Additional instability (for both, Oracle RAC and OCFSv2) happen because no one of them supports multiple network heartbeat interfaces (and no one supports serial heartbeat). It makes them sensitive to almost any network failures. Bonding is not a full solution because it adds complexity and instability by itself. Using loopbacks + OSPF can resolve problem, but by a very strange way (OSPF recovery time is very short, so it can work well). iSCSI is another story. OCFSv2 have (HAD? I knew about plans to improve it) a very primitive decision - making _what to do_ if it lost connection to the primary disk storage (for example, it reboots even if it have not outstanding IO commands). So, your must use heartbeat time (counter, in reality) big enough to allow iSCSI IO to be restored after network reconvergence (switch reboot, for example, or STP configuration change - 40 seconds by standard). Increasing it will increase OCFSv2 reconvergence time in case of real node failure, so it must be done very careful. CHECKLIST for the cluster. What to test: - run cluster. POWER RESET node1. verify that node2 survived. When it comes, do the same with NODE2. Repeat in another order. - the same but 'shutdown' node, not 'power reset' it (power reset means 'hard power reset, not shutdown button'). - drop interconnection for 40 seconds, then restore it. - reboot Ethernet switch. verify that nodes survived (at least one node). - reboot your iSCSI system (if it is cluster, takeover and giveback). verify that cluster survived. Do it with and without pending IO on cluster file system(s). Good cluster file system should survive any network and infrastructure failures without self fencing if it have not pending IO. System should survive if it have IO, but network or second node failure is shorter than your timeout (heartbeat critical time - after which system decide that the peer is dead). You must find a balance between this timeout and maximum unavailability time on your file system (because this timeout cause a service timeout if second node really crashes). There are a few catastrophic scenarios, which should not happen in normal life but you must be aware of. One is _system freeze and then unfreeze in few minutes_. I saw it on some Linuxes because of broken spinlock somewhere in the memory allocation (saw in SLES9 SP3 with badly configured HUGETLB and Oracle 10.2.0.2 RAC cluster). The only way to prevent such things is to use external fencing (SLES10 can use it with OCFSv2, but I never tested it myself), or may be linux watchdog module (hangcheck-timer). Another bad scenario is _blinking access_. I had it when we connected 2 iSCSI initiators with the same ID - OCFSv2 could not recognize it, iSCSI provided enough access, and systems got crazy and damaged file system in a few minutes. I believe that no one need protection against such errors (it was a primitive human error). ----- Original Message ----- From: "Joel Becker" <[EMAIL PROTECTED]> To: "martin sautter" <[EMAIL PROTECTED]> Cc: "Alexei_Roudnev" <[EMAIL PROTECTED]>; "OCFS2 Users List" <[email protected]> Sent: Wednesday, January 31, 2007 12:53 PM Subject: Re: [Ocfs2-users] Also just a comment to the Oracle guys > On Wed, Jan 31, 2007 at 10:09:31AM +0100, martin sautter wrote: > > please can anybody explain why iSCSI requires 3 nodes for a stable > > cluster configuration or which problems I will have with a 2 node > > OCFS2 cluster against iSCSI based storage. > > I think he's claiming you'd have two "server nodes" that mount > the ocfs2 volume, and one "iSCSI node" that actually hosts the disks and > runs the iSCSI target. > You can certainly have a different iSCSI target. > > Joel > > -- > > "The opposite of a correct statement is a false statement. The > opposite of a profound truth may well be another profound truth." > - Niels Bohr > > Joel Becker > Principal Software Developer > Oracle > E-mail: [EMAIL PROTECTED] > Phone: (650) 506-8127 > _______________________________________________ Ocfs2-users mailing list [email protected] http://oss.oracle.com/mailman/listinfo/ocfs2-users
