Luis.

Things can be worst because we can run 3 clusterware at the same time on the 
same Linux:

- CRS (oracle RAC)
- O2CB
- Heartbeat2

Problem is that each system makes independent decisions and independent 
selection of the masters and slaves, and decide _to fence _ or _to suicide_ 
independently.

It makes a common case, when, if we have a SAN service interruption or IP 
network interruption (for a short time), different components makes a different 
decisions and fence themself or each other (btw, in case of CRS, fencing is a 
feature of CSS and not a CRS).

Of these 3 clusterwares, only heartbeat (or heartbeat2) is reliable. Both o2cb 
and CRS uses a very primitive heartbeat without redundancy and with bad initial 
parameters, and both makes a wrong decisions easily.

Fortunately, SuSe10  have integrated O2CB + heartbeat2 version (I am not sure 
how stable is it, but stability is a matter of time only) and Oracle CRS (CSS) 
is conservative enough to prevent many unnecessary reboots. But you are right - 
all this mess don't increase overall reliability.

OCFv2 is a great thing with a great potential (not revealed yet), esp. counting 
on heartbeat2 integration and datavolume options (and because it is well tested 
with Oracle). But it really require some improvements to became a 
production-grade thing. Some improvemments are cheap and safe (such as multiple 
interfaces for heartbeat - I always guess what is the problem to implement such 
simple and standard thing), other are already in progress (heartbeat2 
integration), and some require careful design and testing (improved and smart 
fencing policy).

PS. I really watched such thing as independent self -fencing. We have a RAC 
cluster in the lab, running on iSCSI with the same switch for SAN and 
interconnect (generally I use cross cable for interconnect but I used switch 
connection in this case). Once apon a time we had UPS glitch and switch 
rebooted. 
All nodes in cluster rebooted - one becasue 'OCFS fence himself' and other 
because 'CSS fence himself' (through no one non-cluster system even noticed 
this reboot). heartbeat cluster was not affected as well (because of redundant 
heartbeat - eth0, eth1 , /dev/ttyS0). So multiple self-fencing is a real 
problem.
  ----- Original Message ----- 
  From: Luis Freitas 
  To: Sunil Mushran 
  Cc: [email protected] 
  Sent: Monday, April 09, 2007 4:54 PM
  Subject: Re: [Ocfs2-users] Catatonic nodes under SLES10


  Sunil,

       First I want to make clear that I do think that Oracle Cluster File 
System provides a great value for Oracle Linux customers and I do know that one 
has to pay top dollar for equivalent functionality on other platforms, for 
example Veritas Storage Foundation, and others offered by IBM and HP.

      But, the Linux platform is the only one where there are two independent 
clusterwares running (O2CB and CRS). On all the other platforms, as far as I 
know, when there is a second clusterware on the machine, CRS acts as a client 
to it. Use of a uncertified clusterware stack independently and concurrently 
with CRS is not even allowed on other platforms.

       This is kind of funny because both o2cb and crs are Oracle products. 

  Regards,
  Luis Freitas

  Sunil Mushran <[EMAIL PROTECTED]> wrote:
    Fencing is not a fs operation but a cluster operation. The fs is only a 
    client
    of the cluster stack.

    Alexei_Roudnev wrote:
    > It all depends of the usage scenario.
    >
    > Tipical usage is, for example:
    >
    > (1) Shared application home. Writes happens once / week during 
maintanance,
    > otehr time files are opened for reading only. Few logfiles
    > can be redirected if required.
    >
    > So, when server see a problems, it HAD NOT any pending IO for a 3 days - 
so
    > what the purpose of reboot? It 100% knows that NO ANY IO
    > is pending, and other nodes have not any IO pending as well.
    >
    > (2) Backup storage for the RAC. FS is not opened 90% of the time. At 
night,
    > one node opens it and creates a few files. Other node have not any pending
    > IO on this FS. Fencing passive node (which dont run any backup) is not
    > useful because it HAD NOT ANY PENDING IO for a few hours.
    >
    > (3) WEB server. 10 nodes, 1 only makes updates. The same - most nodes have
    > not any pending IO.
    >
    > Of course there is always a risk of FS corruption in the clusters. Any 
layer
    > can keep pending IO forever (I saw Linux kernel keeping it for 10 
minutes).
    > Problem is that in such cases software fencing can't help as well because
    > node is half-dead and can't detect it's own status.
    >
    > So, the key point here is not in _fence for each ap-chi_ but _keep system
    > without pending writes as long as possible and make clean transition 
between
    > active write/active read / passive states. Then you can avoid self-fencing
    > in 90% cases (because of server wil be in passive or active reads state). 
I
    > mounT FS but don't cd into it, or just CD but dont read - passive status. 
I
    > read file - active read for 1 minute, tbhnen flush buffers so that it is 
in
    > passive mode again. I began top write - switch system to write mode. I did
    > not write blocks for 1 minute - flush everything, wait 1 more minute and
    > switch to passive mode.
    >
    >
    >
    >
    > ----- Original Message ----- 
    > From: "Sunil Mushran" 
    > To: "David Miller" 
    > Cc: 
    > Sent: Monday, April 09, 2007 3:18 PM
    > Subject: Re: [Ocfs2-users] Catatonic nodes under SLES10
    >
    >
    > 
    >> For io fencing to be graceful, one requires better hardware. Read
    >> 
    > expensive.
    > 
    >> As in, switches where one can choke off all the ios to the storage from
    >> a specific
    >> node.
    >>
    >> Read the following for a discussion on force umounts. In short, not
    >> possible as yet.
    >> http://lwn.net/Articles/192632/
    >>
    >> Readonly does not work wrt to io fencing. As in, ro only stops any new
    >> userspace
    >> writes but cannot stop pending writes. And writes could be lodged in any
    >> io layer.
    >> A reboot is the cheapest way to avoid corruption. (While a reboot is
    >> painful, it is
    >> much less painful than a corrupted fs.)
    >>
    >> With 1.2.5 you should be able to increase the network timeouts and
    >> hopefully avoid
    >> the problem.
    >>
    >> David Miller wrote:
    >> 
    >>> Alexei_Roudnev wrote:
    >>> 
    >>>> Did you checked
    >>>>
    >>>> /proc/sys/kernel/panic /proc/sys/kernel/panic_on_oops
    >>>>
    >>>> system variables?
    >>>>
    >>>> 
    >>> No. Maybe I'm missing something here.
    >>>
    >>> Are you saying that a panic/freeze/reboot is the expected/desirable
    >>> behavior? That nothing more graceful could be done, like to just
    >>> dismount the ocfs2 file systems, or force them to a read-only mount or
    >>> something like that? We have to reload the kernel?
    >>>
    >>> Thanks,
    >>>
    >>> --- David
    >>>
    >>> 
    >>>> ----- Original Message ----- From: "David Miller" 
    >>>> To: 
    >>>> Sent: Monday, April 02, 2007 9:01 AM
    >>>> Subject: [Ocfs2-users] Catatonic nodes under SLES10
    >>>>
    >>>> 
    >>> [snip]
    >>>
    >>> 
    >>>> Both servers will be connected to a dual-host external RAID system.
    >>>> I've setup ocfs2 on a couple of test systems and everything appears
    >>>> to work fine.
    >>>>
    >>>> Until, that is, one of the systems loses network connectivity.
    >>>>
    >>>> When the systems can't talk to each other anymore, but the disk
    >>>> heartbeat is still alive, the high numbered node goes catatonic.
    >>>> Under SLES 9 it fenced itself off with a kernel panic; under 10 it
    >>>> simply stops responding to network or console. A power cycling is
    >>>> required to bring it back up.
    >>>>
    >>>> The desired behavior would be for the higher numbered node to lose
    >>>> access to the ocfs2 file system(s). I don't really care whether it
    >>>> would simply timeout ala stale NFS mounts, or immediately error like
    >>>> access to non-existent files.
    >>>>
    >>>>
    >>>> 
    >>> _______________________________________________
    >>> Ocfs2-users mailing list
    >>> [email protected]
    >>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
    >>> 
    >> _______________________________________________
    >> Ocfs2-users mailing list
    >> [email protected]
    >> http://oss.oracle.com/mailman/listinfo/ocfs2-users
    >>
    >> 
    >
    > 

    _______________________________________________
    Ocfs2-users mailing list
    [email protected]
    http://oss.oracle.com/mailman/listinfo/ocfs2-users





------------------------------------------------------------------------------
  Never miss an email again!
  Yahoo! Toolbar alerts you the instant new Mail arrives. Check it out.


------------------------------------------------------------------------------


  _______________________________________________
  Ocfs2-users mailing list
  [email protected]
  http://oss.oracle.com/mailman/listinfo/ocfs2-users
_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to