|
Hello, I’m
working with OCFS2 on Radhat Advanced Server 4 Patch 3 and I had kernel panics
too. I use OCFS2 only for RAC archive logs and RMAN backups. Well, I’m testing one solution and
seems to be fine: In
/etc/ocfs2/cluster.conf I have replaced the public IPs by the heartbeat IPs (parameter
ip_address), but keeping the names. Is there anyone that knows this solution
and have tested it with fails? Regards from Spain, MARTÍN -----Mensaje original----- Unfortunately, it MAKES CLUSTER LESS STABLE. It works
until network and SAN systems afe fine, but is not so good in failed
situations. Even if we use OCFSv2 for idle file systems (which do
nothing 90% of the time) , o2cb reboots nodes when lost heartbeat or (worst) network or (even worst) both... Instead of
trying to recover without it (as I said 0- FS is in consistant state, no activity at all). It is not just OCFSv2 problem - Oracle CSS behave
simular (butis much more stable in reality), and Linux HA cluster too (but it can use different heartbeat conenctions so
it can be configured very reliable). You are right saying that _cluster software always
have a tendency to fence or kill neighbours to keep internal consistancy_. But OCFSv2 is one of he worst
examples of such software. What can be done _relatively easy_. (1) as we saiud many times - redundancy and better
timeout control in heartbeat. (Of course, long timeouts means _long
recovery_, but it's OK for 90% installations). Typical network recovery is 1 minute,
not 10 seconds. (2) System should not make bad things IF it is in
consistant state. In many cases, if system have not outstanding IO requests, it
can recover without server reboot (or at least try to do it) even
if it lost heartbeats and suspect, that other systems could take control out
of it. It is serious theoretical challenge _how to do it
safely_, but it is very desired for such systems. (3) In some configurations, FS can be treated as _not
so important_. It means that it is safer to switch into red_only and try
to recover online, but not panic. Good example - you have production
Oracle which uses ASM, and you use OCFSv2 for backup storage. IT is safer to make
IOP failure on this storage vs rebooting system without reasons. PS. I had 2 network outages in the lab today,m because
of bad UPS - and in all cases, ALL OCFSv2 servers (in 2 different
clusters) rebooted. No one survived short (30 seconds) lost of Ethernet
conenction (including iSCSI). In some cases, one server rebooted by OCFS and otehr
by another part of the cluster (HA or RAC) - but result is exactly this -
_all_ OCFSv2 panic on a shport network/san outage, in all cases. ----- Original Message ----- From: "Sunil Mushran"
<[EMAIL PROTECTED]> To: "ocfs2-users"
<[email protected]> Sent: Tuesday, October 03, 2006 1:51 PM Subject: [Ocfs2-users] Re: FW: Use of OCFS2 file
systems. > I try to avoid responding to such emails because
I am not sure how > much credibility a partisan has in such debates.
After all I have been > working on OCFS/OCFS2 the last 4/5 years. > > Having said that, I have some issues with the
statements. While it is true > that we can improve on the disk/net heartbeat, it
is wrong to say that it > does not work or makes the cluster unstable. > > We have OCFS2 running on lots of clusters in
Oracle that are testing each > new revision of the database. While these
machines are test boxes, they are > all running loads designed to break Oracle. I am
rarely pinged about them > hitting an OCFS2 issue. > > We also have internal production databases as
well as Oracle customers who > are using OCFS2 with much success. > > However, we do have room for improvement and we
are working on it. > > For the list of ongoing projects, you can peruse
the OCFS2 Development > Wiki at http://oss.oracle.com/osswiki/OCFS2. > > If you wish to contribute code, as this is an
open source project, feel free > to ping me or the [email protected]
mailing list. > > Thanks > Sunil Mushran > > > > > Hi Sunial, > > > > What are your thoughts about this message on
the mailing lists? > > > > Thanks! > > Sanjeet > > > > > >
------------------------------------------------------------------------ > > > > *From:* [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED]
*On Behalf Of *Alexei_Roudnev > > *Sent:* Friday, September 29, 2006 11:50 PM > > *To:* Bill Wells; Sunil Mushran > > *Cc:* [email protected] > > *Subject:* Re: [Ocfs2-users] Use of OCFS2
file systems. > > > > > > > > If you can avoid OCFSv2 on a RAC server,
better do it. Any cluster > > (RAC and OCFS) have it's own instability
elements (OCFSv2 have a poor > > heartbeat alghoritm and so tend to
self-fence without real failure, > > and (in addition) is relatively new. It
works fine enough to be used, > > when you really need file sharing (such as
database files or backups > > or even archive logs), but the less you use
it, the better. Oracle > > home files feels well without sharing. > > > > > > > > // I don't see problems with OCFSv2 on SLES9
SP3-updated, but I avoid > > to use it for mission critical file systems
or heavy-duty file systems, > > > > // and I still have failure scenario, when
RAC cluster could work but > > OCFS cause full-cluster failure > > > > // If you have network problem, SAN > > > > // system restart, disk io error, etc etc -
you can end up with system > > panic or reboot, caused by OCFS - > > > > // so the less OCFS you have, the better is
your system stability. > > > > _______________________________________________ > Ocfs2-users mailing list >
http://oss.oracle.com/mailman/listinfo/ocfs2-users > _______________________________________________ Ocfs2-users mailing list http://oss.oracle.com/mailman/listinfo/ocfs2-users
|
_______________________________________________ Ocfs2-users mailing list [email protected] http://oss.oracle.com/mailman/listinfo/ocfs2-users
