On Tue, 26 Jun 2001, Stephane Faroult wrote:

> But in practice, why would you switch to the standby database, unless
> the primary database is crashed or worse?

- Hardware replace/repair
- Move to a larger host
- O/S upgrade
- File layout revision
- Planned/impending infrastructure outage
- Database problem in which datafiles are corrupted but redologs are not
- Frequent memory faults
- Any chronic but not terminal host-related problem
- Migration to a new I/O subsystem

> You know how it is in a production environment, the database
> crashes. Even if failover is easy, you always have to instruct users
> to connect as scott/tiger@backup instead of scott/tiger@prod - or
> perhaps modify the tnsnames.ora to make it transparent, or perhaps
> play with IP addresses which may mean trouble for a while with
> in-memory routing tables etc.

Well, I guess the presumption is that if someone went to the trouble
of setting up a standby, they would actually have a way to point
people at the standby in the event of a failover.  As you point out,
there are a number of scriptable solutions that are suitable.  The
easiest, and one you mention, is IP address assumption.  This is the
same method that is used by HA cluster solutions like Veritas HA, HP
MC Serviceguard and Compaq TruCluster.  It is easy to script, manage
and execute.  Contrary to your belief, no problems with "in-memory
routing tables" arise, and the change is immediate.  It is a simple
matter of 'ifconfig delete' and 'ifconfig alias.'  These actions take
all necessary steps to notify routers and switches on your subnet that
ther MAC address of the IP has changed and that packets should be
routed accordingly.  Using a dedicated IP address just for the
database service is a good idea even if you aren't building a standby
or HA solution.  It comes in handy if you ever decide you want to
rehost the database.

> My point is that, even if the switch can be quasi-immediate, it is
> not so easy, so people will naturally try to make the main machine
> work first, there will be some delay assessing the damage, waiting
> until 2am to ring the VP in his bed to get the authorization to
> switch, etc.

Well, not everyone has to have authorization from a VP to fail over to
a standby.  The endless troubleshooting is a real problem.  At many
sites, they set an upper bound on time spent on diagnostics, and
require a failover (if a failover is appropriate) after some number of
minutes.  The various failover scenarios are scripted and packaged in
advance.  You don't rush around trying to figure it out when the
system is failing.

> In real life, half-an-hour or an hour is easily passed before
> everybody is back at work on the backup machine, busy trying to
> catch up on the wasted time. Do not forget that since the
> transmission of redo logs is asynchronous (I have heard about
> improvements with 9i) some transactions - committed ones - will have
> been lost, so users will have to check and probably reenter the
> missing transactions. At this point the main machine will probably
> be totally out of order. Wait another 2 or 4 hours to have somebody
> to come if it's a hardware problem, I guess that when everything is
> over everybody will be on their knees and the last thing they will
> have in mind is make the old primary database the new standby -
> assuming of course that all files are intact. And even if the
> ex-standby machine is possibly less powerful, everybody will
> probably wait until a quieter time, say the W/E, to switch back to
> the initial configuration. At which time, in all likelihood, a full
> database copy will have become necessary; I think that the simple
> fact of having reentered a couple of transactions not transmitted
> yet to the standby database would require it. Do I err ?

Basically, the only time you wouldn't do a graceful failover is in the
rare event that you didn't have access to the last few logs the
primary had written.  In that case, you would be forced to activate
the standby database as of the time of the last log you have.  This is
one of the risks you take with a standby database, and the standby
must be presented to others within the company as a redundant solution
that may result in the loss of some large number of transactions,
depending on how how often the standby pulls logs.  There must be a
contingency plan in place to handle this eventuality that takes your
application and data into account.

Synchronous log update on the standby side is available in 9.0, and
available on previous versions using third-party technologies such as
EMC SRDF or Veritas SRVM.  These products can be employed to mirror
the online logs and controlfile, in order to create a no-loss standby.
The problem with this configuration is that it makes the primary
beholden to network latency for log writes.  This can have a
significant impact on performance.

I discuss *all* of these considerations at some length in my HA paper
on my site.

http://www.speakeasy.org/~jwilton/241.pdf

--
Jeremiah Wilton
http://www.speakeasy.net/~jwilton

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Jeremiah Wilton
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

Reply via email to