Re: Checkpoints

Tim Gorman Thu, 27 Feb 2003 06:16:16 -0800

Zabair,

You are indeed working from facts, but your conclusion that a checkpoint every minute is "clearly putting considerable load on LGWR" is a deductive leap which is probably unwarranted.

I do not have access to an Oracle8 v8.0.6 RDBMS or documentation, but I recall that the CHECKPOINT_PROCESS parameter became a "no-op" (i.e. hard-coded to TRUE) in Oracle8, after being able to be set to TRUE or FALSE in Oracle7. In Oracle8i and Oracle9i, the parameter is no longer present (even as a hidden "underscore" parameter). The upshot is, CKPT is a mandatory background process.

In Oracle7, if CKPT wasn't running, then the LGWR process would be responsible for updating the datafile headers while the DBWR process(es) were responsible for the real work of checkpointing, namely flushing "dirtied" blocks from the Buffer Cache to disk. So, the LGWR's role in Oracle7 was not onerous, but distracted it from its primary responsibilities (i.e. flushing redo information from the Log Buffer to the online redo log files). Especially if there were lots of data files with headers to be updated. Thus, the CKPT process was introduced in Oracle7, enabled and disabled using the above-mentioned parameter.

By Oracle8, CKPT was decided to be a good idea all-round, so the parameter became a "no-op" and CKPT was always enabled.

There are several points to make, but the main ones are:

LGWR is not affected by checkpoint frequency. It is single-mindedly focused on writing redo information to disk. So the supposition that checkpoint frequency affects LGWR is incorrect.
Checkpoint frequency is not a tuning parameter, it is a speed-of-recovery-in-the-event-of-failure parameter. The idea that checkpoints have to be delayed as long as possible in order to improve performance has got the issue by the wrong end of the stick. Checkpointing and flushing data changes from the Buffer Cache to the datafiles has got to occur sometime; data changes cannot just float in the Buffer Cache forever. Either these changes occur more frequently or less frequently, but they will occur. The issue is better approached from the standpoint of "how much instance recovery do I want to upon database restart in the event of failure"? If you have strict mean-time-to-repair (MTTR) requirements, that your environment be brought back into service quickly after failure, then you'll realize that frequent checkpoints are your friend, minimizing instance recovery at restart.
Since the busy-ness and activity of LGWR are not indicative of checkpoint tuning, please look at the DBWR and CKPT processes. Are they too busy? If you do "ps -eaf | sort -n +6 | tail", you'll usually get a sorting of processes by total CPU consumed. There are issues which can throw that display off, such as dates in place of times in columns 5-6, in which case you'll need to alternate between "ps -eaf | sort -n +6 | tail" and "ps -eaf | sort -n +7 | tail"; hopefully you get the idea. To get a more point-in-time idea, use the "top" command or "ps -eaf | sort -n +3 | tail", which will sort on "clock ticks"...
More to the point, examine a BSTAT/ESTAT or STATSPACK report taken during peak periods to see if there are any wait events indicative of the RDBMS waiting on the CKPT or DBWR processes. It would be best to interpret these reports through the YAPP processor on the www.oraperf.com website, but off the top of my head, such wait events would include "write complete waits" (i.e. DBWR locking specific buffer while flushing), "free buffer waits" (i.e. waiting while DBWR clears buffers so new blocks can be added), etc...

The upshot is, you were doing well with fact-based analysis of the situation. It would be a good idea to get an higher-level overview of performance problems during peak loads by taking a BSTAT/ESTAT or STATSPACK report and running it through the YAPP analyzer at www.oraperf.com, and work from the top-downwards to find what issues are consuming the largest proportion of "total response time" in your database. I'll bet you'll see no evidence that checkpointing is an issue of any importance, that there are other, far more important issues to attend to. As Mogens, Anjo, Cary, and others on this list have expressed so succinctly, R=S+W where R="response time", S="service time", and W="wait time". Start from there and then drill-down...

Hope this helps...

-Tim

----- Original Message -----

From: Zabair Ahmed

To: Multiple recipients of list ORACLE-L

Sent: Thursday, February 27, 2003 2:39 AM

Subject: Checkpoints

The following parameters are set in the init.ora:-

NAME                           VALUE
------------------------------ ---------------
log_checkpoint_interval        25600
log_checkpoint_timeout         0
log_checkpoints_to_alert       TRUE

This means that a checkpoint will happen every 12Mb of
redo being filled, os block size is 512. The size of the redo logs are 50Mb, as a solution, I recommend setting the log_checkpoint_interval to at least 50Mb to reduce the occurrences of these checkpoints.

What am noticing in the alert log is that a checkpoint is happening every minute at peak times, this is clearly putting considerable overhead on the lgwr.

Oracle 8.0.6.3.0 on Solaris 5.8

Anyone got any thoughts on the above or experienced checkpoints going mad on 8.0.6.

TIA

With Yahoo! Mail you can get a bigger mailbox -- choose a size that fits your needs

Re: Checkpoints

Reply via email to