|
Zabair,
You are indeed working from facts, but your
conclusion that a checkpoint every minute is "clearly putting considerable load
on LGWR" is a deductive leap which is probably unwarranted.
I do not have access to an Oracle8 v8.0.6 RDBMS or
documentation, but I recall that the CHECKPOINT_PROCESS parameter became a
"no-op" (i.e. hard-coded to TRUE) in Oracle8, after being able to be set to
TRUE or FALSE in Oracle7. In Oracle8i and Oracle9i, the parameter is no
longer present (even as a hidden "underscore" parameter). The upshot
is, CKPT is a mandatory background process.
In Oracle7, if CKPT wasn't running, then the LGWR
process would be responsible for updating the datafile headers while the DBWR
process(es) were responsible for the real work of checkpointing, namely flushing
"dirtied" blocks from the Buffer Cache to disk. So, the LGWR's role in
Oracle7 was not onerous, but distracted it from its primary responsibilities
(i.e. flushing redo information from the Log Buffer to the online redo log
files). Especially if there were lots of data files with headers to be
updated. Thus, the CKPT process was introduced in Oracle7, enabled and
disabled using the above-mentioned parameter.
By Oracle8, CKPT was decided to be a good idea
all-round, so the parameter became a "no-op" and CKPT was always
enabled.
There are several points to make, but the main ones
are:
- LGWR is not affected by checkpoint
frequency. It is single-mindedly focused on writing redo information to
disk. So the supposition that checkpoint frequency affects LGWR is
incorrect.
- Checkpoint frequency is not a tuning parameter, it
is a speed-of-recovery-in-the-event-of-failure parameter. The idea that
checkpoints have to be delayed as long as possible in order to improve
performance has got the issue by the wrong end of the stick.
Checkpointing and flushing data changes from the Buffer Cache to the
datafiles has got to occur sometime; data changes cannot just float in
the Buffer Cache forever. Either these changes occur more frequently or
less frequently, but they will occur. The issue is better approached
from the standpoint of "how much instance recovery do I want to upon database
restart in the event of failure"? If you have strict mean-time-to-repair
(MTTR) requirements, that your environment be brought back into service
quickly after failure, then you'll realize that frequent checkpoints are your
friend, minimizing instance recovery at restart.
- Since the busy-ness and activity of LGWR are not
indicative of checkpoint tuning, please look at the DBWR and CKPT
processes. Are they too busy? If you do "ps -eaf | sort -n +6 |
tail", you'll usually get a sorting of processes by total CPU consumed.
There are issues which can throw that display off, such as dates in place of
times in columns 5-6, in which case you'll need to alternate between "ps -eaf
| sort -n +6 | tail" and "ps -eaf | sort -n +7 | tail"; hopefully you
get the idea. To get a more point-in-time idea, use the "top" command or
"ps -eaf | sort -n +3 | tail", which will sort on "clock ticks"...
- More to the point, examine a BSTAT/ESTAT or
STATSPACK report taken during peak periods to see if there are any wait events
indicative of the RDBMS waiting on the CKPT or DBWR processes. It would
be best to interpret these reports through the YAPP processor on the www.oraperf.com website, but off the top of
my head, such wait events would include "write complete waits" (i.e. DBWR
locking specific buffer while flushing), "free buffer waits" (i.e. waiting
while DBWR clears buffers so new blocks can be added), etc...
The upshot is, you were doing well with fact-based
analysis of the situation. It would be a good idea to get an higher-level
overview of performance problems during peak loads by taking a BSTAT/ESTAT or
STATSPACK report and running it through the YAPP analyzer at www.oraperf.com, and work from the
top-downwards to find what issues are consuming the largest proportion of "total
response time" in your database. I'll bet you'll see no evidence that
checkpointing is an issue of any importance, that there are other, far more
important issues to attend to. As Mogens, Anjo, Cary, and
others on this list have expressed so succinctly, R=S+W where R="response
time", S="service time", and W="wait time". Start from there and then
drill-down...
Hope this helps...
-Tim
----- Original Message -----
Sent: Thursday, February 27, 2003 2:39
AM
Subject: Checkpoints
The following parameters are set in the
init.ora:-
NAME
VALUE ------------------------------
--------------- log_checkpoint_interval
25600 log_checkpoint_timeout
0 log_checkpoints_to_alert
TRUE
This means that a checkpoint will happen
every 12Mb of redo being filled, os block size is 512. The size of the
redo logs are 50Mb, as a solution, I recommend setting the
log_checkpoint_interval to at least 50Mb to reduce the occurrences of these
checkpoints.
What am noticing in the alert log is that a
checkpoint is happening every minute at peak times, this is clearly putting
considerable overhead on the lgwr.
Oracle 8.0.6.3.0 on Solaris 5.8
Anyone got any thoughts on the above or
experienced checkpoints going mad on 8.0.6.
TIA
With Yahoo! Mail you can get a bigger mailbox -- choose a
size that fits your needs
|