Imagine a system in "furious activity" with two (2) process regularly occuring

Process One:  Looooong read (or write). Takes 20ms to do seek, latency, and 
                stream off. Runs over and over. 
Process Two:  Single block read ( or write ). Typical database row access. 
                Optimally, could be submillisecond. happens more or less 

Let's say process one starts, and then process two. Assume, for sake of this 
that P2's block lies w/in P1's swath. (But doesn't have to...)

Now, everytime process two has to wait at LEAST 20ms to complete. In a 
system, it could be a lot faster. And me, looking for disk service times on P2, 
wondering "why does a single diskblock read keep taking >20ms?" doesn't need to be "a read" or "a write". It doesn't need to be 
"furious activity"
(two processes is not furious, even for a single user desktop.)  This is not a 
"corner case", 
and while it doesn't take into account kernel/drivecache/UBC buffering issues, 
I think it
shines a light on why command re-ordering might be useful. <shrug> 


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kevin Brown
Sent: Thursday, April 14, 2005 4:36 AM
Subject: Re: [PERFORM] How to improve db performance with $7K?

Greg Stark wrote:

> I think you're being misled by analyzing the write case.
> Consider the read case. When a user process requests a block and that 
> read makes its way down to the driver level, the driver can't just put 
> it aside and wait until it's convenient. It has to go ahead and issue 
> the read right away.

Well, strictly speaking it doesn't *have* to.  It could delay for a couple of 
milliseconds to see if other requests come in, and then issue the read if none 
do.  If there are already other requests being fulfilled, then it'll schedule 
the request in question just like the rest.

> In the 10ms or so that it takes to seek to perform that read
> *nothing* gets done. If the driver receives more read or write 
> requests it just has to sit on them and wait. 10ms is a lifetime for a 
> computer. In that time dozens of other processes could have been 
> scheduled and issued reads of their own.

This is true, but now you're talking about a situation where the system goes 
from an essentially idle state to one of furious activity.  In other words, 
it's a corner case that I strongly suspect isn't typical in situations where 
SCSI has historically made a big difference.

Once the first request has been fulfilled, the driver can now schedule the rest 
of the queued-up requests in disk-layout order.

I really don't see how this is any different between a system that has tagged 
queueing to the disks and one that doesn't.  The only difference is where the 
queueing happens.  In the case of SCSI, the queueing happens on the disks (or 
at least on the controller).  In the case of SATA, the queueing happens in the 

I suppose the tagged queueing setup could begin the head movement and, if 
another request comes in that requests a block on a cylinder between where the 
head currently is and where it's going, go ahead and read the block in 
question.  But is that *really* what happens in a tagged queueing system?  It's 
the only major advantage I can see it having.

> The same thing would happen if you had lots of processes issuing lots 
> of small fsynced writes all over the place. Postgres doesn't really do 
> that though. It sort of does with the WAL logs, but that shouldn't 
> cause a lot of seeking.  Perhaps it would mean that having your WAL 
> share a spindle with other parts of the OS would have a bigger penalty 
> on IDE drives than on SCSI drives though?


But I rather doubt that has to be a huge penalty, if any.  When a process 
issues an fsync (or even a sync), the kernel doesn't *have* to drop everything 
it's doing and get to work on it immediately.  It could easily gather a few 
more requests, bundle them up, and then issue them.  If there's a lot of disk 
activity, it's probably smart to do just that.  All fsync and sync require is 
that the caller block until the data hits the disk (from the point of view of 
the kernel). The specification doesn't require that the kernel act on the calls 
immediately or write only the blocks referred to by the call in question.

Kevin Brown                                           [EMAIL PROTECTED]

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?


Reply via email to