Re: [PERFORM] How to improve db performance with $7K?

Jacques Caron Mon, 18 Apr 2005 10:44:35 -0700

Hi,

At 16:59 18/04/2005, Greg Stark wrote:

William Yu <[EMAIL PROTECTED]> writes:

> Using the above prices for a fixed budget for RAID-10, you could get:
>
> SATA 7200 -- 680MB per $1000
> SATA 10K  -- 200MB per $1000
> SCSI 10K  -- 125MB per $1000

What a lot of these analyses miss is that cheaper == faster because cheaper
means you can buy more spindles for the same price. I'm assuming you picked
equal sized drives to compare so that 200MB/$1000 for SATA is almost twice as
many spindles as the 125MB/$1000. That means it would have almost double the
bandwidth. And the 7200 RPM case would have more than 5x the bandwidth.

While 10k RPM drives have lower seek times, and SCSI drives have a natural
seek time advantage, under load a RAID array with fewer spindles will start
hitting contention sooner which results into higher latency. If the controller
works well the larger SATA arrays above should be able to maintain their
mediocre latency much better under load than the SCSI array with fewer drives
would maintain its low latency response time despite its drives' lower average
seek time.

I would definitely agree. More factors in favor of more cheap drives: - cheaper drives (7200 rpm) have larger disks (3.7" diameter against 2.6 or 3.3). That means the outer tracks hold more data, and the same amount of data is held on a smaller area, which means less tracks, which means reduced seek times. You can roughly count the real average seek time as (average seek time over full disk * size of dataset / capacity of disk). And you actually need to physicall seek less often too.

- more disks means less data per disk, which means the data is further concentrated on outer tracks, which means even lower seek times

Also, what counts is indeed not so much the time it takes to do one single random seek, but the number of random seeks you can do per second. Hence, more disks means more seeks per second (if requests are evenly distributed among all disks, which a good stripe size should achieve).

Not taking into account TCQ/NCQ or write cache optimizations, the important parameter (random seeks per second) can be approximated as:

N * 1000 / (lat + seek * ds / (N * cap))

Where:
N is the number of disks
lat is the average rotational latency in milliseconds (500/(rpm/60))
seek is the average seek over the full disk in milliseconds
ds is the dataset size
cap is the capacity of each disk

Using this formula and a variety of disks, counting only the disks themselves (no enclosures, controllers, rack space, power, maintenance...), trying to maximize the number of seeks/second for a fixed budget (1000 euros) with a dataset size of 100 GB makes SATA drives clear winners: you can get more than 4000 seeks/second (with 21 x 80GB disks) where SCSI cannot even make it to the 1400 seek/second point (with 8 x 36 GB disks). Results can vary quite a lot based on the dataset size, which illustrates the importance of "staying on the edges" of the disks. I'll try to make the analysis more complete by counting some of the "overhead" (obviously 21 drives has a lot of other implications!), but I believe SATA drives still win in theory.

It would be interesting to actually compare this to real-world (or nearly-real-world) benchmarks to measure the effectiveness of features like TCQ/NCQ etc.

Jacques.

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] How to improve db performance with $7K?

Reply via email to