Hadley Willan wrote:
To answer question 1, if you use software raid the chunk size is part of the /etc/raidtab file that is used on initial container creation. 4KB is the standard and a LARGE chunk size of 1MB may affect performance if you're not writing down to blocks in that size continuously. If you make it to big and you're constantly needing to write out smaller chunks of information, then you will find the disk "always" working and would be an inefficient use of the blocks. There is some free info around about calculating the ideal chunk size. Looking for "Calculating chunk size for RAID" through google.
"Why does the SAME configuration recommend a one megabyte stripe width? Let’s examine the reasoning behind this choice. Why not use a stripe depth smaller than one megabyte? Smaller stripe depths can improve disk throughput for a single process by spreading a single IO across multiple disks. However IOs that are much smaller than a megabyte can cause seek time to becomes a large fraction of the total IO time. Therefore, the overall efficiency of the storage system is reduced. In some cases it may be worth trading off some efficiency for the increased throughput that smaller stripe depths provide. In general it is not necessary to do this though. Parallel execution at database level achieves high disk throughput while keeping efficiency high. Also, remember that the degree of parallelism can be dynamically tuned, whereas the stripe depth is very costly to change.
Why not use a stripe depth bigger than one megabyte? One megabyte is large enough that a sequential scan will spend most of its time transferring data instead of positioning the disk head. A bigger stripe depth will improve scan efficiency but only modestly. One megabyte is small enough that a large IO operation will not “hog” a single disk for very long before moving to the next one. Further, one megabyte is small enough that Oracle’s asynchronous readahead operations access multiple disks. One megabyte is also small enough that a single stripe unit will not become a hot-spot. Any access hot-spot that is smaller than a megabyte should fit comfortably in the database buffer cache. Therefore it will not create a hot-spot on disk."
The SAME configuration paper says to ensure that that large IO operations aren't broken up between the DB and the disk, you need to be able to ensure that the database file multi-block read count (Oracle has a param called db_file_multiblock_read_count, does Postgres?) is the same size as the stripe width and the OS IO limits should be at least this size.
Also, it says, "Ideally we would like to stripe the log files using the same one megabyte stripe width as the rest of the files. However, the log files are written sequentially, and many storage systems limit the maximum size of a single write operation to one megabyte (or even less). If the maximum write size is limited, then using a one megabyte stripe width for the log files may not work well. In this case, a smaller stripe width such as 64K may work better. Caching RAID controllers are an exception to this. If the storage subsystem can cache write operations in nonvolatile RAM, then a one megabyte stripe width will work well for the log files. In this case, the write operation will be buffered in cache and the next log writes can be issued before the previous write is destaged to disk."
James Thornton ______________________________________________________ Internet Business Consultant, http://jamesthornton.com
---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ?