Andreas, On 10/6/05 3:56 AM, "Zeugswetter Andreas DAZ SD" <[EMAIL PROTECTED]> wrote:
> pg relys on the OS readahead (== larger block IO) to do efficient IO. > Basically the pg scan performance should match a dd if=file of=/dev/null > bs=8k, > unless CPU bound. Which it is. Postgres will currently do a maximum of 120MB/s scan rate on modern CPUs, no matter how fast the underlying I/O subsystem is. Several people on this list have confirmed that by posting 100MB/s numbers on faster subsystems. Others have posted that Oracle/DB2/etc run at near disk rates. IMO, most of the problem is that there are too many small (tuple, single block) operations in the I/O path and they need to be optimized out. The first step is recognition that there is a problem worth solving, and it seems that there are now many who have recognized the issue of poor I/O path optimization in Postgres. I think a prioritized short list might look like this: - Sort performance - The implementation of a possibly fine algorithm yields very poor use of the underlying hardware. It needs to be optimized to remove redundant I/O operations and possibly (with different algorithm?) to make better use of available RAM. This would be practice for the next item on the list: - Scan performance - Someone needs to trace the life of a tuple through the executor and implement better buffering at each stage of the process. There was a start at this, but it only buffered in one node. Yes, I know we have a shared memory buffer, but it acts as a final destination, this is about the path through the executor. - I suspect that after profiling (with system routines included), we will find much more tuple-scale work that is interfering with the optimal flow of tuples through the executor. It's unlikely that the problem is isolated to the COUNT() aggregate, but is present in many other nodes. Redundant allocation/free of memory on a per tuple basis should be replaced with more persistent buffers on a page or larger basis. - Ultimately, after working through the above two issues, we will reach a point of diminishing returns where async I/O is needed to be able to remove producer/consumer problems in a single threaded executor. This can be implemented using sliced processes or threading, or by using aio. I believe that even without async I/O, we should be seeing scan rates of 200-400 MB/s on modern CPUs for the simple COUNT() aggregation pattern though. FWIW, - Luke ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match