On Mon, 2005-05-30 at 10:59 +0900, ITAGAKI Takahiro wrote: > Yes, I've tested pgbench and dbt2 and their performances have improved. > The two results are as follows: > > 1. pgbench -s 100 on one Pentium4, 1GB mem, 2 ATA disks, Linux 2.6.8 > (attached image) > tps | wal_sync_method > -------+------------------------------------------------------- > 147.0 | open_direct + write multipage (previous patch) > 147.2 | open_direct (this patch) > 109.9 | open_sync
I'm surprised this makes as much of a difference as that benchmark would suggest. I wonder if we're benchmarking the right thing, though: is opening a file with O_DIRECT sufficient to ensure that a write(2) does not return until the data has hit disk? (As would be the case with O_SYNC.) O_DIRECT means the OS will attempt to minimize caching, but that is not necessarily the same thing: for example, I can imagine an implementation in which the kernel would submit the appropriate I/O to the disk when it sees a write(2) on a file opened with O_DIRECT, but then let the write(2) return before getting confirmation from the disk that the I/O has succeeded or failed. From googling, the MySQL documentation for innodb_flush_method notes: This option is only relevant on Unix systems. If set to fdatasync, InnoDB uses fsync() to flush both the data and log files. If set to O_DSYNC, InnoDB uses O_SYNC to open and flush the log files, but uses fsync() to flush the datafiles. If O_DIRECT is specified (available on some GNU/Linux versions starting from MySQL 4.0.14), InnoDB uses O_DIRECT to open the datafiles, and uses fsync() to flush both the data and log files. That would suggest O_DIRECT by itself is not sufficient to force a flush to disk -- if anyone has some more definitive evidence that would be welcome. Anyway, if the above is true, we'll need to use O_DIRECT as well as one of the existing wal_sync_methods. BTW, from the patch: + /* TODO: Aligment depends on OS and filesystem. */ + #define O_DIRECT_BUFFER_ALIGN 4096 I suppose there's no reasonable way to autodetect this, so we'll need to expose it as a GUC variable (or perhaps a configure option), which is a bit unfortunate. -Neil ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster