On Thu, Aug 28, 2003, Tom Lane wrote: > Sean Chittenden <[EMAIL PROTECTED]> writes: > > Are there any objections > > to me increasing the block size for FreeBSD installations to 16K for > > the upcoming 7.4 release? > > I'm a little uncomfortable with introducing a cross-platform variation > in the standard block size. That would have implications for things > like whether a table definition that works on FreeBSD could be expected > to work elsewhere; to say nothing of recommendations for shared_buffer > settings and suchlike. > > Also, there is no infrastructure for adjusting BLCKSZ automatically at > configure time, and I don't much want to add it.
On recent versions of FreeBSD (and Solaris too, I think), the default UFS block size is 16K, and file fragments are 2K. This works great for many workloads, but it kills pgsql's random write performance unless pgsql uses 16K blocks as well, due to the read-modify-write involved. Either the filesystem or the database needs to be changed in order to get decent performance. I have not compared 16K UFS/16K pgsql to 8K UFS/8K pgsql, so I can't say which option makes more sense, though. There probably isn't anything wrong with the pgsql default, except that it's set in stone. It's entirely feasible for administrators to create 8K/1K UFS filesystems specifically for pgsql, but they need to be aware of the issue. On the other hand, I don't see how it would be a bad thing if pgsql were able to adapt at runtime either. Thus, I've come up with two possible fixes: (1) Document the problem with having a filesystem block size larger than the database block size. With a simple call to statvfs(2), the postmaster could warn about this on startup, too. (2) Make BLCKSZ a runtime constant, stored as part of the database. Grepping through the source, I didn't see any places using BLCKSZ where efficiency appeared to be so critical that you had to have constant folding. Of course, one could introduce a 'lg2blksz' constant to avoid divides and multiplies. This would NOT introduce cross-platform incompatibilities, only efficiency problems with databases that have been moved across filesystems in some cases. The ability to adapt at database creation time is also useful in that it allows the database to be tuned to the characteristics of the particular device on which it resides.[1] I don't know very much about pgsql, so corrections and feedback regarding these ideas would be appreciated. [1] Right now, the seek time to transfer time ratio of the drive is mostly hidden by the operating system's clustering and read-ahead. I tried modifying pgsql to use direct I/O, but it seems that pgsql doesn't do its own clustering or read-ahead, so that was a lose... ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend