On Thu, Nov 10, 2005 at 11:39:34AM -0500, Tom Lane wrote:
> No, Mike is right: for WAL you shouldn't need any journaling. This is
> because we zero out *and fsync* an entire WAL file before we ever
> consider putting live WAL data in it. During live use of a WAL file,
> its metadata is not changing. As long as the filesystem follows
> the minimal rule of syncing metadata about a file when it fsyncs the
> file, all the live WAL files should survive crashes OK.
Yes, with emphasis on the zero out... :-)
> You do need metadata journaling for all non-WAL PG files, since we don't
> fsync them every time we extend them; which means the filesystem could
> lose track of which disk blocks belong to such a file, if it's not
I think there may be theoretical problems with regard to the ordering
of the fsync operation, for files that are not pre-allocated. For
example, if a new block is allocated - there are two blocks that need
to be updated. The indirect reference block (or inode block, if block
references fit into the inode entry), and the block itself. If the
indirect reference block is written first, before the data block, the
state of the disk is inconsistent. This would be a crash during the
fsync() operation. The metadata journalling can ensure that the data
block is allocated first, and then all the necessary references
updated, allowing for the operation to be incomplete and rolled back,
or committed in full.
Or, that is my understanding, anyways, and this is why I would not use
ext2 for the database, even if it was claimed that fsync() was used.
For WAL, with pre-allocated zero blocks? Sure. Ext2... :-)
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED]
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings