On Wed 15-01-14 10:27:26, Heikki Linnakangas wrote:
> On 01/15/2014 06:01 AM, Jim Nasby wrote:
> >For the sake of completeness... it's theoretically silly that Postgres
> >is doing all this stuff with WAL when the filesystem is doing something
> >very similar with it's journal. And an SSD drive (and next generation
> >spinning rust) is doing the same thing *again* in it's own journal.
> >If all 3 communities (or even just 2 of them!) could agree on the
> >necessary interface a tremendous amount of this duplicated technology
> >could be eliminated.
> >That said, I rather doubt the Postgres community would go this route,
> >not so much because of the presumably massive changes needed, but more
> >because our community is not a fan of restricting our users to things
> >like "Thou shalt use a journaled FS or risk all thy data!"
> The WAL is also used for continuous archiving and replication, not
> just crash recovery. We could skip full-page-writes, though, if we
> knew that the underlying filesystem/storage is guaranteeing that a
> write() is atomic.
> It might be useful for PostgreSQL somehow tell the filesystem that
> we're taking care of WAL-logging, so that the filesystem doesn't
> need to.
Well, journalling fs generally cares about its metadata consistency. We
have much weaker guarantees regarding file data because those guarantees
come at a cost most people don't want to pay.
Filesystems could in theory provide facility like atomic write (at least up
to a certain size say in MB range) but it's not so easy and when there are
no strong usecases fs people are reluctant to make their code more complex
unnecessarily. OTOH without widespread atomic write support I understand
application developers have similar stance. So it's kind of chicken and egg
problem. BTW, e.g. ext3/4 has quite a bit of the infrastructure in place
due to its data=journal mode so if someone on the PostgreSQL side wanted to
research on this, knitting some experimental ext4 patches should be doable.
Jan Kara <j...@suse.cz>
SUSE Labs, CR
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: