> Filesystems could in theory provide facility like atomic write (at least up
> to a certain size say in MB range) but it's not so easy and when there are
> no strong usecases fs people are reluctant to make their code more complex
> unnecessarily. OTOH without widespread atomic write support I understand
> application developers have similar stance. So it's kind of chicken and egg
> problem. BTW, e.g. ext3/4 has quite a bit of the infrastructure in place
> due to its data=journal mode so if someone on the PostgreSQL side wanted to
> research on this, knitting some experimental ext4 patches should be doable.
Atomic 8kB writes would improve performance for us quite a lot.  Full
page writes to WAL are very expensive.  I don't remember what
percentage of write-ahead log traffic that accounts for, but it's not
small.
OK, and do you need atomic writes on per-IO basis or per-file is enough?
It basically boils down to - is all or most of IO to a file going to be
atomic or it's a smaller fraction?

The write-ahead log wouldn't need it, but data files writes would.  So
we'd need it a lot, but not for absolutely everything.

For any given file, we'd either care about writes being atomic, or we wouldn't.

As Dave notes, unless there is HW support (which is coming with newest
solid state drives), ext4/xfs will have to implement this by writing data
to a filesystem journal and after transaction commit checkpointing them to
a final location. Which is exactly what you do with your WAL logs so
it's not clear it will be a performance win. But it is easy enough to code
for ext4 that I'm willing to try...

Yeah, hardware support would be great.

