On 05.05.2016 7:16, Amit Kapila wrote:
On Wed, May 4, 2016 at 8:03 PM, Tom Lane <t...@sss.pgh.pa.us
> Amit Kapila <amit.kapil...@gmail.com
> > On Wed, May 4, 2016 at 4:02 PM, Alex Ignatov
> > wrote:
> >> On 03.05.2016 2:17, Tom Lane wrote:
> >>> Writing a single sector ought to be atomic too.
> >> pg_control is 8k long(i think it is legth of one page in default PG
> >> compile settings).
> > The actual data written is always sizeof(ControlFileData) which
> > less than one sector.
> Yes. We don't care what happens to the rest of the file as long as the
> first sector's worth is updated atomically. See the comments for
> PG_CONTROL_SIZE and the code in ReadControlFile/WriteControlFile.
> We could change to a different PG_CONTROL_SIZE pretty easily, and there's
> certainly room to argue that reducing it to 512 or 1024 would be more
> efficient. I think the motivation for setting it at 8K was basically
> "we're already assuming that 8K writes are efficient, so let's assume
> it here too". But since the file is only written once per checkpoint,
> efficiency is not really a key selling point anyway. If you could make
> an argument that some other size would reduce the risk of failures,
> it would be interesting --- but I suspect any such argument would be
> very dependent on the quirks of a specific file system.
How about using 512 bytes as a write size and perform direct writes
rather than going via OS buffer cache for control file? Alex, is the
issue reproducible (to ensure that if we try to solve it in some way, do
we have way to test it as well)?
> One point worth considering is that on most file systems, rewriting
> a fraction of a page is *less* efficient than rewriting a full page,
> because the kernel first has to read in the old contents to fill
> the disk buffer it's going to partially overwrite with new data.
> This motivates against trying to reduce the write size too much.
Yes, you are very much right and I have observed that recently during my
work on WAL Re-Writes . However, I think that won't be the issue if
we use direct writes for control file.
EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/>
No issue happened only once. Also any attempts to reproduce it is not
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: