On Tue, Sep 13, 2016 at 6:58 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Amit Kapila <amit.kapil...@gmail.com> writes:
>> While working on write-ahead-logging of hash indexes, I noticed that
>> this function allocates buckets in batches and the mechanism it uses
>> is that it initialize the last page of batch with zeros and expect
>> that the filesystem will ensure the intervening pages read as zeroes
> Yes. AFAIK that filesystem behavior is required by POSIX.
>> I think to make it WAL enabled, we need to initialize the page header
>> (using PageInit() or equivalent) instead of initializing it with
>> zeroes as some part of our WAL replay machinery expects that the page
>> should not be new as indicated by me in other thread .
> I don't really see why that's a problem. The only way one of the fill
> pages would get to be not-zero is if there is a WAL action later in the
> stream that overwrites it. So how would things become inconsistent?
It could lead to an error/log message indicating "unexpected zero
page". Refer below code in XLogReadBufferExtended():
if (mode == RBM_NORMAL)
/* check that page has been initialized */
Page page = (Page) BufferGetPage(buffer);
log_invalid_page(rnode, forknum, blkno, true);
Now, during replay of XLOG_FPI (WAL record for new page log_newpage),
it won't hit that condition because it is ensured in the caller that
if block has image, than we use RBM_ZERO_AND_CLEANUP_LOCK or
RBM_ZERO_AND_LOCK buffer mode. However, a generic reader of blocks
or new check consistency tool  being developed by Kuntal would lead
to an error/log - " .. contains unexpected zero page at block ..". I
think the check consistency tool might use some checks to avoid it,
but not sure.
>> Offhand, I don't see any problem with just
>> initializing the last page and write the WAL for same with
>> log_newpage(), however if we try to initialize all pages, there could
>> be some performance penalty on split operation.
> "Some" seems like rather an understatement. And it's not just the
> added I/O, it's the fact that you'd need to lock each bucket as you
> went through them to avoid clobbering concurrently-inserted data.
The question of concurrent-data insertion can only come into picture
when we update metapage information(maxbucket, lowmask, highmask, ..).
Without that, I don't think concurrent users would ever be aware of
the new buckets. However, as this work is done with metapage locked,
it is not advisable to do more work here, unless it is required.
> If you weren't talking about such an enormous penalty, I might be okay
> with zeroing the intervening pages explicitly rather than depending on
> the filesystem to do it. But since you are, I think you need a clearer
> explanation of why this is necessary.
I am also not in favor of doing the extra work unless it is required.
The point of this mail is to check if anybody sees that it is
required, because I can't find a reason to initialize all pages.
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: