On Thu, Jul 6, 2017 at 3:24 PM, Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> wrote: > Hello, > > At Tue, 4 Jul 2017 14:49:26 +0530, Amit Kapila <amit.kapil...@gmail.com> > wrote in <CAA4eK1+SYqCmA7ioTBpJHcO-B-rf8A=n9gr1-rp3rhwecb5...@mail.gmail.com> >> On Tue, Jul 4, 2017 at 1:23 PM, Michael Paquier >> <michael.paqu...@gmail.com> wrote: >> > On Mon, Jul 3, 2017 at 7:40 PM, Amit Kapila <amit.kapil...@gmail.com> >> > wrote: >> > >> > It seems to me that this qualifies as an open item for 10. WAL-logging >> > is new for hash tables. Amit, could you add one? >> > >> >> Added. >> >> > + use_wal = RelationNeedsWAL(rel) || >> > + (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED && >> > + forkNum == INIT_FORKNUM); >> > In access AMs, empty() routines are always only called for unlogged >> > relations, so you could relax that to check for INIT_FORKNUM only. >> > >> >> Yeah, but _hash_init() is an exposed API, so I am not sure it is a >> good idea to make such an assumption at that level of code. Now, as >> there is no current user which wants to use it any other way, we can >> make such an assumption, but does it serve any purpose? > > Check for INIT_FORKNUM appears both accompanied and not > accompanied by check for RELPER.._UNLOGGED, so I'm not sure which > is the convention here. > > The difference of the two is an init fork of TEMP > relations. However I believe that init fork is by definition a > part of an unlogged relation, it seems to me that it ought not to > be wal-logged if we had it. From this viewpoint, the middle line > makes sense. >
What is the middle line? Are you okay with a current check or do you see any problem with it or do you prefer to write it differently? > > >> > Looking at the patch, I think that you are right to do the logging >> > directly in _hash_init() and not separately in hashbuildempty(). >> > Compared to other relations the init data is more dynamic. >> > >> > + /* >> > + * Force the on-disk state of init forks to always be in sync with >> > the >> > + * state in shared buffers. See XLogReadBufferForRedoExtended. >> > + */ >> > + XLogRecGetBlockTag(record, 0, NULL, &forknum, NULL); >> > + if (forknum == INIT_FORKNUM) >> > + FlushOneBuffer(metabuf); >> > I find those forced syncs a bit surprising. The buffer is already >> > marked as dirty, so even if there is a concurrent checkpoint they >> > would be flushed. >> > >> >> What if there is no concurrent checkpoint activity and the server is >> shutdown? Remember this replay happens on standby and we will just >> try to perform Restartpoint and there is no guarantee that it will >> flush the buffers. If the buffers are not flushed at shutdown, then >> the Init fork will be empty on restart and the hash index will be >> corrupt. > > XLogReadBufferForRedoExtended() automatically force the flush for > a XLOG record on init fork that having FPW imaeges. And > _hash_init seems to have emitted such a XLOG record using > log_newpage. > Sure, but log_newpage is not used for metapage and bitmappage. >> > If recovery comes again here, then they would just >> > be replayed. I am unclear why XLogReadBufferForRedoExtended is not >> > enough to replay a FPW of that. >> > >> >> Where does FPW come into the picture for a replay of metapage or bitmappage? > > Since required FPW seems to be emitted and the flush should be > done automatically, the issuer side (_hash_init) may attach wrong > FPW images if hash_xlog_init_meta_page requires additional > flushes. But I don't see such a flaw there. I think Michael wants > to say such a kind of thing. > I am not able to understand what you want to say, but I think you don't see any problem with the patch as such. > > > By the way the comment of the function ReadBufferWithoutRelcache > has the following sentense. > I don't think we need anything with respect to this patch because ReadBufferWithoutRelcache is used in case of FPW replay of INIT_FORK pages as well. Do you have something else in mind which I am not able to see? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers