On Thu, Jan 19, 2017 at 6:37 PM, Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> wrote: > Hello, > > At Wed, 18 Jan 2017 12:34:51 +0900, Michael Paquier > <michael.paqu...@gmail.com> wrote in > <cab7npqqytf2gie7fd-4ojjppvwikjrdqpc24hlngthx01sb...@mail.gmail.com> >> On Tue, Jan 17, 2017 at 7:36 PM, Kyotaro HORIGUCHI >> <horiguchi.kyot...@lab.ntt.co.jp> wrote: >> > I managed to reproduce this. A little tweak as the first patch >> > lets the standby to suicide as soon as walreceiver sees a >> > contrecord at the beginning of a segment. >> >> Good idea. > > Thanks. Fortunately(?), the problematic situation seems to happen > at almost all segment boundary. > >> > I believe that a continuation record cannot be span over three or >> > more *segments* (is it right?), so keeping one spare segment >> > would be enough. The attached second patch does this. >> >> I have to admit that I did not think about this problem much yet (I >> bookmarked this report weeks ago to be honest as something to look >> at), but that does not look right to me. Couldn't a record be spawned >> across even more segments? Take a random string longer than 64MB or >> event longer for example. > > Though I haven't look closer to how a modification is splitted > into WAL records. A tuple cannot be so long. As a simple test, I > observed rechder->xl_tot_len at the end of XLogRecordAssemble > inserting an about 400KB not-so-compressable string into a text > column, but I saw a series of many records with shorter than > several thousand bytes. > >> > Other possible measures might be, >> > >> > - Allowing switching wal source while reading a continuation >> > record. Currently ReadRecord assumes that a continuation record >> > can be read from single source. But this needs refactoring >> > involving xlog.c, xlogreader.c and relatives. >> >> This is scary thinking about back-branches. > > Yes. It would be no longer a bug fix. (Or becomes a quite ugly hack..) > >> > - Delaying recycling a segment until the last partial record on it >> > completes. This seems doable in page-wise (coarse resolution) >> > but would cost additional reading of past xlog files (page >> > header of past pages is required). >> >> Hm, yes. That looks like the least invasive way to go. At least that >> looks more correct than the others. > > The attached patch does that. Usually it reads page headers only > on segment boundaries, but once continuation record found (or > failed to read the next page header, that is, the first record on > the first page in the next segment has not been replicated), it > becomes to happen on every page boundary until non-continuation > page comes.
I'm afraid that many WAL segments would start with a continuation record when there are the workload of short transactions (e.g., by pgbench), and which would make restart_lsn go behind very much. No? The discussion on this thread just makes me think that restart_lsn should indicate the replay location instead of flush location. This seems safer. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers