On Thu, Jan 19, 2017 at 6:37 PM, Kyotaro HORIGUCHI
<horiguchi.kyot...@lab.ntt.co.jp> wrote:
> Hello,
> At Wed, 18 Jan 2017 12:34:51 +0900, Michael Paquier 
> <michael.paqu...@gmail.com> wrote in 
> <cab7npqqytf2gie7fd-4ojjppvwikjrdqpc24hlngthx01sb...@mail.gmail.com>
>> On Tue, Jan 17, 2017 at 7:36 PM, Kyotaro HORIGUCHI
>> <horiguchi.kyot...@lab.ntt.co.jp> wrote:
>> > I managed to reproduce this. A little tweak as the first patch
>> > lets the standby to suicide as soon as walreceiver sees a
>> > contrecord at the beginning of a segment.
>> Good idea.
> Thanks. Fortunately(?), the problematic situation seems to happen
> at almost all segment boundary.
>> > I believe that a continuation record cannot be span over three or
>> > more *segments* (is it right?), so keeping one spare segment
>> > would be enough. The attached second patch does this.
>> I have to admit that I did not think about this problem much yet (I
>> bookmarked this report weeks ago to be honest as something to look
>> at), but that does not look right to me. Couldn't a record be spawned
>> across even more segments? Take a random string longer than 64MB or
>> event longer for example.
> Though I haven't look closer to how a modification is splitted
> into WAL records. A tuple cannot be so long. As a simple test, I
> observed rechder->xl_tot_len at the end of XLogRecordAssemble
> inserting an about 400KB not-so-compressable string into a text
> column, but I saw a series of many records with shorter than
> several thousand bytes.
>> > Other possible measures might be,
>> >
>> > - Allowing switching wal source while reading a continuation
>> >   record. Currently ReadRecord assumes that a continuation record
>> >   can be read from single source. But this needs refactoring
>> >   involving xlog.c, xlogreader.c and relatives.
>> This is scary thinking about back-branches.
> Yes. It would be no longer a bug fix. (Or becomes a quite ugly hack..)
>> > - Delaying recycling a segment until the last partial record on it
>> >   completes. This seems doable in page-wise (coarse resolution)
>> >   but would cost additional reading of past xlog files (page
>> >   header of past pages is required).
>> Hm, yes. That looks like the least invasive way to go. At least that
>> looks more correct than the others.
> The attached patch does that. Usually it reads page headers only
> on segment boundaries, but once continuation record found (or
> failed to read the next page header, that is, the first record on
> the first page in the next segment has not been replicated), it
> becomes to happen on every page boundary until non-continuation
> page comes.

I'm afraid that many WAL segments would start with a continuation record
when there are the workload of short transactions (e.g., by pgbench), and
which would make restart_lsn go behind very much. No?

The discussion on this thread just makes me think that restart_lsn should
indicate the replay location instead of flush location. This seems safer.


Fujii Masao

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to