Re: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-06-17 Thread Michael Paquier
On Sun, Jun 17, 2018 at 07:33:01PM -0700, Andres Freund wrote: > On 2018-06-17 22:31:02 -0400, Tom Lane wrote: >> Yeah, for me parallelized check-world only works in >= 9.6. My (vague) >> recollection is that multiple fixes were needed to get to that point, >> so I doubt it's worth trying to fix

Re: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-06-17 Thread Andres Freund
Hi, On 2018-06-17 22:31:02 -0400, Tom Lane wrote: > Michael Paquier writes: > > Trying to run regression tests in parallel in ~9.5 leads to spurious > > failures, which is annoying... I had a patch fixing that but I cannot > > put my finger on the thread where this has been discussed. > >

Re: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-06-17 Thread Tom Lane
Michael Paquier writes: > Trying to run regression tests in parallel in ~9.5 leads to spurious > failures, which is annoying... I had a patch fixing that but I cannot > put my finger on the thread where this has been discussed. Yeah, for me parallelized check-world only works in >= 9.6. My

Re: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-06-17 Thread Michael Paquier
On Wed, Jun 13, 2018 at 09:00:47AM +0900, Michael Paquier wrote: > Note for everybody on this list: I will be out for a couple of days at > the end of this week, and my intention is to finish wrapping this patch > at the beginning of next week, with a back-patch down to 9.5 where > palloc_extended

Re: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-06-12 Thread Michael Paquier
On Tue, Jun 12, 2018 at 04:27:50PM +0900, Michael Paquier wrote: > On Tue, Jun 12, 2018 at 06:30:49AM +, Tsunakawa, Takayuki wrote: >> Thank you so much. This version looks better. >> >> + * this would cause the instance to stop suddendly with a hard failure, >> >> suddendly -> suddenly

Re: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-06-12 Thread Michael Paquier
On Tue, Jun 12, 2018 at 06:30:49AM +, Tsunakawa, Takayuki wrote: > Thank you so much. This version looks better. > > + * this would cause the instance to stop suddendly with a hard failure, > > suddendly -> suddenly Yep. Thanks for the extra lookup. -- Michael signature.asc

RE: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-06-12 Thread Tsunakawa, Takayuki
> From: Michael Paquier [mailto:mich...@paquier.xyz] > As this is one of those small bug fixes for which we can do something, > attached > is an updated patch with a commit description, and tutti-quanti. At the > end, I have moved the size check within allocate_recordbuf(). Even if the > size

Re: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-06-11 Thread Michael Paquier
On Sun, Mar 18, 2018 at 08:49:01AM +0900, Michael Paquier wrote: > On Fri, Mar 16, 2018 at 06:02:25AM +, Tsunakawa, Takayuki wrote: >> Ouch, you're right. If memory allocation fails, the startup process >> would emit a LOG message and continue to fetch new WAL records. Then, >> I'm

Re: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-03-17 Thread Michael Paquier
On Fri, Mar 16, 2018 at 06:02:25AM +, Tsunakawa, Takayuki wrote: > Ouch, you're right. If memory allocation fails, the startup process > would emit a LOG message and continue to fetch new WAL records. Then, > I'm completely happy with your patch. Thanks for double-checking, Tsunakawa-san.

RE: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-03-16 Thread Tsunakawa, Takayuki
From: Michael Paquier [mailto:mich...@paquier.xyz] > We use palloc_extended with MCXT_ALLOC_NO_OOM in 9.5~, and malloc() further > down, so once you remove the FATAL error caused by a record whose length > is higher than 1GB, you discard all the hard failures, no? Ouch, you're right. If memory

Re: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-03-15 Thread Michael Paquier
On Fri, Mar 16, 2018 at 05:27:58AM +, Tsunakawa, Takayuki wrote: > Honestly, I'm fine with either patch. I like your simpler and cleaner > one that has no performance impact. But please note that the > allocation attempt could amount to nearly 1 GB. That can fail due to > memory shortage,

RE: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-03-15 Thread Tsunakawa, Takayuki
From: Michael Paquier [mailto:mich...@paquier.xyz] > Even with that, the resulting patch is sort of ugly... So it seems to me > that the conclusion to this thread is that there is no clear winner, and > that the problem is so unlikely to happen that it is not worth the performance > impact to

RE: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-02-26 Thread Tsunakawa, Takayuki
From: Michael Paquier [mailto:mich...@paquier.xyz] > By the way, as long as I have my mind of it. Another strategy would be > to just make the checks in XLogReadRecord() a bit smarter if the whole record > header is not on the page. If we check at least for > AllocSizeIsValid(total_len) then

Re: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-02-26 Thread Michael Paquier
On Mon, Feb 26, 2018 at 05:08:49PM +0900, Michael Paquier wrote: > This was mentioned back in 2001 by the way, but this did not count much > for the case discussed here: > https://www.postgresql.org/message-id/24901.995381770%40sss.pgh.pa.us > The issue here is that the streaming case makes it

Re: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-02-26 Thread Michael Paquier
On Mon, Feb 26, 2018 at 07:25:46AM +, Tsunakawa, Takayuki wrote: > From: Michael Paquier [mailto:mich...@paquier.xyz] >> The WAL receiver approach also has a drawback. If WAL is streamed at full >> speed, then the primary sends data with a maximum of 6 WAL pages. >> When beginning streaming

RE: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-02-25 Thread Tsunakawa, Takayuki
From: Michael Paquier [mailto:mich...@paquier.xyz] > I have been playing more with that this morning, and trying to tweak the > XLOG reader so as the fetched page is zeroed where necessary does not help > much. XLogReaderState->EndRecPtr is updated once the last record is set > so it is possible

Re: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-02-25 Thread Michael Paquier
On Fri, Feb 23, 2018 at 11:02:19PM +0900, Michael Paquier wrote: > Tsunakawa-san has proposed upthread to fix the problem by zero-ing the > page read in the WAL receiver. While I agree that zeroing the page is > the way to go, doing so in the WAL receiver does not take care of > problems with the

Re: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-02-23 Thread Michael Paquier
On Fri, Feb 23, 2018 at 11:26:31AM +0900, Michael Paquier wrote: > An other, evil, idea that I have on top of all those things is to > directly hexedit the WAL segment of the standby just at the limit where > it would receive a record from the primary and insert in it garbage > data which would

Re: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-02-22 Thread Michael Paquier
On Thu, Feb 22, 2018 at 04:55:38PM +0900, Michael Paquier wrote: > I am definitely ready to buy that it can be possible to have garbage > being read the length field which can cause allocate_recordbuf to fail > as that's the only code path in xlogreader.c which does such an > allocation. Still,

Re: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-02-21 Thread Michael Paquier
On Mon, Feb 19, 2018 at 03:01:15AM +, Tsunakawa, Takayuki wrote: > From: Michael Paquier [mailto:mich...@paquier.xyz] Sorry for my late reply. I was looking at this problem for the last couple of days here and there, still thinking about it. >> It seems to me that the consolidation of the

RE: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-02-18 Thread Tsunakawa, Takayuki
Thank you for reviewing. From: Michael Paquier [mailto:mich...@paquier.xyz] > It seems to me that the consolidation of the page read should happen directly > in xlogreader.c and not even in one of its callbacks so as no garbage data > is presented back to the caller using its own XLogReader. > I

Re: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-02-16 Thread Michael Paquier
On Fri, Feb 16, 2018 at 04:19:00PM +0900, Michael Paquier wrote: > Wait a minute here, when recycled past WAL segments would be filled with > zeros before being used. Please feel free to ignore this part. I pushed the "Send" button without seeing it, and I was thinking uner which circumstances

Re: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-02-15 Thread Michael Paquier
On Wed, Feb 14, 2018 at 04:37:05AM +, Tsunakawa, Takayuki wrote: > The PostgreSQL version is 9.5. The cluster consists of a master, a > cascading standby (SB1), and a cascaded standby (SB2). The WAL flows > like this: master -> SB1 -> SB2. > > The user shut down SB2 and tried to restart

[bug fix] Cascaded standby cannot start after a clean shutdown

2018-02-13 Thread Tsunakawa, Takayuki
Hello, Our customer encountered a rare bug of PostgreSQL which prevents a cascaded standby from starting up. The attached patch is a fix for it. I hope this will be back-patched. I'll add this to the next CF. PROBLEM == The PostgreSQL version is 9.5. The