Re: [HACKERS] [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly
From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Amit Kapila > I think it beginning of segment (aka the first page of the segment), even > the comment indicates the same. > > /* > * Whenever switching to a new WAL segment, we read the first page of > * the file and validate its header, even if that's not where the > * target record is. ... > .. > */ > > However, on again looking at the code, it seems that part of code behaves > similarly for both 9.2 and 9.3. Yes, the code behaves similarly in 9.2 and later. FYI, ValidXLogHeader() is called at two sites. The earlier one checks the first page of a segment when the real target page is different, and the latter one checks any page including the first page of a segment. > ..Because node3 found a WAL > ! * record fragment at the end of segment 10, it expects to find the ! > * remaining fragment at the beginning of WAL segment 11 streamed from ! > * node2. But there was a fragment of a different WAL record, because ! * > node2 overwrote a different WAL record at the end of segment 10 across ! > * to 11. > > How does node3 ensure that the fragment of WAL in segment 11 is different? > Isn't it possible that when node2 overwrites the last record in WAL segment > 10, it writes a record of slightly different contents but which is of the > same size as an original record in WAL segment 10? That case is detected by checking the CRC value in the XLOG record header. Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly
On Tue, Nov 15, 2016 at 7:51 AM, Tsunakawa, Takayuki wrote: > From: pgsql-hackers-ow...@postgresql.org >> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Amit Kapila >> It looks like the code in 9.3 or later version uses the recptr as the target >> segment location >> (targetSegmentPtr) whereas 9.2 uses recptr as beginning of segment (readOff >> = 0;). If above understanding is right then it will set different values >> for latestPagePtr in 9.2 and 9.3 onwards code. >> > > In 9.2, the relevant variable is not recptr but recaddr. recaddr in 9.2 and > recptr in later releases point to the beginning of a page just read, which is > not always the beginning of the segment (targetSegmentPtr). > I think it beginning of segment (aka the first page of the segment), even the comment indicates the same. /* * Whenever switching to a new WAL segment, we read the first page of * the file and validate its header, even if that's not where the * target record is. ... .. */ However, on again looking at the code, it seems that part of code behaves similarly for both 9.2 and 9.3. ..Because node3 found a WAL ! * record fragment at the end of segment 10, it expects to find the ! * remaining fragment at the beginning of WAL segment 11 streamed from ! * node2. But there was a fragment of a different WAL record, because ! * node2 overwrote a different WAL record at the end of segment 10 across ! * to 11. How does node3 ensure that the fragment of WAL in segment 11 is different? Isn't it possible that when node2 overwrites the last record in WAL segment 10, it writes a record of slightly different contents but which is of the same size as an original record in WAL segment 10? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly
From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas > Let me try to be more clear. I will not commit this patch if it is not > properly commented. I doubt that anyone else will, either. > > The fact that those code changes already exist in 9.4+ is not a reason to > back-port them to earlier releases without a proper explanation of why we > are doing it. Very possibly, we should also improve the comments in newer > branches so that future authors don't reintroduce whatever bugs were fixed > by these changes. But whether we do that or not, I am not going to commit > uncommented patches to complex code in order to fix obscure bugs in > 3+-year-old branches. I think that is a non-starter. > OK, although I'm not perfectly sure what to add as a comment, I added an example scenario as a comment because I thought a concrete situation helps to understand the existing two paragraphs. Is this good? Regards Takayuki Tsunakawa cascading_standby_stuck_v2.patch Description: cascading_standby_stuck_v2.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly
From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Amit Kapila > It looks like the code in 9.3 or later version uses the recptr as the target > segment location > (targetSegmentPtr) whereas 9.2 uses recptr as beginning of segment (readOff > = 0;). If above understanding is right then it will set different values > for latestPagePtr in 9.2 and 9.3 onwards code. > In 9.2, the relevant variable is not recptr but recaddr. recaddr in 9.2 and recptr in later releases point to the beginning of a page just read, which is not always the beginning of the segment (targetSegmentPtr). Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly
On Thu, Nov 10, 2016 at 12:13 AM, Tsunakawa, Takayuki wrote: > From: pgsql-hackers-ow...@postgresql.org >> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas >> OK. I agree that's a problem. However, your patch adds zero new comment >> text while removing some existing comments, so I can't easily tell how it >> solves that problem or whether it does so correctly. Even if I were smart >> enough to figure it out, I wouldn't want to rely on the next person also >> being that smart. This is obviously a subtle problem in tricky code, so >> a clear explanation of the fix seems like a very good idea. > > The comment describes what the code is trying to achieve. Actually, I just > imitated the code and comment of later major releases. The only difference > between later releases and my patch (for 9.2) is whether the state is stored > in XLogReaderStruct or as global variables. Below is the comment from 9.6, > where the second paragraph describes what the two nested if conditions mean. > The removed comment lines are what became irrelevant, which is also not > present in later major releases. Let me try to be more clear. I will not commit this patch if it is not properly commented. I doubt that anyone else will, either. The fact that those code changes already exist in 9.4+ is not a reason to back-port them to earlier releases without a proper explanation of why we are doing it. Very possibly, we should also improve the comments in newer branches so that future authors don't reintroduce whatever bugs were fixed by these changes. But whether we do that or not, I am not going to commit uncommented patches to complex code in order to fix obscure bugs in 3+-year-old branches. I think that is a non-starter. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly
On Thu, Nov 10, 2016 at 10:43 AM, Tsunakawa, Takayuki wrote: > From: pgsql-hackers-ow...@postgresql.org >> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas >> OK. I agree that's a problem. However, your patch adds zero new comment >> text while removing some existing comments, so I can't easily tell how it >> solves that problem or whether it does so correctly. Even if I were smart >> enough to figure it out, I wouldn't want to rely on the next person also >> being that smart. This is obviously a subtle problem in tricky code, so >> a clear explanation of the fix seems like a very good idea. > > The comment describes what the code is trying to achieve. Actually, I just > imitated the code and comment of later major releases. The only difference > between later releases and my patch (for 9.2) is whether the state is stored > in XLogReaderStruct or as global variables. Below is the comment from 9.6, > where the second paragraph describes what the two nested if conditions mean. > The removed comment lines are what became irrelevant, which is also not > present in later major releases. > > /* > * Since child timelines are always assigned a TLI greater than their > * immediate parent's TLI, we should never see TLI go backwards across > * successive pages of a consistent WAL sequence. > * > * Sometimes we re-read a segment that's already been (partially) > read. So > * we only verify TLIs for pages that are later than the last > remembered > * LSN. > */ > I think the changes which you are referring has been done as part of commit 7fcbf6a405ffc12a4546a25b98592ee6733783fc. There is no mention of such a bug fix in that commit; however, it is quite possible that such a change has fixed the problem you have reported. It is not clear if we can directly copy that change and it seems to me the change copied is also not complete. It looks like the code in 9.3 or later version uses the recptr as the target segment location (targetSegmentPtr) whereas 9.2 uses recptr as beginning of segment (readOff = 0;). If above understanding is right then it will set different values for latestPagePtr in 9.2 and 9.3 onwards code. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly
From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas > OK. I agree that's a problem. However, your patch adds zero new comment > text while removing some existing comments, so I can't easily tell how it > solves that problem or whether it does so correctly. Even if I were smart > enough to figure it out, I wouldn't want to rely on the next person also > being that smart. This is obviously a subtle problem in tricky code, so > a clear explanation of the fix seems like a very good idea. The comment describes what the code is trying to achieve. Actually, I just imitated the code and comment of later major releases. The only difference between later releases and my patch (for 9.2) is whether the state is stored in XLogReaderStruct or as global variables. Below is the comment from 9.6, where the second paragraph describes what the two nested if conditions mean. The removed comment lines are what became irrelevant, which is also not present in later major releases. /* * Since child timelines are always assigned a TLI greater than their * immediate parent's TLI, we should never see TLI go backwards across * successive pages of a consistent WAL sequence. * * Sometimes we re-read a segment that's already been (partially) read. So * we only verify TLIs for pages that are later than the last remembered * LSN. */ Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly
On Thu, Aug 25, 2016 at 10:33 PM, Tsunakawa, Takayuki wrote: > The processing went as follows. > > 1. node1's timeline is 140. It wrote a WAL record at the end of WAL segment > 117. The WAL record didn't fit the last page, so it was split across > segments 117 and 118. > > 2. WAL segment 117 was archived. > > 3. node1 got down, and node2 was promoted. > > 4. As part of the recovery process, node2 retrieves WAL segment 117 from > archive. It found a WAL record fragment at the end of the segment but could > not find the remaining fragment in segment 118, so node2 stops recovery there. > > LOG: restored log file "008C028C0075" from archive > LOG: received promote request > LOG: redo done at 28C/75FFF738 > > 5. node2 becomes the primary, and its timeline becomes 118. node3 is > disconnected by node2 (but later reconnectes to node2). > > LOG: terminating all walsender processes to force cascaded standby(s) to > update timeline and reconnect > > 6. node3 retrieves and applies WAL segment 117 from archive. > > LOG: restored log file "008C028C0075" from archive > > 7. node3 found .history file for time line 141 and renews its timeline to 141. > > 8. Because node3 found a WAL record fragment at the end of segment 117, it > expects to find the remaining fragment at the beginning of WAL segment 118 > streamed from node2. But there was a fragment of a different WAL record, > because node2 overwrote a different WAL record at the end of segment 117 > across to 118. > > LOG: invalid contrecord length 5892 in log file 652, segment 118, offset 0 > > 9. node3 then retrieves segment 117 from archive again to get the WAL record > at the end of segment 117. However, as node3's timeline is already 141, it > complains about the older timeline when it sees the timeline 140 at the > beginning of segment 117. > > LOG: out-of-sequence timeline ID 140 (after 141) in log file 652, segment > 117, offset 0 OK. I agree that's a problem. However, your patch adds zero new comment text while removing some existing comments, so I can't easily tell how it solves that problem or whether it does so correctly. Even if I were smart enough to figure it out, I wouldn't want to rely on the next person also being that smart. This is obviously a subtle problem in tricky code, so a clear explanation of the fix seems like a very good idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly
From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier > 9.3 has addressed that by allowing streaming standbys to perform timeline > jumps via the replication protocol. Doesn't this problem enter in this area? IIUC, that new feature enables timeline switch without disconnecting the standby and without WAL archive. I think 9.2 can perform timeline switch with WAL archive. In fact, the customer said they hit the problem only once in many occurrences of the same test. The bug seems to emerge depending on the timing. Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly
On Fri, Aug 26, 2016 at 11:33 AM, Tsunakawa, Takayuki wrote: > Our customer hit a problem of cascading replication, and we found the cause. > They are using the latest PostgreSQL 9.2.18. The bug seems to have been > fixed in 9.4 and higher during the big modification of xlog.c, but it's not > reflected in older releases. > > The attached patch is for 9.2.18. This just borrows the idea from 9.4 and > higher. > > But we haven't been able to reproduce the problem. Could you review the > patch and help to test it? I would very much appreciate it if you could > figure out how to reproduce the problem easily. > [...] > LOG: out-of-sequence timeline ID 140 (after 141) in log file 652, segment > 117, offset 0 9.3 has addressed that by allowing streaming standbys to perform timeline jumps via the replication protocol. Doesn't this problem enter in this area? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly
Hello, Our customer hit a problem of cascading replication, and we found the cause. They are using the latest PostgreSQL 9.2.18. The bug seems to have been fixed in 9.4 and higher during the big modification of xlog.c, but it's not reflected in older releases. The attached patch is for 9.2.18. This just borrows the idea from 9.4 and higher. But we haven't been able to reproduce the problem. Could you review the patch and help to test it? I would very much appreciate it if you could figure out how to reproduce the problem easily. PROBLEM The customer's configuration consists of three nodes: node1 is a primary, node2 is a synchronous standby, and node3 is a cascading standby. The primary archives WAL to a shared (network?) storage and the standbys read archived WAL from there with restore_command. recovery_target_timeline is set to 'latest' on the standbys. When node1 dies and node2 is promoted to a primary, node3 cannot catch up node2 forever, emitting the following message repeatedly: LOG: out-of-sequence timeline ID 140 (after 141) in log file 652, segment 117, offset 0 The expected behavior is that node3 catches up node2 and keeps synchronization. CAUSE The processing went as follows. 1. node1's timeline is 140. It wrote a WAL record at the end of WAL segment 117. The WAL record didn't fit the last page, so it was split across segments 117 and 118. 2. WAL segment 117 was archived. 3. node1 got down, and node2 was promoted. 4. As part of the recovery process, node2 retrieves WAL segment 117 from archive. It found a WAL record fragment at the end of the segment but could not find the remaining fragment in segment 118, so node2 stops recovery there. LOG: restored log file "008C028C0075" from archive LOG: received promote request LOG: redo done at 28C/75FFF738 5. node2 becomes the primary, and its timeline becomes 118. node3 is disconnected by node2 (but later reconnectes to node2). LOG: terminating all walsender processes to force cascaded standby(s) to update timeline and reconnect 6. node3 retrieves and applies WAL segment 117 from archive. LOG: restored log file "008C028C0075" from archive 7. node3 found .history file for time line 141 and renews its timeline to 141. 8. Because node3 found a WAL record fragment at the end of segment 117, it expects to find the remaining fragment at the beginning of WAL segment 118 streamed from node2. But there was a fragment of a different WAL record, because node2 overwrote a different WAL record at the end of segment 117 across to 118. LOG: invalid contrecord length 5892 in log file 652, segment 118, offset 0 9. node3 then retrieves segment 117 from archive again to get the WAL record at the end of segment 117. However, as node3's timeline is already 141, it complains about the older timeline when it sees the timeline 140 at the beginning of segment 117. LOG: out-of-sequence timeline ID 140 (after 141) in log file 652, segment 117, offset 0 Regards Takayuki Tsunakawa cascading_standby_stuck.patch Description: cascading_standby_stuck.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers