Re: Race condition in recovery?

2021-06-15 Thread Kyotaro Horiguchi
At Tue, 15 Jun 2021 07:54:49 -0400, Andrew Dunstan wrote in > > On 6/15/21 2:16 AM, Kyotaro Horiguchi wrote: > > At Fri, 11 Jun 2021 10:46:45 -0400, Tom Lane wrote in > >> I think jacana uses msys[2?], so this likely indicates a problem > >> in path sanitization for the archive command.

Re: Race condition in recovery?

2021-06-15 Thread Robert Haas
On Mon, Jun 14, 2021 at 3:47 PM Andrew Dunstan wrote: > So, will you feel happier with this applied? I haven't tested it yet but > I'm confident it will work. I'm not all that unhappy now, but yeah, that looks like an improvement to me. I'm still afraid that I will keep writing tests that blow

Re: Race condition in recovery?

2021-06-15 Thread Andrew Dunstan
On 6/15/21 2:16 AM, Kyotaro Horiguchi wrote: > At Fri, 11 Jun 2021 10:46:45 -0400, Tom Lane wrote in >> I think jacana uses msys[2?], so this likely indicates a problem >> in path sanitization for the archive command. Andrew, any advice? > Thanks for fixing it. > > # I haven't still succeed to

Re: Race condition in recovery?

2021-06-15 Thread Kyotaro Horiguchi
At Fri, 11 Jun 2021 10:46:45 -0400, Tom Lane wrote in > I think jacana uses msys[2?], so this likely indicates a problem > in path sanitization for the archive command. Andrew, any advice? Thanks for fixing it. # I haven't still succeed to run TAP tests on MSYS2 environment. I # cannot

Re: Race condition in recovery?

2021-06-14 Thread Andrew Dunstan
On 6/14/21 3:32 PM, Andrew Dunstan wrote: > On 6/14/21 1:50 PM, Andrew Dunstan wrote: >> On 6/14/21 1:11 PM, Robert Haas wrote: >>> On Mon, Jun 14, 2021 at 12:56 PM Andrew Dunstan wrote: $^X is not at all broken. The explanation here is pretty simple - the argument to perl2host

Re: Race condition in recovery?

2021-06-14 Thread Andrew Dunstan
On 6/14/21 1:50 PM, Andrew Dunstan wrote: > On 6/14/21 1:11 PM, Robert Haas wrote: >> On Mon, Jun 14, 2021 at 12:56 PM Andrew Dunstan wrote: >>> $^X is not at all broken. >>> >>> The explanation here is pretty simple - the argument to perl2host is >>> meant to be a directory. If we're going to

Re: Race condition in recovery?

2021-06-14 Thread Robert Haas
On Mon, Jun 14, 2021 at 1:50 PM Andrew Dunstan wrote: > Heres a snippet: > > sub perl2host > { > my ($subject) = @_; > ... > if (chdir $subject) > > Last time I looked you can't chdir to anything except a directory. OK, but like I said, you can't tell that from

Re: Race condition in recovery?

2021-06-14 Thread Andrew Dunstan
On 6/14/21 1:11 PM, Robert Haas wrote: > On Mon, Jun 14, 2021 at 12:56 PM Andrew Dunstan wrote: >> $^X is not at all broken. >> >> The explanation here is pretty simple - the argument to perl2host is >> meant to be a directory. If we're going to accomodate plain files then >> we have some more

Re: Race condition in recovery?

2021-06-14 Thread Robert Haas
On Mon, Jun 14, 2021 at 12:56 PM Andrew Dunstan wrote: > $^X is not at all broken. > > The explanation here is pretty simple - the argument to perl2host is > meant to be a directory. If we're going to accomodate plain files then > we have some more work to do in TestLib. This explanation seems

Re: Race condition in recovery?

2021-06-14 Thread Andrew Dunstan
On 6/14/21 11:52 AM, Robert Haas wrote: > On Sat, Jun 12, 2021 at 10:20 AM Tom Lane wrote: >> Andrew Dunstan writes: >>> I have pushed a fix, tested on a replica of fairywren/drongo, >> This bit seems a bit random: >> >> # WAL segment, this is enough to guarantee that the history file was >>

Re: Race condition in recovery?

2021-06-14 Thread Robert Haas
On Sat, Jun 12, 2021 at 10:20 AM Tom Lane wrote: > Andrew Dunstan writes: > > I have pushed a fix, tested on a replica of fairywren/drongo, > > This bit seems a bit random: > > # WAL segment, this is enough to guarantee that the history file was > # archived. > my $archive_wait_query = > -

Re: Race condition in recovery?

2021-06-13 Thread Mikael Kjellström
On 2021-06-10 01:09, Tom Lane wrote: Robert Haas writes: Got it. I have now committed the patch to all branches, after adapting your changes just a little bit. Thanks to you and Kyotaro-san for all the time spent on this. What a slog! conchuela failed its first encounter with this test

Re: Race condition in recovery?

2021-06-12 Thread Andrew Dunstan
On 6/12/21 1:54 PM, Tom Lane wrote: > Andrew Dunstan writes: >> On 6/12/21 1:07 PM, Tom Lane wrote: >>> OK. But it makes me itch a bit that this one wait-for-wal-to-be- >>> processed query looks different from all the other ones. >> I'm happy to bring the other two queries that look like this

Re: Race condition in recovery?

2021-06-12 Thread Tom Lane
Andrew Dunstan writes: > On 6/12/21 1:07 PM, Tom Lane wrote: >> OK. But it makes me itch a bit that this one wait-for-wal-to-be- >> processed query looks different from all the other ones. > I'm happy to bring the other two queries that look like this into line > with this one if you like. I

Re: Race condition in recovery?

2021-06-12 Thread Andrew Dunstan
On 6/12/21 1:07 PM, Tom Lane wrote: > Andrew Dunstan writes: >> On 6/12/21 10:20 AM, Tom Lane wrote: >>> I wonder whether that is a workaround for the poll_query_until bug >>> I proposed to fix at [1]. >> No, it's because I found it annoying and confusing that there was an >> invisible result

Re: Race condition in recovery?

2021-06-12 Thread Tom Lane
Andrew Dunstan writes: > On 6/12/21 10:20 AM, Tom Lane wrote: >> I wonder whether that is a workaround for the poll_query_until bug >> I proposed to fix at [1]. > No, it's because I found it annoying and confusing that there was an > invisible result when last_archived_wal is null. OK. But it

Re: Race condition in recovery?

2021-06-12 Thread Andrew Dunstan
On 6/12/21 10:20 AM, Tom Lane wrote: > Andrew Dunstan writes: >> I have pushed a fix, tested on a replica of fairywren/drongo, > This bit seems a bit random: > > # WAL segment, this is enough to guarantee that the history file was > # archived. > my $archive_wait_query = > - "SELECT

Re: Race condition in recovery?

2021-06-12 Thread Tom Lane
Andrew Dunstan writes: > I have pushed a fix, tested on a replica of fairywren/drongo, This bit seems a bit random: # WAL segment, this is enough to guarantee that the history file was # archived. my $archive_wait_query = - "SELECT '$walfile_to_be_archived' <= last_archived_wal FROM

Re: Race condition in recovery?

2021-06-12 Thread Andrew Dunstan
On 6/12/21 7:31 AM, Andrew Dunstan wrote: > On 6/12/21 3:48 AM, Michael Paquier wrote: >> On Fri, Jun 11, 2021 at 10:46:45AM -0400, Tom Lane wrote: >>> I think jacana uses msys[2?], so this likely indicates a problem >>> in path sanitization for the archive command. Andrew, any advice? >> Err,

Re: Race condition in recovery?

2021-06-12 Thread Andrew Dunstan
On 6/12/21 3:48 AM, Michael Paquier wrote: > On Fri, Jun 11, 2021 at 10:46:45AM -0400, Tom Lane wrote: >> I think jacana uses msys[2?], so this likely indicates a problem >> in path sanitization for the archive command. Andrew, any advice? > Err, something around TestLib::perl2host()? I'm

Re: Race condition in recovery?

2021-06-12 Thread Michael Paquier
On Fri, Jun 11, 2021 at 10:46:45AM -0400, Tom Lane wrote: > I think jacana uses msys[2?], so this likely indicates a problem > in path sanitization for the archive command. Andrew, any advice? Err, something around TestLib::perl2host()? -- Michael signature.asc Description: PGP signature

Re: Race condition in recovery?

2021-06-11 Thread Tom Lane
Kyotaro Horiguchi writes: >> ==~_~===-=-===~_~== >> pgsql.build/src/bin/pg_verifybackup/tmp_check/log/003_corruption_primary.log >> ==~_~===-=-===~_~== >> ... >> 2021-06-08 16:17:41.706 CEST [51792:9] 003_corruption.pl LOG: received >> replication command: START_REPLICATION SLOT

Re: Race condition in recovery?

2021-06-11 Thread Tom Lane
Dilip Kumar writes: > On Fri, Jun 11, 2021 at 11:45 AM Kyotaro Horiguchi > wrote: >>> ==~_~===-=-===~_~== >>> pgsql.build/src/test/recovery/tmp_check/log/025_stuck_on_old_timeline_primary.log >>> ==~_~===-=-===~_~== >>> ... >>> The system cannot find the path specified. >>> 2021-06-10

Re: Race condition in recovery?

2021-06-11 Thread Dilip Kumar
On Fri, Jun 11, 2021 at 11:45 AM Kyotaro Horiguchi wrote: > > At Thu, 10 Jun 2021 21:53:18 -0400, Tom Lane wrote in > tgl> Please note that conchuela and jacana are still failing ... > > I forgot jacana's case.. > > It is failing for the issue the first patch should have fixed. > > >

Re: Race condition in recovery?

2021-06-11 Thread Kyotaro Horiguchi
At Thu, 10 Jun 2021 21:53:18 -0400, Tom Lane wrote in tgl> Please note that conchuela and jacana are still failing ... I forgot jacana's case.. It is failing for the issue the first patch should have fixed. > ==~_~===-=-===~_~== >

Re: Race condition in recovery?

2021-06-10 Thread Kyotaro Horiguchi
At Fri, 11 Jun 2021 14:07:45 +0900 (JST), Kyotaro Horiguchi wrote in > At Thu, 10 Jun 2021 21:53:18 -0400, Tom Lane wrote in > > conchuela's failure is evidently not every time, but this test > > definitely postdates the "fix": conchuela failed recovery_check this time, and > >

Re: Race condition in recovery?

2021-06-10 Thread Kyotaro Horiguchi
At Thu, 10 Jun 2021 21:53:18 -0400, Tom Lane wrote in > Kyotaro Horiguchi writes: > > At Thu, 10 Jun 2021 09:56:51 -0400, Robert Haas > > wrote in > >> Thanks for the analysis and the patches. I have committed them. > > > Thanks for committing it. > > Please note that conchuela and jacana

Re: Race condition in recovery?

2021-06-10 Thread Tom Lane
Kyotaro Horiguchi writes: > At Thu, 10 Jun 2021 09:56:51 -0400, Robert Haas wrote > in >> Thanks for the analysis and the patches. I have committed them. > Thanks for committing it. Please note that conchuela and jacana are still failing ... conchuela's failure is evidently not every time,

Re: Race condition in recovery?

2021-06-10 Thread Kyotaro Horiguchi
At Thu, 10 Jun 2021 09:56:51 -0400, Robert Haas wrote in > On Wed, Jun 9, 2021 at 9:12 PM Kyotaro Horiguchi > wrote: > > https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=conchuela=2021-06-09%2021%3A12%3A25=recovery-check > > > > > ==~_~===-=-===~_~== > > >

Re: Race condition in recovery?

2021-06-10 Thread Robert Haas
On Wed, Jun 9, 2021 at 9:12 PM Kyotaro Horiguchi wrote: > https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=conchuela=2021-06-09%2021%3A12%3A25=recovery-check > > > ==~_~===-=-===~_~== > > pgsql.build/src/test/recovery/tmp_check/log/025_stuck_on_old_timeline_cascade.log > >

Re: Race condition in recovery?

2021-06-09 Thread Kyotaro Horiguchi
At Wed, 09 Jun 2021 19:09:54 -0400, Tom Lane wrote in > Robert Haas writes: > > Got it. I have now committed the patch to all branches, after adapting > > your changes just a little bit. > > Thanks to you and Kyotaro-san for all the time spent on this. What a slog! > > conchuela failed its

Re: Race condition in recovery?

2021-06-09 Thread Tom Lane
Robert Haas writes: > Got it. I have now committed the patch to all branches, after adapting > your changes just a little bit. > Thanks to you and Kyotaro-san for all the time spent on this. What a slog! conchuela failed its first encounter with this test case:

Re: Race condition in recovery?

2021-06-09 Thread Robert Haas
On Wed, Jun 9, 2021 at 4:07 AM Dilip Kumar wrote: > Reason for the problem was that the "-Xnone" parameter was not > accepted by "sub backup" in PostgresNode.pm so I created that for > backpatch. With attached patches I am to make it pass in v12,v11,v10 > (with fix) and fail (without fix).

Re: Race condition in recovery?

2021-06-09 Thread Robert Haas
On Wed, Jun 9, 2021 at 4:07 AM Dilip Kumar wrote: > Reason for the problem was that the "-Xnone" parameter was not > accepted by "sub backup" in PostgresNode.pm so I created that for > backpatch. With attached patches I am to make it pass in v12,v11,v10 > (with fix) and fail (without fix).

Re: Race condition in recovery?

2021-06-09 Thread Dilip Kumar
On Wed, Jun 9, 2021 at 1:37 PM Dilip Kumar wrote: > > On Wed, Jun 9, 2021 at 12:14 PM Dilip Kumar wrote: > > > > On Wed, Jun 9, 2021 at 2:07 AM Robert Haas wrote: > > 2021-06-09 12:11:08.618 IST [122456] LOG: entering standby mode > > 2021-06-09 12:11:08.622 IST [122456] LOG: restored log

Re: Race condition in recovery?

2021-06-09 Thread Dilip Kumar
On Wed, Jun 9, 2021 at 12:14 PM Dilip Kumar wrote: > > On Wed, Jun 9, 2021 at 2:07 AM Robert Haas wrote: > 2021-06-09 12:11:08.618 IST [122456] LOG: entering standby mode > 2021-06-09 12:11:08.622 IST [122456] LOG: restored log file > "0002.history" from archive > cp: cannot stat >

Re: Race condition in recovery?

2021-06-09 Thread Dilip Kumar
On Wed, Jun 9, 2021 at 2:07 AM Robert Haas wrote: > Then I tried to get things working on 9.6. There's a patch attached to > back-port a couple of PostgresNode.pm methods from 10 to 9.6, and also > a version of the main patch attached with the necessary wal->xlog, > lsn->location renaming.

Re: Race condition in recovery?

2021-06-08 Thread Robert Haas
On Tue, Jun 8, 2021 at 12:26 PM Robert Haas wrote: > I think the problem is here: > > Can't locate object method "lsn" via package "PostgresNode" at > t/025_stuck_on_old_timeline.pl line 84. > > When that happens, it bails out, and cleans everything up, doing an > immediate shutdown of all the

Re: Race condition in recovery?

2021-06-08 Thread Robert Haas
On Tue, Jun 8, 2021 at 4:47 AM Dilip Kumar wrote: > I have changed for as per 9.6 but I am seeing some crash (both > with/without fix), I could not figure out the reason, it did not > generate any core dump, although I changed pg_ctl in PostgresNode.pm > to use "-c" so that it can generate core

Re: Race condition in recovery?

2021-06-08 Thread Dilip Kumar
On Tue, Jun 8, 2021 at 11:13 AM Dilip Kumar wrote: > > # Wait until the node exits recovery. > $standby->poll_query_until('postgres', "SELECT pg_is_in_recovery() = 'f';") > or die "Timed out while waiting for promotion"; > > I will try to generate a version for 9.6 based on this idea and see how

Re: Race condition in recovery?

2021-06-07 Thread Dilip Kumar
On Tue, Jun 8, 2021 at 12:32 AM Robert Haas wrote: > > I tried back-porting my version of this patch to 9.6 to see what would > happen there. One problem is that some of the functions have different > names before v10. So 9.6 needs this: > > -"SELECT pg_walfile_name(pg_current_wal_lsn());");

Re: Race condition in recovery?

2021-06-07 Thread Kyotaro Horiguchi
At Mon, 7 Jun 2021 10:40:27 -0400, Robert Haas wrote in > On Mon, Jun 7, 2021 at 12:57 AM Kyotaro Horiguchi > wrote: > > Unfortunately no. The backslashes in the binary path need to be > > escaped. (taken from PostgresNode.pm:1008) > > > > > (my $perlbin = $^X) =~ s{\\}{}g if

Re: Race condition in recovery?

2021-06-07 Thread Robert Haas
Hi, I tried back-porting my version of this patch to 9.6 to see what would happen there. One problem is that some of the functions have different names before v10. So 9.6 needs this: -"SELECT pg_walfile_name(pg_current_wal_lsn());"); +"SELECT

Re: Race condition in recovery?

2021-06-07 Thread Robert Haas
On Mon, Jun 7, 2021 at 12:57 AM Kyotaro Horiguchi wrote: > Unfortunately no. The backslashes in the binary path need to be > escaped. (taken from PostgresNode.pm:1008) > > > (my $perlbin = $^X) =~ s{\\}{}g if ($TestLib::windows_os); > > $node_primary->append_conf( > > 'postgresql.conf',

Re: Race condition in recovery?

2021-06-06 Thread Kyotaro Horiguchi
Sorry, some extra words are left alone. At Mon, 07 Jun 2021 13:57:35 +0900 (JST), Kyotaro Horiguchi wrote in > As I said upthread the relationship between receiveTLI and > recoveryTargetTLI is not confirmed yet at the point. - findNewestTimeLine() simply searches for the history file with the

Re: Race condition in recovery?

2021-06-06 Thread Kyotaro Horiguchi
At Fri, 4 Jun 2021 10:56:12 -0400, Robert Haas wrote in > On Fri, Jun 4, 2021 at 5:25 AM Kyotaro Horiguchi > wrote: > > I think that's right. And the test script detects the issue for me > > both on Linux but doesn't work for Windows. > > > >

Re: Race condition in recovery?

2021-06-04 Thread Robert Haas
On Fri, Jun 4, 2021 at 5:25 AM Kyotaro Horiguchi wrote: > I think that's right. And the test script detects the issue for me > both on Linux but doesn't work for Windows. > > '"C:/../Documents/work/postgresql/src/test/recovery/t/cp_history_files"' is > not recognized as an internal command or

Re: Race condition in recovery?

2021-06-04 Thread Robert Haas
On Fri, Jun 4, 2021 at 3:51 AM Dilip Kumar wrote: > I could not reproduce this but I think I got the issue, I think I used > the wrong target LSN in wait_for_catchup, instead of checking the last > "insert LSN" of the standby I was waiting for last "replay LSN" of > standby which was wrong.

Re: Race condition in recovery?

2021-06-04 Thread Kyotaro Horiguchi
At Fri, 4 Jun 2021 13:21:08 +0530, Dilip Kumar wrote in > On Fri, Jun 4, 2021 at 2:03 AM Robert Haas wrote: > > > > On Thu, May 27, 2021 at 2:26 AM Dilip Kumar wrote: > > > Changed as suggested. > > > > I don't think the code as written here is going to work on Windows, > > because your code

Re: Race condition in recovery?

2021-06-04 Thread Dilip Kumar
On Fri, Jun 4, 2021 at 2:03 AM Robert Haas wrote: > > On Thu, May 27, 2021 at 2:26 AM Dilip Kumar wrote: > > Changed as suggested. > > I don't think the code as written here is going to work on Windows, > because your code doesn't duplicate enable_restoring's call to > perl2host or its

Re: Race condition in recovery?

2021-06-03 Thread Robert Haas
On Thu, May 27, 2021 at 2:26 AM Dilip Kumar wrote: > Changed as suggested. I don't think the code as written here is going to work on Windows, because your code doesn't duplicate enable_restoring's call to perl2host or its backslash-escaping logic. It would really be better if we could use

Re: Race condition in recovery?

2021-06-02 Thread Kyotaro Horiguchi
At Tue, 1 Jun 2021 16:45:52 -0400, Robert Haas wrote in > On Fri, May 28, 2021 at 2:05 AM Kyotaro Horiguchi > wrote: > > Mmmm. That looks like meaning that we don't intend to support the > > Dilip's case, and means that we support the use of > >

Re: Race condition in recovery?

2021-06-02 Thread Noah Misch
On Tue, Jun 01, 2021 at 04:45:52PM -0400, Robert Haas wrote: > On Fri, May 28, 2021 at 2:05 AM Kyotaro Horiguchi > wrote: > > Agreed. I often annoyed by a long-lasting TAP script when I wanted to > > do one of the test items in it. However, I was not sure which is our > > policy here,

Re: Race condition in recovery?

2021-06-01 Thread Robert Haas
On Fri, May 28, 2021 at 2:05 AM Kyotaro Horiguchi wrote: > Mmmm. That looks like meaning that we don't intend to support the > Dilip's case, and means that we support the use of > archive-command-copies-only-other-than-wal-segments? Actually, I think Dilip's case ought to be supported, but I

Re: Race condition in recovery?

2021-05-31 Thread Kyotaro Horiguchi
Moved to another thread. https://www.postgresql.org/message-id/20210531.165825.921389284096975508.horikyota@gmail.com regards. -- Kyotaro Horiguchi NTT Open Source Software Center

Re: Race condition in recovery?

2021-05-30 Thread Tatsuro Yamada
Hi Horiguchi-san, (Why me?) Because the story was also related to PG-REX, which you are also involved in developing. Perhaps off-list instead of -hackers would have been better, but I emailed -hackers because the same problem could be encountered by PostgreSQL users who do not use PG-REX.

Re: Race condition in recovery?

2021-05-28 Thread Kyotaro Horiguchi
(Sorry for being a bit off-topic) At Fri, 28 May 2021 12:18:35 +0900, Tatsuro Yamada wrote in > Hi Horiguchi-san, (Why me?) > In a project I helped with, I encountered an issue where > the archive command kept failing. I thought this issue was > related to the problem in this thread, so I'm

Re: Race condition in recovery?

2021-05-28 Thread Kyotaro Horiguchi
Thanks! At Thu, 27 May 2021 15:05:44 -0400, Robert Haas wrote in > On Wed, May 26, 2021 at 8:49 PM Kyotaro Horiguchi > wrote: > > So in the mail [1] and [2] I tried to describe what's going on around > > the two issues. Although I haven't have a response to [2], can I > > think that we

Re: Race condition in recovery?

2021-05-27 Thread Tatsuro Yamada
Hi Horiguchi-san, In a project I helped with, I encountered an issue where the archive command kept failing. I thought this issue was related to the problem in this thread, so I'm sharing it here. If I should create a new thread, please let me know. * Problem - The archive_command is failed

Re: Race condition in recovery?

2021-05-27 Thread Robert Haas
On Wed, May 26, 2021 at 8:49 PM Kyotaro Horiguchi wrote: > So in the mail [1] and [2] I tried to describe what's going on around > the two issues. Although I haven't have a response to [2], can I > think that we clarified the intention of ee994272ca? And may I think > that we decided that we

Re: Race condition in recovery?

2021-05-27 Thread Kyotaro Horiguchi
At Thu, 27 May 2021 12:47:30 +0530, Dilip Kumar wrote in > On Thu, May 27, 2021 at 12:09 PM Kyotaro Horiguchi > wrote: > > > > At Thu, 27 May 2021 11:44:47 +0530, Dilip Kumar > > wrote in > > We're writing at the very beginning of the switching segment at the > > promotion time. So it is

Re: Race condition in recovery?

2021-05-27 Thread Dilip Kumar
On Thu, May 27, 2021 at 12:09 PM Kyotaro Horiguchi wrote: > > At Thu, 27 May 2021 11:44:47 +0530, Dilip Kumar wrote > in > > Maybe we can somehow achieve that without a broken archive command, > > but I am not sure how it is enough to just delete WAL from pg_wal? I > > mean my original case

Re: Race condition in recovery?

2021-05-27 Thread Kyotaro Horiguchi
At Thu, 27 May 2021 11:44:47 +0530, Dilip Kumar wrote in > Maybe we can somehow achieve that without a broken archive command, > but I am not sure how it is enough to just delete WAL from pg_wal? I > mean my original case was that > 1. Got the new history file from the archive but did not get

Re: Race condition in recovery?

2021-05-27 Thread Dilip Kumar
On Wed, May 26, 2021 at 9:40 PM Robert Haas wrote: > ...which has a clear race condition. > src/test/recovery/t/023_pitr_prepared_xact.pl has logic to wait for a > WAL file to be archived, so maybe we can steal that logic and use it > here. Yeah, done that, I think we can use exact same logic

Re: Race condition in recovery?

2021-05-27 Thread Dilip Kumar
On Thu, May 27, 2021 at 6:19 AM Kyotaro Horiguchi wrote: > > At Wed, 26 May 2021 22:08:32 +0530, Dilip Kumar wrote > in > > On Wed, 26 May 2021 at 10:06 PM, Robert Haas wrote: > > > > > On Wed, May 26, 2021 at 12:26 PM Dilip Kumar > > > wrote: > > > > I will check if there is any timing

Re: Race condition in recovery?

2021-05-26 Thread Kyotaro Horiguchi
At Wed, 26 May 2021 22:08:32 +0530, Dilip Kumar wrote in > On Wed, 26 May 2021 at 10:06 PM, Robert Haas wrote: > > > On Wed, May 26, 2021 at 12:26 PM Dilip Kumar > > wrote: > > > I will check if there is any timing dependency in the test case. > > > > There is. I explained it in the second

Re: Race condition in recovery?

2021-05-26 Thread Dilip Kumar
On Wed, 26 May 2021 at 10:06 PM, Robert Haas wrote: > On Wed, May 26, 2021 at 12:26 PM Dilip Kumar > wrote: > > I will check if there is any timing dependency in the test case. > > There is. I explained it in the second part of my email, which you may > have failed to notice. Sorry, my bad.

Re: Race condition in recovery?

2021-05-26 Thread Robert Haas
On Wed, May 26, 2021 at 12:26 PM Dilip Kumar wrote: > I will check if there is any timing dependency in the test case. There is. I explained it in the second part of my email, which you may have failed to notice. -- Robert Haas EDB: http://www.enterprisedb.com

Re: Race condition in recovery?

2021-05-26 Thread Dilip Kumar
On Wed, May 26, 2021 at 9:40 PM Robert Haas wrote: > > On Wed, May 26, 2021 at 2:44 AM Dilip Kumar wrote: > > I think we need to create some content on promoted standby and check > > whether the cascade standby is able to get that or not, that will > > guarantee that it is actually following the

Re: Race condition in recovery?

2021-05-26 Thread Robert Haas
On Wed, May 26, 2021 at 2:44 AM Dilip Kumar wrote: > I think we need to create some content on promoted standby and check > whether the cascade standby is able to get that or not, that will > guarantee that it is actually following the promoted standby, I have > added the test for that so that

Re: Race condition in recovery?

2021-05-26 Thread Dilip Kumar
On Tue, May 25, 2021 at 9:16 PM Robert Haas wrote: > use FindBin; > > and then use $FindBin::RealBin to construct a path name to the executable, > e.g. > > $node_primary->append_conf( >'postgresql.conf', qq( > archive_command = '"$FindBin::RealBin/skip_cp" "%p" "$archivedir_primary/%f"'

Re: Race condition in recovery?

2021-05-25 Thread Robert Haas
On Sun, May 23, 2021 at 12:08 PM Dilip Kumar wrote: > I have created a tap test based on Robert's test.sh script. It > reproduces the issue. I am new with perl so this still needs some > cleanup/improvement, but at least it shows the idea. Thanks. I think this is the right idea but just needs

Re: Race condition in recovery?

2021-05-23 Thread Dilip Kumar
On Mon, May 24, 2021 at 10:17 AM Kyotaro Horiguchi wrote: > > At Sun, 23 May 2021 21:37:58 +0530, Dilip Kumar wrote > in > > On Sun, May 23, 2021 at 2:19 PM Dilip Kumar wrote: > > > > > > On Sat, May 22, 2021 at 8:33 PM Robert Haas wrote: > > > > I have created a tap test based on Robert's

Re: Race condition in recovery?

2021-05-23 Thread Kyotaro Horiguchi
At Sun, 23 May 2021 21:37:58 +0530, Dilip Kumar wrote in > On Sun, May 23, 2021 at 2:19 PM Dilip Kumar wrote: > > > > On Sat, May 22, 2021 at 8:33 PM Robert Haas wrote: > > I have created a tap test based on Robert's test.sh script. It > reproduces the issue. I am new with perl so this

Re: Race condition in recovery?

2021-05-23 Thread Kyotaro Horiguchi
At Fri, 21 May 2021 12:52:54 -0400, Robert Haas wrote in > I had trouble following it completely, but I didn't really spot > anything that seemed definitely wrong. However, I don't understand > what it has to do with where we are now. What I want to understand is: > under exactly what

Re: Race condition in recovery?

2021-05-23 Thread Dilip Kumar
On Sun, May 23, 2021 at 2:19 PM Dilip Kumar wrote: > > On Sat, May 22, 2021 at 8:33 PM Robert Haas wrote: I have created a tap test based on Robert's test.sh script. It reproduces the issue. I am new with perl so this still needs some cleanup/improvement, but at least it shows the idea. --

Re: Race condition in recovery?

2021-05-23 Thread Dilip Kumar
On Sat, May 22, 2021 at 8:33 PM Robert Haas wrote: > > For my original case, both standby1 and standby2 are connected to the > > primary. Now, standby1 is promoted and standby2 is shut down. And, > > before restarting, all the local WAL of the standby2 is removed so > > that it can follow the

Re: Race condition in recovery?

2021-05-22 Thread Robert Haas
On Sat, May 22, 2021 at 12:45 AM Dilip Kumar wrote: > No, in my original scenario also the new standby was not old primary, > I had 3 nodes > node1-> primary, node2 -> standby1, node3-> standby2 > node2 promoted as a new primary and node3's local WAL was removed (so > that it has to stream

Re: Race condition in recovery?

2021-05-22 Thread Dilip Kumar
On Sat, May 22, 2021 at 10:15 AM Dilip Kumar wrote: > > On Sat, May 22, 2021 at 1:14 AM Robert Haas wrote: > > > > The attached test script, test.sh seems to reliably reproduce this. > > Put that file and the recalcitrant_cp script, also attached, into an > > I haven't tested this, but I will do

Re: Race condition in recovery?

2021-05-21 Thread Dilip Kumar
On Sat, May 22, 2021 at 1:14 AM Robert Haas wrote: > > On Fri, May 21, 2021 at 12:52 PM Robert Haas wrote: > > I had trouble following it completely, but I didn't really spot > > anything that seemed definitely wrong. However, I don't understand > > what it has to do with where we are now. What

Re: Race condition in recovery?

2021-05-21 Thread Robert Haas
On Fri, May 21, 2021 at 12:52 PM Robert Haas wrote: > I had trouble following it completely, but I didn't really spot > anything that seemed definitely wrong. However, I don't understand > what it has to do with where we are now. What I want to understand is: > under exactly what circumstances

Re: Race condition in recovery?

2021-05-21 Thread Robert Haas
On Thu, May 20, 2021 at 10:21 PM Kyotaro Horiguchi wrote: > > > Conclusion: > > > - I think now we agree on the point that initializing expectedTLEs > > > with the recovery target timeline is the right fix. > > > - We still have some differences of opinion about what was the > > > original

Re: Race condition in recovery?

2021-05-21 Thread Robert Haas
On Fri, May 21, 2021 at 10:39 AM Dilip Kumar wrote: > > so we might have > > the timeline history in RECOVERYHISTORY but that's not the filename > > we're actually going to try to read from inside readTimeLineHistory(). > > In the second case, findNewestTimeLine() will call > >

Re: Race condition in recovery?

2021-05-21 Thread Dilip Kumar
On Fri, May 21, 2021 at 7:51 AM Kyotaro Horiguchi wrote: > > https://www.postgresql.org/message-id/50E43C57.5050101%40vmware.com > > > That leaves one case not covered: If you take a backup with plain > > "pg_basebackup" from a standby, without -X, and the first WAL segment > > contains a

Re: Race condition in recovery?

2021-05-21 Thread Dilip Kumar
On Thu, May 20, 2021 at 11:19 PM Robert Haas wrote: > > On Tue, May 18, 2021 at 1:33 AM Dilip Kumar wrote: > > Yeah, it will be a fake 1-element list. But just to be clear that > > 1-element can only be "ControlFile->checkPointCopy.ThisTimeLineID" and > > nothing else, do you agree to this?

Re: Race condition in recovery?

2021-05-21 Thread Kyotaro Horiguchi
At Fri, 21 May 2021 11:21:05 +0900 (JST), Kyotaro Horiguchi wrote in > At Thu, 20 May 2021 13:49:10 -0400, Robert Haas wrote > in > In the case of (c) recoveryTargetTLI > checkpoint TLI. In this case > we expecte that checkpint TLI is in the history of > recoveryTargetTLI. Otherwise

Re: Race condition in recovery?

2021-05-20 Thread Kyotaro Horiguchi
At Thu, 20 May 2021 13:49:10 -0400, Robert Haas wrote in > On Tue, May 18, 2021 at 1:33 AM Dilip Kumar wrote: > > Yeah, it will be a fake 1-element list. But just to be clear that > > 1-element can only be "ControlFile->checkPointCopy.ThisTimeLineID" and > > nothing else, do you agree to

Re: Race condition in recovery?

2021-05-20 Thread Robert Haas
On Tue, May 18, 2021 at 1:33 AM Dilip Kumar wrote: > Yeah, it will be a fake 1-element list. But just to be clear that > 1-element can only be "ControlFile->checkPointCopy.ThisTimeLineID" and > nothing else, do you agree to this? Because we initialize > recoveryTargetTLI to this value and we

Re: Race condition in recovery?

2021-05-19 Thread Dilip Kumar
On Tue, May 18, 2021 at 12:22 PM Kyotaro Horiguchi wrote: > And finally I think I could reach the situation the commit wanted to fix. > > I took a basebackup from a standby just before replaying the first > checkpoint of the new timeline (by using debugger), without copying > pg_wal. In this

Re: Race condition in recovery?

2021-05-18 Thread Kyotaro Horiguchi
At Tue, 18 May 2021 15:52:07 +0900 (JST), Kyotaro Horiguchi wrote in > FWIW, you could be get a problematic base backup by the following steps. > > 0. (make sure /tmp/hoge is removed) > 1. apply the attached patch > 2. create a primary then start > 3. create a standby then start > 4. place

Re: Race condition in recovery?

2021-05-18 Thread Kyotaro Horiguchi
At Mon, 17 May 2021 10:46:24 +0530, Dilip Kumar wrote in > On Mon, May 17, 2021 at 10:09 AM Dilip Kumar wrote: > > > > On Mon, May 17, 2021 at 8:50 AM Kyotaro Horiguchi > > wrote: > > > > > > Before the commit expectedTLEs is always initialized with just one > > > entry for the TLI of the

Re: Race condition in recovery?

2021-05-17 Thread Dilip Kumar
On Tue, May 18, 2021 at 1:28 AM Robert Haas wrote: > > Sorry, you're right. It couldn't be uninitialized, but it could be a > fake 1-element list saying there are no ancestors rather than the real > value. So I think the point was to avoid that. Yeah, it will be a fake 1-element list. But just

Re: Race condition in recovery?

2021-05-17 Thread Robert Haas
On Sat, May 15, 2021 at 1:25 AM Dilip Kumar wrote: > > As I understand it, the general issue here was that if > > XLogFileReadAnyTLI() was called before expectedTLEs got set, then > > prior to this commit it would have to fail, because the foreach() loop > > in that function would be iterating

Re: Race condition in recovery?

2021-05-16 Thread Dilip Kumar
On Mon, May 17, 2021 at 10:09 AM Dilip Kumar wrote: > > On Mon, May 17, 2021 at 8:50 AM Kyotaro Horiguchi > wrote: > > > > Before the commit expectedTLEs is always initialized with just one > > entry for the TLI of the last checkpoint record. > > Right > > > (1) If XLogFileReadAnyTLI() found the

Re: Race condition in recovery?

2021-05-16 Thread Dilip Kumar
On Mon, May 17, 2021 at 8:50 AM Kyotaro Horiguchi wrote: > > Before the commit expectedTLEs is always initialized with just one > entry for the TLI of the last checkpoint record. Right > (1) If XLogFileReadAnyTLI() found the segment but no history file > found, that is, using the dummy

Re: Race condition in recovery?

2021-05-16 Thread Kyotaro Horiguchi
At Mon, 17 May 2021 13:01:04 +0900 (JST), Kyotaro Horiguchi wrote in > At Mon, 17 May 2021 12:20:12 +0900 (JST), Kyotaro Horiguchi > wrote in > > Assuming that we keep expectedTLEs synced with recoveryTargetTLI, > > rescanLatestTimeLine updates the list properly so no need to worry > > about

Re: Race condition in recovery?

2021-05-16 Thread Kyotaro Horiguchi
At Mon, 17 May 2021 12:20:12 +0900 (JST), Kyotaro Horiguchi wrote in > Assuming that we keep expectedTLEs synced with recoveryTargetTLI, > rescanLatestTimeLine updates the list properly so no need to worry > about the future. So the issue would be in the past timelines. After > reading the

Re: Race condition in recovery?

2021-05-16 Thread Kyotaro Horiguchi
At Sat, 15 May 2021 10:55:05 +0530, Dilip Kumar wrote in > On Sat, May 15, 2021 at 3:58 AM Robert Haas wrote: > > > > I did notice, but keep in mind that this was more than 8 years ago. > > Even if Heikki is reading this thread, he may not remember why he > > changed 1 line of code one way

Re: Race condition in recovery?

2021-05-14 Thread Dilip Kumar
On Sat, May 15, 2021 at 3:58 AM Robert Haas wrote: > > I did notice, but keep in mind that this was more than 8 years ago. > Even if Heikki is reading this thread, he may not remember why he > changed 1 line of code one way rather than another in 2013. I mean if > he does that's great, but it's

Re: Race condition in recovery?

2021-05-14 Thread Robert Haas
On Fri, May 14, 2021 at 12:59 AM Dilip Kumar wrote: > I am not sure that have you noticed the commit id which changed the > definition of expectedTLEs, Heikki has committed that change so adding > him in the list to know his opinion. I did notice, but keep in mind that this was more than 8 years

  1   2   >