Re: Trap errors from streaming child in pg_basebackup to exit early

2022-02-23 Thread Daniel Gustafsson
> On 22 Feb 2022, at 02:13, Michael Paquier wrote: > The patch set looks fine overall. Thanks for reviewing and testing, I pushed this today after taking another round at it. -- Daniel Gustafsson https://vmware.com/

Re: Trap errors from streaming child in pg_basebackup to exit early

2022-02-21 Thread Michael Paquier
On Mon, Feb 21, 2022 at 03:11:30PM +0100, Daniel Gustafsson wrote: >On 21 Feb 2022, at 03:03, Michael Paquier wrote: >> +is($node->poll_query_until('postgres', >> + "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE " . >> + "application_name = '010_pg_basebackup.pl' AND wait_event

Re: Trap errors from streaming child in pg_basebackup to exit early

2022-02-21 Thread Daniel Gustafsson
> On 21 Feb 2022, at 03:03, Michael Paquier wrote: > > On Fri, Feb 18, 2022 at 10:00:43PM +0100, Daniel Gustafsson wrote: >> This is good idea, I was going in a different direction earlier with a test >> but >> this is cleaner. The attached 0001 refactors pump_until; 0002 fixes a >> trivial

Re: Trap errors from streaming child in pg_basebackup to exit early

2022-02-20 Thread Michael Paquier
On Fri, Feb 18, 2022 at 10:00:43PM +0100, Daniel Gustafsson wrote: > This is good idea, I was going in a different direction earlier with a test > but > this is cleaner. The attached 0001 refactors pump_until; 0002 fixes a trivial > spelling error found while hacking; and 0003 is the previous

Re: Trap errors from streaming child in pg_basebackup to exit early

2022-02-18 Thread Daniel Gustafsson
> On 16 Feb 2022, at 08:27, Michael Paquier wrote: > > On Wed, Sep 29, 2021 at 01:18:40PM +0200, Daniel Gustafsson wrote: >> So there is one mention of a background WAL receiver already in there, but >> it's >> pretty inconsistent as to what we call it. For now I've changed the >> messaging

Re: Trap errors from streaming child in pg_basebackup to exit early

2022-02-15 Thread Michael Paquier
On Wed, Sep 29, 2021 at 01:18:40PM +0200, Daniel Gustafsson wrote: > So there is one mention of a background WAL receiver already in there, but > it's > pretty inconsistent as to what we call it. For now I've changed the messaging > in this patch to say "background process", leaving making this

Re: Trap errors from streaming child in pg_basebackup to exit early

2021-11-10 Thread Bharath Rupireddy
On Wed, Sep 29, 2021 at 4:48 PM Daniel Gustafsson wrote: > > Other places in the code just refers to the background process as "the > > background process". The term "WAL receiver" is new from this patch. > > While I agree it's in many ways a better term, I think (1) we should > > try to be

Re: Trap errors from streaming child in pg_basebackup to exit early

2021-10-26 Thread Masahiko Sawada
On Wed, Sep 29, 2021 at 8:18 PM Daniel Gustafsson wrote: > > > On 28 Sep 2021, at 15:48, Magnus Hagander wrote: > > > Might be worth noting in one of the comments the difference in > > behaviour if the backend process/thread *crashes* -- that is, on Unix > > it will be detected via the signal

Re: Trap errors from streaming child in pg_basebackup to exit early

2021-09-29 Thread Daniel Gustafsson
> On 28 Sep 2021, at 15:48, Magnus Hagander wrote: > Might be worth noting in one of the comments the difference in > behaviour if the backend process/thread *crashes* -- that is, on Unix > it will be detected via the signal handler and on Windows the whole > process including the main thread

Re: Trap errors from streaming child in pg_basebackup to exit early

2021-09-28 Thread Magnus Hagander
On Fri, Sep 3, 2021 at 11:53 AM Daniel Gustafsson wrote: > > > On 1 Sep 2021, at 12:28, Bharath Rupireddy > > wrote: > > > > On Wed, Sep 1, 2021 at 1:56 PM Daniel Gustafsson wrote: > >> A v2 with the above fixes is attached. > > > > Thanks for the updated patch. Here are some comments: > > > >

Re: Trap errors from streaming child in pg_basebackup to exit early

2021-09-03 Thread Bharath Rupireddy
On Fri, Sep 3, 2021 at 3:23 PM Daniel Gustafsson wrote: > > 4) Instead of just exiting from the main pg_basebackup process when > > the child WAL receiver dies, can't we think of restarting the child > > process, probably with the WAL streaming position where it left off or > > stream from the

Re: Trap errors from streaming child in pg_basebackup to exit early

2021-09-03 Thread Daniel Gustafsson
> On 1 Sep 2021, at 12:28, Bharath Rupireddy > wrote: > > On Wed, Sep 1, 2021 at 1:56 PM Daniel Gustafsson wrote: >> A v2 with the above fixes is attached. > > Thanks for the updated patch. Here are some comments: > > 1) Do we need to set bgchild = -1 before the exit(1); in the code > below

Re: Trap errors from streaming child in pg_basebackup to exit early

2021-09-01 Thread Bharath Rupireddy
On Wed, Sep 1, 2021 at 1:56 PM Daniel Gustafsson wrote: > A v2 with the above fixes is attached. Thanks for the updated patch. Here are some comments: 1) Do we need to set bgchild = -1 before the exit(1); in the code below so that we don't kill(bgchild, SIGTERM); unnecessarily in

Re: Trap errors from streaming child in pg_basebackup to exit early

2021-09-01 Thread Daniel Gustafsson
> On 30 Aug 2021, at 12:31, Bharath Rupireddy > wrote: > Here are some comments on the patch: > 1) Do we need volatile keyword here to read the value of the variables > always from the memory? > +static volatile sig_atomic_t bgchild_exited = false; Yes, fixed. > 2) Do we need #ifndef WIN32

Re: Trap errors from streaming child in pg_basebackup to exit early

2021-08-30 Thread Bharath Rupireddy
On Thu, Aug 26, 2021 at 2:55 PM Daniel Gustafsson wrote: > > When using pg_basebackup with WAL streaming (-X stream), we have observed on a > number of times in production that the streaming child exited prematurely (to > no fault of the code it seems, most likely due to network middleboxes),

Trap errors from streaming child in pg_basebackup to exit early

2021-08-26 Thread Daniel Gustafsson
When using pg_basebackup with WAL streaming (-X stream), we have observed on a number of times in production that the streaming child exited prematurely (to no fault of the code it seems, most likely due to network middleboxes), which cause the backup to fail but only after it has run to