Re: improve pg_receivewal code

2021-09-17 Thread Michael Paquier
On Fri, Sep 17, 2021 at 11:46:33AM +0530, Bharath Rupireddy wrote:
> Thanks. I changed the code that way. PSA v3 patch.

Thanks.  Done.
--
Michael


signature.asc
Description: PGP signature


Re: improve pg_receivewal code

2021-09-17 Thread Bharath Rupireddy
On Thu, Sep 16, 2021 at 9:31 AM Michael Paquier  wrote:
>
> > Here's the v2 with above modifications.
>
> I was looking at this patch, and I agree that checking for the system
> ID and the timeline by setting sysidentifier beforehand looks like an
> improvement.
>
> The extra IDENTIFY_SYSTEM done at the beginning of StreamLog() is not
> a performance bottleneck as we run it only once for each loop.  I
> don't really get the argument of a server replacing another one on the
> same port requiring to rely only on the first system ID fetched before
> starting the loops of StreamLog() calls.  So I would leave main()
> alone, but fill in the system ID from RunIdentifySystem() in
> StreamLog().

Thanks. I changed the code that way. PSA v3 patch.

Regards,
Bharath Rupireddy.


v3-0001-improve-pg_receivewal-code.patch
Description: Binary data


Re: improve pg_receivewal code

2021-09-15 Thread Michael Paquier
> Here's the v2 with above modifications.

I was looking at this patch, and I agree that checking for the system
ID and the timeline by setting sysidentifier beforehand looks like an
improvement.

The extra IDENTIFY_SYSTEM done at the beginning of StreamLog() is not
a performance bottleneck as we run it only once for each loop.  I
don't really get the argument of a server replacing another one on the
same port requiring to rely only on the first system ID fetched before 
starting the loops of StreamLog() calls.  So I would leave main()
alone, but fill in the system ID from RunIdentifySystem() in
StreamLog().
--
Michael


signature.asc
Description: PGP signature


Re: improve pg_receivewal code

2021-09-02 Thread Bharath Rupireddy
On Thu, Sep 2, 2021 at 9:05 PM Ronan Dunklau  wrote:
> > 1) Fetch the server system identifier in the first RunIdentifySystem
> > call and use it to identify(via pg_receivewal's ReceiveXlogStream) any
> > unexpected changes that may happen in the server while pg_receivewal
> > is connected to it. This can be helpful in scenarios when
> > pg_receivewal tries to reconnect to the server (see the code around
> > pg_usleep with RECONNECT_SLEEP_TIME) but something unexpected has
> > happnend in the server that changed the its system identifier. Once
> > the pg_receivewal establishes the conenction to server again, then the
> > ReceiveXlogStream has a code chunk to compare the system identifier
> > that we received in the initial connection.
>
> I'm not sure what kind of problem could be happening here: if you were
> somewhat routed to another server ? Or if we "switched" the cluster listening
> on that port ?

Yeah. Also, pg_control file on the server can get corrupted for
whatever reasons it may be. This sys identifier check is useful in
case if the pg_receivewal connects to the server again and again.
These are things that sound over cautious to me, however it's nothing
wrong to use what ReceiveXlogStream provides. pg_basebackup does make
use of this already.

> > 2) Move the RunIdentifySystem to identify timeline id and start LSN
> > from the server only if the pg_receivewal failed to get them from
> > FindStreamingStart. This way, an extra IDENTIFY_SYSTEM command is
> > avoided.
>
> That makes sense, even if we add another IDENTIFY_SYSTEM to check against the
> one set in the first place.
>
> > 3) Place the "replication connetion shouldn't have any database name
> > associated" error check right after RunIdentifySystem so that we can
> > avoid fetching wal segment size with RetrieveWalSegSize if at all we
> > were to fail with that error. This change is similar to what
> > pg_recvlogical.c does.
>
> Makes sense.
>
> > 4) Move the RetrieveWalSegSize to just before pg_receivewal.c enters
> > main loop to get the wal from the server. This avoids an unnecessary
> > query for pg_receivewal with "--create-slot" or "--drop-slot".
> > 5) Have an assertion after the pg_receivewal done a good amount of
> > work to find start timeline and LSN might be helpful:
> > Assert(stream.timeline != 0 && stream.startpos != InvalidXLogRecPtr);
> >
> > Attaching a patch that does take care of above improvements. Thoughts?
>
> Overall I think it is good.

Thanks for your review.

> However, you have some formatting issues, here it mixes tabs and spaces:
>
> +   /*
> +* No valid data can be found in the existing output
> directory.
> +* Get start LSN position and current timeline ID from
> the server.
> +*/

My bad. I forgot to run "git diff --check" on the v1 patch.

> And here it is not formatted properly:
>
> +static char   *server_sysid = NULL;

Done.

Here's the v2 with above modifications.

Regards,
Bharath Rupireddy.


v2-0001-improve-pg_receivewal-code.patch
Description: Binary data


Re: improve pg_receivewal code

2021-09-02 Thread Ronan Dunklau
Le lundi 30 août 2021, 09:32:05 CEST Bharath Rupireddy a écrit :
> Hi,
> 
> I see there's a scope to do following improvements to pg_receivewal.c:

Thank you Bharath for this patch.

> 
> 1) Fetch the server system identifier in the first RunIdentifySystem
> call and use it to identify(via pg_receivewal's ReceiveXlogStream) any
> unexpected changes that may happen in the server while pg_receivewal
> is connected to it. This can be helpful in scenarios when
> pg_receivewal tries to reconnect to the server (see the code around
> pg_usleep with RECONNECT_SLEEP_TIME) but something unexpected has
> happnend in the server that changed the its system identifier. Once
> the pg_receivewal establishes the conenction to server again, then the
> ReceiveXlogStream has a code chunk to compare the system identifier
> that we received in the initial connection.

I'm not sure what kind of problem could be happening here: if you were 
somewhat routed to another server ? Or if we "switched" the cluster listening 
on that port ? 

> 2) Move the RunIdentifySystem to identify timeline id and start LSN
> from the server only if the pg_receivewal failed to get them from
> FindStreamingStart. This way, an extra IDENTIFY_SYSTEM command is
> avoided.

That makes sense, even if we add another IDENTIFY_SYSTEM to check against the 
one set in the first place.

> 3) Place the "replication connetion shouldn't have any database name
> associated" error check right after RunIdentifySystem so that we can
> avoid fetching wal segment size with RetrieveWalSegSize if at all we
> were to fail with that error. This change is similar to what
> pg_recvlogical.c does.

Makes sense.

> 4) Move the RetrieveWalSegSize to just before pg_receivewal.c enters
> main loop to get the wal from the server. This avoids an unnecessary
> query for pg_receivewal with "--create-slot" or "--drop-slot".
> 5) Have an assertion after the pg_receivewal done a good amount of
> work to find start timeline and LSN might be helpful:
> Assert(stream.timeline != 0 && stream.startpos != InvalidXLogRecPtr);
> 
> Attaching a patch that does take care of above improvements. Thoughts?

Overall I think it is good.

However, you have some formatting issues, here it mixes tabs and spaces:

+   /*
+* No valid data can be found in the existing output 
directory.
+* Get start LSN position and current timeline ID from 
the server.
+*/

And here it is not formatted properly:

+static char   *server_sysid = NULL;



> 
> Regards,
> Bharath Rupireddy.


-- 
Ronan Dunklau






Re: improve pg_receivewal code

2021-08-31 Thread Bharath Rupireddy
On Mon, Aug 30, 2021 at 1:02 PM Bharath Rupireddy
 wrote:
>
> Hi,
>
> I see there's a scope to do following improvements to pg_receivewal.c:
>
> 1) Fetch the server system identifier in the first RunIdentifySystem
> call and use it to identify(via pg_receivewal's ReceiveXlogStream) any
> unexpected changes that may happen in the server while pg_receivewal
> is connected to it. This can be helpful in scenarios when
> pg_receivewal tries to reconnect to the server (see the code around
> pg_usleep with RECONNECT_SLEEP_TIME) but something unexpected has
> happnend in the server that changed the its system identifier. Once
> the pg_receivewal establishes the conenction to server again, then the
> ReceiveXlogStream has a code chunk to compare the system identifier
> that we received in the initial connection.
> 2) Move the RunIdentifySystem to identify timeline id and start LSN
> from the server only if the pg_receivewal failed to get them from
> FindStreamingStart. This way, an extra IDENTIFY_SYSTEM command is
> avoided.
> 3) Place the "replication connetion shouldn't have any database name
> associated" error check right after RunIdentifySystem so that we can
> avoid fetching wal segment size with RetrieveWalSegSize if at all we
> were to fail with that error. This change is similar to what
> pg_recvlogical.c does.
> 4) Move the RetrieveWalSegSize to just before pg_receivewal.c enters
> main loop to get the wal from the server. This avoids an unnecessary
> query for pg_receivewal with "--create-slot" or "--drop-slot".
> 5) Have an assertion after the pg_receivewal done a good amount of
> work to find start timeline and LSN might be helpful:
> Assert(stream.timeline != 0 && stream.startpos != InvalidXLogRecPtr);
>
> Attaching a patch that does take care of above improvements. Thoughts?

Here's the CF entry - https://commitfest.postgresql.org/34/3315/

Regards,
Bharath Rupireddy.




improve pg_receivewal code

2021-08-30 Thread Bharath Rupireddy
Hi,

I see there's a scope to do following improvements to pg_receivewal.c:

1) Fetch the server system identifier in the first RunIdentifySystem
call and use it to identify(via pg_receivewal's ReceiveXlogStream) any
unexpected changes that may happen in the server while pg_receivewal
is connected to it. This can be helpful in scenarios when
pg_receivewal tries to reconnect to the server (see the code around
pg_usleep with RECONNECT_SLEEP_TIME) but something unexpected has
happnend in the server that changed the its system identifier. Once
the pg_receivewal establishes the conenction to server again, then the
ReceiveXlogStream has a code chunk to compare the system identifier
that we received in the initial connection.
2) Move the RunIdentifySystem to identify timeline id and start LSN
from the server only if the pg_receivewal failed to get them from
FindStreamingStart. This way, an extra IDENTIFY_SYSTEM command is
avoided.
3) Place the "replication connetion shouldn't have any database name
associated" error check right after RunIdentifySystem so that we can
avoid fetching wal segment size with RetrieveWalSegSize if at all we
were to fail with that error. This change is similar to what
pg_recvlogical.c does.
4) Move the RetrieveWalSegSize to just before pg_receivewal.c enters
main loop to get the wal from the server. This avoids an unnecessary
query for pg_receivewal with "--create-slot" or "--drop-slot".
5) Have an assertion after the pg_receivewal done a good amount of
work to find start timeline and LSN might be helpful:
Assert(stream.timeline != 0 && stream.startpos != InvalidXLogRecPtr);

Attaching a patch that does take care of above improvements. Thoughts?

Regards,
Bharath Rupireddy.
From 02edb7366a48ebc562d3d0f2ef4865ee704a84d5 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Mon, 30 Aug 2021 07:19:03 +
Subject: [PATCH v1] improve pg_receivewal code

This patch does following improvements to pg_receivewal.c:

1) Fetch the server system identifier in the first RunIdentifySystem
call and use it to identify(via pg_receivewal's ReceiveXlogStream)
any unexpected changes that may happen in the server while pg_receivewal
is connected to it. This can be helpful in scenarios when pg_receivewal
tries to reconnect to the server (see the code around pg_usleep
with RECONNECT_SLEEP_TIME) but something unexpected has happnend
in the server that changed the its system identifier. Once the
pg_receivewal establishes the conenction to server again, then
the ReceiveXlogStream has a code chunk to compare the system identifier
that we received in the initial connection.
2) Move the RunIdentifySystem to identify timeline id and start LSN
from the server only if the pg_receivewal failed to get them from
FindStreamingStart. This way, an extra IDENTIFY_SYSTEM command is avoided.
3) Place the "replication connetion shouldn't have any database name
associated" error check right after RunIdentifySystem so that we can
avoid fetching wal segment size with RetrieveWalSegSize if at all we
were to fail with that error. This change is similar to what pg_recvlogical.c
does.
4) Move the RetrieveWalSegSize to just before pg_receivewal.c
enters main loop to get the wal from the server. This avoids an
unnecessary query for pg_receivewal with "--create-slot" or "--drop-slot".
5) Have an assertion after the pg_receivewal done a good amount of
work to find start timeline and LSN might be helpful:
Assert(stream.timeline != 0 && stream.startpos != InvalidXLogRecPtr);
---
 src/bin/pg_basebackup/pg_receivewal.c | 53 +--
 1 file changed, 25 insertions(+), 28 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index 4474273daf..cffdf29d9e 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -46,7 +46,7 @@ static bool do_sync = true;
 static bool synchronous = false;
 static char *replication_slot = NULL;
 static XLogRecPtr endpos = InvalidXLogRecPtr;
-
+static char	   *server_sysid = NULL;
 
 static void usage(void);
 static DIR *get_destination_dir(char *dest_folder);
@@ -369,8 +369,6 @@ FindStreamingStart(uint32 *tli)
 static void
 StreamLog(void)
 {
-	XLogRecPtr	serverpos;
-	TimeLineID	servertli;
 	StreamCtl	stream;
 
 	MemSet(, 0, sizeof(stream));
@@ -394,24 +392,22 @@ StreamLog(void)
 		exit(1);
 	}
 
-	/*
-	 * Identify server, obtaining start LSN position and current timeline ID
-	 * at the same time, necessary if not valid data can be found in the
-	 * existing output directory.
-	 */
-	if (!RunIdentifySystem(conn, NULL, , , NULL))
-		exit(1);
-
 	/*
 	 * Figure out where to start streaming.
 	 */
 	stream.startpos = FindStreamingStart();
 	if (stream.startpos == InvalidXLogRecPtr)
 	{
-		stream.startpos = serverpos;
-		stream.timeline = servertli;
+		/*
+	 	 * No valid data can be found in the existing output dir