Re: Enumize logical replication message actions

2020-11-26 Thread Ashutosh Bapat
On Thu, Nov 26, 2020 at 10:15 AM Amit Kapila 
wrote:

> On Wed, Nov 25, 2020 at 2:52 PM Amit Kapila 
> wrote:
> >
> > On Wed, Nov 25, 2020 at 2:26 PM Peter Smith 
> wrote:
> > >
> > > Hi Hackers.
> > >
> > > Last month there was a commit [1] for replacing logical replication
> > > message type characters with enums of equivalent values.
> > >
> > > I was revisiting this code recently and I think due to oversight that
> > > initial patch was incomplete. IIUC there are several more enum
> > > substitutions which should have been made.
> > >
> > > PSA my patch which adds those missing substitutions.
> > >
> >
> > Good catch. I'll review it in a day or so.
> >
>
> The patch looks good to me and I have pushed it.
>

Thanks Peter and Amit for noticing the missing substitutions and fixing
those.

--
Best Wishes,
Ashutosh


Re: Enumize logical replication message actions

2020-11-25 Thread Amit Kapila
On Wed, Nov 25, 2020 at 2:52 PM Amit Kapila  wrote:
>
> On Wed, Nov 25, 2020 at 2:26 PM Peter Smith  wrote:
> >
> > Hi Hackers.
> >
> > Last month there was a commit [1] for replacing logical replication
> > message type characters with enums of equivalent values.
> >
> > I was revisiting this code recently and I think due to oversight that
> > initial patch was incomplete. IIUC there are several more enum
> > substitutions which should have been made.
> >
> > PSA my patch which adds those missing substitutions.
> >
>
> Good catch. I'll review it in a day or so.
>

The patch looks good to me and I have pushed it.

-- 
With Regards,
Amit Kapila.




Re: Enumize logical replication message actions

2020-11-25 Thread Amit Kapila
On Wed, Nov 25, 2020 at 2:26 PM Peter Smith  wrote:
>
> Hi Hackers.
>
> Last month there was a commit [1] for replacing logical replication
> message type characters with enums of equivalent values.
>
> I was revisiting this code recently and I think due to oversight that
> initial patch was incomplete. IIUC there are several more enum
> substitutions which should have been made.
>
> PSA my patch which adds those missing substitutions.
>

Good catch. I'll review it in a day or so.

-- 
With Regards,
Amit Kapila.




Re: Enumize logical replication message actions

2020-11-25 Thread Peter Smith
Hi Hackers.

Last month there was a commit [1] for replacing logical replication
message type characters with enums of equivalent values.

I was revisiting this code recently and I think due to oversight that
initial patch was incomplete. IIUC there are several more enum
substitutions which should have been made.

PSA my patch which adds those missing substitutions.

---

[1] 
https://github.com/postgres/postgres/commit/644f0d7cc9c2cb270746f2024c706554e0fbec82

Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-Enums-for-message-types.patch
Description: Binary data


Re: Enumize logical replication message actions

2020-11-02 Thread Ashutosh Bapat
Thanks Amit.

On Mon, 2 Nov 2020 at 14:15, Amit Kapila  wrote:

> On Fri, Oct 30, 2020 at 5:52 PM Ashutosh Bapat
>  wrote:
> >
> > On Fri, 30 Oct 2020 at 17:37, Amit Kapila 
> wrote:
> >>
> >>
> >>
> >> Other than that the patch looks good to me.
> >>
> >
> > Patch with updated commit message and also the list of reviewers
> >
>
> Thanks, pushed!
>
> --
> With Regards,
> Amit Kapila.
>


-- 
Best Wishes,
Ashutosh


Re: Enumize logical replication message actions

2020-11-02 Thread Amit Kapila
On Fri, Oct 30, 2020 at 5:52 PM Ashutosh Bapat
 wrote:
>
> On Fri, 30 Oct 2020 at 17:37, Amit Kapila  wrote:
>>
>>
>>
>> Other than that the patch looks good to me.
>>
>
> Patch with updated commit message and also the list of reviewers
>

Thanks, pushed!

-- 
With Regards,
Amit Kapila.




Re: Enumize logical replication message actions

2020-10-30 Thread Ashutosh Bapat
On Fri, 30 Oct 2020 at 17:37, Amit Kapila  wrote:

>
> I don't like the word 'Enumize' in commit message. How about changing
> it to something like: (a) Add defines for logical replication protocol
> messages, or (b) Associate names with logical replication protocol
> messages.
>

I have used  "Use Enum for top level logical replication message types" in
the attached patch. But please free to use (a) if you feel so.


>
> + 2. It's easy to locate the code handling a given type.
>
> In the above instead of 'type', shouldn't it be 'message'.
>

Used "message type". But please feel free to use "message" if you think
that's appropriate.


>
> Other than that the patch looks good to me.
>
>
Patch with updated commit message and also the list of reviewers

-- 
Best Wishes,
Ashutosh
From 55957f412c40106288f932d38182cf79f5902ddf Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Fri, 16 Oct 2020 12:31:35 +0530
Subject: [PATCH] Use Enum for top level logical replication message types

Logical replication protocol uses single byte character to identify a
message type in logical repliation protocol. The code uses string
literals for the same. Enumize those so that

1. All the string literals used can be found at a single place. This
makes it easy to add more types without the risk of conflicts.

2. It's easy to locate the code handling a given message type.

3. When used with switch statements, it is easy to identify the missing
cases using -Wswitch.

Ashutosh Bapat, reviewed by Kyotaro Horiguchi, Andres Freund, Peter
Smith and Amit Kapila

Discussion: https://postgr.es/m/caexhw5upzq7l0oad_enyvaiymopgkraojpe+zy5-obdcvt6...@mail.gmail.com
---
 src/backend/replication/logical/proto.c  | 26 +++
 src/backend/replication/logical/worker.c | 87 
 src/include/replication/logicalproto.h   | 27 
 3 files changed, 83 insertions(+), 57 deletions(-)

diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index eb19142b48..fdb31182d7 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -44,7 +44,7 @@ static const char *logicalrep_read_namespace(StringInfo in);
 void
 logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn)
 {
-	pq_sendbyte(out, 'B');		/* BEGIN */
+	pq_sendbyte(out, LOGICAL_REP_MSG_BEGIN);
 
 	/* fixed fields */
 	pq_sendint64(out, txn->final_lsn);
@@ -76,7 +76,7 @@ logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
 {
 	uint8		flags = 0;
 
-	pq_sendbyte(out, 'C');		/* sending COMMIT */
+	pq_sendbyte(out, LOGICAL_REP_MSG_COMMIT);
 
 	/* send the flags field (unused for now) */
 	pq_sendbyte(out, flags);
@@ -112,7 +112,7 @@ void
 logicalrep_write_origin(StringInfo out, const char *origin,
 		XLogRecPtr origin_lsn)
 {
-	pq_sendbyte(out, 'O');		/* ORIGIN */
+	pq_sendbyte(out, LOGICAL_REP_MSG_ORIGIN);
 
 	/* fixed fields */
 	pq_sendint64(out, origin_lsn);
@@ -141,7 +141,7 @@ void
 logicalrep_write_insert(StringInfo out, TransactionId xid, Relation rel,
 		HeapTuple newtuple, bool binary)
 {
-	pq_sendbyte(out, 'I');		/* action INSERT */
+	pq_sendbyte(out, LOGICAL_REP_MSG_INSERT);
 
 	/* transaction ID (if not valid, we're not streaming) */
 	if (TransactionIdIsValid(xid))
@@ -185,7 +185,7 @@ void
 logicalrep_write_update(StringInfo out, TransactionId xid, Relation rel,
 		HeapTuple oldtuple, HeapTuple newtuple, bool binary)
 {
-	pq_sendbyte(out, 'U');		/* action UPDATE */
+	pq_sendbyte(out, LOGICAL_REP_MSG_UPDATE);
 
 	Assert(rel->rd_rel->relreplident == REPLICA_IDENTITY_DEFAULT ||
 		   rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL ||
@@ -263,7 +263,7 @@ logicalrep_write_delete(StringInfo out, TransactionId xid, Relation rel,
 		   rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL ||
 		   rel->rd_rel->relreplident == REPLICA_IDENTITY_INDEX);
 
-	pq_sendbyte(out, 'D');		/* action DELETE */
+	pq_sendbyte(out, LOGICAL_REP_MSG_DELETE);
 
 	/* transaction ID (if not valid, we're not streaming) */
 	if (TransactionIdIsValid(xid))
@@ -317,7 +317,7 @@ logicalrep_write_truncate(StringInfo out,
 	int			i;
 	uint8		flags = 0;
 
-	pq_sendbyte(out, 'T');		/* action TRUNCATE */
+	pq_sendbyte(out, LOGICAL_REP_MSG_TRUNCATE);
 
 	/* transaction ID (if not valid, we're not streaming) */
 	if (TransactionIdIsValid(xid))
@@ -369,7 +369,7 @@ logicalrep_write_rel(StringInfo out, TransactionId xid, Relation rel)
 {
 	char	   *relname;
 
-	pq_sendbyte(out, 'R');		/* sending RELATION */
+	pq_sendbyte(out, LOGICAL_REP_MSG_RELATION);
 
 	/* transaction ID (if not valid, we're not streaming) */
 	if (TransactionIdIsValid(xid))
@@ -425,7 +425,7 @@ logicalrep_write_typ(StringInfo out, TransactionId xid, Oid typoid)
 	HeapTuple	tup;
 	Form_pg_type typtup;
 
-	pq_sendbyte(out, 'Y');		/* sending TYPE */
+	pq_sendbyte(out, LOGICAL_REP_MSG_TYPE);
 
 	/* transaction ID (if not valid, we're not streaming) */
 	if (TransactionIdIsValid(xid))
@@ -755,7 +755,7 @@ void
 

Re: Enumize logical replication message actions

2020-10-30 Thread Ashutosh Bapat
On Fri, 30 Oct 2020 at 14:59, Amit Kapila  wrote:

> On Fri, Oct 30, 2020 at 11:50 AM Amit Kapila 
> wrote:
> >
> > On Fri, Oct 30, 2020 at 10:37 AM Peter Smith 
> wrote:
> > >
> > > IIUC getting rid of the default from the switch can make the missing
> > > enum detection easier because then you can use -Wswitch option to
> > > expose the problem (instead of -Wswitch-enum which may give lots of
> > > false positives as well)
> > >
> >
> > Fair enough. So, it makes sense to move the default out of the switch
> case.
> >
>
> One more thing I was thinking about this patch was whether it has any
> impact w.r.t to Endianness as we are using four-bytes to represent
> one-byte and it seems there is no issue with that because pq_sendbyte
> accepts just one-byte and sends that over the network. So, we could
> see a problem only if we use any enum value which is more than
> one-byte which we are anyway adding a warning message along with the
> definition of enum. So, we are safe here. Does that make sense?
>
>
yes. Endian-ness should be handled by the compiler transparently.

-- 
Best Wishes,
Ashutosh


Re: Enumize logical replication message actions

2020-10-30 Thread Ashutosh Bapat
On Fri, 30 Oct 2020 at 09:16, Amit Kapila  wrote

>
> 1.
> + LOGICAL_REP_MSG_STREAM_ABORT = 'A',
> +} LogicalRepMsgType;
>
> There is no need for a comma after the last message.
>
> Done. Thanks for noticing it.


> 2.
> +/*
> + * Logical message types
> + *
> + * Used by logical replication wire protocol.
> + *
> + * Note: though this is an enum it should fit a single byte and should be
> a
> + * printable character.
> + */
> +typedef enum
> +{
>
> I think we can expand the comments to probably say why we need these
> to fit in a single byte or what problem it can cause if that rule is
> disobeyed. This is to make the next person clear why we are imposing
> such a rule.
>

Done. Please check.


>
> 3.
> +typedef enum
> +{
> ..
> +} LogicalRepMsgType;
>
> There are places in code where we use the enum name
> (LogicalRepMsgType) both in the start and end. See TypeCat,
> CoercionMethod, CoercionCodes, etc. I see places where we use the way
> you have in the code. I would prefer the way we have used at places
> like TypeCat as that makes it easier to read.
>

Not my favourite style since changing the type name requires changing enum
name to keep those consistent. But anyway done.


>
> 4.
>   switch (action)
>   {
> - /* BEGIN */
> - case 'B':
> + case LOGICAL_REP_MSG_BEGIN:
>   apply_handle_begin(s);
> - break;
> - /* COMMIT */
> - case 'C':
> + return;
>
> I think we can simply use 'return apply_handle_begin;' instead of
> adding return in another line. Again, I think we changed this handling
> in apply_dispatch() to improve the case where we can detect at the
> compile time any missing enum but at this stage it is not clear to me
> if that is true.
>

I don't see much value in writing it like "return apply_handle_begin()";
gives an impression that apply_handle_begin() and apply_dispatch() are
returning something which they are not. I would prefer return on separate
line unless there's something more than style improvement.

I have added rationale behind Enum in the commit message as you suggested
in one of the later mails.

PFA patch addressing your comments.
-- 
Best Wishes,
Ashutosh
From f71a600d3fd49756926deec1b593472a9fd8a8cc Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Fri, 16 Oct 2020 12:31:35 +0530
Subject: [PATCH] Enumize top level logical replication actions

Logical replication protocol uses single byte character to identify a
message type in logical repliation protocol. The code uses string
literals for the same. Enumize those so that

1. All the string literals used can be found at a single place. This
makes it easy to add more types without the risk of conflicts.

2. It's easy to locate the code handling a given type.

3. When used with switch statements, it is easy to identify the missing
cases using -Wswitch.

Ashutosh Bapat
---
 src/backend/replication/logical/proto.c  | 26 +++
 src/backend/replication/logical/worker.c | 87 
 src/include/replication/logicalproto.h   | 27 
 3 files changed, 83 insertions(+), 57 deletions(-)

diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index eb19142b48..fdb31182d7 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -44,7 +44,7 @@ static const char *logicalrep_read_namespace(StringInfo in);
 void
 logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn)
 {
-	pq_sendbyte(out, 'B');		/* BEGIN */
+	pq_sendbyte(out, LOGICAL_REP_MSG_BEGIN);
 
 	/* fixed fields */
 	pq_sendint64(out, txn->final_lsn);
@@ -76,7 +76,7 @@ logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
 {
 	uint8		flags = 0;
 
-	pq_sendbyte(out, 'C');		/* sending COMMIT */
+	pq_sendbyte(out, LOGICAL_REP_MSG_COMMIT);
 
 	/* send the flags field (unused for now) */
 	pq_sendbyte(out, flags);
@@ -112,7 +112,7 @@ void
 logicalrep_write_origin(StringInfo out, const char *origin,
 		XLogRecPtr origin_lsn)
 {
-	pq_sendbyte(out, 'O');		/* ORIGIN */
+	pq_sendbyte(out, LOGICAL_REP_MSG_ORIGIN);
 
 	/* fixed fields */
 	pq_sendint64(out, origin_lsn);
@@ -141,7 +141,7 @@ void
 logicalrep_write_insert(StringInfo out, TransactionId xid, Relation rel,
 		HeapTuple newtuple, bool binary)
 {
-	pq_sendbyte(out, 'I');		/* action INSERT */
+	pq_sendbyte(out, LOGICAL_REP_MSG_INSERT);
 
 	/* transaction ID (if not valid, we're not streaming) */
 	if (TransactionIdIsValid(xid))
@@ -185,7 +185,7 @@ void
 logicalrep_write_update(StringInfo out, TransactionId xid, Relation rel,
 		HeapTuple oldtuple, HeapTuple newtuple, bool binary)
 {
-	pq_sendbyte(out, 'U');		/* action UPDATE */
+	pq_sendbyte(out, LOGICAL_REP_MSG_UPDATE);
 
 	Assert(rel->rd_rel->relreplident == REPLICA_IDENTITY_DEFAULT ||
 		   rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL ||
@@ -263,7 +263,7 @@ logicalrep_write_delete(StringInfo out, TransactionId xid, Relation rel,
 		   rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL ||
 		   rel->rd_rel->relreplident == 

Re: Enumize logical replication message actions

2020-10-30 Thread Amit Kapila
On Fri, Oct 30, 2020 at 11:50 AM Amit Kapila  wrote:
>
> On Fri, Oct 30, 2020 at 10:37 AM Peter Smith  wrote:
> >
> > IIUC getting rid of the default from the switch can make the missing
> > enum detection easier because then you can use -Wswitch option to
> > expose the problem (instead of -Wswitch-enum which may give lots of
> > false positives as well)
> >
>
> Fair enough. So, it makes sense to move the default out of the switch case.
>

One more thing I was thinking about this patch was whether it has any
impact w.r.t to Endianness as we are using four-bytes to represent
one-byte and it seems there is no issue with that because pq_sendbyte
accepts just one-byte and sends that over the network. So, we could
see a problem only if we use any enum value which is more than
one-byte which we are anyway adding a warning message along with the
definition of enum. So, we are safe here. Does that make sense?

-- 
With Regards,
Amit Kapila.




Re: Enumize logical replication message actions

2020-10-30 Thread Amit Kapila
On Fri, Oct 30, 2020 at 10:37 AM Peter Smith  wrote:
>
> On Fri, Oct 30, 2020 at 2:46 PM Amit Kapila  wrote:
>
> Hi Amit
>
> > You mentioned in the beginning that you prefer to use Enum instead of
> > define so that switch cases can detect any remaining items but I have
> > tried adding extra enum value at the end and didn't handle that in
> > switch case but I didn't get any compilation warning or error. Do we
> > need something else to detect that at compile time?
>
> See [1] some GCC compiler options that can expose missing cases like those.
>

Thanks, I am able to see the warnings now.

> e.g. use -Wswitch or -Wswitch-enum
> Detection depends if the switch has a default case or not.
>
> > 4.
> >   switch (action)
> >   {
> > - /* BEGIN */
> > - case 'B':
> > + case LOGICAL_REP_MSG_BEGIN:
> >   apply_handle_begin(s);
> > - break;
> > - /* COMMIT */
> > - case 'C':
> > + return;
> >
> > I think we can simply use 'return apply_handle_begin;' instead of
> > adding return in another line. Again, I think we changed this handling
> > in apply_dispatch() to improve the case where we can detect at the
> > compile time any missing enum but at this stage it is not clear to me
> > if that is true.
>
> IIUC getting rid of the default from the switch can make the missing
> enum detection easier because then you can use -Wswitch option to
> expose the problem (instead of -Wswitch-enum which may give lots of
> false positives as well)
>

Fair enough. So, it makes sense to move the default out of the switch case.

Ashutosh, see if we can add in comments (or may be commit message) why
we preferred to use enum for these messages.

-- 
With Regards,
Amit Kapila.




Re: Enumize logical replication message actions

2020-10-29 Thread Peter Smith
On Fri, Oct 30, 2020 at 2:46 PM Amit Kapila  wrote:

Hi Amit

> You mentioned in the beginning that you prefer to use Enum instead of
> define so that switch cases can detect any remaining items but I have
> tried adding extra enum value at the end and didn't handle that in
> switch case but I didn't get any compilation warning or error. Do we
> need something else to detect that at compile time?

See [1] some GCC compiler options that can expose missing cases like those.

e.g. use -Wswitch or -Wswitch-enum
Detection depends if the switch has a default case or not.

> 4.
>   switch (action)
>   {
> - /* BEGIN */
> - case 'B':
> + case LOGICAL_REP_MSG_BEGIN:
>   apply_handle_begin(s);
> - break;
> - /* COMMIT */
> - case 'C':
> + return;
>
> I think we can simply use 'return apply_handle_begin;' instead of
> adding return in another line. Again, I think we changed this handling
> in apply_dispatch() to improve the case where we can detect at the
> compile time any missing enum but at this stage it is not clear to me
> if that is true.

IIUC getting rid of the default from the switch can make the missing
enum detection easier because then you can use -Wswitch option to
expose the problem (instead of -Wswitch-enum which may give lots of
false positives as well)

===

[1]  https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wswitch

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Enumize logical replication message actions

2020-10-29 Thread Amit Kapila
On Fri, Oct 23, 2020 at 6:26 PM Ashutosh Bapat
 wrote:
>
>
>
> On Fri, 23 Oct 2020 at 18:23, Amit Kapila  wrote:
>>
>> On Fri, Oct 23, 2020 at 11:50 AM Kyotaro Horiguchi
>>  wrote:
>> >
>> > At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera 
>> >  wrote in
>> > > On 2020-Oct-22, Ashutosh Bapat wrote:
>> > >
>> > > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi 
>> > > > 
>> > > > wrote:
>> > >
>> > > > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
>> > > > > something like that we shouldn't do this refactoring, I think.
>> > > >
>> > > > Enum is an integer, and we want to send byte. The function asserts 
>> > > > that the
>> > > > enum fits a byte. If there's a way to declare byte long enums I would 
>> > > > use
>> > > > that. But I didn't find a way to do that.
>> > >
>> > > I didn't look at the code, but maybe it's sufficient to add a
>> > > StaticAssert?
>> >
>> > That check needs to visit all symbols in a enum and confirm that each
>> > of them is in a certain range.
>> >
>>
>> Can we define something like LOGICAL_REP_MSG_LAST (also add a comment
>> indicating this is a fake message and must be the last one) as the
>> last and just check that?
>>
>
> I don't think that's required once I applied suggestions from Kyotaro and 
> Peter. Please check the latest patch.
> Usually LAST is added to an enum when we need to cap the number of symbols or 
> want to find the number of symbols. None of that is necessary here. Do you 
> see any other use?
>

You mentioned in the beginning that you prefer to use Enum instead of
define so that switch cases can detect any remaining items but I have
tried adding extra enum value at the end and didn't handle that in
switch case but I didn't get any compilation warning or error. Do we
need something else to detect that at compile time?

Some comments assuming we want to use enum either because I am missing
something or due to some other reason we have not discussed yet.

1.
+ LOGICAL_REP_MSG_STREAM_ABORT = 'A',
+} LogicalRepMsgType;

There is no need for a comma after the last message.

2.
+/*
+ * Logical message types
+ *
+ * Used by logical replication wire protocol.
+ *
+ * Note: though this is an enum it should fit a single byte and should be a
+ * printable character.
+ */
+typedef enum
+{

I think we can expand the comments to probably say why we need these
to fit in a single byte or what problem it can cause if that rule is
disobeyed. This is to make the next person clear why we are imposing
such a rule.

3.
+typedef enum
+{
..
+} LogicalRepMsgType;

There are places in code where we use the enum name
(LogicalRepMsgType) both in the start and end. See TypeCat,
CoercionMethod, CoercionCodes, etc. I see places where we use the way
you have in the code. I would prefer the way we have used at places
like TypeCat as that makes it easier to read.

4.
  switch (action)
  {
- /* BEGIN */
- case 'B':
+ case LOGICAL_REP_MSG_BEGIN:
  apply_handle_begin(s);
- break;
- /* COMMIT */
- case 'C':
+ return;

I think we can simply use 'return apply_handle_begin;' instead of
adding return in another line. Again, I think we changed this handling
in apply_dispatch() to improve the case where we can detect at the
compile time any missing enum but at this stage it is not clear to me
if that is true.

-- 
With Regards,
Amit Kapila.




Re: Enumize logical replication message actions

2020-10-23 Thread Ashutosh Bapat
On Fri, 23 Oct 2020 at 18:23, Amit Kapila  wrote:

> On Fri, Oct 23, 2020 at 11:50 AM Kyotaro Horiguchi
>  wrote:
> >
> > At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera <
> alvhe...@alvh.no-ip.org> wrote in
> > > On 2020-Oct-22, Ashutosh Bapat wrote:
> > >
> > > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <
> horikyota@gmail.com>
> > > > wrote:
> > >
> > > > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > > > > something like that we shouldn't do this refactoring, I think.
> > > >
> > > > Enum is an integer, and we want to send byte. The function asserts
> that the
> > > > enum fits a byte. If there's a way to declare byte long enums I
> would use
> > > > that. But I didn't find a way to do that.
> > >
> > > I didn't look at the code, but maybe it's sufficient to add a
> > > StaticAssert?
> >
> > That check needs to visit all symbols in a enum and confirm that each
> > of them is in a certain range.
> >
>
> Can we define something like LOGICAL_REP_MSG_LAST (also add a comment
> indicating this is a fake message and must be the last one) as the
> last and just check that?
>
>
I don't think that's required once I applied suggestions from Kyotaro and
Peter. Please check the latest patch.
Usually LAST is added to an enum when we need to cap the number of symbols
or want to find the number of symbols. None of that is necessary here. Do
you see any other use?

-- 
Best Wishes,
Ashutosh


Re: Enumize logical replication message actions

2020-10-23 Thread Ashutosh Bapat
On Fri, 23 Oct 2020 at 17:02, Kyotaro Horiguchi 
wrote:

> At Fri, 23 Oct 2020 19:53:00 +1100, Peter Smith 
> wrote in
> > On Fri, Oct 23, 2020 at 5:20 PM Kyotaro Horiguchi
> >  wrote:
> > >
> > > At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera <
> alvhe...@alvh.no-ip.org> wrote in
> > > > On 2020-Oct-22, Ashutosh Bapat wrote:
> > > >
> > > > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <
> horikyota@gmail.com>
> > > > > wrote:
> > > >
> > > > > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we
> need
> > > > > > something like that we shouldn't do this refactoring, I think.
> > > > >
> > > > > Enum is an integer, and we want to send byte. The function asserts
> that the
> > > > > enum fits a byte. If there's a way to declare byte long enums I
> would use
> > > > > that. But I didn't find a way to do that.
> >
> > The pq_send_logicalrep_msg_type() function seemed a bit overkill to me.
>
> Ah, yes, it is what I meant. I didn't come up with the word "overkill".
>
> > The comment in the LogicalRepMsgType enum will sufficiently ensure
> > nobody is going to accidentally add any bad replication message codes.
> > And it's not like these are going to be changed often.
>
> Agreed.
>
> > Why not simply downcast your enums when calling pq_sendbyte?
> > There are only a few of them.
> >
> > e.g. pq_sendbyte(out, (uint8)LOGICAL_REP_MSG_STREAM_COMMIT);
>
> If you are worried about compiler warning, that explicit cast is not
> required. Even if the symbol is larger than 0xff, the upper bytes are
> silently truncated off.
>
>
I agree with Peter that the prologue of  LogicalRepMsgType is enough.

I also agree with Kyotaro, that explicit cast is unnecessary.

All this together makes the second patch useless. Removed it. Instead used
Kyotaro's idea in previous mail.

PFA updated patch.

-- 
Best Wishes,
Ashutosh
From 76f578416503478bf5e39993eec4bbd0f17d5a17 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Fri, 16 Oct 2020 12:31:35 +0530
Subject: [PATCH] Enumize top level logical replication actions

Logical replication protocol uses single byte character to identify
different chunks of logical repliation messages. The code uses string
literals for the same. Enumize those so that
1. All the string literals used can be found at a single place. This
makes it easy to add more actions without the risk of conflicts.
2. It's easy to locate the code handling a given action.

Ashutosh Bapat
---
 src/backend/replication/logical/proto.c  | 26 +++
 src/backend/replication/logical/worker.c | 87 
 src/include/replication/logicalproto.h   | 25 +++
 3 files changed, 81 insertions(+), 57 deletions(-)

diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index eb19142b48..fdb31182d7 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -44,7 +44,7 @@ static const char *logicalrep_read_namespace(StringInfo in);
 void
 logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn)
 {
-	pq_sendbyte(out, 'B');		/* BEGIN */
+	pq_sendbyte(out, LOGICAL_REP_MSG_BEGIN);
 
 	/* fixed fields */
 	pq_sendint64(out, txn->final_lsn);
@@ -76,7 +76,7 @@ logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
 {
 	uint8		flags = 0;
 
-	pq_sendbyte(out, 'C');		/* sending COMMIT */
+	pq_sendbyte(out, LOGICAL_REP_MSG_COMMIT);
 
 	/* send the flags field (unused for now) */
 	pq_sendbyte(out, flags);
@@ -112,7 +112,7 @@ void
 logicalrep_write_origin(StringInfo out, const char *origin,
 		XLogRecPtr origin_lsn)
 {
-	pq_sendbyte(out, 'O');		/* ORIGIN */
+	pq_sendbyte(out, LOGICAL_REP_MSG_ORIGIN);
 
 	/* fixed fields */
 	pq_sendint64(out, origin_lsn);
@@ -141,7 +141,7 @@ void
 logicalrep_write_insert(StringInfo out, TransactionId xid, Relation rel,
 		HeapTuple newtuple, bool binary)
 {
-	pq_sendbyte(out, 'I');		/* action INSERT */
+	pq_sendbyte(out, LOGICAL_REP_MSG_INSERT);
 
 	/* transaction ID (if not valid, we're not streaming) */
 	if (TransactionIdIsValid(xid))
@@ -185,7 +185,7 @@ void
 logicalrep_write_update(StringInfo out, TransactionId xid, Relation rel,
 		HeapTuple oldtuple, HeapTuple newtuple, bool binary)
 {
-	pq_sendbyte(out, 'U');		/* action UPDATE */
+	pq_sendbyte(out, LOGICAL_REP_MSG_UPDATE);
 
 	Assert(rel->rd_rel->relreplident == REPLICA_IDENTITY_DEFAULT ||
 		   rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL ||
@@ -263,7 +263,7 @@ logicalrep_write_delete(StringInfo out, TransactionId xid, Relation rel,
 		   rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL ||
 		   rel->rd_rel->relreplident == REPLICA_IDENTITY_INDEX);
 
-	pq_sendbyte(out, 'D');		/* action DELETE */
+	pq_sendbyte(out, LOGICAL_REP_MSG_DELETE);
 
 	/* transaction ID (if not valid, we're not streaming) */
 	if (TransactionIdIsValid(xid))
@@ -317,7 +317,7 @@ logicalrep_write_truncate(StringInfo out,
 	int			i;
 	uint8		flags = 0;
 
-	pq_sendbyte(out, 'T');		/* action TRUNCATE */
+	pq_sendbyte(out, 

Re: Enumize logical replication message actions

2020-10-23 Thread Amit Kapila
On Fri, Oct 23, 2020 at 11:50 AM Kyotaro Horiguchi
 wrote:
>
> At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera  
> wrote in
> > On 2020-Oct-22, Ashutosh Bapat wrote:
> >
> > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi 
> > > wrote:
> >
> > > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > > > something like that we shouldn't do this refactoring, I think.
> > >
> > > Enum is an integer, and we want to send byte. The function asserts that 
> > > the
> > > enum fits a byte. If there's a way to declare byte long enums I would use
> > > that. But I didn't find a way to do that.
> >
> > I didn't look at the code, but maybe it's sufficient to add a
> > StaticAssert?
>
> That check needs to visit all symbols in a enum and confirm that each
> of them is in a certain range.
>

Can we define something like LOGICAL_REP_MSG_LAST (also add a comment
indicating this is a fake message and must be the last one) as the
last and just check that?

-- 
With Regards,
Amit Kapila.




Re: Enumize logical replication message actions

2020-10-23 Thread Ashutosh Bapat
On Fri, 23 Oct 2020 at 06:50, Kyotaro Horiguchi 
wrote:

>
> Those two switch()es are apparently redundant. That code is exactly
> equivalent to:
>
> apply_dispatch(s)
> {
>   LogicalRepMsgType msgtype = pq_getmsgtype(s);
>
>   switch (msgtype)
>   {
>  case LOGICAL_REP_MSG_BEGIN:
> apply_handle_begin();
> !   return;
>  ...
>  case LOGICAL_REP_MSG_STREAM_COMMIT:
> apply_handle_begin();
> !   return;
>   }
>
>   ereport(ERROR, (errmsg("invalid logical replication message type"..
> }
>
> which is smaller and fast.
>

Good idea. Implemented in the latest patch posted with the next mail.

-- 
Best Wishes,
Ashutosh


Re: Enumize logical replication message actions

2020-10-23 Thread Kyotaro Horiguchi
At Fri, 23 Oct 2020 19:53:00 +1100, Peter Smith  wrote 
in 
> On Fri, Oct 23, 2020 at 5:20 PM Kyotaro Horiguchi
>  wrote:
> >
> > At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera 
> >  wrote in
> > > On 2020-Oct-22, Ashutosh Bapat wrote:
> > >
> > > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi 
> > > > 
> > > > wrote:
> > >
> > > > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > > > > something like that we shouldn't do this refactoring, I think.
> > > >
> > > > Enum is an integer, and we want to send byte. The function asserts that 
> > > > the
> > > > enum fits a byte. If there's a way to declare byte long enums I would 
> > > > use
> > > > that. But I didn't find a way to do that.
> 
> The pq_send_logicalrep_msg_type() function seemed a bit overkill to me.

Ah, yes, it is what I meant. I didn't come up with the word "overkill".

> The comment in the LogicalRepMsgType enum will sufficiently ensure
> nobody is going to accidentally add any bad replication message codes.
> And it's not like these are going to be changed often.

Agreed.

> Why not simply downcast your enums when calling pq_sendbyte?
> There are only a few of them.
> 
> e.g. pq_sendbyte(out, (uint8)LOGICAL_REP_MSG_STREAM_COMMIT);

If you are worried about compiler warning, that explicit cast is not
required. Even if the symbol is larger than 0xff, the upper bytes are
silently truncated off.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Enumize logical replication message actions

2020-10-23 Thread Peter Smith
On Fri, Oct 23, 2020 at 5:20 PM Kyotaro Horiguchi
 wrote:
>
> At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera  
> wrote in
> > On 2020-Oct-22, Ashutosh Bapat wrote:
> >
> > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi 
> > > wrote:
> >
> > > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > > > something like that we shouldn't do this refactoring, I think.
> > >
> > > Enum is an integer, and we want to send byte. The function asserts that 
> > > the
> > > enum fits a byte. If there's a way to declare byte long enums I would use
> > > that. But I didn't find a way to do that.

The pq_send_logicalrep_msg_type() function seemed a bit overkill to me.

The comment in the LogicalRepMsgType enum will sufficiently ensure
nobody is going to accidentally add any bad replication message codes.
And it's not like these are going to be changed often.

Why not simply downcast your enums when calling pq_sendbyte?
There are only a few of them.

e.g. pq_sendbyte(out, (uint8)LOGICAL_REP_MSG_STREAM_COMMIT);

Kind Regards.
Peter Smith
Fujitsu Australia.




Re: Enumize logical replication message actions

2020-10-23 Thread Kyotaro Horiguchi
At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera  
wrote in 
> On 2020-Oct-22, Ashutosh Bapat wrote:
> 
> > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi 
> > wrote:
> 
> > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > > something like that we shouldn't do this refactoring, I think.
> > 
> > Enum is an integer, and we want to send byte. The function asserts that the
> > enum fits a byte. If there's a way to declare byte long enums I would use
> > that. But I didn't find a way to do that.
> 
> I didn't look at the code, but maybe it's sufficient to add a
> StaticAssert?

That check needs to visit all symbols in a enum and confirm that each
of them is in a certain range.

I thought of StaticAssert, but it cannot run a code and I don't know
of a syntax that loops through all symbols in a enumeration so I think
we needs to write a static assertion on every symbol in the
enumeration, which seems to be a kind of stupid.

enum hoge
{
  a = '1',
  b = '2',
  c = '3'
};

StaticAssertDecl((unsigned int)(a | b | c ...) <= 0xff, "too large symbol 
value");

I didn't come up with a way to apply static assertion on each symbol
definition line.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Enumize logical replication message actions

2020-10-22 Thread Alvaro Herrera
On 2020-Oct-22, Ashutosh Bapat wrote:

> On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi 
> wrote:

> > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > something like that we shouldn't do this refactoring, I think.
> 
> Enum is an integer, and we want to send byte. The function asserts that the
> enum fits a byte. If there's a way to declare byte long enums I would use
> that. But I didn't find a way to do that.

I didn't look at the code, but maybe it's sufficient to add a
StaticAssert?




Re: Enumize logical replication message actions

2020-10-22 Thread Kyotaro Horiguchi
At Fri, 23 Oct 2020 10:08:44 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Thu, 22 Oct 2020 16:37:18 +0530, Ashutosh Bapat 
>  wrote in 
> > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi 
> > wrote:
> > pg_get_logicalrep_msg_type() seems doing the same check (that the
> > > value is compared aganst every keyword value) with
> > > apply_dispatch(). Why do we need that function separately from
> > > apply_dispatch()?
> > >
> > >
> > The second patch removes the default case you quoted above. I think that's
> > important to detect any unhandled case at compile time rather than at run
> > time. But we need some way to detect whether the values we get from wire
> > are legit. pg_get_logicalrep_msg_type() does that. Further that function
> > can be used at places other than apply_dispatch() if required without each
> > of those places having their own validation.
> 
> Even if that enum contains out-of-range values, that "command" is sent
> having truncated to uint8 and on the receiver side apply_dispatch()
> doesn't identify the command and raises an error.  That is equivalent
> to what pq_send_logicalrep_msg_type() does. (Also equivalent on the
> point that symbols that are not used in regression are not checked.)

Sorry, this is about pg_send_logicalrep_msg_type(), not
pg_get..(). And I forgot to mention pg_get_logicalrep_msg_type().

For the pg_get_logicalrep_msg_type(), It is just a repetion of what
apply_displatch() does in switch().

If I flattened the code, it looks like:

apply_dispatch(s)
{
  LogicalRepMsgType msgtype = pq_getmsgtype(s);
  bool pass = false;

  switch (msgtype)
  {
 case LOGICAL_REP_MSG_BEGIN:
 ...
 case LOGICAL_REP_MSG_STREAM_COMMIT:
   pass = true;
  }
  if (!pass)
 ereport(ERROR, (errmsg("invalid logical replication message type"..

  switch (msgtype)
  {
 case LOGICAL_REP_MSG_BEGIN:
apply_handle_begin();
break;
 ...
 case LOGICAL_REP_MSG_STREAM_COMMIT:
apply_handle_begin();
break;
  }   
} 

Those two switch()es are apparently redundant. That code is exactly
equivalent to:

apply_dispatch(s)
{
  LogicalRepMsgType msgtype = pq_getmsgtype(s);

  switch (msgtype)
  {
 case LOGICAL_REP_MSG_BEGIN:
apply_handle_begin();
!   return;
 ...
 case LOGICAL_REP_MSG_STREAM_COMMIT:
apply_handle_begin();
!   return;
  }

  ereport(ERROR, (errmsg("invalid logical replication message type"..
} 
 
which is smaller and fast.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Enumize logical replication message actions

2020-10-22 Thread Kyotaro Horiguchi
At Thu, 22 Oct 2020 16:37:18 +0530, Ashutosh Bapat 
 wrote in 
> On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi 
> wrote:
> 
> >
> >
> > We shouldn't have the default: in the switch() block in
> > apply_dispatch(). That prevents compilers from checking
> > completeness. The content of the default: should be moved out to after
> > the switch() block.
> >
> > apply_dispatch()
> > {
> > switch (action)
> > {
> >
> > case LOGICAL_REP_MSG_STREAM_COMMIT(s);
> >apply_handle_stream_commit(s);
> >return;
> > }
> >
> > ereport(ERROR, ...);
> > }
> >
> > > 0002 adds wrappers on top of pq_sendbyte() and pq_getmsgbyte() to send
> > and
> > > receive a logical replication message type respectively. These wrappers
> > add
> > > more protection to make sure that the enum definitions fit one byte. This
> > > also removes the default case from apply_dispatch() so that we can detect
> > > any LogicalRepMsgType not handled by that function.
> >
> > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > something like that we shouldn't do this refactoring, I think.
> >
> 
> Enum is an integer, and we want to send byte. The function asserts that the
> enum fits a byte. If there's a way to declare byte long enums I would use
> that. But I didn't find a way to do that.

That way of defining enums can contain two different symbols with the
same value. If we need to check the values are actually in the range
of char, checking duplicate values has more importance from the
standpoint of likelihood.

AFAICS there're two instances of this kind of enums, CoreceionMethod
and TypeCat. None of them are not checked for width nor duplicates
when they are used.

Even if we need such a kind of check, it souldn't be a wrapper
function that adds costs on non-assertion builds, but a replacing of
pq_sendbyte() done only on USE_ASSERT_CHECKING.

> pg_get_logicalrep_msg_type() seems doing the same check (that the
> > value is compared aganst every keyword value) with
> > apply_dispatch(). Why do we need that function separately from
> > apply_dispatch()?
> >
> >
> The second patch removes the default case you quoted above. I think that's
> important to detect any unhandled case at compile time rather than at run
> time. But we need some way to detect whether the values we get from wire
> are legit. pg_get_logicalrep_msg_type() does that. Further that function
> can be used at places other than apply_dispatch() if required without each
> of those places having their own validation.

Even if that enum contains out-of-range values, that "command" is sent
having truncated to uint8 and on the receiver side apply_dispatch()
doesn't identify the command and raises an error.  That is equivalent
to what pq_send_logicalrep_msg_type() does. (Also equivalent on the
point that symbols that are not used in regression are not checked.)

reagrds.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Enumize logical replication message actions

2020-10-22 Thread Ashutosh Bapat
On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi 
wrote:

>
>
> We shouldn't have the default: in the switch() block in
> apply_dispatch(). That prevents compilers from checking
> completeness. The content of the default: should be moved out to after
> the switch() block.
>
> apply_dispatch()
> {
> switch (action)
> {
>
> case LOGICAL_REP_MSG_STREAM_COMMIT(s);
>apply_handle_stream_commit(s);
>return;
> }
>
> ereport(ERROR, ...);
> }
>
> > 0002 adds wrappers on top of pq_sendbyte() and pq_getmsgbyte() to send
> and
> > receive a logical replication message type respectively. These wrappers
> add
> > more protection to make sure that the enum definitions fit one byte. This
> > also removes the default case from apply_dispatch() so that we can detect
> > any LogicalRepMsgType not handled by that function.
>
> pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> something like that we shouldn't do this refactoring, I think.
>

Enum is an integer, and we want to send byte. The function asserts that the
enum fits a byte. If there's a way to declare byte long enums I would use
that. But I didn't find a way to do that.


pg_get_logicalrep_msg_type() seems doing the same check (that the
> value is compared aganst every keyword value) with
> apply_dispatch(). Why do we need that function separately from
> apply_dispatch()?
>
>
The second patch removes the default case you quoted above. I think that's
important to detect any unhandled case at compile time rather than at run
time. But we need some way to detect whether the values we get from wire
are legit. pg_get_logicalrep_msg_type() does that. Further that function
can be used at places other than apply_dispatch() if required without each
of those places having their own validation.

-- 
Best Wishes,
Ashutosh


Re: Enumize logical replication message actions

2020-10-22 Thread Kyotaro Horiguchi
At Thu, 22 Oct 2020 12:13:40 +0530, Ashutosh Bapat 
 wrote in 
> Thanks Andres for your review. Thanks Li, Horiguchi-san and Amit for your
> comments.
> 
> On Tue, 20 Oct 2020 at 04:57, Andres Freund  wrote:
> 
> > Hi,
> >
> > On 2020-10-16 12:55:26 +0530, Ashutosh Bapat wrote:
> > > Here's a patch simplifying that for top level logical replication
> > > messages.
> >
> > I think that's a good plan. One big benefit for me is that it's much
> > easier to search for an enum than for a single letter
> > constant. Including searching for all the places that deal with any sort
> > of logical rep message type.
> 
> 
> >
> > >  void
> > >  logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn)
> > >  {
> > > - pq_sendbyte(out, 'B');  /* BEGIN */
> > > + pq_sendbyte(out, LOGICAL_REP_MSG_BEGIN);/* BEGIN */
> >
> > I think if we have the LOGICAL_REP_MSG_BEGIN we don't need the /* BEGIN */.
> >
> 
> Yes. Fixed all places.
> 
> I have attached two places - 0001 which is previous 0001 patch with your
> comments addressed.

We shouldn't have the default: in the switch() block in
apply_dispatch(). That prevents compilers from checking
completeness. The content of the default: should be moved out to after
the switch() block.

apply_dispatch()
{
switch (action)
{
   
case LOGICAL_REP_MSG_STREAM_COMMIT(s);
   apply_handle_stream_commit(s);
   return;
}

ereport(ERROR, ...);
}

> 0002 adds wrappers on top of pq_sendbyte() and pq_getmsgbyte() to send and
> receive a logical replication message type respectively. These wrappers add
> more protection to make sure that the enum definitions fit one byte. This
> also removes the default case from apply_dispatch() so that we can detect
> any LogicalRepMsgType not handled by that function.

pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
something like that we shouldn't do this refactoring, I think.

pg_get_logicalrep_msg_type() seems doing the same check (that the
value is compared aganst every keyword value) with
apply_dispatch(). Why do we need that function separately from
apply_dispatch()?


> These two patches are intended to be committed together as a single commit.
> For now the second one is separate so that it's easy to remove the changes
> if they are not acceptable.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Enumize logical replication message actions

2020-10-22 Thread Ashutosh Bapat
Thanks Andres for your review. Thanks Li, Horiguchi-san and Amit for your
comments.

On Tue, 20 Oct 2020 at 04:57, Andres Freund  wrote:

> Hi,
>
> On 2020-10-16 12:55:26 +0530, Ashutosh Bapat wrote:
> > Here's a patch simplifying that for top level logical replication
> > messages.
>
> I think that's a good plan. One big benefit for me is that it's much
> easier to search for an enum than for a single letter
> constant. Including searching for all the places that deal with any sort
> of logical rep message type.


>
> >  void
> >  logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn)
> >  {
> > - pq_sendbyte(out, 'B');  /* BEGIN */
> > + pq_sendbyte(out, LOGICAL_REP_MSG_BEGIN);/* BEGIN */
>
> I think if we have the LOGICAL_REP_MSG_BEGIN we don't need the /* BEGIN */.
>

Yes. Fixed all places.

I have attached two places - 0001 which is previous 0001 patch with your
comments addressed.

0002 adds wrappers on top of pq_sendbyte() and pq_getmsgbyte() to send and
receive a logical replication message type respectively. These wrappers add
more protection to make sure that the enum definitions fit one byte. This
also removes the default case from apply_dispatch() so that we can detect
any LogicalRepMsgType not handled by that function.

These two patches are intended to be committed together as a single commit.
For now the second one is separate so that it's easy to remove the changes
if they are not acceptable.

-- 
Best Wishes,
Ashutosh
From 8da06210710033946de1541d6587ecc783fa3649 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Fri, 16 Oct 2020 12:31:35 +0530
Subject: [PATCH 1/2] Enumize top level logical replication actions

Logical replication protocol uses single byte character to identify
different chunks of logical repliation messages. The code uses string
literals for the same. Enumize those so that
1. All the string literals used can be found at a single place. This
makes it easy to add more actions without the risk of conflicts.
2. It's easy to locate the code handling a given action.

Ashutosh Bapat
---
 src/backend/replication/logical/proto.c  | 26 ++--
 src/backend/replication/logical/worker.c | 54 
 src/include/replication/logicalproto.h   | 25 +++
 3 files changed, 65 insertions(+), 40 deletions(-)

diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index eb19142b48..fdb31182d7 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -44,7 +44,7 @@ static const char *logicalrep_read_namespace(StringInfo in);
 void
 logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn)
 {
-	pq_sendbyte(out, 'B');		/* BEGIN */
+	pq_sendbyte(out, LOGICAL_REP_MSG_BEGIN);
 
 	/* fixed fields */
 	pq_sendint64(out, txn->final_lsn);
@@ -76,7 +76,7 @@ logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
 {
 	uint8		flags = 0;
 
-	pq_sendbyte(out, 'C');		/* sending COMMIT */
+	pq_sendbyte(out, LOGICAL_REP_MSG_COMMIT);
 
 	/* send the flags field (unused for now) */
 	pq_sendbyte(out, flags);
@@ -112,7 +112,7 @@ void
 logicalrep_write_origin(StringInfo out, const char *origin,
 		XLogRecPtr origin_lsn)
 {
-	pq_sendbyte(out, 'O');		/* ORIGIN */
+	pq_sendbyte(out, LOGICAL_REP_MSG_ORIGIN);
 
 	/* fixed fields */
 	pq_sendint64(out, origin_lsn);
@@ -141,7 +141,7 @@ void
 logicalrep_write_insert(StringInfo out, TransactionId xid, Relation rel,
 		HeapTuple newtuple, bool binary)
 {
-	pq_sendbyte(out, 'I');		/* action INSERT */
+	pq_sendbyte(out, LOGICAL_REP_MSG_INSERT);
 
 	/* transaction ID (if not valid, we're not streaming) */
 	if (TransactionIdIsValid(xid))
@@ -185,7 +185,7 @@ void
 logicalrep_write_update(StringInfo out, TransactionId xid, Relation rel,
 		HeapTuple oldtuple, HeapTuple newtuple, bool binary)
 {
-	pq_sendbyte(out, 'U');		/* action UPDATE */
+	pq_sendbyte(out, LOGICAL_REP_MSG_UPDATE);
 
 	Assert(rel->rd_rel->relreplident == REPLICA_IDENTITY_DEFAULT ||
 		   rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL ||
@@ -263,7 +263,7 @@ logicalrep_write_delete(StringInfo out, TransactionId xid, Relation rel,
 		   rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL ||
 		   rel->rd_rel->relreplident == REPLICA_IDENTITY_INDEX);
 
-	pq_sendbyte(out, 'D');		/* action DELETE */
+	pq_sendbyte(out, LOGICAL_REP_MSG_DELETE);
 
 	/* transaction ID (if not valid, we're not streaming) */
 	if (TransactionIdIsValid(xid))
@@ -317,7 +317,7 @@ logicalrep_write_truncate(StringInfo out,
 	int			i;
 	uint8		flags = 0;
 
-	pq_sendbyte(out, 'T');		/* action TRUNCATE */
+	pq_sendbyte(out, LOGICAL_REP_MSG_TRUNCATE);
 
 	/* transaction ID (if not valid, we're not streaming) */
 	if (TransactionIdIsValid(xid))
@@ -369,7 +369,7 @@ logicalrep_write_rel(StringInfo out, TransactionId xid, Relation rel)
 {
 	char	   *relname;
 
-	pq_sendbyte(out, 'R');		/* sending RELATION */
+	pq_sendbyte(out, LOGICAL_REP_MSG_RELATION);
 
 	/* transaction 

Re: Enumize logical replication message actions

2020-10-19 Thread Andres Freund
Hi,

On 2020-10-16 12:55:26 +0530, Ashutosh Bapat wrote:
> Here's a patch simplifying that for top level logical replication
> messages.

I think that's a good plan. One big benefit for me is that it's much
easier to search for an enum than for a single letter
constant. Including searching for all the places that deal with any sort
of logical rep message type.


>  void
>  logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn)
>  {
> - pq_sendbyte(out, 'B');  /* BEGIN */
> + pq_sendbyte(out, LOGICAL_REP_MSG_BEGIN);/* BEGIN */

I think if we have the LOGICAL_REP_MSG_BEGIN we don't need the /* BEGIN */.


Greetings,

Andres Freund




Re: Enumize logical replication message actions

2020-10-16 Thread Amit Kapila
On Fri, Oct 16, 2020 at 12:55 PM Ashutosh Bapat
 wrote:
>
> Hi All,
>  Logical replication protocol uses single byte character to identify
> different chunks of logical repliation messages. The code uses
> character literals for the same. These literals are used as bare
> constants in code as well. That's true for almost all the code that
> deals with wire protocol. With that it becomes difficult to identify
> the code which deals with a particular message. For example code that
> deals with message type 'B'. In various protocol 'B' has different
> meaning and it gets difficult and time consuming to differentiate one
> usage from other and find all places which deal with one usage. Here's
> a patch simplifying that for top level logical replication messages.
>

+1. I think this will make the code easier to read and understand. I
think it would be good to do this in some other parts as well but
starting with logical replication is a good idea as that area is still
evolving.

-- 
With Regards,
Amit Kapila.




Re: Enumize logical replication message actions

2020-10-16 Thread Ashutosh Bapat
On Fri, 16 Oct 2020 at 14:06, Kyotaro Horiguchi 
wrote:

> At Fri, 16 Oct 2020 08:08:40 +, Li Japin  wrote
> in
> >
> > > On Oct 16, 2020, at 3:25 PM, Ashutosh Bapat <
> ashutosh.bapat@gmail.com> wrote:
>
> >
> > What about ’N’ for new tuples, ‘O’ for old tuple follows, ‘K’ for old
> key follows?
> > Those are also logical replication protocol message, I think.
>
> They are flags stored in a message so they can be seen as different
> from the message type letters.
>

I think we converting those into macros/enums will help but for now I have
tackled only the top level message types.


>
> Anyway if the values are determined after some meaning, I'm not sure
> enumerize them is good thing or not.  In other words, 'U' conveys
> almost same amount of information with LOGICAL_REP_MSG_UPDATE in the
> context of logical replcation protocol.
>
> We have the same code pattern in PostgresMain and perhaps we don't
> going to change them into enums.
>

That's exactly the problem I am trying to solve. Take for example 'B' as I
have mentioned before. That string literal appears in 64 different places
in the master branch. Which of those are the ones related to a "BEGIN"
message in logical replication protocol is not clear, unless I thumb
through each of those. In PostgresMain it's used to indicate a BIND
message. Which of those 64 instances are also using 'B' to mean a bind
message? Using enums or macros makes it clear. Just look
up LOGICAL_REP_MSG_BEGIN. Converting all 'B' to their respective macros
will help but might be problematic for back-patching. So that's arguable.
But doing that in something as new as logical replication will be helpful,
before it gets too late to change that.

Further logical repliation protocol is using the same literal e.g. 'O' to
mean origin in some places and old tuple in some other. While comments
there help, it's not easy to locate all the code that deals with one
meaning or the other. This change will help with that. Another reason as to
why logical replication.
-- 
Best Wishes,
Ashutosh


Re: Enumize logical replication message actions

2020-10-16 Thread Kyotaro Horiguchi
At Fri, 16 Oct 2020 08:08:40 +, Li Japin  wrote in 
> 
> > On Oct 16, 2020, at 3:25 PM, Ashutosh Bapat  
> > wrote:
> > 
> > Hi All,
> > Logical replication protocol uses single byte character to identify
> > different chunks of logical repliation messages. The code uses
> > character literals for the same. These literals are used as bare
> > constants in code as well. That's true for almost all the code that
> > deals with wire protocol. With that it becomes difficult to identify
> > the code which deals with a particular message. For example code that
> > deals with message type 'B'. In various protocol 'B' has different
> > meaning and it gets difficult and time consuming to differentiate one
> > usage from other and find all places which deal with one usage. Here's
> > a patch simplifying that for top level logical replication messages.
> > 
> > I think I have covered the places that need change. But I might have
> > missed something, given that these literals are used at several other
> > places (a problem this patch tries to fix :)).
> > 
> > Initially I had used #define for the same, but Peter E suggested using
> > Enums so that switch cases can detect any remaining items along with
> > stronger type checks.
> > 
> > Pavan offleast suggested to create a wrapper
> > pg_send_logical_rep_message() on top of pg_sendbyte(), similarly for
> > pg_getmsgbyte(). I wanted to see if this change is acceptable. If so,
> > I will change that as well. Comments/suggestions welcome.
> > 
> > -- 
> > Best Wishes,
> > Ashutosh Bapat
> > <0001-Enumize-top-level-logical-replication-actions.patch>
> 
> What about ’N’ for new tuples, ‘O’ for old tuple follows, ‘K’ for old key 
> follows?
> Those are also logical replication protocol message, I think.

They are flags stored in a message so they can be seen as different
from the message type letters.

Anyway if the values are determined after some meaning, I'm not sure
enumerize them is good thing or not.  In other words, 'U' conveys
almost same amount of information with LOGICAL_REP_MSG_UPDATE in the
context of logical replcation protocol.

We have the same code pattern in PostgresMain and perhaps we don't
going to change them into enums.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


Re: Enumize logical replication message actions

2020-10-16 Thread Li Japin

> On Oct 16, 2020, at 3:25 PM, Ashutosh Bapat  
> wrote:
> 
> Hi All,
> Logical replication protocol uses single byte character to identify
> different chunks of logical repliation messages. The code uses
> character literals for the same. These literals are used as bare
> constants in code as well. That's true for almost all the code that
> deals with wire protocol. With that it becomes difficult to identify
> the code which deals with a particular message. For example code that
> deals with message type 'B'. In various protocol 'B' has different
> meaning and it gets difficult and time consuming to differentiate one
> usage from other and find all places which deal with one usage. Here's
> a patch simplifying that for top level logical replication messages.
> 
> I think I have covered the places that need change. But I might have
> missed something, given that these literals are used at several other
> places (a problem this patch tries to fix :)).
> 
> Initially I had used #define for the same, but Peter E suggested using
> Enums so that switch cases can detect any remaining items along with
> stronger type checks.
> 
> Pavan offleast suggested to create a wrapper
> pg_send_logical_rep_message() on top of pg_sendbyte(), similarly for
> pg_getmsgbyte(). I wanted to see if this change is acceptable. If so,
> I will change that as well. Comments/suggestions welcome.
> 
> -- 
> Best Wishes,
> Ashutosh Bapat
> <0001-Enumize-top-level-logical-replication-actions.patch>

What about ’N’ for new tuples, ‘O’ for old tuple follows, ‘K’ for old key 
follows?
Those are also logical replication protocol message, I think.

--
Best regards
Japin Li



Enumize logical replication message actions

2020-10-16 Thread Ashutosh Bapat
Hi All,
 Logical replication protocol uses single byte character to identify
different chunks of logical repliation messages. The code uses
character literals for the same. These literals are used as bare
constants in code as well. That's true for almost all the code that
deals with wire protocol. With that it becomes difficult to identify
the code which deals with a particular message. For example code that
deals with message type 'B'. In various protocol 'B' has different
meaning and it gets difficult and time consuming to differentiate one
usage from other and find all places which deal with one usage. Here's
a patch simplifying that for top level logical replication messages.

I think I have covered the places that need change. But I might have
missed something, given that these literals are used at several other
places (a problem this patch tries to fix :)).

Initially I had used #define for the same, but Peter E suggested using
Enums so that switch cases can detect any remaining items along with
stronger type checks.

Pavan offleast suggested to create a wrapper
pg_send_logical_rep_message() on top of pg_sendbyte(), similarly for
pg_getmsgbyte(). I wanted to see if this change is acceptable. If so,
I will change that as well. Comments/suggestions welcome.

-- 
Best Wishes,
Ashutosh Bapat
From fa2447ff73cc94b27e3641eda1a3f3d26fd72381 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Fri, 16 Oct 2020 12:31:35 +0530
Subject: [PATCH] Enumize top level logical replication actions

Logical replication protocol uses single byte character to identify
different chunks of logical repliation messages. The code uses string
literals for the same. Enumize those so that
1. All the string literals used can be found at a single place. This
makes it easy to add more actions without the risk of conflicts.
2. It's easy to locate the code handling a given action.

We could create such enums even for the string literals that
differentiate parts of a message, but I have not attempted that in this
commit.

Ashutosh Bapat
---
 src/backend/replication/logical/proto.c  | 26 ++--
 src/backend/replication/logical/worker.c | 52 
 src/include/replication/logicalproto.h   | 19 +
 3 files changed, 58 insertions(+), 39 deletions(-)

diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index eb19142b48..09bce22d96 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -44,7 +44,7 @@ static const char *logicalrep_read_namespace(StringInfo in);
 void
 logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn)
 {
-	pq_sendbyte(out, 'B');		/* BEGIN */
+	pq_sendbyte(out, LOGICAL_REP_MSG_BEGIN);		/* BEGIN */
 
 	/* fixed fields */
 	pq_sendint64(out, txn->final_lsn);
@@ -76,7 +76,7 @@ logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
 {
 	uint8		flags = 0;
 
-	pq_sendbyte(out, 'C');		/* sending COMMIT */
+	pq_sendbyte(out, LOGICAL_REP_MSG_COMMIT);		/* sending COMMIT */
 
 	/* send the flags field (unused for now) */
 	pq_sendbyte(out, flags);
@@ -112,7 +112,7 @@ void
 logicalrep_write_origin(StringInfo out, const char *origin,
 		XLogRecPtr origin_lsn)
 {
-	pq_sendbyte(out, 'O');		/* ORIGIN */
+	pq_sendbyte(out, LOGICAL_REP_MSG_ORIGIN);		/* ORIGIN */
 
 	/* fixed fields */
 	pq_sendint64(out, origin_lsn);
@@ -141,7 +141,7 @@ void
 logicalrep_write_insert(StringInfo out, TransactionId xid, Relation rel,
 		HeapTuple newtuple, bool binary)
 {
-	pq_sendbyte(out, 'I');		/* action INSERT */
+	pq_sendbyte(out, LOGICAL_REP_MSG_INSERT);		/* action INSERT */
 
 	/* transaction ID (if not valid, we're not streaming) */
 	if (TransactionIdIsValid(xid))
@@ -185,7 +185,7 @@ void
 logicalrep_write_update(StringInfo out, TransactionId xid, Relation rel,
 		HeapTuple oldtuple, HeapTuple newtuple, bool binary)
 {
-	pq_sendbyte(out, 'U');		/* action UPDATE */
+	pq_sendbyte(out, LOGICAL_REP_MSG_UPDATE);		/* action UPDATE */
 
 	Assert(rel->rd_rel->relreplident == REPLICA_IDENTITY_DEFAULT ||
 		   rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL ||
@@ -263,7 +263,7 @@ logicalrep_write_delete(StringInfo out, TransactionId xid, Relation rel,
 		   rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL ||
 		   rel->rd_rel->relreplident == REPLICA_IDENTITY_INDEX);
 
-	pq_sendbyte(out, 'D');		/* action DELETE */
+	pq_sendbyte(out, LOGICAL_REP_MSG_DELETE);		/* action DELETE */
 
 	/* transaction ID (if not valid, we're not streaming) */
 	if (TransactionIdIsValid(xid))
@@ -317,7 +317,7 @@ logicalrep_write_truncate(StringInfo out,
 	int			i;
 	uint8		flags = 0;
 
-	pq_sendbyte(out, 'T');		/* action TRUNCATE */
+	pq_sendbyte(out, LOGICAL_REP_MSG_TRUNCATE);		/* action TRUNCATE */
 
 	/* transaction ID (if not valid, we're not streaming) */
 	if (TransactionIdIsValid(xid))
@@ -369,7 +369,7 @@ logicalrep_write_rel(StringInfo out, TransactionId xid, Relation rel)
 {
 	char	   *relname;
 
-	pq_sendbyte(out, 'R');