Re: Logical insert/update/delete WAL records for custom table AMs

2022-07-20 Thread Jeff Davis
On Mon, 2022-03-21 at 17:43 -0700, Andres Freund wrote:
> Currently this doesn't apply: 
> http://cfbot.cputube.org/patch_37_3394.log

Withdrawn for now. With custom WAL resource managers this is less
important to me.

I still think it has value, and I'm willing to work on it if more use
cases come forward.

Regards,
Jeff Davis






Re: Logical insert/update/delete WAL records for custom table AMs

2022-03-31 Thread Jeff Davis
On Thu, 2022-03-31 at 09:05 -0400, Robert Haas wrote:
> 1. An AM that doesn't care about having anything happening during
> recovery, but wants to be able to get logical decoding to do some
> work. Maybe the intention of the AM is that data is available only
> when the server is not in recovery and all data is lost on shutdown,
> or maybe the AM has its own separate durability mechanism.

This is a speculative use case that is not what I would use it for, but
perhaps someone wants to do that with a table AM or maybe an FDW.

> 2. An AM that wants things to happen during recovery, but handles
> that
> separately. For example, maybe it logs all the data changes via
> log_newpage() and then separately emits these log records.

Yes, or Generic WAL. Generic WAL seems like a half-feature without this
Logical WAL patch, because it's hopeless to support logical
decoding/replication of whatever data you're logging with Generic WAL.
That's probably the strongest argument for this patch.

> 3. An AM that wants things to happen during recovery, and expects
> these records to serve that purpose. That would require getting
> control during WAL replay as well as during decoding, and would
> probably only work for an AM whose data is not block-structured (e.g.
> an in-memory btree that is dumped to disk at every checkpoint).

This patch would not work in this case because the records are ignored
during REDO.

Regards,
Jeff Davis







Re: Logical insert/update/delete WAL records for custom table AMs

2022-03-31 Thread Robert Haas
On Wed, Mar 30, 2022 at 2:31 PM Jeff Davis  wrote:
> Attached a new version. The scope expanded, so this is likely to slip
> v15 at this late time. For 15, I'll focus on my extensible rmgr work,
> which can serve similar purposes.
>
> The main purpose of this patch is to be able to emit logical events for
> a table (insert/update/delete/truncate) without actually modifying a
> table or relying on the heap at all. That allows table AMs to support
> logical decoding/replication, and perhaps allows a few other
> interesting use cases (maybe foreign tables?). There are really two
> advantages of this approach over relying on a custom rmgr:
>
>   1. Easier for extension authors
>   2. Doesn't require an extension module to be loaded to start the
> server

I'm not sure that I understand exactly how this is intended to be
used. I can think of three possibilities:

1. An AM that doesn't care about having anything happening during
recovery, but wants to be able to get logical decoding to do some
work. Maybe the intention of the AM is that data is available only
when the server is not in recovery and all data is lost on shutdown,
or maybe the AM has its own separate durability mechanism.

2. An AM that wants things to happen during recovery, but handles that
separately. For example, maybe it logs all the data changes via
log_newpage() and then separately emits these log records.

3. An AM that wants things to happen during recovery, and expects
these records to serve that purpose. That would require getting
control during WAL replay as well as during decoding, and would
probably only work for an AM whose data is not block-structured (e.g.
an in-memory btree that is dumped to disk at every checkpoint).

My guess is that this is intended to meet use cases 1 and 2 but not 3.
Is that correct?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Logical insert/update/delete WAL records for custom table AMs

2022-03-30 Thread Jeff Davis
On Thu, 2022-02-24 at 20:35 +, Simon Riggs wrote:
> The approach is perfectly fine, it just needs to be finished and
> rebased.

Attached a new version. The scope expanded, so this is likely to slip
v15 at this late time. For 15, I'll focus on my extensible rmgr work,
which can serve similar purposes.

The main purpose of this patch is to be able to emit logical events for
a table (insert/update/delete/truncate) without actually modifying a
table or relying on the heap at all. That allows table AMs to support
logical decoding/replication, and perhaps allows a few other
interesting use cases (maybe foreign tables?). There are really two
advantages of this approach over relying on a custom rmgr:

  1. Easier for extension authors
  2. Doesn't require an extension module to be loaded to start the
server

Those are very nice advantages, but they come at the price of
flexibility and performance. I think there's room for both, and we can
discuss the merits individually.

Changes:
  * Support logical messages for INSERT/UPDATE/DELETE/TRUNCATE
(before it only supported INSERT)
  * SQL functions pg_logical_emit_insert/update/delete/truncate
(callable by superuser)
  * Tests (using test_decoding)
  * Use replica identities properly
  * In general a lot of cleanup, but still not quite ready

TODO:
  * Should SQL functions be callable by the table owner? I would lean
toward superuser-only, because logical replication is used for
administrative purposes like upgrades, and I don't think we want table
owners to be able to create inconsistencies.
  * Support for multi-insert
  * Docs for SQL functions, and maybe docs in the section on Generic
WAL
  * Try to get away from reliance on heap tuples specifically

Regards,
Jeff Davis

From f7b85aea60b06eb7019befec38566b5e014bea12 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Sat, 30 Oct 2021 12:07:35 -0700
Subject: [PATCH] Logical wal.

---
 contrib/test_decoding/expected/messages.out   | 148 +++
 contrib/test_decoding/sql/messages.sql|  58 +++
 src/backend/access/heap/heapam.c  |   4 +-
 src/backend/access/rmgrdesc/Makefile  |   2 +-
 .../{logicalmsgdesc.c => logicaldesc.c}   |  45 +-
 src/backend/access/transam/rmgr.c |   2 +-
 src/backend/replication/logical/Makefile  |   2 +-
 src/backend/replication/logical/decode.c  | 275 ++--
 .../replication/logical/logical_xlog.c| 399 ++
 .../replication/logical/logicalfuncs.c| 165 +++-
 src/backend/replication/logical/message.c |  89 
 src/bin/pg_waldump/.gitignore |   2 +-
 src/bin/pg_waldump/rmgrdesc.c |   2 +-
 src/include/access/heapam.h   |   2 +
 src/include/access/heapam_xlog.h  |   3 +
 src/include/access/rmgrlist.h |   2 +-
 src/include/catalog/pg_proc.dat   |  17 +
 src/include/replication/decode.h  |   2 +-
 src/include/replication/logical_xlog.h| 124 ++
 src/include/replication/message.h |  41 --
 20 files changed, 1211 insertions(+), 173 deletions(-)
 rename src/backend/access/rmgrdesc/{logicalmsgdesc.c => logicaldesc.c} (59%)
 create mode 100644 src/backend/replication/logical/logical_xlog.c
 delete mode 100644 src/backend/replication/logical/message.c
 create mode 100644 src/include/replication/logical_xlog.h
 delete mode 100644 src/include/replication/message.h

diff --git a/contrib/test_decoding/expected/messages.out b/contrib/test_decoding/expected/messages.out
index c75d40190b6..aa284bc37c2 100644
--- a/contrib/test_decoding/expected/messages.out
+++ b/contrib/test_decoding/expected/messages.out
@@ -91,6 +91,154 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'for
 --
 (0 rows)
 
+-- no data in this table, but emit logical INSERT/UPDATE/DELETE for it
+CREATE TABLE dummy(i int, t text, n numeric, primary key(t));
+SELECT pg_logical_emit_insert('dummy', row(1, 'one', 0.1)::dummy) <> '0/0'::pg_lsn;
+ ?column? 
+--
+ t
+(1 row)
+
+SELECT pg_logical_emit_insert('dummy', row(2, 'two', 0.2)::dummy) <> '0/0'::pg_lsn;
+ ?column? 
+--
+ t
+(1 row)
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'force-binary', '0', 'skip-empty-xacts', '1', 'include-xids', '0');
+ data  
+---
+ BEGIN
+ table public.dummy: INSERT: i[integer]:1 t[text]:'one' n[numeric]:0.1
+ COMMIT
+ BEGIN
+ table public.dummy: INSERT: i[integer]:2 t[text]:'two' n[numeric]:0.2
+ COMMIT
+(6 rows)
+
+SELECT * FROM dummy;
+ i | t | n 
+---+---+---
+(0 rows)
+
+SELECT pg_logical_emit_delete('dummy', row(12, 'twelve', 0.12)::dummy) <> '0/0'::pg_lsn;
+ ?column? 
+--
+ t
+(1 row)
+
+SELECT pg_logical_emit_update('dummy', row(15, 'fifteen', 0.15)::dummy,
+   row(16, 'sixteen', 0.16)::dummy) <> 

Re: Logical insert/update/delete WAL records for custom table AMs

2022-03-21 Thread Andres Freund
Hi,

On 2022-01-17 01:05:14 -0800, Jeff Davis wrote:
> Great, I attached a rebased version.

Currently this doesn't apply: http://cfbot.cputube.org/patch_37_3394.log

- Andres




Re: Logical insert/update/delete WAL records for custom table AMs

2022-02-24 Thread Simon Riggs
On Mon, 17 Jan 2022 at 09:05, Jeff Davis  wrote:
>
> On Wed, 2022-01-05 at 20:19 +, Simon Riggs wrote:
> > I spoke with Jeff in detail about this patch in NYC Dec 21, and I now
> > think it is worth pursuing. It seems much more likely that this would
> > be acceptable than fully extensible rmgr.
>
> Thank you. I had some conversations with others who felt this approach
> is wasteful, which it is. But if this patch still has potential, I'm
> happy to pursue it along with the extensible rmgr approach.

It's self-contained and generally useful for a range of applications,
so the code would be long-lived.

Calling it wasteful presumes the way you'd use it. If you make logical
log entries you don't need to make physical ones, so its actually the
physical log entries that would be wasteful.

Logical log entries don't need to be decoded, so it's actually more
efficient, plus we could skip index entries altogether.

I would argue that this is the way we should be doing WAL, with
occasional divergence to physical records for performance, rather than
the other way around.

> > So I'm signing up as a reviewer and we'll see if this is good to go.
>
> Great, I attached a rebased version.

The approach is perfectly fine, it just needs to be finished and rebased.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Logical insert/update/delete WAL records for custom table AMs

2022-01-17 Thread Jeff Davis
On Wed, 2022-01-05 at 20:19 +, Simon Riggs wrote:
> I spoke with Jeff in detail about this patch in NYC Dec 21, and I now
> think it is worth pursuing. It seems much more likely that this would
> be acceptable than fully extensible rmgr.

Thank you. I had some conversations with others who felt this approach
is wasteful, which it is. But if this patch still has potential, I'm
happy to pursue it along with the extensible rmgr approach.

> So I'm signing up as a reviewer and we'll see if this is good to go.

Great, I attached a rebased version.

Regards,
Jeff Davis

diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd862..ed6dff179be 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -18,7 +18,7 @@ OBJS = \
 	gistdesc.o \
 	hashdesc.o \
 	heapdesc.o \
-	logicalmsgdesc.o \
+	logicaldesc.o \
 	mxactdesc.o \
 	nbtdesc.o \
 	relmapdesc.o \
diff --git a/src/backend/access/rmgrdesc/logicalmsgdesc.c b/src/backend/access/rmgrdesc/logicaldesc.c
similarity index 59%
rename from src/backend/access/rmgrdesc/logicalmsgdesc.c
rename to src/backend/access/rmgrdesc/logicaldesc.c
index 099e11a84e7..c2b0434a606 100644
--- a/src/backend/access/rmgrdesc/logicalmsgdesc.c
+++ b/src/backend/access/rmgrdesc/logicaldesc.c
@@ -1,22 +1,22 @@
 /*-
  *
- * logicalmsgdesc.c
- *	  rmgr descriptor routines for replication/logical/message.c
+ * logicaldesc.c
+ *	  rmgr descriptor routines for replication/logical/logical_xlog.c
  *
  * Portions Copyright (c) 2015-2022, PostgreSQL Global Development Group
  *
  *
  * IDENTIFICATION
- *	  src/backend/access/rmgrdesc/logicalmsgdesc.c
+ *	  src/backend/access/rmgrdesc/logicaldesc.c
  *
  *-
  */
 #include "postgres.h"
 
-#include "replication/message.h"
+#include "replication/logical_xlog.h"
 
 void
-logicalmsg_desc(StringInfo buf, XLogReaderState *record)
+logical_desc(StringInfo buf, XLogReaderState *record)
 {
 	char	   *rec = XLogRecGetData(record);
 	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
@@ -40,13 +40,42 @@ logicalmsg_desc(StringInfo buf, XLogReaderState *record)
 			sep = " ";
 		}
 	}
+	else if (info == XLOG_LOGICAL_INSERT)
+	{
+
+	}
+	else if (info == XLOG_LOGICAL_UPDATE)
+	{
+
+	}
+	else if (info == XLOG_LOGICAL_DELETE)
+	{
+
+	}
+	else if (info == XLOG_LOGICAL_TRUNCATE)
+	{
+
+	}
 }
 
 const char *
-logicalmsg_identify(uint8 info)
+logical_identify(uint8 info)
 {
-	if ((info & ~XLR_INFO_MASK) == XLOG_LOGICAL_MESSAGE)
-		return "MESSAGE";
+	switch (info)
+	{
+		case XLOG_LOGICAL_MESSAGE:
+			return "MESSAGE";
+		case XLOG_LOGICAL_INSERT:
+			return "INSERT";
+		case XLOG_LOGICAL_UPDATE:
+			return "UPDATE";
+		case XLOG_LOGICAL_DELETE:
+			return "DELETE";
+		case XLOG_LOGICAL_TRUNCATE:
+			return "TRUNCATE";
+		default:
+			return NULL;
+	}
 
 	return NULL;
 }
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 58091f6b520..f31f7187ac4 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -24,7 +24,7 @@
 #include "commands/dbcommands_xlog.h"
 #include "commands/sequence.h"
 #include "commands/tablespace.h"
-#include "replication/message.h"
+#include "replication/logical_xlog.h"
 #include "replication/origin.h"
 #include "storage/standby.h"
 #include "utils/relmapper.h"
diff --git a/src/backend/replication/logical/Makefile b/src/backend/replication/logical/Makefile
index c4e2fdeb719..9fa281a1dca 100644
--- a/src/backend/replication/logical/Makefile
+++ b/src/backend/replication/logical/Makefile
@@ -19,7 +19,7 @@ OBJS = \
 	launcher.o \
 	logical.o \
 	logicalfuncs.o \
-	message.o \
+	logical_xlog.o \
 	origin.o \
 	proto.o \
 	relation.o \
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 1d22208c1ad..1abccd7190b 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -37,7 +37,7 @@
 #include "catalog/pg_control.h"
 #include "replication/decode.h"
 #include "replication/logical.h"
-#include "replication/message.h"
+#include "replication/logical_xlog.h"
 #include "replication/origin.h"
 #include "replication/reorderbuffer.h"
 #include "replication/snapbuild.h"
@@ -56,16 +56,19 @@ static void DecodeHeapOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
 static void DecodeHeap2Op(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
 static void DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
 static void DecodeStandbyOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
-static void DecodeLogicalMsgOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
+static void DecodeLogicalOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf);
 
 /* individual record(group)'s handlers */
-static void DecodeInsert(LogicalDecodingContext 

Re: Logical insert/update/delete WAL records for custom table AMs

2022-01-06 Thread Simon Riggs
On Sun, 31 Oct 2021 at 18:10, Jeff Davis  wrote:

> I'm looking for some review on the approach and structure before I
> polish and test it.

Repurposing the logical msg rmgr into a general purpose logical rmgr
seems right.

Structure looks obvious, which is good.

Please pursue this and I will review with you as you go.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Logical insert/update/delete WAL records for custom table AMs

2022-01-05 Thread Simon Riggs
On Wed, 10 Nov 2021 at 03:17, Amit Kapila  wrote:
>
> On Tue, Nov 9, 2021 at 5:12 AM Jeff Davis  wrote:
> >
> > On Fri, 2021-11-05 at 10:00 +0530, Amit Kapila wrote:
> > > I am not talking about decoding plugin but rather decoding itself,
> > > basically, the work we do in reorderbuffer.c, decode.c, etc. The two
> > > things I remember were tuple format and transaction ids as mentioned
> > > in my previous email.
> >
> > If it's difficult to come up with something that will work for all
> > table AMs, then it suggests that we might want to go towards fully
> > extensible rmgr, and have a decoding method in the rmgr.
> >
> > I started a thread (with a patch) here:
> >
> >
> > https://postgr.es/m/ed1fb2e22d15d3563ae0eb610f7b61bb15999c0a.ca...@j-davis.com
> >
> > > I think we should try to find a solution for
> > > tuple format as the current decoding code relies on it if we want
> > > decoding to deal with another table AMs transparently.
> >
> > Agreed, but it seems like we need basic extensibility first. For now,
> > we'll need to convert to a heap tuple,
> >
>
> Okay, but that might have a cost because we need to convert it before
> WAL logging it, and then we probably also need to convert back to the
> original table AM format during recovery/standby apply.

I spoke with Jeff in detail about this patch in NYC Dec 21, and I now
think it is worth pursuing. It seems much more likely that this would
be acceptable than fully extensible rmgr.

Amit asks a good question: should we be writing a message that seems
to presume the old heap tuple format? My answer is that we clearly
need it to be written in *some* common format, and the current heap
format currently works, so de facto it is the format we should use.
Right now, TAMs have to reformat back into this same format, so it is
the way the APIs work. Put it another way: I don't see any other
format that makes sense to use, now, but that could change in the
future.

So I'm signing up as a reviewer and we'll see if this is good to go.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Logical insert/update/delete WAL records for custom table AMs

2021-11-09 Thread Amit Kapila
On Tue, Nov 9, 2021 at 5:12 AM Jeff Davis  wrote:
>
> On Fri, 2021-11-05 at 10:00 +0530, Amit Kapila wrote:
> > I am not talking about decoding plugin but rather decoding itself,
> > basically, the work we do in reorderbuffer.c, decode.c, etc. The two
> > things I remember were tuple format and transaction ids as mentioned
> > in my previous email.
>
> If it's difficult to come up with something that will work for all
> table AMs, then it suggests that we might want to go towards fully
> extensible rmgr, and have a decoding method in the rmgr.
>
> I started a thread (with a patch) here:
>
>
> https://postgr.es/m/ed1fb2e22d15d3563ae0eb610f7b61bb15999c0a.ca...@j-davis.com
>
> > I think we should try to find a solution for
> > tuple format as the current decoding code relies on it if we want
> > decoding to deal with another table AMs transparently.
>
> Agreed, but it seems like we need basic extensibility first. For now,
> we'll need to convert to a heap tuple,
>

Okay, but that might have a cost because we need to convert it before
WAL logging it, and then we probably also need to convert back to the
original table AM format during recovery/standby apply.

-- 
With Regards,
Amit Kapila.




Re: Logical insert/update/delete WAL records for custom table AMs

2021-11-08 Thread Jeff Davis
On Fri, 2021-11-05 at 10:00 +0530, Amit Kapila wrote:
> I am not talking about decoding plugin but rather decoding itself,
> basically, the work we do in reorderbuffer.c, decode.c, etc. The two
> things I remember were tuple format and transaction ids as mentioned
> in my previous email. 

If it's difficult to come up with something that will work for all
table AMs, then it suggests that we might want to go towards fully
extensible rmgr, and have a decoding method in the rmgr.

I started a thread (with a patch) here: 


https://postgr.es/m/ed1fb2e22d15d3563ae0eb610f7b61bb15999c0a.ca...@j-davis.com

> I think we should try to find a solution for
> tuple format as the current decoding code relies on it if we want
> decoding to deal with another table AMs transparently.

Agreed, but it seems like we need basic extensibility first. For now,
we'll need to convert to a heap tuple, but later I'd like to support
other formats for the decoding plugin to work with.

Regards,
Jeff Davis






Re: Logical insert/update/delete WAL records for custom table AMs

2021-11-04 Thread Amit Kapila
On Fri, Nov 5, 2021 at 4:53 AM Jeff Davis  wrote:
>
> On Thu, 2021-11-04 at 14:33 +0530, Amit Kapila wrote:
> > Can't different tableAM's have different representations
> > for toast or may be some different concept like speculative
> > insertions?
>
> The decoding plugin should just use the common interfaces to toast, and
> if the table AM supports toast, everything should work fine.
>

I think that is true only if table AM uses same format as heap for
toast. It also seems to be relying heap tuple format.

> The only
> special thing it needs to do is check VARATT_IS_EXTERNAL_ONDISK(),
> because on-disk toast data can't be decoded (which is true for heap,
> too).
>
> I haven't looked as much into speculative insertions, but I don't think
> those are a problem either. Shouldn't they be handled before they make
> it into the change stream that the plugin sees?
>

They will be handled before the plugin sees them but I was talking
about what if the table AM has some other WAL similar to WAL of
speculative insertions that needs special handling.

> > Similarly, I remember that for zheap we didn't had
> > TransactionIds for subtransactions so we need to make some changes in
> > logical decoding to compensate for that. I guess similar stuff could
> > be required for another table AMs. Then a different table AM can have
> > a different tuple format which won't work for current change records
> > unless we convert it to heap tuple format before writing WAL for it.
>
> Columnar certainly has a different format. That makes me wonder whether
> ReorderBufferChange and/or ReorderBufferTupleBuf are too low-level, and
> we should have a higher-level representation of a change that is based
> on slots.
>
> Can you tell me more about the changes you made for zheap? I still
> don't understand why the decoding plugin would have to know what table
> AM the change came from.
>

I am not talking about decoding plugin but rather decoding itself,
basically, the work we do in reorderbuffer.c, decode.c, etc. The two
things I remember were tuple format and transaction ids as mentioned
in my previous email. I think we should try to find a solution for
tuple format as the current decoding code relies on it if we want
decoding to deal with another table AMs transparently.

-- 
With Regards,
Amit Kapila.




Re: Logical insert/update/delete WAL records for custom table AMs

2021-11-04 Thread Jeff Davis
On Thu, 2021-11-04 at 14:33 +0530, Amit Kapila wrote:
> Can't different tableAM's have different representations
> for toast or may be some different concept like speculative
> insertions?

The decoding plugin should just use the common interfaces to toast, and
if the table AM supports toast, everything should work fine. The only
special thing it needs to do is check VARATT_IS_EXTERNAL_ONDISK(),
because on-disk toast data can't be decoded (which is true for heap,
too).

I haven't looked as much into speculative insertions, but I don't think
those are a problem either. Shouldn't they be handled before they make
it into the change stream that the plugin sees?

> Similarly, I remember that for zheap we didn't had
> TransactionIds for subtransactions so we need to make some changes in
> logical decoding to compensate for that. I guess similar stuff could
> be required for another table AMs. Then a different table AM can have
> a different tuple format which won't work for current change records
> unless we convert it to heap tuple format before writing WAL for it.

Columnar certainly has a different format. That makes me wonder whether
ReorderBufferChange and/or ReorderBufferTupleBuf are too low-level, and
we should have a higher-level representation of a change that is based
on slots.

Can you tell me more about the changes you made for zheap? I still
don't understand why the decoding plugin would have to know what table
AM the change came from.

Regards,
Jeff Davis






Re: Logical insert/update/delete WAL records for custom table AMs

2021-11-04 Thread Amit Kapila
On Thu, Nov 4, 2021 at 7:09 AM Jeff Davis  wrote:
>
> On Wed, 2021-11-03 at 11:25 +0530, Amit Kapila wrote:
> > You have modeled DecodeLogicalInsert based on current DecodeInsert
> > and
> > it generates the same change message, so not sure how exactly these
> > new messages will be different from current heap_insert/update/delete
> > messages?
>
> Is there a reason you think the messages should be different for heap
> versus another table AM? Isn't the table AM a physical implementation
> detail?
>

We have special handling for speculative insertions and toast
insertions. Can't different tableAM's have different representations
for toast or may be some different concept like speculative
insertions? Similarly, I remember that for zheap we didn't had
TransactionIds for subtransactions so we need to make some changes in
logical decoding to compensate for that. I guess similar stuff could
be required for another table AMs. Then a different table AM can have
a different tuple format which won't work for current change records
unless we convert it to heap tuple format before writing WAL for it.

-- 
With Regards,
Amit Kapila.




Re: Logical insert/update/delete WAL records for custom table AMs

2021-11-03 Thread Jeff Davis
On Wed, 2021-11-03 at 11:25 +0530, Amit Kapila wrote:
> You have modeled DecodeLogicalInsert based on current DecodeInsert
> and
> it generates the same change message, so not sure how exactly these
> new messages will be different from current heap_insert/update/delete
> messages? 

Is there a reason you think the messages should be different for heap
versus another table AM? Isn't the table AM a physical implementation
detail?

Regards,
Jeff Davis






Re: Logical insert/update/delete WAL records for custom table AMs

2021-11-02 Thread Amit Kapila
On Sun, Oct 31, 2021 at 11:40 PM Jeff Davis  wrote:
>
> Attached is a WIP patch to add new WAL records to represent a logical
> insert, update, or delete. These records do not do anything at REDO
> time, they are only processed during logical decoding/replication.
>
> These are intended to be used by a custom table AM, like my columnar
> compression extension[0], which currently supports physical replication
> but can't support logical decoding/replication because decoding is not
> extensible. Using these new logical records would be redundant, making
> inserts/updates/deletes less efficient, but at least logical decoding
> would work (the lack of which is columnar's biggest weakness).
>
> Alternatively, we could support extensible WAL with extensible
> decoding. I also like this approach, but it takes more work for an AM
> like columnar to get that right -- it needs to keep additional state in
> the walsender to track and assemble the compressed columns stored
> across many blocks. It also requires a lot of care, because mistakes
> can get you into serious trouble.
>
> This proposal, for new logical records without WAL extensibility,
> provides a more shallow ramp to get a table AM working (including
> logical replication/decoding) without the need to invest in the WAL
> design. Later, of course I'd like the option for extensible WAL as well
> (to be more efficient), but right now I'd prefer it just worked
> (inefficiently).
>

You have modeled DecodeLogicalInsert based on current DecodeInsert and
it generates the same change message, so not sure how exactly these
new messages will be different from current heap_insert/update/delete
messages? Also, we have code to deal with different types of messages
at various places during decoding, so if they will be different then
we need to probably deal at those places as well.

-- 
With Regards,
Amit Kapila.




Logical insert/update/delete WAL records for custom table AMs

2021-10-31 Thread Jeff Davis
Attached is a WIP patch to add new WAL records to represent a logical
insert, update, or delete. These records do not do anything at REDO
time, they are only processed during logical decoding/replication.

These are intended to be used by a custom table AM, like my columnar
compression extension[0], which currently supports physical replication
but can't support logical decoding/replication because decoding is not
extensible. Using these new logical records would be redundant, making
inserts/updates/deletes less efficient, but at least logical decoding
would work (the lack of which is columnar's biggest weakness).

Alternatively, we could support extensible WAL with extensible
decoding. I also like this approach, but it takes more work for an AM
like columnar to get that right -- it needs to keep additional state in
the walsender to track and assemble the compressed columns stored
across many blocks. It also requires a lot of care, because mistakes
can get you into serious trouble.

This proposal, for new logical records without WAL extensibility,
provides a more shallow ramp to get a table AM working (including
logical replication/decoding) without the need to invest in the WAL
design. Later, of course I'd like the option for extensible WAL as well
(to be more efficient), but right now I'd prefer it just worked
(inefficiently).

The patch is still very rough, but I tried in simple insert cases in my
columnar[0] extension (which only supports insert, not update/delete).
I'm looking for some review on the approach and structure before I
polish and test it. Note that my main test case is columnar, which
doesn't support update/delete. Also note that the patch is against v14
(for now).

Regards,
Jeff Davis

[0] https://github.com/citusdata/citus/tree/master/src/backend/columnar
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index f88d72fd862..ed6dff179be 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -18,7 +18,7 @@ OBJS = \
 	gistdesc.o \
 	hashdesc.o \
 	heapdesc.o \
-	logicalmsgdesc.o \
+	logicaldesc.o \
 	mxactdesc.o \
 	nbtdesc.o \
 	relmapdesc.o \
diff --git a/src/backend/access/rmgrdesc/logicalmsgdesc.c b/src/backend/access/rmgrdesc/logicaldesc.c
similarity index 59%
rename from src/backend/access/rmgrdesc/logicalmsgdesc.c
rename to src/backend/access/rmgrdesc/logicaldesc.c
index d64ce2e7eff..079ab5a847e 100644
--- a/src/backend/access/rmgrdesc/logicalmsgdesc.c
+++ b/src/backend/access/rmgrdesc/logicaldesc.c
@@ -1,22 +1,22 @@
 /*-
  *
- * logicalmsgdesc.c
- *	  rmgr descriptor routines for replication/logical/message.c
+ * logicaldesc.c
+ *	  rmgr descriptor routines for replication/logical/logical_xlog.c
  *
  * Portions Copyright (c) 2015-2021, PostgreSQL Global Development Group
  *
  *
  * IDENTIFICATION
- *	  src/backend/access/rmgrdesc/logicalmsgdesc.c
+ *	  src/backend/access/rmgrdesc/logicaldesc.c
  *
  *-
  */
 #include "postgres.h"
 
-#include "replication/message.h"
+#include "replication/logical_xlog.h"
 
 void
-logicalmsg_desc(StringInfo buf, XLogReaderState *record)
+logical_desc(StringInfo buf, XLogReaderState *record)
 {
 	char	   *rec = XLogRecGetData(record);
 	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
@@ -40,13 +40,42 @@ logicalmsg_desc(StringInfo buf, XLogReaderState *record)
 			sep = " ";
 		}
 	}
+	else if (info == XLOG_LOGICAL_INSERT)
+	{
+
+	}
+	else if (info == XLOG_LOGICAL_UPDATE)
+	{
+
+	}
+	else if (info == XLOG_LOGICAL_DELETE)
+	{
+
+	}
+	else if (info == XLOG_LOGICAL_TRUNCATE)
+	{
+
+	}
 }
 
 const char *
-logicalmsg_identify(uint8 info)
+logical_identify(uint8 info)
 {
-	if ((info & ~XLR_INFO_MASK) == XLOG_LOGICAL_MESSAGE)
-		return "MESSAGE";
+	switch (info)
+	{
+		case XLOG_LOGICAL_MESSAGE:
+			return "MESSAGE";
+		case XLOG_LOGICAL_INSERT:
+			return "INSERT";
+		case XLOG_LOGICAL_UPDATE:
+			return "UPDATE";
+		case XLOG_LOGICAL_DELETE:
+			return "DELETE";
+		case XLOG_LOGICAL_TRUNCATE:
+			return "TRUNCATE";
+		default:
+			return NULL;
+	}
 
 	return NULL;
 }
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 58091f6b520..f31f7187ac4 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -24,7 +24,7 @@
 #include "commands/dbcommands_xlog.h"
 #include "commands/sequence.h"
 #include "commands/tablespace.h"
-#include "replication/message.h"
+#include "replication/logical_xlog.h"
 #include "replication/origin.h"
 #include "storage/standby.h"
 #include "utils/relmapper.h"
diff --git a/src/backend/replication/logical/Makefile b/src/backend/replication/logical/Makefile
index c4e2fdeb719..9fa281a1dca 100644
--- a/src/backend/replication/logical/Makefile
+++ b/src/backend/replication/logical/Makefile
@@ -19,7 +19,7 @@ OBJS = \
 	launcher.o \