Re: Logical replication - initial data synchronization

Bruce Momjian Wed, 16 Oct 2024 18:20:54 -0700

On Sat, May 18, 2024 at 09:02:11PM +0000, PG Doc comments form wrote:
> The following documentation comment has been logged on the website:
> 
> Page: https://www.postgresql.org/docs/16/logical-replication-subscription.html
> Description:
> 
> I'm reading up on Logical Replication and have been reading the pages in
> order.
> 
> The first 2 pages:
> https://www.postgresql.org/docs/current/logical-replication.html and
> https://www.postgresql.org/docs/current/logical-replication-publication.html
> both speak of the requirement to set up a snapshot and explain that
> publication will then send further updates as they happen to subscribers.
> 
> But the 3rd page,
> https://www.postgresql.org/docs/current/logical-replication-subscription.html
> now mentions this: "Additional replication slots may be required for the
> initial data synchronization of pre-existing table data and those will be
> dropped at the end of data synchronization."
> 
> For me, reading the first 2 pages implied that I would have to perform some
> manual command that starts the creation of a snapshot of pre-existing table
> data, and unpack this on the subscriber node somehow.
> 
> The text on the "Subscription" page sounds to me like this is actually
> something the publisher<-> subscriber model of the postgres software can
> manage on its own. As opposed to a snapshot, which feels more like the
> concept of a basebackup.
> 
> Regardless of that being correct or not, my current impression is that the
> description isn't consistent across pages. Maybe the text is obvious for
> people who've performed setup of logical replication before, but I have
> never done this. To me, the description on the first 2 pages seems
> inconsistent with the description I just encountered on the 3rd page. I was
> under the impression there was no such thing as "initial data
> synchronization of pre-existing table data" in terms of postgres doing this
> by itself.
> 
> Am I missing something extremely simple, or can the description of the
> involved operations be made more consistent across documentation pages?


Is the attached patch an improvement?

-- 
  Bruce Momjian  <[email protected]>        https://momjian.us
  EDB                                      https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 98a7ad0c272..cba15fce908 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -24,9 +24,9 @@
  </para>
 
  <para>
-  Logical replication of a table typically starts with taking a snapshot
+  Internally logical replication of a table starts by taking a snapshot
   of the data on the publisher database and copying that to the subscriber.
-  Once that is done, the changes on the publisher are sent to the subscriber
+  Once complete, the changes on the publisher are sent to the subscriber
   as they occur in real-time.  The subscriber applies the data in the same
   order as the publisher so that transactional consistency is guaranteed for
   publications within a single subscription.  This method of data replication
@@ -165,7 +165,7 @@
    The individual tables can be added and removed dynamically using
    <link linkend="sql-alterpublication"><command>ALTER PUBLICATION</command></link>.  Both the <literal>ADD
    TABLE</literal> and <literal>DROP TABLE</literal> operations are
-   transactional; so the table will start or stop replicating at the correct
+   transactional, so the table will start or stop replicating at the correct
    snapshot once the transaction has committed.
   </para>
  </sect1>
@@ -1953,8 +1953,8 @@ CONTEXT:  processing remote data for replication origin "pg_16395" during "INSER
   <title>Architecture</title>
 
   <para>
-   Logical replication starts by copying a snapshot of the data on the
-   publisher database.  Once that is done, changes on the publisher are sent
+   Internally logical replication starts by copying a snapshot of the data on the
+   publisher database.  Once complete, changes on the publisher are sent
    to the subscriber as they occur in real time.  The subscriber applies data
    in the order in which commits were made on the publisher so that
    transactional consistency is guaranteed for the publications within any

Re: Logical replication - initial data synchronization

Reply via email to