Re: Replica Identity check of partition table on subscriber

2022-06-19 Thread Amit Kapila
On Fri, Jun 17, 2022 at 11:22 AM shiy.f...@fujitsu.com
 wrote:
>
> On Fri Jun 17, 2022 11:06 AM shiy.f...@fujitsu.com  
> wrote:
> >
> > Attached the new version of patch set. I also moved the partitioned table
> > check
> > in logicalrep_rel_mark_updatable() to check_relation_updatable() as
> > discussed
> > [2].
> >
>
> Attached back-branch patches of the first patch.
>

One minor comment:
+ /*
+ * If it is a partitioned table, we don't check it, we will check its
+ * partition later.
+ */

Can we change the above comment to: "For partitioned tables, we only
need to care if the target partition is updatable (aka has PK or RI
defined for it)."?

Apart from this looks good to me. I'll push this tomorrow unless there
are any more suggestions/comments.

-- 
With Regards,
Amit Kapila.




Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

2022-06-19 Thread Michael Paquier
On Sat, Jun 11, 2022 at 09:25:48PM +0200, Matthias van de Meent wrote:
> Did you initiate a new cluster or otherwise skip the invalid record
> you generated when running the instance based on master? It seems to
> me you're trying to replay the invalid record (len > MaxAllocSize),
> and this patch does not try to fix that issue. This patch just tries
> to forbid emitting records larger than MaxAllocSize, as per the check
> in XLogRecordAssemble, so that we wont emit unreadable records into
> the WAL anymore.
> 
> Reading unreadable records still won't be possible, but that's also
> not something I'm trying to fix.

As long as you cannot generate such WAL records that should be fine as
wAL is not reused across upgrades, so this kind of restriction is a
no-brainer on HEAD.  The back-patching argument is not on the table
anyway, as some of the routine signatures change with the unsigned
arguments, because of those safety checks.

+   if (unlikely(num_rdatas >= max_rdatas) ||
+   unlikely(!AllocSizeIsValid((uint64) mainrdata_len + (uint64) len)))
+   XLogErrorDataLimitExceeded();
[...]
+inline void
+XLogErrorDataLimitExceeded()
+{
+   elog(ERROR, "too much WAL data");
+}
The three checks are different, OK..  Note that static is missing.

+   if (unlikely(!AllocSizeIsValid(total_len)))
+   XLogErrorDataLimitExceeded();
Rather than a single check at the end of XLogRecordAssemble(), you'd
better look after that each time total_len is added up?
--
Michael


signature.asc
Description: PGP signature


RE: Handle infinite recursion in logical replication setup

2022-06-19 Thread shiy.f...@fujitsu.com
On Thu, Jun 16, 2022 6:18 PM vignesh C  wrote:
> 
> Thanks for the comments, the attached v21 patch has the changes for the
> same.
> 

Thanks for updating the patch. Here are some comments.

0002 patch
==
1.
+  publisher to only send changes that originated locally. Setting
+  origin to any means that that
+  the publisher sends any changes regardless of their origin. The
+  default is any.

It seems there's a redundant "that" at the end of second line.

2.
+
+  
+   suborigin text
+  
+  
+   Possible origin values are local or
+   any. The default is any.
+   If local, the subscription will request the publisher
+   to only send changes that originated locally. If any
+   the publisher sends any changes regardless of their origin.
+  
+ .

A comma can be added after "If any".

3.
@@ -4589,6 +4598,8 @@ dumpSubscription(Archive *fout, const SubscriptionInfo 
*subinfo)
if (strcmp(subinfo->subdisableonerr, "t") == 0)
appendPQExpBufferStr(query, ", disable_on_error = true");
 
+   appendPQExpBuffer(query, ", origin = %s", subinfo->suborigin);
+
if (strcmp(subinfo->subsynccommit, "off") != 0)
appendPQExpBuffer(query, ", synchronous_commit = %s", 
fmtId(subinfo->subsynccommit));

Do we need to append anything if it's the default value ("any")? I saw that some
other parameters, they will be appended only if they are not the default value.

0003 patch
==
1. 
in create_subscription.sgml:
  (You cannot combine setting connect
  to false with
  setting create_slot, enabled,
  or copy_data to true.)

In this description about "connect" parameter in CREATE SUBSCIPTION document,
maybe it would be better to change "copy_data to true" to "copy_data to
true/force".

2.
+   appendStringInfoString(,
+  "SELECT DISTINCT N.nspname 
AS schemaname,\n"
+  "
C.relname AS tablename,\n"
+  "
PS.srrelid as replicated\n"
+  "FROM pg_publication P,\n"
+  " LATERAL 
pg_get_publication_tables(P.pubname) GPT\n"
+  " LEFT JOIN 
pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
+  " pg_class C JOIN 
pg_namespace N ON (N.oid = C.relnamespace)\n"
+  "WHERE C.oid = GPT.relid AND 
P.pubname IN (");

"PS.srrelid as replicated" can be modified to "PS.srrelid AS replicated".

Besides, I think we can filter out the tables which are not subscribing data in
this SQL statement, then later processing can be simplified.

Something like:
SELECT DISTINCT N.nspname AS schemaname,
C.relname AS tablename
FROM pg_publication P,
 LATERAL pg_get_publication_tables(P.pubname) GPT
 LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),
 pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.oid = GPT.relid AND P.pubname IN ('pa') AND PS.srrelid IS NOT NULL;

0004 patch
==
1. Generic steps for adding a new node to an existing set of nodes

+Step-2: Lock the required tables of the new node in EXCLUSIVE mode until
+the setup is complete. (This lock is necessary to prevent any modifications

+Step-4. Lock the required tables of the existing nodes except the first 
node
+in EXCLUSIVE mode until the setup is complete. (This lock is necessary to

Should "in EXCLUSIVE mode" be modified to "in EXCLUSIVE
mode"?

2. Generic steps for adding a new node to an existing set of nodes

+data. There is no need to lock the required tables on
+node1 because any data changes made will be synchronized
+while creating the subscription with copy_data = force).

I think it would be better to say "on the first node" here, instead of "node1",
because in this section, node1 is not mentioned before.

Regards,
Shi yu




RE: Support logical replication of DDLs

2022-06-19 Thread houzj.f...@fujitsu.com
On Saturday, June 18, 2022 3:38 AM Zheng Li  wrote:
> On Wed, Jun 15, 2022 at 12:00 AM Amit Kapila 
> wrote:
> >
> > On Wed, Jun 15, 2022 at 5:44 AM Zheng Li  wrote:
> > >
> > >
> > > While I agree that the deparser is needed to handle the potential
> > > syntax differences between the pub/sub, I think it's only relevant
> > > for the use cases where only a subset of tables in the database are
> > > replicated. For other use cases where all tables, functions and
> > > other objects need to be replicated, (for example, creating a
> > > logical replica for major version upgrade) there won't be any syntax
> > > difference to handle and the schemas are supposed to match exactly
> > > between the pub/sub. In other words the user seeks to create an
> > > identical replica of the source database and the DDLs should be
> > > replicated as is in this case.
> > >
> >
> > I think even for database-level replication we can't assume that
> > source and target will always have the same data in which case "Create
> > Table As ..", "Alter Table .. " kind of statements can't be replicated
> > as it is because that can lead to different results.
> "Create Table As .." is already handled by setting the skipData flag of the
> statement parsetreee before replay:
> 
> /*
> * Force skipping data population to avoid data inconsistency.
> * Data should be replicated from the publisher instead.
> */
> castmt->into->skipData = true;
> 
> "Alter Table .. " that rewrites with volatile expressions can also be handled
> without any syntax change, by enabling the table rewrite replication and
> converting the rewrite inserts to updates. ZJ's patch introduced this 
> solution.
> I've also adopted this approach in my latest patch
> 0012-Support-replication-of-ALTER-TABLE-commands-that-rew.patch
> 
> > The other point
> > is tomorrow we can extend the database level option/syntax to exclude
> > a few objects (something like [1]) as well in which case we again need
> > to filter at the publisher level
> 
> I think for such cases it's not full database replication and we could treat 
> it as
> table level DDL replication, i.e. use the the deparser format.

Hi,

Here are some points in my mind about the two approaches discussed here.

1) search_patch vs schema qualify

Again, I still think it will bring more flexibility and security by schema 
qualify the
objects in DDL command as mentioned before[1].

Besides, a schema qualified DDL is also more appropriate for other use
cases(e.g. a table-level replication). As it's possible the schema is different
between pub/sub and it's easy to cause unexpected and undetectable failure if
we just log the search_path. 

It makes more sense to me to have the same style WAL log(schema qualified) for
both database level or table level replication as it will bring more
flexibility.


> "Create Table As .." is already handled by setting the skipData flag of the
> statement parsetreee before replay:

2) About the handling of CREATE TABLE AS:

I think it's not a appropriate approach to set the skipdata flag on subscriber
as it cannot handle EXECUTE command in CTAS.

CREATE TABLE q5_prep_results AS EXECUTE q5(200, 'DT');

The Prepared statement is a temporary object which we don't replicate. So if
you directly execute the original SQL on subscriber, even if you set skipdata
it will fail.

I think it difficult to make this work as you need handle the create/drop of
this prepared statement. And even if we extended subscriber's code to make it
work, it doesn't seems like a standard and elegant approach.


> "Alter Table .. " that rewrites with volatile expressions can also be handled
> without any syntax change, by enabling the table rewrite replication and
> converting the rewrite inserts to updates. ZJ's patch introduced this 
> solution.

3) About the handling of ALTER TABLE rewrite.

The approach I proposed before is based on the event trigger + deparser
approach. We were able to improve that approach as we don't need to replicate
the rewrite in many cases. For example: we don't need to replicate rewrite dml
if there is no volatile/mutable function. We should check and filter these case
at publisher (e.g. via deparser) instead of checking that at subscriber.

Besides, as discussed, we need to give warning or error for the cases when DDL
contains volatile function which would be executed[2]. We should check this at
publisher as well(via deparser).


> I think for such cases it's not full database replication and we could treat 
> it as
> table level DDL replication, i.e. use the the deparser format.

4) I think the point could be that we should make the WAL log format extendable
so that we can extend it to support more useful feature(table filter/schema
maps/DDL filter). If we just WAL log the original SQL, it seems it's difficult
to extend it in the future ?

[1] 
https://www.postgresql.org/message-id/202204081134.6tcmf5cxl3sz%40alvherre.pgsql
[2] 

Re: Perform streaming logical transactions by background workers and parallel apply

2022-06-19 Thread Amit Kapila
On Fri, Jun 17, 2022 at 12:47 PM wangw.f...@fujitsu.com
 wrote:
>
> On Wed, Jun 15, 2022 at 8:13 PM Amit Kapila  wrote:
> > Few questions/comments on 0001
> > ===
> Thanks for your comments.
>
> > 1.
> > In the commit message, I see: "We also need to allow stream_stop to
> > complete by the apply background worker to avoid deadlocks because
> > T-1's current stream of changes can update rows in conflicting order
> > with T-2's next stream of changes."
> >
> > Thinking about this, won't the T-1 and T-2 deadlock on the publisher
> > node as well if the above statement is true?
> Yes, I think so.
> I think if table's unique index/constraint of the publisher and the subscriber
> are consistent, the deadlock will occur on the publisher-side.
> If it is inconsistent, deadlock may only occur in the subscriber. But since we
> added the check for these (see patch 0004), so it seems okay to not handle 
> this
> at STREAM_STOP.
>
> BTW, I made the following improvements to the code (#a, #c are improved in 
> 0004
> patch, #b, #d and #e are improved in 0001 patch.) :
> a.
> I added some comments in the function apply_handle_stream_stop to explain why
> we do not need to allow stream_stop to complete by the apply background 
> worker.
>

I have improved the comments in this and other related sections of the
patch. See attached.

>
>
> > 3.
> > +
> > +  
> > +   Setting streaming mode to apply could export invalid 
> > LSN
> > +   as finish LSN of failed transaction. Changing the streaming mode and 
> > making
> > +   the same conflict writes the finish LSN of the failed transaction in the
> > +   server log if required.
> > +  
> >
> > How will the user identify that this is an invalid LSN value and she
> > shouldn't use it to SKIP the transaction? Can we change the second
> > sentence to: "User should change the streaming mode to 'on' if they
> > would instead wish to see the finish LSN on error. Users can use
> > finish LSN to SKIP applying the transaction." I think we can give
> > reference to docs where the SKIP feature is explained.
> Improved the sentence as suggested.
>

You haven't answered first part of the comment: "How will the user
identify that this is an invalid LSN value and she shouldn't use it to
SKIP the transaction?". Have you checked what value it displays? For
example, in one of the case in apply_error_callback as shown in below
code, we don't even display finish LSN if it is invalid.
else if (XLogRecPtrIsInvalid(errarg->finish_lsn))
errcontext("processing remote data for replication origin \"%s\"
during \"%s\" in transaction %u",
   errarg->origin_name,
   logicalrep_message_type(errarg->command),
   errarg->remote_xid);

-- 
With Regards,
Amit Kapila.


improve_comments_1.patch
Description: Binary data


Re: [PATCH] Completed unaccent dictionary with many missing characters

2022-06-19 Thread Michael Paquier
On Wed, Jun 15, 2022 at 01:01:37PM +0200, Przemysław Sztoch wrote:
> Two fixes (bad comment and fixed Latin-ASCII.xml).

 if codepoint.general_category.startswith('L') and \
-   len(codepoint.combining_ids) > 1:
+   len(codepoint.combining_ids) > 0:
So, this one checks for the case where a codepoint is within the
letter category.  As far as I can see this indeed adds a couple of
characters, with a combination of Greek and Latin letters.  So that
looks fine.

+elif codepoint.general_category.startswith('N') and \
+   len(codepoint.combining_ids) > 0 and \
+   args.noLigaturesExpansion is False and is_ligature(codepoint, 
table):
+charactersSet.add((codepoint.id,
+   "".join(chr(combining_codepoint.id)
+   for combining_codepoint
+   in get_plain_letters(codepoint, 
table
And this one is for the numerical part of the change.  Do you actually
need to apply is_ligature() here?  I would have thought that this only
applies to letters.

-assert(False)
+assert False, 'Codepoint U+%0.2X' % codepoint.id
[...]
-assert(is_ligature(codepoint, table))
+assert is_ligature(codepoint, table), 'Codepoint U+%0.2X' % codepoint.id
These two are a good idea for debugging.

-return all(is_letter(table[i], table) for i in codepoint.combining_ids)
+return all(i in table and is_letter(table[i], table) for i in 
codepoint.combining_ids)
It looks like this makes the code weaker, as we would silently skip
characters that are not part of the table rather than checking for
them all the time?

While recreating unaccent.rules with your patch, I have noticed what
looks like an error.  An extra rule mapping U+210C (black-letter
capital h) to "x" gets added on top of te existing one for "H", but
the correct answer is the existing rule, not the one added by the
patch.
--
Michael


signature.asc
Description: PGP signature


Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-06-19 Thread Masahiko Sawada
Hi,

On Thu, Jun 16, 2022 at 4:30 PM John Naylor
 wrote:
>
> On Thu, Jun 16, 2022 at 11:57 AM Masahiko Sawada  
> wrote:
> > I've attached an updated version patch that changes the configure
> > script. I'm still studying how to support AVX2 on msvc build. Also,
> > added more regression tests.
>
> Thanks for the update, I will take a closer look at the patch in the
> near future, possibly next week.

Thanks!

> For now, though, I'd like to question
> why we even need to use 32-byte registers in the first place. For one,
> the paper referenced has 16-pointer nodes, but none for 32 (next level
> is 48 and uses a different method to find the index of the next
> pointer). Andres' prototype has 32-pointer nodes, but in a quick read
> of his patch a couple weeks ago I don't recall a reason mentioned for
> it.

I might be wrong but since AVX2 instruction set is introduced in
Haswell microarchitecture in 2013 and the referenced paper is
published in the same year, the art didn't use AVX2 instruction set.
32-pointer nodes are better from a memory perspective as you
mentioned. Andres' prototype supports both 16-pointer nodes and
32-pointer nodes (out of 6 node types). This would provide better
memory usage but on the other hand, it would also bring overhead of
switching the node type. Anyway, it's an important design decision to
support which size of node to support. It should be done based on
experiment results and documented.

> Even if 32-pointer nodes are better from a memory perspective, I
> imagine it should be possible to use two SSE2 registers to find the
> index. It'd be locally slightly more complex, but not much. It might
> not even cost much more in cycles since AVX2 would require indirecting
> through a function pointer. It's much more convenient if we don't need
> a runtime check.

Right.

> There are also thermal and power disadvantages when
> using AXV2 in some workloads. I'm not sure that's the case here, but
> if it is, we'd better be getting something in return.

Good point.

> One more thing in general: In an earlier version, I noticed that
> Andres used the slab allocator and documented why. The last version of
> your patch that I saw had the same allocator, but not the "why".
> Especially in early stages of review, we want to document design
> decisions so it's more clear for the reader.

Indeed. I'll add comments in the next version patch.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Add header support to text format and matching feature

2022-06-19 Thread Michael Paquier
On Thu, Jun 16, 2022 at 09:24:56AM +0200, Peter Eisentraut wrote:
> I don't feel very strongly about this.  It makes sense to stay consistent
> with the existing COPY code.

Yes, my previous argument is based on consistency with the
surroundings.  I am not saying that this could not be made better, it
surely can, but I would recommend to tackle that in a separate patch,
and apply that to more areas than this specific one.
--
Michael


signature.asc
Description: PGP signature


Re: libpq: Remove redundant null pointer checks before free()

2022-06-19 Thread Tom Lane
Alvaro Herrera  writes:
> For PQclear() specifically, one thing that I thought a few days ago
> would be useful would to have it return (PGresult *) NULL.  Then the
> very common pattern of doing "PQclear(res); res = NULL;" could be
> simplified to "res = PQclear(res);", which is nicely compact and is
> learned instantly.

That's a public API unfortunately, and so some people would demand
a libpq.so major version bump if we changed it.

regards, tom lane




Re: libpq: Remove redundant null pointer checks before free()

2022-06-19 Thread Alvaro Herrera
On 2022-Jun-17, Peter Eisentraut wrote:

> From 355ef1a68be690d9bb8ee0524226abd648733ce0 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut 
> Date: Fri, 17 Jun 2022 12:09:32 +0200
> Subject: [PATCH v2 3/3] Remove redundant null pointer checks before PQclear
>  and PQconninfofree
> 
> These functions already had the free()-like behavior of handling NULL
> pointers as a no-op.  But it wasn't documented, so add it explicitly
> to the documentation, too.

For PQclear() specifically, one thing that I thought a few days ago
would be useful would to have it return (PGresult *) NULL.  Then the
very common pattern of doing "PQclear(res); res = NULL;" could be
simplified to "res = PQclear(res);", which is nicely compact and is
learned instantly.

I've not seen this convention used anywhere else though, and I'm not
sure I'd advocate it for other functions where we use similar patterns
such as pfree/pg_free, so perhaps it'd become too much of a special
case.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/