Re: Replica Identity check of partition table on subscriber
On Fri, Jun 17, 2022 at 11:22 AM shiy.f...@fujitsu.com wrote: > > On Fri Jun 17, 2022 11:06 AM shiy.f...@fujitsu.com > wrote: > > > > Attached the new version of patch set. I also moved the partitioned table > > check > > in logicalrep_rel_mark_updatable() to check_relation_updatable() as > > discussed > > [2]. > > > > Attached back-branch patches of the first patch. > One minor comment: + /* + * If it is a partitioned table, we don't check it, we will check its + * partition later. + */ Can we change the above comment to: "For partitioned tables, we only need to care if the target partition is updatable (aka has PK or RI defined for it)."? Apart from this looks good to me. I'll push this tomorrow unless there are any more suggestions/comments. -- With Regards, Amit Kapila.
Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths
On Sat, Jun 11, 2022 at 09:25:48PM +0200, Matthias van de Meent wrote: > Did you initiate a new cluster or otherwise skip the invalid record > you generated when running the instance based on master? It seems to > me you're trying to replay the invalid record (len > MaxAllocSize), > and this patch does not try to fix that issue. This patch just tries > to forbid emitting records larger than MaxAllocSize, as per the check > in XLogRecordAssemble, so that we wont emit unreadable records into > the WAL anymore. > > Reading unreadable records still won't be possible, but that's also > not something I'm trying to fix. As long as you cannot generate such WAL records that should be fine as wAL is not reused across upgrades, so this kind of restriction is a no-brainer on HEAD. The back-patching argument is not on the table anyway, as some of the routine signatures change with the unsigned arguments, because of those safety checks. + if (unlikely(num_rdatas >= max_rdatas) || + unlikely(!AllocSizeIsValid((uint64) mainrdata_len + (uint64) len))) + XLogErrorDataLimitExceeded(); [...] +inline void +XLogErrorDataLimitExceeded() +{ + elog(ERROR, "too much WAL data"); +} The three checks are different, OK.. Note that static is missing. + if (unlikely(!AllocSizeIsValid(total_len))) + XLogErrorDataLimitExceeded(); Rather than a single check at the end of XLogRecordAssemble(), you'd better look after that each time total_len is added up? -- Michael signature.asc Description: PGP signature
RE: Handle infinite recursion in logical replication setup
On Thu, Jun 16, 2022 6:18 PM vignesh C wrote: > > Thanks for the comments, the attached v21 patch has the changes for the > same. > Thanks for updating the patch. Here are some comments. 0002 patch == 1. + publisher to only send changes that originated locally. Setting + origin to any means that that + the publisher sends any changes regardless of their origin. The + default is any. It seems there's a redundant "that" at the end of second line. 2. + + + suborigin text + + + Possible origin values are local or + any. The default is any. + If local, the subscription will request the publisher + to only send changes that originated locally. If any + the publisher sends any changes regardless of their origin. + + . A comma can be added after "If any". 3. @@ -4589,6 +4598,8 @@ dumpSubscription(Archive *fout, const SubscriptionInfo *subinfo) if (strcmp(subinfo->subdisableonerr, "t") == 0) appendPQExpBufferStr(query, ", disable_on_error = true"); + appendPQExpBuffer(query, ", origin = %s", subinfo->suborigin); + if (strcmp(subinfo->subsynccommit, "off") != 0) appendPQExpBuffer(query, ", synchronous_commit = %s", fmtId(subinfo->subsynccommit)); Do we need to append anything if it's the default value ("any")? I saw that some other parameters, they will be appended only if they are not the default value. 0003 patch == 1. in create_subscription.sgml: (You cannot combine setting connect to false with setting create_slot, enabled, or copy_data to true.) In this description about "connect" parameter in CREATE SUBSCIPTION document, maybe it would be better to change "copy_data to true" to "copy_data to true/force". 2. + appendStringInfoString(, + "SELECT DISTINCT N.nspname AS schemaname,\n" + " C.relname AS tablename,\n" + " PS.srrelid as replicated\n" + "FROM pg_publication P,\n" + " LATERAL pg_get_publication_tables(P.pubname) GPT\n" + " LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n" + " pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)\n" + "WHERE C.oid = GPT.relid AND P.pubname IN ("); "PS.srrelid as replicated" can be modified to "PS.srrelid AS replicated". Besides, I think we can filter out the tables which are not subscribing data in this SQL statement, then later processing can be simplified. Something like: SELECT DISTINCT N.nspname AS schemaname, C.relname AS tablename FROM pg_publication P, LATERAL pg_get_publication_tables(P.pubname) GPT LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid), pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE C.oid = GPT.relid AND P.pubname IN ('pa') AND PS.srrelid IS NOT NULL; 0004 patch == 1. Generic steps for adding a new node to an existing set of nodes +Step-2: Lock the required tables of the new node in EXCLUSIVE mode until +the setup is complete. (This lock is necessary to prevent any modifications +Step-4. Lock the required tables of the existing nodes except the first node +in EXCLUSIVE mode until the setup is complete. (This lock is necessary to Should "in EXCLUSIVE mode" be modified to "in EXCLUSIVE mode"? 2. Generic steps for adding a new node to an existing set of nodes +data. There is no need to lock the required tables on +node1 because any data changes made will be synchronized +while creating the subscription with copy_data = force). I think it would be better to say "on the first node" here, instead of "node1", because in this section, node1 is not mentioned before. Regards, Shi yu
RE: Support logical replication of DDLs
On Saturday, June 18, 2022 3:38 AM Zheng Li wrote: > On Wed, Jun 15, 2022 at 12:00 AM Amit Kapila > wrote: > > > > On Wed, Jun 15, 2022 at 5:44 AM Zheng Li wrote: > > > > > > > > > While I agree that the deparser is needed to handle the potential > > > syntax differences between the pub/sub, I think it's only relevant > > > for the use cases where only a subset of tables in the database are > > > replicated. For other use cases where all tables, functions and > > > other objects need to be replicated, (for example, creating a > > > logical replica for major version upgrade) there won't be any syntax > > > difference to handle and the schemas are supposed to match exactly > > > between the pub/sub. In other words the user seeks to create an > > > identical replica of the source database and the DDLs should be > > > replicated as is in this case. > > > > > > > I think even for database-level replication we can't assume that > > source and target will always have the same data in which case "Create > > Table As ..", "Alter Table .. " kind of statements can't be replicated > > as it is because that can lead to different results. > "Create Table As .." is already handled by setting the skipData flag of the > statement parsetreee before replay: > > /* > * Force skipping data population to avoid data inconsistency. > * Data should be replicated from the publisher instead. > */ > castmt->into->skipData = true; > > "Alter Table .. " that rewrites with volatile expressions can also be handled > without any syntax change, by enabling the table rewrite replication and > converting the rewrite inserts to updates. ZJ's patch introduced this > solution. > I've also adopted this approach in my latest patch > 0012-Support-replication-of-ALTER-TABLE-commands-that-rew.patch > > > The other point > > is tomorrow we can extend the database level option/syntax to exclude > > a few objects (something like [1]) as well in which case we again need > > to filter at the publisher level > > I think for such cases it's not full database replication and we could treat > it as > table level DDL replication, i.e. use the the deparser format. Hi, Here are some points in my mind about the two approaches discussed here. 1) search_patch vs schema qualify Again, I still think it will bring more flexibility and security by schema qualify the objects in DDL command as mentioned before[1]. Besides, a schema qualified DDL is also more appropriate for other use cases(e.g. a table-level replication). As it's possible the schema is different between pub/sub and it's easy to cause unexpected and undetectable failure if we just log the search_path. It makes more sense to me to have the same style WAL log(schema qualified) for both database level or table level replication as it will bring more flexibility. > "Create Table As .." is already handled by setting the skipData flag of the > statement parsetreee before replay: 2) About the handling of CREATE TABLE AS: I think it's not a appropriate approach to set the skipdata flag on subscriber as it cannot handle EXECUTE command in CTAS. CREATE TABLE q5_prep_results AS EXECUTE q5(200, 'DT'); The Prepared statement is a temporary object which we don't replicate. So if you directly execute the original SQL on subscriber, even if you set skipdata it will fail. I think it difficult to make this work as you need handle the create/drop of this prepared statement. And even if we extended subscriber's code to make it work, it doesn't seems like a standard and elegant approach. > "Alter Table .. " that rewrites with volatile expressions can also be handled > without any syntax change, by enabling the table rewrite replication and > converting the rewrite inserts to updates. ZJ's patch introduced this > solution. 3) About the handling of ALTER TABLE rewrite. The approach I proposed before is based on the event trigger + deparser approach. We were able to improve that approach as we don't need to replicate the rewrite in many cases. For example: we don't need to replicate rewrite dml if there is no volatile/mutable function. We should check and filter these case at publisher (e.g. via deparser) instead of checking that at subscriber. Besides, as discussed, we need to give warning or error for the cases when DDL contains volatile function which would be executed[2]. We should check this at publisher as well(via deparser). > I think for such cases it's not full database replication and we could treat > it as > table level DDL replication, i.e. use the the deparser format. 4) I think the point could be that we should make the WAL log format extendable so that we can extend it to support more useful feature(table filter/schema maps/DDL filter). If we just WAL log the original SQL, it seems it's difficult to extend it in the future ? [1] https://www.postgresql.org/message-id/202204081134.6tcmf5cxl3sz%40alvherre.pgsql [2]
Re: Perform streaming logical transactions by background workers and parallel apply
On Fri, Jun 17, 2022 at 12:47 PM wangw.f...@fujitsu.com wrote: > > On Wed, Jun 15, 2022 at 8:13 PM Amit Kapila wrote: > > Few questions/comments on 0001 > > === > Thanks for your comments. > > > 1. > > In the commit message, I see: "We also need to allow stream_stop to > > complete by the apply background worker to avoid deadlocks because > > T-1's current stream of changes can update rows in conflicting order > > with T-2's next stream of changes." > > > > Thinking about this, won't the T-1 and T-2 deadlock on the publisher > > node as well if the above statement is true? > Yes, I think so. > I think if table's unique index/constraint of the publisher and the subscriber > are consistent, the deadlock will occur on the publisher-side. > If it is inconsistent, deadlock may only occur in the subscriber. But since we > added the check for these (see patch 0004), so it seems okay to not handle > this > at STREAM_STOP. > > BTW, I made the following improvements to the code (#a, #c are improved in > 0004 > patch, #b, #d and #e are improved in 0001 patch.) : > a. > I added some comments in the function apply_handle_stream_stop to explain why > we do not need to allow stream_stop to complete by the apply background > worker. > I have improved the comments in this and other related sections of the patch. See attached. > > > > 3. > > + > > + > > + Setting streaming mode to apply could export invalid > > LSN > > + as finish LSN of failed transaction. Changing the streaming mode and > > making > > + the same conflict writes the finish LSN of the failed transaction in the > > + server log if required. > > + > > > > How will the user identify that this is an invalid LSN value and she > > shouldn't use it to SKIP the transaction? Can we change the second > > sentence to: "User should change the streaming mode to 'on' if they > > would instead wish to see the finish LSN on error. Users can use > > finish LSN to SKIP applying the transaction." I think we can give > > reference to docs where the SKIP feature is explained. > Improved the sentence as suggested. > You haven't answered first part of the comment: "How will the user identify that this is an invalid LSN value and she shouldn't use it to SKIP the transaction?". Have you checked what value it displays? For example, in one of the case in apply_error_callback as shown in below code, we don't even display finish LSN if it is invalid. else if (XLogRecPtrIsInvalid(errarg->finish_lsn)) errcontext("processing remote data for replication origin \"%s\" during \"%s\" in transaction %u", errarg->origin_name, logicalrep_message_type(errarg->command), errarg->remote_xid); -- With Regards, Amit Kapila. improve_comments_1.patch Description: Binary data
Re: [PATCH] Completed unaccent dictionary with many missing characters
On Wed, Jun 15, 2022 at 01:01:37PM +0200, Przemysław Sztoch wrote: > Two fixes (bad comment and fixed Latin-ASCII.xml). if codepoint.general_category.startswith('L') and \ - len(codepoint.combining_ids) > 1: + len(codepoint.combining_ids) > 0: So, this one checks for the case where a codepoint is within the letter category. As far as I can see this indeed adds a couple of characters, with a combination of Greek and Latin letters. So that looks fine. +elif codepoint.general_category.startswith('N') and \ + len(codepoint.combining_ids) > 0 and \ + args.noLigaturesExpansion is False and is_ligature(codepoint, table): +charactersSet.add((codepoint.id, + "".join(chr(combining_codepoint.id) + for combining_codepoint + in get_plain_letters(codepoint, table And this one is for the numerical part of the change. Do you actually need to apply is_ligature() here? I would have thought that this only applies to letters. -assert(False) +assert False, 'Codepoint U+%0.2X' % codepoint.id [...] -assert(is_ligature(codepoint, table)) +assert is_ligature(codepoint, table), 'Codepoint U+%0.2X' % codepoint.id These two are a good idea for debugging. -return all(is_letter(table[i], table) for i in codepoint.combining_ids) +return all(i in table and is_letter(table[i], table) for i in codepoint.combining_ids) It looks like this makes the code weaker, as we would silently skip characters that are not part of the table rather than checking for them all the time? While recreating unaccent.rules with your patch, I have noticed what looks like an error. An extra rule mapping U+210C (black-letter capital h) to "x" gets added on top of te existing one for "H", but the correct answer is the existing rule, not the one added by the patch. -- Michael signature.asc Description: PGP signature
Re: [PoC] Improve dead tuple storage for lazy vacuum
Hi, On Thu, Jun 16, 2022 at 4:30 PM John Naylor wrote: > > On Thu, Jun 16, 2022 at 11:57 AM Masahiko Sawada > wrote: > > I've attached an updated version patch that changes the configure > > script. I'm still studying how to support AVX2 on msvc build. Also, > > added more regression tests. > > Thanks for the update, I will take a closer look at the patch in the > near future, possibly next week. Thanks! > For now, though, I'd like to question > why we even need to use 32-byte registers in the first place. For one, > the paper referenced has 16-pointer nodes, but none for 32 (next level > is 48 and uses a different method to find the index of the next > pointer). Andres' prototype has 32-pointer nodes, but in a quick read > of his patch a couple weeks ago I don't recall a reason mentioned for > it. I might be wrong but since AVX2 instruction set is introduced in Haswell microarchitecture in 2013 and the referenced paper is published in the same year, the art didn't use AVX2 instruction set. 32-pointer nodes are better from a memory perspective as you mentioned. Andres' prototype supports both 16-pointer nodes and 32-pointer nodes (out of 6 node types). This would provide better memory usage but on the other hand, it would also bring overhead of switching the node type. Anyway, it's an important design decision to support which size of node to support. It should be done based on experiment results and documented. > Even if 32-pointer nodes are better from a memory perspective, I > imagine it should be possible to use two SSE2 registers to find the > index. It'd be locally slightly more complex, but not much. It might > not even cost much more in cycles since AVX2 would require indirecting > through a function pointer. It's much more convenient if we don't need > a runtime check. Right. > There are also thermal and power disadvantages when > using AXV2 in some workloads. I'm not sure that's the case here, but > if it is, we'd better be getting something in return. Good point. > One more thing in general: In an earlier version, I noticed that > Andres used the slab allocator and documented why. The last version of > your patch that I saw had the same allocator, but not the "why". > Especially in early stages of review, we want to document design > decisions so it's more clear for the reader. Indeed. I'll add comments in the next version patch. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Add header support to text format and matching feature
On Thu, Jun 16, 2022 at 09:24:56AM +0200, Peter Eisentraut wrote: > I don't feel very strongly about this. It makes sense to stay consistent > with the existing COPY code. Yes, my previous argument is based on consistency with the surroundings. I am not saying that this could not be made better, it surely can, but I would recommend to tackle that in a separate patch, and apply that to more areas than this specific one. -- Michael signature.asc Description: PGP signature
Re: libpq: Remove redundant null pointer checks before free()
Alvaro Herrera writes: > For PQclear() specifically, one thing that I thought a few days ago > would be useful would to have it return (PGresult *) NULL. Then the > very common pattern of doing "PQclear(res); res = NULL;" could be > simplified to "res = PQclear(res);", which is nicely compact and is > learned instantly. That's a public API unfortunately, and so some people would demand a libpq.so major version bump if we changed it. regards, tom lane
Re: libpq: Remove redundant null pointer checks before free()
On 2022-Jun-17, Peter Eisentraut wrote: > From 355ef1a68be690d9bb8ee0524226abd648733ce0 Mon Sep 17 00:00:00 2001 > From: Peter Eisentraut > Date: Fri, 17 Jun 2022 12:09:32 +0200 > Subject: [PATCH v2 3/3] Remove redundant null pointer checks before PQclear > and PQconninfofree > > These functions already had the free()-like behavior of handling NULL > pointers as a no-op. But it wasn't documented, so add it explicitly > to the documentation, too. For PQclear() specifically, one thing that I thought a few days ago would be useful would to have it return (PGresult *) NULL. Then the very common pattern of doing "PQclear(res); res = NULL;" could be simplified to "res = PQclear(res);", which is nicely compact and is learned instantly. I've not seen this convention used anywhere else though, and I'm not sure I'd advocate it for other functions where we use similar patterns such as pfree/pg_free, so perhaps it'd become too much of a special case. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/