Re: Add macros for ReorderBufferTXN toptxn
Hi, On Wed, Mar 15, 2023 at 8:55 AM Peter Smith wrote: > > On Tue, Mar 14, 2023 at 10:43 PM Amit Kapila wrote: > > > > On Tue, Mar 14, 2023 at 12:37 PM Peter Smith wrote: > > > > > > Thanks for the review! > > > > > > On Mon, Mar 13, 2023 at 6:19 PM vignesh C wrote: > > > ... > > > > > > > 4) We check if txn->toptxn is not null twice here both in if condition > > > > and in the assignment, we could retain the assignment operation as > > > > earlier to remove the 2nd check: > > > > - if (txn->toptxn) > > > > - txn = txn->toptxn; > > > > + if (isa_subtxn(txn)) > > > > + txn = get_toptxn(txn); > > > > > > > > We could avoid one check again by: > > > > + if (isa_subtxn(txn)) > > > > + txn = txn->toptxn; > > > > > > > > > > Yeah, that is true, but I chose not to keep the original assignment in > > > this case mainly because then it is consistent with the other changed > > > code --- e.g. Every other direct member assignment/access of the > > > 'toptxn' member now hides behind the macros so I did not want this > > > single place to be the odd one out. TBH, I don't think 1 extra check > > > is of any significance, but it is not a problem to change like you > > > suggested if other people also want it done that way. > > > > > > > Can't we directly use rbtxn_get_toptxn() for this case? I think that > > way code will look neat. I see that it is not exactly matching the > > existing check so you might be worried but I feel the new code will > > achieve the same purpose and will be easy to follow. > > > > OK. Done as suggested. > +1 to the idea. Here are some minor comments: @@ -1667,7 +1658,7 @@ ReorderBufferTruncateTXN(ReorderBuffer *rb, ReorderBufferTXN *txn, bool txn_prep * about the toplevel xact (we send the XID in all messages), but we never * stream XIDs of empty subxacts. */ - if ((!txn_prepared) && ((!txn->toptxn) || (txn->nentries_mem != 0))) + if ((!txn_prepared) && (rbtxn_is_toptxn(txn) || (txn->nentries_mem != 0))) txn->txn_flags |= RBTXN_IS_STREAMED; Probably the following comment of the above lines also needs to be updated? * The toplevel transaction, identified by (toptxn==NULL), is marked as * streamed always, --- +/* Is this a top-level transaction? */ +#define rbtxn_is_toptxn(txn)\ +(\ + (txn)->toptxn == NULL\ +) + +/* Is this a subtransaction? */ +#define rbtxn_is_subtxn(txn)\ +(\ + (txn)->toptxn != NULL\ +) + +/* Get the top-level transaction of this (sub)transaction. */ +#define rbtxn_get_toptxn(txn)\ +(\ + rbtxn_is_subtxn(txn) ? (txn)->toptxn : (txn)\ +) We need a whitespace before backslashes. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Allow logical replication to copy tables in binary format
On Tue, Mar 14, 2023 at 8:50 PM Takamichi Osumi (Fujitsu) wrote: > > On Tuesday, March 14, 2023 8:02 PM Melih Mutlu wrote: > (3) copy_table() > > + /* > +* If the publisher version is earlier than v14, it COPY command > doesn't > +* support the binary option. > +*/ > > This sentence doesn't look correct grammatically. We can replace "it COPY > command" with "subscription" for example. Kindly please fix it. > How about something like: "The binary option for replication is supported since v14."? -- With Regards, Amit Kapila.
Re: Should vacuum process config file reload more often
On Sat, Mar 11, 2023 at 8:11 AM Melanie Plageman wrote: > > Quotes below are combined from two of Sawada-san's emails. > > I've also attached a patch with my suggested current version. > > On Thu, Mar 9, 2023 at 10:27 PM Masahiko Sawada wrote: > > > > On Fri, Mar 10, 2023 at 11:23 AM Melanie Plageman > > wrote: > > > > > > On Tue, Mar 7, 2023 at 12:10 AM Masahiko Sawada > > > wrote: > > > > > > > > On Mon, Mar 6, 2023 at 5:26 AM Melanie Plageman > > > > wrote: > > > > > > > > > > On Thu, Mar 2, 2023 at 6:37 PM Melanie Plageman > > > > > In this version I've removed wi_cost_delay from WorkerInfoData. There > > > > > is > > > > > no synchronization of cost_delay amongst workers, so there is no > > > > > reason > > > > > to keep it in shared memory. > > > > > > > > > > One consequence of not updating VacuumCostDelay from wi_cost_delay is > > > > > that we have to have a way to keep track of whether or not autovacuum > > > > > table options are in use. > > > > > > > > > > This patch does this in a cringeworthy way. I added two global > > > > > variables, one to track whether or not cost delay table options are in > > > > > use and the other to store the value of the table option cost delay. I > > > > > didn't want to use a single variable with a special value to indicate > > > > > that table option cost delay is in use because > > > > > autovacuum_vacuum_cost_delay already has special values that mean > > > > > certain things. My code needs a better solution. > > > > > > > > While it's true that wi_cost_delay doesn't need to be shared, it seems > > > > to make the logic somewhat complex. We need to handle cost_delay in a > > > > different way from other vacuum-related parameters and we need to make > > > > sure av[_use]_table_option_cost_delay are set properly. Removing > > > > wi_cost_delay from WorkerInfoData saves 8 bytes shared memory per > > > > autovacuum worker but it might be worth considering to keep > > > > wi_cost_delay for simplicity. > > > > > > Ah, it turns out we can't really remove wi_cost_delay from WorkerInfo > > > anyway because the launcher doesn't know anything about table options > > > and so the workers have to keep an updated wi_cost_delay that the > > > launcher or other autovac workers who are not vacuuming that table can > > > read from when calculating the new limit in autovac_balance_cost(). > > > > IIUC if any of the cost delay parameters has been set individually, > > the autovacuum worker is excluded from the balance algorithm. > > Ah, yes! That's right. So it is not a problem. Then I still think > removing wi_cost_delay from the worker info makes sense. wi_cost_delay > is a double and can't easily be accessed atomically the way > wi_cost_limit can be. > > Keeping the cost delay local to the backends also makes it clear that > cost delay is not something that should be written to by other backends > or that can differ from worker to worker. Without table options in the > picture, the cost delay should be the same for any worker who has > reloaded the config file. Agreed. > > As for the cost limit safe access issue, maybe we can avoid a LWLock > acquisition for reading wi_cost_limit by using an atomic similar to what > you suggested here for "did_rebalance". > > > > I've added in a shared lock for reading from wi_cost_limit in this > > > patch. However, AutoVacuumUpdateLimit() is called unconditionally in > > > vacuum_delay_point(), which is called quite often (per block-ish), so I > > > was trying to think if there is a way we could avoid having to check > > > this shared memory variable on every call to vacuum_delay_point(). > > > Rebalances shouldn't happen very often (done by the launcher when a new > > > worker is launched and by workers between vacuuming tables). Maybe we > > > can read from it less frequently? > > > > Yeah, acquiring the lwlock for every call to vacuum_delay_point() > > seems to be harmful. One idea would be to have one sig_atomic_t > > variable in WorkerInfoData and autovac_balance_cost() set it to true > > after rebalancing the worker's cost-limit. The worker can check it > > without locking and update its delay parameters if the flag is true. > > Instead of having the atomic indicate whether or not someone (launcher > or another worker) did a rebalance, it would simply store the current > cost limit. Then the worker can normally access it with a simple read. > > My rationale is that if we used an atomic to indicate whether or not we > did a rebalance ("did_rebalance"), we would have the same cache > coherency guarantees as if we just used the atomic for the cost limit. > If we read from the "did_rebalance" variable and missed someone having > written to it on another core, we still wouldn't get around to checking > the wi_cost_limit variable in shared memory, so it doesn't matter that > we bothered to keep it in shared memory and use a lock to access it. > > I noticed we don't allow wi_cost_limit to ever be less than 0, so we > could store
Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken
On Wed, Mar 15, 2023 at 7:54 AM Nathan Bossart wrote: > Here is roughly what I had in mind: > > NOTE: Although the delay is specified in microseconds, older Unixen > and > Windows use periodic kernel ticks to wake up, which might increase the > delay time significantly. We've observed delay increases as large as > 20 milliseconds on supported platforms. Sold. And pushed. I couldn't let that 20ms != 1s/100 problem go, despite my claim that I would, and now I see: NetBSD does have 10ms resolution, so everyone can relax, arithmetic still works. It's just that it always or often adds on one extra tick, for some strange reason. So you can measure 20ms, 30ms, ... but never as low as 10ms. *Shrug*. Your description covered that nicely. https://marc.info/?l=netbsd-current-users=144832117108168=2 > > (The word "interrupt" is a bit overloaded, which doesn't help with > > this discussion.) > > Yeah, I think it would be clearer if "interrupt" was disambiguated. OK, I rewrote it to avoid that terminology. On small detail, after reading Tom's 2019 proposal to do this[1]: He mentioned SUSv2's ENOSYS error. I see that SUSv3 (POSIX.1-2001) dropped that. Systems that don't have the "timers" option simply shouldn't define the function, but we already require the "timers" option for clock_gettime(). And more practically, I know that all our target systems have it and it works. Pushed. [1] https://www.postgresql.org/message-id/4902.1552349...@sss.pgh.pa.us
Re: Fix fseek() detection of unseekable files on WIN32
On Tue, Mar 14, 2023 at 01:26:27PM +0100, Juan José Santamaría Flecha wrote: > As highlighted in [1] fseek() might fail to error even when accessing > unseekable streams. > > PFA a patch that checks the file type before the actual fseek(), so only > supported calls are made. + * streams, so harden that funcion with our version. s/funcion/function/. +extern int pgfseek64(FILE *stream, pgoff_t offset, int origin); +extern pgoff_t pgftell64(FILE *stream); +#define fseeko(stream, offset, origin) pgfseek64(stream, offset, origin) +#define ftello(stream) pgftell64(stream) What about naming the internal wrappers _pgfseeko64() and _pgftello64(), located in a new file named win32fseek.c? It may be possible that we would need a similar treatment for fseek(), in the future, though I don't see an issue why this would be needed now. + if (GetFileType((HANDLE) _get_osfhandle(_fileno(stream))) != FILE_TYPE_DISK) + { + errno = ESPIPE; + return -1; + } Shouldn't there be cases where we should return EINVAL for some of the other types, like FILE_TYPE_REMOTE or FILE_TYPE_UNKNOWN? We should return ESPIPE only for FILE_TYPE_PIPE and FILE_TYPE_CHAR, then? -- Michael signature.asc Description: PGP signature
Re: psql \watch 2nd argument: iteration count
+ sleep = strtod(opt, _end); + if (sleep < 0 || *opt_end || errno == ERANGE) Should we set errno to 0 before calling strtod()? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: recovery modules
I noticed that the new TAP test for basic_archive was failing intermittently for cfbot. It looks like the query for checking that the post-backup WAL is restored sometimes executes before archive recovery is complete (because hot_standby is on). To fix this, I adjusted the test to use poll_query_until instead. There are no other changes in v14. I first tried to set hot_standby to off on the restored node so that the query wouldn't run until archive recovery completed. This seemed like it would work because start() useѕ "pg_ctl --wait", which has the following note in the docs: Startup is considered complete when the PID file indicates that the server is ready to accept connections. However, that's not what happens when hot_standby is off. In that case, the postmaster.pid file is updated with PM_STATUS_STANDBY once recovery starts, which wait_for_postmaster_start() interprets as "ready." I see this was reported before [0], but that discussion fizzled out. IIUC it was done this way to avoid infinite waits when hot_standby is off and standby mode is enabled. I could be missing something obvious, but that doesn't seem necessary when hot_standby is off and recovery mode is enabled because recovery should end at some point (never mind the halting problem). I'm still digging into this and may spin off a new thread if I can conjure up a proposal. [0] https://postgr.es/m/CAMkU%3D1wrMqPggnEfszE-c3PPLmKgRK17_qr7tmxBECYEbyV-4Q%40mail.gmail.com -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From ef9aade7270d12104647439a99e3b1822393a318 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Thu, 23 Feb 2023 14:27:48 -0800 Subject: [PATCH v14 1/7] Move extra code out of the Pre/PostRestoreCommand() block. If SIGTERM is received within this block, the startup process will immediately proc_exit() in the signal handler, so it is inadvisable to include any more code than is required in this section. This change moves the code recently added to this block (see 1b06d7b and 7fed801) to outside of the block. This ensures that only system() is called while proc_exit() might be called in the SIGTERM handler, which is how this code worked from v8.4 to v14. --- src/backend/access/transam/xlogarchive.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c index fcc87ff44f..41684418b6 100644 --- a/src/backend/access/transam/xlogarchive.c +++ b/src/backend/access/transam/xlogarchive.c @@ -159,20 +159,27 @@ RestoreArchivedFile(char *path, const char *xlogfname, (errmsg_internal("executing restore command \"%s\"", xlogRestoreCmd))); + fflush(NULL); + pgstat_report_wait_start(WAIT_EVENT_RESTORE_COMMAND); + /* - * Check signals before restore command and reset afterwards. + * PreRestoreCommand() informs the SIGTERM handler for the startup process + * that it should proc_exit() right away. This is done for the duration of + * the system() call because there isn't a good way to break out while it + * is executing. Since we might call proc_exit() in a signal handler, it + * is best to put any additional logic before or after the + * PreRestoreCommand()/PostRestoreCommand() section. */ PreRestoreCommand(); /* * Copy xlog from archival storage to XLOGDIR */ - fflush(NULL); - pgstat_report_wait_start(WAIT_EVENT_RESTORE_COMMAND); rc = system(xlogRestoreCmd); - pgstat_report_wait_end(); PostRestoreCommand(); + + pgstat_report_wait_end(); pfree(xlogRestoreCmd); if (rc == 0) -- 2.25.1 >From 78fdf06e4b1bbb37c8ae37937728aaeb4b2cf50b Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Tue, 14 Feb 2023 09:44:53 -0800 Subject: [PATCH v14 2/7] Don't proc_exit() in startup's SIGTERM handler if forked by system(). Instead, emit a message to STDERR and _exit() in this case. This change also adds assertions to proc_exit(), ProcKill(), and AuxiliaryProcKill() to verify that these functions are not called by a process forked by system(), etc. --- src/backend/postmaster/startup.c | 17 - src/backend/storage/ipc/ipc.c| 3 +++ src/backend/storage/lmgr/proc.c | 2 ++ src/backend/utils/error/elog.c | 28 src/include/utils/elog.h | 6 +- 5 files changed, 50 insertions(+), 6 deletions(-) diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c index efc2580536..0e7de26bc2 100644 --- a/src/backend/postmaster/startup.c +++ b/src/backend/postmaster/startup.c @@ -19,6 +19,8 @@ */ #include "postgres.h" +#include + #include "access/xlog.h" #include "access/xlogrecovery.h" #include "access/xlogutils.h" @@ -121,7 +123,20 @@ StartupProcShutdownHandler(SIGNAL_ARGS) int save_errno = errno; if (in_restore_command) - proc_exit(1); + { + /* + * If we are in a child process (e.g., forked by system() in + * RestoreArchivedFile()), we don't want to
Re: psql \watch 2nd argument: iteration count
On Tue, Mar 14, 2023 at 08:20:23PM -0700, Andrey Borodin wrote: > PFA v8. Thanks! Looks OK to me. I've looked as well at resetting query_buffer on failure, which I guess is better this way because this is an accumulation of the previous results, right? -- Michael signature.asc Description: PGP signature
Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32
On Tue, Mar 14, 2023 at 01:47:09PM +0100, Juan José Santamaría Flecha wrote: > I have just posted a patch to enforce the detection of unseekable streams > in the fseek() calls [1], please feel free to review it. Thanks. I have been able to get around 0001 to fix _pgfstat64() and applied it down to v14 where this code has been introduced. Now to the part about fseek() and ftello().. -- Michael signature.asc Description: PGP signature
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
On Tue, Mar 14, 2023 at 3:18 PM Önder Kalacı wrote: >> Pushed this patch but forgot to add a new testfile. Will do that soon. -- With Regards, Amit Kapila.
Re: psql \watch 2nd argument: iteration count
On Tue, Mar 14, 2023 at 6:25 PM Michael Paquier wrote: > > On Wed, Mar 15, 2023 at 10:19:28AM +0900, Kyotaro Horiguchi wrote: > > I hesitated to propose such a level of simplification, but basically I > > was alsothinking the same thing. +1 > Okay, fine by me to use one single message. I'd rather still keep the > three tests, though, as they check the three conditions upon which the > error would be triggered. PFA v8. Thanks! Best regards, Andrey Borodin. v8-0001-Fix-incorrect-argument-handling-in-psql-watch.patch Description: Binary data v8-0002-Add-iteration-count-argument-to-psql-watch-comman.patch Description: Binary data
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Tue, Mar 14, 2023 at 8:27 PM John Naylor wrote: > > I wrote: > > > > > Since the block-level measurement is likely overestimating quite a bit, > > > > I propose to simply reverse the order of the actions here, effectively > > > > reporting progress for the *last page* and not the current one: First > > > > update progress with the current memory usage, then add tids for this > > > > page. If this allocated a new block, only a small bit of that will be > > > > written to. If this block pushes it over the limit, we will detect that > > > > up at the top of the loop. It's kind of like our earlier attempts at a > > > > "fudge factor", but simpler and less brittle. And, as far as OS pages > > > > we have actually written to, I think it'll effectively respect the > > > > memory limit, at least in the local mem case. And the numbers will make > > > > sense. > > > > > > > > Thoughts? > > > > > > It looks to work but it still doesn't work in a case where a shared > > > tidstore is created with a 64kB memory limit, right? > > > TidStoreMemoryUsage() returns 1MB and TidStoreIsFull() returns true > > > from the beginning. > > > > I have two ideas: > > > > 1. Make it optional to track chunk memory space by a template parameter. It > > might be tiny compared to everything else that vacuum does. That would > > allow other users to avoid that overhead. > > 2. When context block usage exceeds the limit (rare), make the additional > > effort to get the precise usage -- I'm not sure such a top-down facility > > exists, and I'm not feeling well enough today to study this further. > > Since then, Masahiko incorporated #1 into v31, and that's what I'm looking at > now. Unfortunately, If I had spent five minutes reminding myself what the > original objections were to this approach, I could have saved us some effort. > Back in July (!), Andres raised two points: GetMemoryChunkSpace() is slow > [1], and fragmentation [2] (leading to underestimation). > > In v31, in the local case at least, the underestimation is actually worse > than tracking chunk space, since it ignores chunk header and alignment. I'm > not sure about the DSA case. This doesn't seem great. Right. > > It shouldn't be a surprise why a simple increment of raw allocation size is > comparable in speed -- GetMemoryChunkSpace() calls the right function through > a pointer, which is slower. If we were willing to underestimate for the sake > of speed, that takes away the reason for making memory tracking optional. > > Further, if the option is not specified, in v31 there is no way to get the > memory use at all, which seems odd. Surely the caller should be able to ask > the context/area, if it wants to. There are precedents that don't provide a way to return memory usage, such as simplehash.h and dshash.c. > > I still like my idea at the top of the page -- at least for vacuum and m_w_m. > It's still not completely clear if it's right but I've got nothing better. It > also ignores the work_mem issue, but I've given up anticipating all future > cases at the moment. > What does it mean by "the precise usage" in your idea? Quoting from the email you referred to, Andres said: --- One thing I was wondering about is trying to choose node types in roughly-power-of-two struct sizes. It's pretty easy to end up with significant fragmentation in the slabs right now when inserting as you go, because some of the smaller node types will be freed but not enough to actually free blocks of memory. If we instead have ~power-of-two sizes we could just use a single slab of the max size, and carve out the smaller node types out of that largest allocation. Btw, that fragmentation is another reason why I think it's better to track memory usage via memory contexts, rather than doing so based on GetMemoryChunkSpace(). --- IIUC he suggested measuring memory usage in block-level in order to count blocks that are not actually freed but some of its chunks are freed. That's why we used MemoryContextMemAllocated(). On the other hand, recently you pointed out[1]: --- I think we're trying to solve the wrong problem here. I need to study this more, but it seems that code that needs to stay within a memory limit only needs to track what's been allocated in chunks within a block, since writing there is what invokes a page fault. --- IIUC you suggested measuring memory usage by tracking how much memory chunks are allocated within a block. If your idea at the top of the page follows this method, it still doesn't deal with the point Andres mentioned. > I'll put this item and a couple other things together in a separate email > tomorrow. Thanks! Regards, [1] https://www.postgresql.org/message-id/CAFBsxsEnzivaJ13iCGdDoUMsXJVGOaahuBe_y%3Dq6ow%3DLTzyDvA%40mail.gmail.com -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Combine pg_walinspect till_end_of_wal functions with others
Hi, On 2023-03-15 09:56:10 +0900, Michael Paquier wrote: > On Tue, Mar 14, 2023 at 02:54:40PM -0700, Andres Freund wrote: > > Object description > > --- > > function pg_get_wal_record_info(pg_lsn) > > - function pg_get_wal_records_info(pg_lsn,pg_lsn) > > function pg_get_wal_records_info_till_end_of_wal(pg_lsn) > > - function pg_get_wal_stats(pg_lsn,pg_lsn,boolean) > > + function pg_get_wal_records_info(pg_lsn,pg_lsn) > > function pg_get_wal_stats_till_end_of_wal(pg_lsn,boolean) > > + function pg_get_wal_stats(pg_lsn,pg_lsn,boolean) > > (5 rows) > > > > -- Make sure checkpoints don't interfere with the test. > > > > Looks like it's missing an ORDER BY. > > Interesting. This is "\dx+ pg_walinspect". > listOneExtensionContents() uses pg_describe_object() for that, and > there is already an ORDER BY based on it. I would not have expected > this part to be that much sensitive. Is this using a specific ICU > collation, because this is a side-effect of switching ICU as the > default in initdb? It's using ICU, but not a specific collation. The build I linked to is WIP hackery to add ICU support to windows CI. Here's the initdb output: https://api.cirrus-ci.com/v1/artifact/task/6288336663347200/testrun/build/testrun/pg_walinspect/regress/log/initdb.log The database cluster will be initialized with this locale configuration: provider:icu ICU locale: en_US LC_COLLATE: English_United States.1252 LC_CTYPE:English_United States.1252 LC_MESSAGES: English_United States.1252 LC_MONETARY: English_United States.1252 LC_NUMERIC: English_United States.1252 LC_TIME: English_United States.1252 The default database encoding has accordingly been set to "WIN1252". The default text search configuration will be set to "english". For comparison, here's a recent CI run (which also failed on windows, but for unrelated reasons), without ICU: https://api.cirrus-ci.com/v1/artifact/task/6478925920993280/testrun/build/testrun/pg_walinspect/regress/log/initdb.log The database cluster will be initialized with locale "English_United States.1252". The default database encoding has accordingly been set to "WIN1252". The default text search configuration will be set to "english". > As a solution, this could use pg_identify_object(classid, objid, 0) in > the ORDER BY clause to enforce a better ordering of the objects dealt > with as it decomposes the object name and the object type. That > should be enough, I assume, as it looks to be parenthesis vs > underscore that switch the order. Greetings, Andres Freund
Re: Add pg_walinspect function with block info columns
On Tue, Mar 14, 2023 at 5:34 PM Melanie Plageman wrote: > On Tue, Mar 14, 2023 at 6:57 PM Peter Geoghegan wrote: > > Why doesn't it already work like this? Why do we need a separate > > pg_get_wal_block_info() function at all? > > Well, I think if you only care about the WAL record-level information > and not the block-level information, having the WAL record information > denormalized like that with all the block information would be a > nuisance. I generally care about both. When I want to look at things at the pg_get_wal_records_info() level (as opposed to a summary), the block_ref information is *always* of primary importance. I don't want to have to write my own bug-prone parser for block_ref, but why should the only alternative be joining against pg_get_wal_block_info()? The information that I'm interested in is "close at hand" to pg_get_wal_records_info() already. I understand that in the general case there might be quite a few blocks associated with a WAL record. For complicated cases, pg_get_wal_block_info() does make sense. However, the vast majority of individual WAL records (and possibly most WAL record types) are related to one block only. One block that is generally from the relation's main fork. > But, perhaps you are suggesting a parameter to pg_get_wal_records_info() > like "with_block_info" or something, which produces the full > denormalized block + record output? I was thinking of something like that, yes -- though it wouldn't necessarily have to be the *full* denormalized block_ref info, the FPI itself, etc. Just the more useful stuff. It occurs to me that my concern about the information that pg_get_wal_records_info() lacks could be restated as a concern about what pg_get_wal_block_info() lacks: pg_get_wal_block_info() fails to show basic information about the WAL record whose blocks it reports on, even though it could easily show all of the pg_get_wal_records_info() info once per block (barring block_ref). So addressing my concern by adjusting pg_get_wal_block_info() might be the best approach. I'd probably be happy with that -- I'd likely just stop using pg_get_wal_records_info() completely under this scheme. Overall, I'm concerned that we may have missed the opportunity to make simple things easier. Again, wanting to see (say) all of the PRUNE records and VACUUM records with an "order by relfilenode, block_number, lsn" seems likely to be a very common requirement to me. It's exactly the kind of thing that you'd expect an SQL interface to make easy. -- Peter Geoghegan
Re: psql \watch 2nd argument: iteration count
On Wed, Mar 15, 2023 at 10:19:28AM +0900, Kyotaro Horiguchi wrote: > I hesitated to propose such a level of simplification, but basically I > was alsothinking the same thing. Okay, fine by me to use one single message. I'd rather still keep the three tests, though, as they check the three conditions upon which the error would be triggered. -- Michael signature.asc Description: PGP signature
Re: psql \watch 2nd argument: iteration count
At Tue, 14 Mar 2023 12:03:00 -0700, Nathan Bossart wrote in > On Tue, Mar 14, 2023 at 01:58:59PM +0900, Kyotaro Horiguchi wrote: > > + if (*opt_end) > > + pg_log_error("\\watch: incorrect > > interval value '%s'", opt); > > + else if (errno == ERANGE) > > + pg_log_error("\\watch: out-of-range > > interval value '%s'", opt); > > + else > > + pg_log_error("\\watch: interval value > > '%s' less than zero", opt); > > > > I'm not sure if we need error messages for that resolution and I'm a > > bit happier to have fewer messages to translate:p. Merging the cases > > of ERANGE and negative values might be better. And I think we usually > > refer to unparsable input as "invalid". > > > > if (*opt_end) > >pg_log_error("\\watch: invalid interval value '%s'", opt); > > else > >pg_log_error("\\watch: interval value '%s' out of range", opt); > > +1, I don't think it's necessary to complicate these error messages too > much. This code hasn't reported errors for nearly 10 years, and I'm not > aware of any complaints. I till think we could simplify this to "\watch: > invalid delay interval: %s" and call it a day. I hesitated to propose such a level of simplification, but basically I was alsothinking the same thing. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Making background psql nicer to use in tap tests
Hi, On 2023-03-14 21:24:32 +0100, Daniel Gustafsson wrote: > > On 31 Jan 2023, at 01:00, Andres Freund wrote: > > > I've hacked some on this. I first tried to just introduce a few helper > > functions in Cluster.pm, but that ended up being awkward. So I bit the > > bullet > > and introduced a new class (in BackgroundPsql.pm), and made > > background_psql() > > and interactive_psql() return an instance of it. > > Thanks for working on this! Thanks for helping it move along :) > > This is just a rough prototype. Several function names don't seem great, it > > need POD documentation, etc. > > It might be rough around the edges but I don't think it's too far off a state > in which in can be committed, given that it's replacing something even > rougher. > With documentation and some polish I think we can iterate on it in the tree. Cool. > > I don't quite like the new interface yet: > > - It's somewhat common to want to know if there was a failure, but also get > > the query result, not sure what the best function signature for that is in > > perl. > > What if query() returns a list with the return value last? The caller will > get > the return value when assigning a single var as the return, and can get both > in > those cases when it's interesting. That would make for reasonably readable > code in most places? >$ret_val = $h->query("SELECT 1;"); >($query_result, $ret_val) = $h->query("SELECT 1;"); I hate perl. > Returning a hash seems like a worse option since it will complicate callsites > which only want to know success/failure. Yea. Perhaps it's worth having a separate function for this? ->query_rc() or such? > > - right now there's a bit too much logic in background_psql() / > > interactive_psql() for my taste > > Not sure what you mean, I don't think they're especially heavy on logic? -EMISSINGWORD on my part. A bit too much duplicated logic. > +A default timeout of $PostgreSQL::Test::Utils::timeout_default is set up, > +which can modified later. > > This require a bit of knowledge about the internals which I think we should > hide in this new API. How about providing a function for defining the > timeout? "definining" in the sense of accessing it? Or passing one in? > Re timeouts: one thing I've done repeatedly is to use short timeouts and reset > them per query, and that turns pretty ugly fast. I hacked up your patch to > provide $h->reset_timer_before_query() which then injects a {timeout}->start > before running each query without the caller having to do it. Not sure if I'm > alone in doing that but if not I think it makes sense to add. I don't quite understand the use case, but I don't mind it as a functionality. Greetings, Andres Freund
Re: Combine pg_walinspect till_end_of_wal functions with others
On Tue, Mar 14, 2023 at 10:35:43AM +0530, Bharath Rupireddy wrote: > My thoughts are simple here - how would one (an end user, not me and > not you) figure out how to get info/stats till the end of WAL? I'm > sure it would be difficult to find that out without looking at the > code or commit history. Search for till end of WAL behaviour with new > version will be more given the 1.0 version has explicit functions to > do that. IMO, there's no harm in being explicit in how to achieve till > end of WAL functionality around in the docs. Okay. I have kept these notes, but tweaked the wording to be a bit cleaner, replacing the term "till" by "until". To my surprise while studying this point, "till" is a term older than "until" in English literacy, but it is rarely used. > I get it. I divided the patches to 0001 and 0002 with 0001 focussing > on the change of behaviour around future end LSNs, dropping till end > of WAL functions and tests tweakings related to it. 0002 has all other > tests tidy up things. > > Please find the attached v8 patch set for further review. The tests of 0001 were still too complex IMO. The changes can be much simpler as it requires only to move the till_end_of_wal() calls from pg_walinspect.sql to oldextversions.sql. Nothing more. -- Michael signature.asc Description: PGP signature
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode
Thanks for the review! On Sat, Mar 11, 2023 at 2:16 PM Justin Pryzby wrote: > > On Sat, Mar 11, 2023 at 09:55:33AM -0500, Melanie Plageman wrote: > > Subject: [PATCH v3 2/3] use shared buffers when failsafe active > > > > + /* > > + * Assume the caller who allocated the memory for the > > + * BufferAccessStrategy object will free it. > > + */ > > + vacrel->bstrategy = NULL; > > This comment could use elaboration: > > ** VACUUM normally restricts itself to a small ring buffer; but in > failsafe mode, in order to process tables as quickly as possible, allow > the leaving behind large number of dirty buffers. Agreed. It definitely needs a comment like this. I will update in the next version along with addressing your other feedback (after sorting out some of the other points in this mail on which I still have questions). > > Subject: [PATCH v3 3/3] add vacuum option to specify ring_size and guc > > > #define INT_ACCESS_ONCE(var) ((int)(*((volatile int *)&(var > > +#define bufsize_limit_to_nbuffers(bufsize) (bufsize * 1024 / BLCKSZ) > > Macros are normally be capitalized Yes, there doesn't seem to be a great amount of consistency around this... See pgstat.c read_chunk_s and bufmgr.c BufHdrGetBlock and friends. Though there are probably more capitalized than not. Since it does a bit of math and returns a value, I wanted to convey that it was more like a function. Also, since the name was long, I thought all-caps would be hard to read. However, if you or others feel strongly, I am attached neither to the capitalization nor to the name at all (what do you think of the name?). > It's a good idea to write "(bufsize)", in case someone passes "a+b". Ah yes, this is a good idea -- I always miss at least one set of parentheses when writing a macro. In this case, I didn't think of it since the current caller couldn't pass an expression. > > @@ -586,6 +587,45 @@ GetAccessStrategy(BufferAccessStrategyType btype) > > +BufferAccessStrategy > > +GetAccessStrategyWithSize(BufferAccessStrategyType btype, int ring_size) > > Maybe it would make sense for GetAccessStrategy() to call > GetAccessStrategyWithSize(). Or maybe not. You mean instead of having anyone call GetAccessStrategyWithSize()? We would need to change the signature of GetAccessStrategy() then -- and there are quite a few callers. > > > + {"vacuum_buffer_usage_limit", PGC_USERSET, RESOURCES_MEM, > > + gettext_noop("Sets the buffer pool size for > > operations employing a buffer access strategy."), > > The description should mention vacuum, if that's the scope of the GUC's > behavior. Good catch. Will update in next version. > > +#vacuum_buffer_usage_limit = -1 # size of vacuum buffer access strategy > > ring. > > + # -1 to use default, > > + # 0 to disable vacuum buffer access strategy > > and use shared buffers > > + # > 0 to specify size > > If I'm not wrong, there's still no documentation about "ring buffers" or > postgres' "strategy". Which seems important to do for this patch, along > with other documentation. Yes, it is. I have been thinking about where in the docs to add it (the docs about buffer access strategies). Any ideas? > This patch should add support in vacuumdb.c. Oh, I had totally forgotten about vacuumdb. > And maybe a comment about adding support there, since it's annoying > when it the release notes one year say "support VACUUM (FOO)" and then > one year later say "support vacuumdb --foo". I'm not sure I totally follow. Do you mean to add a comment to ExecVacuum() saying to add support to vacuumdb when adding a new option to vacuum? In the past, did people forget to add support to vacuumdb for new vacuum options or did they forget to document that they did that or did they forgot to include that they did that in the release notes? - Melanie
Re: Combine pg_walinspect till_end_of_wal functions with others
On Tue, Mar 14, 2023 at 02:54:40PM -0700, Andres Freund wrote: > Object description > --- > function pg_get_wal_record_info(pg_lsn) > - function pg_get_wal_records_info(pg_lsn,pg_lsn) > function pg_get_wal_records_info_till_end_of_wal(pg_lsn) > - function pg_get_wal_stats(pg_lsn,pg_lsn,boolean) > + function pg_get_wal_records_info(pg_lsn,pg_lsn) > function pg_get_wal_stats_till_end_of_wal(pg_lsn,boolean) > + function pg_get_wal_stats(pg_lsn,pg_lsn,boolean) > (5 rows) > > -- Make sure checkpoints don't interfere with the test. > > Looks like it's missing an ORDER BY. Interesting. This is "\dx+ pg_walinspect". listOneExtensionContents() uses pg_describe_object() for that, and there is already an ORDER BY based on it. I would not have expected this part to be that much sensitive. Is this using a specific ICU collation, because this is a side-effect of switching ICU as the default in initdb? As a solution, this could use pg_identify_object(classid, objid, 0) in the ORDER BY clause to enforce a better ordering of the objects dealt with as it decomposes the object name and the object type. That should be enough, I assume, as it looks to be parenthesis vs underscore that switch the order. -- Michael signature.asc Description: PGP signature
Re: Add pg_walinspect function with block info columns
On Tue, Mar 14, 2023 at 6:57 PM Peter Geoghegan wrote: > > On Tue, Mar 14, 2023 at 3:34 PM Melanie Plageman > wrote: > > After patching master to add in the columns from > > pg_get_wal_records_info() which are not returned by > > pg_get_wal_block_info() (except block_ref column of course), this query: > > > > SELECT COUNT(*) FROM pg_get_wal_block_info(:start_lsn, :end_lsn); > > > > took 467 ms. > > > > Perhaps this difference isn't important, but I found it noticeable. > > This seems weird to me too. It's not so much the performance overhead > that bothers me (though that's not great either). It seems *illogical* > to me. The query you end up writing must do two passes over the WAL > records, but its structure almost suggests that it's necessary to do > two separate passes over distinct "streams". > > Why doesn't it already work like this? Why do we need a separate > pg_get_wal_block_info() function at all? Well, I think if you only care about the WAL record-level information and not the block-level information, having the WAL record information denormalized like that with all the block information would be a nuisance. But, perhaps you are suggesting a parameter to pg_get_wal_records_info() like "with_block_info" or something, which produces the full denormalized block + record output? - Melanie
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode
Thanks for your interest in this patch! On Mon, Mar 13, 2023 at 8:38 AM Ants Aasma wrote: > > On Sat, 11 Mar 2023 at 16:55, Melanie Plageman > wrote: > > > > > On Tue, Feb 28, 2023 at 3:16 AM Bharath Rupireddy > > > wrote: > > > > > > > On Thu, Jan 12, 2023 at 6:06 AM Andres Freund > > > > wrote: > > > > > > > > > > On 2023-01-11 17:26:19 -0700, David G. Johnston wrote: > > > > > > Should we just add "ring_buffers" to the existing "shared_buffers" > > > > > > and > > > > > > "temp_buffers" settings? > > > > > > > > > > The different types of ring buffers have different sizes, for good > > > > > reasons. So > > > > > I don't see that working well. I also think it'd be more often useful > > > > > to > > > > > control this on a statement basis - if you have a parallel import > > > > > tool that > > > > > starts NCPU COPYs you'd want a smaller buffer than a single threaded > > > > > COPY. Of > > > > > course each session can change the ring buffer settings, but still. > > > > > > > > How about having GUCs for each ring buffer (bulk_read_ring_buffers, > > > > bulk_write_ring_buffers, vacuum_ring_buffers - ah, 3 more new GUCs)? > > > > These options can help especially when statement level controls aren't > > > > easy to add (COPY, CREATE TABLE AS/CTAS, REFRESH MAT VIEW/RMV)? If > > > > needed users can also set them at the system level. For instance, one > > > > can set bulk_write_ring_buffers to other than 16MB or -1 to disable > > > > the ring buffer to use shared_buffers and run a bunch of bulk write > > > > queries. > > > > In attached v3, I've changed the name of the guc from buffer_usage_limit > > to vacuum_buffer_usage_limit, since it is only used for vacuum and > > autovacuum. > > Sorry for arriving late to this thread, but what about sizing the ring > dynamically? From what I gather the primary motivation for larger ring > size is avoiding WAL flushes due to dirty buffer writes. We already > catch that event with StrategyRejectBuffer(). So maybe a dynamic > sizing algorithm could be applied to the ringbuffer. Make the buffers > array in strategy capable of holding up to the limit of buffers, but > set ring size conservatively. If we have to flush WAL, double the ring > size (up to the limit). If we loop around the ring without flushing, > decrease the ring size by a small amount to let clock sweep reclaim > them for use by other backends. So, the original motivation of this patch was to allow autovacuum in failsafe mode to abandon use of a buffer access strategy, since, at that point, there is no reason to hold back. The idea was expanded to be an option to explicit vacuum, since users often must initiate an explicit vacuum after a forced shutdown due to transaction ID wraparound. As for routine vacuuming and the other buffer access strategies, I think there is an argument for configurability based on operator knowledge -- perhaps your workload will use the data you are COPYing as soon as the COPY finishes, so you might as well disable a buffer access strategy or use a larger fraction of shared buffers. Also, the ring sizes were selected sixteen years ago and average server memory and data set sizes have changed. StrategyRejectBuffer() will allow bulkreads to, as you say, use more buffers than the original ring size, since it allows them to kick dirty buffers out of the ring and claim new shared buffers. Bulkwrites and vacuums, however, will inevitably dirty buffers and require flushing the buffer (and thus flushing the associated WAL) when reusing them. Bulkwrites and vacuum do not kick dirtied buffers out of the ring, since dirtying buffers is their common case. A dynamic resizing like the one you suggest would likely devolve to vacuum and bulkwrite strategies always using the max size. As for decreasing the ring size, buffers are only "added" to the ring lazily and, technically, as it is now, buffers which have been added added to the ring can always be reclaimed by the clocksweep (as long as they are not pinned). The buffer access strategy is more of a self-imposed restriction than it is a reservation. Since the ring is small and the buffers are being frequently reused, odds are the usage count will be 1 and we will be the one who set it to 1, but there is no guarantee. If, when attempting to reuse the buffer, its usage count is > 1 (or it is pinned), we also will kick it out of the ring and go look for a replacement buffer. I do think that it is a bit unreasonable to expect users to know how large they would like to make their buffer access strategy ring. What we want is some way of balancing different kinds of workloads and maintenance tasks reasonably. If your database has no activity because it is the middle of the night or it was shutdown because of transaction id wraparound, there is no reason why vacuum should limit the number of buffers it uses. I'm sure there are many other such examples. - Melanie
Re: Add macros for ReorderBufferTXN toptxn
On Tue, Mar 14, 2023 at 10:33 PM vignesh C wrote: > > On Tue, 14 Mar 2023 at 12:37, Peter Smith wrote: > > > > Thanks for the review! > > > > On Mon, Mar 13, 2023 at 6:19 PM vignesh C wrote: > > ... > > > Few comments: > > > 1) Can we move the macros along with the other macros present in this > > > file, just above this structure, similar to the macros added for > > > txn_flags: > > > /* Toplevel transaction for this subxact (NULL for top-level). */ > > > +#define isa_toptxn(rbtxn) (rbtxn->toptxn == NULL) > > > +#define isa_subtxn(rbtxn) (rbtxn->toptxn != NULL) > > > +#define get_toptxn(rbtxn) (isa_subtxn(rbtxn) ? rbtxn->toptxn : rbtxn) > > > > > > 2) The macro name can be changed to rbtxn_is_toptxn, rbtxn_is_subtxn > > > and rbtxn_get_toptxn to keep it consistent with others: > > > /* Toplevel transaction for this subxact (NULL for top-level). */ > > > +#define isa_toptxn(rbtxn) (rbtxn->toptxn == NULL) > > > +#define isa_subtxn(rbtxn) (rbtxn->toptxn != NULL) > > > +#define get_toptxn(rbtxn) (isa_subtxn(rbtxn) ? rbtxn->toptxn : rbtxn) > > > > > > 3) We could add separate comments for each of the macros: > > > /* Toplevel transaction for this subxact (NULL for top-level). */ > > > +#define isa_toptxn(rbtxn) (rbtxn->toptxn == NULL) > > > +#define isa_subtxn(rbtxn) (rbtxn->toptxn != NULL) > > > +#define get_toptxn(rbtxn) (isa_subtxn(rbtxn) ? rbtxn->toptxn : rbtxn) > > > > > > > All the above are fixed as suggested. > > > > > 4) We check if txn->toptxn is not null twice here both in if condition > > > and in the assignment, we could retain the assignment operation as > > > earlier to remove the 2nd check: > > > - if (txn->toptxn) > > > - txn = txn->toptxn; > > > + if (isa_subtxn(txn)) > > > + txn = get_toptxn(txn); > > > > > > We could avoid one check again by: > > > + if (isa_subtxn(txn)) > > > + txn = txn->toptxn; > > > > > > > Yeah, that is true, but I chose not to keep the original assignment in > > this case mainly because then it is consistent with the other changed > > code --- e.g. Every other direct member assignment/access of the > > 'toptxn' member now hides behind the macros so I did not want this > > single place to be the odd one out. TBH, I don't think 1 extra check > > is of any significance, but it is not a problem to change like you > > suggested if other people also want it done that way. > > The same issue exists here too: > 1) > - if (toptxn != NULL && !rbtxn_has_catalog_changes(toptxn)) > + if (rbtxn_is_subtxn(txn)) > { > - toptxn->txn_flags |= RBTXN_HAS_CATALOG_CHANGES; > - dclist_push_tail(>catchange_txns, > >catchange_node); > + ReorderBufferTXN *toptxn = rbtxn_get_toptxn(txn); > > 2) > - if (change->txn->toptxn) > - topxid = change->txn->toptxn->xid; > + if (rbtxn_is_subtxn(change->txn)) > + topxid = rbtxn_get_toptxn(change->txn)->xid; > > If you plan to fix, bothe the above also should be handled. OK, noted. Anyway, for now, I preferred the 'toptxn' member to be consistently hidden in the code so I don't plan to remove those macros. Also, please see Amit's reply [1] to your suggestion. > > 3) The comment on top of rbtxn_get_toptxn could be kept similar in > both the below places. I know it is not because of your change, but > since it is a very small change probably we could include it along > with this patch: > @@ -717,10 +717,7 @@ ReorderBufferProcessPartialChange(ReorderBuffer > *rb, ReorderBufferTXN *txn, > return; > > /* Get the top transaction. */ > - if (txn->toptxn != NULL) > - toptxn = txn->toptxn; > - else > - toptxn = txn; > + toptxn = rbtxn_get_toptxn(txn); > > /* > * Indicate a partial change for toast inserts. The change will be > @@ -812,10 +809,7 @@ ReorderBufferQueueChange(ReorderBuffer *rb, > TransactionId xid, XLogRecPtr lsn, > ReorderBufferTXN *toptxn; > > /* get the top transaction */ > - if (txn->toptxn != NULL) > - toptxn = txn->toptxn; > - else > - toptxn = txn; > + toptxn = rbtxn_get_toptxn(txn); > IMO the comment ("/* get the top transaction */") was not really saying anything useful that is not already obvious from the macro name ("rbtxn_get_toptxn"). So I've removed it entirely in your 2nd case. This change is consistent with other parts of the patch where the toptxn is just assigned in the declaration. PSA v3. [2] -- [1] Amit reply to your suggestion - https://www.postgresql.org/message-id/CAA4eK1%2BoqfUSC3vpu6bJzgfnSmu_yaeoLS%3DRb3416GuS5iRP1Q%40mail.gmail.com [2] v3 - https://www.postgresql.org/message-id/CAHut%2BPtrD4xU4OPUB64ZK%2BDPDhfKn3zph%3DnDpEWUFFzUvMKo2w%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: Add macros for ReorderBufferTXN toptxn
On Tue, Mar 14, 2023 at 10:43 PM Amit Kapila wrote: > > On Tue, Mar 14, 2023 at 12:37 PM Peter Smith wrote: > > > > Thanks for the review! > > > > On Mon, Mar 13, 2023 at 6:19 PM vignesh C wrote: > > ... > > > > > 4) We check if txn->toptxn is not null twice here both in if condition > > > and in the assignment, we could retain the assignment operation as > > > earlier to remove the 2nd check: > > > - if (txn->toptxn) > > > - txn = txn->toptxn; > > > + if (isa_subtxn(txn)) > > > + txn = get_toptxn(txn); > > > > > > We could avoid one check again by: > > > + if (isa_subtxn(txn)) > > > + txn = txn->toptxn; > > > > > > > Yeah, that is true, but I chose not to keep the original assignment in > > this case mainly because then it is consistent with the other changed > > code --- e.g. Every other direct member assignment/access of the > > 'toptxn' member now hides behind the macros so I did not want this > > single place to be the odd one out. TBH, I don't think 1 extra check > > is of any significance, but it is not a problem to change like you > > suggested if other people also want it done that way. > > > > Can't we directly use rbtxn_get_toptxn() for this case? I think that > way code will look neat. I see that it is not exactly matching the > existing check so you might be worried but I feel the new code will > achieve the same purpose and will be easy to follow. > OK. Done as suggested. PSA v3. -- Kind Regards, Peter Smith. Fujitsu Australia v3-0001-Add-macros-for-ReorderBufferTXN-toptxn.patch Description: Binary data
Re: [PATCH] Add pretty-printed XML output option
Jim Jones writes: > On 14.03.23 18:40, Tom Lane wrote: >> I poked at this for awhile and ran into a problem that I'm not sure >> how to solve: it misbehaves for input with embedded DOCTYPE. > The issue was the flag XML_SAVE_NO_EMPTY. It was forcing empty elements > to be serialized with start-end tag pairs. Removing it did the trick ... > ... but as a side effect empty start-end tags will be now serialized as > empty elements > postgres=# SELECT xmlserialize(CONTENT '' AS text > INDENT); > xmlserialize > -- > + > + > > (1 row) Huh, interesting. That is a legitimate pretty-fication of the input, I suppose, but some people might think it goes beyond the charter of "indentation". I'm okay with it personally; anyone want to object? regards, tom lane
Re: [PATCH] Add pretty-printed XML output option
On 14.03.23 18:40, Tom Lane wrote: Jim Jones writes: [ v22-0001-Add-pretty-printed-XML-output-option.patch ] I poked at this for awhile and ran into a problem that I'm not sure how to solve: it misbehaves for input with embedded DOCTYPE. regression=# SELECT xmlserialize(DOCUMENT '' as text indent); xmlserialize -- + + (1 row) The issue was the flag XML_SAVE_NO_EMPTY. It was forcing empty elements to be serialized with start-end tag pairs. Removing it did the trick ... postgres=# SELECT xmlserialize(DOCUMENT '' AS text INDENT); xmlserialize -- + + (1 row) ... but as a side effect empty start-end tags will be now serialized as empty elements postgres=# SELECT xmlserialize(CONTENT '' AS text INDENT); xmlserialize -- + + (1 row) It seems to be the standard behavior of other xml indent tools (including Oracle) regression=# SELECT xmlserialize(CONTENT '' as text indent); xmlserialize -- (1 row) The bad result for CONTENT is because xml_parse() decides to parse_as_document, but xmlserialize_indent has no idea that happened and tries to use the content_nodes list anyway. I don't especially care for the laissez faire "maybe we'll set *content_nodes and maybe we won't" API you adopted for xml_parse, which seems to be contributing to the mess. We could pass back more info so that xmlserialize_indent knows what really happened. I added a new (nullable) parameter to the xml_parse function that will return the actual XmlOptionType used to parse the xml data. Now xmlserialize_indent knows how the data was really parsed: postgres=# SELECT xmlserialize(CONTENT '' AS text INDENT); xmlserialize -- + + (1 row) I added test cases for these queries. v23 attached. Thanks! Best, Jim From 98fe15f07da345e046b8d29d5dde27ce191055a2 Mon Sep 17 00:00:00 2001 From: Jim Jones Date: Fri, 10 Mar 2023 13:47:16 +0100 Subject: [PATCH v23] Add pretty-printed XML output option This patch implements the XML/SQL:2011 feature 'X069, XMLSERIALIZE: INDENT.' It adds the options INDENT and NO INDENT (default) to the existing xmlserialize function. It uses the indentation feature of xmlSaveToBuffer from libxml2 to indent XML strings - see option XML_SAVE_FORMAT. Although the INDENT feature is designed to work with xml strings of type DOCUMENT, this implementation also allows the usage of CONTENT type strings as long as it contains a well balanced xml. This patch also includes documentation, regression tests and their three possible output files xml.out, xml_1.out and xml_2.out. --- doc/src/sgml/datatype.sgml| 8 +- src/backend/catalog/sql_features.txt | 2 +- src/backend/executor/execExprInterp.c | 9 +- src/backend/parser/gram.y | 14 +- src/backend/parser/parse_expr.c | 1 + src/backend/utils/adt/xml.c | 154 +++-- src/include/nodes/parsenodes.h| 1 + src/include/nodes/primnodes.h | 4 +- src/include/parser/kwlist.h | 1 + src/include/utils/xml.h | 1 + src/test/regress/expected/xml.out | 188 ++ src/test/regress/expected/xml_1.out | 106 +++ src/test/regress/expected/xml_2.out | 188 ++ src/test/regress/sql/xml.sql | 38 ++ 14 files changed, 697 insertions(+), 18 deletions(-) diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index 467b49b199..53d59662b9 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -4460,14 +4460,18 @@ xml 'bar' xml, uses the function xmlserialize:xmlserialize -XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type ) +XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ [NO] INDENT ] ) type can be character, character varying, or text (or an alias for one of those). Again, according to the SQL standard, this is the only way to convert between type xml and character types, but PostgreSQL also allows -you to simply cast the value. +you to simply cast the value. The option INDENT allows to +indent the serialized xml output - the default is NO INDENT. +It is designed to indent XML strings of type DOCUMENT, but it can also + be used with CONTENT as long as value + contains a well-formed XML. diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt index 0fb9ab7533..bb4c135a7f 100644 --- a/src/backend/catalog/sql_features.txt +++ b/src/backend/catalog/sql_features.txt @@ -621,7 +621,7 @@ X061 XMLParse: character string input and DOCUMENT option YES X065 XMLParse: binary string input and CONTENT option NO X066 XMLParse: binary string input and DOCUMENT option NO X068 XMLSerialize: BOM NO -X069 XMLSerialize: INDENT NO +X069 XMLSerialize: INDENT YES X070 XMLSerialize: character string serialization and CONTENT option
Re: Add pg_walinspect function with block info columns
On Tue, Mar 14, 2023 at 3:34 PM Melanie Plageman wrote: > After patching master to add in the columns from > pg_get_wal_records_info() which are not returned by > pg_get_wal_block_info() (except block_ref column of course), this query: > > SELECT COUNT(*) FROM pg_get_wal_block_info(:start_lsn, :end_lsn); > > took 467 ms. > > Perhaps this difference isn't important, but I found it noticeable. This seems weird to me too. It's not so much the performance overhead that bothers me (though that's not great either). It seems *illogical* to me. The query you end up writing must do two passes over the WAL records, but its structure almost suggests that it's necessary to do two separate passes over distinct "streams". Why doesn't it already work like this? Why do we need a separate pg_get_wal_block_info() function at all? -- Peter Geoghegan
Re: Track IO times in pg_stat_io
Hi, On 2023-03-09 11:50:38 -0500, Melanie Plageman wrote: > On Tue, Mar 7, 2023 at 1:39 PM Andres Freund wrote: > > On 2023-03-06 11:30:13 -0500, Melanie Plageman wrote: > > > > As pgstat_bktype_io_stats_valid() is called only in Assert(), I think > > > > that would be a good idea > > > > to also check that if counts are not Zero then times are not Zero. > > > > > > Yes, I think adding some validation around the relationship between > > > counts and timing should help prevent developers from forgetting to call > > > pg_stat_count_io_op() when calling pgstat_count_io_time() (as relevant). > > > > > > However, I think that we cannot check that if IO counts are non-zero > > > that IO times are non-zero, because the user may not have > > > track_io_timing enabled. We can check that if IO times are not zero, IO > > > counts are not zero. I've done this in the attached v3. > > > > And even if track_io_timing is enabled, the timer granularity might be so > > low > > that we *still* get zeroes. > > > > I wonder if we should get rid of pgStatBlockReadTime, pgStatBlockWriteTime, > > And then have pg_stat_reset_shared('io') reset pg_stat_database IO > stats? Yes. > > > @@ -1000,11 +1000,27 @@ ReadBuffer_common(SMgrRelation smgr, char > > > relpersistence, ForkNumber forkNum, > > > > > > if (isExtend) > > > { > > > + instr_time io_start, > > > + io_time; > > > + > > > /* new buffers are zero-filled */ > > > MemSet((char *) bufBlock, 0, BLCKSZ); > > > + > > > + if (track_io_timing) > > > + INSTR_TIME_SET_CURRENT(io_start); > > > + else > > > + INSTR_TIME_SET_ZERO(io_start); > > > + > > > > I wonder if there's an argument for tracking this in the existing IO stats > > as > > well. But I guess we've lived with this for a long time... > > Not sure I want to include that in this patchset. No, probably not. > > > typedef struct PgStat_BktypeIO > > > { > > > - PgStat_Counter > > > data[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; > > > + PgStat_Counter > > > counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; > > > + instr_time > > > times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES]; > > > } PgStat_BktypeIO; > > > > Ah, you're going to hate me. We can't store instr_time on disk. There's > > another patch that gets substantial peformance gains by varying the > > frequency > > at which instr_time keeps track of time based on the CPU frequency... > > What does that have to do with what we can store on disk? The frequency can change. > If so, would it not be enough to do this when reading/writing the stats > file? Theoretically yes. But to me it seems cleaner to do it when flushing to shared stats. See also the overflow issue below. > void > instr_time_deserialize(instr_time *dest, int64 *src, int length) > { > for (size_t i = 0; i < length; i++) > { > INSTR_TIME_SET_ZERO(dest[i]); > dest[i].ticks = src[i]; > } > } That wouldn't be correct, because what ticks means will at some point change between postgres stopping and starting. > > It also just doesn't have enough range to keep track of system wide > > time on a larger system. A single backend won't run for 293 years, but > > with a few thousand backends that's a whole different story. > > > > I think we need to accumulate in instr_time, but convert to floating point > > when flushing stats. > > Hmmm. So, are you saying that we need to read from disk when we query > the view and add that to what is in shared memory? That we only store > the delta since the last restart in the instr_time array? No, I don't think I am suggesting that. What I am trying to suggest is that PendingIOStats should contain instr_time, but that PgStat_IO should contain PgStat_Counter as microseconds, as before. > But, I don't see how that avoids the problem of backend total runtime > being 293 years. We would have to reset and write out the delta whenever > we thought the times would overflow. The overflow risk is due to storing nanoseconds (which is what instr_time stores internally on linux now) - which we don't need to do once accumulatated. Right now we store them as microseconds. nanosecond range: ((2**63) - 1)/(10**9*60*60*24*365) -> 292 years microsecond range: ((2**63) - 1)/(10**6*60*60*24*365) -> 292471 years If you assume 5k connections continually doing IO, a range of 292 years would last 21 days at nanosecond resolution. At microsecond resolution it's 58 years. Greetings, Andres Freund
Re: Add pg_walinspect function with block info columns
I'm excited to see that pg_get_wal_block_info() was merged. Thanks for working on this! Apologies for jumping back in here a bit late. I've been playing around with it and wanted to comment on the performance of JOINing to pg_get_wal_records_info(). On Tue, Mar 7, 2023 at 7:08 AM Matthias van de Meent wrote: > > On Tue, 7 Mar 2023 at 01:34, Michael Paquier wrote: > > > > On Mon, Mar 06, 2023 at 04:08:28PM +0100, Matthias van de Meent wrote: > > > On Mon, 6 Mar 2023 at 15:40, Bharath Rupireddy > > >> IMO, pg_get_wal_records_extended_info as proposed doesn't look good to > > >> me as it outputs most of the columns that are already given by > > >> pg_get_wal_records_info.What I think the best way at this point is to > > >> make it return the following: > > >> lsn pg_lsn > > >> block_id int8 > > >> spcOid oid > > >> dbOid oid > > >> relNumber oid > > >> forkNames text > > >> fpi bytea > > >> fpi_info text > > > > I would add the length of the block data (without the hole and > > compressed, as the FPI data should always be presented as > > uncompressed), and the block data if any (without the block data > > length as one can guess it based on the bytea data length). Note that > > a block can have both a FPI and some data assigned to it, as far as I > > recall. > > > > > The basic idea is to create a single entrypoint to all relevant data > > > from DecodedXLogRecord in SQL, not multiple. > > > > While I would agree with this principle on simplicity's ground in > > terms of minimizing the SQL interface and the pg_wal/ lookups, I > > disagree about it on unsability ground, because we can avoid extra SQL > > tweaks with more functions. One recent example I have in mind is > > partitionfuncs.c, which can actually be achieved with a WITH RECURSIVE > > on the catalogs. > > Correct, but in that case the user would build the same query (or at > least with the same complexity) as what we're executing under the > hood, right? > > > There are of course various degrees of complexity, > > and perhaps unnest() cannot qualify as one, but having two functions > > returning normalized records (one for the record information, and a > > second for the block information), is a rather good balance between > > usability and interface complexity, in my experience. > > I would agree, if it weren't for the reasons written below. > > > If you have two > > functions, a JOIN is enough to cross-check the block data and the > > record data, > > Joins are expensive on large datasets; and because WAL is one of the > largest datasets in the system, why would we want to force the user to > JOIN them if we can produce the data in one pre-baked data structure > without a need to join? I wanted to experiment to see how much slower it is to do the join between pg_get_wal_block_info() and pg_get_wal_records_info() and profile where the time was spent. I saved the wal lsn before and after a bunch of inserts (generates ~1,000,000 records). On master, I did a join like this: SELECT count(*) FROM pg_get_wal_block_info(:start_lsn, :end_lsn) b JOIN pg_get_wal_records_info(:start_lsn, :end_lsn) w ON w.start_lsn = b.lsn; which took 1191 ms. After patching master to add in the columns from pg_get_wal_records_info() which are not returned by pg_get_wal_block_info() (except block_ref column of course), this query: SELECT COUNT(*) FROM pg_get_wal_block_info(:start_lsn, :end_lsn); took 467 ms. Perhaps this difference isn't important, but I found it noticeable. A large chunk of the time is spent joining the tuplestores. The second largest chunk of time seems to be in GetWalRecordInfo()'s calls to XLogRecGetBlockRefInfo(), which spends quite a bit of time in string construction -- mainly for strings we wouldn't end up needing after joining to the block info function. Surprisingly, the string construction seemed to overshadow the performance impact of doubling the decoding passes over the WAL records. So maybe it is worth including more record-level info? On an unrelated note, I had thought that we should have some kind of parameter to pg_get_wal_block_info() to control whether or not we output the FPI to save us from wasting time decompressing the FPIs. AFAIK, we don't have access to projection information from inside the function, so a parameter like "output_fpi" or the like would have to do. I wanted to share what I found in trying this in case someone else had had that thought. TL;DR, it doesn't seem to matter from a performance perspective if we skip decompressing the FPIs in pg_get_wal_block_info(). I hacked "output_fpi" into pg_get_wal_block_info(), enabled pglz wal compression, and generated a boatload of FPIs by dirtying buffers, doing a checkpoint and then updating those pages again right after the checkpoint. With output_fpi = true (same as master), my call to pg_get_wal_block_info() took around 7 seconds and with output_fpi = false, it took around 6 seconds. Not an impressive difference. I noticed that only around
Re: Combine pg_walinspect till_end_of_wal functions with others
Hi, I just rebased a patch over commit 1f282c24e46 Author: Michael Paquier Date: 2023-03-13 13:03:29 +0900 Refactor and improve tests of pg_walinspect and got a test failure: https://cirrus-ci.com/task/5693041982308352 https://api.cirrus-ci.com/v1/artifact/task/5693041982308352/testrun/build/testrun/pg_walinspect/regress/regression.diffs diff -w -U3 C:/cirrus/contrib/pg_walinspect/expected/oldextversions.out C:/cirrus/build/testrun/pg_walinspect/regress/results/oldextversions.out --- C:/cirrus/contrib/pg_walinspect/expected/oldextversions.out 2023-03-14 21:19:01.399716500 + +++ C:/cirrus/build/testrun/pg_walinspect/regress/results/oldextversions.out 2023-03-14 21:26:27.504876700 + @@ -8,10 +8,10 @@ Object description --- function pg_get_wal_record_info(pg_lsn) - function pg_get_wal_records_info(pg_lsn,pg_lsn) function pg_get_wal_records_info_till_end_of_wal(pg_lsn) - function pg_get_wal_stats(pg_lsn,pg_lsn,boolean) + function pg_get_wal_records_info(pg_lsn,pg_lsn) function pg_get_wal_stats_till_end_of_wal(pg_lsn,boolean) + function pg_get_wal_stats(pg_lsn,pg_lsn,boolean) (5 rows) -- Make sure checkpoints don't interfere with the test. Looks like it's missing an ORDER BY. Greetings, Andres Freund
Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED
Here's a better version with more explicit comments about some details. It passes on CI. Planning to push this soon. From 0ac63e73313fffa19db22f19452c4bb08ea7bf14 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 14 Mar 2023 11:53:26 +1300 Subject: [PATCH v2] Fix waitpid() emulation on Windows. Our waitpid() emulation didn't prevent a PID from being recycled by the OS before the call to waitpid(), and the postmaster could finish up tracking multiple children with the same PID, leading to confusion. Fix, by moving the guts of pgwin32_deadchild_callback() into waitpid(), so that resources are released synchronously. The process and PID continue to exist until we close the process handle, which only happens once we're ready to adjust our book-keeping of running children. This seems to explain a couple of failures on CI. It had never been reported before, despite the code being as old as the Windows port. Perhaps Windows started recycling PIDs more rapidly, or perhaps timing changes due to commit 7389aad6 made it more likely to break. Thanks to Alexander Lakhin for analysis and Andres Freund for tracking down the root cause. Back-patch to all releases. Reported-by: Andres Freund Discussion: https://postgr.es/m/20230208012852.bvkn2am4h4iqjogq%40awork3.anarazel.de --- src/backend/postmaster/postmaster.c | 70 - 1 file changed, 40 insertions(+), 30 deletions(-) diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 2552327d90..71198b72c8 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -4811,7 +4811,7 @@ retry: (errmsg_internal("could not register process for wait: error code %lu", GetLastError(; - /* Don't close pi.hProcess here - the wait thread needs access to it */ + /* Don't close pi.hProcess here - waitpid() needs access to it */ CloseHandle(pi.hThread); @@ -6421,36 +6421,21 @@ ShmemBackendArrayRemove(Backend *bn) static pid_t waitpid(pid_t pid, int *exitstatus, int options) { + win32_deadchild_waitinfo *childinfo; + DWORD exitcode; DWORD dwd; ULONG_PTR key; OVERLAPPED *ovl; - /* - * Check if there are any dead children. If there are, return the pid of - * the first one that died. - */ - if (GetQueuedCompletionStatus(win32ChildQueue, , , , 0)) + /* Try to consume one win32_deadchild_waitinfo from the queue. */ + if (!GetQueuedCompletionStatus(win32ChildQueue, , , , 0)) { - *exitstatus = (int) key; - return dwd; + errno = EAGAIN; + return -1; } - return -1; -} - -/* - * Note! Code below executes on a thread pool! All operations must - * be thread safe! Note that elog() and friends must *not* be used. - */ -static void WINAPI -pgwin32_deadchild_callback(PVOID lpParameter, BOOLEAN TimerOrWaitFired) -{ - win32_deadchild_waitinfo *childinfo = (win32_deadchild_waitinfo *) lpParameter; - DWORD exitcode; - - if (TimerOrWaitFired) - return; /* timeout. Should never happen, since we use - * INFINITE as timeout value. */ + childinfo = (win32_deadchild_waitinfo *) key; + pid = childinfo->procId; /* * Remove handle from wait - required even though it's set to wait only @@ -6466,13 +6451,11 @@ pgwin32_deadchild_callback(PVOID lpParameter, BOOLEAN TimerOrWaitFired) write_stderr("could not read exit code for process\n"); exitcode = 255; } - - if (!PostQueuedCompletionStatus(win32ChildQueue, childinfo->procId, (ULONG_PTR) exitcode, NULL)) - write_stderr("could not post child completion status\n"); + *exitstatus = exitcode; /* - * Handle is per-process, so we close it here instead of in the - * originating thread + * Close the process handle. Only after this point can the PID can be + * recycled by the kernel. */ CloseHandle(childinfo->procHandle); @@ -6482,7 +6465,34 @@ pgwin32_deadchild_callback(PVOID lpParameter, BOOLEAN TimerOrWaitFired) */ free(childinfo); - /* Queue SIGCHLD signal */ + return pid; +} + +/* + * Note! Code below executes on a thread pool! All operations must + * be thread safe! Note that elog() and friends must *not* be used. + */ +static void WINAPI +pgwin32_deadchild_callback(PVOID lpParameter, BOOLEAN TimerOrWaitFired) +{ + /* Should never happen, since we use INFINITE as timeout value. */ + if (TimerOrWaitFired) + return; + + /* + * Post the win32_deadchild_waitinfo object for waitpid() to deal with. If + * that fails, we leak the object, but we also leak a whole process and + * get into an unrecoverable state, so there's not much point in worrying + * about that. We'd like to panic, but we can't use that infrastructure + * from this thread. + */ + if (!PostQueuedCompletionStatus(win32ChildQueue, + 0, + (ULONG_PTR) lpParameter, + NULL)) + write_stderr("could not post child completion status\n"); + + /* Queue SIGCHLD signal. */ pg_queue_signal(SIGCHLD); } #endif /* WIN32 */ -- 2.39.1
Re: Making background psql nicer to use in tap tests
> On 31 Jan 2023, at 01:00, Andres Freund wrote: > I've hacked some on this. I first tried to just introduce a few helper > functions in Cluster.pm, but that ended up being awkward. So I bit the bullet > and introduced a new class (in BackgroundPsql.pm), and made background_psql() > and interactive_psql() return an instance of it. Thanks for working on this! > This is just a rough prototype. Several function names don't seem great, it > need POD documentation, etc. It might be rough around the edges but I don't think it's too far off a state in which in can be committed, given that it's replacing something even rougher. With documentation and some polish I think we can iterate on it in the tree. I've played around a lot with it and it seems fairly robust. > I don't quite like the new interface yet: > - It's somewhat common to want to know if there was a failure, but also get > the query result, not sure what the best function signature for that is in > perl. What if query() returns a list with the return value last? The caller will get the return value when assigning a single var as the return, and can get both in those cases when it's interesting. That would make for reasonably readable code in most places? $ret_val = $h->query("SELECT 1;"); ($query_result, $ret_val) = $h->query("SELECT 1;"); Returning a hash seems like a worse option since it will complicate callsites which only want to know success/failure. > - query_until() sounds a bit too much like $node->poll_query_until(). Maybe > query_wait_until() is better? OTOH, the other function has poll in the name, > so maybe it's ok. query_until isn't great but query_wait_until is IMO worse since the function may well be used for tests which aren't using longrunning waits. It's also very useful for things which aren't queries at all, like psql backslash commands. I don't have any better ideas though, so +1 for sticking with query_until. > - right now there's a bit too much logic in background_psql() / > interactive_psql() for my taste Not sure what you mean, I don't think they're especially heavy on logic? > Those points aside, I think it already makes the tests a good bit more > readable. My WIP vacuum_defer_cleanup_age patch shrunk by half with it. The test for \password in the SCRAM iteration count patch shrunk to 1/3 of the previous coding. > I think with a bit more polish it's easy enough to use that we could avoid a > good number of those one-off psql's that we do all over. Agreed, and ideally implement tests which were left unwritten due to the old API being clunky. + # feed the query to psql's stdin, follwed by \n (so psql processes the s/follwed/followed/ +A default timeout of $PostgreSQL::Test::Utils::timeout_default is set up, +which can modified later. This require a bit of knowledge about the internals which I think we should hide in this new API. How about providing a function for defining the timeout? Re timeouts: one thing I've done repeatedly is to use short timeouts and reset them per query, and that turns pretty ugly fast. I hacked up your patch to provide $h->reset_timer_before_query() which then injects a {timeout}->start before running each query without the caller having to do it. Not sure if I'm alone in doing that but if not I think it makes sense to add. -- Daniel Gustafsson
Re: [Proposal] Allow pg_dump to include all child tables with the root table
Gilles Darold writes: > Thanks Stepane, I've changed commit fest status to "Ready for committers". Pushed with some minor editing. regards, tom lane
Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert
On Tue, Mar 14, 2023 at 11:01 AM Gregory Stark (as CFM) wrote: > It does look like a rebase for meson.build would be helpful. I'm not > marking it waiting on author just for meson.build conflicts but it > would be perhaps more likely to be picked up by a committer if it's > showing green in cfbot. Rebased over yesterday's Meson changes in v8. Thanks! --Jacob 1: b07af1c564 ! 1: 84f67249e6 libpq: add sslrootcert=system to use default CAs @@ doc/src/sgml/runtime.sgml: pg_dumpall -p 5432 | psql -d postgres -p 5433 ## meson.build ## -@@ meson.build: if get_option('ssl') == 'openssl' +@@ meson.build: if sslopt in ['auto', 'openssl'] + else + ssl = not_found_dep endif - endforeach - ++ ++if ssl.found() + # Let tests differentiate between vanilla OpenSSL and LibreSSL. + sym = 'LIBRESSL_VERSION_NUMBER' + found = cc.has_header_symbol('openssl/opensslv.h', sym, dependencies: ssl_int) + cdata.set10('HAVE_DECL_' + sym, found, description: +'''Define to 1 if you have the declaration of `@0@', and to 0 if you + don't.'''.format(sym)) -+ - cdata.set('USE_OPENSSL', 1, - description: 'Define to 1 to build with OpenSSL support. (-Dssl=openssl)') - cdata.set('OPENSSL_API_COMPAT', '0x10001000L', ++endif + endif + endif + ## src/include/pg_config.h.in ## @@ 2: 5de1c458b1 = 2: 11b69d0bc0 libpq: force sslmode=verify-full for system CAs From 84f67249e683d79a9a004b3f12507e77be3d9c6f Mon Sep 17 00:00:00 2001 From: Jacob Champion Date: Mon, 24 Oct 2022 15:30:25 -0700 Subject: [PATCH v8 1/2] libpq: add sslrootcert=system to use default CAs Based on a patch by Thomas Habets. Note the workaround in .cirrus.yml for a strange interaction between brew and the openssl@3 formula; hopefully this can be removed in the near future. Discussion: https://www.postgresql.org/message-id/flat/CA%2BkHd%2BcJwCUxVb-Gj_0ptr3_KZPwi3%2B67vK6HnLFBK9MzuYrLA%40mail.gmail.com --- .cirrus.yml | 14 +- configure | 289 +- configure.ac | 2 + doc/src/sgml/libpq.sgml | 17 ++ doc/src/sgml/runtime.sgml | 6 +- meson.build | 9 + src/include/pg_config.h.in| 4 + src/interfaces/libpq/fe-secure-openssl.c | 29 +- src/test/ssl/ssl/server-cn-only+server_ca.crt | 38 +++ src/test/ssl/sslfiles.mk | 6 +- src/test/ssl/t/001_ssltests.pl| 29 ++ 11 files changed, 298 insertions(+), 145 deletions(-) create mode 100644 src/test/ssl/ssl/server-cn-only+server_ca.crt diff --git a/.cirrus.yml b/.cirrus.yml index 505c50f328..67b42db559 100644 --- a/.cirrus.yml +++ b/.cirrus.yml @@ -469,12 +469,24 @@ task: make \ meson \ openldap \ - openssl \ + openssl@3 \ python \ tcl-tk \ zstd brew cleanup -s # to reduce cache size + +# brew cleanup removes the empty certs directory in OPENSSLDIR, causing +# OpenSSL to report unexpected errors ("unregistered scheme") during +# verification failures. Put it back for now as a workaround. +# +# https://github.com/orgs/Homebrew/discussions/4030 +# +# Note that $(brew --prefix openssl) will give us the opt/ prefix but not +# the etc/ prefix, so we hardcode the full path here. openssl@3 is pinned +# above to try to minimize the chances of this changing beneath us, but it's +# brittle... +mkdir -p "/opt/homebrew/etc/openssl@3/certs" upload_caches: homebrew ccache_cache: diff --git a/configure b/configure index e35769ea73..df7e937e0c 100755 --- a/configure +++ b/configure @@ -2094,6 +2094,56 @@ $as_echo "$ac_res" >&6; } } # ac_fn_c_check_func +# ac_fn_c_check_decl LINENO SYMBOL VAR INCLUDES +# - +# Tests whether SYMBOL is declared in INCLUDES, setting cache variable VAR +# accordingly. +ac_fn_c_check_decl () +{ + as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack + # Initialize each $ac_[]_AC_LANG_ABBREV[]_decl_warn_flag once. + as_decl_name=`echo $2|sed 's/ *(.*//'` + as_decl_use=`echo $2|sed -e 's/(/((/' -e 's/)/) 0&/' -e 's/,/) 0& (/g'` + { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $as_decl_name is declared" >&5 +$as_echo_n "checking whether $as_decl_name is declared... " >&6; } +if eval \${$3+:} false; then : + $as_echo_n "(cached) " >&6 +else + ac_save_werror_flag=$ac_c_werror_flag + ac_c_werror_flag="$ac_c_decl_warn_flag$ac_c_werror_flag" + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +$4 +int +main () +{ +#ifndef $as_decl_name +#ifdef __cplusplus + (void) $as_decl_use; +#else + (void) $as_decl_name; +#endif +#endif + + ; +
Re: [PoC] Let libpq reject unexpected authentication requests
On Mon, Mar 13, 2023 at 10:39 PM Michael Paquier wrote: > 0001 was looking fine enough seen from here, so applied it after > tweaking a few comments. That's enough to cover most of the needs of > this thread. Thank you very much! > 0002 looks pretty simple as well, I think that's worth a look for this > CF. Cool. v17 just rebases the set over HEAD, then, for cfbot. > I am not sure about 0003, to be honest, as I am wondering if > there could be a better solution than tying more the mechanism names > with the expected AUTH_REQ_* values.. Yeah, I'm not particularly excited about the approach I took. It'd be easier if we had a second SASL method to verify the implementation... I'd also proposed just adding an Assert, as a third option, to guide the eventual SASL implementer back to this conversation? --Jacob From e2343c008916fe8dd9ecacfa71c60cc250fa208d Mon Sep 17 00:00:00 2001 From: Jacob Champion Date: Tue, 18 Oct 2022 16:55:36 -0700 Subject: [PATCH v17 2/2] require_auth: decouple SASL and SCRAM Rather than assume that an AUTH_REQ_SASL* code refers to SCRAM-SHA-256, future-proof by separating the single allowlist into a list of allowed authentication request codes and a list of allowed SASL mechanisms. The require_auth check is now separated into two tiers. The AUTH_REQ_SASL* codes are always allowed. If the server sends one, responsibility for the check then falls to pg_SASL_init(), which compares the selected mechanism against the list of allowed mechanisms. (Other SCRAM code is already responsible for rejecting unexpected or out-of-order AUTH_REQ_SASL_* codes, so that's not explicitly handled with this addition.) Since there's only one recognized SASL mechanism, conn->sasl_mechs currently only points at static hardcoded lists. Whenever a second mechanism is added, the list will need to be managed dynamically. --- src/interfaces/libpq/fe-auth.c| 34 +++ src/interfaces/libpq/fe-connect.c | 41 +++ src/interfaces/libpq/libpq-int.h | 3 +- src/test/authentication/t/001_password.pl | 14 +--- src/test/ssl/t/002_scram.pl | 6 5 files changed, 86 insertions(+), 12 deletions(-) diff --git a/src/interfaces/libpq/fe-auth.c b/src/interfaces/libpq/fe-auth.c index b2497acdad..4ff49d8207 100644 --- a/src/interfaces/libpq/fe-auth.c +++ b/src/interfaces/libpq/fe-auth.c @@ -522,6 +522,40 @@ pg_SASL_init(PGconn *conn, int payloadlen) goto error; } + /* + * Before going ahead with any SASL exchange, ensure that the user has + * allowed (or, alternatively, has not forbidden) this particular mechanism. + * + * In a hypothetical future where a server responds with multiple SASL + * mechanism families, we would need to instead consult this list up above, + * during mechanism negotiation. We don't live in that world yet. The server + * presents one auth method and we decide whether that's acceptable or not. + */ + if (conn->sasl_mechs) + { + const char **mech; + bool found = false; + + Assert(conn->require_auth); + + for (mech = conn->sasl_mechs; *mech; mech++) + { + if (strcmp(*mech, selected_mechanism) == 0) + { +found = true; +break; + } + } + + if ((conn->sasl_mechs_denied && found) + || (!conn->sasl_mechs_denied && !found)) + { + libpq_append_conn_error(conn, "auth method \"%s\" requirement failed: server requested unacceptable SASL mechanism \"%s\"", + conn->require_auth, selected_mechanism); + goto error; + } + } + if (conn->channel_binding[0] == 'r' && /* require */ strcmp(selected_mechanism, SCRAM_SHA_256_PLUS_NAME) != 0) { diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index cbadb3f6af..a048793b46 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -1258,12 +1258,25 @@ connectOptions2(PGconn *conn) more; bool negated = false; + static const uint32 default_methods = ( + 1 << AUTH_REQ_SASL + | 1 << AUTH_REQ_SASL_CONT + | 1 << AUTH_REQ_SASL_FIN + ); + static const char *no_mechs[] = { NULL }; + /* - * By default, start from an empty set of allowed options and add to + * By default, start from a minimum set of allowed options and add to * it. + * + * NB: The SASL method codes are always "allowed" here. If the server + * requests SASL auth, pg_SASL_init() will enforce adherence to the + * sasl_mechs list, which by default is empty. */ conn->auth_required = true; - conn->allowed_auth_methods = 0; + conn->allowed_auth_methods = default_methods; + conn->sasl_mechs = no_mechs; + conn->sasl_mechs_denied = false; for (first = true, more = true; more; first = false) { @@ -1290,6 +1303,9 @@ connectOptions2(PGconn *conn) */ conn->auth_required = false; conn->allowed_auth_methods = -1; + + /* conn->sasl_mechs is now a list of denied mechanisms. */ + conn->sasl_mechs_denied = true; } else if (!negated)
Re: pg_stat_statements and "IN" conditions
> On Tue, Mar 14, 2023 at 02:14:17PM -0400, Gregory Stark (as CFM) wrote: > So I was seeing that this patch needs a rebase according to cfbot. Yeah, folks are getting up to speed in with pgss improvements recently. Thanks for letting me know. > However it looks like the review feedback you're looking for is more > of design questions. What jumbling is best to include in the feature > set and which is best to add in later patches. It sounds like you've > gotten conflicting feedback from initial reviews. > > It does sound like the patch is pretty mature and you're actively > responding to feedback so if you got more authoritative feedback it > might even be committable now. It's already been two years of being > rolled forward so it would be a shame to keep rolling it forward. You got it about right. There is a balance to strike between implementation, that would cover more useful cases, but has more dependencies (something like possibility of having multiple query id), and more minimalistic implementation that would actually be acceptable in the way it is now. But I haven't heard back from David about it, so I assume everybody is fine with the minimalistic approach. > Or is there some fatal problem that you're trying to work around and > still haven't found the magic combination that convinces any > committers this is something we want? In which case perhaps we set > this patch returned? I don't get that impression myself though. Nothing like this on my side, although I'm not good at conjuring committing powers of the nature.
Re: psql \watch 2nd argument: iteration count
On Tue, Mar 14, 2023 at 01:58:59PM +0900, Kyotaro Horiguchi wrote: > + if (*opt_end) > + pg_log_error("\\watch: incorrect > interval value '%s'", opt); > + else if (errno == ERANGE) > + pg_log_error("\\watch: out-of-range > interval value '%s'", opt); > + else > + pg_log_error("\\watch: interval value > '%s' less than zero", opt); > > I'm not sure if we need error messages for that resolution and I'm a > bit happier to have fewer messages to translate:p. Merging the cases > of ERANGE and negative values might be better. And I think we usually > refer to unparsable input as "invalid". > > if (*opt_end) > pg_log_error("\\watch: invalid interval value '%s'", opt); > else > pg_log_error("\\watch: interval value '%s' out of range", opt); +1, I don't think it's necessary to complicate these error messages too much. This code hasn't reported errors for nearly 10 years, and I'm not aware of any complaints. I ѕtill think we could simplify this to "\watch: invalid delay interval: %s" and call it a day. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Raising the SCRAM iteration count
CFBot is failing with this test failure... I'm not sure if this just represents a timing dependency or a bad test or what? [09:44:49.937] --- stdout --- [09:44:49.937] # executing test in /tmp/cirrus-ci-build/build-32/testrun/authentication/001_password group authentication test 001_password [09:44:49.937] ok 1 - scram_iterations in server side ROLE [09:44:49.937] # test failed [09:44:49.937] --- stderr --- [09:44:49.937] # Tests were run but no plan was declared and done_testing() was not seen. [09:44:49.937] # Looks like your test exited with 2 just after 1. [09:44:49.937] [09:44:49.937] (test program exited with status code 2) It looks like perhaps a Perl issue? # Running: /tmp/cirrus-ci-build/build-32/src/test/regress/pg_regress --config-auth /tmp/cirrus-ci-build/build-32/testrun/authentication/001_password/data/t_001_password_primary_data/pgdata ### Starting node "primary" # Running: pg_ctl -w -D /tmp/cirrus-ci-build/build-32/testrun/authentication/001_password/data/t_001_password_primary_data/pgdata -l /tmp/cirrus-ci-build/build-32/testrun/authentication/001_password/log/001_password_primary.log -o --cluster-name=primary start waiting for server to start done server started # Postmaster PID for node "primary" is 66930 [09:44:07.411](1.875s) ok 1 - scram_iterations in server side ROLE Can't locate IO/Pty.pm in @INC (you may need to install the IO::Pty module) (@INC contains: /tmp/cirrus-ci-build/src/test/perl /tmp/cirrus-ci-build/src/test/authentication /etc/perl /usr/local/lib/i386-linux-gnu/perl/5.32.1 /usr/local/share/perl/5.32.1 /usr/lib/i386-linux-gnu/perl5/5.32 /usr/share/perl5 /usr/lib/i386-linux-gnu/perl/5.32 /usr/share/perl/5.32 /usr/local/lib/site_perl) at /usr/share/perl5/IPC/Run.pm line 1828. Unexpected SCALAR(0x5814b508) in harness() parameter 3 at /tmp/cirrus-ci-build/src/test/perl/PostgreSQL/Test/Cluster.pm line 2112. Can't locate IO/Pty.pm in @INC (you may need to install the IO::Pty module) (@INC contains: /tmp/cirrus-ci-build/src/test/perl /tmp/cirrus-ci-build/src/test/authentication /etc/perl /usr/local/lib/i386-linux-gnu/perl/5.32.1 /usr/local/share/perl/5.32.1 /usr/lib/i386-linux-gnu/perl5/5.32 /usr/share/perl5 /usr/lib/i386-linux-gnu/perl/5.32 /usr/share/perl/5.32 /usr/local/lib/site_perl) at /usr/share/perl5/IPC/Run.pm line 1939. # Postmaster PID for node "primary" is 66930 ### Stopping node "primary" using mode immediate # Running: pg_ctl -D /tmp/cirrus-ci-build/build-32/testrun/authentication/001_password/data/t_001_password_primary_data/pgdata -m immediate stop waiting for server to shut down done server stopped # No postmaster PID for node "primary" [09:44:07.521](0.110s) # Tests were run but no plan was declared and done_testing() was not seen. [09:44:07.521](0.000s) # Looks like your test exited with 2 just after 1.
Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken
On Tue, Mar 14, 2023 at 03:38:45PM +1300, Thomas Munro wrote: > On Tue, Mar 14, 2023 at 12:10 PM Nathan Bossart > wrote: >> > * NOTE: although the delay is specified in microseconds, the effective >> > - * resolution is only 1/HZ, or 10 milliseconds, on most Unixen. Expect >> > - * the requested delay to be rounded up to the next resolution boundary. >> > + * resolution is only 1/HZ on systems that use periodic kernel ticks to >> > wake >> > + * up. This may cause sleeps to be rounded up by 1-20 milliseconds on >> > older >> > + * Unixen and Windows. >> >> nitpick: Could the 1/HZ versus 20 milliseconds discrepancy cause confusion? >> Otherwise, I think this is the right idea. > > Better words welcome; 1-20ms summarises the range I actually measured, > and if reports are correct about Windows' HZ=64 (1/HZ = 15.625ms) then > it neatly covers that too, so I don't feel too bad about not chasing > down the reason for that 10ms/20ms discrepancy; maybe I looked at the > wrong HZ number (which you can change, anyway), I'm not too used to > NetBSD... BTW they have a project plan to fix that > https://wiki.netbsd.org/projects/project/tickless/ Here is roughly what I had in mind: NOTE: Although the delay is specified in microseconds, older Unixen and Windows use periodic kernel ticks to wake up, which might increase the delay time significantly. We've observed delay increases as large as 20 milliseconds on supported platforms. >> > + * CAUTION: if interrupted by a signal, this function will return, but its >> > + * interface doesn't report that. It's not a good idea to use this >> > + * for long sleeps in the backend, because backends are expected to >> > respond to >> > + * interrupts promptly. Better practice for long sleeps is to use >> > WaitLatch() >> > + * with a timeout. >> >> I'm not sure this argument follows. If pg_usleep() returns if interrupted, >> then why are we concerned about delayed responses to interrupts? > > Because you can't rely on it: > > 1. Maybe the signal is delivered just before pg_usleep() begins, and > a handler sets some flag we would like to react to. Now pg_usleep() > will not be interrupted. That problem is solved by using latches > instead. > 2. Maybe the signal is one that is no longer handled by a handler at > all; these days, latches use SIGURG, which pops out when you read a > signalfd or kqueue, so pg_usleep() will not wake up. That problem is > solved by using latches instead. > > (The word "interrupt" is a bit overloaded, which doesn't help with > this discussion.) Yeah, I think it would be clearer if "interrupt" was disambiguated. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Sampling-based timing for EXPLAIN ANALYZE
On Tue, 17 Jan 2023 at 14:52, Tomas Vondra wrote: > > On 1/17/23 19:46, Andres Freund wrote: > > > I think a "hybrid" explain mode might be worth thinking about. Use the > > "current" sampling method for the first execution of a node, and for the > > first > > few milliseconds of a query (or perhaps the first few timestamp > > acquisitions). That provides an accurate explain analyze for short queries, > > without a significant slowdown. Then switch to sampling, which provides > > decent > > attribution for a bit longer running queries. > > > > Yeah, this is essentially the sampling I imagined when I first read the > subject of this thread. It samples which node executions to measure (and > then measures those accurately), while these patches sample timestamps. That sounds interesting. Fwiw my first thought would be to implement it a bit differently. Always have a timer running sampling right from the start, but also if there are less than, say, 1000 samples for a node then measure the actual start/finish time. So for any given node once you've hit enough samples to get a decent estimate you stop checking the time. That way any fast or rarely called nodes still have accurate measurements even if they get few samples and any long or frequently called nodes stop getting timestamps and just use timer counts. -- greg
Re: Privileges on PUBLICATION
FYI this looks like it needs a rebase due to a conflict in copy.c and an offset in pgoutput.c. Is there anything specific that still needs review or do you think you've handled all Peter's concerns? In particular, is there "a comprehensive description of what it is trying to do"? :)
Re: Using each rel as both outer and inner for JOIN_ANTI
So what is the status of this patch? It looks like you received some feedback from Emre, Tom, Ronan, and Alvaro but it also looks like you responded to most or all of that. Are you still blocked waiting for feedback? Anything specific you need help with? Or is the patch ready for commit now? In which case it would be good to rebase it since it's currently failing to apply. Well it would be good to rebase regardless but it would be especially important if we want to get it committed :)
Re: pg_stat_statements and "IN" conditions
So I was seeing that this patch needs a rebase according to cfbot. However it looks like the review feedback you're looking for is more of design questions. What jumbling is best to include in the feature set and which is best to add in later patches. It sounds like you've gotten conflicting feedback from initial reviews. It does sound like the patch is pretty mature and you're actively responding to feedback so if you got more authoritative feedback it might even be committable now. It's already been two years of being rolled forward so it would be a shame to keep rolling it forward. Or is there some fatal problem that you're trying to work around and still haven't found the magic combination that convinces any committers this is something we want? In which case perhaps we set this patch returned? I don't get that impression myself though.
Re: [EXTERNAL] Re: Support load balancing in libpq
The pgindent run in b6dfee28f is causing this patch to need a rebase for the cfbot to apply it.
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Tue, 14 Mar 2023 at 13:59, Tom Lane wrote: > > "Gregory Stark (as CFM)" writes: > > It looks like this needs a big rebase in fea-uth.c fe-auth-scram.c and > > fe-connect.c. Every hunk is failing which perhaps means the code > > you're patching has been moved or refactored? > > The cfbot is giving up after > v14-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch fails, > but that's been superseded (at least in part) by b6dfee28f. Ah, same with Jelte Fennema's patch for load balancing in libpq. -- greg
Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert
It does look like a rebase for meson.build would be helpful. I'm not marking it waiting on author just for meson.build conflicts but it would be perhaps more likely to be picked up by a committer if it's showing green in cfbot.
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
"Gregory Stark (as CFM)" writes: > It looks like this needs a big rebase in fea-uth.c fe-auth-scram.c and > fe-connect.c. Every hunk is failing which perhaps means the code > you're patching has been moved or refactored? The cfbot is giving up after v14-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch fails, but that's been superseded (at least in part) by b6dfee28f. regards, tom lane
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
It looks like this needs a big rebase in fea-uth.c fe-auth-scram.c and fe-connect.c. Every hunk is failing which perhaps means the code you're patching has been moved or refactored?
DROP DATABASE is interruptible
Hi, Unfortunately DROP DATABASE does not hold interrupt over its crucial steps. If you e.g. set a breakpoint on DropDatabaseBuffers() and then do a signal SIGINT, we'll process that interrupt before the transaction commits. A later connect to that database ends with: 2023-03-14 10:22:24.443 PDT [3439153][client backend][3/2:0][[unknown]] PANIC: could not open critical system index 2662 It's not entirely obvious how to fix this. We can't just hold interrupts for the whole transaction - for one, we hang if we do so, because it prevents ourselves from absorbing our own barrier: /* Close all smgr fds in all backends. */ WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE)); ISTM that at the very least dropdb() needs to internally commit *before* dropping buffers - after that point the database is corrupt. Greetings, Andres Freund
Re: [PATCH] Add pretty-printed XML output option
Jim Jones writes: > [ v22-0001-Add-pretty-printed-XML-output-option.patch ] I poked at this for awhile and ran into a problem that I'm not sure how to solve: it misbehaves for input with embedded DOCTYPE. regression=# SELECT xmlserialize(DOCUMENT '' as text indent); xmlserialize -- + + (1 row) regression=# SELECT xmlserialize(CONTENT '' as text indent); xmlserialize -- (1 row) The bad result for CONTENT is because xml_parse() decides to parse_as_document, but xmlserialize_indent has no idea that happened and tries to use the content_nodes list anyway. I don't especially care for the laissez faire "maybe we'll set *content_nodes and maybe we won't" API you adopted for xml_parse, which seems to be contributing to the mess. We could pass back more info so that xmlserialize_indent knows what really happened. However, that won't fix the bogus output for the DOCUMENT case. Are we perhaps passing incorrect flags to xmlSaveToBuffer? regards, tom lane
Re: ICU locale validation / canonicalization
On Tue, 2023-03-14 at 08:08 +0100, Peter Eisentraut wrote: > Another issue that came to mind: Right now, you can, say, develop > SQL > schemas on a newer ICU version, say, your laptop, and then deploy > them > on a server running an older ICU version. If we have a cutoff beyond > which we convert ICU locale IDs to language tags, then this won't > work > anymore for certain combinations. And RHEL/CentOS 7 is still pretty > popular. If we just uloc_canonicalize() in icu_set_collation_attributes() then versions 50-53 can support language tags. Patch attached. One loose end is that we really should support language tags like "und" in those older versions (54 and earlier). Your commit d72900bded avoided the problem, but perhaps we should fix it by looking for "und" and replacing it with "root" while opening, or something. -- Jeff Davis PostgreSQL Contributor Team - AWS From ba1f0794d3be3fe7a9a365f4b6312bef9c215728 Mon Sep 17 00:00:00 2001 From: Jeff Davis Date: Tue, 14 Mar 2023 09:58:29 -0700 Subject: [PATCH v1] Support language tags in older ICU versions (53 and earlier). By calling uloc_canonicalize() before parsing the attributes, the existing locale attribute parsing logic works on language tags as well. Fix a small memory leak, too. --- src/backend/commands/collationcmds.c | 8 +++--- src/backend/utils/adt/pg_locale.c | 26 --- .../regress/expected/collate.icu.utf8.out | 8 ++ src/test/regress/sql/collate.icu.utf8.sql | 4 +++ 4 files changed, 38 insertions(+), 8 deletions(-) diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c index 8949684afe..b8f2e7059f 100644 --- a/src/backend/commands/collationcmds.c +++ b/src/backend/commands/collationcmds.c @@ -950,7 +950,6 @@ pg_import_system_collations(PG_FUNCTION_ARGS) const char *name; char *langtag; char *icucomment; - const char *iculocstr; Oid collid; if (i == -1) @@ -959,20 +958,19 @@ pg_import_system_collations(PG_FUNCTION_ARGS) name = uloc_getAvailable(i); langtag = get_icu_language_tag(name); - iculocstr = U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : name; /* * Be paranoid about not allowing any non-ASCII strings into * pg_collation */ - if (!pg_is_ascii(langtag) || !pg_is_ascii(iculocstr)) + if (!pg_is_ascii(langtag) || !pg_is_ascii(langtag)) continue; collid = CollationCreate(psprintf("%s-x-icu", langtag), nspid, GetUserId(), COLLPROVIDER_ICU, true, -1, - NULL, NULL, iculocstr, NULL, - get_collation_actual_version(COLLPROVIDER_ICU, iculocstr), + NULL, NULL, langtag, NULL, + get_collation_actual_version(COLLPROVIDER_ICU, langtag), true, true); if (OidIsValid(collid)) { diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c index 1d3d4d86d3..b9c7fbd511 100644 --- a/src/backend/utils/adt/pg_locale.c +++ b/src/backend/utils/adt/pg_locale.c @@ -2643,9 +2643,28 @@ pg_attribute_unused() static void icu_set_collation_attributes(UCollator *collator, const char *loc) { - char *str = asc_tolower(loc, strlen(loc)); + UErrorCode status; + int32_t len; + char *icu_locale_id; + char *lower_str; + char *str; - str = strchr(str, '@'); + /* first, make sure the string is an ICU format locale ID */ + status = U_ZERO_ERROR; + len = uloc_canonicalize(loc, NULL, 0, ); + icu_locale_id = palloc(len + 1); + status = U_ZERO_ERROR; + len = uloc_canonicalize(loc, icu_locale_id, len + 1, ); + if (U_FAILURE(status)) + ereport(ERROR, +(errmsg("canonicalization failed for locale string \"%s\": %s", + loc, u_errorName(status; + + lower_str = asc_tolower(icu_locale_id, strlen(icu_locale_id)); + + pfree(icu_locale_id); + + str = strchr(lower_str, '@'); if (!str) return; str++; @@ -2660,7 +2679,6 @@ icu_set_collation_attributes(UCollator *collator, const char *loc) char *value; UColAttribute uattr; UColAttributeValue uvalue; - UErrorCode status; status = U_ZERO_ERROR; @@ -2727,6 +2745,8 @@ icu_set_collation_attributes(UCollator *collator, const char *loc) loc, u_errorName(status; } } + + pfree(lower_str); } #endif /* USE_ICU */ diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out index 9a3e12e42d..057c1d1bf6 100644 --- a/src/test/regress/expected/collate.icu.utf8.out +++ b/src/test/regress/expected/collate.icu.utf8.out @@ -1304,6 +1304,14 @@ SELECT 'abc' <= 'ABC' COLLATE case_insensitive, 'abc' >= 'ABC' COLLATE case_inse t| t (1 row) +-- test language tags +CREATE COLLATION langtag (provider = icu, locale = 'en-u-ks-level1', deterministic = false); +SELECT 'aBcD' COLLATE langtag = 'AbCd' COLLATE langtag; + ?column? +-- + t +(1 row) + CREATE TABLE test1cs (x text COLLATE case_sensitive); CREATE TABLE test2cs (x text COLLATE
Re: Add LZ4 compression in pg_dump
On 3/14/23 12:07, gkokola...@pm.me wrote: > > > --- Original Message --- > On Monday, March 13th, 2023 at 10:47 PM, Tomas Vondra > wrote: > > > >> >>> Change pg_fatal() to an assertion+comment; >> >> >> Yeah, that's reasonable. I'd even ditch the assert/comment, TBH. We >> could add such protections against "impossible" stuff to a zillion other >> places and the confusion likely outweighs the benefits. >> > > A minor note to add is to not ignore the lessons learned from a7885c9bb. > > For example, as the testing framework stands, one can not test that the > contents of the custom format are indeed compressed. One can infer it by > examining the header of the produced dump and searching for the > compression flag. The code responsible for writing the header and the > code responsible for actually dealing with data, is not the same. Also, > the compression library itself will happily read and write uncompressed > data. > > A pg_fatal, assertion, or similar, is the only guard rail against this > kind of error. Without it, the tests will continue passing even after > e.g. a wrong initialization of the API. It was such a case that lead to > a7885c9bb in the first place. I do think that we wish it to be an > "impossible" case. Also it will be an untested case with some history > without such a guard rail. > So is the pg_fatal() a dead code or not? My understanding was it's not really reachable, and the main purpose is to remind people this is not possible. Or am I mistaken/confused? If it's reachable, can we test it? AFAICS we don't, per the coverage reports. If it's just a protection against incorrect API initialization, then an assert is the right solution, I think. With proper comment. But can't we actually verify that *during* the initialization? Also, how come WriteDataToArchiveLZ4() doesn't need this protection too? Or is that due to gzip being the default compression method? > Of course I will not object to removing it, if you think that is more > confusing than useful. > Not sure, I have a feeling I don't quite understand in what situation this actually helps. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: ICU 54 and earlier are too dangerous
On Mon, 2023-03-13 at 18:13 -0700, Andres Freund wrote: > What non-error code is returned in the above example? When the collator for locale "asdf" is opened, the status is set to U_USING_DEFAULT_WARNING. That seemed very promising at first, but it's the same thing returned after opening most valid locales, including "en" and "en-US". It seems to only return U_ZERO_ERROR on an exact hit, like "fr-CA" or "root". There's also U_USING_FALLBACK_WARNING, which also seemed promising, but it's returned when opening "fr-FR" or "ja-JP" (falls back to "fr" and "ja" respectively). > Can we query the returned collator and see if it matches what we were > looking > for? I tried a few variations of that in my canonicalization / validation patch, which I called "validation". The closest thing I found is: ucol_getLocaleByType(collator, ULOC_VALID_LOCALE, ) We could strip away the attributes and compare to the result of that, and it mostly works. There are a few complications, like I think we need to preserve the "collation" attribute for things like "de@collation=phonebook". Another thing to consider is that the environment might happen to open the collation you intend at the time the collation is created, but then later of course the environment can change, so we'd have to check every time it's opened. And getting an error when the collation is opened is not great, so it might need to be a WARNING or something, and it starts to get less useful. What would be *really* nice is if there was some kind of way to tell if there was no real match to a known locale, either during open or via some other API. I wasn't able to find one, though. Actually, now that I think about it, we could just search all known locales using either ucol_getAvailable() or uloc_getAvailable(), and see if there's a match. Not very clean, but it should catch most problems. I'll look into whether there's a reasonable way to match or not. > > I'm a bit confused by the dates. > https://icu.unicode.org/download/55m1 says > that version was released 2014-12-17, but the linked issue around > root locales > is from 2018: https://unicode-org.atlassian.net/browse/ICU-10823 - I > guess > the issue tracker was migrated at some point or such... The dates are misleading in both git (migrated from SVN circa 2016) and JIRA (migrated circa 2018, see https://unicode-org.atlassian.net/browse/ICU-1 ). It seems 55.1 was released in either 2014 or 2015. Regards, Jeff Davis
Re: Add LZ4 compression in pg_dump
On 3/14/23 16:18, gkokola...@pm.me wrote: > ...> Would you mind me trying to come with a patch to address your points? > That'd be great, thanks. Please keep it split into smaller patches - two might work, with one patch for "cosmetic" changes and the other tweaking the API error-handling stuff. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
RE: Allow logical replication to copy tables in binary format
On Tuesday, March 14, 2023 8:02 PM Melih Mutlu wrote: > Attached v13. Hi, Thanks for sharing v13. Few minor review comments. (1) create_subscription.sgml + column types as opposed to text format. Even when this option is enabled, + only data types having binary send and receive functions will be + transferred in binary. Note that the initial synchronization requires (1-1) I think it's helpful to add a reference for the description about send and receive functions (e.g. to the page of CREATE TYPE). (1-2) Also, it would be better to have a cross reference from there to this doc as one paragraph probably in "Notes". I suggested this, because send and receive functions are described as "optional" there and missing them leads to error in the context of binary table synchronization. (3) copy_table() + /* +* If the publisher version is earlier than v14, it COPY command doesn't +* support the binary option. +*/ This sentence doesn't look correct grammatically. We can replace "it COPY command" with "subscription" for example. Kindly please fix it. Best Regards, Takamichi Osumi
Re: Add LZ4 compression in pg_dump
--- Original Message --- On Monday, March 13th, 2023 at 9:21 PM, Tomas Vondra wrote: > > > > > On 3/11/23 11:50, gkokola...@pm.me wrote: > > > --- Original Message --- > > On Saturday, March 11th, 2023 at 7:00 AM, Alexander Lakhin > > exclus...@gmail.com wrote: > > > > > Hello, > > > 23.02.2023 23:24, Tomas Vondra wrote: > > > Thanks for the patch. > > I did look if there are other places that might have the same issue, and > I think there are - see attached 0002. For example LZ4File_write is > declared to return size_t, but then it also does > > if (LZ4F_isError(status)) > { > fs->errcode = status; > > return -1; > } > > That won't work :-( You are right. It is confusing. > > And these issues may not be restricted to lz4 code - Gzip_write is > declared to return size_t, but it does > > return gzwrite(gzfp, ptr, size); > > and gzwrite returns int. Although, maybe that's correct, because > gzwrite() is "0 on error" so maybe this is fine ... > > However, Gzip_read assigns gzread() to size_t, and that does not seem > great. It probably will still trigger the following pg_fatal() because > it'd be very lucky to match the expected 'size', but it's confusing. Agreed. > > > I wonder whether CompressorState should use int or size_t for the > read_func/write_func callbacks. I guess no option is perfect, i.e. no > data type will work for all compression libraries we might use (lz4 uses > int while lz4f uses size_t, to there's that). > > It's a bit weird the "open" functions return int and the read/write > size_t. Maybe we should stick to int, which is what the old functions > (cfwrite etc.) did. > You are right. These functions are modeled by the open/fread/ fwrite etc, and they have kept the return types of these ones. Their callers do check the return value of read_func and write_func against the requested size of bytes to be transferred. > > But I think the actual problem here is that the API does not clearly > define how errors are communicated. I mean, it's nice to return the > value returned by the library function without "mangling" it by > conversion to size_t, but what if the libraries communicate errors in > different way? Some may return "0" while others may return "-1". Agreed. > > I think the right approach is to handle all library errors and not just > let them through. So Gzip_write() needs to check the return value, and > either call pg_fatal() or translate it to an error defined by the API. It makes sense. It will change some of the behaviour of the callers, mostly on what constitutes an error, and what error message is emitted. This is a reasonable change though. > > For example we might say "returns 0 on error" and then translate all > library-specific errors to that. Ok. > While looking at the code I realized a couple function comments don't > say what's returned in case of error, etc. So 0004 adds those. > > 0003 is a couple minor assorted comments/questions: > > - Should we move ZLIB_OUT_SIZE/ZLIB_IN_SIZE to compress_gzip.c? It would make things clearer. > - Why are LZ4 buffer sizes different (ZLIB has both 4kB)? Clearly some comments are needed, if the difference makes sense. > - I wonder if we actually need LZ4F_HEADER_SIZE_MAX? Is it even possible > for LZ4F_compressBound to return value this small (especially for 16kB > input buffer)? > I would recommend to keep it. Earlier versions of LZ4F_HEADER_SIZE_MAX do not have it. Later versions do advise to use it. Would you mind me trying to come with a patch to address your points? Cheers, //Georgios > > > regards > > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
Re: Progress report of CREATE INDEX for nested partitioned tables
> 14 марта 2023 г., в 18:34, Justin Pryzby написал(а): > > On Mon, Mar 13, 2023 at 10:42:59AM -0400, Tom Lane wrote: >> Justin Pryzby writes: >>> On Sun, Mar 12, 2023 at 06:25:13PM -0400, Tom Lane wrote: I agree that adding such a field to IndexStmt would be a very bad idea. However, adding another parameter to DefineIndex doesn't seem like a problem. >> >>> It's a problem since this is a bug and it's desirable to backpatch a >>> fix, right ? >> >> I do not think this is important enough to justify a back-patch. > > That's fine with me, but it comes as a surprise, and it might invalidate > earlier discussions, which were working under the constraint of > maintaining a compatible ABI. > >>> Incrementing by 0 sounds terrible, since someone who has intermediate >>> partitioned tables is likely to always see 0% done. >> >> How so? The counter will increase after there's some actual work done, >> ie building an index. If there's no indexes to build then it hardly >> matters, because the command will complete in very little time. > > I misunderstood your idea as suggesting to skip progress reporting for > *every* intermediate partitioned index, and its leaves. But I guess > what you meant is to skip progress reporting when ATTACHing an > intermediate partitioned index. That seems okay, since 1) intermediate > partitioned tables are probably rare, and 2) ATTACH is fast, so the > effect is indistinguisable from querying the progress report a few > moments later. +1 that counting attached partitioned indexes as 0 is fine. > The idea would be for: > 1) TOTAL to show the number of direct and indirect leaf partitions; > 2) update progress while building direct or indirect indexes; > 3) ATTACHing intermediate partitioned tables to increment by 0; > 4) ATTACHing a direct child should continue to increment by 1, > since that common case already works as expected and shouldn't be > changed. > > The only change from the current patch is (3). (1) still calls > count_leaf_partitions(), but only once. I'd prefer that to rearranging > the progress reporting to set the TOTAL in ProcessUtilitySlow(). > > -- > Justin As for reusing TOTAL calculated outside of DefineIndex, as I can see, ProcessUtilitySlow is not the only call site for DefineIndex (although, I don’t know whether all of them need progress tracking), for instance, there is ALTER TABLE that calls DefineIndex to create index for constraints. So I feel like rearranging progress reporting will result in unnecessary code duplication in those call sites, so passing in an optional parameter seems to be easier here, if we are going to optimize it, after all. Especially if back-patching is a non-issue.
Re: doc: mentioned CREATE+ATTACH PARTITION as an alternative to CREATE TABLE..PARTITION OF
Hi, I've tested the attached patch by Justin and it applied almost cleanly to the master, but there was a tiny typo and make postgres-A4.pdf didn't want to run: Note that creating a partition using PARTITION OF => (note lack of closing literal) => Note that creating a partition using PARTITION OF Attached is version v0002 that contains this fix. @Justin maybe you could set the status to Ready for Comitter ( https://commitfest.postgresql.org/42/3790/ ) ? On Thu, Jan 19, 2023 at 10:58 PM Justin Pryzby wrote: > > On Thu, Jan 19, 2023 at 04:47:59PM -0500, Robert Treat wrote: > > I think all of that feedback is useful, I guess the immediate question > > becomes if Justin wants to try to proceed with his patch implementing > > the change, or if adjusting the documentation for the current > > implementation is the right move for now. > > The docs change is desirable in any case, since it should be > backpatched, and any patch to change CREATE..PARTITION OF would be for > v17+ anyway. > > -- > Justin > > v0002-Justin-doc_mention_CREATE+ATTACH_PARTITION.patch Description: Binary data
Re: Progress report of CREATE INDEX for nested partitioned tables
Justin Pryzby writes: > The idea would be for: > 1) TOTAL to show the number of direct and indirect leaf partitions; > 2) update progress while building direct or indirect indexes; > 3) ATTACHing intermediate partitioned tables to increment by 0; > 4) ATTACHing a direct child should continue to increment by 1, > since that common case already works as expected and shouldn't be > changed. OK. > The only change from the current patch is (3). (1) still calls > count_leaf_partitions(), but only once. I'd prefer that to rearranging > the progress reporting to set the TOTAL in ProcessUtilitySlow(). I don't agree with that. find_all_inheritors is fairly expensive and it seems completely silly to do it twice just to avoid adding a parameter to DefineIndex. regards, tom lane
Re: Ignoring BRIN for HOT updates (was: -udpates seems broken)
On Tue, 14 Mar 2023 at 14:49, Tomas Vondra wrote: > > > > On 3/8/23 23:31, Matthias van de Meent wrote: > > On Wed, 22 Feb 2023 at 14:14, Matthias van de Meent > > > > I think that the v4 patch solves all comments up to now; and > > considering that most of this patch was committed but then reverted > > due to an issue in v15, and that said issue is fixed in this patch, > > I'm marking this as ready for committer. > > > > Tomas, would you be up for that? > > > > Thanks for the patch. I started looking at it yesterday, and I think > it's 99% RFC. I think it's correct and I only have some minor comments, > (see the 0002 patch): > > > 1) There were still a couple minor wording issues in the sgml docs. > > 2) bikeshedding: I added a bunch of "()" to various conditions, I think > it makes it clearer. Sure > 3) This seems a bit weird way to write a conditional Assert: > > if (onlySummarized) > Assert(HeapTupleIsHeapOnly(heapTuple)); > > better to do a composed Assert(!(onlySummarized && !...)) or something? I don't like this double negation, as it adds significant parsing complexity to the statement. If I'd have gone with a single Assert() statement, I'd have used the following: Assert((!onlySummarized) || HeapTupleIsHeapOnly(heapTuple)); because in the code section above that the HOT + !onlySummarized case is an early exit. > 4) A couple comments and minor tweaks. > 5) Undoing a couple unnecessary changes (whitespace, ...). > 6) Proper formatting of TU_UpdateIndexes enum. Allright > + * > + * XXX Why do we assign explicit values to the members, instead of just > letting > + * it up to the enum (just like for TM_Result)? This was from the v15 beta window, to reduce the difference between bool and TU_UpdateIndexes. With pg16, that can be dropped. > > 7) Comment in RelationGetIndexAttrBitmap() is misleading, as it still > references hotblockingattrs, even though it may update summarizedattrs > in some cases. How about Since we have covering indexes with non-key columns, we must handle them accurately here. Non-key columns must be added into the hotblocking or summarizing attrs bitmap, since they are in the index, and update shouldn't miss them. instead for that section? > If you agree with these changes, I'll get it committed. Yes, thanks! Kind regards, Matthias van de Meent
Re: Progress report of CREATE INDEX for nested partitioned tables
On Mon, Mar 13, 2023 at 10:42:59AM -0400, Tom Lane wrote: > Justin Pryzby writes: > > On Sun, Mar 12, 2023 at 06:25:13PM -0400, Tom Lane wrote: > >> I agree that adding such a field to IndexStmt would be a very bad idea. > >> However, adding another parameter to DefineIndex doesn't seem like a > >> problem. > > > It's a problem since this is a bug and it's desirable to backpatch a > > fix, right ? > > I do not think this is important enough to justify a back-patch. That's fine with me, but it comes as a surprise, and it might invalidate earlier discussions, which were working under the constraint of maintaining a compatible ABI. > > Incrementing by 0 sounds terrible, since someone who has intermediate > > partitioned tables is likely to always see 0% done. > > How so? The counter will increase after there's some actual work done, > ie building an index. If there's no indexes to build then it hardly > matters, because the command will complete in very little time. I misunderstood your idea as suggesting to skip progress reporting for *every* intermediate partitioned index, and its leaves. But I guess what you meant is to skip progress reporting when ATTACHing an intermediate partitioned index. That seems okay, since 1) intermediate partitioned tables are probably rare, and 2) ATTACH is fast, so the effect is indistinguisable from querying the progress report a few moments later. The idea would be for: 1) TOTAL to show the number of direct and indirect leaf partitions; 2) update progress while building direct or indirect indexes; 3) ATTACHing intermediate partitioned tables to increment by 0; 4) ATTACHing a direct child should continue to increment by 1, since that common case already works as expected and shouldn't be changed. The only change from the current patch is (3). (1) still calls count_leaf_partitions(), but only once. I'd prefer that to rearranging the progress reporting to set the TOTAL in ProcessUtilitySlow(). -- Justin
Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)
On 3/14/23 11:34, Christoph Berg wrote: > Re: Tomas Vondra >> and I don't think there's a good place to inject the 'rm' so I ended up >> adding a 'cleanup_cmd' right after 'compress_cmd'. But it seems a bit >> strange / hacky. Maybe there's a better way? > > Does the file need to be removed at all? Could we leave it there and > have "make clean" remove it? > I don't think that'd work, because of the automatic "discovery" where we check if a file exists, and if not we try to append .gz and .lz4. So if you leave the .toc, we'd not find the .lz4, making the test useless ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: meson: Non-feature feature options
Hi, On Wed, 22 Feb 2023 at 12:14, Peter Eisentraut wrote: > > On 21.02.23 17:32, Nazir Bilal Yavuz wrote: > >>> I like the second approach, with a 'uuid' feature option. As you wrote > >>> earlier, adding an 'auto' choice to a combo option doesn't work fully > >>> like a > >>> real feature option. > >> But we can make it behave exactly like one, by checking the auto_features > >> option. > > Yes, we can set it like `uuidopt = get_option('auto_features')`. > > However, if someone wants to set 'auto_features' to 'disabled' but > > 'uuid' to 'enabled'(to find at least one working uuid library); this > > won't be possible. We can add 'enabled', 'disabled and 'auto' choices > > to 'uuid' combo option to make all behaviours possible but adding > > 'uuid' feature option and 'uuid_library' combo option seems better to > > me. > > I think the uuid side of this is making this way too complicated. I'm > content leaving this as a manual option for now. > > There is much more value in making the ssl option work automatically. > So I would welcome a patch that just makes -Dssl=auto work smoothly, > perhaps using the "trick" that Andres described. I tried to implement what we did for ssl to uuid as well, do you have any comments? Regards, Nazir Bilal Yavuz Microsoft From 0e92050c15ca6e9ebeddaafad229eee53fc9e7b0 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Tue, 14 Mar 2023 16:38:02 +0300 Subject: [PATCH v1] meson: Make auto the default of the uuid option --- .cirrus.yml | 5 +-- meson.build | 80 ++- meson_options.txt | 4 +- src/makefiles/meson.build | 2 +- 4 files changed, 59 insertions(+), 32 deletions(-) diff --git a/.cirrus.yml b/.cirrus.yml index 505c50f3285..1ccb14c6340 100644 --- a/.cirrus.yml +++ b/.cirrus.yml @@ -181,7 +181,7 @@ task: su postgres <<-EOF meson setup \ --buildtype=debug \ --Dcassert=true -Duuid=bsd -Dtcl_version=tcl86 -Ddtrace=auto \ +-Dcassert=true -Dtcl_version=tcl86 -Ddtrace=auto \ -DPG_TEST_EXTRA="$PG_TEST_EXTRA" \ -Dextra_lib_dirs=/usr/local/lib -Dextra_include_dirs=/usr/local/include/ \ build @@ -243,7 +243,6 @@ LINUX_CONFIGURE_FEATURES: _CONFIGURE_FEATURES >- LINUX_MESON_FEATURES: _MESON_FEATURES >- -Dllvm=enabled - -Duuid=e2fs task: @@ -496,7 +495,7 @@ task: -Dextra_include_dirs=${brewpath}/include \ -Dextra_lib_dirs=${brewpath}/lib \ -Dcassert=true \ - -Duuid=e2fs -Ddtrace=auto \ + -Ddtrace=auto \ -Dsegsize_blocks=6 \ -DPG_TEST_EXTRA="$PG_TEST_EXTRA" \ build diff --git a/meson.build b/meson.build index 2ebdf914c1b..7955ae42d03 100644 --- a/meson.build +++ b/meson.build @@ -1282,34 +1282,61 @@ endif ### uuidopt = get_option('uuid') +uuid_library = 'none' +uuid = not_found_dep +if uuidopt == 'auto' and auto_features.disabled() + uuidopt = 'none' +endif + if uuidopt != 'none' - uuidname = uuidopt.to_upper() - if uuidopt == 'e2fs' -uuid = dependency('uuid', required: true) -uuidfunc = 'uuid_generate' -uuidheader = 'uuid/uuid.h' - elif uuidopt == 'bsd' -# libc should have uuid function -uuid = declare_dependency() -uuidfunc = 'uuid_to_string' -uuidheader = 'uuid.h' - elif uuidopt == 'ossp' -uuid = dependency('ossp-uuid', required: true) -uuidfunc = 'uuid_export' -uuidheader = 'uuid.h' - else -error('huh') - endif - if not cc.has_header_symbol(uuidheader, uuidfunc, args: test_c_args, dependencies: uuid) -error('uuid library @0@ missing required function @1@'.format(uuidopt, uuidfunc)) - endif - cdata.set('HAVE_@0@'.format(uuidheader.underscorify().to_upper()), 1) + uuids = { +'e2fs': { + 'uuiddep': 'uuid', + 'uuidfunc': 'uuid_generate', + 'uuidheader': 'uuid/uuid.h', +}, +'bsd': { + 'uuiddep': '', + 'uuidfunc': 'uuid_to_string', + 'uuidheader': 'uuid.h', +}, +'ossp': { + 'uuiddep': 'ossp-uuid', + 'uuidfunc': 'uuid_export', + 'uuidheader': 'uuid.h', +} + } - cdata.set('HAVE_UUID_@0@'.format(uuidname), 1, - description: 'Define to 1 if you have @0@ UUID support.'.format(uuidname)) -else - uuid = not_found_dep + foreach uuidname, uuid_values : uuids +if uuidopt != 'auto' and uuidname != uuidopt + continue +endif +uuid_required = (uuidname == uuidopt) +uuiddep = uuid_values['uuiddep'] +uuidheader = uuid_values['uuidheader'] +uuidfunc = uuid_values['uuidfunc'] + +# libc should have uuid function +uuid = uuiddep != '' ? dependency(uuiddep, required: uuid_required) : \ +declare_dependency() + +if uuid.found() and cc.has_header_symbol(uuidheader, uuidfunc, args: test_c_args, + dependencies: uuid, required: uuid_required) + uuidname_upper =
Re: Ignoring BRIN for HOT updates (was: -udpates seems broken)
On 3/8/23 23:31, Matthias van de Meent wrote: > On Wed, 22 Feb 2023 at 14:14, Matthias van de Meent > wrote: >> >> On Wed, 22 Feb 2023 at 13:15, Tomas Vondra >> wrote: >>> >>> On 2/20/23 19:15, Matthias van de Meent wrote: Thanks. Based on feedback, attached is v2 of the patch, with as significant changes: - We don't store the columns we mention in predicates of summarized indexes in the hotblocking column anymore, they are stored in the summarized columns bitmap instead. This further reduces the chance of failiing to apply HOT with summarizing indexes. >>> >>> Interesting idea. I need to think about the correctness, but AFAICS it >>> should work. Do we have any tests covering such cases? >> >> There is a test that checks that an update to the predicated column >> does update the index (on table brin_hot_2). However, the description >> was out of date, so I've updated that in v4. >> - The heaptuple header bit for summarized update in inserted tuples is replaced with passing an out parameter. This simplifies the logic and decreases chances of accidentally storing incorrect data. >>> >>> OK. >>> >>> 0002 proposes a minor RelationGetIndexPredicate() tweak, getting rid of >>> the repeated if/else branches. Feel free to discard, if you think the v2 >>> approach is better. >> >> I agree that this is better, it's included in v4 of the patch, as attached. > > I think that the v4 patch solves all comments up to now; and > considering that most of this patch was committed but then reverted > due to an issue in v15, and that said issue is fixed in this patch, > I'm marking this as ready for committer. > > Tomas, would you be up for that? > Thanks for the patch. I started looking at it yesterday, and I think it's 99% RFC. I think it's correct and I only have some minor comments, (see the 0002 patch): 1) There were still a couple minor wording issues in the sgml docs. 2) bikeshedding: I added a bunch of "()" to various conditions, I think it makes it clearer. 3) This seems a bit weird way to write a conditional Assert: if (onlySummarized) Assert(HeapTupleIsHeapOnly(heapTuple)); better to do a composed Assert(!(onlySummarized && !...)) or something? 4) A couple comments and minor tweaks. 5) Undoing a couple unnecessary changes (whitespace, ...). 6) Proper formatting of TU_UpdateIndexes enum. 7) Comment in RelationGetIndexAttrBitmap() is misleading, as it still references hotblockingattrs, even though it may update summarizedattrs in some cases. If you agree with these changes, I'll get it committed. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL CompanyFrom 4459b7b325843d2dfcfad42d645ae5d3b47d784d Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Tue, 14 Mar 2023 01:11:38 +0100 Subject: [PATCH v5 1/2] Ignore BRIN indexes when checking for HOT updates When determining whether an index update may be skipped by using HOT, we can ignore attributes indexed by block summarizing indexes without references to individual tuples that need to be cleaned up. This also removes rd_indexattr list, and replaces it with rd_attrsvalid flag. The list was not used anywhere, and a simple flag is sufficient. A new type TU_UpdateIndexes is invented provide a signal to the executor to determine which indexes to update - no indexes, all indexes, or only the summarizing indexes. One otherwise unused bit in the heap tuple header is (ab)used to signal that the HOT update would still update at least one summarizing index. The bit is cleared immediately Original patch by Josef Simanek, various fixes and improvements by Tomas Vondra and me. Authors: Josef Simanek, Tomas Vondra, Matthias van de Meent Reviewed-by: Tomas Vondra, Alvaro Herrera --- doc/src/sgml/indexam.sgml | 13 ++ src/backend/access/brin/brin.c| 1 + src/backend/access/gin/ginutil.c | 1 + src/backend/access/gist/gist.c| 1 + src/backend/access/hash/hash.c| 1 + src/backend/access/heap/heapam.c | 48 +++- src/backend/access/heap/heapam_handler.c | 19 ++- src/backend/access/nbtree/nbtree.c| 1 + src/backend/access/spgist/spgutils.c | 1 + src/backend/access/table/tableam.c| 2 +- src/backend/catalog/index.c | 9 +- src/backend/catalog/indexing.c| 35 -- src/backend/commands/copyfrom.c | 5 +- src/backend/commands/indexcmds.c | 10 +- src/backend/executor/execIndexing.c | 37 -- src/backend/executor/execReplication.c| 9 +- src/backend/executor/nodeModifyTable.c| 13 +- src/backend/nodes/makefuncs.c | 7 +- src/backend/utils/cache/relcache.c| 69 +++ src/include/access/amapi.h| 2 + src/include/access/heapam.h |
Re: [PATCH] Add pretty-printed XML output option
On 09.03.23 21:21, Tom Lane wrote: Peter Smith writes: The patch v19 LGTM. Another thing that's mildly irking me is that the current factorization of this code will result in xml_parse'ing the data twice, if you have both DOCUMENT and INDENT specified. We could consider avoiding that if we merged the indentation functionality into xmltotext_with_xmloption, but it's probably premature to do so when we haven't figured out how to get the output right --- we might end up needing two xml_parse calls anyway with different parameters, perhaps. Just a thought: since xmlserialize_indent also calls xml_parse() to build the xmlDocPtr, couldn't we simply bypass xmltotext_with_xmloption() in case of INDENT is specified? Something like this: diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index 19351fe..ea808dd 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -3829,6 +3829,7 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op) { Datum *argvalue = op->d.xmlexpr.argvalue; bool *argnull = op->d.xmlexpr.argnull; + text *result; /* argument type is known to be xml */ Assert(list_length(xexpr->args) == 1); @@ -3837,8 +3838,14 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op) return; value = argvalue[0]; - *op->resvalue = PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value), - xexpr->xmloption)); + if (xexpr->indent) + result = xmlserialize_indent(DatumGetXmlP(value), + xexpr->xmloption); + else + result = xmltotext_with_xmloption(DatumGetXmlP(value), + xexpr->xmloption); + + *op->resvalue = PointerGetDatum(result); *op->resnull = false; } break;
Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32
Please, don't top post. On Tue, Mar 14, 2023 at 12:30 PM Daniel Watzinger < daniel.watzin...@gmail.com> wrote: > I'm sorry I couldn't contribute to the discussion in time. The fix of the > fstat() Win32 port looks good to me. I agree that there's a need for > multiple fseek() ports to address the shortcomings of the MSVC > functionality. > > The documentation event states that "on devices incapable of seeking, the > return value is undefined". A simple wrapper using GetFileType() or the new > fstat(), to filter non-seekable devices before delegation, will probably > suffice. > > > https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fseek-fseeki64?view=msvc-170 > I have just posted a patch to enforce the detection of unseekable streams in the fseek() calls [1], please feel free to review it. [1] https://www.postgresql.org/message-id/CAC%2BAXB26a4EmxM2suXxPpJaGrqAdxracd7hskLg-zxtPB50h7A%40mail.gmail.com Regards, Juan José Santamaría Flecha
Fix fseek() detection of unseekable files on WIN32
Hello all, As highlighted in [1] fseek() might fail to error even when accessing unseekable streams. PFA a patch that checks the file type before the actual fseek(), so only supported calls are made. [1] https://www.postgresql.org/message-id/flat/b1448cd7-871e-20e3-8398-895e2d1d3...@gmail.com Regards, Juan José Santamaría Flecha 0001-fix-fseek-detection-of-unseekable-files-for-WIN32.patch Description: Binary data
Re: Add macros for ReorderBufferTXN toptxn
On Tue, Mar 14, 2023 at 5:03 PM vignesh C wrote: > > On Tue, 14 Mar 2023 at 12:37, Peter Smith wrote: > > > > The same issue exists here too: > 1) > - if (toptxn != NULL && !rbtxn_has_catalog_changes(toptxn)) > + if (rbtxn_is_subtxn(txn)) > { > - toptxn->txn_flags |= RBTXN_HAS_CATALOG_CHANGES; > - dclist_push_tail(>catchange_txns, > >catchange_node); > + ReorderBufferTXN *toptxn = rbtxn_get_toptxn(txn); > > 2) > - if (change->txn->toptxn) > - topxid = change->txn->toptxn->xid; > + if (rbtxn_is_subtxn(change->txn)) > + topxid = rbtxn_get_toptxn(change->txn)->xid; > > If you plan to fix, bothe the above also should be handled. > I don't know if it would be any better to change the above two as compared to what the proposed patch has. -- With Regards, Amit Kapila.
Re: Add macros for ReorderBufferTXN toptxn
On Tue, Mar 14, 2023 at 12:37 PM Peter Smith wrote: > > Thanks for the review! > > On Mon, Mar 13, 2023 at 6:19 PM vignesh C wrote: > ... > > > 4) We check if txn->toptxn is not null twice here both in if condition > > and in the assignment, we could retain the assignment operation as > > earlier to remove the 2nd check: > > - if (txn->toptxn) > > - txn = txn->toptxn; > > + if (isa_subtxn(txn)) > > + txn = get_toptxn(txn); > > > > We could avoid one check again by: > > + if (isa_subtxn(txn)) > > + txn = txn->toptxn; > > > > Yeah, that is true, but I chose not to keep the original assignment in > this case mainly because then it is consistent with the other changed > code --- e.g. Every other direct member assignment/access of the > 'toptxn' member now hides behind the macros so I did not want this > single place to be the odd one out. TBH, I don't think 1 extra check > is of any significance, but it is not a problem to change like you > suggested if other people also want it done that way. > Can't we directly use rbtxn_get_toptxn() for this case? I think that way code will look neat. I see that it is not exactly matching the existing check so you might be worried but I feel the new code will achieve the same purpose and will be easy to follow. -- With Regards, Amit Kapila.
Re: Add macros for ReorderBufferTXN toptxn
On Tue, 14 Mar 2023 at 12:37, Peter Smith wrote: > > Thanks for the review! > > On Mon, Mar 13, 2023 at 6:19 PM vignesh C wrote: > ... > > Few comments: > > 1) Can we move the macros along with the other macros present in this > > file, just above this structure, similar to the macros added for > > txn_flags: > > /* Toplevel transaction for this subxact (NULL for top-level). */ > > +#define isa_toptxn(rbtxn) (rbtxn->toptxn == NULL) > > +#define isa_subtxn(rbtxn) (rbtxn->toptxn != NULL) > > +#define get_toptxn(rbtxn) (isa_subtxn(rbtxn) ? rbtxn->toptxn : rbtxn) > > > > 2) The macro name can be changed to rbtxn_is_toptxn, rbtxn_is_subtxn > > and rbtxn_get_toptxn to keep it consistent with others: > > /* Toplevel transaction for this subxact (NULL for top-level). */ > > +#define isa_toptxn(rbtxn) (rbtxn->toptxn == NULL) > > +#define isa_subtxn(rbtxn) (rbtxn->toptxn != NULL) > > +#define get_toptxn(rbtxn) (isa_subtxn(rbtxn) ? rbtxn->toptxn : rbtxn) > > > > 3) We could add separate comments for each of the macros: > > /* Toplevel transaction for this subxact (NULL for top-level). */ > > +#define isa_toptxn(rbtxn) (rbtxn->toptxn == NULL) > > +#define isa_subtxn(rbtxn) (rbtxn->toptxn != NULL) > > +#define get_toptxn(rbtxn) (isa_subtxn(rbtxn) ? rbtxn->toptxn : rbtxn) > > > > All the above are fixed as suggested. > > > 4) We check if txn->toptxn is not null twice here both in if condition > > and in the assignment, we could retain the assignment operation as > > earlier to remove the 2nd check: > > - if (txn->toptxn) > > - txn = txn->toptxn; > > + if (isa_subtxn(txn)) > > + txn = get_toptxn(txn); > > > > We could avoid one check again by: > > + if (isa_subtxn(txn)) > > + txn = txn->toptxn; > > > > Yeah, that is true, but I chose not to keep the original assignment in > this case mainly because then it is consistent with the other changed > code --- e.g. Every other direct member assignment/access of the > 'toptxn' member now hides behind the macros so I did not want this > single place to be the odd one out. TBH, I don't think 1 extra check > is of any significance, but it is not a problem to change like you > suggested if other people also want it done that way. The same issue exists here too: 1) - if (toptxn != NULL && !rbtxn_has_catalog_changes(toptxn)) + if (rbtxn_is_subtxn(txn)) { - toptxn->txn_flags |= RBTXN_HAS_CATALOG_CHANGES; - dclist_push_tail(>catchange_txns, >catchange_node); + ReorderBufferTXN *toptxn = rbtxn_get_toptxn(txn); 2) - if (change->txn->toptxn) - topxid = change->txn->toptxn->xid; + if (rbtxn_is_subtxn(change->txn)) + topxid = rbtxn_get_toptxn(change->txn)->xid; If you plan to fix, bothe the above also should be handled. 3) The comment on top of rbtxn_get_toptxn could be kept similar in both the below places. I know it is not because of your change, but since it is a very small change probably we could include it along with this patch: @@ -717,10 +717,7 @@ ReorderBufferProcessPartialChange(ReorderBuffer *rb, ReorderBufferTXN *txn, return; /* Get the top transaction. */ - if (txn->toptxn != NULL) - toptxn = txn->toptxn; - else - toptxn = txn; + toptxn = rbtxn_get_toptxn(txn); /* * Indicate a partial change for toast inserts. The change will be @@ -812,10 +809,7 @@ ReorderBufferQueueChange(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn, ReorderBufferTXN *toptxn; /* get the top transaction */ - if (txn->toptxn != NULL) - toptxn = txn->toptxn; - else - toptxn = txn; + toptxn = rbtxn_get_toptxn(txn); Regards, Vignesh
Re: pg_dump/pg_restore: Fix stdin/stdout handling of custom format on Win32
I'm sorry I couldn't contribute to the discussion in time. The fix of the fstat() Win32 port looks good to me. I agree that there's a need for multiple fseek() ports to address the shortcomings of the MSVC functionality. The documentation event states that "on devices incapable of seeking, the return value is undefined". A simple wrapper using GetFileType() or the new fstat(), to filter non-seekable devices before delegation, will probably suffice. https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fseek-fseeki64?view=msvc-170 Regarding test automation and regression testing, there's a programmatic way to simulate how the "pipe operator" of cmd.exe and other shells works using CreateProcess and manual "piping" by means of various WinAPI functionality. This is actually how the bug was discovered in the first case. However, existing tests are probably platform-agnostic. -- Daniel On Tue, Mar 14, 2023 at 12:41 AM Michael Paquier wrote: > On Mon, Mar 13, 2023 at 05:49:41PM +0100, Juan José Santamaría Flecha > wrote: > > WFM, making fseek() behaviour more resilient seems like a good > improvement > > overall. > > I have not looked in details, but my guess would be to add a > win32seek.c similar to win32stat.c with a port of fseek() that's more > resilient to the definitions in POSIX. > > > Should we open a new thread to make that part more visible? > > Yes, perhaps it makes sense to do so to attract the correct audience, > There may be a few things we are missing. > > When it comes to pg_dump, both fixes are required, still it seems to > me that adjusting the fstat() port and the fseek() ports are two > different bugs, as they influence different parts of the code base > when taken individually (aka this fseek() port for WIN32 would need > fstat() to properly detect a pipe, as far as I understand). > > Meanwhile, I'll go apply and backpatch 0001 to fix the first bug at > hand with the fstat() port, if there are no objections. > -- > Michael >
Re: [PoC] Improve dead tuple storage for lazy vacuum
I wrote: > > > Since the block-level measurement is likely overestimating quite a bit, I propose to simply reverse the order of the actions here, effectively reporting progress for the *last page* and not the current one: First update progress with the current memory usage, then add tids for this page. If this allocated a new block, only a small bit of that will be written to. If this block pushes it over the limit, we will detect that up at the top of the loop. It's kind of like our earlier attempts at a "fudge factor", but simpler and less brittle. And, as far as OS pages we have actually written to, I think it'll effectively respect the memory limit, at least in the local mem case. And the numbers will make sense. > > > > > > Thoughts? > > > > It looks to work but it still doesn't work in a case where a shared > > tidstore is created with a 64kB memory limit, right? > > TidStoreMemoryUsage() returns 1MB and TidStoreIsFull() returns true > > from the beginning. > > I have two ideas: > > 1. Make it optional to track chunk memory space by a template parameter. It might be tiny compared to everything else that vacuum does. That would allow other users to avoid that overhead. > 2. When context block usage exceeds the limit (rare), make the additional effort to get the precise usage -- I'm not sure such a top-down facility exists, and I'm not feeling well enough today to study this further. Since then, Masahiko incorporated #1 into v31, and that's what I'm looking at now. Unfortunately, If I had spent five minutes reminding myself what the original objections were to this approach, I could have saved us some effort. Back in July (!), Andres raised two points: GetMemoryChunkSpace() is slow [1], and fragmentation [2] (leading to underestimation). In v31, in the local case at least, the underestimation is actually worse than tracking chunk space, since it ignores chunk header and alignment. I'm not sure about the DSA case. This doesn't seem great. It shouldn't be a surprise why a simple increment of raw allocation size is comparable in speed -- GetMemoryChunkSpace() calls the right function through a pointer, which is slower. If we were willing to underestimate for the sake of speed, that takes away the reason for making memory tracking optional. Further, if the option is not specified, in v31 there is no way to get the memory use at all, which seems odd. Surely the caller should be able to ask the context/area, if it wants to. I still like my idea at the top of the page -- at least for vacuum and m_w_m. It's still not completely clear if it's right but I've got nothing better. It also ignores the work_mem issue, but I've given up anticipating all future cases at the moment. I'll put this item and a couple other things together in a separate email tomorrow. [1] https://www.postgresql.org/message-id/20220704211822.kfxtzpcdmslzm2dy%40awork3.anarazel.de [2] https://www.postgresql.org/message-id/20220704220038.at2ane5xkymzzssb%40awork3.anarazel.de -- John Naylor EDB: http://www.enterprisedb.com
Re: Add LZ4 compression in pg_dump
--- Original Message --- On Monday, March 13th, 2023 at 10:47 PM, Tomas Vondra wrote: > > > Change pg_fatal() to an assertion+comment; > > > Yeah, that's reasonable. I'd even ditch the assert/comment, TBH. We > could add such protections against "impossible" stuff to a zillion other > places and the confusion likely outweighs the benefits. > A minor note to add is to not ignore the lessons learned from a7885c9bb. For example, as the testing framework stands, one can not test that the contents of the custom format are indeed compressed. One can infer it by examining the header of the produced dump and searching for the compression flag. The code responsible for writing the header and the code responsible for actually dealing with data, is not the same. Also, the compression library itself will happily read and write uncompressed data. A pg_fatal, assertion, or similar, is the only guard rail against this kind of error. Without it, the tests will continue passing even after e.g. a wrong initialization of the API. It was such a case that lead to a7885c9bb in the first place. I do think that we wish it to be an "impossible" case. Also it will be an untested case with some history without such a guard rail. Of course I will not object to removing it, if you think that is more confusing than useful. Cheers, //Georgios > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
Re: Allow logical replication to copy tables in binary format
Hi, Attached v13. Peter Smith , 14 Mar 2023 Sal, 03:07 tarihinde şunu yazdı: > Here are some review comments for patch v12-0001 > Thanks for reviewing. I tried to make explanations in docs better according to your comments. What do you think? Amit Kapila , 14 Mar 2023 Sal, 06:17 tarihinde şunu yazdı: > I think it would better to write the tests for this feature in the > existing test file 014_binary as that would save some time for node > setup/shutdown and also that would be a more appropriate place for > these tests. I removed 032_binary_copy.pl and added those tests into 014_binary.pl. Also added the case with different column order as Peter suggested. Best, -- Melih Mutlu Microsoft v13-0001-Allow-logical-replication-to-copy-table-in-binary.patch Description: Binary data
Re: [Proposal] Allow pg_dump to include all child tables with the root table
Le 14/03/2023 à 10:50, stephane tachoires a écrit : The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed V5 is ok (aside for LLVM 14 deprecated warnings, but that's another problem) with meson compile and meson test on Ubuntu 20.04.2. Code fits well and looks standart, --help explain what it does clearly, and documentation is ok (but as a Français, I'm not an expert in English). Thanks Stepane, I've changed commit fest status to "Ready for committers". -- Gilles Darold
Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)
Re: Tomas Vondra > and I don't think there's a good place to inject the 'rm' so I ended up > adding a 'cleanup_cmd' right after 'compress_cmd'. But it seems a bit > strange / hacky. Maybe there's a better way? Does the file need to be removed at all? Could we leave it there and have "make clean" remove it? Christoph
Re: [Proposal] Allow pg_dump to include all child tables with the root table
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed V5 is ok (aside for LLVM 14 deprecated warnings, but that's another problem) with meson compile and meson test on Ubuntu 20.04.2. Code fits well and looks standart, --help explain what it does clearly, and documentation is ok (but as a Français, I'm not an expert in English).
Re: [Proposal] Allow pg_dump to include all child tables with the root table
Hi Gilles, V5 is ok (aside for LLVM 14 deprecated warnings, but that's another problem) with meson compile and meson test on Ubuntu 20.04.2. Code fits well and looks standart, --help explain what it does clearly, and documentation is ok (but as a Français, I'm not an expert in English). Stéphane Le lun. 13 mars 2023 à 16:15, Gilles Darold a écrit : > Le 12/03/2023 à 19:05, Stéphane Tachoires a écrit : > > > Hi Gilles, > > On Ubuntu 22.04.2 all deb's updated, I have an error on a test > I'll try to find where and why, but I think you should know first. > > 1/1 postgresql:pg_dump / pg_dump/002_pg_dumpERROR > 24.40s exit status 1 > ―― > ✀ > > ―― > stderr: > # Failed test 'only_dump_measurement: should dump CREATE TABLE > test_compression' > # at /media/hddisk/stephane/postgresql/src/postgresql/src/bin/pg_dump/t/ > 002_pg_dump.pl line 4729. > # Review only_dump_measurement results in > /media/hddisk/stephane/postgresql/build/testrun/pg_dump/002_pg_dump/data/tmp_test_iJxJ > # Looks like you failed 1 test of 10264. > > > Hi Stephane, > > > Odd, I do not have this error when running make installcheck, I have the > same OS version as you. > > > +++ tap check in src/bin/pg_dump +++ > t/001_basic.pl ok > t/002_pg_dump.pl .. ok > t/003_pg_dump_with_server.pl .. ok > t/010_dump_connstr.pl . ok > All tests successful. > Files=4, Tests=9531, 11 wallclock secs ( 0.33 usr 0.04 sys + 3.05 > cusr 1.22 csys = 4.64 CPU) > Result: PASS > > Anyway this test must be fixed and this is done in version v5 of the patch > attached here. > > > Thanks for the review. > > -- > Gilles Darold > > -- "Où se posaient les hirondelles avant l'invention du téléphone ?" -- Grégoire Lacroix
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
> > > > Thanks for the updated patch. > Few minor comments: > 1) The extra line break after IsIndexOnlyOnExpression function can be > removed: > removed > > > 2) Generally we don't terminate with "." for single line comments > + > + /* > + * Simple case, we already have a primary key or a replica identity index. > + */ > + idxoid = GetRelationIdentityOrPK(localrel); > + if (OidIsValid(idxoid)) > + return idxoid; > > Well, there are several "." for single line comments even in the same file such as: /* 0 means it's a dropped attribute. See comments atop AttrMap. */ I really don't have any preference on this, but I doubt if I change it, I'll get another review suggesting to conform to the existing style in the same file. So, I'm skipping this suggestion for now, unless you have objections. v51_0001_use_index_on_subs_when_pub_rep_ident_full.patch Description: Binary data
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
On Tue, 14 Mar 2023 at 14:36, Önder Kalacı wrote: > > > Amit Kapila , 14 Mar 2023 Sal, 11:59 tarihinde şunu > yazdı: >> >> On Tue, Mar 14, 2023 at 12:48 PM Önder Kalacı wrote: >> >> >> >> 2. >> >> +# make sure that the subscriber has the correct data after the update >> >> UPDATE >> >> >> >> "update UPDATE" seems to be a typo. >> >> >> > >> > thanks, fixed >> > >> >> >> >> 3. >> >> +# now, drop the index with the expression, and re-create index on column >> >> lastname >> >> >> >> The comment says "re-create index on column lastname" but it seems we >> >> didn't do >> >> that, should it be modified to something like: >> >> # now, drop the index with the expression, we will use sequential scan >> >> >> >> >> > >> > Thanks, fixed >> > >> > I'll add the changes to v49 in the next e-mail. >> > >> >> It seems you forgot to address these last two comments in the latest version. >> > > Oops, sorry. I think when I get your test changes, I somehow overridden these > changes > on my local. Thanks for the updated patch. Few minor comments: 1) The extra line break after IsIndexOnlyOnExpression function can be removed: + } + + return true; +} + + +/* + * Returns true if the attrmap (which belongs to remoterel) contains the + * leftmost column of the index. + * + * Otherwise returns false. + */ 2) Generally we don't terminate with "." for single line comments + + /* + * Simple case, we already have a primary key or a replica identity index. + */ + idxoid = GetRelationIdentityOrPK(localrel); + if (OidIsValid(idxoid)) + return idxoid; Regards, Vignesh
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Amit Kapila , 14 Mar 2023 Sal, 11:59 tarihinde şunu yazdı: > On Tue, Mar 14, 2023 at 12:48 PM Önder Kalacı > wrote: > >> > >> 2. > >> +# make sure that the subscriber has the correct data after the update > UPDATE > >> > >> "update UPDATE" seems to be a typo. > >> > > > > thanks, fixed > > > >> > >> 3. > >> +# now, drop the index with the expression, and re-create index on > column lastname > >> > >> The comment says "re-create index on column lastname" but it seems we > didn't do > >> that, should it be modified to something like: > >> # now, drop the index with the expression, we will use sequential scan > >> > >> > > > > Thanks, fixed > > > > I'll add the changes to v49 in the next e-mail. > > > > It seems you forgot to address these last two comments in the latest > version. > > Oops, sorry. I think when I get your test changes, I somehow overridden these changes on my local. v50_0001_use_index_on_subs_when_pub_rep_ident_full.patch Description: Binary data
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
On Tue, Mar 14, 2023 at 12:48 PM Önder Kalacı wrote: >> >> 2. >> +# make sure that the subscriber has the correct data after the update UPDATE >> >> "update UPDATE" seems to be a typo. >> > > thanks, fixed > >> >> 3. >> +# now, drop the index with the expression, and re-create index on column >> lastname >> >> The comment says "re-create index on column lastname" but it seems we didn't >> do >> that, should it be modified to something like: >> # now, drop the index with the expression, we will use sequential scan >> >> > > Thanks, fixed > > I'll add the changes to v49 in the next e-mail. > It seems you forgot to address these last two comments in the latest version. -- With Regards, Amit Kapila.
Re: Improve WALRead() to suck data directly from WAL buffers when possible
On Tue, Mar 7, 2023 at 11:14 PM Nathan Bossart wrote: > > On Tue, Mar 07, 2023 at 12:39:13PM +0530, Bharath Rupireddy wrote: > > On Tue, Mar 7, 2023 at 3:30 AM Nathan Bossart > > wrote: > >> Is it possible to memcpy more than a page at a time? > > > > It would complicate things a lot there; the logic to figure out the > > last page bytes that may or may not fit in the whole page gets > > complicated. Also, the logic to verify each page's header gets > > complicated. We might lose out if we memcpy all the pages at once and > > start verifying each page's header in another loop. > > Doesn't the complicated logic you describe already exist to some extent in > the patch? You are copying a page at a time, which involves calculating > various addresses and byte counts. Okay here I am with the v10 patch set attached that avoids multiple memcpy calls which must benefit the callers who want to read more than 1 WAL buffer page (streaming replication WAL sender for instance). > >> +elog(DEBUG1, "read %zu bytes out of %zu bytes from WAL buffers for > >> given LSN %X/%X, Timeline ID %u", > >> + *read_bytes, count, LSN_FORMAT_ARGS(startptr), tli); > >> > >> I definitely don't think we should put an elog() in this code path. > >> Perhaps this should be guarded behind WAL_DEBUG. > > > > Placing it behind WAL_DEBUG doesn't help users/developers. My > > intention was to let users know that the WAL read hit the buffers, > > it'll help them report if any issue occurs and also help developers to > > debug that issue. > > I still think an elog() is mighty expensive for this code path, even when > it doesn't actually produce any messages. And when it does, I think it has > the potential to be incredibly noisy. Well, my motive was to have a way for the user to know WAL buffer hits and misses to report any found issues. However, I have a plan later to add WAL buffer stats (hits/misses). I understand that even if someone enables DEBUG1, this message can bloat server log files and make recovery slower, especially on a standby. Hence, I agree to keep these logs behind the WAL_DEBUG macro like others and did so in the attached v10 patch set. Please review the attached v10 patch set further. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From aa6454d9abb9a70b728dbba7f40279108486a3e4 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Tue, 14 Mar 2023 07:30:09 + Subject: [PATCH v10] Improve WALRead() to suck data directly from WAL buffers --- src/backend/access/transam/xlog.c | 171 src/backend/access/transam/xlogreader.c | 42 +- src/include/access/xlog.h | 6 + 3 files changed, 217 insertions(+), 2 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 543d4d897a..d40b9562e1 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -1639,6 +1639,177 @@ GetXLogBuffer(XLogRecPtr ptr, TimeLineID tli) return cachedPos + ptr % XLOG_BLCKSZ; } +/* + * Read WAL from WAL buffers. + * + * Read 'count' bytes of WAL from WAL buffers into 'buf', starting at location + * 'startptr', on timeline 'tli' and return total read bytes. + * + * Note that this function reads as much as it can from WAL buffers, meaning, + * it may not read all the requested 'count' bytes. Caller must be aware of + * this and deal with it. + */ +Size +XLogReadFromBuffers(XLogReaderState *state PG_USED_FOR_ASSERTS_ONLY, + XLogRecPtr startptr, + TimeLineID tli, + Size count, + char *buf) +{ + XLogRecPtr ptr = startptr; + Size nbytes = count; /* total bytes requested to be read by caller */ + Size ntotal = 0; /* total bytes read */ + Size nbatch = 0; /* bytes to be read in single batch */ + char *batchstart = NULL; /* location to read from for single batch */ + + Assert(!XLogRecPtrIsInvalid(startptr)); + Assert(count > 0); + Assert(startptr <= GetFlushRecPtr(NULL)); + Assert(!RecoveryInProgress()); + Assert(tli == GetWALInsertionTimeLine()); + + /* + * Holding WALBufMappingLock ensures inserters don't overwrite this value + * while we are reading it. We try to acquire it in shared mode so that the + * concurrent WAL readers are also allowed. We try to do as less work as + * possible while holding the lock to not impact concurrent WAL writers + * much. We quickly exit to not cause any contention, if the lock isn't + * immediately available. + */ + if (!LWLockConditionalAcquire(WALBufMappingLock, LW_SHARED)) + return ntotal; + + while (nbytes > 0) + { + XLogRecPtr expectedEndPtr; + XLogRecPtr endptr; + int idx; + char *page; + char*data; + XLogPageHeader phdr; + + idx = XLogRecPtrToBufIdx(ptr); + expectedEndPtr = ptr; + expectedEndPtr += XLOG_BLCKSZ - ptr % XLOG_BLCKSZ; + endptr = XLogCtl->xlblocks[idx]; + + /* Requested WAL isn't available in WAL buffers. */ + if (expectedEndPtr != endptr) + break;
Re: [PATCH] Add CANONICAL option to xmlserialize
v4 attached fixes an encoding issue at the xml_parse call. It now uses GetDatabaseEncoding(). Best, Jim From 3ff8e7bd9a9e43194d834ba6b125841539d5df1c Mon Sep 17 00:00:00 2001 From: Jim Jones Date: Mon, 6 Mar 2023 14:08:54 +0100 Subject: [PATCH v4] Add CANONICAL format to xmlserialize This patch introduces the CANONICAL option to xmlserialize, which serializes xml documents in their canonical form - as described in the W3C Canonical XML Version 1.1 specification. This option can be used with the additional parameter WITH [NO] COMMENTS to keep or remove xml comments from the canonical xml output. This feature is based on the function xmlC14NDocDumpMemory from the C14N module of libxml2. This patch also includes regression tests and documentation. --- doc/src/sgml/datatype.sgml| 41 +- src/backend/executor/execExprInterp.c | 13 +++- src/backend/parser/gram.y | 14 +++- src/backend/parser/parse_expr.c | 1 + src/backend/utils/adt/xml.c | 57 ++ src/include/nodes/parsenodes.h| 1 + src/include/nodes/primnodes.h | 9 +++ src/include/parser/kwlist.h | 1 + src/include/utils/xml.h | 1 + src/test/regress/expected/xml.out | 108 ++ src/test/regress/expected/xml_1.out | 104 + src/test/regress/expected/xml_2.out | 108 ++ src/test/regress/sql/xml.sql | 61 +++ 13 files changed, 514 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index 467b49b199..46ec95dbb8 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -4460,7 +4460,7 @@ xml 'bar' xml, uses the function xmlserialize:xmlserialize -XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type ) +XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ CANONICAL [ WITH [NO] COMMENTS ] ]) type can be character, character varying, or @@ -4470,6 +4470,45 @@ XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS + +The option CANONICAL converts a given + XML document to its https://www.w3.org/TR/xml-c14n11/#Terminology;>canonical form + based on the https://www.w3.org/TR/xml-c14n11/;>W3C Canonical XML 1.1 Specification. + It is basically designed to provide applications the ability to compare xml documents or test if they + have been changed. The optional parameter WITH [NO] COMMENTS removes or keeps XML comments + from the given document. + + + + Example: + + + When a character string value is cast to or from type xml without going through XMLPARSE or diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index 19351fe34b..ce376a6fcd 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -3827,18 +3827,27 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op) case IS_XMLSERIALIZE: { +text *result; Datum *argvalue = op->d.xmlexpr.argvalue; bool *argnull = op->d.xmlexpr.argnull; +XmlSerializeFormat format = op->d.xmlexpr.xexpr->format; /* argument type is known to be xml */ Assert(list_length(xexpr->args) == 1); if (argnull[0]) return; + value = argvalue[0]; -*op->resvalue = PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value), - xexpr->xmloption)); +if (format == XMLCANONICAL || format == XMLCANONICAL_WITH_COMMENTS) + result = xmlserialize_canonical(DatumGetXmlP(value), + format); +else + result = xmltotext_with_xmloption(DatumGetXmlP(value), + xexpr->xmloption); + +*op->resvalue = PointerGetDatum(result); *op->resnull = false; } break; diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index a0138382a1..af5f3dfdfd 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -619,6 +619,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %type xmltable_column_option_el %type xml_namespace_list %type xml_namespace_el +%type opt_xml_serialize_format %type func_application func_expr_common_subexpr %type func_expr func_expr_windowless @@ -676,7 +677,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); BACKWARD BEFORE BEGIN_P BETWEEN BIGINT BINARY BIT BOOLEAN_P BOTH BREADTH BY - CACHE CALL CALLED CASCADE CASCADED CASE CAST CATALOG_P CHAIN CHAR_P + CACHE CALL CALLED CANONICAL CASCADE CASCADED CASE CAST CATALOG_P CHAIN CHAR_P CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE CLUSTER COALESCE COLLATE COLLATION COLUMN COLUMNS COMMENT COMMENTS COMMIT COMMITTED COMPRESSION CONCURRENTLY CONFIGURATION CONFLICT @@ -15532,13 +15533,14 @@ func_expr_common_subexpr: $$ = makeXmlExpr(IS_XMLROOT, NULL, NIL, list_make3($3, $5, $6), @1); } - |
Re: logical decoding and replication of sequences, take 2
I tried a couple toy examples with various combinations of use styles. Three with "automatic" reading from sequences: create table test(i serial); create table test(i int GENERATED BY DEFAULT AS IDENTITY); create table test(i int default nextval('s1')); ...where s1 has some non-default parameters: CREATE SEQUENCE s1 START 100 MAXVALUE 100 INCREMENT BY -1; ...and then two with explicit use of s1, one inserting the 'nextval' into a table with no default, and one with no table at all, just selecting from the sequence. The last two seem to work similarly to the first three, so it seems like FOR ALL TABLES adds all sequences as well. Is that expected? The documentation for CREATE PUBLICATION mentions sequence options, but doesn't really say how these options should be used. Here's the script: # alter system set wal_level='logical'; # restart # port is subscriber echo echo "PUB:" psql -c "drop sequence if exists s1;" psql -c "drop publication if exists pub1;" echo echo "SUB:" psql -p -c "drop sequence if exists s1;" psql -p -c "drop subscription if exists sub1 ;" echo echo "PUB:" psql -c "CREATE SEQUENCE s1 START 100 MAXVALUE 100 INCREMENT BY -1;" psql -c "CREATE PUBLICATION pub1 FOR ALL TABLES;" echo echo "SUB:" psql -p -c "CREATE SEQUENCE s1 START 100 MAXVALUE 100 INCREMENT BY -1;" psql -p -c "CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost dbname=john application_name=sub1 port=5432' PUBLICATION pub1;" echo echo "PUB:" psql -c "select nextval('s1');" psql -c "select nextval('s1');" psql -c "select * from s1;" sleep 1 echo echo "SUB:" psql -p -c "select * from s1;" psql -p -c "drop subscription sub1 ;" psql -p -c "select nextval('s1');" psql -p -c "select * from s1;" ...with the last two queries returning nextval - 67 (1 row) last_value | log_cnt | is_called +-+--- 67 | 32 | t So, I interpret that the decrement by 32 got logged here. Also, running CREATE PUBLICATION pub2 FOR ALL SEQUENCES WITH (publish = 'insert, update, delete, truncate, sequence'); ...reports success, but do non-default values of "publish = ..." have an effect (or should they), or are these just ignored? It seems like these cases shouldn't be treated orthogonally. -- John Naylor EDB: http://www.enterprisedb.com
Re: ICU 54 and earlier are too dangerous
On 14.03.23 01:26, Tom Lane wrote: Unless someone has a better idea, I think we need to bump the minimum required ICU version to 55. That would solve the issue in v16 and later, but those using old versions of ICU and old versions of postgres would still be vulnerable to these kinds of typos. ... that seems like an overreaction. We know from the buildfarm that there's still a lot of old ICU out there. Is it really improving anybody's life to try to forbid them from using such a version? If I'm getting the dates right, the 10-year support of RHEL 7 will expire in June 2024. So if we follow past practices, we could drop support for RHEL 7 in PG17. This would allow us to drop support for old libicu, and also old openssl, zlib, maybe more. So if we don't feel like we need to do an emergency change here, there is a path to do this in a principled way in the near future.
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Hi Amit, all Amit Kapila , 14 Mar 2023 Sal, 09:50 tarihinde şunu yazdı: > On Mon, Mar 13, 2023 at 7:46 PM Önder Kalacı > wrote: > > > > Attaching v47. > > > > I have made the following changes in the attached patch (a) removed > the function IsIdxSafeToSkipDuplicates() and used the check directly > in the caller Should be fine, we can re-introduce this function when I work on the non-pkey/RI unique index improvement as a follow up to this. > ; (b) changed a few comments in the patch; Thanks, looks good. > (c) the test > file was inconsistently using ';' while executing statement with > safe_psql, changed it to remove ';'. > > Alright, thanks. And as a self-review, when I write regression tests next time, I'll spend a lot more time on the style/consistency/comments etc. During this review, the reviewers had to spend many cycles on that area, which is something I should have done better. Attaching v49 with some minor changes Shi Yu noted earlier. Thanks, Onder KALACI v49-0001-Allow-the-use-of-indexes-other-than-PK-and-REPLI.patch Description: Binary data
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Hi Shi Yu, > in RemoteRelContainsLeftMostColumnOnIdx(): > > + if (indexInfo->ii_NumIndexAttrs < 1) > + return false; > > Did you see any cases that the condition is true? I think there is at > least one > column in the index. If so, we can use an Assert(). > Actually, it was mostly to guard against any edge cases. I thought similar to tables, we can have zero column indexes, but it turns out it is not possible. Also, index_create seems to check that, so changing it asset makes sense: > > /* > * check parameters > */ > if (indexInfo->ii_NumIndexAttrs < 1) > elog(ERROR, "must index at least one column"); > > + if (attrmap->maplen <= AttrNumberGetAttrOffset(keycol)) > + return false; > > Similarly, I think `attrmap->maplen` is the number of columns and it is > always > greater than keycol. If you agree, we can check it with an Assert(). At this point, I'm really hesitant to make any assumptions. Logical replication is pretty flexible, and who knows maybe dropped columns will be treated differently at some point, and this assumption changes? I really feel more comfortable to keep this as-is. We call this function very infrequently anyway. > Besides, It > seems we don't need AttrNumberGetAttrOffset(). > > Why not? We are accessing the AttrNumberGetAttrOffset(keycol) element of the array attnums? > 2. > +# make sure that the subscriber has the correct data after the update > UPDATE > > "update UPDATE" seems to be a typo. > > thanks, fixed > 3. > +# now, drop the index with the expression, and re-create index on column > lastname > > The comment says "re-create index on column lastname" but it seems we > didn't do > that, should it be modified to something like: > # now, drop the index with the expression, we will use sequential scan > > > Thanks, fixed I'll add the changes to v49 in the next e-mail. Thanks, Onder KALACI
Re: ICU locale validation / canonicalization
On 13.03.23 16:31, Jeff Davis wrote: What we had discussed a while ago in one of these threads is that ICU before version 54 do not support keyword lists, and we have custom code to do that parsing ourselves, but we don't have code to do the same for language tags. Therefore, if I understand this right, if we automatically convert ICU locale IDs to language tags, as shown above, then we break support for such locales in those older ICU versions. Right. In versions 53 and earlier, and during pg_upgrade, we would just preserve the locale string as entered. Another issue that came to mind: Right now, you can, say, develop SQL schemas on a newer ICU version, say, your laptop, and then deploy them on a server running an older ICU version. If we have a cutoff beyond which we convert ICU locale IDs to language tags, then this won't work anymore for certain combinations. And RHEL/CentOS 7 is still pretty popular.
Re: Add macros for ReorderBufferTXN toptxn
Thanks for the review! On Mon, Mar 13, 2023 at 6:19 PM vignesh C wrote: ... > Few comments: > 1) Can we move the macros along with the other macros present in this > file, just above this structure, similar to the macros added for > txn_flags: > /* Toplevel transaction for this subxact (NULL for top-level). */ > +#define isa_toptxn(rbtxn) (rbtxn->toptxn == NULL) > +#define isa_subtxn(rbtxn) (rbtxn->toptxn != NULL) > +#define get_toptxn(rbtxn) (isa_subtxn(rbtxn) ? rbtxn->toptxn : rbtxn) > > 2) The macro name can be changed to rbtxn_is_toptxn, rbtxn_is_subtxn > and rbtxn_get_toptxn to keep it consistent with others: > /* Toplevel transaction for this subxact (NULL for top-level). */ > +#define isa_toptxn(rbtxn) (rbtxn->toptxn == NULL) > +#define isa_subtxn(rbtxn) (rbtxn->toptxn != NULL) > +#define get_toptxn(rbtxn) (isa_subtxn(rbtxn) ? rbtxn->toptxn : rbtxn) > > 3) We could add separate comments for each of the macros: > /* Toplevel transaction for this subxact (NULL for top-level). */ > +#define isa_toptxn(rbtxn) (rbtxn->toptxn == NULL) > +#define isa_subtxn(rbtxn) (rbtxn->toptxn != NULL) > +#define get_toptxn(rbtxn) (isa_subtxn(rbtxn) ? rbtxn->toptxn : rbtxn) > All the above are fixed as suggested. > 4) We check if txn->toptxn is not null twice here both in if condition > and in the assignment, we could retain the assignment operation as > earlier to remove the 2nd check: > - if (txn->toptxn) > - txn = txn->toptxn; > + if (isa_subtxn(txn)) > + txn = get_toptxn(txn); > > We could avoid one check again by: > + if (isa_subtxn(txn)) > + txn = txn->toptxn; > Yeah, that is true, but I chose not to keep the original assignment in this case mainly because then it is consistent with the other changed code --- e.g. Every other direct member assignment/access of the 'toptxn' member now hides behind the macros so I did not want this single place to be the odd one out. TBH, I don't think 1 extra check is of any significance, but it is not a problem to change like you suggested if other people also want it done that way. -- Kind Regards, Peter Smith. Fujitsu Australia. v2-0001-Add-macros-for-ReorderBufferTXN-toptxn.patch Description: Binary data
Re: Refactoring SysCacheGetAttr to know when attr cannot be NULL
On 13.03.23 14:19, Daniel Gustafsson wrote: On 2 Mar 2023, at 15:44, Tom Lane wrote: Peter Eisentraut writes: I think an error message like "unexpected null value in system cache %d column %d" is sufficient. Since these are "can't happen" errors, we don't need to spend too much extra effort to make it prettier. I'd at least like to see it give the catalog's OID. That's easily convertible to a name, and it doesn't tend to move around across PG versions, neither of which are true for syscache IDs. Also, I'm fairly unconvinced that it's a "can't happen" --- this would be very likely to fire as a result of catalog corruption, so it would be good if it's at least minimally interpretable by a non-expert. Given that we'll now have just one copy of the code, ISTM there's a good case for doing the small extra work to report catalog and column by name. Rebased v3 on top of recent conflicting ICU changes causing the patch to not apply anymore. Also took another look around the tree to see if there were missed callsites but found none new. I think the only open question here was the granularity of the error message, which I think you had addressed in v2. + if (isnull) + { + elog(ERROR, +"unexpected NULL value in cached tuple for pg_catalog.%s.%s", +get_rel_name(cacheinfo[cacheId].reloid), + NameStr(TupleDescAttr(SysCache[cacheId]->cc_tupdesc, attributeNumber - 1)->attname)); + } I prefer to use "null value" for SQL null values, and NULL for the C symbol. I'm a bit hesitant about hardcoding pg_catalog here. That happens to be true, of course, but isn't actually enforced, I think. I think that could be left off. It's not like people will be confused about which schema "pg_class.relname" is in. Also, the cached tuple isn't really for the attribute, so maybe split that up a bit, like "unexpected null value in cached tuple for catalog %s column %s"
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
On Mon, Mar 13, 2023 at 7:46 PM Önder Kalacı wrote: > > Attaching v47. > I have made the following changes in the attached patch (a) removed the function IsIdxSafeToSkipDuplicates() and used the check directly in the caller; (b) changed a few comments in the patch; (c) the test file was inconsistently using ';' while executing statement with safe_psql, changed it to remove ';'. -- With Regards, Amit Kapila. v48-0001-Allow-the-use-of-indexes-other-than-PK-and-REPLI.patch Description: Binary data
Re: Testing autovacuum wraparound (including failsafe)
On Wed, Mar 8, 2023 at 1:52 PM Masahiko Sawada wrote: > > On Fri, Mar 3, 2023 at 8:34 PM Heikki Linnakangas wrote: > > > > On 16/11/2022 06:38, Ian Lawrence Barwick wrote: > > > Thanks for the patch. While reviewing the patch backlog, we have > > > determined that > > > the latest version of this patch was submitted before meson support was > > > implemented, so it should have a "meson.build" file added for > > > consideration for > > > inclusion in PostgreSQL 16. > > > > I wanted to do some XID wraparound testing again, to test the 64-bit > > SLRUs patches [1], and revived this. > > Thank you for reviving this thread! > > > > > I took a different approach to consuming the XIDs. Instead of setting > > nextXID directly, bypassing GetNewTransactionId(), this patch introduces > > a helper function to call GetNewTransactionId() repeatedly. But because > > that's slow, it does include a shortcut to skip over "uninteresting" > > XIDs. Whenever nextXid is close to an SLRU page boundary or XID > > wraparound, it calls GetNewTransactionId(), and otherwise it bumps up > > nextXid close to the next "interesting" value. That's still a lot slower > > than just setting nextXid, but exercises the code more realistically. > > > > I've written some variant of this helper function many times over the > > years, for ad hoc testing. I'd love to have it permanently in the git tree. > > These functions seem to be better than mine. > > > In addition to Masahiko's test for emergency vacuum, this includes two > > other tests. 002_limits.pl tests the "warn limit" and "stop limit" in > > GetNewTransactionId(), and 003_wraparound.pl burns through 10 billion > > transactions in total, exercising XID wraparound in general. > > Unfortunately these tests are pretty slow; the tests run for about 4 > > minutes on my laptop in total, and use about 20 GB of disk space. So > > perhaps these need to be put in a special test suite that's not run as > > part of "check-world". Or perhaps leave out the 003_wraparounds.pl test, > > that's the slowest of the tests. But I'd love to have these in the git > > tree in some form. > > cbfot reports some failures. The main reason seems that meson.build in > xid_wraparound directory adds the regression tests but the .sql and > .out files are missing in the patch. Perhaps the patch wants to add > only tap tests as Makefile doesn't define REGRESS? > > Even after fixing this issue, CI tests (Cirrus CI) are not happy and > report failures due to a disk full. The size of xid_wraparound test > directory is 105MB out of 262MB: > > % du -sh testrun > 262Mtestrun > % du -sh testrun/xid_wraparound/ > 105Mtestrun/xid_wraparound/ > % du -sh testrun/xid_wraparound/* > 460Ktestrun/xid_wraparound/001_emergency_vacuum > 93M testrun/xid_wraparound/002_limits > 12M testrun/xid_wraparound/003_wraparounds > % ls -lh testrun/xid_wraparound/002_limits/log* > total 93M > -rw---. 1 masahiko masahiko 93M Mar 7 17:34 002_limits_wraparound.log > -rw-rw-r--. 1 masahiko masahiko 20K Mar 7 17:34 regress_log_002_limits > > The biggest file is the server logs since an autovacuum worker writes > autovacuum logs for every table for every second (autovacuum_naptime > is 1s). Maybe we can set log_autovacuum_min_duration reloption for the > test tables instead of globally enabling it I think it could be acceptable since 002 and 003 tests are executed only when required. And 001 test seems to be able to pass on cfbot but it takes more than 30 sec. In the attached patch, I made these tests optional and these are enabled if envar ENABLE_XID_WRAPAROUND_TESTS is defined (supporting only autoconf). > > The 001 test uses the 2PC transaction that holds locks on tables but > since we can consume xids while the server running, we don't need > that. Instead I think we can keep a transaction open in the background > like 002 test does. Updated in the new patch. Also, I added a check if the failsafe mode is triggered. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com v2-0001-Add-tests-for-XID-wraparound.patch Description: Binary data
RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
On Mon, Mar 13, 2023 10:16 PM Önder Kalacı wrote: > > Attaching v47. > Thanks for updating the patch. Here are some comments. 1. in RemoteRelContainsLeftMostColumnOnIdx(): + if (indexInfo->ii_NumIndexAttrs < 1) + return false; Did you see any cases that the condition is true? I think there is at least one column in the index. If so, we can use an Assert(). + if (attrmap->maplen <= AttrNumberGetAttrOffset(keycol)) + return false; Similarly, I think `attrmap->maplen` is the number of columns and it is always greater than keycol. If you agree, we can check it with an Assert(). Besides, It seems we don't need AttrNumberGetAttrOffset(). 2. +# make sure that the subscriber has the correct data after the update UPDATE "update UPDATE" seems to be a typo. 3. +# now, drop the index with the expression, and re-create index on column lastname The comment says "re-create index on column lastname" but it seems we didn't do that, should it be modified to something like: # now, drop the index with the expression, we will use sequential scan Besides these, the patch LGTM. Regards, Shi Yu