Re: Introduce XID age and inactive timeout based replication slot invalidation
On Fri, Mar 8, 2024 at 8:08 PM Bharath Rupireddy wrote: > > On Wed, Mar 6, 2024 at 4:28 PM Amit Kapila wrote: > > > > IIUC, the current conflict_reason is primarily used to determine > > logical slots on standby that got invalidated due to recovery time > > conflict. On the primary, it will also show logical slots that got > > invalidated due to the corresponding WAL got removed. Is that > > understanding correct? > > That's right. > > > If so, we are already sort of overloading this > > column. However, now adding more invalidation reasons that won't > > happen during recovery conflict handling will change entirely the > > purpose (as per the name we use) of this variable. I think > > invalidation_reason could depict this column correctly but OTOH I > > guess it would lose its original meaning/purpose. > > Hm. I get the concern. Are you okay with having inavlidation_reason > separately for both logical and physical slots? In such a case, > logical slots that got invalidated on the standby will have duplicate > info in conflict_reason and invalidation_reason, is this fine? > If we have duplicate information in two columns that could be confusing for users. BTW, isn't the recovery conflict occur only because of rows_removed and wal_level_insufficient reasons? The wal_removed or the new reasons you are proposing can't happen because of recovery conflict. Am, I missing something here? > Another idea is to make 'conflict_reason text' as a 'conflicting > boolean' again (revert 007693f2a3), and have 'invalidation_reason > text' for both logical and physical slots. So, whenever 'conflicting' > is true, one can look at invalidation_reason for the reason for > conflict. How does this sound? > So, does this mean that conflicting will only be true for some of the reasons (say wal_level_insufficient, rows_removed, wal_removed) and logical slots but not for others? I think that will also not eliminate the duplicate information as user could have deduced that from single column -- With Regards, Amit Kapila.
Re: Add new error_action COPY ON_ERROR "log"
On Sat, Mar 09, 2024 at 12:01:49AM +0530, Bharath Rupireddy wrote: > If NOTICE per attribute and incrementing num_errors per row is > implemented, it ends up erroring out with ERROR: missing data for > column "m" for all-column-empty-row. Shall we treat this ERROR softly > too if on_error ignore is specified? Or shall we discuss this idea > separately? > > ERROR: missing data for column "m" > CONTEXT: COPY check_ign_err, line 5: "" Hmm. I have spent some time looking at the bevahior of ON_ERROR, and there are two tests in copy2.sql, one for the case where there is more data than attributes and a second where there is not enough data in a row that checks for errors. For example, take this table: =# create table tab (a int, b int, c int); CREATE TABLE This case works, even if the row has clearly not enough attributes. The first attribute can be parsed, not the second one and this causes the remaining data of the row to be skipped: =# copy tab from stdin (delimiter ',', on_error ignore); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 1, >> \. NOTICE: 0: 1 row was skipped due to data type incompatibility LOCATION: CopyFrom, copyfrom.c:1314 COPY 0 This case fails. The first and the second attributes can be parsed, and the line fails because we are missing the last attribute as of a lack of delimiter: =# copy tab from stdin (delimiter ',', on_error ignore); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> 1,1 >> \. ERROR: 22P04: missing data for column "c" CONTEXT: COPY tab, line 1: "1,1" This brings a weird edge case for the all-column-empty-row case you are mentioning once if we try to get information about all the rows we should expect, but this has as side effect to break a case that's intended ro work with ON_ERROR, as far as I understand, which is to skip entirely a raw on the first conversion error we find, even if there is no data afterwards. I was a bit confused by that first, but I can also see why it is useful as-is on HEAD. At the end of the day, this comes down to what is more helpful to the user. And I'd agree on the side what ON_ERROR does currently, which is what your patch relies on: on the first conversion failure, give up and skip the rest of the row because we cannot trust its contents. That's my way of saying that I am fine with the proposal of your patch, and that we cannot provide the full state of a row without making the error stack of COPY more invasive. Perhaps we could discuss this idea of ensuring that all the attributes are on a row in a different thread, as you say, but I am not really convinced that there's a strong need for it at this stage as ON_ERROR is new to v17. So it does not sound like a bad thing to let it brew more before implementing more options and make the COPY paths more complicated than they already are. I suspect that this may trigger some discussion during the beta/stability phase depending on the initial feedback. Perhaps I'm wrong, though. + verbose, a NOTICE message + containing the line number and column name for each discarded row is + emitted. This should clarify that the column name refers to the attribute where the input conversion has failed, I guess. Specifying only "column name" without more context is a bit confusing. -- Michael signature.asc Description: PGP signature
Re: "type with xxxx does not exist" when doing ExecMemoize()
David Rowley 于2024年3月11日周一 13:25写道: > On Thu, 7 Mar 2024 at 22:50, David Rowley wrote: > > > > On Thu, 7 Mar 2024 at 15:24, Tender Wang wrote: > > > > > > Andrei Lepikhov 于2024年3月6日周三 11:37写道: > > >> I think, it is a bug. Should it be fixed (and back-patched) earlier? > > > > > > Agreed. Need David to review it as he knows this area best. > > > > This is on my list of things to do. Just not at the top yet. > > I've gone over this patch and I'm happy with the changes to > nodeMemoize.c. The thing I did change was the newly added test. The > problem there was the test was passing for me with and without the > code fix. I ended up changing the test so the cache hits and misses > are reported. That required moving the test to above where the > work_mem is set to 64KB so we can be certain the values will all be > cached and the cache hits are predictable. > > My other changes were just cosmetic. > > Thanks for working on this fix. I've pushed the patch. > > David > Thanks for pushing the patch. -- Tender Wang OpenPie: https://en.openpie.com/
Re: "type with xxxx does not exist" when doing ExecMemoize()
On Thu, 7 Mar 2024 at 22:50, David Rowley wrote: > > On Thu, 7 Mar 2024 at 15:24, Tender Wang wrote: > > > > Andrei Lepikhov 于2024年3月6日周三 11:37写道: > >> I think, it is a bug. Should it be fixed (and back-patched) earlier? > > > > Agreed. Need David to review it as he knows this area best. > > This is on my list of things to do. Just not at the top yet. I've gone over this patch and I'm happy with the changes to nodeMemoize.c. The thing I did change was the newly added test. The problem there was the test was passing for me with and without the code fix. I ended up changing the test so the cache hits and misses are reported. That required moving the test to above where the work_mem is set to 64KB so we can be certain the values will all be cached and the cache hits are predictable. My other changes were just cosmetic. Thanks for working on this fix. I've pushed the patch. David
Re: POC, WIP: OR-clause support for indexes
On 7/3/2024 21:51, Alexander Korotkov wrote: Hi! On Tue, Mar 5, 2024 at 9:59 AM Andrei Lepikhov mailto:a.lepik...@postgrespro.ru>> wrote: > On 5/3/2024 12:30, Andrei Lepikhov wrote: > > On 4/3/2024 09:26, jian he wrote: > ... and the new version of the patchset is attached. I made some revisions for the patchset. Great! 1) Use hash_combine() to combine hash values. Looks better 2) Upper limit the number of array elements by MAX_SAOP_ARRAY_SIZE. I'm not convinced about this limit. The initial reason was to combine long lists of ORs into the array because such a transformation made at an early stage increases efficiency. I understand the necessity of this limit in the array decomposition routine but not in the creation one. 3) Better save the original order of clauses by putting hash entries and untransformable clauses to the same list. A lot of differences in regression tests output have gone. I agree that reducing the number of changes in regression tests looks better. But to achieve this, you introduced a hack that increases the complexity of the code. Is it worth it? Maybe it would be better to make one-time changes in tests instead of getting this burden on board. Or have you meant something more introducing the node type? We don't make array values unique. That might make query execution performance somewhat worse, and also makes selectivity estimation worse. I suggest Andrei and/or Alena should implement making array values unique. The fix Alena has made looks correct. But I urge you to think twice: The optimizer doesn't care about duplicates, so why do we do it? What's more, this optimization is intended to speed up queries with long OR lists. Using the list_append_unique() comparator on such lists could impact performance. I suggest sticking to the common rule and leaving the responsibility on the user's shoulders. At least, we should do this optimization later, in one pass, with sorting elements before building the array. But what if we don't have a sort operator for the type? -- regards, Andrei Lepikhov Postgres Professional
RE: speed up a logical replica setup
Dear Vignesh, Thanks for updating the patch, but cfbot still got angry [1]. Note that two containers (autoconf and meson) failed at different place, so I think it is intentional one. It seems that there may be a bug related with 32-bit build. We should see and fix as soon as possible. [1]: http://cfbot.cputube.org/highlights/all.html#4637 Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Re: Documentation: warn about two_phase when altering a subscription
On Sat, Feb 24, 2024 at 5:21 AM Tristen Raab wrote: > > I've reviewed your patch, it applies correctly and the documentation builds > without any errors. As for the content of the patch itself, I think so far > it's good but would make two modifications. I like how the patch was worded > originally when referring to the subscription, stating these parameters were > 'in' the subscription rather than 'by' it. So I'd propose changing > > > parameters specified by the subscription. When creating the slot, ensure > > to > > > parameters specified in the subscription. When creating the slot, ensure > This suggestion makes sense, so I updated the patch for this and committed. -- With Regards, Amit Kapila.
Re: speed up a logical replica setup
On Sat, 9 Mar 2024 at 00:56, Tomas Vondra wrote: > > > > On 3/8/24 10:44, Hayato Kuroda (Fujitsu) wrote: > > Dear Tomas, Euler, > > > > Thanks for starting to read the thread! Since I'm not an original author, > > I want to reply partially. > > > >> I decided to take a quick look on this patch today, to see how it works > >> and do some simple tests. I've only started to get familiar with it, so > >> I have only some comments / questions regarding usage, not on the code. > >> It's quite possible I didn't understand some finer points, or maybe it > >> was already discussed earlier in this very long thread, so please feel > >> free to push back or point me to the past discussion. > >> > >> Also, some of this is rather opinionated, but considering I didn't see > >> this patch before, my opinions may easily be wrong ... > > > > I felt your comments were quit valuable. > > > >> 1) SGML docs > >> > >> It seems the SGML docs are more about explaining how this works on the > >> inside, rather than how to use the tool. Maybe that's intentional, but > >> as someone who didn't work with pg_createsubscriber before I found it > >> confusing and not very helpful. > >> > >> For example, the first half of the page is prerequisities+warning, and > >> sure those are useful details, but prerequisities are checked by the > >> tool (so I can't really miss this) and warnings go into a lot of details > >> about different places where things may go wrong. Sure, worth knowing > >> and including in the docs, but maybe not right at the beginning, before > >> I learn how to even run the tool? > > > > Hmm, right. I considered below improvements. Tomas and Euler, how do you > > think? > > > > * Adds more descriptions in "Description" section. > > * Moves prerequisities+warning to "Notes" section. > > * Adds "Usage" section which describes from a single node. > > > >> I'm not sure FOR ALL TABLES is a good idea. Or said differently, I'm > >> sure it won't work for a number of use cases. I know large databases > >> it's common to create "work tables" (not necessarily temporary) as part > >> of a batch job, but there's no need to replicate those tables. > > > > Indeed, the documentation does not describe that all tables in the database > > would be included in the publication. > > > >> I do understand that FOR ALL TABLES is the simplest approach, and for v1 > >> it may be an acceptable limitation, but maybe it'd be good to also > >> support restricting which tables should be replicated (e.g. blacklist or > >> whitelist based on table/schema name?). > > > > May not directly related, but we considered that accepting options was a > > next-step [1]. > > > >> Note: I now realize this might fall under the warning about DDL, which > >> says this: > >> > >> Executing DDL commands on the source server while running > >> pg_createsubscriber is not recommended. If the target server has > >> already been converted to logical replica, the DDL commands must > >> not be replicated so an error would occur. > > > > Yeah, they would not be replicated, but not lead ERROR. > > So should we say like "Creating tables on the source server..."? > > > > Perhaps. Clarifying the docs would help, but it depends on the wording. > For example, I doubt this should talk about "creating tables" because > there are other DDL that (probably) could cause issues (like adding a > column to the table, or something like that). > > >> 5) slot / publication / subscription name > >> > >> I find it somewhat annoying it's not possible to specify names for > >> objects created by the tool - replication slots, publication and > >> subscriptions. If this is meant to be a replica running for a while, > >> after a while I'll have no idea what pg_createsubscriber_569853 or > >> pg_createsubscriber_459548_2348239 was meant for. > >> > >> This is particularly annoying because renaming these objects later is > >> either not supported at all (e.g. for replication slots), or may be > >> quite difficult (e.g. publications). > >> > >> I do realize there are challenges with custom names (say, if there are > >> multiple databases to replicate), but can't we support some simple > >> formatting with basic placeholders? So we could specify > >> > >> --slot-name "myslot_%d_%p" > >> > >> or something like that? > > > > Not sure we can do in the first version, but looks nice. One concern is > > that I > > cannot find applications which accepts escape strings like log_line_prefix. > > (It may just because we do not have use-case.) Do you know examples? > > > > I can't think of a tool already doing that, but I think that's simply > because it was not needed. Why should we be concerned about this? > > >> BTW what will happen if we convert multiple standbys? Can't they all get > >> the same slot name (they all have the same database OID, and I'm not > >> sure how much entropy the PID has)? > > > > I tested and the second try did not work. The primal reason was the name of > >
Re: May be BUG. Periodic burst growth of the checkpoint_req counter on replica.
On 11.03.2024 03:39, Alexander Korotkov wrote: Now that we distinguish stats for checkpoints and restartpoints, we need to update the docs. Please, check the patch attached. Maybe bring the pg_stat_get_checkpointer_buffers_written() description in consistent with these changes? Like that: --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -5740 +5740 @@ - descr => 'statistics: number of buffers written by the checkpointer', + descr => 'statistics: number of buffers written during checkpoints and restartpoints', And after i took a fresh look at this patch i noticed a simple way to extract write_time and sync_time counters for restartpoints too. What do you think, is there a sense to do this? With the best wishes, -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Missing LWLock protection in pgstat_reset_replslot()
On Fri, Mar 08, 2024 at 07:46:26PM +0900, Michael Paquier wrote: > Sounds good to me. I've applied the patch of this thread as b36fbd9f8da1, though I did not see a huge point in backpatching as at the end this is just a consistency improvement. -- Michael signature.asc Description: PGP signature
Re: remaining sql/json patches
On Sun, Mar 10, 2024 at 10:57 PM jian he wrote: > > one more issue. Hi one more documentation issue. after applied V42, 0001 to 0003, there are 11 appearance of `FORMAT JSON` in functions-json.html still not a single place explained what it is for. json_query ( context_item, path_expression [ PASSING { value AS varname } [, ...]] [ RETURNING data_type [ FORMAT JSON [ ENCODING UTF8 ] ] ] [ { WITHOUT | WITH { CONDITIONAL | [UNCONDITIONAL] } } [ ARRAY ] WRAPPER ] [ { KEEP | OMIT } QUOTES [ ON SCALAR STRING ] ] [ { ERROR | NULL | EMPTY { [ ARRAY ] | OBJECT } | DEFAULT expression } ON EMPTY ] [ { ERROR | NULL | EMPTY { [ ARRAY ] | OBJECT } | DEFAULT expression } ON ERROR ]) FORMAT JSON seems just a syntax sugar or for compatibility in json_query. but it returns an error when the returning type category is not TYPCATEGORY_STRING. for example, even the following will return an error. ` CREATE TYPE regtest_comptype AS (b text); SELECT JSON_QUERY(jsonb '{"a":{"b":"c"}}', '$.a' RETURNING regtest_comptype format json); ` seems only types in[0] will not generate an error, when specifying FORMAT JSON in JSON_QUERY. so it actually does something, not a syntax sugar? [0] https://www.postgresql.org/docs/current/datatype-character.html
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Thu, Mar 7, 2024 at 10:35 PM Masahiko Sawada wrote: > > I've attached the remaining patches for CI. I've made some minor > changes in separate patches and drafted the commit message for > tidstore patch. > > While reviewing the tidstore code, I thought that it would be more > appropriate to place tidstore.c under src/backend/lib instead of > src/backend/common/access since the tidstore is no longer implemented > only for heap or other access methods, and it might also be used by > executor nodes in the future. What do you think? That's a heck of a good question. I don't think src/backend/lib is right -- it seems that's for general-purpose data structures. Something like backend/utils is also too general. src/backend/access/common has things for tuple descriptors, toast, sessions, and I don't think tidstore is out of place here. I'm not sure there's a better place, but I could be convinced otherwise. v68-0001: I'm not sure if commit messages are much a subject of review, and it's up to the committer, but I'll share a couple comments just as something to think about, not something I would ask you to change: I think it's a bit distracting that the commit message talks about the justification to use it for vacuum. Let's save that for the commit with actual vacuum changes. Also, I suspect saying there are a "wide range" of uses is over-selling it a bit, and that paragraph is a bit awkward aside from that. + /* Collect TIDs extracted from the key-value pair */ + result->num_offsets = 0; + This comment has nothing at all to do with this line. If the comment is for several lines following, some of which are separated by blank lines, there should be a blank line after the comment. Also, why isn't tidstore_iter_extract_tids() responsible for setting that to zero? + ts->context = CurrentMemoryContext; As far as I can tell, this member is never accessed again -- am I missing something? + /* DSA for tidstore will be detached at the end of session */ No other test module pins the mapping, but that doesn't necessarily mean it's wrong. Is there some advantage over explicitly detaching? +-- Add tids in random order. I don't see any randomization here. I do remember adding row_number to remove whitespace in the output, but I don't remember a random order. On that subject, the row_number was an easy trick to avoid extra whitespace, but maybe we should just teach the setting function to return blocknumber rather than null? +Datum +tidstore_create(PG_FUNCTION_ARGS) +{ ... + tidstore = TidStoreCreate(max_bytes, dsa); +Datum +tidstore_set_block_offsets(PG_FUNCTION_ARGS) +{ + TidStoreSetBlockOffsets(tidstore, blkno, offs, noffs); These names are too similar. Maybe the test module should do s/tidstore_/test_/ or similar. +/* Sanity check if we've called tidstore_create() */ +static void +check_tidstore_available(void) +{ + if (tidstore == NULL) + elog(ERROR, "tidstore is not initialized"); +} I don't find this very helpful. If a developer wiped out the create call, wouldn't the test crash and burn pretty obviously? In general, the .sql file is still very hard-coded. Functions are created that contain a VALUES statement. Maybe it's okay for now, but wanted to mention it. Ideally, we'd have some randomized tests, without having to display it. That could be in addition to (not replacing) the small tests we have that display input. (see below) v68-0002: @@ -329,6 +381,13 @@ TidStoreIsMember(TidStore *ts, ItemPointer tid) ret = (page->words[wordnum] & ((bitmapword) 1 << bitnum)) != 0; +#ifdef TIDSTORE_DEBUG + if (!TidStoreIsShared(ts)) + { + bool ret_debug = ts_debug_is_member(ts, tid);; + Assert(ret == ret_debug); + } +#endif This only checking the case where we haven't returned already. In particular... + /* no entry for the blk */ + if (page == NULL) + return false; + + wordnum = WORDNUM(off); + bitnum = BITNUM(off); + + /* no bitmap for the off */ + if (wordnum >= page->nwords) + return false; ...these results are not checked. More broadly, it seems like the test module should be able to test everything that the debug-build array would complain about. Including ordered iteration. This may require first saving our test input to a table. We could create a cursor on a query that fetches the ordered input from the table and verifies that the tid store iterate produces the same ordered set, maybe with pl/pgSQL. Or something like that. Seems like not a whole lot of work. I can try later in the week, if you like. v68-0005/6 look ready to squash v68-0008 - I'm not a fan of captilizing short comment fragments. I use the style of either: short lower-case phrases, or full sentences including capitalization, correct grammar and period. I see these two styles all over the code base, as appropriate. + /* Remain attached until end of backend */ We'll probably want this comment, if in fact we want this behavior. + /* + * Note that funcctx->call_cntr is incremented in SRF_RETURN_NEXT + * before return. +
Re: Confine vacuum skip logic to lazy_scan_skip
On Mon, Mar 11, 2024 at 5:31 AM Melanie Plageman wrote: > On Wed, Mar 6, 2024 at 6:47 PM Melanie Plageman > wrote: > > Performance results: > > > > The TL;DR of my performance results is that streaming read vacuum is > > faster. However there is an issue with the interaction of the streaming > > read code and the vacuum buffer access strategy which must be addressed. Woo. > I have investigated the interaction between > maintenance_io_concurrency, streaming reads, and the vacuum buffer > access strategy (BAS_VACUUM). > > The streaming read API limits max_pinned_buffers to a pinned buffer > multiplier (currently 4) * maintenance_io_concurrency buffers with the > goal of constructing reads of at least MAX_BUFFERS_PER_TRANSFER size. > > Since the BAS_VACUUM ring buffer is size 256 kB or 32 buffers with > default block size, that means that for a fully uncached vacuum in > which all blocks must be vacuumed and will be dirtied, you'd have to > set maintenance_io_concurrency at 8 or lower to see the same number of > reuses (and shared buffer consumption) as master. > > Given that we allow users to specify BUFFER_USAGE_LIMIT to vacuum, it > seems like we should force max_pinned_buffers to a value that > guarantees the expected shared buffer usage by vacuum. But that means > that maintenance_io_concurrency does not have a predictable impact on > streaming read vacuum. > > What is the right thing to do here? > > At the least, the default size of the BAS_VACUUM ring buffer should be > BLCKSZ * pinned_buffer_multiplier * default maintenance_io_concurrency > (probably rounded up to the next power of two) bytes. Hmm, does the v6 look-ahead distance control algorithm mitigate that problem? Using the ABC classifications from the streaming read thread, I think for A it should now pin only 1, for B 16 and for C, it depends on the size of the random 'chunks': if you have a lot of size 1 random reads then it shouldn't go above 10 because of (default) maintenance_io_concurrency. The only way to get up to very high numbers would be to have a lot of random chunks triggering behaviour C, but each made up of long runs of misses. For example one can contrive a BHS query that happens to read pages 0-15 then 20-35 then 40-55 etc etc so that we want to get lots of wide I/Os running concurrently. Unless vacuum manages to do something like that, it shouldn't be able to exceed 32 buffers very easily. I suspect that if we taught streaming_read.c to ask the BufferAccessStrategy (if one is passed in) what its recommended pin limit is (strategy->nbuffers?), we could just clamp max_pinned_buffers, and it would be hard to find a workload where that makes a difference, and we could think about more complicated logic later. In other words, I think/hope your complaints about excessive pinning from v5 WRT all-cached heap scans might have also already improved this case by happy coincidence? I haven't tried it out though, I just read your description of the problem...
Re: Reducing the log spam
On Sat, 2024-03-09 at 14:03 +0100, Laurenz Albe wrote: > Here is a patch that implements this. And here is patch v2 that fixes a bug and passes the regression tests. Yours, Laurenz Albe From 6c72ea7d0aa1df569a4e53f54fcb1a11542ac0ef Mon Sep 17 00:00:00 2001 From: Laurenz Albe Date: Mon, 11 Mar 2024 03:41:49 +0100 Subject: [PATCH v2] Add parameter log_suppress_errcodes The parameter contains a list of SQLSTATEs for which FATAL and ERROR messages are not logged. This is to suppress messages that are of little interest to the database administrator, but tend to clutter the log. Author: Laurenz Albe Discussion: https://postgr.es/m/408f399e7de1416c47bab7e260327ed5ad92838c.camel%40cybertec.at --- doc/src/sgml/config.sgml | 34 + doc/src/sgml/logical-replication.sgml | 3 +- src/backend/utils/error/elog.c| 116 ++ src/backend/utils/misc/guc_tables.c | 11 ++ src/backend/utils/misc/postgresql.conf.sample | 4 + src/bin/pg_ctl/t/004_logrotate.pl | 1 + src/include/pg_config_manual.h| 10 ++ src/include/utils/guc.h | 1 + src/include/utils/guc_hooks.h | 2 + src/test/subscription/t/014_binary.pl | 5 + src/test/subscription/t/029_on_error.pl | 1 + 11 files changed, 187 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 43b1a132a2..89a698c7b6 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -6850,6 +6850,40 @@ local0.*/var/log/postgresql + + log_suppress_errcodes (string) + + log_suppress_errcodes configuration parameter + + + + +Causes ERROR and FATAL messages +with certain error codes to be excluded from the log. +The value is a comma-separated list of five-character error codes as +listed in . An error code that +represents a class of errors (ends with three zeros) suppresses logging +of all error codes within that class. For example, the entry +08000 (connection_exception) +would suppress an error with code 08P01 +(protocol_violation). The default setting is +23505,3D000,3F000,42601,42704,42883,42P01,57P03. +Only superusers and users with the appropriate SET +privilege can change this setting. + + + +This setting is useful to exclude error messages from the log that are +frequent but irrelevant. That makes it easier to spot relevant +messages in the log and keeps log files from growing too big. For +example, if you have a monitoring system that regularly establishes a +TCP connection to the server without sending a correct startup packet, +you could suppress the protocol violation errors by adding the error +code 08P01 to the list. + + + + log_min_duration_statement (integer) diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml index ec2130669e..6c647a8782 100644 --- a/doc/src/sgml/logical-replication.sgml +++ b/doc/src/sgml/logical-replication.sgml @@ -1490,7 +1490,8 @@ test_sub=# SELECT * FROM t1 ORDER BY id; A conflict will produce an error and will stop the replication; it must be resolved manually by the user. Details about the conflict can be found in - the subscriber's server log. + the subscriber's server log if you remove 23505 from + . diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index c9719f358b..01eb00e870 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -112,12 +112,16 @@ int Log_error_verbosity = PGERROR_DEFAULT; char *Log_line_prefix = NULL; /* format for extra log line info */ int Log_destination = LOG_DESTINATION_STDERR; char *Log_destination_string = NULL; +char *log_suppress_errcodes = NULL; bool syslog_sequence_numbers = true; bool syslog_split_messages = true; /* Processed form of backtrace_functions GUC */ static char *backtrace_function_list; +/* Processed form form of log_suppress_errcodes (zero-terminated array) */ +static int *suppressed_errcodes; + #ifdef HAVE_SYSLOG /* @@ -866,6 +870,27 @@ errcode(int sqlerrcode) edata->sqlerrcode = sqlerrcode; + /* + * ERROR and FATAL messages with codes in log_suppress_errcodes don't get + * logged. + */ + if ((edata->elevel == ERROR || + edata->elevel == FATAL) && + suppressed_errcodes != NULL) + { + int *state; + + for (state = suppressed_errcodes; *state != 0; state++) + /* error categories match all error codes in the category */ + if (sqlerrcode == *state || +(ERRCODE_IS_CATEGORY(*state) && + ERRCODE_TO_CATEGORY(sqlerrcode) == *state)) + { +edata->output_to_server = false; +break; + } + } + return 0; /* return value
Re: Stack overflow issue
On Thu, Mar 7, 2024 at 11:07 AM Alexander Korotkov wrote: > On Thu, Mar 7, 2024 at 9:53 AM Egor Chindyaskin wrote: > > > > > 6 march 2024 г., at 19:17, Alexander Korotkov > > > wrote: > > > > > > The revised set of remaining patches is attached. > > > > > > 0001 Turn tail recursion into iteration in CommitTransactionCommand() > > > I did minor revision of comments and code blocks order to improve the > > > readability. > > > > > > 0002 Avoid stack overflow in ShowTransactionStateRec() > > > I didn't notice any issues, leave this piece as is. > > > > > > 0003 Avoid recursion in MemoryContext functions > > > I've renamed MemoryContextTraverse() => MemoryContextTraverseNext(), > > > which I think is a bit more intuitive. Also I fixed > > > MemoryContextMemConsumed(), which was still trying to use the removed > > > argument "print" of MemoryContextStatsInternal() function. > > > > > > Generally, I think this patchset fixes important stack overflow holes. > > > It is quite straightforward, clear and the code has a good shape. I'm > > > going to push this if no objections. > > > > I have tested the scripts from message [1]. After applying these patches > > and Tom Lane’s patch from message [2], all of the above mentioned functions > > no longer caused the server to crash. I also tried increasing the values in > > the presented scripts, which also did not lead to server crashes. Thank you! > > Also, I would like to clarify something. Will fixes from message [3] and > > others be backported to all other branches, not just the master branch? As > > far as I remember, Tom Lane made corrections to all branches. For example > > [4]. > > > > Links: > > 1. > > https://www.postgresql.org/message-id/343ff14f-3060-4f88-9cc6-efdb390185df%40mail.ru > > 2. https://www.postgresql.org/message-id/386032.1709765547%40sss.pgh.pa.us > > 3. > > https://www.postgresql.org/message-id/CAPpHfduZqAjF%2B7rDRP-RGNHjOXy7nvFROQ0MGS436f8FPY5DpQ%40mail.gmail.com > > 4. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e07ebd4b > > Thank you for your feedback! > > Initially I didn't intend to backpatch any of these. But on second > thought with the references you provided, I think we should backpatch > simple check_stack_depth() checks from d57b7cc333 to all supported > branches, but apply refactoring of memory contextes and transaction > commit/abort just to master. Opinions? I've just backpatched check_stack_depth() checks to all supported branches. -- Regards, Alexander Korotkov
Re: Fix log_line_prefix to display the transaction id (%x) for statements not in a transaction block
On 2024/3/4 15:48, jian he wrote: Maybe we can tolerate LOG, first output the query plan then statement. It is more appropriate to let the extension solve its own problems. Of course, this change is not easy to implement. Due to the way XID is assigned, there seems to be no good solution at the moment.
Re: Make COPY format extendable: Extract COPY TO format implementations
Hi, In "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 11 Mar 2024 08:00:00 +0800, jian he wrote: > Hi, here are my cents: > Currently in v17, we have 3 extra functions within DoCopyTo > CopyToStart, one time, start, doing some preliminary work. > CopyToOneRow, doing the repetitive work, called many times, row by row. > CopyToEnd, one time doing the closing work. > > seems to need a function pointer for processing the format and other options. > or maybe the reason is we need a one time function call before doing DoCopyTo, > like one time initialization. I know that JSON format wants it but can we defer it? We can add more options later. I want to proceed this improvement step by step. More use cases will help us which callbacks are needed. We will be able to collect more use cases by providing basic callbacks. > generally in v17, the code pattern looks like this. > if (cstate->opts.binary) > { > /* handle binary format */ > } > else if (cstate->routine) > { > /* custom code, make the copy format extensible */ > } > else > { > /* handle non-binary, (csv or text) format */ > } > maybe we need another bool flag like `bool buildin_format`. > if the copy format is {csv|text|binary} then buildin_format is true else > false. > > so the code pattern would be: > if (cstate->opts.binary) > { > /* handle binary format */ > } > else if (cstate->routine && !buildin_format) > { > /* custom code, make the copy format extensible */ > } > else > { > /* handle non-binary, (csv or text) format */ > } > > otherwise the {CopyToRoutine| CopyFromRoutine} needs a function pointer > to distinguish native copy format and extensible supported format, > like I mentioned above? Hmm. I may miss something but I think that we don't need the bool flag. Because we don't set cstate->routine for native copy formats. So we can distinguish native copy format and extensible supported format by checking only cstate->routine. Thanks, -- kou
Re: May be BUG. Periodic burst growth of the checkpoint_req counter on replica.
On Sat, Mar 9, 2024 at 4:38 PM Magnus Hagander wrote: > Per the docs, the sync_time, write_time and buffers_written only apply > to checkpoints, not restartpoints. Is this correct? AFAICT from a > quick look at the code they include both checkpoints and restartpoints > in which case I think the docs should be clarified to indicate that? Right, these fields work as before reflecting both checkpoints and restartpoints. Documentation said checkpoints implying restartpoints as well. Now that we distinguish stats for checkpoints and restartpoints, we need to update the docs. Please, check the patch attached. -- Regards, Alexander Korotkov rename_pg_stat_get_checkpointer_fields.patch Description: Binary data
Re: Add checkpoint/redo LSNs to recovery errors.
On Sun, Mar 10, 2024 at 04:58:19PM +1300, David Steele wrote: > No doubt there are many other recovery log messages that could be improved, > but I'd rather not bog down the patch. Fair argument, and these additions are useful when taken independently. I've applied your suggestions for now. -- Michael signature.asc Description: PGP signature
Re: Make COPY format extendable: Extract COPY TO format implementations
On Fri, Mar 8, 2024 at 8:23 AM Sutou Kouhei wrote: > > > This shows that the v17 approach doesn't affect the current > text/csv/binary implementations. (The v17 approach just adds > 2 new structs, Copy{From,To}Rountine, without changing the > current text/csv/binary implementations.) > > Can we push the v17 patch and proceed following > implementations? Could someone (especially a PostgreSQL > committer) take a look at this for double-check? > Hi, here are my cents: Currently in v17, we have 3 extra functions within DoCopyTo CopyToStart, one time, start, doing some preliminary work. CopyToOneRow, doing the repetitive work, called many times, row by row. CopyToEnd, one time doing the closing work. seems to need a function pointer for processing the format and other options. or maybe the reason is we need a one time function call before doing DoCopyTo, like one time initialization. We can placed the function pointer after: ` cstate = BeginCopyTo(pstate, rel, query, relid, stmt->filename, stmt->is_program, NULL, stmt->attlist, stmt->options); ` generally in v17, the code pattern looks like this. if (cstate->opts.binary) { /* handle binary format */ } else if (cstate->routine) { /* custom code, make the copy format extensible */ } else { /* handle non-binary, (csv or text) format */ } maybe we need another bool flag like `bool buildin_format`. if the copy format is {csv|text|binary} then buildin_format is true else false. so the code pattern would be: if (cstate->opts.binary) { /* handle binary format */ } else if (cstate->routine && !buildin_format) { /* custom code, make the copy format extensible */ } else { /* handle non-binary, (csv or text) format */ } otherwise the {CopyToRoutine| CopyFromRoutine} needs a function pointer to distinguish native copy format and extensible supported format, like I mentioned above?
Re: Vectored I/O in bulk_write.c
Here also is a first attempt at improving the memory allocation and memory layout. I wonder if bulk logging should trigger larger WAL writes in the "Have to write it ourselves" case in AdvanceXLInsertBuffer(), since writing 8kB of WAL at a time seems like an unnecessarily degraded level of performance, especially with wal_sync_method=open_datasync. Of course the real answer is "make sure wal_buffers is high enough for your workload" (usually indirectly by automatically scaling from shared_buffers), but this problem jumps out when tracing bulk_writes.c with default settings. We write out the index 128kB at a time, but the WAL 8kB at a time. From 793caf2db6c00314f5bd8f7146d4797508f2f627 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sat, 9 Mar 2024 16:04:21 +1300 Subject: [PATCH v3 1/3] Provide vectored variant of smgrextend(). Since mdwrite() and mdextend() were basically the same, merge them. They had different assertions, but those would surely apply to any implementation of smgr, so move them up to the smgr.c wrapper level. The other difference was the policy on segment creation, but that can be captured by having smgrwritev() and smgrextendv() call a single mdwritev() function with a new "extend" flag. Discussion: https://postgr.es/m/CA%2BhUKGLx5bLwezZKAYB2O_qHj%3Dov10RpgRVY7e8TSJVE74oVjg%40mail.gmail.com --- src/backend/storage/smgr/md.c | 106 +--- src/backend/storage/smgr/smgr.c | 32 ++ src/include/storage/md.h| 3 +- src/include/storage/smgr.h | 12 +++- 4 files changed, 47 insertions(+), 106 deletions(-) diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c index bf0f3ca76d1..9b125841226 100644 --- a/src/backend/storage/smgr/md.c +++ b/src/backend/storage/smgr/md.c @@ -447,79 +447,10 @@ mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo) pfree(path); } -/* - * mdextend() -- Add a block to the specified relation. - * - * The semantics are nearly the same as mdwrite(): write at the - * specified position. However, this is to be used for the case of - * extending a relation (i.e., blocknum is at or beyond the current - * EOF). Note that we assume writing a block beyond current EOF - * causes intervening file space to become filled with zeroes. - */ -void -mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, - const void *buffer, bool skipFsync) -{ - off_t seekpos; - int nbytes; - MdfdVec*v; - - /* If this build supports direct I/O, the buffer must be I/O aligned. */ - if (PG_O_DIRECT != 0 && PG_IO_ALIGN_SIZE <= BLCKSZ) - Assert((uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)); - - /* This assert is too expensive to have on normally ... */ -#ifdef CHECK_WRITE_VS_EXTEND - Assert(blocknum >= mdnblocks(reln, forknum)); -#endif - - /* - * If a relation manages to grow to 2^32-1 blocks, refuse to extend it any - * more --- we mustn't create a block whose number actually is - * InvalidBlockNumber. (Note that this failure should be unreachable - * because of upstream checks in bufmgr.c.) - */ - if (blocknum == InvalidBlockNumber) - ereport(ERROR, -(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), - errmsg("cannot extend file \"%s\" beyond %u blocks", - relpath(reln->smgr_rlocator, forknum), - InvalidBlockNumber))); - - v = _mdfd_getseg(reln, forknum, blocknum, skipFsync, EXTENSION_CREATE); - - seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE)); - - Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE); - - if ((nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ, seekpos, WAIT_EVENT_DATA_FILE_EXTEND)) != BLCKSZ) - { - if (nbytes < 0) - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not extend file \"%s\": %m", - FilePathName(v->mdfd_vfd)), - errhint("Check free disk space."))); - /* short write: complain appropriately */ - ereport(ERROR, -(errcode(ERRCODE_DISK_FULL), - errmsg("could not extend file \"%s\": wrote only %d of %d bytes at block %u", - FilePathName(v->mdfd_vfd), - nbytes, BLCKSZ, blocknum), - errhint("Check free disk space."))); - } - - if (!skipFsync && !SmgrIsTemp(reln)) - register_dirty_segment(reln, forknum, v); - - Assert(_mdnblocks(reln, forknum, v) <= ((BlockNumber) RELSEG_SIZE)); -} - /* * mdzeroextend() -- Add new zeroed out blocks to the specified relation. * - * Similar to mdextend(), except the relation can be extended by multiple - * blocks at once and the added blocks will be filled with zeroes. + * The added blocks will be filled with zeroes. */ void mdzeroextend(SMgrRelation reln, ForkNumber forknum, @@ -595,13 +526,7 @@ mdzeroextend(SMgrRelation reln, ForkNumber forknum, { int ret; - /* - * Even if we don't want to use fallocate, we can still extend a - * bit more efficiently than writing each 8kB block individually. - * pg_pwrite_zeros() (via FileZero()) uses pg_pwritev_with_retry() - * to avoid multiple
Re: Hooking into ExplainOneQuery() complicated by missing standard_ExplainOneQuery
On Thu, Mar 07, 2024 at 07:45:01AM +0900, Michael Paquier wrote: > That's nice. You would be able to shave quite a bit of code. If > there are no objections, I propose to apply the change of this thread > to have this standard explain wrapper at the beginning of next week. > If others have any comments, feel free. Well, done as of a04ddd077e61. -- Michael signature.asc Description: PGP signature
Re: Adding OLD/NEW support to RETURNING
On Sat, Mar 9, 2024 at 3:53 AM Dean Rasheed wrote: > > > Attached is a new patch, now with docs (no other code changes). > Hi, some issues I found, while playing around with support-returning-old-new-v2.patch doc/src/sgml/ref/update.sgml: [ RETURNING [ WITH ( { OLD | NEW } AS output_alias [, ...] ) ] * | output_expression [ [ AS ] output_name ] [, ...] ] There is no parameter explanation for `*`. so, I think the synopsis may not cover cases like: ` update foo set f3 = 443 RETURNING new.*; ` I saw the explanation at output_alias, though. - insert into foo select 1, 2 RETURNING old.*, new.f2, old.f1(); ERROR: function old.f1() does not exist LINE 1: ...sert into foo select 1, 2 RETURNING old.*, new.f2, old.f1(); ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. I guess that's ok, slightly different context evaluation. if you say "old.f1", old refers to the virtual table "old", but "old.f1()", the "old" , reevaluate to the schema "old". you need privilege to schema "old", you also need execution privilege to function "old.f1()" to execute the above query. so seems no security issue after all. - I found a fancy expression: ` CREATE TABLE foo (f1 serial, f2 text, f3 int default 42); insert into foo select 1, 2 union select 11, 22 RETURNING old.*, new.f2, (select sum(new.f1) over()); ` is this ok? also the following works on PG16, not sure it's a bug. ` insert into foo select 1, 2 union select 11, 22 RETURNING (select count(*)); ` but not these ` insert into foo select 1, 2 union select 11, 22 RETURNING (select count(old.*)); insert into foo select 1, 2 union select 11, 22 RETURNING (select sum(f1)); ` - I found another interesting case, while trying to add some tests on for new code in createplan.c. in postgres_fdw.sql, right after line `MERGE ought to fail cleanly` --this will work insert into itrtest select 1, 'foo' returning new.*,old.*; --these two will fail insert into remp1 select 1, 'foo' returning new.*; insert into remp1 select 1, 'foo' returning old.*; itrtest is the partitioned non-foreign table. remp1 is the partition of itrtest, foreign table. -- I did find a segment fault bug. insert into foo select 1, 2 RETURNING (select sum(old.f1) over()); This information is set in a subplan node. /* update the ExprState's flags if Var refers to OLD/NEW */ if (variable->varreturningtype == VAR_RETURNING_OLD) state->flags |= EEO_FLAG_HAS_OLD; else if (variable->varreturningtype == VAR_RETURNING_NEW) state->flags |= EEO_FLAG_HAS_NEW; but in ExecInsert: ` else if (resultRelInfo->ri_projectReturning->pi_state.flags & EEO_FLAG_HAS_OLD) { oldSlot = ExecGetReturningSlot(estate, resultRelInfo); ExecStoreAllNullTuple(oldSlot); oldSlot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc); } ` it didn't use subplan node state->flags information. so the ExecInsert above code, never called, and should be executed. however ` insert into foo select 1, 2 RETURNING (select sum(new.f1)over());` works Similarly this ` delete from foo RETURNING (select sum(new.f1) over()); ` also causes segmentation fault. -- diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h new file mode 100644 index 6133dbc..c9d3661 --- a/src/include/executor/tuptable.h +++ b/src/include/executor/tuptable.h @@ -411,12 +411,21 @@ slot_getsysattr(TupleTableSlot *slot, in { Assert(attnum < 0); /* caller error */ + /* + * If the tid is not valid, there is no physical row, and all system + * attributes are deemed to be NULL, except for the tableoid. + */ if (attnum == TableOidAttributeNumber) { *isnull = false; return ObjectIdGetDatum(slot->tts_tableOid); } - else if (attnum == SelfItemPointerAttributeNumber) + if (!ItemPointerIsValid(>tts_tid)) + { + *isnull = true; + return PointerGetDatum(NULL); + } + if (attnum == SelfItemPointerAttributeNumber) { *isnull = false; return PointerGetDatum(>tts_tid); These changes is slot_getsysattr is somehow independ of this feature?
Re: ExecAppendAsyncEventWait() in REL_14_STABLE can corrupt PG_exception_stack
On Mon, Feb 26, 2024 at 04:29:44PM +0900, Etsuro Fujita wrote: > Will do. (I was thinking you would get busy from now on.) Fujita-san, have you been able to look at this thread? -- Michael signature.asc Description: PGP signature
Re: psql: add \create_function command
> Maybe we could go with :{+...} or the like? > or maybe :{{ ... }} Tab completion didn't work for :{?} and I noted that the same problem would arise for :{+ or :{{ (and tab completion would be more important here). So I fixed that on: https://www.postgresql.org/message-id/flat/CAGRrpzZU48F2oV3d8eDLr=4tu9xfh5jt9ed+qu1+x91gmh6...@mail.gmail.com Would be great to have the above fix reviewed/committed to keep making progress here. Besides that, since :{ is already sort of a prefix for psql functions, how about having `:{file()}`? That would be clearer than :{+ or :{{. Best regards, Steve Chavez On Mon, 29 Jan 2024 at 12:29, Pavel Stehule wrote: > > > po 29. 1. 2024 v 18:11 odesílatel Tom Lane napsal: > >> Steve Chavez writes: >> > However, :{?variable_name} is already taken by psql to test whether a >> > variable is defined or not. It might be confusing to use the same >> syntax. >> >> Hmm. Maybe we could go with :{+...} or the like? >> >> > How about using the convention of interpreting an identifier as a file >> path >> > if it has an slash on it? >> >> Sorry, that is just horrid. foo/bar means division, and "foo/bar" >> is simply an identifier per SQL standard, so you can't squeeze that >> in without breaking an ocean of stuff. Plus, there are many use-cases >> where there's no reason to put a slash in a relative filename. >> > > sometimes paths starts by $ or . > > or maybe :{{ ... }} > > > >> >> regards, tom lane >> >
Re: Invalid query generated by postgres_fdw with UNION ALL and ORDER BY
On Fri, 8 Mar 2024 at 23:14, Richard Guo wrote: > I've looked at this patch a bit. I once wondered why we don't check > pathkey->pk_eclass->ec_has_const with EC_MUST_BE_REDUNDANT to see if the > pathkey is not needed. Then I realized that a child member would not be > marked as constant even if the child expr is a Const, as explained in > add_child_rel_equivalences(). This situation where the child member is a Const but the parent isn't is unique to UNION ALL queries. The only other cases where we have child members are with partitioned and inheritance tables. In those cases, the parent member just maps to the equivalent child member replacing parent Vars with the corresponding child Var according to the column mapping between the parent and child. It might be nice if partitioning supported mapping to a Const as in many cases that could save storing the same value in the table every time, but we don't support that, so this can only happen with UNION ALL queries. > BTW, I wonder if it is possible that we have a pseudoconstant expression > that is not of type Const. In such cases we would need to check > 'bms_is_empty(pull_varnos(em_expr))' instead of 'IsA(em_expr, Const)'. > However, I'm unable to think of such an expression in this context. I can't see how there'd be any problems with a misinterpretation of a pseudoconstant value as an ordinal column position on the remote server. Surely it's only actual "Const" node types that we're just going to call the type's output function which risks it yielding a string of digits and the remote server thinking that we must mean an ordinal column position. > The patch looks good to me otherwise. Thanks David
Re: Failures in constraints regression test, "read only 0 of 8192 bytes"
On Mon, Mar 11, 2024 at 9:59 AM Thomas Munro wrote: > On Mon, Mar 11, 2024 at 9:30 AM Heikki Linnakangas wrote: > > Hmm, I'm not sure if we need even smgrreleaseall() here anymore. It's > > not required for correctness AFAICS. We don't do it in single-rel > > invalidation in RelationCacheInvalidateEntry() either. > > I think we do, because we have missed sinval messages. It's unlikely > but a relfilenode might have been recycled, and we might have file > descriptors that point to the unlinked files. That is, there are new > files with the same names and we need to open those ones. ... though I think you would be right if Dilip and Robert had succeeded in their quest to introduce 56-bit non-cycling relfilenodes. And for the record, we can also shoot ourselves in the foot in another known case without sinval[1], so more work is needed here, but that doesn't mean this sinval code path should also aim footwards. [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGLs554tQFCUjv_vn7ft9Xv5LNjPoAd--3Df%2BJJKJ7A8kw%40mail.gmail.com#f099d68e95edcfe408818447d9da04a7
Re: Failures in constraints regression test, "read only 0 of 8192 bytes"
On Mon, Mar 11, 2024 at 9:30 AM Heikki Linnakangas wrote: > Barring objections, I'll commit the attached. +1 I guess the comment for smgrreleaseall() could also be updated. It mentions only PROCSIGNAL_BARRIER_SMGRRELEASE, but I think sinval overflow (InvalidateSystemCaches()) should also be mentioned? > Hmm, I'm not sure if we need even smgrreleaseall() here anymore. It's > not required for correctness AFAICS. We don't do it in single-rel > invalidation in RelationCacheInvalidateEntry() either. I think we do, because we have missed sinval messages. It's unlikely but a relfilenode might have been recycled, and we might have file descriptors that point to the unlinked files. That is, there are new files with the same names and we need to open those ones.
Re: Failures in constraints regression test, "read only 0 of 8192 bytes"
On 10/03/2024 11:20, Heikki Linnakangas wrote: On 10/03/2024 08:23, Thomas Munro wrote: On Sun, Mar 10, 2024 at 6:48 PM Thomas Munro wrote: I won't be surprised if the answer is: if you're holding a reference, you have to get a pin (referring to bulk_write.c). Ahhh, on second thoughts, I take that back, I think the original theory still actually works just fine. It's just that somewhere in our refactoring of that commit, when we were vacillating between different semantics for 'destroy' and 'release', I think we made a mistake: in RelationCacheInvalidate() I think we should now call smgrreleaseall(), not smgrdestroyall(). Yes, I ran the reproducer with 'rr', and came to the same conclusion. That smgrdestroyall() call closes destroys the SmgrRelation, breaking the new assumption that an unpinned SmgrRelation is valid until end of transaction. That satisfies the requirements for sinval queue overflow: we close md.c segments (and most importantly virtual file descriptors), so our lack of sinval records can't hurt us, we'll reopen all files as required. That's what CLOBBER_CACHE_ALWAYS is effectively testing (and more). But the smgr pointer remains valid, and retains only its "identity", eg hash table key, and that's also fine. It won't be destroyed until after the end of the transaction. Which was the point, and it allows things like bulk_write.c (and streaming_read.c) to hold an smgr reference. Right +1. I wonder if we can now simplify RelationCacheInvalidate() further. In the loop above that smgrdestroyall(): while ((idhentry = (RelIdCacheEnt *) hash_seq_search()) != NULL) { relation = idhentry->reldesc; /* Must close all smgr references to avoid leaving dangling ptrs */ RelationCloseSmgr(relation); /* * Ignore new relations; no other backend will manipulate them before * we commit. Likewise, before replacing a relation's relfilelocator, * we shall have acquired AccessExclusiveLock and drained any * applicable pending invalidations. */ if (relation->rd_createSubid != InvalidSubTransactionId || relation->rd_firstRelfilelocatorSubid != InvalidSubTransactionId) continue; I don't think we need to call RelationCloseSmgr() for relations created in the same transaction anymore. Barring objections, I'll commit the attached. Hmm, I'm not sure if we need even smgrreleaseall() here anymore. It's not required for correctness AFAICS. We don't do it in single-rel invalidation in RelationCacheInvalidateEntry() either. -- Heikki Linnakangas Neon (https://neon.tech) From be03591a4c69224de5b96335aa4a29b3462e2bc9 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Sun, 10 Mar 2024 22:18:59 +0200 Subject: [PATCH 1/1] Don't destroy SMgrRelations at relcache invalidation With commit 21d9c3ee4e, SMgrRelations remain valid until end of transaction (or longer if they're "pinned"). Relcache invalidation can happen in the middle of a transaction, so we must not destroy them at relcache invalidation anymore. This was revealed by failures in the 'constraints' test in buildfarm animals using -DCLOBBER_CACHE_ALWAYS. That started failing with commit 8af2565248, which was the first commit that started to rely on an SMgrRelation living until end of transaction. Diagnosed-by: Tomas Vondra, Thomas Munro Discussion: https://www.postgresql.org/message-id/CA%2BhUKGK%2B5DOmLaBp3Z7C4S-Yv6yoROvr1UncjH2S1ZbPT8D%2BZg%40mail.gmail.com --- src/backend/utils/cache/relcache.c | 10 +++--- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c index 7ad6a35a50e..2cd19d603fb 100644 --- a/src/backend/utils/cache/relcache.c +++ b/src/backend/utils/cache/relcache.c @@ -2985,9 +2985,6 @@ RelationCacheInvalidate(bool debug_discard) { relation = idhentry->reldesc; - /* Must close all smgr references to avoid leaving dangling ptrs */ - RelationCloseSmgr(relation); - /* * Ignore new relations; no other backend will manipulate them before * we commit. Likewise, before replacing a relation's relfilelocator, @@ -3039,11 +3036,10 @@ RelationCacheInvalidate(bool debug_discard) } /* - * Now zap any remaining smgr cache entries. This must happen before we - * start to rebuild entries, since that may involve catalog fetches which - * will re-open catalog files. + * We cannot destroy the SMgrRelations as there might still be references + * to them, but close the underlying file descriptors. */ - smgrdestroyall(); + smgrreleaseall(); /* Phase 2: rebuild the items found to need rebuild in phase 1 */ foreach(l, rebuildFirstList) -- 2.39.2
Re: Statistics Import and Export
On Sun, Mar 10, 2024 at 11:57 AM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > On Fri, Mar 8, 2024 at 12:06 PM Corey Huinker > wrote: > > > > Anyway, here's v7. Eagerly awaiting feedback. > > Thanks for working on this. It looks useful to have the ability to > restore the stats after upgrade and restore. But, the exported stats > are valid only until the next ANALYZE is run on the table. IIUC, > postgres collects stats during VACUUM, autovacuum and ANALYZE, right? > Perhaps there are other ways to collect stats. I'm thinking what > problems does the user face if they are just asked to run ANALYZE on > the tables (I'm assuming ANALYZE doesn't block concurrent access to > the tables) instead of automatically exporting stats. > Correct. These are just as temporary as any other analyze of the table. Another analyze will happen later, probably through autovacuum and wipe out these values. This is designed to QUICKLY get stats into a table to enable the database to be operational sooner. This is especially important after an upgrade/restore, when all stats were wiped out. Other uses could be adapting this for use the postgres_fdw so that we don't have to do table sampling on the remote table, and of course statistics injection to test the query planner. > 2. > +they are replaced by the next auto-analyze. This function is used > by > +pg_upgrade and pg_restore to > +convey the statistics from the old system version into the new > one. > + > > Is there any demonstration of pg_set_relation_stats and > pg_set_attribute_stats being used either in pg_upgrade or in > pg_restore? Perhaps, having them as 0002, 0003 and so on patches might > show real need for functions like this. It also clarifies how these > functions pull stats from tables on the old cluster to the tables on > the new cluster. > That code was adapted from do_analyze(), and yes, there is a patch for pg_dump, but as I noted earlier it is on hold pending feedback. > > 3. pg_set_relation_stats and pg_set_attribute_stats seem to be writing > to pg_class and might affect the plans as stats can get tampered. Can > we REVOKE the execute permissions from the public out of the box in > src/backend/catalog/system_functions.sql? This way one can decide who > to give permissions to. > You'd have to be the table owner to alter the stats. I can envision these functions getting a special role, but they could also be fine as superuser-only. > > 4. > +SELECT pg_set_relation_stats('stats_export_import.test'::regclass, > 3.6::float4, 15000); > + pg_set_relation_stats > +--- > + t > +(1 row) > + > +SELECT reltuples, relpages FROM pg_class WHERE oid = > 'stats_export_import.test'::regclass; > + reltuples | relpages > +---+-- > + 3.6 |15000 > > Isn't this test case showing a misuse of these functions? Table > actually has no rows, but we are lying to the postgres optimizer on > stats. Consider this case. You want to know at what point the query planner will start using a given index. You can generate dummy data for a thousand, a million, a billion rows, and wait for that to complete, or you can just tell the table "I say you have a billion rows, twenty million pages, etc" and see when it changes. But again, in most cases, you're setting the values to the same values the table had on the old database just before the restore/upgrade. > I think altering stats of a table mustn't be that easy for the > end user. Only easy for the end users that happen to be the table owner or a superuser. > As mentioned in comment #3, permissions need to be > tightened. In addition, we can also mark the functions pg_upgrade only > with CHECK_IS_BINARY_UPGRADE, but that might not work for pg_restore > (or I don't know if we have a way to know within the server that the > server is running for pg_restore). > I think they will have usage both in postgres_fdw and for tuning. > > 5. In continuation to the comment #2, is pg_dump supposed to generate > pg_set_relation_stats and pg_set_attribute_stats statements for each > table? When pg_dump does that , pg_restore can automatically load the > stats. > Current plan is to have one TOC entry in the post-data section with a dependency on the table/index/matview. That let's us leverage existing filters. The TOC entry will have a series of statements in it, one pg_set_relation_stats() and one pg_set_attribute_stats() per attribute. > 9. Isn't it good to add a test case where the plan of a query on table > after exporting the stats would remain same as that of the original > table from which the stats are exported? IMO, this is a more realistic > than just comparing pg_statistic of the tables because this is what an > end-user wants eventually. > I'm sure we can add something like that, but query plan formats change a lot and are greatly dependent on database configuration, so maintaining such a test would be a lot of work.
Re: pg_dump include/exclude fails to error
On Sun, Mar 10, 2024 at 4:51 PM Pavel Stehule wrote: > > Hi > > ne 10. 3. 2024 v 15:23 odesílatel Magnus Hagander > napsal: >> >> When including tables with the new pg_dump functionality, it fails to >> error out if a table is missing, but only if more than one table is >> specified. >> >> E.g., if table foo exist, but not bar: >> >> pg_dump --table bar >> pg_dump: error: no matching tables were found >> >> with file "myfilter" containing just "table bar" >> pg_dump --filter myfilter >> pg_dump: error: no matching tables were found >> >> with the file "myfilter" containing both "table foo" and "table bar" >> (order doesn't matter): >> > > > is not this expected behaviour (consistent with -t option)? > > (2024-03-10 16:48:07) postgres=# \dt > List of relations > ┌┬──┬───┬───┐ > │ Schema │ Name │ Type │ Owner │ > ╞╪══╪═══╪═══╡ > │ public │ foo │ table │ pavel │ > └┴──┴───┴───┘ > (1 row) > > pavel@nemesis:~/src/orafce$ /usr/local/pgsql/master/bin/pg_dump -t foo -t boo > > /dev/null > pavel@nemesis:~/src/orafce$ > > if you want to raise error, you should to use option --strict-names. > > pavel@nemesis:~/src/orafce$ /usr/local/pgsql/master/bin/pg_dump -t foo -t boo > --strict-names > /dev/null > pg_dump: error: no matching tables were found for pattern "boo" Hmpf, you're right. I guess I don't use multiple-dash-t often enough :) So yeah, then I agree this is probably the right behaviour. Maybe the docs for --filter deserves a specific mention about these rules though, as it's going to be a lot more common to specify multiples when using --filter? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Confine vacuum skip logic to lazy_scan_skip
On Wed, Mar 6, 2024 at 6:47 PM Melanie Plageman wrote: > > Performance results: > > The TL;DR of my performance results is that streaming read vacuum is > faster. However there is an issue with the interaction of the streaming > read code and the vacuum buffer access strategy which must be addressed. I have investigated the interaction between maintenance_io_concurrency, streaming reads, and the vacuum buffer access strategy (BAS_VACUUM). The streaming read API limits max_pinned_buffers to a pinned buffer multiplier (currently 4) * maintenance_io_concurrency buffers with the goal of constructing reads of at least MAX_BUFFERS_PER_TRANSFER size. Since the BAS_VACUUM ring buffer is size 256 kB or 32 buffers with default block size, that means that for a fully uncached vacuum in which all blocks must be vacuumed and will be dirtied, you'd have to set maintenance_io_concurrency at 8 or lower to see the same number of reuses (and shared buffer consumption) as master. Given that we allow users to specify BUFFER_USAGE_LIMIT to vacuum, it seems like we should force max_pinned_buffers to a value that guarantees the expected shared buffer usage by vacuum. But that means that maintenance_io_concurrency does not have a predictable impact on streaming read vacuum. What is the right thing to do here? At the least, the default size of the BAS_VACUUM ring buffer should be BLCKSZ * pinned_buffer_multiplier * default maintenance_io_concurrency (probably rounded up to the next power of two) bytes. - Melanie
Re: UUID v7
> On 10 Mar 2024, at 17:59, Andrey M. Borodin wrote: > > I tried to "make docs", but it gives me gazilion of errors... Is there an > easy way to see resulting HTML? Oops, CFbot expectedly found a problem... Sorry for the noise, this version, I hope, will pass all the tests. Thanks! Best regards, Andrey Borodin. v19-0001-Implement-UUID-v7.patch Description: Binary data
Re: Statistics Import and Export
On Fri, Mar 8, 2024 at 12:06 PM Corey Huinker wrote: > > Anyway, here's v7. Eagerly awaiting feedback. Thanks for working on this. It looks useful to have the ability to restore the stats after upgrade and restore. But, the exported stats are valid only until the next ANALYZE is run on the table. IIUC, postgres collects stats during VACUUM, autovacuum and ANALYZE, right? Perhaps there are other ways to collect stats. I'm thinking what problems does the user face if they are just asked to run ANALYZE on the tables (I'm assuming ANALYZE doesn't block concurrent access to the tables) instead of automatically exporting stats. Here are some comments on the v7 patch. I've not dived into the whole thread, so some comments may be of type repeated or need clarification. Please bear with me. 1. The following two are unnecessary changes in pg_proc.dat, please remove them. --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -8818,7 +8818,6 @@ { oid => '3813', descr => 'generate XML text node', proname => 'xmltext', proisstrict => 't', prorettype => 'xml', proargtypes => 'text', prosrc => 'xmltext' }, - { oid => '2923', descr => 'map table contents to XML', proname => 'table_to_xml', procost => '100', provolatile => 's', proparallel => 'r', prorettype => 'xml', @@ -12163,8 +12162,24 @@ # GiST stratnum implementations { oid => '8047', descr => 'GiST support', - proname => 'gist_stratnum_identity', prorettype => 'int2', + proname => 'gist_stratnum_identity',prorettype => 'int2', proargtypes => 'int2', prosrc => 'gist_stratnum_identity' }, 2. +they are replaced by the next auto-analyze. This function is used by +pg_upgrade and pg_restore to +convey the statistics from the old system version into the new one. + Is there any demonstration of pg_set_relation_stats and pg_set_attribute_stats being used either in pg_upgrade or in pg_restore? Perhaps, having them as 0002, 0003 and so on patches might show real need for functions like this. It also clarifies how these functions pull stats from tables on the old cluster to the tables on the new cluster. 3. pg_set_relation_stats and pg_set_attribute_stats seem to be writing to pg_class and might affect the plans as stats can get tampered. Can we REVOKE the execute permissions from the public out of the box in src/backend/catalog/system_functions.sql? This way one can decide who to give permissions to. 4. +SELECT pg_set_relation_stats('stats_export_import.test'::regclass, 3.6::float4, 15000); + pg_set_relation_stats +--- + t +(1 row) + +SELECT reltuples, relpages FROM pg_class WHERE oid = 'stats_export_import.test'::regclass; + reltuples | relpages +---+-- + 3.6 |15000 Isn't this test case showing a misuse of these functions? Table actually has no rows, but we are lying to the postgres optimizer on stats. I think altering stats of a table mustn't be that easy for the end user. As mentioned in comment #3, permissions need to be tightened. In addition, we can also mark the functions pg_upgrade only with CHECK_IS_BINARY_UPGRADE, but that might not work for pg_restore (or I don't know if we have a way to know within the server that the server is running for pg_restore). 5. In continuation to the comment #2, is pg_dump supposed to generate pg_set_relation_stats and pg_set_attribute_stats statements for each table? When pg_dump does that , pg_restore can automatically load the stats. 6. +/*- * * statistics.c * + * IDENTIFICATION + * src/backend/statistics/statistics.c + * + *- A description of what the new file statistics.c does is missing. 7. pgindent isn't happy with new file statistics.c, please check. 8. +/* + * Import statistics for a given relation attribute + * + * pg_set_attribute_stats(relation Oid, attname name, stainherit bool, + *stanullfrac float4, stawidth int, stadistinct float4, Having function definition in the function comment isn't necessary - it's hard to keep it consistent with pg_proc.dat in future. If required, one can either look at pg_proc.dat or docs. 9. Isn't it good to add a test case where the plan of a query on table after exporting the stats would remain same as that of the original table from which the stats are exported? IMO, this is a more realistic than just comparing pg_statistic of the tables because this is what an end-user wants eventually. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: pg_dump include/exclude fails to error
Hi ne 10. 3. 2024 v 15:23 odesílatel Magnus Hagander napsal: > When including tables with the new pg_dump functionality, it fails to > error out if a table is missing, but only if more than one table is > specified. > > E.g., if table foo exist, but not bar: > > pg_dump --table bar > pg_dump: error: no matching tables were found > > with file "myfilter" containing just "table bar" > pg_dump --filter myfilter > pg_dump: error: no matching tables were found > > with the file "myfilter" containing both "table foo" and "table bar" > (order doesn't matter): > > is not this expected behaviour (consistent with -t option)? (2024-03-10 16:48:07) postgres=# \dt List of relations ┌┬──┬───┬───┐ │ Schema │ Name │ Type │ Owner │ ╞╪══╪═══╪═══╡ │ public │ foo │ table │ pavel │ └┴──┴───┴───┘ (1 row) pavel@nemesis:~/src/orafce$ /usr/local/pgsql/master/bin/pg_dump -t foo -t boo > /dev/null pavel@nemesis:~/src/orafce$ if you want to raise error, you should to use option --strict-names. pavel@nemesis:~/src/orafce$ /usr/local/pgsql/master/bin/pg_dump -t foo -t boo --strict-names > /dev/null pg_dump: error: no matching tables were found for pattern "boo" Regards Pavel > > Not having looked into the code, but it looks to me like some variable > isn't properly reset, or perhaps there is a check for existence rather > than count? > > //Magnus >
Re: remaining sql/json patches
one more issue. + case JSON_VALUE_OP: + /* Always omit quotes from scalar strings. */ + jsexpr->omit_quotes = (func->quotes == JS_QUOTES_OMIT); + + /* JSON_VALUE returns text by default. */ + if (!OidIsValid(jsexpr->returning->typid)) + { + jsexpr->returning->typid = TEXTOID; + jsexpr->returning->typmod = -1; + } by default, makeNode(JsonExpr), node initialization, jsexpr->omit_quotes will initialize to false, Even though there was no implication to the JSON_TABLE patch (probably because coerceJsonFuncExprOutput), all tests still passed. based on the above comment, and the regress test, you still need do (i think) ` jsexpr->omit_quotes = true; `
Re: pg_dump include/exclude fails to error
> On 10 Mar 2024, at 15:22, Magnus Hagander wrote: > Not having looked into the code, but it looks to me like some variable > isn't properly reset, or perhaps there is a check for existence rather > than count? Thanks for the report, I'll have a look later today when back. -- Daniel Gustafsson
Re: [DOC] Add detail regarding resource consumption wrt max_connections
Hi, On Sun, Mar 10, 2024 at 09:58:25AM -0400, Robert Treat wrote: > On Fri, Mar 8, 2024 at 10:47 AM Michael Banck wrote: > > On Fri, Jan 12, 2024 at 10:14:38PM +, Cary Huang wrote: > > > I think it is good to warn the user about the increased allocation of > > > memory for certain parameters so that they do not abuse it by setting > > > it to a huge number without knowing the consequences. > > > > Right, and I think it might be useful to log (i.e. at LOG not DEBUG3 > > level, with a nicer message) the amount of memory we allocate on > > startup, that is just one additional line per instance lifetime but > > might be quite useful to admins. Or maybe two lines if we log whether we > > could allocate it as huge pages or not as well: > > > > |2024-03-08 16:46:13.117 CET [237899] DEBUG: invoking > > IpcMemoryCreate(size=145145856) > > |2024-03-08 16:46:13.117 CET [237899] DEBUG: mmap(146800640) with > > MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory > > > > If we were going to add these details (and I very much like the idea), > I would advocate that we put it somewhere more permanent than a single > log entry at start-up. Given that database up-times easily run months > and sometimes years, it is hard to imagine we'd always have access to > the log files to figure this out on any actively running systems. Well actually, those two numbers are already available at runtime, via the shared_memory_size and (from 17 on) huge_pages_status GUCs. So this would be geared at admins that keeps in long-term storage and want to know what the numbers were a while ago. Maybe it is not that interesting, but I think one or two lines at startup would not hurt. Michael
pg_dump include/exclude fails to error
When including tables with the new pg_dump functionality, it fails to error out if a table is missing, but only if more than one table is specified. E.g., if table foo exist, but not bar: pg_dump --table bar pg_dump: error: no matching tables were found with file "myfilter" containing just "table bar" pg_dump --filter myfilter pg_dump: error: no matching tables were found with the file "myfilter" containing both "table foo" and "table bar" (order doesn't matter): Not having looked into the code, but it looks to me like some variable isn't properly reset, or perhaps there is a check for existence rather than count? //Magnus
Re: [DOC] Add detail regarding resource consumption wrt max_connections
On Fri, Mar 8, 2024 at 10:47 AM Michael Banck wrote: > > Hi, > > On Fri, Jan 12, 2024 at 10:14:38PM +, Cary Huang wrote: > > I think it is good to warn the user about the increased allocation of > > memory for certain parameters so that they do not abuse it by setting > > it to a huge number without knowing the consequences. > > Right, and I think it might be useful to log (i.e. at LOG not DEBUG3 > level, with a nicer message) the amount of memory we allocate on > startup, that is just one additional line per instance lifetime but > might be quite useful to admins. Or maybe two lines if we log whether we > could allocate it as huge pages or not as well: > > |2024-03-08 16:46:13.117 CET [237899] DEBUG: invoking > IpcMemoryCreate(size=145145856) > |2024-03-08 16:46:13.117 CET [237899] DEBUG: mmap(146800640) with > MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory > If we were going to add these details (and I very much like the idea), I would advocate that we put it somewhere more permanent than a single log entry at start-up. Given that database up-times easily run months and sometimes years, it is hard to imagine we'd always have access to the log files to figure this out on any actively running systems. Robert Treat https://xzilla.net
Re: UUID v7
Hi Peter, thank you for so thoughtful review. > On 6 Mar 2024, at 12:13, Peter Eisentraut wrote: > > I have various comments on this patch: > > > - doc/src/sgml/func.sgml > > The documentation of the new functions should be broken up a bit. > It's all one paragraph now. At least make it several paragraphs, or > possibly tables or something else. I've split functions to generate UUIDs from functions to extract stuff. > > Avoid listing the functions twice: Once before the description and > then again in the description. That's just going to get out of date. > The first listing is not necessary, I think. Fixed. > The return values in the documentation should use the public-facing > type names, like "timestamp with time zone" and "smallint". Fixed. > The descriptions of the UUID generation functions use handwavy > language in their descriptions, like "It provides much better data > locality" or "unacceptable security or business intelligence > implications", which isn't useful. Either we cut that all out and > just say, it creates a UUIDv7, done, look elsewhere for more > information, or we provide some more concretely useful details. I've removed all that stuff entirely. > We shouldn't label a link as "IETF standard" when it's actually a > draft. Fixed. Well, all my modifications of documentation are kind of blind... I tried to "make docs", but it gives me gazilion of errors... Is there an easy way to see resulting HTML? > - src/include/catalog/pg_proc.dat > > The description of uuidv4 should be "generate UUID version 4", so that > it parallels uuidv7. Fixed. > The description of uuid_extract_time says 'extract timestamp from UUID > version 7', the implementation is not limited to version 7. Fixed. > I think uuid_extract_time should be named uuid_extract_timestamp, > because it extracts a timestamp, not a time. Renamed. > The functions uuid_extract_ver and uuid_extract_var could be named > uuid_extract_version and uuid_extract_variant. Otherwise, it's hard > to tell them apart, with only one letter different. Renamed. > - src/test/regress/sql/uuid.sql > > Why are the tests using the input format '{...}', which is not the > standard one? Fixed. > - src/backend/utils/adt/uuid.c > > All this new code should have more comments. There is a lot of bit > twiddling going on, and I suppose one is expected to follow along in > the RFC? At least each function should have a header comment, so one > doesn't have to check in pg_proc.dat what it's supposed to do. I've added some header comment. One big comment is attached to v7, I tried to take parts mostly from RFC. Yet there are a lot of my additions that now need review... > I'm suspicious that these functions all appear to return null for > erroneous input, rather than raising errors. I think at least some > explanation for this should be recorded somewhere. The input is not erroneous per se. But the fact that # select 1/0; ERROR: division by zero makes me consider throwing an error. There was some argumentation upthread for not throwing error though, but now I cannot find it... maybe I accepted this behaviour as more user-friendly. > I think the behavior of uuid_extract_var(iant) is wrong. The code > takes just two bits to return, but the draft document is quite clear > that the variant is 4 bits (see Table 1). Well, it was correct only for implemented variant. I've made version that implements full table 1 from section 4.1. > The uuidv7 function could really use a header comment that explains > the choices that were made. The RFC draft provides various options > that implementations could use; we should describe which ones we > chose. Done. > > I would have expected that, since gettimeofday() provides microsecond > precision, we'd put the extra precision into "rand_a" as per Section 6.2 > method 3. I had chosen method 2 over method 3 as most portable. Can we be sure how many bits (after reading milliseconds) are there across different OSes? Even if we put extra 10 bits of timestamp, we cannot extract safely them. These bits could promote inter-backend stortability. E.i. when many backends generate data fast - this data is still somewhat ordered even within 1ms. But I think benefits of this sortability are outweighed by portability(unknown real resolution), simplicity(we don't store microseconds, thus do not try to extract them). All this arguments are weak, but if one method would be strictly better than another - there would be only one method. > > You use some kind of counter, but could you explain which method that > counter implements? I described counter in uuidv7() header. > > I don't see any acknowledgment of issues relating to concurrency or > restarts. Like, how do we prevent duplicates being generated by > concurrent sessions or between restarts? Maybe the counter or random > stuff does that, but it's not explained. I think restart takes more than 1ms, so this is covered with time
Re: Failures in constraints regression test, "read only 0 of 8192 bytes"
Thanks for diagnosing this! On 10/03/2024 08:23, Thomas Munro wrote: On Sun, Mar 10, 2024 at 6:48 PM Thomas Munro wrote: I won't be surprised if the answer is: if you're holding a reference, you have to get a pin (referring to bulk_write.c). Ahhh, on second thoughts, I take that back, I think the original theory still actually works just fine. It's just that somewhere in our refactoring of that commit, when we were vacillating between different semantics for 'destroy' and 'release', I think we made a mistake: in RelationCacheInvalidate() I think we should now call smgrreleaseall(), not smgrdestroyall(). Yes, I ran the reproducer with 'rr', and came to the same conclusion. That smgrdestroyall() call closes destroys the SmgrRelation, breaking the new assumption that an unpinned SmgrRelation is valid until end of transaction. That satisfies the requirements for sinval queue overflow: we close md.c segments (and most importantly virtual file descriptors), so our lack of sinval records can't hurt us, we'll reopen all files as required. That's what CLOBBER_CACHE_ALWAYS is effectively testing (and more). But the smgr pointer remains valid, and retains only its "identity", eg hash table key, and that's also fine. It won't be destroyed until after the end of the transaction. Which was the point, and it allows things like bulk_write.c (and streaming_read.c) to hold an smgr reference. Right +1. I wonder if we can now simplify RelationCacheInvalidate() further. In the loop above that smgrdestroyall(): while ((idhentry = (RelIdCacheEnt *) hash_seq_search()) != NULL) { relation = idhentry->reldesc; /* Must close all smgr references to avoid leaving dangling ptrs */ RelationCloseSmgr(relation); /* * Ignore new relations; no other backend will manipulate them before * we commit. Likewise, before replacing a relation's relfilelocator, * we shall have acquired AccessExclusiveLock and drained any * applicable pending invalidations. */ if (relation->rd_createSubid != InvalidSubTransactionId || relation->rd_firstRelfilelocatorSubid != InvalidSubTransactionId) continue; I don't think we need to call RelationCloseSmgr() for relations created in the same transaction anymore. -- Heikki Linnakangas Neon (https://neon.tech)
Re: Combine headerscheck and cpluspluscheck scripts
On 07.03.24 08:30, Michael Paquier wrote: On Thu, Mar 07, 2024 at 01:37:36PM +1300, Thomas Munro wrote: +1 Looking at the patch, nice cleanup. Committed, thanks.