Re: Direct I/O
On 2023-04-15 05:22, Tom Lane wrote: Thomas Munro writes: On Sat, Apr 15, 2023 at 7:50 AM Mikael Kjellström wrote: want me to switch to clang instead? I vote +1, that's the system compiler in modern OpenBSD. Ditto, we need coverage of that. OK. I switched to clang on morepork now. As for curculio, I don't understand the motivation for maintaining that machine. I'd rather know if OpenBSD 7.3 works. Those aren't necessarily mutually exclusive :-). But I do agree that recent OpenBSD is more important to cover than ancient OpenBSD. So do you want me to switch that machine to OpenBSD 7.3 instead? /Mikael
Re: v16dev: invalid memory alloc request size 8488348128
David Rowley writes: > I don't think that's really going to help. The crash already tells us > there's a problem down the line, but if the commit you mention is to > blame for this, then the problem is elsewhere, either in our > assumption that we can get away without the datumCopy() or in the > aggregate function producing the state that we're no longer copying. It does smell like the aggregate output has been corrupted by the time it got to the plpgsql function. I don't particularly want to try to synthesize a test case from the essentially-zero SQL-level information we've been provided, though. And I doubt we can track this down without a test case. So please try to sanitize the case you have enough that you can share it. regards, tom lane
Re: Direct I/O
Thomas Munro writes: > On Sat, Apr 15, 2023 at 7:50 AM Mikael Kjellström > wrote: >> want me to switch to clang instead? > I vote +1, that's the system compiler in modern OpenBSD. Ditto, we need coverage of that. > As for curculio, I don't understand the motivation for maintaining > that machine. I'd rather know if OpenBSD 7.3 works. Those aren't necessarily mutually exclusive :-). But I do agree that recent OpenBSD is more important to cover than ancient OpenBSD. regards, tom lane
Re: Tab completion for CREATE SCHEMAAUTHORIZATION
On Fri, Apr 14, 2023 at 05:04:35PM +0100, Dagfinn Ilmari Mannsåker wrote: > Looks like I completely dropped the ball on this one, sorry. So did I ;) > Here's a > rebased patch which uses the new COMPLETE_WITH_QUERY_PLUS functionality > added in commit 02b8048ba5dc36238f3e7c3c58c5946220298d71. Thanks, I'll look at it. -- Michael signature.asc Description: PGP signature
Re: v16dev: invalid memory alloc request size 8488348128
On Sat, 15 Apr 2023 at 13:03, Justin Pryzby wrote: > Maybe you'll find valgrind errors to be helpful. I don't think that's really going to help. The crash already tells us there's a problem down the line, but if the commit you mention is to blame for this, then the problem is elsewhere, either in our assumption that we can get away without the datumCopy() or in the aggregate function producing the state that we're no longer copying. David
Re: Support logical replication of DDLs
On Fri, 14 Apr 2023 at 13:06, vignesh C wrote: > > Few comments: Some more comments on 0001 patch: Few comments: 1) We could add a space after the 2nd parameter + * Note we don't have the luxury of sprintf-like compiler warnings for + * malformed argument lists. + */ +static ObjTree * +new_objtree_VA(char *fmt, int numobjs,...) 2) I felt objtree_to_jsonb_element is a helper function for objtree_to_jsonb_rec, we should update the comments accordingly: +/* + * Helper for objtree_to_jsonb: process an individual element from an object or + * an array into the output parse state. + */ +static void +objtree_to_jsonb_element(JsonbParseState *state, ObjElem *object, +JsonbIteratorToken elem_token) +{ + JsonbValue val; + + switch (object->objtype) + { + case ObjTypeNull: + val.type = jbvNull; + pushJsonbValue(, elem_token, ); + break; 3) domainId parameter change should be removed from the first patch: +static List * +obtainConstraints(List *elements, Oid relationId, Oid domainId, + ConstraintObjType objType) +{ + RelationconRel; + ScanKeyData key; + SysScanDesc scan; + HeapTuple tuple; + ObjTree*constr; + Oid relid; + + /* Only one may be valid */ + Assert(OidIsValid(relationId) ^ OidIsValid(domainId)); + + relid = OidIsValid(relationId) ? ConstraintRelidTypidNameIndexId : + ConstraintTypidIndexId; 4) Do we have any scenario for CONSTRAINT_TRIGGER in table/index, if so could we add a test for this? + case CONSTRAINT_UNIQUE: + contype = "unique"; + break; + case CONSTRAINT_TRIGGER: + contype = "trigger"; + break; + case CONSTRAINT_EXCLUSION: + contype = "exclusion"; + break; 5) The below code adds information about compression but the comment says "USING clause", the comment should be updated accordingly: + /* USING clause */ + tmp_obj = new_objtree("COMPRESSION"); + if (coldef->compression) + append_string_object(tmp_obj, "%{compression_method}I", + "compression_method", coldef->compression); + else + { + append_null_object(tmp_obj, "%{compression_method}I"); + append_not_present(tmp_obj); + } + append_object_object(ret, "%{compression}s", tmp_obj); 6) Generally we add append_null_object followed by append_not_present, but it is not present for "COLLATE" handling, is this correct? + tmp_obj = new_objtree("COMPRESSION"); + if (coldef->compression) + append_string_object(tmp_obj, "%{compression_method}I", + "compression_method", coldef->compression); + else + { + append_null_object(tmp_obj, "%{compression_method}I"); + append_not_present(tmp_obj); + } + append_object_object(ret, "%{compression}s", tmp_obj); + + tmp_obj = new_objtree("COLLATE"); + if (OidIsValid(typcollation)) + append_object_object(tmp_obj, "%{name}D", + new_objtree_for_qualname_id(CollationRelationId, + typcollation)); + else + append_not_present(tmp_obj); + append_object_object(ret, "%{collation}s", tmp_obj); 7) I felt attrTup can be released after get_atttypetypmodcoll as we are not using the tuple after that, I'm not sure if it is required to hold the reference to this tuple till the end of the function: +static ObjTree * +deparse_ColumnDef_typed(Relation relation, List *dpcontext, ColumnDef *coldef) +{ + ObjTree*ret = NULL; + ObjTree*tmp_obj; + Oid relid = RelationGetRelid(relation); + HeapTuple attrTup; + Form_pg_attribute attrForm; + Oid typid; + int32 typmod; + Oid typcollation; + boolsaw_notnull; + ListCell *cell; + + attrTup = SearchSysCacheAttName(relid, coldef->colname); + if (!HeapTupleIsValid(attrTup)) + elog(ERROR, "could not find cache entry for column \"%s\" of relation %u", +coldef->colname, relid); + attrForm = (Form_pg_attribute) GETSTRUCT(attrTup); + + get_atttypetypmodcoll(relid, attrForm->attnum, + , , ); + + /* +* Search for a NOT NULL declaration. As in deparse_ColumnDef, we rely on +* finding a constraint on the column rather than coldef->is_not_null. +* (This routine is never used for ALTER cases.) +*/ + saw_notnull = false;
Re: v16dev: invalid memory alloc request size 8488348128
Maybe you'll find valgrind errors to be helpful. ==17971== Source and destination overlap in memcpy(0x1eb8c078, 0x1d88cb20, 123876054) ==17971==at 0x4C2E81D: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035) ==17971==by 0x9C705A: memcpy (string3.h:51) ==17971==by 0x9C705A: pg_detoast_datum_copy (fmgr.c:1823) ==17971==by 0x8952F8: expand_array (array_expanded.c:131) ==17971==by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556) ==17971==by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277) ==17971==by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733) ==17971==by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354) ==17971==by 0x6D9C8C: ExecProject (executor.h:388) ==17971==by 0x6D9C8C: project_aggregates (nodeAgg.c:1377) ==17971==by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520) ==17971==by 0x6DB2B4: ExecAgg (nodeAgg.c:2172) ==17971==by 0x6C4821: ExecProcNode (executor.h:272) ==17971==by 0x6C4821: ExecutePlan (execMain.c:1640) ==17971==by 0x6C4821: standard_ExecutorRun (execMain.c:365) ==17971==by 0x870535: PortalRunSelect (pquery.c:924) ==17971==by 0x871CCE: PortalRun (pquery.c:768) ==17971==by 0x86D552: exec_simple_query (postgres.c:1274) ==17971== Invalid read of size 8 ==17971==at 0x4C2EA20: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035) ==17971==by 0x9C705A: memcpy (string3.h:51) ==17971==by 0x9C705A: pg_detoast_datum_copy (fmgr.c:1823) ==17971==by 0x8952F8: expand_array (array_expanded.c:131) ==17971==by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556) ==17971==by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277) ==17971==by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733) ==17971==by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354) ==17971==by 0x6D9C8C: ExecProject (executor.h:388) ==17971==by 0x6D9C8C: project_aggregates (nodeAgg.c:1377) ==17971==by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520) ==17971==by 0x6DB2B4: ExecAgg (nodeAgg.c:2172) ==17971==by 0x6C4821: ExecProcNode (executor.h:272) ==17971==by 0x6C4821: ExecutePlan (execMain.c:1640) ==17971==by 0x6C4821: standard_ExecutorRun (execMain.c:365) ==17971==by 0x870535: PortalRunSelect (pquery.c:924) ==17971==by 0x871CCE: PortalRun (pquery.c:768) ==17971==by 0x86D552: exec_simple_query (postgres.c:1274) ==17971== Address 0x1eb8c038 is 8 bytes before a block of size 123,876,112 alloc'd ==17971==at 0x4C29F73: malloc (vg_replace_malloc.c:309) ==17971==by 0x9E4204: AllocSetAlloc (aset.c:732) ==17971==by 0x9ED5BD: palloc (mcxt.c:1224) ==17971==by 0x9C704C: pg_detoast_datum_copy (fmgr.c:1821) ==17971==by 0x8952F8: expand_array (array_expanded.c:131) ==17971==by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556) ==17971==by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277) ==17971==by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733) ==17971==by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354) ==17971==by 0x6D9C8C: ExecProject (executor.h:388) ==17971==by 0x6D9C8C: project_aggregates (nodeAgg.c:1377) ==17971==by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520) ==17971==by 0x6DB2B4: ExecAgg (nodeAgg.c:2172) ==17971==by 0x6C4821: ExecProcNode (executor.h:272) ==17971==by 0x6C4821: ExecutePlan (execMain.c:1640) ==17971==by 0x6C4821: standard_ExecutorRun (execMain.c:365) ==17971==by 0x870535: PortalRunSelect (pquery.c:924) ==17971== Invalid read of size 8 ==17971==at 0x4C2EA28: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035) ==17971==by 0x9C705A: memcpy (string3.h:51) ==17971==by 0x9C705A: pg_detoast_datum_copy (fmgr.c:1823) ==17971==by 0x8952F8: expand_array (array_expanded.c:131) ==17971==by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556) ==17971==by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277) ==17971==by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733) ==17971==by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354) ==17971==by 0x6D9C8C: ExecProject (executor.h:388) ==17971==by 0x6D9C8C: project_aggregates (nodeAgg.c:1377) ==17971==by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520) ==17971==by 0x6DB2B4: ExecAgg (nodeAgg.c:2172) ==17971==by 0x6C4821: ExecProcNode (executor.h:272) ==17971==by 0x6C4821: ExecutePlan (execMain.c:1640) ==17971==by 0x6C4821: standard_ExecutorRun (execMain.c:365) ==17971==by 0x870535: PortalRunSelect (pquery.c:924) ==17971==by 0x871CCE: PortalRun (pquery.c:768) ==17971==by 0x86D552: exec_simple_query (postgres.c:1274) ==17971== Address 0x1eb8c030 is 16 bytes before a block of size 123,876,112 alloc'd ==17971==at 0x4C29F73: malloc (vg_replace_malloc.c:309) ==17971==by 0x9E4204: AllocSetAlloc (aset.c:732) ==17971==by 0x9ED5BD: palloc (mcxt.c:1224) ==17971==by 0x9C704C: pg_detoast_datum_copy (fmgr.c:1821) ==17971==by 0x8952F8: expand_array (array_expanded.c:131) ==17971==by 0x1E971A28: plpgsql_exec_function
Re: v16dev: invalid memory alloc request size 8488348128
David Rowley writes: > Any chance you could try and come up with a minimal reproducer? Yeah --- there's an awful lot of moving parts there, and a stack trace is not much to go on. regards, tom lane
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode
On Fri, 14 Apr 2023 at 19:20, Peter Eisentraut wrote: > > I came across these new options and had a little bit of trouble figuring > them out from the documentation. Maybe this could be polished a bit. > > vacuumdb --help says > > --buffer-usage-limit=BUFSIZE > > I can guess what a "SIZE" might be, but is "BUFSIZE" different from a > "SIZE"? Maybe simplify here. > > On the vacuumdb man page, the placeholder is > > buffer_usage_limit > > which is yet another way of phrasing it. Maybe also use "size" here? > > The VACUUM man page says > > BUFFER_USAGE_LIMIT [ string ] > > which had me really confused. The detailed description later doesn't > give any further explanation of possible values, except that > 0 is apparently a possible value, which in my mind is > not a string. Then there is a link to guc-vacuum-buffer-usage-limit, > which lifts the mystery that this is really just an integer setting with > possible memory-size units, but it was really hard to figure that out > from the start! > > Moreover, on the VACUUM man page, right below BUFFER_USAGE_LIMIT, it > explains the different kinds of accepted values, and "string" wasn't > added there. Maybe also change this to "size" here and add an > explanation there what kinds of sizes are possible. > > Finally, the locations of the new options in the various documentation > places seems a bit random. The vacuumdb --help output and the man page > appear to be mostly alphabetical, so --buffer-usage-limit should be > after -a/--all. (Also note that right now the option isn't even in the > same place in the --help output versus the man page.) These are all valid points. I've attached a patch aiming to address each of them. > The order of the options on the VACUUM man page doesn't make any sense > anymore. This isn't really the fault of this patch, but maybe it's time > to do a fresh reordering there. Agreed, that likely wasn't a big problem say about 5 years ago when we had far fewer options, but the number has grown quite a bit since then. Right after I fix the points you've mentioned seems a good time to address that. David fix_buffer_usage_limit_docs.patch Description: Binary data
Re: Direct I/O
On Sat, Apr 15, 2023 at 7:50 AM Mikael Kjellström wrote: > want me to switch to clang instead? I vote +1, that's the system compiler in modern OpenBSD. https://www.cambus.net/the-state-of-toolchains-in-openbsd/ As for curculio, I don't understand the motivation for maintaining that machine. I'd rather know if OpenBSD 7.3 works.
Re: v16dev: invalid memory alloc request size 8488348128
On Sat, 15 Apr 2023 at 10:48, Justin Pryzby wrote: > > On Sat, Apr 15, 2023 at 10:04:52AM +1200, David Rowley wrote: > > Which aggregate function is being called here? Is it a custom > > aggregate written in C, by any chance? > > That function is not an aggregate: There's an aggregate somewhere as indicated by this fragment from the stack trace: > #12 project_aggregates (aggstate=aggstate@entry=0x607200070d38) at > ../src/backend/executor/nodeAgg.c:1377 > #13 0x00903eb6 in agg_retrieve_direct > (aggstate=aggstate@entry=0x607200070d38) at > ../src/backend/executor/nodeAgg.c:2520 > #14 0x00904074 in ExecAgg (pstate=0x607200070d38) at > ../src/backend/executor/nodeAgg.c:2172 Any chance you could try and come up with a minimal reproducer? You have access to see which aggregates are being used here and what data types are being given to them and then what's being done with the return value of that aggregate that's causing the crash. Maybe you can still get the crash if you mock up some data to aggregate and strip out the guts from the plpgsql functions that we're crashing on? David
Scans are offloaded to SeqScan instead of CustomScan when there are multiple relations in the same query
Hi there, In my implementation of CustomScan, when I have a query that scans multiple tables (select c from t1,t2,t3), the planner always picks one table to be scanned by CustomScan and offloads the rest to SeqScan. I tried assigning a cost of 0 to the CustomScan path, but still not working. BeginCustomScan gets executed, ExecCustomScan is skipped, and then EndCustomScan is executed for all the tables that are offloaded to Seq Scan. EXPLAIN shows that always only one table is picked to be executed by CustomScan. Any idea what I might be doing wrong? Like a value in a struct I might be setting incorrectly? Thanks!
Re: Direct I/O
On Sat, Apr 15, 2023 at 7:38 AM Tom Lane wrote: > Andres Freund writes: > > On 2023-04-14 15:21:18 -0400, Tom Lane wrote: > >> +1 for that, though. (Also, the fact that these animals aren't > >> actually failing suggests that 004_io_direct.pl needs expansion.) > > > It's skipped, due to lack of O_DIRECT: > > [20:50:22] t/004_io_direct.pl .. skipped: no O_DIRECT > > Hmm, I'd say that might be just luck. Whether the compiler honors weird > alignment of locals seems independent of whether the OS has O_DIRECT. > > > So perhaps we don't even need a configure test, just a bit of ifdef'ery? > > It's > > a bit annoying structurally, because the PG*Aligned structs are defined in > > c.h, but the different ways of spelling O_DIRECT are dealt with in fd.h. > > > I wonder if we should try to move those structs to fd.h as well... > > I doubt they belong in c.h, so that could be plausible; except > I'm not convinced that testing O_DIRECT is sufficient. As far as I can tell, the failure to honour large alignment attributes even though the compiler passes our configure check that you can do that was considered to be approximately a bug[1] or at least a thing to be improved in fairly old GCC times but the fix wasn't back-patched that far. Unfortunately the projects that were allergic to the GPL3 change but wanted to ship a compiler (or some motivation related to that) got stuck on 4.2 for a while before they flipped to Clang (as OpenBSD has now done). It seems hard to get excited about doing anything about that on our side, and those systems are also spewing other warnings. But if we're going to do it, it looks like the right place would indeed be a new compiler check that the attribute exists *and* generates no warnings with alignment > 16, something like that? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=16660
Re: v16dev: invalid memory alloc request size 8488348128
On Sat, Apr 15, 2023 at 10:04:52AM +1200, David Rowley wrote: > On Sat, 15 Apr 2023 at 08:36, Justin Pryzby wrote: > > > > I hit this elog() while testing reports under v16 and changed to PANIC > > to help diagnose. > > > > DETAILS: PANIC: invalid memory alloc request size 18446744072967930808 > > CONTEXT: PL/pgSQL function array_weight(real[],real[]) while storing call > > arguments into local variables > > > > I can't share the query, data, nor plpgsql functions themselves. > > Which aggregate function is being called here? Is it a custom > aggregate written in C, by any chance? That function is not an aggregate: ts=# \sf array_weight CREATE OR REPLACE FUNCTION public.array_weight(real[], real[]) RETURNS real LANGUAGE plpgsql IMMUTABLE PARALLEL SAFE And we don't have any C code loaded to postgres. We do have polymorphic aggregate functions using anycompatiblearray [*], and array_weight is being called several times with those aggregates as its arguments. *As in: 9e38c2bb5093ceb0c04d6315ccd8975bd17add66 97f73a978fc1aca59c6ad765548ce0096d95a923 09878cdd489ff7aca761998e7cb104f4fd98ae02
Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert
> On 14 Apr 2023, at 19:34, Jacob Champion wrote: > > On Fri, Apr 14, 2023 at 7:20 AM Daniel Gustafsson wrote: >> And again with the attachment. > > After some sleep... From inspection I think the final EOF branch could > be masked by the new branch, if verification has failed but was already > ignored. > > To test that, I tried hanging up on the client partway through the > server handshake, and I got some strange results. With the patch, using > sslmode=require and OpenSSL 1.0.1, I see: > >connection to server at "127.0.0.1", port 50859 failed: SSL error: > certificate verify failed: self signed certificate > > Which is wrong -- we shouldn't care about the self-signed failure if > we're not using verify-*. I was going to suggest a patch like the following: > >>if (r == -1) >> - libpq_append_conn_error(conn, "SSL SYSCALL error: >> %s", >> - SOCK_STRERROR(SOCK_ERRNO, sebuf, >> sizeof(sebuf))); >> + { >> + /* >> +* If we get an X509 error here without an error in >> the >> +* socket layer it means that verification failed >> without >> +* it being a protocol error. A common cause is >> trying to >> +* a default system CA which is missing or broken. >> +*/ >> + if (!save_errno && vcode != X509_V_OK) >> + libpq_append_conn_error(conn, "SSL error: >> certificate verify failed: %s", >> + >> X509_verify_cert_error_string(vcode)); >> + else >> + libpq_append_conn_error(conn, "SSL SYSCALL >> error: %s", >> + SOCK_STRERROR(save_errno, >> sebuf, sizeof(sebuf))); >> + } >>else >>libpq_append_conn_error(conn, "SSL SYSCALL error: EOF >> detected"); > > But then I tested my case against PG15, and I didn't get the EOF message > I expected: > >connection to server at "127.0.0.1", port 50283 failed: SSL SYSCALL > error: Success This "error: Success" error has been reported to the list numerous times as misleading, and I'd love to make progress on improving error reporting during the v17 cycle. > So it appears that this (hanging up on the client during the handshake) > is _also_ a case where we could get a SYSCALL error with a zero errno, > and my patch doesn't actually fix the misleading error message. > > That makes me worried, but I don't really have a concrete suggestion to > make it better, yet. I'm not opposed to pushing this as a fix for the > tests, but I suspect we'll have to iterate on this more. So, taking a step back. We know that libpq error reporting for SSL errors isn't great, the permutations of sslmodes and OpenSSL versions and the very fine-grained error handling API of OpenSSL make it hard to generalize well. That's not what we're trying to solve here. What we are trying solve is this one case where we know exactly what went wrong, and we know that the error message as-is will be somewhere between misleading and utterly bogus. The committed feature is working as intended, and the connection is refused as it should when no CA is available, but we know it's a situation which is quite easy to get oneself into (a typo in an environment variable can be enough). So what we can do is pinpoint that specific case and leave the unknowns to the current error reporting for consistency with older postgres versions. The attached checks for the specific known error, and leave all the other cases to the same logging that we have today. It relies on the knowledge that system sslrootcert configs has deferred loading, and will run with verify-full. So if we see an X509 failure in loading the local issuer cert here then we know the the user wanted to use the system CA pool for certificate verification but the root CA cannot be loaded for some reason. -- Daniel Gustafsson libpq_system_ca_fix_v5.diff Description: Binary data
Re: v16dev: invalid memory alloc request size 8488348128
On Sat, 15 Apr 2023 at 08:36, Justin Pryzby wrote: > > I hit this elog() while testing reports under v16 and changed to PANIC > to help diagnose. > > DETAILS: PANIC: invalid memory alloc request size 18446744072967930808 > CONTEXT: PL/pgSQL function array_weight(real[],real[]) while storing call > arguments into local variables > > I can't share the query, data, nor plpgsql functions themselves. Which aggregate function is being called here? Is it a custom aggregate written in C, by any chance? David
v16dev: invalid memory alloc request size 8488348128
I hit this elog() while testing reports under v16 and changed to PANIC to help diagnose. DETAILS: PANIC: invalid memory alloc request size 18446744072967930808 CONTEXT: PL/pgSQL function array_weight(real[],real[]) while storing call arguments into local variables I can't share the query, data, nor plpgsql functions themselves. I reproduced the problem at this commit, but not at its parent. commit 42b746d4c982257bf3f924176632b04dc288174b (HEAD) Author: Tom Lane Date: Thu Oct 6 13:27:34 2022 -0400 Remove uses of MemoryContextContains in nodeAgg.c and nodeWindowAgg.c. #2 0x01067af5 in errfinish (filename=filename@entry=0x168f1e0 "../src/backend/utils/mmgr/mcxt.c", lineno=lineno@entry=1013, funcname=funcname@entry=0x16901a0 <__func__.17850> "MemoryContextAlloc") at ../src/backend/utils/error/elog.c:604 #3 0x010c57c7 in MemoryContextAlloc (context=context@entry=0x604200032600, size=size@entry=8488348128) at ../src/backend/utils/mmgr/mcxt.c:1013 #4 0x00db49a4 in copy_byval_expanded_array (eah=eah@entry=0x604200032718, oldeah=0x604200032718) at ../src/backend/utils/adt/array_expanded.c:195 #5 0x00db5f7a in expand_array (arraydatum=105836584314672, parentcontext=, metacache=0x7ffcbd2d29c0, metacache@entry=0x0) at ../src/backend/utils/adt/array_expanded.c:104 #6 0x7f6c05a6b4d0 in plpgsql_exec_function (func=func@entry=0x6092004a4c58, fcinfo=fcinfo@entry=0x7f6c04f7efc8, simple_eval_estate=simple_eval_estate@entry=0x0, simple_eval_resowner=simple_eval_resowner@entry=0x0, procedure_resowner=procedure_resowner@entry=0x0, atomic=atomic@entry=true) at ../src/pl/plpgsql/src/pl_exec.c:556 #7 0x7f6c05a76af4 in plpgsql_call_handler (fcinfo=) at ../src/pl/plpgsql/src/pl_handler.c:277 #8 0x008b30cd in ExecInterpExpr (state=0x7f6c04fd6750, econtext=0x6072000712d0, isnull=0x7ffcbd2d2fa0) at ../src/backend/executor/execExprInterp.c:733 #9 0x008a6c5f in ExecInterpExprStillValid (state=0x7f6c04fd6750, econtext=0x6072000712d0, isNull=0x7ffcbd2d2fa0) at ../src/backend/executor/execExprInterp.c:1858 #10 0x0090032b in ExecEvalExprSwitchContext (isNull=0x7ffcbd2d2fa0, econtext=0x6072000712d0, state=0x7f6c04fd6750) at ../src/include/executor/executor.h:354 #11 ExecProject (projInfo=0x7f6c04fd6748) at ../src/include/executor/executor.h:388 #12 project_aggregates (aggstate=aggstate@entry=0x607200070d38) at ../src/backend/executor/nodeAgg.c:1377 #13 0x00903eb6 in agg_retrieve_direct (aggstate=aggstate@entry=0x607200070d38) at ../src/backend/executor/nodeAgg.c:2520 #14 0x00904074 in ExecAgg (pstate=0x607200070d38) at ../src/backend/executor/nodeAgg.c:2172 #15 0x008d90e0 in ExecProcNodeFirst (node=0x607200070d38) at ../src/backend/executor/execProcnode.c:464 #16 0x008c1e5f in ExecProcNode (node=0x607200070d38) at ../src/include/executor/executor.h:272 #17 ExecutePlan (estate=estate@entry=0x607200070a18, planstate=0x607200070d38, use_parallel_mode=false, operation=operation@entry=CMD_SELECT, sendTuples=true, numberTuples=numberTuples@entry=0, direction=direction@entry=ForwardScanDirection, dest=dest@entry=0x7f6c051abd28, execute_once=execute_once@entry=true) at ../src/backend/executor/execMain.c:1640 #18 0x008c3ffb in standard_ExecutorRun (queryDesc=0x604200016998, direction=ForwardScanDirection, count=0, execute_once=) at ../src/backend/executor/execMain.c:365 #19 0x008c4125 in ExecutorRun (queryDesc=queryDesc@entry=0x604200016998, direction=direction@entry=ForwardScanDirection, count=count@entry=0, execute_once=) at ../src/backend/executor/execMain.c:309 #20 0x00d5d148 in PortalRunSelect (portal=portal@entry=0x607200028a18, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x7f6c051abd28) at ../src/backend/tcop/pquery.c:924 #21 0x00d60dc8 in PortalRun (portal=portal@entry=0x607200028a18, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x7f6c051abd28, altdest=altdest@entry=0x7f6c051abd28, qc=, qc@entry=0x7ffcbd2d3580) at ../src/backend/tcop/pquery.c:768 #22 0x00d595fd in exec_simple_query ( query_string=query_string@entry=0x6082000cf238 "... #23 0x00d5c72c in PostgresMain (dbname=dbname@entry=0x6082b378 "postgres", username=username@entry=0x6082b358 "telsasoft") at ../src/backend/tcop/postgres.c:4632 #24 0x00bddc19 in BackendRun (port=port@entry=0x6030fc40) at ../src/backend/postmaster/postmaster.c:4461 #25 0x00be2583 in BackendStartup (port=port@entry=0x6030fc40) at ../src/backend/postmaster/postmaster.c:4189 #26 0x00be2a05 in ServerLoop () at ../src/backend/postmaster/postmaster.c:1779 #27 0x00be436b in PostmasterMain (argc=argc@entry=9, argv=argv@entry=0x600edf40) at ../src/backend/postmaster/postmaster.c:1463 #28
Re: allow_in_place_tablespaces vs. pg_basebackup
On Tue, Mar 28, 2023 at 9:40 PM Michael Paquier wrote: > Looking at the patch, nothing really stands out.. It doesn't seem like anyone's unhappy about this patch. I don't think it's necessary to back-patch it, given that in-place tablespaces are intended for developer use, not real-world use, and also given that the patch requires changing both a bit of server-side behavior and some client-side behavior and it seems unfriendly to create behavior skew of that sort in minor release. However, I would like to get it committed to master. Do people think it's OK to do that now, or should I wait until we've branched? I personally think this is bug-fix-ish enough that now is OK, but I'll certainly forebear if others disagree. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Direct I/O
Andres Freund writes: > On 2023-04-14 15:21:18 -0400, Tom Lane wrote: >> +1 for that, though. (Also, the fact that these animals aren't >> actually failing suggests that 004_io_direct.pl needs expansion.) > It's skipped, due to lack of O_DIRECT: > [20:50:22] t/004_io_direct.pl .. skipped: no O_DIRECT Hmm, I'd say that might be just luck. Whether the compiler honors weird alignment of locals seems independent of whether the OS has O_DIRECT. > So perhaps we don't even need a configure test, just a bit of ifdef'ery? It's > a bit annoying structurally, because the PG*Aligned structs are defined in > c.h, but the different ways of spelling O_DIRECT are dealt with in fd.h. > I wonder if we should try to move those structs to fd.h as well... I doubt they belong in c.h, so that could be plausible; except I'm not convinced that testing O_DIRECT is sufficient. regards, tom lane
Re: Temporary tables versus wraparound... again
On Fri, Apr 14, 2023 at 10:47 AM Andres Freund wrote: > I don't think it's outright wrong, but it is very confusing what it relates > to. For some reason I tried to "attach" the parenthetical to the "otherwise", > which doesn't make a whole lot of sense. How about: I suppose that it doesn't matter whether it's outright wrong, or just unclear. Either way it should be improved. > * If rel is not NULL the horizon may be considerably more recent (i.e. > * allowing more tuples to be removed) than otherwise. In the NULL case a > * horizon that is correct (but not optimal) for all relations will be > * returned. Thus, if possible, a relation should be provided. That seems much better to me. The most important part is the last sentence. The key idea is that you as a caller should provide a rel if at all possible (and if not you should feel a pang of guilt). That emphasis makes the potential consequences much more obvious. -- Peter Geoghegan
Re: Direct I/O
Hi, On 2023-04-14 15:21:18 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2023-04-14 13:21:33 -0400, Tom Lane wrote: > >> ... I'm not sure why only those two animals > >> are unhappy, but I think they have a point: typical ABIs don't > >> guarantee alignment of function stack frames to better than > >> 16 bytes or so. In principle the compiler could support a 4K > >> alignment request anyway by doing the equivalent of alloca(3), > >> but I do not think we can count on that to happen. > > > Hm. New-ish compilers seem to be ok with it. > > Oh! I was misled by the buildfarm label on morepork, which claims > it's running clang 10.0.1. But actually, per its configure report, > it's running > > configure: using compiler=gcc (GCC) 4.2.1 20070719 Huh. I wonder if that was an accident in the BF setup. > > Perhaps we should have a > > configure check whether the compiler is OK with that, and disable direct IO > > support if not? > > +1 for that, though. (Also, the fact that these animals aren't > actually failing suggests that 004_io_direct.pl needs expansion.) It's skipped, due to lack of O_DIRECT: [20:50:22] t/004_io_direct.pl .. skipped: no O_DIRECT So perhaps we don't even need a configure test, just a bit of ifdef'ery? It's a bit annoying structurally, because the PG*Aligned structs are defined in c.h, but the different ways of spelling O_DIRECT are dealt with in fd.h. I wonder if we should try to move those structs to fd.h as well... Greetings, Andres Freund
Re: Direct I/O
Andres Freund writes: > On 2023-04-14 13:21:33 -0400, Tom Lane wrote: >> ... I'm not sure why only those two animals >> are unhappy, but I think they have a point: typical ABIs don't >> guarantee alignment of function stack frames to better than >> 16 bytes or so. In principle the compiler could support a 4K >> alignment request anyway by doing the equivalent of alloca(3), >> but I do not think we can count on that to happen. > Hm. New-ish compilers seem to be ok with it. Oh! I was misled by the buildfarm label on morepork, which claims it's running clang 10.0.1. But actually, per its configure report, it's running configure: using compiler=gcc (GCC) 4.2.1 20070719 which is the same as curculio. So that explains why nothing else is complaining. I agree we needn't let 15-year-old compilers force us into the mess that would be entailed by not treating these variables as simple locals. > Perhaps we should have a > configure check whether the compiler is OK with that, and disable direct IO > support if not? +1 for that, though. (Also, the fact that these animals aren't actually failing suggests that 004_io_direct.pl needs expansion.) regards, tom lane
Re: Should we remove vacuum_defer_cleanup_age?
On 4/14/23 1:15 PM, Laurenz Albe wrote: Let's remove vacuum_defer_cleanup_age, and put a note in the release notes that recommends using statement_timeout and hot_standby_feedback = on on the standby instead. That should have pretty much the same effect, and it is measured in time and not in the number of transactions. +1. Jonathan OpenPGP_signature Description: OpenPGP digital signature
Re: Direct I/O
Hi, On 2023-04-14 13:21:33 -0400, Tom Lane wrote: > Since the direct I/O commit went in, buildfarm animals > curculio and morepork have been issuing warnings like > > hashpage.c: In function '_hash_expandtable': > hashpage.c:995: warning: ignoring alignment for stack allocated 'zerobuf' > > in places where there's a local variable of type PGIOAlignedBlock > or PGAlignedXLogBlock. I'm not sure why only those two animals > are unhappy, but I think they have a point: typical ABIs don't > guarantee alignment of function stack frames to better than > 16 bytes or so. In principle the compiler could support a 4K > alignment request anyway by doing the equivalent of alloca(3), > but I do not think we can count on that to happen. Hm. New-ish compilers seem to be ok with it. Perhaps we should have a configure check whether the compiler is OK with that, and disable direct IO support if not? Greetings, Andres Freund
Re: Assertion being hit during WAL replay
Andres Freund writes: > I pushed the fix + test now. Cool, thanks. regards, tom lane
Re: Assertion being hit during WAL replay
Hi, On 2023-04-11 15:03:02 -0700, Andres Freund wrote: > On 2023-04-11 16:54:53 -0400, Tom Lane wrote: > > Here's something related to what I hit that time: > > > > diff --git a/src/backend/optimizer/plan/subselect.c > > b/src/backend/optimizer/plan/subselect.c > > index 052263aea6..d43a7c7bcb 100644 > > --- a/src/backend/optimizer/plan/subselect.c > > +++ b/src/backend/optimizer/plan/subselect.c > > @@ -2188,6 +2188,7 @@ SS_charge_for_initplans(PlannerInfo *root, RelOptInfo > > *final_rel) > > void > > SS_attach_initplans(PlannerInfo *root, Plan *plan) > > { > > + Assert(root->init_plans == NIL); > > plan->initPlan = root->init_plans; > > } > > > > You won't get through initdb with this, but if you install this change > > into a successfully init'd database and then "make installcheck-parallel", > > it will crash and then fail to recover, at least a lot of the time. > > Ah, that allowed me to reproduce. Thanks. > > > Took me a bit to understand how we actually get into this situation. A PRUNE > record for relation+block that doesn't exist during recovery. That doesn't > commonly happen outside of PITR or such, because we obviously need a block > with content to generate the PRUNE. The way it does happen here, is that the > relation is vacuumed and then truncated. Then we crash. Thus we end up with a > PRUNE record for a block that doesn't exist on disk. > > Which is also why the test is quite timing sensitive. > > Seems like it'd be good to have a test that covers this scenario. There's > plenty code around it that doesn't currently get exercised. > > None of the existing tests seem like a great fit. I guess it could be added to > 013_crash_restart, but that really focuses on something else. > > So I guess I'll write a 036_notsureyet.pl... See also the separate report by Alexander Lakhin at https://postgr.es/m/0b5eb82b-cb99-e0a4-b932-3dc60e2e3...@gmail.com I pushed the fix + test now. Greetings, Andres Freund
Re: refactoring relation extension and BufferAlloc(), faster COPY
Hi, On 2023-04-12 08:00:00 +0300, Alexander Lakhin wrote: > 12.04.2023 02:21, Andres Freund wrote: > > Hi, > > > > On 2023-04-11 22:00:00 +0300, Alexander Lakhin wrote: > > > A few days later I've found a new defect introduced with 31966b151. > > That's the same issue that Tom also just reported, at > > https://postgr.es/m/392271.1681238924%40sss.pgh.pa.us > > > > Attached is my WIP fix, including a test. > > Thanks for the fix. I can confirm that the issue is gone. > ReadBuffer_common() contains an Assert(), that is similar to the fixed one, > but it looks unreachable for the WAL replay case after 26158b852. Good catch. I implemented it there too. As now all of the modes are supported, I removed the assertion. I also extended the test slightly to also test the case of dropped relations. Greetings, Andres Freund
Re: Should we remove vacuum_defer_cleanup_age?
On Fri, 14 Apr 2023 at 13:15, Laurenz Albe wrote: > > On Fri, 2023-04-14 at 18:43 +0200, Alvaro Herrera wrote: > > On 2023-Apr-14, Greg Stark wrote: > > > I assume people would use hot_standby_feedback if they have streaming > > > replication. > > > > Yes, either that or a replication slot. > > A replication slot doesn't do anything against snapshot conflicts, > which is what we are discussing here. Or are we not? They're related -- the replication slot holds the feedback xmin so that if your standby disconnects it can reconnect later and not have lost data in the meantime. At least I think that's what I think it does -- I don't know if I'm just assuming that, but xmin is indeed in pg_replication_slots. -- greg
Re: daitch_mokotoff module
Buildfarm member hamerkop has a niggle about this patch: c:\\build-farm-local\\buildroot\\head\\pgsql.build\\contrib\\fuzzystrmatch\\daitch_mokotoff.c : warning C4819: The file contains a character that cannot be represented in the current code page (932). Save the file in Unicode format to prevent data loss It's complaining about the comment in static const char iso8859_1_to_ascii_upper[] = /* "`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬ ®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ" */ "`ABCDEFGHIJKLMNOPQRSTUVWXYZ{|}~ ! ?AAECDNO*OYDSAAECDNO/OYDY"; There are some other comments with non-ASCII characters elsewhere in the file, but I think it's mainly just the weird symbols here that might fail to translate to encodings that are not based on ISO 8859-1. I think we need to get rid of this warning: it's far from obvious that it's a non-issue, and because the compiler is not at all specific about where the issue is, people could waste a lot of time figuring that out. In fact, it might *not* be a non-issue, if it prevents the source tree as a whole from being processed by some tool or other. So I propose to replace those symbols with "... random symbols ..." or the like and see if the warning goes away. If not, we might have to resort to something more drastic like removing this comment altogether. We do have non-ASCII text in comments and test cases elsewhere in the tree, and have not had a lot of trouble with that, so I'm hoping the letters can stay because they are useful to compare to the constant. regards, tom lane
Re: Temporary tables versus wraparound... again
Hi, On 2023-04-14 10:05:08 -0400, Greg Stark wrote: > On Thu, 13 Apr 2023 at 13:01, Peter Geoghegan wrote: > > > > On Thu, Apr 13, 2023 at 9:45 AM Robert Haas wrote: > > > > > > On Wed, Apr 12, 2023 at 4:23 PM Greg Stark wrote: > > > > Am I crazy or is the parenthetical comment there exactly backwards? If > > > > the horizon is *more recent* then fewer tuples are *non*-removable. > > > > I.e. *more* tuples are removable, no? > > > > > > Isn't it the non-parenthetical part that's wrong? I would expect that > > > if we don't know which relation it is, the horizon might be > > > considerably LESS recent, which would result in fewer tuples being > > > removable. If rel is *not NULL*, the horizon is more recent - that seems correct? > > You can make arguments for either way of restating it being clearer > > than the other. > > Yeah, I think Robert is being confused by the implicit double > negative. If we don't know which relation it is it's because relation > is NULL and the comment is talking about if it's "not NULL". I think > you're right that it would be less confusing if it just says "if you > pass NULL we have to give a conservative result which means an older > xid and fewer removable tuples". > > But I'm saying the parenthetical part is not just confusing, it's > outright wrong. I guess that just means the first half was so > confusing it confused not only the reader but the author too. I don't think it's outright wrong, but it is very confusing what it relates to. For some reason I tried to "attach" the parenthetical to the "otherwise", which doesn't make a whole lot of sense. How about: * If rel is not NULL the horizon may be considerably more recent (i.e. * allowing more tuples to be removed) than otherwise. In the NULL case a * horizon that is correct (but not optimal) for all relations will be * returned. Thus, if possible, a relation should be provided. Greetings, Andres Freund
Re: doc: add missing "id" attributes to extension packaging page
On Thu, 13 Apr 2023 10:53:31 -0500 "Karl O. Pinc" wrote: > On Thu, 13 Apr 2023 16:01:35 +0200 > Brar Piening wrote: > > > On 13.04.2023 at 10:31, Peter Eisentraut wrote: > > > Side project: I noticed that these new hover links don't appear in > > > the single-page HTML output (make postgres.html), even though the > > > generated HTML source code looks correct. Maybe someone has an > > > idea there. > > I feel responsible for the feature to work for all use cases where > > it makes sense. I'll investigate this and post back. > > Looks to me like the ">" in the CSS was transformed into the > HTML entity when the stylesheet was included into the single-file > HTML. The XSLT 1.0 spec says that characters in
Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert
On Fri, Apr 14, 2023 at 7:20 AM Daniel Gustafsson wrote: > And again with the attachment. After some sleep... From inspection I think the final EOF branch could be masked by the new branch, if verification has failed but was already ignored. To test that, I tried hanging up on the client partway through the server handshake, and I got some strange results. With the patch, using sslmode=require and OpenSSL 1.0.1, I see: connection to server at "127.0.0.1", port 50859 failed: SSL error: certificate verify failed: self signed certificate Which is wrong -- we shouldn't care about the self-signed failure if we're not using verify-*. I was going to suggest a patch like the following: > if (r == -1) > - libpq_append_conn_error(conn, "SSL SYSCALL error: %s", > - SOCK_STRERROR(SOCK_ERRNO, sebuf, > sizeof(sebuf))); > + { > + /* > +* If we get an X509 error here without an error in > the > +* socket layer it means that verification failed > without > +* it being a protocol error. A common cause is > trying to > +* a default system CA which is missing or broken. > +*/ > + if (!save_errno && vcode != X509_V_OK) > + libpq_append_conn_error(conn, "SSL error: > certificate verify failed: %s", > + > X509_verify_cert_error_string(vcode)); > + else > + libpq_append_conn_error(conn, "SSL SYSCALL error: > %s", > + SOCK_STRERROR(save_errno, > sebuf, sizeof(sebuf))); > + } > else > libpq_append_conn_error(conn, "SSL SYSCALL error: EOF > detected"); But then I tested my case against PG15, and I didn't get the EOF message I expected: connection to server at "127.0.0.1", port 50283 failed: SSL SYSCALL error: Success So it appears that this (hanging up on the client during the handshake) is _also_ a case where we could get a SYSCALL error with a zero errno, and my patch doesn't actually fix the misleading error message. That makes me worried, but I don't really have a concrete suggestion to make it better, yet. I'm not opposed to pushing this as a fix for the tests, but I suspect we'll have to iterate on this more. Thanks, --Jacob
Re: Direct I/O
Since the direct I/O commit went in, buildfarm animals curculio and morepork have been issuing warnings like hashpage.c: In function '_hash_expandtable': hashpage.c:995: warning: ignoring alignment for stack allocated 'zerobuf' in places where there's a local variable of type PGIOAlignedBlock or PGAlignedXLogBlock. I'm not sure why only those two animals are unhappy, but I think they have a point: typical ABIs don't guarantee alignment of function stack frames to better than 16 bytes or so. In principle the compiler could support a 4K alignment request anyway by doing the equivalent of alloca(3), but I do not think we can count on that to happen. regards, tom lane
Re: Should we remove vacuum_defer_cleanup_age?
On Fri, 2023-04-14 at 18:43 +0200, Alvaro Herrera wrote: > On 2023-Apr-14, Greg Stark wrote: > > I assume people would use hot_standby_feedback if they have streaming > > replication. > > Yes, either that or a replication slot. A replication slot doesn't do anything against snapshot conflicts, which is what we are discussing here. Or are we not? > > I find it very hard to believe that people are doing stuff with > vacuum_defer_cleanup_age that cannot be done with either of the other > newer mechanisms, which have also seen much wider usage and so bugs > fixed, etc. vacuum_defer_cleanup_age offers a more fine-grained approach. With hot_standby_feedback you can only say "don't ever remove any dead tuples that the standby still needs". But perhaps you'd prefer "don't remove dead tuples unless they are quite old", so that you can get your shorter queries on the standby to complete, without delaying replay and without the danger that a long running query on the standby bloats your primary. How about this: Let's remove vacuum_defer_cleanup_age, and put a note in the release notes that recommends using statement_timeout and hot_standby_feedback = on on the standby instead. That should have pretty much the same effect, and it is measured in time and not in the number of transactions. Yours, Laurenz Albe
Re: Should we remove vacuum_defer_cleanup_age?
On 2023-Apr-14, Greg Stark wrote: > On Fri, 14 Apr 2023 at 09:47, Jonathan S. Katz wrote: > > > > Let me restate [1] in a different way. > > > > Using a large enough dataset, I did qualitatively look at overall usage > > of both "vacuum_defer_cleanup_age" and compared to > > "hot_standby_feedback", given you can use both to accomplish similar > > outcomes. > > I assume people would use hot_standby_feedback if they have streaming > replication. Yes, either that or a replication slot. vacuum_defer_cleanup_age was added in commit efc16ea52067 (2009-12-19), for Postgres 9.0. hot_standby_feedback is a bit newer (bca8b7f16a3e, 2011-02-16), and replication slots are newer still (858ec11858a9, 2014-01-31). > The main use cases for vacuum_defer_cleanup_age would be if you're > replaying WAL files. That may sound archaic but there are plenty of > circumstances where your standby may not have network access to your > primary at all or not want to be replaying continuously. True, those cases exist. However, it sounds to me like in those cases vacuum_defer_cleanup_age doesn't really help you either; you'd just want to pause WAL replay depending on your queries or whatever. After all, you'd have to feed the WAL files "manually" to replay, so you're in control anyway without having to touch the primary. I find it very hard to believe that people are doing stuff with vacuum_defer_cleanup_age that cannot be done with either of the other newer mechanisms, which have also seen much wider usage and so bugs fixed, etc. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Oh, great altar of passive entertainment, bestow upon me thy discordant images at such speed as to render linear thought impossible" (Calvin a la TV)
Re: [RFC] building postgres with meson -v8
Hi, On 2023-04-14 11:58:42 -0400, Greg Stark wrote: > On Wed, 11 May 2022 at 06:19, Peter Eisentraut > wrote: > > > > After that, these configure options don't have an equivalent yet: > > > > --enable-profiling > > Afaics this still doesn't exist? Is there a common idiom to enable > this? Like, if I add in something to cflags is that enough? Yes. Or, well, you might also need to specify it when linking. > I seem to recall we had some hack to actually get each backend's gmon.out to > not step on each other's which needed an explicit flag to enable? I think that's enabled by default in gcc these days, if supported by the platform? TBH, I really don't see the point of this style of profiling. It doesn't provide an accurate view of where time is spent. You're much better off using performance counter driven profiling with perf et al. Greetings, Andres Freund
Re: Tab completion for CREATE SCHEMAAUTHORIZATION
Michael Paquier writes: > On Wed, Aug 11, 2021 at 10:16:15AM +0900, Michael Paquier wrote: >> + else if (Matches("CREATE", "SCHEMA", "AUTHORIZATION")) >> + COMPLETE_WITH_QUERY(Query_for_list_of_owner_roles); >> + else if (Matches("CREATE", "SCHEMA", MatchAny, "AUTHORIZATION")) >> + COMPLETE_WITH_QUERY(Query_for_list_of_owner_roles); >> + else if (Matches("CREATE", "SCHEMA", MatchAny, "AUTHORIZATION", >> MatchAny)) >> + COMPLETE_WITH("CREATE", "GRANT"); >> + else if (Matches("CREATE", "SCHEMA", MatchAny)) >> + COMPLETE_WITH("AUTHORIZATION", "CREATE", "GRANT"); >> Looks like you forgot the case "CREATE SCHEMA AUTHORIZATION MatchAny" >> that should be completed by GRANT and CREATE. > > This patch has been waiting on author for more than a couple of weeks, > so I have marked it as returned with feedback in the CF app. Please > feel free to resubmit if you are able to work more on that. Looks like I completely dropped the ball on this one, sorry. Here's a rebased patch which uses the new COMPLETE_WITH_QUERY_PLUS functionality added in commit 02b8048ba5dc36238f3e7c3c58c5946220298d71. - ilmari >From 4c4833dfdd2bb01cd35715223433f961e5ec004c Mon Sep 17 00:00:00 2001 From: tanghy Date: Mon, 9 Aug 2021 18:47:12 +0100 Subject: [PATCH v2] Add tab completion for CREATE SCHEMA - AUTHORIZATION both in addition to and after a schema name - list of owner roles after AUTHORIZATION - CREATE and GRANT after any otherwise-complete command --- src/bin/psql/tab-complete.c | 24 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index 5825b2a195..222dd617a2 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -1031,9 +1031,13 @@ static const SchemaQuery Query_for_trigger_of_table = { " FROM pg_catalog.pg_roles "\ " WHERE rolname LIKE '%s'" +/* add these to Query_for_list_of_roles in OWNER contexts */ +#define Keywords_for_list_of_owner_roles \ +"CURRENT_ROLE", "CURRENT_USER", "SESSION_USER" + /* add these to Query_for_list_of_roles in GRANT contexts */ #define Keywords_for_list_of_grant_roles \ -"PUBLIC", "CURRENT_ROLE", "CURRENT_USER", "SESSION_USER" +Keywords_for_list_of_owner_roles, "PUBLIC" #define Query_for_all_table_constraints \ "SELECT conname "\ @@ -3154,6 +3158,20 @@ psql_completion(const char *text, int start, int end) else if (TailMatches("AS", "ON", "SELECT|UPDATE|INSERT|DELETE", "TO")) COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables); +/* CREATE SCHEMA [ ] [ AUTHORIZATION ] */ + else if (Matches("CREATE", "SCHEMA")) + COMPLETE_WITH_QUERY_PLUS(Query_for_list_of_schemas, + "AUTHORIZATION"); + else if (Matches("CREATE", "SCHEMA", "AUTHORIZATION") || + Matches("CREATE", "SCHEMA", MatchAny, "AUTHORIZATION")) + COMPLETE_WITH_QUERY_PLUS(Query_for_list_of_roles, + Keywords_for_list_of_owner_roles); + else if (Matches("CREATE", "SCHEMA", "AUTHORIZATION", MatchAny) || + Matches("CREATE", "SCHEMA", MatchAny, "AUTHORIZATION", MatchAny)) + COMPLETE_WITH("CREATE", "GRANT"); + else if (Matches("CREATE", "SCHEMA", MatchAny)) + COMPLETE_WITH("AUTHORIZATION", "CREATE", "GRANT"); + /* CREATE SEQUENCE --- is allowed inside CREATE SCHEMA, so use TailMatches */ else if (TailMatches("CREATE", "SEQUENCE", MatchAny) || TailMatches("CREATE", "TEMP|TEMPORARY", "SEQUENCE", MatchAny)) @@ -4263,9 +4281,7 @@ psql_completion(const char *text, int start, int end) /* OWNER TO - complete with available roles */ else if (TailMatches("OWNER", "TO")) COMPLETE_WITH_QUERY_PLUS(Query_for_list_of_roles, - "CURRENT_ROLE", - "CURRENT_USER", - "SESSION_USER"); + Keywords_for_list_of_owner_roles); /* ORDER BY */ else if (TailMatches("FROM", MatchAny, "ORDER")) -- 2.39.2
Re: [RFC] building postgres with meson -v8
On Wed, 11 May 2022 at 06:19, Peter Eisentraut wrote: > > After that, these configure options don't have an equivalent yet: > > --enable-profiling Afaics this still doesn't exist? Is there a common idiom to enable this? Like, if I add in something to cflags is that enough? I seem to recall we had some hack to actually get each backend's gmon.out to not step on each other's which needed an explicit flag to enable? -- greg
Re: Should we remove vacuum_defer_cleanup_age?
On Fri, 14 Apr 2023 at 09:47, Jonathan S. Katz wrote: > > Let me restate [1] in a different way. > > Using a large enough dataset, I did qualitatively look at overall usage > of both "vacuum_defer_cleanup_age" and compared to > "hot_standby_feedback", given you can use both to accomplish similar > outcomes. I assume people would use hot_standby_feedback if they have streaming replication. The main use cases for vacuum_defer_cleanup_age would be if you're replaying WAL files. That may sound archaic but there are plenty of circumstances where your standby may not have network access to your primary at all or not want to be replaying continuously. I wonder whether your dataset is self-selecting sites that have streaming replication. That does seem like the more common usage pattern. Systems using wal files are more likely to be things like data warehouses, offline analytics systems, etc. They may not even be well known in the same organization that runs the online operations -- in my experience they're often run by marketing or sales organizations or in some cases infosec teams and consume data from lots of sources. The main reason to use wal archive replay is often to provide the isolation so that the operations team don't need to worry about the impact on production and that makes it easy to forget these even exist. -- greg
Re: Temporary tables versus wraparound... again
On Fri, Apr 14, 2023 at 7:05 AM Greg Stark wrote: > But I'm saying the parenthetical part is not just confusing, it's > outright wrong. I guess that just means the first half was so > confusing it confused not only the reader but the author too. I knew that that was what you meant. I agree that it's outright wrong. -- Peter Geoghegan
Where are we on supporting LLVM's opaque-pointer changes?
LLVM 16 is apparently released, because Fedora has started using it in their rawhide (development) branch, which means Postgres is failing to build there [1][2]: ../../../../src/include/jit/llvmjit_emit.h: In function 'l_load_gep1': ../../../../src/include/jit/llvmjit_emit.h:123:30: warning: implicit declaration of function 'LLVMBuildGEP'; did you mean 'LLVMBuildGEP2'? [-Wimplicit-function-declaration] 123 | LLVMValueRef v_ptr = LLVMBuildGEP(b, v, , 1, ""); | ^~~~ | LLVMBuildGEP2 ... etc etc etc ... leading to lots of +ERROR: could not load library "/builddir/build/BUILD/postgresql-15.1/tmp_install/usr/lib64/pgsql/llvmjit.so": /builddir/build/BUILD/postgresql-15.1/tmp_install/usr/lib64/pgsql/llvmjit.so: undefined symbol: LLVMBuildGEP I know we've been letting this topic slide, but we are out of runway. I propose adding this as a must-fix open item for PG 16. regards, tom lane [1] https://bugzilla.redhat.com/show_bug.cgi?id=2186381 [2] https://kojipkgs.fedoraproject.org/work/tasks/8408/99938408/build.log
Re: Issue in postgres_fdw causing unnecessary wait for cancel request reply
On 2023/04/14 18:59, Etsuro Fujita wrote: The primary message basically should avoid reference to implementation details such as specific structure names like PGcancel, shouldn't it, as per the error message style guide? I do not think that PGcancel is that specific, as it is described in the user-facing documentation [1]. (In addition, the error message I proposed was created by copying the existing error message "could not create OpenSSL BIO structure" in contrib/sslinfo.c.) I think that mentioning PGcancel in the error message could be confusing for average users who are just running a query on a foreign table and encounter the error message after pressing Ctrl-C. They may not understand why the PGcancel struct is referenced in the error message while accessing foreign tables. It could be viewed as an internal detail that is not necessary for the user to know. Although the primary message is the same, the supplemental message provides additional context that can help distinguish which function is reporting the message. If the user is familiar with the PQgetCancel/PQcancel internals, this is true, but if not, I do not think this is always true. Consider this error message, for example: 2023-04-14 17:48:55.862 JST [24344] WARNING: could not send cancel request: invalid integer value "999" for connection option "keepalives" It would be hard for users without the knowledge about those internals to distinguish that from this message. For average users, I think it would be good to use a more distinguishable error message. In this case, I believe that they should be able to understand that an invalid integer value "999" was specified in the "keepalives" connection option, which caused the warning message. Then, they would need to check the setting of the "keepalives" option and correct it if necessary. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert
> On 14 Apr 2023, at 16:20, Daniel Gustafsson wrote: > >> On 14 Apr 2023, at 15:51, Tom Lane wrote: >> >> Daniel Gustafsson writes: >>> I mainly put save_errno back into SOCK_ERRNO for greppability, I don't have >>> any >>> strong opinions either way so I went with the latter suggestion. Attached >>> v3 >>> does the above change and passes the tests both with a broken and working >>> system CA pool. Unless objections from those with failing local envs I >>> propose >>> this is pushed to close the open item. >> >> One more question when looking at it with fresh eyes: should the argument >> of X509_verify_cert_error_string be "ecode" or "vcode"? > > Good catch, it should be vcode. And again with the attachment. -- Daniel Gustafsson libpq_system_ca_fix_v4.diff Description: Binary data
Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert
> On 14 Apr 2023, at 15:51, Tom Lane wrote: > > Daniel Gustafsson writes: >> I mainly put save_errno back into SOCK_ERRNO for greppability, I don't have >> any >> strong opinions either way so I went with the latter suggestion. Attached v3 >> does the above change and passes the tests both with a broken and working >> system CA pool. Unless objections from those with failing local envs I >> propose >> this is pushed to close the open item. > > One more question when looking at it with fresh eyes: should the argument > of X509_verify_cert_error_string be "ecode" or "vcode"? Good catch, it should be vcode. -- Daniel Gustafsson
Re: Temporary tables versus wraparound... again
On Thu, 13 Apr 2023 at 13:01, Peter Geoghegan wrote: > > On Thu, Apr 13, 2023 at 9:45 AM Robert Haas wrote: > > > > On Wed, Apr 12, 2023 at 4:23 PM Greg Stark wrote: > > > Am I crazy or is the parenthetical comment there exactly backwards? If > > > the horizon is *more recent* then fewer tuples are *non*-removable. > > > I.e. *more* tuples are removable, no? > > > > Isn't it the non-parenthetical part that's wrong? I would expect that > > if we don't know which relation it is, the horizon might be > > considerably LESS recent, which would result in fewer tuples being > > removable. > > You can make arguments for either way of restating it being clearer > than the other. Yeah, I think Robert is being confused by the implicit double negative. If we don't know which relation it is it's because relation is NULL and the comment is talking about if it's "not NULL". I think you're right that it would be less confusing if it just says "if you pass NULL we have to give a conservative result which means an older xid and fewer removable tuples". But I'm saying the parenthetical part is not just confusing, it's outright wrong. I guess that just means the first half was so confusing it confused not only the reader but the author too. -- greg
Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert
Daniel Gustafsson writes: > I mainly put save_errno back into SOCK_ERRNO for greppability, I don't have > any > strong opinions either way so I went with the latter suggestion. Attached v3 > does the above change and passes the tests both with a broken and working > system CA pool. Unless objections from those with failing local envs I > propose > this is pushed to close the open item. One more question when looking at it with fresh eyes: should the argument of X509_verify_cert_error_string be "ecode" or "vcode"? regards, tom lane
Re: Should we remove vacuum_defer_cleanup_age?
On 4/14/23 8:30 AM, Robert Haas wrote: On Thu, Apr 13, 2023 at 11:06 PM Laurenz Albe wrote: I am not against this in principle, but I know that there are people using this parameter; see the discussion linked in https://postgr.es/m/e1jkzxe-0006dw...@gemulon.postgresql.org I can't say if they have a good use case for that parameter or not. Yeah, I feel similarly. Actually, personally I have no evidence, not even an anecdote, suggesting that this parameter is in use, but I'm a bit skeptical of any consensus of the form "no one is using X," because there sure are a lot of people running PostgreSQL and they sure do a lot of things. Some more justifiably than others, but often people have surprisingly good excuses for doing stuff that sounds bizarre when you first hear about it, and it doesn't seem totally impossible that somebody could have found a way to get value out of this. Let me restate [1] in a different way. Using a large enough dataset, I did qualitatively look at overall usage of both "vacuum_defer_cleanup_age" and compared to "hot_standby_feedback", given you can use both to accomplish similar outcomes. The usage of "vacuum_defer_cleanup_age" was really minimal, in fact approaching "0", whereas "hot_standby_feedback" had significant adoption. I'm not saying that "we should remove a setting just because it's not used" or that it may not have utility -- I'm saying that we can remove the setting given: 1. We know that using this setting incorrectly (which can be done fairly easily) can cause significant issues 2. There's another setting that can accomplish similar goals that's much safer 3. The setting itself is not widely used It's the combination of all 3 that led to my conclusion. If it were just (1), I'd lean more strongly towards trying to fix it first. However, I suspect that there aren't many such people, and I think the setting is a kludge, so I support removing it. Maybe we'll find out that we ought to add something else instead, like a limited delimited in time rather than in XIDs. Or maybe the existing facilities are good enough. But as Peter rightly says, XID age is likely a poor proxy for whatever people really care about, so I don't think continuing to have a setting that works like that is a good plan. That seems like a good eventual outcome. Thanks, Jonathan [1] https://www.postgresql.org/message-id/bf42784f-4d57-0a3d-1a06-ffac1a09318c%40postgresql.org OpenPGP_signature Description: OpenPGP digital signature
Re: Synchronizing slots from primary to standby
Hi, On 11/15/22 10:02 AM, Drouvot, Bertrand wrote: Hi, On 2/11/22 3:26 PM, Peter Eisentraut wrote: On 10.02.22 22:47, Bruce Momjian wrote: On Tue, Feb 8, 2022 at 08:27:32PM +0530, Ashutosh Sharma wrote: Which means that if e.g. the standby_slot_names GUC differs from synchronize_slot_names on the physical replica, the slots synchronized on the physical replica are not going to be valid. Or if the primary drops its logical slots. Should the redo function for the drop replication slot have the capability to drop it on standby and its subscribers (if any) as well? Slots are not WAL logged (and shouldn't be). I think you pretty much need the recovery conflict handling infrastructure I referenced upthread, which recognized during replay if a record has a conflict with a slot on a standby. And then ontop of that you can build something like this patch. OK. Understood, thanks Andres. I would love to see this feature in PG 15. Can someone explain its current status? Thanks. The way I understand it: 1. This feature (probably) depends on the "Minimal logical decoding on standbys" patch. The details there aren't totally clear (to me). That patch had some activity lately but I don't see it in a state that it's nearing readiness. FWIW, a proposal has been submitted in [1] to add information in the WAL records in preparation for logical slot conflict handling. [1]: https://www.postgresql.org/message-id/178cf7da-9bd7-e328-9c49-e28ac4701...@gmail.com Now that the "Minimal logical decoding on standby" patch series (mentioned up-thread) has been committed, I think we can resume working on this one ("Synchronizing slots from primary to standby"). I'll work on a rebase and share it once done (unless someone already started working on a rebase). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: postgres_fdw: wrong results with self join + enable_nestloop off
Hi Nishant, On Fri, Apr 14, 2023 at 8:39 PM Nishant Sharma wrote: > I debugged this issue and was able to find a fix for the same. Kindly please > refer to the attached fix. With the fix I am able to resolve the issue. Thanks for the report and patch! > What is the technical issue? > The problem here is the use of extract_actual_clauses. Because of which the > plan creation misses adding the second condition of AND i.e "now() < > '23-Feb-2020'::timestamp" in the plan. Because it is not considered a pseudo > constant and extract_actual_clause is passed with false as the second > parameter and it gets skipped from the list. As a result that condition is > never taken into consideration as either one-time filter (before or after) or > part of SQL remote execution > > Why do I think the fix is correct? > The fix is simple, where we have created a new function similar to > extract_actual_clause which just extracts all the conditions from the list > with no checks and returns the list to the caller. As a result all conditions > would be taken into consideration in the query plan. I think that the root cause for this issue would be in the create_scan_plan handling of pseudoconstant quals when creating a foreign-join (or custom-join) plan. Anyway, I will look at your patch closely, first. Best regards, Etsuro Fujita
Re: OOM in hash join
On Fri, Apr 14, 2023 at 11:43 PM Jehan-Guillaume de Rorthais wrote: > Would you be able to test the latest patchset posted [1] ? This does not fix > the work_mem overflow, but it helps to keep the number of batches > balanced and acceptable. Any feedback, comment or review would be useful. > > [1] > https://www.postgresql.org/message-id/flat/20230408020119.32a0841b%40karst#616c1f41fcc10e8f89d41e8e5693618c Hi Jehan-Guillaume. I hadn't paid attention to that thread before probably due to timing and the subject and erm ETOOMANYTHREADS. Thanks for all the work you've done to study this area and also review and summarise the previous writing/patches/ideas.
Re: Request for comment on setting binary format output per session
On Mon, Apr 3, 2023 at 11:29 AM Dave Cramer wrote: > > participating clients to receive GUC configured format. I do not > >> > think that libpq's result format being able to be overridden by GUC >>> > is a good idea at all, the library has to to participate, and I >>> > think can be made to so so without adjusting the interface (say, by >>> > resultFormat = 3). >>> >>> Interesting idea. I suppose you'd need to specify 3 for all result >>> columns? That is a protocol change, but wouldn't "break" older clients. >>> The newer clients would need to make sure that they're connecting to >>> v16+, so using the protocol version alone wouldn't be enough. Hmm. >>> >> >> > So this only works with extended protocol and not simple queries. > Again, as Peter mentioned it's already easy enough to confuse psql using > binary cursors so > it makes sense to fix psql either way. > > If you use resultFormat (3) I think you'd still end up doing the Describe > (which we are trying to avoid) to make sure you could receive all the > columns in binary. > Can you elaborate on why Describe would have to be passed? Agreed that would be a dealbreaker if so. If you pass a 3 sending it in, the you'd be checking PQfformat on data coming back as 0/1, or at least that's be smart since you're indicating the server is able to address the format. This addresses the concern libpq clients currently passing resultfomat zero could not have that decision overridden by the server which I also think is a dealbreaker. There might be other reasons why a describe message may be forced however. merlin
Re: Wrong results from Parallel Hash Full Join
On Thu, Apr 13, 2023 at 7:06 PM Thomas Munro wrote: > For some reason I thought we weren't supposed to use the flat thing, > but it looks like I'm just wrong and people do that all the time so I > take that back. > > Pushed. Thanks Richard and Melanie. I tend to use http://postgr.es/m/ or https://postgr.es/m/ just to keep the URL a bit shorter, and also because I like to point anyone reading the commit log to the particular message that I think is most relevant rather than to the thread as a whole. But I don't think there's any hard-and-fast rule that committers have to do it one way rather than another. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Should we remove vacuum_defer_cleanup_age?
> On 14 Apr 2023, at 14:30, Robert Haas wrote: > ..as Peter rightly says, XID age is likely a poor proxy for > whatever people really care about, so I don't think continuing to have > a setting that works like that is a good plan. Agreed, and removing it is likely to be a good vehicle for figuring out what we should have instead (if anything). -- Daniel Gustafsson
Re: Should we remove vacuum_defer_cleanup_age?
On Thu, Apr 13, 2023 at 11:06 PM Laurenz Albe wrote: > I am not against this in principle, but I know that there are people using > this parameter; see the discussion linked in > > https://postgr.es/m/e1jkzxe-0006dw...@gemulon.postgresql.org > > I can't say if they have a good use case for that parameter or not. Yeah, I feel similarly. Actually, personally I have no evidence, not even an anecdote, suggesting that this parameter is in use, but I'm a bit skeptical of any consensus of the form "no one is using X," because there sure are a lot of people running PostgreSQL and they sure do a lot of things. Some more justifiably than others, but often people have surprisingly good excuses for doing stuff that sounds bizarre when you first hear about it, and it doesn't seem totally impossible that somebody could have found a way to get value out of this. However, I suspect that there aren't many such people, and I think the setting is a kludge, so I support removing it. Maybe we'll find out that we ought to add something else instead, like a limited delimited in time rather than in XIDs. Or maybe the existing facilities are good enough. But as Peter rightly says, XID age is likely a poor proxy for whatever people really care about, so I don't think continuing to have a setting that works like that is a good plan. -- Robert Haas EDB: http://www.enterprisedb.com
Re: OOM in hash join
On Fri, 14 Apr 2023 13:21:05 +0200 Matthias van de Meent wrote: > On Fri, 14 Apr 2023 at 12:59, Konstantin Knizhnik wrote: > > > > Hi hackers, > > > > Too small value of work_mem cause memory overflow in parallel hash join > > because of too much number batches. > > There is the plan: > > [...] > > > There is still some gap between size reported by memory context sump and > > actual size of backend. > > But is seems to be obvious, that trying to fit in work_mem > > sharedtuplestore creates so much batches, that them consume much more > > memory than work_mem. Indeed. The memory consumed by batches is not accounted and the consumption reported in explain analyze is wrong. Would you be able to test the latest patchset posted [1] ? This does not fix the work_mem overflow, but it helps to keep the number of batches balanced and acceptable. Any feedback, comment or review would be useful. [1] https://www.postgresql.org/message-id/flat/20230408020119.32a0841b%40karst#616c1f41fcc10e8f89d41e8e5693618c Regards,
postgres_fdw: wrong results with self join + enable_nestloop off
Hi, We have observed that running the same self JOIN query on postgres FDW setup is returning different results with set enable_nestloop off & on. I am at today's latest commit:- 928e05ddfd4031c67e101c5e74dbb5c8ec4f9e23 I created a local FDW setup. And ran this experiment on the same. Kindly refer to the P.S section for details. || *Below is the output difference along with query plan:-* postgres@71609=#set enable_nestloop=off; SET postgres@71609=#select * from pg_tbl_foreign tbl1 join pg_tbl_foreign tbl2 on tbl1.id1 < 5 and now() < '23-Feb-2020'::timestamp; id1 | id2 | id1 | id2 -+-+-+- 1 | 10 | 1 | 10 2 | 20 | 1 | 10 3 | 30 | 1 | 10 1 | 10 | 2 | 20 2 | 20 | 2 | 20 3 | 30 | 2 | 20 1 | 10 | 3 | 30 2 | 20 | 3 | 30 3 | 30 | 3 | 30 (9 rows) postgres@71609=#explain (analyze, verbose) select * from pg_tbl_foreign tbl1 join pg_tbl_foreign tbl2 on tbl1.id1 < 5 and now() < '23-Feb-2020'::timestamp; QUERY PLAN - Foreign Scan (cost=100.00..49310.40 rows=2183680 width=16) (actual time=0.514..0.515 rows=9 loops=1) Output: tbl1.id1, tbl1.id2, tbl2.id1, tbl2.id2 Relations: (public.pg_tbl_foreign tbl1) INNER JOIN (public.pg_tbl_foreign tbl2) Remote SQL: SELECT r1.id1, r1.id2, r2.id1, r2.id2 FROM (public.pg_tbl r1 INNER JOIN public.pg_tbl r2 ON (((r1.id1 < 5 Planning Time: 0.139 ms Execution Time: 0.984 ms (6 rows) postgres@71609=#set enable_nestloop=on; SET postgres@71609=#select * from pg_tbl_foreign tbl1 join pg_tbl_foreign tbl2 on tbl1.id1 < 5 and now() < '23-Feb-2020'::timestamp; id1 | id2 | id1 | id2 -+-+-+- (0 rows) postgres@71609=#explain (analyze, verbose) select * from pg_tbl_foreign tbl1 join pg_tbl_foreign tbl2 on tbl1.id1 < 5 and now() < '23-Feb-2020'::timestamp; QUERY PLAN --- Result (cost=200.00..27644.00 rows=2183680 width=16) (actual time=0.003..0.004 rows=0 loops=1) Output: tbl1.id1, tbl1.id2, tbl2.id1, tbl2.id2 One-Time Filter: (now() < '2020-02-23 00:00:00'::timestamp without time zone) -> Nested Loop (cost=200.00..27644.00 rows=2183680 width=16) (never executed) Output: tbl1.id1, tbl1.id2, tbl2.id1, tbl2.id2 -> Foreign Scan on public.pg_tbl_foreign tbl2 (cost=100.00..186.80 rows=2560 width=8) (never executed) Output: tbl2.id1, tbl2.id2 Remote SQL: SELECT id1, id2 FROM public.pg_tbl -> Materialize (cost=100.00..163.32 rows=853 width=8) (never executed) Output: tbl1.id1, tbl1.id2 -> Foreign Scan on public.pg_tbl_foreign tbl1 (cost=100.00..159.06 rows=853 width=8) (never executed) Output: tbl1.id1, tbl1.id2 Remote SQL: SELECT id1, id2 FROM public.pg_tbl WHERE ((id1 < 5)) Planning Time: 0.178 ms Execution Time: 0.292 ms (15 rows) || I debugged this issue and was able to find a fix for the same. Kindly please refer to the attached fix. With the fix I am able to resolve the issue. But I am not that confident whether this change would affect some other existing functionally but it has helped me resolve this result difference in output. *What is the technical issue?* The problem here is the use of extract_actual_clauses. Because of which the plan creation misses adding the second condition of AND i.e "now() < '23-Feb-2020'::timestamp" in the plan. Because it is not considered a pseudo constant and extract_actual_clause is passed with false as the second parameter and it gets skipped from the list. As a result that condition is never taken into consideration as either one-time filter (before or after) or part of SQL remote execution *Why do I think the fix is correct?* The fix is simple, where we have created a new function similar to extract_actual_clause which just extracts all the conditions from the list with no checks and returns the list to the caller. As a result all conditions would be taken into consideration in the query plan. *After my fix patch:-* postgres@78754=#set enable_nestloop=off; SET postgres@78754=#select * from pg_tbl_foreign tbl1 join pg_tbl_foreign tbl2 on tbl1.id1 < 5 and now() < '23-Feb-2020'::timestamp; id1 | id2 | id1 | id2 -+-+-+- (0 rows) ^ postgres@78754=#explain (analyze, verbose) select * from pg_tbl_foreign tbl1 join pg_tbl_foreign tbl2 on tbl1.id1 < 5 and now() < '23-Feb-2020'::timestamp; QUERY PLAN
Re: OOM in hash join
On Fri, Apr 14, 2023 at 10:59 PM Konstantin Knizhnik wrote: > Too small value of work_mem cause memory overflow in parallel hash join > because of too much number batches. Yeah. Not only in parallel hash join, but in any hash join (admittedly parallel hash join has higher per-batch overheads; that is perhaps something we could improve). That's why we tried to invent an alternative strategy where you loop over batches N times, instead of making more batches, at some point: https://www.postgresql.org/message-id/flat/CA+hUKGKWWmf=WELLG=augbcugrasqbtm0tkyibut-b2rvkx...@mail.gmail.com That thread starts out talking about 'extreme skew' etc but the more general problem is that, at some point, even with perfectly evenly distributed keys, adding more batches requires more memory than you can save by doing so. Sure, it's a problem that we don't account for that memory properly, as complained about here: https://www.postgresql.org/message-id/flat/20190504003414.bulcbnge3rhwhcsh@development If you did have perfect prediction of every byte you will need, maybe you could say, oh, well, we just don't have enough memory for a hash join, so let's do a sort/merge instead. But you can't, because (1) some types aren't merge-joinable, and (2) in reality sometimes you've already started the hash join due to imperfect stats so it's too late to change strategies.
Re: OOM in hash join
On Fri, 14 Apr 2023 at 12:59, Konstantin Knizhnik wrote: > > Hi hackers, > > Too small value of work_mem cause memory overflow in parallel hash join > because of too much number batches. > There is the plan: [...] > There is still some gap between size reported by memory context sump and > actual size of backend. > But is seems to be obvious, that trying to fit in work_mem > sharedtuplestore creates so much batches, that them consume much more > memory than work_mem. The same issue [0] was reported a few weeks ago, with the same diagnosis here [1]. I think it's being worked on over there. Kind regards, Matthias van de Meent [0] https://www.postgresql.org/message-id/flat/20230228190643.1e368315%40karst [1] https://www.postgresql.org/message-id/flat/3013398b-316c-638f-2a73-3783e8e2ef02%40enterprisedb.com#ceb9e14383122ade8b949b7479c6f7e2
OOM in hash join
Hi hackers, Too small value of work_mem cause memory overflow in parallel hash join because of too much number batches. There is the plan: explain SELECT * FROM solixschema.MIG_50GB_APR04_G1_H a join solixschema.MIG_50GB_APR04_G2_H b on a.seq_pk = b.seq_pk join solixschema.MIG_50GB_APR04_G3_H c on b.seq_p k = c.seq_pk join solixschema.MIG_50GB_APR04_G4_H d on c.seq_pk = d.seq_pk join solixschema.MIG_50GB_APR04_G5_H e on d.seq_pk = e.seq_pk; QUERY PLAN - Gather (cost=205209076.76..598109290.40 rows=121319744 width=63084) Workers Planned: 8 -> Parallel Hash Join (cost=205208076.76..585976316.00 rows=15164968 width=63084) Hash Cond: (b.seq_pk = a.seq_pk) -> Parallel Hash Join (cost=55621683.59..251148173.17 rows=14936978 width=37851) Hash Cond: (b.seq_pk = c.seq_pk) -> Parallel Hash Join (cost=27797595.68..104604780.40 rows=15346430 width=25234) Hash Cond: (b.seq_pk = d.seq_pk) -> Parallel Seq Scan on mig_50gb_apr04_g2_h b (cost=0.00..4021793.90 rows=15783190 width=12617) -> Parallel Hash (cost=3911716.30..3911716.30 rows=15346430 width=12617) -> Parallel Seq Scan on mig_50gb_apr04_g4_h d (cost=0.00..3911716.30 rows=15346430 width=12617) -> Parallel Hash (cost=3913841.85..3913841.85 rows=15362085 width=12617) -> Parallel Seq Scan on mig_50gb_apr04_g3_h c (cost=0.00..3913841.85 rows=15362085 width=12617) -> Parallel Hash (cost=102628306.07..102628306.07 rows=15164968 width=25233) -> Parallel Hash Join (cost=27848049.61..102628306.07 rows=15164968 width=25233) Hash Cond: (a.seq_pk = e.seq_pk) -> Parallel Seq Scan on mig_50gb_apr04_g1_h a (cost=0.00..3877018.68 rows=15164968 width=12617) -> Parallel Hash (cost=3921510.05..3921510.05 rows=15382205 width=12616) -> Parallel Seq Scan on mig_50gb_apr04_g5_h e (cost=0.00..3921510.05 rows=15382205 width=12616) work_mem is 4MB and leader + two parallel workers consumes about 10Gb each. There are 262144 batches: (gdb) p *hjstate->hj_HashTable $2 = {nbuckets = 1024, log2_nbuckets = 10, nbuckets_original = 1024, nbuckets_optimal = 1024, log2_nbuckets_optimal = 10, buckets = { unshared = 0x7fa5d5211000, shared = 0x7fa5d5211000}, keepNulls = false, skewEnabled = false, skewBucket = 0x0, skewBucketLen = 0, nSkewBuckets = 0, skewBucketNums = 0x0, nbatch = 262144, curbatch = 86506, nbatch_original = 262144, nbatch_outstart = 262144, growEnabled = true, totalTuples = 12260, partialTuples = 61136408, skewTuples = 0, innerBatchFile = 0x0, outerBatchFile = 0x0, outer_hashfunctions = 0x55ce086a3288, inner_hashfunctions = 0x55ce086a32d8, hashStrict = 0x55ce086a3328, collations = 0x55ce086a3340, spaceUsed = 0, spaceAllowed = 8388608, spacePeak = 204800, spaceUsedSkew = 0, spaceAllowedSkew = 167772, hashCxt = 0x55ce086a3170, batchCxt = 0x55ce086a5180, chunks = 0x0, current_chunk = 0x7fa5d5283000, area = 0x55ce085b56d8, parallel_state = 0x7fa5ee993520, batches = 0x7fa5d3ff8048, current_chunk_shared = 1099512193024} The biggest memory contexts are: ExecutorState: 1362623568 HashTableContext: 102760280 HashBatchContext: 7968 HashTableContext: 178257752 HashBatchContext: 7968 HashTableContext: 5306745728 HashBatchContext: 7968 There is still some gap between size reported by memory context sump and actual size of backend. But is seems to be obvious, that trying to fit in work_mem sharedtuplestore creates so much batches, that them consume much more memory than work_mem.
RE: [PoC] pg_upgrade: allow to upgrade publisher node
Dear Julien, > Sorry for the delay, I didn't had time to come back to it until this > afternoon. No issues, everyone is busy:-). > I don't think that your analysis is correct. Slots are guaranteed to be > stopped after all the normal backends have been stopped, exactly to avoid such > extraneous records. > > What is happening here is that the slot's confirmed_flush_lsn is properly > updated in memory and ends up being the same as the current LSN before the > shutdown. But as it's a logical slot and those records aren't decoded, the > slot isn't marked as dirty and therefore isn't saved to disk. You don't see > that behavior when doing a manual checkpoint before (per your script comment), > as in that case the checkpoint also tries to save the slot to disk but then > finds a slot that was marked as dirty and therefore saves it. > > In your script's scenario, when you restart the server the previous slot data > is restored and the confirmed_flush_lsn goes backward, which explains those > extraneous records. So you meant to say that the key point was that some records which are not sent to subscriber do not mark slots as dirty, hence the updated confirmed_flush was not written into slot file. Is it right? LogicalConfirmReceivedLocation() is called by walsender when the process gets reply from worker process, so your analysis seems correct. > It's probably totally harmless to throw away that value for now (and probably > also doesn't lead to crazy amount of work after restart, I really don't know > much about the logical slot code), but clearly becomes problematic with your > usecase. One easy way to fix this is to teach the checkpoint code to force > saving the logical slots to disk even if they're not marked as dirty during a > shutdown checkpoint, as done in the attached v1 patch (renamed as .txt to not > interfere with the cfbot). With this patch applied I reliably only see a > final > shutdown checkpoint record with your scenario. > > Now such a change will make shutdown a bit more expensive when using logical > replication, even if in 99% of cases you will not need to save the > confirmed_flush_lsn value, so I don't know if that's acceptable or not. In any case we these records must be advanced. IIUC, currently such records are read after rebooting but ingored, and this patch just skips them. I have not measured, but there is a possibility that is not additional overhead, just a trade-off. Currently I did not come up with another solution, so I have included your patch. Please see 0002. Additionally, I added a checking functions in 0003. According to pg_resetwal and other functions, the length of CHECKPOINT_SHUTDOWN record seems (SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort + sizeof(CheckPoint)). Therefore, the function ensures that the difference between current insert position and confirmed_flush_lsn is less than (above + page header). Best Regards, Hayato Kuroda FUJITSU LIMITED v7-0001-pg_upgrade-Add-include-logical-replication-slots-.patch Description: v7-0001-pg_upgrade-Add-include-logical-replication-slots-.patch v7-0002-Always-persist-to-disk-logical-slots-during-a-shu.patch Description: v7-0002-Always-persist-to-disk-logical-slots-during-a-shu.patch v7-0003-pg_upgrade-Add-check-function-for-include-logical.patch Description: v7-0003-pg_upgrade-Add-check-function-for-include-logical.patch
Re: Tab completion for AT TIME ZONE
On 14.04.23 11:29, Dagfinn Ilmari Mannsåker wrote: It doesn't tab complete the AT TIME ZONE operator itself, just the timezone name after it, so this sholud work: # SELECT now() AT TIME ZONE or # SELECT now() AT TIME ZONE am However, looking more closely at the grammar, the word AT only occurs in AT TIME ZONE, so we could complete the operator itself as well. Updated patch attatched. Best, Jim - ilmari Got it. In that case, everything seems to work just fine: postgres=# SELECT now() AT .. autocompletes TIME ZONE : postgres=# SELECT now() AT TIME ZONE postgres=# SELECT now() AT TIME ZONE Display all 598 possibilities? (y or n) postgres=# SELECT now() AT TIME ZONE 'Europe/Is Europe/Isle_of_Man Europe/Istanbul also neglecting the opening single quotes ... postgres=# SELECT now() AT TIME ZONE Europe/Is ... autocompletes it after : postgres=# SELECT now() AT TIME ZONE 'Europe/Is The patch applies cleanly and it does what it is proposing. - and it's IMHO a very nice addition. I've marked the CF entry as "Ready for Committer". Jim
Re: Adding argument names to aggregate functions
Jim Jones writes: > On 12.04.23 19:53, Dagfinn Ilmari Mannsåker wrote: >> Dagfinn Ilmari Mannsåker writes: >> >>> Hi hackers, >>> >>> I'm sure I'm not the only one who can never remember which way around >>> the value and delimiter arguments go for string_agg() and has to look it >>> up in the manual every time. To make it more convenient, here's a patch >>> that adds proargnames to its pg_proc entries so that it can be seen with >>> a quick \df in psql. >> Added to the 2023-07 commitfest: >> >> https://commitfest.postgresql.org/43/4275/ >> >> - ilmari > > +1 for adding the argument names. > > The patch needs a rebase though.. it no longer applies : > > $ git apply > ~/Downloads/0001-Add-argument-names-to-multi-argument-aggregates.patch > error: patch failed: src/include/catalog/pg_proc.dat:8899 > error: src/include/catalog/pg_proc.dat: patch does not apply Thanks for the heads-up, here's a rebased patch. I've also formatted the lines to match what reformat_dat_file.pl wants. It also wanted to reformat a bunch of other entries, but I left those alone. - ilmari >From a6ff997fcea7aa7201318cb94db0173ea6efdf02 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= Date: Mon, 27 Feb 2023 13:06:29 + Subject: [RFC PATCH v2] Add argument names to multi-argument aggregates This makes it easier to see which way around the arguments go when using \dfa. This is particularly relevant for string_agg(), but add it to json(b)_object_agg() too for good measure. --- src/include/catalog/pg_proc.dat | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index b516cee8bd..b2db8d07e1 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -5017,7 +5017,7 @@ { oid => '3538', descr => 'concatenate aggregate input into a string', proname => 'string_agg', prokind => 'a', proisstrict => 'f', prorettype => 'text', proargtypes => 'text text', - prosrc => 'aggregate_dummy' }, + proargnames => '{value,delimiter}', prosrc => 'aggregate_dummy' }, { oid => '3543', descr => 'aggregate transition function', proname => 'bytea_string_agg_transfn', proisstrict => 'f', prorettype => 'internal', proargtypes => 'internal bytea bytea', @@ -5029,7 +5029,7 @@ { oid => '3545', descr => 'concatenate aggregate input into a bytea', proname => 'string_agg', prokind => 'a', proisstrict => 'f', prorettype => 'bytea', proargtypes => 'bytea bytea', - prosrc => 'aggregate_dummy' }, + proargnames => '{value,delimiter}', prosrc => 'aggregate_dummy' }, # To ASCII conversion { oid => '1845', descr => 'encode text from DB encoding to ASCII text', @@ -8978,7 +8978,7 @@ { oid => '3197', descr => 'aggregate input into a json object', proname => 'json_object_agg', prokind => 'a', proisstrict => 'f', provolatile => 's', prorettype => 'json', proargtypes => 'any any', - prosrc => 'aggregate_dummy' }, + proargnames => '{key,value}', prosrc => 'aggregate_dummy' }, { oid => '8955', descr => 'aggregate non-NULL input into a json object', proname => 'json_object_agg_strict', prokind => 'a', proisstrict => 'f', provolatile => 's', prorettype => 'json', proargtypes => 'any any', @@ -9906,7 +9906,7 @@ prosrc => 'jsonb_object_agg_finalfn' }, { oid => '3270', descr => 'aggregate inputs into jsonb object', proname => 'jsonb_object_agg', prokind => 'a', proisstrict => 'f', - prorettype => 'jsonb', proargtypes => 'any any', + prorettype => 'jsonb', proargtypes => 'any any', proargnames => '{key,value}', prosrc => 'aggregate_dummy' }, { oid => '8963', descr => 'aggregate non-NULL inputs into jsonb object', proname => 'jsonb_object_agg_strict', prokind => 'a', proisstrict => 'f', -- 2.39.2
Re: Issue in postgres_fdw causing unnecessary wait for cancel request reply
On Fri, Apr 14, 2023 at 3:19 AM Fujii Masao wrote: > On 2023/04/13 15:13, Etsuro Fujita wrote: > > I am not 100% sure that it is a good idea to use the same error > > message "could not send cancel request" for the PQgetCancel() and > > PQcancel() cases, because they are different functions. How about > > "could not create PGcancel structure” or something like that, for the > > The primary message basically should avoid reference to implementation > details such as specific structure names like PGcancel, shouldn't it, as per > the error message style guide? I do not think that PGcancel is that specific, as it is described in the user-facing documentation [1]. (In addition, the error message I proposed was created by copying the existing error message "could not create OpenSSL BIO structure" in contrib/sslinfo.c.) > > former case, so we can distinguish the former error from the latter? > > Although the primary message is the same, the supplemental message provides > additional context that can help distinguish which function is reporting the > message. If the user is familiar with the PQgetCancel/PQcancel internals, this is true, but if not, I do not think this is always true. Consider this error message, for example: 2023-04-14 17:48:55.862 JST [24344] WARNING: could not send cancel request: invalid integer value "999" for connection option "keepalives" It would be hard for users without the knowledge about those internals to distinguish that from this message. For average users, I think it would be good to use a more distinguishable error message. Best regards, Etsuro Fujita [1] https://www.postgresql.org/docs/current/libpq-cancel.html
Re: Tab completion for AT TIME ZONE
Hi Jim, Thanks for having a look at my patch, but please don't top post on PostgreSQL lists. Jim Jones writes: > Hi, > > On 12.04.23 19:53, Dagfinn Ilmari Mannsåker wrote: >> Dagfinn Ilmari Mannsåker writes: >> >>> Hi hackers, >>> >>> A while back we added support for completing time zone names after SET >>> TIMEZONE, but we failed to do the same for the AT TIME ZONE operator. >>> Here's a trivial patch for that. >> > > Is this supposed to provide tab completion for the AT TIME ZONE operator > like in this query? > > SELECT '2023-04-14 08:00:00' AT TIME ZONE 'Europe/Lisbon'; > > The patch applied cleanly but I'm afraid I cannot reproduce the intended > behaviour: > > postgres=# SELECT '2023-04-14 08:00:00' AT > > postgres=# SELECT '2023-04-14 08:00:00' AT T > > postgres=# SELECT '2023-04-14 08:00:00' AT TIME Z > > Perhaps I'm testing it in the wrong place? It doesn't tab complete the AT TIME ZONE operator itself, just the timezone name after it, so this sholud work: # SELECT now() AT TIME ZONE or # SELECT now() AT TIME ZONE am However, looking more closely at the grammar, the word AT only occurs in AT TIME ZONE, so we could complete the operator itself as well. Updated patch attatched. > Best, Jim - ilmari >From 365844db04d27c5bcd1edf8a9d0d44353bc34631 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= Date: Wed, 29 Mar 2023 11:16:01 +0100 Subject: [PATCH v2] psql: tab completion for AT TIME ZONE Commit 7fa3db367986160dee2b2b0bbfb61e1a51d486fd added support for completing time zone names, use that after the AT TIME ZONE operator. Also complete the operator itself, since it's the only thing in the grammar starting with AT. --- src/bin/psql/tab-complete.c | 8 1 file changed, 8 insertions(+) diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index 5825b2a195..e3870c68e9 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -4657,6 +4657,14 @@ psql_completion(const char *text, int start, int end) else if (TailMatches("JOIN")) COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_selectables); +/* ... AT TIME ZONE ... */ + else if (TailMatches("AT")) + COMPLETE_WITH("TIME ZONE"); + else if (TailMatches("AT", "TIME")) + COMPLETE_WITH("ZONE"); + else if (TailMatches("AT", "TIME", "ZONE")) + COMPLETE_WITH_TIMEZONE_NAME(); + /* Backslash commands */ /* TODO: \dc \dd \dl */ else if (TailMatchesCS("\\?")) -- 2.39.2
Re: Adding argument names to aggregate functions
On 12.04.23 19:53, Dagfinn Ilmari Mannsåker wrote: Dagfinn Ilmari Mannsåker writes: Hi hackers, I'm sure I'm not the only one who can never remember which way around the value and delimiter arguments go for string_agg() and has to look it up in the manual every time. To make it more convenient, here's a patch that adds proargnames to its pg_proc entries so that it can be seen with a quick \df in psql. Added to the 2023-07 commitfest: https://commitfest.postgresql.org/43/4275/ - ilmari +1 for adding the argument names. The patch needs a rebase though.. it no longer applies : $ git apply ~/Downloads/0001-Add-argument-names-to-multi-argument-aggregates.patch error: patch failed: src/include/catalog/pg_proc.dat:8899 error: src/include/catalog/pg_proc.dat: patch does not apply Jim
Re: Set arbitrary GUC options during initdb
On 27.01.23 15:48, Peter Eisentraut wrote: Btw., something that I have had in my notes for a while, but with this it would now be officially exposed: Not all options can be safely set during bootstrap. For example, initdb -D data -c track_commit_timestamp=on will fail an assertion. This might be an exception, or there might be others. I ran a test across all changeable boolean parameters with initdb setting it to the opposite of their default. The only one besides track_commit_timestamp that caused initdb to not complete was default_transaction_read_only, which is to be expected. We should fix track_commit_timestamp, but it doesn't look like there is wider impact. (Obviously, this tested only boolean settings. If someone wants to fuzz-test the others ...)
Re: User functions for building SCRAM secrets
> On 14 Apr 2023, at 05:50, Michael Paquier wrote: > > On Fri, Apr 14, 2023 at 01:27:46AM +0200, Daniel Gustafsson wrote: >> What would be the intended usecase? I don’t have the RFC handy, does >> it say anything about salt length? > > Hmm. I thought it did, but RFC 5802 has only these two paragraphs: > > If the authentication information is stolen from the authentication > database, then an offline dictionary or brute-force attack can be > used to recover the user's password. The use of salt mitigates this > attack somewhat by requiring a separate attack on each password. > Authentication mechanisms that protect against this attack are > available (e.g., the EKE class of mechanisms). RFC 2945 [RFC2945] is > an example of such technology. The WG elected not to use EKE like > mechanisms as a basis for SCRAM. > > If an attacker obtains the authentication information from the > authentication repository and either eavesdrops on one authentication > exchange or impersonates a server, the attacker gains the ability to > impersonate that user to all servers providing SCRAM access using the > same hash function, password, iteration count, and salt. For this > reason, it is important to use randomly generated salt values. The salt needs to be unique, unpredictable and shall not repeat across password generation. The current 16 byte salted with pg_strong_random should provide that and I'm not sure I see a usecase for allowing users to configure that. The iteration count has a direct effect with the security/speed tradeoff but changing the salt can basically only lead to lowering the security while not gaining efficiency, or am I missing something? -- Daniel Gustafsson
Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert
> On 14 Apr 2023, at 01:27, Tom Lane wrote: > > Daniel Gustafsson writes: >> Good points, it should of course be SOCK_ERRNO. The attached saves off errno >> and reinstates it to avoid clobbering. Will test it on Windows in the >> morning >> as well. > > I think instead of this: > > +SOCK_ERRNO_SET(save_errno); > > you could just do this: > > libpq_append_conn_error(conn, "SSL SYSCALL error: %s", > - SOCK_STRERROR(SOCK_ERRNO, sebuf, > sizeof(sebuf))); > + SOCK_STRERROR(save_errno, sebuf, > sizeof(sebuf))); > > Although ... we're already assuming that SSL_get_error and ERR_get_error > don't clobber errno. Maybe SSL_get_verify_result doesn't either. > Or we could make it look like this: > > +SOCK_ERRNO_SET(0); > ERR_clear_error(); > r = SSL_connect(conn->ssl); > if (r <= 0) > + intsave_errno = SOCK_ERRNO; >interr = SSL_get_error(conn->ssl, r); >unsigned long ecode; > >... > > - SOCK_STRERROR(SOCK_ERRNO, sebuf, > sizeof(sebuf))); > + SOCK_STRERROR(save_errno, sebuf, > sizeof(sebuf))); > > to remove all doubt. I mainly put save_errno back into SOCK_ERRNO for greppability, I don't have any strong opinions either way so I went with the latter suggestion. Attached v3 does the above change and passes the tests both with a broken and working system CA pool. Unless objections from those with failing local envs I propose this is pushed to close the open item. -- Daniel Gustafsson libpq_system_ca_fix_v3.diff Description: Binary data
Re: Fix documentation for max_wal_size and min_wal_size
At Thu, 13 Apr 2023 12:01:04 -0700, sirisha chamarthi wrote in > The documentation [1] says max_wal_size and min_wal_size defaults are 1GB > and 80 MB respectively. However, these are configured based on the > wal_segment_size and documentation is not clear about it. Attached a patch > to fix the documentation. > > [1] https://www.postgresql.org/docs/devel/runtime-config-wal.html Good catch! Now wal_segment_size is easily changed. -The default is 1 GB. +The default value is configured to maximum of 64 times the wal_segment_size or 1 GB. -The default is 80 MB. +The default value is configured to maximum of 5 times the wal_segment_size or 80 MB. However, I believe that most users don't change the WAL segment size, so the primary information is that the default sizes are 1GB and 80MB. So, I personally think it should be written like this: "The default size is 80MB. However, if you have changed the WAL segment size from the default of 16MB, it will be five times the segment size.", but I'm not sure what the others think about this.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Tab completion for AT TIME ZONE
Hi, Is this supposed to provide tab completion for the AT TIME ZONE operator like in this query? SELECT '2023-04-14 08:00:00' AT TIME ZONE 'Europe/Lisbon'; The patch applied cleanly but I'm afraid I cannot reproduce the intended behaviour: postgres=# SELECT '2023-04-14 08:00:00' AT postgres=# SELECT '2023-04-14 08:00:00' AT T postgres=# SELECT '2023-04-14 08:00:00' AT TIME Z Perhaps I'm testing it in the wrong place? Best, Jim On 12.04.23 19:53, Dagfinn Ilmari Mannsåker wrote: Dagfinn Ilmari Mannsåker writes: Hi hackers, A while back we added support for completing time zone names after SET TIMEZONE, but we failed to do the same for the AT TIME ZONE operator. Here's a trivial patch for that. Added to the 2023-07 commitfest: https://commitfest.postgresql.org/43/4274/ - ilmari smime.p7s Description: S/MIME Cryptographic Signature
Re: Support logical replication of DDLs
On Fri, 7 Apr 2023 at 08:52, houzj.f...@fujitsu.com wrote: > > On Friday, April 7, 2023 11:13 AM houzj.f...@fujitsu.com > > > > > On Tuesday, April 4, 2023 7:35 PM shveta malik > > wrote: > > > > > > On Tue, Apr 4, 2023 at 8:43 AM houzj.f...@fujitsu.com > > > wrote: > > > > > > > Attach the new version patch set which did the following changes: > > > > > > > > > > Hello, > > > > > > I tried below: > > > pubnew=# ALTER PUBLICATION mypub2 SET (ddl = 'table'); ALTER > > > PUBLICATION > > > > > > pubnew=# \dRp+ > > > Publication mypub2 Owner | > > > All tables > > > | All DDLs | Table DDLs | > > > ++--++- > > > shveta | t | f | f > > > (1 row) > > > > > > I still see 'Table DDLs' as false and ddl replication did not work for > > > this case. > > > > Thanks for reporting. > > > > Attach the new version patch which include the following changes: > > * Fix the above bug for ALTER PUBLICATION SET. > > * Modify the corresponding event trigger when user execute ALTER > > PUBLICATION SET to change the ddl option. > > * Fix a miss in pg_dump's code which causes CFbot failure. > > * Rebase the patch due to recent commit 4826759. > > Sorry, there was a miss when rebasing the patch which could cause the > CFbot to fail and here is the correct patch set. Few comments: 1) I felt is_present_flag variable can be removed by moving "object_name = append_object_to_format_string(tree, sub_fmt);" inside the if condition: +static void +append_bool_object(ObjTree *tree, char *sub_fmt, bool value) +{ + ObjElem*param; + char *object_name = sub_fmt; + boolis_present_flag = false; + + Assert(sub_fmt); + + /* +* Check if the format string is 'present' and if yes, store the boolean +* value +*/ + if (strcmp(sub_fmt, "present") == 0) + { + is_present_flag = true; + tree->present = value; + } + + if (!is_present_flag) + object_name = append_object_to_format_string(tree, sub_fmt); + + param = new_object(ObjTypeBool, object_name); + param->value.boolean = value; + append_premade_object(tree, param); +} By changing it to something like below: +static void +append_bool_object(ObjTree *tree, char *sub_fmt, bool value) +{ + ObjElem*param; + char *object_name = sub_fmt; + + Assert(sub_fmt); + + /* +* Check if the format string is 'present' and if yes, store the boolean +* value +*/ + if (strcmp(sub_fmt, "present") == 0) + { + tree->present = value; + object_name = append_object_to_format_string(tree, sub_fmt); + } + + param = new_object(ObjTypeBool, object_name); + param->value.boolean = value; + append_premade_object(tree, param); +} 2) We could remove the temporary variable tmp_str here: + if (start_ptr != NULL && end_ptr != NULL) + { + length = end_ptr - start_ptr - 1; + tmp_str = (char *) palloc(length + 1); + strncpy(tmp_str, start_ptr + 1, length); + tmp_str[length] = '\0'; + appendStringInfoString(_name, tmp_str); + pfree(tmp_str); + } by changing to: + if (start_ptr != NULL && end_ptr != NULL) + appendBinaryStringInfo(_name, start_ptr + 1, end_ptr - start_ptr - 1); 3) I did not see the usage of ObjTypeFloat type used anywhere, we could remove it: +typedef enum +{ + ObjTypeNull, + ObjTypeBool, + ObjTypeString, + ObjTypeArray, + ObjTypeInteger, + ObjTypeFloat, + ObjTypeObject +} ObjType; 4) I noticed that none of the file names in src/backend/commands uses "_" in the filenames, but in case of ddl_deparse.c and ddl_json.c we have used "_", it might be better to be consistent with other filenames in this directory: diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile index 48f7348f91..171dfb2800 100644 --- a/src/backend/commands/Makefile +++ b/src/backend/commands/Makefile @@ -29,6 +29,8 @@ OBJS = \ copyto.o \ createas.o \ dbcommands.o \ + ddl_deparse.o \ + ddl_json.o \ define.o \ discard.o \ dropcmds.o \ 5) The following includes are no more required in ddl_deparse.c as we have removed support for deparsing of other objects: #include "catalog/pg_am.h" #include "catalog/pg_aggregate.h" #include "catalog/pg_authid.h" #include "catalog/pg_cast.h" #include "catalog/pg_conversion.h" #include "catalog/pg_depend.h" #include "catalog/pg_extension.h" #include "catalog/pg_foreign_data_wrapper.h" #include "catalog/pg_foreign_server.h" #include "catalog/pg_language.h" #include "catalog/pg_largeobject.h" #include "catalog/pg_opclass.h" #include "catalog/pg_operator.h" #include "catalog/pg_opfamily.h"
Re: psql: Add role's membership options to the \du+ command
Sorry for joining in late.. At Thu, 13 Apr 2023 15:44:20 +0300, Pavel Luzanov wrote in > After playing with the \du command, I found that we can't avoid > translation. > All attributes are translatable. Also, two of nine attributes shows in > new line separated format (connection limit and password valid until). Going a bit off-topic here, but I'd like the "infinity" to be translatable... As David-G appears to express concern in upthread, I don't think a direct Japanese translation from "rolename from rolename" works well, as the "from" needs accompanying verb. I, as a Japanese speaker, I prefer a more non-sentence-like notation, similar to David's suggestion but with slight differences: "pg_read_all_stats (grantor: postgres, inherit, set)" This is easily translated into Japanese. "pg_read_all_stats (付与者: postgres、継承、設定)" Come to think of this, I recalled a past discussion in which we concluded that translating punctuation marks appearing between a variable number of items within list expressions should be avoided... Thus, I'd like to propose to use an ACL-like notation, which doesn't need translation. ..| Mamber of | ..| pg_read_server_files=ais/horiguti,pg_execute_server_program=/postgres | If we'd like, but not likely, we might want to provide a parallel function to aclexplode for this notation. =# select memberofexplode('pg_read_server_files=ais/horiguti,pg_execute_server_program=/postgres'); privilege | grantor | admin | inherit | set ---+--+---+-+--- pg_read_server_files | horiguti | true | true| true pg_execute_server_programs | postgres | false | false | false regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode
On 07.04.23 02:52, David Rowley wrote: On Fri, 7 Apr 2023 at 09:44, Melanie Plageman wrote: Otherwise, LGTM. Thanks for looking. I've also taken Justin's comments about the README into account and fixed that part. I've pushed the patch after a little more adjusting. I added some text to the docs that mention larger VACUUM_BUFFER_LIMITs can speed up vacuum and also a reason why they might not want to go nuts with it. I came across these new options and had a little bit of trouble figuring them out from the documentation. Maybe this could be polished a bit. vacuumdb --help says --buffer-usage-limit=BUFSIZE I can guess what a "SIZE" might be, but is "BUFSIZE" different from a "SIZE"? Maybe simplify here. On the vacuumdb man page, the placeholder is buffer_usage_limit which is yet another way of phrasing it. Maybe also use "size" here? The VACUUM man page says BUFFER_USAGE_LIMIT [ string ] which had me really confused. The detailed description later doesn't give any further explanation of possible values, except that 0 is apparently a possible value, which in my mind is not a string. Then there is a link to guc-vacuum-buffer-usage-limit, which lifts the mystery that this is really just an integer setting with possible memory-size units, but it was really hard to figure that out from the start! Moreover, on the VACUUM man page, right below BUFFER_USAGE_LIMIT, it explains the different kinds of accepted values, and "string" wasn't added there. Maybe also change this to "size" here and add an explanation there what kinds of sizes are possible. Finally, the locations of the new options in the various documentation places seems a bit random. The vacuumdb --help output and the man page appear to be mostly alphabetical, so --buffer-usage-limit should be after -a/--all. (Also note that right now the option isn't even in the same place in the --help output versus the man page.) The order of the options on the VACUUM man page doesn't make any sense anymore. This isn't really the fault of this patch, but maybe it's time to do a fresh reordering there.
Re: Partial aggregates pushdown
On Thu, Apr 13, 2023 at 10:56:26AM +, fujii.y...@df.mitsubishielectric.co.jp wrote: > > Yes, good. Did we never push down aggregates before? I thought we > > pushed down partitionwise aggregates already, and such a check should > > already be done. If the check isn't there, it should be. > Yes. The last version of this patch(and original postgres_fdw) checks if > user-defined aggregate depends some extension which is contained in > 'extensions'. > But, in the last version of this patch, there is no such check for > aggpartialfn of user-defined aggregate. So, I will add such check to this > patch. > I think that this modification is easy to do . Good, so our existing code is correct and the patch just needs adjustment. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Re: [PoC] pg_upgrade: allow to upgrade publisher node
Hi, Sorry for the delay, I didn't had time to come back to it until this afternoon. On Mon, Apr 10, 2023 at 09:18:46AM +, Hayato Kuroda (Fujitsu) wrote: > > I have analyzed about the point but it seemed to be difficult. This is because > some additional records like followings may be inserted. PSA the script which > is > used for testing. Note that "double CHECKPOINT_SHUTDOWN" issue might be wrong, > so I wanted to withdraw it once. Sorry for noise. > > * HEAP/HEAP2 records. These records may be inserted by checkpointer. > > IIUC, if there are tuples which have not been flushed yet when shutdown is > requested, > the checkpointer writes back all of them into heap file. At that time many WAL > records are generated. I think we cannot predict the number of records > beforehand. > > * INVALIDATION(S) records. These records may be inserted by VACUUM. > > There is a possibility that autovacuum runs and generate WAL records. I think > we > cannot predict the number of records beforehand because it depends on the > number > of objects. > > * RUNNING_XACTS record > > It might be a timing issue, but I found that sometimes background writer > generated > a XLOG_RUNNING record. According to the function BackgroundWriterMain(), it > will be > generated when the process spends 15 seconds since last logging and there are > important records. I think it is difficult to predict whether this will be > appeared or not. I don't think that your analysis is correct. Slots are guaranteed to be stopped after all the normal backends have been stopped, exactly to avoid such extraneous records. What is happening here is that the slot's confirmed_flush_lsn is properly updated in memory and ends up being the same as the current LSN before the shutdown. But as it's a logical slot and those records aren't decoded, the slot isn't marked as dirty and therefore isn't saved to disk. You don't see that behavior when doing a manual checkpoint before (per your script comment), as in that case the checkpoint also tries to save the slot to disk but then finds a slot that was marked as dirty and therefore saves it. In your script's scenario, when you restart the server the previous slot data is restored and the confirmed_flush_lsn goes backward, which explains those extraneous records. It's probably totally harmless to throw away that value for now (and probably also doesn't lead to crazy amount of work after restart, I really don't know much about the logical slot code), but clearly becomes problematic with your usecase. One easy way to fix this is to teach the checkpoint code to force saving the logical slots to disk even if they're not marked as dirty during a shutdown checkpoint, as done in the attached v1 patch (renamed as .txt to not interfere with the cfbot). With this patch applied I reliably only see a final shutdown checkpoint record with your scenario. Now such a change will make shutdown a bit more expensive when using logical replication, even if in 99% of cases you will not need to save the confirmed_flush_lsn value, so I don't know if that's acceptable or not. >From 77c3d2d361893de857627e036d0eaaf01cfe91c1 Mon Sep 17 00:00:00 2001 From: Julien Rouhaud Date: Fri, 14 Apr 2023 13:49:09 +0800 Subject: [PATCH v1] Always persist to disk logical slots during a shutdown checkpoint. It's entirely possible for a logical slot to have a confirmed_flush_lsn higher than the last value saved on disk while not being marked as dirty. It's currently not a problem to lose that value during a clean shutdown / restart cycle, but a later patch adding support for pg_upgrade of publications and logical slots will rely on that value being properly persisted to disk. Author: Julien Rouhaud Reviewed-by: FIXME Discussion: FIXME --- src/backend/access/transam/xlog.c | 2 +- src/backend/replication/slot.c| 26 -- src/include/replication/slot.h| 2 +- 3 files changed, 18 insertions(+), 12 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index b540ee293b..8100ca656e 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7011,7 +7011,7 @@ static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags) { CheckPointRelationMap(); - CheckPointReplicationSlots(); + CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN); CheckPointSnapBuild(); CheckPointLogicalRewriteHeap(); CheckPointReplicationOrigin(); diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 8021aaa0a8..2bbf2af770 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot); /* internal persistency functions */ static void RestoreSlotFromDisk(const char *name); static void CreateSlotOnDisk(ReplicationSlot *slot); -static void SaveSlotToPath(ReplicationSlot *slot, const char