Re: [bug?] Missed parallel safety checks, and wrong parallel safety
On Fri, Jul 2, 2021 at 8:16 PM Robert Haas wrote: > > On Wed, Jun 30, 2021 at 11:46 PM Greg Nancarrow wrote: > > I personally think "(b) provide an option to the user to specify > > whether inserts can be parallelized on a relation" is the preferable > > option. > > There seems to be too many issues with the alternative of trying to > > determine the parallel-safety of a partitioned table automatically. > > I think (b) is the simplest and most consistent approach, working the > > same way for all table types, and without the overhead of (a). > > Also, I don't think (b) is difficult for the user. At worst, the user > > can use the provided utility-functions at development-time to verify > > the intended declared table parallel-safety. > > I can't really see some mixture of (a) and (b) being acceptable. > > Yeah, I'd like to have it be automatic, but I don't have a clear idea > how to make that work nicely. It's possible somebody (Tom?) can > suggest something that I'm overlooking, though. In general, for the non-partitioned table, where we don't have much overhead of checking the parallel safety and invalidation is also not a big problem so I am tempted to provide an automatic parallel safety check. This would enable parallelism for more cases wherever it is suitable without user intervention. OTOH, I understand that providing automatic checking might be very costly if the number of partitions is more. Can't we provide some mid-way where the parallelism is enabled by default for the normal table but for the partitioned table it is disabled by default and the user has to set it safe for enabling parallelism? I agree that such behavior might sound a bit hackish. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Using COPY FREEZE in pgbench
Hello Tatsuo-san, So overall gain by the patch is around 15%, whereas the last test before the commit was 14%. It seems the patch is still beneficial after the commit. Yes, that's good! I had a quick look again, and about the comment: /* * If partitioning is not enabled and server version is 14.0 or later, we * can take account of freeze option of copy. */ I'd suggest instead the shorter: /* use COPY with FREEZE on v14 and later without partioning */ Or maybe even to fully drop the comment, because the code is clear enough and the doc already says it. -- Fabien.
Re: A qsort template
On Fri, Jul 2, 2021 at 2:32 PM John Naylor wrote: > I suspect if we experiment on two extremes of type "heaviness" (accessing and > comparing trivial or not), such as uint32 and tuplesort, we'll have a pretty > good idea what the parameters should be, if anything different. I'll do some > testing along those lines. Cool. Since you are experimenting with tuplesort and likely thinking similar thoughts, here's a patch I've been using to explore that area. I've seen it get, for example, ~1.18x speedup for simple index builds in favourable winds (YMMV, early hacking results only). Currently, it kicks in when the leading column is of type int4, int8, timestamp, timestamptz, date or text + friends (when abbreviatable, currently that means "C" and ICU collations only), while increasing the executable by only 8.5kB (Clang, amd64, -O2, no debug). These types are handled with just three specialisations. Their custom "fast" comparators all boiled down to comparisons of datum bits, varying only in signedness and width, so I tried throwing them away and using 3 new common routines. Then I extended tuplesort_sort_memtuples()'s pre-existing specialisation dispatch to recognise qualifying users of those and select 3 corresponding sort specialisations. It might turn out to be worth burning some more executable size on extra variants (for example, see XXX notes in the code comments for opportunities; one could also go nuts trying smaller things like special cases for not-null, nulls first, reverse sort, ... to kill all those branches), or not. From 62a8acbc745f3b4a90c9a14b6b61989a9d83bece Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sat, 3 Jul 2021 19:02:10 +1200 Subject: [PATCH] WIP: Accelerate tuple sorting for common types. Several data types have their own fast comparator functions that can be replaced with common binary or signed comparator functions. These functions can then be recognized by tuplesort.c and used to dispatch to corresponding specialized sort functions, to accelerate sorting. XXX WIP, experiment grade only, many open questions... --- src/backend/access/nbtree/nbtcompare.c | 22 ++-- src/backend/utils/adt/date.c | 15 +-- src/backend/utils/adt/timestamp.c | 11 ++ src/backend/utils/adt/varlena.c| 26 +--- src/backend/utils/sort/tuplesort.c | 161 - src/include/utils/sortsupport.h| 115 ++ 6 files changed, 295 insertions(+), 55 deletions(-) diff --git a/src/backend/access/nbtree/nbtcompare.c b/src/backend/access/nbtree/nbtcompare.c index 7ac73cb8c2..204cf778fb 100644 --- a/src/backend/access/nbtree/nbtcompare.c +++ b/src/backend/access/nbtree/nbtcompare.c @@ -119,26 +119,12 @@ btint4cmp(PG_FUNCTION_ARGS) PG_RETURN_INT32(A_LESS_THAN_B); } -static int -btint4fastcmp(Datum x, Datum y, SortSupport ssup) -{ - int32 a = DatumGetInt32(x); - int32 b = DatumGetInt32(y); - - if (a > b) - return A_GREATER_THAN_B; - else if (a == b) - return 0; - else - return A_LESS_THAN_B; -} - Datum btint4sortsupport(PG_FUNCTION_ARGS) { SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0); - ssup->comparator = btint4fastcmp; + ssup->comparator = ssup_datum_int32_cmp; PG_RETURN_VOID(); } @@ -156,6 +142,7 @@ btint8cmp(PG_FUNCTION_ARGS) PG_RETURN_INT32(A_LESS_THAN_B); } +#ifndef USE_FLOAT8_BYVAL static int btint8fastcmp(Datum x, Datum y, SortSupport ssup) { @@ -169,13 +156,18 @@ btint8fastcmp(Datum x, Datum y, SortSupport ssup) else return A_LESS_THAN_B; } +#endif Datum btint8sortsupport(PG_FUNCTION_ARGS) { SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0); +#ifdef USE_FLOAT8_BYVAL + ssup->comparator = ssup_datum_signed_cmp; +#else ssup->comparator = btint8fastcmp; +#endif PG_RETURN_VOID(); } diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c index 0bea16cb67..350c662d50 100644 --- a/src/backend/utils/adt/date.c +++ b/src/backend/utils/adt/date.c @@ -438,25 +438,12 @@ date_cmp(PG_FUNCTION_ARGS) PG_RETURN_INT32(0); } -static int -date_fastcmp(Datum x, Datum y, SortSupport ssup) -{ - DateADT a = DatumGetDateADT(x); - DateADT b = DatumGetDateADT(y); - - if (a < b) - return -1; - else if (a > b) - return 1; - return 0; -} - Datum date_sortsupport(PG_FUNCTION_ARGS) { SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0); - ssup->comparator = date_fastcmp; + ssup->comparator = ssup_datum_int32_cmp; PG_RETURN_VOID(); } diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c index 79761f809c..c678517db6 100644 --- a/src/backend/utils/adt/timestamp.c +++ b/src/backend/utils/adt/timestamp.c @@ -37,6 +37,7 @@ #include "utils/datetime.h"
Re: logical replication worker accesses catalogs in error context callback
On Sat, Jul 3, 2021 at 10:03 PM Tom Lane wrote: > > Bharath Rupireddy writes: > > The patch basically looks good to me except a minor comment to have a > > local variable for var->varattno which makes the code shorter. > > Here's a restructured version that uses rangetable data for the > simple-relation case too. I also modified the relevant test cases > so that it's visible that we're reporting aliases not true names. How about making the below else if statement and the attname assignment into a single line? They are falling below the 80 char limit. else if (colno > 0 && colno <= list_length(rte->eref->colnames)) attname = strVal(list_nth(rte->eref->colnames, colno - 1)); Otherwise the v8 patch looks good to me. Regards, Bharath Rupireddy.
Re: Re[3]: On login trigger: take three
On Thu, Jun 3, 2021 at 8:36 AM Greg Nancarrow wrote: > > On Fri, May 21, 2021 at 2:46 PM Greg Nancarrow wrote: > > > > On Thu, May 20, 2021 at 2:45 PM Ivan Panchenko wrote: > > > > > > I have upgraded the patch for the 14th version. > > > > > > > I have some feedback on the patch: > > > > I've attached an updated version of the patch. > I've applied my review comments and done a bit more tidying up of > documentation and comments. > Also, I found that the previously-posted patch was broken by > snapshot-handling changes in commit 84f5c290 (with patch applied, > resulting in a coredump during regression tests) so I've added a fix > for that too. CFBot shows the following failure: # poll_query_until timed out executing this query: # SELECT '0/3046250' <= replay_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name = 'standby_1'; # expecting this output: # t # last actual query output: # t # with stderr: # NOTICE: You are welcome! # Looks like your test exited with 29 before it could output anything. t/001_stream_rep.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00) Regards, Vignesh
Re: Disable WAL logging to speed up data loading
On Wed, Apr 7, 2021 at 12:13 PM osumi.takami...@fujitsu.com wrote: > > Hi > > > Mainly affected by a commit 9de9294, > I've fixed minor things to rebase the patch. > All modifications I did are cosmetic changes and > a little bit of documentation updates. > Please have a look at attached v09. > Patch is not applying on Head, kindly post a rebased version. As this patch is in Ready for Committer state, it will help one of the committers to pick up this patch during commit fest. Regards, Vignesh
Re: Using COPY FREEZE in pgbench
After this commit: https://git.postgresql.org/pg/commitdiff/8e03eb92e9ad54e2f1ed8b5a73617601f6262f81 I was worried about that the benefit of COPY FREEZE patch is somewhat reduced or gone. So I ran a pgbench test again. Current master: $ pgbench -i -s 100 test : : done in 20.23 s (drop tables 0.00 s, create tables 0.02 s, client-side generate 13.54 s, vacuum 2.34 s, primary keys 4.33 s). With v5 patch: done in 16.92 s (drop tables 0.21 s, create tables 0.01 s, client-side generate 12.68 s, vacuum 0.24 s, primary keys 3.77 s). So overall gain by the patch is around 15%, whereas the last test before the commit was 14%. It seems the patch is still beneficial after the commit. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: pgbench using COPY FREEZE
Hi Simon, >> Hello Simon, >> >> Indeed. >> >> There is already a "ready" patch in the queue, see: >> >> https://commitfest.postgresql.org/33/3034/ > > Ah, my bad. I withdraw this patch, apologies Tatsuo-san. No problem at all. Thank you for looking into the issue. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Removing redundant check for transaction in progress in check_safe_enum_use
Hi, I was looking at : Relax transactional restrictions on ALTER TYPE ... ADD VALUE (redux). In check_safe_enum_use(): + if (!TransactionIdIsInProgress(xmin) && + TransactionIdDidCommit(xmin)) + return; Since the condition would be true only when TransactionIdDidCommit() returns true, I think the call to TransactionIdIsInProgress is not needed. If transaction for xmin is committed, the transaction cannot be in progress at the same time. Please see the simple patch for removing the redundant check. Thanks txn-in-progress-enum.patch Description: Binary data
Re: Reducing the cycle time for CLOBBER_CACHE_ALWAYS buildfarm members
I wrote: > Andrew Dunstan writes: >> Seems reasonable. I don't have a CCA animal any more, but I guess I >> could set up a test. > I can run a test here --- I'll commandeer sifaka for awhile, > since that's the fastest animal I have. Done, and here's the results: Traditional way (-DCLOBBER_CACHE_ALWAYS): 32:53:24 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sifaka=2021-07-01%2018%3A06%3A09 New way: 16:15:43 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sifaka=2021-07-03%2004%3A02%3A16 That's running sifaka's full test schedule including TAP tests, which ordinarily takes it a shade under 10 minutes. The savings on a non-TAP run would be a lot less, of course, thanks to fewer initdb invocations. Although I lacked the patience to run a full back-branch cycle with -DCLOBBER_CACHE_ALWAYS, I did verify that the patch correctly injects that #define when running an old branch. So I think it's ready to go into the buildfarm, modulo any cosmetic work you might want to do. regards, tom lane
Re: Preventing abort() and exit() calls in libpq
Noah Misch writes: > On Sat, Jul 03, 2021 at 10:45:59AM -0400, Tom Lane wrote: >> What I'm now thinking about is restricting the test to only be run on >> platforms where use of foo.a libraries is deprecated, so that we can >> be pretty sure that we won't hit this situation. Even if we only >> run the test on Linux, that'd be plenty to catch any mistakes. > Hmm. Static libraries are the rarer case on both AIX and Linux, but I'm not > aware of a relevant deprecation on either platform. If it comes this to, I'd > be more inclined to control the Makefile rule with an environment variable > (e.g. ENFORCE_LIBC_CALL_RESTRICTIONS) instead of reacting to the platform. That'd require buildfarm owner intervention, as well as intervention by users. Which seems like exporting our problems onto them. I'd really rather not go that way if we can avoid it. regards, tom lane
Re: Preventing abort() and exit() calls in libpq
On Sat, Jul 03, 2021 at 10:45:59AM -0400, Tom Lane wrote: > I'd imagined leaving, e.g., psprintf.c out of libpgcommon_shlib.a. > But if someone mistakenly introduced a psprintf call into libpq, > it'd still compile just fine; the symbol would be resolved against > psprintf in the calling application's code. I think that would fail to compile on Windows, where such references need exported symbols. We don't make an exports file for applications other than postgres.exe. So the strategy that inspired this may work. > What I'm now thinking about is restricting the test to only be run on > platforms where use of foo.a libraries is deprecated, so that we can > be pretty sure that we won't hit this situation. Even if we only > run the test on Linux, that'd be plenty to catch any mistakes. Hmm. Static libraries are the rarer case on both AIX and Linux, but I'm not aware of a relevant deprecation on either platform. If it comes this to, I'd be more inclined to control the Makefile rule with an environment variable (e.g. ENFORCE_LIBC_CALL_RESTRICTIONS) instead of reacting to the platform.
Re: Unbounded %s in sscanf
> On 28 Jun 2021, at 16:45, Daniel Gustafsson wrote: > >> On 28 Jun 2021, at 16:02, Tom Lane wrote: > >> Ugh. Shouldn't we instead modify the format to read not more than >> two characters? Even if this is safe on non-malicious input, it >> doesn't seem like good style. > > No disagreement, I was only basing it on what is in the tree. I would propose > that we change the sscanf in _LoadBlobs() too though to eliminate all such > callsites, even though that one is even safer. I'll prepare a patch once more > caffeine has been ingested. Returning to this, attached is a patchset which amends the two sscanf() callsites with their respective buffersizes for %s format parsing. In pg_dump we need to inject the MAXPGPATH limit with the preprocessor and thus the buffer needs to be increased by one to account for the terminator (which is yet more hygiene coding since the fname buffer is now larger than the input buffer). While in here, I noticed that the fname variable is shadowed in the loop parsing the blobs TOC which yields a broken error message on parse errors. The attached 0003 fixes that. -- Daniel Gustafsson https://vmware.com/ v3-0003-Fix-bug-in-TOC-file-error-message-printing.patch Description: Binary data v3-0002-Fix-sscanf-limit-in-pg_dump.patch Description: Binary data v3-0001-Fix-sscanf-limit-in-pg_basebackup.patch Description: Binary data
Re: Synchronous commit behavior during network outage
On Sat, 2021-07-03 at 14:06 +0500, Andrey Borodin wrote: > > But until you've disabled sync rep, the primary will essentially be > > down for writes whether using this new feature or not. Even if you > > can > > terminate some backends to try to free space, the application will > > just > > make new connections that will get stuck the same way. > > Surely I'm talking about terminating postmaster, not individual > backends. But postmaster will need to terminate each running query. > We surely need to have a way to stop whole instance without making > any single query. And I do not like kill -9 for this purpose. kill -6 would suffice. I see the point that you don't want this to interfere with an administrative shutdown. But it seems like most shutdowns will need to escalate to SIGABRT for cases where things are going badly wrong (low memory, etc.) anyway. I don't see a better solution here. I don't fully understand why you'd be concerned about cancellation but not concerned about similar problems with termination, but if you think two GUCs are important I can do that. Regards, Jeff Davis
Re: cutting down the TODO list thread
On Thu, Jul 1, 2021 at 9:23 PM Bruce Momjian wrote: > Agreed. Please remove them or I can do it. Done, and also changed next release to "15". -- John Naylor EDB: http://www.enterprisedb.com
Re: logical replication worker accesses catalogs in error context callback
Bharath Rupireddy writes: > The patch basically looks good to me except a minor comment to have a > local variable for var->varattno which makes the code shorter. Here's a restructured version that uses rangetable data for the simple-relation case too. I also modified the relevant test cases so that it's visible that we're reporting aliases not true names. regards, tom lane diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 31b5de91ad..25112df916 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -4096,15 +4096,17 @@ DROP TABLE reind_fdw_parent; -- conversion error -- === ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE int; -SELECT * FROM ft1 WHERE c1 = 1; -- ERROR +SELECT * FROM ft1 ftx(x1,x2,x3,x4,x5,x6,x7,x8) WHERE x1 = 1; -- ERROR ERROR: invalid input syntax for type integer: "foo" -CONTEXT: column "c8" of foreign table "ft1" -SELECT ft1.c1, ft2.c2, ft1.c8 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = 1; -- ERROR +CONTEXT: column "x8" of foreign table "ftx" +SELECT ftx.x1, ft2.c2, ftx.x8 FROM ft1 ftx(x1,x2,x3,x4,x5,x6,x7,x8), ft2 + WHERE ftx.x1 = ft2.c1 AND ftx.x1 = 1; -- ERROR ERROR: invalid input syntax for type integer: "foo" -CONTEXT: column "c8" of foreign table "ft1" -SELECT ft1.c1, ft2.c2, ft1 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = 1; -- ERROR +CONTEXT: column "x8" of foreign table "ftx" +SELECT ftx.x1, ft2.c2, ftx FROM ft1 ftx(x1,x2,x3,x4,x5,x6,x7,x8), ft2 + WHERE ftx.x1 = ft2.c1 AND ftx.x1 = 1; -- ERROR ERROR: invalid input syntax for type integer: "foo" -CONTEXT: whole-row reference to foreign table "ft1" +CONTEXT: whole-row reference to foreign table "ftx" SELECT sum(c2), array_agg(c8) FROM ft1 GROUP BY c8; -- ERROR ERROR: invalid input syntax for type integer: "foo" CONTEXT: processing expression at position 2 in select list diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index fafbab6b02..696277ba10 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -302,16 +302,8 @@ typedef struct */ typedef struct ConversionLocation { - Relation rel; /* foreign table's relcache entry. */ AttrNumber cur_attno; /* attribute number being processed, or 0 */ - - /* - * In case of foreign join push down, fdw_scan_tlist is used to identify - * the Var node corresponding to the error location and - * fsstate->ss.ps.state gives access to the RTEs of corresponding relation - * to get the relation name and attribute name. - */ - ForeignScanState *fsstate; + ForeignScanState *fsstate; /* plan node being processed */ } ConversionLocation; /* Callback argument for ec_member_matches_foreign */ @@ -7082,7 +7074,6 @@ make_tuple_from_result_row(PGresult *res, /* * Set up and install callback to report where conversion error occurs. */ - errpos.rel = rel; errpos.cur_attno = 0; errpos.fsstate = fsstate; errcallback.callback = conversion_error_callback; @@ -7185,34 +7176,32 @@ make_tuple_from_result_row(PGresult *res, /* * Callback function which is called when error occurs during column value * conversion. Print names of column and relation. + * + * Note that this function mustn't do any catalog lookups, since we are in + * an already-failed transaction. Fortunately, we can get the needed info + * from the query's rangetable instead. */ static void conversion_error_callback(void *arg) { + ConversionLocation *errpos = (ConversionLocation *) arg; + ForeignScanState *fsstate = errpos->fsstate; + ForeignScan *fsplan = castNode(ForeignScan, fsstate->ss.ps.plan); + int varno = 0; + AttrNumber colno = 0; const char *attname = NULL; const char *relname = NULL; bool is_wholerow = false; - ConversionLocation *errpos = (ConversionLocation *) arg; - if (errpos->rel) + if (fsplan->scan.scanrelid > 0) { /* error occurred in a scan against a foreign table */ - TupleDesc tupdesc = RelationGetDescr(errpos->rel); - Form_pg_attribute attr = TupleDescAttr(tupdesc, errpos->cur_attno - 1); - - if (errpos->cur_attno > 0 && errpos->cur_attno <= tupdesc->natts) - attname = NameStr(attr->attname); - else if (errpos->cur_attno == SelfItemPointerAttributeNumber) - attname = "ctid"; - - relname = RelationGetRelationName(errpos->rel); + varno = fsplan->scan.scanrelid; + colno = errpos->cur_attno; } else { /* error occurred in a scan against a foreign join */ - ForeignScanState *fsstate = errpos->fsstate; - ForeignScan *fsplan = castNode(ForeignScan, fsstate->ss.ps.plan); - EState *estate = fsstate->ss.ps.state; TargetEntry *tle; tle = list_nth_node(TargetEntry, fsplan->fdw_scan_tlist, @@ -7220,35 +7209,42 @@ conversion_error_callback(void *arg) /* * Target list can have Vars and expressions. For Vars, we can get -
Re: logical replication worker accesses catalogs in error context callback
Bharath Rupireddy writes: > Isn't it better to have the below comment before > slot_store_error_callback, similar to what's there before > conversion_error_callback in v7-0002. > * Note that this function mustn't do any catalog lookups, since we are in > * an already-failed transaction. Not really, as there's not much temptation to do so in the current form of that function. I have no desire to go around and plaster every errcontext callback with such comments. > Maybe it's worth considering > avoid_sys_cache_lookup_in_error_callback_v2.patch from [1] as another > way to enforce the above statement. That looks fundamentally broken from here. Wouldn't it forbid perfectly OK code like this randomly-chosen example from tablecmds.c? if (list_member_oid(inheritOids, parentOid)) ereport(ERROR, (errcode(ERRCODE_DUPLICATE_TABLE), errmsg("relation \"%s\" would be inherited from more than once", get_rel_name(parentOid; That is, it's a bit hard to say exactly where in the error processing sequence we should start deeming it unsafe to do a catalog lookup; but the mere presence of an error stack entry can't mean that. In a lot of situations, there wouldn't be any need to consider the transaction broken until we've done a longjmp, so that catalog lookups in errcontext callbacks would be perfectly safe. (Which indeed is why we've not yet seen an actual problem in either of the spots discussed in this thread.) The reason for being paranoid about what an errcontext callback can do is that the callback cannot know what the current failure's cause is. If we're trying to report some internal error that means that the transaction really is pretty broken, then it'd be problematic to initiate new catalog accesses. regards, tom lane
Re: [PATCH] Hooks at XactCommand level
Le 01/07/2021 à 18:47, Tom Lane a écrit : > Nicolas CHAHWEKILIAN writes: >> As far as I am concerned, I am totally awaiting for this kind of feature >> exposed here, for one single reason at this time : the extension >> pg_statement_rollback will be much more valuable with the ability of >> processing "rollback to savepoint" without the need for explicit >> instruction from client side (and this patch is giving this option). > What exactly do these hooks do that isn't done as well or better > by the RegisterXactCallback and RegisterSubXactCallback mechanisms? > Perhaps we need to define some additional event types for those? > Or pass more data to the callback functions? Sorry it take me time to recall the reason of the hooks. Actually the problem is that the callbacks are not called when a statement is executed after an error so that we fall back to error: ERROR: current transaction is aborted, commands ignored until end of transaction block For example with the rollback at statement level extension: BEGIN; UPDATE tbl_rsl SET id = 'two', val = 2 WHERE id = 1; -- > will fail LOG: statement: UPDATE tbl_rsl SET id = 'two', val = 2 WHERE id = 1; ERROR: invalid input syntax for type integer: "two" LINE 1: UPDATE tbl_rsl SET id = 'two', val = 2 WHERE id = 1; ^ UPDATE tbl_rsl SET id = 'two', val = 2 WHERE id = 1; -- > will fail again LOG: statement: UPDATE tbl_rsl SET id = 'two', val = 2 WHERE id = 1; ERROR: current transaction is aborted, commands ignored until end of transaction block SELECT * FROM tbl_rsl; -- Should show records id 1 + 2 LOG: statement: SELECT * FROM tbl_rsl; ERROR: current transaction is aborted, commands ignored until end of transaction block With the exention and the hook on start_xact_command() we can continue and execute all the following statements. I have updated the patch to only keep the hook on start_xact_command(), as you've suggested the other hooks can be replaced by the use of the xact callback. The extension has also been updated for testing the feature, available here https://github.com/darold/pg_statement_rollbackv2 > I quite dislike inventing a hook that's defined as "run during > start_xact_command", because there is basically nothing that's > not ad-hoc about that function: it's internal to postgres.c > and both its responsibilities and its call sites have changed > over time. I think anyone hooking into that will be displeased > by the stability of their results. Unfortunately I had not found a better solution, but I just tried with placing the hook in function BeginCommand() in src/backend/tcop/dest.c and the extension is working as espected. Do you think it would be a better place?In this case I can update the patch. For this feature we need a hook that is executed before any command even if the transaction is in abort state to be able to inject the rollback to savepoint, maybe I'm not looking at the right place to do that. Thanks -- Gilles Darold http://www.darold.net/ diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 8cea10c901..4b9f8eeb3c 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -207,6 +207,8 @@ static void log_disconnections(int code, Datum arg); static void enable_statement_timeout(void); static void disable_statement_timeout(void); +/* Hooks for plugins to get control at end of start_xact_command() */ +XactCommandStart_hook_type start_xact_command_hook = NULL; /* * routines to obtain user input @@ -2708,6 +2710,13 @@ start_xact_command(void) !get_timeout_active(CLIENT_CONNECTION_CHECK_TIMEOUT)) enable_timeout_after(CLIENT_CONNECTION_CHECK_TIMEOUT, client_connection_check_interval); + + /* + * Now give loadable modules a chance to execute code before a transaction + * command is processed. + */ + if (start_xact_command_hook) + (*start_xact_command_hook) (); } static void diff --git a/src/include/tcop/pquery.h b/src/include/tcop/pquery.h index 2318f04ff0..540ede42fd 100644 --- a/src/include/tcop/pquery.h +++ b/src/include/tcop/pquery.h @@ -48,4 +48,8 @@ extern bool PlannedStmtRequiresSnapshot(struct PlannedStmt *pstmt); extern void EnsurePortalSnapshotExists(void); +/* Hook for plugins to get control in start_xact_command() and finish_xact_command() */ +typedef void (*XactCommandStart_hook_type) (void); +extern PGDLLIMPORT XactCommandStart_hook_type start_xact_command_hook; + #endif /* PQUERY_H */ diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 64c06cf952..f473c2dc39 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -2934,6 +2934,7 @@ XPVIV XPVMG XactCallback XactCallbackItem +XactCommandStart_hook_type XactEvent XactLockTableWaitInfo XidBoundsViolation
Re: rand48 replacement
1. PostgreSQL source uses `uint64` and `uint32`, but not `uint64_t`/`uint32_t` Indeed you are right. Attached v6 does that as well. -- Fabien.diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c index 2c2f149fb0..146b524076 100644 --- a/contrib/file_fdw/file_fdw.c +++ b/contrib/file_fdw/file_fdw.c @@ -1188,7 +1188,7 @@ file_acquire_sample_rows(Relation onerel, int elevel, * Found a suitable tuple, so save it, replacing one old tuple * at random */ -int k = (int) (targrows * sampler_random_fract(rstate.randstate)); +int k = (int) (targrows * sampler_random_fract()); Assert(k >= 0 && k < targrows); heap_freetuple(rows[k]); diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index fafbab6b02..3009861e45 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -5152,7 +5152,7 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate) if (astate->rowstoskip <= 0) { /* Choose a random reservoir element to replace. */ - pos = (int) (targrows * sampler_random_fract(astate->rstate.randstate)); + pos = (int) (targrows * sampler_random_fract(>rstate.randstate)); Assert(pos >= 0 && pos < targrows); heap_freetuple(astate->rows[pos]); } diff --git a/contrib/tsm_system_rows/tsm_system_rows.c b/contrib/tsm_system_rows/tsm_system_rows.c index 4996612902..1a46d4b143 100644 --- a/contrib/tsm_system_rows/tsm_system_rows.c +++ b/contrib/tsm_system_rows/tsm_system_rows.c @@ -69,7 +69,7 @@ static BlockNumber system_rows_nextsampleblock(SampleScanState *node, BlockNumbe static OffsetNumber system_rows_nextsampletuple(SampleScanState *node, BlockNumber blockno, OffsetNumber maxoffset); -static uint32 random_relative_prime(uint32 n, SamplerRandomState randstate); +static uint32 random_relative_prime(uint32 n, pg_prng_state *randstate); /* @@ -213,25 +213,25 @@ system_rows_nextsampleblock(SampleScanState *node, BlockNumber nblocks) if (sampler->step == 0) { /* Initialize now that we have scan descriptor */ - SamplerRandomState randstate; + pg_prng_state randstate; /* If relation is empty, there's nothing to scan */ if (nblocks == 0) return InvalidBlockNumber; /* We only need an RNG during this setup step */ - sampler_random_init_state(sampler->seed, randstate); + sampler_random_init_state(sampler->seed, ); /* Compute nblocks/firstblock/step only once per query */ sampler->nblocks = nblocks; /* Choose random starting block within the relation */ /* (Actually this is the predecessor of the first block visited) */ - sampler->firstblock = sampler_random_fract(randstate) * + sampler->firstblock = sampler_random_fract() * sampler->nblocks; /* Find relative prime as step size for linear probing */ - sampler->step = random_relative_prime(sampler->nblocks, randstate); + sampler->step = random_relative_prime(sampler->nblocks, ); } /* Reinitialize lb */ @@ -317,7 +317,7 @@ gcd(uint32 a, uint32 b) * (else return 1). */ static uint32 -random_relative_prime(uint32 n, SamplerRandomState randstate) +random_relative_prime(uint32 n, pg_prng_state *randstate) { uint32 r; diff --git a/contrib/tsm_system_time/tsm_system_time.c b/contrib/tsm_system_time/tsm_system_time.c index 788d8f9a68..36acc6c106 100644 --- a/contrib/tsm_system_time/tsm_system_time.c +++ b/contrib/tsm_system_time/tsm_system_time.c @@ -69,7 +69,7 @@ static BlockNumber system_time_nextsampleblock(SampleScanState *node, BlockNumbe static OffsetNumber system_time_nextsampletuple(SampleScanState *node, BlockNumber blockno, OffsetNumber maxoffset); -static uint32 random_relative_prime(uint32 n, SamplerRandomState randstate); +static uint32 random_relative_prime(uint32 n, pg_prng_state *randstate); /* @@ -224,25 +224,25 @@ system_time_nextsampleblock(SampleScanState *node, BlockNumber nblocks) if (sampler->step == 0) { /* Initialize now that we have scan descriptor */ - SamplerRandomState randstate; + pg_prng_state randstate; /* If relation is empty, there's nothing to scan */ if (nblocks == 0) return InvalidBlockNumber; /* We only need an RNG during this setup step */ - sampler_random_init_state(sampler->seed, randstate); + sampler_random_init_state(sampler->seed, ); /* Compute nblocks/firstblock/step only once per query */ sampler->nblocks = nblocks; /* Choose random starting block within the relation */ /* (Actually this is the predecessor of the first block visited) */ - sampler->firstblock = sampler_random_fract(randstate) * + sampler->firstblock = sampler_random_fract() * sampler->nblocks; /* Find relative prime as step size for linear probing */ - sampler->step = random_relative_prime(sampler->nblocks, randstate); + sampler->step = random_relative_prime(sampler->nblocks,
Re: rand48 replacement
Hello Yura, 1. PostgreSQL source uses `uint64` and `uint32`, but not `uint64_t`/`uint32_t` 2. I don't see why pg_prng_state could not be `typedef uint64 pg_prng_state[2];` It could, but I do not see that as desirable. From an API design point of view we want something clean and abstract, and for me a struct looks better for that. It would be a struct with an array insided, I think that the code is more readable by avoiding constant index accesses (s[0] vs s0), we do not need actual indexes. 3. Then SamplerRandomState and pgbench RandomState could stay. Patch will be a lot shorter. You mean "typedef pg_prng_state SamplerRandomState"? One point of the patch is to have "one" standard PRNG commonly used where one is needed, so I'd say we want the name to be used, hence the substitutions. Also, I have a thing against objects being named "Random" which are not random, which is highly misleading. A PRNG is purely deterministic. Removing misleading names is also a benefit. So If people want to keep the old name it can be done. But I see these name changes as desirable. I don't like mix of semantic refactoring and syntactic refactoring in the same patch. While I could agree with replacing `SamplerRandomState => pg_prng_state`, I'd rather see it in separate commit. And that separate commit could contain transition: `typedef uint64 pg_prng_state[2];` => `typedef struct { uint64 s0, s1 } pg_prng_state;` I would tend to agree on principle, but separating in two phases here looks pointless: why implementing a cleaner rand48 interface, which would then NOT be the rand48 standard, just to upgrade it to something else in the next commit? And the other path is as painfull and pointless. So I think that the new feature better comes with its associated refactoring which is an integral part of it. 4. There is no need in ReservoirStateData->randstate_initialized. There could be macros/function: `bool pg_prng_initiated(state) { return (state[0]|state[1]) != 0; }` It would work for this peculiar implementation but not necessary for others that we may want to substitute later, as it would mean either breaking the interface or adding a boolean in the structure if there is no special unintialized state that can be detected, which would impact memory usage and alignment. So I think it is better to keep it that way, usually the user knows whether its structure has been initialized, and the special case for reservoir where the user does not seem to know about that can handle its own boolean without impacting the common API or the data structure. 5. Is there need for 128 bit prng at all? This is a 64 bits PRNG with a 128 bit state. We are generating 64 bits values, so we want a 64 bit PRNG. A prng state must be larger than its generated value, so we need sizeof(state) > 64 bits, hence at least 128 bits if we add 128 bits memory alignement. And there is 4*32bit xoshiro128: https://prng.di.unimi.it/xoshiro128plusplus.c 32bit operations are faster on 32bit platforms. But 32bit platforms are quite rare in production this days. Therefore I don't have strong opinion on this. I think that 99.9% hardware running postgres is 64 bits, so 64 bits is the right choice. -- Fabien.
Re: Preventing abort() and exit() calls in libpq
I wrote: > Hmmm ... mine is 8.4.1. > I'm about to go out to dinner, but will check into this with some > newer gcc versions later. Tried --enable-coverage on Fedora 34 (with gcc 11.1.1) and sure enough there's an exit() call being inserted. I've pushed a fix to just disable the check altogether in --enable-coverage builds. regards, tom lane
Re: OpenSSL 3.0.0 compatibility
On 12.03.21 08:51, Peter Eisentraut wrote: On 11.03.21 11:41, Daniel Gustafsson wrote: .. and apply the padding changes as proposed in a patch upthread like this (these work for all OpenSSL versions I've tested, and I'm rather more puzzled as to why we got away with not having them in the past): Yes, before proceeding with this, we should probably understand why these changes are effective and why they haven't been required in the past. I took another look at this with openssl-3.0.0-beta1. The issue with the garbled padding output is still there. What I found is that pgcrypto has been using the encryption and decryption API slightly incorrectly. You are supposed to call EVP_DecryptUpdate() followed by EVP_DecryptFinal_ex() (and similarly for encryption), but pgcrypto doesn't do the second one. (To be fair, this API was added to OpenSSL after pgcrypto first appeared.) The "final" functions take care of the padding. We have been getting away with it like this because we do the padding manually elsewhere. But apparently, something has changed in OpenSSL 3.0.0 in that if padding is enabled in OpenSSL, EVP_DecryptUpdate() doesn't flush the last normal block until the "final" function is called. Your proposed fix was to explicitly disable padding, and then this problem goes away. You can still call the "final" functions, but they won't do anything, except check that there is no more data left, but we already check that elsewhere. Another option is to throw out our own padding code and let OpenSSL handle it. See attached demo patch. But that breaks the non-OpenSSL code in internal.c, so we'd have to re-add the padding code there. So this isn't quite as straightforward an option. (At least, with the patch we can confirm that the OpenSSL padding works consistently with our own implementation.) So I think your proposed patch is sound and a good short-term and low-risk solution. From be5882c7e7b80a6d213eee88a8960f6c9773f958 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Sat, 3 Jul 2021 15:39:30 +0200 Subject: [PATCH] Use EVP_EncryptFinal_ex() and EVP_DecryptFinal_ex() --- contrib/pgcrypto/openssl.c | 22 +-- contrib/pgcrypto/pgp-cfb.c | 4 +- contrib/pgcrypto/px.c | 119 + contrib/pgcrypto/px.h | 12 ++-- 4 files changed, 27 insertions(+), 130 deletions(-) diff --git a/contrib/pgcrypto/openssl.c b/contrib/pgcrypto/openssl.c index ed8e74a2b9..68fd61b716 100644 --- a/contrib/pgcrypto/openssl.c +++ b/contrib/pgcrypto/openssl.c @@ -369,16 +369,18 @@ gen_ossl_free(PX_Cipher *c) } static int -gen_ossl_decrypt(PX_Cipher *c, const uint8 *data, unsigned dlen, -uint8 *res) +gen_ossl_decrypt(PX_Cipher *c, int padding, const uint8 *data, unsigned dlen, +uint8 *res, unsigned *rlen) { OSSLCipher *od = c->ptr; - int outlen; + int outlen, outlen2; if (!od->init) { if (!EVP_DecryptInit_ex(od->evp_ctx, od->evp_ciph, NULL, NULL, NULL)) return PXE_CIPHER_INIT; + if (!EVP_CIPHER_CTX_set_padding(od->evp_ctx, padding)) + return PXE_CIPHER_INIT; if (!EVP_CIPHER_CTX_set_key_length(od->evp_ctx, od->klen)) return PXE_CIPHER_INIT; if (!EVP_DecryptInit_ex(od->evp_ctx, NULL, NULL, od->key, od->iv)) @@ -388,21 +390,26 @@ gen_ossl_decrypt(PX_Cipher *c, const uint8 *data, unsigned dlen, if (!EVP_DecryptUpdate(od->evp_ctx, res, , data, dlen)) return PXE_DECRYPT_FAILED; + if (!EVP_DecryptFinal_ex(od->evp_ctx, res + outlen, )) + return PXE_DECRYPT_FAILED; + *rlen = outlen + outlen2; return 0; } static int -gen_ossl_encrypt(PX_Cipher *c, const uint8 *data, unsigned dlen, -uint8 *res) +gen_ossl_encrypt(PX_Cipher *c, int padding, const uint8 *data, unsigned dlen, +uint8 *res, unsigned *rlen) { OSSLCipher *od = c->ptr; - int outlen; + int outlen, outlen2; if (!od->init) { if (!EVP_EncryptInit_ex(od->evp_ctx, od->evp_ciph, NULL, NULL, NULL)) return PXE_CIPHER_INIT; + if (!EVP_CIPHER_CTX_set_padding(od->evp_ctx, padding)) + return PXE_CIPHER_INIT; if (!EVP_CIPHER_CTX_set_key_length(od->evp_ctx, od->klen)) return PXE_CIPHER_INIT; if (!EVP_EncryptInit_ex(od->evp_ctx, NULL, NULL, od->key, od->iv)) @@ -412,6 +419,9 @@ gen_ossl_encrypt(PX_Cipher *c, const uint8 *data, unsigned dlen, if (!EVP_EncryptUpdate(od->evp_ctx, res, , data, dlen)) return PXE_ENCRYPT_FAILED; + if (!EVP_EncryptFinal_ex(od->evp_ctx,
Re: Preventing abort() and exit() calls in libpq
I wrote: > I'm now wondering about applying the test to *.o in libpq, > as well as libpgport_shlib.a and libpgcommon_shlib.a. > The latter would require some code changes, and it would make > the prohibition extend further than libpq alone. On the bright > side, we could reinstate the check for abort(). After consuming a bit more caffeine, I'm afraid that won't work. I'd imagined leaving, e.g., psprintf.c out of libpgcommon_shlib.a. But if someone mistakenly introduced a psprintf call into libpq, it'd still compile just fine; the symbol would be resolved against psprintf in the calling application's code. We'd only detect a failure when trying to use libpq with an app that didn't contain that function, which feels like something that our own testing could miss. What I'm now thinking about is restricting the test to only be run on platforms where use of foo.a libraries is deprecated, so that we can be pretty sure that we won't hit this situation. Even if we only run the test on Linux, that'd be plenty to catch any mistakes. regards, tom lane
Re: rand48 replacement
Yura Sokolov writes: > 2. I don't see why pg_prng_state could not be `typedef uint64 > pg_prng_state[2];` Please no. That sort of typedef behaves so weirdly that it's a foot-gun. regards, tom lane
Re: Preventing abort() and exit() calls in libpq
Noah Misch writes: > On Fri, Jul 02, 2021 at 11:20:17AM -0400, Tom Lane wrote: >> Ugh. What in the world is producing those references? > Those come from a statically-linked libldap_r: Blech! I wonder if there is some way to avoid counting that. It's not really hard to imagine that such a library might contain an exit() call, for example, thus negating our test altogether. I'm now wondering about applying the test to *.o in libpq, as well as libpgport_shlib.a and libpgcommon_shlib.a. The latter would require some code changes, and it would make the prohibition extend further than libpq alone. On the bright side, we could reinstate the check for abort(). regards, tom lane
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)
Hi: I'd start to work on UniqueKey again, it would be great that we can target it to PG 15. The attached patch is just for the notnull_attrs. Since we can't say a column is nullable or not without saying in which resultset, so I think attaching it to RelOptInfo is unavoidable. Here is how my patch works. @@ -686,6 +686,12 @@ typedef struct RelOptInfo /* default result targetlist for Paths scanning this relation */ struct PathTarget *reltarget; /* list of Vars/Exprs, cost, width */ + Bitmapset **notnull_attrs; /* The attno which is not null after evaluating + * all the quals on this relation, for baserel, + * the len would always 1. and for others the array + * index is relid from relids. + */ + For baserel, it records the notnull attrs as a bitmapset and stores it to RelOptInfo->notnull_attrs[0]. As for the joinrel, suppose the relids is {1,3, 5}, then the notnull_attrs[1/3/5] will be used to store notnull_attrs Bitmapset for relation 1,3,5 separately. I don't handle this stuff for all kinds of upper relation and subquery so far since UniqueKey doesn't rely on it and looks more stuff should be handled there. The patch also included some debug messages in set_baserel/joinrel_notnullattrs and attached the test.sql for easier review. Any feedback is welcome and hope this implementation would not block UniqueKey stuff. v1-0001-add-the-not-null-attrs-for-RelOptInfo.-Here-is-ho.patch Description: Binary data
Re: [PATCH v3 1/1] Fix detection of preadv/pwritev support for OSX.
On 21.06.21 07:22, Thomas Munro wrote: I'm not personally against the proposed change. I'll admit there is something annoying about Apple's environment working in a way that doesn't suit traditional configure macros that have been the basis of portable software for a few decades, but when all's said and done, configure is a Unix wars era way to make things work across all the Unixes, and most of them are long gone, configure itself is on the way out, and Apple's still here, so... I think this change is perfectly appropriate (modulo some small cleanups). The objection was that you cannot reliably use AC_CHECK_FUNCS (and therefore AC_REPLACE_FUNCS) anymore, but that has always been true, since AC_CHECK_FUNCS doesn't handle macros, compiler built-ins, and functions that are not declared, and any other situation where looking for a symbol in a library is not the same as checking whether the symbol actual works for your purpose. This is not too different from the long transition from "does this header file exists" to "can I compile this header file". So in fact the correct way forward would be to get rid of all uses of AC_CHECK_FUNCS and related, and then this problem goes away by itself.
Re: Remove useless int64 range checks on BIGINT sequence MINVALUE/MAXVALUE values
On 21.06.21 13:32, Greg Nancarrow wrote: On Mon, Jun 21, 2021 at 8:32 PM David Rowley wrote: It might be worth putting in a comment to mention that the check is not needed. Just in case someone looks again one day and thinks the checks are missing. Probably best to put this in the July commitfest so it does not get missed. Updated the patch, and will add it to the Commitfest, thanks. I don't think this is a good change. It replaces one perfectly solid, harmless, and readable line of code with six lines of comment explaining why the code isn't necessary (times two). And the code is now less robust against changes elsewhere. To maintain this robustness, you'd have to add assertions that prove that what the comment is saying is actually true, thus adding even more code. I think we should leave it as is.
Re: Add ZSON extension to /contrib/
On 04.06.21 17:09, Aleksander Alekseev wrote: I decided to add the patch to the nearest commitfest. With respect to the commit fest submission, I don't think there is consensus right now to add this. I think people would prefer that this dictionary facility be somehow made available in the existing JSON types. Also, I sense that there is still some volatility about some of the details of how this extension should work and its scope. I think this is served best as an external extension for now.
Re: rand48 replacement
Fabien COELHO wrote 2021-07-03 11:45: And a v5 where an unused test file does also compile if we insist. About patch: 1. PostgreSQL source uses `uint64` and `uint32`, but not `uint64_t`/`uint32_t` 2. I don't see why pg_prng_state could not be `typedef uint64 pg_prng_state[2];` 3. Then SamplerRandomState and pgbench RandomState could stay. Patch will be a lot shorter. I don't like mix of semantic refactoring and syntactic refactoring in the same patch. While I could agree with replacing `SamplerRandomState => pg_prng_state`, I'd rather see it in separate commit. And that separate commit could contain transition: `typedef uint64 pg_prng_state[2];` => `typedef struct { uint64 s0, s1 } pg_prng_state;` 4. There is no need in ReservoirStateData->randstate_initialized. There could be macros/function: `bool pg_prng_initiated(state) { return (state[0]|state[1]) != 0; }` 5. Is there need for 128bit prng at all? At least 2*64bit. There are 2*32bit xoroshiro64 https://prng.di.unimi.it/xoroshiro64starstar.c And there is 4*32bit xoshiro128: https://prng.di.unimi.it/xoshiro128plusplus.c 32bit operations are faster on 32bit platforms. But 32bit platforms are quite rare in production this days. Therefore I don't have strong opinion on this. regards, Sokolov Yura.
Re: WIP: Relaxing the constraints on numeric scale
Attached is a more complete patch, with updated docs and tests. I chose to allow the scale to be in the range -1000 to 1000, which, to some extent, is quite arbitrary. The upper limit of 1000 makes sense, because nearly all numeric computations (other than multiply, add and subtract) have that as their upper scale limit (that's the maximum display scale). It also has to be at least 1000 for SQL compliance, since the precision can be up to 1000. The lower limit, on the other hand, really is quite arbitrary. -1000 is a nice round number, giving it a certain symmetry, and is almost certainly sufficient for any realistic use case (-1000 means numbers are rounded to the nearest multiple of 10^1000). Also, keeping some spare bits in the typemod might come in handy one day for something else (e.g., rounding mode choice). Regards, Dean diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml new file mode 100644 index c473d6a..97e4cdf --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -545,8 +545,8 @@ NUMERIC(precision, scale) - The precision must be positive, the scale zero or positive. - Alternatively: + The precision must be positive, the scale may be positive or negative + (see below). Alternatively: NUMERIC(precision) @@ -578,11 +578,41 @@ NUMERIC If the scale of a value to be stored is greater than the declared scale of the column, the system will round the value to the specified - number of fractional digits. Then, if the number of digits to the - left of the decimal point exceeds the declared precision minus the - declared scale, an error is raised. + number of fractional digits. If the declared scale of the column is + negative, the value will be rounded to the left of the decimal point. + If, after rounding, the number of digits to the left of the decimal point + exceeds the declared precision minus the declared scale, an error is + raised. Similarly, if the declared scale exceeds the declared precision + and the number of zero digits to the right of the decimal point is less + than the declared scale minus the declared precision, an error is raised. + For example, a column declared as + +NUMERIC(3, 1) + + will round values to 1 decimal place and be able to store values between + -99.9 and 99.9, inclusive. A column declared as + +NUMERIC(2, -3) + + will round values to the nearest thousand and be able to store values + between -99000 and 99000, inclusive. A column declared as + +NUMERIC(3, 5) + + will round values to 5 decimal places and be able to store values between + -0.00999 and 0.00999, inclusive. + + + The scale in a NUMERIC type declaration may be any value in + the range -1000 to 1000. (The SQL standard requires + the scale to be in the range 0 to precision. + Using values outside this range may not be portable to other database + systems.) + + + Numeric values are physically stored without any extra leading or trailing zeroes. Thus, the declared precision and scale of a column diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c new file mode 100644 index eb78f0b..2001d75 --- a/src/backend/utils/adt/numeric.c +++ b/src/backend/utils/adt/numeric.c @@ -250,6 +250,17 @@ struct NumericData | ((n)->choice.n_short.n_header & NUMERIC_SHORT_WEIGHT_MASK)) \ : ((n)->choice.n_long.n_weight)) +/* + * Pack the numeric precision and scale in the typmod value. The upper 16 + * bits are used for the precision, and the lower 16 bits for the scale. Note + * that the scale may be negative, so use sign extension when unpacking it. + */ + +#define MAKE_TYPMOD(p, s) p) << 16) | ((s) & 0x)) + VARHDRSZ) + +#define TYPMOD_PRECISION(t) t) - VARHDRSZ) >> 16) & 0x) +#define TYPMOD_SCALE(t) ((int32) ((int16) (((t) - VARHDRSZ) & 0x))) + /* -- * NumericVar is the format we use for arithmetic. The digit-array part * is the same as the NumericData storage format, but the header is more @@ -826,7 +837,7 @@ numeric_maximum_size(int32 typmod) return -1; /* precision (ie, max # of digits) is in upper bits of typmod */ - precision = ((typmod - VARHDRSZ) >> 16) & 0x; + precision = TYPMOD_PRECISION(typmod); /* * This formula computes the maximum number of NumericDigits we could need @@ -1080,10 +1091,10 @@ numeric_support(PG_FUNCTION_ARGS) Node *source = (Node *) linitial(expr->args); int32 old_typmod = exprTypmod(source); int32 new_typmod = DatumGetInt32(((Const *) typmod)->constvalue); - int32 old_scale = (old_typmod - VARHDRSZ) & 0x; - int32 new_scale = (new_typmod - VARHDRSZ) & 0x; - int32 old_precision = (old_typmod - VARHDRSZ) >> 16 & 0x; - int32 new_precision = (new_typmod - VARHDRSZ) >> 16 & 0x; + int32 old_scale = TYPMOD_SCALE(old_typmod); + int32 new_scale =
Re: Synchronous commit behavior during network outage
> 3 июля 2021 г., в 01:15, Jeff Davis написал(а): > > On Fri, 2021-07-02 at 11:39 +0500, Andrey Borodin wrote: >> If the failover happens due to unresponsive node we cannot just turn >> off sync rep. We need to have some spare connections for that (number >> of stuck backends will skyrocket during network partitioning). We >> need available descriptors and some memory to fork new backend. We >> will need to re-read config. We need time to try after all. >> At some failures we may lack some of these. > > I think it's a good point that, when things start to go wrong, they can > go very wrong very quickly. > > But until you've disabled sync rep, the primary will essentially be > down for writes whether using this new feature or not. Even if you can > terminate some backends to try to free space, the application will just > make new connections that will get stuck the same way. Surely I'm talking about terminating postmaster, not individual backends. But postmaster will need to terminate each running query. We surely need to have a way to stop whole instance without making any single query. And I do not like kill -9 for this purpose. > > You can avoid the "fork backend" problem by keeping a connection always > open from the HA tool, or by editing the conf to disable sync rep and > issuing SIGHUP instead. Granted, that still takes some memory. > >> Partial degradation is already hard task. Without ability to easily >> terminate running Postgres HA tool will often resort to SIGKILL. > > When the system is really wedged as you describe (waiting on sync rep, > tons of connections, and low memory), what information do you expect > the HA tool to be able to collect, and what actions do you expect it to > take? HA tool is not going to collect anything. It just calls pg_ctl stop [0] or it's equivalent. > > Presumably, you'd want it to disable sync rep at some point to get back > online. Where does SIGTERM fit into the picture? HA tool is going to terminate running instance, rewind it, switch to new timeline and enroll into cluster again as standby. > >>> If you don't handle the termination case, then there's still a >>> chance >>> for the transaction to become visible to other clients before its >>> replicated. >> >> Termination is admin command, they know what they are doing. >> Cancelation is part of user protocol. > > From the pg_terminate_backend() docs: "This is also allowed if the > calling role is a member of the role whose backend is being terminated > or the calling role has been granted pg_signal_backend", so it's not > really an admin command. Even for an admin, it might be hard to > understand why terminating a backend could result in losing a visible > transaction. Ok, I see backend termination is not described as admin command. We cannot prevent user from doing stupid things, they are able to delete their data anyway. > I'm not really seeing two use cases here for two GUCs. Are you sure you > want to disable only cancels but allow termination to proceed? Yes, I'm sure. I had been running production with disabled termination for some weeks. cluster reparation was much slower. For some reason kill-9-ed instances were successfully rewound much less often. But maybe I've done something wrong. If we can stop whole instance the same way as we did without activating proposed GUC - there is no any problem. Thanks! Best regards, Andrey Borodin. [0] https://github.com/zalando/patroni/blob/master/patroni/postgresql/postmaster.py#L155
Re: rand48 replacement
Here is a v4, which: - moves the stuff to common and fully removes random/srandom (Tom) - includes a range generation function based on the bitmask method (Dean) but iterates with splitmix so that the state always advances once (Me) And a v5 where an unused test file does also compile if we insist. -- Fabien.diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c index 2c2f149fb0..146b524076 100644 --- a/contrib/file_fdw/file_fdw.c +++ b/contrib/file_fdw/file_fdw.c @@ -1188,7 +1188,7 @@ file_acquire_sample_rows(Relation onerel, int elevel, * Found a suitable tuple, so save it, replacing one old tuple * at random */ -int k = (int) (targrows * sampler_random_fract(rstate.randstate)); +int k = (int) (targrows * sampler_random_fract()); Assert(k >= 0 && k < targrows); heap_freetuple(rows[k]); diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index fafbab6b02..3009861e45 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -5152,7 +5152,7 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate) if (astate->rowstoskip <= 0) { /* Choose a random reservoir element to replace. */ - pos = (int) (targrows * sampler_random_fract(astate->rstate.randstate)); + pos = (int) (targrows * sampler_random_fract(>rstate.randstate)); Assert(pos >= 0 && pos < targrows); heap_freetuple(astate->rows[pos]); } diff --git a/contrib/tsm_system_rows/tsm_system_rows.c b/contrib/tsm_system_rows/tsm_system_rows.c index 4996612902..1a46d4b143 100644 --- a/contrib/tsm_system_rows/tsm_system_rows.c +++ b/contrib/tsm_system_rows/tsm_system_rows.c @@ -69,7 +69,7 @@ static BlockNumber system_rows_nextsampleblock(SampleScanState *node, BlockNumbe static OffsetNumber system_rows_nextsampletuple(SampleScanState *node, BlockNumber blockno, OffsetNumber maxoffset); -static uint32 random_relative_prime(uint32 n, SamplerRandomState randstate); +static uint32 random_relative_prime(uint32 n, pg_prng_state *randstate); /* @@ -213,25 +213,25 @@ system_rows_nextsampleblock(SampleScanState *node, BlockNumber nblocks) if (sampler->step == 0) { /* Initialize now that we have scan descriptor */ - SamplerRandomState randstate; + pg_prng_state randstate; /* If relation is empty, there's nothing to scan */ if (nblocks == 0) return InvalidBlockNumber; /* We only need an RNG during this setup step */ - sampler_random_init_state(sampler->seed, randstate); + sampler_random_init_state(sampler->seed, ); /* Compute nblocks/firstblock/step only once per query */ sampler->nblocks = nblocks; /* Choose random starting block within the relation */ /* (Actually this is the predecessor of the first block visited) */ - sampler->firstblock = sampler_random_fract(randstate) * + sampler->firstblock = sampler_random_fract() * sampler->nblocks; /* Find relative prime as step size for linear probing */ - sampler->step = random_relative_prime(sampler->nblocks, randstate); + sampler->step = random_relative_prime(sampler->nblocks, ); } /* Reinitialize lb */ @@ -317,7 +317,7 @@ gcd(uint32 a, uint32 b) * (else return 1). */ static uint32 -random_relative_prime(uint32 n, SamplerRandomState randstate) +random_relative_prime(uint32 n, pg_prng_state *randstate) { uint32 r; diff --git a/contrib/tsm_system_time/tsm_system_time.c b/contrib/tsm_system_time/tsm_system_time.c index 788d8f9a68..36acc6c106 100644 --- a/contrib/tsm_system_time/tsm_system_time.c +++ b/contrib/tsm_system_time/tsm_system_time.c @@ -69,7 +69,7 @@ static BlockNumber system_time_nextsampleblock(SampleScanState *node, BlockNumbe static OffsetNumber system_time_nextsampletuple(SampleScanState *node, BlockNumber blockno, OffsetNumber maxoffset); -static uint32 random_relative_prime(uint32 n, SamplerRandomState randstate); +static uint32 random_relative_prime(uint32 n, pg_prng_state *randstate); /* @@ -224,25 +224,25 @@ system_time_nextsampleblock(SampleScanState *node, BlockNumber nblocks) if (sampler->step == 0) { /* Initialize now that we have scan descriptor */ - SamplerRandomState randstate; + pg_prng_state randstate; /* If relation is empty, there's nothing to scan */ if (nblocks == 0) return InvalidBlockNumber; /* We only need an RNG during this setup step */ - sampler_random_init_state(sampler->seed, randstate); + sampler_random_init_state(sampler->seed, ); /* Compute nblocks/firstblock/step only once per query */ sampler->nblocks = nblocks; /* Choose random starting block within the relation */ /* (Actually this is the predecessor of the first block visited) */ - sampler->firstblock = sampler_random_fract(randstate) * + sampler->firstblock = sampler_random_fract() * sampler->nblocks; /* Find relative
Re: rand48 replacement
Hello Dean & Tom, Here is a v4, which: - moves the stuff to common and fully removes random/srandom (Tom) - includes a range generation function based on the bitmask method (Dean) but iterates with splitmix so that the state always advances once (Me) -- Fabien.diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c index 2c2f149fb0..146b524076 100644 --- a/contrib/file_fdw/file_fdw.c +++ b/contrib/file_fdw/file_fdw.c @@ -1188,7 +1188,7 @@ file_acquire_sample_rows(Relation onerel, int elevel, * Found a suitable tuple, so save it, replacing one old tuple * at random */ -int k = (int) (targrows * sampler_random_fract(rstate.randstate)); +int k = (int) (targrows * sampler_random_fract()); Assert(k >= 0 && k < targrows); heap_freetuple(rows[k]); diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index fafbab6b02..3009861e45 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -5152,7 +5152,7 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate) if (astate->rowstoskip <= 0) { /* Choose a random reservoir element to replace. */ - pos = (int) (targrows * sampler_random_fract(astate->rstate.randstate)); + pos = (int) (targrows * sampler_random_fract(>rstate.randstate)); Assert(pos >= 0 && pos < targrows); heap_freetuple(astate->rows[pos]); } diff --git a/contrib/tsm_system_rows/tsm_system_rows.c b/contrib/tsm_system_rows/tsm_system_rows.c index 4996612902..1a46d4b143 100644 --- a/contrib/tsm_system_rows/tsm_system_rows.c +++ b/contrib/tsm_system_rows/tsm_system_rows.c @@ -69,7 +69,7 @@ static BlockNumber system_rows_nextsampleblock(SampleScanState *node, BlockNumbe static OffsetNumber system_rows_nextsampletuple(SampleScanState *node, BlockNumber blockno, OffsetNumber maxoffset); -static uint32 random_relative_prime(uint32 n, SamplerRandomState randstate); +static uint32 random_relative_prime(uint32 n, pg_prng_state *randstate); /* @@ -213,25 +213,25 @@ system_rows_nextsampleblock(SampleScanState *node, BlockNumber nblocks) if (sampler->step == 0) { /* Initialize now that we have scan descriptor */ - SamplerRandomState randstate; + pg_prng_state randstate; /* If relation is empty, there's nothing to scan */ if (nblocks == 0) return InvalidBlockNumber; /* We only need an RNG during this setup step */ - sampler_random_init_state(sampler->seed, randstate); + sampler_random_init_state(sampler->seed, ); /* Compute nblocks/firstblock/step only once per query */ sampler->nblocks = nblocks; /* Choose random starting block within the relation */ /* (Actually this is the predecessor of the first block visited) */ - sampler->firstblock = sampler_random_fract(randstate) * + sampler->firstblock = sampler_random_fract() * sampler->nblocks; /* Find relative prime as step size for linear probing */ - sampler->step = random_relative_prime(sampler->nblocks, randstate); + sampler->step = random_relative_prime(sampler->nblocks, ); } /* Reinitialize lb */ @@ -317,7 +317,7 @@ gcd(uint32 a, uint32 b) * (else return 1). */ static uint32 -random_relative_prime(uint32 n, SamplerRandomState randstate) +random_relative_prime(uint32 n, pg_prng_state *randstate) { uint32 r; diff --git a/contrib/tsm_system_time/tsm_system_time.c b/contrib/tsm_system_time/tsm_system_time.c index 788d8f9a68..36acc6c106 100644 --- a/contrib/tsm_system_time/tsm_system_time.c +++ b/contrib/tsm_system_time/tsm_system_time.c @@ -69,7 +69,7 @@ static BlockNumber system_time_nextsampleblock(SampleScanState *node, BlockNumbe static OffsetNumber system_time_nextsampletuple(SampleScanState *node, BlockNumber blockno, OffsetNumber maxoffset); -static uint32 random_relative_prime(uint32 n, SamplerRandomState randstate); +static uint32 random_relative_prime(uint32 n, pg_prng_state *randstate); /* @@ -224,25 +224,25 @@ system_time_nextsampleblock(SampleScanState *node, BlockNumber nblocks) if (sampler->step == 0) { /* Initialize now that we have scan descriptor */ - SamplerRandomState randstate; + pg_prng_state randstate; /* If relation is empty, there's nothing to scan */ if (nblocks == 0) return InvalidBlockNumber; /* We only need an RNG during this setup step */ - sampler_random_init_state(sampler->seed, randstate); + sampler_random_init_state(sampler->seed, ); /* Compute nblocks/firstblock/step only once per query */ sampler->nblocks = nblocks; /* Choose random starting block within the relation */ /* (Actually this is the predecessor of the first block visited) */ - sampler->firstblock = sampler_random_fract(randstate) * + sampler->firstblock = sampler_random_fract() * sampler->nblocks; /* Find relative prime as step size for linear probing */ -