[PATCH] Add min() and max() aggregate functions for xid8
Hi hackers, Unlike xid, xid8 increases monotonically and cannot be reused. This trait makes it possible to support min() and max() aggregate functions for xid8. I thought they would be useful for monitoring. So I made a patch for this. Best wishes, -- Ken Kato Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATIONdiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 8754f2f89b..1b064b4feb 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -19973,7 +19973,7 @@ SELECT NULLIF(value, '(none)') ... values. Available for any numeric, string, date/time, or enum type, as well as inet, interval, money, oid, pg_lsn, -tid, +tid, xid8, and arrays of any of these types. Yes @@ -19992,7 +19992,7 @@ SELECT NULLIF(value, '(none)') ... values. Available for any numeric, string, date/time, or enum type, as well as inet, interval, money, oid, pg_lsn, -tid, +tid, xid8, and arrays of any of these types. Yes diff --git a/src/backend/utils/adt/xid.c b/src/backend/utils/adt/xid.c index 9b4ceaea47..6e24697782 100644 --- a/src/backend/utils/adt/xid.c +++ b/src/backend/utils/adt/xid.c @@ -286,6 +286,24 @@ xid8cmp(PG_FUNCTION_ARGS) PG_RETURN_INT32(-1); } +Datum +xid8_larger(PG_FUNCTION_ARGS) +{ + FullTransactionId fxid1 = PG_GETARG_FULLTRANSACTIONID(0); + FullTransactionId fxid2 = PG_GETARG_FULLTRANSACTIONID(1); + + PG_RETURN_FULLTRANSACTIONID((U64FromFullTransactionId(fxid1) > U64FromFullTransactionId(fxid2)) ? fxid1 : fxid2); +} + +Datum +xid8_smaller(PG_FUNCTION_ARGS) +{ + FullTransactionId fxid1 = PG_GETARG_FULLTRANSACTIONID(0); + FullTransactionId fxid2 = PG_GETARG_FULLTRANSACTIONID(1); + + PG_RETURN_FULLTRANSACTIONID((U64FromFullTransactionId(fxid1) < U64FromFullTransactionId(fxid2)) ? fxid1 : fxid2); +} + /* * COMMAND IDENTIFIER ROUTINES * */ diff --git a/src/include/catalog/pg_aggregate.dat b/src/include/catalog/pg_aggregate.dat index 137f6eef69..2843f4b415 100644 --- a/src/include/catalog/pg_aggregate.dat +++ b/src/include/catalog/pg_aggregate.dat @@ -149,6 +149,9 @@ { aggfnoid => 'max(pg_lsn)', aggtransfn => 'pg_lsn_larger', aggcombinefn => 'pg_lsn_larger', aggsortop => '>(pg_lsn,pg_lsn)', aggtranstype => 'pg_lsn' }, +{ aggfnoid => 'max(xid8)', aggtransfn => 'xid8_larger', + aggcombinefn => 'xid8_larger', aggsortop => '>(xid8,xid8)', + aggtranstype => 'xid8' }, # min { aggfnoid => 'min(int8)', aggtransfn => 'int8smaller', @@ -214,6 +217,9 @@ { aggfnoid => 'min(pg_lsn)', aggtransfn => 'pg_lsn_smaller', aggcombinefn => 'pg_lsn_smaller', aggsortop => '<(pg_lsn,pg_lsn)', aggtranstype => 'pg_lsn' }, +{ aggfnoid => 'min(xid8)', aggtransfn => 'xid8_smaller', + aggcombinefn => 'xid8_smaller', aggsortop => '<(xid8,xid8)', + aggtranstype => 'xid8' }, # count { aggfnoid => 'count(any)', aggtransfn => 'int8inc_any', diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 7024dbe10a..62f36daa98 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -203,6 +203,12 @@ { oid => '5071', descr => 'convert xid8 to xid', proname => 'xid', prorettype => 'xid', proargtypes => 'xid8', prosrc => 'xid8toxid' }, +{ oid => '5097', descr => 'larger of two', + proname => 'xid8_larger', prorettype => 'xid8', proargtypes => 'xid8 xid8', + prosrc => 'xid8_larger' }, +{ oid => '5098', descr => 'smaller of two', + proname => 'xid8_smaller', prorettype => 'xid8', proargtypes => 'xid8 xid8', + prosrc => 'xid8_smaller' }, { oid => '69', proname => 'cideq', proleakproof => 't', prorettype => 'bool', proargtypes => 'cid cid', prosrc => 'cideq' }, @@ -6564,6 +6570,9 @@ { oid => '4189', descr => 'maximum value of all pg_lsn input values', proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn', proargtypes => 'pg_lsn', prosrc => 'aggregate_dummy' }, +{ oid => '5099', descr => 'maximum value of all xid8 input values', + proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'xid8', + proargtypes => 'xid8', prosrc => 'aggregate_dummy' }, { oid => '2131', descr => 'minimum value of all bigint input values', proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'int8', @@ -6631,6 +6640,9 @@ { oid => '4190', descr => 'minimum value of all pg_lsn input values', proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn', proargtypes => 'pg_lsn', prosrc => 'aggregate_dummy' }, +{ oid => '5100', descr => 'minimum value of all xid8 input values', + proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'xid8', + proargtypes => 'xid8', prosrc => 'aggregate_dummy' }, # count has two
Re: [PATCH] Accept IP addresses in server certificate SANs
At Wed, 2 Feb 2022 19:46:13 +, Jacob Champion wrote in > On Mon, 2022-01-31 at 17:29 +0900, Kyotaro Horiguchi wrote: > > +#define PGSQL_AF_INET (AF_INET + 0) > > +#define PGSQL_AF_INET6 (AF_INET + 1) > > + > > > > Now we have the same definition thrice in frontend code. Coulnd't we > > define them in, say, libpq-fe.h or inet-fe.h (nonexistent) then > > include it from the three files? > > I started down the inet-fe.h route, and then realized I didn't know > where that should go. Does it need to be included in (or part of) > port.h? And should it be installed as part of the logic in > src/include/Makefile? I don't think it should be a part of port.h. Though I suggested frontend-only header file by the name, isn't it enough to separate out the definitions from utils/inet.h to common/inet-common.h then include the inet-common.h from inet.h? > > When IP address is embedded in URI, it won't be translated to another > > IP address. Concretely https://192.0.1.5/hoge cannot reach to the host > > 192.0.1.8. On the other hand, as done in the test, libpq allows that > > when "host=192.0.1.5 hostaddr=192.0.1.8". I can't understand what we > > are doing in that case. Don't we need to match the SAN IP address > > with hostaddr instead of host? > > I thought that host, not hostaddr, was the part that corresponded to > the URI. So in a hypothetical future where postgresqls:// exists, the > two URIs > > postgresqls://192.0.2.2:5432/db > postgresqls://192.0.2.2:5432/db?hostaddr=127.0.0.1 > > should both be expecting the same certificate. That seems to match the > libpq documentation as well. Hmm. Well, considering that the objective for the validation is to check if the server is actually the client is intending to connect, it is fine. Sorry for the noise. > (Specifying a host parameter is also allowed... that seems like it > could cause problems for a hypothetical postgresqls:// scheme, but it's > probably not relevant for this thread.) Yeah. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: support for CREATE MODULE
čt 3. 2. 2022 v 5:46 odesílatel Julien Rouhaud napsal: > Hi, > > On Thu, Feb 03, 2022 at 05:25:27AM +0100, Pavel Stehule wrote: > > > > čt 3. 2. 2022 v 3:28 odesílatel Swaha Miller > > napsal: > > > > > Hi, > > > > > > I'm following up from Jim's POC for adding MODULE to PostgreSQL. [1] > > > > > > My proposal implements modules as schema objects to be stored in a new > > > system catalog pg_module with new syntax for CREATE [OR REPLACE] > MODULE, > > > ALTER MODULE, DROP MODULE and for GRANT and REVOKE for privileges on > > > modules and module routines. I am attempting to follow the SQL spec. > > > However, for right now, I'm proposing to support only routines as > module > > > contents, with local temporary tables and path specifications as > defined > > > in the SQL spec, to be supported in a future submission. We could also > > > include support for variables depending on its status. [2] > > > > I dislike this feature. The modules are partially redundant to schemas > and > > to extensions in Postgres, and I am sure, so there is no reason to > > introduce this. > > > > What is the benefit against schemas and extensions? > > I agree with Pavel. It seems that it's mainly adding another namespacing > layer > between schemas and objects, and it's likely going to create a mess. > That's also going to be problematic if you want to add support for module > variables, as you won't be able to use e.g. > dbname.schemaname.modulename.variablename.fieldname. > > Also, my understanding was that the major interest of modules (at least > for the > routines part) was the ability to make some of them private to the module, > but > it doesn't look like it's doing that, so I also don't see any real benefit > compared to schemas and extensions. > The biggest problem is coexistence of Postgres's SEARCH_PATH object identification, and local and public scopes used in MODULEs or in Oracle's packages. I can imagine MODULES as third level of database unit object grouping with following functionality 1. It should support all database objects like schemas 2. all public objects should be accessed directly when outer schema is in SEARCH_PATH 3. private objects cannot be accessed from other modules 4. modules should be movable between schemas, databases without a loss of functionality 5. modules should to support renaming without loss of functionality 6. there should be redefined some rules of visibility, because there can be new identifier's collisions and ambiguities 7. there should be defined relation of modules's objects and schema's objects. Maybe an introduction of the default module can be a good idea. I had the opportunity to see a man who modified routines in pgAdmin. It can be hell, but if we introduce a new concept (and it is an important concept), then there should be strong benefits - for example - possibility of strong encapsulation of code inside modules (or some units - the name is not important). The problem with pgAdmin maybe can be solved better by adding some annotations to database objects that allows more user friendly organization in the object tree in pgAdmin (and similar tools). Maybe it can be useful to have more tries (defined by locality, semantic, quality, ...). Regards Pavel
Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
At Thu, 3 Feb 2022 13:59:03 +0900, Fujii Masao wrote in > > > On 2022/02/02 23:46, Bharath Rupireddy wrote: > > On Tue, Feb 1, 2022 at 9:39 PM Fujii Masao > > wrote: > >> I found that CreateRestartPoint() already reported the redo lsn as > >> follows after emitting the restartpoint log message. To avoid > >> duplicated logging of the same information, we should update this > >> code? > >> > >> ereport((log_checkpoints ? LOG : DEBUG2), > >> (errmsg("recovery restart point at %X/%X", > >> > >> LSN_FORMAT_ARGS(lastCheckPoint.redo)), > >> xtime ? errdetail("Last completed transaction was > >> at log time %s.", > >> > >> timestamptz_to_str(xtime)) > >> : 0)); > >> > >> This code reports lastCheckPoint.redo as redo lsn. OTOH, with the > >> patch, LogCheckpointEnd() reports > >> ControlFile->checkPointCopy.redo. They may be different, for example, > >> when the current DB state is not DB_IN_ARCHIVE_RECOVERY. In this case, > >> which lsn should we report as redo lsn? > > Do we ever reach CreateRestartPoint when ControlFile->stat != > > DB_IN_ARCHIVE_RECOVERY? Assert(ControlFile->state == > > DB_IN_ARCHIVE_RECOVERY); in CreateRestartPoint doesn't fail any > > regression tests. > > ISTM that CreateRestartPoint() can reach the condition > ControlFile->state != DB_IN_ARCHIVE_RECOVERY. Please imagine the case > where CreateRestartPoint() has already started and calls > CheckPointGuts(). If the standby server is promoted while > CreateRestartPoint() is flushing the data to disk at CheckPointGuts(), > the state would be updated to DB_IN_PRODUCTION and > CreateRestartPoint() can see the state != DB_IN_ARCHIVE_RECOVERY > later. By the way a comment there: > * this is a quick hack to make sure nothing really bad happens if somehow > * we get here after the end-of-recovery checkpoint. now looks a bit wrong since now it's normal that a restartpoint ends after promotion. > As far as I read the code, this case seems to be able to make the > server unrecoverable. If this case happens, since pg_control is not > updated, pg_control still indicates the REDO LSN of last valid > restartpoint. But CreateRestartPoint() seems to delete old WAL files > based on its "current" REDO LSN not pg_control's REDO LSN. That is, > WAL files required for the crash recovery starting from pg_control's > REDO LSN would be removed. Seems right. (I didn't confirm the behavior, though ) > If this understanding is right, to address this issue, probably we > need to make CreateRestartPoint() do nothing (return immediately) when > the state != DB_IN_ARCHIVE_RECOVERY? On way to take. In that case should we log something like "Restartpoint canceled" or something? By the way, restart point should start only while recoverying, and at the timeof the start both checkpoint.redo and checkpoint LSN are already past. We shouldn't update minRecovery point after promotion, but is there any reason for not updating the checkPoint and checkPointCopy? If we update them after promotion, the which-LSN-to-show problem would be gone. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: XTS cipher mode for cluster file encryption
On 2022/2/2 01:50, Bruce Momjian wrote: > Well, I sent an email a week ago asking if people want to advance this > feature forward, and so far you are the only person to reply, which I > think means there isn't enough interest in this feature to advance it. I am still focus on this thread. and I have a small patch to solve the current buffile problem. https://www.postgresql.org/message-id/a859a753-70f2-bb17-6830-19dbcad11c17%40sasa.su OpenPGP_signature Description: OpenPGP digital signature
Re: Add header support to text format and matching feature
On 31.01.22 07:54, Peter Eisentraut wrote: On 30.01.22 23:56, Rémi Lapeyre wrote: I notice in the 0002 patch that there is no test case for the error "wrong header for column \"%s\": got \"%s\"", which I think is really the core functionality of this patch. So please add that. I added a test for it in this new version of the patch. The file_fdw.sql tests contain this +CREATE FOREIGN TABLE header_doesnt_match (a int, foo text) SERVER file_server +OPTIONS (format 'csv', filename :'filename', delimiter ',', header 'match'); -- ERROR but no actual error is generated. Please review the additions on the file_fdw tests to see that they make sense. A few more comments on your latest patch: - The DefGetCopyHeader() function seems very bulky and might not be necessary. I think you can just check for the string "match" first and then use defGetBoolean() as before if it didn't match. - If you define COPY_HEADER_ABSENT = 0 in the enum, then most of the existing truth checks don't need to be changed. - In NextCopyFromRawFields(), it looks like you have code that replaces the null_print value if the supplied column name is null. I don't think we should allow null column values. Someone, this should be an error. (Do we use null_print on output and make the column name null if it matches?)
Re: Doc: CREATE_REPLICATION_SLOT command requires the plugin name
On Wed, Feb 2, 2022 at 12:41 PM Antonin Houska wrote: > > Amit Kapila wrote: > > > On Tue, Feb 1, 2022 at 5:43 PM Antonin Houska wrote: > > > > > > Amit Kapila wrote: > > > > > > > On Tue, Feb 1, 2022 at 3:44 PM Antonin Houska wrote: > > > > > > > > > > I got a syntax error when using the command according to the existing > > > > > documentation. The output_plugin parameter needs to be passed too. > > > > > > > > > > > > > Why do we need it for physical slots? > > > > > > Sure we don't, the missing curly brackets seem to be the problem. I used > > > the > > > other form of the command for reference, which therefore might need a > > > minor > > > fix too. > > > > > > > Instead of adding additional '{}', can't we simply use: > > { PHYSICAL [ RESERVE_WAL ] | > > LOGICAL > class="parameter">output_plugin } [ > > EXPORT_SNAPSHOT | > > NOEXPORT_SNAPSHOT | USE_SNAPSHOT > > | TWO_PHASE ] > > Do you mean changing > > CREATE_REPLICATION_SLOT slot_name [ TEMPORARY ] { PHYSICAL [ RESERVE_WAL ] | > LOGICAL output_plugin [ EXPORT_SNAPSHOT | NOEXPORT_SNAPSHOT | USE_SNAPSHOT | > TWO_PHASE ] } > > to > > CREATE_REPLICATION_SLOT slot_name [ TEMPORARY ] { PHYSICAL [ RESERVE_WAL ] | > LOGICAL output_plugin } [ EXPORT_SNAPSHOT | NOEXPORT_SNAPSHOT | USE_SNAPSHOT > | TWO_PHASE ] > > ? > > I'm not sure, IMHO that would still allow for commands like > > CREATE_REPLICATION_SLOT slot_name PHYSICAL output_plugin > I don't think so. Symbol '|' differentiates that, check its usage at other places like Create Table doc page [1]. Considering that, it seems to me that the way it is currently mentioned in docs is correct. CREATE_REPLICATION_SLOT slot_name [ TEMPORARY ] { PHYSICAL [ RESERVE_WAL ] | LOGICAL output_plugin [ EXPORT_SNAPSHOT | NOEXPORT_SNAPSHOT | USE_SNAPSHOT | TWO_PHASE ] } According to me, this means output_plugin and all other options related to snapshot can be mentioned only for logical slots. [1] - https://www.postgresql.org/docs/devel/sql-createtable.html -- With Regards, Amit Kapila.
Re: Replace pg_controldata output fields with macros for better code manageability
At Wed, 2 Feb 2022 18:02:39 -0500, Bruce Momjian wrote in > On Sat, Jan 29, 2022 at 08:00:53PM +0530, Bharath Rupireddy wrote: > > Hi, > > > > While working on another pg_control patch, I observed that the > > pg_controldata output fields such as "Latest checkpoint's > > TimeLineID:", "Latest checkpoint's NextOID:'' and so on, are being > > used in pg_resetwal.c, pg_controldata.c and pg_upgrade/controldata.c. > > Direct usage of these fields in many places doesn't look good and can > > be error prone if we try to change the text in one place and forget in > > another place. I'm thinking of having those fields as macros in > > pg_control.h, something like [1] and use it all the places. This will > > be a good idea for better code manageability IMO. > > > > Thoughts? > > > > [1] > > #define PG_CONTROL_FIELD_VERSION_NUMBER "pg_control version number:" > > #define PG_CONTROL_FIELD_CATALOG_NUMBER "Catalog version number:" > > #define PG_CONTROL_FIELD_CHECKPOINT_TLI "Latest checkpoint's TimeLineID:" > > #define PG_CONTROL_FIELD_CHECKPOINT_PREV_TLI "Latest checkpoint's > > PrevTimeLineID:" > > #define PG_CONTROL_FIELD_CHECKPOINT_OLDESTXID "Latest checkpoint's > > oldestXID:" > > #define PG_CONTROL_FIELD_CHECKPOINT_OLDESTXID_DB "Latest checkpoint's > > oldestXID's DB:" > > and so on. > > That seems like a very good idea. +1 for consolidate the texts an any shape. But it doesn't sound like the way we should take to simply replace only the concrete text labels to symbols. Significant downsides of doing that are untranslatability and difficulty to add the proper indentation before the value field. So we need to define the texts including indentation spaces and format placeholder. But if we define the strings separately, I would rather define them in an array. typedef enum pg_control_fields { PGCNTRL_FIELD_ERROR = -1, PGCNTRL_FIELD_VERSION = 0, PGCNTRL_FIELD_CATALOG_VERSION, ... PGCTRL_NUM_FIELDS } pg_control_fields; const char *pg_control_item_formats[] = { gettext_noop("pg_control version number:%u\n"), gettext_noop("Catalog version number: %u\n"), ... }; const char * get_pg_control_item_format(pg_control_fields item_id) { return _(pg_control_item_formats[item_id]); }; const char * get_pg_control_item() { return _(pg_control_item_formats[item_id]); }; pg_control_fields get_pg_control_item(const char *str, const char **valp) { int i; for (i = 0 ; i < PGCNTL_NUM_FIELDS ; i++) { const char *fmt = pg_control_item_formats[i]; const char *p = strchr(fmt, ':'); Assert(p); if (strncmp(str, fmt, p - fmt) == 0) { p = str + (p - fmt); for(p++ ; *p == ' ' ; p++); *valp = p; return i; } } return -1; } Printer side does: printf(get_pg_control_item_format(PGCNTRL_FIELD_VERSION), ControlFile->pg_control_version); And the reader side would do: switch(get_pg_control_item(str, )) { case PGCNTRL_FIELD_VERSION: cluster->controldata.ctrl_ver = str2uint(v); break; ... regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Design of pg_stat_subscription_workers vs pgstats
On Thu, Feb 3, 2022 at 1:48 PM David G. Johnston wrote: > > On Wednesday, February 2, 2022, Masahiko Sawada wrote: >> >> and have other error >> information in pg_stat_subscription_workers view. > > > What benefit is there to keeping the existing collector-based > pg_stat_subscripiton_workers view? If we re-write it using shmem IPC then we > might as well put everything there and forego using a catalog. Then it > behaves in a similar manner to pg_stat_activity but for logical replication > workers. Yes, but if we use shmem IPC, we need to allocate shared memory for them based on the number of subscriptions, not logical replication workers (i.e., max_logical_replication_workers). So we cannot estimate memory in the beginning. Also, IIUC the number of subscriptions that are concurrently working is limited by max_replication_slots (see ReplicationStateCtl) but I think we need to remember the state of disabled subscriptions too. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Extensible Rmgr for Table AMs
Hi, On Tue, Feb 01, 2022 at 03:38:32PM -0800, Jeff Davis wrote: > On Tue, 2022-02-01 at 20:45 +0800, Julien Rouhaud wrote: > > > > One last thing, did you do some benchmark with a couple custom rmgr > > to see how > > much the O(n) access is showing up in profiles? > > What kind of a test case would be reasonable there? You mean having a > lot of custom rmgrs? > > I was expecting that few people would have more than one custom rmgr > loaded anyway, so a sparse array or hashtable seemed wasteful. If > custom rmgrs become popular we probably need to have a larger ID space > anyway, but it seems like overengineering to do so now. I agree that having dozen of custom rmgrs doesn't seem likely, but I also have no idea of how much overhead you get by not doing a direct array access. I think it would be informative to benchmark something like simple OLTP write workload on a fast storage (or a ramdisk, or with fsync off...), with the used rmgr being the 1st and the 2nd custom rmgr. Both scenario still seems plausible and shouldn't degenerate on good hardware.
Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
On 2022/02/02 23:46, Bharath Rupireddy wrote: On Tue, Feb 1, 2022 at 9:39 PM Fujii Masao wrote: I found that CreateRestartPoint() already reported the redo lsn as follows after emitting the restartpoint log message. To avoid duplicated logging of the same information, we should update this code? ereport((log_checkpoints ? LOG : DEBUG2), (errmsg("recovery restart point at %X/%X", LSN_FORMAT_ARGS(lastCheckPoint.redo)), xtime ? errdetail("Last completed transaction was at log time %s.", timestamptz_to_str(xtime)) : 0)); This code reports lastCheckPoint.redo as redo lsn. OTOH, with the patch, LogCheckpointEnd() reports ControlFile->checkPointCopy.redo. They may be different, for example, when the current DB state is not DB_IN_ARCHIVE_RECOVERY. In this case, which lsn should we report as redo lsn? Do we ever reach CreateRestartPoint when ControlFile->stat != DB_IN_ARCHIVE_RECOVERY? Assert(ControlFile->state == DB_IN_ARCHIVE_RECOVERY); in CreateRestartPoint doesn't fail any regression tests. ISTM that CreateRestartPoint() can reach the condition ControlFile->state != DB_IN_ARCHIVE_RECOVERY. Please imagine the case where CreateRestartPoint() has already started and calls CheckPointGuts(). If the standby server is promoted while CreateRestartPoint() is flushing the data to disk at CheckPointGuts(), the state would be updated to DB_IN_PRODUCTION and CreateRestartPoint() can see the state != DB_IN_ARCHIVE_RECOVERY later. As far as I read the code, this case seems to be able to make the server unrecoverable. If this case happens, since pg_control is not updated, pg_control still indicates the REDO LSN of last valid restartpoint. But CreateRestartPoint() seems to delete old WAL files based on its "current" REDO LSN not pg_control's REDO LSN. That is, WAL files required for the crash recovery starting from pg_control's REDO LSN would be removed. If this understanding is right, to address this issue, probably we need to make CreateRestartPoint() do nothing (return immediately) when the state != DB_IN_ARCHIVE_RECOVERY? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: 2022-01 Commitfest
On Wed, Feb 02, 2022 at 01:10:39PM -0500, Jaime Casanova wrote: > On Wed, Feb 02, 2022 at 01:00:18PM -0500, Tom Lane wrote: > > > > Anyway, thanks to Julien for doing this mostly-thankless task > > this time! > > > > Agreed, great work! Thanks a lot :)
Re: Design of pg_stat_subscription_workers vs pgstats
On Wednesday, February 2, 2022, Masahiko Sawada wrote: > and have other error > information in pg_stat_subscription_workers view. > What benefit is there to keeping the existing collector-based pg_stat_subscripiton_workers view? If we re-write it using shmem IPC then we might as well put everything there and forego using a catalog. Then it behaves in a similar manner to pg_stat_activity but for logical replication workers. David J.
Re: support for CREATE MODULE
Hi, On Thu, Feb 03, 2022 at 05:25:27AM +0100, Pavel Stehule wrote: > > čt 3. 2. 2022 v 3:28 odesílatel Swaha Miller > napsal: > > > Hi, > > > > I'm following up from Jim's POC for adding MODULE to PostgreSQL. [1] > > > > My proposal implements modules as schema objects to be stored in a new > > system catalog pg_module with new syntax for CREATE [OR REPLACE] MODULE, > > ALTER MODULE, DROP MODULE and for GRANT and REVOKE for privileges on > > modules and module routines. I am attempting to follow the SQL spec. > > However, for right now, I'm proposing to support only routines as module > > contents, with local temporary tables and path specifications as defined > > in the SQL spec, to be supported in a future submission. We could also > > include support for variables depending on its status. [2] > > I dislike this feature. The modules are partially redundant to schemas and > to extensions in Postgres, and I am sure, so there is no reason to > introduce this. > > What is the benefit against schemas and extensions? I agree with Pavel. It seems that it's mainly adding another namespacing layer between schemas and objects, and it's likely going to create a mess. That's also going to be problematic if you want to add support for module variables, as you won't be able to use e.g. dbname.schemaname.modulename.variablename.fieldname. Also, my understanding was that the major interest of modules (at least for the routines part) was the ability to make some of them private to the module, but it doesn't look like it's doing that, so I also don't see any real benefit compared to schemas and extensions.
Re: Design of pg_stat_subscription_workers vs pgstats
On Wed, Feb 2, 2022 at 4:36 PM David G. Johnston wrote: > > On Tue, Feb 1, 2022 at 11:55 PM Amit Kapila wrote: >> >> On Wed, Feb 2, 2022 at 9:41 AM David G. Johnston >> wrote: >> > >> > On Tue, Feb 1, 2022 at 8:07 PM Amit Kapila wrote: >> >> >> >> On Tue, Feb 1, 2022 at 11:47 AM Masahiko Sawada >> >> wrote: >> >> >> >> > >> >> > I see that it's better to use a better IPC for ALTER SUBSCRIPTION SKIP >> >> > feature to pass error-XID or error-LSN information to the worker >> >> > whereas I'm also not sure of the advantages in storing all error >> >> > information in a system catalog. Since what we need to do for this >> >> > purpose is only error-XID/LSN, we can store only error-XID/LSN in the >> >> > catalog? That is, the worker stores error-XID/LSN in the catalog on an >> >> > error, and ALTER SUBSCRIPTION SKIP command enables the worker to skip >> >> > the transaction in question. The worker clears the error-XID/LSN after >> >> > successfully applying or skipping the first non-empty transaction. >> >> > >> >> >> >> Where do you propose to store this information? >> > >> > >> > pg_subscription_worker >> > >> > The error message and context is very important. Just make sure it is >> > only non-null when the worker state is "syncing failed" (or whatever term >> > we use). >> > >> > >> >> Sure, but is this the reason you want to store all the error info in >> the system catalog? I agree that providing more error info could be >> useful and also possibly the previously failed (apply) xacts info as >> well but I am not able to see why you want to have that sort of info >> in the catalog. I could see storing info like err_lsn/err_xid that can >> allow to proceed to apply worker automatically or to slow down the >> launch of errored apply worker but not all sort of other error info >> (like err_cnt, err_code, err_message, err_time, etc.). I want to know >> why you are insisting to make all the error info persistent via the >> system catalog? > > > I look at the catalog and am informed that the worker has stopped because of > an error. I'd rather simply read the error message right then instead of > having to go look at the log file. And if I am going to take an action in > order to overcome the error I would have to know what that error is; so the > error message is not something I can ignore. The error is an attribute of > system state, and the catalog stores the current state of the (workers) > system. > > I already explained that the concept of err_cnt is not useful. The fact that > you include it here makes me think you are still thinking that this all > somehow is meant to keep track of history. It is not. The workers are state > machines and "error" is one of the states - with relevant attributes to > display to the user, and system, while in that state. The state machine > reporting does not care about historical states nor does it report on them. > There is some uncertainty if we continue with the automatic re-launch; which, > now that I write this, I can see where what you call err_cnt is effectively a > count of how many times the worker re-launched without the underlying problem > being resolved and thus encountered the same error. If we persist with the > re-launch behavior then maybe err_cnt should be left in place - with the > description for it basically being the ah-ha! comment I just made. In a world > where we do not typically re-launch and simply re-try without being informed > there is a change - such a count remains of minimal value. > > I don't really understand the confusion here though - this error data already > exists in the pg_stat_subscription_workers stat collector view - the fact > that I want to keep it around (just changing the reset behavior) - doesn't > seem like it should be controversial. I, thinking as a user, really don't > care about all of these implementation details. Whether it is a pg_stat_* > view (collector or shmem IPC) or a pg_* catalog is immaterial to me. The > behavior I observe is what matters. As a developer I don't want to use the > statistics collector because these are not statistics and the collector is > unreliable. I don't know enough about the relevant differences between > shared memory IPC and catalog tables to decide between them. But catalog > tables seem like a lower bar to meet and seem like they can implement the > user-facing requirements as I envision them. I see that important information such as error-XID that can be used for ALTER SUBSCRIPTION SKIP needs to be stored in a reliable way, and using system catalogs is a reasonable way for this purpose. But it's still unclear to me why all error information that is currently shown in pg_stat_subscription_workers view, including error-XID and the error message, relation OID, action, etc., need to be stored in the catalog. The information other than error-XID doesn't necessarily need to be reliable compared to error-XID. I think we can have
Re: Unclear problem reports
Hi, On Wed, Feb 02, 2022 at 07:35:36PM -0500, Bruce Momjian wrote: > The Postgres community is great at diagnosing problems and giving users > feedback. In most cases, we can either diagnose a problem and give a > fix, or at least give users a hint at finding the cause. > > However, there is a class of problems that are very hard to help with, > and I have perhaps seen an increasing number of them recently, e.g.: > > > https://www.postgresql.org/message-id/17384-f50f2eedf541e512%40postgresql.org > > https://www.postgresql.org/message-id/CALTnk7uOrMztNfzjNOZe3TdquAXDPD3vZKjWFWj%3D-Fv-gmROUQ%40mail.gmail.com > > I consider these as problems that need digging to find the cause, and > users are usually unable to do sufficient digging, and we don't have > time to give them instructions, so they never get a reply. > > Is there something we can do to improve this situation? Should we just > tell them they need to hire a Postgres expert? I assume these are users > who do not already have access to such experts. There's also only so much you can do to help interacting by mail without access to the machine, even if you're willing to spend time helping people for free. One thing that might help would be to work on a troubleshooting section on the wiki with some common problems and some clear instructions on either how to try to fix the problem or provide needed information for further debugging. At least it would save the time for those steps and one could quickly respond with that link, similarly to how we do for the slow query questions wiki entry.
Re: support for CREATE MODULE
Hi čt 3. 2. 2022 v 3:28 odesílatel Swaha Miller napsal: > Hi, > > I'm following up from Jim's POC for adding MODULE to PostgreSQL. [1] > > My proposal implements modules as schema objects to be stored in a new > system catalog pg_module with new syntax for CREATE [OR REPLACE] MODULE, > ALTER MODULE, DROP MODULE and for GRANT and REVOKE for privileges on > modules and module routines. I am attempting to follow the SQL spec. > However, for right now, I'm proposing to support only routines as module > contents, with local temporary tables and path specifications as defined > in the SQL spec, to be supported in a future submission. We could also > include support for variables depending on its status. [2] > > Following are some examples of what the new module syntax would look > like. The attached patch has detailed documentation. > > CREATE MODULE mtest1 CREATE FUNCTION m1testa() RETURNS text > LANGUAGE sql > RETURN '1x'; > SELECT mtest1.m1testa(); > ALTER MODULE mtest1 CREATE FUNCTION m1testd() RETURNS text > LANGUAGE sql > RETURN 'm1testd'; > SELECT mtest1.m1testd(); > ALTER MODULE mtest1 RENAME TO mtest1renamed; > SELECT mtest1renamed.m1testd(); > REVOKE ON MODULE mtest1 REFERENCES ON FUNCTION m1testa() FROM public; > GRANT ON MODULE mtest1 REFERENCES ON FUNCTION m1testa() TO > regress_priv_user1; > > I am new to the PostgreSQL community and would really appreciate your > input and feedback. > I dislike this feature. The modules are partially redundant to schemas and to extensions in Postgres, and I am sure, so there is no reason to introduce this. What is the benefit against schemas and extensions? Regards Pavel > > Thanks, > Swaha Miller > Amazon Web Services > > [1] > https://www.postgresql.org/message-id/CAB_5SRebSCjO12%3DnLsaLCBw2vnkiNH7jcNchirPc0yQ2KmiknQ%40mail.gmail.com > > [2] > https://www.postgresql.org/message-id/flat/CAFj8pRD053CY_N4%3D6SvPe7ke6xPbh%3DK50LUAOwjC3jm8Me9Obg%40mail.gmail.com > >
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
On Thu, Feb 3, 2022 at 12:07 AM Nathan Bossart wrote: > > On Wed, Feb 02, 2022 at 05:19:26PM +0530, Bharath Rupireddy wrote: > > On Wed, Feb 2, 2022 at 5:25 AM Nathan Bossart > > wrote: > >> However, I'm not sure about the change to ReadDirExtended(). That might be > >> okay for CheckPointSnapBuild(), which is just trying to remove old files, > >> but CheckPointLogicalRewriteHeap() is responsible for ensuring that files > >> are flushed to disk for the checkpoint. If we stop reading the directory > >> after an error and let the checkpoint continue, isn't it possible that some > >> mappings files won't be persisted to disk? > > > > Unless I mis-read your above statement, with LOG level in > > ReadDirExtended, I don't think we stop reading the files in > > CheckPointLogicalRewriteHeap. Am I missing something here? > > ReadDirExtended() has the following comment: > > * If elevel < ERROR, returns NULL after any error. With the normal coding > * pattern, this will result in falling out of the loop immediately as > * though the directory contained no (more) entries. > > If there is a problem reading the directory, we will LOG and then exit the > loop. If we didn't scan through all the entries in the directory, there is > a chance that we didn't fsync() all the files that need it. Thanks. I get it. For syncing map files, we don't want to tolerate any errors, whereas removal of the old map files (lesser than cutoff LSN) can be tolerated in CheckPointLogicalRewriteHeap. Here's the v7 version using ReadDir for CheckPointLogicalRewriteHeap. Regards, Bharath Rupireddy. v7-0001-Replace-ReadDir-with-ReadDirExtended.patch Description: Binary data
Re: Adding CI to our tree
On Tue, Jan 18, 2022 at 03:08:47PM -0600, Justin Pryzby wrote: > On Mon, Jan 17, 2022 at 12:16:19PM -0800, Andres Freund wrote: > > I think it might still be worth adding stopgap way of running all tap tests > > on > > windows though. Having a vcregress.pl function to find all directories with > > t/ > > and run the tests there, shouldn't be a lot of code... > > I started doing that, however it makes CI/windows even slower. I think it'll > be necessary to run prove with all the tap tests to parallelize them, rather > than looping around directories, many of which have only a single file, and > are > run serially. FYI: I've rebased these against your cirrus/windows changes. A recent cirrus result is here; this has other stuff in the branch, so you can see the artifacts with HTML docs and code coverage. https://github.com/justinpryzby/postgres/runs/5046465342 95793675633 cirrus: spell ccache_maxsize e5286ede1b4 cirrus: avoid unnecessary double star ** 03f6de4643e cirrus: include hints how to install OS packages.. 39cc2130e76 cirrus/linux: script test.log.. 398b7342349 cirrus/macos: uname -a and export at end of sysinfo 9d0f03d3450 wip: cirrus: code coverage bff64e8b998 cirrus: upload html docs as artifacts 80f52c3b172 wip: only upload changed docs 7f3b50f1847 pg_upgrade: show list of files copied only in vebose mode ba229165fe1 wip: run pg_upgrade tests with data-checksums 97d7b039b9b vcregress/ci: test modules/contrib with NO_INSTALLCHECK=1 654b6375401 wip: vcsregress: add alltaptests BTW, I think the double star added in f3feff825 probably doesn't need to be double, either: path: "crashlog-**.txt" -- Justin
Re: psql tab completion versus Debian's libedit
Andres Freund writes: > On 2022-02-02 22:28:31 -0500, Tom Lane wrote: >> I conclude that there are no extant versions of readline/libedit >> that don't have rl_completion_append_character, so we could >> drop that configure test and save a cycle or two. > Sounds good to me! On it now. regards, tom lane
Re: psql tab completion versus Debian's libedit
On 2022-02-02 22:28:31 -0500, Tom Lane wrote: > I conclude that there are no extant versions of readline/libedit > that don't have rl_completion_append_character, so we could > drop that configure test and save a cycle or two. Sounds good to me!
Re: psql tab completion versus Debian's libedit
Andres Freund writes: > I think this is caused by the feature flag detection being broken in the meson > branch - unrelated to your commit - ending up with falsely believing that none > of the rl_* variables exist (below for more on that aspect). > Do we care that the tests would fail when using a readline without any of the > rl_* variables? I don't know if those even exist. Hm, interesting. I reproduced this by #undef'ing HAVE_RL_COMPLETION_APPEND_CHARACTER in pg_config.h. It's not too surprising, because without that, we have no way to prevent readline from appending a space to a completion. In the test at hand, that causes two failures: 1. After we do "pub", we get "public. " not just "public."; and now, that is no longer part of the word-to-be-completed. That breaks the test because FROM is no longer the previous word but the one before that, so we won't try to do table completion. 2. After we do "tab", we normally get "tab1 ", but since no completion is attempted, we should get just "tab". But we get "tab " because the dummy-completion code at the bottom of psql_completion fails to suppress adding a space. In general, *any* failed completion would nonetheless append a space. Each of these is sufficiently bad to prompt user complaints, I think, but we've seen none. And I observe that every buildfarm animal reports having rl_completion_append_character. I conclude that there are no extant versions of readline/libedit that don't have rl_completion_append_character, so we could drop that configure test and save a cycle or two. regards, tom lane
Re: Bugs in pgoutput.c
Amit Kapila writes: > Tom, is it okay for you if I go ahead with this patch after some testing? I've been too busy to get back to it, so sure. regards, tom lane
Re: Bugs in pgoutput.c
On Sat, Jan 29, 2022 at 8:32 AM Amit Kapila wrote: > > On Thu, Jan 6, 2022 at 3:42 AM Tom Lane wrote: > > > > Commit 6ce16088b caused me to look at pgoutput.c's handling of > > cache invalidations, and I was pretty appalled by what I found. > > > > * rel_sync_cache_relation_cb does the wrong thing when called for > > a cache flush (i.e., relid == 0). Instead of invalidating all > > RelationSyncCache entries as it should, it will do nothing. > > > > * When rel_sync_cache_relation_cb does invalidate an entry, > > it immediately zaps the entry->map structure, even though that > > might still be in use (as per the adjacent comment that carefully > > explains why this isn't safe). I'm not sure if this could lead > > to a dangling-pointer core dump, but it sure seems like it could > > lead to failing to translate tuples that are about to be sent. > > > > * Similarly, rel_sync_cache_publication_cb is way too eager to > > reset the pubactions flags, which would likely lead to failing > > to transmit changes that we should transmit. > > > > Are you planning to proceed with this patch? > Tom, is it okay for you if I go ahead with this patch after some testing? -- With Regards, Amit Kapila.
Re: Windows crash / abort handling
Hi, On 2022-02-02 11:24:19 +1300, Thomas Munro wrote: > On Sun, Jan 30, 2022 at 10:02 AM Andres Freund wrote: > > On 2022-01-09 16:57:04 -0800, Andres Freund wrote: > > > I've attached a patch implementing these changes. > > > > Unless somebody is planning to look at this soon, I'm planning to push it to > > master. It's too annoying to have these hangs and not see backtraces. > > +1, I don't know enough about Windows development to have an opinion > on the approach but we've got to try *something*, these hangs are > terrible. I've pushed the patch this thread is about now. Lets see what the buildfarm says. I only could one windows version. Separately I've also pushed a patch to run the windows tests under a timeout. I hope in combination these patches address the hangs. Greetings, Andres Freund
Re: Unclear problem reports
On Wed, Feb 2, 2022 at 5:35 PM Bruce Momjian wrote: > I consider these as problems that need digging to find the cause, and > users are usually unable to do sufficient digging, and we don't have > time to give them instructions, so they never get a reply. > > Is there something we can do to improve this situation? Should we just > tell them they need to hire a Postgres expert? I assume these are users > who do not already have access to such experts. > > Have pg_lister queue up a check for, say, two or three days after the bug reporting form is filled out. If the report hasn't been responded to by someone other than the OP send out a reply that basically says: We're sorry your message hasn't yet attracted a response. If your report falls into the category of "tech support for a malfunctioning server", and this includes error messages that are difficult or impossible to replicate in a development environment (maybe give some examples), you may wish to consider eliciting paid professional help. Please see this page on our website for a directory of companies that provide such services. The PostgreSQL Core Project itself refrains from making recommendations since many, if not all, of these companies contribute back to the project in order to keep it both free and open source. I would also consider adding a form that directs messages to pg_admin instead of pg_bugs. On that form we could put the above message upfront - basically saying that here is a place to ask for help but the core project (this website) doesn't directly provide such services: so if you don't receive a reply, or your needs are immediate, consider enlisting a paid support from our directory. The problem with the separate form is that we would need users to self-select to report their problem to the support list instead of using a bug reporting form. Neither of the mentioned emails are good examples of bug reports. Either we make it easier and hope self-selection works, or just resign ourselves to seeing these messages on -bugs and use the first option to at least be a bit more responsive. The risks related to having a rote response, namely it not really being applicable to the situation, seem minimal and others seeing that response (assuming we'd send it to the list and not just the OP) would likely encourage someone to at least give better suggestions for next steps should that be necessary. David J.
Re: psql tab completion versus Debian's libedit
Hi, On 2022-02-01 16:30:11 -0500, Tom Lane wrote: > I chased down the failure that kittiwake has been showing since > 02b8048ba [1]. I just rebased my meson branch across the commit d33a81203e9. And on freebsd the meson based build failed in the expanded tests, while autoconf succeeded. The failure is: not ok 22 - complete schema-qualified name # Failed test 'complete schema-qualified name' # at /tmp/cirrus-ci-build/src/bin/psql/t/010_tab_completion.pl line 236. # Actual output was "tab " # Did not match "(?^:tab1 )" I think this is caused by the feature flag detection being broken in the meson branch - unrelated to your commit - ending up with falsely believing that none of the rl_* variables exist (below for more on that aspect). Do we care that the tests would fail when using a readline without any of the rl_* variables? I don't know if those even exist. The reason for meson not detecting the variables is either an "andres" or freebsd / readline issue. The tests fail with: /usr/local/include/readline/rltypedefs.h:71:36: error: unknown type name 'FILE' typedef int rl_getc_func_t PARAMS((FILE *)); ^ apparently the readline header on freebsd somehow has a dependency on stdio.h being included. Looks like it's easy enough to work around. My local copy of readline.h (8.1 on debian sid) has an explicit stdio.h include, but it looks like that's a debian addition... Greetings, Andres Freund
Re: ci/cfbot: run windows tests under a timeout
Hi, On 2022-02-02 10:31:07 -0800, Andres Freund wrote: > The attached test adds a timeout (using git's timeout binary) to all vcregress > invocations. I've not re-added it to the other OSs, but I'm on the fence about > doing so. I've pushed this now. Greetings, Andres Freund
Unclear problem reports
The Postgres community is great at diagnosing problems and giving users feedback. In most cases, we can either diagnose a problem and give a fix, or at least give users a hint at finding the cause. However, there is a class of problems that are very hard to help with, and I have perhaps seen an increasing number of them recently, e.g.: https://www.postgresql.org/message-id/17384-f50f2eedf541e512%40postgresql.org https://www.postgresql.org/message-id/CALTnk7uOrMztNfzjNOZe3TdquAXDPD3vZKjWFWj%3D-Fv-gmROUQ%40mail.gmail.com I consider these as problems that need digging to find the cause, and users are usually unable to do sufficient digging, and we don't have time to give them instructions, so they never get a reply. Is there something we can do to improve this situation? Should we just tell them they need to hire a Postgres expert? I assume these are users who do not already have access to such experts. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Replace pg_controldata output fields with macros for better code manageability
On Sat, Jan 29, 2022 at 08:00:53PM +0530, Bharath Rupireddy wrote: > Hi, > > While working on another pg_control patch, I observed that the > pg_controldata output fields such as "Latest checkpoint's > TimeLineID:", "Latest checkpoint's NextOID:'' and so on, are being > used in pg_resetwal.c, pg_controldata.c and pg_upgrade/controldata.c. > Direct usage of these fields in many places doesn't look good and can > be error prone if we try to change the text in one place and forget in > another place. I'm thinking of having those fields as macros in > pg_control.h, something like [1] and use it all the places. This will > be a good idea for better code manageability IMO. > > Thoughts? > > [1] > #define PG_CONTROL_FIELD_VERSION_NUMBER "pg_control version number:" > #define PG_CONTROL_FIELD_CATALOG_NUMBER "Catalog version number:" > #define PG_CONTROL_FIELD_CHECKPOINT_TLI "Latest checkpoint's TimeLineID:" > #define PG_CONTROL_FIELD_CHECKPOINT_PREV_TLI "Latest checkpoint's > PrevTimeLineID:" > #define PG_CONTROL_FIELD_CHECKPOINT_OLDESTXID "Latest checkpoint's oldestXID:" > #define PG_CONTROL_FIELD_CHECKPOINT_OLDESTXID_DB "Latest checkpoint's > oldestXID's DB:" > and so on. That seems like a very good idea. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Server-side base backup: why superuser, not pg_write_server_files?
I wrote: > The Windows animals don't like this: > pg_basebackup: error: connection to server at "127.0.0.1", port 59539 failed: > FATAL: SSPI authentication failed for user "backupuser" > Not sure whether we have a standard method to get around that. Ah, right, we do. Looks like adding something like auth_extra => [ '--create-role', 'backupuser' ] to the $node->init call would do it, or you could mess with invoking pg_regress --config-auth directly. regards, tom lane
Re: archive modules
Thanks for the review! On Wed, Feb 02, 2022 at 01:42:55PM -0500, Robert Haas wrote: > I think avoiding ERROR is going to be impractical. Catching it in the > contrib module seems OK. Catching it in the generic code is probably > also possible to do in a reasonable way. Not catching the error also > seems like it would be OK, since we expect errors to be infrequent. > I'm not objecting to anything you did here, but I'm uncertain why > adding basic_archive along shell_archive changes the calculus here in > any way. It just seems like a separate problem. The main scenario I'm thinking about is when there is no space left for archives. The shell archiving logic is pretty good about avoiding ERRORs, so when there is a problem executing the command, the archiver will retry the command a few times before giving up for a while. If basic_archive just ERROR'd due to ENOSPC, it would cause the archiver to restart. Until space frees up, I believe the archiver will end up restarting every 10 seconds. I thought some more about adding a generic exception handler for the archiving callback. I think we'd need to add a new callback function that would perform any required cleanup (e.g., closing any files that might be left open). That part isn't too bad. However, module authors will also need to keep in mind that the archiving callback runs in its own transient memory context. If the module needs to palloc() something that needs to stick around for a while, it will need to do so in a different memory context. With sufficient documentation, maybe this part isn't too bad either, but in the end, all of this is to save an optional ~15 lines of code in the module. It's not crucial to do your own ERROR handling in your archive module, but if you want to, you can use basic_archive as a good starting point. tl;dr - I left it the same. > + /* Perform logging of memory contexts of this process */ > + if (LogMemoryContextPending) > + ProcessLogMemoryContextInterrupt(); > > Any special reason for moving this up higher? Not really an issue, just > curious. Since archive_library changes cause the archiver to restart, I thought it might be good to move this before the process might exit in case LogMemoryContextPending and ConfigReloadPending are both true. > + gettext_noop("This is unused if > \"archive_library\" does not indicate archiving via shell is > enabled.") > > This contains a double negative. We could describe it more positively: > This is used only if \"archive_library\" specifies archiving via > shell. But that's actually a little confusing, because the way you've > set it up, archiving via shell can be specified by writing either > archive_library = '' or archive_library = 'shell'. I don't see any > particularly good reason to allow that to be spelled in two ways. > Let's pick one. Then here we can write either: > > (a) This is used only if archive_library = 'shell'. > -or- > (b) This is used only if archive_library is not set. > > IMHO, either of those would be clearer than what you have right now, > and it would definitely be shorter. I went with (b). That felt a bit more natural to me, and it was easier to code because I don't have to add error checking for an empty string. > +/* > + * Callback that gets called to determine if the archive module is > + * configured. > + */ > +typedef bool (*ArchiveCheckConfiguredCB) (void); > + > +/* > + * Callback called to archive a single WAL file. > + */ > +typedef bool (*ArchiveFileCB) (const char *file, const char *path); > + > +/* > + * Called to shutdown an archive module. > + */ > +typedef void (*ArchiveShutdownCB) (void); > > I think that this is the wrong amount of comments. One theory is that > the reader should refer to the documentation to understand how these > callbacks work. In that case, having a separate comment for each one > that doesn't really say anything is just taking up space. It would be > better to have one comment for all three lines referring the reader to > the documentation. Alternatively, one could take the position that the > explanation should go into these comments, and then perhaps we don't > even really need documentation. A one-line comment that doesn't really > say anything non-obvious seems like the worst amount. In my quest to write well-commented code, I sometimes overdo it. I adjusted these comments in the new revision. > + > + > + There are considerable robustness and security risks in using > archive modules > + because, being written in the C language, they > have access > + to many server resources. Administrators wishing to enable archive > modules > + should exercise extreme caution. Only carefully audited modules should be > + loaded. > + > + > > Maybe I'm just old and jaded, but do we really need this? I know we > have the same thing for background workers, but if anything that seems > like an argument against duplicating it elsewhere. Lots of
Re: Server-side base backup: why superuser, not pg_write_server_files?
=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= writes: > Here's a follow-on patch that adds a test for non-superuser server-side > basebackup, which crashes without your patch and passes with it. The Windows animals don't like this: # Running: pg_basebackup --no-sync -cfast -U backupuser --target server:C:\\prog\\bf\\root\\HEAD\\pgsql.build\\src\\bin\\pg_basebackup\\tmp_check\\tmp_test_VGMM/backuponserver -X none pg_basebackup: error: connection to server at "127.0.0.1", port 59539 failed: FATAL: SSPI authentication failed for user "backupuser" not ok 108 - backup target server # Failed test 'backup target server' # at t/010_pg_basebackup.pl line 527. Not sure whether we have a standard method to get around that. regards, tom lane
Re: [PATCH] Add extra statistics to explain for Nested Loop
Hi, hackers. I apply the new version of patch. Justin Pryzby wrote: I'm curious to hear what you and others think of the refactoring. Thank you so much. With your changes, the patch has become more understandable and readable. It'd be nice if there's a good way to add a test case for verbose output involving parallel workers, but the output is unstable ... Done! Lukas Fittl wrote: I've briefly thought whether this needs documentation (currently the patch includes none), but there does not appear to be a good place to add documentation about this from a quick glance, so it seems acceptable to leave this out given the lack of more detailed EXPLAIN documentation in general. You're right! I added feature description to the patch header. Whilst no specific bad cases were provided, I wonder if even a simple pgbench with auto_explain (and log_analyze=1) would be a way to test this? I wanted to measure overheads, but could't choose correct way. Thanks for idea with auto_explain. I loaded it and made 10 requests of pgbench (number of clients: 1, of threads: 1). I'm not sure I chose the right way to measure overhead, so any suggestions are welcome. Current results are in file overhead_v0.txt. Please feel free to share your suggestions and comments. Regards, -- Ekaterina Sokolova Postgres Professional: http://www.postgrespro.com The Russian Postgres CompanyFrom: Ekaterina Sokolova Subject: [PATCH] Add extra statistics to explain for Nested Loop For some distributions of data in tables, different loops in nested loop joins can take different time and process different amounts of entries. It makes average statistics returned by explain analyze not very useful for DBA. This patch add collecting of min, max and total statistics for time and rows across all loops to EXPLAIN ANALYSE. You need to set the VERBOSE flag to display this information. The patch contains regression tests. Example of results in TEXT format: -> Append (actual rows=5 loops=5) Loop Min Rows: 2 Max Rows: 6 Total Rows: 23 Reviewed-by: Lukas Fittl, Justin Pryzby, Yugo Nagata, Julien Rouhaud. --- src/backend/commands/explain.c| 147 +++--- src/backend/executor/instrument.c | 31 ++ src/include/executor/instrument.h | 6 ++ src/test/regress/expected/explain.out | 10 ++ src/test/regress/expected/partition_prune.out | 117 src/test/regress/sql/partition_prune.sql | 32 ++ 6 files changed, 280 insertions(+), 63 deletions(-) diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 10644dfac44..458f37c3fc6 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -118,6 +118,8 @@ static void show_instrumentation_count(const char *qlabel, int which, PlanState *planstate, ExplainState *es); static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es); static void show_eval_params(Bitmapset *bms_params, ExplainState *es); +static void show_loop_info(Instrumentation *instrument, bool isworker, + ExplainState *es); static const char *explain_get_index_name(Oid indexId); static void show_buffer_usage(ExplainState *es, const BufferUsage *usage, bool planning); @@ -1606,42 +1608,14 @@ ExplainNode(PlanState *planstate, List *ancestors, if (es->analyze && planstate->instrument && planstate->instrument->nloops > 0) - { - double nloops = planstate->instrument->nloops; - double startup_ms = 1000.0 * planstate->instrument->startup / nloops; - double total_ms = 1000.0 * planstate->instrument->total / nloops; - double rows = planstate->instrument->ntuples / nloops; - - if (es->format == EXPLAIN_FORMAT_TEXT) - { - if (es->timing) -appendStringInfo(es->str, - " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)", - startup_ms, total_ms, rows, nloops); - else -appendStringInfo(es->str, - " (actual rows=%.0f loops=%.0f)", - rows, nloops); - } - else - { - if (es->timing) - { -ExplainPropertyFloat("Actual Startup Time", "ms", startup_ms, - 3, es); -ExplainPropertyFloat("Actual Total Time", "ms", total_ms, - 3, es); - } - ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es); - ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es); - } - } + show_loop_info(planstate->instrument, false, es); else if (es->analyze) { if (es->format == EXPLAIN_FORMAT_TEXT) appendStringInfoString(es->str, " (never executed)"); else { + /* without min and max values because actual result is 0 */ if (es->timing) { ExplainPropertyFloat("Actual Startup Time", "ms", 0.0, 3, es); @@ -1664,44 +1638,14 @@ ExplainNode(PlanState *planstate, List *ancestors, for (int n = 0; n < w->num_workers; n++) { Instrumentation *instrument = >instrument[n]; - double nloops = instrument->nloops; - double startup_ms; -
Re: Support for NSS as a libpq TLS backend
On Tue, Feb 1, 2022 at 01:52:09PM -0800, Andres Freund wrote: > There's https://hg.mozilla.org/projects/nspr/file/tip/pr/src - which is I > think the upstream source. > > A project without even a bare-minimal README at the root does have a "internal > only" feel to it... I agree --- it is a library --- if they don't feel the need to publish the API, it seems to mean they want to maintain the ability to change it at any time, and therefore it is inappropriate for other software to rely on that API. This is not the same as Postgres extensions needing to read the Postgres source code --- they are an important but edge use case and we never saw the need to standardize or publish the internal functions that must be studied and adjusted possibly for major releases. This kind of feels like the Chrome JavaScript code that used to be able to be build separately for PL/v8, but has gotten much harder to do in the past few years. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: CREATEROLE and role ownership hierarchies
On Wed, Feb 2, 2022 at 3:23 PM Mark Dilger wrote: > It's perfectly reasonable (in my mind) that Robert, acting as superuser, may > want to create a creator who acts like a superuser over the sandbox, while at > the same time Stephen, acting as superuser, may want to create a creator who > acts as a low privileged bot that only adds and removes roles, but cannot > read their tables, SET ROLE to them, etc. > > I don't see any reason that Robert and Stephen can't both get what they want. > We just have to make Thing 1 flexible enough. Hmm, that would be fine with me. I don't mind a bit if other people get what they want, as long as I can get what I want, too! In fact, I'd prefer it if other people also get what they want... That having been said, I have some reservations if it involves tightly coupling new features that we're trying to add to existing things that may or may not be that well designed, like the role-level INHERIT flag, or WITH ADMIN OPTION, or the not-properly maintained pg_auth_members.grantor column, or even the SQL standard. I'm not saying we should ignore any of those things and I don't think that we should ... but at the same time, we can't whether the feature does what people want it to do, either. If we do, this whole thing is really a complete waste of time. If a patch achieves infinitely large amounts of backward compatibility, standards compliance, and conformity with existing design but doesn't do the right stuff, forget it! -- Robert Haas EDB: http://www.enterprisedb.com
Re: Why is INSERT-driven autovacuuming based on pg_class.reltuples?
On Tue, Feb 1, 2022 at 3:02 PM David Rowley wrote: > If we wanted a more current estimate for the number of tuples in a > relation then we could use reltuples / relpages * > RelationGetNumberOfBlocks(r). However, I still don't see why an > INSERT driven auto-vacuums are a particularly special case. ANALYZE > updating the reltuples estimate had an effect on when auto-vacuum > would trigger for tables that generally grow in the number of live > tuples but previously only (i.e before insert vacuums existed) > received auto-vacuum attention due to UPDATEs/DELETEs. Sorry for the delay in my response; I've been travelling. I admit that I jumped the gun here -- I now believe that it's operating as originally designed. And that the design is reasonable. It couldn't hurt to describe the design in a little more detail in the user docs, though. > Nothing there seems to indicate the scale is based on the historical > table size when the table was last vacuumed/analyzed, so you could > claim that the 3 usages of relpages when deciding if the table should > be vacuumed and/or analyzed are all wrong and should take into account > RelationGetNumberOfBlocks too. The route of my confusion might interest you. vacuumlazy.c's call to vac_update_relstats() provides relpages and reltuples values that are very accurate relative to the VACUUM operations view of things (relative to its OldestXmin, which uses the size of the table at the start of the VACUUM, not the end). When the vac_update_relstats() call is made, the same accurate-relative-toOldestXmin values might actually *already* be out of date relative to the physical table as it is at the end of the same VACUUM. The fact that these accurate values could be out of date like this is practically inevitable for a certain kind of table. But that might not matter at all, if they were later *interpreted* in a way that took this into account later on -- context matters. For some reason I made the leap to thinking that everybody else believed the same thing, too, and that the intention with the design of the insert-driven autovacuum stuff was to capture that. But that's just not the case, at all. I apologize for the confusion. BTW, this situation with the relstats only being accurate "relative to the last VACUUM's OldestXmin" is *especially* important with new_dead_tuples values, which are basically recently dead tuples relative to OldestXmin. In the common case where there are very few recently dead tuples, we have an incredibly distorted "sample" of dead tuples (it tells us much more about VACUUM implementation details than the truth of what's going on in the table). So that also recommends "relativistically interpreting" the values later on. This specific issue has even less to do with autovacuum_vacuum_scale_factor than the main point, of course. I agree with you that the insert-driven stuff isn't a special case. -- Peter Geoghegan
Re: CREATEROLE and role ownership hierarchies
> On Feb 2, 2022, at 11:52 AM, Stephen Frost wrote: > > The question that we need to solve is how to give > users the ability to choose what roles have which of the privileges that > we've outlined above and agreed should be separable. Ok, there are really two different things going on here, and the conversation keeps conflating them. Maybe I'm wrong, but I think the conflation of these things is the primary problem preventing us from finishing up the design. Thing 1: The superuser needs to be able to create roles who can create other roles. Let's call them "creators". Not every organization will want the same level of privilege to be given to a creator, or even that all creators have equal levels of privilege. So when the superuser creates a creator, the superuser needs to be able to configure what exactly what that creator can do. This includes which attributes the creator can give to new roles. It *might* include whether the creator maintains a dependency link with the created role, called "ownership" or somesuch. It *might* include whether the creator can create roles into which the creator is granted membership/administership. But there really isn't any reason that these things should be all-or-nothing. Maybe one creator maintains a dependency link with created roles, and that dependency link entails some privileges. Maybe other creators do not maintain such a link. It seems like superuser can define a creator in many different ways, as long as we nail down what those ways are, and what they mean. Thing 2: The creator needs to be able to specify which attributes and role memberships are set up with for roles the creator creates. To the extent that the creator has been granted the privilege to create yet more creators, this recurses to Thing 1. But not all creators will have that ability. I think the conversation gets off topic and disagreement abounds when Thing 1 is assumed to be hardcoded, leaving just the details of Thing 2 to be discussed. It's perfectly reasonable (in my mind) that Robert, acting as superuser, may want to create a creator who acts like a superuser over the sandbox, while at the same time Stephen, acting as superuser, may want to create a creator who acts as a low privileged bot that only adds and removes roles, but cannot read their tables, SET ROLE to them, etc. I don't see any reason that Robert and Stephen can't both get what they want. We just have to make Thing 1 flexible enough. Do you agree at least with this much? If so, I think we can hammer out what to do about Thing 1 and get something committed in time for postgres 15. If not, then I'm probably going to stop working on this until next year, because at this point, we don't have enough time to finish. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: speed up text_position() for utf-8
I wrote: > I tried this test on a newer CPU, and 0003 had no regression. Both > systems used gcc 11.2. Rather than try to figure out why an experiment > had unexpected behavior, I plan to test 0001 and 0002 on a couple > different compilers/architectures and call it a day. It's also worth > noting that 0002 by itself seemed to be decently faster on the newer > machine, but not as fast as 0001 and 0002 together. I tested four machines with various combinations of patches, and it seems the only thing they all agree on is that 0002 is a decent improvement (full results attached). The others can be faster or slower. 0002 also simplifies things, so it has that going for it. I plan to commit that this week unless there are objections. -- John Naylor EDB: http://www.enterprisedb.com haswell xeon / gcc 8 master: Time: 608.047 ms Time: 2859.939 ms (00:02.860) Time: 2255.402 ms (00:02.255) 0001 only: Time: 603.026 ms Time: 3077.163 ms (00:03.077) Time: 2377.189 ms (00:02.377) 0002 only: Time: 615.444 ms Time: 2388.939 ms (00:02.389) Time: 2227.665 ms (00:02.228) 0001-0002: Time: 608.757 ms Time: 2746.711 ms (00:02.747) Time: 2156.545 ms (00:02.157) 0001-0003: Time: 609.236 ms Time: 1466.130 ms (00:01.466) Time: 1353.658 ms (00:01.354) power8 / gcc 4.8 master: Time: 559.375 ms Time: 7336.079 ms (00:07.336) Time: 5049.222 ms (00:05.049) 0001 only: Time: 559.466 ms Time: 7272.719 ms (00:07.273) Time: 4878.817 ms (00:04.879) 0002 only: Time: 555.255 ms Time: 6061.435 ms (00:06.061) Time: 4440.829 ms (00:04.441) 0001-0002: Time: 555.144 ms Time: 5378.270 ms (00:05.378) Time: 5745.331 ms (00:05.745) 0001-0003: Time: 554.707 ms Time: 2189.187 ms (00:02.189) Time: 2193.870 ms (00:02.194) comet lake / gcc 11 master: Time: 383.196 ms Time: 2323.875 ms (00:02.324) Time: 1866.594 ms (00:01.867) 0001 only: Time: 379.193 ms Time: 2046.540 ms (00:02.047) Time: 1720.601 ms (00:01.721) 0002 only: Time: 374.072 ms Time: 1769.163 ms (00:01.769) Time: 1485.612 ms (00:01.486) 0001-0002: Time: 374.454 ms Time: 1208.933 ms (00:01.209) Time: 1018.236 ms (00:01.018) 0001-0003: Time: 376.698 ms Time: 1278.807 ms (00:01.279) Time: 1177.005 ms (00:01.177) 3-year old core i5 / gcc 11 master: Time: 378.955 ms Time: 2329.763 ms (00:02.330) Time: 1891.931 ms (00:01.892) 0001 only: Time: 384.547 ms Time: 2342.837 ms (00:02.343) Time: 1906.436 ms (00:01.906) 0002 only: Time: 379.096 ms Time: 2051.755 ms (00:02.052) Time: 1641.376 ms (00:01.641) 0001-0002: Time: 377.758 ms Time: 1293.624 ms (00:01.294) Time: 1096.991 ms (00:01.097) 0001-0003: Time: 377.767 ms Time: 2052.826 ms (00:02.053) Time: 1329.185 ms (00:01.329)
Re: CREATEROLE and role ownership hierarchies
Greetings, * Mark Dilger (mark.dil...@enterprisedb.com) wrote: > > On Jan 31, 2022, at 10:50 AM, Stephen Frost wrote: > > Supporting that through ADMIN is one option, another would be a > > 'DROPROLE' attribute, though we'd want a way to curtail that from being > > able to be used for just any role and that does lead down a path similar > > to ownership or just generally the concept that some roles have certain > > rights over certain other roles (whether you create them or not...). > > I've been operating under the assumption that I have a lot more freedom to > create new features than to change how existing features behave, for two > reasons: backwards compatibility and sql-spec compliance. I agree that those are concerns that need to be considered, though I'm more concerned about the SQL compliance and less about backwards compatibility in this case. For one thing, I'm afraid that we're not as compliant as we really should be and that should really drive us to make change here anyway, to get closer to what the spec calls for. > Changing how having ADMIN on a role works seems problematic for both those > reasons. My family got me socks for Christmas, not what I actually wanted, a > copy of the SQL-spec. So I'm somewhat guessing here. But I believe we'd > have problems if we "fixed" the part where a role can revoke ADMIN from > others on themselves. Whatever we have, whether we call it "ownership", it > can't be something a role can unilaterally revoke. > > As for a 'DROPROLE' attribute, I don't think that gets us anywhere. You > don't seem to think so, either. So that leaves us with "ownership", perhaps > by another word? I only chose that word because it's what we use elsewhere, > but if we want to call it "managementship" and "manager" or whatever, that's > fine. I'm not to the point of debating the terminology just yet. I'm still > trying to get the behavior nailed down. Yeah, didn't mean to imply that those were great ideas or that I was particularly advocating for them, but just to bring up some other ideas to try and get more thought going into this. > > I do think there's a lot of value in being able to segregate certain > > rights- consider that you may want a role that's able to create other > > roles, perhaps grant them into some set of roles, can lock those roles > > (prevent them from logging in, maybe do a password reset, something like > > that), but which *isn't* able to drop those roles (and all their > > objects) as that's dangerous and mistakes can certainly happen, or be > > able to become that role because the creating role simply doesn't have > > any need to be able to do that (or desire to in many cases, as we > > discussed in the landlord-vs-tenant sub-thread). > > I'm totally on the same page. Your argument upthread about wanting any > malfeasance on the part of a service provider showing up in the audit logs > was compelling. Even for those things the "owner"/"manager" has the rights > to do, we might want to make them choose to do it explicitly and not merely > do it by accident. Glad to hear that. > > Naturally, you'd want *some* role to be able to drop that role (and one > > that doesn't have full superuser access) but that might be a role that's > > not able to create new roles or take over accounts. > > I think it's important to go beyond the idea of a role attribute here. It's > not that role "bob" can drop roles. It's that "bob" can drop *specific* > roles, and for that, there has to be some kind of dependency tracked between > "bob" and those other roles. I'm calling that "ownership". I think that > language isn't just arbitrary, but actually helpful (technically, not > politically) because REASSIGN OWNED should treat this kind of relationship > exactly the same as it treats ownership of schemas, tables, functions, etc. I agree that role attributes isn't a good approach and that we should be moving away from it. I'm less sure that the existance of REASSIGN OWNED for schemas and tables and such should be the driver for what this capability of one role being able to drop another role needs to be called. > > Separation of concerns and powers and all of that is what we want to be > > going for here, more generically, which is why I was opposed to the > > blanket "owners have all rights of all roles they own" implementation. > > I'm hoping to bring back, in v9, the idea of ownership/managership. The real > sticking point here is that we (Robert, Andrew, I, and possibly others) want > to be able to drop in a non-superuser-creator-role into existing systems that > use superuser for role management. We'd like it to be as transparent a > switch as possible. That description itself really makes me wonder about the sense of what was proposed. Specifically "existing systems that use superuser for role management" doesn't make me picture a system where this manager role has any need to run SELECT statements against the
Re: Refactoring SSL tests
> On 2 Feb 2022, at 17:09, Andrew Dunstan wrote: > On 2/2/22 08:26, Daniel Gustafsson wrote: >> Thoughts? I'm fairly sure there are many crimes against Perl in this patch, >> I'm happy to take pointers on how to improve that. > > It feels a bit odd to me from a perl POV. I think it needs to more along > the lines of standard OO patterns. I'll take a stab at that based on > this, might be a few days. That would be great, thanks! -- Daniel Gustafsson https://vmware.com/
Re: Server-side base backup: why superuser, not pg_write_server_files?
Daniel Gustafsson writes: >> On 2 Feb 2022, at 19:58, Robert Haas wrote: >> And one thing that concretely stinks about is the progress reporting >> you get while the tests are running: >> >> t/010_pg_basebackup.pl ... 142/? >> >> That's definitely less informative than 142/330 or whatever. > There is that. That's less informative, but only when looking at the tests > while they are running. There is no difference once the tests has finished so > CI runs etc are no less informative. This however is something to consider. TBH I don't see that that's worth much. None of our tests run so long that you'll be sitting there trying to estimate when it'll be done. I'd rather have the benefit of not having to maintain the test counts. regards, tom lane
Re: [PATCH] Accept IP addresses in server certificate SANs
On Mon, 2022-01-31 at 17:29 +0900, Kyotaro Horiguchi wrote: > However, 0002, > > +/* > + * In a frontend build, we can't include inet.h, but we still need to have > + * sensible definitions of these two constants. Note that pg_inet_net_ntop() > + * assumes that PGSQL_AF_INET is equal to AF_INET. > + */ > +#define PGSQL_AF_INET (AF_INET + 0) > +#define PGSQL_AF_INET6 (AF_INET + 1) > + > > Now we have the same definition thrice in frontend code. Coulnd't we > define them in, say, libpq-fe.h or inet-fe.h (nonexistent) then > include it from the three files? I started down the inet-fe.h route, and then realized I didn't know where that should go. Does it need to be included in (or part of) port.h? And should it be installed as part of the logic in src/include/Makefile? > +$node->connect_fails( > + "$common_connstr host=192.0.2.2", > + "host not matching an IPv4 address (Subject Alternative Name 1)", > > It is not the real IP address of the server. > > https://datatracker.ietf.org/doc/html/rfc6125 > > In some cases, the URI is specified as an IP address rather than a > > hostname. In this case, the iPAddress subjectAltName must be > > present in the certificate and must exactly match the IP in the URI. > > When IP address is embedded in URI, it won't be translated to another > IP address. Concretely https://192.0.1.5/hoge cannot reach to the host > 192.0.1.8. On the other hand, as done in the test, libpq allows that > when "host=192.0.1.5 hostaddr=192.0.1.8". I can't understand what we > are doing in that case. Don't we need to match the SAN IP address > with hostaddr instead of host? I thought that host, not hostaddr, was the part that corresponded to the URI. So in a hypothetical future where postgresqls:// exists, the two URIs postgresqls://192.0.2.2:5432/db postgresqls://192.0.2.2:5432/db?hostaddr=127.0.0.1 should both be expecting the same certificate. That seems to match the libpq documentation as well. (Specifying a host parameter is also allowed... that seems like it could cause problems for a hypothetical postgresqls:// scheme, but it's probably not relevant for this thread.) --Jacob
Re: [PoC] Delegating pg_ident to a third party
On Mon, 2022-01-10 at 15:09 -0500, Stephen Frost wrote: > Greetings, Sorry for the delay, the last few weeks have been insane. > * Jacob Champion (pchamp...@vmware.com) wrote: > > On Tue, 2022-01-04 at 22:24 -0500, Stephen Frost wrote: > > > On Tue, Jan 4, 2022 at 18:56 Jacob Champion wrote: > > > > Could you talk more about the use cases for which having the "actual > > > > user" is better? From an auditing perspective I don't see why > > > > "authenticated as ja...@example.net, logged in as admin" is any worse > > > > than "logged in as jacob". > > > > > > The above case isn’t what we are talking about, as far as I > > > understand anyway. You’re suggesting “authenticated as > > > ja...@example.net, logged in as sales” where the user in the database > > > is “sales”. Consider triggers which only have access to “sales”, or > > > a tool like pgaudit which only has access to “sales”. > > > > Okay. So an additional getter function in miscadmin.h, and surfacing > > that function to trigger languages, are needed to make authn_id more > > generally useful. Any other cases you can think of? > > That would help but now you've got two different things that have to be > tracked, potentially, because for some people you might not want to use > their system auth'd-as ID. I don't see that as a great solution and > instead as a workaround. There's nothing to be worked around. If you have a user mapping set up using the features that exist today, and you want to audit who logged in at some point in the past, then you need to log both the authenticated ID and the authorized role. There's no getting around that. It's not enough to say "just check the configuration" because the config can change over time. > > But to elaborate: *forcing* one-user-per-role is wasteful, because if I > > have a thousand employees, and I want to give all my employees access > > to a guest role in the database, then I have to administer a thousand > > roles: maintaining them through dump/restores and pg_upgrades, auditing > > them to figure out why Bob in Accounting somehow got a different > > privilege GRANT than the rest of the users, adding new accounts, > > purging old ones, maintaining the inevitable scripts that will result. > > pg_upgrade just handles it, no? pg_dumpall -g does too. Having to deal > with roles in general is a pain but the number of them isn't necessarily > an issue. A guest role which doesn't have any auditing requirements > might be a decent use-case for what you're talking about here but I > don't know that we'd implement this for just that case. Part of this > discussion was specifically about addressing the other challenges- like > having automation around the account addition/removal and sync'ing role > membership too. As for auditing privileges, that should be done > regardless and the case you outline isn't somehow different from others > (the same could be as easily said for how the 'guest' account got access > to whatever it did). I think there's a difference between auditing a small fixed number of roles and auditing many thousands of them that change on a weekly or daily basis. I'd rather maintain the former, given the choice. It's harder for things to slip through the cracks with fewer moving pieces. > > If none of the users need to be "special" in any way, that's all wasted > > overhead. (If they do actually need to be special, then at least some > > of that overhead becomes necessary. Otherwise it's waste.) You may be > > able to mitigate the cost of the waste, or absorb the mitigations into > > Postgres so that the user can't see the waste, or decide that the waste > > is not costly enough to care about. It's still waste. > > Except the amount of 'wasted' overhead being claimed here seems to be > hardly any. The biggest complaint levied at this seems to really be > just the issues around the load on the ldap systems from having to deal > with the frequent sync queries, and that's largely a solvable issue in > the majority of environments out there today. As long as we're in agreement that there is waste, I don't think I'm going to convince you about the cost. It's tangential anyway if you're not going to remove many-to-many maps. > > > Not sure exactly what you’re referring to here by “administer role > > > privileges externally too”..? Curious to hear what you are imagining > > > specifically. > > > > Just that it would be nice to centrally provision role GRANTs as well > > as role membership, that's all. No specifics in mind, and I'm not even > > sure if LDAP would be a helpful place to put that sort of config. > > GRANT's on objects, you mean? I agree, that would be interesting to > consider though it would involve custom entries in the LDAP directory, > no? Role membership would be able to be sync'd as part of group > membership and that was something I was thinking would be handled as > part of this in a similar manner to what the 3rd party solutions provide > today using
Re: Server-side base backup: why superuser, not pg_write_server_files?
> On 2 Feb 2022, at 19:58, Robert Haas wrote: > > On Wed, Feb 2, 2022 at 1:50 PM Robert Haas wrote: >> On Wed, Feb 2, 2022 at 1:46 PM Tom Lane wrote: >>> Well, if someone wants to step up and provide a patch that changes 'em >>> all at once, that'd be great. But we've discussed this before and >>> nothing's happened. >> >> I mean, I don't understand why it's even better. And I would go so far >> as to say that if nobody can be bothered to do the work to convert >> everything at once, it probably isn't better. I personally think it's better, so I went and did the work. The attached is a first pass over the tree to see what such a patch would look like. This should get a thread of it's own and not be hidden here but as it was discussed I piled on for now. > And one thing that concretely stinks about is the progress reporting > you get while the tests are running: > > t/010_pg_basebackup.pl ... 142/? > > That's definitely less informative than 142/330 or whatever. There is that. That's less informative, but only when looking at the tests while they are running. There is no difference once the tests has finished so CI runs etc are no less informative. This however is something to consider. -- Daniel Gustafsson https://vmware.com/ 0001-Replace-Test-More-plans-with-done_testing.patch Description: Binary data
Re: Server-side base backup: why superuser, not pg_write_server_files?
On Wed, Feb 2, 2022 at 1:50 PM Robert Haas wrote: > On Wed, Feb 2, 2022 at 1:46 PM Tom Lane wrote: > > Well, if someone wants to step up and provide a patch that changes 'em > > all at once, that'd be great. But we've discussed this before and > > nothing's happened. > > I mean, I don't understand why it's even better. And I would go so far > as to say that if nobody can be bothered to do the work to convert > everything at once, it probably isn't better. And one thing that concretely stinks about is the progress reporting you get while the tests are running: t/010_pg_basebackup.pl ... 142/? That's definitely less informative than 142/330 or whatever. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Server-side base backup: why superuser, not pg_write_server_files?
On Wed, Feb 2, 2022 at 1:46 PM Tom Lane wrote: > Well, if someone wants to step up and provide a patch that changes 'em > all at once, that'd be great. But we've discussed this before and > nothing's happened. I mean, I don't understand why it's even better. And I would go so far as to say that if nobody can be bothered to do the work to convert everything at once, it probably isn't better. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Deparsing rewritten query
st 2. 2. 2022 v 15:18 odesílatel Julien Rouhaud napsal: > Hi, > > On Tue, Feb 01, 2022 at 08:35:00PM +0100, Pavel Stehule wrote: > > > > I tested your patch, and it looks so it is working without any problem. > All > > tests passed. > > > > There is just one question. If printalias = true will be active for all > > cases or just with some flag? > > Sorry, as I just replied to Bharath I sent the wrong patch. The new patch > has > the same modification with printalias = true though, so I can still answer > that > question. The change is active for all cases, however it's a no-op for any > in-core case, as a query sent by a client should be valid, and thus should > have > an alias attached to all subqueries. It's only different if you call > get_query_def() on the result of pg_analyze_and_rewrite(), since this code > doesn't add the subquery aliases as those aren't needed for the execution > part. > ok. I checked this trivial patch, and I don't see any problem. Again I run check-world with success. The documentation for this feature is not necessary. But I am not sure about regress tests. Without any other code, enfosing printalias will be invisible. What do you think about the transformation of your extension to a new module in src/test/modules? Maybe it can be used for other checks in future. Regards Pavel
Re: Server-side base backup: why superuser, not pg_write_server_files?
Robert Haas writes: > On Wed, Feb 2, 2022 at 12:55 PM Tom Lane wrote: >> Actually, it seemed that the consensus in the nearby thread [1] >> was to start doing exactly that, rather than try to convert them >> all in one big push. > Urk. Well, OK then. > Such an approach seems to me to have essentially nothing to recommend > it, but I just work here. Well, if someone wants to step up and provide a patch that changes 'em all at once, that'd be great. But we've discussed this before and nothing's happened. regards, tom lane
Re: Server-side base backup: why superuser, not pg_write_server_files?
On Wed, Feb 2, 2022 at 12:55 PM Tom Lane wrote: > Robert Haas writes: > > This seems like a good idea, but I'm not going to slip a change from > > an exact test count to done_testing() into a commit on some other > > topic... > > Actually, it seemed that the consensus in the nearby thread [1] > was to start doing exactly that, rather than try to convert them > all in one big push. Urk. Well, OK then. Such an approach seems to me to have essentially nothing to recommend it, but I just work here. -- Robert Haas EDB: http://www.enterprisedb.com
Re: archive modules
On Mon, Jan 31, 2022 at 8:36 PM Nathan Bossart wrote: > If basic_archive is to be in contrib, we probably want to avoid restarting > the archiver every time the module ERRORs. I debated trying to add a > generic exception handler that all archive modules could use, but I suspect > many will have unique cleanup requirements. Plus, AFAICT restarting the > archiver isn't terrible, it just causes most of the normal retry logic to > be skipped. > > I also looked into rewriting basic_archive to avoid ERRORs and return false > for all failures, but this was rather tedious. Instead, I just introduced > a custom exception handler for basic_archive's archive callback. This > allows us to ERROR as necessary (which helps ensure that failures show up > in the server logs), and the archiver can treat it like a normal failure > and avoid restarting. I think avoiding ERROR is going to be impractical. Catching it in the contrib module seems OK. Catching it in the generic code is probably also possible to do in a reasonable way. Not catching the error also seems like it would be OK, since we expect errors to be infrequent. I'm not objecting to anything you did here, but I'm uncertain why adding basic_archive along shell_archive changes the calculus here in any way. It just seems like a separate problem. + /* Perform logging of memory contexts of this process */ + if (LogMemoryContextPending) + ProcessLogMemoryContextInterrupt(); Any special reason for moving this up higher? Not really an issue, just curious. + gettext_noop("This is unused if \"archive_library\" does not indicate archiving via shell is enabled.") This contains a double negative. We could describe it more positively: This is used only if \"archive_library\" specifies archiving via shell. But that's actually a little confusing, because the way you've set it up, archiving via shell can be specified by writing either archive_library = '' or archive_library = 'shell'. I don't see any particularly good reason to allow that to be spelled in two ways. Let's pick one. Then here we can write either: (a) This is used only if archive_library = 'shell'. -or- (b) This is used only if archive_library is not set. IMHO, either of those would be clearer than what you have right now, and it would definitely be shorter. +/* + * Callback that gets called to determine if the archive module is + * configured. + */ +typedef bool (*ArchiveCheckConfiguredCB) (void); + +/* + * Callback called to archive a single WAL file. + */ +typedef bool (*ArchiveFileCB) (const char *file, const char *path); + +/* + * Called to shutdown an archive module. + */ +typedef void (*ArchiveShutdownCB) (void); I think that this is the wrong amount of comments. One theory is that the reader should refer to the documentation to understand how these callbacks work. In that case, having a separate comment for each one that doesn't really say anything is just taking up space. It would be better to have one comment for all three lines referring the reader to the documentation. Alternatively, one could take the position that the explanation should go into these comments, and then perhaps we don't even really need documentation. A one-line comment that doesn't really say anything non-obvious seems like the worst amount. + + + There are considerable robustness and security risks in using archive modules + because, being written in the C language, they have access + to many server resources. Administrators wishing to enable archive modules + should exercise extreme caution. Only carefully audited modules should be + loaded. + + Maybe I'm just old and jaded, but do we really need this? I know we have the same thing for background workers, but if anything that seems like an argument against duplicating it elsewhere. Lots of copies of essentially identical warnings aren't the way to great documentation; if we copy this here, we'll probably copy it to more places. And also, it seems a bit like warning people that they shouldn't give their complete financial records to total strangers about whom they have no little or no information. Do tell. + +typedef bool (*ArchiveCheckConfiguredCB) (void); + + +If true is returned, the server will proceed with +archiving the file by calling the archive_file_cb +callback. If false is returned, archiving will not +proceed. In the latter case, the server will periodically call this +function, and archiving will proceed if it eventually returns +true. It's not obvious from reading this why anyone would want to provide this callback, or have it do anything other than 'return true'. But there actually is a behavior difference if you provide this and have it return false, vs. just having archiving itself fail. At least, the message "archive_mode enabled, yet archiving is not configured" will be emitted. So that's something we could mention here. I would suggest s/if it
Re: A qsort template
I wrote: > > 0010 - Thresholds on my TODO list. > > I did some basic tests on the insertion sort thresholds, and it looks > like we could safely and profitably increase the current value from 7 > to 20 or so, in line with other more recent implementations. I've > attached an addendum on top of 0012 and the full test results on an > Intel Coffee Lake machine with gcc 11.1. I found that the object test > setup in 0012 had some kind of bug that was comparing the pointer of > the object array. Rather than fix that, I decided to use Datums, but > with the two extremes in comparator: simple branching with machine > instructions vs. a SQL-callable function. The papers I've read > indicate the results for Datum sizes would not be much different for > small structs. The largest existing sort element is SortTuple, but > that's only 24 bytes and has a bulky comparator as well. > > The first thing to note is that I rejected outright any testing of a > "middle value" where the pivot is simply the middle of the array. Even > the Bently and McIlroy paper which is the reference for our > implementation says "The range that consists of the single integer 7 > could be eliminated, but has been left adjustable because on some > machines larger ranges are a few percent better". > > I tested thresholds up to 64, which is where I guessed results to get > worse (most implementations are smaller than that). Here are the best > thresholds at a quick glance: > > - elementary comparator: > > random: 16 or greater > decreasing, rotate: get noticeably better all the way up to 64 > organ: little difference, but seems to get better all the way up to 64 > 0/1: seems to get worse above 20 > > - SQL-callable comparator: > > random: between 12 and 20, but slight differences until 32 > decreasing, rotate: get noticeably better all the way up to 64 > organ: seems best at 12, but slight differences until 32 > 0/1: slight differences > > Based on these tests and this machine, it seems 20 is a good default > value. I'll repeat this test on one older Intel and one non-Intel > platform with older compilers. The above was an Intel Comet Lake / gcc 11, and I've run the same test on a Haswell-era Xeon / gcc 8 and a Power8 machine / gcc 4.8. The results on those machines are pretty close to the above (full results attached). The noticeable exception is the Power8 on random input with a slow comparator -- those measurements there are more random than others so we can't draw conclusions from them, but the deviations are small in any case. I'm still thinking 20 or so is about right. I've put a lot out here recently, so I'll take a break now and come back in a few weeks. (no running tally here because the conclusions haven't changed since last message) -- John Naylor EDB: http://www.enterprisedb.com NOTICE: [direct] size=8MB, order=random, threshold=7, test=0, time=0.130188 NOTICE: [direct] size=8MB, order=random, threshold=7, test=1, time=0.124503 NOTICE: [direct] size=8MB, order=random, threshold=7, test=2, time=0.124557 NOTICE: [direct] size=8MB, order=random, threshold=12, test=0, time=0.119103 NOTICE: [direct] size=8MB, order=random, threshold=12, test=1, time=0.119035 NOTICE: [direct] size=8MB, order=random, threshold=12, test=2, time=0.086948 NOTICE: [direct] size=8MB, order=random, threshold=16, test=0, time=0.082710 NOTICE: [direct] size=8MB, order=random, threshold=16, test=1, time=0.082771 NOTICE: [direct] size=8MB, order=random, threshold=16, test=2, time=0.083004 NOTICE: [direct] size=8MB, order=random, threshold=20, test=0, time=0.080768 NOTICE: [direct] size=8MB, order=random, threshold=20, test=1, time=0.080764 NOTICE: [direct] size=8MB, order=random, threshold=20, test=2, time=0.080635 NOTICE: [direct] size=8MB, order=random, threshold=24, test=0, time=0.080678 NOTICE: [direct] size=8MB, order=random, threshold=24, test=1, time=0.080555 NOTICE: [direct] size=8MB, order=random, threshold=24, test=2, time=0.080604 NOTICE: [direct] size=8MB, order=random, threshold=28, test=0, time=0.079002 NOTICE: [direct] size=8MB, order=random, threshold=28, test=1, time=0.078901 NOTICE: [direct] size=8MB, order=random, threshold=28, test=2, time=0.082200 NOTICE: [direct] size=8MB, order=random, threshold=32, test=0, time=0.079317 NOTICE: [direct] size=8MB, order=random, threshold=32, test=1, time=0.079164 NOTICE: [direct] size=8MB, order=random, threshold=32, test=2, time=0.079308 NOTICE: [direct] size=8MB, order=random, threshold=64, test=0, time=0.078604 NOTICE: [direct] size=8MB, order=random, threshold=64, test=1, time=0.078718 NOTICE: [direct] size=8MB, order=random, threshold=64, test=2, time=0.078689 NOTICE: [direct] size=8MB, order=decreasing, threshold=7, test=0, time=0.025097 NOTICE: [direct] size=8MB, order=decreasing, threshold=7, test=1, time=0.025078 NOTICE: [direct] size=8MB, order=decreasing, threshold=7, test=2, time=0.025095 NOTICE: [direct] size=8MB, order=decreasing, threshold=12, test=0,
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
On Wed, Feb 02, 2022 at 05:19:26PM +0530, Bharath Rupireddy wrote: > On Wed, Feb 2, 2022 at 5:25 AM Nathan Bossart > wrote: >> However, I'm not sure about the change to ReadDirExtended(). That might be >> okay for CheckPointSnapBuild(), which is just trying to remove old files, >> but CheckPointLogicalRewriteHeap() is responsible for ensuring that files >> are flushed to disk for the checkpoint. If we stop reading the directory >> after an error and let the checkpoint continue, isn't it possible that some >> mappings files won't be persisted to disk? > > Unless I mis-read your above statement, with LOG level in > ReadDirExtended, I don't think we stop reading the files in > CheckPointLogicalRewriteHeap. Am I missing something here? ReadDirExtended() has the following comment: * If elevel < ERROR, returns NULL after any error. With the normal coding * pattern, this will result in falling out of the loop immediately as * though the directory contained no (more) entries. If there is a problem reading the directory, we will LOG and then exit the loop. If we didn't scan through all the entries in the directory, there is a chance that we didn't fsync() all the files that need it. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
ci/cfbot: run windows tests under a timeout
Hi, On windows cfbot currently regularly hangs / times out. Presumably this is due to the issues discussed in https://postgr.es/m/CA%2BhUKG%2BG5DUNJfdE-qusq5pcj6omYTuWmmFuxCvs%3Dq1jNjkKKA%40mail.gmail.com which lead to reverting [1] some networking related changes everywhere but master. But it's hard to tell - because the entire test task times out, we don't get to see debugging information. In earlier versions of the CI script I had tests run under a timeout command, that killed the entire test run. I found that to be helpful when working on AIO. But I removed that, in an attempt to simplify things, before submitting. Turns out it was needed complexity. The attached test adds a timeout (using git's timeout binary) to all vcregress invocations. I've not re-added it to the other OSs, but I'm on the fence about doing so. The diff is a bit larger than one might think necessary: Yaml doesn't like % - from the windows command variable syntax - at the start of an unquoted string... Separately, we should probably make Cluster.pm::psql() etc always use a "fallback" timeout (rather than just when the test writer thought it's necessary). Or perhaps Utils.pm's INIT should set up a timer after which an individual test is terminated? Greetings, Andres Freund [1] commit 75674c7ec1b1607e7013b5cebcb22d9c8b4b2cb6 Author: Tom Lane Date: 2022-01-25 12:17:40 -0500 Revert "graceful shutdown" changes for Windows, in back branches only. This reverts commits 6051857fc and ed52c3707, but only in the back branches. Further testing has shown that while those changes do fix some things, they also break others; in particular, it looks like walreceivers fail to detect walsender-initiated connection close reliably if the walsender shuts down this way. We'll keep trying to improve matters in HEAD, but it now seems unwise to push these changes into stable releases. Discussion: https://postgr.es/m/CA+hUKG+OeoETZQ=Qw5Ub5h3tmwQhBmDA=nuNO3KG=zwfuyp...@mail.gmail.com >From e9795000729f649a861523c65376b978cea4a94c Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Tue, 1 Feb 2022 15:25:54 -0800 Subject: [PATCH] wip: time out tests on windows ci-os-only: windows --- .cirrus.yml | 81 + 1 file changed, 45 insertions(+), 36 deletions(-) diff --git a/.cirrus.yml b/.cirrus.yml index 677bdf0e65e..8e0ae69beea 100644 --- a/.cirrus.yml +++ b/.cirrus.yml @@ -366,6 +366,14 @@ task: # build MSBFLAGS: -m -verbosity:minimal "-consoleLoggerParameters:Summary;ForceNoAlign" /p:TrackFileAccess=false -nologo +# If tests hang forever, cirrus eventually times out. In that case log +# output etc is not uploaded, making the problem hard to debug. Of course +# tests internally should have shorter timeouts, but that's proven to not +# be sufficient. 15min currently is fast enough to finish individual test +# "suites". +T_C: "\"C:/Program Files/Git/usr/bin/timeout.exe\" -v -k60s 15m" + + only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*windows.*' windows_container: @@ -391,42 +399,43 @@ task: # Installation on windows currently only completely works from src/tools/msvc - cd src/tools/msvc && perl install.pl %CIRRUS_WORKING_DIR%/tmp_install - test_regress_parallel_script: -- perl src/tools/msvc/vcregress.pl check parallel - startcreate_script: -# paths to binaries need backslashes -- tmp_install\bin\pg_ctl.exe initdb -D tmp_check/db -l tmp_check/initdb.log --options=--no-sync -- echo include '%TEMP_CONFIG%' >> tmp_check/db/postgresql.conf -- tmp_install\bin\pg_ctl.exe start -D tmp_check/db -l tmp_check/postmaster.log - test_pl_script: -- perl src/tools/msvc/vcregress.pl plcheck - test_isolation_script: -- perl src/tools/msvc/vcregress.pl isolationcheck - test_modules_script: -- perl src/tools/msvc/vcregress.pl modulescheck - test_contrib_script: -- perl src/tools/msvc/vcregress.pl contribcheck - stop_script: -- tmp_install\bin\pg_ctl.exe stop -D tmp_check/db -l tmp_check/postmaster.log - test_ssl_script: -- set with_ssl=openssl -- perl src/tools/msvc/vcregress.pl taptest ./src/test/ssl/ - test_subscription_script: -- perl src/tools/msvc/vcregress.pl taptest ./src/test/subscription/ - test_authentication_script: -- perl src/tools/msvc/vcregress.pl taptest ./src/test/authentication/ - test_recovery_script: -- perl src/tools/msvc/vcregress.pl recoverycheck - test_bin_script: -- perl src/tools/msvc/vcregress.pl bincheck - test_pg_upgrade_script: -- perl src/tools/msvc/vcregress.pl upgradecheck - test_ecpg_script: -# tries to build additional stuff -- vcvarsall x64 -# References ecpg_regression.proj in the current dir -- cd src/tools/msvc -- perl vcregress.pl ecpgcheck + test_regress_parallel_script: | +%T_C% perl src/tools/msvc/vcregress.pl
Re: 2022-01 Commitfest
On Wed, Feb 02, 2022 at 01:00:18PM -0500, Tom Lane wrote: > > Anyway, thanks to Julien for doing this mostly-thankless task > this time! > Agreed, great work! -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
Re: 2022-01 Commitfest
Jaime Casanova writes: > On Thu, Feb 03, 2022 at 01:28:53AM +0800, Julien Rouhaud wrote: >> If we close all patches that had a review just because they weren't perfect >> in >> their initial submission, we're just going to force everyone to re-register >> their patch for every single commit fest. I don't see that doing anything >> apart from making sure that everyone stops contributing. > I had the same problem last time, "Returned with feedback" didn't feel > fine in some cases. Agreed, we're not here to cause make-work for submitters. RWF is appropriate if the patch has been in Waiting On Author for awhile and doesn't seem to be going anywhere, but otherwise we should just punt it to the next CF. Anyway, thanks to Julien for doing this mostly-thankless task this time! regards, tom lane
Re: 2022-01 Commitfest
On Wed, Feb 02, 2022 at 12:45:40PM -0500, Jaime Casanova wrote: > On Thu, Feb 03, 2022 at 01:28:53AM +0800, Julien Rouhaud wrote: > > > > My understanding of "Returned with Feedback" is that the patch implements > > something wanted, but as proposed won't be accepted without a major > > redesign or > > something like that. Not patches that are going through normal "review / > > addressing reviews" cycles. And definitely not bug fixes either. > > > > If we close all patches that had a review just because they weren't perfect > > in > > their initial submission, we're just going to force everyone to re-register > > their patch for every single commit fest. I don't see that doing anything > > apart from making sure that everyone stops contributing. > > > > I had the same problem last time, "Returned with feedback" didn't feel > fine in some cases. > > After reading this i started to wish there was some kind of guide about > this, and of course the wiki has that guide (outdated yes but something > to start with). > > https://wiki.postgresql.org/wiki/CommitFest_Checklist#Sudden_Death_Overtime > > This needs some love, still mentions rrreviewers for example Yes, I looked at it but to be honest it doesn't make any sense. It feels like this is punishing patches that get reviewed at the end of the commitfest or that previously got an incorrect review, and somehow tries to salvage patches from authors that don't review anything. > but if we > updated and put here a clear definition of the states maybe it could > help to do CF managment. I'm all for it, but looking at the current commit fest focusing on unresponsive authors (e.g. close one way or another patches that have been waiting on author for more than X days) should already help quite a lot.
Re: Server-side base backup: why superuser, not pg_write_server_files?
Robert Haas writes: > This seems like a good idea, but I'm not going to slip a change from > an exact test count to done_testing() into a commit on some other > topic... Actually, it seemed that the consensus in the nearby thread [1] was to start doing exactly that, rather than try to convert them all in one big push. regards, tom lane [1] https://www.postgresql.org/message-id/flat/9D4FFB61-392B-4A2C-B7E4-911797B4AC14%40yesql.se
Re: 2022-01 Commitfest
On Thu, Feb 03, 2022 at 01:28:53AM +0800, Julien Rouhaud wrote: > > My understanding of "Returned with Feedback" is that the patch implements > something wanted, but as proposed won't be accepted without a major redesign > or > something like that. Not patches that are going through normal "review / > addressing reviews" cycles. And definitely not bug fixes either. > > If we close all patches that had a review just because they weren't perfect in > their initial submission, we're just going to force everyone to re-register > their patch for every single commit fest. I don't see that doing anything > apart from making sure that everyone stops contributing. > I had the same problem last time, "Returned with feedback" didn't feel fine in some cases. After reading this i started to wish there was some kind of guide about this, and of course the wiki has that guide (outdated yes but something to start with). https://wiki.postgresql.org/wiki/CommitFest_Checklist#Sudden_Death_Overtime This needs some love, still mentions rrreviewers for example, but if we updated and put here a clear definition of the states maybe it could help to do CF managment. -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
Re: 2022-01 Commitfest
Hi, On Wed, Feb 02, 2022 at 12:09:06PM -0500, Greg Stark wrote: > I gave two reviews and received one review but the patches have been > "Moved to next CF". For now I only moved to the next commit fest the patches that were in "Needs Review" or "Ready for Committer". I'm assuming that you failed to update the cf entry accordingly after your reviews, so yeah the patches were moved. I unfortunately don't have a lot of time right now and the commit fest still needs to be closed, so I prefer to use my time triaging the patches that were marked as Waiting on Author rather than going through a couple hundred of threads yet another time. > Should I update them to "Returned with Feedback" > given they all did get feedback? I was under the impression "Moved to > next CF" was only for patches that didn't get feedback in a CF and > were still waiting for feedback. My understanding of "Returned with Feedback" is that the patch implements something wanted, but as proposed won't be accepted without a major redesign or something like that. Not patches that are going through normal "review / addressing reviews" cycles. And definitely not bug fixes either. If we close all patches that had a review just because they weren't perfect in their initial submission, we're just going to force everyone to re-register their patch for every single commit fest. I don't see that doing anything apart from making sure that everyone stops contributing.
Re: [PATCH] Add reloption for views to enable RLS
Hi Laurenz, thank you again for the review! On 1/20/22 15:20, Laurenz Albe wrote: [..] I gave the new patch a spin, and got a surprising result: [..] INSERT INTO v VALUES (1); INSERT 0 1 Huh? "duff" has no permission to insert into "tab"! That really should not happen, thanks for finding that and helping me investigating on how to fix that! This is now solved by checking the security_invoker property on the view in rewriteTargetView(). I've also added a testcase for this in v4 to catch that in future. [..] About the documentation: --- a/doc/src/sgml/ref/create_view.sgml +++ b/doc/src/sgml/ref/create_view.sgml + +security_invoker (boolean) + + + If this option is set, it will cause all access to the underlying + tables to be checked as referenced by the invoking user, rather than + the view owner. This will only take effect when row level security is + enabled on the underlying tables (using + ALTER TABLE ... ENABLE ROW LEVEL SECURITY). + Why should this *only* take effect if (not "when") RLS is enabled? The above test shows that there is an effect even without RLS. + This option can be changed on existing views using ALTER VIEW. See + for more details on row level security. + I don't think that it is necessary to mention that this can be changed with ALTER VIEW - all storage parameters can be. I guess you copied that from the "check_option" documentation, but I would say it need not be mentioned there either. Exactly, I tried to fit it in with the existing parameters. I moved the link to ALTER VIEW to the end of the paragraph, as it applies to all options anyways. + +If the security_invoker option is set on the view, +access to tables is determined by permissions of the invoking user, rather +than the view owner. This can be used to provide stricter permission +checking to the underlying tables than by default. Since you are talking about use cases here, RLS might deserve a mention. Expanded upon a little bit in v4. --- a/src/backend/access/common/reloptions.c +++ b/src/backend/access/common/reloptions.c + { + { + "security_invoker", + "View subquery in invoked within the current security context.", + RELOPT_KIND_VIEW, + AccessExclusiveLock + }, + false + }, That doesn't seem to be proper English. Yes, that happened when rewriting this for v1 -> v2. Fixed. Thanks, Christoph HeissFrom 01437a45bfd069080ffe0eb45288bfddd3de6009 Mon Sep 17 00:00:00 2001 From: Christoph Heiss Date: Wed, 2 Feb 2022 16:44:38 +0100 Subject: [PATCH v4 1/3] Add new boolean reloption security_invoker to views When this reloption is set to true, all references to the underlying tables will be checked against the invoking user rather than the view owner, as is currently implemented. Signed-off-by: Christoph Heiss --- src/backend/access/common/reloptions.c| 11 src/backend/nodes/copyfuncs.c | 1 + src/backend/nodes/equalfuncs.c| 1 + src/backend/nodes/outfuncs.c | 1 + src/backend/nodes/readfuncs.c | 1 + src/backend/optimizer/plan/subselect.c| 1 + src/backend/optimizer/prep/prepjointree.c | 1 + src/backend/rewrite/rewriteHandler.c | 17 -- src/backend/utils/cache/relcache.c| 63 +-- src/include/nodes/parsenodes.h| 1 + src/include/utils/rel.h | 11 11 files changed, 77 insertions(+), 32 deletions(-) diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c index d592655258..c7c62a0076 100644 --- a/src/backend/access/common/reloptions.c +++ b/src/backend/access/common/reloptions.c @@ -140,6 +140,15 @@ static relopt_bool boolRelOpts[] = }, false }, + { + { + "security_invoker", + "Check permissions against underlying tables as the calling user, not as view owner", + RELOPT_KIND_VIEW, + AccessExclusiveLock + }, + false + }, { { "vacuum_truncate", @@ -1996,6 +2005,8 @@ view_reloptions(Datum reloptions, bool validate) static const relopt_parse_elt tab[] = { {"security_barrier", RELOPT_TYPE_BOOL, offsetof(ViewOptions, security_barrier)}, + {"security_invoker", RELOPT_TYPE_BOOL, + offsetof(ViewOptions, security_invoker)}, {"check_option", RELOPT_TYPE_ENUM, offsetof(ViewOptions, check_option)} }; diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 90b5da51c9..b171992ba8 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -2465,6 +2465,7 @@ _copyRangeTblEntry(const RangeTblEntry *from) COPY_NODE_FIELD(tablesample); COPY_NODE_FIELD(subquery); COPY_SCALAR_FIELD(security_barrier); + COPY_SCALAR_FIELD(security_invoker); COPY_SCALAR_FIELD(jointype); COPY_SCALAR_FIELD(joinmergedcols);
Re: 2022-01 Commitfest
I gave two reviews and received one review but the patches have been "Moved to next CF". Should I update them to "Returned with Feedback" given they all did get feedback? I was under the impression "Moved to next CF" was only for patches that didn't get feedback in a CF and were still waiting for feedback. On Wed, 2 Feb 2022 at 11:16, Julien Rouhaud wrote: > > Hi, > > It's now at least Feb. 1st anywhere on earth, so the commit fest is now over. > > Since last week 5 entries were committed, 1 withdrawn, 3 returned with > feedback, 2 already moved to the next commitfest and 1 rejected. > > This gives a total of 211 patches still alive, most of them ready for the next > and final pg15 commitfest. > > Status summary: > - Needs review: 147. > - Waiting on Author: 38. > - Ready for Committer: 26. > - Committed: 59. > - Moved to next CF: 3. > - Returned with Feedback: 5. > - Rejected: 5. > - Withdrawn: 9. > - Total: 292. > > I will take care of closing the current commit fest and moving the entries to > the next one shortly. > > -- greg
Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
Additionally I've looked at the tests and I'm not sure but I don't think this arrangement is going to work. I don't have the time to run CLOBBER_CACHE and CLOBBER_CACHE_ALWAYS tests but I know they run *really* slowly. So the test can't just do a CHECKPOINT and then trust that the next few transactions will still be in the wal to decode later. There could have been many more timed checkpoints in between. I think the way to do it is to create either a backup label or a replication slot. Then you can inspect the lsn of the label or slot and decode all transactions between that point and the current point. I also think you should try to have a wider set of wal records in that range to test decoding records with and without full page writes, with DDL records, etc. I do like that the tests don't actually have the decoded record info in the test though. But they can do a minimal effort to check that the records they think they're testing are actually being tested. Insert into a temporary table and then run a few queries with WHERE clauses to test for a heap insert, btree insert test the right relid is present, and test that a full page write is present (if full page writes are enabled I guess). You don't need an exhaustive set of checks because you're not testing that wal logging works properly, just that the tests aren't accidentally passing because they're not finding any interesting records.
Re: CREATEROLE and role ownership hierarchies
On Tue, Feb 1, 2022 at 6:38 PM Andrew Dunstan wrote: > > In existing postgresql releases, having CREATEROLE means you can give away > > most attributes, including ones you yourself don't have (createdb, login). > > So we already have the concept of NOFOO WITH ADMIN OPTION, we just don't > > call it that. In pre-v8 patches on this thread, I got rid of that; you > > *must* have the attribute to give it away. But maybe that was too > > restrictive, and we need a way to specify, attribute by attribute, how this > > works. Is this just a problem of surprising grammar? Is it surprising > > behavior? If the latter, I'm inclined to give up this WIP as having been a > > bad move. If the former, I'll try to propose some less objectionable > > grammar. > > > > Certainly the grammar would need to be better. But I'm not sure any > grammar that expresses what is supported here is not going to be > confusing, because the underlying scheme seems complex. But I'm > persuadable. I'd like to hear from others on the subject. Well, we've been moving more and more in the direction of using predefined roles to manage access. The things that are basically Boolean flags on the role are mostly legacy stuff. So my tentative opinion (and I'm susceptible to being persuaded that I'm wrong here) is that putting a lot of work into fleshing out that infrastructure does not necessarily make a ton of sense. Are we ever going to add even one more flag that works that way? Also, any account that can create roles is a pretty high-privilege account. Maybe it's superuser, or maybe not, but it's certainly powerful. In my opinion, that makes fine distinctions here less important. Is there really an argument for saying "well, we're going to let you bypass RLS, but we're not going to let you give that privilege to others"? It seems contrived to think of restricting a role that is powerful enough to create whole new accounts in such a way. I'm not saying that someone couldn't have a use case for it, but I think it'd be a pretty thin use case. In short, I think it makes tons of sense to say that CREATEROLE lets you give to others those role flags which you have, but not the ones you lack. However, to me, it feels like overengineering to distinguish between things you have and can give away, things you have and can't give away, and things you don't even have. -- Robert Haas EDB: http://www.enterprisedb.com
Re: autovacuum prioritization
On Wed, Jan 26, 2022 at 6:46 PM Greg Stark wrote: > One thing I've been wanting to do something about is I think > autovacuum needs to be a little cleverer about when *not* to vacuum a > table because it won't do any good. I agree. > I was thinking of just putting a check in before kicking off a vacuum > and if the globalxmin is a significant fraction of the distance to the > relfrozenxid then instead log a warning. Basically it means "we can't > keep the bloat below the threshold due to the idle transactions et al, > not because there's insufficient i/o bandwidth". Unfortunately, XID distances don't tell us much, because the tuples need not be uniformly distributed across the XID space. In fact, it seems highly likely that they will be very non-uniformly distributed, with a few transactions having created a lot of dead tuples and most having created none. Therefore, it's pretty plausible that a vacuum that permits relfrozenxid++ could solve every problem we have. If we knew something about the distribution of dead XIDs in the table, then we could make an intelligent judgement about whether vacuuming would be useful. But otherwise I feel like we're just guessing, so instead of really fixing the problem we'll just be making it happen in a set of cases that's even harder to grasp. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Ensure that STDERR is empty during connect_ok
On 2022-Feb-02, Andrew Dunstan wrote: > On 2/2/22 11:01, Dagfinn Ilmari Mannsåker wrote: > > Rather than waiting for Someone™ to find a suitably-shaped tuit to do a > > whole sweep converting everything to done_testing(), I think we should > > make a habit of converting individual scripts whenever a change breaks > > the count. +1. > Or when anyone edits a script, even if the number of tests doesn't get > broken. Sure, if the committer is up to doing it, but let's not make that a hard requirement for any commit that touches a test script. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Ni aún el genio muy grande llegaría muy lejos si tuviera que sacarlo todo de su propio interior" (Goethe)
Re: 2022-01 Commitfest
Hi, It's now at least Feb. 1st anywhere on earth, so the commit fest is now over. Since last week 5 entries were committed, 1 withdrawn, 3 returned with feedback, 2 already moved to the next commitfest and 1 rejected. This gives a total of 211 patches still alive, most of them ready for the next and final pg15 commitfest. Status summary: - Needs review: 147. - Waiting on Author: 38. - Ready for Committer: 26. - Committed: 59. - Moved to next CF: 3. - Returned with Feedback: 5. - Rejected: 5. - Withdrawn: 9. - Total: 292. I will take care of closing the current commit fest and moving the entries to the next one shortly.
Re: Ensure that STDERR is empty during connect_ok
On 2/2/22 11:01, Dagfinn Ilmari Mannsåker wrote: > Tom Lane writes: > >> Daniel Gustafsson writes: >> >>> While I prefer to not plan at all and instead run done_testing(), >>> doing that consistently is for another patch, keeping this with the >>> remainder of the suites. >> +1 to that too, counting the tests is a pretty useless exercise. > Rather than waiting for Someone™ to find a suitably-shaped tuit to do a > whole sweep converting everything to done_testing(), I think we should > make a habit of converting individual scripts whenever a change breaks > the count. Or when anyone edits a script, even if the number of tests doesn't get broken. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Refactoring SSL tests
On 2/2/22 08:26, Daniel Gustafsson wrote: > Thoughts? I'm fairly sure there are many crimes against Perl in this patch, > I'm happy to take pointers on how to improve that. It feels a bit odd to me from a perl POV. I think it needs to more along the lines of standard OO patterns. I'll take a stab at that based on this, might be a few days. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Ensure that STDERR is empty during connect_ok
Tom Lane writes: > Daniel Gustafsson writes: > >> While I prefer to not plan at all and instead run done_testing(), >> doing that consistently is for another patch, keeping this with the >> remainder of the suites. > > +1 to that too, counting the tests is a pretty useless exercise. Rather than waiting for Someone™ to find a suitably-shaped tuit to do a whole sweep converting everything to done_testing(), I think we should make a habit of converting individual scripts whenever a change breaks the count. - ilmari
Re: pg_receivewal - couple of improvements
On Wed, Feb 02, 2022 at 09:14:03PM +0530, Bharath Rupireddy wrote: > > FYI that thread is closed, it committed the change (f61e1dd [1]) that > pg_receivewal can read from its replication slot restart lsn. > > I know that providing the start pos as an option came up there [2], > but I wanted to start the discussion fresh as that thread got closed. Ah sorry I misunderstood your email. I'm not sure it's a good idea. If you have missing WALs in your target directory but have an alternative backup location, you will have to restore the WAL from that alternative location anyway, so I'm not sure how accepting a different start position is going to help in that scenario. On the other hand allowing a position at the command line can also lead to accepting a bogus position, which could possibly make things worse. > 2) Currently, RECONNECT_SLEEP_TIME is 5sec - but I may want to have > more reconnect time as I know that the primary can go down at any time > for whatever reasons in production environments which can take some > time till I bring up primary and I don't want to waste compute cycles > in the node on which pg_receivewal is running I don't think that attempting a connection is really costly. Also, increasing this retry time also increases the amount of time you're not streaming WALs, and thus the amount of data you can lose so I'm not sure that's actually a good idea. But you might also want to make it more aggressive, so no objection to make it configurable.
Re: Server-side base backup: why superuser, not pg_write_server_files?
On Wed, Feb 2, 2022 at 10:42 AM Dagfinn Ilmari Mannsåker wrote: > Here's a follow-on patch that adds a test for non-superuser server-side > basebackup, which crashes without your patch and passes with it. This seems like a good idea, but I'm not going to slip a change from an exact test count to done_testing() into a commit on some other topic... -- Robert Haas EDB: http://www.enterprisedb.com
Re: refactoring basebackup.c
On Tue, Jan 18, 2022 at 1:55 PM Robert Haas wrote: > 0001 adds "server" and "blackhole" as backup targets. It now has some > tests. This might be more or less ready to ship, unless somebody else > sees a problem, or I find one. I played around with this a bit and it seems quite easy to extend this further. So please find attached a couple more patches to generalize this mechanism. 0001 adds an extensibility framework for backup targets. The idea is that an extension loaded via shared_preload_libraries can call BaseBackupAddTarget() to define a new base backup target, which the user can then access via pg_basebackup --target TARGET_NAME, or if they want to pass a detail string, pg_basebackup --target TARGET_NAME:DETAIL. There might be slightly better ways of hooking this into the system. I'm not unhappy with this approach, but there might be a better idea out there. 0002 adds an example contrib module called basebackup_to_shell. The system administrator can set basebackup_to_shell.command='SOMETHING'. A backup directed to the 'shell' target will cause the server to execute the configured command once per generated archive, and once for the backup_manifest, if any. When executing the command, %f gets replaced with the archive filename (e.g. base.tar) and %d gets replaced with the detail. The actual contents of the file are passed to the command's standard input, and it can then do whatever it likes with that data. Clearly, this is not state of the art; for instance, if what you really want is to upload the backup files someplace via HTTP, using this to run 'curl' is probably not so good of an idea as using an extension module that links with libcurl. That would likely lead to better error checking, better performance, nicer configuration, and just generally fewer things that can go wrong. On the other hand, writing an integration in C is kind of tricky, and this thing is quite easy to use -- and it does work. There are a couple of things to be concerned about with 0002 from a security perspective. First, in a backend environment, we have a function to spawn a subprocess via popen(), namely OpenPipeStream(), but there is no function to spawn a subprocess with execve() and end up with a socket connected to its standard input. And that means that whatever command the administrator configures is being interpreted by the shell, which is a potential problem given that we're interpolating the target detail string supplied by the user, who must have at least replication privileges but need not be the superuser. I chose to handle this by allowing the target detail to contain only alphanumeric characters. Refinement is likely possible, but whether the effort is worthwhile seems questionable. Second, what if the superuser wants to allow the use of this module to only some of the users who have replication privileges? That seems a bit unlikely but it's possible, so I added a GUC basebackup_to_shell.required_role. If set, the functionality is only usable by members of the named role. If unset, anyone with replication privilege can use it. I guess someone could criticize this as defaulting to the least secure setting, but considering that you have to have replication privileges to use this at all, I don't find that argument much to get excited about. I have to say that I'm incredibly happy with how easy these patches were to write. I think this is going to make adding new base backup targets as accessible as we can realistically hope to make it. There is some boilerplate code, as an examination of the patches will reveal, but it's not a lot, and at least IMHO it's pretty straightforward. Granted, coding up a new base backup target is something only experienced C hackers are likely to do, but the fact that I was able to throw this together so quickly suggests to me that I've got the design basically right, and that anyone who does want to plug into the new mechanism shouldn't have too much trouble doing so. Thoughts? -- Robert Haas EDB: http://www.enterprisedb.com 0001-Allow-extensions-to-add-new-backup-targets.patch Description: Binary data 0002-Add-basebackup_to_shell-contrib-module.patch Description: Binary data
Re: Ensure that STDERR is empty during connect_ok
On 2022-Feb-02, Daniel Gustafsson wrote: > Making this a subtest in order to not having to change the callers, turns the > patch into the attached. For this we must group the new test with one already > existing test, if we group more into it (which would make more sense) then we > need to change callers as that reduce the testcount across the tree. Well, it wasn't my intention that this patch would not have to change the callers. What I was thinking about is making connect_ok() have a subtest for *all* the tests it contains (and changing the callers to match), so that *in the future* we can add more tests there without having to change the callers *again*. > Or did I misunderstand you? I think you did. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Cómo ponemos nuestros dedos en la arcilla del otro. Eso es la amistad; jugar al alfarero y ver qué formas se pueden sacar del otro" (C. Halloway en La Feria de las Tinieblas, R. Bradbury)
Re: pg_receivewal - couple of improvements
On Wed, Feb 2, 2022 at 9:05 PM Julien Rouhaud wrote: > > Hi, > > On Wed, Feb 02, 2022 at 08:53:13PM +0530, Bharath Rupireddy wrote: > > > > Here are some improvements we can make to pg_receivewal that were > > emanated after working with it in production environments: > > > > 1) As a user, I, sometimes, want my pg_receivewal to start streaming > > from the LSN that I provide as an input i.e. startpos instead of it > > calculating the stream start position 1) from its target directory or > > 2) from its replication slot's restart_lsn or 3) after sending > > IDENTIFY_SYSTEM on to the primary. > > This is already being discussed (at least part of) as of > https://www.postgresql.org/message-id/flat/18708360.4lzOvYHigE%40aivenronan. FYI that thread is closed, it committed the change (f61e1dd [1]) that pg_receivewal can read from its replication slot restart lsn. I know that providing the start pos as an option came up there [2], but I wanted to start the discussion fresh as that thread got closed. [1] commit f61e1dd2cee6b1a1da75c2bb0ca3bc72f18748c1 Author: Michael Paquier Date: Tue Oct 26 09:30:37 2021 +0900 Allow pg_receivewal to stream from a slot's restart LSN Author: Ronan Dunklau Reviewed-by: Kyotaro Horiguchi, Michael Paquier, Bharath Rupireddy Discussion: https://postgr.es/m/18708360.4lzOvYHigE@aivenronan [2] https://www.postgresql.org/message-id/20210902.144554.1303720268994714850.horikyota.ntt%40gmail.com Regards, Bharath Rupireddy.
Re: Ensure that STDERR is empty during connect_ok
> On 2 Feb 2022, at 16:01, Alvaro Herrera wrote: > > On 2022-Feb-02, Daniel Gustafsson wrote: > >> As part of the NSS patchset, quite a few bugs (and NSS quirks) were found by >> inspecting STDERR in connect_ok and require it to be empty. This is not >> really >> NSS specific, and could help find issues in other libraries as well so I >> propose to apply it regardless of the fate of the NSS patchset. >> >> (The change in the SCRAM tests stems from this now making all testruns have >> the >> same number of tests. While I prefer to not plan at all and instead run >> done_testing(), doing that consistently is for another patch, keeping this >> with >> the remainder of the suites.) > > Since commit 405f32fc4960 we can rely on subtests for this, so perhaps > we should group all the tests in connect_ok() (including your new one) > into a subtest; then each connect_ok() calls count as a single test, and > we can add more tests in it without having to change the callers. Disclaimer: longer term I would prefer to remove test plan counting like above and Toms comment downthread. This is just to make this patch more palatable on the way there. Making this a subtest in order to not having to change the callers, turns the patch into the attached. For this we must group the new test with one already existing test, if we group more into it (which would make more sense) then we need to change callers as that reduce the testcount across the tree. This makes the patch smaller, but not more readable IMO (given that we can't make it fully use subtests), so my preference is still the v1 patch. Or did I misunderstand you? -- Daniel Gustafsson https://vmware.com/ v2-0001-Ensure-that-STDERR-is-empty-in-connect_ok-tests.patch Description: Binary data
Re: Server-side base backup: why superuser, not pg_write_server_files?
Robert Haas writes: > On Fri, Jan 28, 2022 at 12:35 PM Dagfinn Ilmari Mannsåker > wrote: >> On Fri, 28 Jan 2022, at 17:33, Robert Haas wrote: >> > LGTM. Committed. >> >> Thanks! > > It appears that neither of us actually tested that this works. Oops! > For me, it works when I test as a superuser, but if I test as a > non-superuser with or without pg_write_server_files, it crashes, > because we end up trying to do syscache lookups without a transaction > environment. I *think* that the attached is a sufficient fix; at > least, it passes simple testing. Here's a follow-on patch that adds a test for non-superuser server-side basebackup, which crashes without your patch and passes with it. - ilmari >From e88af403706338d068a7156d2a9c02e27196ce12 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= Date: Wed, 2 Feb 2022 15:40:55 + Subject: [PATCH] Test server-side basebackup as non-superuser --- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index a827be5e59..2283a8c42d 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -10,7 +10,7 @@ use Fcntl qw(:seek); use PostgreSQL::Test::Cluster; use PostgreSQL::Test::Utils; -use Test::More tests => 143; +use Test::More; program_help_ok('pg_basebackup'); program_version_ok('pg_basebackup'); @@ -521,6 +521,15 @@ ok(-f "$tempdir/backuponserver/base.tar", 'backup tar was created'); rmtree("$tempdir/backuponserver"); +$node->command_ok( + [qw(createuser --replication --role=pg_write_server_files backupuser)], + 'create backup user'); +$node->command_ok( + [ @pg_basebackup_defs, '-U', 'backupuser', '--target', "server:$real_tempdir/backuponserver", '-X', 'none' ], + 'backup target server'); +ok(-f "$tempdir/backuponserver/base.tar", 'backup tar was created as non-superuser'); +rmtree("$tempdir/backuponserver"); + $node->command_fails( [ @pg_basebackup_defs, '-D', @@ -768,3 +777,5 @@ rmtree("$tempdir/backup_gzip2"); rmtree("$tempdir/backup_gzip3"); } + +done_testing(); -- 2.30.2
Re: warn if GUC set to an invalid shared library
On Tue, Feb 1, 2022 at 11:06 PM Maciek Sakrejda wrote: > I tried running ALTER SYSTEM and got the warnings as expected: > > postgres=# alter system set shared_preload_libraries = > no_such_library,not_this_one_either; > WARNING: could not access file "$libdir/plugins/no_such_library" > WARNING: could not access file "$libdir/plugins/not_this_one_either" > ALTER SYSTEM > > I think this is great, but it would be really helpful to also indicate > that at this point the server will fail to come back up after a restart. In > my mind, that's a big part of the reason for having a warning here. Having > made this mistake a couple of times, I would be able to read between the > lines, as would many other users, but if you're not familiar with Postgres > this might still be pretty opaque. +1 I would at least consider having the UX go something like: postgres=# ALTER SYSTEM SET shared_preload_libraries = not_such_library; ERROR: . HINT: to bypass the error please add FORCE before SET postgres=# ALTER SYSTEM FORCE SET shared_preload_libraries = no_such_library; NOTICE: Error suppressed while setting shared_preload_libraries. That is, have the user express their desire to leave the system in a precarious state explicitly before actually doing so. Upon startup, if the system already can track each separate location that shared_preload_libraries is set, printing out those locations and current values would be useful context. Seeing ALTER SYSTEM in that listing would be helpful. David J.
Re: pg_receivewal - couple of improvements
Hi, On Wed, Feb 02, 2022 at 08:53:13PM +0530, Bharath Rupireddy wrote: > > Here are some improvements we can make to pg_receivewal that were > emanated after working with it in production environments: > > 1) As a user, I, sometimes, want my pg_receivewal to start streaming > from the LSN that I provide as an input i.e. startpos instead of it > calculating the stream start position 1) from its target directory or > 2) from its replication slot's restart_lsn or 3) after sending > IDENTIFY_SYSTEM on to the primary. This is already being discussed (at least part of) as of https://www.postgresql.org/message-id/flat/18708360.4lzOvYHigE%40aivenronan.
pg_receivewal - couple of improvements
Hi, Here are some improvements we can make to pg_receivewal that were emanated after working with it in production environments: 1) As a user, I, sometimes, want my pg_receivewal to start streaming from the LSN that I provide as an input i.e. startpos instead of it calculating the stream start position 1) from its target directory or 2) from its replication slot's restart_lsn or 3) after sending IDENTIFY_SYSTEM on to the primary. This will particularly be useful when the primary is down for some time (for whatever reasons) and the WAL files that are required by the pg_receivewal may have been removed by it (I know this situation is a bit messy, but it is quite possible in production environments). Then, the pg_receivewal will calculate the start position from its target directory and request the primary with it, which the primary may not have. I have to intervene and manually delete/move the WAL files in the pg_receivewal target directory and restart the pg_receivewal so that it can continue. Instead, if pg_receivewal can accept a startpos as an option, it can just go ahead and stream from the primary. 2) Currently, RECONNECT_SLEEP_TIME is 5sec - but I may want to have more reconnect time as I know that the primary can go down at any time for whatever reasons in production environments which can take some time till I bring up primary and I don't want to waste compute cycles in the node on which pg_receivewal is running and I should be able to just set it to a higher value, say 5 min or so, after which pg_receivewal can try to perform StreamLog(); and attempt connection to primary. Thoughts? Regards, Bharath Rupireddy.
Re: Design of pg_stat_subscription_workers vs pgstats
On Wed, Feb 2, 2022 at 5:08 AM Amit Kapila wrote: > On Wed, Feb 2, 2022 at 1:06 PM David G. Johnston > wrote: > > ... > > > > I already explained that the concept of err_cnt is not useful. The fact > that you include it here makes me think you are still thinking that this > all somehow is meant to keep track of history. It is not. The workers are > state machines and "error" is one of the states - with relevant attributes > to display to the user, and system, while in that state. The state machine > reporting does not care about historical states nor does it report on > them. There is some uncertainty if we continue with the automatic > re-launch; > > > > I think automatic retry will help to allow some transient errors say > like network glitches that can be resolved on retry and will keep the > behavior transparent. This is also consistent with what we do in > standby mode where if there is an error on primary due to which > standby is not able to fetch some data it will just retry. We can't > fix any error that occurred on the server-side, so the way is to retry > which is true for both standby and subscribers. > Good points. In short there are two subsets of problems to deal with here. We should address them separately, though the pg_subscription_worker table should provide relevant information for both cases. If we are in a retry situation relevant information, like next_scheduled_retry (estimated), should be provided (if there is some kind of delay involved). In a situation like "unique constraint violation" the "next_scheduled_retry" would be null; or make the field a text field and print "Manual Intervention Required". Likewise, the XID/LSN would be null in a retry situation since we haven't received a wholly intact transaction from the publisher (we may know of such an ID but if the final COMMIT message is never even seen before the feed dies we should not be exposing that incomplete information to the user). A standby is not expected to encounter any user data constraint problems so even a system with manual intervention for such will work for standbys because they will never hit that code path. And you cannot simply skip applying the failed transaction and move onto the next one - that data also never came over. David J.
Re: Server-side base backup: why superuser, not pg_write_server_files?
On Fri, Jan 28, 2022 at 12:35 PM Dagfinn Ilmari Mannsåker wrote: > On Fri, 28 Jan 2022, at 17:33, Robert Haas wrote: > > LGTM. Committed. > > Thanks! It appears that neither of us actually tested that this works. For me, it works when I test as a superuser, but if I test as a non-superuser with or without pg_write_server_files, it crashes, because we end up trying to do syscache lookups without a transaction environment. I *think* that the attached is a sufficient fix; at least, it passes simple testing. -- Robert Haas EDB: http://www.enterprisedb.com 0001-Fix-server-crash-bug-in-server-backup-target.patch Description: Binary data
Re: Ensure that STDERR is empty during connect_ok
Daniel Gustafsson writes: > As part of the NSS patchset, quite a few bugs (and NSS quirks) were found by > inspecting STDERR in connect_ok and require it to be empty. This is not > really > NSS specific, and could help find issues in other libraries as well so I > propose to apply it regardless of the fate of the NSS patchset. +1 > (The change in the SCRAM tests stems from this now making all testruns have > the > same number of tests. While I prefer to not plan at all and instead run > done_testing(), doing that consistently is for another patch, keeping this > with > the remainder of the suites.) +1 to that too, counting the tests is a pretty useless exercise. regards, tom lane
Re: Ensure that STDERR is empty during connect_ok
On 2022-Feb-02, Daniel Gustafsson wrote: > As part of the NSS patchset, quite a few bugs (and NSS quirks) were found by > inspecting STDERR in connect_ok and require it to be empty. This is not > really > NSS specific, and could help find issues in other libraries as well so I > propose to apply it regardless of the fate of the NSS patchset. > > (The change in the SCRAM tests stems from this now making all testruns have > the > same number of tests. While I prefer to not plan at all and instead run > done_testing(), doing that consistently is for another patch, keeping this > with > the remainder of the suites.) Since commit 405f32fc4960 we can rely on subtests for this, so perhaps we should group all the tests in connect_ok() (including your new one) into a subtest; then each connect_ok() calls count as a single test, and we can add more tests in it without having to change the callers. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ Al principio era UNIX, y UNIX habló y dijo: "Hello world\n". No dijo "Hello New Jersey\n", ni "Hello USA\n".
Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
On Tue, Feb 1, 2022 at 9:39 PM Fujii Masao wrote: > I found that CreateRestartPoint() already reported the redo lsn as follows > after emitting the restartpoint log message. To avoid duplicated logging of > the same information, we should update this code? > > ereport((log_checkpoints ? LOG : DEBUG2), > (errmsg("recovery restart point at %X/%X", > LSN_FORMAT_ARGS(lastCheckPoint.redo)), > xtime ? errdetail("Last completed transaction was at > log time %s.", > > timestamptz_to_str(xtime)) : 0)); > > This code reports lastCheckPoint.redo as redo lsn. OTOH, with the patch, > LogCheckpointEnd() reports ControlFile->checkPointCopy.redo. They may be > different, for example, when the current DB state is not > DB_IN_ARCHIVE_RECOVERY. In this case, which lsn should we report as redo lsn? Do we ever reach CreateRestartPoint when ControlFile->stat != DB_IN_ARCHIVE_RECOVERY? Assert(ControlFile->state == DB_IN_ARCHIVE_RECOVERY); in CreateRestartPoint doesn't fail any regression tests. Here's what can happen: lastCheckPoint.redo is 100 and ControlFile->checkPointCopy.redo is 105, so, "skipping restartpoint, already performed at %X/%X" LogCheckpointEnd isn't reached lastCheckPoint.redo is 105 and ControlFile->checkPointCopy.redo is 100 and ControlFile->state == DB_IN_ARCHIVE_RECOVERY, then the control file gets updated and LogCheckpointEnd prints the right redo lsn. lastCheckPoint.redo is 105 and ControlFile->checkPointCopy.redo is 100 and ControlFile->state != DB_IN_ARCHIVE_RECOVERY, the the control file doesn't get updated and LogCheckpointEnd just prints the control redo lsn. Looks like this case is rare given Assert(ControlFile->state == DB_IN_ARCHIVE_RECOVERY); doesn't fail any tests. I think we should just let LogCheckpointEnd print the redo lsn from the control file. We can just remove the above errmsg("recovery restart point at %X/%X" message altogether or just print it only in the rare scenario, something like below: if (ControlFile->state != DB_IN_ARCHIVE_RECOVERY) { ereport((log_checkpoints ? LOG : DEBUG2), (errmsg("performed recovery restart point at %X/%X while the database state is %s", LSN_FORMAT_ARGS(lastCheckPoint.redo)), getDBState(ControlFile->state))); } And the last commit/abort records's timestamp will always get logged even before we reach here in the main redo loop (errmsg("last completed transaction was at log time %s"). Or another way is to just pass the redo lsn to LogCheckpointEnd and pass the lastCheckPoint.redo in if (ControlFile->state != DB_IN_ARCHIVE_RECOVERY) cases or when control file isn't updated but restart point happened. Thoughts? Regards, Bharath Rupireddy.
Ensure that STDERR is empty during connect_ok
As part of the NSS patchset, quite a few bugs (and NSS quirks) were found by inspecting STDERR in connect_ok and require it to be empty. This is not really NSS specific, and could help find issues in other libraries as well so I propose to apply it regardless of the fate of the NSS patchset. (The change in the SCRAM tests stems from this now making all testruns have the same number of tests. While I prefer to not plan at all and instead run done_testing(), doing that consistently is for another patch, keeping this with the remainder of the suites.) -- Daniel Gustafsson https://vmware.com/ 0001-Ensure-that-STDERR-is-empty-in-connect_ok-tests.patch Description: Binary data
Re: Deparsing rewritten query
Hi, On Tue, Feb 01, 2022 at 08:35:00PM +0100, Pavel Stehule wrote: > > I tested your patch, and it looks so it is working without any problem. All > tests passed. > > There is just one question. If printalias = true will be active for all > cases or just with some flag? Sorry, as I just replied to Bharath I sent the wrong patch. The new patch has the same modification with printalias = true though, so I can still answer that question. The change is active for all cases, however it's a no-op for any in-core case, as a query sent by a client should be valid, and thus should have an alias attached to all subqueries. It's only different if you call get_query_def() on the result of pg_analyze_and_rewrite(), since this code doesn't add the subquery aliases as those aren't needed for the execution part.
Re: Deparsing rewritten query
Hi, On Wed, Feb 02, 2022 at 07:09:35PM +0530, Bharath Rupireddy wrote: > On Tue, Feb 1, 2022 at 9:08 AM Julien Rouhaud wrote: > > > > Hi, > > Thanks. +1 for this work. Some comments on v3: > > 1) How about pg_get_rewritten_query()? Argh, I just realized that I sent the patch from the wrong branch. Per previous complaint from Tom, I'm not proposing that function anymore (I will publish an extension for that if the patch gets commits) but only expose get_query_def(). I'm attaching the correct patch this time, sorry about that. >From 0485ea1b507e8f2f1df782a97f11184276d7fca7 Mon Sep 17 00:00:00 2001 From: Julien Rouhaud Date: Tue, 29 Jun 2021 00:07:04 +0800 Subject: [PATCH v3] Expose get_query_def() This function can be useful for external module, for instance if they want to display a statement after the rewrite stage. In order to emit valid SQL, make sure that any subquery RTE comes with an alias. Author: Julien Rouhaud Reviewed-by: Gilles Darold Reviewed-by: Pavel Stehule Discussion: https://postgr.es/m/20210627041138.zklczwmu3ms4ufnk@nol --- src/backend/utils/adt/ruleutils.c | 15 +++ src/include/utils/ruleutils.h | 3 +++ 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index 039b1d2b95..3db2948984 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -389,9 +389,6 @@ static void make_ruledef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc, int prettyFlags); static void make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc, int prettyFlags, int wrapColumn); -static void get_query_def(Query *query, StringInfo buf, List *parentnamespace, - TupleDesc resultDesc, - int prettyFlags, int wrapColumn, int startIndent); static void get_values_def(List *values_lists, deparse_context *context); static void get_with_clause(Query *query, deparse_context *context); static void get_select_query_def(Query *query, deparse_context *context, @@ -5344,7 +5341,7 @@ make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc, * the view represented by a SELECT query. * -- */ -static void +void get_query_def(Query *query, StringInfo buf, List *parentnamespace, TupleDesc resultDesc, int prettyFlags, int wrapColumn, int startIndent) @@ -10989,6 +10986,16 @@ get_from_clause_item(Node *jtnode, Query *query, deparse_context *context) if (strcmp(refname, rte->ctename) != 0) printalias = true; } + else if (rte->rtekind == RTE_SUBQUERY) + { + /* +* For a subquery RTE, always print alias. A user-specified query +* should only be valid if an alias is provided, but our view +* expansion doesn't generate aliases, so a rewritten query might +* not be valid SQL. +*/ + printalias = true; + } if (printalias) appendStringInfo(buf, " %s", quote_identifier(refname)); diff --git a/src/include/utils/ruleutils.h b/src/include/utils/ruleutils.h index e8090c96d7..f512bb6867 100644 --- a/src/include/utils/ruleutils.h +++ b/src/include/utils/ruleutils.h @@ -39,6 +39,9 @@ extern List *select_rtable_names_for_explain(List *rtable, Bitmapset *rels_used); extern char *generate_collation_name(Oid collid); extern char *generate_opclass_name(Oid opclass); +void get_query_def(Query *query, StringInfo buf, List *parentnamespace, + TupleDesc resultDesc, + int prettyFlags, int wrapColumn, int startIndent); extern char *get_range_partbound_string(List *bound_datums); extern char *pg_get_statisticsobjdef_string(Oid statextid); -- 2.35.0
Re: Schema variables - new implementation for Postgres 15
Hi, On Sun, Jan 30, 2022 at 08:09:18PM +0100, Pavel Stehule wrote: > > rebase after 02b8048ba5dc36238f3e7c3c58c5946220298d71 Here are a few comments, mostly about pg_variable.c and sessionvariable.c. I stopped before reading the whole patch as I have some concern about the sinval machanism, which ould change a bit the rest of the patch. I'm also attaching a patch (with .txt extension to avoid problem with the cfbot) with some comment update propositions. In sessionvariable.c, why VariableEOXAction and VariableEOXActionCodes? Can't the parser emit directly the char value, like e.g. relpersistence? extraneous returns for 2 functions: +void +get_session_variable_type_typmod_collid(Oid varid, Oid *typid, int32 *typmod, + Oid *collid) +{ [...] + return; +} +void +initVariable(Variable *var, Oid varid, bool fast_only) +{ [...] + return; +} VariableCreate(): Maybe add a bunch of AssertArg() for all the mandatory parametrers? Also, the check for variable already existing should be right after the AssertArg(), and using SearchSysCacheExistsX(). Maybe also adding an Assert(OidIsValid(xxxoid)) just after the CatalogTupleInsert(), similarly to some other creation functions? event-triggers.sgml needs updating for the firing matrix, as session variable are compatible with even triggers. +typedef enum SVariableXActAction +{ + ON_COMMIT_DROP, /* used for ON COMMIT DROP */ + ON_COMMIT_RESET,/* used for DROP VARIABLE */ + RESET, /* used for ON TRANSACTION END RESET */ + RECHECK /* recheck if session variable is living */ +} SVariableXActAction; The names seem a bit generic, maybe add a prefix like SVAR_xxx? ON_COMMIT_RESET is also confusing as it looks like an SQL clause. Maybe PERFORM_DROP or something? +static List *xact_drop_actions = NIL; +static List *xact_reset_actions = NIL; Maybe add a comment saying both are lists of SVariableXActAction? +typedef SVariableData * SVariable; looks like a missing bump to typedefs.list. +char * +get_session_variable_name(Oid varid) +{ + HeapTuple tup; + Form_pg_variable varform; + char *varname; + + tup = SearchSysCache1(VARIABLEOID, ObjectIdGetDatum(varid)); + + if (!HeapTupleIsValid(tup)) + elog(ERROR, "cache lookup failed for session variable %u", varid); + + varform = (Form_pg_variable) GETSTRUCT(tup); + + varname = NameStr(varform->varname); + + ReleaseSysCache(tup); + + return varname; +} This kind of function should return a palloc'd copy of the name. +void +ResetSessionVariables(void) [...] + list_free_deep(xact_drop_actions); + xact_drop_actions = NIL; + + list_free_deep(xact_reset_actions); + xact_drop_actions = NIL; +} The 2nd chunk should be xact_reset_actions = NIL +static void register_session_variable_xact_action(Oid varid, SVariableXActAction action); +static void delete_session_variable_xact_action(Oid varid, SVariableXActAction action); The naming is a bit confusing, maybe unregister_session_cable_xact_action() for consistency? +void +register_session_variable_xact_action(Oid varid, + SVariableXActAction action) the function is missing the static keyword. In AtPreEOXact_SessionVariable_on_xact_actions(), those 2 instructions are executed twice (once in the middle and once at the end): list_free_deep(xact_drop_actions); xact_drop_actions = NIL; +* If this entry was created during the current transaction, +* creating_subid is the ID of the creating subxact; if created in a prior +* transaction, creating_subid is zero. I don't see any place in the code where creating_subid can be zero? It looks like it's only there for future transactional implementation, but for now this attribute seems unnecessary? /* at transaction end recheck sinvalidated variables */ RegisterXactCallback(sync_sessionvars_xact_callback, NULL); I don't think it's ok to use xact callback for in-core code. The function explicitly says: > * These functions are intended for use by dynamically loaded modules. > * For built-in modules we generally just hardwire the appropriate calls > * (mainly because it's easier to control the order that way, where needed). Also, this function and AtPreEOXact_SessionVariable_on_xact_actions() are skipping all or part of the processing if there is no active transaction. Is that really ok? I'm particularly sceptical about AtPreEOXact_SessionVariable_on_xact_actions and the RECHECK actions, as the xact_reset_actions list is reset whether the recheck was done or not, so it seems to me that it could be leaking some entries in the hash table. If the database has a lot of object, it seems possible (while unlikely) that a subsequent CREATE VARIABLE can get the same oid leading to incorrect results? If that's somehow ok, wouldn't it be better to rearrange the code to call those functions less often, and only when
Re: Deparsing rewritten query
On Tue, Feb 1, 2022 at 9:08 AM Julien Rouhaud wrote: > > Hi, Thanks. +1 for this work. Some comments on v3: 1) How about pg_get_rewritten_query()? 2) Docs missing? 3) How about allowing only the users who are explicitly granted to use this function like pg_log_backend_memory_contexts, pg_log_query_plan(in discussion), pg_log_backtrace(in discussion)? 4) initStringInfo in the for loop will palloc every time and will leak the memory. you probably need to do resetStringInfo in the for loop instead. + foreach(lc, querytree_list) + { + query = (Query *) lfirst(lc); + initStringInfo(); 5) I would even suggest using a temp memory context for this function alone, because it will ensure we dont' leak any memory which probably parser, analyzer, rewriter would use. 6) Why can't query be for loop variable? + Query *query; 7) Why can't the function check for empty query string and emit error immedeiately (empty string isn't allowed or some other better error message), rather than depending on the pg_parse_query? + parsetree_list = pg_parse_query(sql); + + /* only support one statement at a time */ + if (list_length(parsetree_list) != 1) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("a single statement should be provided"))); 8) Show rewritten query given raw query string +{ oid => '9246', descr => 'show a query as rewritten', 9) Doesn't the input need a ; at the end of query? not sure if the parser assumes it as provided? +SELECT pg_get_query_def('SELECT * FROM shoe') as def; 10) For pg_get_viewdef it makes sense to have the test case in rules.sql, but shouldn't this function be in misc_functions.sql? 11) Missing bump cat version note in the commit message. 12) I'm just thinking adding an extra option to explain, which will then print the rewritten query in the explain output, would be useful than having a separate function to do this? 13) Somebody might also be interested to just get the completed planned query i.e. output of pg_plan_query? or the other way, given the query plan as input to a function, can we get the query back? something like postgres_fdw/deparse.c does? Regards, Bharath Rupireddy.
Re: Make relfile tombstone files conditional on WAL level
On Wed, Feb 2, 2022 at 6:57 PM Robert Haas wrote: > > On Mon, Jan 31, 2022 at 9:37 AM Dilip Kumar wrote: > > I agree that we are using 8 bytes unsigned int multiple places in code > > as uint64. But I don't see it as an exposed data type and not used as > > part of any exposed function. But we will have to use the relfilenode > > in the exposed c function e.g. > > binary_upgrade_set_next_heap_relfilenode(). > > Oh, I thought we were talking about the C data type uint8 i.e. an > 8-bit unsigned integer. Which in retrospect was a dumb thought because > you said you wanted to store the relfilenode AND the fork number > there, which only make sense if you were talking about SQL data types > rather than C data types. It is confusing that we have an SQL data > type called int8 and a C data type called int8 and they're not the > same. > > But if you're talking about SQL data types, why? pg_class only stores > the relfilenode and not the fork number currently, and I don't see why > that would change. I think that the data type for the relfilenode > column would change to a 64-bit signed integer (i.e. bigint or int8) > that only ever uses the low-order 56 bits, and then when you need to > store a relfilenode and a fork number in the same 8-byte quantity > you'd do that using either a struct with bit fields or by something > like combined = ((uint64) signed_representation_of_relfilenode) | > (((int) forknumber) << 56); Yeah you're right. I think whenever we are using combined then we can use uint64 C type and in pg_class we can keep it as int64 because that is only representing the relfilenode part. I think I was just confused and tried to use the same data type everywhere whether it is combined with fork number or not. Thanks for your input, I will change this. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Make relfile tombstone files conditional on WAL level
On Mon, Jan 31, 2022 at 9:37 AM Dilip Kumar wrote: > I agree that we are using 8 bytes unsigned int multiple places in code > as uint64. But I don't see it as an exposed data type and not used as > part of any exposed function. But we will have to use the relfilenode > in the exposed c function e.g. > binary_upgrade_set_next_heap_relfilenode(). Oh, I thought we were talking about the C data type uint8 i.e. an 8-bit unsigned integer. Which in retrospect was a dumb thought because you said you wanted to store the relfilenode AND the fork number there, which only make sense if you were talking about SQL data types rather than C data types. It is confusing that we have an SQL data type called int8 and a C data type called int8 and they're not the same. But if you're talking about SQL data types, why? pg_class only stores the relfilenode and not the fork number currently, and I don't see why that would change. I think that the data type for the relfilenode column would change to a 64-bit signed integer (i.e. bigint or int8) that only ever uses the low-order 56 bits, and then when you need to store a relfilenode and a fork number in the same 8-byte quantity you'd do that using either a struct with bit fields or by something like combined = ((uint64) signed_representation_of_relfilenode) | (((int) forknumber) << 56); -- Robert Haas EDB: http://www.enterprisedb.com
Refactoring SSL tests
As part of the NSS patchset (and Secure Transport before that), I had to refactor the SSL tests to handle different SSL libraries. The current tests and test module is quite tied to how OpenSSL works wrt setting up the server, the attached refactors this and abstracts the OpenSSL specifics more like how the rest of the codebase is set up. The tight coupling is of course not a problem right now, but I think this patch has more benefits making it a candidate for going in regardless of the fate of the NSS patchset. This is essentially the 0002 patch from that patchset with additional cleanup and documentation: * switch_server_cert takes a set of named parameters rather than a fixed set with defaults depending on each other, which made adding ssl_passphrase to it cumbersome. It also adds readability IMO. * SSLServer is renamed SSL::Server, which in turn use SSL::Backend::X where X is the backend pointed to by with_ssl. Each backend will implement its own module which is responsible for setting up keys/certs and to resolve sslkey values to their full paths. The idea is that the namespace will also allow for an SSL::Client in the future when we implment running client tests against different servers etc. * The modules are POD documented. * While not related to the refactor per se, the hardcoded number of planned tests is removed in favor of calling done_testing(). With this, adding a new SSL library is quite straightforward, I've done the legwork to test that =) I opted for avoiding too invasive changes leaving the tests somewhat easy to compare to back branches. Thoughts? I'm fairly sure there are many crimes against Perl in this patch, I'm happy to take pointers on how to improve that. -- Daniel Gustafsson https://vmware.com/ 0001-Refactor-SSL-tests.patch Description: Binary data
Re: make MaxBackends available in _PG_init
On Tue, Feb 1, 2022 at 5:36 PM Nathan Bossart wrote: > I can work on a new patch if this is the direction we want to go. There > were a couple of functions that called GetMaxBackends() repetitively that I > should probably fix before the patch should be seriously considered. Sure, that sort of thing should be tidied up. It's unlikely to make any real difference, but as a fairly prominent PostgreSQL hacker once said, a cycle saved is a cycle earned. -- Robert Haas EDB: http://www.enterprisedb.com
Re: ICU for global collation
On 27.01.22 09:10, Peter Eisentraut wrote: On 21.01.22 17:13, Julien Rouhaud wrote: On Fri, Jan 21, 2022 at 03:24:02PM +0100, Peter Eisentraut wrote: On 21.01.22 14:51, Julien Rouhaud wrote: Is that change intended? There isn't any usage of the collversionstr before the possible error when actual_versionstr is missing. I wanted to move it closer to the SysCacheGetAttr() where the "datum" value is obtained. It seemed weird to get the datum, then do other things, then decode the datum. Oh ok. It won't make much difference performance-wise, so no objection. I have committed this and will provide follow-up patches in the next few days. Here is the main patch rebased on the various changes that have been committed in the meantime. There is still some work to be done on the user interfaces of initdb, createdb, etc. I have split out the database-level collation version tracking into a separate patch [0]. I think we should get that one in first and then refresh this one. [0]: https://www.postgresql.org/message-id/flat/f0ff3190-29a3-5b39-a179-fa32eee57db6%40enterprisedb.comFrom 4028980f6be3662c0302575ed92de77e941e5a9e Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 2 Feb 2022 13:54:04 +0100 Subject: [PATCH v4] Add option to use ICU as global collation provider This adds the option to use ICU as the default collation provider for either the whole cluster or a database. New options for initdb, createdb, and CREATE DATABASE are used to select this. Discussion: https://www.postgresql.org/message-id/flat/5e756dd6-0e91-d778-96fd-b1bcb06c161a%402ndquadrant.com --- doc/src/sgml/catalogs.sgml| 9 + doc/src/sgml/ref/create_database.sgml | 16 ++ doc/src/sgml/ref/createdb.sgml| 9 + doc/src/sgml/ref/initdb.sgml | 23 ++ src/backend/catalog/pg_collation.c| 18 +- src/backend/commands/collationcmds.c | 93 --- src/backend/commands/dbcommands.c | 72 +- src/backend/utils/adt/pg_locale.c | 242 +++--- src/backend/utils/init/postinit.c | 26 ++ src/bin/initdb/Makefile | 2 + src/bin/initdb/initdb.c | 64 - src/bin/initdb/t/001_initdb.pl| 18 +- src/bin/pg_dump/pg_dump.c | 19 ++ src/bin/pg_upgrade/check.c| 10 + src/bin/pg_upgrade/info.c | 18 +- src/bin/pg_upgrade/pg_upgrade.h | 2 + src/bin/psql/describe.c | 23 +- src/bin/psql/tab-complete.c | 2 +- src/bin/scripts/Makefile | 2 + src/bin/scripts/createdb.c| 9 + src/bin/scripts/t/020_createdb.pl | 20 +- src/include/catalog/pg_collation.dat | 3 +- src/include/catalog/pg_collation.h| 6 +- src/include/catalog/pg_database.dat | 4 +- src/include/catalog/pg_database.h | 6 + src/include/utils/pg_locale.h | 6 + .../regress/expected/collate.icu.utf8.out | 10 +- src/test/regress/sql/collate.icu.utf8.sql | 8 +- 28 files changed, 572 insertions(+), 168 deletions(-) diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 7d5b0b1656..5a5779b9a3 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -2384,6 +2384,15 @@ pg_collation Columns + + + collicucoll text + + + ICU collation string + + + collversion text diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml index f22e28dc81..403010cddf 100644 --- a/doc/src/sgml/ref/create_database.sgml +++ b/doc/src/sgml/ref/create_database.sgml @@ -28,6 +28,7 @@ [ LOCALE [=] locale ] [ LC_COLLATE [=] lc_collate ] [ LC_CTYPE [=] lc_ctype ] + [ COLLATION_PROVIDER [=] collation_provider ] [ TABLESPACE [=] tablespace_name ] [ ALLOW_CONNECTIONS [=] allowconn ] [ CONNECTION LIMIT [=] connlimit ] @@ -158,6 +159,21 @@ Parameters + + + collation_provider + + + +Specifies the provider to use for the default collation in this +database. Possible values are: +icu,ICU +libc. libc is the default. The +available choices depend on the operating system and build options. + + + + tablespace_name diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml index 86473455c9..4b07363fcc 100644 --- a/doc/src/sgml/ref/createdb.sgml +++ b/doc/src/sgml/ref/createdb.sgml @@ -83,6 +83,15 @@ Options + + --collation-provider={libc|icu} + + +Specifies the collation provider for the database's default collation. +
Re: RFC: Logging plan of the running query
2022-02-01 01:51, Fujii Masao wrote: Thanks for reviewing and suggestions! + Note that there is the case where the request to log a query + plan is skipped even while the target backend is running a + query due to lock conflict avoidance. + If this happens, users can just retry pg_log_query_plan(). This may cause users to misunderstand that pg_log_query_plan() fails while the target backend is holding *any* locks? Isn't it better to mention "page-level locks", instead? So how about the following? -- Note that the request to log the query plan will be ignored if it's received during a short period while the target backend is holding a page-level lock. -- Agreed. + ereport(LOG_SERVER_ONLY, + errmsg("backend with PID %d is holding a page lock. Try again", + MyProcPid)); It seems more proper to output this message in DETAIL or HINT, instead. So how about something like the following messages? LOG: could not log the query plan DETAIL: query plan cannot be logged while page level lock is being held HINT: Try pg_log_query_plan() after a few Agreed. I felt the HINT message 'after a few ...' is difficult to describe, and wrote as below. | HINT: Retrying pg_log_query_plan() might succeed since the lock duration of page level locks are usually short How do you think? Or we don't need HINT message? Removed the HINT message. + errmsg("could not log the query plan"), + errdetail("query plan cannot be logged while page level lock is being held"), In detail message, the first word of sentences should be capitalized. How about "Cannot log the query plan while holding page-level lock.", instead? Agreed. Thanks for updating the patch! Here are some review comments. + + + + pg_log_query_plan + This entry is placed before one for pg_log_backend_memory_contexts(). But it should be *after* that since those entries seem to be placed in alphabetical order in the table? Modified it. +Requests to log the plan of the query currently running on the +backend with specified process ID along with the untruncated +query string. Other descriptions about logging of query string seem not to mention something like "untruncated query string". For example, auto_explain, log_statement, etc. Why do we need to mention "along with the untruncated query string" specially for pg_log_query_plan()? Modified it as below: Requests to log the plan of the query currently running on the -backend with specified process ID along with the untruncated -query string. -They will be logged at LOG message level and +backend with specified process ID. +It will be logged at LOG message level and +Note that nested statements (statements executed inside a function) are not +considered for logging. Only the plan of the most deeply nested query is logged. Now the plan of even nested statement can be logged. So this description needs to be updated? Modified it as below: -Note that nested statements (statements executed inside a function) are not -considered for logging. Only the plan of the most deeply nested query is logged. +Note that when the statements are executed inside a function, only the +plan of the most deeply nested query is logged. @@ -440,6 +450,7 @@ standard_ExecutorFinish(QueryDesc *queryDesc) MemoryContextSwitchTo(oldcontext); +ActiveQueryDesc = NULL; ActiveQueryDesc seems unnecessary. Why does ActiveQueryDesc need to be reset to NULL in standard_ExecutorFinish()? ActiveQueryDesc should not be reset in standard_ExecutorFinish(). Removed it. Currently even during ProcessLogQueryPlanInterrupt(), pg_log_query_plan() can be call and another ProcessLogQueryPlanInterrupt() can be executed. So repeatable re-entrances to ProcessLogQueryPlanInterrupt() could cause "stack depth limit exceeded" error. To avoid this, shouldn't we make ProcessLogQueryPlanInterrupt() do nothing and return immediately, if it's called during another ProcessLogQueryPlanInterrupt()? pg_log_backend_memory_contexts() also might have the same issue. As you pointed out offlist, the issue could occur even when both pg_log_backend_memory_contexts and pg_log_query_plan are called and it may be better to make another patch. You also pointed out offlist that it would be necessary to call LockErrorCleanup() if ProcessLogQueryPlanInterrupt() can incur ERROR. AFAICS it can call ereport(ERROR), i.e., palloc0() in NewExplainState(), so I added PG_TRY/CATCH and make it call LockErrorCleanup() when ERROR occurs. On 2022-02-01 17:27, Kyotaro Horiguchi wrote: Thanks for reviewing Horiguchi-san! By the way, I'm anxious about the following part
RE: [Proposal] Add foreign-server health checks infrastructure
Dear Fujii-san, Thank you for good suggestions. > This logic sounds complicated to me. I'm afraid that FDW developers may a bit > easily misunderstand the logic and make the bug in their FDW. > Isn't it simpler to just disable the timeout in core whenever the transaction > ends > whether committed or aborted, like statement_timeout is disabled after each > command? For example, call something like DisableForeignCheckTimeout() in > CommitTransaction() etc. Your idea is that stop the timer at the end of each transactions, right? I had not adopted that because I thought some developers might want not to stop the timer even if transactions ends. It caused complexed situation and not have good usecase, however, so your logic was implemented. > > You are right. I think this suggests that error-reporting should be done in > > the > core layer. > > For meaningful error reporting, I changed a callback specification > > that it should return ForeignServer* which points to downed remote server. > > This approach seems to assume that FDW must manage all the ForeignServer > information so that the callback can return it. Is this assumption valid for > all the > FDW? Not sure, the assumption might be too optimistic. mysql_fdw can easily return ForeignServer* because it caches serverid, but dblink and maybe oracle_fdw cannot. > How about making FDW trigger a query cancel interrupt by signaling SIGINT to > the backend, instead? I understood that the error should be caught by QueryCancelPending. Could you check 0003? Does it follow that? Best Regards, Hayato Kuroda FUJITSU LIMITED <> v08_0001_add_checking_infrastracture.patch Description: v08_0001_add_checking_infrastracture.patch v08_0002_add_doc.patch Description: v08_0002_add_doc.patch
Re: Design of pg_stat_subscription_workers vs pgstats
On Wed, Feb 2, 2022 at 1:06 PM David G. Johnston wrote: > > On Tue, Feb 1, 2022 at 11:55 PM Amit Kapila wrote: >> >> On Wed, Feb 2, 2022 at 9:41 AM David G. Johnston >> wrote: >> > >> > On Tue, Feb 1, 2022 at 8:07 PM Amit Kapila wrote: >> >> >> >> On Tue, Feb 1, 2022 at 11:47 AM Masahiko Sawada >> >> wrote: >> >> >> >> > >> >> > I see that it's better to use a better IPC for ALTER SUBSCRIPTION SKIP >> >> > feature to pass error-XID or error-LSN information to the worker >> >> > whereas I'm also not sure of the advantages in storing all error >> >> > information in a system catalog. Since what we need to do for this >> >> > purpose is only error-XID/LSN, we can store only error-XID/LSN in the >> >> > catalog? That is, the worker stores error-XID/LSN in the catalog on an >> >> > error, and ALTER SUBSCRIPTION SKIP command enables the worker to skip >> >> > the transaction in question. The worker clears the error-XID/LSN after >> >> > successfully applying or skipping the first non-empty transaction. >> >> > >> >> >> >> Where do you propose to store this information? >> > >> > >> > pg_subscription_worker >> > >> > The error message and context is very important. Just make sure it is >> > only non-null when the worker state is "syncing failed" (or whatever term >> > we use). >> > >> > >> >> Sure, but is this the reason you want to store all the error info in >> the system catalog? I agree that providing more error info could be >> useful and also possibly the previously failed (apply) xacts info as >> well but I am not able to see why you want to have that sort of info >> in the catalog. I could see storing info like err_lsn/err_xid that can >> allow to proceed to apply worker automatically or to slow down the >> launch of errored apply worker but not all sort of other error info >> (like err_cnt, err_code, err_message, err_time, etc.). I want to know >> why you are insisting to make all the error info persistent via the >> system catalog? > > ... ... > > I already explained that the concept of err_cnt is not useful. The fact that > you include it here makes me think you are still thinking that this all > somehow is meant to keep track of history. It is not. The workers are state > machines and "error" is one of the states - with relevant attributes to > display to the user, and system, while in that state. The state machine > reporting does not care about historical states nor does it report on them. > There is some uncertainty if we continue with the automatic re-launch; > I think automatic retry will help to allow some transient errors say like network glitches that can be resolved on retry and will keep the behavior transparent. This is also consistent with what we do in standby mode where if there is an error on primary due to which standby is not able to fetch some data it will just retry. We can't fix any error that occurred on the server-side, so the way is to retry which is true for both standby and subscribers. Basically, I don't think every kind of error demands user intervention. We can allow to control it via some parameter say disable_on_error as is discussed in CF entry [1] but don't think that should be the default. [1] - https://commitfest.postgresql.org/36/3407/ -- With Regards, Amit Kapila.
Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
On Wed, Feb 2, 2022 at 5:25 AM Nathan Bossart wrote: > > On Mon, Jan 31, 2022 at 10:42:54AM +0530, Bharath Rupireddy wrote: > > After an off-list discussion with Andreas, proposing here a patch that > > basically replaces ReadDir call with ReadDirExtended and gets rid of > > lstat entirely. With this chance, the checkpoint will only care about > > the snapshot and mapping files and not fail if it finds other files in > > the directories. Removing lstat enables us to make things faster as we > > avoid a bunch of extra system calls - one lstat call per each mapping > > or snapshot file. > > I think removing the lstat() is probably reasonable. We currently aren't > doing proper error checking, and the chances of a non-regular file matching > the prefix are likely pretty low. In the worst case, we'll LOG or ERROR > when unlinking or fsyncing fails. > > However, I'm not sure about the change to ReadDirExtended(). That might be > okay for CheckPointSnapBuild(), which is just trying to remove old files, > but CheckPointLogicalRewriteHeap() is responsible for ensuring that files > are flushed to disk for the checkpoint. If we stop reading the directory > after an error and let the checkpoint continue, isn't it possible that some > mappings files won't be persisted to disk? Unless I mis-read your above statement, with LOG level in ReadDirExtended, I don't think we stop reading the files in CheckPointLogicalRewriteHeap. Am I missing something here? Since, we also continue in CheckPointLogicalRewriteHeap if we can't parse/delete some files with the change of "could not parse filename"/"could not remove file" messages to LOG level I'm attaching v6, just changed elog(LOG, to ereport(LOG in CheckPointLogicalRewriteHeap, other things remain the same. Regards, Bharath Rupireddy. v6-0001-Replace-ReadDir-with-ReadDirExtended.patch Description: Binary data
Re: CREATE TABLE ( .. STORAGE ..)
Hi! Are they both set to name or ColId? Although they are the same. Thank you, fixed, that was just an oversight. 2 For ColumnDef new member storage_name, did you miss the function _copyColumnDef() _equalColumnDef()? Thank you, fixed Regards Wenjing 2021年12月27日 15:51,Teodor Sigaev 写道: Hi! Working on pluggable toaster (mostly, for JSONB improvements, see links below) I had found that STORAGE attribute on column is impossible to set in CREATE TABLE command but COMPRESS option is possible. It looks unreasonable. Suggested patch implements this possibility. [1] http://www.sai.msu.su/~megera/postgres/talks/jsonb-pgconfnyc-2021.pdf [2] http://www.sai.msu.su/~megera/postgres/talks/jsonb-pgvision-2021.pdf [3] http://www.sai.msu.su/~megera/postgres/talks/jsonb-pgconfonline-2021.pdf [4] http://www.sai.msu.su/~megera/postgres/talks/bytea-pgconfonline-2021.pdf PS I will propose pluggable toaster patch a bit later -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ create_table_storage-v2.patch.gz Description: application/gzip