Re: error context for vacuum to include block number
Thanks for reviewing again On Sun, Feb 02, 2020 at 10:45:12AM +0900, Masahiko Sawada wrote: > Thank you for updating the patch. Here is some review comments: > > 1. > +typedef struct > +{ > + char*relnamespace; > + char*relname; > + char*indname; /* If vacuuming index */ > > I think "Non-null if vacuuming index" is better. Actually it's undefined garbage (not NULL) if not vacuuming index. > And tablename is better than relname for accuracy? The existing code uses relname, so I left that, since it's strange to start using tablename and then write things like: | errcbarg.tblname = relname; ... | errcontext(_("while scanning block %u of relation \"%s.%s\""), | cbarg->blkno, cbarg->relnamespace, cbarg->tblname); Also, mat views can be vacuumed. > 2. > + BlockNumber blkno; > + int stage; /* 0: scan heap; 1: vacuum heap; 2: vacuum index */ > +} vacuum_error_callback_arg; > > Why do we not support index cleanup phase? The patch started out just handling scan_heap. The added vacuum_heap. Then added vacuum_index. Now, I've added index cleanup. > 4. > +/* > + * Setup error traceback support for ereport() > + * ->relnamespace and ->relname are already set > + */ > +errcbarg.blkno = InvalidBlockNumber; /* Not known yet */ > +errcbarg.stage = 1; > > relnamespace and relname of errcbarg is not set as it is initialized > in this function. Thanks. That's an oversight from switching back to local vars instead of LVRelStats while updating the patch while out of town.. I don't know how to consistently test the vacuum_heap case, but rechecked it just now. postgres=# SET client_min_messages=debug; SET statement_timeout=0; UPDATE t SET a=1+a; SET statement_timeout=150; VACUUM(VERBOSE, PARALLEL 1) t; ... 2020-02-01 23:11:06.482 CST [26609] ERROR: canceling statement due to statement timeout 2020-02-01 23:11:06.482 CST [26609] CONTEXT: while vacuuming block 33 of relation "public.t" > 5. > @@ -177,6 +177,7 @@ typedef struct LVShared > * the lazy vacuum. > */ > Oid relid; > + charrelname[NAMEDATALEN]; /* tablename, used for error callback > */ > > How about getting relation > name from index relation? That is, in lazy_vacuum_index we can get > table oid from the index relation by IndexGetRelation() and therefore > we can get the table name which is in palloc'd memory. That way we > don't need to add relname to any existing struct such as LVRelStats > and LVShared. See attached Also, I think we shouldn't show a block number if it's "Invalid", to avoid saying "while vacuuming block 4294967295 of relation ..." For now, I made it not show any errcontext at all in that case. >From 94f715818dcdf3225a3e7404e395e4a0f0818b0c Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Thu, 12 Dec 2019 20:54:37 -0600 Subject: [PATCH v16 1/3] vacuum errcontext to show block being processed As requested here. https://www.postgresql.org/message-id/20190807235154.erbmr4o4bo6vgnjv%40alap3.anarazel.de --- src/backend/access/heap/vacuumlazy.c | 94 1 file changed, 94 insertions(+) diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 8ce5011..43859bd 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -292,6 +292,13 @@ typedef struct LVRelStats bool lock_waiter_detected; } LVRelStats; +typedef struct +{ + char *relnamespace; + char *relname; + BlockNumber blkno; + int phase; /* Reusing same enums as for progress reporting */ +} vacuum_error_callback_arg; /* A few variables that don't seem worth passing around as parameters */ static int elevel = -1; @@ -361,6 +368,7 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats, LVParallelState *lps, int nindexes); static LVSharedIndStats *get_indstats(LVShared *lvshared, int n); static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared); +static void vacuum_error_callback(void *arg); /* @@ -724,6 +732,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats, PROGRESS_VACUUM_MAX_DEAD_TUPLES }; int64 initprog_val[3]; + ErrorContextCallback errcallback; + vacuum_error_callback_arg errcbarg; pg_rusage_init(); @@ -870,6 +880,17 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats, else skipping_blocks = false; + /* Setup error traceback support for ereport() */ + errcbarg.relnamespace = get_namespace_name(RelationGetNamespace(onerel)); + errcbarg.relname = relname; + errcbarg.blkno = InvalidBlockNumber; /* Not known yet */ + errcbarg.phase = PROGRESS_VACUUM_PHASE_SCAN_HEAP; + + errcallback.callback = vacuum_error_callback; + errcallback.arg = (void *) + errcallback.previous = error_context_stack; + error_context_stack = + for (blkno = 0; blkno < nblocks; blkno++) { Buffer
Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side
On Fri, 31 Jan 2020 at 02:29, Fujii Masao wrote: > > > > On 2020/01/30 12:58, Kyotaro Horiguchi wrote: > > At Wed, 29 Jan 2020 23:16:08 +0900, Fujii Masao > > wrote in > >> Hi, > >> > >> pg_basebackup reports the backup progress if --progress option is > >> specified, > >> and we can monitor it in the client side. I think that it's useful if > >> we can > >> monitor the progress information also in the server side because, for > >> example, > >> we can easily check that by using SQL. So I'd like to propose > >> pg_stat_progress_basebackup view that allows us to monitor the > >> progress > >> of pg_basebackup, in the server side. Thought? > >> > >> POC patch is attached. > > > > | postgres=# \d pg_stat_progress_basebackup > > | View "pg_catalog.pg_stat_progress_basebackup" > > |Column| Type | Collation | Nullable | Default > > | -+-+---+--+- > > | pid | integer | | | > > | phase | text| | | > > | backup_total| bigint | | | > > | backup_streamed | bigint | | | > > | tablespace_total| bigint | | | > > | tablespace_streamed | bigint | | | > > > > I think the view needs client identity such like host/port pair > > besides pid (host/port pair fails identify client in the case of > > unix-sockets.). > > I don't think this is job of a progress reporting. If those information > is necessary, we can join this view with pg_stat_replication using > pid column as the join key. > > > Also elapsed time from session start might be > > useful. > > +1 > I think that not only pg_stat_progress_basebackup but also all the other > progress views should report the time when the target command was > started and the time when the phase was last changed. Those times > would be useful to estimate the remaining execution time from the > progress infromation. > > > pg_stat_progress_acuum has datid, datname and relid. > > > > + if (backup_total > 0 && backup_streamed > backup_total) > > + { > > + backup_total = backup_streamed; > > > > Do we need the condition "backup_total > 0"? > > I added the condition for the case where --progress option is not supplied > in pg_basebackup, i.e., the case where the total amount of backup is not > estimated at the beginning of the backup. In this case, total_backup is > always 0. > > > + int tblspc_streamed = 0; > > + > > + pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE, > > + > > PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP); > > > > This starts "streaming backup" phase with backup_total = 0. Coudln't > > we move to that phase after setting backup total and tablespace total? > > That is, just after calling SendBackupHeader(). > > OK, that's a bit less confusing for users. I will change in that way. > > > +WHEN 3 THEN 'stopping backup'::text > > > > I'm not sure, but the "stop" seems suggesting the backup is terminated > > before completion. If it is following the name of the function > > pg_stop_backup, I think the name is suggesting to stop "the state for > > performing backup", not a backup. > > > > In the first place, the progress is about "backup" so it seems strange > > that we have another phase after the "stopping backup" phase. It > > might be better that it is "finishing file transfer" or such. > > > > "initializing" > > -> "starting file transfer" > > -> "transferring files" > > -> "finishing file transfer" > > -> "transaferring WAL" > > Better name is always welcome! If "stopping back" is confusing, > what about "performing pg_stop_backup"? So > > initializing > performing pg_start_backup > streaming database files > performing pg_stop_backup > transfering WAL files Another idea I came up with is to show steps that take time instead of pg_start_backup/pg_stop_backup, for better understanding the situation. That is, "performing checkpoint" and "performing WAL archive" etc, which engage the most of the time of these functions. Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Autovacuum on partitioned table
On Wed, 29 Jan 2020 at 17:56, Amit Langote wrote: > > On Wed, Jan 29, 2020 at 11:29 AM yuzuko wrote: > > > Besides the complexity of > > > getting that infrastructure in place, an important question is whether > > > the current system of applying threshold and scale factor to > > > changes_since_analyze should be used as-is for inheritance parents > > > (partitioned tables), because if users set those parameters similarly > > > to for regular tables, autovacuum might analyze partitioned tables > > > more than necessary. We'll either need a different formula, or some > > > commentary in the documentation about how partitioned tables might > > > need different setting, or maybe both. > > > > > I'm not sure but I think we need new autovacuum parameters for > > partitioned tables (autovacuum, autovacuum_analyze_threshold, > > autovacuum_analyze_scale_factor) because whether it's necessary > > to run autovacuum on partitioned tables will depend on users. > > What do you think? > > Yes, we will need to first support those parameters on partitioned > tables. Currently, you get: > > create table p (a int) partition by list (a) with > (autovacuum_analyze_scale_factor=0); > ERROR: unrecognized parameter "autovacuum_analyze_scale_factor" > > > > How are you going to track changes_since_analyze of partitioned table? > > > It's just an idea but we can accumulate changes_since_analyze of > > > partitioned table by adding child tables's value after analyzing each > > > child table. And compare the partitioned tables value to the threshold > > > that is computed by (autovacuum_analyze_threshold + total rows > > > including all child tables * autovacuum_analyze_scale_factor). > > > > > The idea Sawada-san mentioned is similar to mine. > > So if I understand this idea correctly, a partitioned table's analyze > will only be triggered when partitions are analyzed. That is, > inserts, updates, deletes of tuples in partitions will be tracked by > pgstat, which in turn is used by autovacuum to trigger analyze on > partitions. Then, partitions changes_since_analyze is added into the > parent's changes_since_analyze, which in turn *may* trigger analyze > parent. I said "may", because it would take multiple partition > analyzes to accumulate enough changes to trigger one on the parent. > Am I getting that right? Yeah that is what I meant. In addition, adding partition's changes_since_analyze to its parent needs to be done recursively as the parent table could also be a partitioned table. > > > Also, for tracking > > changes_since_analyze, we have to make partitioned table's statistics. > > To do that, we can invent a new PgStat_StatPartitionedTabEntry based > > on PgStat_StatTabEntry. Through talking with Amit, I think the new > > structure > > needs the following members: > > > > tableid > > changes_since_analyze > > analyze_timestamp > > analyze_count > > autovac_analyze_timestamp > > autovac_analyze_count > > > > Vacuum doesn't run on partitioned tables, so I think members related to > > (auto) vacuum need not be contained in the structure. > > On second thought, maybe we don't need a new PgStat_ struct. We can > just use what's used for regular tables and leave the fields that > don't make sense for partitioned tables set to 0, such as those that > track the counts of scans, tuples, etc. That means we don't have to > mess with interfaces of existing functions, like this one: > > static void relation_needs_vacanalyze(Oid relid, > AutoVacOpts *relopts, > Form_pg_class classForm, > PgStat_StatTabEntry *tabentry, ... +1 Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: error context for vacuum to include block number
On Tue, 28 Jan 2020 at 07:50, Justin Pryzby wrote: > > On Mon, Jan 27, 2020 at 03:59:58PM +0900, Masahiko Sawada wrote: > > On Mon, 27 Jan 2020 at 14:38, Justin Pryzby wrote: > > > On Sun, Jan 26, 2020 at 12:29:38PM -0800, Andres Freund wrote: > > > > > CONTEXT: while vacuuming relation "public.t_a_idx" > > > > > > > > It'd be a bit nicer if it said index "public.t_a_idx" for relation > > > > "public.t". > > > > > > I think that tips the scale in favour of making vacrelstats a global. > > > I added that as a 1st patch, and squished the callback patches into one. > > > > Hmm I don't think it's a good idea to make vacrelstats global. If we > > want to display the relation name and its index name in error context > > we might want to define a new struct dedicated for error context > > reporting. That is it has blkno, stage and relation name and schema > > name for both table and index and then we set these variables of > > callback argument before performing a vacuum phase. We don't change > > LVRelStats at all. > > On Mon, Jan 27, 2020 at 12:14:38AM -0600, Justin Pryzby wrote: > > It occured to me that there's an issue with sharing vacrelstats between > > scan/vacuum, since blkno and stage are set by the heap/index vacuum > > routines, > > but not reset on their return to heap scan. Not sure if we should reset > > them, > > or go back to using a separate struct, like it was here: > > https://www.postgresql.org/message-id/20200120054159.GT26045%40telsasoft.com > > I went back to this, original, way of doing it. > The parallel vacuum patch made it harder to pass the table around :( > And has to be separately tested: > > | SET statement_timeout=0; DROP TABLE t; CREATE TABLE t AS SELECT > generate_series(1,9)a; CREATE INDEX ON t(a); CREATE INDEX ON t(a); UPDATE > t SET a=1+a; SET statement_timeout=99;VACUUM(VERBOSE, PARALLEL 2) t; > > I had to allocate space for the table name within the LVShared struct, not > just > a pointer, otherwise it would variously crash or fail to output the index > name. > I think pointers can't be passed to parallel process except using some > heavyweight thing like shm_toc_... > > I guess the callback could also take the index relid instead of name, and use > something like IndexGetRelation(). > > > Although the patch replaces get_namespace_name and > > RelationGetRelationName but we use namespace name of relation at only > > two places and almost ereport/elog messages use only relation name > > gotten by RelationGetRelationName which is a macro to access the > > relation name in Relation struct. So I think adding relname to > > LVRelStats would not be a big benefit. Similarly, adding table > > namespace to LVRelStats would be good to avoid calling > > get_namespace_name whereas I'm not sure it's worth to have it because > > it's expected not to be really many times. > > Right, I only tried that to save a few LOC and maybe make shorter lines. > It's not important so I'll drop that patch. Thank you for updating the patch. Here is some review comments: 1. +typedef struct +{ + char*relnamespace; + char*relname; + char*indname; /* If vacuuming index */ I think "Non-null if vacuuming index" is better. And tablename is better than relname for accuracy? 2. + BlockNumber blkno; + int stage; /* 0: scan heap; 1: vacuum heap; 2: vacuum index */ +} vacuum_error_callback_arg; Why do we not support index cleanup phase? 3. /* Work on all the indexes, then the heap */ lazy_vacuum_all_indexes(onerel, Irel, indstats, vacrelstats, lps, nindexes); - /* Remove tuples from heap */ lazy_vacuum_heap(onerel, vacrelstats); I think it's an unnecessary removal. 4. static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats) { int tupindex; int npages; PGRUsageru0; Buffer vmbuffer = InvalidBuffer; +ErrorContextCallback errcallback; +vacuum_error_callback_arg errcbarg; /* Report that we are now vacuuming the heap */ pgstat_progress_update_param(PROGRESS_VACUUM_PHASE, PROGRESS_VACUUM_PHASE_VACUUM_HEAP); +/* + * Setup error traceback support for ereport() + * ->relnamespace and ->relname are already set + */ +errcbarg.blkno = InvalidBlockNumber; /* Not known yet */ +errcbarg.stage = 1; relnamespace and relname of errcbarg is not set as it is initialized in this function. 5. @@ -177,6 +177,7 @@ typedef struct LVShared * the lazy vacuum. */ Oid relid; + charrelname[NAMEDATALEN]; /* tablename, used for error callback */ Hmm I think it's not a good idea to have LVShared have relname because the parallel vacuum worker being able to know the table name by oid and it consumes DSM memory. To pass the relation name down to lazy_vacuum_index I thought to add new argument relname to some functions but in
Internal key management system
Hi, I've started a new separate thread from the previous long thread[1] for internal key management system to PostgreSQL. As I mentioned in the previous mail[2], we've decided to step back and focus on only internal key management system for PG13. The internal key management system introduces the functionality to PostgreSQL that allows user to encrypt and decrypt data without knowing the actual key. Besides, it will be able to be integrated with transparent data encryption in the future. The basic idea is that PostgreSQL generates the master encryption key which is further protected by the user-provided passphrase. The key management system provides two functions to wrap and unwrap the secret by the master encryption key. A user generates a secret key locally and send it to PostgreSQL to wrap it using by pg_kmgr_wrap() and save it somewhere. Then the user can use the encrypted secret key to encrypt data and decrypt data by something like: INSERT INTO tbl VALUES (pg_encrypt('user data', pg_kmgr_unwrap('x')); SELECT pg_decrypt(secret_column, pg_kmgr_unwrap('x')) FROM tbl; Where 'x' is the result of pg_kmgr_wrap function. That way we can get something encrypted and decrypted without ever knowing the actual key that was used to encrypt it. I'm currently updating the patch and will submit it. On Sun, 2 Feb 2020 at 00:37, Sehrope Sarkuni wrote: > > On Fri, Jan 31, 2020 at 1:21 AM Masahiko Sawada > wrote: > > On Thu, 30 Jan 2020 at 20:36, Sehrope Sarkuni wrote: > > > That > > > would allow the internal usage to have a fixed output length of > > > LEN(IV) + LEN(HMAC) + LEN(DATA) = 16 + 32 + 64 = 112 bytes. > > > > Probably you meant LEN(DATA) is 32? DATA will be an encryption key for > > AES256 (master key) internally generated. > > No it should be 64-bytes. That way we can have separate 32-byte > encryption key (for AES256) and 32-byte MAC key (for HMAC-SHA256). > > While it's common to reuse the same 32-byte key for both AES256 and an > HMAC-SHA256 and there aren't any known issues with doing so, when > designing something from scratch it's more secure to use entirely > separate keys. The HMAC key you mentioned above is not the same as the HMAC key derived from the user provided passphrase, right? That is, individual key needs to have its IV and HMAC key. Given that the HMAC key used for HMAC(IV || ENCRYPT(KEY, IV, DATA)) is the latter key (derived from passphrase), what will be the former key used for? > > > > For the user facing piece, padding would enabled to support arbitrary > > > input data lengths. That would make the output length grow by up to > > > 16-bytes (rounding the data length up to the AES block size) plus one > > > more byte if a version field is added. > > > > I think the length of padding also needs to be added to the output. > > Anyway, in the first version the same methods of wrapping/unwrapping > > key are used for both internal use and user facing function. And user > > input key needs to be a multiple of 16 bytes value. > > A separate length field does not need to be added as the > padding-enabled output will already include it at the end[1]. This > would be handled automatically by the OpenSSL encryption / decryption > operations (if it's enabled): > Yes, right. Regards, [1] https://www.postgresql.org/message-id/031401d3f41d%245c70ed90%241552c8b0%24%40lab.ntt.co.jp [2] https://www.postgresql.org/message-id/CAD21AoD8QT0TWs3ma-aB821vwDKa1X519y1w3yrRKkAWjhZcrw%40mail.gmail.com -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Created feature for to_date() conversion using patterns 'YYYY-WW', 'YYYY-WW-D', 'YYYY-MM-W' and 'YYYY-MM-W-D'
I wrote: > Either way, though, the WW weeks don't line up with the D weeks, > and we're not likely to make them do so. > So I think an acceptable version of this feature has to involve > defining at least one new format code and maybe as many as three, > to produce year, week and day values that agree on whichever > definition of "a week" you want to use, and then to_date has to > enforce that input uses matching year/week/day field types, > very much like it already does for ISO versus Gregorian dates. A different line of thought could be to accept the current to_char() behavior for WW and D, and go ahead and teach to_date() to invert that. That is, take plus WW as specifying a seven-day interval, and then D chooses the matching day within that interval. This would still have the property you complained about originally that WW-plus-D don't form a monotonically increasing sequence, but I think that ship has sailed. regards, tom lane
Re: BUG #16171: Potential malformed JSON in explain output
Hamid Akhtar writes: > I've reviewed and verified this patch and IMHO, this is ready to be committed. I took a look at this and I don't think it's really going in the right direction. ISTM the clear intent of this code was to attach the "Subplans Removed" item as a field of the parent [Merge]Append node, but the author forgot about the intermediate "Plans" array. So I think that, rather than doubling down on a mistake, we ought to move where the field is generated so that it *is* a field of the parent node. Another failure to follow the design conventions for EXPLAIN output is that in non-text formats, the schema for each node type ought to be fixed; that is, if a given field can appear for a particular node type and EXPLAIN options, it should appear always, not be omitted just because it's zero. So that leads me to propose 0001 attached. This does lead to some field order rearrangement in text mode, as per the regression test changes, but I think that's not a big deal. (A change can only happen if there are initplan(s) attached to the parent node.) Also, I wondered whether we had any other violations of correct formatting in this code, which led me to the idea of running auto_explain in JSON mode and having it feed its result to json_in. This isn't a complete test because it won't whine about duplicate field names, but it did correctly find the current bug --- and I couldn't find any others while running the core regression tests with various auto_explain options. 0002 attached isn't committable, because nobody would want the overhead in production, but it seems like a good trick to keep up our sleeves. Thoughts? regards, tom lane diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index c367c75..d901dc4 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -119,8 +119,9 @@ static void ExplainModifyTarget(ModifyTable *plan, ExplainState *es); static void ExplainTargetRel(Plan *plan, Index rti, ExplainState *es); static void show_modifytable_info(ModifyTableState *mtstate, List *ancestors, ExplainState *es); -static void ExplainMemberNodes(PlanState **planstates, int nsubnodes, - int nplans, List *ancestors, ExplainState *es); +static void ExplainMemberNodes(PlanState **planstates, int nplans, + List *ancestors, ExplainState *es); +static void ExplainMissingMembers(int nplans, int nchildren, ExplainState *es); static void ExplainSubPlans(List *plans, List *ancestors, const char *relationship, ExplainState *es); static void ExplainCustomChildren(CustomScanState *css, @@ -1967,6 +1968,30 @@ ExplainNode(PlanState *planstate, List *ancestors, ExplainFlushWorkersState(es); es->workers_state = save_workers_state; + /* + * If partition pruning was done during executor initialization, the + * number of child plans we'll display below will be less than the number + * of subplans that was specified in the plan. To make this a bit less + * mysterious, emit an indication that this happened. Note that this + * field is emitted now because we want it to be a property of the parent + * node; it *cannot* be emitted within the Plans sub-node we'll open next. + */ + switch (nodeTag(plan)) + { + case T_Append: + ExplainMissingMembers(((AppendState *) planstate)->as_nplans, + list_length(((Append *) plan)->appendplans), + es); + break; + case T_MergeAppend: + ExplainMissingMembers(((MergeAppendState *) planstate)->ms_nplans, + list_length(((MergeAppend *) plan)->mergeplans), + es); + break; + default: + break; + } + /* Get ready to display the child plans */ haschildren = planstate->initPlan || outerPlanState(planstate) || @@ -2007,31 +2032,26 @@ ExplainNode(PlanState *planstate, List *ancestors, case T_ModifyTable: ExplainMemberNodes(((ModifyTableState *) planstate)->mt_plans, ((ModifyTableState *) planstate)->mt_nplans, - list_length(((ModifyTable *) plan)->plans), ancestors, es); break; case T_Append: ExplainMemberNodes(((AppendState *) planstate)->appendplans, ((AppendState *) planstate)->as_nplans, - list_length(((Append *) plan)->appendplans), ancestors, es); break; case T_MergeAppend: ExplainMemberNodes(((MergeAppendState *) planstate)->mergeplans, ((MergeAppendState *) planstate)->ms_nplans, - list_length(((MergeAppend *) plan)->mergeplans), ancestors, es); break; case T_BitmapAnd: ExplainMemberNodes(((BitmapAndState *) planstate)->bitmapplans, ((BitmapAndState *) planstate)->nplans, - list_length(((BitmapAnd *) plan)->bitmapplans), ancestors, es); break; case T_BitmapOr: ExplainMemberNodes(((BitmapOrState *) planstate)->bitmapplans, ((BitmapOrState *) planstate)->nplans, - list_length(((BitmapOr *) plan)->bitmapplans),
Re: [Proposal] Global temporary tables
so 1. 2. 2020 v 14:39 odesílatel 曾文旌(义从) napsal: > > > 2020年1月30日 下午10:21,Pavel Stehule 写道: > > > > čt 30. 1. 2020 v 15:17 odesílatel 曾文旌(义从) > napsal: > >> >> >> > 2020年1月29日 下午9:48,Robert Haas 写道: >> > >> > On Tue, Jan 28, 2020 at 12:12 PM 曾文旌(义从) >> wrote: >> >>> Opinion by Pavel >> >>> + rel->rd_islocaltemp = true; <<< if this is valid, then the >> name of field "rd_islocaltemp" is not probably best >> >>> I renamed rd_islocaltemp >> >> >> >> I don't see any change? >> >> >> >> Rename rd_islocaltemp to rd_istemp in >> global_temporary_table_v8-pg13.patch >> > >> > In view of commit 6919b7e3294702adc39effd16634b2715d04f012, I think >> > that this has approximately a 0% chance of being acceptable. If you're >> > setting a field in a way that is inconsistent with the current use of >> > the field, you're probably doing it wrong, because the field has an >> > existing purpose to which new code must conform. And if you're not >> > doing that, then you don't need to rename it. >> Thank you for pointing it out. >> I've rolled back the rename. >> But I still need rd_localtemp to be true, The reason is that >> 1 GTT The GTT needs to support DML in read-only transactions ,like local >> temp table. >> 2 GTT does not need to hold the lock before modifying the index buffer >> ,also like local temp table. >> >> Please give me feedback. >> > > maybe some like > > rel->rd_globaltemp = true; > > and somewhere else > > if (rel->rd_localtemp || rel->rd_globaltemp) > { > ... > } > > I tried to optimize code in global_temporary_table_v10-pg13.patch > > > Please give me feedback. > I tested this patch and I have not any objections - from my user perspective it is work as I expect +#define RELATION_IS_TEMP(relation) \ + ((relation)->rd_islocaltemp || \ + (relation)->rd_rel->relpersistence == RELPERSISTENCE_GLOBAL_TEMP) It looks little bit unbalanced maybe is better to inject rd_isglobaltemp to relation structure and then it should to like +#define RELATION_IS_TEMP(relation) \ + ((relation)->rd_islocaltemp || \ + (relation)->rd_isglobaltemp)) But I have not idea if it helps in complex > Wenjing > > > > > >> >> Wenjing >> >> >> >> >> > >> > -- >> > Robert Haas >> > EnterpriseDB: http://www.enterprisedb.com >> > The Enterprise PostgreSQL Company >> >> >
Re: [Proposal] Global temporary tables
> 2020年1月27日 下午5:38,Konstantin Knizhnik 写道: > > > > On 25.01.2020 18:15, 曾文旌(义从) wrote: >> I wonder why do we need some special check for GTT here. >>> From my point of view cleanup at startup of local storage of temp tables >>> should be performed in the same way for local and global temp tables. >> After oom kill, In autovacuum, the Isolated local temp table will be cleaned >> like orphan temporary tables. The definition of local temp table is deleted >> with the storage file. >> But GTT can not do that. So we have the this implementation in my patch. >> If you have other solutions, please let me know. >> > I wonder if it is possible that autovacuum or some other Postgres process is > killed by OOM and postmaster is not noticing it can doens't restart Postgres > instance? > as far as I know, crash of any process connected to Postgres shared memory > (and autovacuum definitely has such connection) cause Postgres restart. Postmaster will not restart after oom happen, but the startup process will. GTT data files are cleaned up in the startup process. > > >> In my design >> 1 Because different sessions have different transaction information, I >> choose to store the transaction information of GTT in MyProc,not catalog. >> 2 About the XID wraparound problem, the reason is the design of the temp >> table storage(local temp table and global temp table) that makes it can not >> to do vacuum by autovacuum. >> It should be completely solve at the storage level. >> > > My point of view is that vacuuming of temp tables is common problem for local > and global temp tables. > So it has to be addressed in the common way and so we should not try to fix > this problem only for GTT. I think I agree with you this point. However, this does not mean that GTT transaction information stored in pg_class is correct. If you keep it that way, like in global_private_temp-8.patch, It may cause data loss in GTT after aotuvauum. > > >> In fact, The dba can still complete the DDL of the GTT. >> I've provided a set of functions for this case. >> If the dba needs to modify a GTT A(or drop GTT or create index on GTT), he >> needs to do: >> 1 Use the pg_gtt_attached_pids view to list the pids for the session that is >> using the GTT A. >> 2 Use pg_terminate_backend(pid)terminate they except itself. >> 3 Do alter GTT A. >> > IMHO forced terminated of client sessions is not acceptable solution. > And it is not an absolutely necessary requirement. > So from my point of view we should not add such limitations to GTT design. This limitation makes it possible for the GTT to do all the DDL. IMHO even oracle's GTT has similar limitations. > > > >>> >>> What are the reasons of using RowExclusiveLock for GTT instead of >>> AccessExclusiveLock? >>> Yes, GTT data is access only by one backend so no locking here seems to be >>> needed at all. >>> But I wonder what are the motivations/benefits of using weaker lock level >>> here? >> 1 Truncate GTT deletes only the data in the session, so no need use >> high-level lock. >> 2 I think it still needs to be block by DDL of GTT, which is why I use >> RowExclusiveLock. > > Sorry, I do not understand your arguments: we do not need exclusive lock > because we drop only local (private) data > but we need some kind of lock. I agree with 1) and not 2). Yes, we don't need lock for private data, but metadata need. > >> >>> There should be no conflicts in any case... >>> >>> +/* We allow to create index on global temp table only this session >>> use it */ >>> +if (is_other_backend_use_gtt(heapRelation->rd_node)) >>> +elog(ERROR, "can not create index when have other backend >>> attached this global temp table"); >>> + >>> >>> The same argument as in case of dropping GTT: I do not think that >>> prohibiting DLL operations on GTT used by more than one backend is bad idea. >> The idea was to give the GTT almost all the features of a regular table with >> few code changes. >> The current version DBA can still do all DDL for GTT, I've already described. > > I absolutely agree with you that GTT should be given the same features as > regular tables. > The irony is that this most natural and convenient behavior is most easy to > implement without putting some extra restrictions. > Just let indexes for GTT be constructed on demand. It it can be done using > the same function used for regular index creation. The limitation on index creation have been improved in global_temporary_table_v10-pg13.patch. > > >> >>> >>> +/* global temp table not support foreign key constraint yet */ >>> +if (RELATION_IS_GLOBAL_TEMP(pkrel)) >>> +ereport(ERROR, >>> +(errcode(ERRCODE_WRONG_OBJECT_TYPE), >>> + errmsg("referenced relation \"%s\" is not a global temp >>> table", >>> +RelationGetRelationName(pkrel; >>> + >>> >>> Why do we need to prohibit foreign key constraint on GTT? >>
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
On Fri, Jan 31, 2020 at 1:21 AM Masahiko Sawada wrote: > On Thu, 30 Jan 2020 at 20:36, Sehrope Sarkuni wrote: > > That > > would allow the internal usage to have a fixed output length of > > LEN(IV) + LEN(HMAC) + LEN(DATA) = 16 + 32 + 64 = 112 bytes. > > Probably you meant LEN(DATA) is 32? DATA will be an encryption key for > AES256 (master key) internally generated. No it should be 64-bytes. That way we can have separate 32-byte encryption key (for AES256) and 32-byte MAC key (for HMAC-SHA256). While it's common to reuse the same 32-byte key for both AES256 and an HMAC-SHA256 and there aren't any known issues with doing so, when designing something from scratch it's more secure to use entirely separate keys. > > For the user facing piece, padding would enabled to support arbitrary > > input data lengths. That would make the output length grow by up to > > 16-bytes (rounding the data length up to the AES block size) plus one > > more byte if a version field is added. > > I think the length of padding also needs to be added to the output. > Anyway, in the first version the same methods of wrapping/unwrapping > key are used for both internal use and user facing function. And user > input key needs to be a multiple of 16 bytes value. A separate length field does not need to be added as the padding-enabled output will already include it at the end[1]. This would be handled automatically by the OpenSSL encryption / decryption operations (if it's enabled): [1]: https://en.wikipedia.org/wiki/Padding_(cryptography)#PKCS#5_and_PKCS#7 > BTW I think this topic is better to be discussed on a separate thread > as the scope no longer includes TDE. I'll start a new thread for > introducing internal KMS. Good idea. Regards, -- Sehrope Sarkuni Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
Re: widen vacuum buffer counters
Michael Paquier writes: > On Fri, Jan 31, 2020 at 05:13:53PM -0500, Tom Lane wrote: >> Also, %zd is the wrong format code for int64. Recommended practice >> these days is to use "%lld" with an explicit cast of the printf argument >> to long long (just to be sure). That doesn't work safely before v12, >> and if you did insist on back-patching further, you'd need to jump >> through hoops to avoid having platform-specific format codes in a >> translatable string. (The side-effects for translation seem like >> an independent argument against back-patching.) > Surely you meant INT64_FORMAT here? No, because that varies depending on platform, so using it in a translatable string is a bad idea. See e.g. 6a1cd8b92. > Anyway, looking at the patch, > couldn't we just use uint64? Yeah, I was wondering if those counters shouldn't be unsigned, too. Probably doesn't matter once we widen them to 64 bits though. regards, tom lane
Re: PATCH: add support for IN and @> in functional-dependency statistics use
On Sat, Feb 01, 2020 at 08:51:04AM +0100, Pierre Ducroquet wrote: Hello At my current job, we have a lot of multi-tenant databases, thus with tables containing a tenant_id column. Such a column introduces a severe bias in statistics estimation since any other FK in the next columns is very likely to have a functional dependency on the tenant id. We found several queries where this functional dependency messed up the estimations so much the optimizer chose wrong plans. When we tried to use extended statistics with CREATE STATISTIC on tenant_id, other_id, we noticed that the current implementation for detecting functional dependency lacks two features (at least in our use case): - support for IN clauses - support for the array contains operator (that could be considered as a special case of IN) Thanks for the patch. I don't think the lack of support for these clause types is an oversight - we haven't done them because we were not quite sure the functional dependencies can actually apply to them. But maybe we can support them, I'm not against that in principle. After digging in the source code, I think the lack of support for IN clauses is an oversight and due to the fact that IN clauses are ScalarArrayOpExpr instead of OpExpr. The attached patch fixes this by simply copying the code- path for OpExpr and changing the type name. It compiles and the results are correct, with a dependency being correctly taken into consideration when estimating rows. If you think such a copy paste is bad and should be factored in another static bool function, please say so and I will happily provide an updated patch. Hmmm. Consider a query like this: ... WHERE tenant_id = 1 AND another_column IN (2,3) which kinda contradicts the idea of functional dependencies that knowing a value in one column, tells us something about a value in a second column. But that assumes a single value, which is not quite true here. The above query is essentially the same thing as ... WHERE (tenant_id=1 AND (another_column=2 OR another_column=3)) and also ... WHERE (tenant_id=1 AND another_column=2) OR (tenant_id=1 AND another_column=3) at wchich point we could apply functional dependencies - but we'd do it once for each AND-clause, and then combine the results to compute selectivity for the OR clause. But this means that if we apply functional dependencies directly to the original clause, it'll be inconsistent. Which seems a bit unfortunate. Or do I get this wrong? BTW the code added in the 0001 patch is the same as for is_opclause, so maybe we can simply do if (is_opclause(rinfo->clause) || IsA(rinfo->clause, ScalarArrayOpExpr)) { ... } instead of just duplicating the code. We also need some at least some regression tests, testing functional dependencies with this clause type. The lack of support for @> operator, on the other hand, seems to be a decision taken when writing the initial code, but I can not find any mathematical nor technical reason for it. The current code restricts dependency calculation to the = operator, obviously because inequality operators are not going to work... but array contains is just several = operators grouped, thus the same for the dependency calculation. The second patch refactors the operator check in order to also include array contains. No concrete opinion on this yet. I think my concerns are pretty similar to the IN clause, although I'm not sure what you mean by "this could be considered as special case of IN". I tested the patches on current HEAD, but I can test and provide back-ported versions of the patch for other versions if needed (this code path hardly changed since its introduction in 10). I think the chance of this getting backpatched is zero, because it might easily break existing apps. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: psql - add SHOW_ALL_RESULTS option
This patch was marked as ready for committer, but clearly there's an ongoin discussion about what should be the default behavoir, if this breaks existing apps etc. So I've marked it as "needs review" and moved it to the next CF. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Global shared meta cache
This patch was broken and waiting for author since early December, so I've marked it as returned with feedback. Feel free to resubmit an updated version to a future commitfest. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Global temporary tables
Hi, this patch was marked as waiting on author since the beginning of the CF, most likely because it no longer applies (not sure). As there has been very little activity since then, I've marked it as returned with feedback. Feel free to re-submit an updated patch for 2020-03. This definitely does not mean the feature is not desirable, but my feeling is most of the discussion happens on the other thread dealing with global temp tables [1] so maybe we should keep just that one and combine the efforts. [1] https://commitfest.postgresql.org/26/2349/ -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [proposal] recovery_target "latest"
Hi, this patch was waiting on author without any update/response since early December, so I've marked it as returned with feedback. Feel free to re-submit an updated version to a future CF. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Speed up transaction completion faster after many relations are accessed in a transaction
Hi, the patch was in WoA since December, waiting for a rebase. I've marked it as returned with feedback. Feel free to re-submit an updated version into the next CF. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [Proposal] Add accumulated statistics for wait event
so 1. 2. 2020 v 12:34 odesílatel Tomas Vondra napsal: > This patch was in WoA, but that was wrong I think - we got a patch on > January 15, followed by a benchmark by Pavel Stehule, so I think it > should still be in "needs review". So I've updated it and moved it to > the next CF. > currently this patch needs a rebase Pavel > > regards > > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: [Proposal] Add accumulated statistics for wait event
This patch was in WoA, but that was wrong I think - we got a patch on January 15, followed by a benchmark by Pavel Stehule, so I think it should still be in "needs review". So I've updated it and moved it to the next CF. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Fri, Nov 29, 2019 at 09:39:09AM +0100, Julien Rouhaud wrote: On Fri, Nov 29, 2019 at 7:21 AM Michael Paquier wrote: On Fri, Nov 29, 2019 at 03:19:49PM +0900, Michael Paquier wrote: > On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote: >> I'd really like to have the queryid function available through SQL, >> but I think that this specific case wouldn't work very well for >> pg_stat_statements' approach as it's working with oid. The query >> string in pg_stat_activity is the user provided one rather than a >> fully-qualified version, so in order to get that query's queryid, you >> need to know the exact search_path in use in that backend, and that's >> not something available. > > Yeah.. So, we have a patch marked as ready for committer here, and it > seems to me that we have a couple of issues to discuss more about > first particularly this query ID of 0. Again, do others have more > any input to offer? I just realized that with current infrastructure it's not possible to display a utility queryid. We need to recognize utility to not process the counters twice (once in processUtility, once in the underlying executor), so we don't provide a queryid for utility statements in parse analysis. Current magic value 0 has the side effect of showing an invalid queryid for all utilty statements, and using a magic value different from 0 will just always display that magic value. We could instead add another field in the Query and PlannedStmt structs, say "int queryid_flags", that extensions could use for their needs? And while on it, the latest patch does not apply, so a rebase is needed here. Yep, I noticed that this morning. I already rebased the patch locally, I'll send a new version with new modifications when we reach an agreement on the utility issue. Well, this patch was in WoA since November, but now that I look at it that might have been wrong - we're clearly waiting for agreement on how to handle queryid for utility commands. I suspect the WoA status might have been driving people away from this thread :-( I've switched the patch to "needs review" and moved it to the next CF. What I think needs to happen is we get a patch implementing one of the proposed solutions, and discuss that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Copy data to DSA area
I've marked this patch as returned with feedback. It's been sitting in the CF for a long time without any update (and does not apply anymore). That does not mean we don't want the feature (it seems useful) so feel free to resubmit an updated patch. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: enhance SPI to support EXECUTE commands
I've marked this patch as returned with feedback. It's been sitting in the CF without any response from the author since September, and it's not quite clear we actually want/need this feature. If needed, the patch can be resubmitted for 2020-03. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: fix for BUG #3720: wrong results at using ltree
Hi, I've moved this patch to the next CF - it's still in WoA state, but it's supposedly a bugfix so I've decided not to return it with feedback. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pgbench - add pseudo-random permutation function
Hello Alvaro, I read the whole thread, I still don't know what this patch is supposed to do. I know what the words in the subject line mean, but I don't know how this helps a pgbench user run better benchmarks. I feel this is also the sentiment expressed by others earlier in the thread. You indicated that this functionality makes sense to those who want this functionality, but so far only two people, namely the patch author and the reviewer, have participated in the discussion on the substance of this patch. So either the feature is extremely niche, or nobody understands it. I think you ought to take about three steps back and explain this in more basic terms, even just in email at first so that we can then discuss what to put into the documentation. After re-reading one more time, it dawned on me that the point of this is similar in spirit to this one: https://wiki.postgresql.org/wiki/Pseudo_encrypt Indeed. The one in the wiki is useless because it is on all integers, whereas in a benchmark you want it for a given size and you want seeding, but otherwise the same correlation-avoidance problem is addressed. The idea seems to be to map the int4 domain into itself, so you can use a sequence to generate numbers that will not look like a sequence, allowing the user to hide some properties (such as the generation rate) that might be useful to an eavesdropper/attacker. In terms of writing benchmarks, it seems useful to destroy all locality of access, which changes the benchmark completely. Yes. (I'm not sure if this is something benchmark writers really want to have.) I do not get this sentence. I'm sure that a benchmark writer should really want to avoid unrealistic correlations that have a performance impact. If I'm right, then I agree that the documentation provided with the patch does a pretty bad job at explaining it, because until now I didn't at all realize this is what it was. The documentation is improvable, no doubt. Attached is an attempt at improving things. I have added a explicit note and hijacked an existing example to better illustrate the purpose of the function. -- Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 4c48a58ed2..d2344b029b 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -998,7 +998,7 @@ pgbench options d default_seed - seed used in hash functions by default + seed used in hash and pseudo-random permutation functions by default @@ -1494,6 +1494,13 @@ SELECT 2 AS two, 3 AS three \gset p_ pow(2.0, 10), power(2.0, 10) 1024.0 + + pr_perm(i, size [, seed ] ) + integer + pseudo-random permutation in [0,size) + pr_perm(0, 4) + 0, 1, 2 or 3 + random(lb, ub) integer @@ -1545,6 +1552,20 @@ SELECT 2 AS two, 3 AS three \gset p_ shape of the distribution. + + + When designing a benchmark which selects rows non-uniformly, be aware + that using pgbench non-uniform random functions + directly leads to unduly correlations: rows with close ids come out with + similar probability, skewing performance measures because they also reside + in close pages. + + + You must use pr_perm to shuffle the selected ids, or + some other additional step with similar effect, to avoid these skewing impacts. + + + @@ -1639,24 +1660,54 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) / hash_fnv1a accept an input value and an optional seed parameter. In case the seed isn't provided the value of :default_seed is used, which is initialized randomly unless set by the command-line --D option. Hash functions can be used to scatter the -distribution of random functions such as random_zipfian or -random_exponential. For instance, the following pgbench -script simulates possible real world workload typical for social media and +-D option. + + + +Function pr_perm implements a pseudo-random permutation +on an arbitrary size. +It allows to shuffle output of non uniform random functions so that +values drawn more often are not trivially correlated. +It permutes integers in [0, size) using a seed by applying rounds of +simple invertible functions, similarly to an encryption function, +although beware that it is not at all cryptographically secure. +Compared to hash functions discussed above, the function +ensures that a perfect permutation is applied: there are neither collisions +nor holes in the output values. +Values outside the interval are interpreted modulo the size. +The function raises an error if size is not positive. +If no seed is provided, :default_seed is used. + +For instance, the following pgbench script +simulates possible real world workload typical for social media and
Re: polymorphic table functions light
> On 24 Jan 2020, at 08:27, Peter Eisentraut > wrote: > > I definitely want to make it work in a way that does not require writing C > code. My idea was to create a new type, perhaps called "descriptor", that > represents essentially a tuple descriptor. (It could be exactly a TupleDesc, > as this patch does, or something similar.) For the sake of discussion, we > could use JSON as the text representation of this. Then a PL/pgSQL function > or something else high level could easily be written to assemble this. > Interesting use cases are for example in the area of using PL/Perl or > PL/Python for unpacking some serialization format using existing modules in > those languages. I do think it’s very desirable to make it usable outside of C code. > Obviously, there is a lot of leg work to be done between here and there, but > it seems doable. The purpose of this initial patch submission was to get > some opinions on the basic idea of "determine result tuple structure by > calling helper function at parse time", and so far no one has fallen off > their chair from that, so I'm encouraged. ;-) I’m interested in this development, as it makes RECORD-returning SRFs in the SELECT list a viable proposition, and that in turn allows a ValuePerCall SRF to get meaningful benefit from pipelining. (They could always pipeline, but there is no way to extract information from the RECORD that’s returned, with the sole exception of row_to_json.) I couldn’t check out that it would work though because I couldn’t apply the v2 (or v1) patch against either 12.0 or 530609a (which I think was sometime around 25th Jan). Against 12.0, I got a few rejections (prepjointree.c and clauses.c). I figured they might be inconsequential, but no: initdb then fails at CREATE VIEW pg_policies. Different rejections against 530609a, but still initdb fails. But I’m definitely very much encouraged. denty.
Re: widen vacuum buffer counters
On Fri, Jan 31, 2020 at 05:13:53PM -0500, Tom Lane wrote: > +1 for widening these counters, but since they're global variables, -0.2 > or so for back-patching. I don't know of any reason that an extension > would be touching these, but I feel like the problem isn't severe enough > to justify taking an ABI-break risk. I would not recommend doing a back-patch because of that. I don't think that's worth taking any risk. Extension authors can have a lot of imagination. > Also, %zd is the wrong format code for int64. Recommended practice > these days is to use "%lld" with an explicit cast of the printf argument > to long long (just to be sure). That doesn't work safely before v12, > and if you did insist on back-patching further, you'd need to jump > through hoops to avoid having platform-specific format codes in a > translatable string. (The side-effects for translation seem like > an independent argument against back-patching.) Surely you meant INT64_FORMAT here? Anyway, looking at the patch, couldn't we just use uint64? -- Michael signature.asc Description: PGP signature
Re: Prevent pg_basebackup running as root
On Thu, Jan 30, 2020 at 04:00:40PM +0900, Michael Paquier wrote: > Anyway, your patch looks like a good idea to me, so let's see if > others have opinions or objections about it. Seeing nothing, committed v2. -- Michael signature.asc Description: PGP signature
Re: The flinfo->fn_extra question, from me this time.
> On 28 Jan 2020, at 09:56, Thomas Munro wrote: > > ([…] I have no > idea what GUI interaction causes that, but most Apple Mail attachments > seem to be fine.) I gathered from the other thread that posting plain text seems to attach the patches in a way that’s more acceptable. Seems to work, but doesn’t explain exactly what the issue is, and I’m pretty sure I’ve not always had to go via the “make plain text” menu item before. > Here's a quick rebase in case it helps. I mostly applied fine (see > below). The conflicts were just Makefile and expected output files, > which I tried to do the obvious thing with. I had to add a #include > "access/tupdesc.h" to plannodes.h to make something compile (because > it uses TupleDesc). Passes check-world here. Thanks a lot for doing that. I tried it against 530609a, and indeed it seems to work. I’m also watching the polymorphic table functions light thread[0], which at first glance would also seems to make useful SRF RECORD-returning functions when employed in the SELECT list. It’s not doing what this patch does, but people might happy enough to transform their queries into SELECT … FROM (SELECT fn(…)) to achieve pipelining, at least in the short term. [0] https://www.postgresql.org/message-id/46a1cb32-e9c6-e7a8-f3c0-78e6b3f70...@2ndquadrant.com denty.
Re: pgbench - add pseudo-random permutation function
Hello Peter, This patch was marked as RFC on 2019-03-30, but since then there have been a couple more issues pointed out in a review by Thomas Munro, and it went through 2019-09 and 2019-11 without any attention. Is the RFC status still appropriate? Thomas review was about comments/documentation wording and asking for explanations, which I think I addressed, and the code did not actually change, so I'm not sure that the "needs review" is really needed, but do as you feel. I read the whole thread, I still don't know what this patch is supposed to do. I know what the words in the subject line mean, but I don't know how this helps a pgbench user run better benchmarks. I feel this is also the sentiment expressed by others earlier in the thread. You indicated that this functionality makes sense to those who want this functionality, but so far only two people, namely the patch author and the reviewer, have participated in the discussion on the substance of this patch. So either the feature is extremely niche, or nobody understands it. I think you ought to take about three steps back and explain this in more basic terms, even just in email at first so that we can then discuss what to put into the documentation. Here is the motivation for this feature: When you design a benchmark to test performance, you want to avoid unwanted correlation which may impact performance unduly, one way or another. For the default pgbench benchmarks, accounts are chosen uniformly, no problem. However, if you want to test a non uniform distribution, which is what most people would encounter in practice, things are different. For instance, you would replace random by random_exp in the default benchmarks. If you do that, then all accounts with lower ids would come out more often. However this creates an unwanted correlation and performance effect: why frequent accounts would just happen to be those with small ids? which just happen to reside together in the first few pages of the table? In order to avoid these effects, you need to shuffle the chosen account ids, so that frequent accounts are not specifically those at the beginning of the table. Basically, as soon as you have a non uniform random generator selecting step you want to add some shuffle, otherwise you are going to skew your benchmark measures. Nobody should use a non-uniform generator for selecting rows without some shuffling, ever. I have no doubt that many people do without realizing that they are skewing their performance data. Once you realize that, you can try to invent your own shuffling, but frankly this is not as easy as it looks to have something non trivial which would not generate unwanted correlation. I had a lot of (very) bad design before coming up with the one in the patch: You want something fast, you want good steering, which are contradictory objectives. There is also the question of being able to change the shuffling for a given size, for instance to check that there is no unwanted effects, hence the seeding. Good seeded shuffling is what an encryption algorithm do, but for these it is somehow simpler because they would usually work on power of two sizes. In conclusion, using a seeded shuffle step is needed as soon as you start using non random generators. Providing one in pgbench, a tool designed to run possibly non-uniform benchmarks against postgres, looks like common courtesy. Not providing one is helping people to design bad benchmarks, possibly without realizing it, so is outright thoughtlessness. I have no doubt that the documentation should be improved to explain that in a concise but clear way. -- Fabien.
PATCH: add support for IN and @> in functional-dependency statistics use
Hello At my current job, we have a lot of multi-tenant databases, thus with tables containing a tenant_id column. Such a column introduces a severe bias in statistics estimation since any other FK in the next columns is very likely to have a functional dependency on the tenant id. We found several queries where this functional dependency messed up the estimations so much the optimizer chose wrong plans. When we tried to use extended statistics with CREATE STATISTIC on tenant_id, other_id, we noticed that the current implementation for detecting functional dependency lacks two features (at least in our use case): - support for IN clauses - support for the array contains operator (that could be considered as a special case of IN) After digging in the source code, I think the lack of support for IN clauses is an oversight and due to the fact that IN clauses are ScalarArrayOpExpr instead of OpExpr. The attached patch fixes this by simply copying the code- path for OpExpr and changing the type name. It compiles and the results are correct, with a dependency being correctly taken into consideration when estimating rows. If you think such a copy paste is bad and should be factored in another static bool function, please say so and I will happily provide an updated patch. The lack of support for @> operator, on the other hand, seems to be a decision taken when writing the initial code, but I can not find any mathematical nor technical reason for it. The current code restricts dependency calculation to the = operator, obviously because inequality operators are not going to work... but array contains is just several = operators grouped, thus the same for the dependency calculation. The second patch refactors the operator check in order to also include array contains. I tested the patches on current HEAD, but I can test and provide back-ported versions of the patch for other versions if needed (this code path hardly changed since its introduction in 10). Best regards Pierre Ducroquet >From d2b89748e1b520c276875538149eb466e97c21b4 Mon Sep 17 00:00:00 2001 From: Pierre Date: Fri, 31 Jan 2020 23:58:35 +0100 Subject: [PATCH 1/2] Add support for IN clauses in dependencies check --- src/backend/statistics/dependencies.c | 34 +++ 1 file changed, 34 insertions(+) diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c index e2f6c5bb97..d4844a9ec6 100644 --- a/src/backend/statistics/dependencies.c +++ b/src/backend/statistics/dependencies.c @@ -801,6 +801,40 @@ dependency_is_compatible_clause(Node *clause, Index relid, AttrNumber *attnum) /* OK to proceed with checking "var" */ } + else if (IsA(rinfo->clause, ScalarArrayOpExpr)) + { + /* If it's an opclause, check for Var IN Const. */ + ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) rinfo->clause; + + /* Only expressions with two arguments are candidates. */ + if (list_length(expr->args) != 2) + return false; + + /* Make sure non-selected argument is a pseudoconstant. */ + if (is_pseudo_constant_clause(lsecond(expr->args))) + var = linitial(expr->args); + else if (is_pseudo_constant_clause(linitial(expr->args))) + var = lsecond(expr->args); + else + return false; + + /* + * If it's not an "=" operator, just ignore the clause, as it's not + * compatible with functional dependencies. + * + * This uses the function for estimating selectivity, not the operator + * directly (a bit awkward, but well ...). + * + * XXX this is pretty dubious; probably it'd be better to check btree + * or hash opclass membership, so as not to be fooled by custom + * selectivity functions, and to be more consistent with decisions + * elsewhere in the planner. + */ + if (get_oprrest(expr->opno) != F_EQSEL) + return false; + + /* OK to proceed with checking "var" */ + } else if (is_notclause(rinfo->clause)) { /* -- 2.24.1 >From 02363c52d0d03f58b8e79088a7261a9fc1e3bbb7 Mon Sep 17 00:00:00 2001 From: Pierre Date: Fri, 31 Jan 2020 23:58:55 +0100 Subject: [PATCH 2/2] Add support for array contains in dependency check --- src/backend/statistics/dependencies.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c index d4844a9ec6..c3f37a474b 100644 --- a/src/backend/statistics/dependencies.c +++ b/src/backend/statistics/dependencies.c @@ -785,7 +785,7 @@ dependency_is_compatible_clause(Node *clause, Index relid, AttrNumber *attnum) return false; /* - * If it's not an "=" operator, just ignore the clause, as it's not + * If it's not an "=" or "@>" operator, just ignore the clause, as it's not * compatible with functional dependencies. * * This uses the function for estimating selectivity, not the operator @@ -796,8 +796,13 @@ dependency_is_compatible_clause(Node *clause, Index relid, AttrNumber *attnum) * selectivity