Re: error context for vacuum to include block number

2020-02-01 Thread Justin Pryzby
Thanks for reviewing again

On Sun, Feb 02, 2020 at 10:45:12AM +0900, Masahiko Sawada wrote:
> Thank you for updating the patch. Here is some review comments:
> 
> 1.
> +typedef struct
> +{
> +   char*relnamespace;
> +   char*relname;
> +   char*indname; /* If vacuuming index */
> 
> I think "Non-null if vacuuming index" is better.

Actually it's undefined garbage (not NULL) if not vacuuming index.

> And tablename is better than relname for accuracy?

The existing code uses relname, so I left that, since it's strange to
start using tablename and then write things like:

|   errcbarg.tblname = relname;
...
|   errcontext(_("while scanning block %u of relation \"%s.%s\""),
|   cbarg->blkno, cbarg->relnamespace, cbarg->tblname);

Also, mat views can be vacuumed.

> 2.
> +   BlockNumber blkno;
> +   int stage;  /* 0: scan heap; 1: vacuum heap; 2: vacuum index */
> +} vacuum_error_callback_arg;
> 
> Why do we not support index cleanup phase?

The patch started out just handling scan_heap.  The added vacuum_heap.  Then
added vacuum_index.  Now, I've added index cleanup.

> 4.
> +/*
> + * Setup error traceback support for ereport()
> + * ->relnamespace and ->relname are already set
> + */
> +errcbarg.blkno = InvalidBlockNumber; /* Not known yet */
> +errcbarg.stage = 1;
> 
> relnamespace and relname of errcbarg is not set as it is initialized
> in this function.

Thanks. That's an oversight from switching back to local vars instead of
LVRelStats while updating the patch while out of town..

I don't know how to consistently test the vacuum_heap case, but rechecked it 
just now.

postgres=# SET client_min_messages=debug; SET statement_timeout=0; UPDATE t SET 
a=1+a; SET statement_timeout=150; VACUUM(VERBOSE, PARALLEL 1) t;
...
2020-02-01 23:11:06.482 CST [26609] ERROR:  canceling statement due to 
statement timeout
2020-02-01 23:11:06.482 CST [26609] CONTEXT:  while vacuuming block 33 of 
relation "public.t"

> 5.
> @@ -177,6 +177,7 @@ typedef struct LVShared
>  * the lazy vacuum.
>  */
> Oid relid;
> +   charrelname[NAMEDATALEN];   /* tablename, used for error callback 
> */
> 
> How about getting relation
> name from index relation? That is, in lazy_vacuum_index we can get
> table oid from the index relation by IndexGetRelation() and therefore
> we can get the table name which is in palloc'd memory. That way we
> don't need to add relname to any existing struct such as LVRelStats
> and LVShared.

See attached

Also, I think we shouldn't show a block number if it's "Invalid", to avoid
saying "while vacuuming block 4294967295 of relation ..."

For now, I made it not show any errcontext at all in that case.
>From 94f715818dcdf3225a3e7404e395e4a0f0818b0c Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v16 1/3] vacuum errcontext to show block being processed

As requested here.
https://www.postgresql.org/message-id/20190807235154.erbmr4o4bo6vgnjv%40alap3.anarazel.de
---
 src/backend/access/heap/vacuumlazy.c | 94 
 1 file changed, 94 insertions(+)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8ce5011..43859bd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -292,6 +292,13 @@ typedef struct LVRelStats
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+typedef struct
+{
+	char 		*relnamespace;
+	char		*relname;
+	BlockNumber blkno;
+	int			phase;	/* Reusing same enums as for progress reporting */
+} vacuum_error_callback_arg;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -361,6 +368,7 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
 
 
 /*
@@ -724,6 +732,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	ErrorContextCallback errcallback;
+	vacuum_error_callback_arg errcbarg;
 
 	pg_rusage_init();
 
@@ -870,6 +880,17 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
+	/* Setup error traceback support for ereport() */
+	errcbarg.relnamespace = get_namespace_name(RelationGetNamespace(onerel));
+	errcbarg.relname = relname;
+	errcbarg.blkno = InvalidBlockNumber; /* Not known yet */
+	errcbarg.phase = PROGRESS_VACUUM_PHASE_SCAN_HEAP;
+
+	errcallback.callback = vacuum_error_callback;
+	errcallback.arg = (void *) 
+	errcallback.previous = error_context_stack;
+	error_context_stack = 
+
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
 		Buffer		

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-02-01 Thread Masahiko Sawada
On Fri, 31 Jan 2020 at 02:29, Fujii Masao  wrote:
>
>
>
> On 2020/01/30 12:58, Kyotaro Horiguchi wrote:
> > At Wed, 29 Jan 2020 23:16:08 +0900, Fujii Masao 
> >  wrote in
> >> Hi,
> >>
> >> pg_basebackup reports the backup progress if --progress option is
> >> specified,
> >> and we can monitor it in the client side. I think that it's useful if
> >> we can
> >> monitor the progress information also in the server side because, for
> >> example,
> >> we can easily check that by using SQL. So I'd like to propose
> >> pg_stat_progress_basebackup view that allows us to monitor the
> >> progress
> >> of pg_basebackup, in the server side. Thought?
> >>
> >> POC patch is attached.
> >
> > | postgres=# \d pg_stat_progress_basebackup
> > |  View "pg_catalog.pg_stat_progress_basebackup"
> > |Column|  Type   | Collation | Nullable | Default
> > | -+-+---+--+-
> > |  pid | integer |   |  |
> > |  phase   | text|   |  |
> > |  backup_total| bigint  |   |  |
> > |  backup_streamed | bigint  |   |  |
> > |  tablespace_total| bigint  |   |  |
> > |  tablespace_streamed | bigint  |   |  |
> >
> > I think the view needs client identity such like host/port pair
> > besides pid (host/port pair fails identify client in the case of
> > unix-sockets.).
>
> I don't think this is job of a progress reporting. If those information
> is necessary, we can join this view with pg_stat_replication using
> pid column as the join key.
>
> > Also elapsed time from session start might be
> > useful.
>
> +1
> I think that not only pg_stat_progress_basebackup but also all the other
> progress views should report the time when the target command was
> started and the time when the phase was last changed. Those times
> would be useful to estimate the remaining execution time from the
> progress infromation.
>
> > pg_stat_progress_acuum has datid, datname and relid.
> >
> > + if (backup_total > 0 && backup_streamed > backup_total)
> > + {
> > + backup_total = backup_streamed;
> >
> > Do we need the condition "backup_total > 0"?
>
> I added the condition for the case where --progress option is not supplied
> in pg_basebackup, i.e., the case where the total amount of backup is not
> estimated at the beginning of the backup. In this case, total_backup is
> always 0.
>
> > + int tblspc_streamed = 0;
> > +
> > + pgstat_progress_update_param(PROGRESS_BASEBACKUP_PHASE,
> > +  
> > PROGRESS_BASEBACKUP_PHASE_STREAM_BACKUP);
> >
> > This starts "streaming backup" phase with backup_total = 0. Coudln't
> > we move to that phase after setting backup total and tablespace total?
> > That is, just after calling SendBackupHeader().
>
> OK, that's a bit less confusing for users. I will change in that way.
>
> > +WHEN 3 THEN 'stopping backup'::text
> >
> > I'm not sure, but the "stop" seems suggesting the backup is terminated
> > before completion. If it is following the name of the function
> > pg_stop_backup, I think the name is suggesting to stop "the state for
> > performing backup", not a backup.
> >
> > In the first place, the progress is about "backup" so it seems strange
> > that we have another phase after the "stopping backup" phase.  It
> > might be better that it is "finishing file transfer" or such.
> >
> > "initializing"
> > -> "starting file transfer"
> > -> "transferring files"
> > -> "finishing file transfer"
> > -> "transaferring WAL"
>
> Better name is always welcome! If "stopping back" is confusing,
> what about "performing pg_stop_backup"? So
>
> initializing
> performing pg_start_backup
> streaming database files
> performing pg_stop_backup
> transfering WAL files

Another idea I came up with is to show steps that take time instead of
pg_start_backup/pg_stop_backup, for better understanding the
situation. That is, "performing checkpoint" and "performing WAL
archive" etc, which engage the most of the time of these functions.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Autovacuum on partitioned table

2020-02-01 Thread Masahiko Sawada
On Wed, 29 Jan 2020 at 17:56, Amit Langote  wrote:
>
> On Wed, Jan 29, 2020 at 11:29 AM yuzuko  wrote:
> > > Besides the complexity of
> > > getting that infrastructure in place, an important question is whether
> > > the current system of applying threshold and scale factor to
> > > changes_since_analyze should be used as-is for inheritance parents
> > > (partitioned tables), because if users set those parameters similarly
> > > to for regular tables, autovacuum might analyze partitioned tables
> > > more than necessary.  We'll either need a different formula, or some
> > > commentary in the documentation about how partitioned tables might
> > > need different setting, or maybe both.
> > >
> > I'm not sure but I think we need new autovacuum parameters for
> > partitioned tables (autovacuum, autovacuum_analyze_threshold,
> > autovacuum_analyze_scale_factor) because whether it's necessary
> > to run autovacuum on partitioned tables will depend on users.
> > What do you think?
>
> Yes, we will need to first support those parameters on partitioned
> tables.  Currently, you get:
>
> create table p (a int) partition by list (a) with
> (autovacuum_analyze_scale_factor=0);
> ERROR:  unrecognized parameter "autovacuum_analyze_scale_factor"
>
> > > How are you going to track changes_since_analyze of partitioned table?
> > > It's just an idea but we can accumulate changes_since_analyze of
> > > partitioned table by adding child tables's value after analyzing each
> > > child table. And compare the partitioned tables value to the threshold
> > > that is computed by (autovacuum_analyze_threshold  + total rows
> > > including all child tables * autovacuum_analyze_scale_factor).
> > >
> > The idea Sawada-san mentioned is similar to mine.
>
> So if I understand this idea correctly, a partitioned table's analyze
> will only be triggered when partitions are analyzed.  That is,
> inserts, updates, deletes of tuples in partitions will be tracked by
> pgstat, which in turn is used by autovacuum to trigger analyze on
> partitions.  Then, partitions changes_since_analyze is added into the
> parent's changes_since_analyze, which in turn *may* trigger analyze
> parent.  I said "may", because it would take multiple partition
> analyzes to accumulate enough changes to trigger one on the parent.
> Am I getting that right?

Yeah that is what I meant. In addition, adding partition's
changes_since_analyze to its parent needs to be done recursively as
the parent table could also be a partitioned table.

>
> >  Also, for tracking
> > changes_since_analyze, we have to make partitioned table's statistics.
> > To do that, we can invent a new PgStat_StatPartitionedTabEntry based
> > on PgStat_StatTabEntry.  Through talking with Amit, I think the new 
> > structure
> > needs the following members:
> >
> > tableid
> > changes_since_analyze
> > analyze_timestamp
> > analyze_count
> > autovac_analyze_timestamp
> > autovac_analyze_count
> >
> > Vacuum doesn't run on partitioned tables, so I think members related to
> > (auto) vacuum need not be contained in the structure.
>
> On second thought, maybe we don't need a new PgStat_ struct.  We can
> just use what's used for regular tables and leave the fields that
> don't make sense for partitioned tables set to 0, such as those that
> track the counts of scans, tuples, etc.  That means we don't have to
> mess with interfaces of existing functions, like this one:
>
> static void relation_needs_vacanalyze(Oid relid,
>   AutoVacOpts *relopts,
>   Form_pg_class classForm,
>   PgStat_StatTabEntry *tabentry, ...

+1

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-02-01 Thread Masahiko Sawada
On Tue, 28 Jan 2020 at 07:50, Justin Pryzby  wrote:
>
> On Mon, Jan 27, 2020 at 03:59:58PM +0900, Masahiko Sawada wrote:
> > On Mon, 27 Jan 2020 at 14:38, Justin Pryzby  wrote:
> > > On Sun, Jan 26, 2020 at 12:29:38PM -0800, Andres Freund wrote:
> > > > > CONTEXT:  while vacuuming relation "public.t_a_idx"
> > > >
> > > > It'd be a bit nicer if it said index "public.t_a_idx" for relation 
> > > > "public.t".
> > >
> > > I think that tips the scale in favour of making vacrelstats a global.
> > > I added that as a 1st patch, and squished the callback patches into one.
> >
> > Hmm I don't think it's a good idea to make vacrelstats global. If we
> > want to display the relation name and its index name in error context
> > we might want to define a new struct dedicated for error context
> > reporting. That is it has blkno, stage and relation name and schema
> > name for both table and index and then we set these variables of
> > callback argument before performing a vacuum phase. We don't change
> > LVRelStats at all.
>
> On Mon, Jan 27, 2020 at 12:14:38AM -0600, Justin Pryzby wrote:
> > It occured to me that there's an issue with sharing vacrelstats between
> > scan/vacuum, since blkno and stage are set by the heap/index vacuum 
> > routines,
> > but not reset on their return to heap scan.  Not sure if we should reset 
> > them,
> > or go back to using a separate struct, like it was here:
> > https://www.postgresql.org/message-id/20200120054159.GT26045%40telsasoft.com
>
> I went back to this, original, way of doing it.
> The parallel vacuum patch made it harder to pass the table around :(
> And has to be separately tested:
>
> | SET statement_timeout=0; DROP TABLE t; CREATE TABLE t AS SELECT 
> generate_series(1,9)a; CREATE INDEX ON t(a); CREATE INDEX ON t(a); UPDATE 
> t SET a=1+a; SET statement_timeout=99;VACUUM(VERBOSE, PARALLEL 2) t;
>
> I had to allocate space for the table name within the LVShared struct, not 
> just
> a pointer, otherwise it would variously crash or fail to output the index 
> name.
> I think pointers can't be passed to parallel process except using some
> heavyweight thing like shm_toc_...
>
> I guess the callback could also take the index relid instead of name, and use
> something like IndexGetRelation().
>
> > Although the patch replaces get_namespace_name and
> > RelationGetRelationName but we use namespace name of relation at only
> > two places and almost ereport/elog messages use only relation name
> > gotten by RelationGetRelationName which is a macro to access the
> > relation name in Relation struct. So I think adding relname to
> > LVRelStats would not be a big benefit. Similarly, adding table
> > namespace to LVRelStats would be good to avoid calling
> > get_namespace_name whereas I'm not sure it's worth to have it because
> > it's expected not to be really many times.
>
> Right, I only tried that to save a few LOC and maybe make shorter lines.
> It's not important so I'll drop that patch.

Thank you for updating the patch. Here is some review comments:

1.
+typedef struct
+{
+   char*relnamespace;
+   char*relname;
+   char*indname; /* If vacuuming index */

I think "Non-null if vacuuming index" is better. And tablename is
better than relname for accuracy?

2.
+   BlockNumber blkno;
+   int stage;  /* 0: scan heap; 1: vacuum heap; 2: vacuum index */
+} vacuum_error_callback_arg;

Why do we not support index cleanup phase?

3.
/* Work on all the indexes, then the heap */
lazy_vacuum_all_indexes(onerel, Irel, indstats,
vacrelstats, lps, nindexes);
-
/* Remove tuples from heap */
lazy_vacuum_heap(onerel, vacrelstats);

I think it's an unnecessary removal.

4.
 static void
 lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 {
 int tupindex;
 int npages;
 PGRUsageru0;
 Buffer  vmbuffer = InvalidBuffer;
+ErrorContextCallback errcallback;
+vacuum_error_callback_arg errcbarg;

 /* Report that we are now vacuuming the heap */
 pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
  PROGRESS_VACUUM_PHASE_VACUUM_HEAP);

+/*
+ * Setup error traceback support for ereport()
+ * ->relnamespace and ->relname are already set
+ */
+errcbarg.blkno = InvalidBlockNumber; /* Not known yet */
+errcbarg.stage = 1;

relnamespace and relname of errcbarg is not set as it is initialized
in this function.

5.
@@ -177,6 +177,7 @@ typedef struct LVShared
 * the lazy vacuum.
 */
Oid relid;
+   charrelname[NAMEDATALEN];   /* tablename, used for error callback */

Hmm I think it's not a good idea to have LVShared have relname because
the parallel vacuum worker being able to know the table name by oid
and it consumes DSM memory. To pass the relation name down to
lazy_vacuum_index I thought to add new argument relname to some
functions but in 

Internal key management system

2020-02-01 Thread Masahiko Sawada
Hi,

I've started a new separate thread from the previous long thread[1]
for internal key management system to PostgreSQL. As I mentioned in
the previous mail[2], we've decided to step back and focus on only
internal key management system for PG13. The internal key management
system introduces the functionality to PostgreSQL that allows user to
encrypt and decrypt data without knowing the actual key. Besides, it
will be able to be integrated with transparent data encryption in the
future.

The basic idea is that PostgreSQL generates the master encryption key
which is further protected by the user-provided passphrase. The key
management system provides two functions to wrap and unwrap the secret
by the master encryption key. A user generates a secret key locally
and send it to PostgreSQL to wrap it using by pg_kmgr_wrap() and save
it somewhere. Then the user can use the encrypted secret key to
encrypt data and decrypt data by something like:

INSERT INTO tbl VALUES (pg_encrypt('user data', pg_kmgr_unwrap('x'));
SELECT pg_decrypt(secret_column, pg_kmgr_unwrap('x')) FROM tbl;

Where 'x' is the result of pg_kmgr_wrap function.

That way we can get something encrypted and decrypted without ever
knowing the actual key that was used to encrypt it.

I'm currently updating the patch and will submit it.

On Sun, 2 Feb 2020 at 00:37, Sehrope Sarkuni  wrote:
>
> On Fri, Jan 31, 2020 at 1:21 AM Masahiko Sawada
>  wrote:
> > On Thu, 30 Jan 2020 at 20:36, Sehrope Sarkuni  wrote:
> > > That
> > > would allow the internal usage to have a fixed output length of
> > > LEN(IV) + LEN(HMAC) + LEN(DATA) = 16 + 32 + 64 = 112 bytes.
> >
> > Probably you meant LEN(DATA) is 32? DATA will be an encryption key for
> > AES256 (master key) internally generated.
>
> No it should be 64-bytes. That way we can have separate 32-byte
> encryption key (for AES256) and 32-byte MAC key (for HMAC-SHA256).
>
> While it's common to reuse the same 32-byte key for both AES256 and an
> HMAC-SHA256 and there aren't any known issues with doing so, when
> designing something from scratch it's more secure to use entirely
> separate keys.

The HMAC key you mentioned above is not the same as the HMAC key
derived from the user provided passphrase, right? That is, individual
key needs to have its IV and HMAC key. Given that the HMAC key used
for HMAC(IV || ENCRYPT(KEY, IV, DATA)) is the latter key (derived from
passphrase), what will be the former key used for?

>
> > > For the user facing piece, padding would enabled to support arbitrary
> > > input data lengths. That would make the output length grow by up to
> > > 16-bytes (rounding the data length up to the AES block size) plus one
> > > more byte if a version field is added.
> >
> > I think the length of padding also needs to be added to the output.
> > Anyway, in the first version the same methods of wrapping/unwrapping
> > key are used for both internal use and user facing function. And user
> > input key needs to be a multiple of 16 bytes value.
>
> A separate length field does not need to be added as the
> padding-enabled output will already include it at the end[1]. This
> would be handled automatically by the OpenSSL encryption / decryption
> operations (if it's enabled):
>

Yes, right.

Regards,

[1] 
https://www.postgresql.org/message-id/031401d3f41d%245c70ed90%241552c8b0%24%40lab.ntt.co.jp
[2] 
https://www.postgresql.org/message-id/CAD21AoD8QT0TWs3ma-aB821vwDKa1X519y1w3yrRKkAWjhZcrw%40mail.gmail.com

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Created feature for to_date() conversion using patterns 'YYYY-WW', 'YYYY-WW-D', 'YYYY-MM-W' and 'YYYY-MM-W-D'

2020-02-01 Thread Tom Lane
I wrote:
> Either way, though, the WW weeks don't line up with the D weeks,
> and we're not likely to make them do so.
> So I think an acceptable version of this feature has to involve
> defining at least one new format code and maybe as many as three,
> to produce year, week and day values that agree on whichever
> definition of "a week" you want to use, and then to_date has to
> enforce that input uses matching year/week/day field types,
> very much like it already does for ISO versus Gregorian dates.

A different line of thought could be to accept the current to_char()
behavior for WW and D, and go ahead and teach to_date() to invert that.
That is, take  plus WW as specifying a seven-day interval, and then
D chooses the matching day within that interval.  This would still have
the property you complained about originally that WW-plus-D don't form
a monotonically increasing sequence, but I think that ship has sailed.

regards, tom lane




Re: BUG #16171: Potential malformed JSON in explain output

2020-02-01 Thread Tom Lane
Hamid Akhtar  writes:
> I've reviewed and verified this patch and IMHO, this is ready to be committed.

I took a look at this and I don't think it's really going in the right
direction.  ISTM the clear intent of this code was to attach the "Subplans
Removed" item as a field of the parent [Merge]Append node, but the author
forgot about the intermediate "Plans" array.  So I think that, rather than
doubling down on a mistake, we ought to move where the field is generated
so that it *is* a field of the parent node.

Another failure to follow the design conventions for EXPLAIN output is
that in non-text formats, the schema for each node type ought to be fixed;
that is, if a given field can appear for a particular node type and
EXPLAIN options, it should appear always, not be omitted just because it's
zero.

So that leads me to propose 0001 attached.  This does lead to some field
order rearrangement in text mode, as per the regression test changes,
but I think that's not a big deal.  (A change can only happen if there
are initplan(s) attached to the parent node.)

Also, I wondered whether we had any other violations of correct formatting
in this code, which led me to the idea of running auto_explain in JSON
mode and having it feed its result to json_in.  This isn't a complete
test because it won't whine about duplicate field names, but it did
correctly find the current bug --- and I couldn't find any others while
running the core regression tests with various auto_explain options.
0002 attached isn't committable, because nobody would want the overhead
in production, but it seems like a good trick to keep up our sleeves.

Thoughts?

regards, tom lane

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c367c75..d901dc4 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -119,8 +119,9 @@ static void ExplainModifyTarget(ModifyTable *plan, ExplainState *es);
 static void ExplainTargetRel(Plan *plan, Index rti, ExplainState *es);
 static void show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
   ExplainState *es);
-static void ExplainMemberNodes(PlanState **planstates, int nsubnodes,
-			   int nplans, List *ancestors, ExplainState *es);
+static void ExplainMemberNodes(PlanState **planstates, int nplans,
+			   List *ancestors, ExplainState *es);
+static void ExplainMissingMembers(int nplans, int nchildren, ExplainState *es);
 static void ExplainSubPlans(List *plans, List *ancestors,
 			const char *relationship, ExplainState *es);
 static void ExplainCustomChildren(CustomScanState *css,
@@ -1967,6 +1968,30 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		ExplainFlushWorkersState(es);
 	es->workers_state = save_workers_state;
 
+	/*
+	 * If partition pruning was done during executor initialization, the
+	 * number of child plans we'll display below will be less than the number
+	 * of subplans that was specified in the plan.  To make this a bit less
+	 * mysterious, emit an indication that this happened.  Note that this
+	 * field is emitted now because we want it to be a property of the parent
+	 * node; it *cannot* be emitted within the Plans sub-node we'll open next.
+	 */
+	switch (nodeTag(plan))
+	{
+		case T_Append:
+			ExplainMissingMembers(((AppendState *) planstate)->as_nplans,
+  list_length(((Append *) plan)->appendplans),
+  es);
+			break;
+		case T_MergeAppend:
+			ExplainMissingMembers(((MergeAppendState *) planstate)->ms_nplans,
+  list_length(((MergeAppend *) plan)->mergeplans),
+  es);
+			break;
+		default:
+			break;
+	}
+
 	/* Get ready to display the child plans */
 	haschildren = planstate->initPlan ||
 		outerPlanState(planstate) ||
@@ -2007,31 +2032,26 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_ModifyTable:
 			ExplainMemberNodes(((ModifyTableState *) planstate)->mt_plans,
 			   ((ModifyTableState *) planstate)->mt_nplans,
-			   list_length(((ModifyTable *) plan)->plans),
 			   ancestors, es);
 			break;
 		case T_Append:
 			ExplainMemberNodes(((AppendState *) planstate)->appendplans,
 			   ((AppendState *) planstate)->as_nplans,
-			   list_length(((Append *) plan)->appendplans),
 			   ancestors, es);
 			break;
 		case T_MergeAppend:
 			ExplainMemberNodes(((MergeAppendState *) planstate)->mergeplans,
 			   ((MergeAppendState *) planstate)->ms_nplans,
-			   list_length(((MergeAppend *) plan)->mergeplans),
 			   ancestors, es);
 			break;
 		case T_BitmapAnd:
 			ExplainMemberNodes(((BitmapAndState *) planstate)->bitmapplans,
 			   ((BitmapAndState *) planstate)->nplans,
-			   list_length(((BitmapAnd *) plan)->bitmapplans),
 			   ancestors, es);
 			break;
 		case T_BitmapOr:
 			ExplainMemberNodes(((BitmapOrState *) planstate)->bitmapplans,
 			   ((BitmapOrState *) planstate)->nplans,
-			   list_length(((BitmapOr *) plan)->bitmapplans),
 

Re: [Proposal] Global temporary tables

2020-02-01 Thread Pavel Stehule
so 1. 2. 2020 v 14:39 odesílatel 曾文旌(义从) 
napsal:

>
>
> 2020年1月30日 下午10:21,Pavel Stehule  写道:
>
>
>
> čt 30. 1. 2020 v 15:17 odesílatel 曾文旌(义从) 
> napsal:
>
>>
>>
>> > 2020年1月29日 下午9:48,Robert Haas  写道:
>> >
>> > On Tue, Jan 28, 2020 at 12:12 PM 曾文旌(义从) 
>> wrote:
>> >>> Opinion by Pavel
>> >>> + rel->rd_islocaltemp = true;  <<< if this is valid, then the
>> name of field "rd_islocaltemp" is not probably best
>> >>> I renamed rd_islocaltemp
>> >>
>> >> I don't see any change?
>> >>
>> >> Rename rd_islocaltemp to rd_istemp  in
>> global_temporary_table_v8-pg13.patch
>> >
>> > In view of commit 6919b7e3294702adc39effd16634b2715d04f012, I think
>> > that this has approximately a 0% chance of being acceptable. If you're
>> > setting a field in a way that is inconsistent with the current use of
>> > the field, you're probably doing it wrong, because the field has an
>> > existing purpose to which new code must conform. And if you're not
>> > doing that, then you don't need to rename it.
>> Thank you for pointing it out.
>> I've rolled back the rename.
>> But I still need rd_localtemp to be true, The reason is that
>> 1 GTT The GTT needs to support DML in read-only transactions ,like local
>> temp table.
>> 2 GTT does not need to hold the lock before modifying the index buffer
>> ,also like local temp table.
>>
>> Please give me feedback.
>>
>
> maybe some like
>
> rel->rd_globaltemp = true;
>
> and somewhere else
>
> if (rel->rd_localtemp || rel->rd_globaltemp)
> {
>   ...
> }
>
> I tried to optimize code in global_temporary_table_v10-pg13.patch
>
>
> Please give me feedback.
>

I tested this patch and I have not any objections - from my user
perspective it is work as I expect

+#define RELATION_IS_TEMP(relation) \
+ ((relation)->rd_islocaltemp || \
+ (relation)->rd_rel->relpersistence == RELPERSISTENCE_GLOBAL_TEMP)

It looks little bit unbalanced

maybe is better to inject rd_isglobaltemp to relation structure

and then

it should to like

+#define RELATION_IS_TEMP(relation) \
+ ((relation)->rd_islocaltemp || \
+ (relation)->rd_isglobaltemp))

But I have not idea if it helps in complex







> Wenjing
>
>
>
>
>
>>
>> Wenjing
>>
>>
>>
>>
>> >
>> > --
>> > Robert Haas
>> > EnterpriseDB: http://www.enterprisedb.com
>> > The Enterprise PostgreSQL Company
>>
>>
>


Re: [Proposal] Global temporary tables

2020-02-01 Thread 曾文旌(义从)


> 2020年1月27日 下午5:38,Konstantin Knizhnik  写道:
> 
> 
> 
> On 25.01.2020 18:15, 曾文旌(义从) wrote:
>> I wonder why do we need some special check for GTT here.
>>> From my point of view cleanup at startup of local storage of temp tables 
>>> should be performed in the same way for local and global temp tables.
>> After oom kill, In autovacuum, the Isolated local temp table will be cleaned 
>> like orphan temporary tables. The definition of local temp table is deleted 
>> with the storage file. 
>> But GTT can not do that. So we have the this implementation in my patch.
>> If you have other solutions, please let me know.
>> 
> I wonder if it is possible that autovacuum or some other Postgres process is 
> killed by OOM and postmaster is not noticing it can doens't restart Postgres 
> instance?
> as far as I know, crash of any process connected to Postgres shared memory 
> (and autovacuum definitely has such connection) cause Postgres restart.
Postmaster will not restart after oom happen, but the startup process will. GTT 
data files are cleaned up in the startup process.
> 
> 
>> In my design
>> 1 Because different sessions have different transaction information, I 
>> choose to store the transaction information of GTT in MyProc,not catalog.
>> 2 About the XID wraparound problem, the reason is the design of the temp 
>> table storage(local temp table and global temp table) that makes it can not 
>> to do vacuum by autovacuum. 
>> It should be completely solve at the storage level.
>> 
> 
> My point of view is that vacuuming of temp tables is common problem for local 
> and global temp tables. 
> So it has to be addressed in the common way and so we should not try to fix 
> this problem only for GTT.
I think I agree with you this point.
However, this does not mean that GTT transaction information stored in pg_class 
is correct.
If you keep it that way, like in global_private_temp-8.patch, It may cause data 
loss in GTT after aotuvauum.

> 
> 
>> In fact, The dba can still complete the DDL of the GTT.
>> I've provided a set of functions for this case.
>> If the dba needs to modify a GTT A(or drop GTT or create index on GTT), he 
>> needs to do:
>> 1 Use the pg_gtt_attached_pids view to list the pids for the session that is 
>> using the GTT A.
>> 2 Use pg_terminate_backend(pid)terminate they except itself.
>> 3 Do alter GTT A.
>> 
> IMHO forced terminated of client sessions is not acceptable solution.
> And it is not an absolutely necessary requirement.
> So from my point of view we should not add such limitations to GTT design.
This limitation makes it possible for the GTT to do all the DDL.
IMHO even oracle's GTT has similar limitations.

> 
> 
> 
>>> 
>>> What are the reasons of using RowExclusiveLock for GTT instead of 
>>> AccessExclusiveLock?
>>> Yes, GTT data is access only by one backend so no locking here seems to be 
>>> needed at all.
>>> But I wonder what are the motivations/benefits of using weaker lock level 
>>> here?
>> 1 Truncate GTT deletes only the data in the session, so no need use 
>> high-level lock.
>> 2 I think it still needs to be block by DDL of GTT, which is why I use 
>> RowExclusiveLock.
> 
> Sorry, I do not understand your arguments: we do not need exclusive lock 
> because we drop only local (private) data
> but we need some kind of lock. I agree with 1) and not 2).
Yes, we don't need lock for private data, but metadata need.
> 
>> 
>>> There should be no conflicts in any case...
>>> 
>>> +/* We allow to create index on global temp table only this session 
>>> use it */
>>> +if (is_other_backend_use_gtt(heapRelation->rd_node))
>>> +elog(ERROR, "can not create index when have other backend 
>>> attached this global temp table");
>>> +
>>> 
>>> The same argument as in case of dropping GTT: I do not think that 
>>> prohibiting DLL operations on GTT used by more than one backend is bad idea.
>> The idea was to give the GTT almost all the features of a regular table with 
>> few code changes.
>> The current version DBA can still do all DDL for GTT, I've already described.
> 
> I absolutely agree with you that GTT should be given the same features as 
> regular tables.
> The irony is that this most natural and convenient behavior is most easy to 
> implement without putting some extra restrictions.
> Just let indexes for GTT be constructed on demand. It it can be done using 
> the same function used for regular index creation.
The limitation on index creation have been improved in 
global_temporary_table_v10-pg13.patch.

> 
> 
>> 
>>> 
>>> +/* global temp table not support foreign key constraint yet */
>>> +if (RELATION_IS_GLOBAL_TEMP(pkrel))
>>> +ereport(ERROR,
>>> +(errcode(ERRCODE_WRONG_OBJECT_TYPE),
>>> + errmsg("referenced relation \"%s\" is not a global temp 
>>> table",
>>> +RelationGetRelationName(pkrel;
>>> +
>>> 
>>> Why do we need to prohibit foreign key constraint on GTT?
>> 

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2020-02-01 Thread Sehrope Sarkuni
On Fri, Jan 31, 2020 at 1:21 AM Masahiko Sawada
 wrote:
> On Thu, 30 Jan 2020 at 20:36, Sehrope Sarkuni  wrote:
> > That
> > would allow the internal usage to have a fixed output length of
> > LEN(IV) + LEN(HMAC) + LEN(DATA) = 16 + 32 + 64 = 112 bytes.
>
> Probably you meant LEN(DATA) is 32? DATA will be an encryption key for
> AES256 (master key) internally generated.

No it should be 64-bytes. That way we can have separate 32-byte
encryption key (for AES256) and 32-byte MAC key (for HMAC-SHA256).

While it's common to reuse the same 32-byte key for both AES256 and an
HMAC-SHA256 and there aren't any known issues with doing so, when
designing something from scratch it's more secure to use entirely
separate keys.

> > For the user facing piece, padding would enabled to support arbitrary
> > input data lengths. That would make the output length grow by up to
> > 16-bytes (rounding the data length up to the AES block size) plus one
> > more byte if a version field is added.
>
> I think the length of padding also needs to be added to the output.
> Anyway, in the first version the same methods of wrapping/unwrapping
> key are used for both internal use and user facing function. And user
> input key needs to be a multiple of 16 bytes value.

A separate length field does not need to be added as the
padding-enabled output will already include it at the end[1]. This
would be handled automatically by the OpenSSL encryption / decryption
operations (if it's enabled):

[1]: https://en.wikipedia.org/wiki/Padding_(cryptography)#PKCS#5_and_PKCS#7

> BTW I think this topic is better to be discussed on a separate thread
> as the scope no longer includes TDE. I'll start a new thread for
> introducing internal KMS.

Good idea.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/




Re: widen vacuum buffer counters

2020-02-01 Thread Tom Lane
Michael Paquier  writes:
> On Fri, Jan 31, 2020 at 05:13:53PM -0500, Tom Lane wrote:
>> Also, %zd is the wrong format code for int64.  Recommended practice
>> these days is to use "%lld" with an explicit cast of the printf argument
>> to long long (just to be sure).  That doesn't work safely before v12,
>> and if you did insist on back-patching further, you'd need to jump
>> through hoops to avoid having platform-specific format codes in a
>> translatable string.  (The side-effects for translation seem like
>> an independent argument against back-patching.)

> Surely you meant INT64_FORMAT here?

No, because that varies depending on platform, so using it in a
translatable string is a bad idea.  See e.g. 6a1cd8b92.

> Anyway, looking at the patch,
> couldn't we just use uint64?

Yeah, I was wondering if those counters shouldn't be unsigned, too.
Probably doesn't matter once we widen them to 64 bits though.

regards, tom lane




Re: PATCH: add support for IN and @> in functional-dependency statistics use

2020-02-01 Thread Tomas Vondra

On Sat, Feb 01, 2020 at 08:51:04AM +0100, Pierre Ducroquet wrote:

Hello

At my current job, we have a lot of multi-tenant databases, thus with tables
containing a tenant_id column. Such a column introduces a severe bias in
statistics estimation since any other FK in the next columns is very likely to
have a functional dependency on the tenant id. We found several queries where
this functional dependency messed up the estimations so much the optimizer
chose wrong plans.
When we tried to use extended statistics with CREATE STATISTIC on tenant_id,
other_id, we noticed that the current implementation for detecting functional
dependency lacks two features (at least in our use case):
- support for IN clauses
- support for the array contains operator (that could be considered as a
special case of IN)



Thanks for the patch. I don't think the lack of support for these clause
types is an oversight - we haven't done them because we were not quite
sure the functional dependencies can actually apply to them. But maybe
we can support them, I'm not against that in principle.


After digging in the source code, I think the lack of support for IN clauses
is an oversight and due to the fact that IN clauses are ScalarArrayOpExpr
instead of OpExpr. The attached patch fixes this by simply copying the code-
path for OpExpr and changing the type name. It compiles and the results are
correct, with a dependency being correctly taken into consideration when
estimating rows. If you think such a copy paste is bad and should be factored
in another static bool function, please say so and I will happily provide an
updated patch.


Hmmm. Consider a query like this:

  ... WHERE tenant_id = 1 AND another_column IN (2,3)

which kinda contradicts the idea of functional dependencies that knowing
a value in one column, tells us something about a value in a second
column. But that assumes a single value, which is not quite true here.

The above query is essentially the same thing as

  ... WHERE (tenant_id=1 AND (another_column=2 OR another_column=3))

and also

  ... WHERE (tenant_id=1 AND another_column=2)
 OR (tenant_id=1 AND another_column=3)

at wchich point we could apply functional dependencies - but we'd do it
once for each AND-clause, and then combine the results to compute
selectivity for the OR clause.

But this means that if we apply functional dependencies directly to the
original clause, it'll be inconsistent. Which seems a bit unfortunate.

Or do I get this wrong?

BTW the code added in the 0001 patch is the same as for is_opclause, so
maybe we can simply do

if (is_opclause(rinfo->clause) ||
IsA(rinfo->clause, ScalarArrayOpExpr))
{
...
}

instead of just duplicating the code. We also need some at least some
regression tests, testing functional dependencies with this clause type.


The lack of support for @> operator, on the other hand, seems to be a decision
taken when writing the initial code, but I can not find any mathematical nor
technical reason for it. The current code restricts dependency calculation to
the = operator, obviously because inequality operators are not going to
work... but array contains is just several = operators grouped, thus the same
for the dependency calculation. The second patch refactors the operator check
in order to also include array contains.



No concrete opinion on this yet. I think my concerns are pretty similar
to the IN clause, although I'm not sure what you mean by "this could be
considered as special case of IN".




I tested the patches on current HEAD, but I can test and provide back-ported
versions of the patch for other versions if needed (this code path hardly
changed since its introduction in 10).


I think the chance of this getting backpatched is zero, because it might
easily break existing apps.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: psql - add SHOW_ALL_RESULTS option

2020-02-01 Thread Tomas Vondra

This patch was marked as ready for committer, but clearly there's an
ongoin discussion about what should be the default behavoir, if this
breaks existing apps etc. So I've marked it as "needs review" and moved
it to the next CF.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Global shared meta cache

2020-02-01 Thread Tomas Vondra

This patch was broken and waiting for author since early December, so
I've marked it as returned with feedback. Feel free to resubmit an
updated version to a future commitfest.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Global temporary tables

2020-02-01 Thread Tomas Vondra

Hi,

this patch was marked as waiting on author since the beginning of the
CF, most likely because it no longer applies (not sure). As there has
been very little activity since then, I've marked it as returned with
feedback. Feel free to re-submit an updated patch for 2020-03.

This definitely does not mean the feature is not desirable, but my
feeling is most of the discussion happens on the other thread dealing
with global temp tables [1] so maybe we should keep just that one and
combine the efforts.

[1] https://commitfest.postgresql.org/26/2349/

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [proposal] recovery_target "latest"

2020-02-01 Thread Tomas Vondra

Hi,

this patch was waiting on author without any update/response since early
December, so I've marked it as returned with feedback. Feel free to
re-submit an updated version to a future CF.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Speed up transaction completion faster after many relations are accessed in a transaction

2020-02-01 Thread Tomas Vondra

Hi, the patch was in WoA since December, waiting for a rebase. I've
marked it as returned with feedback. Feel free to re-submit an updated
version into the next CF.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [Proposal] Add accumulated statistics for wait event

2020-02-01 Thread Pavel Stehule
so 1. 2. 2020 v 12:34 odesílatel Tomas Vondra 
napsal:

> This patch was in WoA, but that was wrong I think - we got a patch on
> January 15, followed by a benchmark by Pavel Stehule, so I think it
> should still be in "needs review". So I've updated it and moved it to
> the next CF.
>

currently this patch needs a rebase

Pavel


>
> regards
>
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [Proposal] Add accumulated statistics for wait event

2020-02-01 Thread Tomas Vondra

This patch was in WoA, but that was wrong I think - we got a patch on
January 15, followed by a benchmark by Pavel Stehule, so I think it
should still be in "needs review". So I've updated it and moved it to
the next CF.


regards


--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-02-01 Thread Tomas Vondra

On Fri, Nov 29, 2019 at 09:39:09AM +0100, Julien Rouhaud wrote:

On Fri, Nov 29, 2019 at 7:21 AM Michael Paquier  wrote:


On Fri, Nov 29, 2019 at 03:19:49PM +0900, Michael Paquier wrote:
> On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote:
>> I'd really like to have the queryid function available through SQL,
>> but I think that this specific case wouldn't work very well for
>> pg_stat_statements' approach as it's working with oid.  The query
>> string in pg_stat_activity is the user provided one rather than a
>> fully-qualified version, so in order to get that query's queryid, you
>> need to know the exact search_path in use in that backend, and that's
>> not something available.
>
> Yeah..  So, we have a patch marked as ready for committer here, and it
> seems to me that we have a couple of issues to discuss more about
> first particularly this query ID of 0.  Again, do others have more
> any input to offer?


I just realized that with current infrastructure it's not possible to
display a utility queryid.  We need to recognize utility to not
process the counters twice (once in processUtility, once in the
underlying executor),  so we don't provide a queryid for utility
statements in parse analysis.  Current magic value 0 has the side
effect of showing an invalid queryid for all utilty statements, and
using a magic value different from 0 will just always display that
magic value.  We could instead add another field in the Query and
PlannedStmt structs, say "int queryid_flags", that extensions could
use for their needs?


And while on it, the latest patch does not apply, so a rebase is
needed here.


Yep, I noticed that this morning.  I already rebased the patch
locally, I'll send a new version with new modifications when we reach
an agreement on the utility issue.



Well, this patch was in WoA since November, but now that I look at it
that might have been wrong - we're clearly waiting for agreement on how
to handle queryid for utility commands. I suspect the WoA status might
have been driving people away from this thread :-(

I've switched the patch to "needs review" and moved it to the next CF.
What I think needs to happen is we get a patch implementing one of the
proposed solutions, and discuss that.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Copy data to DSA area

2020-02-01 Thread Tomas Vondra

I've marked this patch as returned with feedback. It's been sitting in
the CF for a long time without any update (and does not apply anymore).
That does not mean we don't want the feature (it seems useful) so feel
free to resubmit an updated patch.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: enhance SPI to support EXECUTE commands

2020-02-01 Thread Tomas Vondra

I've marked this patch as returned with feedback. It's been sitting in
the CF without any response from the author since September, and it's
not quite clear we actually want/need this feature. If needed, the patch
can be resubmitted for 2020-03.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: fix for BUG #3720: wrong results at using ltree

2020-02-01 Thread Tomas Vondra

Hi,

I've moved this patch to the next CF - it's still in WoA state, but it's
supposedly a bugfix so I've decided not to return it with feedback.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pgbench - add pseudo-random permutation function

2020-02-01 Thread Fabien COELHO


Hello Alvaro,


I read the whole thread, I still don't know what this patch is supposed to
do.  I know what the words in the subject line mean, but I don't know how
this helps a pgbench user run better benchmarks.  I feel this is also the
sentiment expressed by others earlier in the thread.  You indicated that
this functionality makes sense to those who want this functionality, but so
far only two people, namely the patch author and the reviewer, have
participated in the discussion on the substance of this patch.  So either
the feature is extremely niche, or nobody understands it.  I think you ought
to take about three steps back and explain this in more basic terms, even
just in email at first so that we can then discuss what to put into the
documentation.


After re-reading one more time, it dawned on me that the point of this
is similar in spirit to this one:
https://wiki.postgresql.org/wiki/Pseudo_encrypt


Indeed. The one in the wiki is useless because it is on all integers, 
whereas in a benchmark you want it for a given size and you want seeding, 
but otherwise the same correlation-avoidance problem is addressed.



The idea seems to be to map the int4 domain into itself, so you can use
a sequence to generate numbers that will not look like a sequence,
allowing the user to hide some properties (such as the generation rate)
that might be useful to an eavesdropper/attacker.  In terms of writing
benchmarks, it seems useful to destroy all locality of access, which
changes the benchmark completely.


Yes.

(I'm not sure if this is something benchmark writers really want to 
have.)


I do not get this sentence. I'm sure that a benchmark writer should really 
want to avoid unrealistic correlations that have a performance impact.



If I'm right, then I agree that the documentation provided with the
patch does a pretty bad job at explaining it, because until now I didn't
at all realize this is what it was.


The documentation is improvable, no doubt.

Attached is an attempt at improving things. I have added a explicit note 
and hijacked an existing example to better illustrate the purpose of the 
function.


--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 4c48a58ed2..d2344b029b 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -998,7 +998,7 @@ pgbench  options  d
 
   
 default_seed 
-   seed used in hash functions by default
+   seed used in hash and pseudo-random permutation functions by default
   
 
   
@@ -1494,6 +1494,13 @@ SELECT 2 AS two, 3 AS three \gset p_
pow(2.0, 10), power(2.0, 10)
1024.0
   
+  
+   pr_perm(i, size [, seed ] )
+   integer
+   pseudo-random permutation in [0,size)
+   pr_perm(0, 4)
+   0, 1, 2 or 3
+  
   
random(lb, ub)
integer
@@ -1545,6 +1552,20 @@ SELECT 2 AS two, 3 AS three \gset p_
 shape of the distribution.

 
+   
+
+  When designing a benchmark which selects rows non-uniformly, be aware
+  that using pgbench non-uniform random functions
+  directly leads to unduly correlations: rows with close ids come out with
+  similar probability, skewing performance measures because they also reside
+  in close pages.
+
+
+  You must use pr_perm to shuffle the selected ids, or
+  some other additional step with similar effect, to avoid these skewing impacts.
+
+   
+

 
  
@@ -1639,24 +1660,54 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
 hash_fnv1a accept an input value and an optional seed parameter.
 In case the seed isn't provided the value of :default_seed
 is used, which is initialized randomly unless set by the command-line
--D option. Hash functions can be used to scatter the
-distribution of random functions such as random_zipfian or
-random_exponential. For instance, the following pgbench
-script simulates possible real world workload typical for social media and
+-D option.
+  
+
+  
+Function pr_perm implements a pseudo-random permutation
+on an arbitrary size.
+It allows to shuffle output of non uniform random functions so that 
+values drawn more often are not trivially correlated.
+It permutes integers in [0, size) using a seed by applying rounds of
+simple invertible functions, similarly to an encryption function,
+although beware that it is not at all cryptographically secure.
+Compared to hash functions discussed above, the function
+ensures that a perfect permutation is applied: there are neither collisions
+nor holes in the output values.
+Values outside the interval are interpreted modulo the size.
+The function raises an error if size is not positive.
+If no seed is provided, :default_seed is used.
+
+For instance, the following pgbench script
+simulates possible real world workload typical for social media and
 

Re: polymorphic table functions light

2020-02-01 Thread Dent John
> On 24 Jan 2020, at 08:27, Peter Eisentraut  
> wrote:
> 
> I definitely want to make it work in a way that does not require writing C 
> code.  My idea was to create a new type, perhaps called "descriptor", that 
> represents essentially a tuple descriptor.  (It could be exactly a TupleDesc, 
> as this patch does, or something similar.)  For the sake of discussion, we 
> could use JSON as the text representation of this.  Then a PL/pgSQL function 
> or something else high level could easily be written to assemble this.  
> Interesting use cases are for example in the area of using PL/Perl or 
> PL/Python for unpacking some serialization format using existing modules in 
> those languages.

I do think it’s very desirable to make it usable outside of C code.

> Obviously, there is a lot of leg work to be done between here and there, but 
> it seems doable.  The purpose of this initial patch submission was to get 
> some opinions on the basic idea of "determine result tuple structure by 
> calling helper function at parse time", and so far no one has fallen off 
> their chair from that, so I'm encouraged. ;-)

I’m interested in this development, as it makes RECORD-returning SRFs in the 
SELECT list a viable proposition, and that in turn allows a ValuePerCall SRF to 
get meaningful benefit from pipelining. (They could always pipeline, but there 
is no way to extract information from the RECORD that’s returned, with the sole 
exception of row_to_json.)

I couldn’t check out that it would work though because I couldn’t apply the v2 
(or v1) patch against either 12.0 or 530609a (which I think was sometime around 
25th Jan). Against 12.0, I got a few rejections (prepjointree.c and clauses.c). 
I figured they might be inconsequential, but no: initdb then fails at CREATE 
VIEW pg_policies. Different rejections against 530609a, but still initdb fails.

But I’m definitely very much encouraged.

denty.



Re: widen vacuum buffer counters

2020-02-01 Thread Michael Paquier
On Fri, Jan 31, 2020 at 05:13:53PM -0500, Tom Lane wrote:
> +1 for widening these counters, but since they're global variables, -0.2
> or so for back-patching.  I don't know of any reason that an extension
> would be touching these, but I feel like the problem isn't severe enough
> to justify taking an ABI-break risk.

I would not recommend doing a back-patch because of that.  I don't
think that's worth taking any risk.  Extension authors can have a lot
of imagination.

> Also, %zd is the wrong format code for int64.  Recommended practice
> these days is to use "%lld" with an explicit cast of the printf argument
> to long long (just to be sure).  That doesn't work safely before v12,
> and if you did insist on back-patching further, you'd need to jump
> through hoops to avoid having platform-specific format codes in a
> translatable string.  (The side-effects for translation seem like
> an independent argument against back-patching.)

Surely you meant INT64_FORMAT here?  Anyway, looking at the patch,
couldn't we just use uint64?
--
Michael


signature.asc
Description: PGP signature


Re: Prevent pg_basebackup running as root

2020-02-01 Thread Michael Paquier
On Thu, Jan 30, 2020 at 04:00:40PM +0900, Michael Paquier wrote:
> Anyway, your patch looks like a good idea to me, so let's see if
> others have opinions or objections about it.

Seeing nothing, committed v2.
--
Michael


signature.asc
Description: PGP signature


Re: The flinfo->fn_extra question, from me this time.

2020-02-01 Thread Dent John
> On 28 Jan 2020, at 09:56, Thomas Munro  wrote:
> 
> ([…]  I have no
> idea what GUI interaction causes that, but most Apple Mail attachments
> seem to be fine.)

I gathered from the other thread that posting plain text seems to attach the 
patches in a way that’s more acceptable. Seems to work, but doesn’t explain 
exactly what the issue is, and I’m pretty sure I’ve not always had to go via 
the “make plain text” menu item before.

> Here's a quick rebase in case it helps.  I mostly applied fine (see
> below).  The conflicts were just Makefile and expected output files,
> which I tried to do the obvious thing with.  I had to add a #include
> "access/tupdesc.h" to plannodes.h to make something compile (because
> it uses TupleDesc).  Passes check-world here.

Thanks a lot for doing that. I tried it against 530609a, and indeed it seems to 
work.

I’m also watching the polymorphic table functions light thread[0], which at 
first glance would also seems to make useful SRF RECORD-returning functions 
when employed in the SELECT list. It’s not doing what this patch does, but 
people might happy enough to transform their queries into SELECT … FROM (SELECT 
fn(…)) to achieve pipelining, at least in the short term.

[0] 
https://www.postgresql.org/message-id/46a1cb32-e9c6-e7a8-f3c0-78e6b3f70...@2ndquadrant.com

denty.



Re: pgbench - add pseudo-random permutation function

2020-02-01 Thread Fabien COELHO



Hello Peter,


This patch was marked as RFC on 2019-03-30, but since then there have
been a couple more issues pointed out in a review by Thomas Munro, and
it went through 2019-09 and 2019-11 without any attention. Is the RFC
status still appropriate?


Thomas review was about comments/documentation wording and asking for
explanations, which I think I addressed, and the code did not actually
change, so I'm not sure that the "needs review" is really needed, but do
as you feel.


I read the whole thread, I still don't know what this patch is supposed to 
do.  I know what the words in the subject line mean, but I don't know how 
this helps a pgbench user run better benchmarks.  I feel this is also the 
sentiment expressed by others earlier in the thread.  You indicated that this 
functionality makes sense to those who want this functionality, but so far 
only two people, namely the patch author and the reviewer, have participated 
in the discussion on the substance of this patch.  So either the feature is 
extremely niche, or nobody understands it.  I think you ought to take about 
three steps back and explain this in more basic terms, even just in email at 
first so that we can then discuss what to put into the documentation.


Here is the motivation for this feature:

When you design a benchmark to test performance, you want to avoid 
unwanted correlation which may impact performance unduly, one way or 
another. For the default pgbench benchmarks, accounts are chosen 
uniformly, no problem. However, if you want to test a non uniform 
distribution, which is what most people would encounter in practice, 
things are different.


For instance, you would replace random by random_exp in the default 
benchmarks. If you do that, then all accounts with lower ids would come 
out more often. However this creates an unwanted correlation and 
performance effect: why frequent accounts would just happen to be those 
with small ids? which just happen to reside together in the first few 
pages of the table? In order to avoid these effects, you need to shuffle 
the chosen account ids, so that frequent accounts are not specifically 
those at the beginning of the table.


Basically, as soon as you have a non uniform random generator selecting 
step you want to add some shuffle, otherwise you are going to skew your 
benchmark measures. Nobody should use a non-uniform generator for 
selecting rows without some shuffling, ever. I have no doubt that many 
people do without realizing that they are skewing their performance data.


Once you realize that, you can try to invent your own shuffling, but 
frankly this is not as easy as it looks to have something non trivial 
which would not generate unwanted correlation. I had a lot of (very) bad 
design before coming up with the one in the patch: You want something 
fast, you want good steering, which are contradictory objectives. There is 
also the question of being able to change the shuffling for a given size, 
for instance to check that there is no unwanted effects, hence the 
seeding. Good seeded shuffling is what an encryption algorithm do, but for 
these it is somehow simpler because they would usually work on power of 
two sizes.


In conclusion, using a seeded shuffle step is needed as soon as you start 
using non random generators. Providing one in pgbench, a tool designed to 
run possibly non-uniform benchmarks against postgres, looks like common 
courtesy. Not providing one is helping people to design bad benchmarks, 
possibly without realizing it, so is outright thoughtlessness.


I have no doubt that the documentation should be improved to explain that 
in a concise but clear way.


--
Fabien.




PATCH: add support for IN and @> in functional-dependency statistics use

2020-02-01 Thread Pierre Ducroquet
Hello

At my current job, we have a lot of multi-tenant databases, thus with tables 
containing a tenant_id column. Such a column introduces a severe bias in 
statistics estimation since any other FK in the next columns is very likely to 
have a functional dependency on the tenant id. We found several queries where 
this functional dependency messed up the estimations so much the optimizer 
chose wrong plans.
When we tried to use extended statistics with CREATE STATISTIC on tenant_id, 
other_id, we noticed that the current implementation for detecting functional 
dependency lacks two features (at least in our use case):
- support for IN clauses
- support for the array contains operator (that could be considered as a 
special case of IN)

After digging in the source code, I think the lack of support for IN clauses 
is an oversight and due to the fact that IN clauses are ScalarArrayOpExpr 
instead of OpExpr. The attached patch fixes this by simply copying the code-
path for OpExpr and changing the type name. It compiles and the results are 
correct, with a dependency being correctly taken into consideration when 
estimating rows. If you think such a copy paste is bad and should be factored 
in another static bool function, please say so and I will happily provide an 
updated patch.
The lack of support for @> operator, on the other hand, seems to be a decision 
taken when writing the initial code, but I can not find any mathematical nor 
technical reason for it. The current code restricts dependency calculation to 
the = operator, obviously because inequality operators are not going to 
work... but array contains is just several = operators grouped, thus the same 
for the dependency calculation. The second patch refactors the operator check 
in order to also include array contains.

I tested the patches on current HEAD, but I can test and provide back-ported 
versions of the patch for other versions if needed (this code path hardly 
changed since its introduction in 10).

Best regards

 Pierre Ducroquet
>From d2b89748e1b520c276875538149eb466e97c21b4 Mon Sep 17 00:00:00 2001
From: Pierre 
Date: Fri, 31 Jan 2020 23:58:35 +0100
Subject: [PATCH 1/2] Add support for IN clauses in dependencies check

---
 src/backend/statistics/dependencies.c | 34 +++
 1 file changed, 34 insertions(+)

diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
index e2f6c5bb97..d4844a9ec6 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -801,6 +801,40 @@ dependency_is_compatible_clause(Node *clause, Index relid, AttrNumber *attnum)
 
 		/* OK to proceed with checking "var" */
 	}
+	else if (IsA(rinfo->clause, ScalarArrayOpExpr))
+	{
+		/* If it's an opclause, check for Var IN Const. */
+		ScalarArrayOpExpr	   *expr = (ScalarArrayOpExpr *) rinfo->clause;
+
+		/* Only expressions with two arguments are candidates. */
+		if (list_length(expr->args) != 2)
+			return false;
+
+		/* Make sure non-selected argument is a pseudoconstant. */
+		if (is_pseudo_constant_clause(lsecond(expr->args)))
+			var = linitial(expr->args);
+		else if (is_pseudo_constant_clause(linitial(expr->args)))
+			var = lsecond(expr->args);
+		else
+			return false;
+
+		/*
+		 * If it's not an "=" operator, just ignore the clause, as it's not
+		 * compatible with functional dependencies.
+		 *
+		 * This uses the function for estimating selectivity, not the operator
+		 * directly (a bit awkward, but well ...).
+		 *
+		 * XXX this is pretty dubious; probably it'd be better to check btree
+		 * or hash opclass membership, so as not to be fooled by custom
+		 * selectivity functions, and to be more consistent with decisions
+		 * elsewhere in the planner.
+		 */
+		if (get_oprrest(expr->opno) != F_EQSEL)
+			return false;
+
+		/* OK to proceed with checking "var" */
+	}
 	else if (is_notclause(rinfo->clause))
 	{
 		/*
-- 
2.24.1

>From 02363c52d0d03f58b8e79088a7261a9fc1e3bbb7 Mon Sep 17 00:00:00 2001
From: Pierre 
Date: Fri, 31 Jan 2020 23:58:55 +0100
Subject: [PATCH 2/2] Add support for array contains in dependency check

---
 src/backend/statistics/dependencies.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
index d4844a9ec6..c3f37a474b 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -785,7 +785,7 @@ dependency_is_compatible_clause(Node *clause, Index relid, AttrNumber *attnum)
 			return false;
 
 		/*
-		 * If it's not an "=" operator, just ignore the clause, as it's not
+		 * If it's not an "=" or "@>" operator, just ignore the clause, as it's not
 		 * compatible with functional dependencies.
 		 *
 		 * This uses the function for estimating selectivity, not the operator
@@ -796,8 +796,13 @@ dependency_is_compatible_clause(Node *clause, Index relid, AttrNumber *attnum)
 		 * selectivity