date:20200320

Re: Internal key management system

2020-03-20 Thread Masahiko Sawada

On Sat, 21 Mar 2020 at 05:30, Bruce Momjian  wrote:
>
> On Thu, Mar 19, 2020 at 09:33:09PM +0900, Masahiko Sawada wrote:
> > Attached updated version patch. This patch incorporated the comments
> > and changed pg_upgrade so that we take over the master encryption key
> > from the old cluster to the new one if both enable key management.
>
> We had a crypto team meeting today, and came away with a few ideas:
>
> We should create an SQL-level master key that is different from the
> block-level master key.  By using separate keys, and not deriving them
> from a single key, they keys can be rotated and migrated to a different
> cluster independently.  For example, users might want to create a new
> cluster with a new block-level key, but might want to copy the SQL-level
> key from the old cluster to the new cluster.  Both keys would be
> unlocked with the same passphrase.

I've updated the patch according to yesterday's meeting. As the above
description by Bruce, the current patch have two encryption keys.
Previously we have the master key in pg_control but due to exceeding
the safe size limit of pg_control I moved two keys to the dedicated
file located at global/pg_key. A wrapped key is 128 bytes and the
total size including two wrapped key became 552 bytes while safe limit
is 512 bytes.

During pg_upgrade we copy the key file from the old cluster to the new
cluster. Therefore we can unwrap the data that is wrapped on the old
cluster on the new cluster.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

kms_v8.patch
Description: Binary data

Re: plan cache overhead on plpgsql expression

2020-03-20 Thread Pavel Stehule

Hi

I did another test

I use a pi estimation algorithm and it is little bit more realistic than
just almost empty cycle body - still probably nobody will calculate pi in
plpgsql.

CREATE OR REPLACE FUNCTION pi_est(n int)
RETURNS numeric AS $$
DECLARE
  accum double precision DEFAULT 1.0;
  c1 double precision DEFAULT 2.0;
  c2 double precision DEFAULT 1.0;
  v constant double precision DEFAULT 2.0;
BEGIN
  FOR i IN 1..n
  LOOP
accum := accum * ((c1 * c1) / (c2 * (c2 + v)));
c1 := c1 + v;
c2 := c2 + v;
  END LOOP;
  RETURN accum * v;
END;
$$ LANGUAGE plpgsql;

For this code the patch increased speed for 1000 iterations from 6.3
sec to 4.7  .. it is speedup about 25%

The best performance (28%) is with code

CREATE OR REPLACE FUNCTION pi_est_2(n int)
RETURNS numeric AS $$
DECLARE
  accum double precision DEFAULT 1.0;
  c1 double precision DEFAULT 2.0;
  c2 double precision DEFAULT 1.0;
BEGIN
  FOR i IN 1..n
  LOOP
accum := accum * ((c1 * c1) / (c2 * (c2 + double precision '2.0')));
c1 := c1 + double precision '2.0';
c2 := c2 + double precision '2.0';
  END LOOP;
  RETURN accum * double precision '2.0';
END;
$$ LANGUAGE plpgsql;

Unfortunately for unoptimized code the performance is worse (it is about
55% slower)

CREATE OR REPLACE FUNCTION pi_est_1(n int)
RETURNS numeric AS $$
DECLARE
  accum double precision DEFAULT 1.0;
  c1 double precision DEFAULT 2.0;
  c2 double precision DEFAULT 1.0;
BEGIN
  FOR i IN 1..n
  LOOP
accum := accum * ((c1 * c1) / (c2 * (c2 + 2.0)));
c1 := c1 + 2.0;
c2 := c2 + 2.0;
  END LOOP;
  RETURN accum * 2.0;
END;
$$ LANGUAGE plpgsql;

same performance (bad) is for explicit casting

CREATE OR REPLACE FUNCTION pi_est_3(n int)
RETURNS numeric AS $$
DECLARE
  accum double precision DEFAULT 1.0;
  c1 double precision DEFAULT 2.0;
  c2 double precision DEFAULT 1.0;
BEGIN
  FOR i IN 1..n
  LOOP
accum := accum * ((c1 * c1) / (c2 * (c2 + 2.0::double precision)));
c1 := c1 + 2.0::double precision;
c2 := c2 + 2.0::double precision;
  END LOOP;
  RETURN accum * double precision '2.0';
END;
$$ LANGUAGE plpgsql;

There is relative high overhead of cast from numeric init_var_from_num.

On master (without patching) the speed all double precision variants is
almost same.

This example can be reduced

CREATE OR REPLACE FUNCTION public.fx(integer)
 RETURNS double precision
 LANGUAGE plpgsql
AS $function$
DECLARE
  result double precision DEFAULT 1.0;
BEGIN
  FOR i IN 1..$1
  LOOP
result := result * 1.01::double precision;
  END LOOP;
  RETURN result;
END;
$function$

CREATE OR REPLACE FUNCTION public.fx_1(integer)
 RETURNS double precision
 LANGUAGE plpgsql
AS $function$
DECLARE
  result double precision DEFAULT 1.0;
BEGIN
  FOR i IN 1..$1
  LOOP
result := result * 1.01;
  END LOOP;
  RETURN result;
END;
$function$

CREATE OR REPLACE FUNCTION public.fx_2(integer)
 RETURNS double precision
 LANGUAGE plpgsql
AS $function$
DECLARE
  result double precision DEFAULT 1.0;
BEGIN
  FOR i IN 1..$1
  LOOP
result := result * double precision '1.01';
  END LOOP;
  RETURN result;
END;
$function$

Patched select fx(100) .. 400ms, fx_1 .. 400ms, fx_2  .. 126ms
Master fx(100) .. 180ms, fx_1 180 ms, fx_2 .. 180ms

So the patch has a problem with constant casting - unfortunately the mix of
double precision variables and numeric constants is pretty often in
Postgres.

Regards

Pavel


test.sql
Description: application/sql

Re: color by default

2020-03-20 Thread Jonah H. Harris

On Tue, Dec 31, 2019 at 8:35 AM Tom Lane  wrote:

> Peter Eisentraut  writes:
> > With the attached patch, I propose to enable the colored output by
> > default in PG13.
>
> FWIW, I shall be setting NO_COLOR permanently if this gets committed.
> I wonder how many people there are who actually *like* colored output?
> I find it to be invariably less readable than plain B text.
>

Same.


> I may well be in the minority, but I think some kind of straw poll
> might be advisable, rather than doing this just because.
>

+1 on no color by default.

-- 
Jonah H. Harris

Re: Why does [auto-]vacuum delay not report a wait event?

2020-03-20 Thread Justin Pryzby

On Thu, Mar 19, 2020 at 03:44:49PM -0700, Andres Freund wrote:
> But uh, unfortunately the vacuum delay code just sleeps without setting
> a wait event:
...
> Seems like it should instead use a new wait event in the PG_WAIT_TIMEOUT
> class?
> 
> Given how frequently we run into trouble with [auto]vacuum throttling
> being a problem, and there not being any way to monitor that currently,
> that seems like it'd be a significant improvement, given the effort?

I see that pg_sleep sets WAIT_EVENT_PG_SLEEP, but pg_usleep doesn't, I guess
because the overhead is significant for a small number of usecs, and because it
doesn't work for pg_usleep to set a generic event if callers want to be able to
set a more specific wait event.

Also, I noticed that SLEEP_ON_ASSERT comment (31338352b) wants to use pg_usleep
"which seems too short.".  Surely it should use pg_sleep, added at 782eefc58 a
few years later ?

Also, there was a suggestion recently that this should have a separate
vacuum_progress phase:
|vacuumlazy.c:#define VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL50 /* ms */
|vacuumlazy.c:pg_usleep(VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL * 1000L);

I was planning to look at that eventually ; should it have a wait event instead
or in addition ?

> It'd probably also be helpful to report the total time [auto]vacuum
> spent being delayed for vacuum verbose/autovacuum logging, but imo
> that'd be a parallel feature to a wait event, not a replacement.

This requires wider changes than I anticipated.

2020-03-20 22:35:51.308 CDT [16534] LOG:  automatic aggressive vacuum of table 
"template1.pg_catalog.pg_class": index scans: 1
pages: 0 removed, 11 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 6 removed, 405 remain, 0 are dead but not yet removable, oldest 
xmin: 1574
buffer usage: 76 hits, 7 misses, 8 dirtied
avg read rate: 16.378 MB/s, avg write rate: 18.718 MB/s
Cost-based delay: 2 msec
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s

VACUUM VERBOSE wouldn't normally be run with cost_delay > 0, so that field will
typically be zero, so I made it conditional, which is supposedly why it's
written like that, even though that's not otherwise being used since 17eaae98.

Added at https://commitfest.postgresql.org/28/2515/

-- 
Justin
>From 68c5ad8c7a9feb0c68afad310e3f52c21c3cdbaf Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 20 Mar 2020 20:47:30 -0500
Subject: [PATCH v1 1/2] Report wait event for cost-based vacuum delay

---
 doc/src/sgml/monitoring.sgml| 2 ++
 src/backend/commands/vacuum.c   | 2 ++
 src/backend/postmaster/pgstat.c | 3 +++
 src/include/pgstat.h| 3 ++-
 4 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 5bffdcce10..46c99a55b7 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1507,6 +1507,8 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   (pg_wal, archive or stream) before trying
   again to retrieve WAL data, at recovery.
  
+ VacuumDelay
+ Waiting in a cost-based vacuum delay point.
 
 
  IO
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index d625d17bf4..59731d687f 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2019,7 +2019,9 @@ vacuum_delay_point(void)
 		if (msec > VacuumCostDelay * 4)
 			msec = VacuumCostDelay * 4;
 
+		pgstat_report_wait_start(WAIT_EVENT_VACUUM_DELAY);
 		pg_usleep((long) (msec * 1000));
+		pgstat_report_wait_end();
 
 		VacuumCostBalance = 0;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index d29c211a76..742ec07b59 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3824,6 +3824,9 @@ pgstat_get_wait_timeout(WaitEventTimeout w)
 		case WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL:
 			event_name = "RecoveryRetrieveRetryInterval";
 			break;
+		case WAIT_EVENT_VACUUM_DELAY:
+			event_name = "VacuumDelay";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 851d0a7246..4db40e23cc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -848,7 +848,8 @@ typedef enum
 	WAIT_EVENT_BASE_BACKUP_THROTTLE = PG_WAIT_TIMEOUT,
 	WAIT_EVENT_PG_SLEEP,
 	WAIT_EVENT_RECOVERY_APPLY_DELAY,
-	WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL
+	WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL,
+	WAIT_EVENT_VACUUM_DELAY,
 } WaitEventTimeout;
 
 /* --
-- 
2.17.0

>From 8153d909cabb94474890f0b55c7733f33923e3c5 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 20 Mar 2020 22:08:09 -0500
Subject: [PATCH v1 2/2] vacuum to report time spent in cost-based delay

---
 src/backend/access/gin/ginfast.c  |  6 +++---
 src/backend/access/gin/ginvacuum.c|  6 +++---

Re: Add A Glossary

2020-03-20 Thread Corey Huinker

On Fri, Mar 20, 2020 at 6:32 PM Jürgen Purtz  wrote:

> man pages: Sorry, if I confused someone with my poor English. I just
> want to express in my 'offline' mail that we don't have to worry about
> man page generation. The patch doesn't affect files in the /ref
> subdirectory from where man pages are created.
>

It wasn't your poor English - everyone else understood what you meant. I
had wondered if our docs went into man page format as well, so my research
was still time well spent.

Re: color by default

2020-03-20 Thread Bruce Momjian

On Fri, Mar 20, 2020 at 11:15:07PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> >> I can confirm there is still no mention of PG_COLORS in our
> >> documentation.
> 
> > My mistake, PG_COLOR (not PG_COLORS) is documented properly.
> 
> Yeah, but the point is precisely that pg_logging_init()
> also responds to PG_COLORS, which is not documented anywhere.

Oh, I thought it was a typo.  OK, so it still an open item.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: color by default

2020-03-20 Thread Tom Lane

Bruce Momjian  writes:
>> I can confirm there is still no mention of PG_COLORS in our
>> documentation.

> My mistake, PG_COLOR (not PG_COLORS) is documented properly.

Yeah, but the point is precisely that pg_logging_init()
also responds to PG_COLORS, which is not documented anywhere.

regards, tom lane

Re: color by default

2020-03-20 Thread Bruce Momjian

On Thu, Mar 19, 2020 at 10:15:57PM -0400, Bruce Momjian wrote:
> On Tue, Mar  3, 2020 at 02:31:01PM +0900, Michael Paquier wrote:
> > On Mon, Mar 02, 2020 at 01:00:44PM +0100, Juan José Santamaría Flecha wrote:
> > > - The new entry in the documentation, specially as the PG_COLORS parameter
> > > seems to be currently undocumented. The programs that can use PG_COLOR
> > > would benefit from getting a link to it.
> > 
> > The actual problem here is that we don't have an actual centralized
> > place where we could put that stuff.  And anything able to use this
> > option is basically anything using src/common/logging.c.
> > 
> > Regarding PG_COLORS, the commit message of cc8d415 mentions it, but we
> > have no actual example of how to use it, and the original thread has
> > zero reference to it:
> > https://www.postgresql.org/message-id/6a609b43-4f57-7348-6480-bd022f924...@2ndquadrant.com
> > 
> > And in fact, it took me a while to figure out that using it is a mix
> > of three keywords ("error", "warning" or "locus") separated by colons
> > which need to have an equal sign to the color defined.  Here is for
> > example how to make the locus show up in yellow with errors in blue:
> > export PG_COLORS='error=01;34:locus=01;33'
> > 
> > Having to dig into the code to find out that stuff is not a good user
> > experience.  And I found out about that only because I worked on a
> > patch touching this area yesterday.
> 
> I can confirm there is still no mention of PG_COLORS in our
> documentation.

My mistake, PG_COLOR (not PG_COLORS) is documented properly.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Add A Glossary

2020-03-20 Thread Justin Pryzby

On Fri, Mar 20, 2020 at 11:32:25PM +0100, Jürgen Purtz wrote:
> > > +   
> > > +File Segment
> > > +
> > > + 
> > > +   If a heap or index file grows in size over 1 GB, it will be split
> > 1GB is the default "segment size", which you should define.
> 
> ???

"A <> or other >>Relation<<" is larger than a >Cluster's< segment size
is stored in multiple physical files.  This avoids file size limitations which
vary across operating systems."

https://www.postgresql.org/docs/devel/runtime-config-preset.html

ts=# SELECT name, setting, unit, category, short_desc FROM pg_settings WHERE 
name~'block_size|segment_size';
   name   | setting  | unit |category|  
short_desc  
--+--+--++--
 block_size   | 8192 |  | Preset Options | Shows the size of a disk 
block.
 segment_size | 131072   | 8kB  | Preset Options | Shows the number of 
pages per disk file.
 wal_block_size   | 8192 |  | Preset Options | Shows the block size in 
the write ahead log.
 wal_segment_size | 16777216 | B| Preset Options | Shows the size of write 
ahead log segments.

> > > +   
> > > +Heap
> > > +
> > > + 
> > > +  Contains the original values of Row 
> > > attributes
> > I'm not sure what "original" means here ?
> 
> Yes, this may be misleading. I want to express, that values are stored in
> the heap (the 'original') and possibly repeated as a key in an index.

Maybe "this is the content of rows/attributes in >>Tables<< or other 
>>Relations<<".
or "this is the data store for ..."

> > > +   
> > > +Host
> > > +
> > > + 
> > > +  See Server.
> > Or client.  Or proxy at some layer or other intermediate thing.  Maybe just
> > remove this.
> 
> Sometimes the term "host" is used in a different meaning. Therefor we shall
> have this glossary entry for clarification that it shall be used only in the
> sense of a "server".

I think that suggests just removing "host" and consistently saying "server".

-- 
Justin

Re: Add A Glossary

2020-03-20 Thread Jürgen Purtz




On 20.03.20 20:58, Justin Pryzby wrote:

On Thu, Mar 19, 2020 at 09:11:22PM -0300, Alvaro Herrera wrote:

+Aggregate
+
+ 
+  To combine a collection of data values into a single value, whose
+  value may not be of the same type as the original values.
+  Aggregate Functions
+  combine multiple Rows that share a common set
+  of values into one Row, which means that the
+  only data visible in the values in common, and the aggregates of the

IS the values in common ?
(or, "is the shared values")


+Analytic
+
+ 
+  A Function whose computed value can reference
+  values found in nearby Rows of the same
+  Result Set.
+Archiver

Can you change that to archiver process ?



I prefer the short term without the addition of 'process' - concerning 
'Archiver' as well as the other cases. But I'm not an native English 
speaker.




+Atomic

..

+ 
+  In reference to an operation: An event that cannot be completed in
+  part: it must either entirely succeed or entirely fail. A series of

Can you say: "an action which is not allowed to partially succed and then fail,
..."


+Autovacuum

Say autovacuum process ?


+
+ 
+  Processes that remove outdated MVCC

I would say "A set of processes that remove..."


+  Records of the Heap and

I'm not sure, can you say "tuples" ?



This concerns the upcomming MVCC terms. We need a linguistic distinction 
between the different versions of 'records' or 'tuples'. In my 
understanding the term 'tuple' is nearer to a logical construct 
(relational algebra) and a 'record' some concrete implementation on 
disc. Therefor I prefer 'record' in this context.




+Backend Process
+
+ 
+  Processes of an Instance which act on behalf of

Say DATABASE instance



-1: The term 'database' is used inflationary. We shall restrict it to a 
few cases.




+Backend Server
+
+ 
+  See Instance.

same


+Background Worker
+
+ 
+  Individual processes within an Instance, which

same


+  run system- or user-supplied code. Typical use cases are processes
+  which handle parts of an SQL query to take
+  advantage of parallel execution on servers with multiple
+  CPUs.

I would say "A typical use case is"

+1

+Background Writer

Add "process" ?


+
+ 
+  Writes continuously dirty pages from Shared

Say "Continuously writes"

+1

+  Memory to the file system. It starts periodically, but

Hm, maybe "wakes up periodically"

+1

+Cast
+
+ 
+  A conversion of a Datum from its current data
+  type to another data type.

maybe just say
A conversion of a Datum another data type.


+Catalog
+
+ 
+  The SQL standard uses this standalone term to
+  indicate what is called a Database in
+  PostgreSQL's terminology.

Maybe remove "standalone" ?


+Checkpointer

Process


+  A process that writes dirty pages and WAL
+  Records to the file system and creates a special

Does the chckpointer actually write WAL ?



YES, not only WAL Writer.



+  checkpoint record. This process is initiated when predefined
+  conditions are met, such as a specified amount of time has passed, or
+  a certain volume of records have been collected.

collected or written?

I would say:

+  A checkpoint is usually initiated by
+  a specified amount of time having passed, or
+  a certain volume of records having been written.



+-0



+Checkpoint
+
+ 
+  A  Checkpoint is a point in time

Extra space


+   
+Connection
+
+ 
+  A TCP/IP or socket line for inter-process

I don't know if I've ever heard the phase "socket line"
I guess you mean a unix socket.



+1



+Constraint
+
+ 
+  A concept of restricting the values of data allowed within a
+  Table.

Just say: "A restriction on the values..."?


+Data Area

Remove this ?  I've never heard this phrase before.



grep on *.sgml delivers 4 occurrences.



+
+ 
+  The base directory on the filesystem of a
+  Server that contains all data files and
+  subdirectories associated with a Cluster with
+  the exception of tablespaces. The environment variable

Should add an entry for "tablespace".



+1



+Datum
+
+ 
+  The internal representation of a SQL data type.

I'm not sure if should use "a SQL" or "an SQL", but not both.



grep | wc delivers 106 occurrences for "an SQL" and 63 for "a SQL". It 
depends on how people pronounce the term SQL: "an esquel" or "a sequel".




+Delete
+
+ 
+  A SQL command whose purpose is to remove

just say "which removes"



+1



+   
+File Segment
+
+ 
+   If a heap or index file grows in size over 1 GB, it will be split

1GB is the default "segment size", which you should define.



???



+   
+Foreign Data Wrapper
+
+ 
+  A means of representing data

Re: Add A Glossary

2020-03-20 Thread Jürgen Purtz

man pages: Sorry, if I confused someone with my poor English. I just 
want to express in my 'offline' mail that we don't have to worry about 
man page generation. The patch doesn't affect files in the /ref 
subdirectory from where man pages are created.


review process: Yes, it will be time-consumptive and it may be a hard 
job because of a) the patch has multiple authors with divergent writing 
styles and b) the terms affect different fundamental issues: SQL basics 
and PG basics. Concerning PG basics in the past we used a wide range of 
similar terms with different meanings as well as different terms for the 
same matter - within our documentation as well as in secondary 
publications. The terms "backend server" / "instance" are such an 
example and there shall be a clear decision in favor of one of the two. 
Presumably we will see more discussions about the question which one is 
the preferred term (remember the discussion concerning the terms 
master/slave, primary/secondary some weeks ago).


ongoing: Intermediate questions for clarifications are welcome.


Kind regards, Jürgen

Re: Should we add xid_current() or a int8->xid cast?

2020-03-20 Thread Thomas Munro

On Fri, Feb 14, 2020 at 6:31 PM Thomas Munro  wrote:
> On Wed, Jan 29, 2020 at 12:43 AM Fujii Masao  
> wrote:
> >   /*
> > - * do a TransactionId -> txid conversion for an XID near the given epoch
> > + * Do a TransactionId -> fxid conversion for an XID that is known to 
> > precede
> > + * the given 'next_fxid'.
> >*/
> > -static txid
> > -convert_xid(TransactionId xid, const TxidEpoch *state)
> > +static FullTransactionId
> > +convert_xid(TransactionId xid, FullTransactionId next_fxid)
> >
> > As the comment suggests, this function assumes that "xid" must
> > precede "next_fxid". But there is no check for the assumption.
> > Isn't it better to add, e.g., an assertion checking that?
> > Or convert_xid() should handle the case where "xid" follows
> > "next_fxid" like the orignal convert_xid() does. That is, don't
> > apply the following change.
> >
> > -   if (xid > state->last_xid &&
> > -   TransactionIdPrecedes(xid, state->last_xid))
> > +   if (xid > next_xid)
> > epoch--;
> > -   else if (xid < state->last_xid &&
> > -TransactionIdFollows(xid, state->last_xid))
> > -   epoch++;

I don't think it is reachable.  I have renamed the function to
widen_snapshot_xid() and rewritten the comments to explain the logic.

The other changes in this version:

* updated OIDs to avoid collisions
* added btequalimage to btree/xid8_ops
From 7fc4387dae4395df111ac64aa28c66aebbb04dbb Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 28 Jan 2020 16:36:36 +1300
Subject: [PATCH v7 1/2] Add SQL type xid8 to expose FullTransactionId to
 users.

Similar to xid, but 64 bits wide.  This new type is suitable for use in
various system views and administration functions.

Reviewed-by: Fujii Masao 
Reviewed-by: Takao Fujii 
Reviewed-by: Yoshikazu Imai 
Reviewed-by: Mark Dilger 
Discussion: https://postgr.es/m/https://www.postgresql.org/message-id/20190725000636.666m5mad25wfbrri%40alap3.anarazel.de
---
 doc/src/sgml/datatype.sgml   |   7 ++
 src/backend/access/hash/hashvalidate.c   |   3 +
 src/backend/utils/adt/xid.c  | 116 +++
 src/fe_utils/print.c |   1 +
 src/include/access/transam.h |  14 +++
 src/include/catalog/pg_amop.dat  |  22 
 src/include/catalog/pg_amproc.dat|   8 ++
 src/include/catalog/pg_cast.dat  |   4 +
 src/include/catalog/pg_opclass.dat   |   4 +
 src/include/catalog/pg_operator.dat  |  25 +
 src/include/catalog/pg_opfamily.dat  |   4 +
 src/include/catalog/pg_proc.dat  |  36 ++
 src/include/catalog/pg_type.dat  |   4 +
 src/include/utils/xid8.h |  22 
 src/test/regress/expected/opr_sanity.out |   7 ++
 src/test/regress/expected/xid.out| 136 +++
 src/test/regress/parallel_schedule   |   2 +-
 src/test/regress/serial_schedule |   1 +
 src/test/regress/sql/xid.sql |  48 
 19 files changed, 463 insertions(+), 1 deletion(-)
 create mode 100644 src/include/utils/xid8.h
 create mode 100644 src/test/regress/expected/xid.out
 create mode 100644 src/test/regress/sql/xid.sql

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 03971822c2..89f3a7c119 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4516,6 +4516,10 @@ INSERT INTO mytable VALUES(-1);  -- fails
 regtype

 
+   
+xid8
+   
+

 cid

@@ -4719,6 +4723,9 @@ SELECT * FROM pg_attribute
 Another identifier type used by the system is xid, or transaction
 (abbreviated xact) identifier.  This is the data type of the system columns
 xmin and xmax.  Transaction identifiers are 32-bit quantities.
+In some contexts, a 64-bit variant xid8 is used.  Unlike
+xid values, xid8 values increase strictly
+monotonically and cannot be reused in the lifetime of a database cluster.

 

diff --git a/src/backend/access/hash/hashvalidate.c b/src/backend/access/hash/hashvalidate.c
index 6346e65865..2f2858021c 100644
--- a/src/backend/access/hash/hashvalidate.c
+++ b/src/backend/access/hash/hashvalidate.c
@@ -313,6 +313,9 @@ check_hash_func_signature(Oid funcid, int16 amprocnum, Oid argtype)
 			(argtype == DATEOID ||
 			 argtype == XIDOID || argtype == CIDOID))
 			 /* okay, allowed use of hashint4() */ ;
+		else if ((funcid == F_HASHINT8 || funcid == F_HASHINT8EXTENDED) &&
+			(argtype == XID8OID))
+			 /* okay, allowed use of hashint8() */ ;
 		else if ((funcid == F_TIMESTAMP_HASH ||
   funcid == F_TIMESTAMP_HASH_EXTENDED) &&
  argtype == TIMESTAMPTZOID)
diff --git a/src/backend/utils/adt/xid.c b/src/backend/utils/adt/xid.c
index db6fc9dd6b..20389aff1d 100644
--- a/src/backend/utils/adt/xid.c
+++ b/src/backend/utils/adt/xid.c
@@ -21,6 +21,7 @@
 #include "access/xact.h"
 #include "libpq/pqformat.h"
 #include "utils/builtins.h"
+#include "utils/xid8.h"
 
 #define

GiST secondary split

2020-03-20 Thread Peter Griggs

I am hacking some GIST code for a research project and wanted clarification
about what exactly a secondary split is in GIST. More specifically I am
wondering why the supportSecondarySplit function (which is in
src/backend/access/gist/gistsplit.c) can assume that the data is currently
on the left side in order to swap it.

/*
* Clean up when we did a secondary split but the user-defined PickSplit
* method didn't support it (leaving spl_ldatum_exists or spl_rdatum_exists
* true).
*
* We consider whether to swap the left and right outputs of the secondary
* split; this can be worthwhile if the penalty for merging those tuples into
* the previously chosen sets is less that way.
*
* In any case we must update the union datums for the current column by
* adding in the previous union keys (oldL/oldR), since the user-defined
* PickSplit method didn't do so.
*/
static void
supportSecondarySplit(Relation r, GISTSTATE *giststate, int attno,
GIST_SPLITVEC *sv, Datum oldL, Datum oldR)
{

Best,
Peter

-- 
Peter Griggs
Masters of Engineering (Meng) in Computer Science
Massachusetts Institute of Technology | 2020

Re: Add FOREIGN to ALTER TABLE in pg_dump

2020-03-20 Thread Alvaro Herrera

On 2020-Mar-19, Daniel Gustafsson wrote:

> Moving this patch to Ready for Committer.

Thanks, pushed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Internal key management system

2020-03-20 Thread Bruce Momjian

On Thu, Mar 19, 2020 at 09:33:09PM +0900, Masahiko Sawada wrote:
> Attached updated version patch. This patch incorporated the comments
> and changed pg_upgrade so that we take over the master encryption key
> from the old cluster to the new one if both enable key management.

We had a crypto team meeting today, and came away with a few ideas:

We should create an SQL-level master key that is different from the
block-level master key.  By using separate keys, and not deriving them
from a single key, they keys can be rotated and migrated to a different
cluster independently.  For example, users might want to create a new
cluster with a new block-level key, but might want to copy the SQL-level
key from the old cluster to the new cluster.  Both keys would be
unlocked with the same passphrase.

I was confused by how the wrap/unwrap work.  Here is an example from
the proposed doc patch:

+
+=# SELECT pg_wrap('user sercret key');
+   
   pg_wrap

+
+ 
\xb2c89f76f04f95d029f179e0fc3df4ed7254127b5562a9e27d42d1cd037c942dea65ce7c0750c520fa4f4e90481c9eb7e1e42a068248c262c1a6f25c6eab64303b1154ccc9a14361223641aab4a7aabe
+(1 row)
+
+
+  
+   Once wrapping the user key, user can encrypt and decrypt user data 
using the
+   wrapped user key togehter with the key unwrap functions:
+  
+
+
+ =# INSERT INTO tbl
+VALUES (pgp_sym_encrypt('secret data',
+ 
pg_unwrap('\xb2c89f76f04f95d029f179e0fc3df4ed7254127b5562a9e27d42d1cd037c942dea65ce7c0750c520fa4f4e90481c9eb7e1e42a068248c262c1a6f25c6eab64303b1154ccc9a14361223641aab4a7aabe')));
+ INSERT 1
+
+ =# SELECT * FROM tbl;
+   
  col

+--
+ 
\xc30d04070302a199ee38bea0320b75d23c01577bb3ffb315d67eecbeca3e40e869cea65efbf0b470f805549af905f94d94c447fbfb8113f585fc86b30c0bd784b10c9857322dc00d556aa8de14
+(1 row)
+
+ =# SELECT pgp_sym_decrypt(col,
+   
pg_unwrap('\xb2c89f76f04f95d029f179e0fc3df4ed7254127b5562a9e27d42d1cd037c942dea65ce7c0750c520fa4f4e90481c9eb7e1e42a068248c262c1a6f25c6eab64303b1154ccc9a14361223641aab4a7aabe'))
 as col
+FROM tbl;
+col
+--
+ user secret data

All pg_wrap() does is to take the user string, in this case 'user
sercret key' and encrypt it with the SQL-level master key.  It doesn't
mix the SQL-level master key into the output, which is what I originally
thought.  This means that the pg_unwrap() call above just returns 'user
sercret key'.

How would this be used? Users would call pg_wrap() once, and store the
result on the client.  The client could then use the output of pg_wrap()
in all future sessions, without exposing 'user sercret key', which is
the key used to encrypt user data.

The passing of the parameter to pg_wrap() has to be done in a way that
doesn't permanently record the parameter anywhere, like in the logs. 
pgcryptokey (https://momjian.us/download/pgcryptokey/) has a method of
doing this.  This is how it passes the data encryption key without
making it visible in the logs, using psql:

SELECT get_shared_key()
\gset
\set enc_access_password `echo 'my secret' | tr -d '\n' | openssl dgst 
-sha256 -binary | gpg2 --symmetric --batch --cipher-algo AES128 --passphrase 
:'get_shared_key' | xxd -plain | tr -d '\n'`
SELECT set_session_access_password(:'enc_access_password');

Removing the sanity checks and user-interface simplicity, it is
internally doing this:

SELECT set_config('pgcryptokey.shared_key',
  encode(gen_random_bytes(32), 'hex'),
  FALSE) AS get_shared_key
\gset
\set enc_access_password `echo 'my secret' | tr -d '\n' | openssl dgst 
-sha256 -binary | gpg2 --symmetric --batch --cipher-algo AES128 --passphrase 
:'get_shared_key' | xxd -plain | tr -d '\n'`
SELECT set_config('pgcryptokey.access_password',

encode(pgp_sym_decrypt_bytea(decode(:'enc_access_password', 'hex'),
   :'get_shared_key'),
'hex'),
  FALSE) || NULL;

In English, what it does is the server generates a random key, stores it
in a server-side veraible, and sends it to the client.  The client
hashes a user-supplied key and encrypts it with the random key it

Re: Add A Glossary

2020-03-20 Thread Justin Pryzby

On Thu, Mar 19, 2020 at 09:11:22PM -0300, Alvaro Herrera wrote:
> +Aggregate
> +
> + 
> +  To combine a collection of data values into a single value, whose
> +  value may not be of the same type as the original values.
> +  Aggregate Functions
> +  combine multiple Rows that share a common set
> +  of values into one Row, which means that the
> +  only data visible in the values in common, and the aggregates of the

IS the values in common ?
(or, "is the shared values")

> +Analytic
> +
> + 
> +  A Function whose computed value can reference
> +  values found in nearby Rows of the same
> +  Result Set.

> +Archiver

Can you change that to archiver process ?

> +Atomic
..
> + 
> +  In reference to an operation: An event that cannot be completed in
> +  part: it must either entirely succeed or entirely fail. A series of

Can you say: "an action which is not allowed to partially succed and then fail,
..."

> +Autovacuum

Say autovacuum process ?

> +
> + 
> +  Processes that remove outdated MVCC

I would say "A set of processes that remove..."

> +  Records of the Heap and

I'm not sure, can you say "tuples" ?

> +Backend Process
> +
> + 
> +  Processes of an Instance which act on behalf of

Say DATABASE instance

> +Backend Server
> +
> + 
> +  See Instance.
same

> +Background Worker
> +
> + 
> +  Individual processes within an Instance, which
same

> +  run system- or user-supplied code. Typical use cases are processes
> +  which handle parts of an SQL query to take
> +  advantage of parallel execution on servers with multiple
> +  CPUs.

I would say "A typical use case is"

> +Background Writer

Add "process" ?

> +
> + 
> +  Writes continuously dirty pages from Shared

Say "Continuously writes"

> +  Memory to the file system. It starts periodically, but

Hm, maybe "wakes up periodically"

> +Cast
> +
> + 
> +  A conversion of a Datum from its current data
> +  type to another data type.

maybe just say
A conversion of a Datum another data type.

> +Catalog
> +
> + 
> +  The SQL standard uses this standalone term to
> +  indicate what is called a Database in
> +  PostgreSQL's terminology.

Maybe remove "standalone" ?

> +Checkpointer

Process

> +  A process that writes dirty pages and WAL
> +  Records to the file system and creates a special

Does the chckpointer actually write WAL ?

> +  checkpoint record. This process is initiated when predefined
> +  conditions are met, such as a specified amount of time has passed, or
> +  a certain volume of records have been collected.

collected or written?

I would say:
> +  A checkpoint is usually initiated by
> +  a specified amount of time having passed, or
> +  a certain volume of records having been written.

> +Checkpoint
> +
> + 
> +  A  Checkpoint is a point in time

Extra space

> +   
> +Connection
> +
> + 
> +  A TCP/IP or socket line for inter-process

I don't know if I've ever heard the phase "socket line"
I guess you mean a unix socket.

> +Constraint
> +
> + 
> +  A concept of restricting the values of data allowed within a
> +  Table.

Just say: "A restriction on the values..."?

> +Data Area

Remove this ?  I've never heard this phrase before.

> +
> + 
> +  The base directory on the filesystem of a
> +  Server that contains all data files and
> +  subdirectories associated with a Cluster with
> +  the exception of tablespaces. The environment variable

Should add an entry for "tablespace".

> +Datum
> +
> + 
> +  The internal representation of a SQL data type.

I'm not sure if should use "a SQL" or "an SQL", but not both.

> +Delete
> +
> + 
> +  A SQL command whose purpose is to remove

just say "which removes"

> +   
> +File Segment
> +
> + 
> +   If a heap or index file grows in size over 1 GB, it will be split

1GB is the default "segment size", which you should define.

> +   
> +Foreign Data Wrapper
> +
> + 
> +  A means of representing data that is not contained in the local
> +  Database as if were in local
> +  Table(s).

I'd say:

+ A means of representing data as a Table(s) even though
+ it is not contained in the local Database 


> +   
> +Foreign Key
> +
> + 
> +  A type of Constraint defined on one or more
> +  Columns in a Table which
> +  requires the value in those Columns to uniquely
> +  identify a Row in the specified
> +  Table.

An FK doesn't require the values in its table to be unique, right ?
I'd say something like: "..which enforces that the values in those Columns are
also present in an(other) table."
Reference Referential Integrity?

> +Function
> +
> + 
> +

Re: SQL/JSON: functions

2020-03-20 Thread Pavel Stehule

čt 19. 3. 2020 v 23:57 odesílatel Nikita Glukhov 
napsal:

> Attached 45th version of the patches.
>
> Nodes JsonFormat, JsonReturning, JsonPassing, JsonBehavior were fixed.
>
>
> On 17.03.2020 21:35, Pavel Stehule wrote:
>
>
> út 17. 3. 2020 v 1:55 odesílatel Nikita Glukhov 
> napsal:
>
>> Attached 44th version of the patches.
>>
>>
>> On 12.03.2020 16:41, Pavel Stehule wrote:
>>
>>
>> On 12.03.2020 00:09 Nikita Glukhov wrote:
>>
>>> Attached 43rd version of the patches.
>>>
>>> The previous patch #4 ("Add function formats") was removed.
>>> Instead, introduced new executor node JsonCtorExpr which is used to wrap
>>> SQL/JSON constructor function calls (FuncExpr, Aggref, WindowFunc).
>>>
>>> Also, JsonIsPredicate node began to be used as executor node.
>>> This helped to remove unnecessary json[b]_is_valid() user functions.
>>>
>>>
>> It looks very well - all tests passed, code looks well.
>>
>> Now, when there are special nodes, then the introduction of
>> COERCE_INTERNAL_CAST is not necessary, and it is only my one and last
>> objection again this patch's set.
>>
>> I have removed patch #3 with COERCE_INTERNAL_CAST too.
>>
>> Coercions in JsonValueExpr in JsonCtorExpr, which were previously hidden with
>> COERCE_INTERNAL_CAST and which were outputted as RETURNING or FORMAT JSON
>> clauses, now moved into separate expressions.
>>
>> I am looking on the code, and although the code is correct it doesn't
> look well (consistently with other node types).
>
> I think so JsonFormat and JsonReturning should be node types, not just
> structs. If these types will be nodes, then you can simplify _readJsonExpr
> and all node operations on this node.
>
>
> JsonFormat and JsonReturning was transformed into nodes, and not included
> directly into other nodes now.
>
>
> User functions json[b]_build_object_ext() and json[b]_build_array_ext() also
>> can be easily removed.   But it seems harder to remove new aggregate 
>> functions
>> json[b]_objectagg() and json[b]_agg_strict(), because they can't be called
>> directly from JsonCtorExpr node.
>>
>>
> I don't see reasons for another reduction now. Can be great if you can
> finalize work what you plan for pg13.
>
>
> +<->READ_ENUM_FIELD(on_error.btype, JsonBehaviorType);
> +<->READ_NODE_FIELD(on_error.default_expr);
> +<->READ_ENUM_FIELD(on_empty.btype, JsonBehaviorType);
> +<->READ_NODE_FIELD(on_empty.default_expr);
>
> JsonBehavior is node type. Then why you don't write just
>
> READ_NODE_FIELD(on_error);
> READ_NODE_FIELD(on_empty)
>
> ??
>
> JsonBehavior now used in JsonExpr as a pointer to node.
>
>
> And maybe the code can be better if you don't use JsonPassing struct (or
> use it only inside gram.y) and pass just List *names, List *values.
>
> Nodes should to contains another nodes or scalar types. Using structs
> (that are not nodes)  inside doesn't look consistently.
>
> JsonPassing was replaced with two Lists.
>
>
> I found some not finished code in 0003 patch
> +
> +json_name_and_value:
> +/* TODO
> +<-><--><-->KEY c_expr VALUE_P json_value_expr %prec POSTFIXOP
> +<-><--><--><-->{ $$ = makeJsonKeyValue($2, $4); }
> +<-><--><-->|
> +*/
>
> This is unsupported variant of standard syntax, because it seems to lead
> to unresolvable conflicts.  The only supports syntax is:
>
> JSON_OBJECT(key : value)
> JSON_OBJECT(key VALUE value)
>
> ok. So please change comment from ToDo to this explanation. Maybe note in
doc (unimplemented features can be good idea)

Some these unresolvable conflicts are solved with extra parenthesis in
Postgres.


>
> The support for json type in jsonpath also seems to be immature, so I will try
>> to remove it in the next iteration.
>>
>> What do you think? This patch is little bit off topic, so if you don't
> need it, then can be removed. Is there some dependency for "jsontable" ?
>
> There is a dependency in SQL/JSON query functions executor, because executor
> uses new structure JsonItem instead of plain JsonbValue.  I will try to
> preserve refactoring with JsonItem introduction, but remove json support.
> If it will be still unacceptable, I will try to completely remove patch #1.
>
>
I have much better feeling from version 45 (now it looks like Postgres C
:)). Still there are some small issues.

1. commented code

+json_encoding:
+<-><--><-->name<--><--><--><--><--><--><--><--><-->{ $$ =
makeJsonEncoding($1); }
+<->/*
+<-><--><-->| UTF8<><--><--><--><--><--><--><--><-->{ $$ = JS_ENC_UTF8; }
+<-><--><-->| UTF16><--><--><--><--><--><--><--><-->{ $$ = JS_ENC_UTF16; }
+<-><--><-->| UTF32 <--><--><--><--><--><--><--><-->{ $$ = JS_ENC_UTF32; }
+<->*/
+<-><-->;

2. sometimes useless empty rows

  silent boolean DEFAULT false)
+RETURNS text
+LANGUAGE INTERNAL
+STRICT IMMUTABLE PARALLEL SAFE
+AS 'jsonb_path_query_first_text';
+
+
+


+<-><-->if (!coerced)
+<-><-->{
+
+<-><--><-->/* If coercion failed, use to_json()/to_jsonb() functions. */


All tests passed
build without any problem

looking for next update

Regards

Re: Planning counters in pg_stat_statements (using pgss_store)

2020-03-20 Thread Julien Rouhaud

On Fri, Mar 20, 2020 at 05:09:05PM +0300, Sergei Kornilov wrote:
> Hello
> 
> Yet another is missed in docs: total_time

Oh good catch!  I rechecked many time the field, and totally missed that the
documentation is referring to the view, which has an additional column, and not
the function.  Attached v9 fixes that.

> I specifically verified that the new loaded library works with the old 
> version of the extension in the database. I have not noticed issues here.

Thanks for those extra checks.

> > 2.2% is a bit large but I think it is still acceptable because people using 
> > this feature
> > might take account that some overhead will happen for additional calling of 
> > a
> > gettime function.
> 
> I will be happy even with 10% overhead due to enabled track_planning... (but 
> in this case disabled by default) log_min_duration_statement = 0 with log 
> parsing is much more expensive.
> I think 1-2% is acceptable and we can set track_planning = on by default as 
> patch does.
> 
> > * Rows for executor time are changed from {total/min/max/mean/stddev}_time 
> > to {total/min/max/mean/stddev}_exec_time.
> 
> Maybe release it as 2.0 version instead of minor update 1.18?

I don't have an opinion on that, I'd be fine with any version.  I kept 1.18 in
the patch for now.
>From 16c888f7861300d08d21f3ebb97af4fad3ec1fa5 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Thu, 28 Mar 2019 13:33:23 +0100
Subject: [PATCH v9 1/2] Pass query string to the planner

---
 src/backend/commands/copy.c  |  3 ++-
 src/backend/commands/createas.c  |  3 ++-
 src/backend/commands/explain.c   |  2 +-
 src/backend/commands/extension.c |  2 +-
 src/backend/commands/matview.c   |  2 +-
 src/backend/commands/portalcmds.c|  2 +-
 src/backend/executor/functions.c |  1 +
 src/backend/optimizer/plan/planner.c | 10 ++
 src/backend/tcop/postgres.c  | 13 -
 src/backend/utils/cache/plancache.c  |  3 ++-
 src/include/optimizer/optimizer.h|  3 ++-
 src/include/optimizer/planner.h  |  4 +++-
 src/include/tcop/tcopprot.h  |  6 --
 13 files changed, 34 insertions(+), 20 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index fbde9f88e7..efb1e0d03e 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1580,7 +1580,8 @@ BeginCopy(ParseState *pstate,
 		}
 
 		/* plan the query */
-		plan = pg_plan_query(query, CURSOR_OPT_PARALLEL_OK, NULL);
+		plan = pg_plan_query(query, pstate->p_sourcetext,
+			 CURSOR_OPT_PARALLEL_OK, NULL);
 
 		/*
 		 * With row level security and a user using "COPY relation TO", we
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 3a5676fb39..9febdc5165 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -329,7 +329,8 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
 		Assert(query->commandType == CMD_SELECT);
 
 		/* plan the query */
-		plan = pg_plan_query(query, CURSOR_OPT_PARALLEL_OK, params);
+		plan = pg_plan_query(query, pstate->p_sourcetext,
+			 CURSOR_OPT_PARALLEL_OK, params);
 
 		/*
 		 * Use a snapshot with an updated command ID to ensure this query sees
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d901dc4a50..fb772aa5cd 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -375,7 +375,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
 		INSTR_TIME_SET_CURRENT(planstart);
 
 		/* plan the query */
-		plan = pg_plan_query(query, cursorOptions, params);
+		plan = pg_plan_query(query, queryString, cursorOptions, params);
 
 		INSTR_TIME_SET_CURRENT(planduration);
 		INSTR_TIME_SUBTRACT(planduration, planstart);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 00cf4ef268..38cbea385a 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -751,7 +751,7 @@ execute_sql_string(const char *sql)
 		   NULL,
 		   0,
 		   NULL);
-		stmt_list = pg_plan_queries(stmt_list, CURSOR_OPT_PARALLEL_OK, NULL);
+		stmt_list = pg_plan_queries(stmt_list, sql, CURSOR_OPT_PARALLEL_OK, NULL);
 
 		foreach(lc2, stmt_list)
 		{
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index c3954f3e24..e5a5eef102 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -391,7 +391,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	CHECK_FOR_INTERRUPTS();
 
 	/* Plan the query which will generate data for the refresh. */
-	plan = pg_plan_query(query, 0, NULL);
+	plan = pg_plan_query(query, queryString, 0, NULL);
 
 	/*
 	 * Use a snapshot with an updated command ID to ensure this query sees
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 40be5069fe..6a2c233615 100644
--- a/src/backend/commands/portalcmds.c
+++

Re: Add A Glossary

2020-03-20 Thread Corey Huinker

>
> It's hard to review work from a professional tech writer.  I'm under the
> constant impression that I'm ruining somebody's perfect end product,
> making a fool of myself.


If it makes you feel better, it's a mix of definitions I wrote that Roger
proofed and restructured, ones that Jürgen had written for a separate
effort which then got a Roger-pass, and then some edits of my own and some
by Jürgen which I merged without consulting Roger.

Re: Add A Glossary

2020-03-20 Thread Roger Harkavy

Alvaro, I know that you are joking, but I want to impress on everyone:
please don't feel like anyone here is breaking anything when it comes to
modifying the content and structure of this glossary.

I do have technical writing experience, but everyone else here is a subject
matter expert when it comes to the world of databases and how this one in
particular functions.

On Fri, Mar 20, 2020 at 1:51 PM Alvaro Herrera 
wrote:

> On 2020-Mar-20, Corey Huinker wrote:
>
> > > Jürgen mentioned off-list that the man page doesn't build. I was going
> to
> > > look into that, but if anyone has more familiarity with that, I'm
> listening.
>
> > Looking at this some more, I'm not sure anything needs to be done for man
> > pages.
>
> Yeah, I don't think he was saying that we needed to do anything to
> produce a glossary man page; rather that the "make man" command failed.
> I tried it here, and indeed it failed.  But on further investigation,
> after a "make maintainer-clean" it no longer failed.  I'm not sure what
> to make of it, but it seems that this patch needn't concern itself with
> that.
>
> I gave a read through the first few actual definitions.  It's a much
> slower work than I thought!  Attached you'll find the first few edits
> that I propose.
>
> Looking at the definition of "Aggregate" it seemed weird to have it
> stand as a verb infinitive.  I looked up other glossaries, found this
> one
> https://www.gartner.com/en/information-technology/glossary?glossaryletter=T
> and realized that when they do verbs, they put the present participle
> (-ing) form.  So I changed it to "Aggregating", and split out the
> "Aggregate function" into its own term.
>
> In Atomic, there seemed to be excessive use of  in the
> definitions.  Style guides seem to suggest to do that only the first
> time you use a term in a definition.  I removed some markup.
>
> I'm not sure about some terms such as "analytic" and "backend server".
> I put them in XML comments for now.
>
> The other changes should be self-explanatory.
>
> It's hard to review work from a professional tech writer.  I'm under the
> constant impression that I'm ruining somebody's perfect end product,
> making a fool of myself.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: Add A Glossary

2020-03-20 Thread Alvaro Herrera

On 2020-Mar-20, Corey Huinker wrote:

> > Jürgen mentioned off-list that the man page doesn't build. I was going to
> > look into that, but if anyone has more familiarity with that, I'm listening.

> Looking at this some more, I'm not sure anything needs to be done for man
> pages.

Yeah, I don't think he was saying that we needed to do anything to
produce a glossary man page; rather that the "make man" command failed.
I tried it here, and indeed it failed.  But on further investigation,
after a "make maintainer-clean" it no longer failed.  I'm not sure what
to make of it, but it seems that this patch needn't concern itself with
that.

I gave a read through the first few actual definitions.  It's a much
slower work than I thought!  Attached you'll find the first few edits
that I propose.

Looking at the definition of "Aggregate" it seemed weird to have it
stand as a verb infinitive.  I looked up other glossaries, found this
one
https://www.gartner.com/en/information-technology/glossary?glossaryletter=T
and realized that when they do verbs, they put the present participle
(-ing) form.  So I changed it to "Aggregating", and split out the
"Aggregate function" into its own term.

In Atomic, there seemed to be excessive use of  in the
definitions.  Style guides seem to suggest to do that only the first
time you use a term in a definition.  I removed some markup.

I'm not sure about some terms such as "analytic" and "backend server".
I put them in XML comments for now.

The other changes should be self-explanatory.

It's hard to review work from a professional tech writer.  I'm under the
constant impression that I'm ruining somebody's perfect end product,
making a fool of myself.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 612ce6e5f4..05fca33d9b 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -6,25 +6,34 @@
   systems in general.

-   
-Aggregate
+   
+Aggregating

-  To combine a collection of data values into a single value, whose
-  value may not be of the same type as the original values.
-  Aggregate Functions
-  combine multiple Rows that share a common set
-  of values into one Row, which means that the
-  only data visible in the values in common, and the aggregates of the
-  non-common data.
+  The act of combining a collection of data (input) values into
+  a single output value, which may not be of the same type as the
+  input values.
+ 
+
+   
+
+   
+Aggregate Function
+
+  A Function that combines multiple input values,
+  for example by counting, averaging or adding them all together,
+  yielding a single output value.

   For more information, see
   .

+ 
+  See also Window Function.

+

 Archiver

-  A process that backs up WAL Files in order to
-  reclaim space on the file system.
+  A process that saves aside copies of WAL Files,
+  for the purposes of creating backup copies or keeping
+  Replicas current.

   For more information, see
@@ -59,16 +70,15 @@

   In reference to the value of an Attribute or
-  Datum: cannot be broken down into smaller
-  components.
+  Datum: the property that it cannot be broken down
+  into smaller components.

   In reference to an operation: An event that cannot be completed in
   part: it must either entirely succeed or entirely fail. A series of
   SQL statements can be combined into a
   Transaction, and that
-  transaction is said to be
-  Atomic.
+  transaction is said to be atomic.

@@ -112,6 +122,7 @@

+

 Background Worker
@@ -142,8 +154,9 @@
 Background Writer

-  Writes continuously dirty pages from Shared
-  Memory to the file system. It starts periodically, but
+  A process that continuously writes dirty pages from
+  Shared Memory to the file system.
+  It starts periodically, but
   works only for a short period in order to distribute expensive
   I/O activity over time instead of generating fewer
   large I/O peaks which could block other processes.
@@ -218,10 +231,9 @@
 Checkpoint

-  A  Checkpoint is a point in time
-  when all older dirty pages of the Shared
-  Memory, all older WAL records, and
-  a special Checkpoint record have been written
+  A point in time when all older dirty pages of the
+  Shared Memory, all older WAL records,
+  and a special Checkpoint record have been written
   and flushed to disk.

@@ -543,8 +555,8 @@

   A Relation that contains data derived from a
-  Table (or Relation such
-  as a

Re: Correlated IN/Any Subquery Transformation

2020-03-20 Thread Tom Lane

"Li, Zheng"  writes:
> This patch enables correlated IN/Any subquery to be transformed to join, the 
> transformation is allowed only when the correlated Var is in the where clause 
> of the subquery. It covers the most common correlated cases and follows the 
> same criteria that is followed by the correlated Exists transformation code.

It's too late to include this in v13, but please add the patch to the
next commitfest so that we remember to consider it for v14.

https://commitfest.postgresql.org

regards, tom lane

Re: where EXEC_BACKEND is defined

2020-03-20 Thread Tom Lane

Peter Eisentraut  writes:
> I noticed that when you want to define EXEC_BACKEND, perhaps to do some 
> non-Windows testing of that option, it's not clear how to do that.  I 
> would have expected it in pg_config_manual.h, but it's actually in 
> configure.in and in MSBuildProject.pm.  So if you want to define it 
> yourself, you kind of have to make up your own way to do it.

Yeah.  Personally, I tend to add the #define to pg_config.h
manually after running configure; I find this convenient because
the effects automatically go away at "make distclean", and I
don't have to remember to undo anything.

> I don't see why this should be like that.  I propose the attached patch 
> to move the thing to pg_config_manual.h.

I wouldn't use the option of editing pg_config_manual.h myself, because
of the need to undo it + risk of forgetting and committing that as part
of a patch.  Still, this patch doesn't get in the way of continuing to
set it in pg_config.h, and it does offer a place to document the thing
which is a good idea as you say.  So no objection here.

One small point is that I believe the existing code has the effect of
"#define EXEC_BACKEND 1" not just "#define EXEC_BACKEND".  I don't
think this matters to anyplace in the core code, but it's conceivable
that somebody has extension code written to assume the former.
Nonetheless, I'm +1 for re-standardizing on the latter, because it's
a couple less keystrokes when inserting a manual definition ;-).
Might be worth mentioning in the commit log entry though.

regards, tom lane

Issues with building cpp extensions on PostgreSQL 10+

2020-03-20 Thread Oleksii Kliukin

Hello,

I’ve recently tried to build an extension that employs C++ files and also
passes them to a linker to make a shared library. I’ve discovered a few
issues with them:

- in v10 CFLAGS_SL is not appended to the CXXFLAGS in Makefile.shlib,
resulting in cpp files compiled without -fPIC, leading to errors when
creating the shared library out of them. In v11 and above CFLAGS_SL is
prepended to the PG_CXXFLAGS, but there are no PG_CXXFLAGS on v10, and the
Makefile does nothing to add them to CXXFLAGS. Patch is attached.

- not just with v10, when building bc files from cpp, there are no CXXFLAGS
passed; as a result, when building a source with non-standard flags (i.e
-std=c++11) one would get an error during building of bc files.

The rules in the Makefile.global.(in) look like:

ifndef COMPILE.c.bc
# -Wno-ignored-attributes added so gnu_printf doesn't trigger
# warnings, when the main binary is compiled with C.
COMPILE.c.bc = $(CLANG) -Wno-ignored-attributes $(BITCODE_CFLAGS) $(CPPFLAGS) 
-flto=thin -emit-llvm -c
endif

ifndef COMPILE.cxx.bc
COMPILE.cxx.bc = $(CLANG) -xc++ -Wno-ignored-attributes $(BITCODE_CXXFLAGS) 
$(CPPFLAGS) -flto=thin -emit-llvm -c
endif

%.bc : %.c
$(COMPILE.c.bc) -o $@ $<

%.bc : %.cpp
$(COMPILE.cxx.bc) -o $@ $<

However, there seems to be no way to override BITCODE_CXXFLAGS to include
any specific C++ compilation flags that are also required to build object
files from cpp. Same applies to .bc derived from .c files with
BITCODE_CFLAGS respectively.

I am wondering if we could define something like PG_BITCODE_CXXFLAGS and
PG_BITCODE_CFLAGS in pgxs.mk to be able to override those. If this sound
like a right strategy, I’ll prepare a patch.

Cheers,
Oleksii “Alex” Kluukin



cxxflags_shared_libraries_pg10.diff
Description: Binary data

Re: Add A Glossary

2020-03-20 Thread Corey Huinker

>
> Jürgen mentioned off-list that the man page doesn't build. I was going to
>> look into that, but if anyone has more familiarity with that, I'm listening.
>>
>
Looking at this some more, I'm not sure anything needs to be done for man
pages. man1 is for executables, man3 seems to be dblink and SPI, and man7
is all SQL commands. This isn't any of those. The only possible thing left
would be how to render the text of a foo
sgml/postgres.sgml:  
sgml/release.sgml:[A-Z][A-Z_ ]+[A-Z_] , ,
, 
sgml/stylesheet.css:acronym { font-style: inherit; }

filelist.sgml, postgres.sgml, ans stylesheet.css already have the
corresponding change, and the release.sgml is just an incidental mention of
acronym.

Of course I could be missing something.

>

Re: potential stuck lock in SaveSlotToPath()

2020-03-20 Thread Robert Haas

On Fri, Mar 20, 2020 at 11:32 AM Peter Eisentraut
 wrote:
> On 2020-03-19 16:38, Robert Haas wrote:
> > Incidentally, the wait-event handling in SaveSlotToPath() doesn't look
> > right for the early-exit cases either.
>
> There appear to be appropriate pgstat_report_wait_end() calls.  What are
> you seeing?

Oh, you're right. I think I got confused because the rename() and
close() don't have that, but those don't have a wait event set either.
Sorry for the noise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: potential stuck lock in SaveSlotToPath()

2020-03-20 Thread Peter Eisentraut


On 2020-03-19 16:38, Robert Haas wrote:

Incidentally, the wait-event handling in SaveSlotToPath() doesn't look
right for the early-exit cases either.


There appear to be appropriate pgstat_report_wait_end() calls.  What are 
you seeing?


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

where EXEC_BACKEND is defined

2020-03-20 Thread Peter Eisentraut

I noticed that when you want to define EXEC_BACKEND, perhaps to do some 
non-Windows testing of that option, it's not clear how to do that.  I 
would have expected it in pg_config_manual.h, but it's actually in 
configure.in and in MSBuildProject.pm.  So if you want to define it 
yourself, you kind of have to make up your own way to do it.


I don't see why this should be like that.  I propose the attached patch 
to move the thing to pg_config_manual.h.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From c7d574d3b9f363cc44daf45c543ab4addfa25afd Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 19 Mar 2020 16:32:02 +0100
Subject: [PATCH] Define EXEC_BACKEND in pg_config_manual.h

It was for unclear reasons defined in a separate location, which makes
it more cumbersome to override for testing, and it also did not have
any prominent documentation.  Move to pg_config_manual.h, where
similar things are already collected.
---
 configure|  2 +-
 configure.in |  2 +-
 src/include/pg_config_manual.h   | 13 +
 src/tools/msvc/MSBuildProject.pm |  2 +-
 4 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/configure b/configure
index a7cf71b3f1..899116517c 100755
--- a/configure
+++ b/configure
@@ -6906,7 +6906,7 @@ fi
 
 # We already have this in Makefile.win32, but configure needs it too
 if test "$PORTNAME" = "win32"; then
-  CPPFLAGS="$CPPFLAGS -I$srcdir/src/include/port/win32 -DEXEC_BACKEND"
+  CPPFLAGS="$CPPFLAGS -I$srcdir/src/include/port/win32"
 fi
 
 # Now that we're done automatically adding stuff to C[XX]FLAGS, put back the
diff --git a/configure.in b/configure.in
index d36a7e94b3..ecdf172396 100644
--- a/configure.in
+++ b/configure.in
@@ -600,7 +600,7 @@ fi
 
 # We already have this in Makefile.win32, but configure needs it too
 if test "$PORTNAME" = "win32"; then
-  CPPFLAGS="$CPPFLAGS -I$srcdir/src/include/port/win32 -DEXEC_BACKEND"
+  CPPFLAGS="$CPPFLAGS -I$srcdir/src/include/port/win32"
 fi
 
 # Now that we're done automatically adding stuff to C[XX]FLAGS, put back the
diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index d74a8dd808..b7410ff51e 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -122,6 +122,19 @@
  */
 #define ALIGNOF_BUFFER 32
 
+/*
+ * If EXEC_BACKEND is defined, the postmaster uses an alternative method for
+ * starting subprocesses: Instead of simply using fork(), as is standard on
+ * Unix platforms, it uses fork()+exec() or something equivalent on Windows,
+ * as well as lots of extra code to bring the required global state to those
+ * new processes.  This must be enabled on Windows (because there is no
+ * fork()).  On other platforms, it's only useful for verifying those
+ * otherwise Windows-specific code paths.
+ */
+#if defined(WIN32) && !defined(__CYGWIN__)
+#define EXEC_BACKEND
+#endif
+
 /*
  * Disable UNIX sockets for certain operating systems.
  */
diff --git a/src/tools/msvc/MSBuildProject.pm b/src/tools/msvc/MSBuildProject.pm
index 823357c023..ebb169e201 100644
--- a/src/tools/msvc/MSBuildProject.pm
+++ b/src/tools/msvc/MSBuildProject.pm
@@ -320,7 +320,7 @@ sub WriteItemDefinitionGroup
 
   $p->{opt}
   
$self->{prefixincludes}src/include;src/include/port/win32;src/include/port/win32_msvc;$includes\%(AdditionalIncludeDirectories)
-  
WIN32;_WINDOWS;__WINDOWS__;__WIN32__;EXEC_BACKEND;WIN32_STACK_RLIMIT=4194304;_CRT_SECURE_NO_DEPRECATE;_CRT_NONSTDC_NO_DEPRECATE$self->{defines}$p->{defs}\%(PreprocessorDefinitions)
+  
WIN32;_WINDOWS;__WINDOWS__;__WIN32__;WIN32_STACK_RLIMIT=4194304;_CRT_SECURE_NO_DEPRECATE;_CRT_NONSTDC_NO_DEPRECATE$self->{defines}$p->{defs}\%(PreprocessorDefinitions)
   $p->{strpool}
   $p->{runtime}
   
$self->{disablewarnings};\%(DisableSpecificWarnings)
-- 
2.25.0

Re: Making psql error out on output failures

2020-03-20 Thread Peter Eisentraut


On 2020-03-06 10:36, Daniel Verite wrote:

David Zhang wrote:


Thanks for your review, now the new patch with the error message in PG
style is attached.


The current status of the patch is "Needs review" at
https://commitfest.postgresql.org/27/2400/

If there's no more review to do, would you consider moving it to
Ready for Committer?


committed

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: error context for vacuum to include block number

2020-03-20 Thread Justin Pryzby

On Fri, Mar 20, 2020 at 04:58:08PM +0530, Amit Kapila wrote:
> See, how the attached looks?  I have written a commit message as well,
> see if I have missed anyone is from the credit list?

Thanks for looking again.

Couple tweaks:

+/* Phases of vacuum during which an error can occur. */

Can you say: "during which we report error context"
Otherwise it sounds like we've somehow precluded errors from happening anywhere
else, which I don't think we can claim.

In the commit messsage:
|The additional information displayed will be block number for errors
|occurred while processing heap and index name for errors occurred
|while processing the index.

=> error occurring

|This will help us in diagnosing the problems that occurred during a
|vacuum.  For ex. due to corruption if we get some error while vacuuming,

=> problems that occur

Maybe it should say that this will help both 1) admins who have corruption due
to hardware (say); and, 2) developer's with corruption due to a bug.

-- 
Justin

Re: range_agg

2020-03-20 Thread Alvaro Herrera

On 2020-Mar-14, Paul A Jungwirth wrote:

> On Fri, Mar 13, 2020 at 2:39 PM Alvaro Herrera  
> wrote:
> > Here's the rebased version.
> >
> > I just realized I didn't include the API change I proposed in
> > https://postgr.es/m/20200306200343.GA625@alvherre.pgsql ...
> 
> Thanks for your help with this Alvaro!
> 
> I was just adding your changes to my own branch and I noticed your
> v12-0001 has different parameter names here:
> 
>  static MultirangeIOData *
> -get_multirange_io_data(FunctionCallInfo fcinfo, Oid mltrngtypid,
> IOFuncSelector func)
> +get_multirange_io_data(FunctionCallInfo fcinfo, Oid rngtypid,
> IOFuncSelector func)

> I'm pretty sure mltrngtypid is the correct name here. Right? Let me
> know if I'm missing something. :-)

Heh.  The intention here was to abbreviate to "typid", but if you want
to keep the longer name, it's OK too.  I don't think that name is
particularly critical, since it should be obvious that it must be a
multirange type.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Missing errcode() in ereport

2020-03-20 Thread Alvaro Herrera

On 2020-Mar-19, Tom Lane wrote:

> I think the key decision we'd have to make to move forward on this
> is to decide whether it's still project style to prefer the extra
> parens, or whether we want new code to do without them going
> forward.  If we don't want to risk requiring __VA_ARGS__ for the
> old branches then I'd vote in favor of keeping the parens as
> preferred style, at least till v11 is out of support.  If we do
> change that in the back branches then there'd be reason to prefer
> to go without parens.  New coders might still be confused about
> why there are all these calls with the useless parens, though.

It seems fine to accept new code in pg14 without the extra parens.  All
existing committers are (or should be) used to the style with the
parens, so it's unlikely that we'll mess up when backpatching bugfixes;
and even if we do, the buildfarm would alert us to that soon enough.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Missing errcode() in ereport

2020-03-20 Thread Tom Lane

Andres Freund  writes:
> On 2020-03-19 22:32:30 -0400, Tom Lane wrote:
>> Could we get away with moving the compiler goalposts for the back
>> branches?  I dunno, but it's a fact that we aren't testing anymore
>> with any compilers that would complain about unconditional use of
>> __VA_ARGS__.  So it might be broken already and we wouldn't know it.

> FWIW, I did grep for unprotected uses, and  didn't find anything.

Yeah, I also grepped the v11 branch for that.

>> (I suspect the last buildfarm animal that would've complained about
>> this was pademelon, which I retired more than a year ago IIRC.)

> I guess a query that searches the logs backwards for animals without
> __VA_ARGS__ would be a good idea?

I did a more wide-ranging scan, looking at the 9.4 branch as far back
as 2015.  Indeed, pademelon is the only animal that ever reported
not having __VA_ARGS__ in that timeframe.

So I've got mixed emotions about this.  On the one hand, it seems
quite unlikely that anyone would ever object if we started requiring
__VA_ARGS__ in the back branches.  On the other hand, it's definitely
not project policy to change requirements like that in minor releases.
Also the actual benefit might not be much.  If anyone did mistakenly
back-patch a fix that included a paren-free ereport, the buildfarm
would point out the error soon enough.

I thought for a little bit about making the back branches define ereport
with "..." if HAVE__VA_ARGS and otherwise not, but if we did that then
the buildfarm would *not* complain about paren-free ereports in the back
branches.  I think that would inevitably lead to there soon being some,
so that we'd effectively be requiring __VA_ARGS__ anyway.  (I suppose
I could resurrect pademelon ... but who knows whether that old war
horse will keep working for another four years till v11 is retired.)

On balance I'm leaning towards keeping the parens as preferred style
for now, adjusting v12 so that the macro will allow paren omission
but we don't break ABI, and not touching the older branches.  But
if there's a consensus to require __VA_ARGS__ in the back branches,
I won't stand in the way.

regards, tom lane

Re: Planning counters in pg_stat_statements (using pgss_store)

2020-03-20 Thread Sergei Kornilov

Hello

I was inactive for a while... Sorry.

>>  BTW, I recheck the patchset.
>>  I think codes are ready for committer but should we modify the 
>> documentation?
>>  {min,max,mean,stddev}_time is now obsoleted so it is better to modify it to
>>  {min,max,mean,stddev}_exec_time and add {min,max,mean,stddev}_plan_time.
>
> Oh indeed, I totally forgot about this. I'm attaching v8 with updated
> documentation that should match what was implemented since some versions.

Yet another is missed in docs: total_time

I specifically verified that the new loaded library works with the old version 
of the extension in the database. I have not noticed issues here.

> 2.2% is a bit large but I think it is still acceptable because people using 
> this feature
> might take account that some overhead will happen for additional calling of a
> gettime function.

I will be happy even with 10% overhead due to enabled track_planning... (but in 
this case disabled by default) log_min_duration_statement = 0 with log parsing 
is much more expensive.
I think 1-2% is acceptable and we can set track_planning = on by default as 
patch does.

> * Rows for executor time are changed from {total/min/max/mean/stddev}_time to 
> {total/min/max/mean/stddev}_exec_time.

Maybe release it as 2.0 version instead of minor update 1.18?

regards, Sergei

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-20 Thread Laurenz Albe

On Thu, 2020-03-19 at 23:20 -0700, Andres Freund wrote:
> I am not sure about b).  In my mind, the objective is not to prevent
> > anti-wraparound vacuums, but to see that they have less work to do,
> > because previous autovacuum runs already have frozen anything older than
> > vacuum_freeze_min_age.  So, assuming linear growth, the number of tuples
> > to freeze during any run would be at most one fourth of today's number
> > when we hit autovacuum_freeze_max_age.
> 
> Based on two IM conversations I think it might be worth emphasizing how
> vacuum_cleanup_index_scale_factor works:
> 
> For btree, even if there is not a single deleted tuple, we can *still*
> end up doing a full index scans at the end of vacuum. As the docs describe
> vacuum_cleanup_index_scale_factor:
> 
>
> Specifies the fraction of the total number of heap tuples counted in
> the previous statistics collection that can be inserted without
> incurring an index scan at the VACUUM cleanup 
> stage.
> This setting currently applies to B-tree indexes only.
>
> 
> I.e. with the default settings we will perform a whole-index scan
> (without visibility map or such) after every 10% growth of the
> table. Which means that, even if the visibility map prevents repeated
> tables accesses, increasing the rate of vacuuming for insert-only tables
> can cause a lot more whole index scans.  Which means that vacuuming an
> insert-only workload frequently *will* increase the total amount of IO,
> even if there is not a single dead tuple. Rather than just spreading the
> same amount of IO over more vacuums.
> 
> And both gin and gist just always do a full index scan, regardless of
> vacuum_cleanup_index_scale_factor (either during a bulk delete, or
> during the cleanup).  Thus more frequent vacuuming for insert-only
> tables can cause a *lot* of pain (even an approx quadratic increase of
> IO?  O(increased_frequency * peak_index_size)?) if you have large
> indexes - which is very common for gin/gist.

Ok, ok.  Thanks for the explanation.

In the light of that, I agree that we should increase the scale_factor.

Yours,
Laurenz Albe

Re: Why does [auto-]vacuum delay not report a wait event?

2020-03-20 Thread Magnus Hagander

On Fri, Mar 20, 2020 at 12:32 PM Amit Kapila  wrote:
>
> On Fri, Mar 20, 2020 at 4:15 AM Andres Freund  wrote:
> >
> > Hi,
> >
> > I was looking at [1], wanting to suggest a query to monitor what
> > autovacuum is mostly waiting on. Partially to figure out whether it's
> > mostly autovacuum cost limiting.
> >
> > But uh, unfortunately the vacuum delay code just sleeps without setting
> > a wait event:
> >
> > void
> > vacuum_delay_point(void)
> > {
> > ...
> > /* Nap if appropriate */
> > if (msec > 0)
> > {
> > if (msec > VacuumCostDelay * 4)
> > msec = VacuumCostDelay * 4;
> >
> > pg_usleep((long) (msec * 1000));
> >
> >
> > Seems like it should instead use a new wait event in the PG_WAIT_TIMEOUT
> > class?
> >
>
> +1.  I think it will be quite helpful.

Definite +1. There should be a wait event, and identifying this
particular case is certainly interesting enough that it should have
it's own.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Why does [auto-]vacuum delay not report a wait event?

2020-03-20 Thread Amit Kapila

On Fri, Mar 20, 2020 at 4:15 AM Andres Freund  wrote:
>
> Hi,
>
> I was looking at [1], wanting to suggest a query to monitor what
> autovacuum is mostly waiting on. Partially to figure out whether it's
> mostly autovacuum cost limiting.
>
> But uh, unfortunately the vacuum delay code just sleeps without setting
> a wait event:
>
> void
> vacuum_delay_point(void)
> {
> ...
> /* Nap if appropriate */
> if (msec > 0)
> {
> if (msec > VacuumCostDelay * 4)
> msec = VacuumCostDelay * 4;
>
> pg_usleep((long) (msec * 1000));
>
>
> Seems like it should instead use a new wait event in the PG_WAIT_TIMEOUT
> class?
>

+1.  I think it will be quite helpful.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: error context for vacuum to include block number

2020-03-20 Thread Amit Kapila

On Fri, Mar 20, 2020 at 12:21 PM Justin Pryzby  wrote:
>
> On Fri, Mar 20, 2020 at 11:24:25AM +0530, Amit Kapila wrote:
> > On Fri, Mar 20, 2020 at 5:59 AM Justin Pryzby  wrote:
> > That makes sense.  I have a few more comments:
> >
> > 1.
> > + VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
> > +} errcb_phase;
> >
> > Why do you need a comma after the last element in the above enum?
>
> It's not needed but a common convention to avoid needing a two-line patch in
> order to add a line at the end, like:
>
> - foo
> + foo,
> + bar
>

I don't think this is required and we don't have this at other places,
so I removed it.   Apart from that, I made a few additional changes
(a) moved the typedef to a different palace as it was looking odd
in-between other struct defines, (b) renamed the enum ErrCbPhase as
that suits more to nearby other trypedefs (c) added/edited comments at
few places, (d) ran pgindent.

See, how the attached looks?  I have written a commit message as well,
see if I have missed anyone is from the credit list?

>
> > 8. Can we think of some easy way to add tests for this patch?
>
> Is it possible to make an corrupted index which errors during scan during
> regress tests ?
>

I don't think so.

For now, let's focus on the main patch.  Once that is committed, we
can look into the other code rearrangement/cleanup patches.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

v30-0001-Introduce-vacuum-errcontext-to-display-additiona.patch
Description: Binary data

Re: plan cache overhead on plpgsql expression

2020-03-20 Thread Pavel Stehule

čt 19. 3. 2020 v 10:47 odesílatel Amit Langote 
napsal:

> Hi Pavel,
>
> Sorry it took me a while to look at this.
>
> On Tue, Feb 25, 2020 at 4:28 AM Pavel Stehule 
> wrote:
> > po 24. 2. 2020 v 18:56 odesílatel Pavel Stehule 
> napsal:
> >> But I found one issue - I don't know if this issue is related to your
> patch or plpgsql_check.
> >>
> >> plpgsql_check try to clean after it was executed - it cleans all plans.
> But some pointers on simple expressions are broken after catched exceptions.
> >>
> >> expr->plan = 0x80. Is interesting, so other fields of this expressions
> are correct.
> >
> > I am not sure, but after patching the SPI_prepare_params the current
> memory context is some short memory context.
> >
> > Can SPI_prepare_params change current memory context? It did before. But
> after patching different memory context is active.
>
> I haven't been able to see the behavior you reported.  Could you let
> me know what unexpected memory context you see in the problematic
> case?
>

There was a problem with plpgsql_check after I applied this patch. It
crashed differently on own regress tests.

But I cannot to reproduce this issue now. Probably there was more issues
than one on my build environment.

So my questions and notes about a change of MemoryContext after patching
are messy. Sorry for noise.

Regards

Pavel



>
> --
> Thank you,
> Amit
>

Re: explain HashAggregate to report bucket and memory stats

2020-03-20 Thread Justin Pryzby

On Fri, Mar 13, 2020 at 10:57:43AM -0700, Andres Freund wrote:
> On 2020-03-13 10:53:17 -0700, Jeff Davis wrote:
> > On Fri, 2020-03-13 at 10:27 -0700, Andres Freund wrote:
> > > On 2020-03-13 10:15:46 -0700, Jeff Davis wrote:
> > > > Also, is there a reason you report two different memory values
> > > > (hashtable and tuples)? I don't object, but it seems like a little too
> > > > much detail.
> > > 
> > > Seems useful to me - the hashtable is pre-allocated based on estimates,
> > > whereas the tuples are allocated "on demand". So seeing the difference
> > > will allow to investigate the more crucial issue...

> > Then do we also want to report separately on the by-ref transition
> > values? That could be useful if you are using ARRAY_AGG and the states
> > grow larger than you might expect.
> 
> I can see that being valuable - I've had to debug cases with too much
> memory being used due to aggregate transitions before. Right now it'd be
> mixed in with tuples, I believe - and we'd need a separate context for
> tracking the transition values? Due to that I'm inclined to not report
> separately for now.

I think that's already in a separate context indexed by grouping set:
src/include/nodes/execnodes.h:  ExprContext **aggcontexts;  /* econtexts 
for long-lived data (per GS) */

But the hashtable and tuples are combined.  I put them in separate contexts and
rebased on top of 1f39bce021540fde00990af55b4432c55ef4b3c7.

But didn't do anything yet with the aggcontexts.

Now I can get output like:

|template1=# explain analyze SELECT i,COUNT(1) FROM t GROUP BY 1;
| HashAggregate  (cost=4769.99..6769.98 rows=19 width=12) (actual 
time=266.465..27020.333 rows=19 loops=1)
|   Group Key: i
|   Buckets: 524288 (originally 262144)
|   Peak Memory Usage: hashtable: 12297kB, tuples: 24576kB
|   Disk Usage: 192 kB
|   HashAgg Batches: 3874
|   ->  Seq Scan on t  (cost=0.00..3769.99 rows=19 width=4) (actual 
time=13.043..64.017 rows=19 loops=1)

It looks somewhat funny next to hash join, which puts everything on one line:

|template1=# explain  analyze SELECT i,COUNT(1) FROM t a JOIN t b USING(i) 
GROUP BY 1;
| HashAggregate  (cost=13789.95..15789.94 rows=19 width=12) (actual 
time=657.733..27129.873 rows=19 loops=1)
|   Group Key: a.i
|   Buckets: 524288 (originally 262144)
|   Peak Memory Usage: hashtable: 12297kB, tuples: 24576kB
|   Disk Usage: 192 kB
|   HashAgg Batches: 3874
|   ->  Hash Join  (cost=6269.98..12789.95 rows=19 width=4) (actual 
time=135.932..426.071 rows=19 loops=1)
| Hash Cond: (a.i = b.i)
| ->  Seq Scan on t a  (cost=0.00..3769.99 rows=19 width=4) (actual 
time=3.265..47.598 rows=19 loops=1)
| ->  Hash  (cost=3769.99..3769.99 rows=19 width=4) (actual 
time=131.881..131.882 rows=19 loops=1)
|   Buckets: 262144  Batches: 1  Memory Usage: 9080kB
|   ->  Seq Scan on t b  (cost=0.00..3769.99 rows=19 width=4) 
(actual time=3.273..40.163 rows=19 loops=1)

-- 
Justin
>From e593c119c97ea31edac4c9f08a39eee451964a16 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 19 Mar 2020 23:03:25 -0500
Subject: [PATCH v8 1/8] nodeAgg: separate context for each hashtable

---
 src/backend/executor/execExpr.c |  2 +-
 src/backend/executor/nodeAgg.c  | 82 +++--
 src/include/executor/nodeAgg.h  |  2 +
 src/include/nodes/execnodes.h   |  2 -
 4 files changed, 51 insertions(+), 37 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 1370ffec50..039c5a8b5f 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -3241,7 +3241,7 @@ ExecBuildAggTransCall(ExprState *state, AggState *aggstate,
 	int adjust_jumpnull = -1;
 
 	if (ishash)
-		aggcontext = aggstate->hashcontext;
+		aggcontext = aggstate->perhash[setno].hashcontext;
 	else
 		aggcontext = aggstate->aggcontexts[setno];
 
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 44c159ab2a..1d319f49d0 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -191,8 +191,7 @@
  *	  So we create an array, aggcontexts, with an ExprContext for each grouping
  *	  set in the largest rollup that we're going to process, and use the
  *	  per-tuple memory context of those ExprContexts to store the aggregate
- *	  transition values.  hashcontext is the single context created to support
- *	  all hash tables.
+ *	  transition values.
  *
  *	  Spilling To Disk
  *
@@ -457,7 +456,7 @@ select_current_set(AggState *aggstate, int setno, bool is_hash)
 	 * ExecAggPlainTransByRef().
 	 */
 	if (is_hash)
-		aggstate->curaggcontext = aggstate->hashcontext;
+		aggstate->curaggcontext = aggstate->perhash[setno].hashcontext;
 	else
 		aggstate->curaggcontext = aggstate->aggcontexts[setno];
 
@@ -1424,8 +1423,7 @@ find_unaggregated_cols_walker(Node *node, Bitmapset **colnos)
  * grouping set for which we're doing hashing.
  *

Re: proposal: schema variables

2020-03-20 Thread Pavel Stehule

pá 20. 3. 2020 v 8:18 odesílatel Pavel Stehule 
napsal:

>
>
> čt 19. 3. 2020 v 10:43 odesílatel DUVAL REMI 
> napsal:
>
>> Hello
>>
>>
>>
>> I played around with the ALTER VARIABLE statement to make sure it’s OK
>> and it seems fine to me.
>>
>>
>>
>> Another last thing before commiting.
>>
>>
>>
>> I agree with Thomas Vondra, that this part might/should be simplified :
>>
>>
>>
>> [ { ON COMMIT DROP | ON { TRANSACTIONAL | TRANSACTION } END RESET } ]
>>
>>
>>
>> I would only allow “ON TRANSACTION END RESET”.
>>
>> I think we don’t need both here.
>>
>> Philippe Beaudoin was indeed talking about the TRANSACTIONAL keyword, but
>> that would have make sense (and I think that’s what he meant) , if you
>> could do something like “CREATE [NON] TRANSACTIONAL VARIABLE …”.
>>
>> But here I don’t think that the ON TRANSACTIONAL END RESET has any sense
>> in English, and it only complicates the syntax.
>>
>>
>>
>> Maybe Thomas Vondra (if it’s him) would be more inclined to commit the
>> patch if it has this more simple syntax has he requested.
>>
>>
>>
>> What do you think ?
>>
>
> I removed TRANSACTIONAL from this clause, see attachement change.diff
>
> Updated patch attached.
>
> I hope it would be the last touch, making it fully ready for a commiter.
>>
>
> Thank you very much for review and testing
>

documentation fix



> Pavel
>
>>


schema-variables-20200320-2.patch.gz
Description: application/gzip

Re: proposal: schema variables

2020-03-20 Thread Pavel Stehule

čt 19. 3. 2020 v 10:43 odesílatel DUVAL REMI  napsal:

> Hello
>
>
>
> I played around with the ALTER VARIABLE statement to make sure it’s OK and
> it seems fine to me.
>
>
>
> Another last thing before commiting.
>
>
>
> I agree with Thomas Vondra, that this part might/should be simplified :
>
>
>
> [ { ON COMMIT DROP | ON { TRANSACTIONAL | TRANSACTION } END RESET } ]
>
>
>
> I would only allow “ON TRANSACTION END RESET”.
>
> I think we don’t need both here.
>
> Philippe Beaudoin was indeed talking about the TRANSACTIONAL keyword, but
> that would have make sense (and I think that’s what he meant) , if you
> could do something like “CREATE [NON] TRANSACTIONAL VARIABLE …”.
>
> But here I don’t think that the ON TRANSACTIONAL END RESET has any sense
> in English, and it only complicates the syntax.
>
>
>
> Maybe Thomas Vondra (if it’s him) would be more inclined to commit the
> patch if it has this more simple syntax has he requested.
>
>
>
> What do you think ?
>

I removed TRANSACTIONAL from this clause, see attachement change.diff

Updated patch attached.

I hope it would be the last touch, making it fully ready for a commiter.
>

Thank you very much for review and testing

Pavel

>
diff --git a/doc/src/sgml/ref/create_variable.sgml b/doc/src/sgml/ref/create_variable.sgml
index 1b1883a11a..cf175e05c6 100644
--- a/doc/src/sgml/ref/create_variable.sgml
+++ b/doc/src/sgml/ref/create_variable.sgml
@@ -22,7 +22,7 @@ PostgreSQL documentation
  
 
 CREATE [ { TEMPORARY | TEMP } ] [ IMMUTABLE ] VARIABLE [ IF NOT EXISTS ] name [ AS ] data_type ] [ COLLATE collation ]
-[ NOT NULL ] [ DEFAULT default_expr ] [ { ON COMMIT DROP | ON { TRANSACTIONAL | TRANSACTION } END RESET } ]
+[ NOT NULL ] [ DEFAULT default_expr ] [ { ON COMMIT DROP | ON { TRANSACTION } END RESET } ]
 
  
  
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 744342733d..c135a35d07 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -4567,7 +4567,6 @@ OptSchemaVarDefExpr: DEFAULT b_expr	{ $$ = $2; }
 
 OnEOXActionOption:  ON COMMIT DROP	{ $$ = VARIABLE_EOX_DROP; }
 			| ON TRANSACTION END_P RESET			{ $$ = VARIABLE_EOX_RESET; }
-			| ON TRANSACTIONAL END_P RESET			{ $$ = VARIABLE_EOX_RESET; }
 			| /*EMPTY*/{ $$ = VARIABLE_EOX_NOOP; }
 		;
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d39bcfe9cf..d26b0efcd9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5978,7 +5978,7 @@
   proname => 'pg_collation_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid',
   prosrc => 'pg_collation_is_visible' },
-{ oid => '4191', descr => 'is schema variable visible in search path?',
+{ oid => '4142', descr => 'is schema variable visible in search path?',
   proname => 'pg_variable_is_visible', procost => '10', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid', prosrc => 'pg_variable_is_visible' },
 


schema-variables-20200320.patch.gz
Description: application/gzip

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-20 Thread Masahiko Sawada

On Fri, 20 Mar 2020 at 15:20, Andres Freund  wrote:
>
> Hi,
>
> On 2020-03-19 06:45:48 +0100, Laurenz Albe wrote:
> > On Tue, 2020-03-17 at 18:02 -0700, Andres Freund wrote:
> > > I don't think a default scale factor of 0 is going to be ok. For
> > > large-ish tables this will basically cause permanent vacuums. And it'll
> > > sometimes trigger for tables that actually coped well so far. 10 million
> > > rows could be a few seconds, not more.
> > >
> > > I don't think that the argument that otherwise a table might not get
> > > vacuumed before autovacuum_freeze_max_age is convincing enough.
> > >
> > > a) if that's indeed the argument, we should increase the default
> > >   autovacuum_freeze_max_age - now that there's insert triggered vacuums,
> > >   the main argument against that from before isn't valid anymore.
> > >
> > > b) there's not really a good arguments for vacuuming more often than
> > >   autovacuum_freeze_max_age for such tables. It'll not be not frequent
> > >   enough to allow IOS for new data, and you're not preventing
> > >   anti-wraparound vacuums from happening.
> >
> > According to my reckoning, that is the remaining objection to the patch
> > as it is (with ordinary freezing behavior).
> >
> > How about a scale_factor od 0.005?  That will be high enough for large
> > tables, which seem to be the main concern here.
> >
> > I fully agree with your point a) - should that be part of the patch?
> >
> > I am not sure about b).  In my mind, the objective is not to prevent
> > anti-wraparound vacuums, but to see that they have less work to do,
> > because previous autovacuum runs already have frozen anything older than
> > vacuum_freeze_min_age.  So, assuming linear growth, the number of tuples
> > to freeze during any run would be at most one fourth of today's number
> > when we hit autovacuum_freeze_max_age.
>
> Based on two IM conversations I think it might be worth emphasizing how
> vacuum_cleanup_index_scale_factor works:
>
> For btree, even if there is not a single deleted tuple, we can *still*
> end up doing a full index scans at the end of vacuum. As the docs describe
> vacuum_cleanup_index_scale_factor:
>
>
> Specifies the fraction of the total number of heap tuples counted in
> the previous statistics collection that can be inserted without
> incurring an index scan at the VACUUM cleanup 
> stage.
> This setting currently applies to B-tree indexes only.
>
>
> I.e. with the default settings we will perform a whole-index scan
> (without visibility map or such) after every 10% growth of the
> table. Which means that, even if the visibility map prevents repeated
> tables accesses, increasing the rate of vacuuming for insert-only tables
> can cause a lot more whole index scans.  Which means that vacuuming an
> insert-only workload frequently *will* increase the total amount of IO,
> even if there is not a single dead tuple. Rather than just spreading the
> same amount of IO over more vacuums.

Right.

>
> And both gin and gist just always do a full index scan, regardless of
> vacuum_cleanup_index_scale_factor (either during a bulk delete, or
> during the cleanup).  Thus more frequent vacuuming for insert-only
> tables can cause a *lot* of pain (even an approx quadratic increase of
> IO?  O(increased_frequency * peak_index_size)?) if you have large
> indexes - which is very common for gin/gist.

That's right but for gin, more frequent vacuuming for insert-only
tables can help to clean up the pending list, which increases search
speed and better than doing it by a backend process.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Add schema and table names to partition error

2020-03-20 Thread Amit Kapila

On Thu, Mar 19, 2020 at 3:34 PM Amit Langote  wrote:
>
> Thank you Chris, Amit.
>
> On Thu, Mar 19, 2020 at 1:46 PM Amit Kapila  wrote:
> > On Thu, Mar 19, 2020 at 3:55 AM Chris Bandy  wrote:
> > >
> > >
> > > Sorry for these troubles. Attached are patches created using `git
> > > format-patch -n -v6` on master at 487e9861d0.
> > >
> >
> > No problem.  I have extracted your code changes as a separate patch
> > (see attached) as I am not sure we want to add tests for these cases.
> > This doesn't apply in back-branches, but I think that is small work
> > and we can do that if required.  The real question is do we want to
> > back-patch this?  Basically, this improves the errors in certain cases
> > by providing additional information that otherwise the user might need
> > to extract from error messages.  So, there doesn't seem to be pressing
> > need to back-patch this but OTOH, we have mentioned in docs that we
> > support to display this information for all SQLSTATE class 23
> > (integrity constraint violation) errors which is not true as we forgot
> > to adhere to that in some parts of code.
> >
> > What do you think?  Anybody else has an opinion on whether to
> > back-patch this or not?
>
> As nobody except Chris complained about this so far, maybe no?
>

Fair enough, unless I see any other opinions, I will push this on Monday.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: error context for vacuum to include block number

2020-03-20 Thread Justin Pryzby

On Fri, Mar 20, 2020 at 11:24:25AM +0530, Amit Kapila wrote:
> On Fri, Mar 20, 2020 at 5:59 AM Justin Pryzby  wrote:
> That makes sense.  I have a few more comments:
> 
> 1.
> + VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
> +} errcb_phase;
> 
> Why do you need a comma after the last element in the above enum?

It's not needed but a common convention to avoid needing a two-line patch in
order to add a line at the end, like:

- foo
+ foo,
+ bar

> 2. update_vacuum_error_cbarg(vacrelstats, VACUUM_ERRCB_PHASE_SCAN_HEAP, 
> InvalidBlockNumber, NULL);
> 
> Why do we need to call update_vacuum_error_cbarg at the above place
> after we have added a new one inside for.. loop?

If we're going to update the error_context_stack global point to our callback,
without our vacrelstats arg, it'd better be initialized.  I changed to do
vacrelstats->phase = UNKNOWN after its allocation in heap_vacuum_rel().
That matches parallel_vacuum_main().

> 4. At this and similar places, change the comment to something like:
> "Reset the old phase information for error traceback".

I did this:
/* Revert back to the old phase information for error traceback */

> 5. Subject: [PATCH v28 3/5] Drop reltuples
> 
> Is this patch directly related to the main patch (vacuum errcontext to
> show block being processed) or is it an independent improvement of
> code?

It's a cleanup after implementing the new feature.  I left it as a separate
patch to make review easier of the essential patch and of the cleanup.  
See here:
https://www.postgresql.org/message-id/CA%2Bfd4k4JA3YkP6-HUqHOqu6cTGqqZUhBfsMqQ4WXkD0Y8uotUg%40mail.gmail.com

> 6. [PATCH v28 4/5] add callback for truncation
> 
> + VACUUM_ERRCB_PHASE_TRUNCATE,
> + VACUUM_ERRCB_PHASE_TRUNCATE_PREFETCH,
> 
> Do we really need separate phases for truncate and truncate_prefetch?

The context is that there was a request to add err context for (yet another)
phase, TRUNCATE.  But I insisted on adding it to prefetch, too, since it does
ReadBuffer.  But there was an objection that the error might be misleading if
it said "while truncating" but it was actually "prefetching to truncate".

> 7. Is there a reason to keep the truncate phase patch separate from
> the main patch? If not, let's merge them.

They were separate since it's the most-recently added part, and (as now)
there's still discussion about it.

> 8. Can we think of some easy way to add tests for this patch?

Is it possible to make an corrupted index which errors during scan during
regress tests ?

Thanks for looking.

-- 
Justin
>From 941bd2292bd1f3ed926aa20eb572906a6a36df3a Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v29 1/3] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 245 ---
 src/tools/pgindent/typedefs.list |   1 +
 2 files changed, 221 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43efc32..531f8471db 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -268,8 +268,20 @@ typedef struct LVParallelState
 	int			nindexes_parallel_condcleanup;
 } LVParallelState;
 
+typedef enum
+{
+	VACUUM_ERRCB_PHASE_UNKNOWN,
+	VACUUM_ERRCB_PHASE_SCAN_HEAP,
+	VACUUM_ERRCB_PHASE_VACUUM_INDEX,
+	VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+	VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
+	VACUUM_ERRCB_PHASE_TRUNCATE,
+} errcb_phase;
+
 typedef struct LVRelStats
 {
+	char	   *relnamespace;
+	char	   *relname;
 	/* useindex = true means two-pass strategy; false means one-pass */
 	bool		useindex;
 	/* Overall statistics about rel */
@@ -290,8 +302,12 @@ typedef struct LVRelStats
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
-} LVRelStats;
 
+	/* Used for error callback: */
+	char	   *indname;
+	BlockNumber blkno;			/* used only for heap operations */
+	errcb_phase phase;
+} LVRelStats;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -314,10 +330,10 @@ static void lazy_vacuum_all_indexes(Relation onerel, Relation *Irel,
 	LVRelStats *vacrelstats, LVParallelState *lps,
 	int nindexes);
 static void lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
-			  LVDeadTuples *dead_tuples, double reltuples);
+			  LVDeadTuples *dead_tuples, double reltuples, LVRelStats *vacrelstats);
 static void lazy_cleanup_index(Relation indrel,
 			   IndexBulkDeleteResult **stats,
-			   double reltuples, bool estimated_count);
+			   double reltuples, bool estimated_count, LVRelStats *vacrelstats);
 static int	lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 			 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
 static bool should_attempt_truncation(VacuumParams *params,
@@ -337,13 +353,13 @@

Re: [PATCH] Add schema and table names to partition error

2020-03-20 Thread Amit Kapila

On Thu, Mar 19, 2020 at 8:21 PM Chris Bandy  wrote:
>
> On 3/18/20 11:46 PM, Amit Kapila wrote:
> > On Thu, Mar 19, 2020 at 3:55 AM Chris Bandy  wrote:
> >>
> >>
> >> Sorry for these troubles. Attached are patches created using `git
> >> format-patch -n -v6` on master at 487e9861d0.
> >>
> >
> > No problem.  I have extracted your code changes as a separate patch
> > (see attached) as I am not sure we want to add tests for these cases.
>
> Patch looks good.
>
> My last pitch to keep the tests: These would be the first and only
> automated tests that verify errtable, errtableconstraint, etc.
>

I don't object to those tests.  However, I don't feel adding just for
this patch is advisable.  I suggest you start a new thread for these
tests and let us see what others think about them.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-20 Thread Andres Freund

Hi,

On 2020-03-20 06:59:57 +0100, Laurenz Albe wrote:
> On Thu, 2020-03-19 at 15:17 -0700, Andres Freund wrote:
> > I am *VERY* doubtful that the attempt of using a large threshold, and a
> > tiny scale factor, is going to work out well. I'm not confident enough
> > in my gut feeling to full throatedly object, but confident enough that
> > I'd immediately change it on any important database I operated.
> > 
> > Independent of how large a constant you set the threshold to, for
> > databases with substantially bigger tables this will lead to [near]
> > constant vacuuming. As soon as you hit 1 billion rows - which isn't
> > actually that much - this is equivalent to setting
> > autovacuum_{vacuum,analyze}_scale_factor to 0.01. There's cases where
> > that can be a sensible setting, but I don't think anybody would suggest
> > it as a default.
> 
> In that, you are assuming that the bigger a table is, the more data
> modifications it will get, so that making the scale factor the dominant
> element will work out better.

> My experience is that it is more likely for the change rate (inserts,
> I am less certain about updates and deletes) to be independent of the
> table size.  (Too) many large databases are so large not because the
> data influx grows linearly over time, but because people don't want to
> get rid of old data (or would very much like to do so, but never planned
> for it).

I don't think growing ingest rate into insert only tables is exactly
rare. Maybe I've been too long in the Bay Area though.

> This second scenario would be much better served by a high threshold and
> a low scale factor.

I don't think that's really true. As soon as there's any gin/gist
indexes, a single non-HOT dead tuple, or a btree index grew by more
than vacuum_cleanup_index_scale_factor, indexes are scanned as a
whole. See the email I just concurrently happened to write:
https://postgr.es/m/20200320062031.uwagypenawujwajx%40alap3.anarazel.de

Which means that often each additional vacuum causes IO that's
proportional to the *total* index size, *not* the table size
delta. Which means that the difference in total IO basically is
O(increased_frequency * peak_table_size) in the worst case.

> > After thinking about it for a while, I think it's fundamentally flawed
> > to use large constant thresholds to avoid unnecessary vacuums. It's easy
> > to see cases where it's bad for common databases of today, but it'll be
> > much worse a few years down the line where common table sizes have grown
> > by a magnitude or two. Nor do they address the difference between tables
> > of a certain size with e.g. 2kb wide rows, and a same sized table with
> > 28 byte wide rows.  The point of constant thresholds imo can only be to
> > avoid unnecessary work at the *small* (even tiny) end, not the opposite.
> > 
> > 
> > I think there's too much "reinventing" autovacuum scheduling in a
> > "local" insert-only manner happening in this thread. And as far as I can
> > tell additionally only looking at a somewhat narrow slice of insert only
> > workloads.
> 
> Perhaps.  The traditional "high scale factor, low threshold" system
> is (in my perception) mostly based on the objective of cleaning up
> dead tuples.  When autovacuum was introduced, index only scans were
> only a dream.
> 
> With the objective of getting rid of dead tuples, having the scale factor
> be the dominant part makes sense: it is OK for bloat to be a certain
> percentage of the table size.
> 

As far as I can tell this argument doesn't make sense in light of the ob
fact that many vacuums trigger whole index scans, even if there are no
deleted tuples, as described above?

Even disregarding the index issue, I still don't think your argument is
very convicing.  For one, as I mentioned in another recent email, 10
million rows in a narrow table is something entirely different than 10
million rows in a very wide table. scale_factor doesn't have that
problem to the same degree.  Also, it's fairly obvious that this
argument doesn't hold in the general sense, otherwise we could just set
a threshold of, say, 1.

There's also the issue that frequent vacuums will often not be able to
mark most of the the new data all-visible, due to concurrent
sessions. E.g. concurrent bulk loading sessions, analytics queries
actually looking at the data, replicas all can easily prevent data that
was just inserted from being marked 'all-visible' (not to speak of
frozen). That's not likely to be a problem in a purely oltp system that
inserts only single rows per xact, and has no longlived readers (nor
replicas with hs_feedback = on), but outside of that...

> Also, as you say, tables were much smaller then, and they will only
> become bigger in the future.  But I find that to be an argument *for*
> making the threshold the dominant element: otherwise, you vacuum less
> and less often, and the individual runs become larger and larger.

Which mostly is ok, because there are significant costs that

Re: replay pause vs. standby promotion

2020-03-20 Thread Atsushi Torikoshi

On Fri, Mar 6, 2020 at 10:18 PM Fujii Masao 
wrote:

>
> OK, so patch attached.
>
> This patch causes, if a promotion is triggered while recovery is paused,
> the paused state to end and a promotion to continue. OTOH, this patch
> makes pg_wal_replay_pause() and _resume() throw an error if it's executed
> while a promotion is ongoing.

Regarding recovery_target_action, if the recovery target is reached
> while a promotion is ongoing, "pause" setting will act the same as
> "promote",
> i.e., recovery will finish and the server will start to accept connections.
>
> To implement the above, I added new shared varible indicating whether
> a promotion is triggered or not. Only startup process can update this
> shared
> variable. Other processes like read-only backends can check whether
> promotion is ongoing, via this variable.
>
> I added new function PromoteIsTriggered() that returns true if a promotion
> is triggered. Since the name of this function and the existing function
> IsPromoteTriggered() are confusingly similar, I changed the name of
> IsPromoteTriggered() to IsPromoteSignaled, as more appropriate name.
>

I've confirmed the patch works as you described above.
And I also poked around it a little bit but found no problems.

Regards,

--
Atsushi Torikoshi

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-20 Thread Andres Freund

Hi,

On 2020-03-19 06:45:48 +0100, Laurenz Albe wrote:
> On Tue, 2020-03-17 at 18:02 -0700, Andres Freund wrote:
> > I don't think a default scale factor of 0 is going to be ok. For
> > large-ish tables this will basically cause permanent vacuums. And it'll
> > sometimes trigger for tables that actually coped well so far. 10 million
> > rows could be a few seconds, not more.
> > 
> > I don't think that the argument that otherwise a table might not get
> > vacuumed before autovacuum_freeze_max_age is convincing enough.
> > 
> > a) if that's indeed the argument, we should increase the default
> >   autovacuum_freeze_max_age - now that there's insert triggered vacuums,
> >   the main argument against that from before isn't valid anymore.
> > 
> > b) there's not really a good arguments for vacuuming more often than
> >   autovacuum_freeze_max_age for such tables. It'll not be not frequent
> >   enough to allow IOS for new data, and you're not preventing
> >   anti-wraparound vacuums from happening.
> 
> According to my reckoning, that is the remaining objection to the patch
> as it is (with ordinary freezing behavior).
> 
> How about a scale_factor od 0.005?  That will be high enough for large
> tables, which seem to be the main concern here.
> 
> I fully agree with your point a) - should that be part of the patch?
> 
> I am not sure about b).  In my mind, the objective is not to prevent
> anti-wraparound vacuums, but to see that they have less work to do,
> because previous autovacuum runs already have frozen anything older than
> vacuum_freeze_min_age.  So, assuming linear growth, the number of tuples
> to freeze during any run would be at most one fourth of today's number
> when we hit autovacuum_freeze_max_age.

Based on two IM conversations I think it might be worth emphasizing how
vacuum_cleanup_index_scale_factor works:

For btree, even if there is not a single deleted tuple, we can *still*
end up doing a full index scans at the end of vacuum. As the docs describe
vacuum_cleanup_index_scale_factor:

Specifies the fraction of the total number of heap tuples counted in
the previous statistics collection that can be inserted without
incurring an index scan at the VACUUM cleanup stage.
This setting currently applies to B-tree indexes only.

I.e. with the default settings we will perform a whole-index scan
(without visibility map or such) after every 10% growth of the
table. Which means that, even if the visibility map prevents repeated
tables accesses, increasing the rate of vacuuming for insert-only tables
can cause a lot more whole index scans.  Which means that vacuuming an
insert-only workload frequently *will* increase the total amount of IO,
even if there is not a single dead tuple. Rather than just spreading the
same amount of IO over more vacuums.

And both gin and gist just always do a full index scan, regardless of
vacuum_cleanup_index_scale_factor (either during a bulk delete, or
during the cleanup).  Thus more frequent vacuuming for insert-only
tables can cause a *lot* of pain (even an approx quadratic increase of
IO?  O(increased_frequency * peak_index_size)?) if you have large
indexes - which is very common for gin/gist.

Is there something missing in the above description?

Greetings,

Andres Freund

Re: Internal key management system

2020-03-20 Thread Masahiko Sawada

On Fri, 20 Mar 2020 at 01:38, Bruce Momjian  wrote:
>
> On Fri, Mar 20, 2020 at 12:50:27AM +0900, Masahiko Sawada wrote:
> > On Fri, Mar 20, 2020 at 0:35 Bruce Momjian  wrote:
> > Well, the issue is if the user can control the user key, there is might 
> > be
> > a way to make the user key do nothing.
> >
> > Well I meant ‘USER_KEY:’ is a fixed length string for the key used for wrap 
> > and
> > unwrap SQL interface functions. So user cannot control it. We will have 
> > another
> > key derived by, for example, HKDF(MK, ‘TDE_KEY:’ || system_identifier) for
> > block encryption.
>
> OK, yes, something liek that might make sense.
>

Attached the updated version patch. The patch introduces KDF to derive
a new key from the master encryption key. We use the derived key for
pg_wrap and pg_unwrap SQL functions, instead of directly using the
master encryption key. In the future, we will be able to have a
separate derived keys for block encryption. As a result of using KDF,
the minimum version of OpenSSL when enabling key management is 1.1.0.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

kms_v7.patch
Description: Binary data

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-20 Thread Laurenz Albe

On Thu, 2020-03-19 at 14:38 -0700, Andres Freund wrote:
> > I am not sure about b).  In my mind, the objective is not to prevent
> > anti-wraparound vacuums, but to see that they have less work to do,
> > because previous autovacuum runs already have frozen anything older than
> > vacuum_freeze_min_age.  So, assuming linear growth, the number of tuples
> > to freeze during any run would be at most one fourth of today's number
> > when we hit autovacuum_freeze_max_age.
> 
> This whole chain of arguments seems like it actually has little to do
> with vacuuming insert only/mostly tables. The same problem exists for
> tables that aren't insert only/mostly. Instead it IMO is an argument for
> a general change in logic about when to freeze.

My goal was to keep individual vacuum runs from having too much
work to do.  The freezing was an afterthought.

The difference (for me) is that I am more convinced that the insert
rate for insert-only table is constant over time than I am of the
update rate to be constant.

> What exactly is it that you want to achieve by having anti-wrap vacuums
> be quicker? If the goal is to reduce the window in which autovacuums
> aren't automatically cancelled when there's a conflicting lock request,
> or in which autovacuum just schedules based on xid age, then you can't
> have wraparound vacuums needing to do substantial amount of work.
> 
> Except for not auto-cancelling, and the autovac scheduling issue,
> there's really nothing magic about anti-wrap vacuums.

Yes.  I am under the impression that it is the duration and amount
of work per vacuum run that is the problem here, not the aggressiveness
as such.

If you are in the habit of frequently locking tables with high
lock modes (and I have seen people do that), you are lost anyway:
normal autovacuum runs will always die, and anti-wraparound vacuum
will kill you.  There is nothing we can do about that, except perhaps
put a fat warning in the documentation of LOCK.

> If the goal is to avoid redundant writes, then it's largely unrelated to
> anti-wrap vacuums, and can to a large degree addressed by
> opportunistically freezing (best even during hot pruning!).
> 
> 
> I am more and more convinced that it's a seriously bad idea to tie
> committing "autovacuum after inserts" to also committing a change in
> logic around freezing. That's not to say we shouldn't try to address
> both this cycle, but discussing them as if they really are one item
> makes it both more likely that we get nothing in, and more likely that
> we miss the larger picture.

I hear you, and I agree that we shouldn't do it with this patch.

> If there are no other modifications to the page, more aggressively
> freezing can lead to seriously increased write volume. Its quite normal
> to have databases where data in insert only tables *never* gets old
> enough to need to be frozen (either because xid usage is low, or because
> older partitions are dropped).  If data in an insert-only table isn't
> write-only, the hint bits are likely to already be set, which means that
> vacuum will just cause the entire table to be written another time,
> without a reason.
> 
> 
> I don't see how it's ok to substantially regress this very common
> workload. IMO this basically means that more aggressively and
> non-opportunistically freezing simply is a no-go (be it for insert or
> other causes for vacuuming).
> 
> What am I missing?

Nothing that I can see, and these are good examples why eager freezing
may not be such a smart idea after all.

I think your idea of freezing everything on a page when we know it is
going to be dirtied anyway is the smartest way of going about that.

My only remaining quibbles are about scale factor and threshold, see
my other mail.

Yours,
Laurenz Albe

Re: Improving connection scalability: GetSnapshotData()

2020-03-20 Thread Andres Freund

Hi,

Thanks for looking!

On 2020-03-20 18:23:03 +1300, Thomas Munro wrote:
> On Sun, Mar 1, 2020 at 9:36 PM Andres Freund  wrote:
> > I'm still working on cleaning that part of the patch up, I'll post it in
> > a bit.
> 
> I looked at that part on your public pgxact-split branch.  In that
> version you used "CSN" rather than something based on LSNs, which I
> assume avoids complications relating to WAL locking or something like
> that.

Right, I first tried to use LSNs, but after further tinkering found that
it's too hard to address the difference between visiblity order and LSN
order. I don't think there's an easy way to address the difference.


> We should probably be careful to avoid confusion with the
> pre-existing use of the term "commit sequence number" (CommitSeqNo,
> CSN) that appears in predicate.c.

I looked at that after you mentioned it on IM. But I find it hard to
grok what it's precisely defined at. There's basically no comments
explaining what it's really supposed to do, and I find the relevant code
far from easy to grok :(.


> This also calls to mind the 2013-2016 work by Ants Aasma and others[1]
> on CSN-based snapshots, which is obviously a much more radical change,
> but really means what it says (commits).

Well, I think you could actually build some form of more dense snapshots
ontop of "my" CSN, with a bit of effort (and lot of handwaving). I don't
think they're that different concepts.


> The CSN in your patch set is used purely as a level-change for
> snapshot cache invalidation IIUC, and it advances also for aborts --
> so maybe it should be called something like completed_xact_count,
> using existing terminology from procarray.c.

I expect it to be used outside of snapshots too, in the future, FWIW.

completed_xact_count sounds good to me.


> +   if (snapshot->csn != 0 && MyProc->xidCopy == InvalidTransactionId &&
> +   UINT64_ACCESS_ONCE(ShmemVariableCache->csn) == snapshot->csn)
> 
> Why is it OK to read ShmemVariableCache->csn without at least a read
> barrier?  I suppose this allows a cached snapshot to be used very soon
> after a transaction commits and should be visible to you, but ...
> hmmmrkwjherkjhg... I guess it must be really hard to observe any
> anomaly.  Let's see... maybe it's possible on a relaxed memory system
> like POWER or ARM, if you use a shm flag to say "hey I just committed
> a transaction", and the other guy sees the flag but can't yet see the
> new CSN, so an SPI query can't see the transaction?

Yea, it does need more thought / comments. I can't really see an actual
correctness violation though. As far as I can tell you'd never be able
to get an "older" ShmemVariableCache->csn than one since *after* the
last lock acquired/released by the current backend - which then also
means a different "ordering" would have been possible allowing the
current backend to take the snapshot earlier.


> Another theoretical problem is the non-atomic read of a uint64 on some
> 32 bit platforms.

Yea, it probably should be a pg_atomic_uint64 to address that. I don't
think it really would cause problems, because I think it'd always end up
causing an unnecessary snapshot build. But there's no need to go there.


> > 0007: WIP: Introduce abstraction layer for "is tuple invisible" tests.
> >
> >   This is the most crucial piece. Instead of code directly using
> >   RecentOldestXmin, there's a new set of functions for testing
> >   whether an xid is visible (InvisibleToEveryoneTestXid() et al).
> >
> >   Those function use new horizon boundaries computed as part of
> >   GetSnapshotData(), and recompute an accurate boundary when the
> >   tested xid falls inbetween.
> >
> >   There's a bit more infrastructure needed - we need to limit how
> >   often an accurate snapshot is computed. Probably to once per
> >   snapshot? Or once per transaction?
> >
> >
> >   To avoid issues with the lower boundary getting too old and
> >   presenting a wraparound danger, I made all the xids be
> >   FullTransactionIds. That imo is a good thing?
> 
> +1, as long as we don't just move the wraparound danger to the places
> where we convert xids to fxids!
> 
> +/*
> + * Be very careful about when to use this function. It can only safely be 
> used
> + * when there is a guarantee that, at the time of the call, xid is within 2
> + * billion xids of rel. That e.g. can be guaranteed if the the caller assures
> + * a snapshot is held by the backend, and xid is from a table (where
> + * vacuum/freezing ensures the xid has to be within that range).
> + */
> +static inline FullTransactionId
> +FullXidViaRelative(FullTransactionId rel, TransactionId xid)
> +{
> +   uint32 rel_epoch = EpochFromFullTransactionId(rel);
> +   TransactionId rel_xid = XidFromFullTransactionId(rel);
> +   uint32 epoch;
> +
> +   /*
> +* TODO: A function to easily write an assertion ensuring that xid is
> +* between [oldestXid,

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-20 Thread Laurenz Albe

On Thu, 2020-03-19 at 15:17 -0700, Andres Freund wrote:
> I am doubtful it should be committed with the current settings. See below.
> 
> > From 3ba4b572d82969bbb2af787d1bccc72f417ad3a0 Mon Sep 17 00:00:00 2001
> > From: Laurenz Albe 
> > Date: Thu, 19 Mar 2020 20:26:43 +0100
> > Subject: [PATCH] Autovacuum tables that have received only inserts
> > 
> > This avoids the known problem that insert-only tables
> > are never autovacuumed until they need to have their
> > anti-wraparound autovacuum, which then can be massive
> > and disruptive.
> 
> Shouldn't this also mention index only scans? IMO that's at least as big
> a problem as the "large vacuum" problem.

Yes, that would be good.

> I am *VERY* doubtful that the attempt of using a large threshold, and a
> tiny scale factor, is going to work out well. I'm not confident enough
> in my gut feeling to full throatedly object, but confident enough that
> I'd immediately change it on any important database I operated.
> 
> Independent of how large a constant you set the threshold to, for
> databases with substantially bigger tables this will lead to [near]
> constant vacuuming. As soon as you hit 1 billion rows - which isn't
> actually that much - this is equivalent to setting
> autovacuum_{vacuum,analyze}_scale_factor to 0.01. There's cases where
> that can be a sensible setting, but I don't think anybody would suggest
> it as a default.

In that, you are assuming that the bigger a table is, the more data
modifications it will get, so that making the scale factor the dominant
element will work out better.

My experience is that it is more likely for the change rate (inserts,
I am less certain about updates and deletes) to be independent of the
table size.  (Too) many large databases are so large not because the
data influx grows linearly over time, but because people don't want to
get rid of old data (or would very much like to do so, but never planned
for it).

This second scenario would be much better served by a high threshold and
a low scale factor.

> After thinking about it for a while, I think it's fundamentally flawed
> to use large constant thresholds to avoid unnecessary vacuums. It's easy
> to see cases where it's bad for common databases of today, but it'll be
> much worse a few years down the line where common table sizes have grown
> by a magnitude or two. Nor do they address the difference between tables
> of a certain size with e.g. 2kb wide rows, and a same sized table with
> 28 byte wide rows.  The point of constant thresholds imo can only be to
> avoid unnecessary work at the *small* (even tiny) end, not the opposite.
> 
> 
> I think there's too much "reinventing" autovacuum scheduling in a
> "local" insert-only manner happening in this thread. And as far as I can
> tell additionally only looking at a somewhat narrow slice of insert only
> workloads.

Perhaps.  The traditional "high scale factor, low threshold" system
is (in my perception) mostly based on the objective of cleaning up
dead tuples.  When autovacuum was introduced, index only scans were
only a dream.

With the objective of getting rid of dead tuples, having the scale factor
be the dominant part makes sense: it is OK for bloat to be a certain
percentage of the table size.

Also, as you say, tables were much smaller then, and they will only
become bigger in the future.  But I find that to be an argument *for*
making the threshold the dominant element: otherwise, you vacuum less
and less often, and the individual runs become larger and larger.
Now that vacuum skips pages where it knows it has nothing to do,
doesn't take away much of the pain of vacuuming large tables where
nothing much has changed?

> I, again, strongly suggest using much more conservative values here. And
> then try to address the shortcomings - like not freezing aggressively
> enough - in separate patches (and by now separate releases, in all
> likelihood).

There is much to say for that, I agree.

> This will have a huge impact on a lot of postgres
> installations. Autovacuum already is perceived as one of the biggest
> issues around postgres. If the ratio of cases where these changes
> improve things to the cases it regresses isn't huge, it'll be painful
> (silent improvements are obviously less noticed than breakages).

Yes, that makes it scary to mess with autovacuum.

One of the problems I see in the course of this discussion is that one
can always come up with examples that make any choice look bad.
It is impossible to do it right for everybody.

In the light of that, I won't object to a more conservative default
value for the parameters, even though my considerations above suggest
to me the opposite.  But perhaps my conclusions are based on flawed
premises.

Yours,
Laurenz Albe

53 matches

Mail list logo