Re: [PATCH] /src/backend/access/transam/xlog.c, tiny improvements

2020-01-26 Thread Michael Paquier
On Sun, Jan 26, 2020 at 06:47:57PM -0800, Mark Dilger wrote:
> There is something unusual about comparing a XLogSegNo variable in
> this way, but it seems to go back to 2014 when the replication slots
> were introduced in commit 858ec11858a914d4c380971985709b6d6b7dd6fc,
> and XLogSegNo was unsigned then, too.  Depending on how you look at
> it, this could be a thinko, or it could be defensive programming
> against future changes to the XLogSegNo typedef.  I’m betting it was
> defensive programming, given the context.  As such, I don’t think it
> would be appropriate to remove this defense in your patch. 

Yeah.  To e honest, I am not actually sure if it is worth bothering
about any of those three places.
--
Michael


signature.asc
Description: PGP signature


Re: can we use different function in place of atoi in vacuumdb.c file

2020-01-26 Thread Surafel Temesgen
Hi,
On Thu, Jan 23, 2020 at 3:56 PM Mahendra Singh Thalor 
wrote:

> Hi all,
> While reviewing one patch, I found that if we give any non-integer string
> to atoi (say aa), then it is returning zero(0) as output so we are not
> giving any error(assuming 0 as valid argument) and continuing our
> operations.
>
> Ex:
> Let say, we gave "-P aa" (patch is in review[1]), then it will disable
> parallel vacuum because atoi is returning zero as parallel degree but
> ideally it should give error or at least it should not disable parallel
> vacuum.
>
> I think, in-place of atoi, we should use different function ( defGetInt32,
> strtoint) or we can write own function.
>
> Thoughts?
>
> [1]:
> https://www.postgresql.org/message-id/CA%2Bfd4k6DgwtQSr4%3DUeY%2BWbGuF7-oD%3Dm-ypHPy%2BsYHiXZc%2BhTUQ%40mail.gmail.com
> --
>


For server side there are also scanint8 function for string parsing and for
Clint side we can use strtoint function that didn't have the above issue
and it accept char pointer which is I think suitable for [1] usage.

regards

Surafel


RE: [PoC] Non-volatile WAL buffer

2020-01-26 Thread Takashi Menjo
Hello Heikki,

> I have the same comments on this that I had on the previous patch, see:
> 
> https://www.postgresql.org/message-id/2aec6e2a-6a32-0c39-e4e2-aad854543aa8%40iki.fi

Thanks.  I re-read your messages [1][2].  What you meant, AFAIU, is how
about using memory-mapped WAL segment files as WAL buffers, and switching
CPU instructions or msync() depending on whether the segment files are on
PMEM or not, to sync inserted WAL records. 

It sounds reasonable, but I'm sorry that I haven't tested such a program
yet.  I'll try it to compare with my non-volatile WAL buffer.  For now, I'm
a little worried about the overhead of mmap()/munmap() for each WAL segment
file.

You also told a SIGBUS problem of memory-mapped I/O.  I think it's true for
reading from bad memory blocks, as you mentioned, and also true for writing
to such blocks [3].  Handling SIGBUS properly or working around it is future
work.

Best regards,
Takashi

[1] 
https://www.postgresql.org/message-id/83eafbfd-d9c5-6623-2423-7cab1be3888c%40iki.fi
[2] 
https://www.postgresql.org/message-id/2aec6e2a-6a32-0c39-e4e2-aad854543aa8%40iki.fi
[3] https://pmem.io/2018/11/26/bad-blocks.htm

-- 
Takashi Menjo 
NTT Software Innovation Center







Re: error context for vacuum to include block number

2020-01-26 Thread Masahiko Sawada
On Mon, 27 Jan 2020 at 14:38, Justin Pryzby  wrote:
>
> On Sun, Jan 26, 2020 at 12:29:38PM -0800, Andres Freund wrote:
> > > postgres=# SET client_min_messages=debug;SET statement_timeout=99; VACUUM 
> > > (VERBOSE, PARALLEL 0) t;
> > > INFO:  vacuuming "public.t"
> > > DEBUG:  "t_a_idx": vacuuming index
> > > 2020-01-20 15:47:36.338 CST [20139] ERROR:  canceling statement due to 
> > > statement timeout
> > > 2020-01-20 15:47:36.338 CST [20139] CONTEXT:  while vacuuming relation 
> > > "public.t_a_idx"
> > > 2020-01-20 15:47:36.338 CST [20139] STATEMENT:  VACUUM (VERBOSE, PARALLEL 
> > > 0) t;
> > > ERROR:  canceling statement due to statement timeout
> > > CONTEXT:  while vacuuming relation "public.t_a_idx"
> >
> > It'd be a bit nicer if it said index "public.t_a_idx" for relation 
> > "public.t".
>
> I think that tips the scale in favour of making vacrelstats a global.
> I added that as a 1st patch, and squished the callback patches into one.

Hmm I don't think it's a good idea to make vacrelstats global. If we
want to display the relation name and its index name in error context
we might want to define a new struct dedicated for error context
reporting. That is it has blkno, stage and relation name and schema
name for both table and index and then we set these variables of
callback argument before performing a vacuum phase. We don't change
LVRelStats at all.

Although the patch replaces get_namespace_name and
RelationGetRelationName but we use namespace name of relation at only
two places and almost ereport/elog messages use only relation name
gotten by RelationGetRelationName which is a macro to access the
relation name in Relation struct. So I think adding relname to
LVRelStats would not be a big benefit. Similarly, adding table
namespace to LVRelStats would be good to avoid calling
get_namespace_name whereas I'm not sure it's worth to have it because
it's expected not to be really many times.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-01-26 Thread Justin Pryzby
It occured to me that there's an issue with sharing vacrelstats between
scan/vacuum, since blkno and stage are set by the heap/index vacuum routines,
but not reset on their return to heap scan.  Not sure if we should reset them,
or go back to using a separate struct, like it was here:
https://www.postgresql.org/message-id/20200120054159.GT26045%40telsasoft.com

On Sun, Jan 26, 2020 at 11:38:13PM -0600, Justin Pryzby wrote:
> From 592a77554f99b5ff9035c55bf19a79a1443ae59e Mon Sep 17 00:00:00 2001
> From: Justin Pryzby 
> Date: Thu, 12 Dec 2019 20:54:37 -0600
> Subject: [PATCH v14 2/3] vacuum errcontext to show block being processed
> 
> As requested here.
> https://www.postgresql.org/message-id/20190807235154.erbmr4o4bo6vgnjv%40alap3.anarazel.de
> ---
>  src/backend/access/heap/vacuumlazy.c | 85 
> +++-
>  1 file changed, 84 insertions(+), 1 deletion(-)
> 
> diff --git a/src/backend/access/heap/vacuumlazy.c 
> b/src/backend/access/heap/vacuumlazy.c
> index 114428b..a62dc79 100644
> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -290,8 +290,14 @@ typedef struct LVRelStats
>   int num_index_scans;
>   TransactionId latestRemovedXid;
>   boollock_waiter_detected;
> -} LVRelStats;
>  
> + /* Used by the error callback */
> + char*relname;
> + char*relnamespace;
> + BlockNumber blkno;
> + char*indname;
> + int stage;  /* 0: scan heap; 1: vacuum heap; 2: 
> vacuum index */
> +} LVRelStats;
>  
>  /* A few variables that don't seem worth passing around as parameters */
>  static int   elevel = -1;
> @@ -360,6 +366,7 @@ static void end_parallel_vacuum(Relation *Irel, 
> IndexBulkDeleteResult **stats,
>   LVParallelState 
> *lps, int nindexes);
>  static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
>  static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
> +static void vacuum_error_callback(void *arg);
>  
>  
>  /*
> @@ -721,6 +728,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params,
>   PROGRESS_VACUUM_MAX_DEAD_TUPLES
>   };
>   int64   initprog_val[3];
> + ErrorContextCallback errcallback;
>  
>   pg_rusage_init();
>  
> @@ -867,6 +875,17 @@ lazy_scan_heap(Relation onerel, VacuumParams *params,
>   else
>   skipping_blocks = false;
>  
> + /* Setup error traceback support for ereport() */
> + vacrelstats.relnamespace = 
> get_namespace_name(RelationGetNamespace(onerel));
> + vacrelstats.relname = relname;
> + vacrelstats.blkno = InvalidBlockNumber; /* Not known yet */
> + vacrelstats.stage = 0;
> +
> + errcallback.callback = vacuum_error_callback;
> + errcallback.arg = (void *) 
> + errcallback.previous = error_context_stack;
> + error_context_stack = 
> +
>   for (blkno = 0; blkno < nblocks; blkno++)
>   {
>   Buffer  buf;
> @@ -888,6 +907,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params,
>  #define FORCE_CHECK_PAGE() \
>   (blkno == nblocks - 1 && should_attempt_truncation(params))
>  
> + vacrelstats.blkno = blkno;
> +
>   pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, 
> blkno);
>  
>   if (blkno == next_unskippable_block)
> @@ -984,12 +1005,18 @@ lazy_scan_heap(Relation onerel, VacuumParams *params,
>   vmbuffer = InvalidBuffer;
>   }
>  
> + /* Pop the error context stack */
> + error_context_stack = errcallback.previous;
> +
>   /* Work on all the indexes, then the heap */
>   lazy_vacuum_all_indexes(onerel, Irel, indstats,
>   lps, 
> nindexes);
>   /* Remove tuples from heap */
>   lazy_vacuum_heap(onerel);
>  
> + /* Replace error context while continuing heap scan */
> + error_context_stack = 
> +
>   /*
>* Forget the now-vacuumed tuples, and press on, but be 
> careful
>* not to reset latestRemovedXid since we want that 
> value to be
> @@ -1593,6 +1620,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params,
>   RecordPageWithFreeSpace(onerel, blkno, freespace);
>   }
>  
> + /* Pop the error context stack */
> + error_context_stack = errcallback.previous;
> +
>   /* report that everything is scanned and vacuumed */
>   pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
>  
> @@ -1768,11 +1798,24 @@ lazy_vacuum_heap(Relation onerel)
>   int npages;
>   PGRUsageru0;
>   Buffer  

Re: [HACKERS] WAL logging problem in 9.4.3?

2020-01-26 Thread Kyotaro Horiguchi
By the way, the previous version looks somewhat different from what I
thought I posted..

At Sun, 26 Jan 2020 20:57:00 -0800, Noah Misch  wrote in 
> On Mon, Jan 27, 2020 at 01:44:13PM +0900, Kyotaro Horiguchi wrote:
> > > The purpose of this loop is to create relcache entries for rels locked in 
> > > the
> > > current transaction.  (The "r == NULL" case happens for rels no longer 
> > > visible
> > > in catalogs.  It is harmless.)  Since RelationIdGetRelationCache() never
> > > creates a relcache entry, calling it defeats that purpose.
> > > RelationIdGetRelation() is the right function to call.
> > 
> > I thought that the all required entry exist in the cache but actually
> > it's safer that recreate dropped caches. Does the following works?
> > 
> > r = RelationIdGetRelation(relid);
> > +   /* if not found, fetch a "dropped" entry if any  */
> > +   if (r == NULL)
> > +   r = RelationIdGetRelationCache(relid);
> > if (r == NULL)
> > continue;
> 
> That does not materially change the function's behavior.  Notice that the
> function does one thing with "r", which is to call RelationClose(r).  The
> function calls RelationIdGetRelation() for its side effects, not for its
> return value.

..Right.  The following loop accesses relcache hash directly and no
need for storing returned r to the array rels..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Setting min/max TLS protocol in clientside libpq

2020-01-26 Thread Michael Paquier
On Fri, Jan 24, 2020 at 12:19:31PM +0100, Daniel Gustafsson wrote:
> Attached is a v5 of the patch which hopefully address all the comments raised,
> sorry for the delay.

Thanks for the new version.

psql: error: could not connect to server: invalid or unsupported
maximum protocol version specified: TLSv1.3
Running the regression tests with OpenSSL 1.0.1, 1.0.2 or 1.1.0 fails,
because TLSv1.3 (TLS1_3_VERSION) is not supported in those versions.
I think that it is better to just rely on TLSv1.2 for all that,
knowing that the server default for the minimum bound is v1.2.

+test_connect_fails(
+   $common_connstr,
+   "sslrootcert=ssl/root+server_ca.crt sslmode=require
sslminprotocolversion=TLSv1.3 sslmaxprotocolversion=tlsv1.2",
+   qr/invalid protocol version range/,
+   "connect with an incorrect range of TLS protocol versions
leaving no versions allowed");
+
+test_connect_fails(
+   $common_connstr,
+   "sslrootcert=ssl/root+server_ca.crt sslmode=require
sslminprotocolversion=TLSv1.3 sslmaxprotocolversion=tlsv1",
+   qr/invalid protocol version range/,
+   "connect with an incorrect range of TLS protocol versions
leaving no versions allowed");
This is testing twice pattern the same thing, and I am not sure if is
is worth bothering about the special case with TLSv1 (using just a
comparison with pg_strcasecmp you don't actually need those special
checks..).

Tests should make sure that a failure happens when an incorrect value
is set for sslminprotocolversion and sslmaxprotocolversion.

For readability, I think that it is better to consider NULL or empty
values for the parameters to be valid.  They are actually valid
values, because they just get ignored when creating the connection.

Adding an assertion within the routine for the protocol range check to
make sure that the values are valid makes the code more robust.

+   {"sslminprotocolversion", "PGSSLMINPROTOCOLVERSION", NULL,
NULL,
+   "SSL-Minimum-Protocol-Version", "",  /*
sizeof("TLSv1.x") */ 7,
+   offsetof(struct pg_conn, sslminprotocolversion)},
+
+   {"sslmaxprotocolversion", "PGSSLMAXPROTOCOLVERSION", NULL,
NULL,
+   "SSL-Maximum-Protocol-Version", "", /*
sizeof("TLSv1.x") */ 7,
Missing a zero-terminator in the count here.  And actually
gssencmode is wrong as well..  I'll report that on a different
thread.

+# Test min/mix protocol versions
Typo here.

+bool
+pq_verify_ssl_protocol_option(const char *protocolversion)
[...]
+bool
+pq_verify_ssl_protocol_range(const char *min, const char *max)
Both routines are just used in fe-connect.c to validate the connection
parameters, so it is better to keep them static and in fe-connect.c in
my opinion.

+   if (*(min + strlen("TLSv1.")) > *(max + strlen("TLSv1.")))
+   return false;
It is enough to use pg_strcasecmp() here.

Hm.  I am not sure that having a separate section "Client Protocol
Usage" brings much, so I have removed this one, and added an extra
sentence for the maximum protocol regarding its value for testing or
protocol compatibility.

The regression tests of postgres_fdw failed because of the new
parameters.  One update later, they run fine.

So, attached is an updated version of the patch that I have spent a
couple of hours polishing.  What do you think?
--
Michael
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 0cc59f1be1..987ab660cb 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -1274,6 +1274,9 @@ X509_NAME_to_cstring(X509_NAME *name)
  * version, then we log with the given loglevel and return (if we return) -1.
  * If a nonnegative value is returned, subsequent code can assume it's working
  * with a supported version.
+ *
+ * Note: this is rather similar to libpq's routine in fe-secure-openssl.c,
+ * so make sure to update both routines if changing this one.
  */
 static int
 ssl_protocol_version_to_openssl(int v, const char *guc_name, int loglevel)
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 80b54bc92b..8498f32f8d 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -320,6 +320,14 @@ static const internalPQconninfoOption PQconninfoOptions[] = {
 		"Require-Peer", "", 10,
 	offsetof(struct pg_conn, requirepeer)},
 
+	{"sslminprotocolversion", "PGSSLMINPROTOCOLVERSION", NULL, NULL,
+		"SSL-Minimum-Protocol-Version", "", 8,	/* sizeof("TLSv1.x") == 8 */
+	offsetof(struct pg_conn, sslminprotocolversion)},
+
+	{"sslmaxprotocolversion", "PGSSLMAXPROTOCOLVERSION", NULL, NULL,
+		"SSL-Maximum-Protocol-Version", "", 8,	/* sizeof("TLSv1.x") == 8 */
+	offsetof(struct pg_conn, sslmaxprotocolversion)},
+
 	/*
 	 * As with SSL, all GSS options are exposed even in builds that don't have
 	 * support.
@@ -426,6 +434,8 @@ static char *passwordFromFile(const char *hostname, const char *port, const char
 			  const char *username, const 

Re: error context for vacuum to include block number

2020-01-26 Thread Justin Pryzby
On Sun, Jan 26, 2020 at 12:29:38PM -0800, Andres Freund wrote:
> > postgres=# SET client_min_messages=debug;SET statement_timeout=99; VACUUM 
> > (VERBOSE, PARALLEL 0) t;
> > INFO:  vacuuming "public.t"
> > DEBUG:  "t_a_idx": vacuuming index
> > 2020-01-20 15:47:36.338 CST [20139] ERROR:  canceling statement due to 
> > statement timeout
> > 2020-01-20 15:47:36.338 CST [20139] CONTEXT:  while vacuuming relation 
> > "public.t_a_idx"
> > 2020-01-20 15:47:36.338 CST [20139] STATEMENT:  VACUUM (VERBOSE, PARALLEL 
> > 0) t;
> > ERROR:  canceling statement due to statement timeout
> > CONTEXT:  while vacuuming relation "public.t_a_idx"
> 
> It'd be a bit nicer if it said index "public.t_a_idx" for relation "public.t".

I think that tips the scale in favour of making vacrelstats a global.
I added that as a 1st patch, and squished the callback patches into one.

Also, it seems to me we shouldn't repeat the namespace of the index *and* its
table.  I tried looking for consistency here:

grep -r '\\"%s.%s\\"' --incl='*.c' |grep '\\"%s\\"'
src/backend/commands/cluster.c: (errmsg("clustering 
\"%s.%s\" using index scan on \"%s\"",
src/backend/access/heap/vacuumlazy.c:   errcontext(_("while vacuuming 
index \"%s\" on table \"%s.%s\""),

grep -r 'index \\".* table \\"' --incl='*.c'
src/backend/catalog/index.c:(errmsg("building index 
\"%s\" on table \"%s\" serially",
src/backend/catalog/index.c:
(errmsg_plural("building index \"%s\" on table \"%s\" with request for %d 
parallel worker",
src/backend/catalog/index.c:
   "building index \"%s\" on table \"%s\" with request for %d parallel workers",
src/backend/catalog/catalog.c:   errmsg("index \"%s\" 
does not belong to table \"%s\"",
src/backend/commands/indexcmds.c:   (errmsg("%s %s 
will create implicit index \"%s\" for table \"%s\"",
src/backend/commands/tablecmds.c:errmsg("index 
\"%s\" for table \"%s\" does not exist",
src/backend/commands/tablecmds.c:errmsg("index 
\"%s\" for table \"%s\" does not exist",
src/backend/commands/tablecmds.c:   
 errdetail("The index \"%s\" belongs to a constraint in table \"%s\" but no 
constraint exists for index \"%s\".",
src/backend/commands/cluster.c:  
errmsg("index \"%s\" for table \"%s\" does not exist",
src/backend/parser/parse_utilcmd.c:  
errmsg("index \"%s\" does not belong to table \"%s\"",
>From 8ee9ffc1325118438309ee25e9b33c61cccd022f Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 26 Jan 2020 22:38:10 -0600
Subject: [PATCH v14 1/3] make vacrelstats a global

---
 src/backend/access/heap/vacuumlazy.c | 276 +--
 1 file changed, 136 insertions(+), 140 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8ce5011..114428b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -302,16 +302,17 @@ static MultiXactId MultiXactCutoff;
 
 static BufferAccessStrategy vac_strategy;
 
+LVRelStats vacrelstats = {0};
 
 /* non-export function prototypes */
 static void lazy_scan_heap(Relation onerel, VacuumParams *params,
-		   LVRelStats *vacrelstats, Relation *Irel, int nindexes,
+		   Relation *Irel, int nindexes,
 		   bool aggressive);
-static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
+static void lazy_vacuum_heap(Relation onerel);
 static bool lazy_check_needs_freeze(Buffer buf, bool *hastup);
 static void lazy_vacuum_all_indexes(Relation onerel, Relation *Irel,
 	IndexBulkDeleteResult **stats,
-	LVRelStats *vacrelstats, LVParallelState *lps,
+	LVParallelState *lps,
 	int nindexes);
 static void lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
 			  LVDeadTuples *dead_tuples, double reltuples);
@@ -319,13 +320,11 @@ static void lazy_cleanup_index(Relation indrel,
 			   IndexBulkDeleteResult **stats,
 			   double reltuples, bool estimated_count);
 static int	lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
-			 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
-static bool should_attempt_truncation(VacuumParams *params,
-	  LVRelStats *vacrelstats);
-static void lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats);
-static BlockNumber count_nondeletable_pages(Relation onerel,
-			LVRelStats *vacrelstats);
-static void lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks);
+			 int tupindex, Buffer *vmbuffer);
+static bool should_attempt_truncation(VacuumParams *params);
+static void lazy_truncate_heap(Relation onerel);
+static BlockNumber 

Tid scan increments value of pg_stat_all_tables.seq_scan. (but not seq_tup_read)

2020-01-26 Thread Kasahara Tatsuhito
Hi, I noticed that since PostgreSQL 12, Tid scan increments value of
pg_stat_all_tables.seq_scan. (but not seq_tup_read)

The following is an example.

CREATE TABLE t (c int);
INSERT INTO t SELECT 1;
SET enable_seqscan to off;

(v12 -)
=# EXPLAIN ANALYZE SELECT * FROM t WHERE ctid = '(0,1)';
QUERY PLAN
---
 Tid Scan on t  (cost=0.00..4.01 rows=1 width=4) (actual
time=0.034..0.035 rows=1 loops=1)
   TID Cond: (ctid = '(0,1)'::tid)
 Planning Time: 0.341 ms
 Execution Time: 0.059 ms
(4 rows)

=# SELECT seq_scan, seq_tup_read FROM pg_stat_user_tables WHERE relname = 't';
 seq_scan | seq_tup_read
--+--
1 |0
(1 row)

(- v11)
=# EXPLAIN ANALYZE SELECT * FROM t WHERE ctid = '(0,1)';
QUERY PLAN
---
 Tid Scan on t  (cost=0.00..4.01 rows=1 width=4) (actual
time=0.026..0.027 rows=1 loops=1)
   TID Cond: (ctid = '(0,1)'::tid)
 Planning Time: 1.003 ms
 Execution Time: 0.068 ms
(4 rows)

postgres=# SELECT seq_scan, seq_tup_read FROM pg_stat_user_tables
WHERE relname = 't';
 seq_scan | seq_tup_read
--+--
0 |0
(1 row)


Exactly, this change occurred from following commit.
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=147e3722f7e531f15ba389a4d518efe8cd0bd736)
I think, from this commit, TidListEval() came to call
table_beginscan() , so this increments would be happen.

I'm not sure this change whether intention or not, it can confuse some users.

Best regards,
--
NTT Open Source Software Center
Tatsuhito Kasahara




Re: table partitioning and access privileges

2020-01-26 Thread Amit Langote
Fujii-san,

On Mon, Jan 27, 2020 at 11:19 AM Fujii Masao
 wrote:
> On 2020/01/23 22:14, Fujii Masao wrote:
> > Thanks for updating the patch! Barring any objection,
> > I will commit this fix and backport it to all supported versions.
>
> Attached are the back-port versions of the patches.
>
>
> The patch for master branch separates truncate_check_activity() into two
> functions, but in v11 or before, truncate_check_activity() didn't exist and
> its code was in truncate_check_rel(). So I had to write the back-port
> version
> of the patch for the previous versions and separate truncate_check_rel()
> into three functions, i.e., truncate_check_rel(),
> truncate_check_activity() and
> truncate_check_perms().

Thank you for creating the back-port versions.  I agree with making
the code look similar in all supported branches for the ease of future
maintenance.

Thanks,
Amit




Re: [HACKERS] WAL logging problem in 9.4.3?

2020-01-26 Thread Noah Misch
On Mon, Jan 27, 2020 at 01:44:13PM +0900, Kyotaro Horiguchi wrote:
> At Sun, 26 Jan 2020 20:22:01 -0800, Noah Misch  wrote in 
> > Diffing the two latest versions of one patch:
> > > --- v32-0002-Fix-the-defect-1.patch   2020-01-18 14:32:47.499129940 
> > > -0800
> > > +++ v33-0002-Fix-the-defect-1.patch   2020-01-26 16:23:52.846391035 
> > > -0800
> > > +@@ -2978,8 +3054,8 @@ AssertPendingSyncs_RelationCache(void)
> > > + LOCKTAG_RELATION)
> > > + continue;
> > > + relid = 
> > > ObjectIdGetDatum(locallock->tag.lock.locktag_field2);
> > > +-r = RelationIdGetRelation(relid);
> > > +-if (r == NULL)
> > > ++r = RelationIdGetRelationCache(relid);
> > 
> > The purpose of this loop is to create relcache entries for rels locked in 
> > the
> > current transaction.  (The "r == NULL" case happens for rels no longer 
> > visible
> > in catalogs.  It is harmless.)  Since RelationIdGetRelationCache() never
> > creates a relcache entry, calling it defeats that purpose.
> > RelationIdGetRelation() is the right function to call.
> 
> I thought that the all required entry exist in the cache but actually
> it's safer that recreate dropped caches. Does the following works?
> 
>   r = RelationIdGetRelation(relid);
> +   /* if not found, fetch a "dropped" entry if any  */
> + if (r == NULL)
> + r = RelationIdGetRelationCache(relid);
>   if (r == NULL)
>   continue;

That does not materially change the function's behavior.  Notice that the
function does one thing with "r", which is to call RelationClose(r).  The
function calls RelationIdGetRelation() for its side effects, not for its
return value.




Re: [HACKERS] WAL logging problem in 9.4.3?

2020-01-26 Thread Kyotaro Horiguchi
Thanks!

At Sun, 26 Jan 2020 20:22:01 -0800, Noah Misch  wrote in 
> Diffing the two latest versions of one patch:
> > --- v32-0002-Fix-the-defect-1.patch 2020-01-18 14:32:47.499129940 -0800
> > +++ v33-0002-Fix-the-defect-1.patch 2020-01-26 16:23:52.846391035 -0800
> > +@@ -2978,8 +3054,8 @@ AssertPendingSyncs_RelationCache(void)
> > +   LOCKTAG_RELATION)
> > +   continue;
> > +   relid = ObjectIdGetDatum(locallock->tag.lock.locktag_field2);
> > +-  r = RelationIdGetRelation(relid);
> > +-  if (r == NULL)
> > ++  r = RelationIdGetRelationCache(relid);
> 
> The purpose of this loop is to create relcache entries for rels locked in the
> current transaction.  (The "r == NULL" case happens for rels no longer visible
> in catalogs.  It is harmless.)  Since RelationIdGetRelationCache() never
> creates a relcache entry, calling it defeats that purpose.
> RelationIdGetRelation() is the right function to call.

I thought that the all required entry exist in the cache but actually
it's safer that recreate dropped caches. Does the following works?

r = RelationIdGetRelation(relid);
+   /* if not found, fetch a "dropped" entry if any  */
+   if (r == NULL)
+   r = RelationIdGetRelationCache(relid);
if (r == NULL)
continue;

> On Tue, Jan 21, 2020 at 06:45:57PM +0900, Kyotaro Horiguchi wrote:
> > Three other fixes not mentined above are made. One is the useless
> > rd_firstRelfilenodeSubid in the condition to dicide whether to
> > preserve or not a relcache entry
> 
> It was not useless.  Test case:
> 
>   create table t (c int);
>   begin;
>   alter table t alter c type bigint;  -- sets rd_firstRelfilenodeSubid
>   savepoint q; drop table t; rollback to q;  -- forgets 
> rd_firstRelfilenodeSubid
>   commit;  -- assertion failure, after 
> s/RelationIdGetRelationCache/RelationIdGetRelation/ discussed above

Mmm? I thought somehow that that relcache entry never be dropped and I
believe I considered that case, of course.  But yes, you're right.

I'll post upated version.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] WAL logging problem in 9.4.3?

2020-01-26 Thread Noah Misch
Diffing the two latest versions of one patch:
> --- v32-0002-Fix-the-defect-1.patch   2020-01-18 14:32:47.499129940 -0800
> +++ v33-0002-Fix-the-defect-1.patch   2020-01-26 16:23:52.846391035 -0800
> +@@ -2978,8 +3054,8 @@ AssertPendingSyncs_RelationCache(void)
> + LOCKTAG_RELATION)
> + continue;
> + relid = ObjectIdGetDatum(locallock->tag.lock.locktag_field2);
> +-r = RelationIdGetRelation(relid);
> +-if (r == NULL)
> ++r = RelationIdGetRelationCache(relid);

The purpose of this loop is to create relcache entries for rels locked in the
current transaction.  (The "r == NULL" case happens for rels no longer visible
in catalogs.  It is harmless.)  Since RelationIdGetRelationCache() never
creates a relcache entry, calling it defeats that purpose.
RelationIdGetRelation() is the right function to call.

On Tue, Jan 21, 2020 at 06:45:57PM +0900, Kyotaro Horiguchi wrote:
> Three other fixes not mentined above are made. One is the useless
> rd_firstRelfilenodeSubid in the condition to dicide whether to
> preserve or not a relcache entry

It was not useless.  Test case:

  create table t (c int);
  begin;
  alter table t alter c type bigint;  -- sets rd_firstRelfilenodeSubid
  savepoint q; drop table t; rollback to q;  -- forgets rd_firstRelfilenodeSubid
  commit;  -- assertion failure, after 
s/RelationIdGetRelationCache/RelationIdGetRelation/ discussed above




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2020-01-26 Thread Kyotaro Horiguchi
Hello.

At Sun, 26 Jan 2020 12:22:03 -0800, Andres Freund  wrote in 
> Hi,

I feel the same on the specific issues brought in upthread.

> On 2020-01-26 16:20:03 +0100, Magnus Hagander wrote:
> > On Sun, Jan 26, 2020 at 1:44 AM Andres Freund  wrote:
> > > On 2020-01-25 15:43:41 +0100, Magnus Hagander wrote:
> > > > On Fri, Jan 24, 2020 at 8:52 PM Andres Freund  
> > > > wrote:
> > > > > Lastly, I don't understand what the point of sending fixed size stats,
> > > > > like the stuff underlying pg_stat_bgwriter, through pgstats IPC. While
> > > > > I don't like it's architecture, we obviously need something like 
> > > > > pgstat
> > > > > to handle variable amounts of stats (database, table level etc
> > > > > stats). But that doesn't at all apply to these types of global stats.
> > > >
> > > > That part has annoyed me as well a few times. +1 for just moving that
> > > > into a global shared memory. Given that we don't really care about
> > > > things being in sync between those different counters *or* if we loose
> > > > a bit of data (which the stats collector is designed to do), we could
> > > > even do that without a lock?
> > >
> > > I don't think we'd quite want to do it without any (single counter)
> > > synchronization - high concurrency setups would be pretty likely to
> > > loose values that way. I suspect the best would be to have a struct in
> > > shared memory that contains the potential counters for each potential
> > > process. And then sum them up when actually wanting the concrete
> > > value. That way we avoid unnecessary contention, in contrast to having a
> > > single shared memory value for each(which would just pingpong between
> > > different sockets and store buffers).  There's a few details like how
> > > exactly to implement resetting the counters, but ...
> > 
> > Right. Each process gets to do their own write, but still in shared
> > memory. But do you need to lock them when reading them (for the
> > summary)? That's the part where I figured you could just read and
> > summarize them, and accept the possible loss.
> 
> Oh, yea, I'd not lock for that. On nearly all machines aligned 64bit
> integers can be read / written without a danger of torn values, and I
> don't think we need perfect cross counter accuracy. To deal with the few
> platforms without 64bit "single copy atomicity", we can just use
> pg_atomic_read/write_u64. These days (e8fdbd58fe) they automatically
> fall back to using locked operations for those platforms.  So I don't
> think there's actually a danger of loss.
> 
> Obviously we could also use atomic ops to increment the value, but I'd
> rather not add all those atomic operations, even if it's on uncontended
> cachelines. It'd allow us to reset the backend values more easily by
> just swapping in a 0, which we can't do if the backend increments
> non-atomically. But I think we could instead just have one global "bias"
> value to implement resets (by subtracting that from the summarized
> value, and storing the current sum when resetting). Or use the new
> global barrier to trigger a reset. Or something similar.

Fixed or global stats are suitable for the startar of shared-memory
stats collector. In the case of buffers_*_write, the global stats
entry for each process needs just 8 bytes plus matbe extra 8 bytes for
the bias value.  I'm not sure how many counters like this there are,
but is such size of footprint acceptatble?  (Each backend already uses
the same amount of local memory for pgstat use, though.)

Anyway I will do something like that as a trial, maybe by adding a
member in PgBackendStatus and one global-shared for the bial value.

  int64   st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
+ PgBackendStatsCounters counters;
  } PgBackendStatus;

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [PATCH] /src/backend/access/transam/xlog.c, tiny improvements

2020-01-26 Thread Mark Dilger



> On Jan 24, 2020, at 12:48 PM, Ranier Vilela  wrote:
> 
> 3. At function KeepLogSeg (line 9357)  the test if (slotSegNo <= 0), the var 
> slotSegNo is uint64 and not can be < 0.

There is something unusual about comparing a XLogSegNo variable in this way, 
but it seems to go back to 2014 when the replication slots were introduced in 
commit 858ec11858a914d4c380971985709b6d6b7dd6fc, and XLogSegNo was unsigned 
then, too.  Depending on how you look at it, this could be a thinko, or it 
could be defensive programming against future changes to the XLogSegNo typedef. 
 I’m betting it was defensive programming, given the context.  As such, I don’t 
think it would be appropriate to remove this defense in your patch.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Add MAIN_RELATION_CLEANUP and SECONDARY_RELATION_CLEANUP options to VACUUM

2020-01-26 Thread Michael Paquier
On Fri, Jan 24, 2020 at 09:31:26PM +, Bossart, Nathan wrote:
> On 1/21/20, 9:02 PM, "Michael Paquier"  wrote:
>> On Tue, Jan 21, 2020 at 09:21:46PM +, Bossart, Nathan wrote:
>>> I've attached a patch for a couple of new options for VACUUM:
>>> MAIN_RELATION_CLEANUP and SECONDARY_RELATION_CLEANUP.  The motive
>>> behind these options is to allow table owners to easily vacuum only
>>> the TOAST table or only the main relation.  This is especially useful
>>> for TOAST tables since roles do not have access to the pg_toast schema
>>> by default and some users may find it difficult to discover the name
>>> of a relation's TOAST table.  Next, I will explain a couple of the
>>> main design decisions.
>>
>> So that's similar to the autovacuum reloptions, but to be able to
>> enforce one policy or another manually.  Any issues with autovacuum
>> not able to keep up the bloat pace and where you need to issue manual
>> VACUUMs in periods of low activity, like nightly VACUUMs?
> 
> There have been a couple of occasions where I have seen the TOAST
> table become the most bloated part of the relation.  When this
> happens, it would be handy to be able to avoid scanning the heap and
> indexes.  I am not aware of any concrete problems with autovacuum
> other than needing to tune the parameters for certain workloads.

That's something I have faced as well.  I have some applications
around here where toast tables were the most bloated, and the
vacuuming of the main relation ate time, putting more pressure on the
vacuuming of the toast relation.  So that's a fair argument in my
opinion.  

>>> I chose to implement MAIN_RELATION_CLEANUP within vacuum_rel() instead
>>> of expand_vacuum_rel()/get_all_vacuum_rels().  This allows us to reuse
>>> most of the existing code with minimal changes, and it avoids adding
>>> complexity to the lookups and ownership checks in expand_vacuum_rel()
>>> and get_all_vacuum_rels() (especially the partition lookup logic).
>>> The main tradeoffs of this approach are that we will still create a
>>> transaction for the main relation and that we will still lock the main
>>> relation.
>>
>> Yeah, likely we should not make things more confusing in this area.
>> This was tricky enough to deal with with the recent VACUUM
>> refactoring for multiple relations.
> 
> Finding a way to avoid the lock on the main relation could be a future
> improvement, as that would allow you to manually vacuum both the main
> relation and its TOAST table in parallel.

I am not sure that we actually need that at all, any catalog changes
take a lock on the parent relation first, and that's the conflicts we
are looking at here with a share update exclusive lock.
--
Michael


signature.asc
Description: PGP signature


RE: [PoC] Non-volatile WAL buffer

2020-01-26 Thread Takashi Menjo
Hello Fabien,

Thank you for your +1 :)

> Is it possible to emulate somthing without the actual hardware, at least
> for testing purposes?

Yes, you can emulate PMEM using DRAM on Linux, via "memmap=nnG!ssG" kernel
parameter.  Please see [1] and [2] for emulation details.  If your emulation
does not work well, please check if the kernel configuration options (like
CONFIG_ FOOBAR) for PMEM and DAX (in [1] and [3]) are set up properly.

Best regards,
Takashi

[1] How to Emulate Persistent Memory Using Dynamic Random-access Memory (DRAM)
 
https://software.intel.com/en-us/articles/how-to-emulate-persistent-memory-on-an-intel-architecture-server
[2] how_to_choose_the_correct_memmap_kernel_parameter_for_pmem_on_your_system
 
https://nvdimm.wiki.kernel.org/how_to_choose_the_correct_memmap_kernel_parameter_for_pmem_on_your_system
[3] Persistent Memory Wiki
 https://nvdimm.wiki.kernel.org/

-- 
Takashi Menjo 
NTT Software Innovation Center







Re: table partitioning and access privileges

2020-01-26 Thread Fujii Masao



On 2020/01/23 22:14, Fujii Masao wrote:



On 2020/01/22 16:54, Amit Langote wrote:

Fujii-san,

Thanks for taking a look.

On Fri, Jan 10, 2020 at 10:29 AM Fujii Masao  
wrote:
On Tue, Jan 7, 2020 at 5:15 PM Amit Langote  
wrote:

I tend to agree that TRUNCATE's permission model for inheritance
should be consistent with that for the other commands.  How about the
attached patch toward that end?


Thanks for the patch!

The patch basically looks good to me.

+GRANT SELECT (f1, fz), UPDATE (fz) ON atestc TO regress_priv_user2;
+REVOKE TRUNCATE ON atestc FROM regress_priv_user2;

These seem not to be necessary for the test.


You're right.  Removed in the attached updated patch.


Thanks for updating the patch! Barring any objection,
I will commit this fix and backport it to all supported versions.


Attached are the back-port versions of the patches.

- patch for master and v12

0001-Don-t-check-child-s-TRUNCATE-privilege-when-truncate-fujii-pg12-13.patch

- patch for v11

0001-Don-t-check-child-s-TRUNCATE-privilege-when-truncate-fujii-pg11.patch

- patch for v10

0001-Don-t-check-child-s-TRUNCATE-privilege-when-truncate-fujii-pg10.patch

- patch for v9.6

0001-Don-t-check-child-s-TRUNCATE-privilege-when-truncate-fujii-pg96.patch

- patch for v9.5 and v9.4

0001-Don-t-check-child-s-TRUNCATE-privilege-when-truncate-fujii-pg94-95.patch

The patch for master branch separates truncate_check_activity() into two
functions, but in v11 or before, truncate_check_activity() didn't exist and
its code was in truncate_check_rel(). So I had to write the back-port 
version

of the patch for the previous versions and separate truncate_check_rel()
into three functions, i.e., truncate_check_rel(), 
truncate_check_activity() and

truncate_check_perms().

Also the names of users that the regression test for privileges use were
different between PostgreSQL versions. This is another reason
why I had to write several back-port versions of the patches.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 1a005760d8..204d25aeb3 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -291,7 +291,9 @@ struct DropRelationCallbackState
 #define child_dependency_type(child_is_partition)  \
((child_is_partition) ? DEPENDENCY_AUTO : DEPENDENCY_NORMAL)
 
-static void truncate_check_rel(Relation rel);
+static void truncate_check_rel(Oid relid, Form_pg_class reltuple);
+static void truncate_check_perms(Oid relid, Form_pg_class reltuple);
+static void truncate_check_activity(Relation rel);
 static List *MergeAttributes(List *schema, List *supers, char relpersistence,
bool is_partition, List **supOids, List 
**supconstr,
int *supOidCount);
@@ -1242,7 +1244,11 @@ ExecuteTruncate(TruncateStmt *stmt)
heap_close(rel, lockmode);
continue;
}
-   truncate_check_rel(rel);
+
+   truncate_check_rel(myrelid, rel->rd_rel);
+   truncate_check_perms(myrelid, rel->rd_rel);
+   truncate_check_activity(rel);
+
rels = lappend(rels, rel);
relids = lappend_oid(relids, myrelid);
 
@@ -1278,7 +1284,15 @@ ExecuteTruncate(TruncateStmt *stmt)
continue;
}
 
-   truncate_check_rel(rel);
+   /*
+* Inherited TRUNCATE commands perform access
+* permission checks on the parent table only.
+* So we skip checking the children's 
permissions
+* and don't call truncate_check_perms() here.
+*/
+   truncate_check_rel(RelationGetRelid(rel), 
rel->rd_rel);
+   truncate_check_activity(rel);
+
rels = lappend(rels, rel);
relids = lappend_oid(relids, childrelid);
}
@@ -1317,7 +1331,9 @@ ExecuteTruncate(TruncateStmt *stmt)
ereport(NOTICE,
(errmsg("truncate cascades to 
table \"%s\"",

RelationGetRelationName(rel;
-   truncate_check_rel(rel);
+   truncate_check_rel(relid, rel->rd_rel);
+   truncate_check_perms(relid, rel->rd_rel);
+   truncate_check_activity(rel);
rels = lappend(rels, rel);
relids = lappend_oid(relids, relid);
  

Re: [PATCH] Resolve Parallel Hash Join Performance Issue

2020-01-26 Thread Thomas Munro
On Tue, Jan 21, 2020 at 6:20 PM Thomas Munro  wrote:
> On Fri, Jan 10, 2020 at 1:52 PM Deng, Gang  wrote:
> > Thank you for the comment. Yes, I agree the alternative of using 
> > '(!parallel)', so that no need to test the bit. Will someone submit patch 
> > to for it accordingly?
>
> Here's a patch like that.

Pushed.  Thanks again for the report!

I didn't try the TPC-DS query, but could see a small improvement from
this on various simple queries, especially with a fairly small hash
table and a large outer relation, when many cores are probing.

(Off topic for this thread, but after burning a few hours on a 72-way
box investigating various things including this, I was reminded of the
performance drop-off for joins with large hash tables that happens
somewhere around 8-16 workers.  That's because we can't give 32KB
chunks out fast enough, and if you increase the chunk size it helps
only a bit.  That really needs some work; maybe something like a
separation of reservation and allocation, so that multiple segments
can be created in parallel while respecting limits, or something like
that.  The other thing I was reminded of: FreeBSD blows Linux out of
the water on big parallel hash joins on identical hardware; I didn't
dig further today but I suspect this may be down to lack of huge pages
(TLB misses), and perhaps also those pesky fallocate() calls.  I'm
starting to wonder if we should have a new GUC shared_work_mem that
reserves a wodge of shm in the main region, and hand out 'fast DSM
segments' from there, or some other higher level abstraction that's
wired into the resource release system; they would benefit from
huge_pages=try on Linux, they'd be entirely allocated (in the VM
sense) and there'd be no system calls, though admittedly there'd be
more ways for things to go wrong...)




Re: making the backend's json parser work in frontend code

2020-01-26 Thread Mark Dilger


> On Jan 26, 2020, at 5:51 PM, Andrew Dunstan  
> wrote:
> 
>>> 0007 adds testing.
>>> 
>>> I would appreciate somebody looking at the portability issues for 0007
>>> on Windows.
>>> 
>> 
>> We'll need at a minimum something added to src/tools/msvc to build the
>> test program, maybe some other stuff too. I'll take a look.
> 
> 
> Patch complains that the 0007 patch is malformed:
> 
> andrew@ariana:pg_head (master)*$ patch -p 1 <
> ~/Downloads/v4-0007-Adding-frontend-tests-for-json-parser.patch
> patching file src/Makefile
> patching file src/test/Makefile
> patching file src/test/bin/.gitignore
> patching file src/test/bin/Makefile
> patching file src/test/bin/README
> patching file src/test/bin/t/001_test_json.pl
> patch:  malformed patch at line 201: diff --git
> a/src/test/bin/test_json.c b/src/test/bin/test_json.c

I manually removed a stray newline in the patch file.  I shouldn’t have done 
that.  I’ve removed the stray newline in the sources, committed (with git 
commit —amend) and am testing again, which is what I should have done the first 
time….

Ok, the tests pass.  Here are those two patches again, both regenerated with a 
fresh invocation of ‘git format-patch’.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




v5-0006-Relocating-jsonapi-to-common.patch
Description: Binary data


v5-0007-Adding-frontend-tests-for-json-parser.patch
Description: Binary data


Re: [PATCH] Windows port, fix some resources leaks

2020-01-26 Thread Michael Paquier
On Fri, Jan 24, 2020 at 09:37:25AM -0300, Ranier Vilela wrote:
> Em sex., 24 de jan. de 2020 às 04:13, Michael Paquier 
> escreveu:
>> There is some progress.  You should be careful about your patches,
>> as they generate compiler warnings.  Here is one quote from gcc-9:
>> logging.c:87:13: warning: passing argument 1 of ‘free’ discards
>> ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
>>87 |free(sgr_warning);
>
> Well, in this cases, the solution is cast.
> free((char *) sgr_warning);

Applying blindly a cast is never a good practice.

>> if (strcmp(name, "error") == 0)
>> +   {
>> +   free(sgr_error);
>> sgr_error = strdup(value);
>> +   }
>> I don't see the point of doing that in logging.c.  pg_logging_init()
>> is called only once per tools, so this cannot happen.  Another point
>> that may matter here though is that we do not complain about OOMs.
>> That's really unlikely to happen, and if it happens it leads to
>> partially colored output.
>
> Coverity show the alert, because he tries all the possibilites.Is
> inside a loop.  It seems to me that the only way to happen is by the
> user, by introducing a repeated and wrong sequence.

Again, Coverity may say something that does not apply to the reality,
and sometimes it misses some spots.  Here we should be looking at
query patterns which involve a memory leak.  So I'd rather look at
that separately, and actually on a separate thread because that's not
a Windows-only code path.  If you'd look at the rest of the regex
code, I suspect that there could a couple of ramifications which have
similar problems (I haven't looked at that myself).

>> For those two ones, it looks that you are right.  However, I think
>> that it would be safer to check if Advapi32Handle is NULL for both.
>
> Michael, I did it differently and modified the function to not need to test
> NULL, I think it was better.

advapi32.dll should be present in any modern Windows platform, so
logging an error is actually fine by me instead of a warning.

I have shaved from the patch the parts which are not completely
relevant to this thread, and committed a version addressing the most
obvious leaks after doing more tests, including the changes for
restricted_token.c as of 10a5252.  Thanks.
--
Michael


signature.asc
Description: PGP signature


Re: making the backend's json parser work in frontend code

2020-01-26 Thread Andrew Dunstan
> > 0007 adds testing.
> >
> > I would appreciate somebody looking at the portability issues for 0007
> > on Windows.
> >
>
> We'll need at a minimum something added to src/tools/msvc to build the
> test program, maybe some other stuff too. I'll take a look.


Patch complains that the 0007 patch is malformed:

andrew@ariana:pg_head (master)*$ patch -p 1 <
~/Downloads/v4-0007-Adding-frontend-tests-for-json-parser.patch
patching file src/Makefile
patching file src/test/Makefile
patching file src/test/bin/.gitignore
patching file src/test/bin/Makefile
patching file src/test/bin/README
patching file src/test/bin/t/001_test_json.pl
patch:  malformed patch at line 201: diff --git
a/src/test/bin/test_json.c b/src/test/bin/test_json.c


cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: EXPLAIN's handling of output-a-field-or-not decisions

2020-01-26 Thread Andres Freund
Hi,

On 2020-01-26 17:53:09 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > I've previously wondered about adding a REGRESS option to EXPLAIN would
> > not actually be a good one, so we can move the magic into that, rather
> > than options that are also otherwise relevant.
> 
> I'd be inclined to think about a GUC actually.
> force_parallel_mode = regress is sort of precedent for that,
> and we do already have the infrastructure needed to force a
> database-level GUC setting for regression databases.

Yea, a GUC could work too. What would it do exactly? Hide COSTS, TIMING,
JIT, unless explicitly turned on in the EXPLAIN? And perhaps also
"redact" a few things that we currently manually have to filter out?
And then we'd leave the implicit JIT to on, to allow users to see where
time is spent?

Or were you thinking of something different entirely?


> I can see some advantages to making it an explicit EXPLAIN option
> instead --- but unless we wanted to back-patch it, it'd be a real
> pain in the rear for back-patching regression test cases.

Hm. Would it really be harder? I'd expect that we'd end up writing tests
in master that need additional options to be usable in the back
branches. Seems we'd definitely need to backpatch the GUC?

Greetings,

Andres Freund




Re: making the backend's json parser work in frontend code

2020-01-26 Thread Mark Dilger



> On Jan 26, 2020, at 5:09 PM, Andrew Dunstan  
> wrote:
> 
> We'll need at a minimum something added to src/tools/msvc to build the
> test program, maybe some other stuff too. I'll take a look.

Thanks, much appreciated.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: making the backend's json parser work in frontend code

2020-01-26 Thread Andrew Dunstan
On Mon, Jan 27, 2020 at 5:54 AM Mark Dilger
 wrote:
>
>
>
> > On Jan 22, 2020, at 10:53 AM, Robert Haas  wrote:
> >
> > 0004 is a substantially cleaned up version of the patch to make the
> > JSON parser return a result code rather than throwing errors. Names
> > have been fixed, interfaces have been tidied up, and the thing is
> > better integrated with the surrounding code. I would really like
> > comments, if anyone has them, on whether this approach is acceptable.
> >
> > 0005 builds on 0004 by moving three functions from jsonapi.c to
> > jsonfuncs.c. With that done, jsonapi.c has minimal remaining
> > dependencies on the backend environment. It would still need a
> > substitute for elog(ERROR, "some internal thing is broken"); I'm
> > thinking of using pg_log_fatal() for that case. It would also need a
> > fix for the problem that pg_mblen() is not available in the front-end
> > environment. I don't know what to do about that yet exactly, but it
> > doesn't seem unsolvable. The frontend environment just needs to know
> > which encoding to use, and needs a way to call PQmblen() rather than
> > pg_mblen().
>
> I have completed the work in the attached 0006 and 0007 patches.
> These are intended to apply after your 0004 and 0005; they won’t
> work directly on master which, as of this writing, only contains your
> 0001-0003 patches.
>
> 0006 finishes moving the json parser to src/include/common and src/common.
>
> 0007 adds testing.
>
> I would appreciate somebody looking at the portability issues for 0007
> on Windows.
>

We'll need at a minimum something added to src/tools/msvc to build the
test program, maybe some other stuff too. I'll take a look.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: making the backend's json parser work in frontend code

2020-01-26 Thread Andrew Dunstan
On Sat, Jan 25, 2020 at 6:20 AM Mark Dilger
 wrote:
>
>
>
> > On Jan 24, 2020, at 10:43 AM, Robert Haas  wrote:
> >
> > Since 0001-0003 have been reviewed by multiple people and nobody's
> > objected, I have committed those.
>
> I think 0004-0005 have been reviewed and accepted by both me and Andrew, if I 
> understood him correctly:
>
> > I've reviewed these patches and Robert's, and they seem basically good to 
> > me.
>
> Certainly, nothing in those two patches caused me any concern.  I’m going to 
> modify my patches as you suggested, get rid of the INSIST macro, and move the 
> pg_wchar changes to their own thread.  None of that should require changes in 
> your 0004 or 0005.  It won’t bother me if you commit those two.  Andrew?
>


Just reviewed the latest versions of 4 and 5, they look good to me.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Implementing Incremental View Maintenance

2020-01-26 Thread Takuma Hoshiai
On Mon, 20 Jan 2020 16:57:58 +0900
Yugo NAGATA  wrote:

> On Fri, 17 Jan 2020 14:10:32 -0700 (MST)
> legrand legrand  wrote:
> 
> > Hello,
> > 
> > It seems that patch v11 doesn't apply any more.
> > Problem with "scanRTEForColumn" maybe because of change:
> 
> Thank you for your reporting! We will fix this in the next update. 

Although I have been working conflict fix and merge latest master, it
takes a little longer, because it has large impact than we thought. 

Please wait a little more.

Regards
Takuma Hoshiai


> Regards,
> Yugo Nagata
> 
> > 
> > https://git.postgresql.org/pg/commitdiff/b541e9accb28c90656388a3f827ca3a68dd2a308
> > 
> > Regards
> > PAscal
> > 
> > 
> > 
> > --
> > Sent from: 
> > https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
> > 
> > 
> 
> 
> -- 
> Yugo NAGATA 
> 
> 
> 


-- 
Takuma Hoshiai 





Re: Parallel leader process info in EXPLAIN

2020-01-26 Thread Thomas Munro
On Mon, Jan 27, 2020 at 11:49 AM Tom Lane  wrote:
> I've occasionally wondered whether we'd be better off presenting
> this info as if the leader were "worker 0" and then the N workers
> are workers 1 to N.  I've not worked out the implications of that
> in any detail though.  It's fairly easy to see what to do for
> fields that can be aggregated (the numbers printed for the node
> as a whole are totals), but it doesn't help us any with something
> like Sort Method.

Yeah, in the 0001 patch (which no longer applies and probably just
needs to be rewritten now), I used "Leader:" in the text format, but
worker number -1 in the structured formats, which I expected some
blowback on.  I also thought about adding one to all the numbers as
you suggest.

In PHJ I had a related problem: I had to +1 the worker number to get a
zero-based "participant number" so that the leader would have a slot
in various data structures, and I wondered if we shouldn't just do
that to the whole system (eg not just in explain's output or in
localised bits of PHJ code).

> On a narrower note, I'm not at all happy with the fact that 0001
> adds yet another field to *every* PlanState.  I think this is
> doubling down on a fundamentally wrong decision to have
> ExecParallelRetrieveInstrumentation do some aggregation immediately.
> I think we should abandon that and just say that it returns the raw
> leader and per-worker data, and then explain.c can aggregate as it
> wishes.

Fair point.  I will look into that.




Re: Duplicate Workers entries in some EXPLAIN plans

2020-01-26 Thread Tom Lane
I wrote:
> Andres Freund  writes:
>> I wonder if we could introduce a debug GUC that makes parallel worker
>> acquisition just retry in a loop, for a time determined by the GUC. That
>> obviously would be a bad idea to do in a production setup, but it could
>> be good enough for regression tests?  There are some deadlock dangers,
>> but I'm not sure they really matter for the tests.

> Hmmm  might work.  Seems like a better idea than "run it by itself"
> as we have to do now.

The more I think about this, the more it seems like a good idea, and
not only for regression test purposes.  If you're about to launch a
query that will run for hours even with the max number of workers,
you don't want it to launch with less than that number just because
somebody else was eating a worker slot for a few milliseconds.

So I'm imagining a somewhat general-purpose GUC defined like
"max_delay_to_acquire_parallel_worker", measured say in milliseconds.
The default would be zero (current behavior: try once and give up),
but you could set it to small positive values if you have that kind
of production concern, while the regression tests could set it to big
positive values.  This would alleviate all sorts of problems we have
with not being able to assume stable results from parallel worker
acquisition in the tests.

regards, tom lane




Re: EXPLAIN's handling of output-a-field-or-not decisions

2020-01-26 Thread Tom Lane
Andres Freund  writes:
> On 2020-01-26 16:54:58 -0500, Tom Lane wrote:
>> ... how are you going to square that desire with not breaking the
>> regression tests?

> Well, that's how we arrived at turning off JIT information when COSTS
> OFF, because that's already something all the EXPLAINs in the regression
> tests have to do. I do not want to regress from the current state, with
> regard to both regression tests, and seeing at least a top-line time in
> the normal EXPLAIN ANALYZE cases.

Right, but that's still just a hack.

> I've previously wondered about adding a REGRESS option to EXPLAIN would
> not actually be a good one, so we can move the magic into that, rather
> than options that are also otherwise relevant.

I'd be inclined to think about a GUC actually.
force_parallel_mode = regress is sort of precedent for that,
and we do already have the infrastructure needed to force a
database-level GUC setting for regression databases.

I can see some advantages to making it an explicit EXPLAIN option
instead --- but unless we wanted to back-patch it, it'd be a real
pain in the rear for back-patching regression test cases.

regards, tom lane




Delaying/avoiding BTreeTupleGetNAtts() call within _bt_compare()

2020-01-26 Thread Peter Geoghegan
Andres and I discussed bottlenecks in the nbtree code during the
recent PgDay SF community event. Andres observed that the call to
BTreeTupleGetNAtts() in _bt_compare() can become somewhat of a
bottleneck. I came up with the very simple attached POC-quality
patches, which Andres tested and profiled with his original complaint
in mind yesterday. He reported increased throughput on a memory
bound simple pgbench SELECT-only workload.

He reported that the problem went away with the patches applied. The
following pgbench SELECT-only result was sent to me privately:

before:
single: tps = 30300.202363 (excluding connections establishing)
all cores:  tps = 1026853.447047 (excluding connections establishing)

after:
single: tps = 31120.452919 (excluding connections establishing)
all cores:  tps = 1032376.361006 (excluding connections establishing)

(Actually, he tested something slightly different -- he inlined
_bt_compare() in his own way instead of using my 0002-*, and didn't
use the 0003-* optimization at all.)

Apparently this was a large multi-socket machine. Those are hard to
come by.

The main idea here is to make _bt_compare() delay
calling BTreeTupleGetNAtts() until the point after the first attribute
turns out to be equal to the _bt_compare() caller's insertion scankey.
Many or most calls to _bt_compare() won't even need to call
BTreeTupleGetNAtts().

This relies on the assumption that any tuple must have at least one
untruncated suffix column in the _bt_compare() loop. It doesn't matter
whether it's a pivot or non-pivot tuple -- the representation of the
first column will be exactly the same.

The negative infinity item on an internal page always has zero
attributes, which might seem like a snag. However, we already treat
that as a special case within _bt_compare(), for historical reasons
(pg_upgrade'd indexes won't have the number of attributes explicitly
set to zero in some cases).

Another separate though related idea in 0003-* is to remove the
negative infinity check. It goes from _bt_compare() to _bt_binsrch(),
since it isn't something that we need to consider more than once per
page -- and only on internal pages. That way, _bt_compare() doesn't
have to look at the page special area to check if it's a leaf page or
an internal page at all. I haven't really profiled this just yet. This is
one of those patches where 95%+ of the work is profiling and
benchmarking.

Andres and I both agree that there is a lot more work to be done in
this area, but that will be a major undertaking. I am quite keen on
the idea of repurposing the redundant-to-nbtree ItemId.lp_len field to
store an abbreviated key. Making that work well is a considerable
undertaking, since you need to use prefix compression to get a high
entropy abbreviated key. It would probably take me the best part of a
whole release cycle to write such a patch. The attached patches get
us a relatively easy win in the short term, though.

Thoughts?
--
Peter Geoghegan


v1-0001-Avoid-calling-BTreeTupleGetNAtts-in-_bt_compare.patch
Description: Binary data


v1-0003-Remove-negative-infinity-check-from-_bt_compare.patch
Description: Binary data


v1-0002-Inline-_bt_compare.patch
Description: Binary data


Re: Parallel leader process info in EXPLAIN

2020-01-26 Thread Tom Lane
Thomas Munro  writes:
> I think I'm going to abandon 0002 for now, because that stuff is being
> refactored independently over here, so rebasing would be futile:
> https://www.postgresql.org/message-id/flat/CAOtHd0AvAA8CLB9Xz0wnxu1U%3DzJCKrr1r4QwwXi_kcQsHDVU%3DQ%40mail.gmail.com

Yeah, your 0002 needs some rethinking.  I kind of like the proposed
change in the text-format output:

  Workers Launched: 4
  ->  Sort (actual rows=2000 loops=15)
Sort Key: tenk1.ten
-   Sort Method: quicksort  Memory: xxx
+   Leader:  Sort Method: quicksort  Memory: xxx
Worker 0:  Sort Method: quicksort  Memory: xxx
Worker 1:  Sort Method: quicksort  Memory: xxx
Worker 2:  Sort Method: quicksort  Memory: xxx

but it's quite unclear to me how that translates into non-text
formats, especially if we're not to break invariants about which
fields are present in a non-text output structure (cf [1]).

I've occasionally wondered whether we'd be better off presenting
this info as if the leader were "worker 0" and then the N workers
are workers 1 to N.  I've not worked out the implications of that
in any detail though.  It's fairly easy to see what to do for
fields that can be aggregated (the numbers printed for the node
as a whole are totals), but it doesn't help us any with something
like Sort Method.

On a narrower note, I'm not at all happy with the fact that 0001
adds yet another field to *every* PlanState.  I think this is
doubling down on a fundamentally wrong decision to have
ExecParallelRetrieveInstrumentation do some aggregation immediately.
I think we should abandon that and just say that it returns the raw
leader and per-worker data, and then explain.c can aggregate as it
wishes.

regards, tom lane

[1] https://www.postgresql.org/message-id/19416.1580069629%40sss.pgh.pa.us




Re: EXPLAIN's handling of output-a-field-or-not decisions

2020-01-26 Thread Andres Freund
Hi,

On 2020-01-26 16:54:58 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2020-01-26 15:13:49 -0500, Tom Lane wrote:
> >> The other offender is the JIT stuff: it prints if COSTS is on and
> >> there's some JIT activity to report, and otherwise you get nothing.
> >> This is OK for text mode but it's bogus for the other formats.
> >> Since we just rearranged EXPLAIN's JIT output anyway, now seems like
> >> a good time to fix it.
> 
> > No objection. I think the current choice really is just about hiding JIT
> > information in the cases where we display explain output in the
> > tests. That output can't change depending on the build environment and
> > settings (it's e.g. hugely useful to force all queries to be JITed for
> > coverage).
> 
> Right, but then ...
> 
> > One concern I do have is that I think we need the overall time for JIT
> > to be displayed regardless of the JIT option.
> 
> ... how are you going to square that desire with not breaking the
> regression tests?

Well, that's how we arrived at turning off JIT information when COSTS
OFF, because that's already something all the EXPLAINs in the regression
tests have to do. I do not want to regress from the current state, with
regard to both regression tests, and seeing at least a top-line time in
the normal EXPLAIN ANALYZE cases.

I've previously wondered about adding a REGRESS option to EXPLAIN would
not actually be a good one, so we can move the magic into that, rather
than options that are also otherwise relevant.

Greetings,

Andres Freund




Re: EXPLAIN's handling of output-a-field-or-not decisions

2020-01-26 Thread Tom Lane
Andres Freund  writes:
> On 2020-01-26 15:13:49 -0500, Tom Lane wrote:
>> The other offender is the JIT stuff: it prints if COSTS is on and
>> there's some JIT activity to report, and otherwise you get nothing.
>> This is OK for text mode but it's bogus for the other formats.
>> Since we just rearranged EXPLAIN's JIT output anyway, now seems like
>> a good time to fix it.

> No objection. I think the current choice really is just about hiding JIT
> information in the cases where we display explain output in the
> tests. That output can't change depending on the build environment and
> settings (it's e.g. hugely useful to force all queries to be JITed for
> coverage).

Right, but then ...

> One concern I do have is that I think we need the overall time for JIT
> to be displayed regardless of the JIT option.

... how are you going to square that desire with not breaking the
regression tests?

>> Another debatable question is whether to print anything in non-JIT
>> builds.

> Hm. I don't think I have an opinion on this. I can see an argument
> either way.

After a bit more thought I'm leaning to "print nothing", since as you say
tools would have to cope with that anyway if they want to work with old
releases.  Also, while it's not that hard to print dummy data for JIT
even in a non-JIT build, I can imagine some future feature where it
*would* be hard.  So setting a precedent that we must provide dummy
output for unimplemented features could come back to bite us.

regards, tom lane




Re: EXPLAIN's handling of output-a-field-or-not decisions

2020-01-26 Thread Andres Freund
Hi,

On 2020-01-26 15:13:49 -0500, Tom Lane wrote:
> The other offender is the JIT stuff: it prints if COSTS is on and
> there's some JIT activity to report, and otherwise you get nothing.
> This is OK for text mode but it's bogus for the other formats.
> Since we just rearranged EXPLAIN's JIT output anyway, now seems like
> a good time to fix it.

No objection. I think the current choice really is just about hiding JIT
information in the cases where we display explain output in the
tests. That output can't change depending on the build environment and
settings (it's e.g. hugely useful to force all queries to be JITed for
coverage).

We did discuss adding a JIT option in 11, but it wasn't clear it'd be
useful at that time (hence the "Might want to separate that out from
COSTS at a later stage." comment ExplainOnePlan).


I'm not really bothered by entire sections that a re optional in the
other formats, tbh. Especially when they're configurable, more recent
and/or dependent on build options - tools are going to have to deal with
being absent anyway, and that's usually just about no code.


> I think we might as well go a little further and invent an explicit
> JIT option for EXPLAIN, filling in the feature that Andres didn't
> bother with originally.

Yea, I've introduced that in slightly different form in the thread I
referenced yesterday too. See 0004 in
https://www.postgresql.org/message-id/20191029000229.fkjmuld3g7f2jq7i%40alap3.anarazel.de
although it was about adding additional details, rather than showing the
"basic" information.

I'd probably want to make JIT a tristate (off, on, details), instead of
a boolean, but that's details. And can be changed later.


> What's not entirely clear to me is whether to try to preserve the
> current behavior by making it track COSTS if not explicitly
> specified.

> I'd rather decouple that and say "you must write EXPLAIN (JIT [ON]) if
> you want JIT info"; but maybe people will argue that it's already too
> late to change this?

I don't think "too late" is a concern, I don't think there's much of a
"reliance" interest here. Especially not after whacking things around
already.

One concern I do have is that I think we need the overall time for JIT
to be displayed regardless of the JIT option. Otherwise it's going to be
much harder to diagnose cases where some person reports an issue with
executor startup being slow, because of JIT overhead. I think we
shouldn't require users to guess that that's where they should
look. That need, combined with wanting the regression test output to
keep stable, is really what lead to tying the jit information to COSTS.

I guess we could make it so that JIT is inferred based on COSTS, unless
explicitly present. A bit automagical, but ...


> Another debatable question is whether to print anything in non-JIT
> builds.  We could, with a little bit of pain, print a lot of zeroes
> and "falses".  If we stick with the current behavior of omitting
> the JIT fields entirely, then that's extending the existing policy
> to say that configuration options are also allowed to affect the
> set of fields that are printed.  Given that we allow GUCs to affect
> that set (cf track_io_timing), maybe this is okay; but it does seem
> like it's weakening the promise of a well-defined data schema for
> EXPLAIN output.

Hm. I don't think I have an opinion on this. I can see an argument
either way.


Greetings,

Andres Freund




Re: Online checksums patch - once again

2020-01-26 Thread Andres Freund
Hi,

On 2020-01-23 12:23:09 -0500, Robert Haas wrote:
> On Thu, Jan 23, 2020 at 6:19 AM Daniel Gustafsson  wrote:
> > A bigger question is how to handle the offline capabilities.  pg_checksums 
> > can
> > enable or disable checksums in an offline cluster, which will put the 
> > cluster
> > in a state where the pg_control file and the catalog don't match at startup.
> > One strategy could be to always trust the pg_control file and alter the 
> > catalog
> > accordingly, but that still leaves a window of inconsistent cluster state.
> 
> I suggest that we define things so that the catalog state is only
> meaningful during a state transition. That is, suppose the cluster
> state is either "on", "enabling", or "off". When it's "on", checksums
> are written and verified. When it is "off", checksums are not written
> and not verified. When it's "enabling", checksums are written but not
> verified. Also, when and only when the state is "enabling", the
> background workers that try to rewrite relations to add checksums run,
> and those workers look at the catalog state to figure out what to do.
> Once the state changes to "on", those workers don't run any more, and
> so the catalog state does not make any difference.
> 
> A tricky problem is to handling the case where the state is switched
> from "enabling" to "on" and then back to "off" and then to "enabling"
> again. You don't want to confuse the state from the previous round of
> enabling with the state for the current round of enabling. Suppose in
> addition to storing the cluster-wide state of on/off/enabling, we also
> store an "enable counter" which is incremented every time the state
> goes from "off" to "enabling". Then, for each database and relation,
> we store a counter that indicates the value of the enable counter at
> the time we last scanned/rewrote that relation to set checksums. Now,
> you're covered. And, to save space, it can probably be a 32-bit
> counter, since 4 billion disable/reenable cycles ought to be enough
> for anybody.
> 
> It would not be strictly necessary to store this in pg_class. Another
> thing that could be done is to store it in a separate system table
> that could even be truncated when enabling is not in progress - though
> it would be unwise to assume that it's always truncated at the
> beginning of an enabling cycle, since it would be hard to guarantee
> that the previous enabling cycle didn't fail when trying to truncate.
> So you'd probably still end up with something like the counter
> approach. I am inclined to think that inventing a whole new catalog
> for this is over-engineering, but someone might think differently.
> Note that creating a table while enabling is in progress needs to set
> the enabling counter for the new table to the new value of the
> enabling counter, not the old one, because the new table starts empty
> and won't end up with any pages that don't have valid checksums.
> Similarly, TRUNCATE, CLUSTER, VACUUM FULL, and rewriting variants of
> ALTER TABLE can set the new value for the enabling counter as a side
> effect. That's probably easier and more efficient if it's just value
> in pg_class than if they have to go poking around in another catalog.
> So I am tentatively inclined to think that just putting it in pg_class
> makes more sense.

I'm somewhat inclined to think that it's worth first making this robust
without catalog state - even though I find restartability
important. Especially due to not having convenient ways to have cross
database state that we can reset without again needing background
workers. I also wonder if it's not worthwhile to design the feature in a
way that, *in the future*, checksums could be separately set on the
standby/primary - without needing to ship the whole database through
WAL.

Oh, if all relation types had a metapage with a common header, this
would be so much easier...

It'd also be a lot easier if we could map from relfilenode back to a
relation oid, without needing catalog access. That'd allow us to acquire
locks on the relation for a filenode, without needing to be connected to
a database. Again, with a metapage, that'd be quite doable.


Probably not worth doing just for this, but I'm wondering about solving
the metapage issue by just adding a metadata relation fork. Sucks to
increase the number of files further, but I don't really see a path
towards having a metapage for everything, given pg_upgrade compat
requirements. Such a metadata fork, in contrast, could easily be filled
by pg_upgrade.  That metadata file could perhaps replace init forks too.

For debuggability storing some information about the relation in that
metadata fork would be great. Being able to identify the type of
relation etc from there, and perhaps even the relname at creation, would
certainly be helpful for cases the database doesn't start up anymore.

With a bit of care, we could allow AMs to store additional information
in there, by having a offset pointer for am information in 

Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

2020-01-26 Thread Andres Freund
Hi,

On 2020-01-22 23:18:12 +0100, Daniel Gustafsson wrote:
> > On 26 Nov 2019, at 06:44, Michael Paquier  wrote:
> 
> Re this patch being in WoA state for some time [0]:
> 
> > The regression tests of contrib/test_decoding are still failing here:
> > +ERROR:  could not resolve cmin/cmax of catalog tuple
> 
> This is the main thing left with this patch, and I've been unable so far to
> figure it out.  I have an unscientific hunch that this patch is shaking out
> something (previously unexercised) in the logical decoding code supporting
> multi-inserts in the catalog.  If anyone has ideas or insights, I would love
> the help.

Perhaps we can take a look at it together while at fosdem? Feel free to
remind me...

Greetings,

Andres Freund




Re: error context for vacuum to include block number

2020-01-26 Thread Andres Freund
Hi,

On 2020-01-20 15:49:29 -0600, Justin Pryzby wrote:
> On Mon, Jan 20, 2020 at 11:11:20AM -0800, Andres Freund wrote:
> On Mon, Jan 20, 2020 at 11:11:20AM -0800, Andres Freund wrote:
> > Alternatively we could push another context for each index inside
> > lazy_vacuum_all_indexes(). There's been plenty bugs in indexes
> > triggering problems, so that could be worthwhile.
> 
> Did this too, although I'm not sure what kind of errors it'd find (?)

What do you mean with "kind of errors"? We had index corruptions that
caused index vacuuming to fail, and there was no way to diagnose which
table / index it was so far?


> postgres=# SET client_min_messages=debug;SET statement_timeout=99; VACUUM 
> (VERBOSE, PARALLEL 0) t;
> INFO:  vacuuming "public.t"
> DEBUG:  "t_a_idx": vacuuming index
> 2020-01-20 15:47:36.338 CST [20139] ERROR:  canceling statement due to 
> statement timeout
> 2020-01-20 15:47:36.338 CST [20139] CONTEXT:  while vacuuming relation 
> "public.t_a_idx"
> 2020-01-20 15:47:36.338 CST [20139] STATEMENT:  VACUUM (VERBOSE, PARALLEL 0) 
> t;
> ERROR:  canceling statement due to statement timeout
> CONTEXT:  while vacuuming relation "public.t_a_idx"

It'd be a bit nicer if it said index "public.t_a_idx" for relation "public.t".

Greetings,

Andres Freund




Re: error context for vacuum to include block number

2020-01-26 Thread Andres Freund
Hi,

On 2020-01-22 17:17:26 -0600, Justin Pryzby wrote:
> On Mon, Jan 20, 2020 at 11:11:20AM -0800, Andres Freund wrote:
> > > @@ -966,8 +986,11 @@ lazy_scan_heap(Relation onerel, VacuumParams 
> > > *params, LVRelStats *vacrelstats,
> > >   /* Work on all the indexes, then the heap */
> > > + /* Don't use the errcontext handler outside this 
> > > function */
> > > + error_context_stack = errcallback.previous;
> > >   lazy_vacuum_all_indexes(onerel, Irel, indstats,
> > >   
> > > vacrelstats, lps, nindexes);
> > > + error_context_stack = 
> > 
> > Alternatively we could push another context for each index inside
> > lazy_vacuum_all_indexes(). There's been plenty bugs in indexes
> > triggering problems, so that could be worthwhile.
> 
> Is the callback for index vacuum useful without a block number?

Yea, it is. Without at least that context I don't think we even will
reliably know which index we're dealing with in case of an error.


> FYI, I have another patch which would add DEBUG output before each stage, 
> which
> would be just as much information, and without needing to use a callback.
> It's 0004 here:

I don't think that is equivalent at all. With a context I see the
context in the log in case of an error. With a DEBUG message I need to
be able to reproduce the error (without even knowing which relation
etc).

Greetings,

Andres Freund




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2020-01-26 Thread Andres Freund
Hi,

On 2020-01-26 16:20:03 +0100, Magnus Hagander wrote:
> On Sun, Jan 26, 2020 at 1:44 AM Andres Freund  wrote:
> > On 2020-01-25 15:43:41 +0100, Magnus Hagander wrote:
> > > On Fri, Jan 24, 2020 at 8:52 PM Andres Freund  wrote:
> > > > Lastly, I don't understand what the point of sending fixed size stats,
> > > > like the stuff underlying pg_stat_bgwriter, through pgstats IPC. While
> > > > I don't like it's architecture, we obviously need something like pgstat
> > > > to handle variable amounts of stats (database, table level etc
> > > > stats). But that doesn't at all apply to these types of global stats.
> > >
> > > That part has annoyed me as well a few times. +1 for just moving that
> > > into a global shared memory. Given that we don't really care about
> > > things being in sync between those different counters *or* if we loose
> > > a bit of data (which the stats collector is designed to do), we could
> > > even do that without a lock?
> >
> > I don't think we'd quite want to do it without any (single counter)
> > synchronization - high concurrency setups would be pretty likely to
> > loose values that way. I suspect the best would be to have a struct in
> > shared memory that contains the potential counters for each potential
> > process. And then sum them up when actually wanting the concrete
> > value. That way we avoid unnecessary contention, in contrast to having a
> > single shared memory value for each(which would just pingpong between
> > different sockets and store buffers).  There's a few details like how
> > exactly to implement resetting the counters, but ...
> 
> Right. Each process gets to do their own write, but still in shared
> memory. But do you need to lock them when reading them (for the
> summary)? That's the part where I figured you could just read and
> summarize them, and accept the possible loss.

Oh, yea, I'd not lock for that. On nearly all machines aligned 64bit
integers can be read / written without a danger of torn values, and I
don't think we need perfect cross counter accuracy. To deal with the few
platforms without 64bit "single copy atomicity", we can just use
pg_atomic_read/write_u64. These days (e8fdbd58fe) they automatically
fall back to using locked operations for those platforms.  So I don't
think there's actually a danger of loss.

Obviously we could also use atomic ops to increment the value, but I'd
rather not add all those atomic operations, even if it's on uncontended
cachelines. It'd allow us to reset the backend values more easily by
just swapping in a 0, which we can't do if the backend increments
non-atomically. But I think we could instead just have one global "bias"
value to implement resets (by subtracting that from the summarized
value, and storing the current sum when resetting). Or use the new
global barrier to trigger a reset. Or something similar.

Greetings,

Andres Freund




Re: Strange coding in _mdfd_openseg()

2020-01-26 Thread Thomas Munro
On Sun, Jan 26, 2020 at 8:23 AM Noah Misch  wrote:
> Agreed.  The rest of md.c won't cope with a hole in this array, so allowing
> less-than-or-equal here is futile.  The patch in the original post looks fine.

Thanks.  Pushed.




EXPLAIN's handling of output-a-field-or-not decisions

2020-01-26 Thread Tom Lane
I believe that the design intention for EXPLAIN's non-text output
formats is that a given field should appear, or not, depending solely
on the plan shape, EXPLAIN options, and possibly GUC settings.
It's not okay to suppress a field just because it's empty or zero or
otherwise uninteresting, because that makes life harder for automated
tools that now have to cope with expected fields maybe not being there.
See for instance what I wrote in commit 8ebb69f85:

[Partial Mode] did not appear at all for a non-parallelized Agg plan node,
which is contrary to expectation in non-text formats.  We're notionally
producing objects that conform to a schema, so the set of fields for a
given node type and EXPLAIN mode should be well-defined.  I set it up to
fill in "Simple" in such cases.

Other fields that were added for parallel query, namely "Parallel Aware"
and Gather's "Single Copy", had not gotten the word on that point either.
Make them appear always in non-text output.

(This is intentionally different from the policy for TEXT-format output,
which is meant to be human-readable so suppressing boring data is
sensible.)

But I noticed while poking at the EXPLAIN code yesterday that recent
patches haven't adhered to this policy too well.

For one, EXPLAIN (SETTINGS) suppresses the "Settings" subnode if
there's nothing to report.  This is just wrong, but I think all we
have to do is delete the over-eager early exit:

/* also bail out of there are no options */
if (!num)
return;

The other offender is the JIT stuff: it prints if COSTS is on and
there's some JIT activity to report, and otherwise you get nothing.
This is OK for text mode but it's bogus for the other formats.
Since we just rearranged EXPLAIN's JIT output anyway, now seems like
a good time to fix it.

I think we might as well go a little further and invent an explicit
JIT option for EXPLAIN, filling in the feature that Andres didn't
bother with originally.  What's not entirely clear to me is whether
to try to preserve the current behavior by making it track COSTS
if not explicitly specified. I'd rather decouple that and say
"you must write EXPLAIN (JIT [ON]) if you want JIT info"; but maybe
people will argue that it's already too late to change this?

Another debatable question is whether to print anything in non-JIT
builds.  We could, with a little bit of pain, print a lot of zeroes
and "falses".  If we stick with the current behavior of omitting
the JIT fields entirely, then that's extending the existing policy
to say that configuration options are also allowed to affect the
set of fields that are printed.  Given that we allow GUCs to affect
that set (cf track_io_timing), maybe this is okay; but it does seem
like it's weakening the promise of a well-defined data schema for
EXPLAIN output.

Any thoughts?  I'm happy to go make this happen if there's not a
lot of argument over what it should look like.

regards, tom lane




Re: making the backend's json parser work in frontend code

2020-01-26 Thread Mark Dilger


> On Jan 22, 2020, at 10:53 AM, Robert Haas  wrote:
> 
> 0004 is a substantially cleaned up version of the patch to make the
> JSON parser return a result code rather than throwing errors. Names
> have been fixed, interfaces have been tidied up, and the thing is
> better integrated with the surrounding code. I would really like
> comments, if anyone has them, on whether this approach is acceptable.
> 
> 0005 builds on 0004 by moving three functions from jsonapi.c to
> jsonfuncs.c. With that done, jsonapi.c has minimal remaining
> dependencies on the backend environment. It would still need a
> substitute for elog(ERROR, "some internal thing is broken"); I'm
> thinking of using pg_log_fatal() for that case. It would also need a
> fix for the problem that pg_mblen() is not available in the front-end
> environment. I don't know what to do about that yet exactly, but it
> doesn't seem unsolvable. The frontend environment just needs to know
> which encoding to use, and needs a way to call PQmblen() rather than
> pg_mblen().

I have completed the work in the attached 0006 and 0007 patches.
These are intended to apply after your 0004 and 0005; they won’t
work directly on master which, as of this writing, only contains your
0001-0003 patches.

0006 finishes moving the json parser to src/include/common and src/common.

0007 adds testing.

I would appreciate somebody looking at the portability issues for 0007
on Windows.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




v4-0006-Relocating-jsonapi-to-common.patch
Description: Binary data


v4-0007-Adding-frontend-tests-for-json-parser.patch
Description: Binary data


Re: Add %x to PROMPT1 and PROMPT2

2020-01-26 Thread Vik Fearing
On 26/01/2020 19:48, Fabien COELHO wrote:
> 
> Hello Vik,
> 
>> I cannot ever think of a time when I don't want to know if I'm in a
>> transaction or not (and what its state is).  Every new setup I do, I add
>> %x to the psql prompt.
>>
>> I think it should be part of the default prompt.  Path attached.
> 
> Isn't there examples in the documentation which use the default prompts?
> 
> If so, should they be updated accordingly?

Good catch!
I thought about the documentation but not the examples therein.

Updated patch attached.
-- 
Vik Fearing
>From fed7b95e13b3812d7756188f3b907389f2c45904 Mon Sep 17 00:00:00 2001
From: Vik Fearing 
Date: Fri, 23 Nov 2018 22:26:18 +0100
Subject: [PATCH] Add %x to the default psql prompt.

---
 doc/src/sgml/logicaldecoding.sgml | 6 +++---
 doc/src/sgml/ref/psql-ref.sgml| 2 +-
 src/bin/psql/settings.h   | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 8db968641e..bce6d379bf 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -92,9 +92,9 @@ postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NU
 (0 rows)
 
 postgres=# BEGIN;
-postgres=# INSERT INTO data(data) VALUES('1');
-postgres=# INSERT INTO data(data) VALUES('2');
-postgres=# COMMIT;
+postgres=*# INSERT INTO data(data) VALUES('1');
+postgres=*# INSERT INTO data(data) VALUES('2');
+postgres=*# COMMIT;
 
 postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);
 lsn|  xid  |  data   
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 48b081fd58..20ba105160 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4332,7 +4332,7 @@ testdb= \set PROMPT1 '%[%033[1;33;40m%]%n@%/%R%[%033[0m%]%# '
 
 To insert a percent sign into your prompt, write
 %%. The default prompts are
-'%/%R%# ' for prompts 1 and 2, and
+'%/%R%x%# ' for prompts 1 and 2, and
 ' ' for prompt 3.
 
 
diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h
index d2a9d4836a..2b384a38a1 100644
--- a/src/bin/psql/settings.h
+++ b/src/bin/psql/settings.h
@@ -23,8 +23,8 @@
 #define DEFAULT_EDITOR_LINENUMBER_ARG "+"
 #endif
 
-#define DEFAULT_PROMPT1 "%/%R%# "
-#define DEFAULT_PROMPT2 "%/%R%# "
+#define DEFAULT_PROMPT1 "%/%R%x%# "
+#define DEFAULT_PROMPT2 "%/%R%x%# "
 #define DEFAULT_PROMPT3 ">> "
 
 /*
-- 
2.17.1



Re: Add %x to PROMPT1 and PROMPT2

2020-01-26 Thread Fabien COELHO



Hello Vik,


I cannot ever think of a time when I don't want to know if I'm in a
transaction or not (and what its state is).  Every new setup I do, I add
%x to the psql prompt.

I think it should be part of the default prompt.  Path attached.


Isn't there examples in the documentation which use the default prompts?

If so, should they be updated accordingly?

--
Fabien.




Re: proposal: schema variables

2020-01-26 Thread Pavel Stehule
>
>
> 12) I find it rather suspicious that we make decisions in utility.c
> solely based on commandType (whether it's CMD_UTILITY or not). IMO
> it's pretty strange/ugly that T_LetStmt can be both CMD_UTILITY and
> CMD_PLAN_UTILITY:
>
>  case T_LetStmt:
>  {
>  if (pstmt->commandType == CMD_UTILITY)
>  doLetStmtReset(pstmt);
>  else
>  {
>  Assert(pstmt->commandType == CMD_PLAN_UTILITY);
>  doLetStmtEval(pstmt, params, queryEnv, queryString);
>  }
>
>  if (completionTag)
>  strcpy(completionTag, "LET");
>  }
>  break;
>
>
>
It looks strange, but it has sense, because the LET stmt supports reset to
default value.

I can write

1. LET var = DEFAULT;
2. LET var = (query);

In first case I have not any query, that I can assign, and in this case the
LET statement is really only UTILITY.

I did comment there

Regards

Pavel




>
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


0001-schema-variables-20200126.patch.gz
Description: application/gzip


Re: explain HashAggregate to report bucket and memory stats

2020-01-26 Thread Justin Pryzby
On Sun, Jan 26, 2020 at 08:14:25AM -0600, Justin Pryzby wrote:
> On Fri, Jan 03, 2020 at 10:19:25AM -0600, Justin Pryzby wrote:
> > On Sun, Feb 17, 2019 at 11:29:56AM -0500, Jeff Janes wrote:
> > https://www.postgresql.org/message-id/CAMkU%3D1zBJNVo2DGYBgLJqpu8fyjCE_ys%2Bmsr6pOEoiwA7y5jrA%40mail.gmail.com
> > > What would I find very useful is [...] if the HashAggregate node under
> > > "explain analyze" would report memory and bucket stats; and if the 
> > > Aggregate
> > > node would report...anything.
> > 
> > Find attached my WIP attempt to implement this.
> > 
> > Jeff: can you suggest what details Aggregate should show ?
> 
> Rebased on top of 10013684970453a0ddc86050bba813c64321
> And added https://commitfest.postgresql.org/27/2428/

Attached for real.
>From 42634353f44ab44a06ffe4ae36a0dcbb848408ca Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 1 Jan 2020 13:09:33 -0600
Subject: [PATCH v1 1/2] refactor: show_hinstrument and avoid showing memory
 use if not verbose..

This changes explain analyze at least for Hash(join), but doesn't affect
regression tests, since all the HashJoin tests seem to run explain without
analyze, so nbatch=0, and no stats are shown.

But for future patch to show stats for HashAgg (for which nbatch=1, always), we
want to show buckets in explain analyze, but don't want to show memory, since
it's machine-specific.
---
 src/backend/commands/explain.c | 79 +++---
 1 file changed, 52 insertions(+), 27 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f523adb..11b5857 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -103,6 +103,7 @@ static void show_sortorder_options(StringInfo buf, Node *sortexpr,
 static void show_tablesample(TableSampleClause *tsc, PlanState *planstate,
 			 List *ancestors, ExplainState *es);
 static void show_sort_info(SortState *sortstate, ExplainState *es);
+static void show_hinstrument(ExplainState *es, HashInstrumentation *h);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 ExplainState *es);
@@ -2713,43 +2714,67 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 		}
 	}
 
-	if (hinstrument.nbatch > 0)
+	show_hinstrument(es, );
+}
+
+/*
+ * Show hash bucket stats and (optionally) memory.
+ */
+static void
+show_hinstrument(ExplainState *es, HashInstrumentation *h)
+{
+	long		spacePeakKb = (h->space_peak + 1023) / 1024;
+
+	// Currently, this isn't shown for explain of hash(join) since nbatch=0 without analyze
+	// But, it's shown for hashAgg since nbatch=1, always.
+	// Need to 1) avoid showing memory use if !analyze; and, 2) avoid memory use if not verbose
+
+	if (h->nbatch <= 0)
+		return;
+	/* This avoids showing anything if it's explain without analyze; should we just check that, instead ? */
+	if (!es->analyze)
+		return;
+
+	if (es->format != EXPLAIN_FORMAT_TEXT)
+	{
+		ExplainPropertyInteger("Hash Buckets", NULL,
+			   h->nbuckets, es);
+		ExplainPropertyInteger("Original Hash Buckets", NULL,
+			   h->nbuckets_original, es);
+		ExplainPropertyInteger("Hash Batches", NULL,
+			   h->nbatch, es);
+		ExplainPropertyInteger("Original Hash Batches", NULL,
+			   h->nbatch_original, es);
+		ExplainPropertyInteger("Peak Memory Usage", "kB",
+			   spacePeakKb, es);
+	}
+	else
 	{
-		long		spacePeakKb = (hinstrument.space_peak + 1023) / 1024;
 
-		if (es->format != EXPLAIN_FORMAT_TEXT)
-		{
-			ExplainPropertyInteger("Hash Buckets", NULL,
-   hinstrument.nbuckets, es);
-			ExplainPropertyInteger("Original Hash Buckets", NULL,
-   hinstrument.nbuckets_original, es);
-			ExplainPropertyInteger("Hash Batches", NULL,
-   hinstrument.nbatch, es);
-			ExplainPropertyInteger("Original Hash Batches", NULL,
-   hinstrument.nbatch_original, es);
-			ExplainPropertyInteger("Peak Memory Usage", "kB",
-   spacePeakKb, es);
-		}
-		else if (hinstrument.nbatch_original != hinstrument.nbatch ||
- hinstrument.nbuckets_original != hinstrument.nbuckets)
-		{
+		if (h->nbatch_original != h->nbatch ||
+			 h->nbuckets_original != h->nbuckets) {
 			ExplainIndentText(es);
 			appendStringInfo(es->str,
-			 "Buckets: %d (originally %d)  Batches: %d (originally %d)  Memory Usage: %ldkB\n",
-			 hinstrument.nbuckets,
-			 hinstrument.nbuckets_original,
-			 hinstrument.nbatch,
-			 hinstrument.nbatch_original,
-			 spacePeakKb);
+		"Buckets: %d (originally %d)   Batches: %d (originally %d)",
+		h->nbuckets,
+		h->nbuckets_original,
+		h->nbatch,
+		h->nbatch_original);
 		}
 		else
 		{
 			ExplainIndentText(es);
 			appendStringInfo(es->str,
-			 "Buckets: %d  Batches: %d  Memory Usage: %ldkB\n",
-			 hinstrument.nbuckets, hinstrument.nbatch,
-			 spacePeakKb);
+		"Buckets: %d  Batches: %d",
+		h->nbuckets,
+		

Re: proposal: type info support functions for functions that use "any" type

2020-01-26 Thread Pavel Stehule
Hi

út 14. 1. 2020 v 22:09 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> >  [ parser-support-function-with-demo-20191128.patch ]
>
> TBH, I'm still not convinced that this is a good idea.  Restricting
> the support function to only change the function's return type is
> safer than the original proposal, but it's still not terribly safe.
> If you change the support function's algorithm in any way, how do
> you know whether you've broken existing stored queries?  If the
> support function consults external resources to make its choice
> (perhaps checking the existence of a cast), where could we record
> that the query depends on the existence of that cast?  There'd be
> no visible trace of that in the query parsetree.
>
>
I reread all related mails and I think so it should be safe - or there is
same risk like using any C extensions for functions or hooks.

I use a example from demo

+CREATE FUNCTION decode_support(internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
+
+--
+-- decode function - example of function that returns "any" type
+--
+CREATE FUNCTION decode(variadic "any")
+RETURNS "any"
+AS 'MODULE_PATHNAME'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE SUPPORT decode_support;

The support function (and implementation) is joined with "decode" function.
So I cannot to change the behave of support function without reloading a
extension, and it needs typically session reconnect.



> I'm also still not convinced that this idea allows doing anything
> that can't be done just as well with polymorphism.  It would be a
> really bad idea for the support function to be examining the values
> of the arguments (else what happens when they're not constants?).
> So all you can do is look at their types, and then it seems like
> the things you can usefully do are pretty much like polymorphism,
> i.e. select some one of the input types, or a related type such
> as an array type or element type.  If there are gaps in what you
> can express with polymorphism, I'd much rather spend effort on
> improving that facility than in adding something that is only
> accessible to advanced C coders.  (Yes, I know I've been slacking
> on reviewing [1].)
>

The design is based not on values, just on types. I don't need to know a
value, I need to know a type.

Currently our polymorphism is not enough - and is necessary to use "any"
datatype. This patch just add a possibility to use "any" as return type.

I spent on this topic lot of time and one result is patch [1]. This patch
increase situation lot of, but cannot to cover all. There are strong limits
for variadic usage.

This patch is really not about values, it is about types - and about more
possibility (and more elasticity) to control result type.


> Lastly, I still think that this patch doesn't begin to address
> all the places that would have to know about the feature.  There's
> a lot of places that know about polymorphism --- if this is
> polymorphism on steroids, which it is, then why don't all of those
> places need to be touched?
>

It is working with "any" type, and then it can be very small, because the
all work with this type is moved to extension.


> On the whole I think we should reject this idea.
>

I will accept any your opinion. Please, try to understand to me as Orafce
developer, maintainer. I would to clean this extension, and current state
of polymorphism (with patch [1]) doesn't allow it.

I am open to any proposals, ideas.

Regards

Pavel


> regards, tom lane
>
> [1] https://commitfest.postgresql.org/26/1911/
>


Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2020-01-26 Thread Magnus Hagander
On Sun, Jan 26, 2020 at 1:44 AM Andres Freund  wrote:
>
> Hi,
>
> On 2020-01-25 15:43:41 +0100, Magnus Hagander wrote:
> > On Fri, Jan 24, 2020 at 8:52 PM Andres Freund  wrote:
> > > Additionally pg_stat_bgwriter.buffers_backend also counts writes done by
> > > autovacuum et al.
>
> > > I think it'd make sense to at least split buffers_backend into
> > > buffers_backend_extend,
> > > buffers_backend_write,
> > > buffers_backend_write_strat
> > >
> > > but it could also be worthwhile to expand it into
> > > buffers_backend_extend,
> > > buffers_{backend,checkpoint,bgwriter,autovacuum}_write
> > > buffers_{backend,autovacuum}_write_stat
> >
> > Given that these are individual global counters, I don't really see
> > any reason not to expand it to the bigger set of counters. It's easy
> > enough to add them up together later if needed.
>
> Are you agreeing to
> buffers_{backend,checkpoint,bgwriter,autovacuum}_write
> or are you suggesting further ones?

The former.


> > > I think we also count things as writes that aren't writes: mdtruncate()
> > > is AFAICT counted as one backend write for each segment. Which seems
> > > weird to me.
> >
> > It's at least slightly weird :) Might it be worth counting truncate
> > events separately?
>
> Is that really something interesting? Feels like it'd have to be done at
> a higher level to be useful. E.g. the truncate done by TRUNCATE (when in
> same xact as creation) and VACUUM are quite different. I think it'd be
> better to just not include it.

Yeah, you're probably right. it certainly makes very little sense
where it is now.


> > > Lastly, I don't understand what the point of sending fixed size stats,
> > > like the stuff underlying pg_stat_bgwriter, through pgstats IPC. While
> > > I don't like it's architecture, we obviously need something like pgstat
> > > to handle variable amounts of stats (database, table level etc
> > > stats). But that doesn't at all apply to these types of global stats.
> >
> > That part has annoyed me as well a few times. +1 for just moving that
> > into a global shared memory. Given that we don't really care about
> > things being in sync between those different counters *or* if we loose
> > a bit of data (which the stats collector is designed to do), we could
> > even do that without a lock?
>
> I don't think we'd quite want to do it without any (single counter)
> synchronization - high concurrency setups would be pretty likely to
> loose values that way. I suspect the best would be to have a struct in
> shared memory that contains the potential counters for each potential
> process. And then sum them up when actually wanting the concrete
> value. That way we avoid unnecessary contention, in contrast to having a
> single shared memory value for each(which would just pingpong between
> different sockets and store buffers).  There's a few details like how
> exactly to implement resetting the counters, but ...

Right. Each process gets to do their own write, but still in shared
memory. But do you need to lock them when reading them (for the
summary)? That's the part where I figured you could just read and
summarize them, and accept the possible loss.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: explain HashAggregate to report bucket and memory stats

2020-01-26 Thread Justin Pryzby
On Fri, Jan 03, 2020 at 10:19:25AM -0600, Justin Pryzby wrote:
> On Sun, Feb 17, 2019 at 11:29:56AM -0500, Jeff Janes wrote:
> https://www.postgresql.org/message-id/CAMkU%3D1zBJNVo2DGYBgLJqpu8fyjCE_ys%2Bmsr6pOEoiwA7y5jrA%40mail.gmail.com
> > What would I find very useful is [...] if the HashAggregate node under
> > "explain analyze" would report memory and bucket stats; and if the Aggregate
> > node would report...anything.
> 
> Find attached my WIP attempt to implement this.
> 
> Jeff: can you suggest what details Aggregate should show ?

Rebased on top of 10013684970453a0ddc86050bba813c64321
And added https://commitfest.postgresql.org/27/2428/




vacuum verbose: show pages marked allvisible/frozen/hintbits

2020-01-26 Thread Justin Pryzby
I'm forking this thread since it's separate topic, and since keeping in a
single branch hasn't made maintaining the patches any easier.
https://www.postgresql.org/message-id/CAMkU%3D1xAyWnwnLGORBOD%3Dpyv%3DccEkDi%3DwKeyhwF%3DgtB7QxLBwQ%40mail.gmail.com
On Sun, Dec 29, 2019 at 01:15:24PM -0500, Jeff Janes wrote:
> Also, I'd appreciate a report on how many hint-bits were set, and how many
> pages were marked all-visible and/or frozen.  When I do a manual vacuum, it
> is more often for those purposes than it is for removing removable rows
> (which autovac generally does a good enough job of).

The first patch seems simple enough but the 2nd could use critical review.
>From 57eede7d1158904d6b66532c7d0ce6a59803210f Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 29 Dec 2019 14:56:02 -0600
Subject: [PATCH v1 1/2] Report number of pages marked allvisible/frozen..

..as requested by Jeff Janes
---
 src/backend/access/heap/vacuumlazy.c | 37 ++--
 1 file changed, 31 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8ce5011..9975699 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -283,7 +283,9 @@ typedef struct LVRelStats
 	double		new_rel_tuples; /* new estimated total # of tuples */
 	double		new_live_tuples;	/* new estimated total # of live tuples */
 	double		new_dead_tuples;	/* new estimated total # of dead tuples */
-	BlockNumber pages_removed;
+	BlockNumber pages_removed;	/* Due to truncation */
+	BlockNumber pages_allvisible;
+	BlockNumber pages_frozen;
 	double		tuples_deleted;
 	BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
 	LVDeadTuples *dead_tuples;
@@ -602,11 +604,13 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
 			 get_namespace_name(RelationGetNamespace(onerel)),
 			 RelationGetRelationName(onerel),
 			 vacrelstats->num_index_scans);
-			appendStringInfo(, _("pages: %u removed, %u remain, %u skipped due to pins, %u skipped frozen\n"),
+			appendStringInfo(, _("pages: %u removed, %u remain, %u skipped due to pins, %u skipped frozen, %u marked all visible, %u marked frozen\n"),
 			 vacrelstats->pages_removed,
 			 vacrelstats->rel_pages,
 			 vacrelstats->pinskipped_pages,
-			 vacrelstats->frozenskipped_pages);
+			 vacrelstats->frozenskipped_pages,
+			 vacrelstats->pages_allvisible,
+			 vacrelstats->pages_frozen);
 			appendStringInfo(,
 			 _("tuples: %.0f removed, %.0f remain, %.0f are dead but not yet removable, oldest xmin: %u\n"),
 			 vacrelstats->tuples_deleted,
@@ -751,6 +755,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	vacrelstats->scanned_pages = 0;
 	vacrelstats->tupcount_pages = 0;
 	vacrelstats->nonempty_pages = 0;
+	vacrelstats->pages_allvisible = 0;
+	vacrelstats->pages_frozen = 0;
+
 	vacrelstats->latestRemovedXid = InvalidTransactionId;
 
 	/*
@@ -1170,6 +1177,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 visibilitymap_set(onerel, blkno, buf, InvalidXLogRecPtr,
   vmbuffer, InvalidTransactionId,
   VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+vacrelstats->pages_allvisible++;
+vacrelstats->pages_frozen++;
 END_CRIT_SECTION();
 			}
 
@@ -1501,8 +1510,12 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		{
 			uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
-			if (all_frozen)
+			if (all_frozen) {
 flags |= VISIBILITYMAP_ALL_FROZEN;
+vacrelstats->pages_frozen++;
+			}
+
+			vacrelstats->pages_allvisible++;
 
 			/*
 			 * It should never be the case that the visibility map page is set
@@ -1690,6 +1703,14 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	"%u pages are entirely empty.\n",
 	empty_pages),
 	 empty_pages);
+	appendStringInfo(, ngettext("Marked %u page all visible, ",
+	"Marked %u pages all visible, ",
+	vacrelstats->pages_allvisible),
+	vacrelstats->pages_allvisible);
+	appendStringInfo(, ngettext("%u page frozen.\n",
+	"%u pages frozen.\n",
+	vacrelstats->pages_frozen),
+	vacrelstats->pages_frozen);
 	appendStringInfo(, _("%s."), pg_rusage_show());
 
 	ereport(elevel,
@@ -1912,10 +1933,14 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 		uint8		flags = 0;
 
 		/* Set the VM all-frozen bit to flag, if needed */
-		if ((vm_status & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		if ((vm_status & VISIBILITYMAP_ALL_VISIBLE) == 0) {
 			flags |= VISIBILITYMAP_ALL_VISIBLE;
-		if ((vm_status & VISIBILITYMAP_ALL_FROZEN) == 0 && all_frozen)
+			vacrelstats->pages_allvisible++;
+		}
+		if ((vm_status & VISIBILITYMAP_ALL_FROZEN) == 0 && all_frozen) {
 			flags |= VISIBILITYMAP_ALL_FROZEN;
+			vacrelstats->pages_frozen++;
+		}
 
 		Assert(BufferIsValid(*vmbuffer));
 		if 

Re: Add %x to PROMPT1 and PROMPT2

2020-01-26 Thread Christoph Berg
Re: Vik Fearing 2020-01-26 
<09502c40-cfe1-bb29-10f9-4b3fa7b2b...@2ndquadrant.com>
> I cannot ever think of a time when I don't want to know if I'm in a
> transaction or not (and what its state is).  Every new setup I do, I add
> %x to the psql prompt.
> 
> I think it should be part of the default prompt.  Path attached.

+1, same here.

Christoph




Add %x to PROMPT1 and PROMPT2

2020-01-26 Thread Vik Fearing
I cannot ever think of a time when I don't want to know if I'm in a
transaction or not (and what its state is).  Every new setup I do, I add
%x to the psql prompt.

I think it should be part of the default prompt.  Path attached.
-- 
Vik Fearing
>From 6118b8b2ab4cfc70525666b8d57eaa351d6c2a3d Mon Sep 17 00:00:00 2001
From: Vik Fearing 
Date: Fri, 23 Nov 2018 22:26:18 +0100
Subject: [PATCH] add %x to the default psql prompt

---
 doc/src/sgml/ref/psql-ref.sgml | 2 +-
 src/bin/psql/settings.h| 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 48b081fd58..20ba105160 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4332,7 +4332,7 @@ testdb= \set PROMPT1 '%[%033[1;33;40m%]%n@%/%R%[%033[0m%]%# '
 
 To insert a percent sign into your prompt, write
 %%. The default prompts are
-'%/%R%# ' for prompts 1 and 2, and
+'%/%R%x%# ' for prompts 1 and 2, and
 ' ' for prompt 3.
 
 
diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h
index d2a9d4836a..2b384a38a1 100644
--- a/src/bin/psql/settings.h
+++ b/src/bin/psql/settings.h
@@ -23,8 +23,8 @@
 #define DEFAULT_EDITOR_LINENUMBER_ARG "+"
 #endif
 
-#define DEFAULT_PROMPT1 "%/%R%# "
-#define DEFAULT_PROMPT2 "%/%R%# "
+#define DEFAULT_PROMPT1 "%/%R%x%# "
+#define DEFAULT_PROMPT2 "%/%R%x%# "
 #define DEFAULT_PROMPT3 ">> "
 
 /*
-- 
2.17.1



Re: Remove page-read callback from XLogReaderState.

2020-01-26 Thread Heikki Linnakangas

On 21/01/2020 13:33, Craig Ringer wrote:

On Tue, 21 Jan 2020 at 18:46, Kyotaro Horiguchi  wrote:

It seems to me that it works perfectly, and everything looks good


I seem to remember some considerable pain in this area when it came to
timeline switches. Especially with logical decoding and xlog records
that split across a segment boundary.

My first attempts at logical decoding timeline following appeared fine
and passed tests until they were put under extended real world
workloads, at which point they exploded when they tripped corner cases
like this.

I landed up writing ridiculous regression tests to trigger some of
these behaviours. I don't recall how many of them made it into the
final patch to core but it's worth a look in the TAP test suite.


Yeah, the timeline switching stuff is complicated. The small 
XLogBeginRead() patch isn't really affected, but it's definitely 
something to watch out for in the callback API patch. If you happen to 
have any extra ridiculous tests still lying around, would be nice to 
look at them.


- Heikki




Re: Remove page-read callback from XLogReaderState.

2020-01-26 Thread Heikki Linnakangas

On 21/01/2020 12:45, Kyotaro Horiguchi wrote:

At Mon, 20 Jan 2020 17:24:07 +0900 (JST), Kyotaro Horiguchi 
 wrote in

Separating XLogBeginRead seems reasonable.  The annoying recptr trick
is no longer needed. In particular some logical-decoding stuff become
far cleaner by the patch.

fetching_ckpt of ReadRecord is a bit annoying but that coundn't be
moved outside to, say, ReadCheckpointRecord in a clean way. (Anyway
XLogBeginRead is not the place, of course).

As I looked through it, it looks good to me but give me some more time
to look closer.


It seems to me that it works perfectly, and everything looks good to
me except one point.

-* In this case, the passed-in record pointer should already be
+* In this case, EndRecPtr record pointer should already be

I'm not confident but isn't the "record pointer" redundant?  EndRecPtr
seems containing that meaning, and the other occurances of that kind
of variable names are not accompanied by that.


I fixed that, fixed the XLogFindNextRecord() function so that it 
positions the reader like XLogBeginRead() does (I had already adjusted 
the comments to say that, but the function didn't actually do it), and 
pushed. Thanks for the review!


I'll continue looking at the callback API changes on Monday.

- Heikki