Re: Improve handling of parameter differences in physical replication

2020-02-27 Thread Peter Eisentraut

On 2020-02-28 08:45, Michael Paquier wrote:

On Thu, Feb 27, 2020 at 02:37:24PM +0100, Peter Eisentraut wrote:

On 2020-02-27 11:13, Fujii Masao wrote:

Btw., I think the current setup is slightly buggy.  The

MaxBackends value that is used to size shared memory is computed as
MaxConnections + autovacuum_max_workers + 1 + max_worker_processes +
max_wal_senders, but we don't track autovacuum_max_workers in WAL.

Maybe this is because autovacuum doesn't work during recovery?


Autovacuum on the primary can use locks or xids, and so it's possible that
the standby when processing WAL encounters more of those than it has locally
allocated shared memory to handle.


Putting aside your patch because that sounds like a separate issue..
Doesn't this mean that autovacuum_max_workers should be added to the
control file, that we need to record in WAL any updates done to it and
that CheckRequiredParameterValues() is wrong?


That would be a direct fix, yes.

Perhaps it might be better to track the combined MaxBackends instead, 
however.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: more ALTER .. DEPENDS ON EXTENSION fixes

2020-02-27 Thread ahsan hadi
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

Tested the pg_dump patch for dumping "ALTER .. DEPENDS ON EXTENSION" in case of 
indexes, functions, triggers etc. The "ALTER .. DEPENDS ON EXTENSION" is 
included in the dump. However in some case not sure why "ALTER 
INDEX.DEPENDS ON EXTENSION" is repeated several times in the dump?

Re: Improve handling of parameter differences in physical replication

2020-02-27 Thread Michael Paquier
On Thu, Feb 27, 2020 at 02:37:24PM +0100, Peter Eisentraut wrote:
> On 2020-02-27 11:13, Fujii Masao wrote:
>>> Btw., I think the current setup is slightly buggy.  The
> MaxBackends value that is used to size shared memory is computed as
> MaxConnections + autovacuum_max_workers + 1 + max_worker_processes +
> max_wal_senders, but we don't track autovacuum_max_workers in WAL. 
>> Maybe this is because autovacuum doesn't work during recovery?
> 
> Autovacuum on the primary can use locks or xids, and so it's possible that
> the standby when processing WAL encounters more of those than it has locally
> allocated shared memory to handle.

Putting aside your patch because that sounds like a separate issue..
Doesn't this mean that autovacuum_max_workers should be added to the
control file, that we need to record in WAL any updates done to it and
that CheckRequiredParameterValues() is wrong?
--
Michael


signature.asc
Description: PGP signature


Re: Make mesage at end-of-recovery less scary.

2020-02-27 Thread Michael Paquier
On Fri, Feb 28, 2020 at 04:01:00PM +0900, Kyotaro Horiguchi wrote:
> Hello, this is a followup thread of [1].
> 
> # I didn't noticed that the thread didn't cover -hackers..
> 
> When recovery of any type ends, we see several kinds of error messages
> that says "WAL is broken".

Have you considered an error context here?  Your patch leads to a bit
of duplication with the message a bit down of what you are changing
where the end of local pg_wal is reached.

> + * reached the end of WAL.  Otherwise something's really wrong and
> + * we report just only the errormsg if any. If we don't receive

This sentence sounds strange to me.  Or you meant "Something is wrong,
so use errormsg as report if it is set"?

> +  * Note: errormsg is alreay translated.

Typo here.

> + if (StandbyMode)
> + ereport(actual_emode,
> + (errmsg ("rached end of WAL at %X/%X on timeline %u in 
> %s during streaming replication",

StandbyMode happens also with only WAL archiving, depending on if
primary_conninfo is set or not.

> + (errmsg ("rached end of WAL at %X/%X on timeline %u in %s during crash 
> recovery",

FWIW, you are introducing three times the same typo, in the same
word, in three different messages.
--
Michael


signature.asc
Description: PGP signature


Re: Implementing Incremental View Maintenance

2020-02-27 Thread Takuma Hoshiai
On Thu, 27 Feb 2020 14:35:55 -0700 (MST)
legrand legrand  wrote:

> > I have tried to use an other patch with yours:
> > "Planning counters in pg_stat_statements (using pgss_store)"
> 
> > setting
> > shared_preload_libraries='pg_stat_statements'
> > pg_stat_statements.track=all
> > restarting the cluster and creating the extension
> 
> 
> > When trying following syntax:
> 
> > create table b1 (id integer, x numeric(10,3));
> > create incremental materialized view mv1 as select id, count(*),sum(x) 
> > from b1 group by id;
> > insert into b1 values (1,1)
> >
> > I got an ASSERT FAILURE in pg_stat_statements.c
> > on
> > Assert(query != NULL);
> >
> > comming from matview.c
> > refresh_matview_datafill(dest_old, query, queryEnv, NULL);
> > or
> > refresh_matview_datafill(dest_new, query, queryEnv, NULL);
> >
> > If this (last) NULL field was replaced by the query text, 
> > a comment or just "n/a",
> > it would fix the problem.
> 
> > Could this be investigated ?
> 
> Hello,
> 
> thank you for patch v14, that fix problems inherited from temporary tables.
> it seems that this ASSERT problem with pgss patch is still present ;o(
> 
> Could we have a look ?

Sorry but we are busy on fixing and improving IVM patches. I think fixing
the assertion failure needs non trivial changes to other part of PosthreSQL.
So we would like to work on the issue you reported after the pgss patch
gets committed.

Best Regards,

Takuma Hoshiai

 
> Thanks in advance
> Regards
> PAscal
> 
> 
> 
> --
> Sent from:
> https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
> 
> 
> 
> 
> 
> --
> Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
> 
> 
> 


-- 
Takuma Hoshiai 





Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

2020-02-27 Thread Michael Paquier
On Fri, Feb 28, 2020 at 11:29:56AM +0530, Mahendra Singh Thalor wrote:
> Patch looks good to me and it is fixing the assert failure.

Thanks for looking at the patch.  Bringing the code to act
consistently with what was done in 246a6c8 still looks like the
correct direction to take, in short don't drop temp relations created
without an existing temp schema and ignore them instead of creating
more issues with the storage, so I'd like to apply and back-patch this
stuff down to 11.  First, let's wait a couple of extra days.
--
Michael


signature.asc
Description: PGP signature


Make mesage at end-of-recovery less scary.

2020-02-27 Thread Kyotaro Horiguchi
Hello, this is a followup thread of [1].

# I didn't noticed that the thread didn't cover -hackers..

When recovery of any type ends, we see several kinds of error messages
that says "WAL is broken".

> LOG:  invalid record length at 0/7CB6BC8: wanted 24, got 0
> LOG:  redo is not required
> LOG:  database system is ready to accept connections

This patch reduces the scariness of such messages as the follows.

> LOG:  rached end of WAL at 0/1551048 on timeline 1 in pg_wal during crash 
> recovery
> DETAIL:  invalid record length at 0/1551048: wanted 24, got 0
> LOG:  redo is not required
> LOG:  database system is ready to accept connections

[1] 
https://www.postgresql.org/message-id/20200117.172655.1384889922565817808.horikyota.ntt%40gmail.com

I'll register this to the coming CF.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From f3692cb484b7f1ebc351ba8a522039c0b91bcfdb Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 28 Feb 2020 15:52:58 +0900
Subject: [PATCH] Make End-Of-Recovery error less scary

When recovery in any type ends, we see a bit scary error message like
"invalid record length" that suggests something serious is happening.
Make this message less scary as "reached end of WAL".
---
 src/backend/access/transam/xlog.c | 45 ++-
 1 file changed, 38 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d19408b3be..452c376f62 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4288,6 +4288,10 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
 		EndRecPtr = xlogreader->EndRecPtr;
 		if (record == NULL)
 		{
+			int actual_emode =
+emode_for_corrupt_record(emode,
+		 ReadRecPtr ? ReadRecPtr : EndRecPtr);
+
 			if (readFile >= 0)
 			{
 close(readFile);
@@ -4295,14 +4299,41 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
 			}
 
 			/*
-			 * We only end up here without a message when XLogPageRead()
-			 * failed - in that case we already logged something. In
-			 * StandbyMode that only happens if we have been triggered, so we
-			 * shouldn't loop anymore in that case.
+			 * randAccess here means we are reading successive records during
+			 * recovery. If we get here during recovery, we can assume that we
+			 * reached the end of WAL.  Otherwise something's really wrong and
+			 * we report just only the errormsg if any. If we don't receive
+			 * errormsg here, we already logged something.  We don't emit
+			 * "reached end of WAL" in muted messages.
+			 *
+			 * Note: errormsg is alreay translated.
 			 */
-			if (errormsg)
-ereport(emode_for_corrupt_record(emode, EndRecPtr),
-		(errmsg_internal("%s", errormsg) /* already translated */ ));
+			if (!private->randAccess && actual_emode == emode)
+			{
+if (StandbyMode)
+	ereport(actual_emode,
+			(errmsg ("rached end of WAL at %X/%X on timeline %u in %s during streaming replication",
+	 (uint32) (EndRecPtr >> 32), (uint32) EndRecPtr,
+	 ThisTimeLineID,
+	 xlogSourceNames[currentSource]),
+			 (errormsg ? errdetail_internal("%s", errormsg) : 0)));
+else if (InArchiveRecovery)
+	ereport(actual_emode,
+			(errmsg ("rached end of WAL at %X/%X on timeline %u in %s during archive recovery",
+	 (uint32) (EndRecPtr >> 32), (uint32) EndRecPtr,
+	 ThisTimeLineID,
+	 xlogSourceNames[currentSource]),
+			 (errormsg ? errdetail_internal("%s", errormsg) : 0)));
+else
+	ereport(actual_emode,
+			(errmsg ("rached end of WAL at %X/%X on timeline %u in %s during crash recovery",
+	 (uint32) (EndRecPtr >> 32), (uint32) EndRecPtr,
+	 ThisTimeLineID,
+	 xlogSourceNames[currentSource]),
+			 (errormsg ? errdetail_internal("%s", errormsg) : 0)));
+			}
+			else if (errormsg)
+ereport(actual_emode, (errmsg_internal("%s", errormsg)));
 		}
 
 		/*
-- 
2.18.2



Re: Trying to pull up EXPR SubLinks

2020-02-27 Thread Andy Fan
On Fri, Feb 28, 2020 at 2:35 PM Richard Guo  wrote:

> Hi All,
>
> Currently we will not consider EXPR_SUBLINK when pulling up sublinks and
> this would cause performance issues for some queries with the form of:
> 'a > (SELECT agg(b) from ...)' as described in [1].
>
> So here is a patch as an attempt to pull up EXPR SubLinks. The idea,
> which is based on Greenplum's implementation, is to perform the
> following transformation.
>
> For query:
>
> select * from foo where foo.a >
> (select avg(bar.a) from bar where foo.b = bar.b);
>
> we transform it to:
>
> select * from foo inner join
> (select bar.b, avg(bar.a) as avg from bar group by bar.b) sub
> on foo.b = sub.b and foo.a > sub.avg;
>

Glad to see this.  I think the hard part is this transform is not *always*
good.  for example foo.a only has 1 rows, but bar has a lot  of rows, if so
the original would be the better one.  doss this patch consider this
problem?


> Thanks
> Richard
>


Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-02-27 Thread Michael Paquier
On Thu, Feb 27, 2020 at 06:29:34PM +0300, Alexey Kondratov wrote:
> On 2020-02-27 16:41, Alexey Kondratov wrote:
> > 
> > New version of the patch is attached. Thanks again for your review.
> > 
> 
> Last patch (v18) got a conflict with one of today commits (05d8449e73).
> Rebased version is attached.

The shape of the patch is getting better.  I have found some issues
when reading through the patch, but nothing huge.

+   printf(_("  -c, --restore-target-wal   use restore_command in target 
config\n"));
+   printf(_(" to retrieve WAL files from 
archive\n"));
[...]
{"progress", no_argument, NULL, 'P'},
+   {"restore-target-wal", no_argument, NULL, 'c'},
It may be better to reorder that alphabetically.

+   if (rc != 0)
+   /* Sanity check, should never happen. */
+   elog(ERROR, "failed to build restore_command due to missing 
parameters");
No point in having this comment IMO.

+/* logging support */
+#define pg_fatal(...) do { pg_log_fatal(__VA_ARGS__); exit(1); } while(0)
Actually, I don't think that this is a good idea to name this
pg_fatal() as we have the same think in pg_rewind so it could be
confusing.

-   while ((c = getopt_long(argc, argv, "D:nNPR", long_options,
_index)) != -1)
+   while ((c = getopt_long(argc, argv, "D:nNPRc", long_options,
_index)) != -1)
Alphabetical order here.

+   rmdir($node_master->archive_dir);
rmtree() is used in all our other tests.

+   pg_log_error("archive file \"%s\" has wrong size: %lu instead 
of %lu, %s",
+xlogfname, (unsigned long) stat_buf.st_size,
+(unsigned long) expectedSize, strerror(errno));
I think that the error message should be reworded: "unexpected WAL
file size for \"%s\": %lu instead of %lu".  Please note that there is
no need for strerror() here at all, as errno should be 0.

+if (xlogfd < 0)
+pg_log_error("could not open file \"%s\" restored from archive: %s\n",
+ xlogpath, strerror(errno));
[...]
+pg_log_error("could not stat file \"%s\" restored from archive: %s",
+xlogpath, strerror(errno));
No need for strerror() as you can just use %m.  And no need for the
extra newline at the end as pg_log_* routines do that by themselves. 

+   pg_log_error("could not restore file \"%s\" from archive\n",
+xlogfname);
No need for a newline here.

--
Michael


signature.asc
Description: PGP signature


Trying to pull up EXPR SubLinks

2020-02-27 Thread Richard Guo
Hi All,

Currently we will not consider EXPR_SUBLINK when pulling up sublinks and
this would cause performance issues for some queries with the form of:
'a > (SELECT agg(b) from ...)' as described in [1].

So here is a patch as an attempt to pull up EXPR SubLinks. The idea,
which is based on Greenplum's implementation, is to perform the
following transformation.

For query:

select * from foo where foo.a >
(select avg(bar.a) from bar where foo.b = bar.b);

we transform it to:

select * from foo inner join
(select bar.b, avg(bar.a) as avg from bar group by bar.b) sub
on foo.b = sub.b and foo.a > sub.avg;

To do that, we recurse through the quals in sub-select and extract quals
of form 'foo(outervar) = bar(innervar)' and then according to innervars
we make new SortGroupClause items and TargetEntry items for sub-select.
And at last we pull up the sub-select into upper range table.

As a result, the plan would change as:

FROM

   QUERY PLAN

 Seq Scan on foo
   Filter: ((a)::numeric > (SubPlan 1))
   SubPlan 1
 ->  Aggregate
   ->  Seq Scan on bar
 Filter: (foo.b = b)
(6 rows)

TO

QUERY PLAN
--
 Hash Join
   Hash Cond: (foo.b = bar.b)
   Join Filter: ((foo.a)::numeric > (avg(bar.a)))
   ->  Seq Scan on foo
   ->  Hash
 ->  HashAggregate
   Group Key: bar.b
   ->  Seq Scan on bar
(8 rows)

The patch works but still in draft stage. Post it here to see if it is
the right thing we want.

[1]
https://www.postgresql.org/message-id/flat/CAKU4AWodctmbU%2BZj6U83y_RniQk0UeXBvKH1ZaJ%3DLR_iC90GOw%40mail.gmail.com

Thanks
Richard


v1-0001-Draft-PR-for-pulling-up-EXPR_SUBLINK.patch
Description: Binary data


Re: Add LogicalTapeSetExtend() to logtape.c

2020-02-27 Thread Adam Lee
On Thu, Feb 27, 2020 at 01:01:08PM -0800, Jeff Davis wrote:
> Attached is a patch that exports a new function from logtape.c:
> 
>extern LogicalTapeSet *LogicalTapeSetExtend(
> LogicalTapeSet *lts, int nAdditional);
> 
> The purpose is to allow adding new tapes dynamically for the upcoming
> Hash Aggregation work[0]. HashAgg doesn't know in advance how many
> tapes it will need, though only a limited number are actually active at
> a time.
> 
> This was proposed and originally written by Adam Lee[1] (extract only
> the changes to logtape.c/h from his posted patch). Strangely, I'm
> seeing ~5% regression with his version when doing:
> 
> -- t20m_1_int4 has 20 million random integers
> select * from t20m_1_int4 order by i offset 1;
> 
> Which seems to be due to using a pointer rather than a flexible array
> member (I'm using gcc[2]). My version (attached) still uses a flexible
> array member, which doesn't see the regression; but I repalloc the
> whole struct so the caller needs to save the new pointer to the tape
> set.
> 
> That doesn't entirely make sense to me, and I'm wondering if someone
> else can repro that result and/or make a suggestion, because I don't
> have a good explanation. I'm fine with my version of the patch, but it
> would be nice to know why there's such a big difference using a pointer
> versus a flexible array member.
> 
> Regards,
>   Jeff Davis

I noticed another difference, I was using palloc0(), which could be one of the
reason, but not sure.

Tested your hashagg-20200226.patch on my laptop(Apple clang version 11.0.0),
the average time is 25.9s:

```
create table t20m_1_int4(i int);
copy t20m_1_int4 from program 'shuf -i 1-4294967295 -n 2000';
analyze t20m_1_int4;
```
```
explain analyze select * from t20m_1_int4 order by i offset 1;
  QUERY PLAN
---
 Limit  (cost=3310741.20..3310741.20 rows=1 width=4) (actual 
time=25666.471..25666.471 rows=0 loops=1)
   ->  Sort  (cost=3260740.96..3310741.20 rows=2096 width=4) (actual 
time=20663.065..24715.269 rows=2000 loops=1)
 Sort Key: i
 Sort Method: external merge  Disk: 274056kB
 ->  Seq Scan on t20m_1_int4  (cost=0.00..288496.96 rows=2096 
width=4) (actual time=0.075..2749.385 rows=2000 loops=1)
 Planning Time: 0.109 ms
 Execution Time: 25911.542 ms
(7 rows)
```

But if use the palloc0() or do the MemSet() like:

```
lts = (LogicalTapeSet *) palloc0(offsetof(LogicalTapeSet, tapes) +
ntapes * 
sizeof(LogicalTape));
...
MemSet(lts->tapes, 0, lts->nTapes * sizeof(LogicalTape)); <--- this line 
doesn't matter as I observed,
   which makes a 
little sense the compiler
   might know it's 
already zero.
```

The average time goes up to 26.6s:

```
explain analyze select * from t20m_1_int4 order by i offset 1;
  QUERY PLAN
---
 Limit  (cost=3310741.20..3310741.20 rows=1 width=4) (actual 
time=26419.712..26419.712 rows=0 loops=1)
   ->  Sort  (cost=3260740.96..3310741.20 rows=2096 width=4) (actual 
time=21430.044..25468.661 rows=2000 loops=1)
 Sort Key: i
 Sort Method: external merge  Disk: 274056kB
 ->  Seq Scan on t20m_1_int4  (cost=0.00..288496.96 rows=2096 
width=4) (actual time=0.060..2839.452 rows=2000 loops=1)
 Planning Time: 0.111 ms
 Execution Time: 26652.255 ms
(7 rows)
```

-- 
Adam Lee




Re: pgbench: option delaying queries till connections establishment?

2020-02-27 Thread Kyotaro Horiguchi
At Thu, 27 Feb 2020 10:51:29 -0800, Andres Freund  wrote in 
> Hi,
> 
> On 2020-02-27 10:01:00 -0800, Andres Freund wrote:
> > If so, should this be done unconditionally? A new option? Included in an
> > existing one somehow?
> 
> FWIW, leaving windows, error handling, and other annoyances aside, this
> can be implemented fairly simply. See below.
> 
> As an example of the difference:
> 
> Before:
> andres@awork3:~/build/postgres/dev-optimize/vpath$ ./src/bin/pgbench/pgbench 
> -M prepared -c 5000 -j 100 -T 100 -P1 -S
> starting vacuum...end.
> progress: 100.4 s, 515307.4 tps, lat 1.374 ms stddev 7.739
> transaction type: 
> scaling factor: 30
> query mode: prepared
> number of clients: 5000
> number of threads: 100
> duration: 100 s
> number of transactions actually processed: 51728348
> latency average = 1.374 ms
> latency stddev = 7.739 ms
> tps = 513802.541226 (including connections establishing)
> tps = 521342.427158 (excluding connections establishing)
> 
> 
> Note that there's no progress report until the end. That's because the
> main thread didn't get a connection until the other threads were done.
> 
> 
> After:
> 
> pgbench -M prepared -c 5000 -j 100 -T 100 -P1 -S
> starting vacuum...end.
> progress: 1.5 s, 9943.5 tps, lat 4.795 ms stddev 14.822
> progress: 2.0 s, 380312.6 tps, lat 1.728 ms stddev 15.461
> progress: 3.0 s, 478811.1 tps, lat 2.052 ms stddev 31.687
> progress: 4.0 s, 470804.6 tps, lat 1.941 ms stddev 24.661
> 
> 
> 
> I think this also shows that "including/excluding connections
> establishing" as well as some of the other stats reported pretty
> bogus. In the 'before' case a substantial numer of the connections had
> not yet been established until the end of the test run!

I see it useful. In most cases we don't care connection time of
pgbench. Especially in the mentioned case the result is just bogus.  I
think the reason for "including/excluding connection establishing" is
not that people wants to see how long connection took to establish but
that how long the substantial work took.  If each client did run with
continuously re-establishing new connections the connection time would
be useful, but actually all the connections are established at once at
the beginning.

So FWIW I prefer that the barrier is applied by default (that is, it
can be disabled) and the progress time starts at the time all clients
has been established.

> starting vacuum...end.
+ time to established 5000 connections: 1323ms
! progress: 1.0 s, 33.5 tps, lat 2.795 ms stddev 14.822
! progress: 2.0 s, 380312.6 tps, lat 1.728 ms stddev 15.461
! progress: 3.0 s, 478811.1 tps, lat 2.052 ms stddev 31.687
! progress: 4.0 s, 470804.6 tps, lat 1.941 ms stddev 24.661
> transaction type: 
> scaling factor: 30
> query mode: prepared
> number of clients: 5000
> number of threads: 100
> duration: 100 s
> number of transactions actually processed: 51728348
> latency average = 1.374 ms
> latency stddev = 7.739 ms
> tps = 513802.541226 (including connections establishing)
> tps = 521342.427158 (excluding connections establishing)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

2020-02-27 Thread Mahendra Singh Thalor
On Thu, 16 Jan 2020 at 09:36, Michael Paquier  wrote:
>
> On Fri, Jan 10, 2020 at 08:07:48PM +0900, Michael Paquier wrote:
> > Thinking more about it, this has a race condition if a temporary
> > schema is removed after collecting the OIDs in the drop phase.  So the
> > updated attached is actually much more conservative and does not need
> > an update of the log message, without giving up on the improvements
> > done in v11~.  In 9.4~10, the code of the second phase relies on
> > GetTempNamespaceBackendId() which causes an orphaned relation to not
> > be dropped in the event of a missing namespace.  I'll just leave that
> > alone for a couple of days now..
>
> And back on that one, I still like better the solution as of the
> attached which skips any relations with their namespace gone missing
> as 246a6c87's intention was only to allow orphaned temp relations to
> be dropped by autovacuum when a backend slot is connected, but not
> using yet its own temp namespace.
>
> If we want the drop of temp relations to work properly, more thoughts
> are needed regarding the storage part, and I am not actually sure that
> it is autovacuum's job to handle that better.
>
> Any thoughts?
Hi,

Patch looks good to me and it is fixing the assert failure.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com




Re: Identifying user-created objects

2020-02-27 Thread Amit Langote
On Wed, Feb 26, 2020 at 4:48 PM Masahiko Sawada
 wrote:
> On Thu, 13 Feb 2020 at 17:13, Julien Rouhaud  wrote:
> > On Thu, Feb 13, 2020 at 8:32 AM Amit Langote  
> > wrote:
> > > Maybe we could document that pg_is_user_object() and its internal
> > > counterpart returns true only for objects that are created during
> > > "normal" multi-user database operation.
> >
> > +1
>
> Agreed.
>
> Attached updated version patch.

Thanks for updating the patch.  Some comments:

+  
+   
pg_is_user_object(oid)
+   bool
+   
+true if oid is the object which is
created during
+normal multi-user database operation.
+   
+  

How about clarifying the description further as follows:

"true for objects created while database is operating in normal
multi-user mode, as opposed to single-user mode (see )."

Term "multi-user operation" is not mentioned elsewhere in the
documentation, so better to clarify what it means.

Also, maybe a minor nitpick, but how about adding the new function's
row at the end of the table (Table 9.72) instead of in the middle?

Other than that, patch looks to be in pretty good shape.

Thanks,
Amit




Re: Autovacuum on partitioned table

2020-02-27 Thread Amit Langote
On Fri, Feb 28, 2020 at 11:25 AM Alvaro Herrera
 wrote:
> >   /*
> > +  * If the relation is a partitioned table, we must add up children's
> > +  * reltuples.
> > +  */
> > + if (classForm->relkind == RELKIND_PARTITIONED_TABLE)
> > + {
> > + List *children;
> > + ListCell *lc;
> > +
> > + reltuples = 0;
> > +
> > + /* Find all members of inheritance set taking AccessShareLock 
> > */
> > + children = find_all_inheritors(relid, AccessShareLock, NULL);
> > +
> > + foreach(lc, children)
> > + {
> > + OidchildOID = lfirst_oid(lc);
> > + HeapTuple  childtuple;
> > + Form_pg_class childclass;
> > +
> > + /* Ignore the parent table */
> > + if (childOID == relid)
> > + continue;
>
> I think this loop counts partitioned partitions multiple times, because
> we add up reltuples for all levels, no?  (If I'm wrong, that is, if
> a partitioned rel does not have reltuples, then why skip the parent?)

+1, no need to skip partitioned tables here a their reltuples is always 0.

> > + /*
> > +  * If the table is a leaf partition, tell the stats collector its 
> > parent's
> > +  * changes_since_analyze for auto analyze

Maybe write:

For a leaf partition, add its current changes_since_analyze into its
ancestors' counts.  This must be done before sending the ANALYZE
message as it resets the partition's changes_since_analyze counter.

> > +  */
> > + if (IsAutoVacuumWorkerProcess() &&
> > + rel->rd_rel->relispartition &&
> > + !(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE))
>
> I'm not sure I understand why we do this only on autovac. Why not all
> analyzes?

+1.  If there is a reason, it should at least be documented in the
comment above.

> > + {
> > + Oid  parentoid;
> > + Relation parentrel;
> > + PgStat_StatDBEntry *dbentry;
> > + PgStat_StatTabEntry *tabentry;
> > +
> > + /* Get its parent table's Oid and relation */
> > + parentoid = get_partition_parent(RelationGetRelid(rel));
> > + parentrel = table_open(parentoid, AccessShareLock);
>
> Climbing up the partitioning hierarchy acquiring locks on ancestor
> relations opens up for deadlocks.  It's better to avoid that.  (As a
> test, you could try what happens if you lock the topmost relation with
> access-exclusive and leave a transaction open, then have autoanalyze
> run).  At the same time, I wonder if it's sensible to move one level up
> here, and also have pgstat_report_partanalyze move more levels up.

Maybe fetch all ancestors here and process from the top.  But as we'd
have locked the leaf partition long before we got here, maybe we
should lock ancestors even before we start analyzing the leaf
partition? AccessShareLock should be enough on the ancestors because
we're not actually analyzing them.

(It appears get_partition_ancestors() returns a list where the root
parent is the last element, so need to be careful with that.)

Thanks,
Amit




Re: Crash by targetted recovery

2020-02-27 Thread Kyotaro Horiguchi
At Thu, 27 Feb 2020 20:04:41 +0900, Fujii Masao  
wrote in 
> 
> 
> On 2020/02/27 17:05, Kyotaro Horiguchi wrote:
> > Thank you for the comment.
> > At Thu, 27 Feb 2020 16:23:44 +0900, Fujii Masao
> >  wrote in
> >> On 2020/02/27 15:23, Kyotaro Horiguchi wrote:
>  I failed to understand why random access while reading from
>  stream is bad idea. Could you elaborate why?
> >>> It seems to me the word "streaming" suggests that WAL record should be
> >>> read sequentially. Random access, which means reading from arbitrary
> >>> location, breaks a stream.  (But the patch doesn't try to stop wal
> >>> sender if randAccess.)
> >>>
>  Isn't it sufficient to set currentSource to 0 when disabling
>  StandbyMode?
> >>> I thought that and it should work, but I hesitated to manipulate on
> >>> currentSource in StartupXLOG. currentSource is basically a private
> >>> state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I
> >>> think it's not good to modify it out of the the logic in
> >>> WaitForWALToBecomeAvailable.
> >>
> >> If so, what about adding the following at the top of
> >> WaitForWALToBecomeAvailable()?
> >>
> >>  if (!StandbyMode && currentSource == XLOG_FROM_STREAM)
> >>   currentSource = 0;
> > It works virtually the same way. I'm happy to do that if you don't
> > agree to using randAccess. But I'd rather do that in the 'if
> > (!InArchiveRecovery)' section.
> 
> The approach using randAccess seems unsafe. Please imagine
> the case where currentSource is changed to XLOG_FROM_ARCHIVE
> because randAccess is true, while walreceiver is still running.
> For example, this case can occur when the record at REDO
> starting point is fetched with randAccess = true after walreceiver
> is invoked to fetch the last checkpoint record. The situation
> "currentSource != XLOG_FROM_STREAM while walreceiver is
>  running" seems invalid. No?

When I mentioned an possibility of changing ReadRecord so that it
modifies randAccess instead of currentSource, I thought that
WaitForWALToBecomeAvailable should shutdown wal receiver as
needed.

At Thu, 27 Feb 2020 15:23:07 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
me> location, breaks a stream.  (But the patch doesn't try to stop wal
me> sender if randAccess.)

And random access during StandbyMode ususally (always?) lets RecPtr go
back.  I'm not sure WaitForWALToBecomeAvailable works correctly if we
don't have a file in pg_wal and the REDO point is far back by more
than a segment from the initial checkpoint record.  (It seems to cause
assertion failure, but I haven't checked that.)

If we go back to XLOG_FROM_ARCHIVE by random access, it correctly
re-connects to the primary for the past segment.

> So I think that the approach that I proposed is better.

It depends on how far we assume RecPtr go back.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Autovacuum on partitioned table

2020-02-27 Thread Amit Langote
Hosoya-san,

Thanks for the new patch.

On Wed, Feb 26, 2020 at 11:33 AM yuzuko  wrote:
> Attach the v5 patch.  In this patch, pgstat_report_analyze() always reports 0 
> as
> msg.m_live_tuples and m_dead_tuples when the relation is partitioned.

Some comments:

+ * PgStat_MsgPartAnalyzeSent by the backend or autovacuum daemon
+ *  after ANALYZE for partitioned tables

Looking at the way this message is used, it does not seem to be an
"analyze" message and also it's not sent "after ANALYZE of partitioned
tables", but really after ANALYZE of leaf partitions.  Analyze (for
both partitioned tables and leaf partitions) is reported as a
PgStat_MsgAnalyze message as before.  It seems that
PgStat_MsgPartAnalyze is only sent to update a leaf partition's
parent's (and recursively any grandparents') changes_since_analyze
counters, so maybe we should find a different name for it.  Maybe,
PgStat_MsgPartChanges and accordingly the message type enum value.

 /*
- * Report ANALYZE to the stats collector, too.  However, if doing
- * inherited stats we shouldn't report, because the stats collector only
- * tracks per-table stats.  Reset the changes_since_analyze counter only
- * if we analyzed all columns; otherwise, there is still work for
- * auto-analyze to do.
+ * Report ANALYZE to the stats collector, too.  If the table is a
+ * partition, report changes_since_analyze of its parent because
+ * autovacuum process for partitioned tables needs it.  Reset the
+ * changes_since_analyze counter only if we analyzed all columns;
+ * otherwise, there is still work for auto-analyze to do.
  */

The new comment says "partitions", which we typically use to refer to
a child table, but this comment really talks about parent tables.  Old
comment says we don't report "inherited stats", presumably because
stats collector lacks the infrastructure to distinguish a table's
inherited stats and own stats, at least in the case of traditional
inheritance.  With this patch, we are making an exception for
partitioned tables, because we are also teaching the stats collector
to maintain at least changes_since_analyze for them that accumulates
counts of changed tuples from partitions.

It seems Alvaro already reported some of the other issues I had with
the patch, such as why partanalyze messages are only sent from a
autovacuum worker.

Thanks,
Amit




Re: Autovacuum on partitioned table

2020-02-27 Thread Alvaro Herrera
Hello Yuzuko,

> +  * Report ANALYZE to the stats collector, too.  If the table is a
> +  * partition, report changes_since_analyze of its parent because
> +  * autovacuum process for partitioned tables needs it.  Reset the
> +  * changes_since_analyze counter only if we analyzed all columns;
> +  * otherwise, there is still work for auto-analyze to do.
>*/
> - if (!inh)
> - pgstat_report_analyze(onerel, totalrows, totaldeadrows,
> -   (va_cols == NIL));
> + if (!inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> + pgstat_report_analyze(onerel, totalrows, totaldeadrows,
> +   (va_cols == NIL));
 
Hmm, I think the comment has a bug: it says "report ... of its parent"
but the report is of the same rel.  (The pgstat_report_analyze line is
mis-indented also).


>   /*
> +  * If the relation is a partitioned table, we must add up children's
> +  * reltuples.
> +  */
> + if (classForm->relkind == RELKIND_PARTITIONED_TABLE)
> + {
> + List *children;
> + ListCell *lc;
> +
> + reltuples = 0;
> +
> + /* Find all members of inheritance set taking AccessShareLock */
> + children = find_all_inheritors(relid, AccessShareLock, NULL);
> +
> + foreach(lc, children)
> + {
> + OidchildOID = lfirst_oid(lc);
> + HeapTuple  childtuple;
> + Form_pg_class childclass;
> +
> + /* Ignore the parent table */
> + if (childOID == relid)
> + continue;

I think this loop counts partitioned partitions multiple times, because
we add up reltuples for all levels, no?  (If I'm wrong, that is, if
a partitioned rel does not have reltuples, then why skip the parent?)
 
> + /*
> +  * If the table is a leaf partition, tell the stats collector its 
> parent's
> +  * changes_since_analyze for auto analyze
> +  */
> + if (IsAutoVacuumWorkerProcess() &&
> + rel->rd_rel->relispartition &&
> + !(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE))

I'm not sure I understand why we do this only on autovac. Why not all
analyzes?

> + {
> + Oid  parentoid;
> + Relation parentrel;
> + PgStat_StatDBEntry *dbentry;
> + PgStat_StatTabEntry *tabentry;
> +
> + /* Get its parent table's Oid and relation */
> + parentoid = get_partition_parent(RelationGetRelid(rel));
> + parentrel = table_open(parentoid, AccessShareLock);

Climbing up the partitioning hierarchy acquiring locks on ancestor
relations opens up for deadlocks.  It's better to avoid that.  (As a
test, you could try what happens if you lock the topmost relation with
access-exclusive and leave a transaction open, then have autoanalyze
run).  At the same time, I wonder if it's sensible to move one level up
here, and also have pgstat_report_partanalyze move more levels up.

> + * pgstat_report_partanalyze() -
> + *
> + *   Tell the collector about the parent table of which partition just 
> analyzed.
> + *
> + * Caller must provide a child's changes_since_analyze as a parents.

I'm not sure what the last line is trying to say.


Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Autovacuum on partitioned table

2020-02-27 Thread Masahiko Sawada
On Wed, 26 Feb 2020 at 11:33, yuzuko  wrote:
>
> Hi,
>
> Thanks for reviewing the patch.
>
> > > We can make it work correctly but I think perhaps we can skip updating
> > > statistics values of partitioned tables other than n_mod_since_analyze
> > > as the first step. Because if we support also n_live_tup and
> > > n_dead_tup, user might get confused that other statistics values such
> > > as seq_scan, seq_tup_read however are not supported.
> >
> > +1, that makes sense.
> >
> Yes, Indeed.  I modified it not to update statistics other than
> n_mod_since_analyze.
> Attach the v5 patch.  In this patch, pgstat_report_analyze() always reports 0 
> as
> msg.m_live_tuples and m_dead_tuples when the relation is partitioned.
>

Thank you for updating the patch. I'll look at it. I'd recommend to
register this patch to the next commit fest so at not to forget.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Allowing ALTER TYPE to change storage strategy

2020-02-27 Thread Tomas Vondra

Hi,

From time to time I run into the limitation that ALTER TYPE does not
allow changing storage strategy. I've written a bunch of data types over
the years - in some cases I simply forgot to specify storage in CREATE
TYPE (so it defaulted to PLAIN) or I expected PLAIN to be sufficient and
then later wished I could enable TOAST.

Obviously, without ALTER TYPE supporting that it's rather tricky. You
either need to do dump/restore, or tweak the pg_type catalog directly.
So here is an initial patch extending ALTER TYPE to support this.

The question is why this was not implemented before - my assumption is
this is not simply because no one wanted that. We track the storage in
pg_attribute too, and ALTER TABLE allows changing that ...

My understanding is that pg_type.typstorage essentially specifies two
things: (a) default storage strategy for the attributes with this type,
and (b) whether the type implementation is prepared to handle TOAST-ed
values or not. And pg_attribute.attstorage has to respect this - when
the type says PLAIN then the attribute can't simply use strategy that
would enable TOAST.

Luckily, this is only a problem when switching typstorage to PLAIN (i.e.
when disabling TOAST for the type). The attached v1 patch checks if
there are attributes with non-PLAIN storage for this type, and errors
out if it finds one. But unfortunately that's not entirely correct,
because ALTER TABLE only sets storage for new data. A table may be
created with e.g. EXTENDED storage for an attribute, a bunch of rows may
be loaded and then the storage for the attribute may be changed to
PLAIN. That would pass the check as it's currently in the patch, yet
there may be TOAST-ed values for the type with PLAIN storage :-(

I'm not entirely sure what to do about this, but I think it's OK to just
reject changes in this direction (from non-PLAIN to PLAIN storage). I've
never needed it, and it seems pretty useless - it seems fine to just
instruct the user to do a dump/restore.

In the opposite direction - when switching from PLAIN to a TOAST-enabled
storage, or enabling/disabling compression, this is not an issue at all.
It's legal for type to specify e.g. EXTENDED but attribute to use PLAIN,
for example. So the attached v1 patch simply allows this direction.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/ref/alter_type.sgml b/doc/src/sgml/ref/alter_type.sgml
index 67be1dd568..03a6a59468 100644
--- a/doc/src/sgml/ref/alter_type.sgml
+++ b/doc/src/sgml/ref/alter_type.sgml
@@ -30,6 +30,7 @@ ALTER TYPE name 
RENAME TO name SET SCHEMA 
new_schema
 ALTER TYPE name ADD VALUE [ IF 
NOT EXISTS ] new_enum_value [ { 
BEFORE | AFTER } neighbor_enum_value ]
 ALTER TYPE name RENAME VALUE 
existing_enum_value TO 
new_enum_value
+ALTER TYPE name SET STORAGE { 
PLAIN | EXTERNAL | EXTENDED | MAIN }
 
 where action is one 
of:
 
@@ -97,6 +98,38 @@ ALTER TYPE name 
RENAME VALUE 

 
+   
+
+ SET STORAGE
+ 
+  TOAST
+  per-type storage settings
+ 
+
+
+ 
+  This form sets the storage mode for a data type, controlling whether the
+  values are held inline or in a secondary TOAST table,
+  and whether the data should be compressed or not.
+  PLAIN must be used for fixed-length values such as
+  integer and is inline, uncompressed. MAIN
+  is for inline, compressible data. EXTERNAL is for
+  external, uncompressed data, and EXTENDED is for
+  external, compressed data.  EXTENDED is the default
+  for most data types that support non-PLAIN storage.
+  Use of EXTERNAL will make substring operations on
+  very large text and bytea values run faster,
+  at the penalty of increased storage space.  Note that
+  SET STORAGE doesn't itself change anything in the
+  tables, it just sets the strategy to be used by tables created in the
+  future. See  for more information.
+  Note that this merely modifies the default for a data type, but each
+  attribute specifies it's own strategy that overrides this value.
+  See  for how to change that.
+ 
+
+   
+

 SET SCHEMA
 
diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c
index 1cb84182b0..d021b1d272 100644
--- a/src/backend/commands/alter.c
+++ b/src/backend/commands/alter.c
@@ -1042,3 +1042,48 @@ AlterObjectOwner_internal(Relation rel, Oid objectId, 
Oid new_ownerId)
 
InvokeObjectPostAlterHook(classId, objectId, 0);
 }
+
+/*
+ * Executes an ALTER TYPE / SET STORAGE statement.  At the moment this is
+ * supported only for OBJECT_TYPE nad OBJECT_DOMAIN.
+ */
+ObjectAddress
+ExecAlterTypeStorageStmt(AlterTypeStorageStmt *stmt)
+{
+   char   *storagemode;
+   charnewstorage;
+
+   Assert(IsA(stmt->newstorage, String));
+   storagemode = strVal(stmt->newstorage);
+
+   if (pg_strcasecmp(storagemode, "plain") == 

Re: ALTER INDEX fails on partitioned index

2020-02-27 Thread Alvaro Herrera
On 2020-Feb-27, Justin Pryzby wrote:

> The attached allows CREATE/ALTER to specify reloptions on a partitioned table
> which are used as defaults for future children.
> 
> I think that's a desirable behavior, same as for tablespaces.  Michael
> mentioned that ALTER INDEX ONLY doesn't exist, but that's only an issue if
> ALTER acts recursively, which isn't the case here.

I think ALTER not acting recursively is a bug that we would do well not
to propagate any further.  Enough effort we have to spend trying to fix
that already.  Let's add ALTER .. ONLY if needed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-02-27 Thread Alvaro Herrera
On 2020-Feb-27, Justin Pryzby wrote:

> Originally, the patch only supported "scanning heap", and set the callback
> strictly, to avoid having callback installed when calling other functions 
> (like
> vacuuming heap/indexes).
> 
> Then incrementally added callbacks in increasing number of places.  We only
> need one errcontext.  And possibly you're right that the callback could always
> be in place (?).  But what about things like vacuuming FSM ?  I think we'd 
> need
> another "phase" for that (or else invent a PHASE_IGNORE to do nothing).  Would
> VACUUM_FSM be added to progress reporting, too?  We're also talking about new
> phase for TRUNCATE_PREFETCH and TRUNCATE_WAIT.

I think we should use a separate enum. It's simple enough, and there's
no reason to use the same enum for two different things if it seems to
complicate matters.

> Regarding the cbarg, at one point I took a suggestion from Andres to use the
> LVRelStats struct.  I got rid of that since I didn't like sharing "blkno"
> between heap scanning and heap vacuuming, and needs to be reset when switching
> back to scanning heap.  I experimented now going back to that now.  The only
> utility is in having an single allocation of relname/space.

I'm unsure about reusing that struct.  Not saying don't do it, just ...
unsure.  It possibly has other responsibilities.

I don't think there's a reason to keep 0002 separate.

Regarding this,

> + case PROGRESS_VACUUM_PHASE_VACUUM_HEAP:
> + if (BlockNumberIsValid(cbarg->blkno))
> + errcontext("while vacuuming block %u of 
> relation \"%s.%s\"",
> + cbarg->blkno, 
> cbarg->relnamespace, cbarg->relname);
> + break;

I think you should still call errcontext() when blkno is invalid.  In
fact, just remove the "if" line altogether and let it show whatever
value is there.  It should work okay.  We don't expect the value to be
invalid anyway.

Maybe it would make sense to make the LVRelStats struct members be char
arrays rather than pointers.  Then you memcpy() or strlcpy() them
instead of palloc/free.

Please don't cuddle your braces.


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Rethinking opclass member checks and dependency strength

2020-02-27 Thread Tom Lane
Tomas Vondra  writes:
> On Sun, Jan 05, 2020 at 12:33:10PM -0500, Tom Lane wrote:
>> I see your point that "check" suggests a read-only operation, but
>> I'm not sure about a better verb.  I thought of "amvalidatemembers",
>> but that's not really much better than "check" is it?

> I don't :-(

Still haven't got a better naming idea, but in the meantime here's
a rebase to fix a conflict with 612a1ab76.

regards, tom lane

diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index 0104d02..1098ab7 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -138,6 +138,7 @@ blhandler(PG_FUNCTION_ARGS)
 	amroutine->amproperty = NULL;
 	amroutine->ambuildphasename = NULL;
 	amroutine->amvalidate = blvalidate;
+	amroutine->amcheckmembers = NULL;
 	amroutine->ambeginscan = blbeginscan;
 	amroutine->amrescan = blrescan;
 	amroutine->amgettuple = NULL;
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 37f8d87..d76d17d 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -141,6 +141,7 @@ typedef struct IndexAmRoutine
 amproperty_function amproperty; /* can be NULL */
 ambuildphasename_function ambuildphasename;   /* can be NULL */
 amvalidate_function amvalidate;
+amcheckmembers_function amcheckmembers; /* can be NULL */
 ambeginscan_function ambeginscan;
 amrescan_function amrescan;
 amgettuple_function amgettuple; /* can be NULL */
@@ -500,7 +501,48 @@ amvalidate (Oid opclassoid);
the access method can reasonably do that.  For example, this might include
testing that all required support functions are provided.
The amvalidate function must return false if the opclass is
-   invalid.  Problems should be reported with ereport messages.
+   invalid.  Problems should be reported with ereport
+   messages, typically at INFO level.
+  
+
+  
+
+void
+amcheckmembers (Oid opfamilyoid,
+Oid opclassoid,
+List *operators,
+List *functions);
+
+   Validate proposed operator and function members of an operator family,
+   so far as the access method can reasonably do that, and set their
+   dependency types if the default is not satisfactory.  This is called
+   during CREATE OPERATOR CLASS and during
+   ALTER OPERATOR FAMILY ADD; in the latter
+   case opclassoid is InvalidOid.
+   The List arguments are lists
+   of OpFamilyMember structs, as defined
+   in amapi.h.
+
+   Tests done by this function will typically be a subset of those
+   performed by amvalidate,
+   since amcheckmembers cannot assume that it is
+   seeing a complete set of members.  For example, it would be reasonable
+   to check the signature of a support function, but not to check whether
+   all required support functions are provided.  Any problems can be
+   reported by throwing an error.
+
+   The dependency-related fields of
+   the OpFamilyMember structs are initialized by
+   the core code to create hard dependencies on the opclass if this
+   is CREATE OPERATOR CLASS, or soft dependencies on the
+   opfamily if this is ALTER OPERATOR FAMILY ADD.
+   amcheckmembers can adjust these fields if some other
+   behavior is more appropriate.  For example, GIN, GiST, and SP-GiST
+   always set operator members to have soft dependencies on the opfamily,
+   since the connection between an operator and an opclass is relatively
+   weak in these index types; so it is reasonable to allow operator members
+   to be added and removed freely.  Optional support functions are typically
+   also given soft dependencies, so that they can be removed if necessary.
   
 
 
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2e8f67e..d1e1176 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -118,6 +118,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->amproperty = NULL;
 	amroutine->ambuildphasename = NULL;
 	amroutine->amvalidate = brinvalidate;
+	amroutine->amcheckmembers = NULL;
 	amroutine->ambeginscan = brinbeginscan;
 	amroutine->amrescan = brinrescan;
 	amroutine->amgettuple = NULL;
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index a7e55ca..e61e629 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -70,6 +70,7 @@ ginhandler(PG_FUNCTION_ARGS)
 	amroutine->amproperty = NULL;
 	amroutine->ambuildphasename = NULL;
 	amroutine->amvalidate = ginvalidate;
+	amroutine->amcheckmembers = gincheckmembers;
 	amroutine->ambeginscan = ginbeginscan;
 	amroutine->amrescan = ginrescan;
 	amroutine->amgettuple = NULL;
diff --git a/src/backend/access/gin/ginvalidate.c b/src/backend/access/gin/ginvalidate.c
index 0b62e0a..f44aa4d 100644
--- a/src/backend/access/gin/ginvalidate.c
+++ b/src/backend/access/gin/ginvalidate.c
@@ -267,3 +267,64 @@ ginvalidate(Oid opclassoid)
 
 	return result;
 }
+
+/*
+ * Prechecking function for adding operators/functions to a GIN 

Re: ALTER INDEX fails on partitioned index

2020-02-27 Thread Justin Pryzby
The attached allows CREATE/ALTER to specify reloptions on a partitioned table
which are used as defaults for future children.

I think that's a desirable behavior, same as for tablespaces.  Michael
mentioned that ALTER INDEX ONLY doesn't exist, but that's only an issue if
ALTER acts recursively, which isn't the case here.

The current behavior seems unreasonable: CREATE allows specifying fillfactor,
which does nothing, and unable to alter it, either:

postgres=# CREATE TABLE tt(i int)PARTITION BY RANGE (i);;
CREATE TABLE
postgres=# CREATE INDEX ON tt(i)WITH(fillfactor=11);
CREATE INDEX
postgres=# \d tt
...
"tt_i_idx" btree (i) WITH (fillfactor='11')
postgres=# ALTER INDEX tt_i_idx SET (fillfactor=12);
ERROR:  "tt_i_idx" is not a table, view, materialized view, or index

Maybe there are other ALTER commands to handle (UNLOGGED currently does nothing
on a partitioned table?, STATISTICS, ...).

The first patch makes a prettier message, per Robert's suggestion.

-- 
Justin
>From e5bb363f514d768a4f540d9c82ad5745944b1486 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 30 Dec 2019 09:31:03 -0600
Subject: [PATCH v2 1/2] More specific error message when failing to alter a
 partitioned index..

"..is not an index" is deemed to be unfriendly

https://www.postgresql.org/message-id/CA%2BTgmobq8_-DS7qDEmMi-4ARP1_0bkgFEjYfiK97L2eXq%2BQ%2Bnw%40mail.gmail.com
---
 src/backend/commands/tablecmds.c | 23 +--
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index b7c8d66..1b271af 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -366,7 +366,7 @@ static void ATRewriteTables(AlterTableStmt *parsetree,
 static void ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode);
 static AlteredTableInfo *ATGetQueueEntry(List **wqueue, Relation rel);
 static void ATSimplePermissions(Relation rel, int allowed_targets);
-static void ATWrongRelkindError(Relation rel, int allowed_targets);
+static void ATWrongRelkindError(Relation rel, int allowed_targets, int actual_target);
 static void ATSimpleRecursion(List **wqueue, Relation rel,
 			  AlterTableCmd *cmd, bool recurse, LOCKMODE lockmode,
 			  AlterTableUtilityContext *context);
@@ -5458,7 +5458,7 @@ ATSimplePermissions(Relation rel, int allowed_targets)
 
 	/* Wrong target type? */
 	if ((actual_target & allowed_targets) == 0)
-		ATWrongRelkindError(rel, allowed_targets);
+		ATWrongRelkindError(rel, allowed_targets, actual_target);
 
 	/* Permissions checks */
 	if (!pg_class_ownercheck(RelationGetRelid(rel), GetUserId()))
@@ -5479,7 +5479,7 @@ ATSimplePermissions(Relation rel, int allowed_targets)
  * type.
  */
 static void
-ATWrongRelkindError(Relation rel, int allowed_targets)
+ATWrongRelkindError(Relation rel, int allowed_targets, int actual_target)
 {
 	char	   *msg;
 
@@ -5527,9 +5527,20 @@ ATWrongRelkindError(Relation rel, int allowed_targets)
 			break;
 	}
 
-	ereport(ERROR,
-			(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-			 errmsg(msg, RelationGetRelationName(rel;
+	if (actual_target == ATT_PARTITIONED_INDEX &&
+			(allowed_targets_INDEX) &&
+			!(allowed_targets_PARTITIONED_INDEX))
+		/* Add a special errhint for this case, since "is not an index" message is unfriendly */
+		ereport(ERROR,
+(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg(msg, RelationGetRelationName(rel)),
+ // errhint("\"%s\" is a partitioned index", RelationGetRelationName(rel;
+ errhint("operation is not supported on partitioned indexes")));
+	else
+		ereport(ERROR,
+(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg(msg, RelationGetRelationName(rel;
+
 }
 
 /*
-- 
2.7.4

>From 5f7f79c4c8e85528e88ffa0b749faad1da4ab4d5 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 25 Feb 2020 14:27:09 -0600
Subject: [PATCH v2 2/2] Allow reloptions on partitioned tables and indexes..

..which provides default values used for future (direct) partitions.

See also similar behavior from:
dfa60814 - Fix tablespace handling for partitioned indexes
ca410302 - Fix tablespace handling for partitioned tables
c94e6942 - Don't allocate storage for partitioned tables.
---
 src/backend/access/common/reloptions.c   | 19 +--
 src/backend/commands/tablecmds.c | 32 +++-
 src/test/regress/expected/reloptions.out | 29 +
 src/test/regress/sql/reloptions.sql  | 11 +++
 4 files changed, 64 insertions(+), 27 deletions(-)

diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 79430d2..a2678e7 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -1108,10 +1108,8 @@ extractRelOptions(HeapTuple tuple, TupleDesc tupdesc,
 		case RELKIND_RELATION:
 		case RELKIND_TOASTVALUE:
 		case RELKIND_MATVIEW:
-			options = heap_reloptions(classForm->relkind, datum, 

Re: Implementing Incremental View Maintenance

2020-02-27 Thread legrand legrand
> I have tried to use an other patch with yours:
> "Planning counters in pg_stat_statements (using pgss_store)"

> setting
> shared_preload_libraries='pg_stat_statements'
> pg_stat_statements.track=all
> restarting the cluster and creating the extension


> When trying following syntax:

> create table b1 (id integer, x numeric(10,3));
> create incremental materialized view mv1 as select id, count(*),sum(x) 
> from b1 group by id;
> insert into b1 values (1,1)
>
> I got an ASSERT FAILURE in pg_stat_statements.c
> on
>   Assert(query != NULL);
>
> comming from matview.c
>   refresh_matview_datafill(dest_old, query, queryEnv, NULL);
> or
>   refresh_matview_datafill(dest_new, query, queryEnv, NULL);
>
> If this (last) NULL field was replaced by the query text, 
> a comment or just "n/a",
> it would fix the problem.

> Could this be investigated ?

Hello,

thank you for patch v14, that fix problems inherited from temporary tables.
it seems that this ASSERT problem with pgss patch is still present ;o(

Could we have a look ?

Thanks in advance
Regards
PAscal



--
Sent from:
https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html





--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html




Re: Resolving the python 2 -> python 3 mess

2020-02-27 Thread Tom Lane
Andrew Dunstan  writes:
> On Thu, Feb 27, 2020 at 1:33 AM Tom Lane  wrote:
>> OK, so we need that *and* the AddProject addition you mentioned?

> Yes, the first one builds it, the second one fixes the tests to run correctly.

Thanks, here's a patchset incorporating those fixes.  Otherwise
same as before.

regards, tom lane

diff --git a/doc/src/sgml/plpython.sgml b/doc/src/sgml/plpython.sgml
index 1921915..ac989a3 100644
--- a/doc/src/sgml/plpython.sgml
+++ b/doc/src/sgml/plpython.sgml
@@ -164,13 +164,6 @@
   
 
   
-   See also the
-   document https://docs.python.org/3/whatsnew/3.0.html;>What's
-   New In Python 3.0 for more information about porting to
-   Python 3.
-  
-
-  
It is not allowed to use PL/Python based on Python 2 and PL/Python
based on Python 3 in the same session, because the symbols in the
dynamic modules would clash, which could result in crashes of the
@@ -179,6 +172,90 @@
a mismatch is detected.  It is possible, however, to use both
PL/Python variants in the same database, from separate sessions.
   
+
+  
+   Converting from Python 2 to Python 3
+
+   
+See the
+document https://docs.python.org/3/whatsnew/3.0.html;>What's
+New In Python 3.0 for the Python community's information and
+recommendations about porting to Python 3.
+   
+
+   
+PostgreSQL provides some support for helping
+you to convert existing Python 2 routines to Python 3.  In an
+installation built with Python 3, there is an
+extension convert_python3 that changes functions
+and procedures from the plpythonu
+and plpython2u languages to
+the plpython3u language.  While doing so, it applies
+the 2to3 tool described in the above document to
+the body of each such routine.
+   
+
+   
+Using convert_python3 can be as simple as:
+
+CREATE EXTENSION convert_python3;
+CALL convert_python3_all();
+
+This must be done as database superuser.  If you wish, you can drop the
+extension once you're done converting everything.
+   
+
+   
+Since convert_python3 is Python 3 code, be careful
+not to install or run it in a session that has previously executed any
+Python 2 code.  As explained above, that won't work.
+   
+
+   
+convert_python3_all has two optional arguments: the
+name of the conversion tool to use (by default 2to3,
+but you might for instance need to provide a full path name) and any
+special command-line options to provide to it.  You might for example
+want to adjust the set of fixer rules
+that 2to3 applies:
+
+CALL convert_python3_all(options = '-f idioms -x apply');
+
+See 2to3's
+https://docs.python.org/3/library/2to3.html;>documentation
+for more information.
+   
+
+   
+The convert_python3 extension also provides a
+procedure that converts just one Python 2 function at a time:
+
+CALL convert_python3_one('myfunc(int)');
+
+The argument is the target function's OID, which can be written as
+a regprocedure constant (see
+).  The main reason to use this would be
+if you need to use different options for different functions.  It has
+the same optional arguments as convert_python3_all:
+
+CALL convert_python3_one('otherfunc(text)', tool = '/usr/bin/2to3',
+ options = '-f idioms');
+
+   
+
+   
+If you have needs that go beyond this, consult the source code for
+the convert_python3 extension (it's just a
+couple of plpython3u procedures) and adapt those
+procedures as necessary.
+   
+
+   
+Keep in mind that if you've constructed any DO blocks
+that use Python 2 code, those will have to be fixed up manually,
+wherever the source code for them exists.
+   
+  
  
 
  
diff --git a/src/pl/plpython/Makefile b/src/pl/plpython/Makefile
index 9e95285..03f858b 100644
--- a/src/pl/plpython/Makefile
+++ b/src/pl/plpython/Makefile
@@ -38,6 +38,9 @@ DATA = $(NAME)u.control $(NAME)u--1.0.sql
 ifeq ($(python_majorversion),2)
 DATA += plpythonu.control plpythonu--1.0.sql
 endif
+ifeq ($(python_majorversion),3)
+DATA += convert_python3.control convert_python3--1.0.sql
+endif
 
 # header files to install - it's not clear which of these might be needed
 # so install them all.
diff --git a/src/pl/plpython/convert_python3--1.0.sql b/src/pl/plpython/convert_python3--1.0.sql
new file mode 100644
index 000..3444ac0
--- /dev/null
+++ b/src/pl/plpython/convert_python3--1.0.sql
@@ -0,0 +1,149 @@
+/* convert_python3--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION convert_python3" to load this file. \quit
+
+-- This module provides two procedures, one to convert all python2
+-- functions and one to do just one.  They're nearly identical, and
+-- in principle convert_python3_all() could be written as a loop
+-- around convert_python3_one().  It's not done that way since
+-- creating a temp directory for each function in a bulk 

Re: error context for vacuum to include block number

2020-02-27 Thread Justin Pryzby
On Thu, Feb 20, 2020 at 02:02:36PM -0300, Alvaro Herrera wrote:
> On 2020-Feb-19, Justin Pryzby wrote:
> 
> > Also, I was thinking that lazy_scan_heap doesn't needs to do this:
> > 
> > +   /* Pop the error context stack while calling vacuum */
> > +   error_context_stack = errcallback.previous;
> > ...
> > +   /* Set the error context while continuing heap scan */
> > +   error_context_stack = 
> > 
> > It seems to me that's not actually necessary, since lazy_vacuum_heap will 
> > just
> > *push* a context handler onto the stack, and then pop it back off.
> 
> So if you don't pop before pushing, you'll end up with two context
> lines, right?

Hm, looks like you're right, but that's not what I intended (and I didn't hit
that in my test).

> I think it would make sense to
> initialize the context callback just *once* for a vacuum run, and from
> that point onwards, just update the errcbarg struct to match what
> you're currently doing -- not continually pop/push error callback stack
> entries.  See below ...

Originally, the patch only supported "scanning heap", and set the callback
strictly, to avoid having callback installed when calling other functions (like
vacuuming heap/indexes).

Then incrementally added callbacks in increasing number of places.  We only
need one errcontext.  And possibly you're right that the callback could always
be in place (?).  But what about things like vacuuming FSM ?  I think we'd need
another "phase" for that (or else invent a PHASE_IGNORE to do nothing).  Would
VACUUM_FSM be added to progress reporting, too?  We're also talking about new
phase for TRUNCATE_PREFETCH and TRUNCATE_WAIT.

Regarding the cbarg, at one point I took a suggestion from Andres to use the
LVRelStats struct.  I got rid of that since I didn't like sharing "blkno"
between heap scanning and heap vacuuming, and needs to be reset when switching
back to scanning heap.  I experimented now going back to that now.  The only
utility is in having an single allocation of relname/space.

-- 
Justin
>From cad1177c7b61f1543fca1ac2120a238fe6f9e555 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v22 1/3] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 162 +++
 1 file changed, 144 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43ef..d34d0c5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -270,6 +270,8 @@ typedef struct LVParallelState
 
 typedef struct LVRelStats
 {
+	char 		*relnamespace;
+	char		*relname;
 	/* useindex = true means two-pass strategy; false means one-pass */
 	bool		useindex;
 	/* Overall statistics about rel */
@@ -290,8 +292,12 @@ typedef struct LVRelStats
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
-} LVRelStats;
 
+	/* Used for error callback: */
+	char 		*indname;
+	BlockNumber blkno;	/* used only for heap operations */
+	int			phase;	/* Reusing same constants as for progress reporting */
+} LVRelStats;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -314,10 +320,10 @@ static void lazy_vacuum_all_indexes(Relation onerel, Relation *Irel,
 	LVRelStats *vacrelstats, LVParallelState *lps,
 	int nindexes);
 static void lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
-			  LVDeadTuples *dead_tuples, double reltuples);
+			  LVDeadTuples *dead_tuples, double reltuples, LVRelStats *vacrelstats);
 static void lazy_cleanup_index(Relation indrel,
 			   IndexBulkDeleteResult **stats,
-			   double reltuples, bool estimated_count);
+			   double reltuples, bool estimated_count, LVRelStats *vacrelstats);
 static int	lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 			 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
 static bool should_attempt_truncation(VacuumParams *params,
@@ -361,6 +367,9 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
+static void update_vacuum_error_cbarg(LVRelStats *errcbarg, int phase,
+		BlockNumber blkno, Relation rel);
 
 
 /*
@@ -460,6 +469,9 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
 
 	vacrelstats = (LVRelStats *) palloc0(sizeof(LVRelStats));
 
+	vacrelstats->relnamespace = get_namespace_name(RelationGetNamespace(onerel));
+	vacrelstats->relname = pstrdup(RelationGetRelationName(onerel));
+	vacrelstats->indname = NULL;
 	

Add LogicalTapeSetExtend() to logtape.c

2020-02-27 Thread Jeff Davis
Attached is a patch that exports a new function from logtape.c:

   extern LogicalTapeSet *LogicalTapeSetExtend(
LogicalTapeSet *lts, int nAdditional);

The purpose is to allow adding new tapes dynamically for the upcoming
Hash Aggregation work[0]. HashAgg doesn't know in advance how many
tapes it will need, though only a limited number are actually active at
a time.

This was proposed and originally written by Adam Lee[1] (extract only
the changes to logtape.c/h from his posted patch). Strangely, I'm
seeing ~5% regression with his version when doing:

-- t20m_1_int4 has 20 million random integers
select * from t20m_1_int4 order by i offset 1;

Which seems to be due to using a pointer rather than a flexible array
member (I'm using gcc[2]). My version (attached) still uses a flexible
array member, which doesn't see the regression; but I repalloc the
whole struct so the caller needs to save the new pointer to the tape
set.

That doesn't entirely make sense to me, and I'm wondering if someone
else can repro that result and/or make a suggestion, because I don't
have a good explanation. I'm fine with my version of the patch, but it
would be nice to know why there's such a big difference using a pointer
versus a flexible array member.

Regards,
Jeff Davis

[0] 
https://postgr.es/m/6e7c269b9a84ff75fefcad8ab2d4758f03581e98.camel%40j-davis.com
[1] https://postgr.es/m/20200108071202.GA1511@mars.local
[2] gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0

diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c
index 4f78b55fbaf..36104a73a75 100644
--- a/src/backend/utils/sort/logtape.c
+++ b/src/backend/utils/sort/logtape.c
@@ -201,6 +201,7 @@ static long ltsGetFreeBlock(LogicalTapeSet *lts);
 static void ltsReleaseBlock(LogicalTapeSet *lts, long blocknum);
 static void ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
  SharedFileSet *fileset);
+static void ltsInitTape(LogicalTape *lt);
 static void ltsInitReadBuffer(LogicalTapeSet *lts, LogicalTape *lt);
 
 
@@ -536,6 +537,30 @@ ltsConcatWorkerTapes(LogicalTapeSet *lts, TapeShare *shared,
 	lts->nHoleBlocks = lts->nBlocksAllocated - nphysicalblocks;
 }
 
+/*
+ * Initialize per-tape struct.  Note we allocate the I/O buffer and the first
+ * block for a tape only when it is first actually written to.  This avoids
+ * wasting memory space when tuplesort.c overestimates the number of tapes
+ * needed.
+ */
+static void
+ltsInitTape(LogicalTape *lt)
+{
+	lt->writing   = true;
+	lt->frozen= false;
+	lt->dirty = false;
+	lt->firstBlockNumber  = -1L;
+	lt->curBlockNumber= -1L;
+	lt->nextBlockNumber   = -1L;
+	lt->offsetBlockNumber = 0L;
+	lt->buffer= NULL;
+	lt->buffer_size   = 0;
+	/* palloc() larger than MaxAllocSize would fail */
+	lt->max_size  = MaxAllocSize;
+	lt->pos   = 0;
+	lt->nbytes= 0;
+}
+
 /*
  * Lazily allocate and initialize the read buffer. This avoids waste when many
  * tapes are open at once, but not all are active between rewinding and
@@ -579,7 +604,6 @@ LogicalTapeSetCreate(int ntapes, TapeShare *shared, SharedFileSet *fileset,
 	 int worker)
 {
 	LogicalTapeSet *lts;
-	LogicalTape *lt;
 	int			i;
 
 	/*
@@ -597,29 +621,8 @@ LogicalTapeSetCreate(int ntapes, TapeShare *shared, SharedFileSet *fileset,
 	lts->nFreeBlocks = 0;
 	lts->nTapes = ntapes;
 
-	/*
-	 * Initialize per-tape structs.  Note we allocate the I/O buffer and the
-	 * first block for a tape only when it is first actually written to.  This
-	 * avoids wasting memory space when tuplesort.c overestimates the number
-	 * of tapes needed.
-	 */
 	for (i = 0; i < ntapes; i++)
-	{
-		lt = >tapes[i];
-		lt->writing = true;
-		lt->frozen = false;
-		lt->dirty = false;
-		lt->firstBlockNumber = -1L;
-		lt->curBlockNumber = -1L;
-		lt->nextBlockNumber = -1L;
-		lt->offsetBlockNumber = 0L;
-		lt->buffer = NULL;
-		lt->buffer_size = 0;
-		/* palloc() larger than MaxAllocSize would fail */
-		lt->max_size = MaxAllocSize;
-		lt->pos = 0;
-		lt->nbytes = 0;
-	}
+		ltsInitTape(>tapes[i]);
 
 	/*
 	 * Create temp BufFile storage as required.
@@ -1004,6 +1007,29 @@ LogicalTapeFreeze(LogicalTapeSet *lts, int tapenum, TapeShare *share)
 	}
 }
 
+/*
+ * Add additional tapes to this tape set. Not intended to be used when any
+ * tapes are frozen.
+ */
+LogicalTapeSet *
+LogicalTapeSetExtend(LogicalTapeSet *lts, int nAdditional)
+{
+	int i;
+	int		nTapesOrig = lts->nTapes;
+	Size	newSize;
+
+	lts->nTapes += nAdditional;
+	newSize = offsetof(LogicalTapeSet, tapes) +
+		lts->nTapes * sizeof(LogicalTape);
+
+	lts = (LogicalTapeSet *) repalloc(lts, newSize);
+
+	for (i = nTapesOrig; i < lts->nTapes; i++)
+		ltsInitTape(>tapes[i]);
+
+	return lts;
+}
+
 /*
  * Backspace the tape a given number of bytes.  (We also support a more
  * general seek interface, see below.)
diff --git a/src/include/utils/logtape.h b/src/include/utils/logtape.h
index 

Add absolute value to dict_int

2020-02-27 Thread Jeff Janes
I've seen a few requests on how to make FTS search on the absolute value of
integers.  This question is usually driven by the fact that the text search
parser interprets a separating hyphen ("partnumber-987") as a minus sign.

There is currently no good answer for this that doesn't involve C coding.
I think this feature has a natural and trivial home in the dict_int
extension, so attached is a patch that does that.

There are no changes to the extension creation scripts, so there is no need
to bump the version.  And I think the convention is that we don't bump the
version just to signify a change which implements a new feature when that
doesn't change the creation scripts.

Cheers,

Jeff


dict_int_absval_v001.patch
Description: Binary data


Re: [HACKERS] Doubt in pgbench TPS number

2020-02-27 Thread Andres Freund
On 2020-02-27 12:26:36 -0800, Andres Freund wrote:
> Hi,
> 
> On 2015-09-25 20:35:45 +0200, Fabien COELHO wrote:
> > 
> > Hello Tatsuo,
> > 
> > > Hmmm... I never use -C. The formula seems ok:
> > > 
> > >tps_exclude = normal_xacts / (time_include -
> > >(INSTR_TIME_GET_DOUBLE(conn_total_time) / 
> > > nthreads));
> > 
> > Hmmm... it is not:-)
> > 
> > I think that the degree of parallelism to consider is nclients, not
> > nthreads: while connection time is accumulated in conn_time, other clients
> > are possibly doing their transactions, in parallel, even if it is in the
> > same thread, so it is not "stopped time" for all clients. It starts to
> > matter with "-j 1 -c 30" and slow transactions, the cumulated conn_time in
> > each thread may be arbitrary close to the whole time if there are many
> > clients.
> 
> I think this pretty much entirely broke the tps_exclude logic when not
> using -C, especially when -c and -j differ. The wait time there is
> actually per thread, not per client.
> 
> In this example I set post_auth_delay=1s on the server. Pgbench
> tells me:
> pgbench -M prepared -c 180 -j 180 -T 10 -P1 -S
> tps = 897607.544862 (including connections establishing)
> tps = 1004793.708611 (excluding connections establishing)
> 
> pgbench -M prepared -c 180 -j 60 -T 10 -P1 -S
> tps = 739502.979613 (including connections establishing)
> tps = 822639.038779 (excluding connections establishing)
> 
> pgbench -M prepared -c 180 -j 30 -T 10 -P1 -S
> tps = 376468.177081 (including connections establishing)
> tps = 418554.527585 (excluding connections establishing)
> 
> which pretty obviously is bogus. While I'd not expect it'd to work
> perfectly, the "excluding" number should stay roughly constant.
> 
> 
> The fundamental issue is that without -C *none* of the connections in
> each thread gets to actually execute work before all of them have
> established a connection. So dividing conn_total_time by / nclients is
> wrong.
> 
> For more realistic connection delays this leads to the 'excluding'
> number being way too close to the 'including' number, even if a
> substantial portion of the time is spent on it.

Not suggesting it as an actual fix, but just multiplying the computed
conn_time within runThread() by the number of connections the thread
handles (for the non -C case) results in much more reasonable output.

pgbench -M prepared -c 180 -j 30 -T 10 -P1 -S
before:
tps = 378393.985650 (including connections establishing)
tps = 420691.025930 (excluding connections establishing)
after:
tps = 379818.929680 (including connections establishing)
tps = 957405.785600 (excluding connections establishing)

pgbench -M prepared -c 180 -j 180 -T 10 -P1 -S

before:
tps = 906662.031099 (including connections establishing)
tps = 1012223.500880 (excluding connections establishing)

after:
tps = 906178.154876 (including connections establishing)
tps = 1012431.154463 (excluding connections establishing)


The numbers are still a bit bogus because of thread startup overhead
being included, and conn_time being computed relative to the thread
creation. But they at least make some basic sense.

diff --git i/src/bin/pgbench/pgbench.c w/src/bin/pgbench/pgbench.c
index 1159757acb0..3bc45107136 100644
--- i/src/bin/pgbench/pgbench.c
+++ w/src/bin/pgbench/pgbench.c
@@ -6265,6 +6265,16 @@ threadRun(void *arg)
 INSTR_TIME_SET_CURRENT(thread->conn_time);
 INSTR_TIME_SUBTRACT(thread->conn_time, thread->start_time);
 
+/* add once for each other connection */
+if (!is_connect)
+{
+instr_time e = thread->conn_time;
+for (i = 0; i < (nstate - 1); i++)
+{
+INSTR_TIME_ADD(thread->conn_time, e);
+}
+}
+
 /* explicitly initialize the state machines */
 for (i = 0; i < nstate; i++)
 {

Greetings,

Andres Freund




ALTER TEXT SEARCH DICTIONARY tab completion

2020-02-27 Thread Jeff Janes
"ALTER TEXT SEARCH DICTIONARY foobar" can be followed by an open
parenthesis, but that is not offered in tab completion.  That is useful,
because otherwise I have to look up the docs to see if I need a SET or
OPTION(S) or WITH or something before it, just to discover I don't.

The attached one-line patch adds "(".

We can't go beyond that, as available options for each dictionary are not
known in advance.

Cheers,

Jeff


alter_dict_tab_complete.patch
Description: Binary data


Re: [HACKERS] Doubt in pgbench TPS number

2020-02-27 Thread Andres Freund
Hi,

On 2015-09-25 20:35:45 +0200, Fabien COELHO wrote:
> 
> Hello Tatsuo,
> 
> > Hmmm... I never use -C. The formula seems ok:
> > 
> >tps_exclude = normal_xacts / (time_include -
> >(INSTR_TIME_GET_DOUBLE(conn_total_time) / nthreads));
> 
> Hmmm... it is not:-)
> 
> I think that the degree of parallelism to consider is nclients, not
> nthreads: while connection time is accumulated in conn_time, other clients
> are possibly doing their transactions, in parallel, even if it is in the
> same thread, so it is not "stopped time" for all clients. It starts to
> matter with "-j 1 -c 30" and slow transactions, the cumulated conn_time in
> each thread may be arbitrary close to the whole time if there are many
> clients.

I think this pretty much entirely broke the tps_exclude logic when not
using -C, especially when -c and -j differ. The wait time there is
actually per thread, not per client.

In this example I set post_auth_delay=1s on the server. Pgbench
tells me:
pgbench -M prepared -c 180 -j 180 -T 10 -P1 -S
tps = 897607.544862 (including connections establishing)
tps = 1004793.708611 (excluding connections establishing)

pgbench -M prepared -c 180 -j 60 -T 10 -P1 -S
tps = 739502.979613 (including connections establishing)
tps = 822639.038779 (excluding connections establishing)

pgbench -M prepared -c 180 -j 30 -T 10 -P1 -S
tps = 376468.177081 (including connections establishing)
tps = 418554.527585 (excluding connections establishing)

which pretty obviously is bogus. While I'd not expect it'd to work
perfectly, the "excluding" number should stay roughly constant.


The fundamental issue is that without -C *none* of the connections in
each thread gets to actually execute work before all of them have
established a connection. So dividing conn_total_time by / nclients is
wrong.

For more realistic connection delays this leads to the 'excluding'
number being way too close to the 'including' number, even if a
substantial portion of the time is spent on it.

Greetings,

Andres Freund




Less-silly selectivity for JSONB matching operators

2020-02-27 Thread Tom Lane
While looking at a recent complaint about bad planning, I was
reminded that jsonb's @> and related operators use "contsel"
as their selectivity estimator.  This is really bad, because
(a) contsel is only a stub, yielding a fixed default estimate,
and (b) that default is 0.001, meaning we estimate these operators
as five times more selective than equality, which is surely pretty
silly.

There's a good model for improving this in ltree's ltreeparentsel():
for any "var OP constant" query, we can try applying the operator
to all of the column's MCV and histogram values, taking the latter
as being a random sample of the non-MCV values.  That code is
actually 100% generic except for the question of exactly what
default selectivity ought to be plugged in when we don't have stats.

Hence, the attached draft patch moves that logic into a generic
function in selfuncs.c, and then invents "matchsel" and "matchjoinsel"
generic estimators that have a default estimate of twice DEFAULT_EQ_SEL.
(I'm not especially wedded to that number, but it seemed like a
reasonable starting point.)

There were a couple of other operators that seemed to be inappropriately
using contsel, so I changed all of these to use matchsel:

 @>(tsquery,tsquery)| tsq_mcontains
 <@(tsquery,tsquery)| tsq_mcontained
 @@(text,text)  | ts_match_tt
 @@(text,tsquery)   | ts_match_tq
 -|-(anyrange,anyrange) | range_adjacent
 @>(jsonb,jsonb)| jsonb_contains
 ?(jsonb,text)  | jsonb_exists
 ?|(jsonb,text[])   | jsonb_exists_any
 ?&(jsonb,text[])   | jsonb_exists_all
 <@(jsonb,jsonb)| jsonb_contained
 @?(jsonb,jsonpath) | jsonb_path_exists_opr
 @@(jsonb,jsonpath) | jsonb_path_match_opr

Note: you might think that we should just shove this generic logic
into contsel itself, and maybe areasel and patternsel while at it.
However, that would be pretty useless for these functions' intended
usage with the geometric operators, because we collect neither MCV
nor histogram stats for the geometric data types, making the extra
complexity worthless.  Pending somebody putting some effort into
estimation for the geometric data types, I think we should just get
out of the business of having non-geometric types relying on these
estimators.

This patch is not complete, because I didn't look at changing
the contrib modules, and grep says at least some of them are using
contsel for non-geometric data types.  But I thought I'd put it up
for discussion at this stage.

regards, tom lane

diff --git a/contrib/ltree/ltree_op.c b/contrib/ltree/ltree_op.c
index 070868f..2791037 100644
--- a/contrib/ltree/ltree_op.c
+++ b/contrib/ltree/ltree_op.c
@@ -559,8 +559,6 @@ ltree2text(PG_FUNCTION_ARGS)
 }
 
 
-#define DEFAULT_PARENT_SEL 0.001
-
 /*
  *	ltreeparentsel - Selectivity of parent relationship for ltree data types.
  */
@@ -571,101 +569,12 @@ ltreeparentsel(PG_FUNCTION_ARGS)
 	Oid			operator = PG_GETARG_OID(1);
 	List	   *args = (List *) PG_GETARG_POINTER(2);
 	int			varRelid = PG_GETARG_INT32(3);
-	VariableStatData vardata;
-	Node	   *other;
-	bool		varonleft;
 	double		selec;
 
-	/*
-	 * If expression is not variable <@ something or something <@ variable,
-	 * then punt and return a default estimate.
-	 */
-	if (!get_restriction_variable(root, args, varRelid,
-  , , ))
-		PG_RETURN_FLOAT8(DEFAULT_PARENT_SEL);
-
-	/*
-	 * If the something is a NULL constant, assume operator is strict and
-	 * return zero, ie, operator will never return TRUE.
-	 */
-	if (IsA(other, Const) &&
-		((Const *) other)->constisnull)
-	{
-		ReleaseVariableStats(vardata);
-		PG_RETURN_FLOAT8(0.0);
-	}
-
-	if (IsA(other, Const))
-	{
-		/* Variable is being compared to a known non-null constant */
-		Datum		constval = ((Const *) other)->constvalue;
-		FmgrInfo	contproc;
-		double		mcvsum;
-		double		mcvsel;
-		double		nullfrac;
-		int			hist_size;
-
-		fmgr_info(get_opcode(operator), );
-
-		/*
-		 * Is the constant "<@" to any of the column's most common values?
-		 */
-		mcvsel = mcv_selectivity(, , constval, varonleft,
- );
-
-		/*
-		 * If the histogram is large enough, see what fraction of it the
-		 * constant is "<@" to, and assume that's representative of the
-		 * non-MCV population.  Otherwise use the default selectivity for the
-		 * non-MCV population.
-		 */
-		selec = histogram_selectivity(, ,
-	  constval, varonleft,
-	  10, 1, _size);
-		if (selec < 0)
-		{
-			/* Nope, fall back on default */
-			selec = DEFAULT_PARENT_SEL;
-		}
-		else if (hist_size < 100)
-		{
-			/*
-			 * For histogram sizes from 10 to 100, we combine the histogram
-			 * and default selectivities, putting increasingly more trust in
-			 * the histogram for larger sizes.
-			 */
-			double		hist_weight = hist_size / 100.0;
-
-			selec = selec * hist_weight +
-DEFAULT_PARENT_SEL * (1.0 - hist_weight);
-		}
-
-		/* In any case, don't believe extremely small or large estimates. */
-		if (selec < 0.0001)
-			

Re: [GsoC] Read/write transaction-level routing in Odyssey Project Idea

2020-02-27 Thread Stephen Frost
Greetings,

* koschasia...@gmail.com (koschasia...@gmail.com) wrote:
> I send this response to remind you of my initial email because it might have 
> been lost among others. 

Andrey is the one listed as a possible mentor for that project- I've
added him to the CC list.  Hopefully he'll get back to you soon.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: [GsoC] Read/write transaction-level routing in Odyssey Project Idea

2020-02-27 Thread koschasialis
Hello once again,

I send this response to remind you of my initial email because it might have 
been lost among others. 

I also want to know, if I can familiarize myself with your project idea before 
even sending my proposal.

Thanks for your time, I understand you receive hundreds of emails.

> On 25 Feb 2020, at 17:56, Kostas Chasialis  wrote:
> 
> 
> Greetings, 
> 
> I am a Computer Science student at National and Kapodistrian University of 
> Athens, and I would like to be a part of this year's GsoC program.
> During my academic courses, I developed an interest on databases (back-end) 
> and the main language I used during my academic career is C.
> 
> A project I recently developed was In-Memory SUM SQL Query Execution on 
> RDBMS-like queries using SortMerge Join. As input data we used the data 
> provided from SIGMOD 2018 contest.
> (github link : https://github.com/kchasialis/Query-Compiler-Executor)
> 
> Therefore, the project idea "Read/Write transaction-level routing in Odyssea" 
> highly intrigued me and I would like to be a part of it.
> I believe that I have the necessary background to live up to your 
> expectations and I would like to learn more things about this project and 
> ways I can contribute to its development.
> 
> Thanks in advance, Kostas.
> 


Re: pgbench: option delaying queries till connections establishment?

2020-02-27 Thread Andres Freund
Hi,

On 2020-02-27 10:01:00 -0800, Andres Freund wrote:
> If so, should this be done unconditionally? A new option? Included in an
> existing one somehow?

FWIW, leaving windows, error handling, and other annoyances aside, this
can be implemented fairly simply. See below.

As an example of the difference:

Before:
andres@awork3:~/build/postgres/dev-optimize/vpath$ ./src/bin/pgbench/pgbench -M 
prepared -c 5000 -j 100 -T 100 -P1 -S
starting vacuum...end.
progress: 100.4 s, 515307.4 tps, lat 1.374 ms stddev 7.739
transaction type: 
scaling factor: 30
query mode: prepared
number of clients: 5000
number of threads: 100
duration: 100 s
number of transactions actually processed: 51728348
latency average = 1.374 ms
latency stddev = 7.739 ms
tps = 513802.541226 (including connections establishing)
tps = 521342.427158 (excluding connections establishing)


Note that there's no progress report until the end. That's because the
main thread didn't get a connection until the other threads were done.


After:

pgbench -M prepared -c 5000 -j 100 -T 100 -P1 -S
starting vacuum...end.
progress: 1.5 s, 9943.5 tps, lat 4.795 ms stddev 14.822
progress: 2.0 s, 380312.6 tps, lat 1.728 ms stddev 15.461
progress: 3.0 s, 478811.1 tps, lat 2.052 ms stddev 31.687
progress: 4.0 s, 470804.6 tps, lat 1.941 ms stddev 24.661



I think this also shows that "including/excluding connections
establishing" as well as some of the other stats reported pretty
bogus. In the 'before' case a substantial numer of the connections had
not yet been established until the end of the test run!



diff --git i/src/bin/pgbench/pgbench.c w/src/bin/pgbench/pgbench.c
index 1159757acb0..1a82c6a290e 100644
--- i/src/bin/pgbench/pgbench.c
+++ w/src/bin/pgbench/pgbench.c
@@ -310,6 +310,8 @@ typedef struct RandomState
 /* Various random sequences are initialized from this one. */
 static RandomState base_random_sequence;
 
+pthread_barrier_t conn_barrier;
+
 /*
  * Connection state machine states.
  */
@@ -6110,6 +6112,8 @@ main(int argc, char **argv)
 
 /* start threads */
 #ifdef ENABLE_THREAD_SAFETY
+pthread_barrier_init(_barrier, NULL, nthreads);
+
 for (i = 0; i < nthreads; i++)
 {
 TState *thread = [i];
@@ -6265,6 +6269,8 @@ threadRun(void *arg)
 INSTR_TIME_SET_CURRENT(thread->conn_time);
 INSTR_TIME_SUBTRACT(thread->conn_time, thread->start_time);
 
+pthread_barrier_wait(_barrier);
+
 /* explicitly initialize the state machines */
 for (i = 0; i < nstate; i++)
 {

Greetings,

Andres Freund




pgbench: option delaying queries till connections establishment?

2020-02-27 Thread Andres Freund
Hi,

I am trying to run a few benchmarks measuring the effects of patch to
make GetSnapshotData() faster in the face of larger numbers of
established connections.

Before the patch connection establishment often is very slow due to
contention. The first few connections are fast, but after that it takes
increasingly long. The first few connections constantly hold
ProcArrayLock in shared mode, which then makes it hard for new
connections to acquire it exclusively (I'm addressing that to a
significant degree in the patch FWIW).

But for a fair comparison of the runtime effects I'd like to only
compare the throughput for when connections are actually usable,
otherwise I end up benchmarking few vs many connections, which is not
useful. And because I'd like to run the numbers for a lot of different
numbers of connections etc, I can't just make each run several hour
longs to make the initial minutes not matter much.

Therefore I'd like to make pgbench wait till it has established all
connections, before they run queries.

Does anybody else see this as being useful?

If so, should this be done unconditionally? A new option? Included in an
existing one somehow?

Greetings,

Andres Freund




Re: Allow auto_explain to log plans before queries are executed

2020-02-27 Thread legrand legrand
Hi,

that feature for dumping plans with auto explain is already available in
https://github.com/legrandlegrand/pg_stat_sql_plans

This is an hybrid extension combining auto_explain and pg_stat_statements,
adding a planid and tracking metrics even on error, ..., ...

With 
pg_stat_sql_plans.track_planid = true
pg_stat_sql_plans.explain = true
-->  it writes explain plan in log file after planning and only one time
per (queryid,planid)
   then no need of sampling

and with
pg_stat_sql_plans.track = 'all'
  --> function pgssp_backend_queryid(pid) retrieves (nested) queryid of a
stuck statement, 
and permit to retrieve its plan (by its queryid) in logs.

Regards
PAscal




--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html




Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2020-02-27 Thread Peter Geoghegan
On Wed, Feb 26, 2020 at 10:03 PM Fujii Masao
 wrote:
> Thanks for committing this nice feature!

You're welcome!

> Here is one minor comment.
>
> +  deduplicate_items
> +  storage parameter
>
> This should be
>
>  deduplicate_items storage parameter

I pushed a fix for this.

Thanks
-- 
Peter Geoghegan




BEFORE ROW triggers for partitioned tables

2020-02-27 Thread Alvaro Herrera
Enabling BEFORE triggers FOR EACH ROW in partitioned tables is very easy
-- just remove the check against them.  With that, they work fine.

The main problem is that the executor is not prepared to re-route the
tuple if the user decides to change the partitioning columns (so you get
the error that the partitioning constraint is violated by the partition,
which makes no sense if you're inserting in the top-level partitioned
table).  There are several views about that:

1. Just let it be.  If the user does something silly, it's their problem
if they get an ugly error message.

2. If the partition keys are changed, raise an error.  The trigger is
allowed to change everything but those columns.  Then there's no
conflict, and it allows desirable use cases.

3. Allow the partition keys to change, as long as the tuple ends up in
the same partition.  This is the same as (1) except the error message is
nicer.

The attached patch implements (2).  The cases that are allowed by (3)
are a strict superset of those allowed by (2), so if we decide to allow
it in the future, it is possible without breaking anything that works
after implementing (2).

The implementation harnesses the newly added pg_trigger.tgparentid
column; if that is set to a non-zero value, then we search up the
partitioning hierarchy for each partitioning key column, and verify the
values are bitwise equal, up to the "root".  Notes:

* We must check all levels, not just the one immediately above, because
the routing might involve crawling down several levels, and any of those
might become invalidated if the trigger changes values.

* The "root" is not necessarily the root partitioned table, but instead
it's the table that was named in the command.  Because of this, we don't
need to acquire locks on the tables, since the executor already has the
tables open and locked (thus they cannot be modified by concurrent
commands).

* I find it a little odd that the leaf partition doesn't have any intel
on what its partitioning columns are.  I guess they haven't been needed
thus far, and it seems inappropriate for this admittedly very small
feature to add such a burden on the rest of the system.

* The new function I added, ReportTriggerPartkeyChange(), contains one
serious bug (namely: it doesn't map attribute numbers properly if
partitions are differently defined).   Also, it has a performance issue,
namely that we do heap_getattr() potentially repeatedly -- maybe it'd be
better to "slotify" the tuple prior to doing the checks.  Another
possible controversial point is that its location in commands/trigger.c
isn't great.  (Frankly, I don't understand why the executor trigger
firing functions are in commands/ at all.)

Thoughts?

-- 
Álvaro Herrera
>From 20ad3320d9d5f16756d3fd0bba2db5df74117d35 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 26 Feb 2020 17:34:54 -0300
Subject: [PATCH] Enable BEFORE row-level triggers for partitioned tables

... with the limitation that if partitioning columns are changed, the
operation is aborted.  (The reason for this limitation is that it might
require re-routing the tuple, which doesn't work.)
---
 src/backend/commands/tablecmds.c   |   7 --
 src/backend/commands/trigger.c | 106 ++---
 src/backend/partitioning/partdesc.c|   2 +-
 src/include/utils/reltrigger.h |   1 +
 src/test/regress/expected/triggers.out |  53 -
 src/test/regress/sql/triggers.sql  |  39 -
 6 files changed, 182 insertions(+), 26 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 02a7c04fdb..3b560067a4 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -16503,13 +16503,6 @@ CloneRowTriggersToPartition(Relation parent, Relation partition)
 			 !OidIsValid(trigForm->tgparentid)))
 			continue;
 
-		/*
-		 * Complain if we find an unexpected trigger type.
-		 */
-		if (!TRIGGER_FOR_AFTER(trigForm->tgtype))
-			elog(ERROR, "unexpected trigger \"%s\" found",
- NameStr(trigForm->tgname));
-
 		/* Use short-lived context for CREATE TRIGGER */
 		oldcxt = MemoryContextSwitchTo(perTupCxt);
 
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 6e8b7223fe..d4687de211 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -53,10 +53,12 @@
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/bytea.h"
+#include "utils/datum.h"
 #include "utils/fmgroids.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
@@ -82,6 +84,9 @@ static int	MyTriggerDepth = 0;
 /* Local function prototypes */
 static void ConvertTriggerToFK(CreateTrigStmt *stmt, Oid funcoid);
 static void SetTriggerFlags(TriggerDesc *trigdesc, Trigger *trigger);
+static void ReportTriggerPartkeyChange(ResultRelInfo *relinfo,
+

Re: pg_trigger.tgparentid

2020-02-27 Thread Tom Lane
Alvaro Herrera  writes:
> Thanks both -- pushed.  I also changed regress/sql/triggers to leave
> tables around that have a non-zero tgparentid.  This ensures that the
> pg_upgrade test sees such objects, as well as findoidjoins.

> I refrained from doing the findoidjoins dance itself, though; I got a
> large number of false positives that I think are caused by some pg12-era
> hacking.

Generally I try to update findoidjoins once per release cycle, after
feature freeze.  I don't think it's worth messing with it more often
than that.  But thanks for making sure there'll be data for it to find.

regards, tom lane




Re: Error on failed COMMIT

2020-02-27 Thread Dave Cramer
On Thu, 27 Feb 2020 at 07:44, Dave Cramer  wrote:

>
>
>
> On Wed, 26 Feb 2020 at 16:57, Vik Fearing  wrote:
>
>> On 26/02/2020 22:22, Tom Lane wrote:
>> > Dave Cramer  writes:
>> >> OK, here is a patch that actually doesn't leave the transaction in a
>> failed
>> >> state but emits the error and rolls back the transaction.
>> >
>> >> This is far from complete as it fails a number of tests  and does not
>> cover
>> >> all of the possible paths.
>> >> But I'd like to know if this is strategy will be acceptable ?
>> >
>> > I really don't think that changing the server's behavior here is going
>> to
>> > fly.  The people who are unhappy that we changed it are going to vastly
>> > outnumber the people who are happy.  Even the people who are happy are
>> not
>> > going to find that their lives are improved all that much, because
>> they'll
>> > still have to deal with old servers with the old behavior for the
>> > foreseeable future.
>>
>> Dealing with old servers for a while is something that everyone is used
>> to.
>>
>> > Even granting that a behavioral incompatibility is acceptable, I'm not
>> > sure how a client is supposed to be sure that this "error" means that a
>> > rollback happened, as opposed to real errors that prevented any state
>> > change from occurring.
>>
>> Because the error is a Class 40 — Transaction Rollback.
>>
>> My original example was:
>>
>> postgres=!# commit;
>> ERROR:  40P00: transaction cannot be committed
>> DETAIL:  First error was "42601: syntax error at or near "error"".
>>
>>
>> > (A trivial example of that is misspelling the
>> > COMMIT command; which I'll grant is unlikely in practice.  But there are
>> > less-trivial examples involving internal server malfunctions.)
>>
>> Misspelling the COMMIT command is likely a syntax error, which is Class
>> 42.  Can you give one of those less-trivial examples please?
>>
>> > The only
>> > way to be sure you're out of the transaction is to check the transaction
>> > state that's sent along with ReadyForQuery ... but if you need to do
>> > that, it's not clear why we should change the server behavior at all.
>>
>> How does this differ from the deferred constraint violation example you
>> provided early on in the thread?  That gave the error 23505 and
>> terminated the transaction.  If you run the same scenario with the
>> primary key immediate, you get the *exact same error* but the
>> transaction is *not* terminated!
>>
>> I won't go so far as to suggest we change all COMMIT errors to Class 40
>> (as the spec says), but I'm thinking it very loudly.
>>
>> > I also don't think that this scales to the case of subtransaction
>> > commit/rollback.  That should surely act the same, but your patch
>> doesn't
>> > change it.
>>
>> How does one commit a subtransaction?
>>
>> > Lastly, introducing a new client-visible message level seems right out.
>> > That's a very fundamental protocol break, independently of all else.
>>
>> Yeah, this seemed like a bad idea to me, too.
>>
>
> Ok, here is a much less obtrusive solution thanks to Vladimir.
>

Still had to mess with error levels since commit and chain needs the
existing context to succeed.

After fixing up the tests only 1 still failing.


Dave Cramer
http://www.postgres.rocks


throwerror2.patch
Description: Binary data


Re: pg_trigger.tgparentid

2020-02-27 Thread Alvaro Herrera
Thanks both -- pushed.  I also changed regress/sql/triggers to leave
tables around that have a non-zero tgparentid.  This ensures that the
pg_upgrade test sees such objects, as well as findoidjoins.

I refrained from doing the findoidjoins dance itself, though; I got a
large number of false positives that I think are caused by some pg12-era
hacking.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




jsonpath syntax extensions

2020-02-27 Thread Nikita Glukhov

Hi, hackers!

Attached patches implement several useful jsonpath syntax extensions.
I already published them two years ago in the original SQL/JSON thread,
but then after creation of separate threads for SQL/JSON functions and
JSON_TABLE I forgot about them.

A brief description of the patches:

1. Introduced new jsonpath modifier 'pg' which is used for enabling
PostgreSQL-specific extensions.  This feature was already proposed in the
discussion of jsonpath's like_regex implementation.

2. Added support for raw jbvObject and jbvArray JsonbValues inside jsonpath
engine.  Now, jsonpath can operate with JSON arrays and objects only in
jbvBinary form.  But with introduction of array and object constructors in
patches #4 and #5 raw in-memory jsonb containers can appear in jsonpath engine.
In some places we can iterate through jbvArrays, in others we need to encode
jbvArrays and jbvObjects into jbvBinay.

3. SQL/JSON sequence construction syntax. A simple comma-separated list can be
used to concatenate single values or sequences into a single resulting sequence.

 SELECT jsonb_path_query('[1, 2, 3]', 'pg $[*], 4, 2 + 3');
  jsonb_path_query
 --
  1
  2
  3
  4
  5

 SELECT jsonb_path_query('{ "a": [1, 2, 3], "b": [4, 5] }',
'pg ($.a[*], $.b[*]) ? (@ % 2 == 1)');
   jsonb_path_query
 --
  1
  3
  5


Patches #4-#6 implement ECMAScript-like syntax constructors and accessors:

4. Array construction syntax.
This can also be considered as enclosing a sequence constructor into brackets.
 
 SELECT jsonb_path_query('[1, 2, 3]', 'pg [$[*], 4, 2 + 3]');

  jsonb_path_query
 --
  [1, 2, 3, 4, 5]

Having this feature, jsonb_path_query_array() becomes somewhat redundant.


5. Object construction syntax.  It is useful for constructing derived objects
from the interesting parts of the original object.  (But this is not sufficient
to "project" each object in array, item method like '.map()' is needed here.)

 SELECT jsonb_path_query('{"b": 2}', 'pg { a : 1, b : $.b, "x y" : $.b + 3 }');
 jsonb_path_query
 ---
  { "a" : 1, "b": 3, "x y": 5 }

Fields with empty values are simply skipped regardless of lax/strict mode:

 SELECT jsonb_path_query('{"a": 1}', 'pg { b : $.b, a : $.a ? (@ > 1) }');
  jsonb_path_query
 --
  {}


6. Object subscription syntax.  This gives us ability to specify what key to
extract on runtime.  The syntax is the same as ordinary array subscription
syntax.

 -- non-existent $.x is simply skipped in lax mode
 SELECT jsonb_path_query('{"a": "b", "b": "c"}', 'pg $[$.a, "x", "a"]');
  jsonb_path_query
 --
  "c"
  "b"

 SELECT jsonb_path_query('{"a": "b", "b": "c"}', 'pg $[$fld]', '{"fld": "b"}');
  jsonb_path_query
 --
  "c"

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>From 258579bf45484fd150d952cf78f32f37fabff77a Mon Sep 17 00:00:00 2001
From: Nikita Glukhov 
Date: Thu, 27 Jun 2019 19:58:35 +0300
Subject: [PATCH v1 1/8] Add jsonpath 'pg' modifier for enabling extensions

---
 doc/src/sgml/func.sgml | 29 ++
 src/backend/utils/adt/jsonpath.c   | 54 ++
 src/backend/utils/adt/jsonpath_exec.c  |  4 +++
 src/backend/utils/adt/jsonpath_gram.y  | 17 +++
 src/backend/utils/adt/jsonpath_scan.l  |  1 +
 src/include/utils/jsonpath.h   |  9 --
 src/test/regress/expected/jsonpath.out | 18 
 src/test/regress/sql/jsonpath.sql  |  3 ++
 src/tools/pgindent/typedefs.list   |  1 +
 9 files changed, 116 insertions(+), 20 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 28035f1..d344b95 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -12601,6 +12601,35 @@ table2-mapping
 

 
+   
+Extensions
+
+ PostgreSQL has some extensions to the SQL/JSON
+ Path standard.  These syntax extensions that can enabled by the specifying
+ additional pg modifier before the
+ strict/lax flags.
+  shows these
+ extensions with examples.
+
+
+
+jsonpath Syntax Extensions
+ 
+  
+   
+Name
+Description
+Example JSON
+Example JSON path
+Result
+   
+  
+  
+  
+ 
+
+   
+

 Regular Expressions
 
diff --git a/src/backend/utils/adt/jsonpath.c b/src/backend/utils/adt/jsonpath.c
index 3c0dc38..17c09a7 100644
--- a/src/backend/utils/adt/jsonpath.c
+++ b/src/backend/utils/adt/jsonpath.c
@@ -72,11 +72,20 @@
 #include "utils/jsonpath.h"
 
 
+/* Context for jsonpath encoding. */
+typedef struct JsonPathEncodingContext
+{
+	StringInfo	buf;		/* output buffer */
+	bool		ext;		/* PG extensions are enabled? */
+} JsonPathEncodingContext;
+
 static Datum jsonPathFromCstring(char *in, int len);
 static char *jsonPathToCstring(StringInfo out, JsonPath *in,
 			   int 

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-02-27 Thread Alexey Kondratov

On 2020-02-27 16:41, Alexey Kondratov wrote:


New version of the patch is attached. Thanks again for your review.



Last patch (v18) got a conflict with one of today commits (05d8449e73). 
Rebased version is attached.


--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
The Russian Postgres Company
From ea93b52b298d80aac547735c5917386b37667595 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov 
Date: Tue, 25 Feb 2020 02:22:45 +0300
Subject: [PATCH v19] pg_rewind: Add options to restore WAL files from archive

Currently, pg_rewind fails when it could not find required WAL files in the
target data directory.  One have to manually figure out which WAL files are
required and copy them back from archive.

This commit implements new pg_rewind options, which allow pg_rewind to
automatically retrieve missing WAL files from archival storage. The
restore_command option is read from postgresql.conf.

Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1%40postgrespro.ru
Author: Alexey Kondratov
Reviewed-by: Michael Paquier, Andrey Borodin, Alvaro Herrera
Reviewed-by: Andres Freund, Alexander Korotkov
---
 doc/src/sgml/ref/pg_rewind.sgml  |  28 --
 src/backend/access/transam/xlogarchive.c |  60 ++--
 src/bin/pg_rewind/parsexlog.c|  33 ++-
 src/bin/pg_rewind/pg_rewind.c|  77 +++-
 src/bin/pg_rewind/pg_rewind.h|   6 +-
 src/bin/pg_rewind/t/001_basic.pl |   3 +-
 src/bin/pg_rewind/t/RewindTest.pm|  66 +-
 src/common/Makefile  |   2 +
 src/common/archive.c | 102 +
 src/common/fe_archive.c  | 111 +++
 src/include/common/archive.h |  22 +
 src/include/common/fe_archive.h  |  19 
 src/tools/msvc/Mkvcbuild.pm  |   8 +-
 13 files changed, 457 insertions(+), 80 deletions(-)
 create mode 100644 src/common/archive.c
 create mode 100644 src/common/fe_archive.c
 create mode 100644 src/include/common/archive.h
 create mode 100644 src/include/common/fe_archive.h

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 42d29edd4e..64a6942031 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -66,11 +66,11 @@ PostgreSQL documentation
can be found either on the target timeline, the source timeline, or their common
ancestor. In the typical failover scenario where the target cluster was
shut down soon after the divergence, this is not a problem, but if the
-   target cluster ran for a long time after the divergence, the old WAL
-   files might no longer be present. In that case, they can be manually
-   copied from the WAL archive to the pg_wal directory, or
-   fetched on startup by configuring  or
-   .  The use of
+   target cluster ran for a long time after the divergence, its old WAL
+   files might no longer be present. In this case, you can manually copy them
+   from the WAL archive to the pg_wal directory, or run
+   pg_rewind with the -c option to
+   automatically retrieve them from the WAL archive. The use of
pg_rewind is not limited to failover, e.g.  a standby
server can be promoted, run some write transactions, and then rewinded
to become a standby again.
@@ -232,6 +232,19 @@ PostgreSQL documentation
   
  
 
+ 
+  -c
+  --restore-target-wal
+  
+   
+Use the restore_command defined in
+the target cluster configuration to retrieve WAL files from
+the WAL archive if these files are no longer available in the
+pg_wal directory.
+   
+  
+ 
+
  
   --debug
   
@@ -318,7 +331,10 @@ GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text, bigint, bigint, b
   history forked off from the target cluster. For each WAL record,
   record each data block that was touched. This yields a list of all
   the data blocks that were changed in the target cluster, after the
-  source cluster forked off.
+  source cluster forked off. If some of the WAL files are no longer
+  available, try re-running pg_rewind with
+  the -c option to search for the missing files in
+  the WAL archive.
  
 
 
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 188b73e752..f78a7e8f02 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -21,6 +21,7 @@
 
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
+#include "common/archive.h"
 #include "miscadmin.h"
 #include "postmaster/startup.h"
 #include "replication/walsender.h"
@@ -55,9 +56,6 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 	char		xlogpath[MAXPGPATH];
 	char		xlogRestoreCmd[MAXPGPATH];
 	char		lastRestartPointFname[MAXPGPATH];
-	char	   *dp;
-	char	   

Re: proposal: schema variables

2020-02-27 Thread Pavel Stehule
čt 27. 2. 2020 v 15:59 odesílatel DUVAL REMI  napsal:

> Hello Pavel.
>
>
>
> That looks pretty good to me !
>
>
>
> I’m adding Philippe Beaudoin who was also interested in this topic.
>
>
>
> Recap : We were looking for a way to separate variable from constants in
> the “Schema Variables” proposition from Pavel.
>
> Pavel was saying that there are some limitations regarding the keywords we
> can use, as the community don’t want to introduce too much new keywords in
> Postgres SQL (PL/pgSQL is a different list of keywords).
>
> “CONSTANT” is not a keyword in SQL for Now (though it is one in PL/pgSQL).
>
> Pavel’s syntax allow to use it as a keyword in the “WITH OPTIONS” clause
> that is already supported.
>
> … I think it’s a good idea.
>
>
>
> The list of keywords is defined in : postgresql\src\include\parser\kwlist.h
>
>
>
> Pavel, I saw that in DB2, those variables are called “Global Variables”,
> is it something we can consider changing, or do you prefer to keep using
> the “Schema Variable” name ?
>

It is most hard question. Global variables has sense, but when we will use
it in plpgsql, then this name can be little bit confusing. Personally I
prefer "schema variable" although my opinion is not too strong. This name
more signalize so this is more generic, more database related than some
special kind of plpgsql variables. Now, I think so maybe is better to use
schema variables, because there is different syntax then global temp
tables. Variables are global by design. So in this moment I prefer the name
"schema variables". It can be used as global variables in plpgsql, but it
is one case.

Pavel



>
>
>
> *De :* Pavel Stehule [mailto:pavel.steh...@gmail.com]
> *Envoyé :* jeudi 27 février 2020 15:38
> *À :* DUVAL REMI 
> *Cc :* PostgreSQL Hackers 
> *Objet :* Re: proposal: schema variables
>
>
>
>
>
> Hi
>
>
>
>
> 3) Any way to define CONSTANTs ?
> We already talked a bit about this subject and also Gilles Darold
> introduces it in this mailing-list topic but I'd like to insist on it.
> I think it would be nice to have a way to say that a variable should not
> be changed once defined.
> Maybe it's hard to implement and can be implemented later, but I just want
> to know if this concern is open.
>
>
>
> I played little bit with it and I didn't find any nice solution, but maybe
> I found the solution. I had ideas about some variants, but almost all time
> I had a problem with parser's shifts because all potential keywords are not
> reserved.
>
>
>
> last variant, but maybe best is using keyword WITH
>
>
>
> So the syntax can looks like
>
>
>
> CREATE [ TEMP ] VARIABLE varname [ AS ] type [ NOT NULL ] [ DEFAULT
> expression ] [ WITH [ OPTIONS ] '(' ... ')' ] ]
>
>
>
> What do you think about this syntax? It doesn't need any new keyword, and
> it easy to enhance it.
>
>
>
> CREATE VARIABLE foo AS int DEFAULT 10 WITH OPTIONS ( CONSTANT);
>
>
>
> ?
>
>
>
> Regards
>
>
>
> Pavel
>
>
>
>
>


RE: proposal: schema variables

2020-02-27 Thread DUVAL REMI
Hello Pavel.

That looks pretty good to me !

I’m adding Philippe Beaudoin who was also interested in this topic.

Recap : We were looking for a way to separate variable from constants in the 
“Schema Variables” proposition from Pavel.
Pavel was saying that there are some limitations regarding the keywords we can 
use, as the community don’t want to introduce too much new keywords in Postgres 
SQL (PL/pgSQL is a different list of keywords).
“CONSTANT” is not a keyword in SQL for Now (though it is one in PL/pgSQL).
Pavel’s syntax allow to use it as a keyword in the “WITH OPTIONS” clause that 
is already supported.
… I think it’s a good idea.

The list of keywords is defined in : postgresql\src\include\parser\kwlist.h

Pavel, I saw that in DB2, those variables are called “Global Variables”, is it 
something we can consider changing, or do you prefer to keep using the “Schema 
Variable” name ?


De : Pavel Stehule [mailto:pavel.steh...@gmail.com]
Envoyé : jeudi 27 février 2020 15:38
À : DUVAL REMI 
Cc : PostgreSQL Hackers 
Objet : Re: proposal: schema variables


Hi


3) Any way to define CONSTANTs ?
We already talked a bit about this subject and also Gilles Darold introduces it 
in this mailing-list topic but I'd like to insist on it.
I think it would be nice to have a way to say that a variable should not be 
changed once defined.
Maybe it's hard to implement and can be implemented later, but I just want to 
know if this concern is open.

I played little bit with it and I didn't find any nice solution, but maybe I 
found the solution. I had ideas about some variants, but almost all time I had 
a problem with parser's shifts because all potential keywords are not reserved.

last variant, but maybe best is using keyword WITH

So the syntax can looks like

CREATE [ TEMP ] VARIABLE varname [ AS ] type [ NOT NULL ] [ DEFAULT expression 
] [ WITH [ OPTIONS ] '(' ... ')' ] ]

What do you think about this syntax? It doesn't need any new keyword, and it 
easy to enhance it.

CREATE VARIABLE foo AS int DEFAULT 10 WITH OPTIONS ( CONSTANT);

?

Regards

Pavel




Re: proposal: schema variables

2020-02-27 Thread Pavel Stehule
Hi


> 3) Any way to define CONSTANTs ?
> We already talked a bit about this subject and also Gilles Darold
> introduces it in this mailing-list topic but I'd like to insist on it.
> I think it would be nice to have a way to say that a variable should not
> be changed once defined.
> Maybe it's hard to implement and can be implemented later, but I just want
> to know if this concern is open.
>

I played little bit with it and I didn't find any nice solution, but maybe
I found the solution. I had ideas about some variants, but almost all time
I had a problem with parser's shifts because all potential keywords are not
reserved.

last variant, but maybe best is using keyword WITH

So the syntax can looks like

CREATE [ TEMP ] VARIABLE varname [ AS ] type [ NOT NULL ] [ DEFAULT
expression ] [ WITH [ OPTIONS ] '(' ... ')' ] ]

What do you think about this syntax? It doesn't need any new keyword, and
it easy to enhance it.

CREATE VARIABLE foo AS int DEFAULT 10 WITH OPTIONS ( CONSTANT);

?

Regards

Pavel


Re: Commit fest manager for 2020-03

2020-02-27 Thread Alvaro Herrera
On 2020-Feb-27, Michael Paquier wrote:

> On Wed, Feb 26, 2020 at 08:34:26PM +0100, Tomas Vondra wrote:
> > Nope, the RMT for PG12 was announced on 2019/03/30 [1], i.e. shortly
> > before the end of the last CF (and before pgcon). I think there was some
> > discussion about the members at/after the FOSDEM dev meeting. The
> > overlap with CFM duties is still fairly minimal, and there's not much
> > for RMT to do before the end of the last CF anyway ...
> > 
> > [1] https://www.postgresql.org/message-id/20190330094043.ga28...@paquier.xyz
> > 
> > Maybe we shouldn't wait with assembling RMT until pgcon, though.
> 
> Waiting until PGCon if a bad idea,

+1.  As I recall, the RMT assembles around the time the last CF is over.
Last year it was announced on March 30th, which is the latest date it
has ever happened.  The history can be seen at the bottom here:
https://wiki.postgresql.org/wiki/RMT

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-02-27 Thread Alexey Kondratov

On 2020-02-27 04:52, Michael Paquier wrote:

On Thu, Feb 27, 2020 at 12:43:55AM +0300, Alexander Korotkov wrote:

Regarding text split change, it was made by pgindent.  I didn't notice
it belongs to unchanged part of code.  Sure, we shouldn't include this
into the patch.


I have read through v17 (not tested, sorry), and spotted a couple of
issues that need to be addressed.

+   "--source-pgdata=$standby_pgdata",
+   "--target-pgdata=$master_pgdata",
+   "--no-sync", "--no-ensure-shutdown",
FWIW, I think that perl indenting would reshape this part.  I would
recommend to run src/tools/pgindent/pgperltidy and
./src/tools/perlcheck/pgperlcritic before commit.



Thanks, formatted this part with perltidy. It also has modified 
RecursiveCopy's indents. Pgperlcritic has no complains about this file. 
BTW, being executed on the whole project pgperltidy modifies dozens of 
perl files an even pgindent itself.




+ * Copyright (c) 2020, PostgreSQL Global Development Group
Wouldn't it be better to just use the full copyright here?  I mean the
following:
Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
Portions Copyright (c) 1994, The Regents of the University of 
California




I think so, it contains some older code parts, so it is better to use 
unified copyrights.




+++ b/src/common/archive.c
[...]
+#include "postgres.h"
+
+#include "common/archive.h"
This is incorrect.  All files shared between the backend and the
frontend in src/common/ have to include the following set of headers:
#ifndef FRONTEND
#include "postgres.h"
#else
#include "postgres_fe.h"
#endif

+++ b/src/common/fe_archive.c
[...]
+#include "postgres_fe.h"
This is incomplete.  The following piece should be added:
#ifndef FRONTEND
#error "This file is not expected to be compiled for backend code"
#endif



Fixed both.



+   snprintf(postgres_cmd, sizeof(postgres_cmd), "%s -D %s
-C restore_command",
+postgres_exec_path, datadir_target);
+
I think that this is missing proper quoting.



Yep, added the same quoting as in pg_upgrade/options.



I would rename ConstructRestoreCommand() to BuildRestoreCommand()
while on it..



OK, shorter is better.



I think that it would be saner to check the return status of
ConstructRestoreCommand() in xlogarchive.c as a sanity check, with an
elog(ERROR) if not 0, as that should never happen.



Added.

New version of the patch is attached. Thanks again for your review.


Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
The Russian Postgres Company
From c775a2e40e405474f6ecef35843d276d43fb462f Mon Sep 17 00:00:00 2001
From: Alexander Korotkov 
Date: Tue, 25 Feb 2020 02:22:45 +0300
Subject: [PATCH v18] pg_rewind: Add options to restore WAL files from archive

Currently, pg_rewind fails when it could not find required WAL files in the
target data directory.  One have to manually figure out which WAL files are
required and copy them back from archive.

This commit implements new pg_rewind options, which allow pg_rewind to
automatically retrieve missing WAL files from archival storage. The
restore_command option is read from postgresql.conf.

Discussion: https://postgr.es/m/a3acff50-5a0d-9a2c-b3b2-ee36168955c1%40postgrespro.ru
Author: Alexey Kondratov
Reviewed-by: Michael Paquier, Andrey Borodin, Alvaro Herrera
Reviewed-by: Andres Freund, Alexander Korotkov
---
 doc/src/sgml/ref/pg_rewind.sgml  |  28 --
 src/backend/access/transam/xlogarchive.c |  60 ++--
 src/bin/pg_rewind/parsexlog.c|  33 ++-
 src/bin/pg_rewind/pg_rewind.c|  77 +++-
 src/bin/pg_rewind/pg_rewind.h|   6 +-
 src/bin/pg_rewind/t/001_basic.pl |   3 +-
 src/bin/pg_rewind/t/RewindTest.pm|  66 +-
 src/common/Makefile  |   2 +
 src/common/archive.c | 102 +
 src/common/fe_archive.c  | 111 +++
 src/include/common/archive.h |  22 +
 src/include/common/fe_archive.h  |  19 
 src/tools/msvc/Mkvcbuild.pm  |   8 +-
 13 files changed, 457 insertions(+), 80 deletions(-)
 create mode 100644 src/common/archive.c
 create mode 100644 src/common/fe_archive.c
 create mode 100644 src/include/common/archive.h
 create mode 100644 src/include/common/fe_archive.h

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 42d29edd4e..64a6942031 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -66,11 +66,11 @@ PostgreSQL documentation
can be found either on the target timeline, the source timeline, or their common
ancestor. In the typical failover scenario where the target cluster was
shut down soon after the divergence, this is not a problem, but if the
-  

Re: Improve handling of parameter differences in physical replication

2020-02-27 Thread Peter Eisentraut

On 2020-02-27 11:13, Fujii Masao wrote:

Btw., I think the current setup is slightly buggy.  The MaxBackends value that 
is used to size shared memory is computed as MaxConnections + 
autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders, but we 
don't track autovacuum_max_workers in WAL.

Maybe this is because autovacuum doesn't work during recovery?


Autovacuum on the primary can use locks or xids, and so it's possible 
that the standby when processing WAL encounters more of those than it 
has locally allocated shared memory to handle.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Using stat collector for collecting long SQL

2020-02-27 Thread legrand legrand
Hi

> There is a often problem with taking source of long SQL strings. The
> pg_stat_activity field query is reduced to some short limit and is not too
> practical to increase this limit.

I thought it was "old story", since that track_activity_query_size can be
increased widely:
https://www.postgresql.org/message-id/flat/7b5ecc5a9991045e2f13c84e3047541d%40postgrespro.ru

[...]

> It can be base for implementation EXPLAIN PID ?
+1 for this concept, that would be very usefull for diagnosing stuck
queries.

Until now the only proposal I saw regarding this was detailled in
https://www.postgresql.org/message-id/1582756552256-0.p...@n3.nabble.com
a prerequisite to be able to use extension postgrespro/pg_query_state

Regards 
PAscal



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html




Re: [Proposal] Level4 Warnings show many shadow vars

2020-02-27 Thread Michael Paquier
On Thu, Feb 27, 2020 at 10:43:50AM +0100, Peter Eisentraut wrote:
> This thread is still in the commit fest, but it's apparently gone as far as
> it will, so I've set it to "Committed" for lack of a "partial" status.

Thanks, that sounds right to me.  I was just looking at the latest
patch presented after seeing your reply, and I did not spot
immediately any issues standing out compared to the others.
--
Michael


signature.asc
Description: PGP signature


Re: PG v12.2 - Setting jit_above_cost is causing the server to crash

2020-02-27 Thread Dave Page
Hi

On Thu, Feb 27, 2020 at 12:41 PM Tom Lane  wrote:

> Aditya Toshniwal  writes:
> > On Mon, Feb 24, 2020 at 12:46 PM Andres Freund 
> wrote:
> >> This isn't reproducible here. Are you sure that you're running on a
> >> clean installation?
>
> > Yes I did a fresh installation using installer provided here -
> > https://www.enterprisedb.com/downloads/postgresql
>
> There is apparently something wrong with the JIT stuff in EDB's 12.2
> build for macOS.  At least, that's the conclusion I came to after
> off-list discussion with the submitter of bug #16264, which has pretty
> much exactly this symptom (especially if you're seeing "signal 9"
> reports in the postmaster log).  For him, either disabling JIT or
> reverting to 12.1 made it go away.
>

We've been looking into this;

Apple started a notarisation process some time ago, designed to mark their
applications as conforming to various security requirements, but prior to
Catalina it was essentially optional. When Catalina was released, they made
notarisation for distributed software a requirement, but had the process
issue warnings for non-compliance. As-of the end of January, those warnings
became hard errors, so now our packages must be notarised, and for that to
happen, must be hardened by linking with a special runtime and having
securely time stamped signatures on every binary before being checked and
notarised as such by Apple. Without that, users would have to disable
security features on their systems before they could run our software.

Our packages are being successfully notarised at the moment, because that's
essentially done through a static analysis. We can (and have) added what
Apple call an entitlement in test builds which essentially puts a flag in
the notarisation for the product that declares that it will do JIT
operations, however, it seems that this alone is not enough and that in
addition to the entitlement, we also need to include the MAP_JIT flag in
mmap() calls. See
https://developer.apple.com/documentation/security/hardened_runtime and
https://developer.apple.com/documentation/bundleresources/entitlements/com_apple_security_cs_allow-jit

We're working on trying to test a patch for that at the moment.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Error on failed COMMIT

2020-02-27 Thread Dave Cramer
On Wed, 26 Feb 2020 at 16:57, Vik Fearing  wrote:

> On 26/02/2020 22:22, Tom Lane wrote:
> > Dave Cramer  writes:
> >> OK, here is a patch that actually doesn't leave the transaction in a
> failed
> >> state but emits the error and rolls back the transaction.
> >
> >> This is far from complete as it fails a number of tests  and does not
> cover
> >> all of the possible paths.
> >> But I'd like to know if this is strategy will be acceptable ?
> >
> > I really don't think that changing the server's behavior here is going to
> > fly.  The people who are unhappy that we changed it are going to vastly
> > outnumber the people who are happy.  Even the people who are happy are
> not
> > going to find that their lives are improved all that much, because
> they'll
> > still have to deal with old servers with the old behavior for the
> > foreseeable future.
>
> Dealing with old servers for a while is something that everyone is used to.
>
> > Even granting that a behavioral incompatibility is acceptable, I'm not
> > sure how a client is supposed to be sure that this "error" means that a
> > rollback happened, as opposed to real errors that prevented any state
> > change from occurring.
>
> Because the error is a Class 40 — Transaction Rollback.
>
> My original example was:
>
> postgres=!# commit;
> ERROR:  40P00: transaction cannot be committed
> DETAIL:  First error was "42601: syntax error at or near "error"".
>
>
> > (A trivial example of that is misspelling the
> > COMMIT command; which I'll grant is unlikely in practice.  But there are
> > less-trivial examples involving internal server malfunctions.)
>
> Misspelling the COMMIT command is likely a syntax error, which is Class
> 42.  Can you give one of those less-trivial examples please?
>
> > The only
> > way to be sure you're out of the transaction is to check the transaction
> > state that's sent along with ReadyForQuery ... but if you need to do
> > that, it's not clear why we should change the server behavior at all.
>
> How does this differ from the deferred constraint violation example you
> provided early on in the thread?  That gave the error 23505 and
> terminated the transaction.  If you run the same scenario with the
> primary key immediate, you get the *exact same error* but the
> transaction is *not* terminated!
>
> I won't go so far as to suggest we change all COMMIT errors to Class 40
> (as the spec says), but I'm thinking it very loudly.
>
> > I also don't think that this scales to the case of subtransaction
> > commit/rollback.  That should surely act the same, but your patch doesn't
> > change it.
>
> How does one commit a subtransaction?
>
> > Lastly, introducing a new client-visible message level seems right out.
> > That's a very fundamental protocol break, independently of all else.
>
> Yeah, this seemed like a bad idea to me, too.
>

Ok, here is a much less obtrusive solution thanks to Vladimir.

FWIW, only 10 of 196 tests fail.
Dave Cramer
www.postgres.rocks

>
>


throwerror2.patch
Description: Binary data


Re: Crash by targetted recovery

2020-02-27 Thread Fujii Masao




On 2020/02/27 17:05, Kyotaro Horiguchi wrote:

Thank you for the comment.
At Thu, 27 Feb 2020 16:23:44 +0900, Fujii Masao  
wrote in

On 2020/02/27 15:23, Kyotaro Horiguchi wrote:

I failed to understand why random access while reading from
stream is bad idea. Could you elaborate why?

It seems to me the word "streaming" suggests that WAL record should be
read sequentially. Random access, which means reading from arbitrary
location, breaks a stream.  (But the patch doesn't try to stop wal
sender if randAccess.)


Isn't it sufficient to set currentSource to 0 when disabling
StandbyMode?

I thought that and it should work, but I hesitated to manipulate on
currentSource in StartupXLOG. currentSource is basically a private
state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I
think it's not good to modify it out of the the logic in
WaitForWALToBecomeAvailable.


If so, what about adding the following at the top of
WaitForWALToBecomeAvailable()?

 if (!StandbyMode && currentSource == XLOG_FROM_STREAM)
  currentSource = 0;


It works virtually the same way. I'm happy to do that if you don't
agree to using randAccess. But I'd rather do that in the 'if
(!InArchiveRecovery)' section.


The approach using randAccess seems unsafe. Please imagine
the case where currentSource is changed to XLOG_FROM_ARCHIVE
because randAccess is true, while walreceiver is still running.
For example, this case can occur when the record at REDO
starting point is fetched with randAccess = true after walreceiver
is invoked to fetch the last checkpoint record. The situation
"currentSource != XLOG_FROM_STREAM while walreceiver is
 running" seems invalid. No?

So I think that the approach that I proposed is better.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: Online verification of checksums

2020-02-27 Thread Asif Rehman
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

The patch applies cleanly and works as expected. Just a few minor observations:

- I would suggest refactoring PageIsZero function by getting rid of all_zeroes 
variable
and simply returning false when a non-zero byte is found, rather than setting 
all_zeros
variable to false and breaking the for loop. The function should simply return 
true at the
end otherwise.

- Remove the empty line:
+* would throw an assertion failure.  Consider this a
+* checksum failure.
+*/
+
+   checksum_failures++;


- Code needs to run through pgindent.

Also, I'd suggest to make "5" a define within the current file/function, 
perhaps 
something like "MAX_CHECKSUM_FAILURES". You could move the second 
warning outside the conditional statement as it appears in both "if" and "else" 
blocks.


Regards,
--Asif

The new status of this patch is: Waiting on Author


Re: Improve handling of parameter differences in physical replication

2020-02-27 Thread Fujii Masao




On 2020/02/27 17:23, Peter Eisentraut wrote:

When certain parameters are changed on a physical replication primary,   this 
is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL record.  The 
standby then checks whether its own settings are at least as big as the ones on 
the primary.  If not, the standby shuts down with a fatal error.

The correspondence of settings between primary and standby is required because 
those settings influence certain shared memory sizings that are required for 
processing WAL records that the primary might send.  For example, if the 
primary sends a prepared transaction, the standby must have had 
max_prepared_transaction set appropriately or it won't be able to process those 
WAL records.

However, fatally shutting down the standby immediately upon receipt of the 
parameter change record might be a bit of an overreaction.  The resources 
related to those settings are not required immediately at that point, and might 
never be required if the activity on the primary does not exhaust all those 
resources.  An extreme example is raising max_prepared_transactions on the 
primary but never actually using prepared transactions.

Where this becomes a serious problem is if you have many standbys and you do a 
failover.  If the newly promoted standby happens to have a higher setting for 
one of the relevant parameters, all the other standbys that have followed it 
then shut down immediately and won't be able to continue until you change all 
their settings.

If we didn't do the hard shutdown and we just let the standby roll on with recovery, 
nothing bad will happen and it will eventually produce an appropriate error when those 
resources are required (e.g., "maximum number of prepared transactions 
reached").

So I think there are better ways to handle this.  It might be reasonable to 
provide options.  The attached patch doesn't do that but it would be pretty 
easy.  What the attached patch does is:

Upon receipt of XLOG_PARAMETER_CHANGE, we still check the settings but only 
issue a warning and set a global flag if there is a problem.  Then when we 
actually hit the resource issue and the flag was set, we issue another warning 
message with relevant information.  Additionally, at that point we pause 
recovery instead of shutting down, so a hot standby remains usable.  (That 
could certainly be configurable.)


+1

Btw., I think the current setup is slightly buggy.  The MaxBackends value that 
is used to size shared memory is computed as MaxConnections + 
autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders, but we 
don't track autovacuum_max_workers in WAL.


Maybe this is because autovacuum doesn't work during recovery?

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: Improve handling of parameter differences in physical replication

2020-02-27 Thread Sergei Kornilov
Hello

Thank you for working on this!

> Where this becomes a serious problem is if you have many standbys and you do 
> a failover.

+1
Several times my team would like to pause recovery instead of panic after 
change settings on primary. (same thing for create_tablespace_directories 
replay errors too...)

We documented somewhere (excluding code) shutting down the standby immediately 
upon receipt of the parameter change? doc/src/sgml/high-availability.sgml says 
only about "refuse to start".

regards, Sergei




Re: [Proposal] Level4 Warnings show many shadow vars

2020-02-27 Thread Peter Eisentraut

On 2019-12-18 10:55, Alvaro Herrera wrote:

On 2019-Dec-18, Michael Paquier wrote:


Let's take one example.  The changes in pg_dump/ like
/progname/prog_name/ have just been done in haste, without actual
thoughts about how the problem ought to be fixed.  And in this case,
something which could be more adapted is to remove the argument from
usage() because progname is a global variable, initialized from the
beginning in pg_restore.


We discussed progname as a global/local before -- IIRC in the thread
that introduced the frontend logging API -- and while I think the whole
issue could stand some improvement, we shouldn't let it be driven by
minor changes; that'll only make it more confusing.  IMO if we want it
improved, a larger change (involving the bunch of frontend programs) is
what to look for.  Maybe what you suggest is an improvement, though
(certainly the "prog_name" patch wasn't).


This thread is still in the commit fest, but it's apparently gone as far 
as it will, so I've set it to "Committed" for lack of a "partial" status.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Connection time for \conninfo

2020-02-27 Thread Peter Eisentraut
My opinion is that this is not particularly useful and not appropriate 
to piggy-back onto \conninfo.  Connection information including host, 
port, database, user name is a well-established concept in PostgreSQL 
programs and tools and it contains a delimited set of information. 
Knowing what server and what database you are connected to also seems 
kind of important.  Moreover, this is information that is under control 
of the client, so it must be tracked on the client side.


Knowing how long you've been connected on the other hand seems kind of 
fundamentally unimportant.  If we add that, what's to stop us from 
adding other statistics of minor interest such as how many commands 
you've run, how many errors there were, etc.  The connection time is 
already available, and perhaps we should indeed make it a bit easier to 
get, but it doesn't need to be a psql command.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: vacuum verbose detail logs are unclear; log at *start* of each stage

2020-02-27 Thread Peter Eisentraut

On 2020-01-26 06:36, Justin Pryzby wrote:

 From a3d0b41435655615ab13f808ec7c30e53e596e50 Mon Sep 17 00:00:00 2001
From: Justin Pryzby
Date: Sat, 25 Jan 2020 21:25:37 -0600
Subject: [PATCH v3 1/4] Remove gettext erronously readded at 580ddce

---
  src/backend/access/heap/vacuumlazy.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/access/heap/vacuumlazy.c 
b/src/backend/access/heap/vacuumlazy.c
index 8ce5011..8e8ea9d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1690,7 +1690,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, 
LVRelStats *vacrelstats,
"%u pages 
are entirely empty.\n",

empty_pages),
 empty_pages);
-   appendStringInfo(, _("%s."), pg_rusage_show());
+   appendStringInfo(, "%s.", pg_rusage_show());
  
  	ereport(elevel,

(errmsg("\"%s\": found %.0f removable, %.0f nonremovable row 
versions in %u out of %u pages",
-- 2.7.4


Why do you think it was erroneously added?

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Yet another vectorized engine

2020-02-27 Thread Konstantin Knizhnik




On 27.02.2020 11:09, Hubert Zhang wrote:

Hi Konstantin,
I also vimdiff nodeAgg.c in your PG13 branch with nodeAgg.c in pg's 
main repo.
Many functions has changed from PG96 to PG13, e.g. 
'advance_aggregates', 'lookup_hash_entry'
The vectorized nodeAgg seems still follow the PG96 way of implementing 
these functions.
In general, I think we'd better port executor of PG13 to vectorized 
executor of PG13 instead of merge some PG13 code into vectorized 
executor of PG96 to make it works. Because It's hard to determine 
which functions need to be merged and it's buggy if the executor code 
of both PG13 and PG96 exist in one branch.


What's your opinion?



In new version of Postgres all logic of aggregates transition is 
encapsulated in expression and performed by execExprInterp or generated 
GIT code.
If we not going to embed vectorize engine in kernel and continue to 
develop it as extension, then I do not have any good idea how to achieve 
it without

copying and patching code of ExecInterpExpr.

In any case, the current prototype doesn't show any noticeable 
performance improvement  comparing with existed executor with enabled JIT.
And providing vectorized version of ExecInterpExpr will not help to 
increase speed (according to profile time is spent in other places).


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: [postgis-devel] About EXTENSION from UNPACKAGED on PostgreSQL 13

2020-02-27 Thread Sandro Santilli
On Thu, Feb 27, 2020 at 09:32:24AM +0100, Sandro Santilli wrote:
> On Wed, Feb 26, 2020 at 11:18:43AM -0500, Chapman Flack wrote:
> > On 2/26/20 10:52 AM, Sandro Santilli wrote:
> > 
> > > This part is not clear to me. You're _assuming_ that the unpackaged--xxx
> > > will not make checks, so you _drop_ support for it ? Can't the normal
> > > extension script also be unsafe for some reason ? Or can't the
> > > unpackaged-xxx script be made safe by the publishers ? Or, as a last
> > > resort.. can't you just mark postgis as UNSAFE and still require
> > > superuser, which would give us the same experience as before ?
> > 
> > I am wondering: does anything in the PG 13 change preclude writing
> > a postgis_raster--unpackaged.sql script that could be applied with
> > CREATE EXTENSION postgis_raster VERSION unpackaged;
> > and would do perhaps nothing at all, or merely confirm that the
> > right unpackaged things are present and are the right things...
> > 
> > ... from which an ALTER EXTENSION postgis_raster UPDATE TO 3.0;
> > would naturally run the existing postgis_raster--unpackaged--3.0.sql
> > and execute all of its existing ALTER EXTENSION ... ADD operations?
> > 
> > Has the disadvantage of being goofy, but possibly the advantage of
> > being implementable in the current state of affairs.
> 
> Thanks for this hint, yes, seems to be technically feasible, as well
> as doing packaging in the extension creation scripts. Only... this
> would basically work around the intentionally removed syntax, which
> Steven Frost was against (still unclear to me why)...

NOTE: my suggestion was to directly have CREATE EXTENSION do the
packaging, which would give the same level of security as the
workaround suggested here, but with less hops.

--strk;




Re: WIP: Aggregation push-down

2020-02-27 Thread Antonin Houska
legrand legrand  wrote:

> Antonin Houska-2 wrote

> > Right now I recall two problems: 1) is the way I currently store
> > RelOptInfo for the grouped relations correct?, 2) how should we handle
> > types for which logical equality does not imply physical (byte-wise)
> > equality?
> > 
> > Fortunately it seems now that I'm not the only one who cares about 2), so
> > this
> > problem might be resolved soon:
> > 
> > https://www.postgresql.org/message-id/CAH2-Wzn3Ee49Gmxb7V1VJ3-AC8fWn-Fr8pfWQebHe8rYRxt5OQ%40mail.gmail.com
> > 
> > But 1) still remains.
> > 
> 
> Hello
> would "pgsql: Add equalimage B-Tree support functions." 
> https://www.postgresql.org/message-id/e1j72ny-0002gi...@gemulon.postgresql.org

Yes, it seems so. I'll adapt the patch soon, hopefully next week. Thanks for
reminder.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Using stat collector for collecting long SQL

2020-02-27 Thread Pavel Stehule
Hi

There is a often problem with taking source of long SQL strings. The
pg_stat_activity field query is reduced to some short limit and is not too
practical to increase this limit.

I have a idea to use for solution of this problem stat_collector process.
When the query string is reduced, then full form of query string can be
saved in stat_collector process. Maybe with execution plan. Then any
process can read these information from this process.

What do you think about this idea? It can be base for implementation
EXPLAIN PID ?

Notes, comments?

Regards

Pavel


Re: [postgis-devel] About EXTENSION from UNPACKAGED on PostgreSQL 13

2020-02-27 Thread Sandro Santilli
On Wed, Feb 26, 2020 at 11:18:43AM -0500, Chapman Flack wrote:
> On 2/26/20 10:52 AM, Sandro Santilli wrote:
> 
> > This part is not clear to me. You're _assuming_ that the unpackaged--xxx
> > will not make checks, so you _drop_ support for it ? Can't the normal
> > extension script also be unsafe for some reason ? Or can't the
> > unpackaged-xxx script be made safe by the publishers ? Or, as a last
> > resort.. can't you just mark postgis as UNSAFE and still require
> > superuser, which would give us the same experience as before ?
> 
> I am wondering: does anything in the PG 13 change preclude writing
> a postgis_raster--unpackaged.sql script that could be applied with
> CREATE EXTENSION postgis_raster VERSION unpackaged;
> and would do perhaps nothing at all, or merely confirm that the
> right unpackaged things are present and are the right things...
> 
> ... from which an ALTER EXTENSION postgis_raster UPDATE TO 3.0;
> would naturally run the existing postgis_raster--unpackaged--3.0.sql
> and execute all of its existing ALTER EXTENSION ... ADD operations?
> 
> Has the disadvantage of being goofy, but possibly the advantage of
> being implementable in the current state of affairs.

Thanks for this hint, yes, seems to be technically feasible, as well
as doing packaging in the extension creation scripts. Only... this
would basically work around the intentionally removed syntax, which
Steven Frost was against (still unclear to me why)...

--strk;




Improve handling of parameter differences in physical replication

2020-02-27 Thread Peter Eisentraut
When certain parameters are changed on a physical replication primary, 
  this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL 
record.  The standby then checks whether its own settings are at least 
as big as the ones on the primary.  If not, the standby shuts down with 
a fatal error.


The correspondence of settings between primary and standby is required 
because those settings influence certain shared memory sizings that are 
required for processing WAL records that the primary might send.  For 
example, if the primary sends a prepared transaction, the standby must 
have had max_prepared_transaction set appropriately or it won't be able 
to process those WAL records.


However, fatally shutting down the standby immediately upon receipt of 
the parameter change record might be a bit of an overreaction.  The 
resources related to those settings are not required immediately at that 
point, and might never be required if the activity on the primary does 
not exhaust all those resources.  An extreme example is raising 
max_prepared_transactions on the primary but never actually using 
prepared transactions.


Where this becomes a serious problem is if you have many standbys and 
you do a failover.  If the newly promoted standby happens to have a 
higher setting for one of the relevant parameters, all the other 
standbys that have followed it then shut down immediately and won't be 
able to continue until you change all their settings.


If we didn't do the hard shutdown and we just let the standby roll on 
with recovery, nothing bad will happen and it will eventually produce an 
appropriate error when those resources are required (e.g., "maximum 
number of prepared transactions reached").


So I think there are better ways to handle this.  It might be reasonable 
to provide options.  The attached patch doesn't do that but it would be 
pretty easy.  What the attached patch does is:


Upon receipt of XLOG_PARAMETER_CHANGE, we still check the settings but 
only issue a warning and set a global flag if there is a problem.  Then 
when we actually hit the resource issue and the flag was set, we issue 
another warning message with relevant information.  Additionally, at 
that point we pause recovery instead of shutting down, so a hot standby 
remains usable.  (That could certainly be configurable.)


Btw., I think the current setup is slightly buggy.  The MaxBackends 
value that is used to size shared memory is computed as MaxConnections + 
autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders, but 
we don't track autovacuum_max_workers in WAL.


(This patch was developed together with Simon Riggs.)

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From f5b4b7fd853b0dba2deea6b1e8290ae4c6df7081 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 27 Feb 2020 08:50:37 +0100
Subject: [PATCH v1] Improve handling of parameter differences in physical
 replication

When certain parameters are changed on a physical replication primary,
this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL
record.  The standby then checks whether its own settings are at least
as big as the ones on the primary.  If not, the standby shuts down
with a fatal error.

The correspondence of settings between primary and standby is required
because those settings influence certain shared memory sizings that
are required for processing WAL records that the primary might send.
For example, if the primary sends a prepared transaction, the standby
must have had max_prepared_transaction set appropriately or it won't
be able to process those WAL records.

However, fatally shutting down the standby immediately upon receipt of
the parameter change record might be a bit of an overreaction.  The
resources related to those settings are not required immediately at
that point, and might never be required if the activity on the primary
does not exhaust all those resources.  If we just let the standby roll
on with recovery, it will eventually produce an appropriate error when
those resources are used.

So this patch relaxes this a bit.  Upon receipt of
XLOG_PARAMETER_CHANGE, we still check the settings but only issue a
warning and set a global flag if there is a problem.  Then when we
actually hit the resource issue and the flag was set, we issue another
warning message with relevant information.  Additionally, at that
point we pause recovery, so a hot standby remains usable.
---
 src/backend/access/transam/twophase.c |  3 ++
 src/backend/access/transam/xlog.c | 56 ++-
 src/backend/storage/ipc/procarray.c   | 15 ++-
 src/backend/storage/lmgr/lock.c   | 10 +
 src/include/access/xlog.h |  1 +
 5 files changed, 74 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/transam/twophase.c 
b/src/backend/access/transam/twophase.c
index 5adf956f41..fdac2ed69d 100644
--- 

Re: Yet another vectorized engine

2020-02-27 Thread Hubert Zhang
Hi Konstantin,
I also vimdiff nodeAgg.c in your PG13 branch with nodeAgg.c in pg's main
repo.
Many functions has changed from PG96 to PG13, e.g. 'advance_aggregates',
'lookup_hash_entry'
The vectorized nodeAgg seems still follow the PG96 way of implementing
these functions.
In general, I think we'd better port executor of PG13 to vectorized
executor of PG13 instead of merge some PG13 code into vectorized executor
of PG96 to make it works. Because It's hard to determine which functions
need to be merged and it's buggy if the executor code of both PG13 and PG96
exist in one branch.

What's your opinion?


Re: reindex concurrently and two toast indexes

2020-02-27 Thread Julien Rouhaud
On Thu, Feb 27, 2020 at 04:32:11PM +0900, Michael Paquier wrote:
> On Sat, Feb 22, 2020 at 04:06:57PM +0100, Julien Rouhaud wrote:
> > Sorry, I just realized that I forgot to commit the last changes before 
> > sending
> > the patch, so here's the correct v2.
>
> Thanks for the patch.
>
> > +   if (skipit)
> > +   {
> > +   ereport(NOTICE,
> > +(errmsg("skipping invalid index \"%s.%s\"",
> > +
> > get_namespace_name(get_rel_namespace(indexOid)),
> > +get_rel_name(indexOid;
>
> ReindexRelationConcurrently() issues a WARNING when bumping on an
> invalid index, shouldn't the same log level be used?

For ReindexRelationConcurrently, the index is skipped because the feature isn't
supported, thus a warning.  In this case that would work, it's just that we
don't want to process such indexes, so I used a notice instead.

I'm not opposed to use a warning instead if you prefer.  What errcode should be
used though, ERRCODE_WARNING?  ERRCODE_FEATURE_NOT_SUPPORTED doesn't feel
right.

> Even with this patch, it is possible to reindex an invalid toast index
> with REINDEX INDEX (with and without CONCURRENTLY), which is the
> problem I mentioned upthread (Er, actually only for the non-concurrent
> case as told about reindex_index).  Shouldn't both cases be prevented
> as well with an ERROR?

Ah indeed, sorry I missed that.

While looking at it, I see that invalid indexes seem to leaked when the table
is dropped, with no way to get rid of them:

s1:
CREATE TABLE t1(val text);
CREATE INDEX ON t1 (val);
BEGIN;
SELECT * FROM t1 FOR UPDATE;

s2:
REINDEX TABLE CONCURRENTLY t1;
[stucked and canceled]
SELECT indexrelid::regclass, indrelid::regclass FROM pg_index WHERE NOT 
indisvalid;
 indexrelid  |indrelid
-+-
 t1_val_idx_ccold| t1
 pg_toast.pg_toast_16385_index_ccold | pg_toast.pg_toast_16385
(2 rows)

s1:
ROLLBACK;
DROP TABLE t1;

SELECT indexrelid::regclass, indrelid::regclass FROM pg_index WHERE NOT 
indisvalid;
 indexrelid  | indrelid
-+--
 t1_val_idx_ccold| 16385
 pg_toast.pg_toast_16385_index_ccold | 16388
(2 rows)

REINDEX INDEX t1_val_idx_ccold;
ERROR:  XX000: could not open relation with OID 16385
LOCATION:  relation_open, relation.c:62

DROP INDEX t1_val_idx_ccold;
ERROR:  XX000: could not open relation with OID 16385
LOCATION:  relation_open, relation.c:62

REINDEX INDEX pg_toast.pg_toast_16385_index_ccold;
ERROR:  XX000: could not open relation with OID 16388
LOCATION:  relation_open, relation.c:62

DROP INDEX pg_toast.pg_toast_16385_index_ccold;
ERROR:  XX000: could not open relation with OID 16388
LOCATION:  relation_open, relation.c:62

REINDEX DATABASE rjuju;
REINDEX

SELECT indexrelid::regclass, indrelid::regclass FROM pg_index WHERE NOT 
indisvalid;
 indexrelid  | indrelid
-+--
 t1_val_idx_ccold| 16385
 pg_toast.pg_toast_16385_index_ccold | 16388
(2 rows)

Shouldn't DROP TABLE be fixed to also drop invalid indexes?




Re: Crash by targetted recovery

2020-02-27 Thread Kyotaro Horiguchi
Thank you for the comment.
At Thu, 27 Feb 2020 16:23:44 +0900, Fujii Masao  
wrote in 
> On 2020/02/27 15:23, Kyotaro Horiguchi wrote:
> >> I failed to understand why random access while reading from
> >> stream is bad idea. Could you elaborate why?
> > It seems to me the word "streaming" suggests that WAL record should be
> > read sequentially. Random access, which means reading from arbitrary
> > location, breaks a stream.  (But the patch doesn't try to stop wal
> > sender if randAccess.)
> > 
> >> Isn't it sufficient to set currentSource to 0 when disabling
> >> StandbyMode?
> > I thought that and it should work, but I hesitated to manipulate on
> > currentSource in StartupXLOG. currentSource is basically a private
> > state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I
> > think it's not good to modify it out of the the logic in
> > WaitForWALToBecomeAvailable.
> 
> If so, what about adding the following at the top of
> WaitForWALToBecomeAvailable()?
> 
> if (!StandbyMode && currentSource == XLOG_FROM_STREAM)
>  currentSource = 0;

It works virtually the same way. I'm happy to do that if you don't
agree to using randAccess. But I'd rather do that in the 'if
(!InArchiveRecovery)' section.

> > Come to think of that I got to think the
> > following part in ReadRecord should use randAccess instead..
> > xlog.c:4384
> >>  /*
> > -  * Before we retry, reset lastSourceFailed and currentSource
> > -  * so that we will check the archive next.
> > +  * Streaming has broken, we retry from the same LSN.
> >>   */
> >>  lastSourceFailed = false;
> > - currentSource = 0;
> > + private->randAccess = true;
> 
> Sorry, I failed to understand why this change is necessary...

It's not necessary, just for being tidy about the responsibility on
currentSource.

> At least the comment that you added seems incorrect
> because WAL streaming should not have started yet when
> we reach the above point.

Oops, right.

-  * Streaming has broken, we retry from the same LSN.
+  * Restart recovery from the current LSN.

For clarity, I don't insist on the change at all. If it were
necessary, it's another topic, anyway. Please forget it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 8cb817a43226a0d60dd62c6205219a2f0c807b9e Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 26 Feb 2020 20:41:11 +0900
Subject: [PATCH v2 1/2] TAP test for a crash bug

---
 src/test/recovery/t/003_recovery_targets.pl | 32 +
 1 file changed, 32 insertions(+)

diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index fd14bab208..8e71788981 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -167,3 +167,35 @@ foreach my $i (0..100)
 $logfile = slurp_file($node_standby->logfile());
 ok($logfile =~ qr/FATAL:  recovery ended before configured recovery target was reached/,
 	'recovery end before target reached is a fatal error');
+
+
+# Edge case where targetted promotion happens on segment boundary
+$node_standby = get_new_node('standby_9');
+$node_standby->init_from_backup($node_master, 'my_backup',
+has_restoring => 1, has_streaming => 1);
+$node_standby->start;
+## read wal_segment_size
+my $result =
+  $node_standby->safe_psql('postgres', "SHOW wal_segment_size");
+die "unknown format of wal_segment_size: $result\n"
+  if ($result !~ /^([0-9]+)MB$/);
+my $segsize = $1 * 1024 * 1024;
+## stop just before the next segment boundary
+$result =
+  $node_standby->safe_psql('postgres', "SELECT pg_last_wal_replay_lsn()");
+my ($seg, $off) = split('/', $result);
+my $target = sprintf("$seg/%08X", (hex($off) / $segsize + 1) * $segsize);
+
+$node_standby->stop;
+$node_standby->append_conf('postgresql.conf', qq(
+recovery_target_inclusive=no
+recovery_target_lsn='$target'
+recovery_target_action='promote'
+));
+$node_standby->start;
+## do targetted promote
+$node_master->safe_psql('postgres', "CREATE TABLE t(); DROP TABLE t;");
+$node_master->safe_psql('postgres', "SELECT pg_switch_wal(); CHECKPOINT;");
+my $caughtup_query = "SELECT NOT pg_is_in_recovery()";
+$node_standby->poll_query_until('postgres', $caughtup_query)
+  or die "Timed out while waiting for standby to promote";
-- 
2.18.2

>From d917638b787b9d1393cbc0d77586168da844756b Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 26 Feb 2020 20:41:37 +0900
Subject: [PATCH v2 2/2] Fix a crash bug of targetted promotion.

After recovery target is reached, StartupXLOG turns off standby mode
then refetches the last record. If the last record starts from the
previous segment at the time, WaitForWALToBecomeAvailable crashes with
assertion failure.  WaitForWALToBecomeAvailable should move back to
XLOG_FROM_ARCHIVE if standby mode is turned off while
XLOG_FROM_STREAM.
---
 src/backend/access/transam/xlog.c | 13 +++--
 1 file changed, 11 insertions(+), 2