date:20200403

Petr Jelinek  writes:
> On 03/04/2020 17:51, Tom Lane wrote:
>> But the forked-off children have to write the gcov files independently,
>> don't they?

> Hmm that's very good point. I did see these missing coverage issue when 
> running tests that explicitly start more instances of postgres before 
> though. And with some quick googling, parallel testing seems to be issue 
> with gcov for more people.

I poked around and found this:

https://gcc.gnu.org/legacy-ml/gcc-help/2005-11/msg00074.html

which says

gcov instrumentation is multi-process safe, but not multi-thread
safe. The multi-processing safety relies on OS level file locking,
which is not available on some systems.

That would explain why it works for me, but then there's a question
of why it doesn't work for you ...

regards, tom lane

Re: WAL usage calculation patch

On Fri, Apr 3, 2020 at 7:36 PM Amit Kapila  wrote:
>
> On Fri, Apr 3, 2020 at 9:40 AM Amit Kapila  wrote:
> >
> > On Fri, Apr 3, 2020 at 9:35 AM Dilip Kumar  wrote:
> > >
> > > I have analyzed the WAL and there could be multiple reasons for the
> > > same.  With small data, I have noticed that while inserting in the
> > > system index there was a Page Split and that created extra WAL.
> > >
> >
> > Thanks for the investigation.  I think it is clear that we can't
> > expect the same WAL size even if we repeat the same operation unless
> > it is a fresh database.
> >
>
> Attached find the latest patches.  I have modified based on our
> discussion on user interface thread [1], ran pgindent on all patches,
> slightly modified one comment based on Dilip's input and added commit
> messages.  I think the patches are in good shape.  I would like to
> commit the first patch in this series tomorrow unless I see more
> comments or any other objections.
>

Pushed.

>  The patch-2 might need to be
> rebased if the other related patch [2] got committed first and we
> might need to tweak a bit based on the input from other thread [1]
> where we are discussing user interface for it.
>

The primary question for patch-2 is whether we want to include WAL
usage information for the planning phase as we did for BUFFERS in
recent commit ce77abe63c (Include information on buffer usage during
planning phase, in EXPLAIN output, take two.).  Initially, I thought
it might be a good idea to do the same for WAL but after reading the
thread that leads to commit, I am not sure if there is any pressing
need to include WAL information for the planning phase.  Because
during planning we might not write much WAL (with the exception of WAL
due to setting of hint-bits) so users might not care much.  What do
you think?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: adding partitioned tables to publications


On 03/04/2020 17:51, Tom Lane wrote:

Petr Jelinek  writes:

On 03/04/2020 16:59, Tom Lane wrote:

Petr Jelinek  writes:

AFAIK gcov can't handle multiple instances of same process being started
as it just overwrites the coverage files. So for TAP test it will report
bogus info (as in some code that's executed will look as not executed).



Hm, really?  I routinely run "make check" (ie, parallel regression
tests) under coverage, and I get results that seem sane.  If I were
losing large chunks of the data, I think I'd have noticed.



Parallel regression still just starts single postgres instance no?


But the forked-off children have to write the gcov files independently,
don't they?



Hmm that's very good point. I did see these missing coverage issue when 
running tests that explicitly start more instances of postgres before 
though. And with some quick googling, parallel testing seems to be issue 
with gcov for more people.


I wonder if the program checksum that gcov calculates when merging the 
.gcda data while updating it is somehow different for separately started 
instances but not for the ones forked from same parent or something. I 
don't know internals of gcov well enough to say how exactly that works.


--
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/

Re: backup manifests and contemporaneous buildfarm failures


On 04/04/2020 05:06, Andres Freund wrote:

Hi,

Peter, Petr, CCed you because it's probably a bug somewhere around the
initial copy code for logical replication.


On 2020-04-03 20:48:09 -0400, Robert Haas wrote:

'serinus' is also failing. This is less obviously related:


Hm. Tests passed once since then.



2020-04-04 02:08:57.299 CEST [5e87d019.506c1:4] LOG:  received
replication command: CREATE_REPLICATION_SLOT
"tap_sub_16390_sync_16384" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
2020-04-04 02:08:57.299 CEST [5e87d019.506c1:5] ERROR:  replication
slot "tap_sub_16390_sync_16384" already exists


That already seems suspicious. I checked the following (successful) run
and I did not see that in the stage's logs.

Looking at the failing log, it fails because for some reason there's
rounds (once due to a refresh, once due to an intention replication
failure) of copying the relation. Each creates its own temporary slot.

first time:
2020-04-04 02:08:57.276 CEST [5e87d019.506bd:1] LOG:  connection received: 
host=[local]
2020-04-04 02:08:57.278 CEST [5e87d019.506bd:4] LOG:  received replication command: 
CREATE_REPLICATION_SLOT "tap_sub_16390_sync_16384" TEMPORARY LOGICAL pgoutput 
USE_SNAPSHOT
2020-04-04 02:08:57.282 CEST [5e87d019.506bd:9] LOG:  statement: COPY 
public.tab_rep TO STDOUT
2020-04-04 02:08:57.284 CEST [5e87d019.506bd:10] LOG:  disconnection: session 
time: 0:00:00.007 user=bf database=postgres host=[local]

second time:
2020-04-04 02:08:57.288 CEST [5e87d019.506bf:1] LOG:  connection received: 
host=[local]
2020-04-04 02:08:57.289 CEST [5e87d019.506bf:4] LOG:  received replication command: 
CREATE_REPLICATION_SLOT "tap_sub_16390_sync_16384" TEMPORARY LOGICAL pgoutput 
USE_SNAPSHOT
2020-04-04 02:08:57.293 CEST [5e87d019.506bf:9] LOG:  statement: COPY 
public.tab_rep TO STDOUT

third time:
2020-04-04 02:08:57.297 CEST [5e87d019.506c1:1] LOG:  connection received: 
host=[local]
2020-04-04 02:08:57.299 CEST [5e87d019.506c1:4] LOG:  received replication command: 
CREATE_REPLICATION_SLOT "tap_sub_16390_sync_16384" TEMPORARY LOGICAL pgoutput 
USE_SNAPSHOT
2020-04-04 02:08:57.299 CEST [5e87d019.506c1:5] ERROR:  replication slot 
"tap_sub_16390_sync_16384" already exists

Note that the connection from the second attempt has not yet
disconnected. Hence the error about the replication slot already
existing - it's a temporary replication slot that'd otherwise already
have been dropped.


Seems the logical rep code needs to do something about this race?



The downstream:


2020-04-04 02:08:57.275 CEST [5e87d019.506bc:1] LOG:  logical replication table synchronization 
worker for subscription "tap_sub", table "tab_rep" has started
2020-04-04 02:08:57.282 CEST [5e87d019.506bc:2] ERROR:  duplicate key value violates 
unique constraint "tab_rep_pkey"
2020-04-04 02:08:57.282 CEST [5e87d019.506bc:3] DETAIL:  Key (a)=(1) already 
exists.
2020-04-04 02:08:57.282 CEST [5e87d019.506bc:4] CONTEXT:  COPY tab_rep, line 1
2020-04-04 02:08:57.283 CEST [5e87d018.50689:5] LOG:  background worker "logical 
replication worker" (PID 329404) exited with exit code 1
2020-04-04 02:08:57.287 CEST [5e87d019.506be:1] LOG:  logical replication table synchronization 
worker for subscription "tap_sub", table "tab_rep" has started
2020-04-04 02:08:57.293 CEST [5e87d019.506be:2] ERROR:  duplicate key value violates 
unique constraint "tab_rep_pkey"
2020-04-04 02:08:57.293 CEST [5e87d019.506be:3] DETAIL:  Key (a)=(1) already 
exists.
2020-04-04 02:08:57.293 CEST [5e87d019.506be:4] CONTEXT:  COPY tab_rep, line 1
2020-04-04 02:08:57.295 CEST [5e87d018.50689:6] LOG:  background worker "logical 
replication worker" (PID 329406) exited with exit code 1
2020-04-04 02:08:57.297 CEST [5e87d019.506c0:1] LOG:  logical replication table synchronization 
worker for subscription "tap_sub", table "tab_rep" has started
2020-04-04 02:08:57.299 CEST [5e87d019.506c0:2] ERROR:  could not create replication slot 
"tap_sub_16390_sync_16384": ERROR:  replication slot "tap_sub_16390_sync_16384" 
already exists
2020-04-04 02:08:57.300 CEST [5e87d018.50689:7] LOG:  background worker "logical replication worker" (PID 329408) exited with exit code 


Looks like we are simply retrying so fast that upstream will not have 
finished cleanup after second try by the time we already run the third one.


The last_start_times is supposed to protect against that so I guess 
there is some issue with how that works.


--
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/

Re: Proposal: is_castable

Hi


> So it's a two step process for me currently because of this, I would love
> if there was a better way to handle
> this type of work though, because my plpgsql functions using exception
> blocks are not exactly great
> for performance.
>

Probably we can for some important buildin types write method "is_valid",
and this method can be called directly. For custom types or for types
without this method, the solution based on exceptions can be used.

This should not be too much code, and can be fast for often used types.

Re: proposal \gcsv

so 4. 4. 2020 v 0:24 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > We can have a new commands for cloning print environments and switch to
> one
> > shot environment. It can be based just on enhancing of \pset command
>
> > \pset save anyidentifier .. serialize settings
> > \pset load anyidentifier .. load setting
> > \pset oneshot [anyidentifer] .. prepare and set copy of current print
> > setting for next execution command
> > \pset main
> > \pset reset .. reset to defaults
>
> I feel like that's gotten pretty far away from the idea of a simple,
> easy-to-use way of adjusting the parameters for one \g operation.
> There'd be a whole lot of typing involved above and beyond the
> obviously-necessary part of specifying the new pset parameter values.
>

for my original proposal is important only one command \pset oneshot

so one shot setting can be done by

\pset oneshot
\pset format csv
\pset csv_separator ;
any command that print tuples

this is your plan B, but we we need just enhance only pset command, and all
others can be used without any modifications.


> (Also, it's not clear to me how that's any more robust than the
> stack idea.  If you could lose "\pset pop" to an error, you could
> lose "\pset reset" too.)
>

The \pset reset should not to do switch from one shot to usual settings
(this is not necessary,because one shot settings is destroyed after
execution), but my idea is reset to initial psql settings

>
> If people are upset about the extra whitespace in the paren-style
> proposal, we could do without it.  The only real problem would be
> if there's ever a pset parameter for which a trailing right paren
> could be a sensible part of the value.  Maybe that's not ever
> going to be an issue; or maybe we could provide a quoting mechanism
> for weird pset values.
>

Parametrization in parenthesis is usual pattern (EXPLAIN, COPY, ..) in
Postgres, and for me it most natural syntax.




>
> regards, tom lane
>

Re: Add A Glossary

2020-04-03 Thread Laurenz Albe

On Fri, 2020-04-03 at 16:01 -0500, Justin Pryzby wrote:
> BTW it's now visible at:
> https://www.postgresql.org/docs/devel/glossary.html

Great!

Some comments:

- SQL object: There are more kinds of objects, like roles or full text 
dictionaries.
  Perhaps better:

Anything that is created with a CREATE statement, for example ...
Most objects belong to a database schema, except ...

  Or do we consider a replication slot to be an object?

- The glossary has "Primary (server)", but not "Standby (server)".
  That should be a synonym for "Replica".

- Server: is that really our definition?
  I thought that "server" is what the glossary defines as "instance", and
  the thing called "server" in the glossary should really be called "host".

  Maybe I am too Unix-centered.

  Many people I know use "instance" synonymous to "cluster".

- Role: I understand the motivation behind the definition (except that the word 
"instance"
  is ill chosen), but a role is more than a collection of privileges.
  How can a collection of privileges have a password or own an object?
  Perhaps, instead of the first sentence:

A database object used for authentication, authorization and ownership.
Both database users and user groups are "roles" in PostgreSQL.

  In the second sentence, "roles" is mis-spelled as "roless".

- Null

  I think it should say "It represents the absence of *a definite* value."
  Usually it is better to think of NULL as "unknown".

- Function

  I don't know if "transformation of data" describes it well.
  Quite a lot of functions in PostgreSQL have side effects.
  How about:

Procedural code stored in the database that can be used in SQL statements.

Yours,
Laurenz Albe

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Andres Freund

Hi,

Peter, Petr, CCed you because it's probably a bug somewhere around the
initial copy code for logical replication.


On 2020-04-03 20:48:09 -0400, Robert Haas wrote:
> 'serinus' is also failing. This is less obviously related:

Hm. Tests passed once since then.


> 2020-04-04 02:08:57.299 CEST [5e87d019.506c1:4] LOG:  received
> replication command: CREATE_REPLICATION_SLOT
> "tap_sub_16390_sync_16384" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
> 2020-04-04 02:08:57.299 CEST [5e87d019.506c1:5] ERROR:  replication
> slot "tap_sub_16390_sync_16384" already exists

That already seems suspicious. I checked the following (successful) run
and I did not see that in the stage's logs.

Looking at the failing log, it fails because for some reason there's
rounds (once due to a refresh, once due to an intention replication
failure) of copying the relation. Each creates its own temporary slot.

first time:
2020-04-04 02:08:57.276 CEST [5e87d019.506bd:1] LOG:  connection received: 
host=[local]
2020-04-04 02:08:57.278 CEST [5e87d019.506bd:4] LOG:  received replication 
command: CREATE_REPLICATION_SLOT "tap_sub_16390_sync_16384" TEMPORARY LOGICAL 
pgoutput USE_SNAPSHOT
2020-04-04 02:08:57.282 CEST [5e87d019.506bd:9] LOG:  statement: COPY 
public.tab_rep TO STDOUT
2020-04-04 02:08:57.284 CEST [5e87d019.506bd:10] LOG:  disconnection: session 
time: 0:00:00.007 user=bf database=postgres host=[local]

second time:
2020-04-04 02:08:57.288 CEST [5e87d019.506bf:1] LOG:  connection received: 
host=[local]
2020-04-04 02:08:57.289 CEST [5e87d019.506bf:4] LOG:  received replication 
command: CREATE_REPLICATION_SLOT "tap_sub_16390_sync_16384" TEMPORARY LOGICAL 
pgoutput USE_SNAPSHOT
2020-04-04 02:08:57.293 CEST [5e87d019.506bf:9] LOG:  statement: COPY 
public.tab_rep TO STDOUT

third time:
2020-04-04 02:08:57.297 CEST [5e87d019.506c1:1] LOG:  connection received: 
host=[local]
2020-04-04 02:08:57.299 CEST [5e87d019.506c1:4] LOG:  received replication 
command: CREATE_REPLICATION_SLOT "tap_sub_16390_sync_16384" TEMPORARY LOGICAL 
pgoutput USE_SNAPSHOT
2020-04-04 02:08:57.299 CEST [5e87d019.506c1:5] ERROR:  replication slot 
"tap_sub_16390_sync_16384" already exists

Note that the connection from the second attempt has not yet
disconnected. Hence the error about the replication slot already
existing - it's a temporary replication slot that'd otherwise already
have been dropped.


Seems the logical rep code needs to do something about this race?


About the assertion failure:

TRAP: FailedAssertion("owner->bufferarr.nitems == 0", File: 
"/home/bf/build/buildfarm-serinus/HEAD/pgsql.build/../pgsql/src/backend/utils/resowner/resowner.c",
 Line: 718)
postgres: publisher: walsender bf [local] idle in transaction 
(aborted)(ExceptionalCondition+0x5c)[0x9a13ac]
postgres: publisher: walsender bf [local] idle in transaction 
(aborted)(ResourceOwnerDelete+0x295)[0x9db8e5]
postgres: publisher: walsender bf [local] idle in transaction 
(aborted)[0x54c61f]
postgres: publisher: walsender bf [local] idle in transaction 
(aborted)(AbortOutOfAnyTransaction+0x122)[0x550e32]
postgres: publisher: walsender bf [local] idle in transaction 
(aborted)[0x9b3bc9]
postgres: publisher: walsender bf [local] idle in transaction 
(aborted)(shmem_exit+0x35)[0x80db45]
postgres: publisher: walsender bf [local] idle in transaction 
(aborted)[0x80dc77]
postgres: publisher: walsender bf [local] idle in transaction 
(aborted)(proc_exit+0x8)[0x80dd08]
postgres: publisher: walsender bf [local] idle in transaction 
(aborted)(PostgresMain+0x59f)[0x83bd0f]
postgres: publisher: walsender bf [local] idle in transaction 
(aborted)[0x7a0264]
postgres: publisher: walsender bf [local] idle in transaction 
(aborted)(PostmasterMain+0xbfc)[0x7a2b8c]
postgres: publisher: walsender bf [local] idle in transaction 
(aborted)(main+0x6fb)[0x49749b]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7fc52d83]
postgres: publisher: walsender bf [local] idle in transaction 
(aborted)(_start+0x2a)[0x49753a]
2020-04-04 02:08:57.302 CEST [5e87d018.5066b:4] LOG:  server process (PID 
329409) was terminated by signal 6: Aborted

Due to the log_line_prefix used, I was at first not entirely sure the
backend that crashed was the one with the ERROR. But it appears we print
the pid as hex for '%c' (why?), so it indeed is the one.


I, again, have to say that the amount of stuff that was done as part of

commit 7c4f52409a8c7d85ed169bbbc1f6092274d03920
Author: Peter Eisentraut 
Date:   2017-03-23 08:36:36 -0400

Logical replication support for initial data copy

is insane. Adding support for running sql over replication connections
and extending CREATE_REPLICATION_SLOT with new options (without even
mentioning that in the commit message!) as part of a commit described as
"Logical replication support for initial data copy" shouldn't happen.


It's not obvious to me what buffer pins could be held at this point. I
wonder if this could be somehow related to

commit

Re: backup manifests and contemporaneous buildfarm failures

On Fri, Apr 3, 2020 at 9:52 PM Tom Lane  wrote:
> Robert Haas  writes:
> > 'prairiedog' is also unhappy, and it looks related:
>
> Yeah, gaur also failed in the same place.  Both of those are
> alignment-picky 32-bit hardware, so I'm thinking the problem is
> pg_gmtime() trying to fetch a 64-bit pg_time_t from an insufficiently
> aligned address.  I'm trying to confirm that on gaur's host right now,
> but it's a slow machine ...

You might just want to wait until tomorrow and see whether it clears
up in newer runs. I just pushed yet another fix that might be
relevant.

I think I've done about as much as I can do for tonight, though. Most
things are green now, and the ones that aren't are failing because of
stuff that is at least plausibly fixed. By morning it should be
clearer how much broken stuff is left, although that will be somewhat
complicated by at least sidewinder and seawasp needing manual
intervention to get back on track.

I apologize to everyone who has been or will be inconvenienced by all
of this. So far I've pushed 4 test case fixes, 2 bug fixes, and 1
makefile fix, which I'm pretty sure is over quota for one patch. :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: backup manifests and contemporaneous buildfarm failures

Robert Haas  writes:
> Interestingly, on my machine, rmtree coped with a mode 0 directory
> just fine, but mode 0400 was more than its tiny brain could handle, so
> the originally committed fix had code to revert 0400 back to 0700, but
> I didn't add similar code to revert from 0 back to 0700 because that
> was working fine.

It seems really odd that an implementation could cope with mode-0
but not mode-400.  Not sure I care enough to dig into the Perl
library code, though.

regards, tom lane

Re: backup manifests and contemporaneous buildfarm failures

Robert Haas  writes:
> 'prairiedog' is also unhappy, and it looks related:

Yeah, gaur also failed in the same place.  Both of those are
alignment-picky 32-bit hardware, so I'm thinking the problem is
pg_gmtime() trying to fetch a 64-bit pg_time_t from an insufficiently
aligned address.  I'm trying to confirm that on gaur's host right now,
but it's a slow machine ...

> 'serinus' is also failing. This is less obviously related:

Dunno about this one.

regards, tom lane

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

2020-04-03 Thread David Rowley

On Fri, 3 Apr 2020 at 21:56, Andy Fan  wrote:
>
>
>
> On Fri, Apr 3, 2020 at 12:08 PM David Rowley  wrote:
>>
>> On Fri, 3 Apr 2020 at 16:40, Tom Lane  wrote:
>> > (It occurs to me BTW that we've been overly conservative about using
>> > NOT NULL constraints in planning, because of failing to consider that.
>> > Addition or drop of NOT NULL has to cause a change in
>> > pg_attribute.attnotnull, which will definitely cause a relcache inval
>> > on its table, cf rules in CacheInvalidateHeapTuple().  So we *don't*
>> > need to have a pg_constraint entry corresponding to the NOT NULL, as
>> > we've mistakenly supposed in some past discussions.)
>>
>> Agreed for remove_useless_groupby_columns(), but we'd need it if we
>> wanted to detect functional dependencies in
>> check_functional_grouping() using unique indexes.
>
>
> Thanks for the explanation.  I will add the removal in the next version of 
> this
> patch.

There's no need for this patch to touch
remove_useless_groupby_columns().  Fixes for that should be considered
independently and *possibly* even backpatched.

Re: Proposal: is_castable

2020-04-03 Thread Adam Brusselback

>  What would you actually do with it?

I am one of the users of these do-it-yourself functions, and I use them in
my ETL pipelines heavily.

For me, data gets loaded into a staging table, all columns text, and I run
a whole bunch of validation queries
on the data prior to it moving to the next stage in the pipeline, a
strongly typed staging table, where more
validations are performed. So I currently check each column type with my
custom can_convert_sometype(text)
functions, and if the row has any columns that cannot convert, it marks a
boolean to ignore moving that row
to the next strongly typed table (thus avoiding the cast for those rows).

For this ETL process, I need to give users feedback about why specific
specific rows failed to be processed, so
each of those validations also logs an error message for the user for each
row failing a specific validation.

So it's a two step process for me currently because of this, I would love
if there was a better way to handle
this type of work though, because my plpgsql functions using exception
blocks are not exactly great
for performance.

>> Similar features are implemented in:
>> - SQL Server (as TRY_CONVERT)
>> - Oracle (as CONVERT([val] DEFAULT [expr] ON CONVERSION ERROR)
>
> Somehow, I don't think those have the semantics of what you suggest
here.

Agreed that they aren't the same exact feature, but I would very much love
the ability to both
know "will this cast fail?", and also be able to "try and cast, but if it
fails just put this value and don't error".

They both have uses IMO, and while having is_castable() functions built in
would be great, I just want to
express my desire for something like the above feature in SQL Server or
Oracle as well.

Re: backup manifests and contemporaneous buildfarm failures

On Fri, Apr 3, 2020 at 8:12 PM Tom Lane  wrote:
> Yeah, so it would seem.  The buildfarm script uses rmtree to clean out
> the old build tree.  The man page for File::Path suggests (but can't
> quite bring itself to say in so many words) that by default, rmtree
> will adjust the permissions on target directories to allow the deletion
> to succeed.  But that's very clearly not happening on some platforms.
> (Maybe that represents a local patch on the part of some packagers
> who thought it was too unsafe?)

Interestingly, on my machine, rmtree coped with a mode 0 directory
just fine, but mode 0400 was more than its tiny brain could handle, so
the originally committed fix had code to revert 0400 back to 0700, but
I didn't add similar code to revert from 0 back to 0700 because that
was working fine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: EINTR while resizing dsm segment.

2020-04-03 Thread Thomas Munro

On Thu, Apr 2, 2020 at 9:25 PM Kyotaro Horiguchi
 wrote:
> I provided the subject, and added -hackers.
>
> > Hello,
> > I am running postgres 11.5 and we were having issues with shared segments.
> > So I increased the max_connection as suggested by you guys and reduced my
> > work_mem to 600M.
> >
> > Right now instead, it is the second time I see this error :
> >
> > ERROR:  could not resize shared memory segment "/PostgreSQL.2137675995" to
> > 33624064 bytes: Interrupted system call
>
> The function posix_fallocate is protected against EINTR.
>
> | do
> | {
> |   rc = posix_fallocate(fd, 0, size);
> | } while (rc == EINTR && !(ProcDiePending || QueryCancelPending));
>
> But not for ftruncate and write. Don't we need to protect them from
> ENTRI as the attached?

We don't handle EINTR for write() generally because that's not
supposed to be necessary on local files (local disks are not "slow
devices", and we document that if you're using something like NFS you
should use its "hard" mount option so that it behaves that way too).
As for ftruncate(), you'd think it'd be similar, and I can't think of
a more local filesystem than tmpfs (where POSIX shmem lives on Linux),
but I can't seem to figure that out from reading man pages; maybe I'm
reading the wrong ones.  Perhaps in low memory situations, an I/O wait
path reached by ftruncate() can return EINTR here rather than entering
D state (non-interruptable sleep) or restarting due to our SA_RESTART
flag... anyone know?

Another thought: is there some way for the posix_fallocate() retry
loop to exit because (ProcDiePending || QueryCancelPending), but then
for CHECK_FOR_INTERRUPTS() to do nothing, so that we fall through to
reporting the EINTR?

Re: backup manifests and contemporaneous buildfarm failures

On Fri, Apr 3, 2020 at 6:13 PM Tom Lane  wrote:
> Locally, I observe that "make clean" in src/bin/pg_validatebackup fails
> to clean up the tmp_check directory left behind by "make check".

Fixed.

I also tried to fix 'lapwing', which was complaining about about a
call to pg_gmtime, saying that it "expected 'const pg_time_t *' but
argument is of type 'time_t *'". I was thinking that the problem had
something to do with const, but Thomas pointed out to me that
pg_time_t != time_t, so I pushed a fix which assumes that was the
issue. (It was certainly *an* issue.)

'prairiedog' is also unhappy, and it looks related:

/bin/sh ../../../../config/install-sh -c -d
'/Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/modules/commit_ts'/tmp_check
cd . && 
TESTDIR='/Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/modules/commit_ts'
PATH="/Users/buildfarm/bf-data/HEAD/pgsql.build/tmp_install/Users/buildfarm/bf-data/HEAD/inst/bin:$PATH"
DYLD_LIBRARY_PATH="/Users/buildfarm/bf-data/HEAD/pgsql.build/tmp_install/Users/buildfarm/bf-data/HEAD/inst/lib:$DYLD_LIBRARY_PATH"
 PGPORT='65678'
PG_REGRESS='/Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/modules/commit_ts/../../../../src/test/regress/pg_regress'
REGRESS_SHLIB='/Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/regress/regress.so'
/usr/local/perl5.8.3/bin/prove -I ../../../../src/test/perl/ -I .
t/*.pl
t/001_base.ok
t/002_standby..FAILED--Further testing stopped: system pg_basebackup failed
make: *** [check] Error 25

Unfortunately, that error message is not very informative and for some
reason the TAP logs don't seem to be included in the buildfarm output
in this case, so it's hard to tell exactly what went wrong. This
appears to be another 32-bit critter, which may be related somehow,
but I don't know how exactly.

'serinus' is also failing. This is less obviously related:

[02:08:55] t/003_constraints.pl .. ok 2048 ms ( 0.01 usr  0.00 sys
+  1.28 cusr  0.38 csys =  1.67 CPU)
# poll_query_until timed out executing this query:
# SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN
('r', 's');
# expecting this output:
# t
# last actual query output:
# f
# with stderr:

But there's also this:

2020-04-04 02:08:57.297 CEST [5e87d019.506c1:1] LOG:  connection
received: host=[local]
2020-04-04 02:08:57.298 CEST [5e87d019.506c1:2] LOG:  replication
connection authorized: user=bf
application_name=tap_sub_16390_sync_16384
2020-04-04 02:08:57.299 CEST [5e87d019.506c1:3] LOG:  statement: BEGIN
READ ONLY ISOLATION LEVEL REPEATABLE READ
2020-04-04 02:08:57.299 CEST [5e87d019.506c1:4] LOG:  received
replication command: CREATE_REPLICATION_SLOT
"tap_sub_16390_sync_16384" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
2020-04-04 02:08:57.299 CEST [5e87d019.506c1:5] ERROR:  replication
slot "tap_sub_16390_sync_16384" already exists
TRAP: FailedAssertion("owner->bufferarr.nitems == 0", File:
"/home/bf/build/buildfarm-serinus/HEAD/pgsql.build/../pgsql/src/backend/utils/resowner/resowner.c",
Line: 718)
postgres: publisher: walsender bf [local] idle in transaction
(aborted)(ExceptionalCondition+0x5c)[0x9a13ac]
postgres: publisher: walsender bf [local] idle in transaction
(aborted)(ResourceOwnerDelete+0x295)[0x9db8e5]
postgres: publisher: walsender bf [local] idle in transaction
(aborted)[0x54c61f]
postgres: publisher: walsender bf [local] idle in transaction
(aborted)(AbortOutOfAnyTransaction+0x122)[0x550e32]
postgres: publisher: walsender bf [local] idle in transaction
(aborted)[0x9b3bc9]
postgres: publisher: walsender bf [local] idle in transaction
(aborted)(shmem_exit+0x35)[0x80db45]
postgres: publisher: walsender bf [local] idle in transaction
(aborted)[0x80dc77]
postgres: publisher: walsender bf [local] idle in transaction
(aborted)(proc_exit+0x8)[0x80dd08]
postgres: publisher: walsender bf [local] idle in transaction
(aborted)(PostgresMain+0x59f)[0x83bd0f]
postgres: publisher: walsender bf [local] idle in transaction
(aborted)[0x7a0264]
postgres: publisher: walsender bf [local] idle in transaction
(aborted)(PostmasterMain+0xbfc)[0x7a2b8c]
postgres: publisher: walsender bf [local] idle in transaction
(aborted)(main+0x6fb)[0x49749b]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7fc52d83]
postgres: publisher: walsender bf [local] idle in transaction
(aborted)(_start+0x2a)[0x49753a]
2020-04-04 02:08:57.302 CEST [5e87d018.5066b:4] LOG:  server process
(PID 329409) was terminated by signal 6: Aborted
2020-04-04 02:08:57.302 CEST [5e87d018.5066b:5] DETAIL:  Failed
process was running: BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ

That might well be related. I note in passing that the DETAIL emitted
by the postmaster shows the previous SQL command rather than the
more-recent replication command, which seems like something to fix. (I
still really dislike the fact that we have this evil hack allowing one
connection to mix and match those sets of commands...)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL

Re: [PATCH] Fix for slow GIN index queries when "gin_fuzzy_search_limit" setting is relatively small for large tables

2020-04-03 Thread Adé

Great. Thanks for refactoring it further and fixing other bugs in there
(and making it more clean too)!

Regards,
Ade

On Fri, Apr 3, 2020 at 1:18 PM Tom Lane  wrote:

> I wrote:
> > I'm inclined to back-patch this.  Given how fuzzy the definition
> > of gin_fuzzy_search_limit is, it seems unlikely that anyone would
> > be depending on the current buggy behavior.
>
> And done.  Thanks for the bug report and patch!
>
> regards, tom lane
>

Re: backup manifests

BTW, some of the buildfarm is showing a simpler portability problem:
they think you were too cavalier about the difference between time_t
and pg_time_t.  (On a platform with 32-bit time_t, that's an actual
bug, probably.)  lapwing is actually failing:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing=2020-04-03%2021%3A41%3A49

ccache gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Werror=vla -Wendif-labels 
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv 
-fexcess-precision=standard -g -O2 -Werror -I. -I. -I../../../src/include  
-DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS -D_GNU_SOURCE 
-I/usr/include/libxml2  -I/usr/include/et  -c -o basebackup.o basebackup.c
basebackup.c: In function 'AddFileToManifest':
basebackup.c:1199:10: error: passing argument 1 of 'pg_gmtime' from 
incompatible pointer type [-Werror]
In file included from ../../../src/include/access/xlog_internal.h:26:0,
 from basebackup.c:20:
../../../src/include/pgtime.h:49:22: note: expected 'const pg_time_t *' but 
argument is of type 'time_t *'
cc1: all warnings being treated as errors
make[3]: *** [basebackup.o] Error 1

but some others are showing it as a warning.

I suppose that judicious s/time_t/pg_time_t/ would fix this.

regards, tom lane

Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-04-03 Thread Alexander Korotkov

Hi!

On Fri, Apr 3, 2020 at 9:51 PM Anna Akenteva  wrote:
> In the patch that I reviewed, you could do things like:
>  WAIT FOR
>  LSN lsn0,
>  LSN lsn1 TIMEOUT time1,
>  LSN lsn2 TIMEOUT time2;
> and such a statement was in practice equivalent to
>  WAIT FOR LSN(max(lsn0, lsn1, lsn2)) TIMEOUT (max(time1, time2))
>
> As you can see, even though grammatically lsn1 is grouped with time1 and
> lsn2 is grouped with time2, both timeouts that we specified are not
> connected to their respective LSN-s, and instead they kinda act like
> global timeouts. Therefore, I didn't see a point in keeping TIMEOUT
> necessarily grammatically connected to LSN.
>
> In the new syntax our statement would look like this:
>  WAIT FOR LSN lsn0, LSN lsn1, LSN lsn2, TIMEOUT time1, TIMEOUT time2;
> TIMEOUT-s are not forced to be grouped with LSN-s anymore, which makes
> it more clear that all specified TIMEOUTs will be global and will apply
> to all LSN-s at once.
>
> The point of having TIMEOUT is still to let us limit the time of waiting
> for LSNs. It's just that with the new syntax, we can also use TIMEOUT
> without an LSN. You are right, such a case is equivalent to pg_sleep.
> One way to avoid that is to prohibit waiting for TIMEOUT without
> specifying an LSN. Do you think we should do that?

I think specifying multiple LSNs/TIMEOUTs is kind of ridiculous.  We
can assume that client application is smart enough to calculate
minimum/maximum on its side.  When multiple LSNs/TIMEOUTs are
specified, what should we wait for?  Reaching all the LSNs?  Reaching
any of LSNs?  Are timeouts independent from LSNs or sticked together?
So if we didn't manage to reach LSN1 in TIMEOUT1, then we don't wait
for LSN2 with TIMEOUT2 (or not)?

I think that now we would be fine with single LSN and single TIMEOUT.
In future we may add multiple LSNs/TIMEOUTs or/and support for
expressions as LSNs/TIMEOUTs if we figure out it's necessary.

I also think it's good to couple waiting for lsn with beginning of
transaction is good idea.  Separate WAIT FOR LSN statement called in
the middle of transaction looks problematic for me. Imagine we have RR
isolation and already acquired the snapshot.  Then out snapshot can
block applying wal records, which we are waiting for.  That would be
implicit deadlock.  It would be nice to evade such deadlocks by
design.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: backup manifests and contemporaneous buildfarm failures

Robert Haas  writes:
> On Fri, Apr 3, 2020 at 6:48 PM Tom Lane  wrote:
>> I'm guessing that we're looking at a platform-specific difference in
>> whether "rm -rf" fails outright on an unreadable subdirectory, or
>> just tries to carry on by unlinking it anyway.

> My intention was that it would be cleaned by the TAP framework itself,
> since the temporary directories it creates are marked for cleanup. But
> it may be that there's a platform dependency in the behavior of Perl's
> File::Path::rmtree, too.

Yeah, so it would seem.  The buildfarm script uses rmtree to clean out
the old build tree.  The man page for File::Path suggests (but can't
quite bring itself to say in so many words) that by default, rmtree
will adjust the permissions on target directories to allow the deletion
to succeed.  But that's very clearly not happening on some platforms.
(Maybe that represents a local patch on the part of some packagers
who thought it was too unsafe?)

Anyway, the end state presumably is that the pgsql.build directory
is still there at the end of the buildfarm run, and the next run's
attempt to also rmtree it fares no better.  Then look what it does
to set up the new build:

system("cp -R -p $target $build_path 2>&1");

Of course, if $build_path already exists, then cp copies to a subdirectory
of the target not the target itself.  So that explains the symptom
"./configure does not exist" --- it exists all right, but in a
subdirectory below the one where the buildfarm expects it to be.

It looks to me like the same problem would occur with VPATH or no.
The lack of failures among the VPATH-using critters probably has
more to do with whether their rmtree is willing to deal with this
case than with VPATH.

Anyway, it's evident that the buildfarm critters that are busted
will need manual cleanup, because the script is not going to be
able to get out of this by itself.  I remain of the opinion that
the hazard of that happening again in the future (eg, if a buildfarm
animal loses power during the test) is sufficient reason to remove
this test case.

regards, tom lane

Re: backup manifests and contemporaneous buildfarm failures

On Fri, Apr 3, 2020 at 5:58 PM Fabien COELHO  wrote:
> seawasp just failed the same way. Good news, I can see "configure" under
> "HEAD/pgsql".

Ah, good.

> The only strange thing under buildroot I found is:
>
> HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans/

Huh. I wonder how that got left behind ... it should've been cleaned
up by the TAP test framework. But I pushed a commit to change the
permissions back explicitly before exiting. As Tom says, I probably
need to remove that entire test, but I'm going to try this first.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: backup manifests and contemporaneous buildfarm failures

On Fri, Apr 3, 2020 at 6:48 PM Tom Lane  wrote:
> I'm guessing that we're looking at a platform-specific difference in
> whether "rm -rf" fails outright on an unreadable subdirectory, or
> just tries to carry on by unlinking it anyway.

My intention was that it would be cleaned by the TAP framework itself,
since the temporary directories it creates are marked for cleanup. But
it may be that there's a platform dependency in the behavior of Perl's
File::Path::rmtree, too.

> A partial fix would be to have the test script put back normal
> permissions on that directory before it exits ... but any failure
> partway through the script would leave a time bomb requiring manual
> cleanup.

Yeah. I've pushed that fix for now, but as you say, it may not survive
contact with the enemy. That's kind of disappointing, because I put a
lot of work into trying to make the tests cover every line of code
that they possibly could, and there's no reason to suppose that
pg_validatebackup is the only tool that could benefit from having code
coverage of those kinds of scenarios. It's probably not even the tool
that is most in need of such testing; it must be far worse if, say,
pg_rewind can't cope with it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Ltree syntax improvement

2020-04-03 Thread Nikita Glukhov


On 02.04.2020 19:16, Tom Lane wrote:


Nikita Glukhov  writes:

Rebased patch attached.

Thanks for rebasing!  The cfbot's not very happy though:

4842ltxtquery_io.c: In function ‘makepol’:
4843ltxtquery_io.c:188:13: error: ‘escaped’ may be used uninitialized in this 
function [-Werror=maybe-uninitialized]
4844  if (lenval - escaped <= 0)
4845 ^
4846ltxtquery_io.c:230:6: note: ‘escaped’ was declared here
4847  int   escaped;
4848  ^
4849cc1: all warnings being treated as errors

regards, tom lane


Fixed patch attached.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
>From c4929c05423f5b29063c739f25dfebfe99681d01 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov 
Date: Tue, 16 Jul 2019 17:59:32 +0300
Subject: [PATCH] Ltree syntax improvements

---
 contrib/ltree/expected/ltree.out | 1081 +-
 contrib/ltree/ltree.h|   25 +-
 contrib/ltree/ltree_io.c |  667 +--
 contrib/ltree/ltxtquery_io.c |  118 +++--
 contrib/ltree/sql/ltree.sql  |  267 ++
 doc/src/sgml/ltree.sgml  |   38 +-
 6 files changed, 1984 insertions(+), 212 deletions(-)

diff --git a/contrib/ltree/expected/ltree.out b/contrib/ltree/expected/ltree.out
index c6d8f3e..0c925b6 100644
--- a/contrib/ltree/expected/ltree.out
+++ b/contrib/ltree/expected/ltree.out
@@ -1,4 +1,5 @@
 CREATE EXTENSION ltree;
+SET standard_conforming_strings=on;
 -- Check whether any of our opclasses fail amvalidate
 SELECT amname, opcname
 FROM pg_opclass opc LEFT JOIN pg_am am ON am.oid = opcmethod
@@ -53,7 +54,7 @@ SELECT repeat('x', 255)::ltree;
 
 SELECT repeat('x', 256)::ltree;
 ERROR:  label string is too long
-DETAIL:  Label length is 256, must be at most 255, at character 257.
+DETAIL:  Label length is 256, must be at most 255, at character 1.
 SELECT ltree2text('1.2.3.34.sdf');
   ltree2text  
 --
@@ -336,6 +337,11 @@ SELECT lca('1.2.2.3','1.2.3.4.5.6','1');
  
 (1 row)
 
+SELECT ''::lquery;
+ERROR:  lquery syntax error
+LINE 1: SELECT ''::lquery;
+   ^
+DETAIL:  Unexpected end of input.
 SELECT '1'::lquery;
  lquery 
 
@@ -480,6 +486,16 @@ SELECT 'foo*@@*'::lquery;
  foo@*
 (1 row)
 
+SELECT '*'::lquery;
+ lquery 
+
+ *
+(1 row)
+
+SELECT '*{1}|2'::lquery;
+ERROR:  lquery syntax error at character 5
+LINE 1: SELECT '*{1}|2'::lquery;
+   ^
 SELECT 'qwerty%@*.tu'::lquery;
 lquery
 --
@@ -516,17 +532,15 @@ SELECT '!.2.3'::lquery;
 ERROR:  lquery syntax error at character 2
 LINE 1: SELECT '!.2.3'::lquery;
^
-DETAIL:  Empty labels are not allowed.
 SELECT '1.!.3'::lquery;
 ERROR:  lquery syntax error at character 4
 LINE 1: SELECT '1.!.3'::lquery;
^
-DETAIL:  Empty labels are not allowed.
 SELECT '1.2.!'::lquery;
-ERROR:  lquery syntax error at character 6
+ERROR:  lquery syntax error
 LINE 1: SELECT '1.2.!'::lquery;
^
-DETAIL:  Empty labels are not allowed.
+DETAIL:  Unexpected end of input.
 SELECT '1.2.3|@.4'::lquery;
 ERROR:  lquery syntax error at character 7
 LINE 1: SELECT '1.2.3|@.4'::lquery;
@@ -539,7 +553,7 @@ SELECT (repeat('x', 255) || '*@@*')::lquery;
 
 SELECT (repeat('x', 256) || '*@@*')::lquery;
 ERROR:  label string is too long
-DETAIL:  Label length is 256, must be at most 255, at character 257.
+DETAIL:  Label length is 256, must be at most 255, at character 1.
 SELECT ('!' || repeat('x', 255))::lquery;
   lquery  
 --
@@ -548,7 +562,7 @@ SELECT ('!' || repeat('x', 255))::lquery;
 
 SELECT ('!' || repeat('x', 256))::lquery;
 ERROR:  label string is too long
-DETAIL:  Label length is 256, must be at most 255, at character 258.
+DETAIL:  Label length is 256, must be at most 255, at character 2.
 SELECT nlevel('1.2.3.4');
  nlevel 
 
@@ -8084,3 +8098,1056 @@ SELECT count(*) FROM _ltreetest WHERE t ? '{23.*.1,23.*.2}' ;
 15
 (1 row)
 
+-- Extended syntax, escaping, quoting etc
+-- success
+SELECT E'\\.'::ltree;
+ ltree 
+---
+ "."
+(1 row)
+
+SELECT E'\\ '::ltree;
+ ltree 
+---
+ " "
+(1 row)
+
+SELECT E''::ltree;
+ ltree 
+---
+ "\\"
+(1 row)
+
+SELECT E'\\a'::ltree;
+ ltree 
+---
+ a
+(1 row)
+
+SELECT E'\\n'::ltree;
+ ltree 
+---
+ n
+(1 row)
+
+SELECT E'x'::ltree;
+ ltree 
+---
+ "x\\"
+(1 row)
+
+SELECT E'x\\ '::ltree;
+ ltree 
+---
+ "x "
+(1 row)
+
+SELECT E'x\\.'::ltree;
+ ltree 
+---
+ "x."
+(1 row)

Re: vacuum_defer_cleanup_age inconsistently applied on replicas

2020-04-03 Thread Peter Geoghegan

On Fri, Apr 3, 2020 at 4:18 PM Peter Geoghegan  wrote:
> OTOH, I wonder if it's possible that vacuum_defer_cleanup_age was
> deliberately intended to affect the behavior of
> XLogWalRcvSendHSFeedback(), which is probably one of the most common
> reasons why GetOldestXmin() is called on standbys.

Pressed "send" too soon. vacuum_defer_cleanup_age *doesn't* get
applied when recovery is in progress, so that definitely can't be
true.

Another hint that vacuum_defer_cleanup_age is only really supposed to
be used on the primary is the fact that it appears under "18.6.1.
Master Server" in the 9.1 docs.

-- 
Peter Geoghegan

Re: vacuum_defer_cleanup_age inconsistently applied on replicas

2020-04-03 Thread Peter Geoghegan

On Fri, Apr 3, 2020 at 3:53 PM Andres Freund  wrote:
> GetOldestXmin() applies vacuum_defer_cleanup_age only when
> !RecoveryInProgress(). In contrast to that GetSnapshotData() applies it
> unconditionally.
>
> I'm not actually clear whether including vacuum_defer_cleanup_age on a
> replica is meaningful. But it strikes me as odd to have that behavioural
> difference between GetOldestXmin() and GetSnapshotData() - without any
> need, as far as I can tell?

Did you notice the comments added by Tom in b4a0223d008, which repeat
the claim that it isn't used on standbys? I think that this is
probably just an oversight in bca8b7f1, as you suggested. It's not
that hard to imagine how this oversight might have happened: Hot
standby feedback was introduced, and nobody cared about
vacuum_defer_cleanup_age anymore. It was always very difficult to
tune.

OTOH, I wonder if it's possible that vacuum_defer_cleanup_age was
deliberately intended to affect the behavior of
XLogWalRcvSendHSFeedback(), which is probably one of the most common
reasons why GetOldestXmin() is called on standbys.

-- 
Peter Geoghegan

Re: [HACKERS] Restricting maximum keep segments by repslots

So, the more I look at this patch, the less I like the way the slots are
handled.

* I think it's a mistake to try to do anything in KeepLogSeg itself;
  that function is merely in charge of some arithmetic.  I propose to
  make that function aware of the new size limitation (so that it
  doesn't trust the slot's LSNs completely), but to make the function
  have no side effects.  The attached patch does that, I hope.
  To replace that responsibility, let's add another function.  I named it
  InvalidateObsoleteReplicationSlots().  In CreateCheckPoint and
  CreateRestartPoint, we call the new function just before removing
  segments.  Note: the one in this patch doesn't actually work or even
  compile.
  The new function must:

  1. mark the slot as "invalid" somehow.  Maybe it would make sense to
  add a new flag in the on-disk struct for this; but for now I'm just
  thinking that changing the slot's restart_lsn is sufficient.
  (Of course, I haven't tested this, so there might be side-effects that
  mean that this idea doesn't work).

  2. send SIGTERM to a walsender that's using such a slot.

  3. Send the warning message.  Instead of trying to construct a message
  with a list of slots, send one message per slot.  (I think this is
  better from a translatability point of view, and also from a
  monitoring PoV).

* GetWalAvailability seems too much in competition with
  DistanceToWalRemoval.  Which is weird, because both functions do
  pretty much the same thing.  I think a better design is to make the
  former function return the distance as an out parameter.

* Andres complained that the "distance" column was not a great value to
  expose (20171106132050.6apzynxrqrzgh...@alap3.anarazel.de).  That's
  right: it changes both by the insertion LSN as well as the slot's
  consumption.  Maybe we can expose the earliest live LSN (start of the
  earliest segment?) as a new column.  It'll be the same for all slots,
  I suppose, but we don't care, do we?

I attach a rough sketch, which as I said before doesn't work and doesn't
compile.  Sadly I have reached the end of my day here so I won't be able
to work on this for today anymore.  I'll be glad to try again tomorrow,
but in the meantime I thought it was better to send it over and see
whether you had any thoughts about this proposed design (maybe you know
it doesn't work for some reason), or better yet, you have the chance to
actually complete the code or at least move it a little further.

Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 64614b569c..01a7802ed4 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9907,6 +9907,54 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
   
  
 
+ 
+  wal_status
+  text
+  
+
+  Availability of WAL files claimed by this slot.
+  Valid values are:
+   
+
+ normal means that the claimed files
+ are within max_wal_size
+
+
+ keeping means that max_wal_size
+ is exceeded but still held by replication slots or
+ wal_keep_segments
+
+
+ losing means that some of the files are on the verge
+ of deletion, but can still be accessed by a session that's currently
+ reading it
+
+
+ lost means that some of them are definitely lost
+ and the session using this slot cannot continue replication.
+ This state also implies that the session using this slot has been
+ stopped.
+
+   
+  The last two states are seen only when
+   is
+  non-negative. If restart_lsn is NULL, this
+  field is null.
+  
+ 
+
+ 
+  remain
+  bigint
+  
+  The amount in bytes of WAL that can be written before this slot
+loses required WAL files.
+If restart_lsn is null or
+wal_status is losing
+or lost, this field is null.
+  
+ 
+
 

   
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c4d6ed4bbc..17c18386e2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3753,6 +3753,29 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
   
  
 
+  
+   max_slot_wal_keep_size (integer)
+   
+max_slot_wal_keep_size configuration parameter
+   
+   
+   
+   
+Specify the maximum size of WAL files
+that replication
+slots are allowed to retain in the pg_wal
+directory at checkpoint time.
+If max_slot_wal_keep_size is -1 (the default),
+replication slots retain unlimited amount of WAL files.  If
+restart_lsn of a replication slot gets behind more than that megabytes
+from the current LSN, the standby using the slot may no

Re: backup manifests and contemporaneous buildfarm failures

I wrote:
> I'm guessing that we're looking at a platform-specific difference in
> whether "rm -rf" fails outright on an unreadable subdirectory, or
> just tries to carry on by unlinking it anyway.

Yeah... on my RHEL6 box, "make check" cleans up the working directories
under tmp_check, but on a FreeBSD 12.1 box, not so much: I'm left with

$ ls tmp_check/
log/t_003_corruption_master_data/
tgl@oldmini$ ls -R tmp_check/t_003_corruption_master_data/
backup/

tmp_check/t_003_corruption_master_data/backup:
open_directory_fails/

tmp_check/t_003_corruption_master_data/backup/open_directory_fails:
pg_subtrans/

tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans:
ls: 
tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans: 
Permission denied

I did not see any complaints printed to the terminal, but in
regress_log_003_corruption there's

...
ok 40 - corrupt backup fails validation: open_directory_fails: matches
cannot chdir to child for 
/usr/home/tgl/pgsql/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans:
 Permission denied at t/003_corruption.pl line 126.
cannot remove directory for 
/usr/home/tgl/pgsql/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails:
 Directory not empty at t/003_corruption.pl line 126.
# Running: pg_basebackup -D 
/usr/home/tgl/pgsql/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/search_directory_fails
 --no-sync -T /tmp/lxaL_sLcnr=/tmp/_fegwVjoDR
ok 41 - base backup ok
...

This may be more of a Perl version issue than a platform issue,
but either way it's a problem.

Also, on the FreeBSD box, "rm -rf" isn't happy either:

$ rm -rf tmp_check
rm: 
tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans: 
Permission denied
rm: tmp_check/t_003_corruption_master_data/backup/open_directory_fails: 
Directory not empty
rm: tmp_check/t_003_corruption_master_data/backup: Directory not empty
rm: tmp_check/t_003_corruption_master_data: Directory not empty
rm: tmp_check: Directory not empty


regards, tom lane

vacuum_defer_cleanup_age inconsistently applied on replicas

2020-04-03 Thread Andres Freund

Hi,

GetOldestXmin() applies vacuum_defer_cleanup_age only when
!RecoveryInProgress(). In contrast to that GetSnapshotData() applies it
unconditionally.

I'm not actually clear whether including vacuum_defer_cleanup_age on a
replica is meaningful. But it strikes me as odd to have that behavioural
difference between GetOldestXmin() and GetSnapshotData() - without any
need, as far as I can tell?


The difference seems to have been introduced in

commit bca8b7f16a3e720794cb0afbdb3733be4f8d9c2c
Author: Simon Riggs 
Date:   2011-02-16 19:29:37 +

Hot Standby feedback for avoidance of cleanup conflicts on standby.
Standby optionally sends back information about oldestXmin of queries
which is then checked and applied to the WALSender's proc->xmin.
GetOldestXmin() is modified slightly to agree with GetSnapshotData(),
so that all backends on primary include WALSender within their snapshots.
Note this does nothing to change the snapshot xmin on either master or
standby. Feedback piggybacks on the standby reply message.
vacuum_defer_cleanup_age is no longer used on standby, though parameter
still exists on primary, since some use cases still exist.

Simon Riggs, review comments from Fujii Masao, Heikki Linnakangas, Robert 
Haas


without, as far as I can tell, explaining why "vacuum_defer_cleanup_age
is no longer used on standby" shouldn't also apply to
GetSnapshotData().

I suspect it doesn't hurt all that much to unnecessarily apply
vacuum_defer_cleanup_age on a replica. The only thing I see where it
matters is that it makes get_actual_variable_endpoint() less accurate,
which we probably would like to avoid...

Greetings,

Andres Freund

Re: backup manifests and contemporaneous buildfarm failures

Thomas Munro  writes:
> Same here, on elver.  I see pg_subtrans has been chmod(0)'d,
> presumably by the perl subroutine mutilate_open_directory_fails.  I
> see this in my inbox (the build farm wrote it to stderr or stdout
> rather than the log file):

> cannot chdir to child for
> pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans:
> Permission denied at ./run_build.pl line 1013.
> cannot remove directory for
> pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails:
> Directory not empty at ./run_build.pl line 1013.

I'm guessing that we're looking at a platform-specific difference in
whether "rm -rf" fails outright on an unreadable subdirectory, or
just tries to carry on by unlinking it anyway.

A partial fix would be to have the test script put back normal
permissions on that directory before it exits ... but any failure
partway through the script would leave a time bomb requiring manual
cleanup.

On the whole, I'd argue that testing that behavior is not valuable
enough to take risks of periodically breaking buildfarm members
in a way that will require manual recovery --- to say nothing of
annoying developers who trip over it.  So my vote is to remove
that part of the test and be satisfied with checking the behavior
for an unreadable file.

This doesn't directly explain the failure-at-next-configure behavior
that we're seeing in the buildfarm, but it wouldn't be too surprising
if it ends up being that the buildfarm client script doesn't manage
to fully recover from the situation.

regards, tom lane

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Stephen Frost

Greetings,

* Thomas Munro (thomas.mu...@gmail.com) wrote:
> On Sat, Apr 4, 2020 at 11:13 AM Tom Lane  wrote:
> > Fabien COELHO  writes:
> > > The only strange thing under buildroot I found is:
> >
> > > HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans/
> >
> > > this last directory perms are d- which seems to break cleanup.
> 
> Same here, on elver.  I see pg_subtrans has been chmod(0)'d,
> presumably by the perl subroutine mutilate_open_directory_fails.  I
> see this in my inbox (the build farm wrote it to stderr or stdout
> rather than the log file):

Yup, saw the same here.

chmod'ing it to 755 seemed to result it the next run cleaning it up, at
least.  Not sure how things will go on the next actual build tho.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Thomas Munro

On Sat, Apr 4, 2020 at 11:13 AM Tom Lane  wrote:
> Fabien COELHO  writes:
> > The only strange thing under buildroot I found is:
>
> > HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans/
>
> > this last directory perms are d- which seems to break cleanup.

Same here, on elver.  I see pg_subtrans has been chmod(0)'d,
presumably by the perl subroutine mutilate_open_directory_fails.  I
see this in my inbox (the build farm wrote it to stderr or stdout
rather than the log file):

cannot chdir to child for
pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans:
Permission denied at ./run_build.pl line 1013.
cannot remove directory for
pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails:
Directory not empty at ./run_build.pl line 1013.
cannot remove directory for
pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup:
Directory not empty at ./run_build.pl line 1013.
cannot remove directory for
pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data:
Directory not empty at ./run_build.pl line 1013.
cannot remove directory for
pgsql.build/src/bin/pg_validatebackup/tmp_check: Directory not empty
at ./run_build.pl line 1013.
cannot remove directory for pgsql.build/src/bin/pg_validatebackup:
Directory not empty at ./run_build.pl line 1013.
cannot remove directory for pgsql.build/src/bin: Directory not empty
at ./run_build.pl line 1013.
cannot remove directory for pgsql.build/src: Directory not empty at
./run_build.pl line 1013.
cannot remove directory for pgsql.build: Directory not empty at
./run_build.pl line 1013.
cannot chdir to child for
pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans:
Permission denied at ./run_build.pl line 589.
cannot remove directory for
pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails:
Directory not empty at ./run_build.pl line 589.
cannot remove directory for
pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup:
Directory not empty at ./run_build.pl line 589.
cannot remove directory for
pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data:
Directory not empty at ./run_build.pl line 589.
cannot remove directory for
pgsql.build/src/bin/pg_validatebackup/tmp_check: Directory not empty
at ./run_build.pl line 589.
cannot remove directory for pgsql.build/src/bin/pg_validatebackup:
Directory not empty at ./run_build.pl line 589.
cannot remove directory for pgsql.build/src/bin: Directory not empty
at ./run_build.pl line 589.
cannot remove directory for pgsql.build/src: Directory not empty at
./run_build.pl line 589.
cannot remove directory for pgsql.build: Directory not empty at
./run_build.pl line 589.

Re: proposal \gcsv

Pavel Stehule  writes:
> We can have a new commands for cloning print environments and switch to one
> shot environment. It can be based just on enhancing of \pset command

> \pset save anyidentifier .. serialize settings
> \pset load anyidentifier .. load setting
> \pset oneshot [anyidentifer] .. prepare and set copy of current print
> setting for next execution command
> \pset main
> \pset reset .. reset to defaults

I feel like that's gotten pretty far away from the idea of a simple,
easy-to-use way of adjusting the parameters for one \g operation.
There'd be a whole lot of typing involved above and beyond the
obviously-necessary part of specifying the new pset parameter values.

(Also, it's not clear to me how that's any more robust than the
stack idea.  If you could lose "\pset pop" to an error, you could
lose "\pset reset" too.)

If people are upset about the extra whitespace in the paren-style
proposal, we could do without it.  The only real problem would be
if there's ever a pset parameter for which a trailing right paren
could be a sensible part of the value.  Maybe that's not ever
going to be an issue; or maybe we could provide a quoting mechanism
for weird pset values.

regards, tom lane

Re: backup manifests and contemporaneous buildfarm failures

On 2020-Apr-03, Tom Lane wrote:

> I wonder if VPATH versus not-VPATH might be a relevant factor ...

Oh, absolutely.  The ones that failed show, in the last successful run,
the configure line invoked as "./configure", while the animals that are
still running are invoking configure from some other directory.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: backup manifests and contemporaneous buildfarm failures

Fabien COELHO  writes:
> The only strange thing under buildroot I found is:

> HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans/

> this last directory perms are d- which seems to break cleanup.

Locally, I observe that "make clean" in src/bin/pg_validatebackup fails
to clean up the tmp_check directory left behind by "make check".
So the new makefile is not fully plugged into its standard
responsibilities.  I don't see any unreadable subdirectories though.

I wonder if VPATH versus not-VPATH might be a relevant factor ...

regards, tom lane

Re: backup manifests and contemporaneous buildfarm failures

2020-04-03 Thread Fabien COELHO

Hello Robert,

Done now. Meanwhile, two more machines have reported the mysterious message:

sh: ./configure: not found

...that first appeared on spurfowl a few hours ago. The other two
machines are eelpout and elver, both of which list Thomas Munro as a
maintainer. spurfowl lists Stephen Frost. Thomas, Stephen, can one of
you check and see what's going on? spurfowl has failed this way four
times now, and eelpout and elver have each failed the last two runs,
but since there's no helpful information in the logs, it's hard to
guess what went wrong.

I'm sort of afraid that something in the new TAP tests accidentally
removed way too many files during the cleanup phase - e.g. it decided
the temporary directory was / and removed every file it could access,
or something like that. It doesn't do that here, or I, uh, would've
noticed by now. But sometimes strange things happen on other people's
machines. Hopefully one of those strange things is not that my test
code is single-handedly destroying the entire buildfarm, but it's
possible.

seawasp just failed the same way. Good news, I can see "configure" under
"HEAD/pgsql".

The only strange thing under buildroot I found is:

HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/t_003_corruption_master_data/backup/open_directory_fails/pg_subtrans/

this last directory perms are d- which seems to break cleanup.

It may be a left over from a previous run which failed (possibly 21dc488
?). I cannot see how this would be related to configure, though. Maybe
something else fails silently and the message is about a consequence of
the prior silent failure.

I commented out the cron job and will try to look into it on tomorrow if
the status has not changed by then.

--
Fabien.

backup manifests and contemporaneous buildfarm failures

[ splitting this off into a separate thread ]

On Fri, Apr 3, 2020 at 5:07 PM Robert Haas  wrote:
> I'lll go see about adding that.

Done now. Meanwhile, two more machines have reported the mysterious message:

sh: ./configure: not found

...that first appeared on spurfowl a few hours ago. The other two
machines are eelpout and elver, both of which list Thomas Munro as a
maintainer. spurfowl lists Stephen Frost. Thomas, Stephen, can one of
you check and see what's going on? spurfowl has failed this way four
times now, and eelpout and elver have each failed the last two runs,
but since there's no helpful information in the logs, it's hard to
guess what went wrong.

I'm sort of afraid that something in the new TAP tests accidentally
removed way too many files during the cleanup phase - e.g. it decided
the temporary directory was / and removed every file it could access,
or something like that. It doesn't do that here, or I, uh, would've
noticed by now. But sometimes strange things happen on other people's
machines. Hopefully one of those strange things is not that my test
code is single-handedly destroying the entire buildfarm, but it's
possible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: backup manifests

2020-04-03 Thread Justin Pryzby

On Fri, Apr 03, 2020 at 03:22:23PM -0400, Robert Haas wrote:
> I've pushed all the patches.

I didn't manage to look at this in advance but have some doc fixes.

word-diff:

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 536de9a698..d84afb7b18 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2586,7 +2586,7 @@ The commands accepted in replication mode are:
  and sent along with the backup.  The manifest is a list of every
  file present in the backup with the exception of any WAL files that
  may be included. It also stores the size, last modification time, and
  [-an optional-]{+optionally a+} checksum for each file.
  A value of force-escape forces all filenames
  to be hex-encoded; otherwise, this type of encoding is performed only
  for files whose names are non-UTF8 octet sequences.
@@ -2602,7 +2602,7 @@ The commands accepted in replication mode are:
MANIFEST_CHECKSUMS

 
  Specifies the {+checksum+} algorithm that should be applied to each 
file included
  in the backup manifest. Currently, the available
  algorithms are NONE, CRC32C,
  SHA224, SHA256,
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml 
b/doc/src/sgml/ref/pg_basebackup.sgml
index c778e061f3..922688e227 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -604,7 +604,7 @@ PostgreSQL documentation
not contain any checksums. Otherwise, it will contain a checksum
of each file in the backup using the specified algorithm. In addition,
the manifest will always contain a SHA256
checksum of its own [-contents.-]{+content.+} The 
SHA algorithms
are significantly more CPU-intensive than CRC32C,
so selecting one of them may increase the time required to complete
the backup.
@@ -614,7 +614,7 @@ PostgreSQL documentation
of each file for users who wish to verify that the backup has not been
tampered with, while the CRC32C algorithm provides a checksum which is
much faster to calculate and good at catching errors due to accidental
changes but is not resistant to [-targeted-]{+malicious+} 
modifications.  Note that, to
be useful against an adversary who has access to the backup, the backup
manifest would need to be stored securely elsewhere or otherwise
verified not to have been modified since the backup was taken.
diff --git a/doc/src/sgml/ref/pg_validatebackup.sgml 
b/doc/src/sgml/ref/pg_validatebackup.sgml
index 19888dc196..748ac439a6 100644
--- a/doc/src/sgml/ref/pg_validatebackup.sgml
+++ b/doc/src/sgml/ref/pg_validatebackup.sgml
@@ -41,12 +41,12 @@ PostgreSQL documentation
  

  
   It is important to note that[-that-] the validation which is performed by
   pg_validatebackup does not and [-can 
not-]{+cannot+} include
   every check which will be performed by a running server when attempting
   to make use of the backup. Even if you use this tool, you should still
   perform test restores and verify that the resulting databases work as
   expected and that they[-appear to-] contain the correct data. However,
   pg_validatebackup can detect many problems
   that commonly occur due to storage problems or user error.
  
@@ -73,7 +73,7 @@ PostgreSQL documentation
   a backup_manifest file in the target directory or
   about anything inside pg_wal, even though these
   files won't be listed in the backup manifest. Only files are checked;
   the presence or absence [-or-]{+of+} directories is not verified, except
   indirectly: if a directory is missing, any files it should have contained
   will necessarily also be missing. 
  
@@ -84,7 +84,7 @@ PostgreSQL documentation
   for any files for which the computed checksum does not match the
   checksum stored in the manifest. This step is not performed for any files
   which produced errors in the previous step, since they are already known
   to have problems. [-Also, files-]{+Files+} which were ignored in the 
previous step are
   also ignored in this step.
  

@@ -123,7 +123,7 @@ PostgreSQL documentation
  Options

   
The following command-line options control the [-behavior.-]{+behavior of 
this program.+}


 
diff --git a/src/backend/replication/basebackup.c 
b/src/backend/replication/basebackup.c
index 3b18e733cd..aa72a6ff10 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1148,7 +1148,7 @@ AddFileToManifest(manifest_info *manifest, const char 
*spcoid,
}

/*
 * Each file's entry [-need-]{+needs+} to be separated from any entry 
that follows by a
 * comma, but there's no comma before the first one or after the last 
one.
 * To make that work, adding a file to the manifest starts by 
terminating
 * the most recently added line, with a comma if appropriate, but does 
not

--

Re: d25ea01275 and partitionwise join

Amit Langote  writes:
> Updated patches attached.

I looked through these and committed 0001+0002, with some further
comment-polishing.  However, I have no faith at all in 0003.  It is
blithely digging through COALESCE expressions with no concern for
whether they came from full joins or not, or whether the other values
being coalesced to might completely change the semantics.  Digging
through PlaceHolderVars scares me even more; what's that got to do
with the problem, anyway?  So while this might fix the complained-of
issue of failing to use a partitionwise join, I think it wouldn't be
hard to create examples that it would incorrectly turn into
partitionwise joins.

I wonder whether it'd be feasible to fix the problem by going in the
other direction; that is, while constructing the nullable_partexprs
lists for a full join, add synthesized COALESCE() expressions for the
output columns (by wrapping COALESCE around copies of the input rels'
partition expressions), and then not need to do anything special in
match_expr_to_partition_keys.  We'd still need to convince ourselves
that this did the right thing and not any of the wrong things, but
I think it might be easier to prove it that way.

regards, tom lane

Re: backup manifests

On Fri, Apr 3, 2020 at 4:54 PM Alvaro Herrera  wrote:
> Maybe it needs perl2host?

*jaw drops*

Wow, OK, yeah, that looks like the thing.  Thanks for the suggestion;
I didn't know that existed (and I kinda wish I still didn't).

I'lll go see about adding that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add A Glossary

2020-04-03 Thread Erik Rijkers

On 2020-04-03 22:51, Alvaro Herrera wrote:

On 2020-Apr-03, Erik Rijkers wrote:

On 2020-04-03 18:45, Alvaro Herrera wrote:
> Pushed now.  Many thanks to Corey who put the main thrust, and to Jürgen
> and Roger for the great help, and to Justin for the extensive review and
> Fabien for the initial discussion.

A few improvements:

Thanks!  That gives me the attached patch.

Should also be a  lemmata in the glossary:

ACID

Agreed.  Wording suggestions welcome.

How about:

"
ACID

Atomicity, consistency, isolation, and durability. ACID is a set of 
properties of database transactions intended to guarantee validity even 
in the event of power failures, etc.
ACID is concerned with how the database recovers from such failures that 
might occur while processing a transaction.

"

'archaic' should maybe be 'obsolete'. That seems to me to be an easier 
word

for non-native speakers.

Bummer ;-)

OK - we'll figure it out :)

--
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add A Glossary

2020-04-03 Thread Justin Pryzby

On Fri, Apr 03, 2020 at 05:51:43PM -0300, Alvaro Herrera wrote:
> - The internal representation of one value of a SQL
> + The internal representation of one value of an SQL

I'm not sure about this one.  The new glossary says "a SQL" seven times, and
doesn't say "an sql" at all.

"An SQL" does appear to be more common in the rest of the docs, but if you
change one, I think you'd change them all.

BTW it's now visible at:
https://www.postgresql.org/docs/devel/glossary.html

-- 
Justin

Re: backup manifests

On 2020-Apr-03, Robert Haas wrote:

> sub tempdir_short
> {
> 
> return File::Temp::tempdir(CLEANUP => 1);
> }
> 
> And File::Temp's documentation says that the temporary directory is
> picked using File::Spec's tmpdir(), which says that it knows about
> different operating systems and will DTRT on Unix, Mac, OS2, Win32,
> and VMS. Yet on fairywren it is apparently DTWT. I'm not sure why.

Maybe it needs perl2host?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add A Glossary

On 2020-Apr-03, Erik Rijkers wrote:

> On 2020-04-03 18:45, Alvaro Herrera wrote:
> > Pushed now.  Many thanks to Corey who put the main thrust, and to Jürgen
> > and Roger for the great help, and to Justin for the extensive review and
> > Fabien for the initial discussion.
> 
> A few improvements:

Thanks!  That gives me the attached patch.

> Should also be a  lemmata in the glossary:
> 
> ACID

Agreed.  Wording suggestions welcome.

> 'archaic' should maybe be 'obsolete'. That seems to me to be an easier word
> for non-native speakers.

Bummer ;-)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 8c6cb6e942..b5155e1a85 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -48,7 +48,7 @@

 
  In reference to a datum:
- the fact that its value that cannot be broken down into smaller
+ the fact that its value cannot be broken down into smaller
  components.
 

@@ -270,14 +270,14 @@
  global SQL objects. The
  cluster is managed by exactly one
  instance. A newly created
- Cluster will have three databases created automatically. They are
+ cluster will have three databases created automatically. They are
  template0, template1, and
  postgres. It is expected that an application will
  create one or more additional database aside from these three.
 
 
  (Don't confuse the PostgreSQL-specific term
- Cluster with the SQL
+ cluster with the SQL
  command CLUSTER).
 

@@ -363,7 +363,7 @@

 
  A restriction on the values of data allowed within a
- Table.
+ table.
 
 
  For more information, see
@@ -424,7 +424,7 @@
Datum

 
- The internal representation of one value of a SQL
+ The internal representation of one value of an SQL
  data type.
 

@@ -617,7 +617,7 @@

 
  Contains the values of row
- attributes (i.e. the data) for a
+ attributes (i.e., the data) for a
  relation.
  The heap is realized within
  segment files.
@@ -835,7 +835,7 @@

 
  A relation that is
- defined in the same way that a a view
+ defined in the same way that a view
  is, but stores data in the same way that a
  table does. It cannot be
  modified via INSERT, UPDATE, or
@@ -900,7 +900,7 @@
 
  In reference to a
  partitioned table:
- One of the tables that each contain part of the data of the partitioned table,
+ One of multiple tables that each contain part of the data of the partitioned table,
  which is said to be the parent.
  The partition is itself a table, so it can also be queried directly;
  at the same time, a partition can sometimes be a partitioned table,
@@ -910,7 +910,7 @@

 
  In reference to a window function:
- a partition is a user-defined criteria that identifies which neighboring
+ a partition is a user-defined criterion that identifies which neighboring
  rows can be considered by the
  function.
 
@@ -1103,7 +1103,7 @@
 
  A data structure transmitted from a
  backend process to
- client program upon the completion of a SQL
+ client program upon the completion of an SQL
  command, usually a SELECT but it can be an
  INSERT, UPDATE, or
  DELETE command if the RETURNING
@@ -1134,8 +1134,8 @@

 
  A collection of access privileges to the
- instance.
- Roless are themselves a privilege that can be granted to other roles.
+ instance.
+ Roles are themselves a privilege that can be granted to other roles.
  This is often done for convenience or to ensure completeness
  when multiple users need
  the same privileges.
@@ -1151,7 +1151,7 @@
Rollback

 
- A command to undo all of the operations performed since the beginning
+ A command to undo all operations performed since the beginning
  of a transaction.
 
 
@@ -1170,7 +1170,7 @@
Savepoint

 
- A special mark inside the sequence of steps in a
+ A special mark in the sequence of steps in a
  transaction.
  Data modifications after this point in time may be reverted
  to the time of the savepoint.
@@ -1192,7 +1192,8 @@
  SQL object must reside in exactly one schema.
 
 
- The names of SQL objects of the same type in the same schema are enforced unique.
+ The names of SQL objects of the same type in the same schema are enforced
+ to be unique.
  There is no restriction on reusing a name in multiple schemas.
 
 
@@ -1205,7 +1206,7 @@


 
- More generically, the term Schema is used to mean
+ More generically, the term schema is used to mean
  all data descriptions (table definitions,
  constraints, comments, etc)
  for

Re: backup manifests

On Fri, Apr 3, 2020 at 3:53 PM Robert Haas  wrote:
> 2. Also, a bunch of machines were super-unhappy with
> 003_corruption.pl, failing with this sort of thing:
>
> pg_basebackup: error: could not get COPY data stream: ERROR:  symbolic
> link target too long for tar format: file name "pg_tblspc/16387",
> target 
> "/home/fabien/pg/build-farm-11/buildroot/HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/tmp_test_7w0w"
>
> Apparently, this is a known problem and the solution is to use
> TestLib::tempdir_short instead of TestLib::tempdir, so I pushed a fix
> to make it do that.

By and large, the buildfarm is a lot happier now, but fairywren
(Windows / Msys Server 2019 / 2 gcc 7.3.0 x86_64) failed like this:

# Postmaster PID for node "master" is 198420
error running SQL: 'psql::3: ERROR:  directory
"/tmp/9peoZHrEia" does not exist'
while running 'psql -XAtq -d port=51493 host=127.0.0.1
dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'CREATE TABLE x1
(a int);
INSERT INTO x1 VALUES (111);
CREATE TABLESPACE ts1 LOCATION '/tmp/9peoZHrEia';
CREATE TABLE x2 (a int) TABLESPACE ts1;
INSERT INTO x1 VALUES (222);
' at 
/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/perl/PostgresNode.pm
line 1531.
### Stopping node "master" using mode immediate

I wondered why this should be failing on this machine when none of the
other places where tempdir_short is used are similarly failing. The
answer appears to be that most of the TAP tests that use tempdir_short
just do this:

my $tempdir_short = TestLib::tempdir_short;

...and then ignore that variable completely for the rest of the
script.  That's not ideal, and we should probably remove those calls
to avoid giving that it's actually used for something. The two TAP
tests that actually do something with it - apart from the one I just
added - are pg_basebackup's 010_pg_basebackup.pl and pg_ctl's
001_start_stop.pl. However, both of those are skipped on Windows.
Also, PostgresNode.pm itself uses it, but only when UNIX sockets are
used, so again not on Windows. So it sorta looks to me like we no
preexisting tests that meaningfully exercise TestLib::tempdir_short on
Windows.

Given that, I suppose I should consider myself lucky if this ends up
working on *any* of the Windows critters, but given the implementation
I'm kinda surprised we have a problem. That function is just:

sub tempdir_short
{

return File::Temp::tempdir(CLEANUP => 1);
}

And File::Temp's documentation says that the temporary directory is
picked using File::Spec's tmpdir(), which says that it knows about
different operating systems and will DTRT on Unix, Mac, OS2, Win32,
and VMS. Yet on fairywren it is apparently DTWT. I'm not sure why.

Any ideas?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Binary support for pgoutput plugin

2020-04-03 Thread Dave Cramer

On Fri, 3 Apr 2020 at 03:43, Petr Jelinek  wrote:

> Hi,
>
> On 08/03/2020 00:18, Dave Cramer wrote:
> > On Fri, 6 Mar 2020 at 08:54, Petr Jelinek  > > wrote:
> >
> > Hi Dave,
> >
> > On 29/02/2020 18:44, Dave Cramer wrote:
> >  >
> >  >
> >  > rebased and removed the catversion bump.
> >
> > Looked into this and it generally seems okay, but I do have one
> > gripe here:
> >
> >  > + tuple->values[i].data =
> > palloc(len + 1);
> >  > + /* and data */
> >  > +
> >  > + pq_copymsgbytes(in,
> > tuple->values[i].data, len);
> >  > + tuple->values[i].len = len;
> >  > + tuple->values[i].cursor = 0;
> >  > + tuple->values[i].maxlen =
> len;
> >  > + /* not strictly necessary
> > but the docs say it is required */
> >  > + tuple->values[i].data[len]
> > = '\0';
> >  > + break;
> >  > + }
> >  > + case 't':   /* text
> > formatted value */
> >  > + {
> >  > + tuple->changed[i] = true;
> >  > + int len = pq_getmsgint(in,
> > 4);  /* read length */
> >  >
> >  >   /* and data */
> >  > - tuple->values[i] =
> > palloc(len + 1);
> >  > - pq_copymsgbytes(in,
> > tuple->values[i], len);
> >  > - tuple->values[i][len] =
> '\0';
> >  > + tuple->values[i].data =
> > palloc(len + 1);
> >  > + pq_copymsgbytes(in,
> > tuple->values[i].data, len);
> >  > + tuple->values[i].data[len]
> > = '\0';
> >  > + tuple->values[i].len = len;
> >
> > The cursor should be set to 0 in the text formatted case too if this
> is
> > how we chose to encode data.
> >
> > However I am not quite convinced I like the StringInfoData usage
> here.
> > Why not just change the struct to include additional array of lengths
> > rather than replacing the existing values array with StringInfoData
> > array, that seems generally both simpler and should have smaller
> memory
> > footprint too, no?
> >
> >
> > Can you explain this a bit more? I don't really see a huge difference in
> > memory usage.
> > We still need length and the data copied into LogicalRepTupleData when
> > sending the data in binary, no?
> >
>
> Yes but we don't need to have fixed sized array of 1600 elements that
> contains maxlen and cursor positions of the StringInfoData struct which
> we don't use for anything AFAICS.
>

OK, I can see an easy way to only allocate the number of elements required
but since OidReceiveFunctionCall takes
StringInfo as one of it's arguments it seems like an easy path unless there
is something I am missing ?

Dave

Re: proposal \gcsv

st 1. 4. 2020 v 18:29 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > It can work, but it is not user friendly - my proposal was motivated by
> > using some quick csv exports to gplot's pipe.
>
> I kind of liked the stack idea, myself.  It's simpler than what I was
> suggesting and it covers probably 90% of the use-case.
>

The stack idea probably needs much stronger psql handling error redesign to
be safe

postgres=# \set ON_ERROR_STOP 1
postgres=# select 10/0 \echo 'ahoj' \g \echo 'nazdar
ahoj
ERROR:  division by zero

There is not guaranteed so the command for returning to stored state will
be executed.



> However, if we prefer something closer to Plan A ... I took a look at
> the psql lexer, and the only difference between OT_FILEPIPE and OT_NORMAL
> parsing is if the argument starts with '|'.  So we could make it work
> I think.  I'd modify my first proposal so far as to make it
>
> \g ( pset-option pset-value ... ) filename-or-pipe
>
> That is, require spaces around the parens, and require a value for each
> pset-option (no fair using the shortcuts like "\pset expanded").  Then
> it's easy to separate the option names and values from the paren markers.
> The \g parser would consume its first argument in OT_FILEPIPE mode, and
> then if it sees '(' it would consume arguments in OT_NORMAL mode until
> it's found the ')'.
>

To have this syntax can be nice, but the requirement spaces around
parenthesis is not too user friendly and natural.

Following ideas are based on Tom's ideas

We can have a new commands for cloning print environments and switch to one
shot environment. It can be based just on enhancing of \pset command

\pset save anyidentifier .. serialize settings
\pset load anyidentifier .. load setting
\pset oneshot [anyidentifer] .. prepare and set copy of current print
setting for next execution command
\pset main
\pset reset .. reset to defaults

so this can support some scenarios

-- one shot csv
\pset oneshot  -- copy current settings to one shot environment and use one
shot environment
\pset format csv
\pset csv_delimiter ;
select 1; -- any output

-- prepare named configuration
\pset oneshot
\pset format csv
\pset csv_delimiter ;
\pset save czech_csv -- serialize changes against "main" environment
\pset main

\pset load czech_csv
select 1;

or

\pset oneshot czech_csv
select 1;

So we just need to enhance syntax only of \pset command, and we have to
support work with two print settings environments - "main" and "oneshot"

What do you think about this proposal?

Regards

Pavel






> This way also narrows the backwards-compatibility problem from "fails if
> your filename starts with '('" to "fails if your filename is exactly '('",
> which seems acceptably improbable to me.
>
> regards, tom lane
>

Re: backup manifests

On Fri, Apr 3, 2020 at 3:22 PM Robert Haas  wrote:
> It looks like the buildfarm is unhappy though, so I guess I'd better
> go look at that.

I fixed two things so far, and there seems to be at least one more
possible issue that I don't understand.

1. Apparently, we have an automated perlcritic run built in to the
build farm, and apparently, it really hates Perl subroutines that
don't end with an explicit return statement. We have that overridden
to severity 5 in our Perl critic configuration. I guess I should've
known this, but didn't. I've pushed a fix adding return statements. I
believe I'm on record as thinking that perlcritic is a tool for
complaining about a lot of things that don't really matter and very
few that actually do -- but it's project style, so I'll suck it up!

2. Also, a bunch of machines were super-unhappy with
003_corruption.pl, failing with this sort of thing:

pg_basebackup: error: could not get COPY data stream: ERROR:  symbolic
link target too long for tar format: file name "pg_tblspc/16387",
target 
"/home/fabien/pg/build-farm-11/buildroot/HEAD/pgsql.build/src/bin/pg_validatebackup/tmp_check/tmp_test_7w0w"

Apparently, this is a known problem and the solution is to use
TestLib::tempdir_short instead of TestLib::tempdir, so I pushed a fix
to make it do that.

3. spurfowl has failed its last two runs like this:

sh: 1: ./configure: not found

I am not sure how this patch could've caused that to happen, but the
timing of the failures is certainly suspicious.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: backup manifests

On Thu, Apr 2, 2020 at 4:34 PM David Steele  wrote:
> +1. These would be great tests to have and a win for pg_basebackup
> overall but I don't think they should be a prerequisite for this commit.

Not to mention the server. I can't say that I have a lot of confidence
that all of the server behavior in this area is well-understood and
sane.

I've pushed all the patches. Hopefully everyone is happy now, or at
least not so unhappy that they're going to break quarantine to beat me
up. I hope I acknowledged all of the relevant people in the commit
message, but it's possible that I missed somebody; if so, my
apologies. As is my usual custom, I added entries in roughly the order
that people chimed in on the thread, so the ordering should not be
taken as a reflection of magnitude of contribution or, well, anything
other than the approximate order in which they chimed in.

It looks like the buildfarm is unhappy though, so I guess I'd better
go look at that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: snapshot too old issues, first around wraparound and then more.

2020-04-03 Thread Andres Freund

Hi,

On 2020-04-03 14:32:09 +0530, Amit Kapila wrote:
> On Fri, Apr 3, 2020 at 6:52 AM Peter Geoghegan  wrote:
> >
> > On Thu, Apr 2, 2020 at 5:17 PM Andres Freund  wrote:
> > > Since this is a feature that can result in wrong query results (and
> > > quite possibly crashes / data corruption), I don't think we can just
> > > leave this unfixed.  But given the amount of code / infrastructure
> > > changes required to get this into a working feature, I don't see how we
> > > can unleash those changes onto the stable branches.
> >
>
> As per my initial understanding, the changes required are (a) There
> seem to be multiple places where TestForOldSnapshot is missing, (b)
> TestForOldSnapshot itself need to be reviewed carefully to see if it
> has problems, (c) Some of the members of OldSnapshotControlData like
> head_timestamp and xid_by_minute are not maintained accurately, (d)
> handling of wraparound for xids in the in-memory data-structure for
> this feature is required, (e) test infrastructure is not good enough
> to catch bugs or improve this feature.

And a bunch more correctness issues. But basically, yes.

When you say "(c) Some of the members of OldSnapshotControlData like
head_timestamp and xid_by_minute are not maintained accurately)" - note
that that's the core state for the whole feature.

With regards to test: "not good enough" is somewhat of an
understatement. Not a *single* tuple is removed in the tests due to
old_snapshot_threshold - and removing tuples is the entire point.

> Now, this sounds like a quite of work but OTOH, if we see most of the
> critical changes required will be in only a few functions like
> TransactionIdLimitedForOldSnapshots(),
> MaintainOldSnapshotTimeMapping(), TestForOldSnapshot().

I don't think that's really the case. Every place reading a buffer needs
to be inspected, and new calls added. They aren't free, and I'm not sure
all of them have the relevant snapshot available. To fix the issue of
spurious errors, we'd likely need changes outside of those, and it'd
quite possibly have performance / bloat implications.

> I don't deny the possibility that we might need much more work or we
> need to come up with quite a different design to address all these
> problems but unless Kevin or someone else doesn't come up with a
> solution to address all of these problems, we can't be sure of that.
>
> > I don't think that the feature can be allowed to remain in anything
> > like its current form. The current design is fundamentally unsound.
> >
>
> Agreed, but OTOH, not giving time to Kevin or others who might be
> interested to support this work is also not fair.  I think once
> somebody comes up with patches for problems we can decide whether this
> feature can be salvaged in back-branches or we need to disable it in a
> hard-way.  Now, if Kevin himself is not interested in fixing or nobody
> shows up to help, then surely we can take the decision sooner but
> giving time for a couple of weeks (or even till we are near for PG13
> release) in this case doesn't seem like a bad idea.

It'd certainly be great if somebody came up with fixes, yes. Even if we
had to disable it in the back branches, that'd allow us to keep the
feature around, at least.

The likelihood of regressions even when the feature is not on does not
seem that low. But you're right, we'll be able to better judge it with
fixes to look at.

Greetings,

Andres Freund

Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-04-03 Thread Anna Akenteva

I did some code cleanup and added tests - both for the standalone WAIT 
FOR statement and for WAIT FOR as a part of BEGIN. The new patch is 
attached.


On 2020-04-03 17:29, Alexey Kondratov wrote:

On 2020-04-01 02:26, Anna Akenteva wrote:


- WAIT FOR [ANY | ALL] event [, ...]
- BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] [ WAIT FOR
[ANY | ALL] event [, ...]]
where event is one of:
LSN value
TIMEOUT number_of_milliseconds
timestamp

Now, one event cannot contain both an LSN and a TIMEOUT.



In my understanding the whole idea of having TIMEOUT was to do
something like 'Do wait for this LSN to be replicated, but no longer
than TIMEOUT milliseconds'. What is the point of having plain TIMEOUT?
It seems to be equivalent to pg_sleep, doesn't it?



In the patch that I reviewed, you could do things like:
WAIT FOR
LSN lsn0,
LSN lsn1 TIMEOUT time1,
LSN lsn2 TIMEOUT time2;
and such a statement was in practice equivalent to
WAIT FOR LSN(max(lsn0, lsn1, lsn2)) TIMEOUT (max(time1, time2))

As you can see, even though grammatically lsn1 is grouped with time1 and 
lsn2 is grouped with time2, both timeouts that we specified are not 
connected to their respective LSN-s, and instead they kinda act like 
global timeouts. Therefore, I didn't see a point in keeping TIMEOUT 
necessarily grammatically connected to LSN.


In the new syntax our statement would look like this:
WAIT FOR LSN lsn0, LSN lsn1, LSN lsn2, TIMEOUT time1, TIMEOUT time2;
TIMEOUT-s are not forced to be grouped with LSN-s anymore, which makes 
it more clear that all specified TIMEOUTs will be global and will apply 
to all LSN-s at once.


The point of having TIMEOUT is still to let us limit the time of waiting 
for LSNs. It's just that with the new syntax, we can also use TIMEOUT 
without an LSN. You are right, such a case is equivalent to pg_sleep. 
One way to avoid that is to prohibit waiting for TIMEOUT without 
specifying an LSN. Do you think we should do that?


--
Anna Akenteva
Postgres Professional:
The Russian Postgres Company
http://www.postgrespro.comdiff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 8d91f3529e6..8697f9807ff 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -187,6 +187,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/begin.sgml b/doc/src/sgml/ref/begin.sgml
index c23bbfb4e71..cfee3c8f102 100644
--- a/doc/src/sgml/ref/begin.sgml
+++ b/doc/src/sgml/ref/begin.sgml
@@ -21,13 +21,21 @@ PostgreSQL documentation
 
  
 
-BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
+BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] wait_for_event
 
 where transaction_mode is one of:
 
 ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
 READ WRITE | READ ONLY
 [ NOT ] DEFERRABLE
+
+where wait_for_event is:
+WAIT FOR [ANY | ALL] event [, ...]
+
+and event is one of:
+LSN lsn_value
+TIMEOUT number_of_milliseconds
+timestamp
 
  
 
diff --git a/doc/src/sgml/ref/start_transaction.sgml b/doc/src/sgml/ref/start_transaction.sgml
index d6cd1d41779..0a2ea7e80be 100644
--- a/doc/src/sgml/ref/start_transaction.sgml
+++ b/doc/src/sgml/ref/start_transaction.sgml
@@ -21,13 +21,21 @@ PostgreSQL documentation
 
  
 
-START TRANSACTION [ transaction_mode [, ...] ]
+START TRANSACTION [ transaction_mode [, ...] ] wait_for_event
 
 where transaction_mode is one of:
 
 ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
 READ WRITE | READ ONLY
 [ NOT ] DEFERRABLE
+
+where wait_for_event is:
+WAIT FOR [ANY | ALL] event [, ...]
+
+and event is one of:
+LSN lsn_value
+TIMEOUT number_of_milliseconds
+timestamp
 
  
 
diff --git a/doc/src/sgml/ref/wait.sgml b/doc/src/sgml/ref/wait.sgml
new file mode 100644
index 000..26cae3ad859
--- /dev/null
+++ b/doc/src/sgml/ref/wait.sgml
@@ -0,0 +1,146 @@
+
+
+
+ 
+  WAIT FOR
+ 
+
+ 
+  WAIT FOR
+  7
+  SQL - Language Statements
+ 
+
+ 
+  WAIT FOR
+  wait for the target LSN to be replayed or for specified time to pass
+ 
+
+ 
+
+WAIT FOR [ANY | ALL] event [, ...]
+
+where event is one of:
+LSN value
+TIMEOUT number_of_milliseconds
+timestamp
+
+WAIT FOR LSN 'lsn_number'
+WAIT FOR LSN 'lsn_number' TIMEOUT wait_timeout
+WAIT FOR LSN 'lsn_number', TIMESTAMP wait_time
+WAIT FOR TIMESTAMP wait_time
+WAIT FOR ALL LSN 'lsn_number' TIMEOUT wait_timeout, TIMESTAMP wait_time
+WAIT FOR ANY LSN 'lsn_number', TIMESTAMP wait_time
+
+ 
+
+ 
+  Description
+
+  
+   WAIT FOR provides a simple interprocess communication
+   mechanism to wait for timestamp, timeout or the target log sequence number
+   (LSN) on standby in PostgreSQL
+   databases with master-standby asynchronous replication. When run with the
+   LSN option, the WAIT FOR
+   command waits for the specified LSN to be replayed.
+   If no

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2020-04-03 Thread Justin Pryzby

On Wed, Apr 01, 2020 at 08:08:36AM -0500, Justin Pryzby wrote:
> Or maybe you'd want me to squish my changes into yours and resend after any
> review comments ?

I didn't hear any feedback, so I've now squished all "parenthesized" and "fix"
commits.  0004 reduces duplicative error handling, as a separate commit so
Alexey can review it and/or integrate it.  The last two commits save a few
dozen lines of code, but not sure they're desirable.

As this changes REINDEX/CLUSTER to allow parenthesized options, it might be
pretty reasonable if someone were to kick this to the July CF.

-- 
Justin
>From e322fbe6e2fb5bda9fb8040e4401704c16c371c3 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 2 Apr 2020 14:20:11 -0500
Subject: [PATCH v17 1/8] tab completion for reindex(verbose)..

..which was added at ecd222e77 for v9.5.

This handles "verbose" itself as well as the following word.

Separate commit as this could be backpatched to v12 (but backpatching further
is less trivial, due to improvements added at 121213d9d).
---
 src/bin/psql/tab-complete.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5fec59723c..e428b0b9df 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3414,7 +3414,7 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH("DATA");
 
 /* REINDEX */
-	else if (Matches("REINDEX"))
+	else if (Matches("REINDEX") || Matches("REINDEX", "(*)"))
 		COMPLETE_WITH("TABLE", "INDEX", "SYSTEM", "SCHEMA", "DATABASE");
 	else if (Matches("REINDEX", "TABLE"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_indexables,
@@ -3436,6 +3436,17 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH_QUERY(Query_for_list_of_schemas);
 	else if (Matches("REINDEX", "SYSTEM|DATABASE", "CONCURRENTLY"))
 		COMPLETE_WITH_QUERY(Query_for_list_of_databases);
+	else if (HeadMatches("REINDEX", "(*") &&
+			 !HeadMatches("REINDEX", "(*)"))
+	{
+		/*
+		 * This fires if we're in an unfinished parenthesized option list.
+		 * get_previous_words treats a completed parenthesized option list as
+		 * one word, so the above test is correct.
+		 */
+		if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
+			COMPLETE_WITH("VERBOSE");
+	}
 
 /* SECURITY LABEL */
 	else if (Matches("SECURITY"))
-- 
2.17.0

>From f6fa2cbfa209d105750f7576afcd61999d9d861a Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 27 Mar 2020 17:50:46 -0500
Subject: [PATCH v17 2/8] Change REINDEX/CLUSTER to accept an option list..

..like reindex (CONCURRENTLY)

Change docs in the style of VACUUM.  See also: 52dcfda48778d16683c64ca4372299a099a15b96
---
 doc/src/sgml/ref/cluster.sgml| 13 
 doc/src/sgml/ref/reindex.sgml| 26 +++
 src/backend/commands/cluster.c   | 21 -
 src/backend/commands/indexcmds.c | 41 +---
 src/backend/nodes/copyfuncs.c|  2 ++
 src/backend/nodes/equalfuncs.c   |  2 ++
 src/backend/parser/gram.y| 54 +++-
 src/backend/tcop/utility.c   | 33 ---
 src/bin/psql/tab-complete.c  | 25 ---
 src/include/commands/cluster.h   |  3 +-
 src/include/commands/defrem.h|  7 ++---
 src/include/nodes/parsenodes.h   |  4 ++-
 12 files changed, 169 insertions(+), 62 deletions(-)

diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 4da60d8d56..bd0682ddfd 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -82,31 +82,32 @@ CLUSTER [VERBOSE]
 
   

-table_name
+VERBOSE
 
  
-  The name (possibly schema-qualified) of a table.
+  Prints a progress report as each table is clustered.
  
 

 

-index_name
+table_name
 
  
-  The name of an index.
+  The name (possibly schema-qualified) of a table.
  
 

 

-VERBOSE
+index_name
 
  
-  Prints a progress report as each table is clustered.
+  The name of an index.
  
 

+
   
  
 
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index c54a7c420d..200526b6f4 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -141,19 +141,6 @@ REINDEX [ ( option [, ...] ) ] { IN
 

 
-   
-name
-
- 
-  The name of the specific index, table, or database to be
-  reindexed.  Index and table names can be schema-qualified.
-  Presently, REINDEX DATABASE and REINDEX SYSTEM
-  can only reindex the current database, so their parameter must match
-  the current database's name.
- 
-
-   
-

 CONCURRENTLY
 
@@ -182,6 +169,19 @@ REINDEX [ ( option [, ...] ) ] { IN
  
 

+
+   
+name
+
+ 
+  The name of the specific index, table, or database to be
+  reindexed.  Index and table names can be schema-qualified.
+  Presently,

Re: Syntax rules for a text value inside the literal for a user-defined type—doc section “8.16.2. Constructing Composite Values”

2020-04-03 Thread Bryn Llewellyn

On 03-Apr-2020, at 00:05, David G. Johnston  wrote:

On Thu, Apr 2, 2020 at 8:46 PM Bryn Llewellyn mailto:b...@yugabyte.com>> wrote:
On 02-Apr-2020, at 19:25, Tom Lane mailto:t...@sss.pgh.pa.us>> wrote:

Bryn Llewellyn mailto:b...@yugabyte.com>> writes:
> The documentation in section “8.16.2. Constructing Composite Values” here:
> https://www.postgresql.org/docs/11/rowtypes.html#ROWTYPES-IO-SYNTAX 
>  
>  >

The authoritative documentation for that is at 8.16.6 "Composite Type
Input and Output Syntax", and it says quite clearly that whitespace is
not ignored (except for before and after the outer parentheses). 
[...] 

“Whitespace outside the parentheses is ignored, but within the parentheses it 
is considered part of the field value, and might or might not be significant 
depending on the input conversion rules for the field data type. For example, 
in:

'(  42)'

the whitespace will be ignored if the field type is integer, but not if it is 
text. As shown previously, when writing a composite value you can write double 
quotes around any individual field value.”

Notice the wording “double quotes around any individual field value.” The word 
“around” was the source of my confusion. For the docs to communicate what, it 
seems, they ought to, then the word should be “within”. This demonstrates my 
point:

Actually, they do mean around (as in, the double-quotes must be adjacent to the 
commas that delimit the fields, or the parens).

The next two sentences clear things up a bit:

"You must do so if the field value would otherwise confuse the composite-value 
parser.  In particular, fields containing parentheses, commas, double quotes, 
or backslashes must be double-quoted."

That said the documentation doesn't match the behavior (which is considerably 
more forgiving and also willing to simply discard double-quotes instead of 
error-ing out when the documented rules are not adhered to)

Specifically:  '(a \"b c\" d, e \"f,g\" h)'::rt leaves the double-quote 
while '(a ""b c"" d, e ""f,g"" h)'::rt does not.  Neither have the field 
surround with double-quotes so should be invalid per the documentation.  When 
you follow the documentation they then both retain the double-quote.

So if you follow the guidelines set forth in the documentation you get the 
result the documentation promises.  If you fail to follow the guidelines you 
may still get a result but there is no promise made as to its contents.  Not 
ideal but also not likely to be changed after all this time.

create type rt as (a text, b text);
with v as (select '(a "b c" d, e "f,g" h)'::rt as r)
select
  '>'||(r).a||'<' as a,
  '>'||(r).b||'<' as b
from v;

This demonstrates that, in my input, the double quotes are *within* each of the 
two text values—and definitely *do not surround* them.

Yep, which is why you have an issue.   The "surround them" is indeed what the 
text meant to say.

I really would appreciate a reply to the second part of my earlier question:

“please explain the rationale for what seems to me to be a counter-intuitive 
syntax design choice.”
[...]
They chose the rule that a string value *must* be surrounded by double quotes 
and that other values must *not* be so surrounded. The JSON model resonates 
with my intuition.

This point wasn't answered because there is no good answer to be given.  The 
above is how someone in the mid-90s or so decided PostgreSQL should handle 
this.  I'll personally agree that more verbose but explicit, and less 
forgiving, syntax and parsing, would have been a better choice.  But the choice 
has been made and isn't subject to change.

But regardless of what drove the original design choice if you really care 
about it in a "want to learn" mode then it is very easy to observe the defined 
behavior and critique it independently of how it came to be.  If all you want 
to do is make a left-handed jab at PostgreSQL for having a non-intuitive to you 
design choice do act surprised when people don't choose to respond - especially 
when none of us made the original choice.

The only thing we can do today is describe the system more clearly if we have 
failed to do so adequately.  You are probably in the best position, then, to 
learn what it does and propose new wording that someone with inexperienced (or 
biased toward a different system) eyes would understand quickly and clearly.

David J.

Thanks for the explanation, David. It helped me a lot. You said “If all you 
want to do is make a left-handed jab at PostgreSQL”. That was not at all my 
intention. Rather, my question was driven by my experience of learning things 
over the years—and by my experience of teaching others. Sometimes, at the early 
stage of learning, a new idea strikes one as counter intuitive, and one feels 
that there

Re: pgsql: Improve handling of parameter differences in physical replicatio

On Mon, Mar 30, 2020 at 2:10 PM Andres Freund  wrote:
> One important issue seems to me to be the size of the array that
> TransactionIdIsInProgress() allocates:
> /*
>  * If first time through, get workspace to remember main XIDs in. We
>  * malloc it permanently to avoid repeated palloc/pfree overhead.
>  */
> if (xids == NULL)
> {
> /*
>  * In hot standby mode, reserve enough space to hold all xids 
> in the
>  * known-assigned list. If we later finish recovery, we no 
> longer need
>  * the bigger array, but we don't bother to shrink it.
>  */
> int maxxids = RecoveryInProgress() ? 
> TOTAL_MAX_CACHED_SUBXIDS : arrayP->maxProcs;
>
> xids = (TransactionId *) malloc(maxxids * 
> sizeof(TransactionId));
> if (xids == NULL)
> ereport(ERROR,
> (errcode(ERRCODE_OUT_OF_MEMORY),
>  errmsg("out of memory")));
> }
>
> Which I think means we'll just overrun the xids array in some cases,
> e.g. if KnownAssignedXids overflowed.  Obviously we should have a
> crosscheck in the code (which we don't), but it was previously a
> supposedly unreachable path.

I think this patch needs to be reverted. The only places where it
changes anything are places where we were about to throw some error
anyway. But as Andres's analysis shows, that's not nearly good enough.
I am kind of surprised that Peter thought that would be good enough.
It is necessary, for something like this, to investigate all the
places where the code may be relying on a certain assumption, not just
assume that there's an error check everywhere that we rely on that
assumption and change only those places.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add A Glossary

2020-04-03 Thread Erik Rijkers


On 2020-04-03 18:45, Alvaro Herrera wrote:
Pushed now.  Many thanks to Corey who put the main thrust, and to 
Jürgen
and Roger for the great help, and to Justin for the extensive review 
and

Fabien for the initial discussion.


A few improvements:

'its value that cannot'  should be
'its value cannot'

'A newly created Cluster'  should be
'A newly created cluster'

'term Cluster'  should be
'term cluster'

'allowed within a Table.'  should be
'allowed within a table.'

'of a SQL data type.'  should be
'of an SQL data type.'

'A SQL command'  should be
'An SQL command'

'i.e. the data'  should be
'i.e., the data'

'that a a view is'  should be
'that a view is'

'One of the tables that each contain part'  should be
'One of multiple tables that each contain part'

'a partition is a user-defined criteria'  should be
'a partition is a user-defined criterion'

'Roless are'  should be
'Roles are'

'undo all of the operations'  should be
'undo all operations'

'A special mark inside the sequence of steps'  should be
'A special mark in the sequence of steps'

'are enforced unique'  should be (?)
'are enforced to be unique'

'the term Schema is used'  should be
'the term schema is used'

'belong to exactly one Schema.'  should be
'belong to exactly one schema.'

'about the Cluster's activities'  should be
'about the cluster's activities'

'the most common form of Relation'  should be
'the most common form of relation'

'A Trigger executes'  should be
'A trigger executes'

'and other closely related garbage-collection-like processing'  should 
be

'and other processing'

'each of the changes are replayed'  should be
'each of the changes is replayed'

Should also be a  lemmata in the glossary:

ACID


'archaic' should maybe be 'obsolete'. That seems to me to be an easier 
word for non-native speakers.



Thanks,

Erik Rijkers

Re: Add A Glossary

2020-04-03 Thread Roger Harkavy

On Fri, Apr 3, 2020 at 1:34 PM Corey Huinker 
wrote:

> Thanks for all your work on this!
>

And to add on to Corey's message of thanks, I also want to thank everyone
for their input and assistance on that. I am very grateful for the
opportunity to contribute to this project!

Re: Datum values consistency within one query

Paul Ramsey  writes:
> So, if I tested for VARATT_IS_EXTENDED(), and then for 
> VARATT_IS_EXTERNAL_ONDISK(attr) and then did 
> VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr), I could use va_valueid + 
> va_toastrelid as keys in the cache for things that passed that filter?

I'm pretty sure VARATT_IS_EXTERNAL_ONDISK subsumes the other, so you
don't need to check VARATT_IS_EXTENDED, but yeah.

> What about large const values that haven't been stored in a table yet? (eg, 
> ST_Buffer(ST_MakePoint(0, 0), 100, 1)) is there a stable key I can use 
> for them?

Nope.  If you could convert them into "expanded datums" then you
might have something ... but without any knowledge about where they
are coming from it's hard to see how to detect that a value is
the same one you dealt with before.

regards, tom lane

Re: Add A Glossary

2020-04-03 Thread Corey Huinker

>
> we have it, we can start thinking of patching the main part of the docs
> to make reference to it by using  in key spots.  Right now
> the glossary links to itself, but it makes lots of sense to have other
> places point to it.
>

I have some ideas about how to patch the main docs, but will leave those to
a separate thread.


> * I commented out the definition of "sequence", which seemed to go into
>   excessive detail.  Let's have a more concise definition?
>

That one's my fault.


>
> Patches for these omissions, and other contributions, welcome.
>

Thanks for all your work on this!

Re: Datum values consistency within one query

2020-04-03 Thread Paul Ramsey

> On Apr 2, 2020, at 4:30 PM, Tom Lane  wrote:
> 
> Paul Ramsey  writes:
>> Getting the datum value is really fast, so I can have a cache that
>> keeps the latest detoasted object around, and update it when the datum
>> changes, and store the cache information in the parent context. Like
>> so:
> 
> Jeez, no, not like that.  You're just testing a pointer.  
> ...
> The case where this would actually be worth doing, probably, is where
> you are receiving a toasted-out-of-line datum.  In that case you could
> legitimately use the toast pointer ID values (va_valueid + va_toastrelid)
> as a lookup key for a cache, as long as it had a lifespan of a statement
> or less.  You'd have to get a bit in bed with the details of toast
> pointers, but it's not like those are going anywhere.

So, if I tested for VARATT_IS_EXTENDED(), and then for 
VARATT_IS_EXTERNAL_ONDISK(attr) and then did 
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr), I could use va_valueid + 
va_toastrelid as keys in the cache for things that passed that filter?
What about large const values that haven't been stored in a table yet? (eg, 
ST_Buffer(ST_MakePoint(0, 0), 100, 1)) is there a stable key I can use for 
them?

> It would be interesting to tie that into the "expanded object"
> infrastructure, perhaps, especially if the contents of the objects
> you're interested in aren't just flat blobs of data.

Yeah, I'm wrestling with the right place to do this stuff, it's not just the 
detoasting going on, I also build in-memory trees on large objects and hold 
them around for as long as the object keeps showing repeatedly up in the query, 
I just test the cache right now by using memcmp on the previous value and 
that's really pricey.

P

Re: [PATCH] Fix for slow GIN index queries when "gin_fuzzy_search_limit" setting is relatively small for large tables

I wrote:
> I'm inclined to back-patch this.  Given how fuzzy the definition
> of gin_fuzzy_search_limit is, it seems unlikely that anyone would
> be depending on the current buggy behavior.

And done.  Thanks for the bug report and patch!

regards, tom lane

Re: Add A Glossary

Pushed now.  Many thanks to Corey who put the main thrust, and to Jürgen
and Roger for the great help, and to Justin for the extensive review and
Fabien for the initial discussion.

This is just a starting point.  Let's keep improving it.  And how that
we have it, we can start thinking of patching the main part of the docs
to make reference to it by using  in key spots.  Right now
the glossary links to itself, but it makes lots of sense to have other
places point to it.

On 2020-Apr-02, Justin Pryzby wrote:

> We already have Session:
> A Connection to the Database. 

Yes, but I didn't like that much, so I rewrote it -- I was asking for
suggestions on how to improve it further.  While I think we use those
terms (connection and session) interchangeably sometimes, they're not
exactly the same and the glossary should be more precise or at least
less vague about the distinction.

> I propose: Client:
>   A host (or a process on a host) which connects to a server to make
> queries or other requests.
> 
> But note, "host" is still defined as "server", which I didn't like.
> 
> Maybe it should be:
>   A computer which may act as a >client< or a >server<.

I changed all these terms, and a few others, added a couple more and
commented out some that I was not happy with, and pushed.

I think this still needs more work:

* We had "serializable", but none of the other isolation levels were
  defined.  If we think we should define them, let's define them all.
  But also the definition we had for serializable was not correct;
  it seemed more suited to define "repeatable read".

* I commented out the definition of "sequence", which seemed to go into
  excessive detail.  Let's have a more concise definition?

* We're missing exclusion constraints, and NOT NULL which is also a
  weird type of constraint.

Patches for these omissions, and other contributions, welcome.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: zombie connections

On Fri, Apr 3, 2020 at 11:50 AM Tom Lane  wrote:
> (2) It only wins if a statement timeout is active, otherwise it makes
> things worse, because then you need setitimer() at statement start
> and end just to enable/disable the socket check timeout.  Whereas
> if you just let a once-a-minute timeout continue to run, you don't
> incur those kernel calls.

Oh, that's a really good point. I should have thought of that.

> Anyway, the core problem with the originally-submitted patch was that
> it was totally ignorant that timeout.c had restrictions it was breaking.
> You can either fix the restrictions, or you can try to design around them,
> but you've got to be aware of what that code can and can't do today.

No disagreement there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: VACUUM memory management

2020-04-03 Thread Ibrar Ahmed

On Mon, Mar 16, 2020 at 6:35 PM David Steele  wrote:

> On 1/28/20 1:36 PM, Ibrar Ahmed wrote:
> > On Wed, Jan 22, 2020 at 11:17 AM k.jami...@fujitsu.com
> > I might have missed something more,
> >
> > but I'll continue reviewing after the rebased patch.
> >
> > Yes, I am working on that. I will send the rebased and updated patch.
>
> This patch has not had any updates in months and now we are halfway
> through the CF so I have marked it Returned with Feedback.
>
> If a patch arrives soon I'll be happy to revive the entry, otherwise
> please submit to a future CF when a new patch is available.
>
> Regards,
> --
> -David
> da...@pgmasters.net
>

Here is the latest patch rebased with master
(19db23bcbda99e93321cb0636677ec9c6e121a2a)  Fri Apr 3 12:20:42 2020. Patch
fix all the issues, after the parallel vacuum patch. The patch works in
case of a non-parallel option and allocates memory in chunks.

-- 
Ibrar Ahmed

vacuum_dyn_mem_ibrar_v3.patch
Description: Binary data

Re: adding partitioned tables to publications

Petr Jelinek  writes:
> On 03/04/2020 16:59, Tom Lane wrote:
>> Petr Jelinek  writes:
>>> AFAIK gcov can't handle multiple instances of same process being started
>>> as it just overwrites the coverage files. So for TAP test it will report
>>> bogus info (as in some code that's executed will look as not executed).

>> Hm, really?  I routinely run "make check" (ie, parallel regression
>> tests) under coverage, and I get results that seem sane.  If I were
>> losing large chunks of the data, I think I'd have noticed.

> Parallel regression still just starts single postgres instance no?

But the forked-off children have to write the gcov files independently,
don't they?

regards, tom lane

Re: zombie connections

Robert Haas  writes:
> On Fri, Apr 3, 2020 at 10:34 AM Tom Lane  wrote:
>> In general I think the threshold problem for a patch like this will be
>> "how do you keep the added overhead down".  As Robert noted upthread,
>> timeout.c is quite a bit shy of being able to handle timeouts that
>> persist across statements.  I don't think that there's any fundamental
>> reason it can't be improved, but it will need improvements.

> Why do we need that? If we're not executing a statement, we're
> probably trying to read() from the socket, and we'll notice if that
> returns 0 or -1. So it seems like we only need periodic checks while
> there's a statement  in progress.

Maybe you could build it that way, but I'm not sure it's a better way.

(1) You'll need to build a concept of a timeout that's not a statement
timeout, but nonetheless should be canceled exactly when the statement
timeout is (not before or after, unless you'd like to incur additional
setitimer() calls).   That's going to involve either timeout.c surgery
or fragile requirements on the callers.

(2) It only wins if a statement timeout is active, otherwise it makes
things worse, because then you need setitimer() at statement start
and end just to enable/disable the socket check timeout.  Whereas
if you just let a once-a-minute timeout continue to run, you don't
incur those kernel calls.

It's possible that we should run this timeout differently depending
on whether or not a statement timeout is active, though I'd prefer to
avoid such complexity if possible.  On the whole, if we have to
optimize just one of those cases, it should be the no-statement-timeout
case; with that timeout active, you're paying two setitimers per
statement anyway.

Anyway, the core problem with the originally-submitted patch was that
it was totally ignorant that timeout.c had restrictions it was breaking.
You can either fix the restrictions, or you can try to design around them,
but you've got to be aware of what that code can and can't do today.

regards, tom lane

Re: Should we add xid_current() or a int8->xid cast?

2020-04-03 Thread Mark Dilger

> On Apr 2, 2020, at 7:39 PM, Thomas Munro  wrote:
> 
> 

These apply cleanly, build and pass check-world on mac, and the documentation 
and regression test changes surrounding txid look good to me.

FYI, (not the responsibility of this patch), we never quite define what the 
abbreviation "xip" stands for.  If "Active xid8s at the time of the snapshot." 
were rewritten as "In progress xid8s at the time of the snapshot", it might be 
slightly easier for the reader to figure out that "xip" = "Xid8s In Progress".  
As it stands, nothing in the docs seems to explain the abbrevation.  See 
doc/src/sgml/func.sgml

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: adding partitioned tables to publications


On 03/04/2020 16:59, Tom Lane wrote:

Petr Jelinek  writes:

AFAIK gcov can't handle multiple instances of same process being started
as it just overwrites the coverage files. So for TAP test it will report
bogus info (as in some code that's executed will look as not executed).


Hm, really?  I routinely run "make check" (ie, parallel regression
tests) under coverage, and I get results that seem sane.  If I were
losing large chunks of the data, I think I'd have noticed.



Parallel regression still just starts single postgres instance no?

--
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/

Re: adding partitioned tables to publications

Petr Jelinek  writes:
> AFAIK gcov can't handle multiple instances of same process being started 
> as it just overwrites the coverage files. So for TAP test it will report 
> bogus info (as in some code that's executed will look as not executed). 

Hm, really?  I routinely run "make check" (ie, parallel regression
tests) under coverage, and I get results that seem sane.  If I were
losing large chunks of the data, I think I'd have noticed.

regards, tom lane

Re: adding partitioned tables to publications


Hi,

On 03/04/2020 16:25, Amit Langote wrote:

On Fri, Apr 3, 2020 at 6:34 PM Amit Langote  wrote:

I am checking test coverage at the moment and should have the patches
ready by sometime later today.


Attached updated patches.

I confirmed using a coverage build that all the new code in
logical/worker.c due to 0002 is now covered. For some reason, coverage
report for pgoutput.c doesn't say the same thing for 0003's changes,
although I doubt that result.  It seems strange to believe that *none*
of the new code is tested.  I even checked by adding debugging elog()s
next to the lines that the coverage report says aren't exercised,
which tell me that that's not true. Perhaps my coverage build is
somehow getting messed up, so it would be nice if someone with
reliable coverage builds can confirm one way or the other. I will
continue to check what's wrong.



AFAIK gcov can't handle multiple instances of same process being started 
as it just overwrites the coverage files. So for TAP test it will report 
bogus info (as in some code that's executed will look as not executed). 
We'd probably have to do some kind of `GCOV_PREFIX` magic in the TAP 
framework and merge (gcov/lcov can do that AFAIK) the resulting files to 
get accurate coverage info. But that's beyond this patch IMHO.


--
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/

Re: zombie connections

On Fri, Apr 3, 2020 at 10:34 AM Tom Lane  wrote:
> In general I think the threshold problem for a patch like this will be
> "how do you keep the added overhead down".  As Robert noted upthread,
> timeout.c is quite a bit shy of being able to handle timeouts that
> persist across statements.  I don't think that there's any fundamental
> reason it can't be improved, but it will need improvements.

Why do we need that? If we're not executing a statement, we're
probably trying to read() from the socket, and we'll notice if that
returns 0 or -1. So it seems like we only need periodic checks while
there's a statement  in progress.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: zombie connections

Pavel Stehule  writes:
> I prefer simple solution without any "intelligence". It is much safer to
> close connect and rollback. Then it is clean protocol - when server didn't
> reported successful end of operation, then operation was reverted - always.

It would be a fatal mistake to imagine that this feature would offer any
greater guarantees in that line than we have today (which is to say,
none really).  It can be no better than the OS network stack's error
detection/reporting, which is necessarily pretty weak.  The fact that
the kernel accepted a "command complete" message from us doesn't mean
that the client was still alive at that instant, much less that the
message will be deliverable.

In general I think the threshold problem for a patch like this will be
"how do you keep the added overhead down".  As Robert noted upthread,
timeout.c is quite a bit shy of being able to handle timeouts that
persist across statements.  I don't think that there's any fundamental
reason it can't be improved, but it will need improvements.

regards, tom lane

Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-04-03 Thread Alexey Kondratov


On 2020-04-01 02:26, Anna Akenteva wrote:

On 2020-03-27 04:15, Kartyshov Ivan wrote:

Anna, feel free to work on this patch.


Ivan and I worked on this patch a bit more. We fixed the bugs that we
could find and cleaned up the code. For now, we've kept both options:
WAIT as a standalone statement and WAIT as a part of BEGIN. The new
patch is attached.

The syntax looks a bit different now:

- WAIT FOR [ANY | ALL] event [, ...]
- BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] [ WAIT FOR
[ANY | ALL] event [, ...]]
where event is one of:
LSN value
TIMEOUT number_of_milliseconds
timestamp

Now, one event cannot contain both an LSN and a TIMEOUT.



In my understanding the whole idea of having TIMEOUT was to do something 
like 'Do wait for this LSN to be replicated, but no longer than TIMEOUT 
milliseconds'. What is the point of having plain TIMEOUT? It seems to be 
equivalent to pg_sleep, doesn't it?



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

Re: adding partitioned tables to publications

2020-04-03 Thread Amit Langote

On Fri, Apr 3, 2020 at 6:34 PM Amit Langote  wrote:
> I am checking test coverage at the moment and should have the patches
> ready by sometime later today.

Attached updated patches.

I confirmed using a coverage build that all the new code in
logical/worker.c due to 0002 is now covered. For some reason, coverage
report for pgoutput.c doesn't say the same thing for 0003's changes,
although I doubt that result.  It seems strange to believe that *none*
of the new code is tested.  I even checked by adding debugging elog()s
next to the lines that the coverage report says aren't exercised,
which tell me that that's not true. Perhaps my coverage build is
somehow getting messed up, so it would be nice if someone with
reliable coverage builds can confirm one way or the other. I will
continue to check what's wrong.

I fixed a couple of bugs in 0002. One of the bugs was that the
"partition map" hash table in logical/relation.c didn't really work,
so logicalrep_partition_would() always create a new entry.

In 0003, changed the publication parameter name to publish_via_partition_root.

-- 
Thank you,

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

v16-0001-Add-subscription-support-to-replicate-into-parti.patch
Description: Binary data

v16-0002-Publish-partitioned-table-inserts-as-its-own.patch
Description: Binary data

Re: Proposal: is_castable

=?UTF-8?Q?Micha=C5=82_Wadas?=  writes:
> Currently there is no way to check if CAST will succeed.
> Therefore I propose adding new function: is_castable

> SELECT is_castable('foo' as time) // false

What would you actually do with it?

> Similar features are implemented in:
> - SQL Server (as TRY_CONVERT)
> - Oracle (as CONVERT([val] DEFAULT [expr] ON CONVERSION ERROR)

Somehow, I don't think those have the semantics of what you suggest here.

I suspect you are imagining that you could write something like

CASE WHEN is_castable(x as y) THEN cast(x as y) ELSE ...

but that will not work.  The THEN condition has to pass parse analysis
whether or not execution will ever reach it.

regards, tom lane

where should I stick that backup?

There are a couple of things that pg_basebackup can't do that might be
an issue for some users. One of them is that you might want to do
something like encrypt your backup. Another is that you might want to
store someplace other than in the filesystem, like maybe S3. We could
certainly teach pg_basebackup how to do specifically those things, and
maybe that is worthwhile. However, I wonder if it would be useful to
provide a more general capability, either instead of doing those more
specific things or in addition to doing those more specific things.

What I'm thinking about is: suppose we add an option to pg_basebackup
with a name like --pipe-output. This would be mutually exclusive with
-D, but would work at least with -Ft and maybe also with -Fp. The
argument to --pipe-output would be a shell command to be executed once
per output file. Any instance of %f in the shell command would be
replaced with the name of the file that would have been written (and
%% would turn into a single %). The shell command itself would be
executed via system(). So if you want to compress, but using some
other compression program instead of gzip, you could do something
like:

pg_basebackup -Ft --pipe-output 'bzip > %f.bz2'

And if you want to encrypt, you could do something like:

pg_basebackup -Ft --pipe-output 'gpg -e -o %f.gpg'

And if you want to ship it off to be stored in a concrete bunker deep
underground, you can just do something like:

pg_basebackup -Ft --pipe-output 'send-to-underground-storage.sh
backup-2020-04-03 %f'

You still have to write send-to-underground-storage.sh, of course, and
that may involve some work, and maybe also some expensive
construction. But what you don't have to do is first copy the entire
backup to your local filesystem and then as a second step figure out
how to put it through whatever post-processing it needs. Instead, you
can simply take your backup and stick it anywhere you like.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WAL usage calculation patch

On Fri, Apr 3, 2020 at 9:40 AM Amit Kapila  wrote:
>
> On Fri, Apr 3, 2020 at 9:35 AM Dilip Kumar  wrote:
> >
> > I have analyzed the WAL and there could be multiple reasons for the
> > same.  With small data, I have noticed that while inserting in the
> > system index there was a Page Split and that created extra WAL.
> >
>
> Thanks for the investigation.  I think it is clear that we can't
> expect the same WAL size even if we repeat the same operation unless
> it is a fresh database.
>

Attached find the latest patches.  I have modified based on our
discussion on user interface thread [1], ran pgindent on all patches,
slightly modified one comment based on Dilip's input and added commit
messages.  I think the patches are in good shape.  I would like to
commit the first patch in this series tomorrow unless I see more
comments or any other objections.  The patch-2 might need to be
rebased if the other related patch [2] got committed first and we
might need to tweak a bit based on the input from other thread [1]
where we are discussing user interface for it.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2Bo1Vj4Rso09pKOaKhY8QWTA0gWwCL3TGCi1rCLBBf-QQ%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/E1jKC4J-0007R3-Bo%40gemulon.postgresql.org

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

v14-0001-Add-infrastructure-to-track-WAL-usage.patch
Description: Binary data

v14-0002-Add-the-option-to-report-WAL-usage-in-EXPLAIN-an.patch
Description: Binary data

v14-0003-Allow-pg_stat_statements-to-track-WAL-usage-stat.patch
Description: Binary data

v14-0004-Allow-verbose-auto-vacuum-to-display-WAL-usage-s.patch
Description: Binary data

Re: zombie connections

pá 3. 4. 2020 v 15:52 odesílatel Isaac Morland 
napsal:

> On Fri, 3 Apr 2020 at 08:30, Robert Haas  wrote:
>
>>
>> I don't have a terribly specific idea about how to improve this, but
>> is there some way that we could, at least periodically, check the
>> socket to see whether it's dead? Noticing the demise of the client
>> after a configurable interval (maybe 60s by default?) would be
>> infinitely better than never.
>>
>
> Does it make any difference if the query is making changes? If the query
> is just computing a result and returning it to the client, there is no
> point in continuing once the socket is closed. But if it is updating data
> or making DDL changes, then at least some of the time it would be
> preferable for the changes to be made. Having said that, in normal
> operation one wants, at the client end, to see the message from the server
> that the changes have been completed, not just fire off a change and hope
> it completes.
>

I prefer simple solution without any "intelligence". It is much safer to
close connect and rollback. Then it is clean protocol - when server didn't
reported successful end of operation, then operation was reverted - always.

Re: zombie connections

On Fri, Apr 3, 2020 at 9:52 AM Isaac Morland  wrote:
> Does it make any difference if the query is making changes? If the query is 
> just computing a result and returning it to the client, there is no point in 
> continuing once the socket is closed. But if it is updating data or making 
> DDL changes, then at least some of the time it would be preferable for the 
> changes to be made. Having said that, in normal operation one wants, at the 
> client end, to see the message from the server that the changes have been 
> completed, not just fire off a change and hope it completes.

The system can't know whether the query is going to change anything,
because even if the query is a SELECT, it doesn't know whether any of
the functions or operators called from that SELECT might write data.

I don't think it would be smart to make behavior like this depend on
whether the statement is a SELECT vs. INSERT/UPDATE/DELETE, or on
things like whether there is an explicit transaction open. I think we
should just have a feature that kills the server process if the
connection goes away. If some people don't want that, it can be
optional.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: zombie connections

2020-04-03 Thread Isaac Morland

On Fri, 3 Apr 2020 at 08:30, Robert Haas  wrote:

>
> I don't have a terribly specific idea about how to improve this, but
> is there some way that we could, at least periodically, check the
> socket to see whether it's dead? Noticing the demise of the client
> after a configurable interval (maybe 60s by default?) would be
> infinitely better than never.
>

Does it make any difference if the query is making changes? If the query is
just computing a result and returning it to the client, there is no point
in continuing once the socket is closed. But if it is updating data or
making DDL changes, then at least some of the time it would be preferable
for the changes to be made. Having said that, in normal operation one
wants, at the client end, to see the message from the server that the
changes have been completed, not just fire off a change and hope it
completes.

Re: Nicer error when connecting to standby with hot_standby=off

2020-04-03 Thread James Coleman

On Thu, Apr 2, 2020 at 5:53 PM David Zhang  wrote:
>
> The following review has been posted through the commitfest application:
> make installcheck-world:  not tested
> Implements feature:   tested, passed
> Spec compliant:   not tested
> Documentation:not tested
>
> I applied the patch to the latest master branch and run a test below. The 
> error messages have been separated. Below is the test steps.
>
> ### setup primary server
> initdb -D /tmp/primary/data
> mkdir /tmp/archive_dir
> echo "archive_mode='on'" >> /tmp/primary/data/postgresql.conf
> echo "archive_command='cp %p /tmp/archive_dir/%f'" >> 
> /tmp/primary/data/postgresql.conf
> pg_ctl -D /tmp/primary/data -l /tmp/primary-logs start
>
> ### setup host standby server
> pg_basebackup -p 5432 -w -R -D /tmp/hotstandby
> echo "primary_conninfo='host=127.0.0.1 port=5432 user=pgdev'" >> 
> /tmp/hotstandby/postgresql.conf
> echo "restore_command='cp /tmp/archive_dir/%f %p'" >> 
> /tmp/hotstandby/postgresql.conf
> echo "hot_standby = off" >> /tmp/hotstandby/postgresql.conf
> pg_ctl -D /tmp/hotstandby -l /tmp/hotstandby-logs -o "-p 5433" start
>
> ### keep trying to connect to hot standby server in order to get the error 
> messages in different stages.
> while true; do echo "`date`"; psql postgres -p 5433 -c "SELECT 
> txid_current_snapshot();" sleep 0.2; done
>
> ### before the patch
> psql: error: could not connect to server: FATAL:  the database system is 
> starting up
> ...
>
> ### after the patch, got different messages, one message indicates 
> hot_standby is off
> psql: error: could not connect to server: FATAL:  the database system is 
> starting up
> ...
> psql: error: could not connect to server: FATAL:  the database system is up, 
> but hot_standby is off
> ...

Thanks for the review and testing!

James

Re: zombie connections

On Fri, Apr 3, 2020 at 9:34 AM Konstantin Knizhnik
 wrote:
> There was a patch on commitfest addressing this problem:
> https://commitfest.postgresql.org/21/1882/
> It it currently included in PostgrePro EE, but the author of the patch
> is not working in our company any more.
> Should we resurrects this patch or there is something wrong with the
> proposed approach?

Thanks for the link.

Tom seems to have offered some fairly specific criticism in
https://www.postgresql.org/message-id/31564.1563426253%40sss.pgh.pa.us

I haven't studied that thread in detail, but I would suggest that if
you want to resurrect the patch, that might be a good place to start.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: User Interface for WAL usage data

2020-04-03 Thread Julien Rouhaud

On Fri, Apr 3, 2020 at 11:58 AM Dilip Kumar  wrote:
>
> On Fri, Apr 3, 2020 at 1:57 PM Amit Kapila  wrote:
> >
> > On Fri, Apr 3, 2020 at 11:29 AM Amit Kapila  wrote:
> > >
> > > On Fri, Apr 3, 2020 at 11:14 AM Justin Pryzby  
> > > wrote:
> > > >
> > > > On Fri, Apr 03, 2020 at 10:52:02AM +0530, Amit Kapila wrote:
> > > > > On Fri, Apr 3, 2020 at 10:41 AM Justin Pryzby  
> > > > > wrote:
> > > > > >
> > > > > > > > > > > >WAL: records=2359 full page records=42 bytes=447788
> > > > > >
> > > > > > 1) records; 2) pages ("full page images"); 3) bytes
> > > > > >
> > > > > > That is exactly like sort (method/type/size) and hash 
> > > > > > (buckets/batches/size),
> > > > > > and *not* like buffers, which shows various values all in units of 
> > > > > > "pages".
> > > > > >
> > > > >
> > > > > The way you have written (2) appears to bit awkward. I would prefer
> > > > > "full page writes" or "full page images".
> > > >
> > > > I didn't mean it to be the description used in the patch or anywhere 
> > > > else, just
> > > > the list of units.
> > > >
> > > > I wonder if it should use colons instead of equals ?  As in:
> > > > |WAL: Records: 2359  Full Page Images: 42  Size: 437kB
> > > >
> > > > Note, that has: 1) two spaces; 2) capitalized "fields"; 3) size rather 
> > > > than
> > > > "bytes".  That's similar to Buckets:
> > > > |Buckets: 1024  Batches: 1  Memory Usage: 44kB
> > > >
> > > > I'm not sure if it should say "WAL: " or "WAL ", or perhaps "WAL:  "  If
> > > > there's no colon, then it looks like the first field is "WAL Records", 
> > > > but then
> > > > "size" isn't as tightly associated with WAL.  It could say:
> > > > |WAL Records: n  Full Page Images: n  WAL Size: nkB
> > > >
> > > > For comparison, buffers uses "equals" for the case showing multiple 
> > > > "fields",
> > > > which are all in units of pages:
> > > > |Buffers: shared hit=15 read=2006
> > > >
> > >
> > > I think this is more close to the case of Buffers where all fields are
> > > directly related to buffers/blocks.  Here all the fields we want to
> > > display are related to WAL, so we should try to make it display
> > > similar to Buffers.
> > >
> >
> > Dilip, Julien, others, do you have any suggestions here? I think we
> > need to decide something now.  We can change a few things like from
> > 'two spaces' to 'one space' between fields later as well.
>
> I also think it is more close to the BufferUsage so better to keep
> similar to that.

+1 too for keeping consistency with BufferUsage, and adding extra
spaces if needed.

>  If we think the parsing is the problem we can keep
> '_'  in the multi-word name as shown below.
> WAL: records=n full_page_writes=n bytes=n

I'm fine with it too.

To answer Justin too:

> Also, for now, the output can be in kB, but I think in the future we should
> take a recent suggestion from Andres to make an ExplainPropertyBytes() which
> handles conversion to and display of a reasonable unit.

This could be nice, but I think that it raises some extra concerns.
There are multiple tools that parse those outputs, and having to deal
with a new and non-fixed units may cause some issues.  And probably
the non text output would also need to be displayed differently.

Re: zombie connections

2020-04-03 Thread Konstantin Knizhnik





On 03.04.2020 15:29, Robert Haas wrote:

Hi,

Suppose that the server is executing a lengthy query, and the client
breaks the connection. The operating system will be aware that the
connection is no more, but PostgreSQL doesn't notice, because it's not
try to read from or write to the socket. It's not paying attention to
the socket at all. In theory, the query could be one that runs for a
million years and continue to chew up CPU and I/O, or at the very
least a connection slot, essentially forever. That's sad.

I don't have a terribly specific idea about how to improve this, but
is there some way that we could, at least periodically, check the
socket to see whether it's dead? Noticing the demise of the client
after a configurable interval (maybe 60s by default?) would be
infinitely better than never.



There was a patch on commitfest addressing this problem:
https://commitfest.postgresql.org/21/1882/
It it currently included in PostgrePro EE, but the author of the patch 
is not working in our company any more.
Should we resurrects this patch or there is something wrong with the 
proposed approach?

Re: zombie connections

2020-04-03 Thread Julien Rouhaud

On Fri, Apr 03, 2020 at 02:40:30PM +0200, Pavel Stehule wrote:
> pá 3. 4. 2020 v 14:30 odesílatel Robert Haas  napsal:
> >
> > Suppose that the server is executing a lengthy query, and the client
> > breaks the connection. The operating system will be aware that the
> > connection is no more, but PostgreSQL doesn't notice, because it's not
> > try to read from or write to the socket. It's not paying attention to
> > the socket at all. In theory, the query could be one that runs for a
> > million years and continue to chew up CPU and I/O, or at the very
> > least a connection slot, essentially forever. That's sad.
> >
> > I don't have a terribly specific idea about how to improve this, but
> > is there some way that we could, at least periodically, check the
> > socket to see whether it's dead? Noticing the demise of the client
> > after a configurable interval (maybe 60s by default?) would be
> > infinitely better than never.
> >
> 
> + 1


+1 too, I already saw such behavior.


Maybe the postmaster could send some new PROCSIG SIGUSR1 signal to backends at
a configurable interval and let ProcessInterrupts handle it?

Re: [Proposal] Global temporary tables

2020-04-03 Thread Prabhat Sahu

Hi Wenjing,

Please check the allowed values for boolean parameter
"on_commit_delete_rows".

postgres=# create global temp table gtt1(c1 int)
with(on_commit_delete_rows='true');
CREATE TABLE
Similarly we can successfully create GTT by using the values as:
'true','false', true, false, 'ON', 'OFF', ON, OFF, 1, 0 for boolean
parameter "on_commit_delete_rows"

But we are getting error while using the boolean value as: '1', '0', 't',
'f', 'yes', 'no', 'y', 'n' as below.
postgres=# create global temp table gtt11(c1 int)
with(on_commit_delete_rows='1');
ERROR:  on_commit_delete_rows requires a Boolean value
postgres=# create global temp table gtt11(c1 int)
with(on_commit_delete_rows='0');
ERROR:  on_commit_delete_rows requires a Boolean value
postgres=# create global temp table gtt11(c1 int)
with(on_commit_delete_rows='t');
ERROR:  on_commit_delete_rows requires a Boolean value
postgres=# create global temp table gtt11(c1 int)
with(on_commit_delete_rows='f');
ERROR:  on_commit_delete_rows requires a Boolean value
postgres=# create global temp table gtt11(c1 int)
with(on_commit_delete_rows='yes');
ERROR:  on_commit_delete_rows requires a Boolean value
postgres=# create global temp table gtt11(c1 int)
with(on_commit_delete_rows='no');
ERROR:  on_commit_delete_rows requires a Boolean value
postgres=# create global temp table gtt11(c1 int)
with(on_commit_delete_rows='y');
ERROR:  on_commit_delete_rows requires a Boolean value
postgres=# create global temp table gtt11(c1 int)
with(on_commit_delete_rows='n');
ERROR:  on_commit_delete_rows requires a Boolean value

-- As per the error message "ERROR:  on_commit_delete_rows requires a
Boolean value" either we should allow all the boolean values.

*Example*: CREATE VIEW view1 WITH (security_barrier = 'true') as select 5;
The syntax of VIEW allows all the above possible boolean values for the
boolean parameter "security_barrier"


-- or else we should change the error message something like
"ERROR:  on_commit_delete_rows requires 'true','false','ON','OFF',1,0 as
Boolean value".

-- 

With Regards,
Prabhat Kumar Sahu
EnterpriseDB: http://www.enterprisedb.com

Re: zombie connections

pá 3. 4. 2020 v 14:30 odesílatel Robert Haas  napsal:

> Hi,
>
> Suppose that the server is executing a lengthy query, and the client
> breaks the connection. The operating system will be aware that the
> connection is no more, but PostgreSQL doesn't notice, because it's not
> try to read from or write to the socket. It's not paying attention to
> the socket at all. In theory, the query could be one that runs for a
> million years and continue to chew up CPU and I/O, or at the very
> least a connection slot, essentially forever. That's sad.
>
> I don't have a terribly specific idea about how to improve this, but
> is there some way that we could, at least periodically, check the
> socket to see whether it's dead? Noticing the demise of the client
> after a configurable interval (maybe 60s by default?) would be
> infinitely better than never.
>

+ 1

Pavel


> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>

zombie connections

Hi,

Suppose that the server is executing a lengthy query, and the client
breaks the connection. The operating system will be aware that the
connection is no more, but PostgreSQL doesn't notice, because it's not
try to read from or write to the socket. It's not paying attention to
the socket at all. In theory, the query could be one that runs for a
million years and continue to chew up CPU and I/O, or at the very
least a connection slot, essentially forever. That's sad.

I don't have a terribly specific idea about how to improve this, but
is there some way that we could, at least periodically, check the
socket to see whether it's dead? Noticing the demise of the client
after a configurable interval (maybe 60s by default?) would be
infinitely better than never.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Proposal: is_castable

Hi

pá 3. 4. 2020 v 13:45 odesílatel Michał Wadas 
napsal:

> Currently there is no way to check if CAST will succeed.
>
> Therefore I propose adding new function: is_castable
>
> SELECT is_castable('foo' as time) // false
> SELECT is_castable('123' as numeric) // true
> SELECT is_castable(1.5 as int) // true
> SELECT is_castable('1.5' as int) // false
>
> Many users write their own functions:
>
> https://stackoverflow.com/q/10306830/2446102 (11k views, ~25 upvotes)
> https://stackoverflow.com/q/36775814/2446102
> https://stackoverflow.com/a/16206123/2446102 (72k views, 70 upvotes)
> https://stackoverflow.com/q/2082686/2446102 (174k views, ~150 upvotes)
>
> Similar features are implemented in:
> - SQL Server (as TRY_CONVERT)
> - Oracle (as CONVERT([val] DEFAULT [expr] ON CONVERSION ERROR)
>
> I would love to implement it myself, but my knowledge of C is superficial.
>

It's is interesting feature - and implementation can be very easy - but
without enhancing type API this function can be pretty slow.

So there is a dilemma - simple implementation (few work) but possible very
negative performance impact under higher load due work with savepoints, or
much larger work (probably easy) without necessity to use safepoints.

Regards

Pavel



>
> Thanks,
> Michał Wadas
>
>

Proposal: is_castable

2020-04-03 Thread Michał Wadas

Currently there is no way to check if CAST will succeed.

Therefore I propose adding new function: is_castable

SELECT is_castable('foo' as time) // false
SELECT is_castable('123' as numeric) // true
SELECT is_castable(1.5 as int) // true
SELECT is_castable('1.5' as int) // false

Many users write their own functions:

https://stackoverflow.com/q/10306830/2446102 (11k views, ~25 upvotes)
https://stackoverflow.com/q/36775814/2446102
https://stackoverflow.com/a/16206123/2446102 (72k views, 70 upvotes)
https://stackoverflow.com/q/2082686/2446102 (174k views, ~150 upvotes)

Similar features are implemented in:
- SQL Server (as TRY_CONVERT)
- Oracle (as CONVERT([val] DEFAULT [expr] ON CONVERSION ERROR)

I would love to implement it myself, but my knowledge of C is superficial.

Thanks,
Michał Wadas

Re: WIP/PoC for parallel backup

2020-04-03 Thread Kashif Zeeshan

Hi Asif

When a non-existent slot is used with tablespace then correct error is
displayed but then the backup folder is not cleaned and leaves a corrupt
backup.

Steps
===

edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=#  create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#


[edb@localhost bin]$
[edb@localhost bin]$  ./pg_basebackup -v -j 2 -D  /home/edb/Desktop/backup/
-T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E28 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command
"START_REPLICATION": ERROR:  replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$

backup folder not cleaned

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label  globalpg_dynshmem  pg_ident.conf  pg_multixact
 pg_replslot  pg_snapshots  pg_stat_tmp  pg_tblspcPG_VERSION  pg_xact
postgresql.conf
base  pg_commit_ts  pg_hba.conf  pg_logical pg_notify
pg_serialpg_stat   pg_subtrans  pg_twophase  pg_wal
 postgresql.auto.conf
[edb@localhost bin]$




If the same case is executed without the parallel backup patch then the
backup folder is cleaned after the error is displayed.

[edb@localhost bin]$ ./pg_basebackup -v -D  /home/edb/Desktop/backup/ -T
/home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B28 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command
"START_REPLICATION": ERROR:  replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
*pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
pg_basebackup: changes to tablespace directories will not be undone


On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman  wrote:

>
>
> On Thu, Apr 2, 2020 at 8:45 PM Robert Haas  wrote:
>
>> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman 
>> wrote:
>> >> Why would you need to do that? As long as the process where
>> >> STOP_BACKUP can do the check, that seems good enough.
>> >
>> > Yes, but the user will get the error only after the STOP_BACKUP, not
>> while the backup is
>> > in progress. So if the backup is a large one, early error detection
>> would be much beneficial.
>> > This is the current behavior of non-parallel backup as well.
>>
>> Because non-parallel backup does not feature early detection of this
>> error, it is not necessary to make parallel backup do so. Indeed, it
>> is undesirable. If you want to fix that problem, do it on a separate
>> thread in a separate patch. A patch proposing to make parallel backup
>> inconsistent in behavior with non-parallel backup will be rejected, at
>> least if I have anything to say about it.
>>
>> TBH, fixing this doesn't seem like an urgent problem to me. The
>> current situation is not great, but promotions ought to be relatively
>> infrequent, so I'm not sure it's a huge problem in practice. It is
>> also worth considering whether the right fix is to figure out how to
>> make that case actually work, rather than just making it fail quicker.
>> I don't currently understand the reason for the prohibition so I can't
>> express an intelligent opinion on what the right answer is here, but
>> it seems like it ought to be investigated before somebody goes and
>> builds a bunch of infrastructure to make the error more timely.
>>
>
> Non-parallel backup already does the early error checking. I only intended
>
> to make parallel behave the same as non-parallel here. So, I agree with
>
> you that the behavior of parallel backup should be consistent with the
>
> non-parallel one.  Please see the code snippet below from
>
> basebackup.c:sendDir()
>
>
> /*
>>
>>  * Check if the postmaster has signaled us to exit, and abort with an
>>
>>  * error in that case. The error handler further up will call
>>
>>  * do_pg_abort_backup() for us. Also check that if the backup was
>>
>>  * started while still

Re: User Interface for WAL usage data

2020-04-03 Thread Dilip Kumar

On Fri, Apr 3, 2020 at 1:57 PM Amit Kapila  wrote:
>
> On Fri, Apr 3, 2020 at 11:29 AM Amit Kapila  wrote:
> >
> > On Fri, Apr 3, 2020 at 11:14 AM Justin Pryzby  wrote:
> > >
> > > On Fri, Apr 03, 2020 at 10:52:02AM +0530, Amit Kapila wrote:
> > > > On Fri, Apr 3, 2020 at 10:41 AM Justin Pryzby  
> > > > wrote:
> > > > >
> > > > > > > > > > >WAL: records=2359 full page records=42 bytes=447788
> > > > >
> > > > > 1) records; 2) pages ("full page images"); 3) bytes
> > > > >
> > > > > That is exactly like sort (method/type/size) and hash 
> > > > > (buckets/batches/size),
> > > > > and *not* like buffers, which shows various values all in units of 
> > > > > "pages".
> > > > >
> > > >
> > > > The way you have written (2) appears to bit awkward. I would prefer
> > > > "full page writes" or "full page images".
> > >
> > > I didn't mean it to be the description used in the patch or anywhere 
> > > else, just
> > > the list of units.
> > >
> > > I wonder if it should use colons instead of equals ?  As in:
> > > |WAL: Records: 2359  Full Page Images: 42  Size: 437kB
> > >
> > > Note, that has: 1) two spaces; 2) capitalized "fields"; 3) size rather 
> > > than
> > > "bytes".  That's similar to Buckets:
> > > |Buckets: 1024  Batches: 1  Memory Usage: 44kB
> > >
> > > I'm not sure if it should say "WAL: " or "WAL ", or perhaps "WAL:  "  If
> > > there's no colon, then it looks like the first field is "WAL Records", 
> > > but then
> > > "size" isn't as tightly associated with WAL.  It could say:
> > > |WAL Records: n  Full Page Images: n  WAL Size: nkB
> > >
> > > For comparison, buffers uses "equals" for the case showing multiple 
> > > "fields",
> > > which are all in units of pages:
> > > |Buffers: shared hit=15 read=2006
> > >
> >
> > I think this is more close to the case of Buffers where all fields are
> > directly related to buffers/blocks.  Here all the fields we want to
> > display are related to WAL, so we should try to make it display
> > similar to Buffers.
> >
>
> Dilip, Julien, others, do you have any suggestions here? I think we
> need to decide something now.  We can change a few things like from
> 'two spaces' to 'one space' between fields later as well.

I also think it is more close to the BufferUsage so better to keep
similar to that.  If we think the parsing is the problem we can keep
'_'  in the multi-word name as shown below.
WAL: records=n full_page_writes=n bytes=n

And, all three fields are related to WAL so we can use WAL: followed
by other fields as we are doing now in the current patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Online checksums verification in the backend

2020-04-03 Thread Julien Rouhaud

On Fri, Apr 03, 2020 at 12:24:50PM +0900, Masahiko Sawada wrote:
> On Sat, 28 Mar 2020 at 21:19, Julien Rouhaud  wrote:
> >
> The current patch still checks SearchSysCacheExists1() before
> relation_open. Why do we need to call SearchSysCacheExists1() here? I
> think if the given relation doesn't exist, relation_open() will raise
> an error "could not open relation with OID %u".
> 
> +   /* Open the relation if it exists.  */
> +   if (SearchSysCacheExists1(RELOID, ObjectIdGetDatum(relid)))
> +   {
> +   relation = relation_open(relid, AccessShareLock);
> +   }

Oops yes sorry about that.  Fixed.


> > > 8.
> > > +   if (PageIsVerified(buffer, blkno))
> > > +   {
> > > +   /*
> > > +* If the page is really new, there won't by any checksum to 
> > > be
> > > +* computed or expected.
> > > +*/
> > > +   *chk_expected = *chk_found = NoComputedChecksum;
> > > +   return true;
> > > +   }
> > > +   else
> > > +   {
> > > +   /*
> > > +* There's a corruption, but since this affect PageIsNew, we
> > > +* can't compute a checksum, so set NoComputedChecksum for the
> > > +* expected checksum.
> > > +*/
> > > +   *chk_expected = NoComputedChecksum;
> > > +   *chk_found = hdr->pd_checksum;
> > > +   }
> > > +   return false;
> > >
> > > * I think the 'else' is not necessary here.
> >
> > AFAICT it's, see below.
> >
> > > * Setting *chk_expected and *chk_found seems useless when we return
> > > true. The caller doesn't use them.
> >
> > Indeed, fixed.
> 
> The patch still sets values to both?
> 
> +   if (PageIsVerified(buffer, blkno))
> +   {
> +   /* No corruption. */
> +   *chk_expected = *chk_found = NoComputedChecksum;
> +   return true;
> +   }


Sorry again, fixed.


> > > * Should we forcibly overwrite ignore_checksum_failure to off in
> > > pg_check_relation()? Otherwise, this logic seems not work fine.
> > >
> > > * I don't understand why we cannot compute a checksum in case where a
> > > page looks like a new page but is actually corrupted. Could you please
> > > elaborate on that?
> >
> > PageIsVerified has a different behavior depending on whether the page looks 
> > new
> > or not.  If the page looks like new, it only checks that it's indeed a new
> > page, and otherwise try to verify the checksum.
> >
> > Also, pg_check_page() has an assert to make sure that the page isn't (or 
> > don't
> > look like) new.
> >
> > So it seems to me that the 'else' is required to properly detect a real or 
> > fake
> > PageIsNew, and try to compute checksums only when required.
> 
> Thank you for your explanation! I understand.
> 
> I thought we can arrange the code to something like:
> 
> if (PageIsNew(hdr))
> {
> if (PageIsVerified(hdr))
> {
> *chk_expected = *chk_found = NoComputedChecksum;
> return true;
> }
> 
> *chk_expected = NoComputedChecksum;
> *chk_found = hdr->pd_checksum;
> return false;
> }
> 
> But since it's not a critical problem you can ignore it.


I like it, so done!


> > > 8.
> > > +   {
> > > +   {"checksum_cost_page_hit", PGC_USERSET, RESOURCES_CHECKSUM_DELAY,
> > > +   gettext_noop("Checksum cost for a page found in the buffer 
> > > cache."),
> > > +   NULL
> > > +   },
> > > +   ,
> > > +   1, 0, 1,
> > > +   NULL, NULL, NULL
> > > +   },
> > >
> > > * There is no description about the newly added four GUC parameters in 
> > > the doc.
> > >
> > > * We need to put new GUC parameters into postgresql.conf.sample as well.
> >
> > Fixed both.
> >
> > > * The patch doesn't use checksum_cost_page_hit at all.
> >
> > Indeed, I also realized that while working on previous issues.  I removed it
> > and renamed checksum_cost_page_miss to checksum_cost_page.
> 
> Perhaps we can use checksum_cost_page_hit when we found the page in
> the shared buffer but it's marked as dirty?


The thing is that when the buffer is dirty, we won't do any additional check,
thus not adding any overhead.  What may be needed here is to account for the
locking overhead (in all cases), so that if all (or almost all) the buffers are
dirty and in shared buffers the execution can be throttled.  I don't know how
much an issue it can be, but if that's something to be fixes then page_hit
doesn't look like the right answer for that.


> I've read the latest patch and here is random comments:
> 
> 1.
> +   /*
> +* Add a page miss cost, as we're always reading outside the shared
> +* buffers.
> +*/
> +   /* Add a page cost. */
> +   ChecksumCostBalance += ChecksumCostPage;
> 
> There are duplicate comments.

Fixed.


> 2.
> +   /* Dirty pages are ignored as they'll be flushed soon. */
> +   if (buf_state & BM_DIRTY)
> +   checkit = false;
> 
> Should we check the buffer if it has BM_TAG_VALID as well here? I
>

Re: adding partitioned tables to publications

2020-04-03 Thread Amit Langote

On Fri, Apr 3, 2020 at 4:52 PM Petr Jelinek  wrote:
> On 02/04/2020 14:23, Peter Eisentraut wrote:
> > On 2020-03-30 17:42, Amit Langote wrote:
> >> I have updated the comments in apply_handle_tuple_routing() (see 0002)
> >> to better explain what's going on with UPDATE handling.  I also
> >> rearranged the tests a bit for clarity.
> >>
> >> Attached updated patches.
> > > Also, the coverage report reveals that in logicalrep_partmap_init(), the
> > patch is mistakenly initializing LogicalRepRelMapContext instead of
> > LogicalRepPartMapContext.  (Hmm, how does it even work like that?)
> >
>
> It works because it's just a MemoryContext and it's long lived. I wonder
> if the fix here is to simply remove the LogicalRepPartMapContext...

Actually, there is no LogicalRepPartMapContext in the patches posted
so far, but I have decided to add it in the updated patch. One
advantage beside avoiding confusion is that it might help to tell
memory consumed by the partitions apart from that consumed by the
actual replication targets.

> > I think apart from some of these details, this patch is okay, but I
> > don't have deep experience in the partitioning code, I can just see that
> > it looks like other code elsewhere.  Perhaps someone with more knowledge
> > can give this a look as well.
> >
>
> FWIW it looks okay to me as well from perspective of somebody who
> implemented something similar outside of core.

Thanks for giving it a look.

> > About patch 0003, I was talking to some people offline about the name of
> > the option.  There was some confusion about using the term "schema". How
> > about naming it "publish_via_partition_root", which also matches the
> > name of the analogous option in pg_dump.
> >
>
> +1 (disclaimer: I was one of the people who discussed this offline)

Okay, I like that too.

I am checking test coverage at the moment and should have the patches
ready by sometime later today.

-- 
Thank you,

Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Re: snapshot too old issues, first around wraparound and then more.

On Fri, Apr 3, 2020 at 6:52 AM Peter Geoghegan  wrote:
>
> On Thu, Apr 2, 2020 at 5:17 PM Andres Freund  wrote:
> > Since this is a feature that can result in wrong query results (and
> > quite possibly crashes / data corruption), I don't think we can just
> > leave this unfixed.  But given the amount of code / infrastructure
> > changes required to get this into a working feature, I don't see how we
> > can unleash those changes onto the stable branches.
>

As per my initial understanding, the changes required are (a) There
seem to be multiple places where TestForOldSnapshot is missing,  (b)
TestForOldSnapshot itself need to be reviewed carefully to see if it
has problems, (c) Some of the members of OldSnapshotControlData like
head_timestamp and xid_by_minute are not maintained accurately, (d)
handling of wraparound for xids in the in-memory data-structure for
this feature is required, (e) test infrastructure is not good enough
to catch bugs or improve this feature.

Now, this sounds like a quite of work but OTOH, if we see most of the
critical changes required will be in only a few functions like
TransactionIdLimitedForOldSnapshots(),
MaintainOldSnapshotTimeMapping(), TestForOldSnapshot().  I don't deny
the possibility that we might need much more work or we need to come
up with quite a different design to address all these problems but
unless Kevin or someone else doesn't come up with a solution to
address all of these problems, we can't be sure of that.

> I don't think that the feature can be allowed to remain in anything
> like its current form. The current design is fundamentally unsound.
>

Agreed, but OTOH, not giving time to Kevin or others who might be
interested to support this work is also not fair.  I think once
somebody comes up with patches for problems we can decide whether this
feature can be salvaged in back-branches or we need to disable it in a
hard-way.  Now, if Kevin himself is not interested in fixing or nobody
shows up to help, then surely we can take the decision sooner but
giving time for a couple of weeks (or even till we are near for PG13
release) in this case doesn't seem like a bad idea.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

2020-04-03 Thread Andy Fan

On Fri, Apr 3, 2020 at 12:08 PM David Rowley  wrote:

> On Fri, 3 Apr 2020 at 16:40, Tom Lane  wrote:
> > (It occurs to me BTW that we've been overly conservative about using
> > NOT NULL constraints in planning, because of failing to consider that.
> > Addition or drop of NOT NULL has to cause a change in
> > pg_attribute.attnotnull, which will definitely cause a relcache inval
> > on its table, cf rules in CacheInvalidateHeapTuple().  So we *don't*
> > need to have a pg_constraint entry corresponding to the NOT NULL, as
> > we've mistakenly supposed in some past discussions.)
>
> Agreed for remove_useless_groupby_columns(), but we'd need it if we
> wanted to detect functional dependencies in
> check_functional_grouping() using unique indexes.
>

Thanks for the explanation.  I will add the removal in the next version of
this
patch.

Best Regards
Andy Fan

Re: A bug when use get_bit() function for a long bytea string

2020-04-03 Thread Daniel Verite

movead...@highgo.ca wrote:

> >I believe the formula for the upper limit in the 32-bit case is: 
> >  (len <= PG_INT32_MAX / 8) ? (len*8 - 1) : PG_INT32_MAX; 
> 
> >These functions could benefit from a comment mentioning that 
> >they cannot reach the full extent of a bytea, because of the size limit 
> >on the bit number. 
> 
> Because the 2nd argument is describing 'bit' location of every byte in bytea
> string, so an int32 is not enough for that. I think the change is nothing
> wrong, 
> or I have not caught your means? 

In existing releases, the SQL definitions are set_bit(bytea,int4,int4)
and get_bit(bytea,int4) and cannot be changed to not break the API.
So the patch meant for existing releases has to deal with a too-narrow
int32 bit number.

Internally in the C functions, you may convert that number to int64
if you think it's necessary, but you may not use PG_GETARG_INT64
to pick a 32-bit argument.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: WIP/PoC for parallel backup

2020-04-03 Thread Asif Rehman

On Thu, Apr 2, 2020 at 8:45 PM Robert Haas  wrote:

> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman 
> wrote:
> >> Why would you need to do that? As long as the process where
> >> STOP_BACKUP can do the check, that seems good enough.
> >
> > Yes, but the user will get the error only after the STOP_BACKUP, not
> while the backup is
> > in progress. So if the backup is a large one, early error detection
> would be much beneficial.
> > This is the current behavior of non-parallel backup as well.
>
> Because non-parallel backup does not feature early detection of this
> error, it is not necessary to make parallel backup do so. Indeed, it
> is undesirable. If you want to fix that problem, do it on a separate
> thread in a separate patch. A patch proposing to make parallel backup
> inconsistent in behavior with non-parallel backup will be rejected, at
> least if I have anything to say about it.
>
> TBH, fixing this doesn't seem like an urgent problem to me. The
> current situation is not great, but promotions ought to be relatively
> infrequent, so I'm not sure it's a huge problem in practice. It is
> also worth considering whether the right fix is to figure out how to
> make that case actually work, rather than just making it fail quicker.
> I don't currently understand the reason for the prohibition so I can't
> express an intelligent opinion on what the right answer is here, but
> it seems like it ought to be investigated before somebody goes and
> builds a bunch of infrastructure to make the error more timely.
>

Non-parallel backup already does the early error checking. I only intended

to make parallel behave the same as non-parallel here. So, I agree with

you that the behavior of parallel backup should be consistent with the

non-parallel one.  Please see the code snippet below from

basebackup.c:sendDir()


/*
>
>  * Check if the postmaster has signaled us to exit, and abort with an
>
>  * error in that case. The error handler further up will call
>
>  * do_pg_abort_backup() for us. Also check that if the backup was
>
>  * started while still in recovery, the server wasn't promoted.
>
>  * do_pg_stop_backup() will check that too, but it's better to stop
>
>  * the backup early than continue to the end and fail there.
>
>  */
>
> CHECK_FOR_INTERRUPTS();
>
> *if* (RecoveryInProgress() != backup_started_in_recovery)
>
> ereport(ERROR,
>
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>
> errmsg("the standby was promoted during online backup"),
>
> errhint("This means that the backup being taken is corrupt "
>
> "and should not be used. "
>
> "Try taking another online backup.")));
>
>
> > Okay, then I will add the shared state. And since we are adding the
> shared state, we can use
> > that for throttling, progress-reporting and standby early error checking.
>
> Please propose a grammar here for all the new replication commands you
> plan to add before going and implement everything. That will make it
> easier to hash out the design without forcing you to keep changing the
> code. Your design should include a sketch of how several sets of
> coordinating backends taking several concurrent parallel backups will
> end up with one shared state per parallel backup.
>
> > There are two possible options:
> >
> > (1) Server may generate a unique ID i.e. BackupID= OR
> > (2) (Preferred Option) Use the WAL start location as the BackupID.
> >
> > This BackupID should be given back as a response to start backup
> command. All client workers
> > must append this ID to all parallel backup replication commands. So that
> we can use this identifier
> > to search for that particular backup. Does that sound good?
>
> Using the WAL start location as the backup ID seems like it might be
> problematic -- could a single checkpoint not end up as the start
> location for multiple backups started at the same time? Whether that's
> possible now or not, it seems unwise to hard-wire that assumption into
> the wire protocol.
>
> I was thinking that perhaps the client should generate a unique backup
> ID, e.g. leader does:
>
> START_BACKUP unique_backup_id [options]...
>
> And then others do:
>
> JOIN_BACKUP unique_backup_id
>
> My thought is that you will have a number of shared memory structure
> equal to max_wal_senders, each one large enough to hold the shared
> state for one backup. The shared state will include
> char[NAMEDATALEN-or-something] which will be used to hold the backup
> ID. START_BACKUP would allocate one and copy the name into it;
> JOIN_BACKUP would search for one by name.
>
> If you want to generate the name on the server side, then I suppose
> START_BACKUP would return a result set that includes the backup ID,
> and clients would have to specify that same backup ID when invoking
> JOIN_BACKUP. The rest would stay the same. I am not sure which way is
> better. Either way, the backup ID should be something long and hard to
> guess, not e.g. the leader processes' PID. I think we should generate
> it using

Re: [Proposal] Global temporary tables

pá 3. 4. 2020 v 9:52 odesílatel 曾文旌  napsal:

> In my opinion
> 1 We are developing GTT according to the SQL standard, not Oracle.
>
> 2 The implementation differences you listed come from pg and oracle
> storage modules and DDL implementations.
>
> 2.1 issue 1 and issue 2
> The creation of Normal table/GTT defines the catalog and initializes the
> data store file, in the case of the GTT, which initializes the store file
> for the current session.
> But in oracle It just looks like only defines the catalog.
> This causes other sessions can not drop the GTT in PostgreSQL.
> This is the reason for issue 1 and issue 2, I think it is reasonable.
>
> 2.2 issue 3
> I thinking the logic of drop GTT is
> When only the current session is using the GTT, it is safe to drop the
> GTT.
> because the GTT's definition and storage files can completely delete from
> db.
> But, If multiple sessions are using this GTT, it is hard to drop GTT in
> session a, because remove the local buffer and data file of the GTT in
> other session is difficult.
> I am not sure why oracle has this limitation.
> So, issue 3 is reasonable.
>
> 2.3 TRUNCATE Normal table/GTT
> TRUNCATE Normal table / GTT clean up the logical data but not unlink data
> store file. in the case of the GTT, which is the store file for the
> current session.
> But in oracle,  It just looks like data store file was cleaned up.
> PostgreSQL storage is obviously different from oracle, In other words,
> session is detached from storage.
> This is the reason for issue4 I think it is reasonable.
>

Although the implementation of GTT is different, I think so TRUNCATE on
Postgres (when it is really finalized) can remove session metadata of GTT
too (and reduce usage's counter). It is not critical feature, but I think
so it should not be hard to implement. From practical reason can be nice to
have a tool how to refresh GTT without a necessity to close session.
TRUNCATE can be this tool.

Regards

Pavel


> All in all, I think the current implementation is sufficient for dba to
> manage GTT.
>
> 2020年4月2日 下午4:45，Prabhat Sahu  写道：
>
> Hi All,
>
> I have noted down few behavioral difference in our GTT implementation in
> PG as compared to Oracle DB:
> As per my understanding, the behavior of DROP TABLE in case of "Normal
> table and GTT" in Oracle DB are as below:
>
>1. Any tables(Normal table / GTT) without having data in a session, we
>will be able to DROP from another session.
>2. For a completed transaction on a normal table having data, we will
>be able to DROP from another session. If the transaction is not yet
>complete, and we are trying to drop the table from another session, then we
>will get an error. (working as expected)
>3. For a completed transaction on GTT with(on commit delete rows)
>(i.e. no data in GTT) in a session, we will be able to DROP from another
>session.
>4. For a completed transaction on GTT with(on commit preserve rows)
>with data in a session, we will not be able to DROP from any session(not
>even from the session in which GTT is created), we need to truncate the
>table data first from all the session(session1, session2) which is having
>data.
>
> *1. Any tables(Normal table / GTT) without having data in a session, we
> will be able to DROP from another session.*
> *Session1:*
> create table t1 (c1 integer);
> create global temporary table gtt1 (c1 integer) on commit delete rows;
> create global temporary table gtt2 (c1 integer) on commit preserve rows;
>
> *Session2:*
> drop table t1;
> drop table gtt1;
> drop table gtt2;
>
> -- *Issue 1:* But we are able to drop a simple table and failed to drop
> GTT as below.
>
> postgres=# drop table t1;
> DROP TABLE
> postgres=# drop table gtt1;
> ERROR:  can not drop relation gtt1 when other backend attached this global
> temp table
> postgres=# drop table gtt2;
> ERROR:  can not drop relation gtt2 when other backend attached this global
> temp table
>
>
> *3. For a completed transaction on GTT with(on commit delete rows) (i.e.
> no data in GTT) in a session, we will be able to DROP from another session.*
>
> *Session1:*create global temporary table gtt1 (c1 integer) on commit
> delete rows;
>
> *Session2:*
> drop table gtt1;
>
> -- *Issue 2:* But we are getting error for GTT
> with(on_commit_delete_rows) without data.
>
> postgres=# drop table gtt1;
> ERROR:  can not drop relation gtt1 when other backend attached this global
> temp table
>
>
> *4. For a completed transaction on GTT with(on commit preserve rows) with
> data in any session, we will not be able to DROP from any session(not even
> from the session in which GTT is created)*
>
> *Case1:*
> create global temporary table gtt2 (c1 integer) on commit preserve rows;
> insert into gtt2 values(100);
> drop table gtt2;
>
> SQL> drop table gtt2;
> drop table gtt2
>   *
> ERROR at line 1:
> ORA-14452: attempt to create, alter or drop an index on temporary table
> already in use
>
> -- *Issue 3:* But,

Re: User Interface for WAL usage data