Re: future pg+llvm compilation is broken

2020-05-27 Thread Fabien COELHO



Hello Justin,


 llvmjit_inline.cpp:59:10: fatal error: llvm/IR/CallSite.h: No such file or 
directory


Seems to be the same as here:
https://www.postgresql.org/message-id/flat/CAGf%2BfX4sDP5%2B43HBz_3fjchawO6boqwgbYVfuFc1D4gbA6qQxw%40mail.gmail.com#540c3746c79c0f13360b35c9c369a887


Definitely. I did not notice this thread, should have.


Which is why animal seawasp is in now the red.


Which I run, hence I noticed it. It should have turned red in April, but 
the host had some issues hence the compiler was not updated, and I fixed 
it only a few days ago.


Sorry for the noise!

On the fixing philosophy, I'm for sooner rather than later (which does not 
mean immedialtely), and that it has to be backpatched to supported 
branches.


--
Fabien.




Re: Getting ERROR with FOR UPDATE/SHARE for partitioned table.

2020-05-27 Thread amul sul
On Wed, May 27, 2020 at 12:53 PM Amit Langote 
wrote:

> On Fri, May 22, 2020 at 9:09 PM amul sul  wrote:
> > I tried similar things on inherit partitioning as follow and that looks
> fine:
> >
> > DROP TABLE tbl;
> > CREATE TABLE tbl (c1 INT,c2 TEXT);
> > CREATE TABLE tbl_null(check (c1 is NULL)) INHERITS (tbl);
> > CREATE TABLE tbl_1 (check (c1 > 0 and c1 < 4)) INHERITS (tbl);
> > INSERT INTO tbl_1 VALUES(generate_series(1,3));
> >
> > postgres=# SELECT func(10);
> >  func
> > --
> >10
> > (1 row)
> >
> > On looking further for declarative partition, I found that issue happens
> only if
> > the partitioning pruning enabled, see this:
> >
> > -- Execute on original set of test case.
> > postgres=# ALTER FUNCTION func SET enable_partition_pruning to off;
> > ALTER FUNCTION
> >
> > postgres=# SELECT func(10);
> >  func
> > --
> >10
> > (1 row)
> >
> > I think we need some indication in execCurrentOf() to skip error if the
> relation
> > is pruned.  Something like that we already doing for inheriting
> partitioning,
> > see following comment execCurrentOf():
> >
> > /*
> >  * This table didn't produce the cursor's current row; some other
> >  * inheritance child of the same parent must have.  Signal
> caller to
> >  * do nothing on this table.
> >  */
>
> Actually, if you declare the cursor without FOR SHARE/UPDATE, the case
> would fail even with traditional inheritance:
>
> drop table if exists p cascade;
> create table p (a int);
> create table c (check (a = 2)) inherits (p);
> insert into p values (1);
> insert into c values (2);
> begin;
> declare c cursor for select * from p where a = 1;
> fetch c;
> update p set a = a where current of c;
> ERROR:  cursor "c" is not a simply updatable scan of table "c"
> ROLLBACK
>
>
I am not sure I understood the point, you'll see the same error with
declarative
partitioning as well.


> When there are no RowMarks to use because no FOR SHARE/UPDATE clause
> was specified when declaring the cursor, execCurrentOf() tries to find
> the cursor's current table by looking up its Scan node in the plan
> tree but will not find it if it was excluded in the cursor's query.
>
> With FOR SHARE/UPDATE, it seems to work because the planner delivers
> the RowMarks of all the children irrespective of whether or not they
> are present in the plan tree itself (something I had complained about
> in past [1]).  execCurrentOf() doesn't complain as long as there is a
> RowMark present even if it's never used.  For partitioning, the
> planner doesn't make RowMarks for pruned partitions, so
> execCurrentOf() can't find one if it's passed a pruned partition's
> oid.
>
>
Right.


> I don't see a way to avoid these errors.  How does execCurrentOf()
> distinguish a table that could *never* be present in a cursor from a
> table that could be had it not been pruned/excluded?  If it can do
> that, then give an error for the former and return false for the
> latter.
>

Yeah. I haven't thought much about this; I was thinking initially just to
skip
error by assuming that the table that we are looking might have pruned, but
I am
not sure how bad or good approach it is.


> I guess the workaround is to declare the cursor such that no
> partitions/children are pruned/excluded.
>
>
Disabling pruning as well -- at-least for the statement or function.

Regards,
Amul


-- 
> Amit Langote
> EnterpriseDB: http://www.enterprisedb.com
>
> [1]
> https://www.postgresql.org/message-id/468c85d9-540e-66a2-1dde-fec2b741e688%40lab.ntt.co.jp
>


Re: Explain Analyze (Rollback off) Suggestion

2020-05-27 Thread David G. Johnston
On Wed, May 27, 2020 at 5:31 PM Tom Lane  wrote:

> Robert Haas  writes:
> > I think the only way to make the effects of an EXPLAIN ANALYZE
> > statement be automatically rolled back would be to wrap the entire
> > operation in a subtransaction. While we could certainly implement
> > that, it might have its own share of surprises; for example, it would
> > consume an XID, leading to faster wraparound vacuums if you do it
> > frequently.
>
> Right, but it's just automating something that people now do by hand
> (ie manually wrap the EXPLAIN in BEGIN/ROLLBACK) when that's what they
> need.  I think the idea of having an option to do it for you isn't bad.
>

Agreed

> I'm strongly against changing the very-longstanding default behavior of
> EXPLAIN ANALYZE, though; the villagers at your doorstep will not be
> bringing flowers.  So this new option has to *not* default to on.
>

The "safety" aspect of this is a motivator but at least having the option
exist makes users both more aware and also simplifies usage, so ok.

> As far as the general topic of the thread goes, I like the idea of
> controlling EXPLAIN options on the client side way better than inventing
> statement-behavior-altering GUCs.  We learned our lesson about that a
> decade or two back; only those who don't remember propose new ones.
>

I'm not seeing enough similarity with the reasons for, and specific
behaviors, of those previous GUCs to dismiss this proposal on that basis
alone.  These are "crap we messed things up" switches that alter a query
behind the scenes in ways that a user cannot do through SQL - they simply
provide for changing a default that we already allow the user to override
per-query.  Its akin to "DateStyle" and its pure cosmetic influencing
ease-of-use option rather than some changing the fundamental structural
meaning of '\n'

If that isn't enough then I would just drop the idea since I don't see
enough benefit to introducing a wrapper layer in psql on top of explain.

David J.


Re: Warn when parallel restoring a custom dump without data offsets

2020-05-27 Thread David Gilman
I've attached the latest patches after further review from Justin Pryzby.

-- 
David Gilman  :DG<
https://gilslotd.com
>From 90e06cbb724f6f6a244dfc69f3d59ca2e7d29c01 Mon Sep 17 00:00:00 2001
From: David Gilman 
Date: Wed, 20 May 2020 22:49:28 -0400
Subject: [PATCH 1/4] Scan all TOCs when restoring a custom dump file without
 offsets

TOC requests are not guaranteed to come in disk order. If the custom
dump file was written with data offsets, pg_restore can seek directly to
the data, making request order irrelevant. If there are no data offsets,
pg_restore would never attempt to seek backwards, even when seeking is
possible, and would be unable to find TOCs before the current read
position in the file. 548e50976 changed how pg_restore's parallel
algorithm worked at the cost of greatly increasing out-of-order TOC
requests.

This patch changes pg_restore to scan through all TOCs to service a TOC
read request when restoring a custom dump file without data offsets.
The odds of getting a successful parallel restore go way up at the cost
of a bunch of extra tiny reads when pg_restore starts up.

The pg_restore manpage now warns against running pg_dump with an
unseekable output file and suggests that if you plan on doing a parallel
restore of a custom dump, you should run pg_dump with --file.
---
 doc/src/sgml/ref/pg_restore.sgml   |  8 
 src/bin/pg_dump/pg_backup_custom.c | 25 -
 2 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index 232f88024f..23286bb076 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -279,6 +279,14 @@ PostgreSQL documentation
 jobs cannot be used together with the
 option --single-transaction.

+
+   
+pg_restore with concurrent jobs may fail
+when restoring a custom archive format dump written to an unseekable
+output stream, like stdout. To allow for concurrent restoration of
+a custom archive format dump, use pg_dump's
+--file option to specify an output file.
+   
   
  
 
diff --git a/src/bin/pg_dump/pg_backup_custom.c b/src/bin/pg_dump/pg_backup_custom.c
index 369dcea429..5aa7ab33db 100644
--- a/src/bin/pg_dump/pg_backup_custom.c
+++ b/src/bin/pg_dump/pg_backup_custom.c
@@ -415,6 +415,7 @@ _PrintTocData(ArchiveHandle *AH, TocEntry *te)
 	lclTocEntry *tctx = (lclTocEntry *) te->formatData;
 	int			blkType;
 	int			id;
+	bool		initialScan = true;
 
 	if (tctx->dataState == K_OFFSET_NO_DATA)
 		return;
@@ -423,13 +424,28 @@ _PrintTocData(ArchiveHandle *AH, TocEntry *te)
 	{
 		/*
 		 * We cannot seek directly to the desired block.  Instead, skip over
-		 * block headers until we find the one we want.  This could fail if we
-		 * are asked to restore items out-of-order.
+		 * block headers until we find the one we want.
 		 */
-		_readBlockHeader(AH, , );
 
-		while (blkType != EOF && id != te->dumpId)
+		for (;;)
 		{
+			_readBlockHeader(AH, , );
+
+			if (blkType == EOF && ctx->hasSeek && initialScan)
+			{
+/*
+ * This was possibly an out-of-order request. Try one extra
+ * pass over the file to find the TOC.
+ */
+initialScan = false;
+if (fseeko(AH->FH, ctx->dataStart, SEEK_SET) != 0)
+	fatal("error during file seek: %m");
+continue;
+			}
+
+			if (blkType == EOF || id == te->dumpId)
+break;
+
 			switch (blkType)
 			{
 case BLK_DATA:
@@ -445,7 +461,6 @@ _PrintTocData(ArchiveHandle *AH, TocEntry *te)
 		  blkType);
 	break;
 			}
-			_readBlockHeader(AH, , );
 		}
 	}
 	else
-- 
2.26.2

>From 750958499b19e6295a15b01ccb3a9e3ce963af2c Mon Sep 17 00:00:00 2001
From: David Gilman 
Date: Sat, 23 May 2020 10:49:33 -0400
Subject: [PATCH 2/4] Add integration test for out-of-order TOC requests in
 pg_restore

Add undocumented --disable-seeking argument to pg_dump to emulate
writing to an unseekable output file.
---
 src/bin/pg_dump/pg_backup.h  |  4 +-
 src/bin/pg_dump/pg_backup_archiver.c | 14 ++--
 src/bin/pg_dump/pg_backup_archiver.h |  1 +
 src/bin/pg_dump/pg_backup_custom.c   |  7 +-
 src/bin/pg_dump/pg_dump.c|  6 +-
 src/bin/pg_dump/t/002_pg_dump.pl | 97 +++-
 6 files changed, 117 insertions(+), 12 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index 8c0cedcd98..160d705eb5 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -158,6 +158,7 @@ typedef struct _dumpOptions
 	int			use_setsessauth;
 	int			enable_row_security;
 	int			load_via_partition_root;
+	int			disable_seeking;
 
 	/* default, if no "inclusion" switches appear, is to dump everything */
 	bool		include_everything;
@@ -270,7 +271,8 @@ extern Archive *OpenArchive(const char *FileSpec, const ArchiveFormat fmt);
 /* Create a new archive */
 extern Archive *CreateArchive(const char *FileSpec, const ArchiveFormat fmt,
 			  const int 

Re: Explain Analyze (Rollback off) Suggestion

2020-05-27 Thread Tom Lane
Robert Haas  writes:
> I think the only way to make the effects of an EXPLAIN ANALYZE
> statement be automatically rolled back would be to wrap the entire
> operation in a subtransaction. While we could certainly implement
> that, it might have its own share of surprises; for example, it would
> consume an XID, leading to faster wraparound vacuums if you do it
> frequently.

Right, but it's just automating something that people now do by hand
(ie manually wrap the EXPLAIN in BEGIN/ROLLBACK) when that's what they
need.  I think the idea of having an option to do it for you isn't bad.

I'm strongly against changing the very-longstanding default behavior of
EXPLAIN ANALYZE, though; the villagers at your doorstep will not be
bringing flowers.  So this new option has to *not* default to on.

As far as the general topic of the thread goes, I like the idea of
controlling EXPLAIN options on the client side way better than inventing
statement-behavior-altering GUCs.  We learned our lesson about that a
decade or two back; only those who don't remember propose new ones.

regards, tom lane




Re: tablespace_map code cleanup

2020-05-27 Thread Kyotaro Horiguchi
At Wed, 27 May 2020 07:57:38 -0400, Robert Haas  wrote 
in 
> On Wed, May 13, 2020 at 10:43 PM Kyotaro Horiguchi
>  wrote:
> > About 0002,
> >
> > +   boolsendtblspclinks = true;
> >
> > The boolean seems to me useless since it is always the inverse of
> > opt->sendtblspcmapfile when it is used.
> 
> Well, I think it might have some documentation value, to clarify that
> whether or not we send tablespace links is the opposite of whether we
> send the tablespace map file.

That makes sense to me. Thanks!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: New 'pg' consolidated metacommand patch

2020-05-27 Thread Mark Dilger



> On May 27, 2020, at 2:42 PM, Christopher Browne  wrote:
> 
> There is merit to having a new, harmonious set of "pg commands."  But it's 
> eminently easy to get into trouble (and get people mad) by changing things 
> that have been working fine for many years.  Half the battle (against the 
> "getting people mad" part) is making sure that it's clear that people were 
> listened to.  Listening to the community is one of the important things to do 
> :-).

I totally agree.

There are options for keeping the existing tools and not modifying them.  If 
the "pg" command (or "pgsql" command, if we use that naming) knows, for 
example, how to execute pg_ctl, that's no harm to people who prefer to just run 
pg_ctl directly.  It only becomes a problem when this patch, or one like it, 
decides that "pg_ctl" needs to work differently, have a different set of 
command line options, etc.  The only thing I changed about pg_ctl and friends 
in the v1 patch is that they moved from BINDIR to LIBEXECDIR, and internally 
they were updated to be able to still work despite the move.  That change was 
partly designed to spark conversation.  If people prefer they get moved back 
into BINDIR, fine by me.  If people instead prefer that the patch include links 
between the old BINDIR location and the new LIBEXECDIR location, that's also 
fine by me.  The "pg" command doesn't really care either.  I'm intentionally 
not calling the shots here.  I'm asking the community members, many of whom 
expressed an interest in something along the lines of this patch.  I'm happy to 
do the grunt work of the patch to meet the community needs.

Dave Page expressed an interest upthread in standardizing the interfaces of the 
various commands.  He didn't say this, but I assume he is thinking about things 
like -d meaning --debug in initdb but meaning --dbname=CONNSTR in 
pg_basebackup.  We could break backwards compatibility by changing one or both 
of those commands to interpret those options in some new standardized way.  Or, 
we could preserve backwards compatibililty by having "pg" take --dbname and 
--debug options and pass them to the subcommand according to the grandfathered 
rules of the subcommand.  I tend towards preserving compability, but maybe 
somebody on this list wants to argue for the other side?  For new commands 
introduced after this patch gets committed (assuming it does), options could be 
passed from "pg" through to the subcommand unmolested.  That supports Robert's 
idea that people could install new subcommands from contrib modules without the 
"pg" command needing to know anything about them.  This, too, is still open to 
conversation and debate.

I'd like to hear from more community members on this.  I'm listening.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: BufFileRead() error signalling

2020-05-27 Thread Alvaro Herrera
On 2020-May-28, Thomas Munro wrote:

> My indecision on the back-patching question has been resolved by your
> crash report and a search on codesearch.debian.org.

Great news!

> BufFileRead() and BufFileWrite() aren't referenced by any of the
> extensions they package, so by that standard it's OK to change this
> stuff in back branches.

This makes me a bit uncomfortable.  For example,
https://inst.eecs.berkeley.edu/~cs186/fa03/hwk5/assign5.html (admittedly
a very old class) encourages students to use this API to create an
aggregate.  It might not be the smartest thing in the world, but I'd
prefer not to break such things if they exist proprietarily.  Can we
keep the API unchanged in stable branches and just ereport the errors?

> I'll post a rebased a patch with Robert and Ibrar's changes
> for last reviews later today.

... walks away wondering about BufFileSeekBlock's API ...

(BufFileSeek seems harder to change, due to tuplestore.c)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: BufFileRead() error signalling

2020-05-27 Thread Thomas Munro
On Thu, May 28, 2020 at 4:16 AM Alvaro Herrera  wrote:
> On 2020-Jan-27, Robert Haas wrote:
> > OK, now that I've waxed eloquent on that topic, let me have a go at
> > your actual questions. Regarding back-patching, I don't mind
> > back-patching error handling patches like this, but I don't feel it's
> > necessary if we have no evidence that data is actually getting
> > corrupted as a result of the problem and the chances of it actually
> > happening seems remote.
>
> I do have evidence of postgres crashes because of a problem that could
> be explained by this bug, so I +1 backpatching this to all supported
> branches.
>
> (The problem I saw is a hash-join spilling data to temp tablespace,
> which fills up but somehow goes undetected, then when reading the data
> back it causes heap_fill_tuple to crash.)

Ooh.

> Thomas, if you're no longer interested in seeing this done, please let
> me know and I can see to it.

My indecision on the back-patching question has been resolved by your
crash report and a search on codesearch.debian.org.  BufFileRead() and
BufFileWrite() aren't referenced by any of the extensions they
package, so by that standard it's OK to change this stuff in back
branches.  I'll post a rebased a patch with Robert and Ibrar's changes
for last reviews later today.




Re: New 'pg' consolidated metacommand patch

2020-05-27 Thread Christopher Browne
On Wed, 27 May 2020 at 16:49, Isaac Morland  wrote:

> On Wed, 27 May 2020 at 16:00, Peter Eisentraut <
> peter.eisentr...@2ndquadrant.com> wrote:
>
>
>> Also consider some practical concerns with the command structure you
>> describe: Tab completion of commands wouldn't work anymore, unless you
>> supply custom tab completion setups.  The direct association between a
>> command and its man page would be broken.  Shell scripting becomes more
>> challenging:  Instead of writing common things like "if which
>> pg_waldump; then" you'd need some custom code, to be determined.  These
>> are all solvable, but just a sum of slight annoyances, for no real
>> benefit.
>>
>
> I don’t think the man page concern is justified. We could have a “help”
> subcommand, just like git; “git help add” is (to a casual observer;
> probably not precisely) the same as “man git-add”.
>

There's some very small gulf in between the concerns...

- On the one hand, git (and systems with similar "keyword" subsystems) have
arrived at reasonable solutions to cope with various of the systematic
issues, so that these shouldn't be considered to be gigantic insurmountable
barriers.

Indeed, some of these tools present systematic solutions to additional
matters.  I was very pleased when I found that some of the Kubernetes tools
I was working with included subcommands to configure my shell to know how
to do command completion.  Seems like a fine thing to me to have that
become systematically *easier*, and I think that would be a good new
subcommand...  "pg completion bash" and "pg completion zsh" would be mighty
fine things.

- On the other hand, mapping old commands that are names of programs onto
"pg subcommands" is some additional effort, and we haven't yet started
bikeshedding on the favoured names :-)

I have lately noticed some interesting looking apps wandering about that
briefly attracted my attention, but, which, due to being painfully
different from the existing commands and tools that I have already learned,
and have "muscle memory" for, am loath to leave.   I'll throw out 4
examples, 3 of them personal:
a) nnn is a terminal-based file manager.  It has some nifty features
surrounding the concept that you can set up custom file filters to look for
sorts of files that you find interesting, and then offers customizable UI
for running favorite actions against them.  I'm 25 years into using Emacs
Dired mode; as neat as nnn seems, it's not enough of an improvement to be
worth the pain in the neck of relearning stuff.
b) 3mux is a redo of tmux (which was a redo of GNU Screen), and has key
mappings that make it way easier for a new user to learn.  I'm 20-ish years
into Screen/Tmux; I wasn't looking for it to be easier to learn, because I
did that quite a while ago.
c) Kakoune is a vi-like editor which rotates from vi's "verb/object"
approach to commands to a "object/verb" approach, for apparent more
efficiency.  I think I already mentioned that my "muscle memory" is biased
by Emacs features...  I'm not adding a "rotated-vi-like" thing into my mix
:-(
d) systemd is a Controversial System; the folk that seem particularly irate
about it seem to be "Old Bearded Sysadmins" that hate the idea of redoing
their understandings of how Unix systems initialize.  Personally, my
feelings are ambivalent; I'm using it where I find some use, and have not
been displeased with my results.  And since modern systems now have USB and
network devices added and dropped on a whim, there's a critical need for
something newer with more dynamic responses than old SysV Init.  But I
certainly "get" that some aren't so happy with it, and I'm not thrilled at
the ongoing scope creep that never seems to end.

There is merit to having a new, harmonious set of "pg commands."  But it's
eminently easy to get into trouble (and get people mad) by changing things
that have been working fine for many years.  Half the battle (against the
"getting people mad" part) is making sure that it's clear that people were
listened to.  Listening to the community is one of the important things to
do :-).
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: New 'pg' consolidated metacommand patch

2020-05-27 Thread Isaac Morland
On Wed, 27 May 2020 at 16:00, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:


> Also consider some practical concerns with the command structure you
> describe: Tab completion of commands wouldn't work anymore, unless you
> supply custom tab completion setups.  The direct association between a
> command and its man page would be broken.  Shell scripting becomes more
> challenging:  Instead of writing common things like "if which
> pg_waldump; then" you'd need some custom code, to be determined.  These
> are all solvable, but just a sum of slight annoyances, for no real benefit.
>

I don’t think the man page concern is justified. We could have a “help”
subcommand, just like git; “git help add” is (to a casual observer;
probably not precisely) the same as “man git-add”.


Re: future pg+llvm compilation is broken

2020-05-27 Thread Justin Pryzby
On Wed, May 27, 2020 at 07:40:27PM +0200, Fabien COELHO wrote:
>  llvmjit_inline.cpp:59:10: fatal error: llvm/IR/CallSite.h: No such file or

Seems to be the same as here:
https://www.postgresql.org/message-id/flat/CAGf%2BfX4sDP5%2B43HBz_3fjchawO6boqwgbYVfuFc1D4gbA6qQxw%40mail.gmail.com#540c3746c79c0f13360b35c9c369a887

> Which is why animal seawasp is in now the red.

-- 
Justin




Re: New 'pg' consolidated metacommand patch

2020-05-27 Thread Peter Eisentraut

On 2020-05-27 01:19, Mark Dilger wrote:

Attached is a patch for a `pg' command that consolidates various PostgreSQL 
functionality into a single command, along the lines of how `git' commands are 
run from a single 'git' executable.  In other words,

   `pg upgrade`   # instead of `pg_upgrade`
   `pg resetwal`   # instead of `pg_resetwal`

This has been discussed before on the -hackers list, but I don't recall seeing 
a patch.  I'm submitting this patch mostly as a way of moving the conversation 
along, fully expecting the community to want some (or all) of what I wrote to 
be changed.

I'd also appreciate +1 and -1 votes on the overall idea, in case this entire 
feature, regardless of implementation, is simply something the community does 
not want.


I'm not excited about this.

First, consider that git has over 170 subcommands.  PostgreSQL currently 
has 36, and we're probably not going to add dozens more any time soon. 
So the issue is not of the same scope.  It also seems to me that the way 
git is organized this becomes a self-perpetuating system:  They are 
adding subcommands all the time without much care where you might in 
other situations think harder about combining them and keeping the 
surface area smaller.  For example, we wouldn't really need separate 
commands clusterdb, reindexdb, vacuumdb if we had better support in psql 
for "run this command in each database [in parallel]".


git (and svn etc. before it) also has a much more consistent operating 
model that is sensible to reflect in the command structure.  They all 
more or less operate on a git repository, in apparently 170 different 
ways.  The 36 PostgreSQL commands don't all work in the same way.  Now 
if someone were to propose a way to combine server tools, perhaps like 
pginstancetool {init|controldata|resetwal|checksum}, and perhaps also in 
a way that actually saves code duplication and inconsistency, that would 
be something to consider.  Or maybe a client-side tool that does 
pgclienttool {create user|drop user|create database|...} -- but that 
pretty much already exists by the name of psql.  But just renaming 
everything that's shipped with PostgreSQL to one common bucket without 
regard to how it actually works and what role it plays would be 
unnecessarily confusing.


Also consider some practical concerns with the command structure you 
describe: Tab completion of commands wouldn't work anymore, unless you 
supply custom tab completion setups.  The direct association between a 
command and its man page would be broken.  Shell scripting becomes more 
challenging:  Instead of writing common things like "if which 
pg_waldump; then" you'd need some custom code, to be determined.  These 
are all solvable, but just a sum of slight annoyances, for no real benefit.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Failure to create GiST on ltree column

2020-05-27 Thread Victor Yegorov
пн, 25 мая 2020 г. в 18:25, Justin Pryzby :

> I wonder if/how that fails if you create the index before adding data:
>
> CREATE TABLE test_path(path ltree);
> CREATE INDEX ON test_path USING GIST(path);
> INSERT INTO test_path SELECT * FROM comments.mp_comments;
>
> Does that fail on a particular row ?
>
> How many paths do you have and how long?  How big is the table?
>

Yes, it fails.

I got permission and created a partial dump of the data with:
CREATE TABLE lc AS SELECT id, path FROM comments.mp_comments WHERE
length(path::text)>=500;

Attached. It is reproduces the error I get. One needs to create ltree
extension first.

I understand, that issue most likely comes from the length of the ltree
data stored in the columns.
But error is a bit misleading…


-- 
Victor Yegorov


lc.pgdump
Description: Binary data


future pg+llvm compilation is broken

2020-05-27 Thread Fabien COELHO



Hello devs,

 commit 2c24051bacd2d0eb7141fc4adb870281aec4e714
 Author: Craig Topper 
 Date:   Fri Apr 24 22:12:21 2020 -0700

[CallSite removal] Rename CallSite.h to AbstractCallSite.h. NFC

The CallSite and ImmutableCallSite were removed in a previous
commit. So rename the file to match the remaining class and
the name of the cpp that implements it.

Hence :

 .. llvmjit_inline.cpp
 llvmjit_inline.cpp:59:10: fatal error: llvm/IR/CallSite.h: No such file or
 directory
   59 | #include 
  |  ^~~~

Which is why animal seawasp is in now the red.

It looks unlikely that it will vanish, so pg must probably start aiming at 
the moving llvm target.


--
Fabien.




Re: New 'pg' consolidated metacommand patch

2020-05-27 Thread Isaac Morland
On Wed, 27 May 2020 at 12:35, Robert Haas  wrote:

> Ugh, yeah, please don't do that. Renaming them just to make it "look more
> modern" helps nobody, really. Especially if the suggestion is people should
> be using the shared-launcher binary anyway.
>
> The way things like 'git' work is that 'git thunk' just looks in a
> designated directory for an executable called git-thunk, and invokes
> it if it's found. If you want to invent your own git subcommand, you
> can. I guess 'git help' wouldn't know to list it, but you can still
> get the metacommand to execute it. That only works if you use a
> standard naming, though. If the meta-executable has to hard-code the
> names of all the individual executables that it calls, then you can't
> really make that work.
>

You could make the legacy names symlinks to the new systematic names.


Re: segmentation fault using currtid and partitioned tables

2020-05-27 Thread Alvaro Herrera
On 2020-May-26, Michael Paquier wrote:

> On Mon, May 25, 2020 at 06:29:10PM +0900, Michael Paquier wrote:
> > Perhaps you are right though, and that we don't need to spend this
> > much energy into improving the error messages so I am fine to discard
> > this part.  At the end, in order to remove the crashes, you just need
> > to keep around the two RELKIND_HAS_STORAGE() checks.  But I would
> > rather keep these two to use ereport(ERRCODE_FEATURE_NOT_SUPPORTED)
> > instead of elog(), and keep the test coverage of the previous patch
> > (including the tests for the aggregates I noticed were missing).
> > Would you be fine with that?
> 
> And this means the attached.  Thoughts are welcome.

Yeah, this looks good to me.  I would have used elog() instead, but
I don't care enough ... as a translator, I can come up with a message as
undecipherable as the original without worrying too much, since I
suspect nobody will ever see it in practice.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: New 'pg' consolidated metacommand patch

2020-05-27 Thread Robert Haas
On Wed, May 27, 2020 at 4:51 AM Magnus Hagander  wrote:
>> For that reason, I did not change the names of the executables, merely their 
>> location.  During conversations with Robert off-list, we discussed renaming 
>> the executables to things like 'pg-ctl' (hyphen rather than underscore), 
>> mostly because that's the more modern way of doing it and follows what 'git' 
>> does.  To avoid breaking scripts that execute these commands by the old 
>> name, this patch doesn't go that far.  It also leaves the usage() functions 
>> alone such that when they report their own progname in the usage text, they 
>> do so under the old name.  This would need to change at some point, but I'm 
>> unclear on whether that would be for v14 or if it would be delayed.
>
> Ugh, yeah, please don't do that. Renaming them just to make it "look more 
> modern" helps nobody, really. Especially if the suggestion is people should 
> be using the shared-launcher binary anyway.

The way things like 'git' work is that 'git thunk' just looks in a
designated directory for an executable called git-thunk, and invokes
it if it's found. If you want to invent your own git subcommand, you
can. I guess 'git help' wouldn't know to list it, but you can still
get the metacommand to execute it. That only works if you use a
standard naming, though. If the meta-executable has to hard-code the
names of all the individual executables that it calls, then you can't
really make that work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: BufFileRead() error signalling

2020-05-27 Thread Robert Haas
On Wed, May 27, 2020 at 12:16 PM Alvaro Herrera
 wrote:
> I do have evidence of postgres crashes because of a problem that could
> be explained by this bug, so I +1 backpatching this to all supported
> branches.
>
> (The problem I saw is a hash-join spilling data to temp tablespace,
> which fills up but somehow goes undetected, then when reading the data
> back it causes heap_fill_tuple to crash.)

FWIW, that seems like a plenty good enough reason for back-patching to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: BufFileRead() error signalling

2020-05-27 Thread Alvaro Herrera
On 2020-Jan-27, Robert Haas wrote:

> OK, now that I've waxed eloquent on that topic, let me have a go at
> your actual questions. Regarding back-patching, I don't mind
> back-patching error handling patches like this, but I don't feel it's
> necessary if we have no evidence that data is actually getting
> corrupted as a result of the problem and the chances of it actually
> happening seems remote.

I do have evidence of postgres crashes because of a problem that could
be explained by this bug, so I +1 backpatching this to all supported
branches.

(The problem I saw is a hash-join spilling data to temp tablespace,
which fills up but somehow goes undetected, then when reading the data
back it causes heap_fill_tuple to crash.)

Thomas, if you're no longer interested in seeing this done, please let
me know and I can see to it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Explain Analyze (Rollback off) Suggestion

2020-05-27 Thread Robert Haas
On Wed, May 27, 2020 at 10:48 AM David G. Johnston
 wrote:
> The recent discussion about EXPLAIN and the possible inclusion of 
> default-specifying GUCs raised a behavior that I did not fully appreciate nor 
> find to be self-evident.  Running EXPLAIN ANALYZE results in any side-effects 
> of the explained and analyzed statement being permanently written to the 
> current transaction - which is in many cases is implicitly immediately 
> committed unless the user takes care otherwise.  This seems like an 
> implementation expedient behavior but an unfriendly default.  It doesn't seem 
> unreasonable for a part-time dba to expect an explain outcome to always be 
> non-persistent, even in ANALYZE mode since the execution of that command 
> could be done in a transaction (or savepoint...) and then immediately undone 
> before sending the explain output to the client.
>
> I'm against having a GUC that implicitly triggers an ANALYZE version of the 
> EXPLAIN command.  I also think that it would be worth the effort to try and 
> make EXPLAIN ANALYZE default to using auto-rollback behavior.  Overriding 
> that default behavior could be done on a per command basis by specifying the 
> option "ROLLBACK off".  With the new GUCs users that find themselves in the 
> situation of needing a non-permanent outcome across multiple commands could 
> then get back to the less safe behavior by setting the corresponding GUC to 
> off in their session.  I won't pretend to have any idea how often that would 
> be useful - especially as it would depend upon whether the auto-savepoint 
> idea is workable or whether the client has to be outside of a transaction in 
> order for the rollback limited behavior to work.

I think the only way to make the effects of an EXPLAIN ANALYZE
statement be automatically rolled back would be to wrap the entire
operation in a subtransaction. While we could certainly implement
that, it might have its own share of surprises; for example, it would
consume an XID, leading to faster wraparound vacuums if you do it
frequently.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: BufFileRead() error signalling

2020-05-27 Thread Alvaro Herrera
On 2020-Jan-29, Michael Paquier wrote:

> On Tue, Jan 28, 2020 at 03:51:54PM -0500, Robert Haas wrote:
> > I quickly reread that thread and I don't see that there's any firm
> > consensus there in favor of "read %d of %zu" over "read only %d of %zu
> > bytes". Now, if most people prefer the former, so be it, but I don't
> > think that's clear from that thread.
> 
> The argument of consistency falls in favor of the former on HEAD:
> $ git grep "could not read" | grep "read %d of %zu" | wc -l
> 59
> $ git grep "could not read" | grep "read only %d of %zu" | wc -l
> 0

In the discussion that led to 811b6e36a9e2 I did suggest to use "read
only M of N" instead, but there wasn't enough discussion on that fine
point so we settled on what you now call prevalent without a lot of
support specifically on that.  I guess it was enough of an improvement
over what was there.  But like Robert, I too prefer the wording that
includes "only" and "bytes" over the wording that doesn't.

I'll let it be known that from a translator's point of view, it's a
ten-seconds job to update a fuzzy string from not including "only" and
"bytes" to one that does.  So let's not make that an argument for not
changing.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: New 'pg' consolidated metacommand patch

2020-05-27 Thread Robert Haas
On Wed, May 27, 2020 at 4:51 AM Magnus Hagander  wrote:
> As mentioned at least once before, the "pg" name is already taken in posix. 
> Granted it has been removed now, but it was removed from posix in 2018, which 
> I think is nowhere near soon enough to "steal. See for example 
> https://en.wikipedia.org/wiki/Pg_(Unix)

The previous discussion of this general topic starts at
http://postgr.es/m/ca+tgmozqmdy7nlrq96nlm-wrnmnpy90qdmvz6ltjo941gwg...@mail.gmail.com
and the discussion of this particular issue starts at
https://www.postgresql.org/message-id/15135.1586703479%40sss.pgh.pa.us

I think I agree with what Andres said on that thread: rather than
waiting a long time to see what happens, we should grab the name
before somebody else does. As also discussed on that thread, perhaps
we should have the official name of the binary be 'pgsql' with 'pg' as
a symlink that some packagers might choose to omit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: New 'pg' consolidated metacommand patch

2020-05-27 Thread Christoph Moench-Tegeder
## Magnus Hagander (mag...@hagander.net):

> Ugh, yeah, please don't do that. Renaming them just to make it "look more
> modern" helps nobody, really. Especially if the suggestion is people should
> be using the shared-launcher binary anyway.

Quick, let's invent a fancy name like "microcommand" for doing this
like we're used to; then we tell people that's the new "modern" (anybody
cares to write a Medium article for that? (why Medium? it's neither
Rare nor Well Done)). What might make sense for (some) version control
systems and is tempting in languages which have forgotten howto shared
library might not be the best architecture for everything. What has
become of the collection of small dedicated tools?

Regards,
Christoph

-- 
Spare Space




Re: Fix compilation failure against LLVM 11

2020-05-27 Thread Jesse Zhang
Hi Andres,
On the mensiversary of the last response, what can I do to help move
this along (aside from heeding the advice "don't use LLVM HEAD")?

Michael Paquier expressed concerns over flippant churn upthread:

On Mon, Apr 27, 2020 at 10:19 PM Andres Freund wrote:
AF> On 2020-04-28 13:56:23 +0900, Michael Paquier wrote:
MP> > On Mon, Apr 27, 2020 at 07:48:54AM -0700, Jesse Zhang wrote:
JZ> > > Are you expressing a concern against "churning" this part of the
JZ> > > code in reaction to upstream LLVM changes? I'd agree with you in
JZ> > > general. But then the question we need to ask is "will we need
JZ> > > to revert this 3 weeks from now if upstream reverts their
JZ> > > changes?", or "we change X to Y now, will we need to instead
JZ> > > change X to Z 3 weeks later?".
> >
MP> > My concerns are a mix of all that, because we may finish by doing
MP> > the same verification work multiple times instead of fixing all
MP> > existing issues at once.  A good thing is that we may be able to
MP> > conclude rather soon, it looks like LLVM releases a new major
MP> > version every 3 months or so.
>
AF> Given the low cost of a change like this, and the fact that we have
AF> a buildfarm animal building recent trunk versions of llvm, I think
AF> it's better to just backpatch now.

For bystanders: Andres and I seemed to agree that this is unlikely to be
flippant (in addition to other benefits mentioned below). We haven't
discussed more on this, though I'm uncertain we had a consensus.

>
JZ> > > In that frame of mind, the answer is simply "no" w.r.t this
JZ> > > patch: it's removing an #include that simply has been dead: the
JZ> > > upstream change merely exposed it.
> >
MP> > The docs claim support for LLVM down to 3.9.  Are versions older
MP> > than 8 fine with your proposed change?
>
AF> I'll check.
>

How goes the checking? I was 99% certain it'd work but that might have
been my excuse to be lazy. Do you need help on this?

>
JZ> > > OTOH, is your concern more around "how many more dead #include
JZ> > > will LLVM 11 reveal before September?", I'm open to suggestions.
JZ> > > I personally have a bias to keep things working.
> >
MP> > This position can have advantages, though it seems to me that we
MP> > should still wait to see if there are more issues popping up.
>
AF> What's the benefit? The cost of checking this will be not
AF> meaningfully lower if there's other things to check as well, given
AF> their backward compat story presumably is different.

For bystanders: Andres and I argued for "fixing this sooner and
backpatch" and Michael suggested "wait longer and whack all moles". We
have waited, and there seems to be only one mole (finding all dead
unbroken "include"s was left as an exercise for the reader). Have we
come to an agreement on this?

Cheers,
Jesse




Explain Analyze (Rollback off) Suggestion

2020-05-27 Thread David G. Johnston
The recent discussion about EXPLAIN and the possible inclusion of
default-specifying GUCs raised a behavior that I did not fully appreciate
nor find to be self-evident.  Running EXPLAIN ANALYZE results in any
side-effects of the explained and analyzed statement being permanently
written to the current transaction - which is in many cases is implicitly
immediately committed unless the user takes care otherwise.  This seems
like an implementation expedient behavior but an unfriendly default.  It
doesn't seem unreasonable for a part-time dba to expect an explain outcome
to always be non-persistent, even in ANALYZE mode since the execution of
that command could be done in a transaction (or savepoint...) and then
immediately undone before sending the explain output to the client.

I'm against having a GUC that implicitly triggers an ANALYZE version of the
EXPLAIN command.  I also think that it would be worth the effort to try and
make EXPLAIN ANALYZE default to using auto-rollback behavior.  Overriding
that default behavior could be done on a per command basis by specifying
the option "ROLLBACK off".  With the new GUCs users that find themselves in
the situation of needing a non-permanent outcome across multiple commands
could then get back to the less safe behavior by setting the corresponding
GUC to off in their session.  I won't pretend to have any idea how often
that would be useful - especially as it would depend upon whether the
auto-savepoint idea is workable or whether the client has to be outside of
a transaction in order for the rollback limited behavior to work.

I cannot make this happen even if there is interest but it seems like a
good time to bring up the idea.

David J.


Re: New 'pg' consolidated metacommand patch

2020-05-27 Thread Dave Page
On Wed, May 27, 2020 at 3:00 PM Mark Dilger 
wrote:

>
>
> > On May 26, 2020, at 4:59 PM, David G. Johnston <
> david.g.johns...@gmail.com> wrote:
> >
> > On Tue, May 26, 2020 at 4:19 PM Mark Dilger <
> mark.dil...@enterprisedb.com> wrote:
> > I'd also appreciate +1 and -1 votes on the overall idea, in case this
> entire feature, regardless of implementation, is simply something the
> community does not want.
> >
> > -1, at least as part of core.  My question would be how much of this is
> would be needed if someone were to create an external project that
> installed a "pg" command on top of an existing PostgreSQL installation.  Or
> put differently, how many of the changes to the existing binaries are
> required versus nice-to-have?
>
> If the only goal of something like this were to have a frontend that could
> execute the various postgres binaries, then I'd say no changes to those
> binaries would be needed, and the frontend would not be worth very much.
> The value in having the frontend is that it makes it less difficult to
> introduce new commands to the postgres suite of commands, as you don't need
> to worry about whether another executable by the same name might happen to
> already exist somewhere.  Even introducing a command named "pg" has already
> gotten such a response on this thread.  By having the commands installed in
> postgres's libexec rather than bin, you can put whatever commands you want
> in libexec without worrying about conflicts.  That still leaves open the
> question of whether existing commands get moved into libexec, and if so, if
> they keep the same name.  An external project for this would be worthless
> in this regard, as the community wouldn't get any benefit when debating the
> merits of introducing a new command vs. the potential for conflicts.
>

The issue you raise can almost certainly be resolved simply by prefixing
pg- or something similar on all the existing binary names.

I think the beauty of having a single CLI executable is that we can
redesign the user interface to make it nice and consistent for all the
different functions it offers, and to cleanup old cruft such as createuser
vs. createrole and pgbench vs. pg_* and so on.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: New 'pg' consolidated metacommand patch

2020-05-27 Thread Mark Dilger



> On May 27, 2020, at 1:50 AM, Magnus Hagander  wrote:
> 
> On Wed, May 27, 2020 at 1:19 AM Mark Dilger  
> wrote:
> Hackers,
> 
> Attached is a patch for a `pg' command that consolidates various PostgreSQL 
> functionality into a single command, along the lines of how `git' commands 
> are run from a single 'git' executable.  In other words,
> 
>   `pg upgrade`   # instead of `pg_upgrade`
>   `pg resetwal`   # instead of `pg_resetwal`
> 
> This has been discussed before on the -hackers list, but I don't recall 
> seeing a patch.  I'm submitting this patch mostly as a way of moving the 
> conversation along, fully expecting the community to want some (or all) of 
> what I wrote to be changed.
> 
> As mentioned at least once before, the "pg" name is already taken in posix. 
> Granted it has been removed now, but it was removed from posix in 2018, which 
> I think is nowhere near soon enough to "steal. See for example 
> https://en.wikipedia.org/wiki/Pg_(Unix)

Care to recommend a different name?

> All other executables have been moved to LIBEXECDIR where they retain their 
> old names and can still be run directly from the command line.  If we 
> committed this patch for v14, I think it makes sense that packagers could put 
> the LIBEXECDIR in the PATH so that 3rd-party scripts which call pg_ctl, 
> initdb, etc. continue to work. 
> 
> I would definitely not expect a packager to change the PATH, as also 
> mentioned by others. More likely options would be to symlink the binaries 
> into the actual  bindir, or just set both those directories to the same one 
> (in the path) for a number of releases as a transition.

There is nothing in the patch that expects packagers to muck with the PATH.  
The idea, badly phrased, was that by keeping the names of the executables and 
only changing locations, people would have more options for how to deal with 
the change.

> But you should definitely poll the packagers separately to make sure 
> something is done that works well for them -- especially when it comes to 
> integrating with for example the debian/ubuntu wrapper system that already 
> supports multiple parallel installs. And mind that they don't typically 
> follow hackers actively (I think), so it would be worthwhile to bring their 
> attention specifically to the thread. In many ways I'd find them more 
> important to get input from than most "other hackers" :)

Yeah, good advice.  Since I've already floated this on -hackers, I might wait a 
few days for comment, then if it looks encouraging, ask on other lists.

> For that reason, I did not change the names of the executables, merely their 
> location.  During conversations with Robert off-list, we discussed renaming 
> the executables to things like 'pg-ctl' (hyphen rather than underscore), 
> mostly because that's the more modern way of doing it and follows what 'git' 
> does.  To avoid breaking scripts that execute these commands by the old name, 
> this patch doesn't go that far.  It also leaves the usage() functions alone 
> such that when they report their own progname in the usage text, they do so 
> under the old name.  This would need to change at some point, but I'm unclear 
> on whether that would be for v14 or if it would be delayed.
> 
> Ugh, yeah, please don't do that. Renaming them just to make it "look more 
> modern" helps nobody, really. Especially if the suggestion is people should 
> be using the shared-launcher binary anyway. 
> 
> usage() seems more reasonable to change as part of a patch like this.
> 
> 
> The binaries 'createuser' and 'dropuser' might be better named 'createrole' 
> and 'droprole'.  I don't currently have aliases in this patch, but it might 
> make sense to allow 'pg createrole' as a synonym for 'pg createuser' and 'pg 
> droprole' as a synonym for 'pg dropuser'.  I have not pursued that yet, 
> largely because as soon as you go that route, it starts making sense to have 
> things like 'pg drop user', 'pg cluster db' and so forth, with the extra 
> spaces.  How far would people want me to go in this direction?
> 
> I'd say a createrole would make sense, but certainly not a "create role". 
> You'd end up with unlimited number of commands. But in either of them, I'd 
> say keep aliases completely out of it for a first iteration.

Ok.

> Prior to this patch, postgres binaries that need to execute other postgres 
> binaries determine the BINDIR using find_my_exec and trimming off their own 
> executable name.  They can then assume the other binary is in that same 
> directory.  After this patch, binaries need to find the common prefix ROOTDIR 
> = commonprefix(BINDIR,LIBEXECDIR) and then assume the other binary is either 
> in ROOTDIR/binsuffix or ROOTDIR/libexecsuffix.  This may cause problems on 
> windows if BINDIR and LIBEXECDIR are configured on different drives, as there 
> won't be a common prefix of C:\my\pg\bin and D:\my\pg\libexec.  I'm hoping 
> somebody with more Windows savvy expresses an opinion about 

Re: New 'pg' consolidated metacommand patch

2020-05-27 Thread Mark Dilger



> On May 27, 2020, at 1:13 AM, Dave Page  wrote:
> 
> Hi
> 
> On Wed, May 27, 2020 at 12:19 AM Mark Dilger  
> wrote:
> 
> I think it makes sense that packagers could put the LIBEXECDIR in the PATH so 
> that 3rd-party scripts which call pg_ctl, initdb, etc. continue to work.  
> 
> Having packages that futz with the PATH is generally a bad idea, especially 
> those that support side-by-side installations of different versions. None of 
> ours (EDBs) will be doing so.

I probably phrased that badly.  The operative word in that sentence was 
"could".  If we rename the binaries, people can still make links to them from 
the old name, but if we don't rename them, then either links or PATH changes 
*could* be used.  I'm not trying to recommend any particular approach.  
Mentioning "packagers" probably wasn't helpful, as "people" works just as well 
in that sentence.

There is also the option of not moving the binaries at all, and only putting 
new commands into libexec, while grandfathering existing ones in bin.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: New 'pg' consolidated metacommand patch

2020-05-27 Thread Mark Dilger



> On May 26, 2020, at 4:59 PM, David G. Johnston  
> wrote:
> 
> On Tue, May 26, 2020 at 4:19 PM Mark Dilger  
> wrote:
> I'd also appreciate +1 and -1 votes on the overall idea, in case this entire 
> feature, regardless of implementation, is simply something the community does 
> not want.
> 
> -1, at least as part of core.  My question would be how much of this is would 
> be needed if someone were to create an external project that installed a "pg" 
> command on top of an existing PostgreSQL installation.  Or put differently, 
> how many of the changes to the existing binaries are required versus 
> nice-to-have?

If the only goal of something like this were to have a frontend that could 
execute the various postgres binaries, then I'd say no changes to those 
binaries would be needed, and the frontend would not be worth very much.  The 
value in having the frontend is that it makes it less difficult to introduce 
new commands to the postgres suite of commands, as you don't need to worry 
about whether another executable by the same name might happen to already exist 
somewhere.  Even introducing a command named "pg" has already gotten such a 
response on this thread.  By having the commands installed in postgres's 
libexec rather than bin, you can put whatever commands you want in libexec 
without worrying about conflicts.  That still leaves open the question of 
whether existing commands get moved into libexec, and if so, if they keep the 
same name.  An external project for this would be worthless in this regard, as 
the community wouldn't get any benefit when debating the merits of introducing 
a new command vs. the potential for conflicts.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: password_encryption default

2020-05-27 Thread Stephen Frost
Greetings,

* Jonathan S. Katz (jk...@postgresql.org) wrote:
> On 5/27/20 9:13 AM, Michael Paquier wrote:
> > On Wed, May 27, 2020 at 02:56:34PM +0200, Magnus Hagander wrote:
> >> Seems like the better choice yeah. Since we're changing the default anyway,
> >> maybe now is the time to do that? Or if not, maybe have it log an explicit
> >> deprecation warning when it loads a config with it?
> > 
> > Not sure that's worth it here, so I would just remove the whole.  It
> > would be confusing to keep the past values and have them map to
> > something we think is not an appropriate default.
> 
> +1 to removing the legacy options. It could break some people on legacy
> upgrades, but my guess would be that said situations are very small, and
> we would document the removal of these as "breaking changes" in the
> release notes.

Agreed- let's remove the legacy options.  As I've mentioned elsewhere,
distros may manage the issue for us, and if we want to get into it, we
could consider adding support to pg_upgrade to complain if it comes
across a legacy setting that isn't valid.  I'm not sure that's
worthwhile though.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Make the qual cost on index Filter slightly higher than qual cost on index Cond.

2020-05-27 Thread Andy Fan
On Wed, May 27, 2020 at 8:01 PM Ashutosh Bapat <
ashutosh.ba...@2ndquadrant.com> wrote:

>
>
> On Wed, 27 May 2020 at 04:43, Andy Fan  wrote:
>
>> You can use the attached sql to reproduce this issue, but I'm not sure
>> you can
>> get the above result at the first time that is because when optimizer
>> think the
>> 2 index scan have the same cost, it will choose the first one it found,
>> the order
>> depends on RelationGetIndexList.  If so,  you may try drop and create
>> j1_i_im5 index.
>>
>> The sense behind this patch is we still use the cost based optimizer,
>> just when we
>> we find out the 2 index scans have the same cost,  we prefer to use the
>> index which
>> have more qual filter on Index Cond.  This is implemented by adjust the
>> qual cost
>> on index filter slightly higher.
>>
>
> Thanks for the example and the explanation.
>
> The execution time difference in your example is pretty high to account
> for executing the filter on so many rows. My guess is this has to do with
> the heap access. For applying the filter the entire row needs to be fetched
> from the heap. So we should investigate this case from that angle. Another
> guess I have is the statistics is not correct and hence the cost is wrong.
>
>
I believe this is a statistics issue and then the cost is wrong.  More
characters of this
issue are:  1).  If a data is out of range in the old statistics,
optimizer will given an 1 row
assumption.  2).  based on the 1 row assumption,  for query
"col1=out_of_range_val AND
col2 = any_value"   Index (col1, col2) and (col1, col3) will have exactly
same cost for current
cost model. 3).  If the statistics was wrong, (col1, col3) maybe a very bad
plan as shown
above, but index (col1, col2) should  always better/no worse than (col1,
col3) in any case.
4). To expand the rule, for query "col1 = out_of_range_val AND col2 =
any_value AND col3 = any_val",
index are (col1, col2, col_m) and (col1, col_m, col_n),  the former index
will aways has better/no worse
than the later one.  5). an statistics issue like this is not  uncommon,
for example
an log based application, creation_date is very easy to out of range in
statistics.

so we need to optimize the cost model for such case, the method is the
patch I mentioned above.
I can't have a solid data to prove oracle did something similar, but based
on the talk with my
customer,  oracle is likely did something like this.

-- 
Best Regards
Andy Fan


Re: password_encryption default

2020-05-27 Thread Jonathan S. Katz
On 5/27/20 9:13 AM, Michael Paquier wrote:
> On Wed, May 27, 2020 at 02:56:34PM +0200, Magnus Hagander wrote:
>> Seems like the better choice yeah. Since we're changing the default anyway,
>> maybe now is the time to do that? Or if not, maybe have it log an explicit
>> deprecation warning when it loads a config with it?
> 
> Not sure that's worth it here, so I would just remove the whole.  It
> would be confusing to keep the past values and have them map to
> something we think is not an appropriate default.

+1 to removing the legacy options. It could break some people on legacy
upgrades, but my guess would be that said situations are very small, and
we would document the removal of these as "breaking changes" in the
release notes.

Jonathan



signature.asc
Description: OpenPGP digital signature


Re: password_encryption default

2020-05-27 Thread Jonathan S. Katz
On 5/26/20 4:25 AM, Peter Eisentraut wrote:
> On 2020-05-25 17:57, Jonathan S. Katz wrote:
>> I took a look over, it looks good. One question on the initdb.c diff:
>>
>> -    if (strcmp(authmethodlocal, "scram-sha-256") == 0 ||
>> -    strcmp(authmethodhost, "scram-sha-256") == 0)
>> -    {
>> -    conflines = replace_token(conflines,
>> -  "#password_encryption = md5",
>> -  "password_encryption =
>> scram-sha-256");
>> -    }
>> -
>>
>> Would we reverse this, i.e. if someone chooses authmethodlocal to be
>> "md5", we would then set "password_encryption = md5"?
> 
> Yeah, I was too enthusiastic about removing that.  Here is a better patch.

Did some testing. Overall it looks good. Here are my test cases and what
happened:

$ initdb -D data

Deferred password_encryption to the default, confirmed it was indeed scram

$ initdb -D data --auth-local=md5


Set password_encryption to md5

$ initdb -D data --auth-host=md5

Set password_encryption to md5

$ initdb -D data --auth-local=md5 --auth-host=scram-sha-256

Got an error message:

initdb: error: must specify a password for the superuser to enable
scram-sha-256 authentication

$ initdb -D data --auth-local=scram-sha-256 --auth-host=md5

Got an error message:

"initdb: error: must specify a password for the superuser to enable md5
authentication"

For the last two, that behavior is to be expected (after all, you've set
the two login vectors to require passwords), but the error message seems
odd now. Perhaps we tweak it to be:


"initdb: error: must specify a password for the superuser when requiring
passwords for both local and host authentication."

Another option could be to set the message based on whether both
local/host have the same setting, and then default to something like the
above when they differ.

Other than that, looks great!

Jonathan



signature.asc
Description: OpenPGP digital signature


Re: Getting ERROR with FOR UPDATE/SHARE for partitioned table.

2020-05-27 Thread Amit Langote
On Wed, May 27, 2020 at 9:11 PM Ashutosh Bapat
 wrote:
> On Wed, May 27, 2020 at 12:53 PM Amit Langote  wrote:
> > On Fri, May 22, 2020 at 9:09 PM amul sul  wrote:
> > > I tried similar things on inherit partitioning as follow and that looks 
> > > fine:
> > >
> > > DROP TABLE tbl;
> > > CREATE TABLE tbl (c1 INT,c2 TEXT);
> > > CREATE TABLE tbl_null(check (c1 is NULL)) INHERITS (tbl);
> > > CREATE TABLE tbl_1 (check (c1 > 0 and c1 < 4)) INHERITS (tbl);
> > > INSERT INTO tbl_1 VALUES(generate_series(1,3));
> > >
> > > postgres=# SELECT func(10);
> > >  func
> > > --
> > >10
> > > (1 row)
> > >
> > > On looking further for declarative partition, I found that issue happens 
> > > only if
> > > the partitioning pruning enabled, see this:
> > >
> > > -- Execute on original set of test case.
> > > postgres=# ALTER FUNCTION func SET enable_partition_pruning to off;
> > > ALTER FUNCTION
> > >
> > > postgres=# SELECT func(10);
> > >  func
> > > --
> > >10
> > > (1 row)
> > >
> > > I think we need some indication in execCurrentOf() to skip error if the 
> > > relation
> > > is pruned.  Something like that we already doing for inheriting 
> > > partitioning,
> > > see following comment execCurrentOf():
> > >
> > > /*
> > >  * This table didn't produce the cursor's current row; some other
> > >  * inheritance child of the same parent must have.  Signal caller 
> > > to
> > >  * do nothing on this table.
> > >  */
> >
> > Actually, if you declare the cursor without FOR SHARE/UPDATE, the case
> > would fail even with traditional inheritance:
> >
> > drop table if exists p cascade;
> > create table p (a int);
> > create table c (check (a = 2)) inherits (p);
> > insert into p values (1);
> > insert into c values (2);
> > begin;
> > declare c cursor for select * from p where a = 1;
> > fetch c;
> > update p set a = a where current of c;
> > ERROR:  cursor "c" is not a simply updatable scan of table "c"
> > ROLLBACK
> >
> > When there are no RowMarks to use because no FOR SHARE/UPDATE clause
> > was specified when declaring the cursor, execCurrentOf() tries to find
> > the cursor's current table by looking up its Scan node in the plan
> > tree but will not find it if it was excluded in the cursor's query.
> >
> > With FOR SHARE/UPDATE, it seems to work because the planner delivers
> > the RowMarks of all the children irrespective of whether or not they
> > are present in the plan tree itself (something I had complained about
> > in past [1]).  execCurrentOf() doesn't complain as long as there is a
> > RowMark present even if it's never used.  For partitioning, the
> > planner doesn't make RowMarks for pruned partitions, so
> > execCurrentOf() can't find one if it's passed a pruned partition's
> > oid.
>
> I am missing something in this explanation. WHERE CURRENT OF works on
> the row that was last fetched from a cursor. How could a pruned
> partition's row be fetched and thus cause this error.

So in Rajkumar's example, the cursor is declared as:

CURSOR IS SELECT * FROM tbl WHERE c1< 5 FOR UPDATE;

and the WHERE CURRENT OF query is this:

 UPDATE tbl SET c2='aa' WHERE CURRENT OF cur;

You can see that the UPDATE will scan all partitions, whereas the
cursor's query does not.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com




Re: password_encryption default

2020-05-27 Thread Michael Paquier
On Wed, May 27, 2020 at 02:56:34PM +0200, Magnus Hagander wrote:
> Seems like the better choice yeah. Since we're changing the default anyway,
> maybe now is the time to do that? Or if not, maybe have it log an explicit
> deprecation warning when it loads a config with it?

Not sure that's worth it here, so I would just remove the whole.  It
would be confusing to keep the past values and have them map to
something we think is not an appropriate default.
--
Michael


signature.asc
Description: PGP signature


Re: password_encryption default

2020-05-27 Thread Magnus Hagander
On Wed, May 27, 2020 at 8:29 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2020-05-27 08:00, Michael Paquier wrote:
> > On Tue, May 26, 2020 at 10:25:25AM +0200, Peter Eisentraut wrote:
> >> Yeah, I was too enthusiastic about removing that.  Here is a better
> patch.
> >
> > +as an MD5 hash.  (on is also accepted, as an
> alias
> > +for md5.)  The default is
> > +scram-sha-256.
> > Shouldn't password_encryption = on/true/1/yes be an equivalent of
> > scram-sha-256 as the default gets changed?
>
> I think these are mostly legacy options anyway, so if we wanted to make
> a change, we should remove them.
>

Seems like the better choice yeah. Since we're changing the default anyway,
maybe now is the time to do that? Or if not, maybe have it log an explicit
deprecation warning when it loads a config with it?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Getting ERROR with FOR UPDATE/SHARE for partitioned table.

2020-05-27 Thread Ashutosh Bapat
On Wed, May 27, 2020 at 12:53 PM Amit Langote  wrote:
>
> On Fri, May 22, 2020 at 9:09 PM amul sul  wrote:
> > I tried similar things on inherit partitioning as follow and that looks 
> > fine:
> >
> > DROP TABLE tbl;
> > CREATE TABLE tbl (c1 INT,c2 TEXT);
> > CREATE TABLE tbl_null(check (c1 is NULL)) INHERITS (tbl);
> > CREATE TABLE tbl_1 (check (c1 > 0 and c1 < 4)) INHERITS (tbl);
> > INSERT INTO tbl_1 VALUES(generate_series(1,3));
> >
> > postgres=# SELECT func(10);
> >  func
> > --
> >10
> > (1 row)
> >
> > On looking further for declarative partition, I found that issue happens 
> > only if
> > the partitioning pruning enabled, see this:
> >
> > -- Execute on original set of test case.
> > postgres=# ALTER FUNCTION func SET enable_partition_pruning to off;
> > ALTER FUNCTION
> >
> > postgres=# SELECT func(10);
> >  func
> > --
> >10
> > (1 row)
> >
> > I think we need some indication in execCurrentOf() to skip error if the 
> > relation
> > is pruned.  Something like that we already doing for inheriting 
> > partitioning,
> > see following comment execCurrentOf():
> >
> > /*
> >  * This table didn't produce the cursor's current row; some other
> >  * inheritance child of the same parent must have.  Signal caller to
> >  * do nothing on this table.
> >  */
>
> Actually, if you declare the cursor without FOR SHARE/UPDATE, the case
> would fail even with traditional inheritance:
>
> drop table if exists p cascade;
> create table p (a int);
> create table c (check (a = 2)) inherits (p);
> insert into p values (1);
> insert into c values (2);
> begin;
> declare c cursor for select * from p where a = 1;
> fetch c;
> update p set a = a where current of c;
> ERROR:  cursor "c" is not a simply updatable scan of table "c"
> ROLLBACK
>
> When there are no RowMarks to use because no FOR SHARE/UPDATE clause
> was specified when declaring the cursor, execCurrentOf() tries to find
> the cursor's current table by looking up its Scan node in the plan
> tree but will not find it if it was excluded in the cursor's query.
>
> With FOR SHARE/UPDATE, it seems to work because the planner delivers
> the RowMarks of all the children irrespective of whether or not they
> are present in the plan tree itself (something I had complained about
> in past [1]).  execCurrentOf() doesn't complain as long as there is a
> RowMark present even if it's never used.  For partitioning, the
> planner doesn't make RowMarks for pruned partitions, so
> execCurrentOf() can't find one if it's passed a pruned partition's
> oid.

I am missing something in this explanation. WHERE CURRENT OF works on
the row that was last fetched from a cursor. How could a pruned
partition's row be fetched and thus cause this error.
-- 
Best Wishes,
Ashutosh Bapat




Re: Make the qual cost on index Filter slightly higher than qual cost on index Cond.

2020-05-27 Thread Ashutosh Bapat
On Wed, 27 May 2020 at 04:43, Andy Fan  wrote:

> You can use the attached sql to reproduce this issue, but I'm not sure you
> can
> get the above result at the first time that is because when optimizer
> think the
> 2 index scan have the same cost, it will choose the first one it found,
> the order
> depends on RelationGetIndexList.  If so,  you may try drop and create
> j1_i_im5 index.
>
> The sense behind this patch is we still use the cost based optimizer, just
> when we
> we find out the 2 index scans have the same cost,  we prefer to use the
> index which
> have more qual filter on Index Cond.  This is implemented by adjust the
> qual cost
> on index filter slightly higher.
>

Thanks for the example and the explanation.

The execution time difference in your example is pretty high to account for
executing the filter on so many rows. My guess is this has to do with the
heap access. For applying the filter the entire row needs to be fetched
from the heap. So we should investigate this case from that angle. Another
guess I have is the statistics is not correct and hence the cost is wrong.

-- 
Best Wishes,
Ashutosh


Re: tablespace_map code cleanup

2020-05-27 Thread Robert Haas
On Wed, May 13, 2020 at 10:43 PM Kyotaro Horiguchi
 wrote:
> About 0002,
>
> +   boolsendtblspclinks = true;
>
> The boolean seems to me useless since it is always the inverse of
> opt->sendtblspcmapfile when it is used.

Well, I think it might have some documentation value, to clarify that
whether or not we send tablespace links is the opposite of whether we
send the tablespace map file.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-05-27 Thread Mahendra Singh Thalor
On Tue, 26 May 2020 at 16:46, Amit Kapila  wrote:
>
> On Tue, May 26, 2020 at 2:44 PM Dilip Kumar  wrote:
> >
> > On Tue, May 26, 2020 at 10:27 AM Amit Kapila 
wrote:
> > >
> > > >
> > > > 2. There is a bug fix in handling the stream abort in 0008 (earlier
it
> > > > was 0006).
> > > >
> > >
> > > The code changes look fine but it is not clear what was the exact
> > > issue.  Can you explain?
> >
> > Basically, in case of an empty subtransaction, we were reading the
> > subxacts info but when we could not find the subxid in the subxacts
> > info we were not releasing the memory.  So on next subxact_info_read
> > it will expect that subxacts should be freed but we did not free it in
> > that !found case.
> >
>
> Okay, on looking at it again, the same code exists in
> subxact_info_write as well.  It is better to have a function for it.
> Can we have a structure like SubXactContext for all the variables used
> for subxact?  As mentioned earlier I find the allocation/deallocation
> of subxacts a bit ad-hoc, so there will always be a chance that we can
> forget to free it.  Having it allocated in memory context which we can
> reset later might reduce that risk.  One idea could be that we have a
> special memory context for start and stop messages which can be used
> to allocate the subxacts there.  In case of commit/abort, we can allow
> subxacts information to be allocated in ApplyMessageContext which is
> reset at the end of each protocol message.
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>
>

Hi all,
On the top of v16 patch set [1], I did some testing for DDL's and DML's to
test wal size and performance. Below is the testing summary;

*Test parameters:*
wal_level= 'logical
max_connections = '150'
wal_receiver_timeout = '600s'
max_wal_size = '2GB'
min_wal_size = '2GB'
autovacuum= 'off'
checkpoint_timeout= '1d'

*Test results:*

CREATE index operations Add col int(date) operations Add col text operations
SN. operation name LSN diff (in bytes) time (in sec) % LSN change LSN diff
(in bytes) time (in sec) % LSN change LSN diff (in bytes) time (in sec) %
LSN change

1
1 DDL without patch 17728 0.89116
1.624548
976 0.764393
11.475409
33904 0.80044
2.80792
with patch 18016 0.804868 1088 0.763602 34856 0.787108

2
2 DDL without patch 19872 0.860348
2.73752
1632 0.763199
13.7254902
34560 0.806086
3.078703
with patch 20416 0.839065 1856 0.733147 35624 0.829281

3
3 DDL without patch 22016 0.894891
3.63372093
2288 0.776871
14.685314
35216 0.803493
3.339391186
with patch 22816 0.828028 2624 0.737177 36392 0.800194

4
4 DDL without patch 24160 0.901686
4.4701986
2944 0.768445
15.217391
35872 0.77489
3.590544
with patch 25240 0.887143 3392 0.768382 37160 0.82777

5
5 DDL without patch 26328 0.901686
4.9832877
3600 0.751879
15.55
36528 0.817928
3.832676
with patch 27640 0.914078 4160 0.74709 37928 0.820621

6
6 DDL without patch 28472 0.936385
5.5071649
4256 0.745179
15.78947368
37184 0.797043
4.066265
with patch 30040 0.958226 4928 0.725321 38696 0.814535

7
8 DDL without patch 32760 1.0022203
6.422466
5568 0.757468
16.091954
38496 0.83207
4.509559
with patch 34864 0.966777 6464 0.769072 40232 0.903604

8
11 DDL without patch 50296 1.0022203
5.662478
7536 0.748332
16.66
40464 0.822266
5.179913
with patch 53144 0.966777 8792 0.750553 42560 0.797133

9
15 DDL without patch <#gid=2095312519=B9> 58896 1.267253
5.662478
10184 0.776875
16.496465
43112 0.821916
5.84524
with patch 62768 1.27234 11864 0.746844 45632 0.812567

10
1 DDL & 3 DML without patch 18240 0.812551
1.6228
1192 0.771993
10.067114
34120 0.849467
2.8113599
with patch 18536 0.819089 1312 0.785117 35080 0.855456

11
3 DDL & 5 DML without patch 23656 0.926616
3.4832606
2656 0.758029
13.55421687
35584 0.829377
3.372302
with patch 24480 0.915517 3016 0.797206 36784 0.839176

12
10 DDL & 5 DML without patch 52760 1.101005
4.958301744
7288 0.763065
16.02634468
40216 0.837843
4.993037
with patch 55376 1.105241 8456 0.779257 42224 0.835206

13
10 DML without patch 1008 0.791091
6.349206
1008 0.81105
6.349206
1008 0.78817
6.349206
with patch 1072 0.807875 1072 0.771113 1072 0.759789

To see all operations, please see[2] test_results


*Summary:*
Basically, we are writing per command invalidation message and for testing
that I have tested with different combinations of the DDL and DML
operation.  I have not observed any performance degradation with the patch.
For "create index" DDL's, %change in wal is 1-7% for 1-15 DDL's. For "add
col int/date" DDL's, it is 11-17% for 1-15 DDL's and for "add col text"
DDL's, it is 2-6% for 1-15 DDL's. For mix (DDL & DML), it is 2-10%.

why are we seeing 11-13 % of the extra wall, basically,  the amount of
extra WAL is not very high but the amount of WAL generated with add column
int/date is just ~1000 bytes so additional 100 bytes will be around 10% and
for add column text it is  ~35000 

Re: Inlining of couple of functions in pl_exec.c improves performance

2020-05-27 Thread Amit Khandekar
On Tue, 26 May 2020 at 09:06, Amit Khandekar  wrote:
>
> On Sat, 23 May 2020 at 23:24, Pavel Stehule  wrote:
> >
> >FOR counter IN 1..180 LOOP
> >   id = 0; id = 0; id1 = 0;
> >   id2 = 0; id3 = 0; id1 = 0; id2 = 0;
> >   id3 = 0; id = 0; id = 0; id1 = 0;
> >   id2 = 0; id3 = 0; id1 = 0; id2 = 0;
> >   id3 = 0;
> >END LOOP;
> >
> > This is not too much typical PLpgSQL code. All expressions are not 
> > parametrized - so this test is little bit obscure.
> >
> > Last strange performance plpgsql benchmark did calculation of pi value. It 
> > does something real
>
> Yeah, basically I wanted to have many statements, and that too with
> many assignments where casts are not required. Let me check if I can
> come up with a real-enough testcase. Thanks.

create table tab (id int[]);
insert into tab select array((select ((random() * 10)::bigint) id
from generate_series(1, 3) order by 1));
insert into tab select array((select ((random() * 60)::bigint) id
from generate_series(1, 3) order by 1));
insert into tab select array((select ((random() * 100)::bigint) id
from generate_series(1, 3) order by 1));
insert into tab select array((select ((random() * 10)::bigint) id
from generate_series(1, 3) order by 1));
insert into tab select array((select ((random() * 60)::bigint) id
from generate_series(1, 3) order by 1));
insert into tab select array((select ((random() * 100)::bigint) id
from generate_series(1, 3) order by 1));
insert into tab select array((select ((random() * 10)::bigint) id
from generate_series(1, 3) order by 1));
insert into tab select array((select ((random() * 60)::bigint) id
from generate_series(1, 3) order by 1));
insert into tab select array((select ((random() * 100)::bigint) id
from generate_series(1, 3) order by 1));


-- Return how much two consecutive array elements are apart from each
other, on average; i.e. how much the numbers are spaced out.
-- Input is an ordered array of integers.
CREATE OR REPLACE FUNCTION avg_space(int[]) RETURNS bigint AS $$
DECLARE
  diff int = 0;
  num int;
  prevnum int = 1;
BEGIN
  FOREACH num IN ARRAY $1
  LOOP
diff = diff + num - prevnum;
prevnum = num;
  END LOOP;
  RETURN diff/array_length($1, 1);
END;
$$ LANGUAGE plpgsql;

explain analyze select avg_space(id) from tab;
Like earlier figures, these are execution times in milliseconds, taken
from explain-analyze.
ARM VM:
   HEAD : 49.8
   patch 0001+0002   : 47.8 => 4.2%
   patch 0001+0002+0003 : 42.9 => 16.1%
x86 VM:
   HEAD : 32.8
   patch 0001+0002   : 32.7 => 0%
   patch 0001+0002+0003 : 28.0 => 17.1%




Re: Default gucs for EXPLAIN

2020-05-27 Thread Vik Fearing
On 5/27/20 7:27 AM, David G. Johnston wrote:
> On Tuesday, May 26, 2020, David Rowley  wrote:
> 
>> On Tue, 26 May 2020 at 23:59, Vik Fearing  wrote:
>>> Are you saying we should have all new EXPLAIN options off forever into
>>> the future because apps won't know about the new data?  I guess we
>>> should also not ever introduce new plan nodes because those won't be
>>> known either.
>>
>> Another argument against this is that it creates dependency among the
>> new GUCs. Many of the options are not compatible with each other. e.g.
>>
>> postgres=# explain (timing on) select 1;
>> ERROR:  EXPLAIN option TIMING requires ANALYZE
>>
>> Would you propose we just error out in that case, or should we
>> silently enable the required option, or disable the conflicting
>> option?
>>
>>
> The same thing we do today...ignore options that require analyze if analyze
> is not specified.  There are no other options documented that are dependent
> with options besides than analyze.  The docs say timing defaults to on, its
> only when explicitly specified instead of being treated as a default that
> the user message appears.  All the GUCs are doing is changing the default.


Yes, the patch handles this case the way you describe.  In fact, the
patch doesn't (or shouldn't) change any behavior at all.
-- 
Vik Fearing




Re: Trouble with hashagg spill I/O pattern and costing

2020-05-27 Thread Tomas Vondra

On Tue, May 26, 2020 at 06:42:50PM -0700, Melanie Plageman wrote:

On Tue, May 26, 2020 at 5:40 PM Jeff Davis  wrote:


On Tue, 2020-05-26 at 21:15 +0200, Tomas Vondra wrote:
>
> As for the tlist fix, I think that's mostly ready too - the one thing
> we
> should do is probably only doing it for AGG_HASHED. For AGG_SORTED
> it's
> not really necessary.

Melanie previously posted a patch to avoid spilling unneeded columns,
but it introduced more code:



https://www.postgresql.org/message-id/caakru_aefesv+ukqwqu+ioenoil2lju9diuh9br8mbyxuz0...@mail.gmail.com

and it seems that Heikki also looked at it. Perhaps we should get an
acknowledgement from one of them that your one-line change is the right
approach?



I spent some time looking at it today, and, it turns out I was wrong.

I thought that there was a case I had found where CP_SMALL_TLIST did not
eliminate as many columns as could be eliminated for the purposes of
spilling, but, that turned out not to be the case.

I changed CP_LABEL_TLIST to CP_SMALL_TLIST in
create_groupingsets_plan(), create_agg_plan(), etc and tried a bunch of
different queries and this 2-3 line change worked for all the cases I
tried. Is that where you made the change?


I've only made the change in create_agg_plan, because that's what was in
the query plan I was investigating. You may be right that the same fix
is needed in additional places, though.


And then are you proposing to set it based on the aggstrategy to either
CP_LABEL_TLIST or CP_SMALL_TLIST here?



Yes, something like that. The patch I shared on on 5/21 just changed
that, but I'm wondering if that could add overhead for sorted
aggregation, which already does the projection thanks to the sort.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: New 'pg' consolidated metacommand patch

2020-05-27 Thread Magnus Hagander
On Wed, May 27, 2020 at 1:19 AM Mark Dilger 
wrote:

> Hackers,
>
> Attached is a patch for a `pg' command that consolidates various
> PostgreSQL functionality into a single command, along the lines of how
> `git' commands are run from a single 'git' executable.  In other words,
>
>   `pg upgrade`   # instead of `pg_upgrade`
>   `pg resetwal`   # instead of `pg_resetwal`
>
> This has been discussed before on the -hackers list, but I don't recall
> seeing a patch.  I'm submitting this patch mostly as a way of moving the
> conversation along, fully expecting the community to want some (or all) of
> what I wrote to be changed.
>

As mentioned at least once before, the "pg" name is already taken in posix.
Granted it has been removed now, but it was removed from posix in 2018,
which I think is nowhere near soon enough to "steal. See for example
https://en.wikipedia.org/wiki/Pg_(Unix)



All other executables have been moved to LIBEXECDIR where they retain their
> old names and can still be run directly from the command line.  If we
> committed this patch for v14, I think it makes sense that packagers could
> put the LIBEXECDIR in the PATH so that 3rd-party scripts which call pg_ctl,
> initdb, etc. continue to work.


I would definitely not expect a packager to change the PATH, as also
mentioned by others. More likely options would be to symlink the binaries
into the actual  bindir, or just set both those directories to the same one
(in the path) for a number of releases as a transition.

But you should definitely poll the packagers separately to make sure
something is done that works well for them -- especially when it comes to
integrating with for example the debian/ubuntu wrapper system that already
supports multiple parallel installs. And mind that they don't typically
follow hackers actively (I think), so it would be worthwhile to bring their
attention specifically to the thread. In many ways I'd find them more
important to get input from than most "other hackers" :)



> For that reason, I did not change the names of the executables, merely
> their location.  During conversations with Robert off-list, we discussed
> renaming the executables to things like 'pg-ctl' (hyphen rather than
> underscore), mostly because that's the more modern way of doing it and
> follows what 'git' does.  To avoid breaking scripts that execute these
> commands by the old name, this patch doesn't go that far.  It also leaves
> the usage() functions alone such that when they report their own progname
> in the usage text, they do so under the old name.  This would need to
> change at some point, but I'm unclear on whether that would be for v14 or
> if it would be delayed.
>

Ugh, yeah, please don't do that. Renaming them just to make it "look more
modern" helps nobody, really. Especially if the suggestion is people should
be using the shared-launcher binary anyway.

usage() seems more reasonable to change as part of a patch like this.


The binaries 'createuser' and 'dropuser' might be better named 'createrole'
> and 'droprole'.  I don't currently have aliases in this patch, but it might
> make sense to allow 'pg createrole' as a synonym for 'pg createuser' and
> 'pg droprole' as a synonym for 'pg dropuser'.  I have not pursued that yet,
> largely because as soon as you go that route, it starts making sense to
> have things like 'pg drop user', 'pg cluster db' and so forth, with the
> extra spaces.  How far would people want me to go in this direction?
>

I'd say a createrole would make sense, but certainly not a "create role".
You'd end up with unlimited number of commands. But in either of them, I'd
say keep aliases completely out of it for a first iteration.


Prior to this patch, postgres binaries that need to execute other postgres
> binaries determine the BINDIR using find_my_exec and trimming off their own
> executable name.  They can then assume the other binary is in that same
> directory.  After this patch, binaries need to find the common prefix
> ROOTDIR = commonprefix(BINDIR,LIBEXECDIR) and then assume the other binary
> is either in ROOTDIR/binsuffix or ROOTDIR/libexecsuffix.  This may cause
> problems on windows if BINDIR and LIBEXECDIR are configured on different
> drives, as there won't be a common prefix of C:\my\pg\bin and
> D:\my\pg\libexec.  I'm hoping somebody with more Windows savvy expresses an
> opinion about how to handle this.
>

Maybe the "pg" binary could just pass down it's own location as a parameter
to the binary it calls, thereby making sure that binary has direct access
to both?


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: New 'pg' consolidated metacommand patch

2020-05-27 Thread Dave Page
Hi

On Wed, May 27, 2020 at 12:19 AM Mark Dilger 
wrote:

>
> I think it makes sense that packagers could put the LIBEXECDIR in the PATH
> so that 3rd-party scripts which call pg_ctl, initdb, etc. continue to
> work.


Having packages that futz with the PATH is generally a bad idea, especially
those that support side-by-side installations of different versions. None
of ours (EDBs) will be doing so.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: max_slot_wal_keep_size comment in postgresql.conf

2020-05-27 Thread Michael Paquier
On Wed, May 27, 2020 at 04:21:59PM +0900, Kyotaro Horiguchi wrote:
> I don't oppose to full-spelling.  How about the attached?

No problem from me.
--
Michael


signature.asc
Description: PGP signature


Re: repeat() function, CHECK_FOR_INTERRUPTS(), and unlikely()

2020-05-27 Thread Michael Paquier
On Mon, May 25, 2020 at 11:14:39AM -0400, Tom Lane wrote:
> Perhaps I'm an optimist, but I think that eventually we will figure out
> how to make unlikely() work for MSVC.  In the meantime we might as well
> let it work for gcc-on-Windows builds.

I am less optimistic than that, but there is hope.  This was mentioned
as something considered for implementation in April 2019:
https://developercommunity.visualstudio.com/idea/488669/please-add-likelyunlikely-builtins.html

> I think that each of those tests should have a separate unlikely() marker,
> since the whole point here is that we don't expect either of those tests
> to yield true in the huge majority of CHECK_FOR_INTERRUPTS executions.

+1.  I am not sure that the addition of unlikely() should be
backpatched though, that's not something usually done.
--
Michael


signature.asc
Description: PGP signature


Re: Getting ERROR with FOR UPDATE/SHARE for partitioned table.

2020-05-27 Thread Amit Langote
On Fri, May 22, 2020 at 9:09 PM amul sul  wrote:
> I tried similar things on inherit partitioning as follow and that looks fine:
>
> DROP TABLE tbl;
> CREATE TABLE tbl (c1 INT,c2 TEXT);
> CREATE TABLE tbl_null(check (c1 is NULL)) INHERITS (tbl);
> CREATE TABLE tbl_1 (check (c1 > 0 and c1 < 4)) INHERITS (tbl);
> INSERT INTO tbl_1 VALUES(generate_series(1,3));
>
> postgres=# SELECT func(10);
>  func
> --
>10
> (1 row)
>
> On looking further for declarative partition, I found that issue happens only 
> if
> the partitioning pruning enabled, see this:
>
> -- Execute on original set of test case.
> postgres=# ALTER FUNCTION func SET enable_partition_pruning to off;
> ALTER FUNCTION
>
> postgres=# SELECT func(10);
>  func
> --
>10
> (1 row)
>
> I think we need some indication in execCurrentOf() to skip error if the 
> relation
> is pruned.  Something like that we already doing for inheriting partitioning,
> see following comment execCurrentOf():
>
> /*
>  * This table didn't produce the cursor's current row; some other
>  * inheritance child of the same parent must have.  Signal caller to
>  * do nothing on this table.
>  */

Actually, if you declare the cursor without FOR SHARE/UPDATE, the case
would fail even with traditional inheritance:

drop table if exists p cascade;
create table p (a int);
create table c (check (a = 2)) inherits (p);
insert into p values (1);
insert into c values (2);
begin;
declare c cursor for select * from p where a = 1;
fetch c;
update p set a = a where current of c;
ERROR:  cursor "c" is not a simply updatable scan of table "c"
ROLLBACK

When there are no RowMarks to use because no FOR SHARE/UPDATE clause
was specified when declaring the cursor, execCurrentOf() tries to find
the cursor's current table by looking up its Scan node in the plan
tree but will not find it if it was excluded in the cursor's query.

With FOR SHARE/UPDATE, it seems to work because the planner delivers
the RowMarks of all the children irrespective of whether or not they
are present in the plan tree itself (something I had complained about
in past [1]).  execCurrentOf() doesn't complain as long as there is a
RowMark present even if it's never used.  For partitioning, the
planner doesn't make RowMarks for pruned partitions, so
execCurrentOf() can't find one if it's passed a pruned partition's
oid.

I don't see a way to avoid these errors.  How does execCurrentOf()
distinguish a table that could *never* be present in a cursor from a
table that could be had it not been pruned/excluded?  If it can do
that, then give an error for the former and return false for the
latter.

I guess the workaround is to declare the cursor such that no
partitions/children are pruned/excluded.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

[1] 
https://www.postgresql.org/message-id/468c85d9-540e-66a2-1dde-fec2b741e688%40lab.ntt.co.jp




Re: max_slot_wal_keep_size comment in postgresql.conf

2020-05-27 Thread Kyotaro Horiguchi
At Wed, 27 May 2020 15:11:00 +0900, Michael Paquier  wrote 
in 
> On Wed, May 27, 2020 at 10:46:27AM +0900, Kyotaro Horiguchi wrote:
> > Agreed. It should be a leftover at the time the unit was changed
> > (before committed) to MB from bytes.  The default value makes the
> > confusion worse.
> > 
> > Is the following works?
> > 
> > #max_slot_wal_keep_size = -1  # in MB; -1 disables
> 
> Indeed, better to fix that.  The few GUCs using memory units that have
> such a mention in their comments use the actual name of the memory
> unit, and not its abbreviation (see log_temp_files).  So it seems more

I was not sure which is preferable.  Does that mean we will fix the
following, too?

> #temp_file_limit = -1   # limits per-process temp file space
> # in kB, or -1 for no limit

> logic to me to just use "in megabytes; -1 disables", that would be
> also more consistent with the time-unit-based ones.

I don't oppose to full-spelling.  How about the attached?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 0389911da7b2af708d518423e2c5cacebc4d5f31 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 27 May 2020 10:42:24 +0900
Subject: [PATCH v2] Change some comments in postgresql.conf.sample

Fix a lame comment for max_slot_wal_keep_size so that it suggests the
unit for the value.  On the way fixing that, fixed another instacne of
unit symbol (kB) used as unit name to its unit name (kilobytes).
---
 src/backend/utils/misc/postgresql.conf.sample | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 995b6ca155..81055edde7 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -149,7 +149,7 @@
 # - Disk -
 
 #temp_file_limit = -1			# limits per-process temp file space
-	# in kB, or -1 for no limit
+	# in kilobytes, or -1 for no limit
 
 # - Kernel Resources -
 
@@ -289,7 +289,7 @@
 #max_wal_senders = 10		# max number of walsender processes
 # (change requires restart)
 #wal_keep_segments = 0		# in logfile segments; 0 disables
-#max_slot_wal_keep_size = -1	# measured in bytes; -1 disables
+#max_slot_wal_keep_size = -1	# in megabytes; -1 disables
 #wal_sender_timeout = 60s	# in milliseconds; 0 disables
 
 #max_replication_slots = 10	# max number of replication slots
-- 
2.18.2



Re: race condition when writing pg_control

2020-05-27 Thread Michael Paquier
On Tue, May 26, 2020 at 07:30:54PM +, Bossart, Nathan wrote:
> While an assertion in UpdateControlFile() would not have helped us
> catch the problem I initially reported, it does seem worthwhile to add
> it.  I have attached a patch that adds this assertion and also
> attempts to fix XLogReportParameters().  Since there is only one place
> where we feel it is safe to call UpdateControlFile() without a lock, I
> just changed it to take the lock.  I don't think this adds any sort of
> significant contention risk, and IMO it is a bit cleaner than the
> boolean flag.

Let's see what Fujii-san and Thomas think about that.  I'd rather
avoid taking a lock here because we don't need it and because it makes
things IMO confusing with the beginning of StartupXLOG() where a lot
of the fields are read, even if we go without this extra assertion.

> For the XLogReportParameters() fix, I simply added an exclusive lock
> acquisition for the portion that updates the values in shared memory
> and calls UpdateControlFile().  IIUC the first part of this function
> that accesses several ControlFile values should be safe, as none of
> them can be updated after server start.

They can get updated when replaying a XLOG_PARAMETER_CHANGE record.
But you are right as all of this happens in the startup process, so
your patch looks right to me here.
--
Michael


signature.asc
Description: PGP signature


Re: max_slot_wal_keep_size comment in postgresql.conf

2020-05-27 Thread Fujii Masao




On 2020/05/27 15:11, Michael Paquier wrote:

On Wed, May 27, 2020 at 10:46:27AM +0900, Kyotaro Horiguchi wrote:

Agreed. It should be a leftover at the time the unit was changed
(before committed) to MB from bytes.  The default value makes the
confusion worse.

Is the following works?

#max_slot_wal_keep_size = -1  # in MB; -1 disables


Indeed, better to fix that.  The few GUCs using memory units that have
such a mention in their comments use the actual name of the memory
unit, and not its abbreviation (see log_temp_files).  So it seems more
logic to me to just use "in megabytes; -1 disables", that would be
also more consistent with the time-unit-based ones.


+1

#temp_file_limit = -1   # limits per-process temp file space
# in kB, or -1 for no limit

BTW, the abbreviation "in kB" is used in temp_file_limit.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: max_slot_wal_keep_size comment in postgresql.conf

2020-05-27 Thread Kyotaro Horiguchi
At Tue, 26 May 2020 22:56:39 -0400, Isaac Morland  
wrote in 
> On Tue, 26 May 2020 at 21:46, Kyotaro Horiguchi 
> wrote:
> 
> > At Tue, 26 May 2020 09:10:40 -0400, Jeff Janes 
> > wrote in
> > > In postgresql.conf, it says:
> > >
> > > #max_slot_wal_keep_size = -1  # measured in bytes; -1 disables
> > >
> > > I don't know if that is describing the dimension of this parameter or the
> > > units of it, but the default units for it are megabytes, not individual
> > > bytes, so I think it is pretty confusing.
> >
> > Agreed. It should be a leftover at the time the unit was changed
> > (before committed) to MB from bytes.  The default value makes the
> > confusion worse.
> >
> > Is the following works?
> >
> > #max_slot_wal_keep_size = -1  # in MB; -1 disables
> 
> 
> Extreme pedant question: Is it MB (10^6 bytes) or MiB (2^20 bytes)?


GUC variables for file/memory sizes are in a traditional
representation, that is, a power of two represented by
SI-prefixes. AFAICS PostgreSQL doesn't use binary-prefixed units.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: password_encryption default

2020-05-27 Thread Peter Eisentraut

On 2020-05-27 08:00, Michael Paquier wrote:

On Tue, May 26, 2020 at 10:25:25AM +0200, Peter Eisentraut wrote:

Yeah, I was too enthusiastic about removing that.  Here is a better patch.


+as an MD5 hash.  (on is also accepted, as an alias
+for md5.)  The default is
+scram-sha-256.
Shouldn't password_encryption = on/true/1/yes be an equivalent of
scram-sha-256 as the default gets changed?


I think these are mostly legacy options anyway, so if we wanted to make 
a change, we should remove them.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: max_slot_wal_keep_size comment in postgresql.conf

2020-05-27 Thread Michael Paquier
On Wed, May 27, 2020 at 10:46:27AM +0900, Kyotaro Horiguchi wrote:
> Agreed. It should be a leftover at the time the unit was changed
> (before committed) to MB from bytes.  The default value makes the
> confusion worse.
> 
> Is the following works?
> 
> #max_slot_wal_keep_size = -1  # in MB; -1 disables

Indeed, better to fix that.  The few GUCs using memory units that have
such a mention in their comments use the actual name of the memory
unit, and not its abbreviation (see log_temp_files).  So it seems more
logic to me to just use "in megabytes; -1 disables", that would be
also more consistent with the time-unit-based ones.
--
Michael


signature.asc
Description: PGP signature


Re: segmentation fault using currtid and partitioned tables

2020-05-27 Thread Michael Paquier
On Wed, May 27, 2020 at 12:29:39AM -0500, Jaime Casanova wrote:
> so, currently the patch just installs protections on both currtid_*
> functions and adds some tests... therefore we can consider it as a bug
> fix and let it go in 13? actually also backpatch in 12...

Yes, and it has the advantage to be simple.

> only point to mention is a typo (a missing "l") in an added comment:
> 
> + * currtid_byrename

Oops, thanks.
--
Michael


signature.asc
Description: PGP signature


Re: password_encryption default

2020-05-27 Thread Michael Paquier
On Tue, May 26, 2020 at 10:25:25AM +0200, Peter Eisentraut wrote:
> Yeah, I was too enthusiastic about removing that.  Here is a better patch.

+as an MD5 hash.  (on is also accepted, as an alias
+for md5.)  The default is
+scram-sha-256.
Shouldn't password_encryption = on/true/1/yes be an equivalent of
scram-sha-256 as the default gets changed?
--
Michael


signature.asc
Description: PGP signature