Re: [Proposal] Global temporary tables

2019-11-01 Thread Pavel Stehule
pá 1. 11. 2019 v 17:09 odesílatel Konstantin Knizhnik <
k.knizh...@postgrespro.ru> napsal:

>
>
> On 01.11.2019 18:26, Robert Haas wrote:
> > On Fri, Nov 1, 2019 at 11:15 AM Konstantin Knizhnik
> >  wrote:
> >> It seems to me that I have found quite elegant solution for per-backend
> statistic for GTT: I just inserting it in backend's catalog cache, but not
> in pg_statistic table itself.
> >> To do it I have to add InsertSysCache/InsertCatCache functions which
> insert pinned entry in the correspondent cache.
> >> I wonder if there are some pitfalls of such approach?
> > That sounds pretty hackish. You'd have to be very careful, for
> > example, that if the tables were dropped or re-analyzed, all of the
> > old entries got removed --
>
> I have checked it:
> - when table is reanalyzed, then cache entries are replaced.
> - when table is dropped, then cache entries are removed.
>
> > and then it would still fail if any code
> > tried to access the statistics directly from the table, rather than
> > via the caches. My assumption is that the statistics ought to be
> > stored in some backend-private data structure designed for that
> > purpose, and that the code that needs the data should be taught to
> > look for it there when the table is a GTT.
>
> Yes, if you do "select * from pg_statistic" then you will not see
> statistic for GTT in this case.
> But I do not think that it is so critical. I do not believe that anybody
> is trying to manually interpret values in this table.
> And optimizer is retrieving statistic through sys-cache mechanism and so
> is able to build correct plan in this case.
>

Years ago, when I though about it, I wrote patch with similar design. It's
working, but surely it's ugly.

I have another idea. Can be pg_statistics view instead a table?

Some like

SELECT * FROM pg_catalog.pg_statistics_rel
UNION ALL
SELECT * FROM pg_catalog.pg_statistics_gtt();

Internally - when stat cache is filled, then there can be used
pg_statistics_rel and pg_statistics_gtt() directly. What I remember, there
was not possibility to work with queries, only with just relations.

Or crazy idea - today we can implement own types of heaps. Is possible to
create engine where result can be combination of some shared data and local
data. So union will be implemented on heap level.
This implementation can be simple, just scanning pages from shared buffers
and from local buffers. For these tables we don't need complex metadata.
It's crazy idea, and I think so union with table function should be best.

Regards

Pavel





> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>


Re: [BUG] Partition creation fails after dropping a column and adding a partial index

2019-11-01 Thread Michael Paquier
On Fri, Nov 01, 2019 at 09:58:26AM +0900, Amit Langote wrote:
> On Thu, Oct 31, 2019 at 1:45 PM Michael Paquier  wrote:
>> The patch is rather simple as per the attached, with extended
>> regression tests included.  I have not checked on back-branches yet,
>> but that's visibly wrong since 8b08f7d down to v11 (will do that when
>> back-patching).
> 
> The patch looks correct and applies to both v12 and v11.

Thanks for the review, committed down to v11.  The version for v11 had
a couple of conflicts actually.

>> There could be a point in changing convert_tuples_by_name_map & co so
>> as they return the length of the map on top of the map to avoid such
>> mistakes in the future.  That's a more invasive patch not really
>> adapted for a back-patch, but we could do that on HEAD once this bug
>> is fixed.  I have also checked other calls of this API and the
>> handling is done correctly.
> 
> I've been bitten by this logical error when deciding what length to
> use for the map, so seems like a good idea.

Okay, let's see about that then.
--
Michael


signature.asc
Description: PGP signature


Re: Remove configure --disable-float4-byval and --disable-float8-byval

2019-11-01 Thread Peter Geoghegan
On Fri, Nov 1, 2019 at 7:47 PM Michael Paquier  wrote:
> > This line of argument seems to me to be the moral equivalent of
> > "let's drop 32-bit support altogether".  I'm not entirely on board
> > with that.  Certainly, a lot of the world is 64-bit these days,
> > but people are still building small systems and they might want
> > a database; preferably one that hasn't been detuned to the extent
> > that it barely manages to run at all on such a platform.  Making
> > a whole lot of internal APIs 64-bit would be a pretty big hit for
> > a 32-bit platform --- more instructions, more memory consumed for
> > things like Datum arrays, all in a memory space that's not that big.
>
> I don't agree as well with the line of arguments to just remove 32b
> support.

Clearly you didn't read what I actually wrote, Michael.

--
Peter Geoghegan




Re: Remove configure --disable-float4-byval and --disable-float8-byval

2019-11-01 Thread Andres Freund
On 2019-11-02 11:47:26 +0900, Michael Paquier wrote:
> On Fri, Nov 01, 2019 at 02:00:10PM -0400, Tom Lane wrote:
> > Peter Geoghegan  writes:
> >> Even Raspberry Pi devices (which can cost as little as $35) use 64-bit
> >> ARM processors. It's abundantly clear that 32-bit platforms do not
> >> matter enough to justify keeping all the SIZEOF_DATUM crud around.
> > 
> > This line of argument seems to me to be the moral equivalent of
> > "let's drop 32-bit support altogether".  I'm not entirely on board
> > with that.  Certainly, a lot of the world is 64-bit these days,
> > but people are still building small systems and they might want
> > a database; preferably one that hasn't been detuned to the extent
> > that it barely manages to run at all on such a platform.  Making
> > a whole lot of internal APIs 64-bit would be a pretty big hit for
> > a 32-bit platform --- more instructions, more memory consumed for
> > things like Datum arrays, all in a memory space that's not that big.
> 
> I don't agree as well with the line of arguments to just remove 32b
> support.

Nobody is actually proposing that, as far as I can tell? I mean, by all
means argue that the overhead is too high, but just claiming that
slowing down 32bit systems by a measurable fraction is morally
equivalent to removing 32bit support seems pointlessly facetious.

Greetings,

Andres Freund




Re: Remove configure --disable-float4-byval and --disable-float8-byval

2019-11-01 Thread Michael Paquier
On Fri, Nov 01, 2019 at 02:00:10PM -0400, Tom Lane wrote:
> Peter Geoghegan  writes:
>> Even Raspberry Pi devices (which can cost as little as $35) use 64-bit
>> ARM processors. It's abundantly clear that 32-bit platforms do not
>> matter enough to justify keeping all the SIZEOF_DATUM crud around.
> 
> This line of argument seems to me to be the moral equivalent of
> "let's drop 32-bit support altogether".  I'm not entirely on board
> with that.  Certainly, a lot of the world is 64-bit these days,
> but people are still building small systems and they might want
> a database; preferably one that hasn't been detuned to the extent
> that it barely manages to run at all on such a platform.  Making
> a whole lot of internal APIs 64-bit would be a pretty big hit for
> a 32-bit platform --- more instructions, more memory consumed for
> things like Datum arrays, all in a memory space that's not that big.

I don't agree as well with the line of arguments to just remove 32b
support.  The newest models of PI indeed use 64b ARM processors, but
the first model, as well as the PI zero are on 32b if I recall
correctly, and I would like to believe that these are still widely
used.
--
Michael


signature.asc
Description: PGP signature


Re: Make StringInfo available to frontend code.

2019-11-01 Thread Andres Freund
Hi,

On 2019-11-01 23:19:33 +0100, Daniel Gustafsson wrote:
> > On 30 Oct 2019, at 01:10, Andres Freund  wrote:
> 
> >Make StringInfo available to frontend code.
> 
> I’ve certainly wanted just that on multiple occasions, so +1 on this.

Cool.


> + * StringInfo provides an extensible string data type.  It can be used to
> 
> It might be useful to point out the upper bound on the extensibility in the
> rewrite of this sentence, and that it’s not guaranteed to be consistent 
> between
> frontend and backend.

Hm. Something like 'Currently the maximum length of a StringInfo is
1GB.'? I don't really think it's worth pointing out that they may not be
consistent, when they currently are...

And I suspect we should just fix the length limit to be higher for both,
perhaps somehow allowing to limit the length for the backend cases where
we want to error out if a string gets too long (possibly adding a
separate initialization API that allows to specify the memory allocation
flags or such).


> > I'm still using stringinfo in the aforementioned thread, and I also want
> > to use it in a few more places. On the more ambitious side I really
> > would like to have a minimal version of elog.h available in the backend,
> > and that would really be a lot easier with stringinfo available.
> 
> s/backend/frontend/?

Indeed.

Thanks for looking,

Andres Freund




Re: Rearranging ALTER TABLE to avoid multi-operations bugs

2019-11-01 Thread Tom Lane
I wrote:
> Anyway, with the benefit of more time to let this thing percolate
> in my hindbrain, I am thinking that the fundamental error we've made
> is to do transformAlterTableStmt in advance of execution *at all*.
> The idea I now have is to scrap that, and instead apply the
> parse_utilcmd.c transformations individually to each AlterTable
> subcommand when it reaches execution in "phase 2" of AlterTable().

Attached is a patch that does things that way.  This appears to fix
all of the previously reported order-of-operations bugs in ALTER
TABLE, although there's still some squirrely-ness around identity
columns.

My original thought of postponing all parse analysis into the
execution phase turned out to be not quite right.  We still want
to analyze ALTER COLUMN TYPE subcommands before we start doing
anything.  The reason why is that any USING expressions in those
subcommands should all be parsed against the table's starting
rowtype, since those expressions will all be evaluated against
that state during a single rewrite pass in phase 3.  Fortunately
(but not coincidentally, I think) the execution-passes design is
"DROP, then ALTER COLUMN TYPE, then everything else", so that this
is okay.

I had to do some other finagling to get it to work, notably breaking
down some of the passes a bit more.  This allows us to have a rule
that any new subcommands deduced during mid-execution parse analysis
steps will be executed in a strictly later pass.  It might've been
possible to allow it to be "same pass", but I thought that would
be putting an undesirable amount of reliance on the semantics of
appending to a list that some other function is busy scanning.

What I did about the API issues we were arguing about before was
just to move the logic ProcessUtilitySlow had for handling
non-AlterTableStmts generated by ALTER TABLE parse analysis into
a new function that tablecmds.c calls.  This doesn't really resolve
any of the questions I had about event trigger processing, but
I think it at least doesn't make anything worse.  (The event
trigger, logical decoding, and sepgsql tests all pass without
any changes.)  It's tempting to consider providing a similar
API for CREATE SCHEMA to use, but I didn't do so here.

The squirrely-ness around identity is that while this now works:

regression=# CREATE TABLE itest8 (f1 int);
CREATE TABLE
regression=# ALTER TABLE itest8
regression-#   ADD COLUMN f2 int NOT NULL,
regression-#   ALTER COLUMN f2 ADD GENERATED ALWAYS AS IDENTITY;
ALTER TABLE

it doesn't work if there's rows in the table:

regression=# CREATE TABLE itest8 (f1 int);
CREATE TABLE
regression=# insert into itest8 default values;
INSERT 0 1
regression=# ALTER TABLE itest8
  ADD COLUMN f2 int NOT NULL,
  ALTER COLUMN f2 ADD GENERATED ALWAYS AS IDENTITY;
ERROR:  column "f2" contains null values

The same would be true if you tried to do the ALTER as two separate
operations (because the ADD ... NOT NULL, without a default, will
naturally fail on a nonempty table).  So I don't feel *too* awful
about that.  But it'd be better if this worked.  It'll require
some refactoring of where the dependency link from an identity
column to its sequence gets set up.  This patch seems large enough
as-is, and it covers all the cases we've gotten field complaints
about, so I'm content to leave the residual identity issues for later.

regards, tom lane

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8d25d14..7dc5d9a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -86,6 +86,7 @@
 #include "storage/lock.h"
 #include "storage/predicate.h"
 #include "storage/smgr.h"
+#include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
@@ -144,11 +145,13 @@ static List *on_commits = NIL;
 #define AT_PASS_OLD_CONSTR		3	/* re-add existing constraints */
 /* We could support a RENAME COLUMN pass here, but not currently used */
 #define AT_PASS_ADD_COL			4	/* ADD COLUMN */
-#define AT_PASS_COL_ATTRS		5	/* set other column attributes */
-#define AT_PASS_ADD_INDEX		6	/* ADD indexes */
-#define AT_PASS_ADD_CONSTR		7	/* ADD constraints, defaults */
-#define AT_PASS_MISC			8	/* other stuff */
-#define AT_NUM_PASSES			9
+#define AT_PASS_ADD_CONSTR		5	/* ADD constraints (initial examination) */
+#define AT_PASS_COL_ATTRS		6	/* set column attributes, eg NOT NULL */
+#define AT_PASS_ADD_INDEXCONSTR	7	/* ADD index-based constraints */
+#define AT_PASS_ADD_INDEX		8	/* ADD indexes */
+#define AT_PASS_ADD_OTHERCONSTR	9	/* ADD other constraints, defaults */
+#define AT_PASS_MISC			10	/* other stuff */
+#define AT_NUM_PASSES			11
 
 typedef struct AlteredTableInfo
 {
@@ -161,6 +164,7 @@ typedef struct AlteredTableInfo
 	/* Information saved by Phases 1/2 for Phase 3: */
 	List	   *constraints;	/* List of NewConstraint */
 	List	   *newvals;		/* List of NewColumnValue */
+	List	   *afterStmts;		/* List of utility command 

Re: Make StringInfo available to frontend code.

2019-11-01 Thread Daniel Gustafsson
> On 30 Oct 2019, at 01:10, Andres Freund  wrote:

>Make StringInfo available to frontend code.

I’ve certainly wanted just that on multiple occasions, so +1 on this.

>Therefore it seems best to just making StringInfo being usable by
>frontend code. There's not much to do for that, except for rewriting
>two subsequent elog/ereport calls into others types of error
>reporting, and deciding on a maximum string length.

Skimming (but not testing) the patch, it seems a reasonable approach.

+ * StringInfo provides an extensible string data type.  It can be used to

It might be useful to point out the upper bound on the extensibility in the
rewrite of this sentence, and that it’s not guaranteed to be consistent between
frontend and backend.

> I'm still using stringinfo in the aforementioned thread, and I also want
> to use it in a few more places. On the more ambitious side I really
> would like to have a minimal version of elog.h available in the backend,
> and that would really be a lot easier with stringinfo available.

s/backend/frontend/?

cheers ./daniel



Re: Join Correlation Name

2019-11-01 Thread Vik Fearing
On 30/10/2019 09:04, Fabien COELHO wrote:
>
>> I think possibly what the spec says (and that neither my patch nor
>> Peter's implements) is assigning the alias just to the > list>. 
>
> I think you are right, the alias is only on the identical columns.
>
> It solves the issue I raised about inaccessible attributes, and
> explains why it is only available with USING and no other join variants.
>
>> So my original example query should actually be:
>>
>> SELECT a.x, b.y, j.z FROM a INNER JOIN b USING (z) AS j;
>
> Yep, only z should be in j, it is really just about the USING clause.


My reading of SQL:2016-2 7.10 SR 11.a convinces me that this is the case.


My reading of transformFromClauseItem() convinces me that this is way
over my head and I have to abandon it. :-(





Re: fe-utils - share query cancellation code

2019-11-01 Thread Fabien COELHO



Then you need to add #include libpq-fe.h in cancel.h.  Our policy is
that headers compile standalone (c.h / postgres_fe.h / postgres.h
excluded).


Ok. I do not make a habit of adding headers in postgres, so I did not 
notice there was an alphabetical logic to that.


Attached patch v4 does it.

--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 03bcd22996..904d6f0e00 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -59,6 +59,7 @@
 
 #include "common/int.h"
 #include "common/logging.h"
+#include "fe_utils/cancel.h"
 #include "fe_utils/conditional.h"
 #include "getopt_long.h"
 #include "libpq-fe.h"
@@ -3887,6 +3888,9 @@ initGenerateData(PGconn *con)
 			exit(1);
 		}
 
+		if (CancelRequested)
+			break;
+
 		/*
 		 * If we want to stick with the original logging, print a message each
 		 * 100k inserted rows.
@@ -4057,6 +4061,9 @@ runInitSteps(const char *initialize_steps)
 	if ((con = doConnect()) == NULL)
 		exit(1);
 
+	setup_cancel_handler(NULL);
+	SetCancelConn(con);
+
 	for (step = initialize_steps; *step != '\0'; step++)
 	{
 		instr_time	start;
@@ -4120,6 +4127,7 @@ runInitSteps(const char *initialize_steps)
 	}
 
 	fprintf(stderr, "done in %.2f s (%s).\n", run_time, stats.data);
+	ResetCancelConn();
 	PQfinish(con);
 	termPQExpBuffer();
 }
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index b981ae81ff..f1d9e0298a 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -29,6 +29,7 @@
 #include "libpq-fe.h"
 #include "pqexpbuffer.h"
 #include "common/logging.h"
+#include "fe_utils/cancel.h"
 #include "fe_utils/print.h"
 #include "fe_utils/string_utils.h"
 
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 90f6380170..0a66b71372 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -24,6 +24,7 @@
 #include "copy.h"
 #include "crosstabview.h"
 #include "fe_utils/mbprint.h"
+#include "fe_utils/cancel.h"
 #include "fe_utils/string_utils.h"
 #include "portability/instr_time.h"
 #include "settings.h"
@@ -223,6 +224,7 @@ NoticeProcessor(void *arg, const char *message)
 }
 
 
+#ifndef WIN32
 
 /*
  * Code to support query cancellation
@@ -241,7 +243,7 @@ NoticeProcessor(void *arg, const char *message)
  *
  * SIGINT is supposed to abort all long-running psql operations, not only
  * database queries.  In most places, this is accomplished by checking
- * cancel_pressed during long-running loops.  However, that won't work when
+ * CancelRequested during long-running loops.  However, that won't work when
  * blocked on user input (in readline() or fgets()).  In those places, we
  * set sigint_interrupt_enabled true while blocked, instructing the signal
  * catcher to longjmp through sigint_interrupt_jmp.  We assume readline and
@@ -252,34 +254,9 @@ volatile bool sigint_interrupt_enabled = false;
 
 sigjmp_buf	sigint_interrupt_jmp;
 
-static PGcancel *volatile cancelConn = NULL;
-
-#ifdef WIN32
-static CRITICAL_SECTION cancelConnLock;
-#endif
-
-/*
- * Write a simple string to stderr --- must be safe in a signal handler.
- * We ignore the write() result since there's not much we could do about it.
- * Certain compilers make that harder than it ought to be.
- */
-#define write_stderr(str) \
-	do { \
-		const char *str_ = (str); \
-		int		rc_; \
-		rc_ = write(fileno(stderr), str_, strlen(str_)); \
-		(void) rc_; \
-	} while (0)
-
-
-#ifndef WIN32
-
 static void
-handle_sigint(SIGNAL_ARGS)
+psql_sigint_callback(void)
 {
-	int			save_errno = errno;
-	char		errbuf[256];
-
 	/* if we are waiting for input, longjmp out of it */
 	if (sigint_interrupt_enabled)
 	{
@@ -288,74 +265,20 @@ handle_sigint(SIGNAL_ARGS)
 	}
 
 	/* else, set cancel flag to stop any long-running loops */
-	cancel_pressed = true;
-
-	/* and send QueryCancel if we are processing a database query */
-	if (cancelConn != NULL)
-	{
-		if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
-			write_stderr("Cancel request sent\n");
-		else
-		{
-			write_stderr("Could not send cancel request: ");
-			write_stderr(errbuf);
-		}
-	}
-
-	errno = save_errno;			/* just in case the write changed it */
+	CancelRequested = true;
 }
 
-void
-setup_cancel_handler(void)
-{
-	pqsignal(SIGINT, handle_sigint);
-}
-#else			/* WIN32 */
-
-static BOOL WINAPI
-consoleHandler(DWORD dwCtrlType)
-{
-	char		errbuf[256];
-
-	if (dwCtrlType == CTRL_C_EVENT ||
-		dwCtrlType == CTRL_BREAK_EVENT)
-	{
-		/*
-		 * Can't longjmp here, because we are in wrong thread :-(
-		 */
-
-		/* set cancel flag to stop any long-running loops */
-		cancel_pressed = true;
-
-		/* and send QueryCancel if we are processing a database query */
-		EnterCriticalSection();
-		if (cancelConn != NULL)
-		{
-			if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
-write_stderr("Cancel request sent\n");
-			else
-			{
-write_stderr("Could not send cancel request: ");
-write_stderr(errbuf);
-			}
-		}
-		LeaveCriticalSection();
-
-		return TRUE;
-	}
-	else
-		/* Return FALSE for any 

Re: Remove configure --disable-float4-byval and --disable-float8-byval

2019-11-01 Thread Peter Geoghegan
On Fri, Nov 1, 2019 at 1:19 PM Robert Haas  wrote:
> Yeah! I mean, users who are using only 4-byte or smaller pass-by-value
> quantities will be harmed, especially in cases where they are storing
> a lot of them at the same time (e.g. sorting) and especially if they
> double their space consumption and run out of their very limited
> supply of memory.

I think that you meant treble, not double. You're forgetting about the
palloc() header overhead.   ;-)

Doing even slightly serious work on a 32-bit machine is penny wise and
pound foolish. I also believe that we should support minority
platforms reasonably well, including 32-bit platforms, because it's
always a good idea to try to meet people where they are. Your proposal
doesn't seem like it really gives up on that goal.

> Those are all worthwhile considerations and perhaps
> strong arguments against my proposal. But, people using 8-byte
> oughta-be-pass-by-value quantities are certainly being harmed by the
> present system. It's not a black-and-white thing, like, oh, 8-byte
> datums on 32-bit system is awful all the time. Or at least, I don't
> think it is.

Right.

> Yeah, I've had to fight with this multiple times, and so have other
> people. It's a nuisance, and it causes bugs, certainly in draft
> patches, sometimes in committed ones. It's not "free." If it's a cost
> worth paying, ok, but is it?

#ifdef crud is something that we should go out of our way to eliminate
on general principle. All good portable C codebases go to great
lengths to encapsulate platform differences, if necessary by adding a
compatibility layer. One of the worst things about the OpenSSL
codebase is that it makes writing portable code everybody's problem.

-- 
Peter Geoghegan




excluding CREATE INDEX CONCURRENTLY from OldestXmin

2019-11-01 Thread Alvaro Herrera
It would be useful to have CREATE INDEX CONCURRENTLY be ignored by
vacuuming's OldestXmin.  Frequently in OLTP scenarios, CIC transactions
are severely disruptive because they are the only long-running
transactions in the system, and VACUUM has to keep rows only for their
sake, pointlessly.  The motivation for this change seems well justified
to me (but feel free to argue if you think otherwise).

So the question is how to implement this.  Here's a very small patch for
it:

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 374e2d0efe..9081dfe384 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -532,6 +532,12 @@ DefineIndex(Oid relationId,
 errmsg("cannot use more than %d columns in an 
index",
INDEX_MAX_KEYS)));
 
+   if (stmt->concurrent && !IsTransactionBlock())
+   {
+   Assert(GetCurrentTransactionIdIfAny() == InvalidTransactionId);
+   MyPgXact->vacuumFlags |= PROC_IN_VACUUM;
+   }
+
/*
 * Only SELECT ... FOR UPDATE/SHARE are allowed while doing a standard
 * index build; but for concurrent builds we allow INSERT/UPDATE/DELETE

There's an obvious flaw, which is that this doesn't consider expressions
in partial indexes and column definitions.  That's moderately easy to
fix.  But there are less obvious flaws, such as: is it possible that
CIC's xmin is required for other reasons? (such as access to catalogs,
which get cleaned by concurrent vacuum)  If it is, can we fix that
problem by keeping track of a separate Xmin for catalog vacuuming
purposes?  (We already have catalog_xmin for replication slots, so this
is not completely ridiculous I think ...)

-- 
Álvaro Herrera




Re: Remove configure --disable-float4-byval and --disable-float8-byval

2019-11-01 Thread Robert Haas
On Fri, Nov 1, 2019 at 3:15 PM Peter Geoghegan  wrote:
> I don't think that those two things are equivalent at all. There may
> even be workloads that will benefit when run on 32-bit hardware.
> Having to palloc() and pfree() with 8 byte integers is probably very
> slow.

Yeah! I mean, users who are using only 4-byte or smaller pass-by-value
quantities will be harmed, especially in cases where they are storing
a lot of them at the same time (e.g. sorting) and especially if they
double their space consumption and run out of their very limited
supply of memory. Those are all worthwhile considerations and perhaps
strong arguments against my proposal. But, people using 8-byte
oughta-be-pass-by-value quantities are certainly being harmed by the
present system. It's not a black-and-white thing, like, oh, 8-byte
datums on 32-bit system is awful all the time. Or at least, I don't
think it is.

> The mental burden of considering "SIZEOF_DATUM == 4" and
> "USE_FLOAT8_BYVAL" is a real cost for us. We maintain non-trivial
> 32-bit only code for numeric abbreviated keys. We also have to worry
> about pfree()'ing memory when USE_FLOAT8_BYVAL within
> heapam_index_validate_scan(). How confident are we that there isn't
> some place that leaks memory on !USE_FLOAT8_BYVAL builds because
> somebody forgot to add a pfree() in an #ifdef block?

Yeah, I've had to fight with this multiple times, and so have other
people. It's a nuisance, and it causes bugs, certainly in draft
patches, sometimes in committed ones. It's not "free." If it's a cost
worth paying, ok, but is it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-11-01 Thread Robert Haas
On Tue, Aug 6, 2019 at 10:36 AM Bruce Momjian  wrote:
> OK, I think you are missing something.   Let me go over the details.
> First, I think we are all agreed we are using CTR for heap/index pages,
> and for WAL, because CTR allows byte granularity, it is faster, and
> might be more secure.
>
> So, to write 8k heap/index pages, we use the agreed-on LSN/page-number
> to encrypt each page.  In CTR mode, we do that by creating an 8k bit
> stream, which is created in 16-byte chunks with AES by incrementing the
> counter used for each 16-byte chunk.  Wee then XOR the bits with what we
> want to encrypt, and skip the LSN and CRC parts of the page.

Seems reasonable (not that I am an encryption expert).

> For WAL, we effectively create a 16MB bitstream, though we can create it
> in parts as needed.  (Creating it in parts is easier in CTR mode.)  The
> nonce is the segment number, but each 16-byte chunk uses a different
> counter.  Therefore, even if you are encrypting the same 8k page several
> times in the WAL, the 8k page would be different because of the LSN (and
> other changes), and the bitstream you encrypt/XOR it with would be
> different because the counter would be different for that offset in the
> WAL.

But, if you encrypt the same WAL page several times, the LSN won't
change, because a WAL page doesn't have an LSN on it, and if it did,
it wouldn't be changing, because an LSN is just a position within the
WAL stream, so any given byte on any given WAL page always has the
same LSN, whatever it is.

And if the counter value changed on re-encryption, I don't see how
we'd know what counter value to use when decrypting.  There's no way
for the code that is decrypting to know how many times the page got
rewritten as it was being filled.

Please correct me if I'm being stupid here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-11-01 Thread Robert Haas
On Mon, Aug 5, 2019 at 8:44 PM Bruce Momjian  wrote:
> Right.  The 8k page LSN changes each time the page is modified, and the
> is part of the page nonce.

What about hint bit changes?

I think even with wal_log_hints=on, it's not the case that *every*
change to hint bits results in an LSN change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: fe-utils - share query cancellation code

2019-11-01 Thread Alvaro Herrera
On 2019-Nov-01, Fabien COELHO wrote:

> Because "cancel.h" requires PGconn declaration, thus must appear after
> "libpq-fe.h",

Then you need to add #include libpq-fe.h in cancel.h.  Our policy is
that headers compile standalone (c.h / postgres_fe.h / postgres.h
excluded).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Remove configure --disable-float4-byval and --disable-float8-byval

2019-11-01 Thread Peter Geoghegan
On Fri, Nov 1, 2019 at 11:00 AM Tom Lane  wrote:
> This line of argument seems to me to be the moral equivalent of
> "let's drop 32-bit support altogether".  I'm not entirely on board
> with that.

I don't think that those two things are equivalent at all. There may
even be workloads that will benefit when run on 32-bit hardware.
Having to palloc() and pfree() with 8 byte integers is probably very
slow.

> Certainly, a lot of the world is 64-bit these days,
> but people are still building small systems and they might want
> a database; preferably one that hasn't been detuned to the extent
> that it barely manages to run at all on such a platform.

Even the very low end is 64-bit these days. $100 smartphones have
64-bit CPUs and 4GB of ram. AFAICT, anything still being produced that
is recognizable as a general purpose CPU (e.g. by having virtual
memory) is 64-bit. We're talking about a change that can't affect
users until late 2020 at the earliest.

I accept that there are some number of users that still have 32-bit
Postgres installations, probably because they happened to have a
32-bit machine close at hand. The economics of running a database
server on a 32-bit machine are already awful, though.

> It seems especially insane to conclude that we should pull the plug
> on such use-cases just to get rid of one obscure configure option.
> If we were expending any significant devel effort on supporting
> 32-bit platforms, I might be ready to drop support, but we're not.
> (Robert's proposal looks to me like it's actually creating new work
> to do, not saving work.)

The mental burden of considering "SIZEOF_DATUM == 4" and
"USE_FLOAT8_BYVAL" is a real cost for us. We maintain non-trivial
32-bit only code for numeric abbreviated keys. We also have to worry
about pfree()'ing memory when USE_FLOAT8_BYVAL within
heapam_index_validate_scan(). How confident are we that there isn't
some place that leaks memory on !USE_FLOAT8_BYVAL builds because
somebody forgot to add a pfree() in an #ifdef block?

-- 
Peter Geoghegan




Re: ICU for global collation

2019-11-01 Thread Daniel Verite
Peter Eisentraut wrote:

> I looked into this problem.  The way to address this would be adding 
> proper collation support to the text search subsystem.  See the TODO 
> markers in src/backend/tsearch/ts_locale.c for starting points.  These 
> APIs spread out to a lot of places, so it will take some time to finish. 
>  In the meantime, I'm pausing this thread and will set the CF entry as RwF.

Even if the FTS code is improved in that matter, any extension code
with libc functions depending on LC_CTYPE is still going to be
potentially problematic. In particular when it happens to be set
to a different encoding than the database.

Couldn't we simply invent per-database GUC options, as in
ALTER DATABASE myicudb SET libc_lc_ctype TO 'value';
ALTER DATABASE myicudb SET libc_lc_collate TO 'value';

where libc_lc_ctype/libc_lc_collate would specifically set
the values in the LC_CTYPE and LC_COLLATE environment vars
of any backend serving the corresponding database"?


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: [PATCH] Implement INSERT SET syntax

2019-11-01 Thread Marko Tiikkaja
On Fri, Nov 1, 2019 at 6:31 PM Robert Haas  wrote:

> On Sun, Aug 18, 2019 at 11:00 AM Tom Lane  wrote:
> > Perhaps the way to resolve Peter's objection is to make the syntax
> > more fully like UPDATE:
> >
> > INSERT INTO target SET c1 = x, c2 = y+z, ... FROM tables-providing-x-y-z
> >
> > (with the patch as-submitted corresponding to the case with an empty
> > FROM clause, hence no variables in the expressions-to-be-assigned).
> >
> > Of course, this is not functionally distinct from
> >
> > INSERT INTO target(c1,c2,...) SELECT x, y+z, ... FROM
> tables-providing-x-y-z
> >
> > and it's fair to question whether it's worth supporting a nonstandard
> > syntax just to allow the target column names to be written closer to
> > the expressions-to-be-assigned.
>
> For what it's worth, I think this would be useful enough to justify
> its existence. Back in days of yore when dragons roamed the earth and
> I wrote database-driven applications instead of hacking on the
> database itself, I often wondered why I had to write two
> completely-different looking SQL statements, one to insert the data
> which a user had entered into a webform into the database, and another
> to update previously-entered data. This feature would allow those
> queries to be written in the same way, which would have pleased me,
> back in the day.
>

I still do, and this would be a big help.  I don't care if it's
non-standard.


.m


Re: Remove configure --disable-float4-byval and --disable-float8-byval

2019-11-01 Thread Tom Lane
Peter Geoghegan  writes:
> On Fri, Nov 1, 2019 at 7:41 AM Robert Haas  wrote:
>> Could we get around this by making Datum 8 bytes everywhere?

> I really like that idea.

> Even Raspberry Pi devices (which can cost as little as $35) use 64-bit
> ARM processors. It's abundantly clear that 32-bit platforms do not
> matter enough to justify keeping all the SIZEOF_DATUM crud around.

This line of argument seems to me to be the moral equivalent of
"let's drop 32-bit support altogether".  I'm not entirely on board
with that.  Certainly, a lot of the world is 64-bit these days,
but people are still building small systems and they might want
a database; preferably one that hasn't been detuned to the extent
that it barely manages to run at all on such a platform.  Making
a whole lot of internal APIs 64-bit would be a pretty big hit for
a 32-bit platform --- more instructions, more memory consumed for
things like Datum arrays, all in a memory space that's not that big.

It seems especially insane to conclude that we should pull the plug
on such use-cases just to get rid of one obscure configure option.
If we were expending any significant devel effort on supporting
32-bit platforms, I might be ready to drop support, but we're not.
(Robert's proposal looks to me like it's actually creating new work
to do, not saving work.)

regards, tom lane




Re: Allow superuser to grant passwordless connection rights on postgres_fdw

2019-11-01 Thread Andrew Dunstan


On 11/1/19 12:58 PM, Robert Haas wrote:
> On Thu, Oct 31, 2019 at 4:58 PM Andrew Dunstan
>  wrote:
>> This patch allows the superuser to grant passwordless connection rights
>> in postgres_fdw user mappings.
> This is clearly something that we need, as the current code seems
> woefully ignorant of the fact that passwords are not the only
> authentication method supported by PostgreSQL, nor even the most
> secure.
>
> But, I do wonder a bit if we ought to think harder about the overall
> authentication model for FDW. Like, maybe we'd take a different view
> of how to solve this particular piece of the problem if we were
> thinking about how FDWs could do LDAP authentication, SSL
> authentication, credentials forwarding...
>


I'm certainly open to alternatives.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: ssl passphrase callback

2019-11-01 Thread Andrew Dunstan


On 11/1/19 11:01 AM, Robert Haas wrote:
> On Thu, Oct 31, 2019 at 11:37 AM Andrew Dunstan
>  wrote:
>> This patch provides a hook for a function that can supply an SSL
>> passphrase. The hook can be filled in by a shared preloadable module. In
>> order for that to be effective, the startup order is modified slightly.
>> There is a test attached that builds and uses one trivial
>> implementation, which just takes a configuration setting and rot13's it
>> before supplying the result as the passphrase.
> It seems to me that it would be a lot better to have an example in
> contrib that does something which might be of actual use to users,
> such as running a shell command and reading the passphrase from
> stdout.
>
> Features that are only accessible by writing C code are, in general,
> not as desirable as features which can be accessed via SQL or
> configuration.
>


Well, I tried to provide the most trivial and simple test I could come
up with. Running a shell command can already be accomplished via the
ssl_passphrase_command setting.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: MarkBufferDirtyHint() and LSN update

2019-11-01 Thread Antonin Houska
Robert Haas  wrote:

> On Wed, Oct 30, 2019 at 9:43 AM Antonin Houska  wrote:
> > 5. In the first session, FlushBuffer()->TerminateBufferIO() will not clear
> > BM_DIRTY because MarkBufferDirtyHint() has eventually set
> > BM_JUST_DIRTIED. Thus the hint bit change itself will be written by the next
> > call of FlushBuffer(). However page LSN is hasn't been updated so the
> > requirement that WAL must be flushed first is not met.
> 
> This part confuses me. Are you saying that MarkBufferDirtyHint() can
> set BM_JUST_DIRTIED without setting BM_DIRTY?

No, I'm saying that MarkBufferDirtyHint() leaves BM_DIRTY set, as
expected. However, if things happen in the order I described, then LSN
returned by XLogSaveBufferForHint() is not assigned to the page.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: fe-utils - share query cancellation code

2019-11-01 Thread Fabien COELHO



I understand that you are unhappy about something, but where the issue is
fails me, the "wtf" 3 characters are not enough to point me in the right
direction. Feel free to elaborate a little bit more:-)


I don't see why you move the "conditional.h" line out of its correct
alphabetical position (where it is now), and then add "cancel.h" next to
it also out of its correct alphabetical position.


Because "cancel.h" requires PGconn declaration, thus must appear after 
"libpq-fe.h", and once I put it after that letting "conditional.h" above & 
alone looked a little bit silly. I put cancel after conditional because it 
was the new addition, which is somehow logical, although not alphabetical.


Now I can put cancel before conditional, sure.

Patch v3 attached does that.

--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 03bcd22996..7aab4bdf47 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -59,9 +59,10 @@
 
 #include "common/int.h"
 #include "common/logging.h"
-#include "fe_utils/conditional.h"
 #include "getopt_long.h"
 #include "libpq-fe.h"
+#include "fe_utils/cancel.h"
+#include "fe_utils/conditional.h"
 #include "pgbench.h"
 #include "portability/instr_time.h"
 
@@ -3887,6 +3888,9 @@ initGenerateData(PGconn *con)
 			exit(1);
 		}
 
+		if (CancelRequested)
+			break;
+
 		/*
 		 * If we want to stick with the original logging, print a message each
 		 * 100k inserted rows.
@@ -4057,6 +4061,9 @@ runInitSteps(const char *initialize_steps)
 	if ((con = doConnect()) == NULL)
 		exit(1);
 
+	setup_cancel_handler(NULL);
+	SetCancelConn(con);
+
 	for (step = initialize_steps; *step != '\0'; step++)
 	{
 		instr_time	start;
@@ -4120,6 +4127,7 @@ runInitSteps(const char *initialize_steps)
 	}
 
 	fprintf(stderr, "done in %.2f s (%s).\n", run_time, stats.data);
+	ResetCancelConn();
 	PQfinish(con);
 	termPQExpBuffer();
 }
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index b981ae81ff..f1d9e0298a 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -29,6 +29,7 @@
 #include "libpq-fe.h"
 #include "pqexpbuffer.h"
 #include "common/logging.h"
+#include "fe_utils/cancel.h"
 #include "fe_utils/print.h"
 #include "fe_utils/string_utils.h"
 
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 90f6380170..0a66b71372 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -24,6 +24,7 @@
 #include "copy.h"
 #include "crosstabview.h"
 #include "fe_utils/mbprint.h"
+#include "fe_utils/cancel.h"
 #include "fe_utils/string_utils.h"
 #include "portability/instr_time.h"
 #include "settings.h"
@@ -223,6 +224,7 @@ NoticeProcessor(void *arg, const char *message)
 }
 
 
+#ifndef WIN32
 
 /*
  * Code to support query cancellation
@@ -241,7 +243,7 @@ NoticeProcessor(void *arg, const char *message)
  *
  * SIGINT is supposed to abort all long-running psql operations, not only
  * database queries.  In most places, this is accomplished by checking
- * cancel_pressed during long-running loops.  However, that won't work when
+ * CancelRequested during long-running loops.  However, that won't work when
  * blocked on user input (in readline() or fgets()).  In those places, we
  * set sigint_interrupt_enabled true while blocked, instructing the signal
  * catcher to longjmp through sigint_interrupt_jmp.  We assume readline and
@@ -252,34 +254,9 @@ volatile bool sigint_interrupt_enabled = false;
 
 sigjmp_buf	sigint_interrupt_jmp;
 
-static PGcancel *volatile cancelConn = NULL;
-
-#ifdef WIN32
-static CRITICAL_SECTION cancelConnLock;
-#endif
-
-/*
- * Write a simple string to stderr --- must be safe in a signal handler.
- * We ignore the write() result since there's not much we could do about it.
- * Certain compilers make that harder than it ought to be.
- */
-#define write_stderr(str) \
-	do { \
-		const char *str_ = (str); \
-		int		rc_; \
-		rc_ = write(fileno(stderr), str_, strlen(str_)); \
-		(void) rc_; \
-	} while (0)
-
-
-#ifndef WIN32
-
 static void
-handle_sigint(SIGNAL_ARGS)
+psql_sigint_callback(void)
 {
-	int			save_errno = errno;
-	char		errbuf[256];
-
 	/* if we are waiting for input, longjmp out of it */
 	if (sigint_interrupt_enabled)
 	{
@@ -288,74 +265,20 @@ handle_sigint(SIGNAL_ARGS)
 	}
 
 	/* else, set cancel flag to stop any long-running loops */
-	cancel_pressed = true;
-
-	/* and send QueryCancel if we are processing a database query */
-	if (cancelConn != NULL)
-	{
-		if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
-			write_stderr("Cancel request sent\n");
-		else
-		{
-			write_stderr("Could not send cancel request: ");
-			write_stderr(errbuf);
-		}
-	}
-
-	errno = save_errno;			/* just in case the write changed it */
+	CancelRequested = true;
 }
 
-void
-setup_cancel_handler(void)
-{
-	pqsignal(SIGINT, handle_sigint);
-}
-#else			/* WIN32 */
-
-static BOOL WINAPI
-consoleHandler(DWORD dwCtrlType)
-{
-	char		errbuf[256];
-
-	if (dwCtrlType == CTRL_C_EVENT ||
-		dwCtrlType == CTRL_BREAK_EVENT)
-	{

Re: On disable_cost

2019-11-01 Thread Andres Freund
On 2019-11-01 12:56:30 -0400, Robert Haas wrote:
> On Fri, Nov 1, 2019 at 12:43 PM Andres Freund  wrote:
> > As a first step I'd be inclined to "just" adjust disable_cost up to
> > something like 1.0e12. Unfortunately much higher and and we're getting
> > into the area where the loss of precision starts to be significant
> > enough that I'm not sure that we're always careful enough to perform
> > math in the right order (e.g. 1.0e16 + 1 being 1.0e16, and 1e+20 + 1000
> > being 1e+20). I've seen queries with costs above 1e10 where that costing
> > wasn't insane.
> 
> We've done that before and we can do it again. But we're going to need
> to have something better eventually, I think, not just keep kicking
> the can down the road.

Yea, that's why I continued on to describe what we should do afterwards
;)


> Yet another approach would be to divide the cost into two parts, a
> "cost" component and a "violations" component. If two paths are
> compared, the one with fewer violations always wins; if it's a tie,
> they compare on cost. A path's violation count is the total of its
> children, plus one for itself if it does something that's disabled.
> This would be more principled than the current approach, but maybe
> it's too costly.

Namely go for something like this. I think we probably get away with the
additional comparison, especially if we were to store the violations as
an integer and did it like if (unlikely(path1->nviolations !=
path2->nviolations)) or such - that ought to be very well predicted in
nearly all cases.

I wonder how much we'd need to reformulate
compare_path_costs/compare_path_costs_fuzzily to allow the compiler to
auto-vectorize. Might not be worth caring...

Greetings,

Andres Freund




Re: 64 bit transaction id

2019-11-01 Thread Tomas Vondra

On Fri, Nov 01, 2019 at 12:05:12PM +0300, Павел Ерёмин wrote:

  Hi.
  sorry for my English.
  I want to once again open the topic of 64 bit transaction id. I did not
  manage to find in the archive of the option that I want to discuss, so I
  write. If I searched poorly, then please forgive me.
  The idea is not very original and probably has already been considered,
  again I repeat - I did not find it. Therefore, please do not scold me
  severely.
  In discussions of 64-bit transaction id, I did not find mention of an
  algorithm for storing them, as it was done, for example, in MS SQL Server.
  What if instead of 2 fields (xmin and xmax) with a total length of 64 bits
  - use 1 field (let's call it xid) with a length of 64 bits in tuple
  header? In this field store the xid of the transaction that created the
  version. In this case, the new transaction in order to understand whether
  the read version is suitable for it or not, will have to read the next
  version as well. Those. The downside of such  decision is of course an
  increase in I / O. Transactions will have to read the +1 version. On the
  plus side, the title of the tuple remains the same length.
   


I think that assumes we can easily identify the next version of a tuple,
and I don't think we can do that. We may be able to do that for for HOT
chains, but that only works when the next version fits onto the same
page (and does not update indexed columns). But when we store the new
version on a separate page, we don't have any link between those tuples.
And adding it may easily mean more overhead than the 8B we'd save by
only storing a single XID.

IMO the most promising solution to this is the "page epoch" approach
discussed some time ago (1-2 years?).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Resume vacuum and autovacuum from interruption and cancellation

2019-11-01 Thread Robert Haas
On Thu, Aug 8, 2019 at 9:42 AM Rafia Sabih  wrote:
> Sounds like an interesting idea, but does it really help? Because if
> vacuum was interrupted previously, wouldn't it already know the dead
> tuples, etc in the next run quite quickly, as the VM, FSM is already
> updated for the page in the previous run.

+1. I don't deny that a patch like this could sometimes save
something, but it doesn't seem like it would save all that much all
that often. If your autovacuum runs are being frequently cancelled,
that's going to be a big problem, I think. And as Rafia says, even
though you might do a little extra work reclaiming garbage from
subsequently-modified pages toward the beginning of the table, it
would be unusual if they'd *all* been modified. Plus, if they've
recently been modified, they're more likely to be in cache.

I think this patch really needs a test scenario or demonstration of
some kind to prove that it produces a measurable benefit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: 64 bit transaction id

2019-11-01 Thread Tomas Vondra

On Fri, Nov 01, 2019 at 10:25:17AM +0100, Pavel Stehule wrote:

Hi

pá 1. 11. 2019 v 10:11 odesílatel Павел Ерёмин 
napsal:


Hi.
sorry for my English.
I want to once again open the topic of 64 bit transaction id. I did not
manage to find in the archive of the option that I want to discuss, so I
write. If I searched poorly, then please forgive me.
The idea is not very original and probably has already been considered,
again I repeat - I did not find it. Therefore, please do not scold me
severely.
In discussions of 64-bit transaction id, I did not find mention of an
algorithm for storing them, as it was done, for example, in MS SQL Server.
What if instead of 2 fields (xmin and xmax) with a total length of 64 bits
- use 1 field (let's call it xid) with a length of 64 bits in tuple header?
In this field store the xid of the transaction that created the version. In
this case, the new transaction in order to understand whether the read
version is suitable for it or not, will have to read the next version as
well. Those. The downside of such  decision is of course an increase in I /
O. Transactions will have to read the +1 version. On the plus side, the
title of the tuple remains the same length.



is 32 bit tid really problem? Why you need to know state of last 2^31
transactions? Is not problem in too low usage (or maybe too high overhead)
of VACUUM FREEZE.



It certainly can be an issue for large and busy systems, that may need
anti-wraparoud vacuum every couple of days. If that requires rewriting a
couple of TB of data, it's not particularly nice. That's why 64-bit XIDs
were discussed repeatedly in the past, and it's likely to get even more
pressing as the systems get larger.


I am not sure if increasing this range can has much more fatal problems
(maybe later)



Well, not fatal, but naive approaches can increase per-tuple overhead.
And we already have plenty of that, hence there were proposals to use
page epochs and so on.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: On disable_cost

2019-11-01 Thread Tomas Vondra

On Fri, Nov 01, 2019 at 09:30:52AM -0700, Jim Finnerty wrote:

re: coping with adding disable_cost more than once

Another option would be to have a 2-part Cost structure.  If disable_cost is
ever added to the Cost, then you set a flag recording this.  If any plans
exist that have no disable_costs added to them, then the planner chooses the
minimum cost among those, otherwise you choose the minimum cost path.



Yeah, I agree having is_disabled flag, and treat all paths with 'true'
as more expensive than paths with 'false' (and when both paths have the
same value then actually compare the cost) is probably the way forward.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: pglz performance

2019-11-01 Thread Tomas Vondra

On Fri, Nov 01, 2019 at 12:48:28PM -0300, Alvaro Herrera wrote:

On 2019-Nov-01, Peter Eisentraut wrote:


On 2019-10-25 07:05, Andrey Borodin wrote:
> > 21 окт. 2019 г., в 14:09, Andrey Borodin  написал(а):
> >
> > With Silesian corpus pglz_decompress_hacked is actually decreasing 
performance on high-entropy data.
> > Meanwhile pglz_decompress_hacked8 is still faster than usual 
pglz_decompress.
> > In spite of this benchmarks, I think that pglz_decompress_hacked8 is safer 
option.
>
> Here's v3 which takes into account recent benchmarks with Silesian Corpus and 
have better comments.

Your message from 21 October appears to say that this change makes the
performance worse.  So I don't know how to proceed with this.


As I understand that report, in these results "less is better", so the
hacked8 variant shows better performance (33.8) than current (42.5).
The "hacked" variant shows worse performance (48.2) that the current
code.  The "in spite" phrase seems to have been a mistake.

I am surprised that there is so much variability in the performance
numbers, though, based on such small tweaks of the code.



I'd try running the benchmarks to verify the numbers, and maybe do some
additional tests, but it's not clear to me which patches should I use.

I think the last patches with 'hacked' and 'hacked8' in the name are a
couple of months old, and the recent posts attach just a single patch.
Andrey, can you post current versions of both patches?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Allow superuser to grant passwordless connection rights on postgres_fdw

2019-11-01 Thread Robert Haas
On Thu, Oct 31, 2019 at 4:58 PM Andrew Dunstan
 wrote:
> This patch allows the superuser to grant passwordless connection rights
> in postgres_fdw user mappings.

This is clearly something that we need, as the current code seems
woefully ignorant of the fact that passwords are not the only
authentication method supported by PostgreSQL, nor even the most
secure.

But, I do wonder a bit if we ought to think harder about the overall
authentication model for FDW. Like, maybe we'd take a different view
of how to solve this particular piece of the problem if we were
thinking about how FDWs could do LDAP authentication, SSL
authentication, credentials forwarding...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: update ALTER TABLE with ATTACH PARTITION lock mode (docs)

2019-11-01 Thread Justin Pryzby
On Fri, Nov 01, 2019 at 11:01:22PM +0900, Michael Paquier wrote:
> On Fri, Nov 01, 2019 at 08:59:48AM -0500, Justin Pryzby wrote:
> > I guess you mean because it's not a child until after the ALTER.  Yes, that
> > makes sense.
> 
> Yes, perhaps you have another idea than mine on how to shape this
> sentence?

I can't think of anything better.

Attaching a partition acquires a SHARE UPDATE EXCLUSIVE lock
on the parent table, in addition to ACCESS EXCLUSIVE locks
on the table to be attached and the DEFAULT partition (if
any).   
  

Justin




Re: On disable_cost

2019-11-01 Thread Robert Haas
On Fri, Nov 1, 2019 at 12:43 PM Andres Freund  wrote:
> Hm. That seems complicated. Is it clear that we'd always notice that we
> have no plan early enough to know which paths to reconsider? I think
> there's cases where that'd only happen a few levels up.

Yeah, there could be problems of that kind. I think if a baserel has
no paths, then we know right away that we've got a problem, but for
joinrels it might be more complicated.

> As a first step I'd be inclined to "just" adjust disable_cost up to
> something like 1.0e12. Unfortunately much higher and and we're getting
> into the area where the loss of precision starts to be significant
> enough that I'm not sure that we're always careful enough to perform
> math in the right order (e.g. 1.0e16 + 1 being 1.0e16, and 1e+20 + 1000
> being 1e+20). I've seen queries with costs above 1e10 where that costing
> wasn't insane.

We've done that before and we can do it again. But we're going to need
to have something better eventually, I think, not just keep kicking
the can down the road.

Another point to consider here is that in some cases we could really
just skip generating certain paths altogether. We already do this for
hash joins: if we're planning a join and enable_hashjoin is disabled,
we just don't generate hash joins paths at all, except for full joins,
where there might be no other legal method. As this example shows,
this cannot be applied in all cases, but maybe we could do it more
widely than we do today. I'm not sure how beneficial that technique
would be, though, because it doesn't seem like it's quite enough to
solve this problem by itself.

Yet another approach would be to divide the cost into two parts, a
"cost" component and a "violations" component. If two paths are
compared, the one with fewer violations always wins; if it's a tie,
they compare on cost. A path's violation count is the total of its
children, plus one for itself if it does something that's disabled.
This would be more principled than the current approach, but maybe
it's too costly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Looking for a demo of extensible nodes

2019-11-01 Thread Onder Kalaci
Hi,

I've a basic experimental extension where I use extensible nodes. This is
the commit which adds the extensible node to the project:
https://github.com/onderkalaci/pgcolor/commit/10cba5d02a828dbee4bc140f5e88d6c44b40e5c2

Hope that gives you some pointers.


On Fri, Nov 1, 2019 at 1:18 PM Adam Lee  wrote:

> Hi, hackers
>
> As the $Subject, does anyone have one? I'd like to refer to it, and
> write an example for people who is also looking for the document.
>
> Thanks.
>
> --
> Adam Lee
>
>
>


Re: On disable_cost

2019-11-01 Thread Andres Freund
Hi,

On 2019-11-01 12:22:06 -0400, Robert Haas wrote:
> On Fri, Nov 1, 2019 at 12:00 PM Andres Freund  wrote:
> > That seems like a bad idea - we add the cost multiple times. And we
> > still want to compare plans that potentially involve that cost, if
> > there's no other way to plan the query.
> 
> Yeah.  I kind of wonder if we shouldn't instead (a) skip adding paths
> that use methods which are disabled and then (b) if we don't end up
> with any paths for that reloptinfo, try again, ignoring disabling
> GUCs.

Hm. That seems complicated. Is it clear that we'd always notice that we
have no plan early enough to know which paths to reconsider? I think
there's cases where that'd only happen a few levels up.

As a first step I'd be inclined to "just" adjust disable_cost up to
something like 1.0e12. Unfortunately much higher and and we're getting
into the area where the loss of precision starts to be significant
enough that I'm not sure that we're always careful enough to perform
math in the right order (e.g. 1.0e16 + 1 being 1.0e16, and 1e+20 + 1000
being 1e+20). I've seen queries with costs above 1e10 where that costing
wasn't insane.

And then, in a larger patch, go for something like Heikki's proposal
quoted by Zhenghua Lyu upthread, where we treat 'forbidden' as a
separate factor in comparisons of path costs, rather than fudging the
cost upwards. But there's some care to be taken to make sure we don't
regress performance too much due to the additional logic in
compare_path_cost et al.

I'd also be curious to see if there's some other problem with cost
calculation here - some of the quoted final costs seem high enough to be
suspicious. I'd be curious to see a plan...

Greetings,

Andres Freund




Re: pglz performance

2019-11-01 Thread Alvaro Herrera
On 2019-Nov-01, Peter Eisentraut wrote:

> On 2019-10-25 07:05, Andrey Borodin wrote:
> > > 21 окт. 2019 г., в 14:09, Andrey Borodin  
> > > написал(а):
> > > 
> > > With Silesian corpus pglz_decompress_hacked is actually decreasing 
> > > performance on high-entropy data.
> > > Meanwhile pglz_decompress_hacked8 is still faster than usual 
> > > pglz_decompress.
> > > In spite of this benchmarks, I think that pglz_decompress_hacked8 is 
> > > safer option.
> > 
> > Here's v3 which takes into account recent benchmarks with Silesian Corpus 
> > and have better comments.
> 
> Your message from 21 October appears to say that this change makes the
> performance worse.  So I don't know how to proceed with this.

As I understand that report, in these results "less is better", so the
hacked8 variant shows better performance (33.8) than current (42.5).
The "hacked" variant shows worse performance (48.2) that the current
code.  The "in spite" phrase seems to have been a mistake.

I am surprised that there is so much variability in the performance
numbers, though, based on such small tweaks of the code.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Implement INSERT SET syntax

2019-11-01 Thread Robert Haas
On Sun, Aug 18, 2019 at 11:00 AM Tom Lane  wrote:
> Perhaps the way to resolve Peter's objection is to make the syntax
> more fully like UPDATE:
>
> INSERT INTO target SET c1 = x, c2 = y+z, ... FROM tables-providing-x-y-z
>
> (with the patch as-submitted corresponding to the case with an empty
> FROM clause, hence no variables in the expressions-to-be-assigned).
>
> Of course, this is not functionally distinct from
>
> INSERT INTO target(c1,c2,...) SELECT x, y+z, ... FROM tables-providing-x-y-z
>
> and it's fair to question whether it's worth supporting a nonstandard
> syntax just to allow the target column names to be written closer to
> the expressions-to-be-assigned.

For what it's worth, I think this would be useful enough to justify
its existence. Back in days of yore when dragons roamed the earth and
I wrote database-driven applications instead of hacking on the
database itself, I often wondered why I had to write two
completely-different looking SQL statements, one to insert the data
which a user had entered into a webform into the database, and another
to update previously-entered data. This feature would allow those
queries to be written in the same way, which would have pleased me,
back in the day.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




[PATCH][DOC] Fix for PREPARE TRANSACTION doc and postgres_fdw message.

2019-11-01 Thread Gilles Darold
Hi,


As per the following code, t1 is a remote table through postgres_fdw:


test=# BEGIN;
BEGIN
test=# SELECT * FROM t1;
...

test=# PREPARE TRANSACTION 'gxid1';
ERROR:  cannot prepare a transaction that modified remote tables


I have attached a patch to the documentation that adds remote tables to
the list of objects where any operation prevent using a prepared
transaction, currently it is just notified "operations involving
temporary tables or the session's temporary namespace".


The second patch modify the message returned by postgres_fdw as per the
SELECT statement above the message should be more comprehensible with:

    ERROR:  cannot PREPARE a transaction that has operated on remote tables

like for temporary objects:

    ERROR:  cannot PREPARE a transaction that has operated on temporary
objects


Best regards,

--

Gilles

-- 
Gilles Darold
http://www.darold.net/

diff --git a/doc/src/sgml/ref/prepare_transaction.sgml b/doc/src/sgml/ref/prepare_transaction.sgml
index 5016ca287e..443caf131e 100644
--- a/doc/src/sgml/ref/prepare_transaction.sgml
+++ b/doc/src/sgml/ref/prepare_transaction.sgml
@@ -99,8 +99,8 @@ PREPARE TRANSACTION transaction_id
   
It is not currently allowed to PREPARE a transaction that
has executed any operations involving temporary tables or the session's
-   temporary namespace, created any cursors WITH HOLD, or
-   executed LISTEN, UNLISTEN, or
+   temporary namespace or remote tables, created any cursors WITH HOLD,
+   or executed LISTEN, UNLISTEN, or
NOTIFY.
Those features are too tightly
tied to the current session to be useful in a transaction to be prepared.
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 7cd69cc709..2f5c37b159 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -725,7 +725,7 @@ pgfdw_xact_callback(XactEvent event, void *arg)
 	 */
 	ereport(ERROR,
 			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-			 errmsg("cannot prepare a transaction that modified remote tables")));
+			 errmsg("cannot prepare a transaction that has operated on remote tables")));
 	break;
 case XACT_EVENT_PARALLEL_COMMIT:
 case XACT_EVENT_COMMIT:


Re: On disable_cost

2019-11-01 Thread Robert Haas
On Fri, Nov 1, 2019 at 12:00 PM Andres Freund  wrote:
> That seems like a bad idea - we add the cost multiple times. And we
> still want to compare plans that potentially involve that cost, if
> there's no other way to plan the query.

Yeah.  I kind of wonder if we shouldn't instead (a) skip adding paths
that use methods which are disabled and then (b) if we don't end up
with any paths for that reloptinfo, try again, ignoring disabling
GUCs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [Proposal] Global temporary tables

2019-11-01 Thread Konstantin Knizhnik




On 01.11.2019 18:26, Robert Haas wrote:

On Fri, Nov 1, 2019 at 11:15 AM Konstantin Knizhnik
 wrote:

It seems to me that I have found quite elegant solution for per-backend 
statistic for GTT: I just inserting it in backend's catalog cache, but not in 
pg_statistic table itself.
To do it I have to add InsertSysCache/InsertCatCache functions which insert 
pinned entry in the correspondent cache.
I wonder if there are some pitfalls of such approach?

That sounds pretty hackish. You'd have to be very careful, for
example, that if the tables were dropped or re-analyzed, all of the
old entries got removed --


I have checked it:
- when table is reanalyzed, then cache entries are replaced.
- when table is dropped, then cache entries are removed.


and then it would still fail if any code
tried to access the statistics directly from the table, rather than
via the caches. My assumption is that the statistics ought to be
stored in some backend-private data structure designed for that
purpose, and that the code that needs the data should be taught to
look for it there when the table is a GTT.


Yes, if you do "select * from pg_statistic" then you will not see 
statistic for GTT in this case.
But I do not think that it is so critical. I do not believe that anybody 
is trying to manually interpret values in this table.
And optimizer is retrieving statistic through sys-cache mechanism and so 
is able to build correct plan in this case.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Adding percentile metrics to pg_stat_statements module

2019-11-01 Thread Igor Calabria
>
> That's not what I wrote. My point was that we *should* store the digests
> themselves, otherwise we just introduce additional errors into the
> estimates, because it discards the weights/frequencies.


Sorry. I meant to write "no reason to *not* store the digests"


Em sex, 1 de nov de 2019 às 11:17, Tomas Vondra <
tomas.von...@2ndquadrant.com> escreveu:

> On Fri, Nov 01, 2019 at 11:11:13AM -0300, Igor Calabria wrote:
> >Yeah, I agree that there's no reason to store the digests themselves and I
> >really liked the idea of it being optional.
>
> That's not what I wrote. My point was that we *should* store the digests
> themselves, otherwise we just introduce additional errors into the
> estimates, because it discards the weights/frequencies.
>
> >If it turns out that memory consumption on real workloads is small enough,
> >it could eventually be turned on by default.
> >
>
> Maybe, but it's not just about memory consumption. CPU matters too.
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: On disable_cost

2019-11-01 Thread Andres Freund
Hi,

On 2019-11-01 19:58:04 +1300, Thomas Munro wrote:
> On Fri, Nov 1, 2019 at 7:42 PM Zhenghua Lyu  wrote:
> > It is tricky to set disable_cost a huge number. Can we come up with 
> > better solution?
> 
> What happens if you use DBL_MAX?

That seems like a bad idea - we add the cost multiple times. And we
still want to compare plans that potentially involve that cost, if
there's no other way to plan the query.

- Andres




Re: WIP/PoC for parallel backup

2019-11-01 Thread Robert Haas
On Wed, Oct 30, 2019 at 10:16 AM Asif Rehman  wrote:
> 'startptr' is used by sendFile() during checksum verification. Since
> SendBackupFiles() is using sendFIle we have to set a valid WAL location.

Ugh, global variables.

Why are START_BACKUP, SEND_BACKUP_FILELIST, SEND_BACKUP_FILES, and
STOP_BACKUP all using the same base_backup_opt_list production as
BASE_BACKUP? Presumably most of those options are not applicable to
most of those commands, and the productions should therefore be
separated.

You should add docs, too.  I wouldn't have to guess what some of this
stuff was for if you wrote documentation explaining what this stuff
was for. :-)

>> The tablespace_path option appears entirely unused, and I don't know
>> why that should be necessary here, either.
>
> This is to calculate the basepathlen. We need to exclude the tablespace 
> location (or
> base path) from the filename before it is sent to the client with sendFile 
> call. I added
> this option primarily to avoid performing string manipulation on filename to 
> extract the
> tablespace location and then calculate the basepathlen.
>
> Alternatively we can do it by extracting the base path from the received 
> filename. What
> do you suggest?

I don't think the server needs any information from the client in
order to be able to exclude the tablespace location from the pathname.
Whatever it needs to know, it should be able to figure out, just as it
would in a non-parallel backup.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: fe-utils - share query cancellation code

2019-11-01 Thread Alvaro Herrera
On 2019-Nov-01, Fabien COELHO wrote:

> > >  #include "common/int.h"
> > >  #include "common/logging.h"
> > > -#include "fe_utils/conditional.h"
> > >  #include "getopt_long.h"
> > >  #include "libpq-fe.h"
> > > +#include "fe_utils/conditional.h"
> > > +#include "fe_utils/cancel.h"
> > >  #include "pgbench.h"
> > >  #include "portability/instr_time.h"
> > 
> > wtf?
> 
> I understand that you are unhappy about something, but where the issue is
> fails me, the "wtf" 3 characters are not enough to point me in the right
> direction. Feel free to elaborate a little bit more:-)

I don't see why you move the "conditional.h" line out of its correct
alphabetical position (where it is now), and then add "cancel.h" next to
it also out of its correct alphabetical position.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [Proposal] Global temporary tables

2019-11-01 Thread Robert Haas
On Fri, Nov 1, 2019 at 11:15 AM Konstantin Knizhnik
 wrote:
> It seems to me that I have found quite elegant solution for per-backend 
> statistic for GTT: I just inserting it in backend's catalog cache, but not in 
> pg_statistic table itself.
> To do it I have to add InsertSysCache/InsertCatCache functions which insert 
> pinned entry in the correspondent cache.
> I wonder if there are some pitfalls of such approach?

That sounds pretty hackish. You'd have to be very careful, for
example, that if the tables were dropped or re-analyzed, all of the
old entries got removed -- and then it would still fail if any code
tried to access the statistics directly from the table, rather than
via the caches. My assumption is that the statistics ought to be
stored in some backend-private data structure designed for that
purpose, and that the code that needs the data should be taught to
look for it there when the table is a GTT.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: fe-utils - share query cancellation code

2019-11-01 Thread Fabien COELHO



Hello Alvaro,


 #include "common/int.h"
 #include "common/logging.h"
-#include "fe_utils/conditional.h"
 #include "getopt_long.h"
 #include "libpq-fe.h"
+#include "fe_utils/conditional.h"
+#include "fe_utils/cancel.h"
 #include "pgbench.h"
 #include "portability/instr_time.h"


wtf?


I understand that you are unhappy about something, but where the issue is 
fails me, the "wtf" 3 characters are not enough to point me in the right 
direction. Feel free to elaborate a little bit more:-)


--
Fabien.




Re: MarkBufferDirtyHint() and LSN update

2019-11-01 Thread Robert Haas
On Wed, Oct 30, 2019 at 9:43 AM Antonin Houska  wrote:
> 5. In the first session, FlushBuffer()->TerminateBufferIO() will not clear
> BM_DIRTY because MarkBufferDirtyHint() has eventually set
> BM_JUST_DIRTIED. Thus the hint bit change itself will be written by the next
> call of FlushBuffer(). However page LSN is hasn't been updated so the
> requirement that WAL must be flushed first is not met.

This part confuses me. Are you saying that MarkBufferDirtyHint() can
set BM_JUST_DIRTIED without setting BM_DIRTY?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Remove configure --disable-float4-byval and --disable-float8-byval

2019-11-01 Thread Peter Geoghegan
On Fri, Nov 1, 2019 at 7:41 AM Robert Haas  wrote:
> Could we get around this by making Datum 8 bytes everywhere?

I really like that idea.

Even Raspberry Pi devices (which can cost as little as $35) use 64-bit
ARM processors. It's abundantly clear that 32-bit platforms do not
matter enough to justify keeping all the SIZEOF_DATUM crud around.

-- 
Peter Geoghegan




Re: [Proposal] Global temporary tables

2019-11-01 Thread Konstantin Knizhnik



On 25.10.2019 20:00, Pavel Stehule wrote:


>
>> So except the limitation mentioned above (which I do not
consider as critical) there is only one problem which was not
addressed: maintaining statistics for GTT.
>> If all of the following conditions are true:
>>
>> 1) GTT are used in joins
>> 2) There are indexes defined for GTT
>> 3) Size and histogram of GTT in different backends can
significantly vary.
>> 4) ANALYZE was explicitly called for GTT
>>
>> then query execution plan built in one backend will be also
used for other backends where it can be inefficient.
>> I also do not consider this problem as "show stopper" for
adding GTT to Postgres.
> I think that's *definitely* a show stopper.
Well, if both you and Pavel think that it is really "show
stopper", then
this problem really has to be addressed.
I slightly confused about this opinion, because Pavel has told me
himself that 99% of users never create indexes for temp tables
or run "analyze" for them. And without it, this problem is not a
problem
at all.


Users doesn't do ANALYZE on temp tables in 99%. It's true. But second 
fact is so users has lot of problems. It's very similar to wrong 
statistics on persistent tables. When data are small, then it is not 
problem for users, although from my perspective it's not optimal. When 
data are not small, then the problem can be brutal. Temporary tables 
are not a exception. And users and developers are people - we know 
only about fatal problems. There are lot of unoptimized queries, but 
because the problem is not fatal, then it is not reason for report it. 
And lot of people has not any idea how fast the databases can be. The 
knowledges of  users and app developers are sad book.


Pavel


It seems to me that I have found quite elegant solution for per-backend 
statistic for GTT: I just inserting it in backend's catalog cache, but 
not in pg_statistic table itself.
To do it I have to add InsertSysCache/InsertCatCache functions which 
insert pinned entry in the correspondent cache.

I wonder if there are some pitfalls of such approach?

New patch for GTT is attached.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index 647350c..ca5f22d 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -25,6 +25,7 @@
 #include "access/brin_revmap.h"
 #include "access/brin_tuple.h"
 #include "access/brin_xlog.h"
+#include "access/brin.h"
 #include "access/rmgr.h"
 #include "access/xloginsert.h"
 #include "miscadmin.h"
@@ -79,6 +80,11 @@ brinRevmapInitialize(Relation idxrel, BlockNumber *pagesPerRange,
 	meta = ReadBuffer(idxrel, BRIN_METAPAGE_BLKNO);
 	LockBuffer(meta, BUFFER_LOCK_SHARE);
 	page = BufferGetPage(meta);
+
+	if (GlobalTempRelationPageIsNotInitialized(idxrel, page))
+		brin_metapage_init(page, BrinGetPagesPerRange(idxrel),
+		   BRIN_CURRENT_VERSION);
+
 	TestForOldSnapshot(snapshot, idxrel, page);
 	metadata = (BrinMetaPageData *) PageGetContents(page);
 
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index 439a91b..8a6ac71 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -241,6 +241,16 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
 	metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
 	metapage = BufferGetPage(metabuffer);
 
+	if (GlobalTempRelationPageIsNotInitialized(index, metapage))
+	{
+		Buffer rootbuffer = ReadBuffer(index, GIN_ROOT_BLKNO);
+		LockBuffer(rootbuffer, BUFFER_LOCK_EXCLUSIVE);
+		GinInitMetabuffer(metabuffer);
+		GinInitBuffer(rootbuffer, GIN_LEAF);
+		MarkBufferDirty(rootbuffer);
+		UnlockReleaseBuffer(rootbuffer);
+	}
+
 	/*
 	 * An insertion to the pending list could logically belong anywhere in the
 	 * tree, so it conflicts with all serializable scans.  All scans acquire a
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index b18ae2b..41bab5d 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -1750,7 +1750,7 @@ collectMatchesForHeapRow(IndexScanDesc scan, pendingPosition *pos)
 /*
  * Collect all matched rows from pending list into bitmap.
  */
-static void
+static bool
 scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
 {
 	GinScanOpaque so = (GinScanOpaque) scan->opaque;
@@ -1774,6 +1774,12 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
 	LockBuffer(metabuffer, GIN_SHARE);
 	page = BufferGetPage(metabuffer);
 	TestForOldSnapshot(scan->xs_snapshot, scan->indexRelation, page);
+
+	if (GlobalTempRelationPageIsNotInitialized(scan->indexRelation, page))
+	{
+		UnlockReleaseBuffer(metabuffer);
+		return false;
+	}
 	blkno = GinPageGetMeta(page)->head;
 
 	/*
@@ -1784,7 +1790,7 @@ 

Re: ssl passphrase callback

2019-11-01 Thread Robert Haas
On Thu, Oct 31, 2019 at 11:37 AM Andrew Dunstan
 wrote:
> This patch provides a hook for a function that can supply an SSL
> passphrase. The hook can be filled in by a shared preloadable module. In
> order for that to be effective, the startup order is modified slightly.
> There is a test attached that builds and uses one trivial
> implementation, which just takes a configuration setting and rot13's it
> before supplying the result as the passphrase.

It seems to me that it would be a lot better to have an example in
contrib that does something which might be of actual use to users,
such as running a shell command and reading the passphrase from
stdout.

Features that are only accessible by writing C code are, in general,
not as desirable as features which can be accessed via SQL or
configuration.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: On disable_cost

2019-11-01 Thread Euler Taveira
Em sex, 1 de nov de 2019 às 03:42, Zhenghua Lyu  escreveu:
>
> My issue: I did some spikes and tests on TPCDS 1TB Bytes data. For query 
> 104, it generates
>  nestloop join even with enable_nestloop set off. And the final plan's total 
> cost is very huge (about 1e24). But If I enlarge the disable_cost to 1e30, 
> then, planner will generate hash join.
>
> So I guess that disable_cost is not large enough for huge amount of data.
>
> It is tricky to set disable_cost a huge number. Can we come up with 
> better solution?
>
Isn't it a case for a GUC disable_cost? As Thomas suggested, DBL_MAX
upper limit should be sufficient.

> The following thoughts are from Heikki:
>>
>> Aside from not having a large enough disable cost, there's also the fact 
>> that the high cost might affect the rest of the plan, if we have to use a 
>> plan type that's disabled. For example, if a table doesn't have any indexes, 
>> but enable_seqscan is off, we might put the unavoidable Seq Scan on 
>> different side of a join than we we would with enable_seqscan=on, because of 
>> the high cost estimate.
>
>
>>
>> I think a more robust way to disable forbidden plan types would be to handle 
>> the disabling in add_path(). Instead of having a high disable cost on the 
>> Path itself, the comparison add_path() would always consider disabled paths 
>> as more expensive than others, regardless of the cost.
>
I'm afraid it is not as cheap as using diable_cost as a node cost. Are
you proposing to add a new boolean variable in Path struct to handle
those cases in compare_path_costs_fuzzily?


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento




Re: Remove configure --disable-float4-byval and --disable-float8-byval

2019-11-01 Thread Robert Haas
On Thu, Oct 31, 2019 at 9:36 AM Tom Lane  wrote:
> > float8 and related types are now hardcoded to pass-by-value or
> > pass-by-reference depending on whether the build is 64- or 32-bit, as
> > was previously also the default.
>
> I'm less happy with doing this.  It makes it impossible to test the
> pass-by-reference code paths without actually firing up a 32-bit
> environment.  It'd be fine to document --disable-float8-byval as
> a developer-only option (it might be so already), but I don't want
> to lose it completely.  I fail to see any advantage in getting rid
> of it, anyway, since we do still have to maintain both code paths.

Could we get around this by making Datum 8 bytes everywhere?

Years ago, we got rid of INT64_IS_BUSTED, so we're relying on 64-bit
types working on all platforms. However, Datum on a system where
pointers are only 32 bits wide is also only 32 bits wide, so we can't
pass 64-bit quantities the same way we do elsewhere. But, why is the
width of a Datum equal to the width of a pointer, anyway? It would
surely cost something to widen 1, 2, and 4 byte quantities to 8 bytes
when packing them into datums on 32-bit platforms, but (1) we've long
since accepted that cost on 64-bit platforms, (2) it would save
palloc/pfree cycles when packing 8 byte quantities into 4-byte values,
(3) 32-bit platforms are now marginal and performance on those
platforms is not critical, and (4) it would simplify a lot of code and
reduce future bugs.

On a related note, why do we store typbyval in the catalog anyway
instead of inferring it from typlen and maybe typalign? It seems like
a bad idea to record on disk the way we pass around values in memory,
because it means that a change to how values are passed around in
memory has ramifications for on-disk compatibility.

rhaas=# select typname, typlen, typbyval, typalign from pg_type where
typlen in (1,2,4,8) != typbyval;
 typname  | typlen | typbyval | typalign
--++--+--
 macaddr8 |  8 | f| i
(1 row)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Adding percentile metrics to pg_stat_statements module

2019-11-01 Thread Tomas Vondra

On Fri, Nov 01, 2019 at 11:11:13AM -0300, Igor Calabria wrote:

Yeah, I agree that there's no reason to store the digests themselves and I
really liked the idea of it being optional.


That's not what I wrote. My point was that we *should* store the digests
themselves, otherwise we just introduce additional errors into the
estimates, because it discards the weights/frequencies.


If it turns out that memory consumption on real workloads is small enough,
it could eventually be turned on by default.



Maybe, but it's not just about memory consumption. CPU matters too.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Adding percentile metrics to pg_stat_statements module

2019-11-01 Thread Igor Calabria
Yeah, I agree that there's no reason to store the digests themselves and I
really liked the idea of it being optional.
If it turns out that memory consumption on real workloads is small enough,
it could eventually be turned on by default.

I'll start working on patch

Em qui, 31 de out de 2019 às 16:32, Tomas Vondra <
tomas.von...@2ndquadrant.com> escreveu:

> On Thu, Oct 31, 2019 at 12:51:17PM -0300, Igor Calabria wrote:
> >Hi everyone,
> >
> >I was taking a look at pg_stat_statements module and noticed that it does
> >not collect any percentile metrics. I believe that It would be really
> handy
> >to have those available and I'd love to contribute with this feature.
> >
> >The basic idea is to accumulate the the query execution times using an
> >approximation structure like q-digest or t-digest and add those results to
> >the pg_stat_statements table as fixed columns. Something like this
> >
> >p90_time:
> >p95_time:
> >p99_time:
> >p70_time:
> >...
> >
> >Another solution is to persist de digest structure in a binary column and
> >use a function to extract the desired quantile ilke this SELECT
> >approx_quantile(digest_times, 0.99) FROM pg_stat_statements
> >
>
> IMO having some sort of CDF approximation (being a q-digest or t-digest)
> would be useful, although it'd probably need to be optional (mostly
> becuase of memory consumption).
>
> I don't see why we would not store the digests themselves. Storing just
> some selected percentiles would be pretty problematic due to losing a
> lot of information on restart. Also, pg_stat_statements is not a table
> but a view on in-memory hash table.
>
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: Problem with synchronous replication

2019-11-01 Thread Michael Paquier
On Thu, Oct 31, 2019 at 05:38:32PM +0900, Fujii Masao wrote:
> Thanks for the patch! Looks good to me.

Thanks.  I have applied the core fix down to 9.4, leaving the new
assertion improvements only for HEAD.
--
Michael


signature.asc
Description: PGP signature


Re: update ALTER TABLE with ATTACH PARTITION lock mode (docs)

2019-11-01 Thread Michael Paquier
On Fri, Nov 01, 2019 at 08:59:48AM -0500, Justin Pryzby wrote:
> I guess you mean because it's not a child until after the ALTER.  Yes, that
> makes sense.

Yes, perhaps you have another idea than mine on how to shape this
sentence?
--
Michael


signature.asc
Description: PGP signature


Re: update ALTER TABLE with ATTACH PARTITION lock mode (docs)

2019-11-01 Thread Justin Pryzby
On Thu, Oct 31, 2019 at 06:07:34PM +0900, Michael Paquier wrote:
> On Mon, Oct 28, 2019 at 10:56:33PM -0500, Justin Pryzby wrote:
> > I suppose it should something other than partition(ed), since partitions 
> > can be
> > partitioned, too...
> > 
> >   Attaching a partition acquires a SHARE UPDATE 
> > EXCLUSIVE
> >   lock on the parent table, in addition to
> >   ACCESS EXCLUSIVE locks on the child table and the
> >   DEFAULT partition (if any).
> 
> In this context, "on the child table" sounds a bit confusing?  Would
> it make more sense to say the "on the table to be attached" instead?

I guess you mean because it's not a child until after the ALTER.  Yes, that
makes sense.

Thanks,
Justin




Re: fe-utils - share query cancellation code

2019-11-01 Thread Alvaro Herrera
On 2019-Nov-01, Fabien COELHO wrote:

> diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
> index 03bcd22996..389b4d7bcd 100644
> --- a/src/bin/pgbench/pgbench.c
> +++ b/src/bin/pgbench/pgbench.c
> @@ -59,9 +59,10 @@
>  
>  #include "common/int.h"
>  #include "common/logging.h"
> -#include "fe_utils/conditional.h"
>  #include "getopt_long.h"
>  #include "libpq-fe.h"
> +#include "fe_utils/conditional.h"
> +#include "fe_utils/cancel.h"
>  #include "pgbench.h"
>  #include "portability/instr_time.h"

wtf?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add a GUC variable that control logical replication

2019-11-01 Thread Peter Eisentraut

On 2019-10-20 00:23, Euler Taveira wrote:

You can probably achieve that using ALTER PUBLICATION to disable
publication of deletes or truncates, as the case may be, either
permanently or just for the duration of the operations you want to skip.


... then you are skipping all tables in the publication.


You can group tables into different publications and set the 
subscription to subscribe to multiple publications if you need this kind 
of granularity.


In any case, this kind of thing needs to be handled by the decoding 
plugin based on its configuration policies and depending on its needs. 
For example, let's say you have two decoding plugins running: one for a 
replication system and one for writing an audit log.  It would not be 
appropriate to disable logging for both of them because of some 
performance optimization for one of them.  And it would also not be 
appropriate to do this with a USERSET setting.


If we need different hooks or more DDL commands do this better, then 
that can be considered.  But this seems to be the wrong way to do it.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pause recovery if pitr target not reached

2019-11-01 Thread Peter Eisentraut

On 2019-10-21 08:44, Fujii Masao wrote:

Probably we can use standby mode + recovery target setting for
the almost same purpose. In this configuration, if end-of-WAL is reached
before recovery target, the startup process keeps waiting for new WAL to
be available. Then, if recovery target is reached, the startup process works
as recovery_target_action indicates.


So basically get rid of recovery.signal mode and honor recovery target 
parameters in standby mode?  That has some appeal because it simplify 
this whole space significantly, but perhaps it would be too confusing 
for end users?


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pglz performance

2019-11-01 Thread Peter Eisentraut

On 2019-10-25 07:05, Andrey Borodin wrote:

21 окт. 2019 г., в 14:09, Andrey Borodin  написал(а):

With Silesian corpus pglz_decompress_hacked is actually decreasing performance 
on high-entropy data.
Meanwhile pglz_decompress_hacked8 is still faster than usual pglz_decompress.
In spite of this benchmarks, I think that pglz_decompress_hacked8 is safer 
option.


Here's v3 which takes into account recent benchmarks with Silesian Corpus and 
have better comments.


Your message from 21 October appears to say that this change makes the 
performance worse.  So I don't know how to proceed with this.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Add some useful asserts into View Options macroses

2019-11-01 Thread Peter Eisentraut

On 2019-10-08 12:44, Nikolay Shaplov wrote:

В письме от понедельник, 7 октября 2019 г. 12:59:27 MSK пользователь Robert
Haas написал:


This thread is a follow up to the thread
https://www.postgresql.org/message-id/2620882.s52SJui4ql@x200m where I've
been trying to remove StdRdOptions structure and replace it with unique
structure for each relation kind.

I've decided to split that patch into smaller parts.

This part adds some asserts to ViewOptions macroses.
Since an option pointer there is converted into (ViewOptions *) it would
be
really good to make sure that this macros is called in proper context, and
we do the convertation properly. At least when running tests with asserts
turned on.


Committed.

I simplified the parentheses by one level from your patch.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Looking for a demo of extensible nodes

2019-11-01 Thread Adam Lee
Hi, hackers

As the $Subject, does anyone have one? I'd like to refer to it, and
write an example for people who is also looking for the document.

Thanks.

-- 
Adam Lee




Re: alternative to PG_CATCH

2019-11-01 Thread Peter Eisentraut

On 2019-10-29 17:10, Tom Lane wrote:

Peter Eisentraut  writes:

On 2019-10-28 13:45, Robert Haas wrote:

In theory, the do_rethrow variable could conflict with a symbol
declared in the surrounding scope, but that doesn't seem like it's a
problem worth getting worked up about.



Right.  A PG_TRY block also declares other local variables for internal
use without much care about namespacing.  If it becomes a problem, it's
easy to address.


Although we haven't been terribly consistent about it, some of our macros
address this problem by using local variable names with a leading and/or
trailing underscore, or otherwise making them names you'd be quite
unlikely to use in normal code.  I suggest doing something similar
here.  (Wouldn't be a bad idea to make PG_TRY's variables follow suit.)


committed with a leading underscore

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: 64 bit transaction id

2019-11-01 Thread Pavel Stehule
Hi

pá 1. 11. 2019 v 10:11 odesílatel Павел Ерёмин 
napsal:

> Hi.
> sorry for my English.
> I want to once again open the topic of 64 bit transaction id. I did not
> manage to find in the archive of the option that I want to discuss, so I
> write. If I searched poorly, then please forgive me.
> The idea is not very original and probably has already been considered,
> again I repeat - I did not find it. Therefore, please do not scold me
> severely.
> In discussions of 64-bit transaction id, I did not find mention of an
> algorithm for storing them, as it was done, for example, in MS SQL Server.
> What if instead of 2 fields (xmin and xmax) with a total length of 64 bits
> - use 1 field (let's call it xid) with a length of 64 bits in tuple header?
> In this field store the xid of the transaction that created the version. In
> this case, the new transaction in order to understand whether the read
> version is suitable for it or not, will have to read the next version as
> well. Those. The downside of such  decision is of course an increase in I /
> O. Transactions will have to read the +1 version. On the plus side, the
> title of the tuple remains the same length.
>

is 32 bit tid really problem? Why you need to know state of last 2^31
transactions? Is not problem in too low usage (or maybe too high overhead)
of VACUUM FREEZE.

I am not sure if increasing this range can has much more fatal problems
(maybe later)

Pavel



>
> regards, Eremin Pavel.
>


Re: fe-utils - share query cancellation code

2019-11-01 Thread Fabien COELHO


Hello,


I give a quick look and I think we can

void
psql_setup_cancel_handler(void)
{
   setup_cancel_handler(psql_sigint_callback);
}

Because it does not matter for setup_cancel_handler what we passed
because it is ignoring that in case of windows.


The "psql_sigint_callback" function is not defined under WIN32.

I've fixed a missing NULL argument in the section you pointed out, though.

I've used the shared infrastructure in pgbench.

I've noticed yet another instance of the cancelation stuff in 
"src/bin/pg_dump/parallel.c", but it seems somehow different from the two 
others, so I have not tried to used the shared version.


--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 03bcd22996..389b4d7bcd 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -59,9 +59,10 @@
 
 #include "common/int.h"
 #include "common/logging.h"
-#include "fe_utils/conditional.h"
 #include "getopt_long.h"
 #include "libpq-fe.h"
+#include "fe_utils/conditional.h"
+#include "fe_utils/cancel.h"
 #include "pgbench.h"
 #include "portability/instr_time.h"
 
@@ -3887,6 +3888,9 @@ initGenerateData(PGconn *con)
 			exit(1);
 		}
 
+		if (CancelRequested)
+			break;
+
 		/*
 		 * If we want to stick with the original logging, print a message each
 		 * 100k inserted rows.
@@ -4057,6 +4061,9 @@ runInitSteps(const char *initialize_steps)
 	if ((con = doConnect()) == NULL)
 		exit(1);
 
+	setup_cancel_handler(NULL);
+	SetCancelConn(con);
+
 	for (step = initialize_steps; *step != '\0'; step++)
 	{
 		instr_time	start;
@@ -4120,6 +4127,7 @@ runInitSteps(const char *initialize_steps)
 	}
 
 	fprintf(stderr, "done in %.2f s (%s).\n", run_time, stats.data);
+	ResetCancelConn();
 	PQfinish(con);
 	termPQExpBuffer();
 }
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index b981ae81ff..f1d9e0298a 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -29,6 +29,7 @@
 #include "libpq-fe.h"
 #include "pqexpbuffer.h"
 #include "common/logging.h"
+#include "fe_utils/cancel.h"
 #include "fe_utils/print.h"
 #include "fe_utils/string_utils.h"
 
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 90f6380170..0a66b71372 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -24,6 +24,7 @@
 #include "copy.h"
 #include "crosstabview.h"
 #include "fe_utils/mbprint.h"
+#include "fe_utils/cancel.h"
 #include "fe_utils/string_utils.h"
 #include "portability/instr_time.h"
 #include "settings.h"
@@ -223,6 +224,7 @@ NoticeProcessor(void *arg, const char *message)
 }
 
 
+#ifndef WIN32
 
 /*
  * Code to support query cancellation
@@ -241,7 +243,7 @@ NoticeProcessor(void *arg, const char *message)
  *
  * SIGINT is supposed to abort all long-running psql operations, not only
  * database queries.  In most places, this is accomplished by checking
- * cancel_pressed during long-running loops.  However, that won't work when
+ * CancelRequested during long-running loops.  However, that won't work when
  * blocked on user input (in readline() or fgets()).  In those places, we
  * set sigint_interrupt_enabled true while blocked, instructing the signal
  * catcher to longjmp through sigint_interrupt_jmp.  We assume readline and
@@ -252,34 +254,9 @@ volatile bool sigint_interrupt_enabled = false;
 
 sigjmp_buf	sigint_interrupt_jmp;
 
-static PGcancel *volatile cancelConn = NULL;
-
-#ifdef WIN32
-static CRITICAL_SECTION cancelConnLock;
-#endif
-
-/*
- * Write a simple string to stderr --- must be safe in a signal handler.
- * We ignore the write() result since there's not much we could do about it.
- * Certain compilers make that harder than it ought to be.
- */
-#define write_stderr(str) \
-	do { \
-		const char *str_ = (str); \
-		int		rc_; \
-		rc_ = write(fileno(stderr), str_, strlen(str_)); \
-		(void) rc_; \
-	} while (0)
-
-
-#ifndef WIN32
-
 static void
-handle_sigint(SIGNAL_ARGS)
+psql_sigint_callback(void)
 {
-	int			save_errno = errno;
-	char		errbuf[256];
-
 	/* if we are waiting for input, longjmp out of it */
 	if (sigint_interrupt_enabled)
 	{
@@ -288,74 +265,20 @@ handle_sigint(SIGNAL_ARGS)
 	}
 
 	/* else, set cancel flag to stop any long-running loops */
-	cancel_pressed = true;
-
-	/* and send QueryCancel if we are processing a database query */
-	if (cancelConn != NULL)
-	{
-		if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
-			write_stderr("Cancel request sent\n");
-		else
-		{
-			write_stderr("Could not send cancel request: ");
-			write_stderr(errbuf);
-		}
-	}
-
-	errno = save_errno;			/* just in case the write changed it */
+	CancelRequested = true;
 }
 
-void
-setup_cancel_handler(void)
-{
-	pqsignal(SIGINT, handle_sigint);
-}
-#else			/* WIN32 */
-
-static BOOL WINAPI
-consoleHandler(DWORD dwCtrlType)
-{
-	char		errbuf[256];
-
-	if (dwCtrlType == CTRL_C_EVENT ||
-		dwCtrlType == CTRL_BREAK_EVENT)
-	{
-		/*
-		 * Can't longjmp here, because we are in wrong thread :-(
-		 */
-
-		/* set cancel flag to stop any long-running loops */
-	

Re: The command tag of "ALTER MATERIALIZED VIEW RENAME COLUMN"

2019-11-01 Thread Ibrar Ahmed
On Fri, Nov 1, 2019 at 8:00 AM Fujii Masao  wrote:

> On Fri, Nov 1, 2019 at 6:34 AM Ibrar Ahmed  wrote:
> >
> >
> >
> > On Thu, Oct 31, 2019 at 6:56 PM Tom Lane  wrote:
> >>
> >> Fujii Masao  writes:
> >> > ... I found that the command tag of
> >> > ALTER MATERIALIZED VIEW RENAME COLUMN is "ALTER TABLE", not "ALTER
> VIEW".
> >>
> >> > =# ALTER MATERIALIZED VIEW hoge RENAME COLUMN j TO x;
> >> > ALTER TABLE
> >>
> >> > Is this intentional? Or bug?
> >>
> >> Seems like an oversight.
>
> Thanks for the check!
>
> > The same issue is with ALTER FOREIGN TABLE
>
> Yes.
>
> > Attached patch fixes that for ALTER VIEW , ALTER MATERIALIZED VIEW and
> ALTER FOREIGN TABLE
>
> You introduced subtype in your patch, but I think it's better and simpler
> to just give relationType to AlterObjectTypeCommandTag()
> if renaming the columns (i.e., renameType = OBJECT_COLUMN).
>
> That's works perfectly along with future oversight about the command tag.


> To avoid this kind of oversight about command tag, I'd like to add
> regression
> tests to make sure that SQL returns valid and correct command tag.
> But currently there seems no mechanism for such test, in regression
> test. Right??
>

Do we really need a regression test cases for such small oversights?


> Maybe we will need that mechanism.
>
> Regards,
>
> --
> Fujii Masao
>


-- 
Ibrar Ahmed


64 bit transaction id

2019-11-01 Thread Павел Ерёмин
Hi.sorry for my English.I want to once again open the topic of 64 bit transaction id. I did not manage to find in the archive of the option that I want to discuss, so I write. If I searched poorly, then please forgive me.The idea is not very original and probably has already been considered, again I repeat - I did not find it. Therefore, please do not scold me severely.In discussions of 64-bit transaction id, I did not find mention of an algorithm for storing them, as it was done, for example, in MS SQL Server.What if instead of 2 fields (xmin and xmax) with a total length of 64 bits - use 1 field (let's call it xid) with a length of 64 bits in tuple header? In this field store the xid of the transaction that created the version. In this case, the new transaction in order to understand whether the read version is suitable for it or not, will have to read the next version as well. Those. The downside of such  decision is of course an increase in I / O. Transactions will have to read the +1 version. On the plus side, the title of the tuple remains the same length.  regards, Eremin Pavel.

Re: [HACKERS] Block level parallel vacuum

2019-11-01 Thread Masahiko Sawada
On Thu, Oct 31, 2019 at 3:45 PM Dilip Kumar  wrote:
>
> On Thu, Oct 31, 2019 at 11:33 AM Dilip Kumar  wrote:
> >
> > On Tue, Oct 29, 2019 at 1:59 PM Masahiko Sawada  
> > wrote:
> > > Actually after increased shared_buffer I got expected results:
> > >
> > > * Test1 (after increased shared_buffers)
> > > normal  : 2807 ms (hit 56295, miss 2, dirty 3, total 56300)
> > > 2 workers : 2840 ms (hit 56295, miss 2, dirty 3, total 56300)
> > > 1 worker   : 2841 ms (hit 56295, miss 2, dirty 3, total 56300)
> > >
> > > I updated the patch that computes the total cost delay shared by
> > > Dilip[1] so that it collects the number of buffer hits and so on, and
> > > have attached it. It can be applied on top of my latest patch set[1].
>
> While reading your modified patch (PoC-delay-stats.patch), I have
> noticed that in my patch I used below formulae to compute the total
> delay
> total delay = delay in heap scan + (total delay of index scan
> /nworkers). But, in your patch, I can see that it is just total sum of
> all delay.  IMHO, the total sleep time during the index vacuum phase
> must be divided by the number of workers, because even if at some
> point, all the workers go for sleep (e.g. 10 msec) then the delay in
> I/O will be only for 10msec not 30 msec.  I think the same is
> discussed upthread[1]
>

I think that two approaches make parallel vacuum worker wait in
different way: in approach(a) the vacuum delay works as if vacuum is
performed by single process, on the other hand in approach(b) the
vacuum delay work for each workers independently.

Suppose that the total number of blocks to vacuum is 10,000 blocks,
the cost per blocks is 10, the cost limit is 200 and sleep time is 5
ms. In single process vacuum the total sleep time is 2,500ms (=
(10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
Because all parallel vacuum workers use the shared balance value and a
worker sleeps once the balance value exceeds the limit. In
approach(b), since the cost limit is divided evenly the value of each
workers is 40 (e.g. when 5 parallel degree). And suppose each workers
processes blocks  evenly,  the total sleep time of all workers is
12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
compute the sleep time of approach(b) by dividing the total value by
the number of parallel workers.

IOW the approach(b) makes parallel vacuum delay much more than normal
vacuum and parallel vacuum with approach(a) even with the same
settings. Which behaviors do we expect? I thought the vacuum delay for
parallel vacuum should work as if it's a single process vacuum as we
did for memory usage. I might be missing something. If we prefer
approach(b) I should change the patch so that the leader process
divides the cost limit evenly.

Regards,

--
Masahiko Sawada




Re: Creating foreign key on partitioned table is too slow

2019-11-01 Thread Amit Langote
Hello,

On Fri, Oct 25, 2019 at 7:18 AM Tomas Vondra
 wrote:
> On Thu, Oct 24, 2019 at 03:48:57PM -0300, Alvaro Herrera wrote:
> >On 2019-Oct-23, kato-...@fujitsu.com wrote:
> >
> >> Hello
> >>
> >> To benchmark with tpcb model, I tried to create a foreign key in the 
> >> partitioned history table, but backend process killed by OOM.
> >> the number of partitions is 8192. I tried in master(commit: ad4b7aeb84).
> >>
> >> I did the same thing in another server which has 200GB memory, but 
> >> creating foreign key did not end in 24 hours.
> >
> >Thanks for reporting.

Thank you Kato-san.

>  It sounds like there must be a memory leak here.
> >I am fairly pressed for time at present so I won't be able to
> >investigate this until, at least, mid November.
>
> I've briefly looked into this, and I think the main memory leak is in
> RelationBuildPartitionDesc. It gets called with PortalContext, it
> allocates a lot of memory building the descriptor, copies it into
> CacheContext but does not even try to free anything. So we end up with
> something like this:
...
> The attached patch trivially fixes that by adding a memory context
> tracking all the temporary data, and then just deletes it as a whole at
> the end of the function. This significantly reduces the memory usage for
> me, not sure it's 100% correct.

Thank you Tomas.  I think we have considered this temporary context
fix a number of times before, but it got stalled for one reason or
another ([1] comes to mind as the last thread where this came up).

Another angle to look at this is that our design where PartitionDesc
is rebuilt on relcache reload of the parent relation is not a great
one after all. It seems that we're rightly (?) invalidating the
parent's relcache 8192 times in this case, because its cacheable
foreign key descriptor changes on processing each partition, but
PartitionDesc itself doesn't change.  Having to pointlessly rebuild it
8192 times seems really wasteful.

I recall a discussion where it was proposed to build PartitionDesc
only when needed as opposed on every relcache reload of the parent
relation.  Attached PoC-at-best patch that does that seems to go
through without OOM and passes make check-world.  I think this should
have a very minor impact on select queries.

But...

> FWIW, even with this fix it still takes an awful lot to create the
> foreign key, because the CPU is stuck doing this
>
> 60.78%60.78%  postgres  postgres[.] bms_equal
> 32.58%32.58%  postgres  postgres[.] 
> get_eclass_for_sort_expr
>  3.83% 3.83%  postgres  postgres[.] 
> add_child_rel_equivalences
>  0.23% 0.00%  postgres  [unknown]   [.] 0x0005
>  0.22% 0.00%  postgres  [unknown]   [.] 
>  0.18% 0.18%  postgres  postgres[.] AllocSetCheck

...we have many problems to solve here. :-(

Thanks,
Amit

[1] 
https://www.postgresql.org/message-id/CA%2BTgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw%40mail.gmail.com


build-PartitionDesc-when-needed-PoC.patch
Description: Binary data


Refactor parse analysis of EXECUTE command

2019-11-01 Thread Peter Eisentraut

This patch moves the parse analysis component of ExecuteQuery() and
EvaluateParams() into a new transformExecuteStmt() that is called from
transformStmt().  This makes EXECUTE behave more like other utility
commands.

Effects are that error messages can have position information (see 
regression test case), and it allows using external parameters in the 
arguments of the EXECUTE command.


I had previously inquired about this in [0] and some vague concerns were 
raised.  I haven't dug very deep on this, but I figure with an actual 
patch it might be easier to review and figure out if there are any problems.



[0]: 
https://www.postgresql.org/message-id/flat/ed2767e5-c506-048d-8ddf-280ecbc9e1b7%402ndquadrant.com


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: On disable_cost

2019-11-01 Thread Thomas Munro
On Fri, Nov 1, 2019 at 7:42 PM Zhenghua Lyu  wrote:
> It is tricky to set disable_cost a huge number. Can we come up with 
> better solution?

What happens if you use DBL_MAX?




On disable_cost

2019-11-01 Thread Zhenghua Lyu
Hi,

Postgres has a global variable `disable_cost`. It is set the value
1.0e10.

This value will be added to the cost of path if related GUC is set off.
For example,
if enable_nestloop is set off, when planner trys to add nestloop join
path, it continues
to add such path but with a huge cost `disable_cost`.

But 1.0e10 may not be large enough. I encounter this issue in
Greenplum(based on postgres).
Heikki tolds me that someone also encountered the same issue on Postgres.
So I send it here to
have a discussion.

My issue: I did some spikes and tests on TPCDS 1TB Bytes data. For
query 104, it generates
 nestloop join even with enable_nestloop set off. And the final plan's
total cost is very huge (about 1e24). But If I enlarge the disable_cost to
1e30, then, planner will generate hash join.

So I guess that disable_cost is not large enough for huge amount of
data.

It is tricky to set disable_cost a huge number. Can we come up with
better solution?

The following thoughts are from Heikki:

> Aside from not having a large enough disable cost, there's also the
> fact that the high cost might affect the rest of the plan, if we have to
> use a plan type that's disabled. For example, if a table doesn't have any
> indexes, but enable_seqscan is off, we might put the unavoidable Seq Scan
> on different side of a join than we we would with enable_seqscan=on,
> because of the high cost estimate.



> I think a more robust way to disable forbidden plan types would be to
> handle the disabling in add_path(). Instead of having a high disable cost
> on the Path itself, the comparison add_path() would always consider
> disabled paths as more expensive than others, regardless of the cost.


  Any thoughts or ideas on the problem? Thanks!

Best Regards,
Zhenghua Lyu


Re: progress report for ANALYZE

2019-11-01 Thread Tatsuro Yamada

Hi vignesh!


On 2019/09/17 20:51, vignesh C wrote:

On Thu, Sep 5, 2019 at 2:31 AM Alvaro Herrera  wrote:


There were some minor problems in v5 -- bogus Docbook as well as
outdated rules.out, small "git diff --check" complaint about whitespace.
This v6 (on today's master) fixes those, no other changes.


+ 
+   The command is preparing to begin scanning the heap.  This phase is
+   expected to be very brief.
+ 
In the above after "." there is an extra space, is this intentional. I
had noticed that in lot of places there is couple of spaces and
sometimes single space across this file.

Like in below, there is single space after ".":
  Time when this process' current transaction was started, or null
   if no transaction is active. If the current
   query is the first of its transaction, this column is equal to the
   query_start column.
  



Sorry for the late reply.

Probably, it is intentional because there are many extra space
in other documents. See below:

# Results of grep
=
$ grep '\.  ' doc/src/sgml/monitoring.sgml | wc -l
114
$ grep '\.  ' doc/src/sgml/information_schema.sgml | wc -l
184
$ grep '\.  ' doc/src/sgml/func.sgml | wc -l
577
=

Therefore, I'm going to leave it as is. :)


Thanks,
Tatsuro Yamada