Re: utilities to rebuild commit logs from wal

2018-06-23 Thread Bruce Momjian
On Fri, Jun 22, 2018 at 10:49:58AM +0200, Chris Travers wrote:
> Before we reinvent the wheel here, does anyone know of utilities to build
> commit logs from wal segments?  Or even to just fill with commits?
> 
> I figure it is worth asking before I write one.

Uh, what are commit log?  pg_waldump output?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Incorrect errno used with %m for backend code

2018-06-23 Thread Ashutosh Sharma
On Sat, Jun 23, 2018 at 6:43 PM, Michael Paquier  wrote:
> On Fri, Jun 22, 2018 at 03:45:33PM +0530, Ashutosh Sharma wrote:
>> Okay, thanks for the confirmation. Few of them are also there in
>> origin.c and snapbuild.c files.
>
> Thanks Ashutosh.  I have been reviewing the whole tree and I found more
> places where this is missing, like rewriteheap.c, reorderbuffer.c or
> pg_basebackup, which gives the attached.
> --

Okay, I too had a quick look into the source code to see if there are
still some places where we could have missed to set an errno to ENOSPC
in case of write system call failure but, couldn't find any such place
in the code. The v2 version of patch looks good to me.

So, to conclude, now, v2 patch fixes two things - 1) It makes ereport
to print a correct error number (the error number that matches with
the error message), 2) It sets the errno to ENOSPC (assuming that the
problem is no disk space) if write system call fails to set an errno.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com



Re: Keeping temporary tables in shared buffers

2018-06-23 Thread Bruce Momjian
On Thu, Jun 21, 2018 at 07:42:54AM +0530, Amit Kapila wrote:
> On Wed, Jun 20, 2018 at 8:47 PM, Bruce Momjian  wrote:
> > On Sat, Jun  2, 2018 at 05:18:17PM -0400, Asim Praveen wrote:
> >> Hi Amit
> >>
> >> On Mon, May 28, 2018 at 4:25 AM, Amit Kapila  
> >> wrote:
> >> >
> >> > This is one way, but I think there are other choices as well.  We can
> >> > identify and flush all the dirty (local) buffers for the relation
> >> > being accessed parallelly.  Now, once the parallel operation is
> >> > started, we won't allow performing any write operation on them.  It
> >>
> >> We talked about this in person in Ottawa and it was great meeting you!
> >>  To summarize, the above proposal to continue using local buffers for
> >> temp tables is a step forward, however, it enables only certain kinds
> >> of queries to be parallelized for temp tables.  E.g. queries changing
> >> a temp table in any way cannot be parallelized due to the restriction
> >> of no writes during parallel operation.
> >
> > Should this be a TODO item?
> >
> 
> +1.  I think we have not hammered out the design completely, but if
> somebody is willing to put effort, it is not an unsolvable problem.
> AFAIU, this thread is about parallelizing queries that refer temp
> tables, however, it is not clear from the title of this thread.

Seems it is already documented on the wiki:


https://wiki.postgresql.org/wiki/Parallel_Query#What_Parts_of_a_Query_Can_Run_In_Parallel.3F

o  Scans of plain tables may not appear below Gather if (1) they are
   temporary tables ...
   

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: comma to delimit fractional seconds

2018-06-23 Thread Tom Lane
Chris Howard  writes:
> I looked at the TODO list and saw the issue of using comma to delimit 
> fractional seconds. ...
> Is there anything more to it than allowing HH:MM:SS,### as well as 
> HH:MM:SS.# ?

Here's the thing: most of the easy-looking stuff on the TODO list is
there because there's not actually consensus about how or whether
to do it.

In the case at hand, what you'd need to convince people of is that
there's not a significant penalty in terms of robustness (error
detection) if we allow commas to substitute for periods.  There's
a bunch of existing use-cases that depend on commas being noise,
so it's not obvious that this is OK.

regards, tom lane



Re: pg_upgrade from 9.4 to 10.4

2018-06-23 Thread Bruce Momjian
On Fri, Jun 15, 2018 at 03:01:37PM -0700, Vimalraj A wrote:
> Hi,
> 
> After pg_upgrade, the data in Postgres is corrupted. 
> 
> 1. Postgres 9.4, stop db with "immediate" mode
> 2. Run pg_upgrade, to upgrade to Postgres 10.4
> 3. Start Postgres 10 and run vacuum full, got a bunch of "concurrent insert in
> progress". Looks like the data is corrupted. (Loading the old data back on
> Postgres 9.4 works just fine)
> 
> But if I stop the 9.4 DB with "smart" mode, the data after pg_upgrade looks
> fine. 
> 
> pg_upgrade doesn't stop or throw warnings if the upgraded db gets into
> corrupted state. 
> 
> 
> I would like to understand why it happens so. 
> 1. What transient state corrupts the db?
> 2. Is it a known issue with pg_upgrade? 
> 3. Is there a way to get the data from pg_upgrade after "immediate" mode stop
> of previous version?

Well, that's interesting.  We document to shut down the old and new
sever with pg_ctl stop, but don't specify to avoid immediate.  

The reason you are having problems is that pg_upgrade does not copy the
WAL from the old cluster to the new one, so there is no way to replay
the needed WAL during startup of the new server, which leads to
corruption.  Did you find this out in testing or in actual use?

What is also interesting is how pg_upgrade tries to avoid problems with
_crash_ shutdowns --- if it sees a postmaster lock file, it tries to
start the server, and if that works, it then stops it, causing the WAL
to be replayed and cleanly shutdown.  What it _doesn't_ handle is pg_ctl
-m immediate, which removes the lock file, but does leave WAL in need of
replay.  Oops!

Ideally we could detect this before we check pg_controldata and then do
the start/stop trick to fix the WAL, but the ordering of the code makes
that hard.  Instead, I have developed the attached patch which does a
check for the cluster state at the same time we are checking
pg_controldata, and reports an error if there is not a clean shutdown. 
Based on how rare this is, this is probably the cleanest solution, and I
think can be backpatched.

Patch attached.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
new file mode 100644
index 0fe98a5..bba3b1b
*** a/src/bin/pg_upgrade/controldata.c
--- b/src/bin/pg_upgrade/controldata.c
*** get_control_data(ClusterInfo *cluster, b
*** 58,63 
--- 58,64 
  	bool		got_large_object = false;
  	bool		got_date_is_int = false;
  	bool		got_data_checksum_version = false;
+ 	bool		got_cluster_state = false;
  	char	   *lc_collate = NULL;
  	char	   *lc_ctype = NULL;
  	char	   *lc_monetary = NULL;
*** get_control_data(ClusterInfo *cluster, b
*** 423,428 
--- 424,487 
  	pclose(output);
  
  	/*
+ 	 * Check for clean shutdown
+ 	 */
+ 
+ 	/* only pg_controldata outputs the cluster state */
+ 	snprintf(cmd, sizeof(cmd), "\"%s/pg_controldata\" \"%s\"",
+ 			 cluster->bindir, cluster->pgdata);
+ 	fflush(stdout);
+ 	fflush(stderr);
+ 
+ 	if ((output = popen(cmd, "r")) == NULL)
+ 		pg_fatal("could not get control data using %s: %s\n",
+  cmd, strerror(errno));
+ 
+ 	/* we have the result of cmd in "output". so parse it line by line now */
+ 	while (fgets(bufin, sizeof(bufin), output))
+ 	{
+ 		if ((!live_check || cluster == _cluster) &&
+ 			(p = strstr(bufin, "Database cluster state:")) != NULL)
+ 		{
+ 			p = strchr(p, ':');
+ 
+ 			if (p == NULL || strlen(p) <= 1)
+ pg_fatal("%d: database cluster state problem\n", __LINE__);
+ 
+ 			p++;/* remove ':' char */
+ 
+ 			/*
+ 			 * We checked earlier for a postmaster lock file, and if we found
+ 			 * one, we tried to start/stop the server to replay the WAL.  However,
+ 			 * pg_ctl -m immediate doesn't leave a lock file, but does require
+ 			 * WAL replay, so we check here that the server was shut down cleanly,
+ 			 * from the controldata perspective.
+ 			 */
+ 			/* remove leading spaces */
+ 			while (*p == ' ')
+ p++;
+ 			if (strcmp(p, "shut down\n") != 0)
+ 			{
+ if (cluster == _cluster)
+ 	pg_fatal("The source cluster was not shut down cleanly.\n");
+ else
+ 	pg_fatal("The target cluster was not shut down cleanly.\n");
+ 			}
+ 			got_cluster_state = true;
+ 		}
+ 	}
+ 
+ 	pclose(output);
+ 
+ 	if (!got_cluster_state)
+ 	{
+ 		if (cluster == _cluster)
+ 			pg_fatal("The source cluster lacks cluster state information:\n");
+ 		else
+ 			pg_fatal("The target cluster lacks cluster state information:\n");
+ 	}
+ 
+ 	/*
  	 * Restore environment variables
  	 */
  	pg_putenv("LC_COLLATE", lc_collate);


Re: WIP: BRIN multi-range indexes

2018-06-23 Thread Tomas Vondra
Hi,

Attached is rebased version of this BRIN patch series, fixing mostly the
breakage due to 372728b0 (aka initial-catalog-data format changes). As
2018-07 CF is meant for almost-ready patches, this is more a 2018-09
material. But perhaps someone would like to take a look - and I'd have
to fix it anyway ...

At the pgcon dev meeting I suggested that long-running patches should
have a "summary" post once in a while, so that reviewers don't have to
reread the whole thread and follow all the various discussions. So let
me start with this thread, although it's not a particularly long or
complex one, nor does it have a long discussion. But anyway ...


The patches introduce two new BRIN opclasses - minmax-multi and bloom.


minmax-multi


minmax-multi is a variant of the current minmax opclass that handles
cases where the plain minmax opclass degrades due to outlier values.

Imagine almost perfectly correlated data (say, timestamps in a log
table) - that works great with regular minmax indexes. But if you go and
delete a bunch of historical messages (for whatever reason), new rows
with new timestamps will be routed to the empty space and the minmax
indexes will degrade because the ranges will get much "wider" due to the
new values.

The minmax-multi indexes deal with that by maintaining not a single
minmax range, but several of them. That allows tracking the outlier
values separately, without constructing one wide minmax range.

Consider this artificial example:

create table t (a bigint, b int);

alter t set (fillfactor=95);

insert into t select i + 1000*random(), i+1000*random()
 from generate_series(1,1) s(i);

update t set a = 1, b = 1 where random() < 0.001;
update t set a = 1, b = 1 where random() < 0.001;

Now if you create a regular minmax index, it's going to perform
terribly, because pretty much every minmax range is [1,1] thanks
to the update of 0.1% of rows.

create index on t using brin (a);

explain analyze select * from t
where a between 1923300::int and 1923600::int;

  QUERY PLAN
  -
   Bitmap Heap Scan on t  (cost=75.11..75884.45 rows=319 width=12)
(actual time=948.906..101739.892 rows=308 loops=1)
 Recheck Cond: ((a >= 1923300) AND (a <= 1923600))
 Rows Removed by Index Recheck: 9692
 Heap Blocks: lossy=568182
 ->  Bitmap Index Scan on t_a_idx  (cost=0.00..75.03 rows=22587
  width=0) (actual time=89.357..89.357 rows=5681920 loops=1)
   Index Cond: ((a >= 1923300) AND (a <= 1923600))
   Planning Time: 2.161 ms
   Execution Time: 101740.776 ms
  (8 rows)

But with the minmax-multi opclass, this is not an issue:

create index on t using brin (a int8_minmax_multi_ops);

  QUERY PLAN
  ---
   Bitmap Heap Scan on t  (cost=1067.11..76876.45 rows=319 width=12)
   (actual time=38.906..49.763 rows=308 loops=1)
 Recheck Cond: ((a >= 1923300) AND (a <= 1923600))
 Rows Removed by Index Recheck: 0
 Heap Blocks: lossy=128
 ->  Bitmap Index Scan on t_a_idx  (cost=0.00..1067.03 rows=22587
   width=0) (actual time=28.069..28.069 rows=1280 loops=1)
   Index Cond: ((a >= 1923300) AND (a <= 1923600))
   Planning Time: 1.715 ms
   Execution Time: 50.866 ms
  (8 rows)

Which is clearly a big improvement.

Doing this required some changes to how BRIN evaluates conditions on
page ranges. With a single minmax range it was enough to evaluate them
one by one, but minmax-multi needs to see all of them at once (to match
them against the partial ranges).

Most of the complexity is in building the summary, particularly picking
which values (partial ranges) to merge. The max number of values in the
summary is specified as values_per_range index reloption, and by default
it's set to 64, so there can be either 64 points or 32 intervals or some
combination of those.

I've been thinking about some automated way to tune this (either
globally or for each page range independently), but so far I have not
been very successful. The challenge is that making good decisions
requires global information about values in the column (e.g. global
minimum and maximum). I think the reloption with 64 as a default is a
good enough solution for now.

Perhaps the stats from pg_statistic would be useful for improving this
in the future, but I'm not sure.


bloom
=

As the name suggests, this opclass uses bloom filter for the summary.
Compared to the minmax-multi it's a bit more experimental idea, but I
believe the foundations are safe.

Using bloom filter means that the index can only support equalities, but
for many use cases that's an acceptable limitation - UUID, IP addresses,
... (various identifiers in general).

Of course, how to size the bloom filter? It's 

Re: [GSoC] array processing questions

2018-06-23 Thread Andrew Gierth
> "Charles" == Charles Cui  writes:

 Charles> Hi mentors and hackers,
 Charles> One question about array processing in postgresql. Assume
 Charles> my input is a Datum (which contains an array). Are there any
 Charles> examples to loop through the array and operates on each
 Charles> element? feel like it is not that straight-ford.

See array_iter_setup and array_iter_next

-- 
Andrew (irc:RhodiumToad)



[GSoC] working status

2018-06-23 Thread Charles Cui
Hi mentors and hackers,

   Here is my current working status. Resolved all warnings found by
Aleksander previously. Having two threads in parallel. One is the thrift
binary type implementation, the other is thrift compact byte interface
implementation. For these two threads, simple data type has been
implemented and tested, but still need time to focus on complicated data
type like struct, map or list. Let me know if you have any questions. Here
is the repo with latest updates, https://github.com/charles-cui/pg_thrift

Thanks, Charles.


Re: pg11b1 from outside a txn: "VACUUM cannot run inside a transaction block": should be: ...or multi-command string

2018-06-23 Thread Andrew Gierth
> "Justin" == Justin Pryzby  writes:

 Justin> I couldn't figure out how to \set VERBOSITY verbose inside a
 Justin> psql command (??),

psql -v VERBOSITY=verbose -c 'query here'

-- 
Andrew (irc:RhodiumToad)



[GSoC] array processing questions

2018-06-23 Thread Charles Cui
Hi mentors and hackers,

One question about array processing in postgresql. Assume my input is a
Datum (which contains an array). Are there any examples to loop through the
array and operates on each element? feel like it is not that straight-ford.

Thanks Charles


pg11b1 from outside a txn: "VACUUM cannot run inside a transaction block": should be: ...or multi-command string

2018-06-23 Thread Justin Pryzby
in pg10:

ts=# begin;VACUUM FULL pg_toast.pg_toast_2619;
BEGIN
ERROR:  25001: VACUUM cannot run inside a transaction block
LOCATION:  PreventTransactionChain, xact.c:3167
=> sounds fine

$ psql postgres -c 'SELECT 1; VACUUM pg_statistic'
ERROR:  VACUUM cannot be executed from a function or multi-command 
string
=> sounds fine

In pg11b1:

pryzbyj=# begin;VACUUM FULL pg_toast.pg_toast_2619;
BEGIN
ERROR:  25001: VACUUM cannot run inside a transaction block
LOCATION:  PreventInTransactionBlock, xact.c:3163
=> sounds fine

[pryzbyj@dev ~]$ psql -c 'SELECT 1; VACUUM pg_statistic'
ERROR:  VACUUM cannot run inside a transaction block
=> Error message seems off??

I couldn't figure out how to \set VERBOSITY verbose inside a psql command (??),
but strace shows for v10:
SERROR\0VERROR\0C25001\0MVACUUM cannot be executed from a function or 
multi-command string\0Fxact.c\0L3187\0RPreventTransactionChain

And for v11:
SERROR\0VERROR\0C25001\0MVACUUM cannot run inside a transaction 
block\0Fxact.c\0L3163\0RPreventInTransactionBlock

Function renamed by commit 04700b685f31508036456bea4d92533e5ceee9d6.

So behavior change maybe caused by 6eb52da3948dc8bc7c8a61cbacac14823b670c58 ?

Justin



Re: Does logical replication supports cross platform servers?

2018-06-23 Thread Bruce Momjian
On Wed, Jun 13, 2018 at 11:42:14AM +1000, Haribabu Kommi wrote:
> 
> On Tue, Jun 12, 2018 at 1:29 PM Craig Ringer  wrote:
> 
> On 12 June 2018 at 11:04, Haribabu Kommi  wrote:
> 
> Hi All,
> 
> I am not able to find any docs suggesting that PostgreSQL logical
> replication supports
> cross platform servers (windows --> Linux or vice-versa).
> 
> 
> 
> It should. I don't think there's buildfarm coverage yet, though. 
> 
> 
> Thanks for the confirmation.
> This is a good use case that must be explicitly mentioned in the docs.
> How about the attached patch?

OK, doc patch added to git head.  Let me know if I should back patch it
further.  Thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Log query parameters for terminated execute

2018-06-23 Thread Pavel Stehule
2018-06-23 21:54 GMT+02:00 Sergei Kornilov :

> Hello all
> We already have feature to logging query parameters. If we use
> log_statement = 'all' we write parameters before execution and all is fine
> here. If we use log_min_duration_statement statement is logged obviously
> after execution, but currently we have no parameters if query was
> terminated by statement_timeout, lock_timeout or by pg_terminate_backend.
>
> I would like have parameters in logs at least for such three cases.
>

It is good idea - more times I would to see these values

Regards

Pavel


> Simple way achieve this is just add errdetail_params to such ereport as in
> attached patch.
> Another way is add something like printing global variable
> debug_query_string in send_message_to_server_log
> (src/backend/utils/error/elog.c). But i have no good idea how print
> ParamListInfo correctly. We can not use OidOutputFunctionCall in all cases,
> right?
>
> Another small question is why errdetail_params uses errdetail instead
> errdetail_log? We assume that the user wants to get their own parameters
> back (if he set client_min_messages to LOG)?
>
> Any feedback is strongly appreciated. Thank you!
>
> regards, Sergei


Retrieve memory size allocated by libpq

2018-06-23 Thread Lars Kanis
Hello,

I would like to be able to retrieve the size of memory internally
allocated by libpq for a result. The reason is that we have a Ruby
wrapper that exposes libpq in Ruby. The problem is that Ruby's GC
doesn't know how much memory has been allocated by libpq, so no pressure
is applied to the GC when it should be. With this function we could
instruct the GC about the memory usage associated to each result object.

This issue has already been discussed in the following thread, with the
request to use custom malloc/realloc/free functions:

https://www.postgresql.org/message-id/flat/20170828172834.GA71455%40TC.local#20170828172834.GA71455@TC.local

Retrieving the allocated memory size is another approach to solve the
same base issue. However since the relation between memory consumption
and the particular result object is maintained, it can additionally be
used to provide diagnostic information to each object.

What do you think about adding such a function?

--
Kind Regards,
Lars

From d3ac8089a1b8c26d29d8d8e93c48a892cec75e53 Mon Sep 17 00:00:00 2001
From: Lars Kanis 
Date: Sat, 23 Jun 2018 19:34:11 +0200
Subject: [PATCH] libpq: Add function PQresultMemsize()

This function retrieves the number of bytes allocated for a given result.
That can be used to instruct the GC about the memory consumed behind a
wrapping object and for diagnosing memory consumtion.

This is an alternative approach to customizable malloc/realloc/free
functions as discussed here:
https://www.postgresql.org/message-id/flat/20170828172834.GA71455%40TC.local#20170828172834.GA71455@TC.local
---
 src/interfaces/libpq/exports.txt |  1 +
 src/interfaces/libpq/fe-exec.c   | 14 +-
 src/interfaces/libpq/libpq-fe.h  |  1 +
 src/interfaces/libpq/libpq-int.h |  2 ++
 4 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index d6a38d0df8..0b50dddbb7 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -172,3 +172,4 @@ PQsslAttribute169
 PQsetErrorContextVisibility 170
 PQresultVerboseErrorMessage 171
 PQencryptPasswordConn 172
+PQresultMemsize   173
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index 4c0114c514..064c7a693c 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -166,6 +166,7 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
 	result->curBlock = NULL;
 	result->curOffset = 0;
 	result->spaceLeft = 0;
+	result->memsize = sizeof(PGresult);
 
 	if (conn)
 	{
@@ -215,6 +216,12 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
 	return result;
 }
 
+size_t
+PQresultMemsize(const PGresult *res)
+{
+	return res->memsize;
+}
+
 /*
  * PQsetResultAttrs
  *
@@ -567,9 +574,11 @@ pqResultAlloc(PGresult *res, size_t nBytes, bool isBinary)
 	 */
 	if (nBytes >= PGRESULT_SEP_ALLOC_THRESHOLD)
 	{
-		block = (PGresult_data *) malloc(nBytes + PGRESULT_BLOCK_OVERHEAD);
+		size_t alloc_size = nBytes + PGRESULT_BLOCK_OVERHEAD;
+		block = (PGresult_data *) malloc(alloc_size);
 		if (!block)
 			return NULL;
+		res->memsize += alloc_size;
 		space = block->space + PGRESULT_BLOCK_OVERHEAD;
 		if (res->curBlock)
 		{
@@ -594,6 +603,7 @@ pqResultAlloc(PGresult *res, size_t nBytes, bool isBinary)
 	block = (PGresult_data *) malloc(PGRESULT_DATA_BLOCKSIZE);
 	if (!block)
 		return NULL;
+	res->memsize += PGRESULT_DATA_BLOCKSIZE;
 	block->next = res->curBlock;
 	res->curBlock = block;
 	if (isBinary)
@@ -711,6 +721,7 @@ PQclear(PGresult *res)
 	res->errFields = NULL;
 	res->events = NULL;
 	res->nEvents = 0;
+	res->memsize = 0;
 	/* res->curBlock was zeroed out earlier */
 
 	/* Free the PGresult structure itself */
@@ -927,6 +938,7 @@ pqAddTuple(PGresult *res, PGresAttValue *tup, const char **errmsgp)
 realloc(res->tuples, newSize * sizeof(PGresAttValue *));
 		if (!newTuples)
 			return false;		/* malloc or realloc failed */
+		res->memsize += (newSize - res->tupArrSize) * sizeof(PGresAttValue *);
 		res->tupArrSize = newSize;
 		res->tuples = newTuples;
 	}
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index ed9c806861..4fd9a4fcda 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -491,6 +491,7 @@ extern int	PQgetlength(const PGresult *res, int tup_num, int field_num);
 extern int	PQgetisnull(const PGresult *res, int tup_num, int field_num);
 extern int	PQnparams(const PGresult *res);
 extern Oid	PQparamtype(const PGresult *res, int param_num);
+extern size_t	PQresultMemsize(const PGresult *res);
 
 /* Describe prepared statements and portals */
 extern PGresult *PQdescribePrepared(PGconn *conn, const char *stmt);
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 9a586ff25a..37c9c3853d 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -208,6 +208,8 @@ struct pg_result
 	PGresult_data 

Log query parameters for terminated execute

2018-06-23 Thread Sergei Kornilov
Hello all
We already have feature to logging query parameters. If we use log_statement = 
'all' we write parameters before execution and all is fine here. If we use 
log_min_duration_statement statement is logged obviously after execution, but 
currently we have no parameters if query was terminated by statement_timeout, 
lock_timeout or by pg_terminate_backend.

I would like have parameters in logs at least for such three cases.

Simple way achieve this is just add errdetail_params to such ereport as in 
attached patch.
Another way is add something like printing global variable debug_query_string 
in send_message_to_server_log (src/backend/utils/error/elog.c). But i have no 
good idea how print ParamListInfo correctly. We can not use 
OidOutputFunctionCall in all cases, right?

Another small question is why errdetail_params uses errdetail instead 
errdetail_log? We assume that the user wants to get their own parameters back 
(if he set client_min_messages to LOG)?

Any feedback is strongly appreciated. Thank you!

regards, Sergeidiff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index f413395..ef6877e 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -171,6 +171,9 @@ static ProcSignalReason RecoveryConflictReason;
 static MemoryContext row_description_context = NULL;
 static StringInfoData row_description_buf;
 
+/* current portal parameters */
+static ParamListInfo debug_query_params = NULL;
+
 /* 
  *		decls for routines only used in this file
  * 
@@ -183,7 +186,7 @@ static void forbidden_in_wal_sender(char firstchar);
 static List *pg_rewrite_query(Query *query);
 static bool check_log_statement(List *stmt_list);
 static int	errdetail_execute(List *raw_parsetree_list);
-static int	errdetail_params(ParamListInfo params);
+static int	errdetail_log_params(ParamListInfo params);
 static int	errdetail_abort(void);
 static int	errdetail_recovery_conflict(void);
 static void start_xact_command(void);
@@ -1850,7 +1853,7 @@ exec_bind_message(StringInfo input_message)
 			*portal_name ? portal_name : "",
 			psrc->query_string),
 	 errhidestmt(true),
-	 errdetail_params(params)));
+	 errdetail_log_params(params)));
 			break;
 	}
 
@@ -1934,6 +1937,7 @@ exec_execute_message(const char *portal_name, long max_rows)
 		else
 			prepStmtName = "";
 		portalParams = portal->portalParams;
+		debug_query_params = portalParams;
 	}
 
 	/*
@@ -1985,7 +1989,7 @@ exec_execute_message(const char *portal_name, long max_rows)
 		*portal_name ? portal_name : "",
 		sourceText),
  errhidestmt(true),
- errdetail_params(portalParams)));
+ errdetail_log_params(portalParams)));
 		was_logged = true;
 	}
 
@@ -2074,7 +2078,7 @@ exec_execute_message(const char *portal_name, long max_rows)
 			*portal_name ? portal_name : "",
 			sourceText),
 	 errhidestmt(true),
-	 errdetail_params(portalParams)));
+	 errdetail_log_params(portalParams)));
 			break;
 	}
 
@@ -2082,6 +2086,7 @@ exec_execute_message(const char *portal_name, long max_rows)
 		ShowUsage("EXECUTE MESSAGE STATISTICS");
 
 	debug_query_string = NULL;
+	debug_query_params = NULL;
 }
 
 /*
@@ -2200,12 +2205,12 @@ errdetail_execute(List *raw_parsetree_list)
 }
 
 /*
- * errdetail_params
+ * errdetail_log_params
  *
- * Add an errdetail() line showing bind-parameter data, if available.
+ * Add an errdetail_log() line showing bind-parameter data, if available.
  */
 static int
-errdetail_params(ParamListInfo params)
+errdetail_log_params(ParamListInfo params)
 {
 	/* We mustn't call user-defined I/O functions when in an aborted xact */
 	if (params && params->numParams > 0 && !IsAbortedTransactionBlockState())
@@ -2256,7 +2261,7 @@ errdetail_params(ParamListInfo params)
 			pfree(pstring);
 		}
 
-		errdetail("parameters: %s", param_str.data);
+		errdetail_log("parameters: %s", param_str.data);
 
 		pfree(param_str.data);
 
@@ -2925,7 +2930,8 @@ ProcessInterrupts(void)
 		else
 			ereport(FATAL,
 	(errcode(ERRCODE_ADMIN_SHUTDOWN),
-	 errmsg("terminating connection due to administrator command")));
+	 errmsg("terminating connection due to administrator command"),
+	 errdetail_log_params(debug_query_params)));
 	}
 	if (ClientConnectionLost)
 	{
@@ -3001,14 +3007,16 @@ ProcessInterrupts(void)
 			LockErrorCleanup();
 			ereport(ERROR,
 	(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
-	 errmsg("canceling statement due to lock timeout")));
+	 errmsg("canceling statement due to lock timeout"),
+	 errdetail_log_params(debug_query_params)));
 		}
 		if (stmt_timeout_occurred)
 		{
 			LockErrorCleanup();
 			ereport(ERROR,
 	(errcode(ERRCODE_QUERY_CANCELED),
-	 errmsg("canceling statement due to statement timeout")));
+	 errmsg("canceling statement due to statement timeout"),
+	 

comma to delimit fractional seconds

2018-06-23 Thread Chris Howard



I'm new at this.

I looked at the TODO list and saw the issue of using comma to delimit 
fractional seconds.


I looked at the parser stuff and I think I have at least a start on the 
issue.


Is there anything more to it than allowing HH:MM:SS,### as well as 
HH:MM:SS.# ?


Is it OK if it works that way in all timestamps, not just 
ISO-8601-compliant cases?



Chris Howard








postgresql_fdw doesn't handle defaults correctly

2018-06-23 Thread Pavel Stehule
Hi

I have a table boo

create table boo(id serial primary key, inserted date default current_date,
v varchar);

I imported this table via simple

IMPORT FOREIGN SCHEMA public FROM SERVER foreign_server INTO public;


The command insert into boo(v) values('ahoj'); is working in original
database, but in second database with foreign table this fails

postgres=# insert into boo(v) values('ahoj');
ERROR:  null value in column "id" violates not-null constraint
DETAIL:  Failing row contains (null, null, ahoj).
CONTEXT:  remote SQL command: INSERT INTO public.boo(id, inserted, v)
VALUES ($1, $2, $3)

It does unwanted transformation to insert of all columns.

Is it expected behave?

Regards

Pavel


Re: Constraint documentation

2018-06-23 Thread Fabien COELHO


Hello lætitia,

My 0.02 € to try to simplify the suggested documentation.


Check constraint were not


are not


designed to enforce business rules across tables.



Avoid using check constraints with function accessing to other tables


accessing other tables (no "to")

and prefer triggers instead (please refer to  
for more information about triggers).


... and use  instead.


PostgreSQL won't prevent you from doing so,


Although PostgreSQL ... so,

but be aware you might encounter difficulties to restore dumps 
(generated with pg_dump or pg_dumpall) if you do.


beware that dumps generated by pg_*<...> or <...> may be hard 
to restore because of such checks, as the underlying dependencies are not 
taken into account.


--
Fabien.

Re: Adding Markodwn formatting to psql output

2018-06-23 Thread Vik Fearing
On 23/06/18 17:01, Alvaro Herrera wrote:
> On 2018-Jun-23, Bruce Momjian wrote:
> 
>> On Tue, May 29, 2018 at 06:47:54PM +0200, Lætitia Avrot wrote:
>>>
>>> I will try to apply and review the patch.
>>
>> Uh, no one mentioned in this thread that psql has supported asciidoc
>> since PG 9.5.
> 
> I would welcome markdown output, particularly some version that can be
> processed nicely by pandoc :-)

I use latex format for that, but I also welcome a markdown format.
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



Re: Incorrect visibility test function assigned to snapshot

2018-06-23 Thread Bruce Momjian
On Wed, May 30, 2018 at 09:28:54AM -0400, Alvaro Herrera wrote:
> On 2018-May-30, Antonin Houska wrote:
> 
> > In the header comment, SnapBuildInitialSnapshot() claims to set
> > snapshot->satisfies to the HeapTupleSatisfiesMVCC test function, and indeed 
> > it
> > converts the "xid" array to match its semantics (i.e. the xid items 
> > eventually
> > represent running transactions as opposed to the committed ones). However 
> > the
> > test function remains HeapTupleSatisfiesHistoricMVCC as set by
> > SnapBuildBuildSnapshot().
> 
> Interesting.  While this sounds like an oversight that should have
> horrible consequences, it's seems not to because the current callers
> don't seem to care about the ->satisfies function.  Are you able to come
> up with some scenario in which it causes an actual problem?

Uh, are we going to fix this anyway?  Seems we should.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Supporting tls-server-end-point as SCRAM channel binding for OpenSSL 1.0.0 and 1.0.1

2018-06-23 Thread Bruce Momjian
On Wed, Jun  6, 2018 at 01:16:11PM -0700, Steven Fackler wrote:
> TLS 1.3, (which is currently in a draft state, but is theoretically being
> finalized soon) does not support the TLS channel binding algorithms [1]. From

Uh, according to this article, TLS 1.3 was finalized in March:

  
https://www.theregister.co.uk/2018/03/27/with_tls_13_signed_off_its_implementation_time/

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Adding Markodwn formatting to psql output

2018-06-23 Thread Alvaro Herrera
On 2018-Jun-23, Bruce Momjian wrote:

> On Tue, May 29, 2018 at 06:47:54PM +0200, Lætitia Avrot wrote:
> > 
> > 
> > Le mar. 29 mai 2018 à 17:28, Euler Taveira  a écrit :
> > 
> > 2018-05-29 12:11 GMT-03:00 Lætitia Avrot :
> > > I write often documents for my customers in MarkDown to produce pdf
> > > documents. Quite often, I need to present data from queries in a 
> > tabular
> > > form.
> > >
> > https://www.postgresql.org/message-id/flat/CAAYBy8YU4pXYKDHeQhsA_=
> > FC93yOBZp5j1h=bssao9-olcw...@mail.gmail.com#CAAYBy8YU4pXYKDHeQhsA_=
> > FC93yOBZp5j1h=bssao9-olcw...@mail.gmail.com
> > 
> > Thanks a lot!
> > 
> >  It seems that other people need that feature too. :-)
> > 
> > I will try to apply and review the patch.
> 
> Uh, no one mentioned in this thread that psql has supported asciidoc
> since PG 9.5.

I would welcome markdown output, particularly some version that can be
processed nicely by pandoc :-)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Concurrency bug in UPDATE of partition-key

2018-06-23 Thread Alvaro Herrera
Would you wait a little bit before pushing this?  I'd like to give this
a read too.

Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Adding Markodwn formatting to psql output

2018-06-23 Thread Bruce Momjian
On Tue, May 29, 2018 at 06:47:54PM +0200, Lætitia Avrot wrote:
> 
> 
> Le mar. 29 mai 2018 à 17:28, Euler Taveira  a écrit :
> 
> 2018-05-29 12:11 GMT-03:00 Lætitia Avrot :
> > I write often documents for my customers in MarkDown to produce pdf
> > documents. Quite often, I need to present data from queries in a tabular
> > form.
> >
> https://www.postgresql.org/message-id/flat/CAAYBy8YU4pXYKDHeQhsA_=
> FC93yOBZp5j1h=bssao9-olcw...@mail.gmail.com#CAAYBy8YU4pXYKDHeQhsA_=
> FC93yOBZp5j1h=bssao9-olcw...@mail.gmail.com
> 
> Thanks a lot!
> 
>  It seems that other people need that feature too. :-)
> 
> I will try to apply and review the patch.

Uh, no one mentioned in this thread that psql has supported asciidoc
since PG 9.5.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: SCRAM with channel binding downgrade attack

2018-06-23 Thread Bruce Momjian
On Sat, Jun 23, 2018 at 10:30:19PM +0900, Michael Paquier wrote:
> On Fri, Jun 22, 2018 at 11:01:53PM -0400, Bruce Momjian wrote:
> > Uh, as I am understanding it, if we don't allow clients to force channel
> > binding, then channel binding is useless because it cannot prevent
> > man-in-the-middle attacks.  I am sure some users will try to use it, and
> > not understand that it serves no purpose.  If we then allow clients to
> > force channel binding in PG 12, they will then need to fix their
> > clients.
> > 
> > I suggest that if we don't allow users to use channel binding
> > effectively that we should remove all documentation about this
> > feature.
> 
> Well, I don't agree with this position as the protocol put in place for
> SCRAM with or without channel binding perfectly allows a client to
> enforce the use channel binding.  While that's missing for libpq, other
> clients like JDBC or npgsql could perfectly implement that before this
> gets in Postgres core in the shape they want.  So I think that the docs
> should be kept.

Yes, the code is useful, but the _feature_ is not useful until some
interface allows the forcing of channel binding.  People are worried
about users having to change their API in PG 12, but the point is that
to use this feature people will have to change their API in PG 12
anyway, and it doesn't do anything useful without an interface we don't
ship, and hasn't been written, so why confuse people that it is a
feature in PG 11?

Channel binding is listed as a _major_ feature in PG 11 in the release
notes, and you can bet people are going to look at how to use it:

  Channel binding for SCRAM authentication, to prevent potential
  man-in-the-middle attacks on database connections

It should perhaps be marked in the source code section, and listed as
not useful by PG 11's libpq or any of the interfaces built on it.  We
are also going to need to communicate to people who have already looked
at the release notes that this features is not useful in PG 11 using
libpq.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: SCRAM with channel binding downgrade attack

2018-06-23 Thread Michael Paquier
On Fri, Jun 22, 2018 at 11:01:53PM -0400, Bruce Momjian wrote:
> Uh, as I am understanding it, if we don't allow clients to force channel
> binding, then channel binding is useless because it cannot prevent
> man-in-the-middle attacks.  I am sure some users will try to use it, and
> not understand that it serves no purpose.  If we then allow clients to
> force channel binding in PG 12, they will then need to fix their
> clients.
> 
> I suggest that if we don't allow users to use channel binding
> effectively that we should remove all documentation about this
> feature.

Well, I don't agree with this position as the protocol put in place for
SCRAM with or without channel binding perfectly allows a client to
enforce the use channel binding.  While that's missing for libpq, other
clients like JDBC or npgsql could perfectly implement that before this
gets in Postgres core in the shape they want.  So I think that the docs
should be kept.
--
Michael


signature.asc
Description: PGP signature


Re: Incorrect errno used with %m for backend code

2018-06-23 Thread Michael Paquier
On Fri, Jun 22, 2018 at 11:15:53AM -0400, Alvaro Herrera wrote:
> I wondered for a bit if it would be a better idea to have
> CloseTransientFile restore the existing errno if there is any and close
> works fine -- when I noticed that that function itself says that caller
> should check errno for close() errors.  Most callers seem to do it
> correctly, but a few fail to check the return value.

Yeah, the API in its current shape is simpler to understand.  Once you
get used to the errno stanza of course...

> A bunch of other places use CloseTransientFile while closing shop after
> some other syscall failure, which is what your patch fixes.  I didn't
> review those; checking for close failure seems pointless.

Agreed.

> In some cases, we fsync the file and check that return value, then close
> the file and do *not* check CloseTransientFile's return value --
> examples are CheckPointLogicalRewriteHeap, heap_xlog_logical_rewrite,
> SnapBuildSerialize, SaveSlotToPath, durable_rename.  I don't know if
> it's reasonable for close() to fail after successfully fsyncing a file;
> maybe this is not a problem.  I would patch those nonetheless.

And writeTimeLineHistory.

> be_lo_export() is certainly missing a check: it writes and closes,
> without intervening fsync.

One problem at the same time if possible :)  I think that those
adjustments should be a separate patch.
--
Michael


signature.asc
Description: PGP signature


Re: Incorrect errno used with %m for backend code

2018-06-23 Thread Michael Paquier
On Fri, Jun 22, 2018 at 03:45:33PM +0530, Ashutosh Sharma wrote:
> Okay, thanks for the confirmation. Few of them are also there in
> origin.c and snapbuild.c files.

Thanks Ashutosh.  I have been reviewing the whole tree and I found more
places where this is missing, like rewriteheap.c, reorderbuffer.c or
pg_basebackup, which gives the attached.
--
Michael
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index 8d3c861a33..ed7ba181c7 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -1168,9 +1168,14 @@ heap_xlog_logical_rewrite(XLogReaderState *r)
 	/* write out tail end of mapping file (again) */
 	pgstat_report_wait_start(WAIT_EVENT_LOGICAL_REWRITE_MAPPING_WRITE);
 	if (write(fd, data, len) != len)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
 		ereport(ERROR,
 (errcode_for_file_access(),
  errmsg("could not write to file \"%s\": %m", path)));
+	}
 	pgstat_report_wait_end();
 
 	/*
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 65194db70e..a9ef1b3d73 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1241,12 +1241,17 @@ ReadTwoPhaseFile(TransactionId xid, bool give_warnings)
 	 */
 	if (fstat(fd, ))
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(fd);
 		if (give_warnings)
+		{
+			errno = save_errno;
 			ereport(WARNING,
 	(errcode_for_file_access(),
 	 errmsg("could not stat two-phase state file \"%s\": %m",
 			path)));
+		}
 		return NULL;
 	}
 
@@ -1274,13 +1279,18 @@ ReadTwoPhaseFile(TransactionId xid, bool give_warnings)
 	pgstat_report_wait_start(WAIT_EVENT_TWOPHASE_FILE_READ);
 	if (read(fd, buf, stat.st_size) != stat.st_size)
 	{
+		int			save_errno = errno;
+
 		pgstat_report_wait_end();
 		CloseTransientFile(fd);
 		if (give_warnings)
+		{
+			errno = save_errno;
 			ereport(WARNING,
 	(errcode_for_file_access(),
 	 errmsg("could not read two-phase state file \"%s\": %m",
 			path)));
+		}
 		pfree(buf);
 		return NULL;
 	}
@@ -1663,16 +1673,26 @@ RecreateTwoPhaseFile(TransactionId xid, void *content, int len)
 	pgstat_report_wait_start(WAIT_EVENT_TWOPHASE_FILE_WRITE);
 	if (write(fd, content, len) != len)
 	{
+		int			save_errno = errno;
+
 		pgstat_report_wait_end();
 		CloseTransientFile(fd);
+
+		/* if write didn't set errno, assume problem is no disk space */
+		errno = save_errno ? save_errno : ENOSPC;
 		ereport(ERROR,
 (errcode_for_file_access(),
  errmsg("could not write two-phase state file: %m")));
 	}
 	if (write(fd, _crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c))
 	{
+		int			save_errno = errno;
+
 		pgstat_report_wait_end();
 		CloseTransientFile(fd);
+
+		/* if write didn't set errno, assume problem is no disk space */
+		errno = save_errno ? save_errno : ENOSPC;
 		ereport(ERROR,
 (errcode_for_file_access(),
  errmsg("could not write two-phase state file: %m")));
@@ -1686,7 +1706,10 @@ RecreateTwoPhaseFile(TransactionId xid, void *content, int len)
 	pgstat_report_wait_start(WAIT_EVENT_TWOPHASE_FILE_SYNC);
 	if (pg_fsync(fd) != 0)
 	{
+		int			save_errno = errno;
+
 		CloseTransientFile(fd);
+		errno = save_errno;
 		ereport(ERROR,
 (errcode_for_file_access(),
  errmsg("could not fsync two-phase state file: %m")));
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index adbd6a2126..1a419aa49b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3268,7 +3268,10 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 	pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_SYNC);
 	if (pg_fsync(fd) != 0)
 	{
+		int			save_errno = errno;
+
 		close(fd);
+		errno = save_errno;
 		ereport(ERROR,
 (errcode_for_file_access(),
  errmsg("could not fsync file \"%s\": %m", tmppath)));
@@ -11675,8 +11678,10 @@ retry:
 	if (lseek(readFile, (off_t) readOff, SEEK_SET) < 0)
 	{
 		char		fname[MAXFNAMELEN];
+		int			save_errno = errno;
 
 		XLogFileName(fname, curFileTLI, readSegNo, wal_segment_size);
+		errno = save_errno;
 		ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen),
 (errcode_for_file_access(),
  errmsg("could not seek in log segment %s to offset %u: %m",
@@ -11688,9 +11693,11 @@ retry:
 	if (read(readFile, readBuf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
 	{
 		char		fname[MAXFNAMELEN];
+		int			save_errno = errno;
 
 		pgstat_report_wait_end();
 		XLogFileName(fname, curFileTLI, readSegNo, wal_segment_size);
+		errno = save_errno;
 		ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen),
 (errcode_for_file_access(),
  errmsg("could not read from log segment %s, offset %u: %m",
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 52fe55e2af..4ecdc9220f 100644
--- a/src/backend/access/transam/xlogutils.c
+++ 

Re: PATCH: backtraces for error messages

2018-06-23 Thread Craig Ringer
On 23 June 2018 at 20:22, Craig Ringer  wrote:


> Yeah. libunwind doesn't expose any interface to get that information, so
> you'd have to use platform APIs, like glibc's dl_iterate_phdr or dladdr, or
> capture /proc/self/maps.
>

Ahem, I take that part back.

https://stackoverflow.com/a/21584356/398670

see /usr/include/link.h



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: PATCH: backtraces for error messages

2018-06-23 Thread Craig Ringer
On 22 June 2018 at 23:26, Korry Douglas 
wrote:

> A few misc comments:
>
> In general, +1.
>
> It might be nice to move the stack trace code into a separate function
> (not static to elog.c) - I have often found it useful to attach backtraces
> to objects so I can debug complex allocation/deallocation problems.
>

Good suggestion.


>
> The major expense in capturing a stack trace is in the symbolization of
> the stack addresses, not the stack walk itself.  You typically want to
> symbolize the addresses at the time you generate the trace, but that's not
> always the case.  For example, if you want to record the stack trace of
> every call to AllocSetAlloc() (and attach the trace to the AllocChunk)
> performance gets unbearable if you symbolize every trace.  So a flag that
> specifies whether to symbolize would be nice too.
>

libunwind has some nifty caching that makes that a _lot_ cheaper; that's
part of why I went with it despite the extra lib requirement.



> If you don't symbolize, you need a way to capture the address range of
> each dynamically loaded shared object (so that you can later symbolize
> using something like addr2line).
>

Yeah. libunwind doesn't expose any interface to get that information, so
you'd have to use platform APIs, like glibc's dl_iterate_phdr or dladdr, or
capture /proc/self/maps. You have to make sure that info makes it to the
log, and is re-output often enough not to be lost to log rotation, and is
invalidated and re-output if mappings change due to new lib loads etc. But
you don't want to print it all the time either.

Then you have to walk the stack and print the instruction pointers and
stack pointers and spit out raw traces for the user to reassemble.

I'd argue that if you're doing the sort of thing where you want a stack of
every AllocSetAlloc, you shouldn't be trying to do that in-process. That's
where tools like perf, systemtap, etc really come into their own. I'm
focused on making additional diagnostic info for errors and key log
messages collectable for systems that aren't easily accessed, like users
who have 12-hour read/response latencies and security setups as strict as
they are insane and nonsensical.

I'd have no objection to the option to disable symbolic back traces and
print the raw call stack. It's trivial to do; in fact, I only removed the
ip/sp info to keep the output more concise.

I'm not interested in writing anything to handle the library load address
mapping collection etc though. I don't see a really sensible way to do that
in a log-based system.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: libpq compression

2018-06-23 Thread Konstantin Knizhnik




On 22.06.2018 20:56, Robbie Harwood wrote:

Konstantin Knizhnik  writes:


On 22.06.2018 18:59, Robbie Harwood wrote:

Konstantin Knizhnik  writes:

On 21.06.2018 20:14, Robbie Harwood wrote:

Konstantin Knizhnik  writes:

On 21.06.2018 17:56, Robbie Harwood wrote:

Konstantin Knizhnik  writes:

On 20.06.2018 23:34, Robbie Harwood wrote:

Konstantin Knizhnik  writes:


My idea was the following: client want to use compression. But
server may reject this attempt (for any reasons: it doesn't
support it, has no proper compression library, do not want to
spend CPU for decompression,...) Right now compression
algorithm is hardcoded. But in future client and server may
negotiate to choose proper compression protocol.  This is why
I prefer to perform negotiation between client and server to
enable compression.

Well, for negotiation you could put the name of the algorithm
you want in the startup.  It doesn't have to be a boolean for
compression, and then you don't need an additional round-trip.

Sorry, I can only repeat arguments I already mentioned:

- in future it may be possible to specify compression algorithm

- even with boolean compression option server may have some
reasons to reject client's request to use compression

Extra flexibility is always good thing if it doesn't cost too
much. And extra round of negotiation in case of enabling
compression seems to me not to be a high price for it.

You already have this flexibility even without negotiation.  I
don't want you to lose your flexibility.  Protocol looks like
this:

- Client sends connection option "compression" with list of
 algorithms it wants to use (comma-separated, or something).

- First packet that the server can compress one of those algorithms
 (or none, if it doesn't want to turn on compression).

No additional round-trips needed.

This is exactly how it works now...  Client includes compression
option in connection string and server replies with special
message ('Z') if it accepts request to compress traffic between
this client and server.

No, it's not.  You don't need this message.  If the server receives
a compression request, it should just turn compression on (or not),
and then have the client figure out whether it got compression
back.

How it will managed to do it. It receives some reply and first of
all it should know whether it has to be decompressed or not.

You can tell whether a message is compressed by looking at it.  The
way the protocol works, every message has a type associated with it:
a single byte, like 'R', that says what kind of message it is.

Compressed message can contain any sequence of bytes, including 'R':)

Then tag your messages with a type byte.  Or do it the other way around
- look for the zstd framing within a message.

Please, try to work with me on this instead of fighting every design
change.

Sorry, I do not want fighting.
I am always vote for peace and constructive dialog.

But it is hard to me to understand and accept your arguments.

I do not understand why secure_read function is better place for 
handling compression than pq_recvbuf.

And why it is destroying existed model.
I already mentioned my arguments:
1. I want to use the same code for frontend and backend.
2. I think that streaming compression can be used not only for libpq. 
This is why I tried to make zpq_stream independent from

communication layer and pass here callbacks for sending/receiving data.

If pq_recvbuf is not right place for performing decommpression, I can 
introduce some other function,  like read_raw or something like that and 
do decompression here. But I do not see much sense in it.



Concerning necessity of special message for acknowledging compression by 
server: I also do not understand why you do not like idea to send some 
message and what is wrong with it. What you are suggesting "then tag 
your message" actually is the same as sending new message.

Because what is the difference between tag 'Z' and message with code 'Z'?

Sorry, but  I do not understand problems you are going to solve and do 
not see any arguments except "I can not accept it".



Thanks,
--Robbie





Re: Keeping temporary tables in shared buffers

2018-06-23 Thread Amit Kapila
On Fri, Jun 22, 2018 at 6:09 PM, Robert Haas  wrote:
> On Mon, May 28, 2018 at 4:25 AM, Amit Kapila  wrote:
>> On Fri, May 25, 2018 at 6:33 AM, Asim Praveen  wrote:
>>> We are evaluating the use of shared buffers for temporary tables.  The
>>> advantage being queries involving temporary tables can make use of parallel
>>> workers.
>> This is one way, but I think there are other choices as well.  We can
>> identify and flush all the dirty (local) buffers for the relation
>> being accessed parallelly.  Now, once the parallel operation is
>> started, we won't allow performing any write operation on them.  It
>> could be expensive if we have a lot of dirty local buffers for a
>> particular relation.  I think if we are worried about the cost of
>> writes, then we can try some different way to parallelize temporary
>> table scan.  At the beginning of the scan, leader backend will
>> remember the dirty blocks present in local buffers, it can then share
>> the list with parallel workers which will skip scanning those blocks
>> and in the end leader ensures that all those blocks will be scanned by
>> the leader.  This shouldn't incur a much additional cost as the
>> skipped blocks should be present in local buffers of backend.
>
> This sounds awkward and limiting.  How about using DSA to allocate
> space for the backend's temporary buffers and a dshash for lookups?
>

That's a better idea.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Concurrency bug in UPDATE of partition-key

2018-06-23 Thread Amit Kapila
On Fri, Jun 22, 2018 at 10:25 PM, Amit Khandekar  wrote:
> On 20 June 2018 at 18:54, Amit Kapila  wrote:
>> On Tue, Jun 19, 2018 at 9:33 PM, Amit Khandekar  
>> wrote:
>>
>> 2.
>> ExecBRDeleteTriggers(..)
>> {
>> ..
>> + /* If requested, pass back the concurrently updated tuple, if any. */
>> + if (epqslot != NULL)
>> + *epqslot = newSlot;
>> +
>>   if (trigtuple == NULL)
>>   return false;
>> +
>> + /*
>> + * If the tuple was concurrently updated and the caller of this
>> + * function requested for the updated tuple, skip the trigger
>> + * execution.
>> + */
>> + if (newSlot != NULL && epqslot != NULL)
>> + return false;
>> ..
>> }
>>
>> Can't we merge the first change into second, something like:
>>
>> if (newSlot != NULL && epqslot != NULL)
>> {
>> *epqslot = newSlot;
>> return false;
>> }
>
> We want to update *epqslot with whatever the value of newSlot is. So
> if newSlot is NULL, epqslot should be NULL. So we can't do that in the
> "if (newSlot != NULL && epqslot != NULL)" condition.
>

Why do you need to update when newslot is NULL?  Already *epqslot is
initialized as NULL in the caller (ExecUpdate). I think we only want
to update it when trigtuple is not null.  Apart from looking bit
awkward, I think it is error-prone as well because the code of
GetTupleForTrigger is such that it can return NULL with a valid value
of newSlot in which case the behavior will become undefined. The case
which I am worried is when first-time EvalPlanQual returns some valid
value of epqslot, but in the next execution after heap_lock_tuple,
returns NULL.  In practise, it won't happen because EvalPlanQual locks
the tuple, so after that heap_lock_tuple shouldn't fail again, but it
seems prudent to not rely on that unless we need to.

For now, I have removed it in the attached patch, but if you have any
valid reason, then we can add back.

>>
>> 4.
>> +/* --
>> + * ExecBRDeleteTriggers()
>> + *
>> + * Called to execute BEFORE ROW DELETE triggers.
>> + *
>> + * Returns false in following scenarios :
>> + * 1. Trigger function returned NULL.
>> + * 2. The tuple was concurrently deleted OR it was concurrently updated and 
>> the
>> + * new tuple didn't pass EvalPlanQual() test.
>> + * 3. The tuple was concurrently updated and the new tuple passed the
>> + * EvalPlanQual() test, BUT epqslot parameter is not NULL. In such case, 
>> this
>> + * function skips the trigger function execution, because the caller has
>> + * indicated that it wants to further process the updated tuple. The updated
>> + * tuple slot is passed back through epqslot.
>> + *
>> + * In all other cases, returns true.
>> + * --
>> + */
>>  bool
>>  ExecBRDeleteTriggers(EState *estate, EPQState *epqstate,
>>
..
>
> If it looks complicated, can you please suggest something to make it a
> bit simpler.
>

See attached.

Apart from this, I have changed few comments and fixed indentation
issues.  Let me know what you think about attached?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_concurrency_bug_v5.patch
Description: Binary data


Re: effect of JIT tuple deform?

2018-06-23 Thread Pavel Stehule
2018-06-23 8:35 GMT+02:00 Pavel Stehule :

> Hi
>
> I try to measure effect of JIT tuple deform and I don't see any possible
> effect.
>
> Is it this feature active in master branch?
>
> Is possible to see this feature in EXPLAIN ANALYZE?
>

Unfortunately I got slowdown

0. shared buffers = 1GB
1. create table with 50 int columns
2. insert into this table 4M rows

postgres=# \dt+ wt
   List of relations
++--+---+---++-+
| Schema | Name | Type  | Owner |  Size  | Description |
++--+---+---++-+
| public | wt   | table | pavel | 893 MB | |
++--+---+---++-+
(1 row)

default setting

postgres=# explain analyze select sum(c45) from wt;
+--+
|QUERY
PLAN|
+--+
| Finalize Aggregate  (cost=136120.69..136120.70 rows=1 width=8) (actual
time=879.547..879.547 rows=1 loops=1) |
|   ->  Gather  (cost=136120.47..136120.68 rows=2 width=8) (actual
time=879.514..879.538 rows=3 loops=1)   |
| Workers Planned:
2
|
| Workers Launched:
2
|
| ->  Partial Aggregate  (cost=135120.47..135120.48 rows=1 width=8)
(actual time=850.283..850.284 rows=1 loops=3)  |
|   ->  Parallel Seq Scan on wt  (cost=0.00..130953.77
rows=178 width=4) (actual time=0.071..223.338 rows=147 loops=3) |
| Planning Time: 0.158
ms
|
|
JIT:
|
|   Functions:
6
|
|   Generation Time: 4.267
ms
|
|   Inlining:
false
|
|   Inlining Time: 0.000
ms
|
|   Optimization:
false
|
|   Optimization Time: 2.472
ms
|
|   Emission Time: 15.929
ms
|
| Execution Time: 899.496
ms
|
+--+
(16 rows)

postgres=# set jit_tuple_deforming to off;
SET
postgres=# explain analyze select sum(c45) from wt;
+--+
|QUERY
PLAN|
+--+
| Finalize Aggregate  (cost=136120.69..136120.70 rows=1 width=8) (actual
time=743.667..743.667 rows=1 loops=1) |
|   ->  Gather  (cost=136120.47..136120.68 rows=2 width=8) (actual
time=743.654..743.657 rows=3 loops=1)   |
| Workers Planned:
2
|
| Workers Launched:
2
|
| ->  Partial Aggregate  (cost=135120.47..135120.48 rows=1 width=8)
(actual time=715.532..715.533 rows=1 loops=3)  |
|   ->  Parallel Seq Scan on wt  (cost=0.00..130953.77
rows=178 width=4) (actual time=0.067..216.245 rows=147 loops=3) |
| Planning Time: 0.157
ms
|
|
JIT:
|
|   Functions:
4
|
|   Generation Time: 1.989
ms
|
|   Inlining:
false
|
|   Inlining Time: 0.000
ms
|
|   Optimization:
false
|
|   Optimization Time: 0.449
ms
|
|   Emission Time: 7.254
ms
|
| Execution Time: 761.549
ms
|
+--+
(16 rows)

When jit_tuple_deforming is enabled, the query is slower about 100ms, looks
like performance regression





> Regards
>
> Pavel
>


effect of JIT tuple deform?

2018-06-23 Thread Pavel Stehule
Hi

I try to measure effect of JIT tuple deform and I don't see any possible
effect.

Is it this feature active in master branch?

Is possible to see this feature in EXPLAIN ANALYZE?

Regards

Pavel