Re: Copy function for logical replication slots

2018-08-13 Thread Masahiko Sawada
On Thu, Jul 12, 2018 at 9:28 PM, Masahiko Sawada  wrote:
> On Mon, Jul 9, 2018 at 10:34 AM, Michael Paquier  wrote:
>> On Mon, Jul 09, 2018 at 10:06:00AM +0900, Masahiko Sawada wrote:
>>> I think that this patch might be splitted but I will be able to send
>>> an updated patch in the next week. As you suggestion this patch needs
>>> more careful thoughts. I'll move this patch to the next commit fest if
>>> I will not be able to sent it. Is that okay?
>>
>> Fine by me.  Thanks for the update.
>
> Attached new version of patch incorporated the all comments I got from
> Michael-san.
>
> To prevent the WAL segment file of restart_lsn of the origin slot from
> removal during creating the target slot, I've chosen a way to copy new
> one while holding the origin one. One problem to implement this way is
> that the current replication slot code doesn't allow us to do
> straightforwardly; the code assumes that the process creating a new
> slot is not holding another slot. So I've changed the copy functions
> so that it save temporarily MyReplicationSlot and then restore and
> release it after creation the target slot. To ensure that both the
> origin and target slot are released properly in failure cases I used
> PG_ENSURE_ERROR_CLEANUP(). That way, we can keep the code changes of
> the logical decoding at a minimum. I've thought that we can change the
> logical decoding code so that it can assumes that the process can have
> more than one slots at once but it seems overkill to me.
>
> Please review it.
>

The previous patch conflicts with the current HEAD. Attached updated
version patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


v5-0001-Copy-function-for-logical-and-physical-replicatio.patch
Description: Binary data


Re: Upper limit arguments of pg_logical_slot_xxx_changes functions accept invalid values

2018-08-13 Thread Masahiko Sawada
On Sat, Jul 28, 2018 at 2:00 AM, Robert Haas  wrote:
> On Wed, Jul 11, 2018 at 8:58 PM, Masahiko Sawada  
> wrote:
>> While reading the replication slot codes, I found a wrong assignment
>> in pg_logical_slot_get_changes_guts() function as follows.
>>
>> if (PG_ARGISNULL(2))
>>upto_nchanges = InvalidXLogRecPtr;
>> else
>> upto_nchanges = PG_GETARG_INT32(2);
>>
>> Since the upto_nchanges is an integer value we should set 0 meaning
>> unlimited instead of InvalidXLogRecPtr. Since InvalidXLogRecPtr is
>> actually 0 this function works fine so far.
>
> If somebody changes InvalidXLogRecPtr to (uint64)-1, then it breaks as
> the code is written.  On the other hand, if somebody reverted
> 0ab9d1c4b31622e9176472b4276f3e9831e3d6ba, it would keep working as
> written but break under your proposal.

I might be missing something but I think the setting either 0 or
negative values to it solves this problem. Since the upto_nchanges is
int32 we cannot build if somebody reverted
0ab9d1c4b31622e9176472b4276f3e9831e3d6ba.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: NetBSD vs libxml2

2018-08-13 Thread Nico Williams
On Mon, Aug 13, 2018 at 11:16:23AM -0400, Tom Lane wrote:
> Well, the issue is that new kinds of switches introduce new potential
> for bugs.  In the case of -Wl,-R..., I'm not even sure that you can
> write that more than once per link, so absorbing one from xml2-config
> might well break things on some platforms.  Or, if you can write more
> than one, we'd need to make sure they end up in a desirable order.

I forgot to address this.  You're absolutely right that -Wl,-R options
could break things.  E.g., if you have dependency -lA in one location
and -lB in another, but the second location also has a libA.so that
doesn't match the one used at link-edit time.

The problem here is that ELF, the linker-editor, and the RTLD, don't
really have a way of keeping things together that they should keep
together.

In practice this usually doesn't cause serious problems that users
cannot workaround...  For NetBSD in particular this should not cause
problems because NetBSD (presumably!) would not put a libA.so in
/usr/lib and in their ports.

Question: why even separate -lxml2 and the -L and -Wl,-R options into
LIBS and LDFLAGS?  Why not put all of them into LIBS and not LDFLAGS?
What would break (aside from the runpath confusion issue)?

After all, why should a user's use of LDFLAGS on the ./configure step
all over xml2-config's --libs?

With ./configure just doing:

  LIBS="$LIBS `$XML2_CONFIG --libs`"

and a libxml2 built and installed into /tmp/xpg/, with xml2-config
modified to include -Wl,-rpath=/tmp/xpg, this works correctly:

$ ldd ./src/backend/postgres|grep xml2
libxml2.so.2 => /tmp/xpg/lib/libxml2.so.2 (0x7fc9ebfd1000)
$ readelf -a ./src/backend/postgres|grep xpg
 0x001d (RUNPATH)Library runpath: 
[/opt/pg/lib:/tmp/xpg/lib]
$ 

And while we're here, why is configure parsing the output of xml2-config
--cflags??  That too seems to assume that there will never be any C
compiler options to absorb other than -I or -D.  That at least seems
like a safer assumption.

The -l/-L/-R mess is... a mess.  It's a shortcoming of ELF, and of the
linker-editors, and the run-time linker-loaders.

Solaris/Illumos has something called "direct linking", whereby ELF
objects record for each symbol which dependency should provide it, and
this speeds up run-time linking and loading significantly.  But that
seems insufficiently advanced: ELF should also record for each
dependency where to look for it, not just a one-size-fits-all rpath.
And the CLI for the linker-editor should reflect such a state of affairs
somewhow (e.g., ld ... -lxml2:L/opt/foo/lib:R/opt/foo/lib).  The linker-
editor projects move awfully slow.

Nico
-- 



Re: Undocumented(?) limits on regexp functions

2018-08-13 Thread Tom Lane
Andrew Gierth  writes:
> All the regexp functions blow up with "invalid memory alloc request"
> errors when the input string exceeds 256MB in length. This restriction
> does not seem to be documented anywhere that I could see.

> (Also for regexp_split* and regexp_matches, there's a limit of 64M total
> matches, which also doesn't seem to be documented anywhere).

> Should these limits:

> a) be removed

Doubt it --- we could use the "huge" request variants, maybe, but
I wonder whether the engine could run fast enough that you'd want to.

> c) have better error messages?

+1 for that, though.

regards, tom lane



Undocumented(?) limits on regexp functions

2018-08-13 Thread Andrew Gierth
All the regexp functions blow up with "invalid memory alloc request"
errors when the input string exceeds 256MB in length. This restriction
does not seem to be documented anywhere that I could see.

(Also for regexp_split* and regexp_matches, there's a limit of 64M total
matches, which also doesn't seem to be documented anywhere).

Should these limits:

a) be removed

b) be documented

c) have better error messages?

-- 
Andrew (irc:RhodiumToad)



Pre-v11 appearances of the word "procedure" in v11 docs

2018-08-13 Thread Peter Geoghegan
I noticed that the word "procedure" appears in at least a few places
in the v11 docs as a not-quite-apt synonym of "function". For example,
https://www.postgresql.org/docs/11/static/plpgsql-trigger.html talks
about "PL/pgSQL Trigger Procedures" which are actually all functions
in practice.

I think that this has the potential to confuse users, since, of
course, a function is a distinct variety of object to a procedure in
v11. Tightening up the wording seems like a good idea. I'm not sure if
this was discussed already, but didn't find anything during a quick
search.

-- 
Peter Geoghegan



Re: [HACKERS] pgbench - allow to store select results into variables

2018-08-13 Thread Fabien COELHO


Hello Andrew,

Attached is v18, another basic rebase after some perl automatic 
reindentation.


This patch contains CRLF line endings


Alas, not according to "file" nor "hexdump" (only 0A, no 0D) on my local 
version, AFAICS.


What happens on the path and what is done by mail clients depending on the 
mime type is another question (eg text/x-diff or text/plain).



- and in any case it doesn't apply any more. Please fix those things.


Here is the new generated version, v19, that I just tested on my linux 
ubuntu bionic laptop:


  sh> git checkout -b test master
  Switched to a new branch 'test'

  sh> cksum ~/pgbench-into-19.patch
  3375461661 26024 ~/pgbench-into-19.patch

  sh> hexdump ~/pgbench-into-19.patch
  000 6964  2d20 672d 7469 6120 642f 636f
  010 732f 6372 732f 6d67 2f6c 6572 2f66 6770
  020 6562 636e 2e68 6773 6c6d 6220 642f 636f
  030 732f 6372 732f 6d67 2f6c 6572 2f66 6770
  040 6562 636e 2e68 6773 6c6d 690a 646e 
  # no 0d in front of 0a ^^

  sh> git apply ~/pgbench-into-19.patch

  sh>  git status
  On branch test ...
modified:   doc/src/sgml/ref/pgbench.sgml
modified:   src/bin/pgbench/pgbench.c
modified:   src/bin/pgbench/pgbench.h
modified:   src/bin/pgbench/t/001_pgbench_with_server.pl
modified:   src/fe_utils/psqlscan.l
modified:   src/include/fe_utils/psqlscan_int.h

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 88cf8b3933..946f08005d 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -942,6 +942,51 @@ pgbench  options 
 d
   
 
   
+   
+
+ \cset [prefix] or
+ \gset [prefix]
+
+
+
+ 
+  These commands may be used to end SQL queries, replacing a semicolon.
+  \cset replaces an embedded semicolon 
(\;) within
+  a compound SQL command, and \gset replaces a final
+  (;) semicolon which ends the SQL command. 
+ 
+
+ 
+  When these commands are used, the preceding SQL query is expected to
+  return one row, the columns of which are stored into variables named 
after
+  column names, and prefixed with prefix if 
provided.
+ 
+
+ 
+  The following example puts the final account balance from the first query
+  into variable abalance, and fills variables
+  one, two and
+  p_three with integers from a compound query.
+
+UPDATE pgbench_accounts
+  SET abalance = abalance + :delta
+  WHERE aid = :aid
+  RETURNING abalance \gset
+-- compound of two queries
+SELECT 1 AS one, 2 AS two \cset
+SELECT 3 AS three \gset p_
+
+ 
+
+ 
+  
+\cset and \gset commands do not 
work when
+empty SQL queries appear within a compound SQL command.
+  
+ 
+
+   
+

 \if expression
 \elif expression
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 41b756c089..4c2c263db4 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -438,12 +438,15 @@ static const char *QUERYMODE[] = {"simple", "extended", 
"prepared"};
 
 typedef struct
 {
-   char   *line;   /* text of command line */
+   char   *first_line; /* first line for short display */
+   char   *lines;  /* full multi-line text of 
command */
int command_num;/* unique index of this Command 
struct */
int type;   /* command type 
(SQL_COMMAND or META_COMMAND) */
MetaCommand meta;   /* meta command identifier, or 
META_NONE */
int argc;   /* number of command 
words */
char   *argv[MAX_ARGS]; /* command word list */
+   int compound;   /* last compound command 
(number of \;) */
+   char  **gset;   /* per-compound command prefix */
PgBenchExpr *expr;  /* parsed expression, if needed 
*/
SimpleStats stats;  /* time spent in this command */
 } Command;
@@ -1596,6 +1599,107 @@ valueTruth(PgBenchValue *pval)
}
 }
 
+/* read all responses from backend, storing into variable or discarding */
+static bool
+read_response(CState *st, char **gset)
+{
+   PGresult   *res;
+   int compound = 0;
+
+   while ((res = PQgetResult(st->con)) != NULL)
+   {
+   switch (PQresultStatus(res))
+   {
+   case PGRES_COMMAND_OK: /* non-SELECT commands */
+   case PGRES_EMPTY_QUERY: /* may be used for testing 
no-op overhead */
+   if (gset[compound] != NULL)
+   {
+   fprintf(stderr,
+   "client %d file %d 
command %d compound %d: "
+ 

Re: libpq should append auth failures, not overwrite

2018-08-13 Thread Tom Lane
Fabien COELHO  writes:
>> As I explained in my comments, the reason I did not do these things
>> is that I didn't want to change the output for cases in which just a
>> single host name is given.  I still don't.

> Ok, I get your argument when there is just one target server (cluster), 
> which is probably at least 99.9% of the use case in practice.
> However, ISTM multiple ip & multiple hostname look pretty close.
> I guess that the number of people that use these features is small, but 
> for these when things go wrong better messages are useful... so I would 
> see no harm to trigger the server introductory line when there are 
> multiples servers, whatever the reason why there are multiples servers.

The thing is that there are a *lot* of systems nowadays on which localhost
maps to both ipv4 and ipv6, so that "host=localhost" would be enough to
trigger the new behavior, and I think that would inevitably give rise to
more complaints than kudos.  So I'm still of the opinion that we're better
off with the second patch's behavior as it stands.

Responding to your other points from upthread:

> There are still 86 instances of "printfPQExpBuffer", which seems like a 
> lot. They are mostly OOM messages, but not only. This make checking the 
> patch quite cumbersome.
> I'd rather there would be only one rule, and clear reset with a comment 
> when needded (eg "fe-lobj.c").

Well, I did this for discussion's sake, but I don't really find the
resulting changes in fe-lobj.c to be any sort of improvement.  In
particular, the messiness around suppressing extraneous complaints from
lo_close is still there, if anything worse than before (it's hard to
get rid of because lo_close will reset the buffer).  I'd rather just
leave fe-lobj.c alone, possibly adding a header comment explaining why
it's different.  Which is basically because none of those functions
expect to be appending multiple errors; they're all gonna quit after
the first one.

> I agree with your suggestion of adding a function trying to preserve 
> messages on OOM errors if possible. Maybe "appendPQExpBufferOOM"?

Yeah, done.

> It is unclear to me why the "printf" variant is used in 
> "PQencryptPasswordConn" and "connectOptions2", it looks that you skipped 
> these functions.

I did because it seemed like unnecessary extra diff content, since
we're expecting the error buffer to be initially empty anyway (for
connectOptions2) or we'd just need an extra reset (for
PQencryptPasswordConn).  In the attached I went ahead and made those
changes, but again I'm unsure that it's really a net improvement.

> In passing, some OOM message are rather terse: "out of memory\n", other 
> are more precise "out of memory allocating GSSAPI buffer (%d)\n".

I got rid of the latter; I think they're unnecessary translation burdens,
and it'd be hard to fit them into a one-size-fits-all OOM reporting
subroutine, and it's pretty unclear what's the point of special-casing
them anyway.

> I do not like the resulting output
> server:
> problem found
>more details
> I'd rather have
> server: problem found
>more details

Done in the attached, although I'm concerned about the resulting effects
for error message line length --- in my tests, lines that used to reliably
fit in 80 columns don't anymore.  Again, I'm not really convinced this is
an improvement.  It would absolutely *not* be an improvement if we tried
to also shoehorn the server's IP address in there; that'd make the lines
way too long, with way too much stuff before the important part.

regards, tom lane

diff --git a/src/interfaces/libpq/fe-auth-scram.c b/src/interfaces/libpq/fe-auth-scram.c
index 1d9c937..e7110d5 100644
--- a/src/interfaces/libpq/fe-auth-scram.c
+++ b/src/interfaces/libpq/fe-auth-scram.c
@@ -185,14 +185,14 @@ pg_fe_scram_exchange(void *opaq, char *input, int inputlen,
 	{
 		if (inputlen == 0)
 		{
-			printfPQExpBuffer(>errorMessage,
-			  libpq_gettext("malformed SCRAM message (empty message)\n"));
+			appendPQExpBufferStr(>errorMessage,
+ libpq_gettext("malformed SCRAM message (empty message)\n"));
 			goto error;
 		}
 		if (inputlen != strlen(input))
 		{
-			printfPQExpBuffer(>errorMessage,
-			  libpq_gettext("malformed SCRAM message (length mismatch)\n"));
+			appendPQExpBufferStr(>errorMessage,
+ libpq_gettext("malformed SCRAM message (length mismatch)\n"));
 			goto error;
 		}
 	}
@@ -240,17 +240,17 @@ pg_fe_scram_exchange(void *opaq, char *input, int inputlen,
 			else
 			{
 *success = false;
-printfPQExpBuffer(>errorMessage,
-  libpq_gettext("incorrect server signature\n"));
+appendPQExpBufferStr(>errorMessage,
+	 libpq_gettext("incorrect server signature\n"));
 			}
 			*done = true;
 			state->state = FE_SCRAM_FINISHED;
 			break;
 
 		default:
-			/* shouldn't happen */
-			printfPQExpBuffer(>errorMessage,
-			  libpq_gettext("invalid SCRAM exchange state\n"));
+			/* shouldn't happen, so we don't 

Re: [HACKERS] pgbench - allow to store select results into variables

2018-08-13 Thread Andrew Dunstan




On 04/27/2018 12:28 PM, Fabien COELHO wrote:


Hello Stephen,

Attached is v18, another basic rebase after some perl automatic 
reindentation.





This patch contains CRLF line endings - and in any case it doesn't apply 
any more. Please fix those things.


cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: FailedAssertion on partprune

2018-08-13 Thread Robert Haas
On Sat, Aug 11, 2018 at 9:16 AM, David Rowley
 wrote:
> I started debugging this to see where things go wrong. I discovered
> that add_paths_to_append_rel() is called yet again from
> apply_scanjoin_target_to_paths() and this is where it's all going
> wrong. The problem is that the gather paths have been tagged onto the
> partial paths by this time, so accumulate_append_subpath() has no code
> to look through those to fetch the Append/MergeAppend subpaths, so it
> just appends the entire path to the subpaths List.  This all worked
> before 96030f9a481. This commit moved where generate_gather_paths() is
> called.

I'm baffled as to why looking through Gather to find
Append/MergeAppend subpaths would ever be a sane thing to do.

> I'm having a hard time understanding why we need to call
> add_paths_to_append_rel() from apply_scanjoin_target_to_paths().  The
> name of the function does not seem to imply that we'd do this. The
> comment just explains "what" rather than "why" making it a bit
> useless.
>
> /* Build new paths for this relation by appending child paths. */
> if (live_children != NIL)
> add_paths_to_append_rel(root, rel, live_children);
>
> I also think that accumulate_append_subpath() is a bit of a fragile
> way of flattening the append rel's paths, but I feel it's probably
> apply_scanjoin_target_to_paths() that's at fault here.

In my original design, apply_scanjoin_target_to_paths() -- or whatever
I called it in the original patch versions -- built an entirely new
RelOptInfo, so that we ended up with one RelOptInfo representing the
unvarnished result of scan/join planning, and a second one
representing the result of applying the scan/join target to that
result.  However, Amit Kapila was able to demonstrate that this caused
a small but noticeable regression in planning speed, which seemed like
it might plausibly cause some issues for people running very simple
queries.

In that original design, if the topmost scan/join rel was partitioned,
the new "tlist" upper rel was also partitioned, and in the same way.
In short, this was a kind of "partitionwise tlist" feature.  For each
child of the topmost scan/join rel, we built a child "tlist" rel,
which got the same paths but with the correct tlist applied to each
one.  The path for the toplevel "tlist" upper rel could then be built
by appending the children from each child "tlist" rel, or else by the
usual method of sticking a Result node over the cheapest path for the
topmost scan/join rel.  In general, the first method is superior.
Instead of a plan like Result -> Append -> (N children each producing
the unprojected tlist), you get Append -> (N children each producing
the projected tlist), which is cheaper.

To avoid the performance overhead of creating a whole new tree of
upper rels, I rewrote the code so that it modifies the RelOptInfos in
place.  That unfortunately makes the logic a bit more confusing, and
it sounds from your remarks like I didn't comment it as clearly as
might have been possible.  But the basic idea is the same: we want the
projection to be passed down to the child nodes rather than getting a
Result node on top.  The commit that did most of this --
11cf92f6e2e13c0a6e3f98be3e629e6bd90b74d5 -- also solved a few other
problems, as noted in the commit message, so I don't think we want to
rip it out wholesale.  There might be better ways to do some of it,
though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Some pgq table rewrite incompatibility with logical decoding?

2018-08-13 Thread Jeremy Finzel
>
> I don't think that's true, for two reasons.
>
> Firstly, I don't think pgq updates catalogs directly, it simply truncates
> the queue tables when rotating them (which updates the relfilenode in
> pg_class, of course).
>
> Secondly, we're occasionally seeing this on systems that do not use pgq,
> but that do VACUUM FULL on custom "queue" tables. The symptoms are exactly
> the same ("ERROR: could not map filenode"). It's pretty damn rare and we
> don't have direct access to the systems, so investigation is difficult :-(
> Our current hypothesis is that it's somewhat related to subtransactions
> (because of triggers with exception blocks).
>
> Jeremy, are you able to reproduce the issue locally, using pgq? That would
> be very valuable.


We have tried but have been unable to reproduce it.  If we do encounter it
again, we will plan on reporting back and seeing if we can do some deep
debugging.

Thanks,
Jeremy


Re: libpq compression

2018-08-13 Thread Andrew Dunstan




On 08/13/2018 02:47 PM, Robert Haas wrote:

On Fri, Aug 10, 2018 at 5:55 PM, Andrew Dunstan
 wrote:

This thread appears to have gone quiet. What concerns me is that there
appears to be substantial disagreement between the author and the reviewers.
Since the last thing was this new patch it should really have been put back
into "needs review" (my fault to some extent - I missed that). So rather
than return the patch with feedfack I'm going to set it to "needs review"
and move it to the next CF. However, if we can't arrive at a consensus about
the direction during the next CF it should be returned with feedback.

I agree with the critiques from Robbie Harwood and Michael Paquier
that the way in that compression is being hooked into the existing
architecture looks like a kludge.  I'm not sure I know exactly how it
should be done, but the current approach doesn't look natural; it
looks like it was bolted on.  I agree with the critique from Peter
Eisentraut and others that we should not go around and add -Z as a
command-line option to all of our tools; this feature hasn't yet
proved itself to be useful enough to justify that.  Better to let
people specify it via a connection string option if they want it.  I
think Thomas Munro was right to ask about what will happen when
different compression libraries are in use, and I think failing
uncleanly is quite unacceptable.  I think there needs to be some
method for negotiating the compression type; the client can, for
example, provide a comma-separated list of methods it supports in
preference order, and the server can pick the first one it likes.  In
short, I think that a number of people have provided really good
feedback on this patch, and I suggest to Konstantin that he should
consider accepting all of those suggestions.

Commit ae65f6066dc3d19a55f4fdcd3b30003c5ad8dbed tried to introduce
some facilities that can be used for protocol version negotiation as
new features are added, but this patch doesn't use them.  It looks to
me like it instead just breaks backward compatibility.  The new
'compression' option won't be recognized by older servers.  If it were
spelled '_pq_.compression' then the facilities in that commit would
cause a NegotiateProtocolVersion message to be sent by servers which
have that commit but don't support the compression option.  I'm not
exactly sure what will happen on even-older servers that don't have
that commit either, but I hope they'll treat it as a GUC name; note
that setting an unknown GUC name with a namespace prefix is not an
error, but setting one without such a prefix IS an ERROR.  Servers
which do support compression would respond with a message indicating
that compression had been enabled or, maybe, just start sending back
compressed-packet messages, if we go with including some framing in
the libpq protocol.




Excellent summary, and well argued recommendations, thanks. I've changed 
the status to waiting on author.


cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: libpq should not look up all host addresses at once

2018-08-13 Thread Tom Lane
Fabien COELHO  writes:
> Patch compiles, global "make check" ok, although I'm unsure whether the
> feature is actually tested somewhere. I think not:-(

Yeah, it's hard to test this stuff without either opening up security
hazards or making unwarranted assumptions about the local network setup.
I think that the standard regression tests only use Unix-socket
communication (except on Windows) for exactly that reason, and that makes
it hard to do anything much about regression-testing this feature.

> As you noted in another message, a small doc update should be needed.

Check.  Proposed doc patch attached.  (Only the last hunk is actually
specific to this patch, the rest is cleanup that I noticed while looking
around for possibly-relevant text.)

> I'd consider wrapping some of the logic. I'd check the port first, then 
> move the host resolution stuff into a function.

Don't really see the value of either ...

regards, tom lane

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 80e55f5..7c150f3 100644
*** a/doc/src/sgml/libpq.sgml
--- b/doc/src/sgml/libpq.sgml
*** PostgresPollingStatusType PQconnectPoll(
*** 303,311 
 
  
   
!   The hostaddr and host parameters are used appropriately to ensure that
!   name and reverse name queries are not made. See the documentation of
!   these parameters in  for details.
   
  
  
--- 303,311 
 
  
   
!   The hostaddr parameter must be used appropriately
!   to prevent DNS queries from being made.  See the documentation of
!   this parameter in  for details.
   
  
  
*** PostgresPollingStatusType PQconnectPoll(
*** 318,324 
  
  
   
!   You ensure that the socket is in the appropriate state
before calling PQconnectPoll, as described below.
   
  
--- 318,324 
  
  
   
!   You must ensure that the socket is in the appropriate state
before calling PQconnectPoll, as described below.
   
  
*** PostgresPollingStatusType PQconnectPoll(
*** 326,349 

  

!Note: use of PQconnectStartParams is analogous to
!PQconnectStart shown below.
!   
! 
!   
!To begin a nonblocking connection request, call conn = PQconnectStart("connection_info_string").
!If conn is null, then libpq has been unable to allocate a new PGconn
!structure. Otherwise, a valid PGconn pointer is returned (though not yet
!representing a valid connection to the database). On return from
!PQconnectStart, call status = PQstatus(conn). If status equals
!CONNECTION_BAD, PQconnectStart has failed.

  

!If PQconnectStart succeeds, the next stage is to poll
!libpq so that it can proceed with the connection sequence.
 Use PQsocket(conn) to obtain the descriptor of the
 socket underlying the database connection.
 Loop thus: If PQconnectPoll(conn) last returned
 PGRES_POLLING_READING, wait until the socket is ready to
 read (as indicated by select(), poll(), or
--- 326,352 

  

!To begin a nonblocking connection request,
!call PQconnectStart
!or PQconnectStartParams.  If the result is null,
!then libpq has been unable to allocate a
!new PGconn structure.  Otherwise, a
!valid PGconn pointer is returned (though not
!yet representing a valid connection to the database).  Next
!call PQstatus(conn).  If the result
!is CONNECTION_BAD, the connection attempt has already
!failed, typically because of invalid connection parameters.

  

!If PQconnectStart
!or PQconnectStartParams succeeds, the next stage
!is to poll libpq so that it can proceed with
!the connection sequence.
 Use PQsocket(conn) to obtain the descriptor of the
 socket underlying the database connection.
+(Caution: do not assume that the socket remains the same
+across PQconnectPoll calls.)
 Loop thus: If PQconnectPoll(conn) last returned
 PGRES_POLLING_READING, wait until the socket is ready to
 read (as indicated by select(), poll(), or
*** PostgresPollingStatusType PQconnectPoll(
*** 352,360 
 Conversely, if PQconnectPoll(conn) last returned
 PGRES_POLLING_WRITING, wait until the socket is ready
 to write, then call PQconnectPoll(conn) again.
!If you have yet to call
!PQconnectPoll, i.e., just after the call to
!PQconnectStart, behave as if it last returned
 PGRES_POLLING_WRITING.  Continue this loop until
 PQconnectPoll(conn) returns
 PGRES_POLLING_FAILED, indicating the connection 

Re: FailedAssertion on partprune

2018-08-13 Thread Robert Haas
On Wed, Aug 8, 2018 at 11:33 PM, Tom Lane  wrote:
> Oh ... never mind that last.  The parent Append will run its children
> sequentially, so that the Gathers execute one at a time, and at no
> point will more than 2 workers be active.

Yep.

> Nonetheless, it's a damfool query plan, because we'll be going through
> parallel worker startup/shutdown overhead 4 times for no benefit.
> We really should put in some kind of logic to force those sibling
> Gathers to be aggregated into one, or else disallow them altogether.

Disallowing them could be a big performance regression.  Combining
them is reasonable, but it's not clear to me how you'd fit that into
the planner structure.  There's no convenient RelOptInfo to associate
with the aggregated-Gather.  It can't use the parent RelOptInfo unless
all of the children have a partial path, and in that case this plan
never would have been generated in the first place.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: libpq compression

2018-08-13 Thread Robert Haas
On Fri, Aug 10, 2018 at 5:55 PM, Andrew Dunstan
 wrote:
> This thread appears to have gone quiet. What concerns me is that there
> appears to be substantial disagreement between the author and the reviewers.
> Since the last thing was this new patch it should really have been put back
> into "needs review" (my fault to some extent - I missed that). So rather
> than return the patch with feedfack I'm going to set it to "needs review"
> and move it to the next CF. However, if we can't arrive at a consensus about
> the direction during the next CF it should be returned with feedback.

I agree with the critiques from Robbie Harwood and Michael Paquier
that the way in that compression is being hooked into the existing
architecture looks like a kludge.  I'm not sure I know exactly how it
should be done, but the current approach doesn't look natural; it
looks like it was bolted on.  I agree with the critique from Peter
Eisentraut and others that we should not go around and add -Z as a
command-line option to all of our tools; this feature hasn't yet
proved itself to be useful enough to justify that.  Better to let
people specify it via a connection string option if they want it.  I
think Thomas Munro was right to ask about what will happen when
different compression libraries are in use, and I think failing
uncleanly is quite unacceptable.  I think there needs to be some
method for negotiating the compression type; the client can, for
example, provide a comma-separated list of methods it supports in
preference order, and the server can pick the first one it likes.  In
short, I think that a number of people have provided really good
feedback on this patch, and I suggest to Konstantin that he should
consider accepting all of those suggestions.

Commit ae65f6066dc3d19a55f4fdcd3b30003c5ad8dbed tried to introduce
some facilities that can be used for protocol version negotiation as
new features are added, but this patch doesn't use them.  It looks to
me like it instead just breaks backward compatibility.  The new
'compression' option won't be recognized by older servers.  If it were
spelled '_pq_.compression' then the facilities in that commit would
cause a NegotiateProtocolVersion message to be sent by servers which
have that commit but don't support the compression option.  I'm not
exactly sure what will happen on even-older servers that don't have
that commit either, but I hope they'll treat it as a GUC name; note
that setting an unknown GUC name with a namespace prefix is not an
error, but setting one without such a prefix IS an ERROR.  Servers
which do support compression would respond with a message indicating
that compression had been enabled or, maybe, just start sending back
compressed-packet messages, if we go with including some framing in
the libpq protocol.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Improve behavior of concurrent TRUNCATE

2018-08-13 Thread Robert Haas
On Fri, Aug 10, 2018 at 5:05 PM, Alvaro Herrera
 wrote:
> I was actually thinking in applying to all back-branches, not just pg11,
> considering it a fix for a pretty serious bug.  But checking the
> history, it seems that Robert patched this is 9.2 as new development
> (2ad36c4e4, 1489e2f26, cbe24a6dd, 1da5c1195, 74a1d4fe7); holes remained,
> but none was patched until 94da2a6a in pg10 -- took some time!  And then
> nobody cared about the ones you're patching now.
>
> So I withdraw my argumentation, mostly because there's clearly not as
> much interest in seeing this fixed as all that.

The original patches would, I think, have been pretty scary to
back-patch, since the infrastructure didn't exist in older branches
and we were churning a fairly large amount of code.  Now that most
places are fixed and things have had five years to bake, we could
conceivably back-patch the remaining fixes.  However, I wonder if
we've really looked into how many instances of this problem remain.
If there's still ten more that we haven't bothered to fix,
back-patching one or two that we've gotten around to doing something
about doesn't make a lot of sense to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: libpq connection timeout mismanagement

2018-08-13 Thread Robert Haas
On Thu, Aug 9, 2018 at 11:23 AM, Tom Lane  wrote:
> The patch that taught libpq about allowing multiple target hosts
> modified connectDBComplete() with the intent of making the
> connect_timeout (if specified) apply per-host, not to the complete
> connection attempt.  It did not do a very good job though, because
> the timeout only gets reset when connectDBComplete() itself detects
> a timeout.  If PQconnectPoll advances to a new host due to some
> other cause, the previous host's timeout continues to run, possibly
> causing a premature timeout failure for the new one.

Oops.

> Another thing that I find pretty strange is that it is coded so that,
> in event of a timeout detection by connectDBComplete, we give up on the
> current connhost entry and advance to the next host, ignoring any
> additional addresses we might have for the current hostname.  This seems
> at best poorly thought through.  There's no good reason for libpq to
> assume that all the addresses returned by DNS point at the same machine,
> or share the same network failure points in between.

Hmm, well, that was deliberate, but maybe ill-advised.  I guess I was
worried about making users wait for a long time to no gain, but your
points are valid, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: libpq connection timeout mismanagement

2018-08-13 Thread Tom Lane
Fabien COELHO  writes:
> Patch does not "git apply", but is ok with "patch -p1". Compiles.

Yeah, as far as I can tell, "git apply" is very intolerant.

> The code suggests that timeout is always 2 or more
>  if (timeout < 2) timeout = 2;
> but doc says "It is not recommended to use a timeout of less than 2 
> seconds", which is inconsistent. It should read "Timeout is always set to 
> at least 2 whatever the user asks.".

Agreed, changed the docs to clarify this.

> connect_timeout=2.9 is accepted and considered as meaning 2. 
> connect_timeout=-10 or connect_timeout=two are also accepted and mean 
> forever. Probably thanks to "atoi".

Right.  As before, I'm not excited about rejecting trailing junk,
considering we never did before.  It is kind of bogus that the
resolution of the timeout is only 1 second though --- maybe in
2002 that was fine, but today it seems strange.  I'm tempted to
change the parameter to be floating point (parse with atof, say)
and allow millisecond precision.  Then we'd not really need to
have the restriction about minimum delay.

That's a task for a different patch though.  In the meantime,
pushed this one --- thanks for reviewing!

regards, tom lane



Re: Temporary tables prevent autovacuum, leading to XID wraparound

2018-08-13 Thread Alvaro Herrera
If writing an OID were not atomic, the assignment would be really
dangerous.  I don't think your proposed update is good.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Temporary tables prevent autovacuum, leading to XID wraparound

2018-08-13 Thread Michael Paquier
On Mon, Aug 13, 2018 at 02:56:16AM -0700, Andres Freund wrote:
> On 2018-08-09 18:50:47 +0200, Michael Paquier wrote:
> I don't think that comment, nor the comment that you ended up
> committing:
> +
> +   /*
> +* Reset the temporary namespace flag in MyProc.  We assume that
> +* this operation is atomic.  Even if it is not, the temporary
> +* table which created this namespace is still locked until this
> +* transaction aborts so it would not be visible yet, acting as a
> +* barrier.
> +*/
> 
> is actually correct. *Holding* a lock isn't a memory barrier. Acquring
> or releasing one is.

I cannot guess what you think, but would something like the attached be
more adapted?  Both things look rather similar to me now, likely for you
it does not.
--
Michael
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 3971346e73..f0f692fe09 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -3937,9 +3937,11 @@ InitTempTableNamespace(void)
 	 * Mark MyProc as owning this namespace which other processes can use to
 	 * decide if a temporary namespace is in use or not.  We assume that
 	 * assignment of namespaceId is an atomic operation.  Even if it is not,
-	 * the temporary relation which resulted in the creation of this temporary
-	 * namespace is still locked until the current transaction commits, so it
-	 * would not be accessible yet, acting as a barrier.
+	 * the lock acquired on the temporary relation which created this
+	 * namespace serves as a memory barrier which guarantees a value of
+	 * tempNamespaceId recent enough.  All the contents of this namespace will
+	 * become visible once the current transaction commits, releasing the lock
+	 * previously acquired.
 	 */
 	MyProc->tempNamespaceId = namespaceId;
 
@@ -3976,10 +3978,12 @@ AtEOXact_Namespace(bool isCommit, bool parallel)
 
 			/*
 			 * Reset the temporary namespace flag in MyProc.  We assume that
-			 * this operation is atomic.  Even if it is not, the temporary
-			 * table which created this namespace is still locked until this
-			 * transaction aborts so it would not be visible yet, acting as a
-			 * barrier.
+			 * this operation is atomic.  Even if it is not, the lock acquired
+			 * on the temporary table which created this namespace serves as a
+			 * memory barrier which guarantees a value of tempNamespaceId
+			 * recent enough.  Nothing from this temporary namespace will be
+			 * visible as this transaction aborts, releasing also the
+			 * previously-acquired lock.
 			 */
 			MyProc->tempNamespaceId = InvalidOid;
 		}
@@ -4037,10 +4041,12 @@ AtEOSubXact_Namespace(bool isCommit, SubTransactionId mySubid,
 
 			/*
 			 * Reset the temporary namespace flag in MyProc.  We assume that
-			 * this operation is atomic.  Even if it is not, the temporary
-			 * table which created this namespace is still locked until this
-			 * transaction aborts so it would not be visible yet, acting as a
-			 * barrier.
+			 * this operation is atomic.  Even if it is not, the lock acquired
+			 * on the temporary table which created this namespace serves as a
+			 * memory barrier which guarantees a value of tempNamespaceId
+			 * recent enough.  Nothing from this temporary namespace will be
+			 * visible as this transaction aborts, releasing also the
+			 * previously-acquired lock.
 			 */
 			MyProc->tempNamespaceId = InvalidOid;
 		}


signature.asc
Description: PGP signature


Re: Temporary tables prevent autovacuum, leading to XID wraparound

2018-08-13 Thread Alvaro Herrera


>From 1e9ba9fa9b172bda1ea54b1f3be1b930973ff45f Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 8 Aug 2018 19:45:31 +0200
Subject: [PATCH] Make autovacuum more aggressive to remove orphaned temp
 tables

I was here to complain about this piece:

> @@ -3975,6 +4033,15 @@ AtEOSubXact_Namespace(bool isCommit, SubTransactionId 
> mySubid,
>   myTempNamespace = InvalidOid;
>   myTempToastNamespace = InvalidOid;
>   baseSearchPathValid = false;/* need to rebuild list 
> */
> +
> + /*
> +  * Reset the temporary namespace flag in MyProc. The 
> creation of
> +  * the temporary namespace has failed for some reason 
> and should
> +  * not be seen by other processes as it has not been 
> committed
> +  * yet, hence this would be fine even if not atomic, 
> still we
> +  * assume that it is an atomic assignment.
> +  */
> + MyProc->tempNamespaceId = InvalidOid;
>   }
>   }

But after looking carefully, I realized that this only runs if the
subtransaction being aborted is the same that set up the temp namespace
(or one of its children) in the first place; therefore it's correct to
tear it down unconditionally.  I was afraid of this clobbering a temp
namespace that had been set up by some other branch of the transaction
tree.  However, that's not a concern because we compare the
subTransactionId (and update the value when propagating to parent
subxact.)

I do share Andres' concerns on the wording the comment.  I would say
something like

/*
 * Reset the temporary namespace flag in MyProc.  We assume this to be
 * an atomic assignment.
 *
 * Because this subtransaction is rolling back, the pg_namespace
 * row is not visible to anyone else anyway, but that doesn't matter:
 * it's not a problem if objects contained in this namespace are removed
 * concurrently.
 */

The fact of assignment being atomic and the fact of the pg_namespace row
being visible are separately important.  You care about it being atomic
because it means you must not have someone read "16" (0x10) when you
were partway removing the value "65552" (0x10010), thus causing that
someone removing namespace 16.  And you care about the visibility of the
pg_namespace row because of whether you're worried about a third party
removing the tables from that namespace or not: since the subxact is
aborting, you are not.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: ATTACH/DETACH PARTITION CONCURRENTLY

2018-08-13 Thread Robert Haas
On Sun, Aug 12, 2018 at 9:05 AM, David Rowley
 wrote:
> This is not a fully baked idea, but I'm wondering if a better way to
> do this, instead of having this PartitionIsValid macro to determine if
> the partition should be visible to the current transaction ID, we
> could, when we invalidate a relcache entry, send along the transaction
> ID that it's invalid from.  Other backends when they process the
> invalidation message they could wipe out the cache entry only if their
> xid is >= the invalidation's "xmax" value. Otherwise, just tag the
> xmax onto the cache somewhere and always check it before using the
> cache (perhaps make it part of the RelationIsValid macro).

Transactions don't necessarily commit in XID order, so this might be
an optimization to keep older transactions from having to do
unnecessary rebuilds -- which I actually doubt is a major problem, but
maybe I'm wrong -- but you can't rely solely on this as a way of
deciding which transactions will see the effects of some change.  If
transactions 100, 101, and 102 begin in that order, and transaction
101 commits, there's no principled justification for 102 seeing its
effects but 100 not seeing it.

> This would
> also require that we move away from SnapshotAny type catalogue scans
> in favour of MVCC scans so that backends populating their relcache
> build it based on their current xid.

I think this is a somewhat confused analysis.  We don't use
SnapshotAny for catalog scans, and we never have.  We used to use
SnapshotNow, and we now use a current MVCC snapshot.  What you're
talking about, I think, is possibly using the transaction snapshot
rather than a current MVCC snapshot for the catalog scans.

I've thought about similar things, but I think there's a pretty deep
can of worms.  For instance, if you built a relcache entry using the
transaction snapshot, you might end up building a seemingly-valid
relcache entry for a relation that has been dropped or rewritten.
When you try to access the relation data, you'll be attempt to access
a relfilenode that's not there any more.  Similarly, if you use an
older snapshot to build a partition descriptor, you might thing that
relation OID 12345 is still a partition of that table when in fact
it's been detached - and, maybe, altered in other ways, such as
changing column types.

It seems to me that overall you're not really focusing on the right
set of issues here.  I think the very first thing we need to worry
about how how we're going to keep the executor from following a bad
pointer and crashing.  Any time the partition descriptor changes, the
next relcache rebuild is going to replace rd_partdesc and free the old
one, but the executor may still have the old pointer cached in a
structure or local variable; the next attempt to dereference it will
be looking at freed memory, and kaboom.  Right now, we prevent this by
not allowing the partition descriptor to be modified while there are
any queries running against the partition, so while there may be a
rebuild, the old pointer will remain valid (cf. keep_partdesc).  I
think that whatever scheme you/we choose here should be tested with a
combination of CLOBBER_CACHE_ALWAYS and multiple concurrent sessions
-- one of them doing DDL on the table while the other runs a long
query.

Once we keep it from blowing up, the second question is what the
semantics are supposed to be.  It seems inconceivable to me that the
set of partitions that an in-progress query will scan can really be
changed on the fly.  I think we're going to have to rule that if you
add or remove partitions while a query is running, we're going to scan
exactly the set we had planned to scan at the beginning of the query;
anything else would require on-the-fly plan surgery to a degree that
seems unrealistic.  That means that when a new partition is attached,
already-running queries aren't going to scan it.  If they did, we'd
have big problems, because the transaction snapshot might see rows in
those tables from an earlier time period during which that table
wasn't attached.  There's no guarantee that the data at that time
conformed to the partition constraint, so it would be pretty
problematic to let users see it.  Conversely, when a partition is
detached, there may still be scans from existing queries hitting it
for a fairly arbitrary length of time afterwards.  That may be
surprising from a locking point of view or something, but it's correct
as far as MVCC goes.  Any changes made after the DETACH operation
can't be visible to the snapshot being used for the scan.

Now, what we could try to change on the fly is the set of partitions
that are used for tuple routing.  For instance, suppose we're
inserting a long stream of COPY data.  At some point, we attach a new
partition from another session.  If we then encounter a row that
doesn't route to any of the partitions that existed at the time the
query started, we could - instead of immediately failing - go and
reload the set of partitions that 

Re: WIP: "More fair" LWLocks

2018-08-13 Thread Dmitry Dolgov
> On Mon, 13 Aug 2018 at 17:36, Alexander Korotkov  
> wrote:
>
> Hi!
>
> I run pgbench (read-write and read-only benchmarks) on Amazon
> c5d.18xlarge virtual machine, which has 72 VCPU (approximately same
> power as 36 physical cores).  The results are attached
> (lwlock-fair-ro.png and lwlock-fair-rw.png).

Hi,

Thanks for working on that. I haven't read the patch yet, but I think it worth
mentioning that with testing locks on AWS we also need to take into account
lock holder/waiter preemtion problem and such things as Intel PLE, since I
believe they can have significant impact in this case.
Right now I'm doing some small research on this topic and I hope soon I'll
finish a tool and benchmark setup to test the influence of such factors in
virtualized environment. In the meantime I would be glad to review your
patches.



Re: NetBSD vs libxml2

2018-08-13 Thread Nico Williams
[Quoting out of order.]

On Mon, Aug 13, 2018 at 11:16:23AM -0400, Tom Lane wrote:
> Robert Haas  writes:
>
> I'm not, personally, eager to do that work for a requirement which
> somehow hasn't surfaced on any other platform, nor on any previous
> NetBSD release.  I think NetBSD is way out in left field here.

It's not just about the distros.  It's also about sites that build and
install alternate versions of things that PG might depend on.  I've seen
that many times.  It is PG that is being hostile to them, not NetBSD
that is being hostile to PG.

Building PG outside a distro, with libxml2 outside a distro, ought to be
possible, but currently isn't without hacking on PG.  That's not good.

In any case, the patch I posted is trivial and small and should do the
job.

> > I kind of agree with Nico: why do we think we get to tell operating
> > system distributions which switches they're allowed to need to make
> > things work?  The point of things like pg_config and xmlconfig is to
> > reveal what is needed.  If we editorialize on that, we do so at our
> > own risk.
> 
> Well, the issue is that new kinds of switches introduce new potential
> for bugs.  In the case of -Wl,-R..., I'm not even sure that you can
> write that more than once per link, so absorbing one from xml2-config
> might well break things on some platforms.  Or, if you can write more
> than one, we'd need to make sure they end up in a desirable order.
> (cf commit dddfc4cb2 which fixed similar issues for -L switches;
> it looks to me like the current coding would in fact fail to order
> our $(rpath) correctly against one absorbed from xml2, and it would
> not be trivial to fix that.)

I believe no new options have _ever_ been added to linker-editors for
specifying dependencies, but lots of options for other purposes have
been added over the last several decades.

The link-editors are free to add new linker options.  There is nothing
we can do about that.

The fundamental problem here is that *-config programs should have had
separate --libs and --ldflags options, but don't, obligating us (and
others too) to parse the output of the --libs option.

What we need is an approach to parsing the output of the --libs option
that is likely to be more stable.  I believe looking for words that
start with -l is better than looking for words that start with -L,
because there haven't been new ways to specify dependencies ever, but
there have been tons of new linker-editor options for other things.  It
follows that this approach should be more stable.

And yes, I understand that if the linker-editors ever add new options
for specifying dependencies then we'll be here again.  It just seems a)
unlikely, b) not that burdensome if it does ever happen.

Aside:  It is truly shocking that in a world where, for better or worse,
autoconf is so prevalent, *-config programs still fail to
separate --libs and --ldflags.  After all, the separation of
LIBS (not precious) and LDFLAGS (precious), is very much a
tradition in autoconf'ed projects, and autoconf very much
supports it.

Nico
-- 



Re: libpq should not look up all host addresses at once

2018-08-13 Thread Fabien COELHO




The Fatal error does not really say for which host/ip the password fail.


Yup, but that's not the province of this patch to improve.  See

https://www.postgresql.org/message-id/25918.1533918...@sss.pgh.pa.us

for one that is trying to improve it.


Yep, I gathered that afterwards.

--
Fabien.



WIP: "More fair" LWLocks

2018-08-13 Thread Alexander Korotkov
Hi!

This subject was already raised multiple times [1], [2], [3].  In
short, our LWLock implementation has pathological behavior when shared
lockers constitute a continuous flood.  In this case exclusive lock
waiters are not guaranteed to eventually get the lock.  When shared
lock is held, other shared lockers goes without any queue.  This is
why despite individual shared lock durations are small, when flood of
shared lockers is high enough then there might be no gap to process
exclusive lockers.  Such behavior is periodically reported on
different LWLocks.  And that leads to an idea of making our LWLocks
"more fair", which would make infinite starvation of exclusive lock
waiters impossible.

This idea was a subject of critics.  And I can summarize arguments of
this critics as following:
1) LWLocks are designed to be unfair.  Their unfairness is downside of
high performance in majority of scenarios.
2) Attempt to make LWLocks "more fair" would lead to unacceptable
general performance regression.
3) If exclusive locks waiters are faced with infinite starvation, then
that's not a problem of tLWLocks implementation, but that's a problem
of particular use case.  So, we need to fix LWLocks use cases, not
LWLocks themselves.

I see some truth in these arguments.   But I can't agree that we
shouldn't try to fix LWLocks unfairness.  And I see following
arguments for that:
1) It doesn't look like we can ever fix all the LWLocks use cases in
the way, which would make infinite starvation impossible.  Usage of
NUMA systems is rising, and more LWLocks use cases are becoming
pathological.  For instance, there been much efforts placed to reduce
ProcArrayLock contention, but on multicore machine with heavy readonly
workload it might be still impossible to login or commit transaction.
Or another recent example: buffer mapping lock becomes reason of
eviction blocking [3].
2) The situation of infinite exclusive locker starvation is much worse
than just bad overall DBMS performance.  We are doing our best on
removing high contention points in PostgreSQL.  It's very good, but
it's an infinite race, assuming that new hardware platforms are
arriving.  But situation when you can't connect to the database when
the system have free resources is much worse than situation when
PostgreSQL doesn't scale well enough on your brand new hardware.
3) It's not necessary to make LWLocks completely fair in order to
exclude infinite starvation of exclusive lockers.  So, it's not
necessary to put all the shared lockers into the queue.  In the
majority of cases, shared lockers might still go through directly, but
on some event we might decide that it's too much and they should to
the queue.

So, taking into account all of above I made some experiments with
patches making LWLocks "more fair".  I'd like to share some
intermediate results.  I've written two patches for comparison.
1) lwlock-far-1.patch
Shared locker goes to the queue if there is already somebody in the
queue, otherwise obtains lock immediately.
2) lwlock-far-2.patch
New flag LW_FLAG_FAIR is introduced.  This flag is set when first
shared locker in the row releases the lock.  When LW_FLAG_FAIR is set
and there is already somebody in the queue, then shared locker goes to
the queue.  Basically it means that first shared locker "holds the
door" for other shared lockers to go without queue.

I run pgbench (read-write and read-only benchmarks) on Amazon
c5d.18xlarge virtual machine, which has 72 VCPU (approximately same
power as 36 physical cores).  The results are attached
(lwlock-fair-ro.png and lwlock-fair-rw.png).

We can see that for read-only scenario there is no difference between
master and both of patches.  That's expected, because in this scenario
no exclusive lock is obtained, so all shared lockers anyway go without
queue.

For read-write scenario we can see regression in both of patches.  1st
version of patch gives dramatic regression in 2.5-3 times.  2nd
version of patch behaves better, regression is about 15%, but it's
still unacceptable.  However, I think idea, that some event triggers
path of shared lockers to the queue, is promising.  We should just
select this triggering event better.

I'm going to continue my experiments trying to make "more fair"
LWLocks (almost) without performance regression.  Any feedback is
welcome.

Links:
1. 
https://www.postgresql.org/message-id/0A3221C70F24FB45833433255569204D1F578E83%40G01JPEXMBYT05
2. 
https://www.postgresql.org/message-id/CAPpHfdsytkTFMy3N-zfSo+kAuUx=u-7jg6q2byb6fpuw2cd...@mail.gmail.com
3. 
https://www.postgresql.org/message-id/CAPpHfdt_HFxNKFbSUaDg5QHxzKcvPBB5OhRengRpVDp6ubdrFg%40mail.gmail.com

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


lwlock-fair-1.patch
Description: Binary data


lwlock-fair-2.patch
Description: Binary data


Re: libpq should append auth failures, not overwrite

2018-08-13 Thread Fabien COELHO



Hello Tom,


ISTM that both the hostname and ip should be shown to avoid confusion
about hosts with multiple ips, esp. as ips are given in any order by the
dns.
...
Also for homogeneity, I'd suggest to always add the server line. If the
server introduction is inserted in all cases, including when only one host
is used, hints become partially redundant:
   server "local.coelho.net" port 5434:
   could not connect to server: Connection refused
 Is the server running on host "local.coelho.net" (127.0.0.1) and 
accepting
 TCP/IP connections on port 5434?
This would allow to simplify more hints, which you seem to have done on
"open read-write session" and "SHOW transaction_read_only" but not others.


As I explained in my comments, the reason I did not do these things
is that I didn't want to change the output for cases in which just a
single host name is given.  I still don't.


Ok, I get your argument when there is just one target server (cluster), 
which is probably at least 99.9% of the use case in practice.


However, ISTM multiple ip & multiple hostname look pretty close.

I guess that the number of people that use these features is small, but 
for these when things go wrong better messages are useful... so I would 
see no harm to trigger the server introductory line when there are 
multiples servers, whatever the reason why there are multiples servers.


--
Fabien.



Re: NetBSD vs libxml2

2018-08-13 Thread Tom Lane
Robert Haas  writes:
> On Sat, Aug 11, 2018 at 1:18 PM, Tom Lane  wrote:
>> We could fix this by teaching configure to absorb -Wl,-R... switches
>> into LDFLAGS from xml2-config's output, and that seems to make things
>> work, but I wonder whether we should or not.  This seems like a new height
>> of unfriendliness to non-default packages on NetBSD's part, and it's a bit
>> hard to believe the behavior will make it to a formal release.

> I kind of agree with Nico: why do we think we get to tell operating
> system distributions which switches they're allowed to need to make
> things work?  The point of things like pg_config and xmlconfig is to
> reveal what is needed.  If we editorialize on that, we do so at our
> own risk.

Well, the issue is that new kinds of switches introduce new potential
for bugs.  In the case of -Wl,-R..., I'm not even sure that you can
write that more than once per link, so absorbing one from xml2-config
might well break things on some platforms.  Or, if you can write more
than one, we'd need to make sure they end up in a desirable order.
(cf commit dddfc4cb2 which fixed similar issues for -L switches;
it looks to me like the current coding would in fact fail to order
our $(rpath) correctly against one absorbed from xml2, and it would
not be trivial to fix that.)

I'm not, personally, eager to do that work for a requirement which
somehow hasn't surfaced on any other platform, nor on any previous
NetBSD release.  I think NetBSD is way out in left field here.

regards, tom lane



Re: libpq should append auth failures, not overwrite

2018-08-13 Thread Tom Lane
Fabien COELHO  writes:
> ISTM that both the hostname and ip should be shown to avoid confusion 
> about hosts with multiple ips, esp. as ips are given in any order by the 
> dns.
> ...
> Also for homogeneity, I'd suggest to always add the server line. If the 
> server introduction is inserted in all cases, including when only one host 
> is used, hints become partially redundant:
>server "local.coelho.net" port 5434:
>could not connect to server: Connection refused
>  Is the server running on host "local.coelho.net" (127.0.0.1) and 
> accepting
>  TCP/IP connections on port 5434?
> This would allow to simplify more hints, which you seem to have done on 
> "open read-write session" and "SHOW transaction_read_only" but not others.

As I explained in my comments, the reason I did not do these things
is that I didn't want to change the output for cases in which just a
single host name is given.  I still don't.  People will think it's
change for the sake of change, and they tend not to like that.
The multi-host feature is new enough that I think we can still get away
with changing how errors are reported in those cases ... but what you're
proposing here is to mess with error-reporting behavior that was settled
on decades ago.  I'm not really interested in taking the flak that will
come with that.

regards, tom lane



Re: logical decoding / rewrite map vs. maxAllocatedDescs

2018-08-13 Thread Andres Freund
On 2018-08-13 11:50:41 -0300, Alvaro Herrera wrote:
> On 2018-Aug-11, Andres Freund wrote:
> 
> > On 2018-08-11 00:46:25 -0700, Andres Freund wrote:
> > > Below you can find the bt full showing a bunch of nested invalidations. 
> > > Looking.
> > 
> > Hm. I wonder if the attached fixes the issue for you.  If it's this I
> > don't understand why this doesn't occur on older branches...
> > 
> > I've not yet been able to reliably reproduce the issue without the
> > patch, so I'm not sure it means much that it didn't reoccur afterwards.
> 
> Well, the line you're patching was introduced in 2ce64caaf793 (looking
> at pg10), and was not replacing a line with the exact same mistake.

I'm not clear on what you're trying to say?  My problem is that Thomas'
backtraces don't show involvement by that function, which makes it less
likely that it matters, no?

Greetings,

Andres Freund



Re: Repeatable Read Isolation in SQL running via background worker

2018-08-13 Thread Jeremy Finzel
On Thu, Aug 9, 2018 at 4:34 PM, Jeremy Finzel  wrote:

> I am using worker_spi as a model to run a SQL statement inside a
> background worker.  From my browsing of the Postgres library, I believe
> that if I want repeatable read isolation level, the proper way for me to
> attain this is to add this line after StartTransactionCommand()
> in worker_spi_main:
>
> XactIsoLevel = XACT_REPEATABLE_READ;
>
> Or - am I mistaken?  Does PushActiveSnapshot already ensure I will get the
> same snapshot of the data within this transaction?
>
> Can anyone help me if this is accurate or if there are any other gotchas I
> should be aware of?
>
> The SQL statement will be run every minute for example, and each time with
> this isolation level.  At least, that is my goal.
>
> Any help is much appreciated.
>
> Thanks,
> Jeremy
>

It seems to be working.  If anyone could provide any feedback though I
would be very appreciative.


Re: logical decoding / rewrite map vs. maxAllocatedDescs

2018-08-13 Thread Alvaro Herrera
On 2018-Aug-11, Andres Freund wrote:

> On 2018-08-11 00:46:25 -0700, Andres Freund wrote:
> > Below you can find the bt full showing a bunch of nested invalidations. 
> > Looking.
> 
> Hm. I wonder if the attached fixes the issue for you.  If it's this I
> don't understand why this doesn't occur on older branches...
> 
> I've not yet been able to reliably reproduce the issue without the
> patch, so I'm not sure it means much that it didn't reoccur afterwards.

Well, the line you're patching was introduced in 2ce64caaf793 (looking
at pg10), and was not replacing a line with the exact same mistake.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: NetBSD vs libxml2

2018-08-13 Thread Robert Haas
On Sat, Aug 11, 2018 at 1:18 PM, Tom Lane  wrote:
> We could fix this by teaching configure to absorb -Wl,-R... switches
> into LDFLAGS from xml2-config's output, and that seems to make things
> work, but I wonder whether we should or not.  This seems like a new height
> of unfriendliness to non-default packages on NetBSD's part, and it's a bit
> hard to believe the behavior will make it to a formal release.

I kind of agree with Nico: why do we think we get to tell operating
system distributions which switches they're allowed to need to make
things work?  The point of things like pg_config and xmlconfig is to
reveal what is needed.  If we editorialize on that, we do so at our
own risk.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: logical decoding / rewrite map vs. maxAllocatedDescs

2018-08-13 Thread Andres Freund
Hi,

On 2018-08-13 11:46:30 -0300, Alvaro Herrera wrote:
> On 2018-Aug-11, Tomas Vondra wrote:
> 
> > Hmmm, it's difficult to compare "bt full" output, but my backtraces look
> > somewhat different (and all the backtraces I'm seeing are 100% exactly
> > the same). Attached for comparison.
> 
> Hmm, looks similar enough to me -- at the bottom you have the executor
> doing its thing, then an AcceptInvalidationMessages in the middle
> section atop which sit a few more catalog accesses, and further up from
> that you have another AcceptInvalidationMessages with more catalog
> accesses.  AFAICS that's pretty much the same thing Andres was
> describing.

It's somewhat different because it doesn't seem to involve a reload of a
nailed table, which my traces did.  I wasn't able to reproduce the crash
more than once, so I'm not at all sure how to properly verify the issue.
I'd appreciate if Thomas could try to do so again with the small patch I
provided.

Greetings,

Andres Freund



Re: logical decoding / rewrite map vs. maxAllocatedDescs

2018-08-13 Thread Alvaro Herrera
On 2018-Aug-11, Tomas Vondra wrote:

> Hmmm, it's difficult to compare "bt full" output, but my backtraces look
> somewhat different (and all the backtraces I'm seeing are 100% exactly
> the same). Attached for comparison.

Hmm, looks similar enough to me -- at the bottom you have the executor
doing its thing, then an AcceptInvalidationMessages in the middle
section atop which sit a few more catalog accesses, and further up from
that you have another AcceptInvalidationMessages with more catalog
accesses.  AFAICS that's pretty much the same thing Andres was
describing.  I think the exact shape of the executor bits is not
relevant, and I suppose the exact details of what occurs when
invalidation messages are processed are not terribly important either.
I think in Andres' backtrace there are *three*
AcceptInvalidationMessages rather than two as in your case, but that
shouldn't be important either, just the fact that there are N nested
ones.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Creating extensions for non-superusers

2018-08-13 Thread Robert Haas
On Fri, Aug 10, 2018 at 11:11 AM, Stephen Frost  wrote:
> For my 2c, I'd like something along these lines when it comes to a
> capability but it's just not that simple.

It seems pretty simple to me.  We have a bunch of other predefined
roles that allow otherwise-superuser-only operations to be delegated
to non-superusers.  Alexandra's proposal to add one more seems like a
logical extension of that work.  +1 from me.

> Further, while you might make it such that a non-superuser could install
> the extensions, those extensions may have superuser checks inside them
> as well which would need to be addressed or at least considered.  There
> isn't too much point in installing an extension if everything that
> extension allows requires superuser rights.
>
> Lastly, you'll certainly want to look at some of the extensions to see
> if what they install are things you really want a non-superuser to be
> able to do, in particular in cases where you're getting an extension
> from a third party but there may even be cases in contrib where an
> extension, once installed, allows a non-superuser to do things that a
> hosted environment might prefer they didn't.

While these might be good things for an individual DBA to consider
before granting the new pg_create_extension privilege to a user on
their system, they don't in my mind have much to do with whether or
not we should add the feature in the first place.  Our goal should be
to allow bits of superuser privilege to be given out according to
local policy; it is for individual DBAs to decide on what the local
policy should be, and the factors you mention are things they ought to
consider.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: libpq should not look up all host addresses at once

2018-08-13 Thread Tom Lane
Fabien COELHO  writes:
> In the same vein on a wrong password:

>   sh> psql "host=no-such-host,local2.coelho.net,local3.coelho.net"
>   Password for user fabien:
>   psql: could not translate host name "no-such-host" to address: Name or 
> service not known
>   could not connect to server: Connection refused
>  Is the server running on host "local2.coelho.net" (127.0.0.2) and 
> accepting
>  TCP/IP connections on port 5432?
>   could not connect to server: Connection refused
>  Is the server running on host "local2.coelho.net" (127.0.0.3) and 
> accepting
>  TCP/IP connections on port 5432?
>   FATAL:  password authentication failed for user "fabien"

> The Fatal error does not really say for which host/ip the password fail.

Yup, but that's not the province of this patch to improve.  See

https://www.postgresql.org/message-id/25918.1533918...@sss.pgh.pa.us

for one that is trying to improve it.

regards, tom lane



Re: libpq should not look up all host addresses at once

2018-08-13 Thread Tom Lane
Fabien COELHO  writes:
> About the behavior from psql point of view:

> * if dns works, error messages are only printed if all attempts failed:
> But nothing shows if one succeeds at some point. I understand that libpq 
> is doing its job, but I'm wondering whether this is the best behavior.

Yeah, this is the behavior that was established by the multi-host patch
to begin with.  This patch just extends that to treat DNS failures the
same way as we already treated other connection problems.

> Maybe the user would like to know that attempts are made and are failing? 
> This would suggest some callback mecanism so that the client is informed 
> of in progress issues? Maybe this is for future work.

Well, the application can already tell that if it wishes to, by noting
whether PQhost/PQport return the values for the first alternative or
later ones.  Not sure that we need anything more.

> * when the password is required, there is no way to know for which host/ip 
> it is:

Again, I'm not here to re-litigate API decisions that were made in
connection with the multi-host patch.  What was decided (and documented)
at that point was that if you don't want to have the same password for all
the hosts in question, you need to use ~/.pgpass to supply per-host
passwords.

In practice, I'm not sure this matters too much.  It's hard to conceive of
a practical use-case in which all the target hosts aren't interchangeable
from the application's/user's standpoint.  That's why there's little or
no provision for varying the other conn parameters per-host.  We could
imagine future extensions to libpq to allow some or all of the rest of
them to be comma-separated lists, but I'm content to wait for a compelling
use-case to be shown before doing that work.

> atoi("5432+1") == 5432, so the port syntax check is loose, really.

libpq has always parsed port parameters that way.  Tightening it now
is not likely to earn us any thanks.

regards, tom lane



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-08-13 Thread Marina Polyakova

On 12-08-2018 12:14, Fabien COELHO wrote:

HEllo Marina,


Hello, Fabien!


v10-0003-Pgbench-errors-use-the-Variables-structure-for-c.patch
- a patch for the Variables structure (this is used to reset client 
variables during the repeating of transactions after 
serialization/deadlock failures).


This patch adds an explicit structure to manage Variables, which is
useful to reset these on pgbench script retries, which is the purpose
of the whole patch series.

About part 3:

Patch applies cleanly,


On 12-08-2018 12:17, Fabien COELHO wrote:

About part 3:

Patch applies cleanly,


I forgot: compiles, global & local "make check" are ok.


I'm glad to hear it :-)


* typo in comments: "varaibles"


I'm sorry, I'll fix it.


* About enlargeVariables:

multiple INT_MAX error handling looks strange, especially as this code
can never be triggered because pgbench would be dead long before
having allocated INT_MAX variables. So I would not bother to add such
checks.
...
I'm not sure that the size_t cast here and there are useful for any
practical values likely to be encountered by pgbench.


Looking at the code of the functions, for example, ParseScript and 
psql_scan_setup, where the integer variable is used for the size of the 
entire script - ISTM that you are right.. Therefore size_t casts will 
also be removed.



ISTM that if something is amiss it will fail in pg_realloc anyway.


IIUC and physical RAM is not enough, this may depend on the size of the 
swap.



Also I do not like the ExpBuf stuff, as usual.



The exponential allocation seems overkill. I'd simply add a constant
number of slots, with a simple rule:

/* reallocated with a margin */
if (max_vars < needed) max_vars = needed + 8;

So in the end the function should be much simpler.


Ok!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-13 Thread Etsuro Fujita

(2018/08/13 11:57), Robert Haas wrote:

On Mon, Aug 6, 2018 at 8:30 AM, Etsuro Fujita
  wrote:

In the above I used the test whether the relation's reloptkind is
RELOPT_BASEREL or not, but I noticed that I had overlooked the case of a
multi-level partitioned table.  So I fixed that and added regression test
cases for that.  I also revised comments a bit.  Attached is an updated
version of the patch.


+   /* If so, consider partitionwise joins for that join. */
+   if (IS_PARTITIONED_REL(joinrel))
+   joinrel->consider_partitionwise_join = true;

Maybe this should assert that the inner and outer rels have
consider_partitionwise_join set.  There is an Assert quite a bit
earlier in the function that the parent join have it set, but I think
it might make sense to check the children have it set whenever we set
the flag.


Agreed.  Done.

One thing I noticed might be an improvement is to skip 
build_joinrel_partition_info if the given joinrel will be to have 
consider_partitionwise_join=false; in the previous patch, that function 
created the joinrel's partition info such as part_scheme and part_rels 
if the joinrel is considered as partitioned, independently of the flag 
consider_partitionwise_join for it, but if that flag is false, we don't 
generate PWJ paths for the joinrel, so we would not need to create that 
partition info at all.  This would not only avoid unnecessary processing 
in that function, but also make unnecessary the changes I made to 
try_partitionwise_join, generate_partitionwise_join_paths, 
apply_scanjoin_target_to_paths, and create_ordinary_grouping_paths.  So 
I updated the patch that way.  Please find attached an updated version 
of the patch.


Thanks for the review!

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 8337,8344  ALTER TABLE fprt2_p1 SET (autovacuum_enabled = 'false');
  ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false');
  INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i;
  INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i;
! CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250)
  	SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true');
  CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500)
  	SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true');
  ANALYZE fprt2;
--- 8337,8345 
  ALTER TABLE fprt2_p2 SET (autovacuum_enabled = 'false');
  INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(0, 249, 3) i;
  INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM') FROM generate_series(250, 499, 3) i;
! CREATE FOREIGN TABLE ftprt2_p1 (b int, c varchar, a int)
  	SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true');
+ ALTER TABLE fprt2 ATTACH PARTITION ftprt2_p1 FOR VALUES FROM (0) TO (250);
  CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500)
  	SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true');
  ANALYZE fprt2;
***
*** 8389,8416  SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10)
   8 |   | 
  (5 rows)
  
! -- with whole-row reference
  EXPLAIN (COSTS OFF)
! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
!QUERY PLAN
! -
   Sort
 Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2)
!->  Append
!  ->  Foreign Scan
!Relations: (public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2)
!  ->  Foreign Scan
!Relations: (public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2)
! (7 rows)
  
! SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
!t1   |   t2   
  +
   (0,0,) | (0,0,)
   (150,150,0003) | (150,150,0003)
   (250,250,0005) | (250,250,0005)
   (400,400,0008) | (400,400,0008)
! (4 rows)
  
  -- join with lateral reference
  EXPLAIN (COSTS OFF)
--- 8390,8431 
   8 |   | 
  (5 rows)
  
! -- with whole-row reference; partitionwise join does not apply
  EXPLAIN (COSTS OFF)
! SELECT t1.wr, t2.wr FROM (SELECT t1 wr, a FROM fprt1 t1 WHERE t1.a % 25 = 0) t1 FULL JOIN (SELECT t2 wr, b FROM fprt2 t2 WHERE t2.b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY 1,2;
!QUERY PLAN   
! 
   Sort
 Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2)
!->  Hash Full Join
!  Hash Cond: (t1.a = t2.b)
!  ->  Append
!->  Foreign Scan on 

Re: Get funcid when create function

2018-08-13 Thread Robert Haas
On Fri, Aug 10, 2018 at 5:50 AM, 王翔宇  wrote:
> I'm developing a extension for pg. Now I have create a event trigger on
> ddl_command_end, and this function will be called after I enter create
> function statement. In this function I can only get parseTree. In pg source
> code, I found a function named "pg_event_trigger_ddl_commands" seems provide
> cmds which include funcid. BUT I didn't found any example to call this
> function.
> who can helps?

Maybe it would help to read the documentation on that function:

https://www.postgresql.org/docs/current/static/functions-event-triggers.html

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket

2018-08-13 Thread Heikki Linnakangas

On 08/08/18 19:19, Heikki Linnakangas wrote:

So, with this commit, the various SIGQUIT quickdie handlers are in a
better shape. The regular backend's quickdie() handler still calls
ereport(), which is not safe, but it's a step in the right direction.
And we haven't addressed the original complaint, which was actually
about startup_die(), not quickdie().

What should we do about startup_die()? Ideas?


To recap: If a backend process receives a SIGTERM, or if the 
authentication timeout expires, while the backend is reading the startup 
packet, we call proc_exit(0) from the signal handler. proc_exit(0) is 
clearly not async-safe, so if the process was busy modifying backend 
state, for example doing a memory allocation, bad things can happen. In 
the beginning of this thread, Jimmy demonstrated a self-deadlock, when 
the backend was in the middle of proc_exit(), when the signal arrived, 
and a nested call to proc_exit was made.


I think the proper fix here is to stop calling proc_exit(0) directly 
from the startup_die() signal handler. Instead, let's install the 
regular die() signal handler earlier, before reading the startup 
process, and rely on the usual CHECK_FOR_INTERRUPTS() mechanism.


One annoying thing is the pg_getnameinfo_all() call, in 
BackendInitialize(). If 'log_hostname' is enabled, it can perform a DNS 
lookup, which can take a very long time. It would be nice to still be 
able to interrupt that, but if we stop calling proc_exit() from the 
signal handler, we won't. I don't think it's safe to interrupt it, like 
we do today, but I wonder if the cure is worse than the disease.


We revamped the signal handling in backend processes in PostgreSQL 9.5, 
so I'm inclined to only backpatch this to 9.5. In 9.3 and 9.4, let's 
just add a '!proc_exit_in_progress' check to the signal handler, to 
prevent the nested proc_eixt() call that Jimmy ran into from happening. 
It won't fix the general problem, but since we haven't heard any other 
reports about this, I think that's the right amount of effort for 9.3 
and 9.4.


- Heikki



Re: Temporary tables prevent autovacuum, leading to XID wraparound

2018-08-13 Thread Andres Freund
On 2018-08-09 18:50:47 +0200, Michael Paquier wrote:
> On Thu, Aug 09, 2018 at 02:29:54AM -0700, Andres Freund wrote:
> +   /*
> +* Mark MyProc as owning this namespace which other processes can use to
> +* decide if a temporary namespace is in use or not.  We assume that
> +* assignment of namespaceId is an atomic operation.  Even if it is not,
> +* there is no visible temporary relations associated to it and the
> +* temporary namespace creation is not committed yet, so none of its
> +* contents should be seen yet if scanning pg_class or pg_namespace.
> +*/

> I actually have tried to mention what you are willing to see in the
> comments with the last sentence.  So that is awkward :)

I don't know what you're trying to say with this.

> I would propose to reword the last sentence of the patch as follows
> then:
> "Even if it is not atomic, the temporary relation which resulted in the
> creation of this temporary namespace is still locked until the current
> transaction commits, so it would not be accessible yet."
> 
> When resetting the value on abort I have that:
> +   /*
> +* Reset the temporary namespace flag in MyProc. The creation of
> +* the temporary namespace has failed for some reason and should
> +* not be seen by other processes as it has not been committed
> +* yet, hence this would be fine even if not atomic, still we
> +* assume that it is an atomic assignment.
> +*/
> 
> Hence I would propose the following wording for this part:
> "Reset the temporary namespace flag in MyProc.  We assume that this
> operation is atomic, however it would be fine even if not atomic as the
> temporary table which created this namespace is still locked until this
> transaction aborts so it would not be visible yet."

I don't think that comment, nor the comment that you ended up
committing:
+
+   /*
+* Reset the temporary namespace flag in MyProc.  We assume that
+* this operation is atomic.  Even if it is not, the temporary
+* table which created this namespace is still locked until this
+* transaction aborts so it would not be visible yet, acting as a
+* barrier.
+*/

is actually correct. *Holding* a lock isn't a memory barrier. Acquring
or releasing one is.

Greetings,

Andres Freund



Re: Temporary tables prevent autovacuum, leading to XID wraparound

2018-08-13 Thread Michael Paquier
On Thu, Aug 09, 2018 at 06:50:47PM +0200, Michael Paquier wrote:
> Better ideas are of course welcome.

I have gone back-and-forth on this patch for the last couple of days,
reworded the comment blocks to outline the point Andres has been making,
and I have finally been able to push it and back-patched it down to v11
but not further down as we don't want an ABI breakage in already
released branches.
--
Michael


signature.asc
Description: PGP signature


Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

2018-08-13 Thread Antonin Houska
Toshi Harada  wrote:

> Hi.
> 
> If "data-at-rest-encryption-wip-2018.07.25.patch" is applied to PostgreSQL 
> 11-beta3 released last week, 
> patch execution will fail as follows.

> 
> patching file src/backend/replication/logical/reorderbuffer.c
> Hunk #9 FAILED at 2464.
> 
> 1 out of 7 hunks FAILED -- saving rejects to file 
> src/bin/pg_rewind/pg_rewind.c.rej
> 
> 1 out of 6 hunks FAILED -- saving rejects to file 
> src/bin/pg_upgrade/controldata.c.rej

The patch should be applicable to the master branch.

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com



"WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

2018-08-13 Thread Toshi Harada
Hi.

If "data-at-rest-encryption-wip-2018.07.25.patch" is applied to PostgreSQL 
11-beta3 released last week, 
patch execution will fail as follows.


patching file src/backend/replication/logical/reorderbuffer.c
Hunk #9 FAILED at 2464.

1 out of 7 hunks FAILED -- saving rejects to file 
src/bin/pg_rewind/pg_rewind.c.rej

1 out of 6 hunks FAILED -- saving rejects to file 
src/bin/pg_upgrade/controldata.c.rej



(See the attached file for the entire patch log)

Antonin Houska  wrote:
> Toshi Harada  wrote:
> 
> > Hi.
> > 
> > I applied the patch "WIP: Data at rest encryption" to PostgreSQL 11 - beta 
> > 2 and I'm working on it.
> > 
> > When this patch is applied, the following problem occurs.
> > 
> > * An error occurs when CHECKPOINT is executed during two-phase commit.
> > * After an error occurs, if you stop PostgreSQL, it will never start again.
> > 
> > (1) First, execute PREPARE TRANSACTION.
> > 
> > postgres=# BEGIN;
> > BEGIN
> > postgres=# PREPARE TRANSACTION 'foo';
> > PREPARE TRANSACTION
> > postgres=#
> > 
> > (2) Execute the CHECKPOINT command from another terminal.
> > CHEKPOINT command fails.
> > 
> > postgres=# CHECKPOINT;
> > ERROR:  checkpoint request failed
> > HINT:  Consult recent messages in the server log for details.
> > postgres=#
> 
> The patch version I posted in
> 
> https://www.postgresql.org/message-id/11678.1532519255%40localhost
> 
> fixes an issue (unitialized pointer) that caused failure here, but it was
> SEGFAULT rather than ERROR. And the scope of the bug was broader than just
> CHECKPOINT.
> 
> Can you please test it again with the new version of the patch?
> 
> Anyway, thanks for your reports!
> 
> 
> -- 
> Antonin Houska
> Cybertec Schonig & Schonig GmbH
> Grohrmuhlgasse 26, A-2700 Wiener Neustadt
> Web: https://www.cybertec-postgresql.com
> 


PG11-beta3+0725patch.log
Description: Binary data