Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

2019-09-30 Thread Andrew Gierth
> "Antonin" == Antonin Houska  writes:

 >> It is set to false for numeric and float4, float8.

 Antonin> Are you sure about these?

numeric values can compare equal but have different display scales (see
hash_numeric).

float4 and float8 both have representations for -0, which compares equal
to 0. (numeric technically has a representation for -0 too, but I
believe the current code carefully avoids ever generating it.)

-- 
Andrew (irc:RhodiumToad)




Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

2019-09-30 Thread Antonin Houska
Anastasia Lubennikova  wrote:

> The patch implementing new opclass option is attached.
> 
> It adds new attribute pg_opclass.opcisbitwise, which is set to true if opclass
> equality is the same as binary equality.
> By default it is true.

I think the default value should be false and we should only set it to true
for individual opclasses which do meet the bitwise equality requirement. Also
extension authors should explicitly state that their data types are bitwise
equal. Otherwise the existing opclasses, when created via pg_dump ->
pg_restore, can be used by the system incorrectly.

> It is set to false for numeric and float4, float8.

Are you sure about these?

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: recovery_min_apply_delay in archive recovery causes assertion failure in latch

2019-09-30 Thread Michael Paquier
On Mon, Sep 30, 2019 at 05:50:03PM +0900, Fujii Masao wrote:
> Thanks for the review! OK, attached is the patch which also added
> two assertion checks as you described.

Thanks, looks fine.  The indentation looks a bit wrong for the
comments, but that's a nit.
--
Michael


signature.asc
Description: PGP signature


Re: pgbench - allow to create partitioned tables

2019-09-30 Thread Amit Kapila
On Mon, Sep 30, 2019 at 5:17 PM Fabien COELHO  wrote:
>
>
> > I don't want to introduce a new pattern in tests which people can then
> > tomorrow copy at other places even though such code is not required.
> > OTOH, if there is a genuine need for the same, then I am fine.
>
> Hmmm. The committer is right by definition. Here is a version without
> escaping but with a comment instead.
>

Thanks, attached is a patch with minor modifications which I am
planning to push after one more round of review on Thursday morning
IST unless there are more comments by anyone else.

The changes include:
1. ran pgindent
2. As per Alvaro's suggestions move few function definitions.
3. Changed one or two comments and fixed spelling at one place.

The one place where some suggestion might help:
+ else if (PQntuples(res) == 0)
+ {
+ /*
+ * This case is unlikely as pgbench already found "pgbench_branches"
+ * above to compute the scale.
+ */
+ fprintf(stderr,
+ "no pgbench_accounts table found in search_path\n"
+ "Perhaps you need to do initialization (\"pgbench -i\") in database
\"%s\"\n", PQdb(con));
+ exit(1);
+ }

Can anyone else think of a better error message either in wording or
style for above case?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pgbench-init-partitioned-21.patch
Description: Binary data


Re: A comment fix in xlogreader.c

2019-09-30 Thread Kyotaro Horiguchi
At Thu, 26 Sep 2019 11:57:59 +0900, Michael Paquier  wrote 
in <20190926025759.gb2...@paquier.xyz>
> On Thu, Sep 26, 2019 at 11:08:09AM +0900, Kyotaro Horiguchi wrote:
> > While rechecking another patch, I found that 709d003fbd forgot to
> > edit a comment mentioning three members removed from
> > XLogReaderState.
> >
> @@ -103,8 +103,7 @@ XLogReaderAllocate(int wal_segment_size, const char 
> *waldir,
>  state->read_page = pagereadfunc;
>  /* system_identifier initialized to zeroes above */
>  state->private_data = private_data;
> -/* ReadRecPtr and EndRecPtr initialized to zeroes above */
> -/* readSegNo, readOff, readLen, readPageTLI initialized to zeroes above 
> */
> +/* ReadRecPtr, EndRecPtr and readLen initialized to zeroes above */
>  state->errormsg_buf = palloc_extended(MAX_ERRORMSG_LEN + 1,
>MCXT_ALLOC_NO_OOM);
>  if (!state->errormsg_buf)
> 
> I see.  readSegNo and readOff have been moved to WALOpenSegment and
> replaced by new, equivalent fields, still all those three fields are
> still initialized for the palloc_extended() call to allocate
> XLogReaderState, while the two others are now part of
> WALOpenSegmentInit().  Your change is correct, so applied.

Exactly. Thanks!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Attempt to consolidate reading of XLOG page

2019-09-30 Thread Kyotaro Horiguchi
Hello,

At Sat, 28 Sep 2019 15:00:35 +0200, Antonin Houska  wrote in 
<9236.1569675635@antos>
> Antonin Houska  wrote:
> 
> > Alvaro Herrera  wrote:
> > 
> > > BTW that tli_p business to the openSegment callback is horribly
> > > inconsistent.  Some callers accept a NULL tli_p, others will outright
> > > crash, even though the API docs say that the callback must determine the
> > > timeline.  This is made more complicated by us having the TLI in "seg"
> > > also.  Unless I misread, the problem is again that the walsender code is
> > > doing nasty stuff with globals (endSegNo).  As a very minor stylistic
> > > point, we prefer to have out params at the end of the signature.
> > 
> > XLogRead() tests for NULL so it should not crash but I don't insist on doing
> > it this way. XLogRead() actually does not have to care whether the "open
> > segment callback" determines the TLI or not, so it (XLogRead) can always
> > receive a valid pointer to seg.ws_tli.
> 
> This is actually wrong - seg.ws_tli is not always the correct value to
> pass. If seg.ws_tli refers to the segment from which data was read last time,
> then XLogRead() still needs a separate argument to specify from which TLI the
> current call should read. If these two differ, new file needs to be opened.

openSegment represents the file *currently* opened. XLogRead
needs the TLI *to be* opened. If they are different, as far as
wal logical wal sender and pg_waldump is concerned, XLogRead
switches to the new TLI and the new TLI is set to
openSegment.ws_tli.  So, it seems to me that the parameter
doesn't need to be inout? It is enough that it is an "in"
parameter.

> The problem of walsender.c is that its implementation of XLogRead() does not
> care about the TLI of the previous read. If the behavior of the new, generic
> implementation should be exactly the same, we need to tell XLogRead() that in
> some cases it also should not compare the current TLI to the previous
> one. That's why I tried to use the NULL pointer, or the InvalidTimeLineID
> earlier.

Physical wal sender doesn't switch TLI. So I don't think the
behavior doesn't harm (or doesn't fire). openSegment holds the
TLI set at the first call. (Even if future wal sender switches
TLI, the behavior should be needed.)

> Another approach is to add a boolean argument "check_tli", but that still
> forces caller to pass some (random) value of the tli. The concept of
> InvalidTimeLineID seems to me less disturbing than this.

So I think InvalidTimeLineID is not needed.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Hooks for session start and end, take two

2019-09-30 Thread Michael Paquier
On Mon, Sep 30, 2019 at 02:46:42PM -0300, Fabrízio de Royes Mello wrote:
> Makes sense to me. I become a reviewer and run all tests (make check &&
> make check-world) and everything is ok. Changed status to "ready for
> commiter".

Thanks.  There was an extra problem in the module hidden in the logs
of the regression tests: the autovacuum launcher has a backend ID
assigned at the moment of the session end hook (not the start), but it
is connected to no databases so I added a check on that in the start
and end hooks of the module.  That's more logic this way anyway as we
run a transaction in the test, and much better than using directly
GetUserIdAndSecContext() which would not assert for an invalid user
ID.  And done with e788bd9.
--
Michael


signature.asc
Description: PGP signature


Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock(PG10.7)

2019-09-30 Thread Alexander Korotkov
On Mon, Sep 30, 2019 at 10:54 PM Peter Geoghegan  wrote:
>
> On Sun, Sep 29, 2019 at 8:12 AM Alexander Korotkov
>  wrote:
> > I just managed to reproduce this using two sessions on master branch.
> >
> > session 1
> > session 2
>
> Was the involvement of the pending list stuff in Chen's example just a
> coincidence? Can you recreate the problem while eliminating that
> factor (i.e. while setting fastupdate to off)?
>
> Chen's example involved an INSERT that deadlocked against VACUUM --
> not a SELECT. Is this just a coincidence?

Chen wrote.

> Unfortunately the insert process(run by gcore) held no lwlock, it should be 
> another process(we did not fetch core file) that hold the lwlock needed for 
> autovacuum process.

So, he catched backtrace for INSERT and post it for information.  But
since INSERT has no lwlocks held, it couldn't participate deadlock.
It was just side waiter.

I've rerun my reproduction case and it still deadlocks.  Just the same
steps but GIN index with (fastupdate = off).

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Commit fest 2019-09

2019-09-30 Thread Amit Kapila
On Tue, Oct 1, 2019 at 12:54 AM Alvaro Herrera  wrote:
>
> Hello,
>
> After closing a few patches that have been dead for months, and moving
> the rest to the next commitfest, here's the results:
>
>   statusstring  │ week1 │ week2 │ week3 │ week4 │ final
> ┼───┼───┼───┼───┼───
>  Needs review   │   165 │   138 │   116 │   118 │ 0
>  Waiting on Author  │30 │44 │51 │44 │ 0
>  Ready for Committer│11 │ 5 │ 8 │11 │ 0
>  Returned with Feedback │ 1 │ 4 │ 5 │ 5 │28
>  Moved to next CF   │ 2 │ 4 │ 4 │ 4 │   191
>  Committed  │14 │23 │32 │34 │42
>  Rejected   │ 1 │ 1 │ 1 │ 1 │ 1
>  Withdrawn  │ 4 │ 9 │11 │11 │12
>
> This means we got 28 patches committed during this month ... which is
> a rather concerningly low figure.
>

Yes, I agree that we can improve these numbers.  However, like others,
I would also like to thank you for sincere and dedicated efforts in
this CF.  It really helps in moving things forward.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Memory Accounting

2019-09-30 Thread Tomas Vondra

On Mon, Sep 30, 2019 at 01:34:13PM -0700, Jeff Davis wrote:

On Sun, 2019-09-29 at 00:22 +0200, Tomas Vondra wrote:

Notice that when CLOBBER_FREED_MEMORY is defined, the code first
> calls
> wipe_mem and then accesses fields of the (wiped) block.
> Interesringly
> enough, the regression tests don't seem to exercise these bits -
> I've
> tried adding elog(ERROR) and it still passes. For (2) that's not
> very
> surprising because Generation context is only really used in
> logical
> decoding (and we don't delete the context I think). Not sure about
> (1)
> but it might be because AllocSetReset does the right thing and only
> leaves behind the keeper block.
>
> I'm pretty sure a custom function calling the contexts explicitly
> would
> fall over, but I haven't tried.
>


Fixed.

I tested with some custom use of memory contexts. The reason
AllocSetDelete() didn't fail before is that most memory contexts use
the free lists (the list of free memory contexts, not the free list of
chunks), so you need to specify a non-default minsize in order to
prevent that and trigger the bug.

AllocSetReset() worked, but it was reading the header of the keeper
block after wiping the contents of the keeper block. It technically
worked, because the header of the keeper block was not wiped, but it
seems more clear to explicitly save the size of the keeper block. In
AllocSetDelete(), saving the keeper size is required, because it wipes
the block headers in addition to the contents.



OK, looks good to me now.


Oh, and one more thing - this probably needs to add at least some
basic
explanation of the accounting to src/backend/mmgr/README.


Added.



Well, I've meant a couple of paragraphs explaining the motivation, and
relevant trade-offs considered. So I've written a brief summary of the
design as I understand it and pushed it like that. Of course, if you
could proof-read it, that'd be good.

I had a bit of a hard time deciding who to list as a reviewer - this
patch started sometime in ~2015, and it was initially discussed as part
of the larger hashagg effort, with plenty of people discussing various
ancient versions of the patch. In the end I've included just people from
the current thread. If that omits important past reviews, I'm sorry.

For the record, results of the benchmarks I've done over the past couple
of days are in [1]. It includes both the reindex benchmark used by by
Robert in 2015, and a regular read-only pgbench. In general, the
overhead of the accounting is pretty much indistinguishable from noise
(at least on those two machines).

In any case, thanks for the perseverance working on this.


[1] https://github.com/tvondra/memory-accounting-benchmarks


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

2019-09-30 Thread Paul Guo
>
> BTW in the future if you have two separate patches, please post them in
> separate threads and use separate commitfest items for each, even if
> they have minor conflicts.
>

Sure. Thanks.


Re: PL/pgSQL — "commit" illegal in the executable section of a block statement that has an exception section

2019-09-30 Thread Christophe Pettus



> On Sep 30, 2019, at 15:37, Bryn Llewellyn  wrote:
> I wrote up the problem here:
> 
> https://github.com/yugabyte/yugabyte-db/issues/2464

This is documented; it's the very last line of the page you reference in the 
Github issue:

A transaction cannot be ended inside a block with exception handlers.

Your discussion doesn't answer the specific issue: BEGIN/EXCEPTION/END in 
pl/pgSQL is implemented by savepoints.  What semantics should COMMIT / ROLLBACK 
have inside of that?  Doing a COMMIT / ROLLBACK (at the database level) at that 
point would lose the savepoints, which would break the BEGIN/EXCEPTION/END 
semantics.

It's not clear to me what the alternative semantics would be.  Can you propose 
specific database behavior for a COMMIT or ROLLBACK inside a 
BEGIN/EXCEPTION/END block which retain the savepoint behavior of 
BEGIN/EXCEPTION/END?
--
-- Christophe Pettus
   x...@thebuild.com





Re: subscriptionCheck failures on nightjar

2019-09-30 Thread Tom Lane
Andrew Dunstan  writes:
> On 9/20/19 6:17 PM, Tom Lane wrote:
>> Dromedary is running the last release of macOS that supports 32-bit
>> hardware, so if we decide to kick that to the curb, I'd either shut
>> down the box or put some newer Linux or BSD variant on it.

> Well, nightjar is on FBSD 9.0 which is oldish. I can replace it before
> long with an 11-stable instance if that's appropriate.

FYI, I've installed FreeBSD 12.0/i386 on that machine and it's
now running buildfarm member florican, using clang with -msse2
(a configuration we had no buildfarm coverage of before, AFAIK).

I can still boot the macOS installation if anyone is interested
in specific tests in that environment, but I don't intend to run
dromedary on a regular basis anymore.

regards, tom lane




Re: Don't codegen deform code for virtual tuples in expr eval for scan fetch

2019-09-30 Thread Soumyadeep Chakraborty
Awesome! Thanks so much for all the review! :)

--
Soumyadeep


Re: SSL tests failing for channel_binding with OpenSSL <= 1.0.1

2019-09-30 Thread Michael Paquier
On Mon, Sep 30, 2019 at 11:08:20AM -0700, Jeff Davis wrote:
> On Mon, 2019-09-30 at 09:37 -0400, Tom Lane wrote:
>> The committed fix looks odd: isn't the number of executed tests the
>> same in both code paths?  (I didn't try it yet.)
>
> test_connect_fails actually runs two tests, one for the failing exit
> code and one for the error message.

Yes.  The committed code still works as I would expect.  With OpenSSL
<= 1.0.1, I get 10 tests, and 9 with OpenSSL >= 1.0.2.  You can check
the difference from test 5 "SCRAM with SSL and channel_binding=require".
--
Michael


signature.asc
Description: PGP signature


Re: Don't codegen deform code for virtual tuples in expr eval for scan fetch

2019-09-30 Thread Andres Freund
Hi,

On 2019-09-30 09:14:45 -0700, Soumyadeep Chakraborty wrote:
> I don't feel very strongly about the changes I proposed.
> 
> > > I completely agree, that was an important consideration.
> > >
> > > I had some purely cosmetic suggestions:
> > > 1. Rename ExecComputeSlotInfo to eliminate the need for the asserts.
> >
> > How does renaming it do so? I feel like the asserts are a good idea
> > independent of anything else?
> 
> I felt that encoding the restriction that the function is meant to be called
> only in the context of fetch operations in the function name itself
> ensured that we don't call it from a non-fetch operation - something the
> asserts within ExecComputeSlotInfo() are guarding against.
> 
> >
> > > 2. Extract return value to a bool variable for slightly better
> > > readability.
> >
> > To me that seems clearly worse. The variable doesn't add anything, but
> > needing to track more state.>
> 
> Agreed.

I pushed this to master. Thanks for your contribution!

Regards,

Andres




PL/pgSQL — "commit" illegal in the executable section of a block statement that has an exception section

2019-09-30 Thread Bryn Llewellyn
I work for YugaByte, Inc (www.yugabyte.com ). 
YugabyteDB re-uses the source code that implements the “upper half” of 
PostgreSQL Version 11.2. See here:

https://blog.yugabyte.com/distributed-postgresql-on-a-google-spanner-architecture-query-layer/

This means that the problem that the PostgreSQL issue I describe tracks affects 
users of YugabyteDB too.

I wrote up the problem here:

https://github.com/yugabyte/yugabyte-db/issues/2464 


Please read my account and comment.

Peter Eisentraut, I hope that you’ll read this. I’m told that you're the 
authority for txn control in PL/pgSQL.

Re: Commit fest 2019-09

2019-09-30 Thread David Steele

On 9/30/19 5:26 PM, David Fetter wrote:


Thanks for doing all this work!


+1!

--
-David
da...@pgmasters.net




Value of Transparent Data Encryption (TDE)

2019-09-30 Thread Bruce Momjian
For plan for full-cluster Transparent Data Encryption (TDE) is here:
   

https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption

The values it has, I think, are:

*  encrypts data for anyone with read-access to the file system (but not
   memory)

 *  I think write access would allow access to the encryption keys
by modifying postgresql.conf or other files

 * This is particularly useful if the storage is remote

*  encrypts non-logical/non-pg_dump-like backups

*  fulfills several security compliance requirements

*  encrypts storage

*  perhaps easier to implement than file system encryption

Is that accurate?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Commit fest 2019-09

2019-09-30 Thread David Fetter
On Mon, Sep 30, 2019 at 04:24:15PM -0300, Alvaro Herrera wrote:
> Hello,
> 
> After closing a few patches that have been dead for months, and moving
> the rest to the next commitfest, here's the results:
> 
>   statusstring  │ week1 │ week2 │ week3 │ week4 │ final 
> ┼───┼───┼───┼───┼───
>  Needs review   │   165 │   138 │   116 │   118 │ 0
>  Waiting on Author  │30 │44 │51 │44 │ 0
>  Ready for Committer│11 │ 5 │ 8 │11 │ 0
>  Returned with Feedback │ 1 │ 4 │ 5 │ 5 │28
>  Moved to next CF   │ 2 │ 4 │ 4 │ 4 │   191
>  Committed  │14 │23 │32 │34 │42
>  Rejected   │ 1 │ 1 │ 1 │ 1 │ 1
>  Withdrawn  │ 4 │ 9 │11 │11 │12
> 
> This means we got 28 patches committed during this month ... which is
> a rather concerningly low figure.  I really hope we can do much better
> in the upcoming three commitfests -- at least, we're not going to have
> a stable release at the same time.
> 
> And with that, I've closed the commitfest.  I know there's still
> technically a few hours left ... but if anyone wants to commit one more
> patch, they can mark it committed in the next commitfest without
> remorse.
> 
> Thanks everyone,

Thanks for doing all this work!

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Transparent Data Encryption (TDE) and encrypted files

2019-09-30 Thread Bruce Momjian
For full-cluster Transparent Data Encryption (TDE), the current plan is
to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem
overflow).  The plan is:


https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption

We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact, or
other files.  Is that correct?  Do any other PGDATA files contain user
data?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




msys2 is missing pexports

2019-09-30 Thread Andrew Dunstan

We rely on pexports to extract exported symbols from DLL files (e.g. for
linking in PLs) when building with mingw. However, this program isn't
present in msys2. Instead the approved way is apparently to call
"gendef" from the appropriate toolset (e.g. /mingw64/bin). I have worked
around this on my new test machine but adding a script liker this in
/usr/bin/pexports:


#!/bin/sh
gendef - "$@"


However, requiring that is a bit ugly. Instead I think we should do
something like the attached.


I would not be surprised if we need to test the msys version elsewhere
as time goes on, so this would stand us in good stead if we do.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/configure b/configure
index b3c92764be..57d634d501 100755
--- a/configure
+++ b/configure
@@ -649,6 +649,7 @@ MSGFMT
 PG_CRC32C_OBJS
 CFLAGS_ARMV8_CRC32C
 CFLAGS_SSE42
+msys_version
 have_win32_dbghelp
 LIBOBJS
 UUID_LIBS
@@ -16082,7 +16083,9 @@ esac
 fi
 
 # Win32 (really MinGW) support
+msys_version=0
 if test "$PORTNAME" = "win32"; then
+  msys_version=`uname -r | sed 's/..*//'`
   for ac_func in _configthreadlocale
 do :
   ac_fn_c_check_func "$LINENO" "_configthreadlocale" "ac_cv_func__configthreadlocale"
@@ -16185,6 +16188,7 @@ else
 
 fi
 
+
 # Cygwin needs only a bit of that
 if test "$PORTNAME" = "cygwin"; then
   case " $LIBOBJS " in
diff --git a/configure.in b/configure.in
index 0d16c1a971..692ea2fabf 100644
--- a/configure.in
+++ b/configure.in
@@ -1768,7 +1768,9 @@ if test "$PORTNAME" = "win32"; then
 fi
 
 # Win32 (really MinGW) support
+msys_version=0
 if test "$PORTNAME" = "win32"; then
+  msys_version=`uname -r | sed 's/[.].*//'`
   AC_CHECK_FUNCS(_configthreadlocale)
   AC_REPLACE_FUNCS(gettimeofday)
   AC_LIBOBJ(dirmod)
@@ -1792,6 +1794,7 @@ if test x"$pgac_minidump_type" = x"yes" ; then
 else
   AC_SUBST(have_win32_dbghelp,no)
 fi
+AC_SUBST(msys_version)
 
 # Cygwin needs only a bit of that
 if test "$PORTNAME" = "cygwin"; then
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 2d21068183..2c68ffe1da 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -350,6 +350,12 @@ XGETTEXT = @XGETTEXT@
 GZIP	= gzip
 BZIP2	= bzip2
 
+msys_version = @msys_version@ # 0 = not msys
+ifeq ($(msys_version),1)
+	PEXPORTS = pexports
+else
+	PEXPORTS = "gendef -"
+endif
 
 # Tree-wide build support
 
diff --git a/src/pl/plperl/GNUmakefile b/src/pl/plperl/GNUmakefile
index 9b1c514101..04ba95b5ea 100644
--- a/src/pl/plperl/GNUmakefile
+++ b/src/pl/plperl/GNUmakefile
@@ -48,7 +48,7 @@ lib$(perlwithver).a: $(perlwithver).def
 	dlltool --dllname $(perlwithver).dll --def $(perlwithver).def --output-lib lib$(perlwithver).a
 
 $(perlwithver).def: $(PERLDLL)
-	pexports $^ > $@
+	$(PEXPORTS) $^ > $@
 
 endif # win32
 
diff --git a/src/pl/plpython/Makefile b/src/pl/plpython/Makefile
index 667a74469e..e919d17fc8 100644
--- a/src/pl/plpython/Makefile
+++ b/src/pl/plpython/Makefile
@@ -69,7 +69,7 @@ libpython${pytverstr}.a: python${pytverstr}.def
 	dlltool --dllname python${pytverstr}.dll --def python${pytverstr}.def --output-lib libpython${pytverstr}.a
 
 python${pytverstr}.def:
-	pexports $(PYTHONDLL) > $@
+	$(PEXPORTS) $(PYTHONDLL) > $@
 
 endif # win32
 
diff --git a/src/pl/tcl/Makefile b/src/pl/tcl/Makefile
index 8ee4c3ff15..52dd938182 100644
--- a/src/pl/tcl/Makefile
+++ b/src/pl/tcl/Makefile
@@ -44,7 +44,7 @@ lib$(tclwithver).a: $(tclwithver).def
 	dlltool --dllname $(tclwithver).dll --def $(tclwithver).def --output-lib lib$(tclwithver).a
 
 $(tclwithver).def: $(TCLDLL)
-	pexports $^ > $@
+	$(PEXPORTS) $^ > $@
 
 endif # win32
 


Re: Memory Accounting

2019-09-30 Thread Jeff Davis
On Sun, 2019-09-29 at 00:22 +0200, Tomas Vondra wrote:
> Notice that when CLOBBER_FREED_MEMORY is defined, the code first
> > calls
> > wipe_mem and then accesses fields of the (wiped) block.
> > Interesringly
> > enough, the regression tests don't seem to exercise these bits -
> > I've
> > tried adding elog(ERROR) and it still passes. For (2) that's not
> > very
> > surprising because Generation context is only really used in
> > logical
> > decoding (and we don't delete the context I think). Not sure about
> > (1)
> > but it might be because AllocSetReset does the right thing and only
> > leaves behind the keeper block.
> > 
> > I'm pretty sure a custom function calling the contexts explicitly
> > would
> > fall over, but I haven't tried.
> > 

Fixed.

I tested with some custom use of memory contexts. The reason
AllocSetDelete() didn't fail before is that most memory contexts use
the free lists (the list of free memory contexts, not the free list of
chunks), so you need to specify a non-default minsize in order to
prevent that and trigger the bug.

AllocSetReset() worked, but it was reading the header of the keeper
block after wiping the contents of the keeper block. It technically
worked, because the header of the keeper block was not wiped, but it
seems more clear to explicitly save the size of the keeper block. In
AllocSetDelete(), saving the keeper size is required, because it wipes
the block headers in addition to the contents.

> Oh, and one more thing - this probably needs to add at least some
> basic 
> explanation of the accounting to src/backend/mmgr/README.

Added.

Regards,
Jeff Davis

From 7184df8a928d6c0c6e41d171968a54d46f256a22 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Fri, 1 Jun 2018 13:35:21 -0700
Subject: [PATCH] Memory accounting at block level.

Track the memory allocated to a memory context in the form of blocks,
and introduce a new function MemoryContextMemAllocated() to report it
to the caller in bytes. This does not track individual chunks.
---
 src/backend/utils/mmgr/README   |  4 +++
 src/backend/utils/mmgr/aset.c   | 38 +
 src/backend/utils/mmgr/generation.c | 19 ---
 src/backend/utils/mmgr/mcxt.c   | 24 ++
 src/backend/utils/mmgr/slab.c   | 10 
 src/include/nodes/memnodes.h|  1 +
 src/include/utils/memutils.h|  1 +
 7 files changed, 93 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/mmgr/README b/src/backend/utils/mmgr/README
index 7e6541d0dee..bd5fa6d8e53 100644
--- a/src/backend/utils/mmgr/README
+++ b/src/backend/utils/mmgr/README
@@ -23,6 +23,10 @@ The basic operations on a memory context are:
 * reset a context (free all memory allocated in the context, but not the
   context object itself)
 
+* inquire about the total amount of memory allocated to the context
+  (the raw memory from which the context allocates chunks; not the
+  chunks themselves)
+
 Given a chunk of memory previously allocated from a context, one can
 free it or reallocate it larger or smaller (corresponding to standard C
 library's free() and realloc() routines).  These operations return memory
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 6b63d6f85d0..90f370570fe 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -458,6 +458,9 @@ AllocSetContextCreateInternal(MemoryContext parent,
 parent,
 name);
 
+			((MemoryContext) set)->mem_allocated =
+set->keeper->endptr - ((char *) set);
+
 			return (MemoryContext) set;
 		}
 	}
@@ -546,6 +549,8 @@ AllocSetContextCreateInternal(MemoryContext parent,
 		parent,
 		name);
 
+	((MemoryContext) set)->mem_allocated = firstBlockSize;
+
 	return (MemoryContext) set;
 }
 
@@ -566,6 +571,7 @@ AllocSetReset(MemoryContext context)
 {
 	AllocSet	set = (AllocSet) context;
 	AllocBlock	block;
+	Size		keepersize = set->keeper->endptr - ((char *) set);
 
 	AssertArg(AllocSetIsValid(set));
 
@@ -604,6 +610,8 @@ AllocSetReset(MemoryContext context)
 		else
 		{
 			/* Normal case, release the block */
+			context->mem_allocated -= block->endptr - ((char*) block);
+
 #ifdef CLOBBER_FREED_MEMORY
 			wipe_mem(block, block->freeptr - ((char *) block));
 #endif
@@ -612,6 +620,8 @@ AllocSetReset(MemoryContext context)
 		block = next;
 	}
 
+	Assert(context->mem_allocated == keepersize);
+
 	/* Reset block size allocation sequence, too */
 	set->nextBlockSize = set->initBlockSize;
 }
@@ -628,6 +638,7 @@ AllocSetDelete(MemoryContext context)
 {
 	AllocSet	set = (AllocSet) context;
 	AllocBlock	block = set->blocks;
+	Size		keepersize = set->keeper->endptr - ((char *) set);
 
 	AssertArg(AllocSetIsValid(set));
 
@@ -683,6 +694,9 @@ AllocSetDelete(MemoryContext context)
 	{
 		AllocBlock	next = block->next;
 
+		if (block != set->keeper)
+			context->mem_allocated -= block->endptr - ((char *) block);
+
 #ifdef CLOBBER_FREED_MEMORY
 		wipe_mem(block, block->freeptr - 

Re: pgsql: Implement jsonpath .datetime() method

2019-09-30 Thread Robert Haas
On Sun, Sep 29, 2019 at 10:30 AM Alexander Korotkov
 wrote:
> So, jsonpath behaves like 100 is not greater than 2020.  This
> looks like plain false.  And user can't expect that unless she is
> familiar with our particular issues.  Now I got opinion  that such
> errors shouldn't be suppressed.  We can't suppress *every* error.  If
> trying to do this, we can come to an idea to suppress OOM error and
> return garbage then, which is obviously ridiculous.  Opinions?

I don't know enough about jsonpath to have a view on specifically
which errors ought to be suppressed, but I agree that it's probably
not all of them. In fact, I'd go so far as to say that thinking about
it in terms of error suppression is probably not the right approach in
the first place. Rather, you want to ask what behavior you're trying
to create.

For example, if I'm trying to write a function that takes a string as
input and returns JSON, where the result is formatted as a number if
possible or a string otherwise, I might want access at the C level to
the guts of numeric_in, with all parsing errors returned rather than
thrown. But it would be silly to suppress an out-of-memory condition,
because that doesn't help the caller. The caller wants to know whether
the thing can be parsed as a number or not, and that has nothing to do
with whether we're out of memory, so an out-of-memory error should
still be thrown.

In this case here, it seems to me that you should similarly start by
defining the behavior you're trying to create. Unless that's clearly
defined, deciding which errors to suppress may be difficult.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock(PG10.7)

2019-09-30 Thread Peter Geoghegan
On Sun, Sep 29, 2019 at 8:12 AM Alexander Korotkov
 wrote:
> I just managed to reproduce this using two sessions on master branch.
>
> session 1
> session 2

Was the involvement of the pending list stuff in Chen's example just a
coincidence? Can you recreate the problem while eliminating that
factor (i.e. while setting fastupdate to off)?

Chen's example involved an INSERT that deadlocked against VACUUM --
not a SELECT. Is this just a coincidence?

-- 
Peter Geoghegan




Re: Revert back to standard AC_STRUCT_TIMEZONE Autoconf macro

2019-09-30 Thread Tom Lane
Peter Eisentraut  writes:
> Instead of AC_STRUCT_TIMEZONE we use our own variant called
> PGAC_STRUCT_TIMEZONE that checks for tzname even if other variants were
> found first.  But since 63bd0db12199c5df043e1dea0f2b574f622b3a4c we
> don't use tzname anymore, so we don't need this anymore.

Hmm.  I wonder if we need AC_STRUCT_TIMEZONE either?  Seems like
we should only be using our own struct pg_tm.  If we could get
rid of that configure macro altogether, we could remove some dubious
junk like plpython.h's "#undef HAVE_TZNAME".

regards, tom lane




Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock(PG10.7)

2019-09-30 Thread Peter Geoghegan
On Mon, Sep 30, 2019 at 11:00 AM Alexander Korotkov
 wrote:
> Thank you for pointing.  I remember this thread, but don't remember
> details.  I'll reread it.

I think that doing this now would be a good idea:

"""
Defensively checking the page type (pending, posting, etc) within
scanPendingInsert() would be a good start. That's already something
that we do for posting trees. If we're following the same rules as
posting trees (which sounds like a good idea), then we should have the
same defenses. Same applies to using elogs within ginInsertCleanup()
-- we should promote those Assert()s to elog()s.
"""

In other words, ginInsertCleanup() should have defensive "Page types
match?" checks that are similar to the existing checks in
ginStepRight(). In both cases we're stepping right while coupling
locks, to avoid concurrent page deletion.

> > > Locking from right to left is clearly wrong.  It could deadlock with
> > > concurrent ginStepRight(), which locks from left to right.  I expect
> > > this happened in your case.  I'm going to reproduce this and fix.

> Frankly speaking I'm not very happy with special version of B-tree,
> which is builtin to GIN.  This version of B-tree is lacking of high
> keys.  AFAIR because of lack of high keys, we can't implement the same
> concurrency model as nbtree.  Instead we have to do super-locking for
> page deletion and so on.  That looks ridiculous for me.  I think in
> future we should somehow reimplement GIN on top of nbtree.

I think that that could work on top of the new nbtree posting list
stuff, provided that it was acceptable to not use posting list
compression in the main tree -- the way that posting list splits work
there needs to be able to think about free space in a very simple way
that is broken even by GIN's varbyte compression. Compression could
still be used in posting trees, though.

> But we have to provide some way less radical fixes for our stable
> releases.  In particular, I believe patch I've posted in this thread
> makes situation better not worse.  That is it fixes one bug without
> introducing mode bugs.  But I'm going to analyze more on this and
> document GIN concurrency better in the README.  Probably, I'll spot
> more details.

Thanks.
-- 
Peter Geoghegan




Re: Commit fest 2019-09

2019-09-30 Thread Alvaro Herrera
Hello,

After closing a few patches that have been dead for months, and moving
the rest to the next commitfest, here's the results:

  statusstring  │ week1 │ week2 │ week3 │ week4 │ final 
┼───┼───┼───┼───┼───
 Needs review   │   165 │   138 │   116 │   118 │ 0
 Waiting on Author  │30 │44 │51 │44 │ 0
 Ready for Committer│11 │ 5 │ 8 │11 │ 0
 Returned with Feedback │ 1 │ 4 │ 5 │ 5 │28
 Moved to next CF   │ 2 │ 4 │ 4 │ 4 │   191
 Committed  │14 │23 │32 │34 │42
 Rejected   │ 1 │ 1 │ 1 │ 1 │ 1
 Withdrawn  │ 4 │ 9 │11 │11 │12

This means we got 28 patches committed during this month ... which is
a rather concerningly low figure.  I really hope we can do much better
in the upcoming three commitfests -- at least, we're not going to have
a stable release at the same time.

And with that, I've closed the commitfest.  I know there's still
technically a few hours left ... but if anyone wants to commit one more
patch, they can mark it committed in the next commitfest without
remorse.

Thanks everyone,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Revert back to standard AC_STRUCT_TIMEZONE Autoconf macro

2019-09-30 Thread Peter Eisentraut
Instead of AC_STRUCT_TIMEZONE we use our own variant called
PGAC_STRUCT_TIMEZONE that checks for tzname even if other variants were
found first.  But since 63bd0db12199c5df043e1dea0f2b574f622b3a4c we
don't use tzname anymore, so we don't need this anymore.

The attached patches revert back to the standard AC_STRUCT_TIMEZONE
macro and do some related cleanup.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 78597561e3f8e9119d6fd07e4106cc780e58507b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 30 Sep 2019 20:50:16 +0200
Subject: [PATCH 1/2] Remove use of deprecated Autoconf define

Change from HAVE_TM_ZONE to HAVE_STRUCT_TM_TM_ZONE.
---
 src/interfaces/ecpg/pgtypeslib/dt_common.c | 4 ++--
 src/interfaces/ecpg/pgtypeslib/timestamp.c | 8 
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/interfaces/ecpg/pgtypeslib/dt_common.c 
b/src/interfaces/ecpg/pgtypeslib/dt_common.c
index e71defaa66..29c1117546 100644
--- a/src/interfaces/ecpg/pgtypeslib/dt_common.c
+++ b/src/interfaces/ecpg/pgtypeslib/dt_common.c
@@ -995,7 +995,7 @@ abstime2tm(AbsoluteTime _time, int *tzp, struct tm *tm, 
char **tzn)
tm->tm_sec = tx->tm_sec;
tm->tm_isdst = tx->tm_isdst;
 
-#if defined(HAVE_TM_ZONE)
+#if defined(HAVE_STRUCT_TM_TM_ZONE)
tm->tm_gmtoff = tx->tm_gmtoff;
tm->tm_zone = tx->tm_zone;
 
@@ -1041,7 +1041,7 @@ abstime2tm(AbsoluteTime _time, int *tzp, struct tm *tm, 
char **tzn)
}
else
tm->tm_isdst = -1;
-#else  /* not (HAVE_TM_ZONE || 
HAVE_INT_TIMEZONE) */
+#else  /* not 
(HAVE_STRUCT_TM_TM_ZONE || HAVE_INT_TIMEZONE) */
if (tzp != NULL)
{
/* default to UTC */
diff --git a/src/interfaces/ecpg/pgtypeslib/timestamp.c 
b/src/interfaces/ecpg/pgtypeslib/timestamp.c
index e830ee737e..2be151f7e6 100644
--- a/src/interfaces/ecpg/pgtypeslib/timestamp.c
+++ b/src/interfaces/ecpg/pgtypeslib/timestamp.c
@@ -100,7 +100,7 @@ timestamp2tm(timestamp dt, int *tzp, struct tm *tm, fsec_t 
*fsec, const char **t
int64   dDate,
date0;
int64   time;
-#if defined(HAVE_TM_ZONE) || defined(HAVE_INT_TIMEZONE)
+#if defined(HAVE_STRUCT_TM_TM_ZONE) || defined(HAVE_INT_TIMEZONE)
time_t  utime;
struct tm  *tx;
 #endif
@@ -134,7 +134,7 @@ timestamp2tm(timestamp dt, int *tzp, struct tm *tm, fsec_t 
*fsec, const char **t
 */
if (IS_VALID_UTIME(tm->tm_year, tm->tm_mon, tm->tm_mday))
{
-#if defined(HAVE_TM_ZONE) || defined(HAVE_INT_TIMEZONE)
+#if defined(HAVE_STRUCT_TM_TM_ZONE) || defined(HAVE_INT_TIMEZONE)
 
utime = dt / USECS_PER_SEC +
((date0 - date2j(1970, 1, 1)) * 
INT64CONST(86400));
@@ -147,7 +147,7 @@ timestamp2tm(timestamp dt, int *tzp, struct tm *tm, fsec_t 
*fsec, const char **t
tm->tm_min = tx->tm_min;
tm->tm_isdst = tx->tm_isdst;
 
-#if defined(HAVE_TM_ZONE)
+#if defined(HAVE_STRUCT_TM_TM_ZONE)
tm->tm_gmtoff = tx->tm_gmtoff;
tm->tm_zone = tx->tm_zone;
 
@@ -159,7 +159,7 @@ timestamp2tm(timestamp dt, int *tzp, struct tm *tm, fsec_t 
*fsec, const char **t
if (tzn != NULL)
*tzn = TZNAME_GLOBAL[(tm->tm_isdst > 0)];
 #endif
-#else  /* not (HAVE_TM_ZONE || 
HAVE_INT_TIMEZONE) */
+#else  /* not 
(HAVE_STRUCT_TM_TM_ZONE || HAVE_INT_TIMEZONE) */
*tzp = 0;
/* Mark this as *no* time zone available */
tm->tm_isdst = -1;
-- 
2.23.0

From 5fb3fb899b74f9eb8ba63c23da8a65582b5b2eb0 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 30 Sep 2019 21:12:58 +0200
Subject: [PATCH 2/2] Revert back to standard AC_STRUCT_TIMEZONE Autoconf macro

We used to use our own variant of this called PGAC_STRUCT_TIMEZONE
that checked for tzname even if other variants were found first.  But
since 63bd0db12199c5df043e1dea0f2b574f622b3a4c we don't use tzname
anymore, so we don't need this anymore.
---
 config/c-library.m4|  33 
 configure  | 300 +++--
 configure.in   |   2 +-
 src/include/pg_config.h.in |   7 +-
 4 files changed, 163 insertions(+), 179 deletions(-)

diff --git a/config/c-library.m4 b/config/c-library.m4
index 6f2b0fbb4e..a8fa6edd26 100644
--- a/config/c-library.m4
+++ b/config/c-library.m4
@@ -23,39 +23,6 @@ if test x"$pgac_cv_var_int_timezone" = xyes ; then
 fi])# PGAC_VAR_INT_TIMEZONE
 
 
-# PGAC_STRUCT_TIMEZONE
-# --
-# Figure out how to get the current 

Re: Auxiliary Processes and MyAuxProc

2019-09-30 Thread Mike Palmiotto
On Thu, Sep 26, 2019 at 6:03 PM Mike Palmiotto
 wrote:
>
> On Thu, Sep 26, 2019 at 10:56 AM Alvaro Herrera
>  wrote:
> 
> >
> > Well, I think it would be easier to manage as split patches, yeah.
> > I think it'd be infrastructure that needs to be carefully reviewed,
> > while the other ones are mostly boilerplate.  If I were the committer
> > for it, I would push that initial patch first immediately followed by
> > conversion of some process that's heavily exercised in buildfarm, wait
> > until lack of trouble is evident, followed by a trickle of pushes to
> > adapt the other processes.
>
> Thanks for the feedback! I've rebased and tested on my F30 box with
> and without EXEC_BACKEND. Just working on splitting out the patches
> now and will post the new patchset as soon as that's done (hopefully
> sometime tomorrow).

Attached is the reworked and rebased patch set. I put the hook on top
and a separate commit for each process type. Note that avworker and
avlauncher were intentionally left together. Let me know if you think
those should be split out as well.

Tested again with and without EXEC_BACKEND. Note that you'll need to
set randomize_va_space to 0 to disable ASLR in order to avoid
occasional failure in the EXEC_BACKEND case.

Please let me know if anything else jumps out to you.

Thanks,
-- 
Mike Palmiotto
https://crunchydata.com
From 205ea86dde898f7edac327d46b2b43b04721aadc Mon Sep 17 00:00:00 2001
From: Mike Palmiotto 
Date: Mon, 30 Sep 2019 14:42:53 -0400
Subject: [PATCH 7/8] Add backends to process centralization

---
 src/backend/postmaster/postmaster.c   | 309 +-
 src/include/postmaster/fork_process.h |   5 +
 src/include/postmaster/postmaster.h   |   1 -
 3 files changed, 160 insertions(+), 155 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 8f862fcd64..b55cc4556d 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -144,6 +144,7 @@ static Backend *ShmemBackendArray;
 
 BackgroundWorker *MyBgworkerEntry = NULL;
 RegisteredBgWorker *CurrentBgWorker = NULL;
+static Backend	   *MyBackend;
 static int			child_errno;
 
 /* The socket number we are listening for connections on */
@@ -356,7 +357,6 @@ static void BackendInitialize(Port *port);
 static void BackendRun(Port *port) pg_attribute_noreturn();
 static void ExitPostmaster(int status) pg_attribute_noreturn();
 static int	ServerLoop(void);
-static int	BackendStartup(Port *port);
 static int	ProcessStartupPacket(Port *port, bool secure_done);
 static void SendNegotiateProtocolVersion(List *unrecognized_protocol_options);
 static void processCancelRequest(Port *port, void *pkt);
@@ -409,7 +409,6 @@ typedef struct
 } win32_deadchild_waitinfo;
 #endif			/* WIN32 */
 
-static pid_t backend_forkexec(Port *port);
 static pid_t internal_forkexec(int argc, char *argv[]);
 
 /* Type for a socket that can be inherited to a client process */
@@ -505,6 +504,7 @@ static void ShmemBackendArrayRemove(Backend *bn);
 #define pgarch_start()			StartChildProcess(PgArchiverFork)
 #define SysLogger_Start()		StartChildProcess(SysLoggerFork)
 #define do_start_bgworker()		StartChildProcess(BgWorkerFork)
+#define BackendStartup()		StartChildProcess(BackendFork)
 
 /* Macros to check exit status of a child process */
 #define EXIT_STATUS_0(st)  ((st) == 0)
@@ -524,6 +524,141 @@ HANDLE		PostmasterHandle;
 
 static Port *ConnProcPort = NULL;
 
+/*
+ *	 BackendMain
+ *
+ *	 Child code when forking a Backend.
+ */
+static void BackendMain(pg_attribute_unused() int argc, pg_attribute_unused() char *argv[])
+{
+	/*
+	 * Perform additional initialization and collect startup packet.
+	 *
+	 * We want to do this before InitProcess() for a couple of reasons: 1.
+	 * so that we aren't eating up a PGPROC slot while waiting on the
+	 * client. 2. so that if InitProcess() fails due to being out of
+	 * PGPROC slots, we have already initialized libpq and are able to
+	 * report the error to the client.
+	 */
+	BackendInitialize(ConnProcPort);
+
+#ifdef EXEC_BACKEND
+	shmemSetup(false);
+#endif
+
+	/* And run the backend */
+	BackendRun(ConnProcPort);		/* does not return */
+}
+
+/*
+ *	 BackendPostmasterMain
+ *
+ *	 Parent code when forking a Backend.
+ */
+static void BackendPostmasterMain(pg_attribute_unused() int argc, pg_attribute_unused() char *argv[])
+{
+	/* in parent, successful fork */
+	ereport(DEBUG2,
+			(errmsg_internal("forked new backend, pid=%d socket=%d",
+			 (int) MyChildProcPid, (int) MyProcPort->sock)));
+
+	/*
+	 * Everything's been successful, it's safe to add this backend to our list
+	 * of backends.
+	 */
+	MyBackend->pid = MyChildProcPid;
+	MyBackend->bkend_type = BACKEND_TYPE_NORMAL;	/* Can change later to WALSND */
+	dlist_push_head(, >elem);
+
+#ifdef EXEC_BACKEND
+	if (!MyBackend->dead_end)
+		ShmemBackendArrayAdd(MyBackend);
+#endif
+}
+
+/*
+ *	 BackendFailFork
+ *
+ *	 Backend cleanup in case a failure occurs forking a new 

Re: Commit fest 2019-09

2019-09-30 Thread Alvaro Herrera
On 2019-Sep-30, Alvaro Herrera wrote:

> I've closed the following items as Returned with Feedback.  These were
> all Waiting-on-Author for a long time, with no new versions posted.  In
> most cases these are still patches that appear to be useful ... but if
> they're not kept updated to feedback, they're clutter in the CF system.

Some more returned with feedback:

 patch_id │   author   │
  name   
──┼┼─
 1294 │ Ildus Kurbangaliev │ Custom compression methods
 1695 │ Takeshi Ideriha│ Global shared meta cache
 2166 │ Takeshi Ideriha│ Shared memory context
 2043 │ Jeff Janes │ Bloom index cost model

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock(PG10.7)

2019-09-30 Thread Peter Geoghegan
On Sun, Sep 29, 2019 at 10:38 PM Andrey Borodin  wrote:
> As far as I understand deleted page is stamped with
> GinPageSetDeleteXid(page, ReadNewTransactionId());
> It will not be recycled until that Xid is far behind.

That only gets used within posting tree pages, though.
ginInsertCleanup() is concerned with pending list pages.

> BTW we found a small bug (wraparound) in similar GiST and B-tree 
> implementations.
> Probably, it's there in GIN too.

Probably, but that's much less of a problem to me.

-- 
Peter Geoghegan




Re: Partitioning versus autovacuum

2019-09-30 Thread Greg Stark
Actually -- I'm sorry to followup to myself (twice) -- but that's
wrong. That Todo item predates the modern partitioning code. It came
from when the partitioned statistics were added for inheritance trees.
The resulting comment almost doesn't make sense any more since it
talks about updates to the parent table and treats them as distinct
from updates to the children.

In any case it's actually not true any more as updates to the parent
table aren't even tracked any more -- see below. My modest proposal is
that we should count any updates that arrive through the parent table
as mods for both the parent and child.

A more ambitious proposal would have updates to the children also
count against the parent somehow but I'm not sure exactly how. And I'm
not sure we shouldn't be updating the parent statistics whenever we
run analyze on a child anyways but again I'm not sure how.

postgres=# postgres=# create table p (i integer primary key, t text)
partition by range (i) ;
CREATE TABLE
postgres=# create table p0 partition of p for values from (0) to (10);
CREATE TABLE
postgres=# analyze p;
ANALYZE
postgres=# analyze p0;
ANALYZE
postgres=# select pg_stat_get_mod_since_analyze('p'::regclass) as p,
pg_stat_get_mod_since_analyze('p0'::regclass) as p0;
 p | p0
---+
 0 |  0
(1 row)

postgres=# insert into p values (2);
INSERT 0 1
postgres=# select pg_stat_get_mod_since_analyze('p'::regclass) as p,
pg_stat_get_mod_since_analyze('p0'::regclass) as p0;
 p | p0
---+
 0 |  1
(1 row)




Re: Commit fest 2019-09

2019-09-30 Thread Alvaro Herrera
I've closed the following items as Returned with Feedback.  These were
all Waiting-on-Author for a long time, with no new versions posted.  In
most cases these are still patches that appear to be useful ... but if
they're not kept updated to feedback, they're clutter in the CF system.

 patch_id │   author   │
  name   
──┼┼─
 1854 │ Aya Iwata  │ libpq trace log
 2268 │ Ian Barwick│ psql: add tab completion for \df slash command 
suffixes
 1796 │ Chris Travers  │ documenting signal handling with readme
 2051 │ Karl Pinc  │ DOC: Document encode() and decode() base64 
encoding
 2060 │ takuma hoshiai │ suppress errors thrown by to_reg*()
 2082 │ Sandro Mani│ Mingw: Fix import library extension, build 
actual static libraries
 2095 │ Daniel Gustafsson  │ pg_upgrade version and path checking
 2148 │ Timur Birsh│ vacuumlo: report the number of large objects 
going to be removed
 2223 │ Daniel Migowski│ Adding column "mem_usage" to view 
pg_prepared_statements
 2028 │ Mike Palmiotto │ Flexible partition pruning hook
 2108 │ Thomas Munro   │ Parallel-aware file_fdw
  944 │ Nikhil Sontakke│ Logical decoding of two-phase transactions
 1961 │ Petr Jelínek   │ Synchronizing slots from primary to standby
 2167 │ Peter Eisentraut   │ unlogged sequences
 2124 │ Matwey V. Kornilov │ Introduce spgist quadtree @<(point,circle) 
operator
 1533 │ Matheus Oliveira   │ Add support for ON UPDATE/DELETE actions on 
ALTER CONSTRAINT
 2034 │ Takamichi Osumi│ extension patch of CREATE OR REPLACE TRIGGER

If somebody wants co-authorship for a patch whose author seems
unresponsive, I suggest to ping them about that and run with it if they
don't respond in a reasonable timeframe.

(There were a few others patches Waiting-on-Author, but these didn't
look as stale.  I didn't touch anything in the bugfixes section either.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Partitioning versus autovacuum

2019-09-30 Thread Greg Stark
Actually I did just find it in the To-do wiki:

Have autoanalyze of parent tables occur when child tables are modified


   - http://archives.postgresql.org/pgsql-performance/2010-06/msg00137.php


On Mon., Sep. 30, 2019, 1:48 p.m. Greg Stark,  wrote:

> So we now support `ANALYZE partitioned_table` which will gather statistics
> for the main table by gathering stats from all the partitions.
>
> However as far as I can tell autovacuum will never actually trigger this
> analyze. Because we never generate any update records for the parent table
> in the statistics. Have I missed something?
>
> I didn't find any discussion of this in the threads from when partitioning
> was committed but there were a lot of discussions and I could easily have
> missed it.
>
> Is there a story for this? Some way to configure things so that autovacuum
> will analyze partitioned tables?
>
> Or should we look at doing something? Maybe whether we analyze a child we
> should also update the parent -- and if there's no stats yet run analyze on
> it?
>
> This may be a serious enough problem for users that it may warrant
> backpatching. Not having any stats is resulting in some pretty weird plans
> for us.
>


Re: errbacktrace

2019-09-30 Thread Peter Eisentraut
On 2019-09-27 17:50, Alvaro Herrera wrote:
> On 2019-Sep-13, Alvaro Herrera wrote:
> 
>> On 2019-Aug-20, Peter Eisentraut wrote:
>>
>>> The memory management of that seems too complicated.  The "extra"
>>> mechanism of the check/assign hooks only supports one level of malloc.
>>> Using a List seems impossible.  I don't know if you can safely do a
>>> malloc-ed array of malloc-ed strings either.
>>
>> Here's an idea -- have the check/assign hooks create a different
>> representation, which is a single guc_malloc'ed chunk that is made up of
>> every function name listed in the GUC, separated by \0.  That can be
>> scanned at error time comparing the function name with each piece.
> 
> Peter, would you like me to clean this up for commit, or do you prefer
> to keep authorship and get it done yourself?

If you want to finish it using the idea from your previous message,
please feel free.  I won't get to it this week.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Minimal logical decoding on standbys

2019-09-30 Thread Robert Haas
On Mon, Sep 30, 2019 at 7:35 AM Amit Khandekar  wrote:
> Alright. Attached is the updated patch that splits the file into two
> files, one that does only xmin related testing, and the other test
> file that tests conflict recovery scenarios, and also one scenario
> where drop-database drops the slots on the database on standby.
> Removed get_slot_xmins() and get_node_from_slotname().
> Renamed 'replica' to 'standby'.
> Used node->backup() function instead of pg_basebackup command.
> Renamed $master_slot to $master_slotname, similarly for $standby_slot.

In general, I think this code is getting a lot clearer and easier to
understand in these last few revisions.

Why does create_logical_slot_on_standby include sleep(1)? Does the
test fail if you take that out? If so, it's probably going to fail on
the buildfarm even with that included, because some of the buildfarm
machines are really slow (e.g. because they use CLOBBER_CACHE_ALWAYS,
or because they're running on a shared system with low hardware
specifications and an ancient disk).

Similarly for the sleep(1) just after you VACUUM FREEZE all the databases.

I'm not sure wait the point of the wait_for_xmins() stuff is in
019_standby_logical_decoding_conflicts.pl. Isn't that just duplicating
stuff we've already tested in 018?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: SSL tests failing for channel_binding with OpenSSL <= 1.0.1

2019-09-30 Thread Jeff Davis
On Mon, 2019-09-30 at 09:37 -0400, Tom Lane wrote:
> Michael Paquier  writes:
> > On Fri, Sep 27, 2019 at 11:44:57AM +0900, Michael Paquier wrote:
> > > We need to do something similar to c3d41cc for the test, as per
> > > the
> > > attached.  I have tested that with OpenSSL 1.0.1 and 1.0.2 to
> > > stress
> > > both scenarios.
> > > Any objections to this fix?
> > Committed as a12c75a1.
> 
> The committed fix looks odd: isn't the number of executed tests the
> same in both code paths?  (I didn't try it yet.)

test_connect_fails actually runs two tests, one for the failing exit
code and one for the error message.

Regards,
Jeff Davis






Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock(PG10.7)

2019-09-30 Thread Alexander Korotkov
On Sun, Sep 29, 2019 at 10:53 PM Peter Geoghegan  wrote:
> On Sun, Sep 29, 2019 at 7:38 AM Alexander Korotkov
>  wrote:
> > Starting from root seems OK for me, because vacuum blocks all
> > concurrent inserts before doing this.  But this needs to be properly
> > documented in readme.
>
> I never got an adequate answer to this closely related question almost
> two years ago:
>
> https://www.postgresql.org/message-id/CAH2-Wz=gtnapzeezqyelov3h1fxpo5xhmrbp6amgeklkv95...@mail.gmail.com
>
> In general, ginInsertCleanup() seems badly designed. Why is it okay
> that there is no nbtree-style distinction between page deletion and
> page recycling?

Thank you for pointing.  I remember this thread, but don't remember
details.  I'll reread it.

> > Locking from right to left is clearly wrong.  It could deadlock with
> > concurrent ginStepRight(), which locks from left to right.  I expect
> > this happened in your case.  I'm going to reproduce this and fix.
>
> I am sick and tired of seeing extremely basic errors like this within
> GIN's locking protocols. Bugs happen, but these are not ordinary bugs.
> They're more or less all a result of giving no thought to the high
> level design. I'm not blaming you for this, or any one person. But
> this is not okay.
>
> Anything around index concurrency needs to be explained in
> excruciating detail, while taking a top-down approach that applies
> general rules (e.g. you can only do lock coupling left to right, or
> bottom to top in nbtree). Anything less than that should be assumed to
> be wrong on general principle.

Frankly speaking I'm not very happy with special version of B-tree,
which is builtin to GIN.  This version of B-tree is lacking of high
keys.  AFAIR because of lack of high keys, we can't implement the same
concurrency model as nbtree.  Instead we have to do super-locking for
page deletion and so on.  That looks ridiculous for me.  I think in
future we should somehow reimplement GIN on top of nbtree.

But we have to provide some way less radical fixes for our stable
releases.  In particular, I believe patch I've posted in this thread
makes situation better not worse.  That is it fixes one bug without
introducing mode bugs.  But I'm going to analyze more on this and
document GIN concurrency better in the README.  Probably, I'll spot
more details.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Partitioning versus autovacuum

2019-09-30 Thread Greg Stark
So we now support `ANALYZE partitioned_table` which will gather statistics
for the main table by gathering stats from all the partitions.

However as far as I can tell autovacuum will never actually trigger this
analyze. Because we never generate any update records for the parent table
in the statistics. Have I missed something?

I didn't find any discussion of this in the threads from when partitioning
was committed but there were a lot of discussions and I could easily have
missed it.

Is there a story for this? Some way to configure things so that autovacuum
will analyze partitioned tables?

Or should we look at doing something? Maybe whether we analyze a child we
should also update the parent -- and if there's no stats yet run analyze on
it?

This may be a serious enough problem for users that it may warrant
backpatching. Not having any stats is resulting in some pretty weird plans
for us.


Re: Hooks for session start and end, take two

2019-09-30 Thread Fabrízio de Royes Mello
On Sun, Sep 29, 2019 at 10:29 PM Michael Paquier 
wrote:
>
> This code path can only be taken by normal backends, so that would
> apply, still I don't actually see why we should limit us here on the
> backend side.  If for a reason or another those two code paths begin
> to be taken by a backend with InvalidBackendId, then users of the
> session start/end hook will need to think how to handle it if they
> didn't from the start, which sounds like a good thing to me.
>

Makes sense to me. I become a reviewer and run all tests (make check &&
make check-world) and everything is ok. Changed status to "ready for
commiter".

Regards,

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock(PG10.7)

2019-09-30 Thread Alexander Korotkov
On Mon, Sep 30, 2019 at 8:38 AM Andrey Borodin  wrote:
>
> > 29 сент. 2019 г., в 21:27, Alexander Korotkov  
> > написал(а):
> >
> > Patch with fix is attached.  Idea is simple: ginScanToDelete() now
> > keeps exclusive lock on left page eliminating the need to relock it.
> > So, we preserve left-to-right locking order and can't deadlock with
> > ginStepRight().
>
> In this function ginDeletePage(gvs, blkno, 
> BufferGetBlockNumber(me->leftBuffer),...)
> we are going to reread buffer
> lBuffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, leftBlkno,
>  RBM_NORMAL, gvs->strategy);
> Is it OK?


That's related not only to left buffer.  We also could pass buffer
instead of deleteBlkno.  And with some code changes it's also possible
to pass buffer instead of parentBlkno.  But I decided to keep code
changes minimal at least for backpatch version.  We could apply such
optimization to master as a separate patch.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Optimize partial TOAST decompression

2019-09-30 Thread Tomas Vondra

On Mon, Sep 30, 2019 at 09:20:22PM +0500, Andrey Borodin wrote:




30 сент. 2019 г., в 20:56, Tomas Vondra  
написал(а):

I mean this:

  /*
   * Use int64 to prevent overflow during calculation.
   */
  compressed_size = (int32) ((int64) rawsize * 9 + 8) / 8;

I'm not very familiar with pglz internals, but I'm a bit puzzled by
this. My first instinct was to compare it to this:

  #define PGLZ_MAX_OUTPUT(_dlen)((_dlen) + 4)

but clearly that's a very different (much simpler) formula. So why
shouldn't pglz_maximum_compressed_size simply use this macro?




compressed_size accounts for possible increase of size during
compression. pglz can consume up to 1 control byte for each 8 bytes of
data in worst case.


OK, but does that actually translate in to the formula? We essentially
need to count 8-byte chunks in raw data, and multiply that by 9. Which
gives us something like

  nchunks = ((rawsize + 7) / 8) * 9;

which is not quite what the patch does.


Even if whole data is compressed well - there can be prefix compressed
extremely ineffectively. Thus, if you are going to decompress rawsize
bytes, you need at most compressed_size bytes of compressed input.


OK, that explains why we can't use the PGLZ_MAX_OUTPUT macro.

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Proposal: Add more compile-time asserts to expose inconsistencies.

2019-09-30 Thread Andres Freund
Hi, 

On September 30, 2019 10:20:54 AM PDT, Tom Lane  wrote:
>Andres Freund  writes:
>> On 2019-09-19 04:46:27 +, Smith, Peter wrote:
>>> In the attached patch example I have defined a new macro
>>> StaticAssertDecl. A typical usage of it is shown in the relpath.c
>>> file.
>
>> I'm in favor of adding that - in fact, when I was working on adding a
>a
>> static_assert wrapper, I was advocating for only supporting compilers
>> that know static_assert, so we can put these in the global scope. The
>> number of compilers that don't support static_assert is pretty small
>> today, especially compared to 2012, when we added these.
>>
>https://www.postgresql.org/message-id/E1TIW1p-0002yE-AY%40gemulon.postgresql.org
>>
>https://www.postgresql.org/message-id/27446.1349024252%40sss.pgh.pa.us
>> Tom, you were arguing for restricting to file scope to get broader
>> compatibility, are you ok with adding a seperate *Decl version?
>
>It could use another look now that we require C99.  I'd be in favor
>of having a decl-level static assert if practical.

What do you think about my proposal further down in the email to rely on extern 
function declarations to have one fallback that works in the relevant scopes 
(not in expressions, but we already treat that differently)? Seems to work on 
common compilers and seems standard conform?

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: Proposal: Add more compile-time asserts to expose inconsistencies.

2019-09-30 Thread Tom Lane
Andres Freund  writes:
> On 2019-09-19 04:46:27 +, Smith, Peter wrote:
>> In the attached patch example I have defined a new macro
>> StaticAssertDecl. A typical usage of it is shown in the relpath.c
>> file.

> I'm in favor of adding that - in fact, when I was working on adding a a
> static_assert wrapper, I was advocating for only supporting compilers
> that know static_assert, so we can put these in the global scope. The
> number of compilers that don't support static_assert is pretty small
> today, especially compared to 2012, when we added these.
> https://www.postgresql.org/message-id/E1TIW1p-0002yE-AY%40gemulon.postgresql.org
> https://www.postgresql.org/message-id/27446.1349024252%40sss.pgh.pa.us
> Tom, you were arguing for restricting to file scope to get broader
> compatibility, are you ok with adding a seperate *Decl version?

It could use another look now that we require C99.  I'd be in favor
of having a decl-level static assert if practical.

regards, tom lane




Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

2019-09-30 Thread Alvaro Herrera
On 2019-Mar-19, Paul Guo wrote:

> Hello, Postgres hackers,
> 
> Please see the attached patches.

BTW in the future if you have two separate patches, please post them in
separate threads and use separate commitfest items for each, even if
they have minor conflicts.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Proposal: Add more compile-time asserts to expose inconsistencies.

2019-09-30 Thread Andres Freund
Hi,

On 2019-09-19 04:46:27 +, Smith, Peter wrote:
> In the attached patch example I have defined a new macro
> StaticAssertDecl. A typical usage of it is shown in the relpath.c
> file.

I'm in favor of adding that - in fact, when I was working on adding a a
static_assert wrapper, I was advocating for only supporting compilers
that know static_assert, so we can put these in the global scope. The
number of compilers that don't support static_assert is pretty small
today, especially compared to 2012, when we added these.

https://www.postgresql.org/message-id/E1TIW1p-0002yE-AY%40gemulon.postgresql.org
https://www.postgresql.org/message-id/27446.1349024252%40sss.pgh.pa.us

Tom, you were arguing for restricting to file scope to get broader
compatibility, are you ok with adding a seperate *Decl version?

Or perhaps it's time to just remove the fallback implementation? I think
we'd have to add proper MSVC support, but that seems like something we
should do anyway.

Back then I was wondering about using tyepedef to emulate static assert
that works both in file and block scope, but that struggles with needing
unique names.

FWIW, the perl5 implementation has precisely that problem. If it's used
in multiple headers (or a header and a .c file), two static asserts may
not be on the same line... - which one will only notice when using an
old compiler.

I wonder if defining the fallback static assert code to something like
  extern void static_assert_func(int static_assert_failed[(condition) ? 1 : 
-1]);
isn't a solution, however. I *think* that's standard C. Seems to work at
least with gcc, clang, msvc, icc.

Re standard: C99's "6.7 Declarations" + 6.7.1 defines 'declaration' to
include extern specifiers and in 6.7.1 5) says "The declaration of an
identifier for a function that has block scope shall have no explicit
storage-class specifier other than extern.".  And "6.8 Statements and
blocks", via "6.8.2 Compound statement" allows declarations in statements.

You can play with a good few compilers at: https://godbolt.org/z/fl0Mzu


> The goal was to leave all existing Postgres static assert macros unchanged, 
> but to allow static asserts to be added in the code at file scope without the 
> need for the explicit ct_asserts function.

It'd probably worthwhile to move many of the current ones.


> Notice, in reality the StaticAssertDecl macro still uses a static function as 
> a wrapper for the StaticAssertStmt,  but now the function is not visible in 
> the source.

I think this implementation is not ok, due to the unique-name issue.

Greetings,

Andres Freund




Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

2019-09-30 Thread Alvaro Herrera
OK, I pushed this patch as well as Alexey's test patch.  It all works
for me, and the coverage report shows that we're doing the new thing ...
though only in the case that rewind *is* required.  There is no test to
verify the case where rewind is *not* required.  I guess it'd also be
good to test the case when we throw the new error, if only for
completeness ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

2019-09-30 Thread Anastasia Lubennikova

26.08.2019 14:15, Antonin Houska wrote:

Peter Geoghegan  wrote:


Consumers of this new infrastructure probably won't be limited to the
deduplication feature;

It'd also solve an open problem of the aggregate push-down patch [1], in
particular see the mention of pg_opclass in [2]: the partial aggregate
node below the final join must not put multiple opclass-equal values of
which are not byte-wise equal into the same group because some
information needed by WHERE or JOIN/ON condition may be lost this
way. The scale of the numeric type is the most obvious example.


I would like to:

* Get some buy-in on whether or not the precise distinctions I would
like to make are correct for deduplication in particular, and as
useful as possible for other cases that we may need to add later on.

* Figure out the exact interface through which opclass/opfamily
authors can represent that their notion of equality is compatible with
deduplication/compression.

It's not entirely clear to me whether opclass or opfamily should carry
this information. opclass probably makes more sense for index related
problems and the aggregate push-down patch can live with that. I don't
see particular reason to add any flag to opfamily. (Planner uses uses
both pg_opclass and pg_opfamily catalogs.)

I think the fact that the aggregate push-down would benefit from this
enhancement should affect choice of the new catalog attribute name,
i.e. it should be not mention words as concrete as "deduplication" or
"compression".



The patch implementing new opclass option is attached.

It adds new attribute pg_opclass.opcisbitwise, which is set to true if 
opclass equality is the same as binary equality.

By default it is true. It is set to false for numeric and float4, float8.

Does anyarray opclasses need special treatment?

New syntax for create opclass is  "CREATE OPERATOR CLASS NOT BITWISE ..."

Any ideas on better names?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

commit 5916d188be1cfff798845720d6e955327aa8c693
Author: Anastasia 
Date:   Mon Sep 30 19:31:40 2019 +0300

Opclass bitwise equality check

diff --git a/doc/src/sgml/ref/create_opclass.sgml b/doc/src/sgml/ref/create_opclass.sgml
index dd5252f..eb2e086 100644
--- a/doc/src/sgml/ref/create_opclass.sgml
+++ b/doc/src/sgml/ref/create_opclass.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-CREATE OPERATOR CLASS name [ DEFAULT ] FOR TYPE data_type
+CREATE OPERATOR CLASS name [ DEFAULT | NOT BITWISE ] FOR TYPE data_type
   USING index_method [ FAMILY family_name ] AS
   {  OPERATOR strategy_number operator_name [ ( op_type, op_type ) ] [ FOR SEARCH | FOR ORDER BY sort_family_name ]
| FUNCTION support_number [ ( op_type [ , op_type ] ) ] function_name ( argument_type [, ...] )
@@ -106,6 +106,18 @@ CREATE OPERATOR CLASS name [ DEFAUL
 

 
+
+NOT BITWISE
+
+ 
+  If present, the operator class equality is not the same as equivalence.
+  For example, two numerics can compare equal but have different scales.
+  Most opclasses implement bitwise equal comparison, alternative behaviour
+  must be set explicitly.
+ 
+
+   
+

 data_type
 
diff --git a/src/backend/commands/opclasscmds.c b/src/backend/commands/opclasscmds.c
index 6a1ccde..bb6a0a7 100644
--- a/src/backend/commands/opclasscmds.c
+++ b/src/backend/commands/opclasscmds.c
@@ -654,6 +654,7 @@ DefineOpClass(CreateOpClassStmt *stmt)
 	values[Anum_pg_opclass_opcintype - 1] = ObjectIdGetDatum(typeoid);
 	values[Anum_pg_opclass_opcdefault - 1] = BoolGetDatum(stmt->isDefault);
 	values[Anum_pg_opclass_opckeytype - 1] = ObjectIdGetDatum(storageoid);
+	values[Anum_pg_opclass_opcisbitwise - 1] = BoolGetDatum(!stmt->isNotBitwise);
 
 	tup = heap_form_tuple(rel->rd_att, values, nulls);
 
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 3432bb9..c2cf06e 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3785,6 +3785,7 @@ _copyCreateOpClassStmt(const CreateOpClassStmt *from)
 	COPY_NODE_FIELD(datatype);
 	COPY_NODE_FIELD(items);
 	COPY_SCALAR_FIELD(isDefault);
+	COPY_SCALAR_FIELD(isNotBitwise);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 18cb014..52e8f0b 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1607,6 +1607,7 @@ _equalCreateOpClassStmt(const CreateOpClassStmt *a, const CreateOpClassStmt *b)
 	COMPARE_NODE_FIELD(datatype);
 	COMPARE_NODE_FIELD(items);
 	COMPARE_SCALAR_FIELD(isDefault);
+	COMPARE_SCALAR_FIELD(isNotBitwise);
 
 	return true;
 }
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 3f67aaf..45a4f8a 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -590,6 +590,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 		hash_partbound
 %type 		hash_partbound_elem
 

Re: FETCH FIRST clause PERCENT option

2019-09-30 Thread Ryan Lambert
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

The latest patch applies cleanly and passes all tests.  I believe all feedback 
has been addressed except for the one case that can be improved upon in later 
work.  Updating status as ready for committer.

Ryan Lambert

The new status of this patch is: Ready for Committer


Re: Optimize partial TOAST decompression

2019-09-30 Thread Andrey Borodin



> 30 сент. 2019 г., в 20:56, Tomas Vondra  
> написал(а):
> 
> I mean this:
> 
>   /*
>* Use int64 to prevent overflow during calculation.
>*/
>   compressed_size = (int32) ((int64) rawsize * 9 + 8) / 8;
> 
> I'm not very familiar with pglz internals, but I'm a bit puzzled by
> this. My first instinct was to compare it to this:
> 
>   #define PGLZ_MAX_OUTPUT(_dlen)  ((_dlen) + 4)
> 
> but clearly that's a very different (much simpler) formula. So why
> shouldn't pglz_maximum_compressed_size simply use this macro?

compressed_size accounts for possible increase of size during compression. pglz 
can consume up to 1 control byte for each 8 bytes of data in worst case.
Even if whole data is compressed well - there can be prefix compressed 
extremely ineffectively. Thus, if you are going to decompress rawsize bytes, 
you need at most compressed_size bytes of compressed input.



Re: Don't codegen deform code for virtual tuples in expr eval for scan fetch

2019-09-30 Thread Soumyadeep Chakraborty
Hi Andres,

I don't feel very strongly about the changes I proposed.

> > I completely agree, that was an important consideration.
> >
> > I had some purely cosmetic suggestions:
> > 1. Rename ExecComputeSlotInfo to eliminate the need for the asserts.
>
> How does renaming it do so? I feel like the asserts are a good idea
> independent of anything else?

I felt that encoding the restriction that the function is meant to be called
only in the context of fetch operations in the function name itself
ensured that we don't call it from a non-fetch operation - something the
asserts within ExecComputeSlotInfo() are guarding against.

>
> > 2. Extract return value to a bool variable for slightly better
> > readability.
>
> To me that seems clearly worse. The variable doesn't add anything, but
> needing to track more state.>

Agreed.

--
Soumyadeep


Re: Online checksums patch - once again

2019-09-30 Thread Andres Freund
Hi,

On 2019-09-30 16:59:00 +0200, Magnus Hagander wrote:
> On Mon, Sep 30, 2019 at 2:49 PM Tomas Vondra 
> wrote:
> > IMHO the patch is ready to go - I think the global barrier solves the
> > issue in the previous version, and that's the only problem I'm aware of.
> > So +1 from me to go ahead and push it.

I don't think the global barrier part is necessarily ready. I wrote it
on a flight back from a conference, to allow Magnus to make some
progress. And I don't think it has received meaningful review so far.


> Especially -- Andres, any chance I can bribe you to take another look?

I'll try to take a look.

Greetings,

Andres Freund




Re: SQL/JSON: JSON_TABLE

2019-09-30 Thread Pavel Stehule
Hi

so 28. 9. 2019 v 3:53 odesílatel Nikita Glukhov 
napsal:

> Attached 39th version of the patches rebased onto current master.
>
>
Regress tests fails on my comp - intel 64bit Linux, gcc 9.2.1

Comments:

* +<->/* Only XMLTABLE and JSON_TABLE are supported currently */

this comment has not sense more. Can be removed. Probably long time there
will not be new format like XML or JSON

* there are new 600 lines to parse_clause.c, maybe this code can be placed
in new file parse_jsontable.c ? parse_clause.c is pretty long already
(json_table has very complex syntax)

*
+<->if (list_length(ci->passing.values) > 0)
+<->{
+<-><-->ListCell   *exprlc;
+<-><-->ListCell   *namelc;
+

It's uncommon usage of list_length function. More common is just "if
(ci->passing.values) {}". Is there any reason for list_length?

* I tested some examples that I found on net. It works very well. Minor
issues are white chars for json type. Probably json_table should to trim
returned values, because after cutting from document, original white chars
lost sense. It is not a problem jsonb type, that reduce white chars on
input.

I did only simple tests and I didn't find any other issues than white chars
problems for json type. I'll continue in some deeper tests. Please, prepare
documentation. Without documentation there is not clean what features are
supported. I have to do blind testing.

Regards

Pavel









> --
> Nikita Glukhov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>


regression.diffs
Description: Binary data


Re: Optimize partial TOAST decompression

2019-09-30 Thread Tomas Vondra

On Fri, Sep 27, 2019 at 01:00:36AM +0200, Tomas Vondra wrote:

On Wed, Sep 25, 2019 at 05:38:34PM -0300, Alvaro Herrera wrote:

Hello, can you please update this patch?



I'm not the patch author, but I've been looking at the patch recently
and I have a rebased version at hand - so attached.

FWIW I believe the patch is solid and in good shape, and it got stuck
after I reported some benchmarks showing somewhat flaky performance. I
know Binguo Bao was trying to reproduce the benchmark, and I assume the
silence means he was not successful :-(

On the larger data set the patch however performed very nicely, so maybe
I just did something stupid while running the smaller one, or maybe it's
just noise (the queries were just a couple of ms in that test). I do
plan to rerun the benchmarks and investigate a bit - if I find the patch
is fine, I'd like to commit it shortly.



OK, I was just about to push this after polishing it a bit, but then I
noticed the patch does not address one of the points from Paul' review,
asking for comment explaining the pglz_maximum_compressed_size formula.

I mean this:

   /*
* Use int64 to prevent overflow during calculation.
*/
   compressed_size = (int32) ((int64) rawsize * 9 + 8) / 8;

I'm not very familiar with pglz internals, but I'm a bit puzzled by
this. My first instinct was to compare it to this:

   #define PGLZ_MAX_OUTPUT(_dlen)   ((_dlen) + 4)

but clearly that's a very different (much simpler) formula. So why
shouldn't pglz_maximum_compressed_size simply use this macro?

Regarding benchmarks - as I mentioned, I've repeated the tests and
everything seems fine. The results from the two usual machines are
available in [1]. There are only very few noise-level regressions and
many very significant speedups.

I have a theory what went wrong in the first run that showed some
regressions - it's possible the machine had CPU power management
enabled. I can't check this retroactively, but it'd explain variability
for short queries, and smooth behavior on longer queries.

[1] https://github.com/tvondra/toast-optimize

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Online checksums patch - once again

2019-09-30 Thread Magnus Hagander
On Mon, Sep 30, 2019 at 2:49 PM Tomas Vondra 
wrote:

> On Mon, Sep 30, 2019 at 01:03:20PM +0200, Magnus Hagander wrote:
> >On Thu, Sep 26, 2019 at 9:48 PM Alvaro Herrera 
> >wrote:
> >
> >> On 2019-Aug-26, Magnus Hagander wrote:
> >>
> >> > OK, let's try this again :)
> >> >
> >> > This is work mainly based in the first version of the online checksums
> >> > patch, but based on top of Andres WIP patchset for global barriers (
> >> >
> >>
> https://www.postgresql.org/message-id/20181030051643.elbxjww5jjgnjaxg%40alap3.anarazel.de
> >> > )
> >> >
> >> > Andres patch has been enhanced with wait events per
> >> >
> >>
> https://www.postgresql.org/message-id/CABUevEwy4LUFqePC5YzanwtzyDDpYvgrj6R5WNznwrO5ouVg1w%40mail.gmail.com
> >> > .
> >>
> >> Travis says your SGML doesn't compile (maybe you just forgot to "git
> >> add" and edit allfiles.sgml?):
> >>
> >
> >Nope, even easier -- the reference pgVerifyChecksums was renamed to
> >pgChecksums and for some reason we missed that in the merge.
> >
> >I've rebased again on top of todays master, but that was the only change I
> >had to make.
> >
> >
> >Other than bots, this patch doesn't seem to have attracted any reviewers
> >> this time around.  Perhaps you need to bribe someone?  (Maybe "how sad
> >> your committer SSH key stopped working" would do?)
> >>
> >
> >Hmm. I don't think that's a bribe, that's a threat. However, maybe it will
> >work.
> >
>
> IMHO the patch is ready to go - I think the global barrier solves the
> issue in the previous version, and that's the only problem I'm aware of.
> So +1 from me to go ahead and push it.
>

Not to downvalue your review, but I'd really appreciate a review from
someone who was one of the ones who spotted the issue initially.

Especially -- Andres, any chance I can bribe you to take another look?


And now please uncomment my commit SSH key again, please ;-)
>

I'll consider it...

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Online checksums patch - once again

2019-09-30 Thread Bruce Momjian
On Mon, Sep 30, 2019 at 04:57:41PM +0200, Magnus Hagander wrote:
> 
> 
> On Mon, Sep 30, 2019 at 4:53 PM Bruce Momjian  wrote:
> 
> On Mon, Sep 30, 2019 at 02:49:44PM +0200, Tomas Vondra wrote:
> > On Mon, Sep 30, 2019 at 01:03:20PM +0200, Magnus Hagander wrote:
> > > Other than bots, this patch doesn't seem to have attracted any
> reviewers
> > > > this time around.  Perhaps you need to bribe someone?  (Maybe "how
> sad
> > > > your committer SSH key stopped working" would do?)
> > > >
> > >
> > > Hmm. I don't think that's a bribe, that's a threat. However, maybe it
> will
> > > work.
> > >
> >
> > IMHO the patch is ready to go - I think the global barrier solves the
> > issue in the previous version, and that's the only problem I'm aware of.
> > So +1 from me to go ahead and push it.
> >
> > And now please uncomment my commit SSH key again, please ;-)
> 
> For adding cluster-level encryption to Postgres, the plan is to create a
> standby that has encryption enabled, then switchover to it.  Is that a
> method we support now for adding checksums to Postgres?  Do we need the
> ability to do it in-place too?
> 
> 
> I definitely think we need the ability to do it in-place as well, yes. 

OK, just has to ask.  I think for encryption, for the first version, we
will do just replica-only changes.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Online checksums patch - once again

2019-09-30 Thread Magnus Hagander
On Mon, Sep 30, 2019 at 4:53 PM Bruce Momjian  wrote:

> On Mon, Sep 30, 2019 at 02:49:44PM +0200, Tomas Vondra wrote:
> > On Mon, Sep 30, 2019 at 01:03:20PM +0200, Magnus Hagander wrote:
> > > Other than bots, this patch doesn't seem to have attracted any
> reviewers
> > > > this time around.  Perhaps you need to bribe someone?  (Maybe "how
> sad
> > > > your committer SSH key stopped working" would do?)
> > > >
> > >
> > > Hmm. I don't think that's a bribe, that's a threat. However, maybe it
> will
> > > work.
> > >
> >
> > IMHO the patch is ready to go - I think the global barrier solves the
> > issue in the previous version, and that's the only problem I'm aware of.
> > So +1 from me to go ahead and push it.
> >
> > And now please uncomment my commit SSH key again, please ;-)
>
> For adding cluster-level encryption to Postgres, the plan is to create a
> standby that has encryption enabled, then switchover to it.  Is that a
> method we support now for adding checksums to Postgres?  Do we need the
> ability to do it in-place too?
>

I definitely think we need the ability to do it in-place as well, yes.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Online checksums patch - once again

2019-09-30 Thread Bruce Momjian
On Mon, Sep 30, 2019 at 02:49:44PM +0200, Tomas Vondra wrote:
> On Mon, Sep 30, 2019 at 01:03:20PM +0200, Magnus Hagander wrote:
> > Other than bots, this patch doesn't seem to have attracted any reviewers
> > > this time around.  Perhaps you need to bribe someone?  (Maybe "how sad
> > > your committer SSH key stopped working" would do?)
> > > 
> > 
> > Hmm. I don't think that's a bribe, that's a threat. However, maybe it will
> > work.
> > 
> 
> IMHO the patch is ready to go - I think the global barrier solves the
> issue in the previous version, and that's the only problem I'm aware of.
> So +1 from me to go ahead and push it.
> 
> And now please uncomment my commit SSH key again, please ;-)

For adding cluster-level encryption to Postgres, the plan is to create a
standby that has encryption enabled, then switchover to it.  Is that a
method we support now for adding checksums to Postgres?  Do we need the
ability to do it in-place too?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Commit fest 2019-09

2019-09-30 Thread Alvaro Herrera
Pencils down!

Commitfest now has officially ended, and here are some numbers:

  statusstring  │ week1 │ week2 │ week3 │ week4 │ count 
┼───┼───┼───┼───┼───
 Needs review   │   165 │   138 │   116 │   118 │   117
 Waiting on Author  │30 │44 │51 │44 │39
 Ready for Committer│11 │ 5 │ 8 │11 │ 9
 Returned with Feedback │ 1 │ 4 │ 5 │ 5 │ 8
 Moved to next CF   │ 2 │ 4 │ 4 │ 4 │ 5
 Committed  │14 │23 │32 │34 │38
 Rejected   │ 1 │ 1 │ 1 │ 1 │ 1
 Withdrawn  │ 4 │ 9 │11 │11 │11

This fourth week was almost as disappointing as the previous one; only
four CF entries got marked committed.  I attribute that to the fact that
we have one of our most active committers busy with the pg12 release,
and most of the other committers are not paying any attention to the
commitfest, instead spending their time developing other patches (I
suppose that means our future still looks bright, but in CF terms it's
still sad.)

We made no progress at all on bugfixes this week -- numbers are
identical to the past one.

On the bright side, a lot of non-committer reviewers seem to have become
more active -- in a subjective look, quite a few patches look better now
than they did a week ago.  Also, patch authors seem to have become more
engaged, so there aren't as many patches that don't even apply.

I will be spending some time today marking patches that weren't updated
in time as returned-with-feedback, and moving others to the next
commitfest, and then I'll close this commitfest.  I'll be also trying to
commit a couple of pending patches, so I may yet provide one last
update.

Thanks everyone,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: add a MAC check for TRUNCATE

2019-09-30 Thread Alvaro Herrera
On 2019-Sep-30, Joe Conway wrote:

> I am not sure I will get to this today. I assume it is ok for me to move
> it forward e.g. next weekend, or is that not in line with commitfest rules?

You can commit whatever patch whenever you feel like it.  I will
probably move this patch to the next commitfest before that, but you can
mark it committed there as soon as you commit it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: add a MAC check for TRUNCATE

2019-09-30 Thread Joe Conway
On 9/25/19 4:47 PM, Joe Conway wrote:
> On 9/25/19 3:56 PM, Alvaro Herrera wrote:
>> Hello
>> 
>> On 2019-Sep-09, Yuli Khodorkovskiy wrote:
>> 
>>> I have included an updated version of the sepgql patch. The
>>> Truncate-Hook patch is unchanged from the last version.
>> 
>> This patch no longer applies.  Can you please rebase?
>> 
>> Joe, do you plan on being committer for this patch?  There seems to be
>> substantial agreement on it.
> 
> 
> I should be able to do that.


I am not sure I will get to this today. I assume it is ok for me to move
it forward e.g. next weekend, or is that not in line with commitfest rules?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: SSL tests failing for channel_binding with OpenSSL <= 1.0.1

2019-09-30 Thread Tom Lane
Michael Paquier  writes:
> On Fri, Sep 27, 2019 at 11:44:57AM +0900, Michael Paquier wrote:
>> We need to do something similar to c3d41cc for the test, as per the
>> attached.  I have tested that with OpenSSL 1.0.1 and 1.0.2 to stress
>> both scenarios.
>> Any objections to this fix?

> Committed as a12c75a1.

The committed fix looks odd: isn't the number of executed tests the
same in both code paths?  (I didn't try it yet.)

regards, tom lane




Re: Implementing Incremental View Maintenance

2019-09-30 Thread Yugo Nagata
Hi,

Attached is the latest patch for supporting self-join views. This also
including the following fix mentioned by Tatsuo Ishii.

> > On 2019-Aug-06, Tatsuo Ishii wrote:
> > 
> >> It's not mentioned below but some bugs including seg fault when
> >> --enable-casser is enabled was also fixed in this patch.
> >> 
> >> BTW, I found a bug with min/max support in this patch and I believe
> >> Yugo is working on it. Details:
> >> https://github.com/sraoss/pgsql-ivm/issues/20

This patch allows to support self-join views, simultaneous updates of more
than one base tables, and also multiple updates of the same base table. 
I first tried to support just self-join, but I found that this is essentially
same as to support simultaneous table updates, so I decided to support them in 
the same commit. I think this will be a base for implementing
Deferred-maintenance in future.



In the new implementation, AFTER triggers are used to collecting tuplestores 
containing transition table contents. When multiple tables are changed, 
multiple AFTER triggers are invoked, then the final AFTER trigger performs 
actual update of the matview. In addition AFTER trigger, also BEFORE trigger
is used to handle global information for view maintenance. 

For example, suppose that we have a view V joining table R,S, and new tuples are
inserted to each table, dR,dS, and dT respectively.

 V = R*S*T
 R_new = R + dR
 S_new = S + dS
 T_new = T + dT

In this situation, we can calculate the new view state as bellow.

V_new 
= R_new * S_new * T_new
= (R + dR) * (S + dS) * (T + dT)
= R*S*T + dR*(S + dS)*(T + dT) + R*dS*(T + dT) + R*S*dT
= V + dR*(S + dS)*(T + dT) + R*dS*(T + dT) + R*S*dT
= V + (dR *S_new*T_new) + (R*dS*T_new) + (R*S*dT)

To calculate view deltas, we need both pre-state (R,S, and T) and post-state 
(R_new, S_new, and T_new) of base tables. 

Post-update  states are available in AFTER trigger, and we calculate pre-update
states by filtering inserted tuples using cmin/xmin system columns, and 
appendding
deleted  tuples which are contained in a old transition table.

In the original core implementation, tuplestores of transition tables were 
freed for each query depth. However, we want to prolong their life plan because
we have to preserve these for a whole query assuming some base tables are 
changed
in other trigger functions, so I added a hack to trigger.c.

Regression tests are also added for self join view, multiple change on the same
table, simultaneous two table changes, and foreign reference constrains.

Here are behavior examples:

1. Table definition
- t: for self-join
- r,s: for 2-ways join

CREATE TABLE r (i int, v int);
CREATE TABLE
CREATE TABLE s (i int, v int);
CREATE TABLE
CREATE TABLE t (i int, v int);
CREATE TABLE

2. Initial data

INSERT INTO r VALUES (1, 10), (2, 20), (3, 30);
INSERT 0 3
INSERT INTO s VALUES (1, 100), (2, 200), (3, 300);
INSERT 0 3
INSERT INTO t VALUES (1, 10), (2, 20), (3, 30);
INSERT 0 3

3. View definition

3.1. self-join(mv_self, v_slef)

CREATE INCREMENTAL MATERIALIZED VIEW mv_self(v1, v2) AS
 SELECT t1.v, t2.v FROM t t1 JOIN t t2 ON t1.i = t2.i;
SELECT 3
CREATE VIEW v_self(v1, v2) AS
 SELECT t1.v, t2.v FROM t t1 JOIN t t2 ON t1.i = t2.i;
CREATE VIEW

3.2. 2-ways join (mv, v)

CREATE INCREMENTAL MATERIALIZED VIEW mv(v1, v2) AS
 SELECT r.v, s.v FROM r JOIN s USING(i);
SELECT 3
CREATE VIEW v(v1, v2) AS
 SELECT r.v, s.v FROM r JOIN s USING(i);
CREATE VIEW

3.3 Initial contents

SELECT * FROM mv_self ORDER BY v1;
 v1 | v2 
+
 10 | 10
 20 | 20
 30 | 30
(3 rows)

SELECT * FROM mv ORDER BY v1;
 v1 | v2  
+-
 10 | 100
 20 | 200
 30 | 300
(3 rows)

4. Update a base table for the self-join view

INSERT INTO t VALUES (4,40);
INSERT 0 1
DELETE FROM t WHERE i = 1;
DELETE 1
UPDATE t SET v = v*10 WHERE i=2;
UPDATE 1

4.1. Results
- Comparison with the normal view

SELECT * FROM mv_self ORDER BY v1;
 v1  | v2  
-+-
  30 |  30
  40 |  40
 200 | 200
(3 rows)

SELECT * FROM v_self ORDER BY v1;
 v1  | v2  
-+-
  30 |  30
  40 |  40
 200 | 200
(3 rows)

5. pdate a base table for the 2-way join view

WITH
 ins_r AS (INSERT INTO r VALUES (1,11) RETURNING 1),
 ins_r2 AS (INSERT INTO r VALUES (3,33) RETURNING 1),
 ins_s AS (INSERT INTO s VALUES (2,222) RETURNING 1),
 upd_r AS (UPDATE r SET v = v + 1000 WHERE i = 2 RETURNING 1),
 dlt_s AS (DELETE FROM s WHERE i = 3 RETURNING 1)
SELECT NULL;
 ?column? 
--
 
(1 row)

5.1. Results
- Comparison with the normal view

SELECT * FROM mv ORDER BY v1;
  v1  | v2  
--+-
   10 | 100
   11 | 100
 1020 | 200
 1020 | 222
(4 rows)

SELECT * FROM v ORDER BY v1;
  v1  | v2  
--+-
   10 | 100
   11 | 100
 1020 | 200
 1020 | 222
(4 rows)



Best Regards,
Yugo Nagata

-- 
Yugo Nagata 
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 5e71a2e865..a288bddcda 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1952,6 +1952,13 @@ SCRAM-SHA-256$iteration count:
   True if 

Re: Online checksums patch - once again

2019-09-30 Thread Tomas Vondra

On Mon, Sep 30, 2019 at 01:03:20PM +0200, Magnus Hagander wrote:

On Thu, Sep 26, 2019 at 9:48 PM Alvaro Herrera 
wrote:


On 2019-Aug-26, Magnus Hagander wrote:

> OK, let's try this again :)
>
> This is work mainly based in the first version of the online checksums
> patch, but based on top of Andres WIP patchset for global barriers (
>
https://www.postgresql.org/message-id/20181030051643.elbxjww5jjgnjaxg%40alap3.anarazel.de
> )
>
> Andres patch has been enhanced with wait events per
>
https://www.postgresql.org/message-id/CABUevEwy4LUFqePC5YzanwtzyDDpYvgrj6R5WNznwrO5ouVg1w%40mail.gmail.com
> .

Travis says your SGML doesn't compile (maybe you just forgot to "git
add" and edit allfiles.sgml?):



Nope, even easier -- the reference pgVerifyChecksums was renamed to
pgChecksums and for some reason we missed that in the merge.

I've rebased again on top of todays master, but that was the only change I
had to make.


Other than bots, this patch doesn't seem to have attracted any reviewers

this time around.  Perhaps you need to bribe someone?  (Maybe "how sad
your committer SSH key stopped working" would do?)



Hmm. I don't think that's a bribe, that's a threat. However, maybe it will
work.



IMHO the patch is ready to go - I think the global barrier solves the
issue in the previous version, and that's the only problem I'm aware of.
So +1 from me to go ahead and push it.

And now please uncomment my commit SSH key again, please ;-)


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pgbench - allow to create partitioned tables

2019-09-30 Thread Fabien COELHO


Hello Amit,


$node->safe_psql('postgres',
   "CREATE TABLESPACE regress_pgbench_tap_1_ts LOCATION '$ets';"


I think that this last command fails if the path contains a "'", so the
'-escaping is necessary. I had to make changes in TAP tests before because
it was not working when the path was a little bit strange, so now I'm
careful.


Hmm, I don't know what kind of issues you have earlier faced,


AFAICR, path with shell-sensitive characters ($ ? * ...) which was 
breaking something somewhere.


but tablespace creation doesn't allow quotes.  See the message 
"tablespace location cannot contain single quotes" in CreateTableSpace.


Hmmm. That is the problem of CreateTableSpace. From an SQL perspective, 
escaping is required. If the command fails later, that is the problem of 
the command implementation, but otherwise this is just a plain syntax 
error at the SQL level.


Also, there are other places in tests like 
src/bin/pg_checksums/t/002_actions.pl which uses the way I have 
mentioned.


Yes, I looked at it and imported the window-specific function to handle 
the path. It does not do anything about escaping.



I don't think there is any need for escaping single-quotes
here


As said, this is required for SQL, or you must know that there are no 
single quotes in the string.



and I am not seeing the use of same.


Sure. It is probably buggy there too.

I don't want to introduce a new pattern in tests which people can then 
tomorrow copy at other places even though such code is not required. 
OTOH, if there is a genuine need for the same, then I am fine.


Hmmm. The committer is right by definition. Here is a version without 
escaping but with a comment instead.


--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index c857aa3cba..e3a0abb4c7 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -306,6 +306,31 @@ pgbench  options  d
   
  
 
+ 
+  --partitions=NUM
+  
+   
+Create a partitioned pgbench_accounts table with
+NUM partitions of nearly equal size for
+the scaled number of accounts.
+Default is 0, meaning no partitioning.
+   
+  
+ 
+
+ 
+  --partition-method=NAME
+  
+   
+Create a partitioned pgbench_accounts table with
+NAME method.
+Expected values are range or hash.
+This option requires that --partitions is set to non-zero.
+If unspecified, default is range.
+   
+  
+ 
+
  
   --tablespace=tablespace
   
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index ed7652bfbf..d71e38b8a8 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -186,6 +186,25 @@ int64		latency_limit = 0;
 char	   *tablespace = NULL;
 char	   *index_tablespace = NULL;
 
+/*
+ * Number of "pgbench_accounts" partitions, found or to create.
+ * When creating, 0 is the default and means no partitioning.
+ * When running, this is the actual number of partitions.
+ */
+static int	partitions = 0;
+
+/* partitioning strategy for "pgbench_accounts" */
+typedef enum
+{
+	PART_NONE,		/* no partitioning */
+	PART_RANGE,	/* range partitioning */
+	PART_HASH		/* hash partitioning */
+}
+			partition_method_t;
+
+static partition_method_t partition_method = PART_NONE;
+static const char *PARTITION_METHOD[] = {"none", "range", "hash"};
+
 /* random seed used to initialize base_random_sequence */
 int64		random_seed = -1;
 
@@ -617,6 +636,9 @@ usage(void)
 		   "  --foreign-keys   create foreign key constraints between tables\n"
 		   "  --index-tablespace=TABLESPACE\n"
 		   "   create indexes in the specified tablespace\n"
+		   "  --partitions=NUM partition pgbench_accounts in NUM parts (default: 0)\n"
+		   "  --partition-method=(range|hash)\n"
+		   "   partition pgbench_accounts with this method (default: range)\n"
 		   "  --tablespace=TABLESPACE  create tables in the specified tablespace\n"
 		   "  --unlogged-tablescreate tables as unlogged tables\n"
 		   "\nOptions to select what to run:\n"
@@ -3601,6 +3623,89 @@ initDropTables(PGconn *con)
 	 "pgbench_tellers");
 }
 
+/*
+ * add fillfactor percent option.
+ *
+ * As default is 100, it could be removed in this case.
+ */
+static void
+append_fillfactor(char *opts, int len)
+{
+	snprintf(opts + strlen(opts), len - strlen(opts),
+			 " with (fillfactor=%d)", fillfactor);
+}
+
+/*
+ * Create "pgbench_accounts" partitions if needed.
+ *
+ * This is the larger table of pgbench default tpc-b like schema
+ * with a known size, so that it can be partitioned by range.
+ */
+static void
+createPartitions(PGconn *con)
+{
+	char		ff[64];
+
+	ff[0] = '\0';
+
+	/*
+	 * Per ddlinfo in initCreateTables below, fillfactor is needed on
+	 * table pgbench_accounts.
+	 */
+	append_fillfactor(ff, sizeof(ff));
+
+	/* we must have to create some partitions */
+	

Re: Minimal logical decoding on standbys

2019-09-30 Thread Amit Khandekar
On Fri, 27 Sep 2019 at 23:21, Robert Haas  wrote:
>
> On Fri, Sep 27, 2019 at 12:41 PM Amit Khandekar  
> wrote:
> > Preferably I want wait_for_xmins() to get rid of the $node parameter,
> > because we can deduce it using slot name. But that requires having
> > get_node_from_slotname(). Your suggestion was to remove
> > get_node_from_slotname() and add back the $node param so as to reduce
> > duplicate code. Instead, how about keeping  wait_for_xmins() in the
> > PostgresNode.pm() ? This way, we won't have duplication, and also we
> > can get rid of param $node. This is just my preference; if you are
> > quite inclined to not have get_node_from_slotname(), I will go with
> > your suggestion.
>
> I'd be inclined not to have it.  I think having a lookup function to
> go from slot name -> node is strange; it doesn't really simplify
> things that much for the caller, and it makes the logic harder to
> follow. It would break outright if you had the same slot name on
> multiple nodes, which is a perfectly reasonable scenario.

Alright. Attached is the updated patch that splits the file into two
files, one that does only xmin related testing, and the other test
file that tests conflict recovery scenarios, and also one scenario
where drop-database drops the slots on the database on standby.
Removed get_slot_xmins() and get_node_from_slotname().
Renamed 'replica' to 'standby'.
Used node->backup() function instead of pg_basebackup command.
Renamed $master_slot to $master_slotname, similarly for $standby_slot.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


logicaldecodng_standby_v3.tar.gz
Description: GNU Zip compressed data


Re: Online checksums patch - once again

2019-09-30 Thread Magnus Hagander
On Thu, Sep 26, 2019 at 9:48 PM Alvaro Herrera 
wrote:

> On 2019-Aug-26, Magnus Hagander wrote:
>
> > OK, let's try this again :)
> >
> > This is work mainly based in the first version of the online checksums
> > patch, but based on top of Andres WIP patchset for global barriers (
> >
> https://www.postgresql.org/message-id/20181030051643.elbxjww5jjgnjaxg%40alap3.anarazel.de
> > )
> >
> > Andres patch has been enhanced with wait events per
> >
> https://www.postgresql.org/message-id/CABUevEwy4LUFqePC5YzanwtzyDDpYvgrj6R5WNznwrO5ouVg1w%40mail.gmail.com
> > .
>
> Travis says your SGML doesn't compile (maybe you just forgot to "git
> add" and edit allfiles.sgml?):
>

Nope, even easier -- the reference pgVerifyChecksums was renamed to
pgChecksums and for some reason we missed that in the merge.

I've rebased again on top of todays master, but that was the only change I
had to make.


Other than bots, this patch doesn't seem to have attracted any reviewers
> this time around.  Perhaps you need to bribe someone?  (Maybe "how sad
> your committer SSH key stopped working" would do?)
>

Hmm. I don't think that's a bribe, that's a threat. However, maybe it will
work.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 
From a52a74193015c8701bf40bb87321dfd56e0dab76 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 29 Oct 2018 10:14:15 -0700
Subject: [PATCH 1/2] WIP: global barriers

This is a squash of three patches from Andres:
* Use procsignal_sigusr1_handler for all shmem connected bgworkers.
* Use  procsignal_sigusr1_handler in all auxiliary processes.
* WIP: global barriers.

And one from Magnus:
* Wait event for global barriers
---
 src/backend/postmaster/autovacuum.c   |   3 +-
 src/backend/postmaster/bgworker.c |  31 +---
 src/backend/postmaster/bgwriter.c |  24 ++
 src/backend/postmaster/checkpointer.c |  19 ++---
 src/backend/postmaster/pgstat.c   |   3 +
 src/backend/postmaster/startup.c  |  18 ++---
 src/backend/postmaster/walwriter.c|  17 +---
 src/backend/replication/walreceiver.c |  20 +
 src/backend/storage/buffer/bufmgr.c   |   4 +
 src/backend/storage/ipc/procsignal.c  | 141 ++
 src/backend/storage/lmgr/proc.c   |  20 +
 src/backend/tcop/postgres.c   |   7 ++
 src/include/pgstat.h  |   1 +
 src/include/storage/proc.h|   9 +++
 src/include/storage/procsignal.h  |  23 +-
 15 files changed, 255 insertions(+), 85 deletions(-)

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 073f313337..24e28dd3a3 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -649,8 +649,9 @@ AutoVacLauncherMain(int argc, char *argv[])
 
 		ResetLatch(MyLatch);
 
-		/* Process sinval catchup interrupts that happened while sleeping */
+		/* Process pending interrupts. */
 		ProcessCatchupInterrupt();
+		ProcessGlobalBarrierIntterupt();
 
 		/* the normal shutdown case */
 		if (got_SIGTERM)
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index b66b517aca..f300f9285b 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -734,23 +734,32 @@ StartBackgroundWorker(void)
 	/*
 	 * Set up signal handlers.
 	 */
+
+
+	/*
+	 * SIGINT is used to signal canceling the current action for processes
+	 * able to run queries.
+	 */
 	if (worker->bgw_flags & BGWORKER_BACKEND_DATABASE_CONNECTION)
-	{
-		/*
-		 * SIGINT is used to signal canceling the current action
-		 */
 		pqsignal(SIGINT, StatementCancelHandler);
-		pqsignal(SIGUSR1, procsignal_sigusr1_handler);
-		pqsignal(SIGFPE, FloatExceptionHandler);
-
-		/* XXX Any other handlers needed here? */
-	}
 	else
-	{
 		pqsignal(SIGINT, SIG_IGN);
+
+	/*
+	 * Everything with a PGPROC should be able to receive procsignal.h style
+	 * signals.
+	 */
+	if (worker->bgw_flags & (BGWORKER_BACKEND_DATABASE_CONNECTION |
+			 BGWORKER_SHMEM_ACCESS))
+		pqsignal(SIGUSR1, procsignal_sigusr1_handler);
+	else
 		pqsignal(SIGUSR1, bgworker_sigusr1_handler);
+
+	if (worker->bgw_flags & BGWORKER_BACKEND_DATABASE_CONNECTION)
+		pqsignal(SIGFPE, FloatExceptionHandler);
+	else
 		pqsignal(SIGFPE, SIG_IGN);
-	}
+
 	pqsignal(SIGTERM, bgworker_die);
 	pqsignal(SIGHUP, SIG_IGN);
 
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 8ec16a3fb8..80a8e3cf4b 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -51,6 +51,7 @@
 #include "storage/ipc.h"
 #include "storage/lwlock.h"
 #include "storage/proc.h"
+#include "storage/procsignal.h"
 #include "storage/shmem.h"
 #include "storage/smgr.h"
 #include "storage/spin.h"
@@ -97,7 +98,6 @@ static volatile sig_atomic_t shutdown_requested = false;
 static void bg_quickdie(SIGNAL_ARGS);
 static void BgSigHupHandler(SIGNAL_ARGS);
 

Re: pgbench - allow to create partitioned tables

2019-09-30 Thread Amit Kapila
On Mon, Sep 30, 2019 at 2:26 PM Fabien COELHO  wrote:
> > I am leaning towards approach (b) unless you and or Alvaro feels (a)
> > is good for now or if you have some other idea.
>
> No other idea. I put back the tablespace creation which I just removed,
> with comments about why it is there.
>
> > If we want to go with option (b), I have small comment in your previous 
> > test:
> > +# tablespace for testing
> > +my $ts = $node->basedir . '/regress_pgbench_tap_1_ts_dir';
> > +mkdir $ts or die "cannot create directory $ts";
> > +my $ets = TestLib::perl2host($ts);
> > +# add needed escaping!
> > +$ets =~ s/'/''/;
> >
> > I am not sure if we really need this quote skipping stuff.  Why can't
> > we write the test as below:
> >
> > # tablespace for testing
> > my $basedir = $node->basedir;
> > my $ts = "$basedir/regress_pgbench_tap_1_ts_dir";
> > mkdir $ts or die "cannot create directory $ts";
> > $ts = TestLib::perl2host($ts);
> > $node->safe_psql('postgres',
> >"CREATE TABLESPACE regress_pgbench_tap_1_ts LOCATION '$ets';"
>
> I think that this last command fails if the path contains a "'", so the
> '-escaping is necessary. I had to make changes in TAP tests before because
> it was not working when the path was a little bit strange, so now I'm
> careful.
>

Hmm, I don't know what kind of issues you have earlier faced, but
tablespace creation doesn't allow quotes.  See the message "tablespace
location cannot contain single quotes" in CreateTableSpace.  Also,
there are other places in tests like
src/bin/pg_checksums/t/002_actions.pl which uses the way I have
mentioned.  I don't think there is any need for escaping single-quotes
here and I am not seeing the use of same.  I don't want to introduce a
new pattern in tests which people can then tomorrow copy at other
places even though such code is not required.  OTOH, if there is a
genuine need for the same, then I am fine.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Zedstore - compressed in-core columnar storage

2019-09-30 Thread Ashutosh Sharma
On Fri, Sep 27, 2019 at 3:09 PM Alexandra Wang  wrote:
>
> Hi Ashutosh,
>
> Sorry I indeed missed your question, thanks for the reminder!
>
> On Wed, Sep 25, 2019 at 4:10 AM Ashutosh Sharma  wrote:
>>
>> > Further, the UPDATE operation on zedstore table is very slow. I think
>> > that's because in case of zedstore table we have to update all the
>> > btree data structures even if one column is updated and that really
>> > sucks. Please let me know if there is some other reason for it.
>> >
>>
>> There was no answer for this in your previous reply. It seems like you
>> missed it. As I said earlier, I tried performing UPDATE operation with
>> optimised build and found that to update around 10 lacs record in
>> zedstore table it takes around 24k ms whereas for normal heap table it
>> takes 2k ms. Is that because in case of zedstore table we have to
>> update all the Btree data structures even if one column is updated or
>> there is some other reason for it. If yes, could you please let us
>> know. FYI, I'm trying to update the table with just two columns.
>
>
> Zedstore UPDATE operation currently fetches the old rows, updates the
> undo pointers stored in the tid btree, and insert new rows into all
> the attribute btrees with the new tids. So performance of updating one
> column makes no difference from updating all the columns. That said,
> the wider the table is, the longer it takes to update, regardless
> updating one column or all the columns.
>
> However, since your test table only has two columns, and we also
> tested the same on a one-column table and got similar results as
> yours, there is definitely room for optimizations. Attached file
> zedstore_update_flames_lz4_first_update.svg is the profiling results
> for the update query on a one-column table with 1M records. It spent
> most of the time in zedstoream_fetch_row() and zsbt_tid_update(). For
> zedstoream_fetch_row(), Taylor and I had some interesting findings
> which I'm going to talk about next, I haven't dived into
> zsbt_tid_update() yet and need to think about it more.
>
> To understand what slows down zedstore UDPATE, Taylor and I did the
> following test and profiling on a zedstore table with only one column.
>
> postgres=# create table onecol(a int) using zedstore;
> postgres=# insert into onecol select i from generate_series(1, 100) i;
>
> -- Create view to count zedstore pages group by page types
> postgres=# CREATE VIEW pg_zs_page_counts AS
> SELECT
> c.relnamespace::regnamespace,
> c.oid,
> c.relname,
> pg_zs_page_type(c.oid, generate_series(0, c.relpages - 1)),
> count(*)
> FROM pg_am am
> JOIN pg_class c ON (c.relam = am.oid)
> WHERE am.amname='zedstore'
> GROUP BY 1,2,3,4;
>
> postgres=# select * from pg_zs_page_counts;
>  relnamespace |  oid  | relname | pg_zs_page_type | count
> --+---+-+-+---
>  public   | 32768 | onecol  | BTREE   |   640
>  public   | 32768 | onecol  | FREE|90
>  public   | 32768 | onecol  | META| 1
> (3 rows)
>
> -- Run update query the first time
> postgres=# update onecol set a = 200; -- profiling attached in 
> zedstore_update_flames_lz4_first_update.svg
> Time: 28760.199 ms (00:28.760)
>
> postgres=# select * from pg_zs_page_counts;
>  relnamespace |  oid  | relname | pg_zs_page_type | count
> --+---+-+-+---
>  public   | 32768 | onecol  | BTREE   |  6254
>  public   | 32768 | onecol  | FREE| 26915
>  public   | 32768 | onecol  | META| 1
> (6 rows)
>

Oops, the first UPDATE created a lot of free pages.

Just FYI, when the second update was ran, it took around 5 mins (which
is almost 10-12 times more than what 1st UPDATE took) but this time
there was no more free pages added, instead the already available free
pages were used. Here is the stats observed before and after second
update,

before:
=
postgres=# select * from pg_zs_page_counts;
 relnamespace |  oid  | relname | pg_zs_page_type | count
--+---+-+-+---
 public   | 16390 | t1  | FREE| 26915
 public   | 16390 | t1  | BTREE   |  7277
 public   | 16390 | t1  | META| 1
(3 rows)


after:

postgres=# select * from pg_zs_page_counts;
 relnamespace |  oid  | relname | pg_zs_page_type | count
--+---+-+-+---
 public   | 16390 | t1  | FREE| 26370
 public   | 16390 | t1  | BTREE   |  7822
 public   | 16390 | t1  | META| 1
(3 rows)

You may see that around 545 pages got added this time and they were
all taken from the free pages list.

> postgres=# select count(*) from pg_zs_btree_pages('onecol') where attno = 0;
>  count
> ---
>   5740
> (1 row)
>

This could be because currently tid blocks 

Re: backup manifests

2019-09-30 Thread Rushabh Lathia
On Wed, Sep 25, 2019 at 6:17 PM vignesh C  wrote:

> On Sat, Sep 21, 2019 at 12:25 AM Robert Haas 
> wrote:
> >
> Some comments:
>
> Manifest file will be in plain text format even if compression is
> specified, should we compress it?
> May be this is intended, just raised the point to make sure that it is
> intended.
> +static void
> +ReceiveBackupManifestChunk(size_t r, char *copybuf, void *callback_data)
> +{
> + WriteManifestState *state = callback_data;
> +
> + if (fwrite(copybuf, r, 1, state->file) != 1)
> + {
> + pg_log_error("could not write to file \"%s\": %m", state->filename);
> + exit(1);
> + }
> +}
>
> WALfile.done file gets added but wal file information is not included
> in the manifest file, should we include WAL file also?
> @@ -599,16 +618,20 @@ perform_base_backup(basebackup_options *opt)
>   (errcode_for_file_access(),
>   errmsg("could not stat file \"%s\": %m", pathbuf)));
>
> - sendFile(pathbuf, pathbuf, , false, InvalidOid);
> + sendFile(pathbuf, pathbuf, , false, InvalidOid, manifest,
> + NULL);
>
>   /* unconditionally mark file as archived */
>   StatusFilePath(pathbuf, fname, ".done");
> - sendFileWithContent(pathbuf, "");
> + sendFileWithContent(pathbuf, "", manifest);
>
> Should we add an option to make manifest generation configurable to
> reduce overhead during backup?
>
> Manifest file does not include directory information, should we include it?
>
> There is one warning:
> In file included from ../../../src/include/fe_utils/string_utils.h:20:0,
>  from pg_basebackup.c:34:
> pg_basebackup.c: In function ‘ReceiveTarFile’:
> ../../../src/interfaces/libpq/pqexpbuffer.h:60:9: warning: the
> comparison will always evaluate as ‘false’ for the address of ‘buf’
> will never be NULL [-Waddress]
>   ((str) == NULL || (str)->maxlen == 0)
>  ^
> pg_basebackup.c:1203:7: note: in expansion of macro ‘PQExpBufferBroken’
>if (PQExpBufferBroken())
>
>
I also observed this warning.  PFA to fix the same.

pg_gmtime can fail in case of malloc failure:
> + /*
> + * Convert time to a string. Since it's not clear what time zone to use
> + * and since time zone definitions can change, possibly causing confusion,
> + * use GMT always.
> + */
> + pg_strftime(timebuf, sizeof(timebuf), "%Y-%m-%d %H:%M:%S %Z",
> + pg_gmtime());
>
>
Fixed that into attached patch.




Regards.
Rushabh Lathia
www.EnterpriseDB.com
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 5a2a7fc..990d758 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -903,6 +903,7 @@ AddFileToManifest(StringInfo manifest, const char *tsoid,
 	static char timebuf[128];
 	static char shatextbuf[PG_SHA256_DIGEST_LENGTH * 2 + 1];
 	int		shatextlen;
+	struct pg_tm *tm;
 
 	/*
 	 * If this file is part of a tablespace, the filename passed to this
@@ -924,8 +925,10 @@ AddFileToManifest(StringInfo manifest, const char *tsoid,
 	 * and since time zone definitions can change, possibly causing confusion,
 	 * use GMT always.
 	 */
-	pg_strftime(timebuf, sizeof(timebuf), "%Y-%m-%d %H:%M:%S %Z",
-pg_gmtime());
+	tm = pg_gmtime();
+	if (tm == NULL)
+		elog(ERROR, "could not convert epoch to timestamp: %m");
+	pg_strftime(timebuf, sizeof(timebuf), "%Y-%m-%d %H:%M:%S %Z", tm);
 
 	/* Convert checksum to hexadecimal. */
 	shatextlen =
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 2abe632..60d3a53 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1200,7 +1200,7 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
 
 		initPQExpBuffer();
 		ReceiveBackupManifestInMemory(conn, );
-		if (PQExpBufferBroken())
+		if (PQExpBufferDataBroken(buf))
 		{
 			pg_log_error("out of memory");
 			exit(1);


Drop Trigger Mechanism with Detached partitions

2019-09-30 Thread M Beena Emerson
Detach partition does not remove the partition trigger dependency as seen
in below scenario.

rm44010_p had 2 partition p1 and p2 and p2 was detached.

A. Description of a partitioned table
\d+ rm44010_p
   Partitioned table "public.rm44010_p"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target
| Description
+-+---+--+-+-+--+-
 c1 | integer |   |  | | plain   |
 |
 c2 | integer |   |  | | plain   |
 |
Partition key: RANGE (c2)
Triggers:
rm44010_trig1 AFTER INSERT ON rm44010_p FOR EACH ROW EXECUTE FUNCTION
trig_func()
Partitions: rm44010_p1 FOR VALUES FROM (1) TO (100)

B. Description of the detached partition still shows the trigger.
\d+ rm44010_p2
Table "public.rm44010_p2"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target
| Description
+-+---+--+-+-+--+-
 c1 | integer |   |  | | plain   |
 |
 c2 | integer |   |  | | plain   |
 |
Triggers:
rm44010_trig1 AFTER INSERT ON rm44010_p2 FOR EACH ROW EXECUTE FUNCTION
trig_func()
Access method: heap

C. Drop Trigger on partitioned table also removes the trigger on the
detached partition.
DROP TRIGGER RM44010_trig1 ON RM44010_p;
DROP TRIGGER
\d+ rm44010_p2
Table "public.rm44010_p2"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target
| Description
+-+---+--+-+-+--+-
 c1 | integer |   |  | | plain   |
 |
 c2 | integer |   |  | | plain   |
 |
Access method: heap



--
Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Parallel grouping sets

2019-09-30 Thread Pengzhou Tang
Hi Richard & Tomas:

I followed the idea of the second approach to add a gset_id in the
targetlist of the first stage of
grouping sets and uses it to combine the aggregate in final stage. gset_id
stuff is still kept
because of GROUPING() cannot uniquely identify a grouping set, grouping
sets may contain
duplicated set, eg: group by grouping sets((c1, c2), (c1,c2)).

There are some differences to implement the second approach comparing to
the original idea from
Richard, gset_id is not used as additional group key in the final stage,
instead, we use it to
dispatch the input tuple to the specified grouping set directly and then do
the aggregate.
One advantage of this is that we can handle multiple rollups with better
performance without APPEND node.

the plan now looks like:

gpadmin=# explain select c1, c2 from gstest group by grouping
sets(rollup(c1, c2), rollup(c3));
 QUERY PLAN

 Finalize MixedAggregate  (cost=1000.00..73108.57 rows=8842 width=12)
   Dispatched by: (GROUPINGSETID())
   Hash Key: c1, c2
   Hash Key: c1
   Hash Key: c3
   Group Key: ()
 Group Key: ()
   ->  Gather  (cost=1000.00..71551.48 rows=17684 width=16)
 Workers Planned: 2
 ->  Partial MixedAggregate  (cost=0.00..68783.08 rows=8842
width=16)
   Hash Key: c1, c2
   Hash Key: c1
   Hash Key: c3
   Group Key: ()
   Group Key: ()
   ->  Parallel Seq Scan on gstest  (cost=0.00..47861.33
rows=208 width=12)
(16 rows)

gpadmin=# set enable_hashagg to off;
gpadmin=# explain select c1, c2 from gstest group by grouping
sets(rollup(c1, c2), rollup(c3));
   QUERY PLAN

 Finalize GroupAggregate  (cost=657730.66..663207.45 rows=8842 width=12)
   Dispatched by: (GROUPINGSETID())
   Group Key: c1, c2
   Sort Key: c1
 Group Key: c1
 Group Key: ()
 Group Key: ()
   Sort Key: c3
 Group Key: c3
   ->  Sort  (cost=657730.66..657774.87 rows=17684 width=16)
 Sort Key: c1, c2
 ->  Gather  (cost=338722.94..656483.04 rows=17684 width=16)
   Workers Planned: 2
   ->  Partial GroupAggregate  (cost=337722.94..653714.64
rows=8842 width=16)
 Group Key: c1, c2
 Group Key: c1
 Group Key: ()
 Group Key: ()
 Sort Key: c3
   Group Key: c3
 ->  Sort  (cost=337722.94..342931.28 rows=208
width=12)
   Sort Key: c1, c2
   ->  Parallel Seq Scan on gstest
 (cost=0.00..47861.33 rows=208 width=12)

References:
[1] https://github.com/greenplum-db/postgres/tree/parallel_groupingsets
_3

On Wed, Jul 31, 2019 at 4:07 PM Richard Guo  wrote:

> On Tue, Jul 30, 2019 at 11:05 PM Tomas Vondra <
> tomas.von...@2ndquadrant.com> wrote:
>
>> On Tue, Jul 30, 2019 at 03:50:32PM +0800, Richard Guo wrote:
>> >On Wed, Jun 12, 2019 at 10:58 AM Richard Guo  wrote:
>> >
>> >> Hi all,
>> >>
>> >> Paul and I have been hacking recently to implement parallel grouping
>> >> sets, and here we have two implementations.
>> >>
>> >> Implementation 1
>> >> 
>> >>
>> >> Attached is the patch and also there is a github branch [1] for this
>> >> work.
>> >>
>> >
>> >Rebased with the latest master.
>> >
>>
>> Hi Richard,
>>
>> thanks for the rebased patch. I think the patch is mostly fine (at least I
>> don't see any serious issues). A couple minor comments:
>>
>
> Hi Tomas,
>
> Thank you for reviewing this patch.
>
>
>>
>> 1) I think get_number_of_groups() would deserve a short explanation why
>> it's OK to handle (non-partial) grouping sets and regular GROUP BY in the
>> same branch. Before these cases were clearly separated, now it seems a bit
>> mixed up and it may not be immediately obvious why it's OK.
>>
>
> Added a short comment in get_number_of_groups() explaining the behavior
> when doing partial aggregation for grouping sets.
>
>
>>
>> 2) There are new regression tests, but they are not added to any schedule
>> (parallel or serial), and so are not executed as part of "make check". I
>> suppose this is a mistake.
>>
>
> Yes, thanks. Added the new regression test in parallel_schedule and
> serial_schedule.
>
>
>>
>> 3) The regression tests do check plan and results like this:
>>
>> EXPLAIN (COSTS OFF, VERBOSE) SELECT ...;
>> SELECT ... ORDER BY 1, 2, 3;
>>
>> which however means that the query might easily use a different plan than
>> what's verified in the eplain (thanks to the additional ORDER BY clause).
>> So I think this should explain and execute the same query.
>>
>> (In this case the 

Re: backup manifests

2019-09-30 Thread Jeevan Chalke
Entry for directory is not added in manifest. So it might be difficult
at client to get to know about the directories. Will it be good to add
an entry for each directory too? May be like:
Dir 

Also, on latest HEAD patches does not apply.

On Wed, Sep 25, 2019 at 6:17 PM vignesh C  wrote:

> On Sat, Sep 21, 2019 at 12:25 AM Robert Haas 
> wrote:
> >
> Some comments:
>
> Manifest file will be in plain text format even if compression is
> specified, should we compress it?
> May be this is intended, just raised the point to make sure that it is
> intended.
> +static void
> +ReceiveBackupManifestChunk(size_t r, char *copybuf, void *callback_data)
> +{
> + WriteManifestState *state = callback_data;
> +
> + if (fwrite(copybuf, r, 1, state->file) != 1)
> + {
> + pg_log_error("could not write to file \"%s\": %m", state->filename);
> + exit(1);
> + }
> +}
>
> WALfile.done file gets added but wal file information is not included
> in the manifest file, should we include WAL file also?
> @@ -599,16 +618,20 @@ perform_base_backup(basebackup_options *opt)
>   (errcode_for_file_access(),
>   errmsg("could not stat file \"%s\": %m", pathbuf)));
>
> - sendFile(pathbuf, pathbuf, , false, InvalidOid);
> + sendFile(pathbuf, pathbuf, , false, InvalidOid, manifest,
> + NULL);
>
>   /* unconditionally mark file as archived */
>   StatusFilePath(pathbuf, fname, ".done");
> - sendFileWithContent(pathbuf, "");
> + sendFileWithContent(pathbuf, "", manifest);
>
> Should we add an option to make manifest generation configurable to
> reduce overhead during backup?
>
> Manifest file does not include directory information, should we include it?
>
> There is one warning:
> In file included from ../../../src/include/fe_utils/string_utils.h:20:0,
>  from pg_basebackup.c:34:
> pg_basebackup.c: In function ‘ReceiveTarFile’:
> ../../../src/interfaces/libpq/pqexpbuffer.h:60:9: warning: the
> comparison will always evaluate as ‘false’ for the address of ‘buf’
> will never be NULL [-Waddress]
>   ((str) == NULL || (str)->maxlen == 0)
>  ^
> pg_basebackup.c:1203:7: note: in expansion of macro ‘PQExpBufferBroken’
>if (PQExpBufferBroken())
>
>
Yes I too obeserved this warning.


> pg_gmtime can fail in case of malloc failure:
> + /*
> + * Convert time to a string. Since it's not clear what time zone to use
> + * and since time zone definitions can change, possibly causing confusion,
> + * use GMT always.
> + */
> + pg_strftime(timebuf, sizeof(timebuf), "%Y-%m-%d %H:%M:%S %Z",
> + pg_gmtime());
>
> Regards,
> Vignesh
> EnterpriseDB: http://www.enterprisedb.com
>
>
>

-- 
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: Usage of the system truststore for SSL certificate validation

2019-09-30 Thread Magnus Hagander
On Sat, Sep 28, 2019 at 9:59 PM Bruce Momjian  wrote:

> On Thu, Sep 19, 2019 at 12:26:27PM -0400, Isaac Morland wrote:
> > If we're going to open this up, can we add an option to say "this key is
> > allowed to log in to this account", SSH style?
> >
> > I like the idea of using keys rather than .pgpass, but I like the ~/.ssh/
> > authorized_keys model and don't like the "set up an entire certificate
> > infrastructure" approach.
>
> This is actually a good question --- why does ssh do it that way and
> Postgres does it another, more like a web server/client.  Maybe it is
> because ssh allows the user to create one key pair, and use it for
> several independent servers, while Postgres assumes the client will only
> connect to multiple related servers controlled by the same CA.  With the
> Postgres approach, you can change the client certificate with no changes
> on the server, while with the ssh model, changing the client certificate
> requires sending the public key to the ssh server to be added to
> ~/.ssh/authorized_keys.
>

The big difference between the two methods in general is the CA yes. In the
SSL based method, you have a central authority that says "these keys are
OK" by means of certificates. In the ssh key model, there's an individual
keypair.

It would make no sense to extend the cert model of authentication to
support ssh style keys, IMO. However, it might make perfect sense to add a
separate pure key based login method. And re-using the way ssh handles keys
there would make sense. But the question is, would you really want to
re-use the ssh *keys*? You couldn't do it server-side anyway (PostgreSQL
won't have access to authorized_keys files for other users than itself, as
unlike ssh it doesn't run as root), and since you need a separate keyspace
you probably wouldn't want to use .ssh/identity either.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: pgbench - allow to create partitioned tables

2019-09-30 Thread Fabien COELHO


Hello Amit,


Attached v18:
  - remove the test tablespace
I had to work around a strange issue around partitioned tables and
the default tablespace.


- if (tablespace != NULL)
+ if (tablespace != NULL && strcmp(tablespace, "pg_default") != 0)

[...]

I don't think we need any such check, rather if the user gives
default_tablespace with 'partitions' option, then let it fail with an
error "cannot specify default tablespace for partitioned relations".


That is the one I wanted to avoid, which is triggered by TAP tests, but 
I'm fine with putting back a tablespace. Given partitioned table strange 
constraints, ISTM desirable to check that it works with options such as 
tablespace and fillfactor.



(b) Create a non-default tablespace to test partitions with "all
possible options" test as you have in your previous version.


Also, add a comment explaining why in that test we are using non-default 
tablespace.



I am leaning towards approach (b) unless you and or Alvaro feels (a)
is good for now or if you have some other idea.


No other idea. I put back the tablespace creation which I just removed, 
with comments about why it is there.



If we want to go with option (b), I have small comment in your previous test:
+# tablespace for testing
+my $ts = $node->basedir . '/regress_pgbench_tap_1_ts_dir';
+mkdir $ts or die "cannot create directory $ts";
+my $ets = TestLib::perl2host($ts);
+# add needed escaping!
+$ets =~ s/'/''/;

I am not sure if we really need this quote skipping stuff.  Why can't
we write the test as below:

# tablespace for testing
my $basedir = $node->basedir;
my $ts = "$basedir/regress_pgbench_tap_1_ts_dir";
mkdir $ts or die "cannot create directory $ts";
$ts = TestLib::perl2host($ts);
$node->safe_psql('postgres',
   "CREATE TABLESPACE regress_pgbench_tap_1_ts LOCATION '$ets';"


I think that this last command fails if the path contains a "'", so the 
'-escaping is necessary. I had to make changes in TAP tests before because 
it was not working when the path was a little bit strange, so now I'm 
careful.


Attached v19:
 - put back a local tablespace plus comments
 - remove the pg_default doubtful workaround.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index c857aa3cba..e3a0abb4c7 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -306,6 +306,31 @@ pgbench  options  d
   
  
 
+ 
+  --partitions=NUM
+  
+   
+Create a partitioned pgbench_accounts table with
+NUM partitions of nearly equal size for
+the scaled number of accounts.
+Default is 0, meaning no partitioning.
+   
+  
+ 
+
+ 
+  --partition-method=NAME
+  
+   
+Create a partitioned pgbench_accounts table with
+NAME method.
+Expected values are range or hash.
+This option requires that --partitions is set to non-zero.
+If unspecified, default is range.
+   
+  
+ 
+
  
   --tablespace=tablespace
   
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index ed7652bfbf..d71e38b8a8 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -186,6 +186,25 @@ int64		latency_limit = 0;
 char	   *tablespace = NULL;
 char	   *index_tablespace = NULL;
 
+/*
+ * Number of "pgbench_accounts" partitions, found or to create.
+ * When creating, 0 is the default and means no partitioning.
+ * When running, this is the actual number of partitions.
+ */
+static int	partitions = 0;
+
+/* partitioning strategy for "pgbench_accounts" */
+typedef enum
+{
+	PART_NONE,		/* no partitioning */
+	PART_RANGE,	/* range partitioning */
+	PART_HASH		/* hash partitioning */
+}
+			partition_method_t;
+
+static partition_method_t partition_method = PART_NONE;
+static const char *PARTITION_METHOD[] = {"none", "range", "hash"};
+
 /* random seed used to initialize base_random_sequence */
 int64		random_seed = -1;
 
@@ -617,6 +636,9 @@ usage(void)
 		   "  --foreign-keys   create foreign key constraints between tables\n"
 		   "  --index-tablespace=TABLESPACE\n"
 		   "   create indexes in the specified tablespace\n"
+		   "  --partitions=NUM partition pgbench_accounts in NUM parts (default: 0)\n"
+		   "  --partition-method=(range|hash)\n"
+		   "   partition pgbench_accounts with this method (default: range)\n"
 		   "  --tablespace=TABLESPACE  create tables in the specified tablespace\n"
 		   "  --unlogged-tablescreate tables as unlogged tables\n"
 		   "\nOptions to select what to run:\n"
@@ -3601,6 +3623,89 @@ initDropTables(PGconn *con)
 	 "pgbench_tellers");
 }
 
+/*
+ * add fillfactor percent option.
+ *
+ * As default is 100, it could be removed in this case.
+ */
+static void
+append_fillfactor(char *opts, int len)
+{
+	snprintf(opts + strlen(opts), len - strlen(opts),
+			 " with (fillfactor=%d)", fillfactor);
+}
+
+/*

Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

2019-09-30 Thread Alexey Kondratov

On 27.09.2019 17:28, Alvaro Herrera wrote:



+   # Now, when pg_rewind apparently succeeded with minimal 
permissions,
+   # add REPLICATION privilege.  So we could test that new standby
+   # is able to connect to the new master with generated config.
+   $node_standby->psql(
+   'postgres', "ALTER ROLE rewind_user WITH REPLICATION;");

I think this better use safe_psql.



Yes, indeed.

On 30.09.2019 10:07, Paul Guo wrote:


2) Are you going to leave -R option completely without tap-tests?
Attached is a small patch, which tests -R option along with the
existing
'remote' case. If needed it may be split into two separate cases.
First,
it tests that pg_rewind is able to succeed with minimal permissions
according to the Michael's patch d9f543e [1]. Next, it checks
presence
of standby.signal and adds REPLICATION permission to rewind_user
to test
that new standby is able to start with generated recovery
configuration.

[1]

https://github.com/postgres/postgres/commit/d9f543e9e9be15f92abdeaf870e57ef289020191

It seems that we could further disabling recovery info setting code 
for the 'remote' test case?


-   my $port_standby = $node_standby->port;
-   $node_master->append_conf(
-       'postgresql.conf', qq(
-primary_conninfo='port=$port_standby'
-));
+   if ($test_mode ne "remote")
+   {
+       my $port_standby = $node_standby->port;
+       $node_master->append_conf(
+           'postgresql.conf',
+           qq(primary_conninfo='port=$port_standby'));

-   $node_master->set_standby_mode();
+       $node_master->set_standby_mode();
+   }




Yeah, it makes sense. It is excessive for remote if we add '-R' there. 
I've updated and attached my test adding patch.




--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From b38bc7d71f7e7d68d66d3bf9af4e6371445aeab2 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Fri, 27 Sep 2019 14:30:57 +0300
Subject: [PATCH v10 3/3] Test new standby start with generated config during
 pg_rewind remote

---
 src/bin/pg_rewind/t/001_basic.pl   |  2 +-
 src/bin/pg_rewind/t/002_databases.pl   |  2 +-
 src/bin/pg_rewind/t/003_extrafiles.pl  |  2 +-
 src/bin/pg_rewind/t/004_pg_xlog_symlink.pl |  2 +-
 src/bin/pg_rewind/t/RewindTest.pm  | 27 +++---
 5 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/src/bin/pg_rewind/t/001_basic.pl b/src/bin/pg_rewind/t/001_basic.pl
index 115192170e..c3293e93df 100644
--- a/src/bin/pg_rewind/t/001_basic.pl
+++ b/src/bin/pg_rewind/t/001_basic.pl
@@ -1,7 +1,7 @@
 use strict;
 use warnings;
 use TestLib;
-use Test::More tests => 10;
+use Test::More tests => 11;
 
 use FindBin;
 use lib $FindBin::RealBin;
diff --git a/src/bin/pg_rewind/t/002_databases.pl b/src/bin/pg_rewind/t/002_databases.pl
index f1eb4fe1d2..1db534c0dc 100644
--- a/src/bin/pg_rewind/t/002_databases.pl
+++ b/src/bin/pg_rewind/t/002_databases.pl
@@ -1,7 +1,7 @@
 use strict;
 use warnings;
 use TestLib;
-use Test::More tests => 6;
+use Test::More tests => 7;
 
 use FindBin;
 use lib $FindBin::RealBin;
diff --git a/src/bin/pg_rewind/t/003_extrafiles.pl b/src/bin/pg_rewind/t/003_extrafiles.pl
index c4040bd562..f4710440fc 100644
--- a/src/bin/pg_rewind/t/003_extrafiles.pl
+++ b/src/bin/pg_rewind/t/003_extrafiles.pl
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use TestLib;
-use Test::More tests => 4;
+use Test::More tests => 5;
 
 use File::Find;
 
diff --git a/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl b/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl
index ed1ddb6b60..639eeb9c91 100644
--- a/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl
+++ b/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl
@@ -14,7 +14,7 @@ if ($windows_os)
 }
 else
 {
-	plan tests => 4;
+	plan tests => 5;
 }
 
 use FindBin;
diff --git a/src/bin/pg_rewind/t/RewindTest.pm b/src/bin/pg_rewind/t/RewindTest.pm
index 68b6004e94..2b45c2789c 100644
--- a/src/bin/pg_rewind/t/RewindTest.pm
+++ b/src/bin/pg_rewind/t/RewindTest.pm
@@ -149,7 +149,7 @@ sub start_master
 
 	# Create custom role which is used to run pg_rewind, and adjust its
 	# permissions to the minimum necessary.
-	$node_master->psql(
+	$node_master->safe_psql(
 		'postgres', "
 		CREATE ROLE rewind_user LOGIN;
 		GRANT EXECUTE ON function pg_catalog.pg_ls_dir(text, boolean, boolean)
@@ -266,9 +266,18 @@ sub run_pg_rewind
 			[
 'pg_rewind',  "--debug",
 "--source-server",$standby_connstr,
-"--target-pgdata=$master_pgdata", "--no-sync"
+"--target-pgdata=$master_pgdata", "-R", "--no-sync"
 			],
 			'pg_rewind remote');
+
+		# Check that standby.signal has been created.
+		ok(-e "$master_pgdata/standby.signal");
+
+		# Now, when pg_rewind apparently succeeded with minimal permissions,
+		# add REPLICATION privilege.  So we could test that new standby
+		# is able to connect to the new master with 

Re: recovery_min_apply_delay in archive recovery causes assertion failure in latch

2019-09-30 Thread Fujii Masao
On Mon, Sep 30, 2019 at 12:42 PM Michael Paquier  wrote:
>
> On Mon, Sep 30, 2019 at 12:49:03AM +0900, Fujii Masao wrote:
> > Attached patch fixes this issue by making archive recovery always ignore
> > recovery_min_apply_delay. This change is OK because
> > recovery_min_apply_delay was introduced for standby mode, I think.
> >
> > This issue is not new in v12. I observed that the issue was reproduced
> > in v11. So the back-patch is necessary.
>
> I have not directly tested, but from a lookup at the code I think
> that you are right.  Perhaps we'd want more safeguards in
> WaitForWALToBecomeAvailable(), like an assert within the
> XLOG_FROM_STREAM part similar to the check you are adding?  My point
> is that we should switch to XLOG_FROM_STREAM only if we are in standby
> mode, and that's the only place where the startup process looks at
> recoveryWakeupLatch.

Thanks for the review! OK, attached is the patch which also added
two assertion checks as you described.

Regards,

-- 
Fujii Masao


fix-assertion-failure-in-latch_v2.patch
Description: Binary data


Re: Skip recovery/standby signal files in pg_basebackup

2019-09-30 Thread Ashutosh Sharma
Hi David,

On Sat, Sep 28, 2019 at 12:23 AM David Steele  wrote:
>
> Hackers,
>
> Restoring these files could cause surprising behaviors so it seems best
> to let the restore process create them when needed.
>

Could you please let us know what is the surprising behaviour you are
talking about here when including recovery/standby signal files in
pg_basebackup output.

If including recovery.conf in pg_basebackup output earlier wasn't a
problem then why including recovery/standby.signal should be a
problem.

Your patch is just trying to skip standby.signal or recovery.signal
files when the base backup is either taken on standby server or it is
taken on the server where the PITR is still going on or may be paused.

What would be the behaviour with your patch when *-R* option is used
with pg_basebackup to take backup from standby server ? Won't it
create a standby.signal file.

> Patch is attached.
>

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com




Re: Removing unneeded self joins

2019-09-30 Thread Konstantin Knizhnik


Slightly refactored version of the patch with more comments.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index b0dcd02..dc8cb9c 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2274,7 +2274,8 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
 	WRITE_OID_FIELD(userid);
 	WRITE_BOOL_FIELD(useridiscurrent);
 	/* we don't try to print fdwroutine or fdw_private */
-	/* can't print unique_for_rels/non_unique_for_rels; BMSes aren't Nodes */
+	WRITE_NODE_FIELD(unique_for_rels);
+	/* can't print non_unique_for_rels; BMSes aren't Nodes */
 	WRITE_NODE_FIELD(baserestrictinfo);
 	WRITE_UINT_FIELD(baserestrict_min_security);
 	WRITE_NODE_FIELD(joininfo);
@@ -2347,6 +2348,19 @@ _outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
 }
 
 static void
+_outUniqueRelInfo(StringInfo str, const UniqueRelInfo *node)
+{
+	WRITE_NODE_TYPE("UNIQUERELINFO");
+
+	WRITE_BITMAPSET_FIELD(outerrelids);
+	if (node->index)
+	{
+		WRITE_OID_FIELD(index->indexoid);
+		WRITE_NODE_FIELD(column_values);
+	}
+}
+
+static void
 _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 {
 	/*
@@ -4122,6 +4136,9 @@ outNode(StringInfo str, const void *obj)
 			case T_StatisticExtInfo:
 _outStatisticExtInfo(str, obj);
 break;
+			case T_UniqueRelInfo:
+_outUniqueRelInfo(str, obj);
+break;
 			case T_ExtensibleNode:
 _outExtensibleNode(str, obj);
 break;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 37b257c..3d0d03b 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3564,7 +3564,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
  * relation_has_unique_index_for
  *	  Determine whether the relation provably has at most one row satisfying
  *	  a set of equality conditions, because the conditions constrain all
- *	  columns of some unique index.
+ *	  columns of some unique index. If index_info is not null, it is set to
+ *	  point to a new UniqueRelInfo containing the index and conditions.
  *
  * The conditions can be represented in either or both of two ways:
  * 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3585,12 +3586,16 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
 bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 			  List *restrictlist,
-			  List *exprlist, List *oprlist)
+			  List *exprlist, List *oprlist,
+			  UniqueRelInfo **unique_info)
 {
 	ListCell   *ic;
 
 	Assert(list_length(exprlist) == list_length(oprlist));
 
+	if (unique_info)
+		*unique_info = NULL;
+
 	/* Short-circuit if no indexes... */
 	if (rel->indexlist == NIL)
 		return false;
@@ -3641,6 +3646,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 	{
 		IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
 		int			c;
+		List *column_values = NIL;
 
 		/*
 		 * If the index is not unique, or not immediately enforced, or if it's
@@ -3689,6 +3695,9 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 if (match_index_to_operand(rexpr, c, ind))
 {
 	matched = true; /* column is unique */
+	column_values = lappend(column_values, rinfo->outer_is_left
+			? get_leftop(rinfo->clause)
+			: get_rightop(rinfo->clause));
 	break;
 }
 			}
@@ -3731,7 +3740,22 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 
 		/* Matched all key columns of this index? */
 		if (c == ind->nkeycolumns)
+		{
+			if (unique_info)
+			{
+/* This may be called in GEQO memory context. */
+MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+*unique_info = makeNode(UniqueRelInfo);
+(*unique_info)->index = ind;
+(*unique_info)->column_values = list_copy(column_values);
+MemoryContextSwitchTo(oldContext);
+			}
+			if (column_values)
+list_free(column_values);
 			return true;
+		}
+		if (column_values)
+			list_free(column_values);
 	}
 
 	return false;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index dc28b56..450d2d4 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -176,7 +176,8 @@ add_paths_to_joinrel(PlannerInfo *root,
 	innerrel,
 	JOIN_INNER,
 	restrictlist,
-	false);
+	false,
+	NULL /*index_info*/);
 			break;
 		default:
 			extra.inner_unique = innerrel_is_unique(root,
@@ -185,7 +186,8 @@ add_paths_to_joinrel(PlannerInfo *root,
 	innerrel,
 	jointype,
 	restrictlist,
-	false);
+	false,
+	NULL /*index_info*/);
 			break;
 	}
 
diff --git a/src/backend/optimizer/plan/analyzejoins.c 

Re: pg_wal/RECOVERYHISTORY file remains after archive recovery

2019-09-30 Thread Masahiko Sawada
On Mon, Sep 30, 2019 at 5:03 PM Michael Paquier  wrote:
>
> On Mon, Sep 30, 2019 at 12:53:58PM +0900, Masahiko Sawada wrote:
> > I think that the above checks are always true because isnt() function
> > checks if the 1st argument and 2nd argument are not the same.
>
> Dammit.  I overlooked this part of the module's doc.
>
> > I've attached the updated version patch including the tests. Please
> > review it.
>
> Thanks, your test allows to reproduce the original problem, so that's
> nice.  I don't have much to say, except some improvements to the
> comments of the test as per the attached.  What do you think?

Thank you for updating! The comment in your patch is much better.

Regards,

--
Masahiko Sawada




Re: pg_wal/RECOVERYHISTORY file remains after archive recovery

2019-09-30 Thread Michael Paquier
On Mon, Sep 30, 2019 at 12:53:58PM +0900, Masahiko Sawada wrote:
> I think that the above checks are always true because isnt() function
> checks if the 1st argument and 2nd argument are not the same.

Dammit.  I overlooked this part of the module's doc.

> I've attached the updated version patch including the tests. Please
> review it.

Thanks, your test allows to reproduce the original problem, so that's
nice.  I don't have much to say, except some improvements to the
comments of the test as per the attached.  What do you think?
--
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 61ba6b852e..790e2c8714 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5461,7 +5461,6 @@ validateRecoveryParameters(void)
 static void
 exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
 {
-	char		recoveryPath[MAXPGPATH];
 	char		xlogfname[MAXFNAMELEN];
 	XLogSegNo	endLogSegNo;
 	XLogSegNo	startLogSegNo;
@@ -5541,17 +5540,6 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
 	XLogFileName(xlogfname, ThisTimeLineID, startLogSegNo, wal_segment_size);
 	XLogArchiveCleanup(xlogfname);
 
-	/*
-	 * Since there might be a partial WAL segment named RECOVERYXLOG, get rid
-	 * of it.
-	 */
-	snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
-	unlink(recoveryPath);		/* ignore any error */
-
-	/* Get rid of any remaining recovered timeline-history file, too */
-	snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYHISTORY");
-	unlink(recoveryPath);		/* ignore any error */
-
 	/*
 	 * Remove the signal files out of the way, so that we don't accidentally
 	 * re-enter archive recovery mode in a subsequent crash.
@@ -7433,6 +7421,7 @@ StartupXLOG(void)
 	if (ArchiveRecoveryRequested)
 	{
 		char		reason[200];
+		char		recoveryPath[MAXPGPATH];
 
 		Assert(InArchiveRecovery);
 
@@ -7489,6 +7478,17 @@ StartupXLOG(void)
 		 */
 		writeTimeLineHistory(ThisTimeLineID, recoveryTargetTLI,
 			 EndRecPtr, reason);
+
+		/*
+		 * Since there might be a partial WAL segment named RECOVERYXLOG, get
+		 * rid of it.
+		 */
+		snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
+		unlink(recoveryPath);	/* ignore any error */
+
+		/* Get rid of any remaining recovered timeline-history file, too */
+		snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYHISTORY");
+		unlink(recoveryPath);	/* ignore any error */
 	}
 
 	/* Save the selected TimeLineID in shared memory, too */
diff --git a/src/test/recovery/t/002_archiving.pl b/src/test/recovery/t/002_archiving.pl
index e1bd3c95cc..b1c4694cdf 100644
--- a/src/test/recovery/t/002_archiving.pl
+++ b/src/test/recovery/t/002_archiving.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 1;
+use Test::More tests => 3;
 use File::Copy;
 
 # Initialize master node, doing archives
@@ -49,3 +49,28 @@ $node_standby->poll_query_until('postgres', $caughtup_query)
 my $result =
   $node_standby->safe_psql('postgres', "SELECT count(*) FROM tab_int");
 is($result, qq(1000), 'check content from archives');
+
+# Check the presence of temporary files specifically generated during
+# archive recovery.  To ensure the presence of the temporary history
+# file, switch to a timeline large enough to allow a standby to recover
+# a history file from an archive.  As this requires at least two timeline
+# switches, promote the existing standby first.  Then create a second
+# standby based on the promoted one.  Finally, the second standby is
+# promoted.
+$node_standby->promote;
+
+my $node_standby2 = get_new_node('standby2');
+$node_standby2->init_from_backup($node_master, $backup_name,
+ has_restoring => 1);
+$node_standby2->start;
+
+# Now promote standby2, and check that files specifically generated during
+# archive recovery are removed by the end of recovery.
+$node_standby2->promote;
+my $node_standby2_data = $node_standby2->data_dir;
+ok(
+	! -f "$node_standby2_data/pg_wal/RECOVERYHISTORY",
+	"RECOVERYHISTORY removed after promotion");
+ok(
+	! -f "$node_standby2_data/pg_wal/RECOVERYXLOG",
+	"RECOVERYXLOG removed after promotion");


signature.asc
Description: PGP signature


Re: Don't codegen deform code for virtual tuples in expr eval for scan fetch

2019-09-30 Thread Andres Freund
Hi,

On 2019-09-27 23:01:05 -0700, Soumyadeep Chakraborty wrote:
> I completely agree, that was an important consideration.
> 
> I had some purely cosmetic suggestions:
> 1. Rename ExecComputeSlotInfo to eliminate the need for the asserts.

How does renaming it do so? I feel like the asserts are a good idea
independent of anything else?


> 2. Extract return value to a bool variable for slightly better
> readability.

To me that seems clearly worse. The variable doesn't add anything, but
needing to track more state.


> 3. Taking the opportunity to use TTS_IS_VIRTUAL.

Good point.

- Andres




Inconsistent usage of BACKEND_* symbols

2019-09-30 Thread Kyotaro Horiguchi
Hello.

While I looked around shutdown sequence, pmdie() uses
"BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER" for sending signal
and PostmasterStateMachine counts them using
BACKEND_TYPE_WORKER. It is the only usage of the combined one. It
seems to me just a leftover of da07a1e856.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index eb9e0221f8..a071c3de87 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2740,8 +2740,8 @@ pmdie(SIGNAL_ARGS)
 			{
 /* autovac workers are told to shut down immediately */
 /* and bgworkers too; does this need tweaking? */
-SignalSomeChildren(SIGTERM,
-   BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER);
+SignalSomeChildren(SIGTERM, BACKEND_TYPE_WORKER);
+
 /* and the autovac launcher too */
 if (AutoVacPID != 0)
 	signal_child(AutoVacPID, SIGTERM);
@@ -2821,8 +2821,7 @@ pmdie(SIGNAL_ARGS)
 		(errmsg("aborting any active transactions")));
 /* shut down all backends and workers */
 SignalSomeChildren(SIGTERM,
-   BACKEND_TYPE_NORMAL | BACKEND_TYPE_AUTOVAC |
-   BACKEND_TYPE_BGWORKER);
+   BACKEND_TYPE_NORMAL | BACKEND_TYPE_WORKER);
 /* and the autovac launcher too */
 if (AutoVacPID != 0)
 	signal_child(AutoVacPID, SIGTERM);


Re: Batch insert in CTAS/MatView code

2019-09-30 Thread Andres Freund
Hi,

On 2019-09-30 12:12:31 +0800, Paul Guo wrote:
> > > > However, I can also see that there is no better alternative.  We need
> > to
> > > > compute the size of accumulated tuples so far, in order to decide
> > whether
> > > > to stop accumulating tuples.  There is no convenient way to obtain the
> > > > length of the tuple, given a slot.  How about making that decision
> > solely
> > > > based on number of tuples, so that we can avoid ExecFetchSlotHeapTuple
> > call
> > > > altogether?
> > >
> > > ... maybe we should add a new operation to slots, that returns the
> > > (approximate?) size of a tuple?
> >
> > Hm, I'm not convinced that it's worth adding that as a dedicated
> > operation. It's not that clear what it'd exactly mean anyway - what
> > would it measure? As referenced in the slot? As if it were stored on
> > disk? etc?
> >
> > I wonder if the right answer wouldn't be to just measure the size of a
> > memory context containing the batch slots, or something like that.
> >
> >
> Probably a better way is to move those logic (append slot to slots, judge
> when to flush, flush, clean up slots) into table_multi_insert()?

That does not strike me as a good idea. The upper layer is going to need
to manage some resources (e.g. it's the only bit that knows about how to
manage lifetime of the incoming data), and by exposing it to each AM
we're going to duplicate the necessary code too.


> Generally the final implementation of table_multi_insert() should be
> able to know the sizes easily. One concern is that currently just COPY
> in the repo uses multi insert, so not sure if other callers in the
> future want their own logic (or set up a flag to allow customization
> but seems a bit over-designed?).

And that is also a concern, it seems unlikely that we'll get the
interface good.


Greetings,

Andres Freund




Re: Skip recovery/standby signal files in pg_basebackup

2019-09-30 Thread Michael Paquier
On Fri, Sep 27, 2019 at 02:52:54PM -0400, David Steele wrote:
> Restoring these files could cause surprising behaviors so it seems best
> to let the restore process create them when needed.
> 
> Patch is attached.

When taking a base backup from a standby, we have always copied
recovery.conf if present, which would have triggered recovery (and
a standby if standby_mode was enabled).  Hence always including
RECOVERY_SIGNAL_FILE would be consistent with the past behavior.

Including STANDBY_SIGNAL_FILE would be consistent with checking if
standby_mode was set or not in recovery.conf.  We have replaced
standby_mode by the standby signal file, so including it if present
is consistent with the past as well, no?
--
Michael


signature.asc
Description: PGP signature


Re: Skip recovery/standby signal files in pg_basebackup

2019-09-30 Thread Masahiko Sawada
On Sat, Sep 28, 2019 at 3:53 AM David Steele  wrote:
>
> Hackers,
>
> Restoring these files could cause surprising behaviors so it seems best
> to let the restore process create them when needed.

It's not a normal situation where a running postgres has either
recovery.signal or standby.signal but I'm +1 on this change for
safety.

The patch looks good to me.

Regards,

--
Masahiko Sawada




Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

2019-09-30 Thread Paul Guo
>
>
> I went through the remaining two patches and they seem to be very clear
> and concise. However, there are two points I could complain about:
>
> 1) Maybe I've missed it somewhere in the thread above, but currently
> pg_rewind allows to run itself with -R and --source-pgdata. In that case
> -R option is just swallowed and neither standby.signal, nor
> postgresql.auto.conf is written, which is reasonable though. Should it
> be stated somehow in the docs that -R option always has to go altogether
> with --source-server? Or should pg_rewind notify user that options are
> incompatible and no recovery configuration will be written?
>

I modified code & doc to address this. In code, pg_rewind will error out
for the local case.


> 2) Are you going to leave -R option completely without tap-tests?
> Attached is a small patch, which tests -R option along with the existing
> 'remote' case. If needed it may be split into two separate cases. First,
> it tests that pg_rewind is able to succeed with minimal permissions
> according to the Michael's patch d9f543e [1]. Next, it checks presence
> of standby.signal and adds REPLICATION permission to rewind_user to test
> that new standby is able to start with generated recovery configuration.
>
> [1]
>
> https://github.com/postgres/postgres/commit/d9f543e9e9be15f92abdeaf870e57ef289020191
>
>
It seems that we could further disabling recovery info setting code for the
'remote' test case?

-   my $port_standby = $node_standby->port;
-   $node_master->append_conf(
-   'postgresql.conf', qq(
-primary_conninfo='port=$port_standby'
-));
+   if ($test_mode ne "remote")
+   {
+   my $port_standby = $node_standby->port;
+   $node_master->append_conf(
+   'postgresql.conf',
+   qq(primary_conninfo='port=$port_standby'));

-   $node_master->set_standby_mode();
+   $node_master->set_standby_mode();
+   }

Thanks.


v10-0001-Add-option-to-write-recovery-configuration-infor.patch
Description: Binary data


Re: pgbench - allow to create partitioned tables

2019-09-30 Thread Amit Kapila
On Sat, Sep 28, 2019 at 11:41 AM Fabien COELHO  wrote:
>
>
> Hello Amit,
>
> > I think we might also need to use pg_get_partkeydef along with
> > pg_partition_tree to fetch the partition method information.  However,
> > I think to find reloid of pgbench_accounts in the current search path,
> > we might need to use some part of query constructed by Fabien.
> >
> > Fabien, what do you think about Alvaro's suggestion?
>
> I think that the current straightforward SQL query is and works fine, and
> I find it pretty elegant. No doubt other solutions could be implemented to
> the same effect, with SQL or possibly through introspection functions.
>
> Incidentally, ISTM that "pg_partition_tree" appears in v12, while
> partitions exist in v11, so it would break uselessly backward
> compatibility of the feature which currently work with v11, which I do not
> find desirable.
>

Fair enough.  Alvaro, do let us know if you think this can be
simplified?  I think even if we find some better way to get that
information as compare to what Fabien has done here, we can change it
later without any impact.

> Attached v18:
>   - remove the test tablespace
> I had to work around a strange issue around partitioned tables and
> the default tablespace.

- if (tablespace != NULL)
+
+ if (tablespace != NULL && strcmp(tablespace, "pg_default") != 0)
  {

- if (index_tablespace != NULL)
+ if (index_tablespace != NULL && strcmp(index_tablespace, "pg_default") != 0)

I don't think such a workaround is a good idea for two reasons (a)
having check on the name ("pg_default") is not advisable, we should
get the tablespace oid and then check if it is same as
DEFAULTTABLESPACE_OID, (b) this will change something which was
previously allowed i.e. to append default tablespace name for the
non-partitioned tables.

I don't think we need any such check, rather if the user gives
default_tablespace with 'partitions' option, then let it fail with an
error "cannot specify default tablespace for partitioned relations".
If we do that then one of the modified pgbench tests will start
failing.  I think we have two options here:

(a) Don't test partitions with "all possible options" test and add a
comment on why we are not testing it there.
(b) Create a non-default tablespace to test partitions with "all
possible options" test as you have in your previous version.  Also,
add a comment explaining why in that test we are using non-default
tablespace.

I am leaning towards approach (b) unless you and or Alvaro feels (a)
is good for now or if you have some other idea.

If we want to go with option (b), I have small comment in your previous test:
+# tablespace for testing
+my $ts = $node->basedir . '/regress_pgbench_tap_1_ts_dir';
+mkdir $ts or die "cannot create directory $ts";
+my $ets = TestLib::perl2host($ts);
+# add needed escaping!
+$ets =~ s/'/''/;

I am not sure if we really need this quote skipping stuff.  Why can't
we write the test as below:

# tablespace for testing
my $basedir = $node->basedir;
my $ts = "$basedir/regress_pgbench_tap_1_ts_dir";
mkdir $ts or die "cannot create directory $ts";
$ts = TestLib::perl2host($ts);
$node->safe_psql('postgres',
"CREATE TABLESPACE regress_pgbench_tap_1_ts LOCATION '$ets';"

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com