Re: [PATCH v3 1/1] Fix detection of preadv/pwritev support for OSX.

2021-01-30 Thread James Hilliard
On Fri, Jan 22, 2021 at 12:32 PM James Hilliard
 wrote:
>
> Fixes:
> gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement 
> -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security 
> -fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument -O2 
> -I../../../../src/include  -isysroot 
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk
> -c -o fd.o fd.c
> fd.c:3661:10: warning: 'pwritev' is only available on macOS 11.0 or newer 
> [-Wunguarded-availability-new]
> part = pg_pwritev(fd, iov, iovcnt, offset);
>^~
> ../../../../src/include/port/pg_iovec.h:49:20: note: expanded from macro 
> 'pg_pwritev'
>^~~
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk/usr/include/sys/uio.h:104:9:
>  note: 'pwritev' has been marked as being introduced in macOS 11.0
>   here, but the deployment target is macOS 10.15.0
> ssize_t pwritev(int, const struct iovec *, int, off_t) 
> __DARWIN_NOCANCEL(pwritev) __API_AVAILABLE(macos(11.0), ios(14.0), 
> watchos(7.0), tvos(14.0));
> ^
> fd.c:3661:10: note: enclose 'pwritev' in a __builtin_available check to 
> silence this warning
> part = pg_pwritev(fd, iov, iovcnt, offset);
>^~
> ../../../../src/include/port/pg_iovec.h:49:20: note: expanded from macro 
> 'pg_pwritev'
>^~~
> 1 warning generated.
>
> This results in a runtime error:
> running bootstrap script ... dyld: lazy symbol binding failed: Symbol not 
> found: _pwritev
>   Referenced from: /usr/local/pgsql/bin/postgres
>   Expected in: /usr/lib/libSystem.B.dylib
>
> dyld: Symbol not found: _pwritev
>   Referenced from: /usr/local/pgsql/bin/postgres
>   Expected in: /usr/lib/libSystem.B.dylib
>
> child process was terminated by signal 6: Abort trap: 6
>
> To fix this we set -Werror=unguarded-availability-new so that a declaration
> check for preadv/pwritev will fail if the symbol is unavailable on the 
> requested
> SDK version.
> ---
> Changes v2 -> v3:
>   - Replace compile check with AC_CHECK_DECLS
>   - Fix preadv detection as well
> Changes v1 -> v2:
>   - Add AC_LIBOBJ(pwritev) when pwritev not available
>   - set -Werror=unguarded-availability-new for CXX flags as well
> ---
>  configure   | 164 ++--
>  configure.ac|   9 +-
>  src/include/pg_config.h.in  |  14 +--
>  src/include/port/pg_iovec.h |   4 +-
>  src/tools/msvc/Solution.pm  |   4 +-
>  5 files changed, 157 insertions(+), 38 deletions(-)
>
> diff --git a/configure b/configure
> index 8af4b99021..07a9b08d80 100755
> --- a/configure
> +++ b/configure
> @@ -5373,6 +5373,98 @@ if test x"$pgac_cv_prog_CC_cflags__Werror_vla" = 
> x"yes"; then
>  fi
>
>
> +  # Prevent usage of symbols marked as newer than our target.
> +
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CC} supports 
> -Werror=unguarded-availability-new, for CFLAGS" >&5
> +$as_echo_n "checking whether ${CC} supports 
> -Werror=unguarded-availability-new, for CFLAGS... " >&6; }
> +if ${pgac_cv_prog_CC_cflags__Werror_unguarded_availability_new+:} false; 
> then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  pgac_save_CFLAGS=$CFLAGS
> +pgac_save_CC=$CC
> +CC=${CC}
> +CFLAGS="${CFLAGS} -Werror=unguarded-availability-new"
> +ac_save_c_werror_flag=$ac_c_werror_flag
> +ac_c_werror_flag=yes
> +cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +/* end confdefs.h.  */
> +
> +int
> +main ()
> +{
> +
> +  ;
> +  return 0;
> +}
> +_ACEOF
> +if ac_fn_c_try_compile "$LINENO"; then :
> +  pgac_cv_prog_CC_cflags__Werror_unguarded_availability_new=yes
> +else
> +  pgac_cv_prog_CC_cflags__Werror_unguarded_availability_new=no
> +fi
> +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
> +ac_c_werror_flag=$ac_save_c_werror_flag
> +CFLAGS="$pgac_save_CFLAGS"
> +CC="$pgac_save_CC"
> +fi
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
> $pgac_cv_prog_CC_cflags__Werror_unguarded_availability_new" >&5
> +$as_echo "$pgac_cv_prog_CC_cflags__Werror_unguarded_availability_new" >&6; }
> +if test x"$pgac_cv_prog_CC_cflags__Werror_unguarded_availability_new" = 
> x"yes"; then
> +  CFLAGS="${CFLAGS} -Werror=unguarded-availability-new"
> +fi
> +
> +
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CXX} supports 
> -Werror=unguarded-availability-new, for CXXFLAGS" >&5
> +$as_echo_n "checking whether ${CXX} supports 
> -Werror=unguarded-availability-new, for CXXFLAGS... " >&6; }
> +if ${pgac_cv_prog_CXX_cxxflags__Werror_unguarded_availability_new+:} false; 
> then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  pgac_save_CXXFLAGS=$CXXFLAGS
> +pgac_save_CXX=$CXX
> +CXX=${CXX}
> +CXXFLAGS="${CXXFLAGS} -Werror=unguarded-availability-new"
> +ac_save_cxx_werror_flag=$ac_cxx_werror_flag
> +ac_cxx_werror_flag=yes
> +ac_ext=cpp
> 

Re: Is Recovery actually paused?

2021-01-30 Thread Dilip Kumar
On Fri, Jan 29, 2021 at 4:33 PM Dilip Kumar  wrote:
>
> On Fri, Jan 29, 2021 at 3:25 PM Yugo NAGATA  wrote:
> >
> > On Thu, 28 Jan 2021 09:55:42 +0530
> > Dilip Kumar  wrote:
> >
> > > On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar  wrote:
> > > >
> > > > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA  wrote:
> > > > >
> > > > > On Wed, 27 Jan 2021 13:29:23 +0530
> > > > > Dilip Kumar  wrote:
> > > > >
> > > > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy
> > > > > > > >  wrote:
> > > > > > > > > +1 to just show the recovery pause state in the output of
> > > > > > > > > pg_is_wal_replay_paused. But, should the function name
> > > > > > > > > "pg_is_wal_replay_paused" be something like
> > > > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, when 
> > > > > > > > > "is" exists
> > > > > > > > > in a function, I expect a boolean output. Others may have 
> > > > > > > > > better
> > > > > > > > > thoughts.
> > > > > > > >
> > > > > > > > Maybe we should leave the existing function 
> > > > > > > > pg_is_wal_replay_paused()
> > > > > > > > alone and add a new one with the name you suggest that returns 
> > > > > > > > text.
> > > > > > > > That would create less burden for tool authors.
> > > > > > >
> > > > > > > +1
> > > > > > >
> > > > > >
> > > > > > Yeah, we can do that, I will send an updated patch soon.
> > > > >
> > > > > This means pg_is_wal_replay_paused is left without any change and this
> > > > > returns whether pause is requested or not? If so, it seems good to 
> > > > > modify
> > > > > the documentation of this function in order to note that this could 
> > > > > not
> > > > > return the actual pause state.
> > > >
> > > > Yes, we can say that it will return true if the replay pause is
> > > > requested.  I am changing that in my new patch.
> > >
> > > I have modified the patch, changes
> > >
> > > - I have added a new interface pg_get_wal_replay_pause_state to get
> > > the pause request state
> > > - Now, we are not waiting for the recovery to actually get paused so I
> > > think it doesn't make sense to put a lot of checkpoints to check the
> > > pause requested so I have removed that check from the
> > > recoveryApplyDelay but I think it better we still keep that check in
> > > the WaitForWalToBecomeAvailable because it can wait forever before the
> > > next wal get available.
> >
> > I think basically the check in WaitForWalToBecomeAvailable is independent
> > of the feature of pg_get_wal_replay_pause_state, that is, reporting the
> > actual pause state.  This function could just return 'pause requested'
> > if a pause is requested during waiting for WAL.
> >
> > However, I agree the change to allow recovery to transit the state to
> > 'paused' during WAL waiting because 'paused' has more useful information
> > for users than 'pause requested'.  Returning 'paused' lets users know
> > clearly that no more WAL are applied until recovery is resumed.  On the
> > other hand, when 'pause requested' is returned, user can't say whether
> > the next WAL wiill be applied or not from this information.
> >
> > For the same reason, I think it is also useful to call recoveryPausesHere
> > in recoveryApplyDelay.
>
> IMHO the WaitForWalToBecomeAvailable can wait until the next wal get
> available so it can not be controlled by user so it is good to put a
> check for the recovery pause,  however recoveryApplyDelay wait for the
> apply delay which is configured by user and it is predictable value by
> the user.  I don't have much objection to putting that check in the
> recoveryApplyDelay as well but I feel it is not necessary.  Any other
> thoughts on this?
>
> > In addition, in RecoveryRequiresIntParameter, recovery should get paused
> > if a parameter value has a problem.  However, pg_get_wal_replay_pause_state
> > will return 'pause requested' in this case. So, I think, we should pass
> > RECOVERY_PAUSED to SetRecoveryPause() instead of RECOVERY_PAUSE_REQUESTED,
> > or call CheckAndSetRecoveryPause() in the loop like recoveryPausesHere().
>
> Yeah, absolutely right, it must pass RECOVERY_PAUSED.  I will change
> this, thanks for noticing this.

I have changed this in the new patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


v9-0001-Provide-a-new-interface-to-get-the-recovery-pause.patch
Description: Binary data


Re: shared tempfile was not removed on statement_timeout

2021-01-30 Thread Tom Lane
Thomas Munro  writes:
>> +1, this seems like a good idea.  This is a little bit like the code
>> near the comments "Don't joggle the elbow of proc_exit".

> So that gives a very simple back-patchable patch.

Hmm, so is the *rest* of that function perfectly okay with being
interrupted?

regards, tom lane




Re: Thoughts on "killed tuples" index hint bits support on standby

2021-01-30 Thread Peter Geoghegan
On Sat, Jan 30, 2021 at 9:11 AM Michail Nikolaev
 wrote:
> > Yeah, it would help a lot. But those bits are precious. So it makes
> > sense to think about what to do with both of them in index AMs at the
> > same time.  Otherwise we risk missing some important opportunity.
>
> Hm. I was trying to "expand the scope" as you said and got an idea... Why we 
> even should do any conflict resolution for hint bits? Or share precious LP 
> bits?

What does it mean today when an LP_DEAD bit is set on a standby? I
don't think that it means nothing at all -- at least not if you think
about it in the most general terms.

In a way it actually means something like "recently dead" even today,
at least in one narrow specific sense: recovery may end, and then we
actually *will* do *something* with the LP_DEAD bit, without directly
concerning ourselves with when or how each LP_DEAD bit was set.

During recovery, we will probably always have to consider the
possibility that LP_DEAD bits that get set on the primary may be
received by a replica through some implementation detail (e.g. LP_DEAD
bits are set in FPIs we replay, maybe even some other thing that
neither of us have thought of). We can't really mask LP_DEAD bits from
the primary in recovery anyway, because of stuff like page-level
checksums. I suspect that it just isn't useful to fight against that.
These preexisting assumptions are baked into everything already.

Why should we assume that *anybody* understands all of the ways in
which that is true?

Even today, "LP_DEAD actually means a limited kind of 'recently dead'
during recovery + hot standby" is something that is true (as I just
went into), but at the same time also has a fuzzy definition. My gut
instinct is to be conservative about that. As I suggested earlier, you
could invent some distinct kind of "recently dead" that achieves your
goals in a way that is 100% orthogonal.

The two unused LP dead bits (unused in indexes, though not tables) are
only precious if we assume that somebody will eventually use them for
something -- if nobody ever uses them then they're 100% useless. The
number of possible projects that might end up using the two bits for
something is not that high -- certainly not more than 3 or 4. Besides,
it is always a good idea to keep the on-disk format as simple and
explicit as possible -- it should be easy to analyze forensically in
the event of bugs or some other kind of corruption, especially by
using tools like amcheck.

As I said in my earlier email, we can even play tricks during page
deletion by treating certain kinds of "recently dead" bits as
approximate things. As a further example, we can "rely" on the
"dead-to-all but only on standby" bits when recovery ends, during a
subsequent write transactions. We can do so by simply using them in
_bt_deadblocks() as if they were LP_DEAD (we'll recheck heap tuples in
heapam.c instead).

> The only way standby could get an "invalid" hint bit - is an FPI from the 
> primary. We could just use the current *btree_mask* infrastructure and clear 
> all "probably invalid" bits from primary in case of standby while applying 
> FPI (in `XLogReadBufferForRedoExtended`)!

I don't like that idea. Apart from what I said already, you're
assuming that setting LP_DEAD bits in indexes on the primary won't
eventually have some value on the standby after it is promoted and can
accept writes -- they really do have meaning and value on standbys.
Plus you'll introduce new overhead for this process during replay,
which creates significant overhead -- often most leaf pages have some
LP_DEAD bits set during recovery.

> For binary compatibility, we could use one of `btpo_flags` bits to mark the 
> page as "primary bits masked". The same way would work for hash\gist too.

I agree that there are plenty of btpo_flags bits. However, I have my
doubts about this.

Why shouldn't this break page-level checksums (or wal_log_hints) in
some way? What about pg_rewind, some eventual implementation of
incremental backups, etc? I suspect that it will be necessary to
invent some entirely new concept that is like a hint bit, but works on
standbys (without breaking anything else).

> No conflicts, only LP_DEAD bit is used by standby, `btpo_flags` has many free 
> bits for now, easy to implement, page content of primary\standby already 
> differs in this bits...
> Looks like an easy and effective solution for me.

Note that the BTP_HAS_GARBAGE flag (which goes in btpo_flags) was
deprecated in commit cf2acaf4. It was never a reliable indicator of
whether or not some LP_DEAD bits are set in the page. And I think that
it never could be made to work like that.

> >> Also, btw, do you know any reason to keep minRecoveryPoint at a low value?
> > Not offhand.
>
> If so, looks like it is not a bad idea to move minRecoveryPoint forward from 
> time to time (for more aggressive standby index hint bits). For each 
> `xl_running_xacts` (about each 15s), for example.

It's necessary for recoverypoints 

Re: shared tempfile was not removed on statement_timeout

2021-01-30 Thread Thomas Munro
On Wed, Jan 27, 2021 at 9:34 AM Thomas Munro  wrote:
> On Wed, Jan 27, 2021 at 12:22 AM Kyotaro Horiguchi
>  wrote:
> > At Tue, 26 Jan 2021 11:00:56 +0200, Heikki Linnakangas  
> > wrote in
> > > Don't we potentially have the same problem with all on_dsm_detach
> > > callbacks? Looking at the other on_dsm_detach callbacks, I don't see
> > > any CHECK_FOR_INTERRUPT() calls in them, but it seems fragile to
> > > assume that.
> > >
> > > I'd suggest adding HOLD/RESUME_INTERRUPTS() to dsm_detach(). At least
> > > around the removal of the callback from the list and calling the
> > > callback. Maybe even over the whole function.
> >
> > Yes, I first came up with HOLD/RESUME_QUERY_INTERRUPTS() to the same
> > location.
>
> +1, this seems like a good idea.  This is a little bit like the code
> near the comments "Don't joggle the elbow of proc_exit".

So that gives a very simple back-patchable patch.
From b27cbabee9a5980c8673c4fee4ea6f7e0c89bdbc Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sun, 31 Jan 2021 14:04:02 +1300
Subject: [PATCH] Hold interrupts while running dsm_detach() callbacks.

While cleaning up after a parallel query or parallel index creation that
created temporary files, we could be interrupted by a statement timeout.
The error handling path would then fail to clean up the files too,
because the callback was already popped off the list.  Prevent this
hazard by holding interrupts while the cleanup code runs.

Thanks to Heikki Linnakangas for this suggestion.

Back-patch to all supported releases.

Reported-by: Justin Pryzby 
Discussion: https://postgr.es/m/20191212180506.gr2...@telsasoft.com
---
 src/backend/storage/ipc/dsm.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index ae82b4bdc0..23bf192727 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -771,8 +771,10 @@ dsm_detach(dsm_segment *seg)
 	/*
 	 * Invoke registered callbacks.  Just in case one of those callbacks
 	 * throws a further error that brings us back here, pop the callback
-	 * before invoking it, to avoid infinite error recursion.
+	 * before invoking it, to avoid infinite error recursion.  Don't allow
+	 * interrupts to prevent cleanup from running to completion.
 	 */
+	HOLD_INTERRUPTS();
 	while (!slist_is_empty(>on_detach))
 	{
 		slist_node *node;
@@ -788,6 +790,7 @@ dsm_detach(dsm_segment *seg)
 
 		function(seg, arg);
 	}
+	RESUME_INTERRUPTS();
 
 	/*
 	 * Try to remove the mapping, if one exists.  Normally, there will be, but
-- 
2.20.1



Re: [sqlsmith] Failed assertion during partition pruning

2021-01-30 Thread Tom Lane
I wrote:
> As I said, I'm now thinking it's not the Assert that's faulty.
> If I'm right about that, it's likely that the mistaken labeling
> of these paths has other consequences beyond triggering this
> assertion.  (If it has none, I think we'd be better off to remove
> these Path fields altogether, and re-identify the parent rels
> here from the RelOptInfo data.)

After more time poking at this, I've concluded that my parenthetical
remark is the right solution: [Merge]AppendPath.partitioned_rels
ought to be nuked from orbit.  It requires a whole lot of squirrely,
now-known-to-be-buggy logic in allpaths.c, and practically the only
place where we use the result is in make_partition_pruneinfo().
That means we're expending a lot of work *per AppendPath*, which
is quite foolish unless there's some reason we can't get the
information at create_plan() time instead.  Which we can.

For simplicity of review I divided the patch into two parts.
0001 revises make_partition_pruneinfo() and children to identify
the relevant parent partitions for themselves, which is not too
hard to do by chasing up the child-to-parent AppendRelInfo links.
Not formerly documented, AFAICT, is that we want to collect just
the parent partitions that are between the Append path's own rel
(possibly equal to it) and the subpaths' rels.  I'd first tried
to code this by using the top_parent_relids and part_rels[] links
in the RelOptInfos, but that turns out not to work.  We might
ascend to a top parent that's above the Append's rel (if considering
an appendpath for a sub-partition, which happens in partitionwise
join).  We could also descend to a child at or below some subpath
level, since there are also cases where subpaths correspond to
unflattened non-leaf partitions.  Either of those things result
in failures.  But once you wrap your head around handling those
restrictions, it's quite simple.

Then 0002 just nukes all the no-longer-necessary logic to compute
the partitioned_rels fields.  There are a couple of changes that
are not quite trivial.  create_[merge_]append_plan were testing
best_path->partitioned_rels != NIL to shortcut calling
make_partition_pruneinfo at all.  I just dropped that, reasoning
that it wasn't saving enough to worry about.  Also,
create_append_path was similarly testing partitioned_rels != NIL
to decide if it needed to go to the trouble of using
get_baserel_parampathinfo.  I changed that to an IS_PARTITIONED_REL()
test, which isn't *quite* the same thing but seems close enough.
(It changes no regression results, anyway.)

This fixes the cases reported by Andreas and Jaime, leaving me
more confident that there's nothing wrong with David's Assert.

I did not try to add a regression test case, mainly because it's
not quite clear to me where the original bug is.  (I'm a little
suspicious that the blame might lie with the "match_partition_order"
cases in generate_orderedappend_paths, which try to bypass
accumulate_append_subpath without updating the partitioned_rels
data.  But I'm not excited about trying to prove that.)

I wonder whether we should consider back-patching this.  Another
thing that seems unclear is whether there is any serious consequence
to omitting some intermediate partitions from the set considered
by make_partition_pruneinfo.  Presumably it could lead to missing
some potential run-time-pruning opportunities, but is there any
worse issue?  If there isn't, this is a bigger change than I want
to put in the back braches.

regards, tom lane

diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 42c7c5f554..c08cfd 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -138,10 +138,12 @@ typedef struct PruneStepResult
 } PruneStepResult;
 
 
+static List *add_part_rel_list(List *partrellists, List *partrellist);
 static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
 		   RelOptInfo *parentrel,
+		   List *partrellist,
+		   List *prunequal,
 		   int *relid_subplan_map,
-		   Relids partrelids, List *prunequal,
 		   Bitmapset **matchedsubplans);
 static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
 PartClauseTarget target,
@@ -213,67 +215,103 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
  *
  * 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
  * of scan paths for its child rels.
- *
- * 'partitioned_rels' is a List containing Lists of relids of partitioned
- * tables (a/k/a non-leaf partitions) that are parents of some of the child
- * rels.  Here we attempt to populate the PartitionPruneInfo by adding a
- * 'prune_infos' item for each sublist in the 'partitioned_rels' list.
- * However, some of the sets of partitioned relations may not require any
- * run-time pruning.  In these cases we'll simply not include a 'prune_infos'
- * item for that set and instead we'll add 

Re: Add primary keys to system catalogs

2021-01-30 Thread Tom Lane
Peter Eisentraut  writes:
> Committed with your update, thanks.

Hmm, shouldn't there have been a catversion bump in there?

regards, tom lane




Re: Is it worth accepting multiple CRLs?

2021-01-30 Thread Peter Eisentraut

On 2021-01-19 09:32, Kyotaro Horiguchi wrote:

At Tue, 19 Jan 2021 09:17:34 +0900 (JST), Kyotaro Horiguchi 
 wrote in

By the way we can do the same thing on CA file/dir, but I personally
think that the benefit from the specify-by-directory for CA files is
far less than CRL files. So I'm not going to do this for CA files for
now.


This is it. A new guc ssl_crl_dir and connection option crldir are
added.


This looks pretty good to me overall.

You need to update the expected result of the postgres_fdw test.

Also check your patch for whitespace errors with git diff --check or 
similar.



One problem raised upthread is the footprint for test is quite large
because all certificate and key files are replaced by this patch. I
think we can shrink the footprint by generating that files on-demand
but that needs openssl frontend to be installed on the development
environment.


I don't understand why you need to recreate all these files.  All your 
patch should contain are the new *.r0 files that are computed from the 
existing *.crl files.  Nothing else should change, AIUI.


Some of the makefile rules for generating the CRL files need some 
refinement.  In


+ssl/root+server-crldir: ssl/server.crl
+   mkdir ssl/root+server-crldir
+   cp ssl/server.crl ssl/root+server-crldir/`openssl crl -hash -noout 
-in ssl/server.crl`.r0
+   cp ssl/root.crl ssl/root+server-crldir/`openssl crl -hash -noout -in 
ssl/root.crl`.r0

+ssl/root+client-crldir: ssl/client.crl
+   mkdir ssl/root+client-crldir
+   cp ssl/client.crl ssl/root+client-crldir/`openssl crl -hash -noout 
-in ssl/client.crl`.r0
+   cp ssl/root.crl ssl/root+client-crldir/`openssl crl -hash -noout -in 
ssl/root.crl`.r0


the rules should also have a dependency on ssl/root.crl in addition to 
ssl/server.crl.


By the way:

-   print $sslconf "ssl_crl_file='root+client.crl'\n";
+   print $sslconf "ssl_crl_file='$crlfile'\n" if (defined $crlfile);
+   print $sslconf "ssl_crl_dir='$crldir'\n" if (defined $crldir);

Trailing "if" doesn't need parentheses.




Re: Allow matching whole DN from a client certificate

2021-01-30 Thread Andrew Dunstan

On 1/29/21 10:10 AM, Andrew Dunstan wrote:
> On 1/28/21 5:10 PM, Andrew Dunstan wrote:
>>> (I'd still recommend switching to use the RFC
>>> flag to OpenSSL, to ease future improvements.) There should be a bunch
>>> of warning documentation saying not to do anything more complex unless
>>> you're an expert, and that the exact regular expression needed may
>>> change depending on the TLS backend.
>> I'll play around with the RFC flag.
>>
>>
>>> If you want to add UTF-8 support to the above, so that things outside
>>> ASCII can be matched nicely, then the ASN1_STRFLGS_ESC_MSB flag should
>>> be removed from the _print_ex() call per OpenSSL documentation, and we
>>> should investigate how it plays with other non-UTF-8 database
>>> encodings. That seems like work but not a huge amount of it.
>> How about if we remove ASN1_STRFLGS_ESC_MSB when the server encoding is
>> UTF8? That should be an extremely simple change.
>>
>>
>>
> Of course, we don't have any idea what the database is at this stage, so
> that wasn't well thought out.
>
>
> I'm inclined at this stage just to turn on RFC2253. If someone wants to
> deal with UTF8 (or other encodings) at a later stage they can.
>
>

Here's a version of the patch that does it that way. For fun I have
modified the certificate so it has two OU fields in the DN.

Finding out how to generate the new cert without regenerating everything
else was also fun :-) Here's what I did in a pristine source with patch
applied, but where configure hasn't been run:

cd src/test/ssl
touch ../../Makefile.global
rm -f ssl/client-dn.crt  ssl/client-dn.key
touch ssl/root_ca-certindex
echo 01> ssl/root_ca.srl
make ssl/client-dn.crt
rm -rf ssl/*certindex* ssl/root_ca.srl ssl/new_certs_dir
rm ../../Makefile.global

Making incremental additions to the certificate set easier wouldn't be a
bad thing.

I wonder if we should really be setting 1 as the serial number, though.
Might it not be better to use, say, `date +%Y%m%d01` rather like we do
with catalog version numbers?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index c4b9971a20..948c4f8aab 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -596,7 +596,7 @@ hostnogssenc  database  user
 
   
-   In addition to the method-specific options listed below, there is one
+   In addition to the method-specific options listed below, there is a
method-independent authentication option clientcert, which
can be specified in any hostssl record.
This option can be set to verify-ca or
@@ -610,6 +610,19 @@ hostnogssenc  database  userhostssl entries.
   
+  
+   On any record using client certificate authentication, that is one
+   using the cert authentication method or one
+   using the clientcert option, you can specify
+   which part of the client certificate credentials to match using
+   the clientname option. This option can have one
+   of two values. If you specify clientname=CN, which
+   is the default, the username is matched against the certificate's
+   Common Name (CN). If instead you specify
+   clientname=DN the username is matched against the
+   entire Distinguished Name (DN) of the certificate.
+   This option is probably best used in comjunction with a username map.
+  
  
 

diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 545635f41a..d2e4c0b8a8 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -2848,12 +2848,15 @@ static int
 CheckCertAuth(Port *port)
 {
 	int			status_check_usermap = STATUS_ERROR;
+	char	   *peer_username;
 
 	Assert(port->ssl);
 
 	/* Make sure we have received a username in the certificate */
-	if (port->peer_cn == NULL ||
-		strlen(port->peer_cn) <= 0)
+	peer_username = port->hba->clientcertname == clientCertCN ? port->peer_cn : port->peer_dn;
+
+	if (peer_username == NULL ||
+		strlen(peer_username) <= 0)
 	{
 		ereport(LOG,
 (errmsg("certificate authentication failed for user \"%s\": client certificate contains no user name",
@@ -2861,8 +2864,8 @@ CheckCertAuth(Port *port)
 		return STATUS_ERROR;
 	}
 
-	/* Just pass the certificate cn to the usermap check */
-	status_check_usermap = check_usermap(port->hba->usermap, port->user_name, port->peer_cn, false);
+	/* Just pass the certificate cn/dn to the usermap check */
+	status_check_usermap = check_usermap(port->hba->usermap, port->user_name, peer_username, false);
 	if (status_check_usermap != STATUS_OK)
 	{
 		/*
@@ -2873,7 +2876,7 @@ CheckCertAuth(Port *port)
 		if (port->hba->clientcert == clientCertFull && port->hba->auth_method != uaCert)
 		{
 			ereport(LOG,
-	(errmsg("certificate validation (clientcert=verify-full) failed for user \"%s\": CN mismatch",
+	(errmsg("certificate validation (clientcert=verify-full) failed for user \"%s\": 

Re: WIP: BRIN multi-range indexes

2021-01-30 Thread John Naylor
On Tue, Jan 26, 2021 at 6:59 PM Tomas Vondra 
wrote:
>
>
>
> On 1/26/21 7:52 PM, John Naylor wrote:
> > On Fri, Jan 22, 2021 at 10:59 PM Tomas Vondra
> > mailto:tomas.von...@enterprisedb.com>>
> > wrote:
> >  > Hmm. I think Alvaro also mentioned he'd like to use this as a drop-in
> >  > replacement for minmax (essentially, using these opclasses as the
> >  > default ones, with the option to switch back to plain minmax). I'm
not
> >  > convinced we should do that - though. Imagine you have minmax
indexes in
> >  > your existing DB, it's working perfectly fine, and then we come and
just
> >  > silently change that during dump/restore. Is there some past example
> >  > when we did something similar and it turned it to be OK?
> >
> > I was assuming pg_dump can be taught to insert explicit opclasses for
> > minmax indexes, so that upgrade would not cause surprises. If that's
> > true, only new indexes would have the different default opclass.
> >
>
> Maybe, I suppose we could do that. But I always found such changes
> happening silently in the background a bit suspicious, because it may be
> quite confusing. I certainly wouldn't expect such difference between
> creating a new index and index created by dump/restore. Did we do such
> changes in the past? That might be a precedent, but I don't recall any
> example ...

I couldn't think of a comparable example either. It comes down to
evaluating risk. On the one hand it's nice if users get an enhancement
without having to know about it, on the other hand if there is some kind of
noticeable regression, that's bad.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Add primary keys to system catalogs

2021-01-30 Thread Peter Eisentraut

On 2021-01-21 18:15, Tom Lane wrote:

After reading the patch again, I have a couple more nits about comments,
which I'll just present as a proposed delta patch.  Otherwise it's good.
I'll mark it RFC.


Committed with your update, thanks.

--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/




Re: Perform COPY FROM encoding conversions in larger chunks

2021-01-30 Thread John Naylor
On Thu, Jan 28, 2021 at 7:36 AM Heikki Linnakangas  wrote:
>
> Even more surprising was that the second patch
> (0002-Replace-pg_utf8_verifystr-with-a-faster-implementati.patch)
> actually made things worse again. I thought it would give a modest gain,
> but nope.

Hmm, that surprised me too.

> Based on these results, I'm going to commit the first patch, but not the
> second one. There are much faster UTF-8 verification routines out there,
> using SIMD instructions and whatnot, and we should consider adopting one
> of those, but that's future work.

I have something in mind for that.

I took a look at v2, and for the first encoding I tried, it fails to report
the error for invalid input:

create database euctest WITH ENCODING 'EUC_CN' LC_COLLATE='zh_CN.eucCN'
LC_CTYPE='zh_CN.eucCN' TEMPLATE=template0;

\c euctest
create table foo (a text);

master:

euctest=# copy foo from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> ä
>> \.
ERROR:  character with byte sequence 0xc3 0xa4 in encoding "UTF8" has no
equivalent in encoding "EUC_CN"
CONTEXT:  COPY foo, line 1

patch:

euctest=# copy foo from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> ä
>> \.
COPY 0
euctest=#

I believe the problem is in UtfToLocal(). I've attached a fix formatted as
a text file to avoid confusing the cfbot. The fix keeps the debugging
ereport() in case you find it useful. Some additional test coverage might
be good here, but not sure how much work that would be. I didn't check any
other conversions yet.


v2-0002 seems fine to me, I just have cosmetic comments here:

+ * the same, no conversion is required by we must still validate that the

s/by/but/

This comment in copyfrom_internal.h above the *StateData struct is the same
as the corresponding one in copyto.c:

 * Multi-byte encodings: all supported client-side encodings encode
multi-byte
 * characters by having the first byte's high bit set. Subsequent bytes of
the
 * character can have the high bit not set. When scanning data in such an
 * encoding to look for a match to a single-byte (ie ASCII) character, we
must
 * use the full pg_encoding_mblen() machinery to skip over multibyte
 * characters, else we might find a false match to a trailing byte. In
 * supported server encodings, there is no possibility of a false match, and
 * it's faster to make useless comparisons to trailing bytes than it is to
 * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is
true
 * when we have to do it the hard way.

The references to pg_encoding_mblen() and encoding_embeds_ascii, are out of
date for copy-from. I'm not sure the rest is relevant to copy-from anymore,
either. Can you confirm?

This comment inside the struct is now out of date as well:

 * Similarly, line_buf holds the whole input line being processed. The
 * input cycle is first to read the whole line into line_buf, convert it
 * to server encoding there, and then extract the individual attribute

HEAD has this macro already:

/* Shorthand for number of unconsumed bytes available in raw_buf */
#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len -
(cstate)->raw_buf_index)

It might make sense to create a CONVERSION_BUF_BYTES equivalent since the
patch calculates cstate->conversion_buf_len - cstate->conversion_buf_index
in a couple places.

--
John Naylor
EDB: http://www.enterprisedb.com
diff --git a/src/backend/commands/copyfromparse.c 
b/src/backend/commands/copyfromparse.c
index 60dfebb0bd..1681efc7a0 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -397,6 +397,9 @@ CopyConvertBuf(CopyFromState cstate)

   (unsigned char *) cstate->raw_buf + cstate->raw_buf_len,

   dstlen,

   true);
+
+ereport(NOTICE, errmsg_internal("convertedbytes: %d", convertedbytes));
+
if (convertedbytes == 0)
{
/*
diff --git a/src/backend/utils/mb/conv.c b/src/backend/utils/mb/conv.c
index b83358bc7a..26600e68ee 100644
--- a/src/backend/utils/mb/conv.c
+++ b/src/backend/utils/mb/conv.c
@@ -497,7 +497,7 @@ UtfToLocal(const unsigned char *utf, int len,
int l;
const pg_utf_to_local_combined *cp;
const unsigned char *start = utf;
-   const unsigned char *cur = utf;
+   unsigned char *cur = unconstify(unsigned char *, utf);
 
if (!PG_VALID_ENCODING(encoding))
ereport(ERROR,
@@ -511,8 +511,6 @@ UtfToLocal(const unsigned char *utf, int len,
unsigned char b3 = 0;
unsigned char b4 = 0;
 
-   cur = iso;
-
   

Re: Thoughts on "killed tuples" index hint bits support on standby

2021-01-30 Thread Michail Nikolaev
Hello, Peter.

> Yeah, it would help a lot. But those bits are precious. So it makes
> sense to think about what to do with both of them in index AMs at the
> same time.  Otherwise we risk missing some important opportunity.

Hm. I was trying to "expand the scope" as you said and got an idea... Why
we even should do any conflict resolution for hint bits? Or share precious
LP bits?

The only way standby could get an "invalid" hint bit - is an FPI from the
primary. We could just use the current *btree_mask* infrastructure and
clear all "probably invalid" bits from primary in case of standby while
applying FPI (in `XLogReadBufferForRedoExtended`)!
For binary compatibility, we could use one of `btpo_flags` bits to mark the
page as "primary bits masked". The same way would work for hash\gist too.

No conflicts, only LP_DEAD bit is used by standby, `btpo_flags` has many
free bits for now, easy to implement, page content of primary\standby
already differs in this bits...
Looks like an easy and effective solution for me.

What do you think?

>> Also, btw, do you know any reason to keep minRecoveryPoint at a low
value?
> Not offhand.

If so, looks like it is not a bad idea to move minRecoveryPoint forward
from time to time (for more aggressive standby index hint bits). For each
`xl_running_xacts` (about each 15s), for example.

> BTW, what happens when the page splits on the primary, btw? Does your
> patch "move over" the LP_DEAD bits to each half of the split?

That part is not changed in any way.

Thanks,
Michail.


Re: Should we make Bitmapsets a kind of Node?

2021-01-30 Thread Tom Lane
Peter Geoghegan  writes:
> On Fri, Jan 29, 2021 at 6:44 PM Tom Lane  wrote:
>> Pointer width is interesting, but really it's a solved problem
>> compared to these.

> What about USE_FLOAT8_BYVAL?

That's an annoyance, sure, but I don't recall many recent bugs
related to violations of that coding rule.  As long as you don't
violate the Datum abstraction it's pretty safe.

regards, tom lane




Re: Key management with tests

2021-01-30 Thread Tom Kincaid
Thanks Stephen, Bruce and Masahiko,


> > discussions so far and the point behind the design so that everyone
> > can understand why this feature is designed in that way. To do that,
> > it might be a good start to sort the wiki page since it has data
> > encryption part, KMS, and ToDo mixed.
>
> I hope it's pretty clear that I'm also very much in support of both this
> effort with the KMS and of TDE in general- TDE is specifically,
> repeatedly, called out as a capability whose lack is blocking PG from
> being able to be used for certain use-cases that it would otherwise be
> well suited for, and that's really unfortunate.
>

It is clear you are supportive.

As you know, I share your point of view that PG adoption is suffering for
certain use cases because it does not have TDE.

I appreciate the recent discussion and reviews of the KMS in particular,
> and of the patches which have been sent enabling TDE based on the KMS
> patches.  Having them be relatively independent seems to be an ongoing
> concern and perhaps we should figure out a way to more clearly put them
> together.  That is- the KMS patches have been posted on one thread, and
> TDE PoC patches which use the KMS patches have been on another thread,
> leading some to not realize that there's already been TDE PoC work done
> based on the KMS patches.  Seems like it might make sense to get one
> patch set which goes all the way from the KMS and includes the TDE PoC,
> even if they don't all go in at once.
>

Sounds good, thanks Masahiko, let's see if we can get consensus on the
approach for moving this forward see below.


>
> together, as a few on this thread have voiced, but there's no doubt that
> this is a large project and it's hard to see how we could possibly
> commit all of it at once.
>

I propose that we meet to discuss what approach we want to use to move TDE
forward.  We then start a new thread with a proposal on the approach
and finalize it via community consensus. I will invite Bruce, Stephen and
Masahiko to this meeting. If anybody else would like to participate in this
discussion and subsequently in the effort to get TDE in PG1x, please let me
know. Assuming Bruce, Stephen and Masahiko are down for this, I (or a
volunteer from this meeting) will post the proposal for how we move this
patch forward in another thread. Hopefully, we can get consensus on that
and subsequently restart the execution of delivering this feature.





> Thanks!
>
> Stephen
>


-- 
Thomas John Kincaid


Re: Single transaction in the tablesync worker?

2021-01-30 Thread Amit Kapila
On Fri, Jan 29, 2021 at 4:07 PM Peter Smith  wrote:
>
>
> Differences from v21:
> + Patch is rebased to latest OSS HEAD @ 29/Jan.
> + Includes new code as suggested [ak0128] to ensure no dangling slots
> at Drop/AlterSubscription.
> + Removes the slot/origin cleanup down by process interrupt logic
> (cleanup_at_shutdown function).
> + Addresses some minor review comments.
>

I have made the below changes in the patch. Let me know what you think
about these?
1. It was a bit difficult to understand the code in DropSubscription
so I have rearranged the code to match the way we are doing in HEAD
where we drop the slots at the end after finishing all the other
cleanup.
2. In AlterSubscription_refresh(), we can't allow workers to be
stopped at commit time as we have already dropped the slots because
the worker can access the dropped slot. We need to stop the workers
before dropping slots. This makes all the code related to
logicalrep_worker_stop_at_commit redundant.
3. In AlterSubscription_refresh(), we need to acquire the lock on
pg_subscription_rel only when we try to remove any subscription rel.
4. Added/Changed quite a few comments.

-- 
With Regards,
Amit Kapila.


v23-0001-Tablesync-Solution1.patch
Description: Binary data


Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?

2021-01-30 Thread Thomas Munro
> While reading the ProcSignalBarrier code, I couldn't resist replacing
> its poll/sleep loop with condition variables.

Oops, that version accidentally added and then removed an unnecessary
change due to incorrect commit squashing.  Here's a better pair of
patches.
From 6a3f396bac604aa0b96b79d75983e1dfa2302ea6 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 30 Jan 2021 14:02:32 +1300
Subject: [PATCH v2 1/2] Use a global barrier to fix DROP TABLESPACE on
 Windows.

Previously, it was possible for DROP TABLESPACE to fail because a table
had been dropped, but other backends still had open file handles.
Windows won't allow a directory containing unlinked-but-still-open files
to be unlinked.  Tackle this problem by forcing all backends to close
all fds.  No change for Unix systems, which didn't suffer from the
problem, but simulate the need to do it in assertion builds just to
exercise the code.

XXX maybe only do this after 2nd failure?
---
 src/backend/commands/tablespace.c| 22 +++---
 src/backend/storage/ipc/procsignal.c | 22 ++
 src/backend/storage/smgr/md.c|  6 ++
 src/backend/storage/smgr/smgr.c  |  6 ++
 src/include/storage/md.h |  1 +
 src/include/storage/procsignal.h |  7 +--
 src/include/storage/smgr.h   |  1 +
 7 files changed, 36 insertions(+), 29 deletions(-)

diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index 69ea155d50..8853f1b658 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -520,15 +520,23 @@ DropTableSpace(DropTableSpaceStmt *stmt)
 		 * but we can't tell them apart from important data files that we
 		 * mustn't delete.  So instead, we force a checkpoint which will clean
 		 * out any lingering files, and try again.
-		 *
-		 * XXX On Windows, an unlinked file persists in the directory listing
-		 * until no process retains an open handle for the file.  The DDL
-		 * commands that schedule files for unlink send invalidation messages
-		 * directing other PostgreSQL processes to close the files.  DROP
-		 * TABLESPACE should not give up on the tablespace becoming empty
-		 * until all relevant invalidation processing is complete.
 		 */
 		RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
+		/*
+		 * On Windows, an unlinked file persists in the directory listing until
+		 * no process retains an open handle for the file.  The DDL
+		 * commands that schedule files for unlink send invalidation messages
+		 * directing other PostgreSQL processes to close the files, but nothing
+		 * guarantees they'll be processed in time.  So, we'll also use a
+		 * global barrier to ask all backends to close all files, and wait
+		 * until they're finished.
+		 */
+#if defined(WIN32) || defined(USE_ASSERT_CHECKING)
+		LWLockRelease(TablespaceCreateLock);
+		WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));
+		LWLockAcquire(TablespaceCreateLock, LW_EXCLUSIVE);
+#endif
+		/* And now try again. */
 		if (!destroy_tablespace_directories(tablespaceoid, false))
 		{
 			/* Still not empty, the files must be important then */
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index c43cdd685b..30082d443f 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -27,6 +27,7 @@
 #include "storage/latch.h"
 #include "storage/proc.h"
 #include "storage/shmem.h"
+#include "storage/smgr.h"
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 
@@ -98,7 +99,7 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
 static void ResetProcSignalBarrierBits(uint32 flags);
-static bool ProcessBarrierPlaceholder(void);
+static bool ProcessBarrierSmgrRelease(void);
 
 /*
  * ProcSignalShmemSize
@@ -538,8 +539,8 @@ ProcessProcSignalBarrier(void)
 type = (ProcSignalBarrierType) pg_rightmost_one_pos32(flags);
 switch (type)
 {
-	case PROCSIGNAL_BARRIER_PLACEHOLDER:
-		processed = ProcessBarrierPlaceholder();
+	case PROCSIGNAL_BARRIER_SMGRRELEASE:
+		processed = ProcessBarrierSmgrRelease();
 		break;
 }
 
@@ -605,20 +606,9 @@ ResetProcSignalBarrierBits(uint32 flags)
 }
 
 static bool
-ProcessBarrierPlaceholder(void)
+ProcessBarrierSmgrRelease(void)
 {
-	/*
-	 * XXX. This is just a placeholder until the first real user of this
-	 * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
-	 * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
-	 * appropriately descriptive. Get rid of this function and instead have
-	 * ProcessBarrierSomethingElse. Most likely, that function should live in
-	 * the file pertaining to that subsystem, rather than here.
-	 *
-	 * The return value should be 'true' if the barrier was successfully
-	 * absorbed and 

Fix DROP TABLESPACE on Windows with ProcSignalBarrier?

2021-01-30 Thread Thomas Munro
Hello,

In tablespace.c, a comment explains that DROP TABLESPACE can fail
bogusly because of Windows file semantics:

 * XXX On Windows, an unlinked file persists in the directory listing
 * until no process retains an open handle for the file.  The DDL
 * commands that schedule files for unlink send invalidation messages
 * directing other PostgreSQL processes to close the files.  DROP
 * TABLESPACE should not give up on the tablespace becoming empty
 * until all relevant invalidation processing is complete.

While trying to get the AIO patchset working on more operating
systems, this turned out to be a problem.  Andres mentioned the new
ProcSignalBarrier stuff as a good way to tackle this, so I tried it
and it seems to work well so far.

The idea in this initial version is to tell every process in the
cluster to close all fds, and then try again.  That's a pretty large
hammer, but it isn't reached on Unix, and with slightly more work it
could be made to happen only after 2 failures on Windows.  It was
tempting to try to figure out how to use the sinval mechanism to close
precisely the right files instead, but it doesn't look safe to run
sinval at arbitrary CFI() points.  It's easier to see that the
pre-existing closeAllVfds() function has an effect that is local to
fd.c and doesn't affect the VFDs or SMgrRelations, so any CFI() should
be an OK time to run that.

While reading the ProcSignalBarrier code, I couldn't resist replacing
its poll/sleep loop with condition variables.

Thoughts?
From bf32e7426cf780d33ae1140c443ae8964397a3c9 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 30 Jan 2021 14:02:32 +1300
Subject: [PATCH 1/2] Use a global barrier to fix DROP TABLESPACE on Windows.

Previously, it was possible for DROP TABLESPACE to fail because a table
had been dropped, but other backends still had open file handles.
Windows won't allow a directory containing unlinked-but-still-open files
to be unlinked.  Tackle this problem by forcing all backends to close
all fds.  No change for Unix systems, which didn't suffer from the
problem, but simulate the need to do it in assertion builds just to
exercise the code.

XXX maybe only do this after 2nd failure?
---
 src/backend/commands/tablespace.c| 22 +++---
 src/backend/storage/ipc/procsignal.c | 28 
 src/backend/storage/smgr/md.c|  6 ++
 src/backend/storage/smgr/smgr.c  |  6 ++
 src/include/storage/md.h |  1 +
 src/include/storage/procsignal.h |  7 +--
 src/include/storage/smgr.h   |  1 +
 7 files changed, 42 insertions(+), 29 deletions(-)

diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index 69ea155d50..8853f1b658 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -520,15 +520,23 @@ DropTableSpace(DropTableSpaceStmt *stmt)
 		 * but we can't tell them apart from important data files that we
 		 * mustn't delete.  So instead, we force a checkpoint which will clean
 		 * out any lingering files, and try again.
-		 *
-		 * XXX On Windows, an unlinked file persists in the directory listing
-		 * until no process retains an open handle for the file.  The DDL
-		 * commands that schedule files for unlink send invalidation messages
-		 * directing other PostgreSQL processes to close the files.  DROP
-		 * TABLESPACE should not give up on the tablespace becoming empty
-		 * until all relevant invalidation processing is complete.
 		 */
 		RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
+		/*
+		 * On Windows, an unlinked file persists in the directory listing until
+		 * no process retains an open handle for the file.  The DDL
+		 * commands that schedule files for unlink send invalidation messages
+		 * directing other PostgreSQL processes to close the files, but nothing
+		 * guarantees they'll be processed in time.  So, we'll also use a
+		 * global barrier to ask all backends to close all files, and wait
+		 * until they're finished.
+		 */
+#if defined(WIN32) || defined(USE_ASSERT_CHECKING)
+		LWLockRelease(TablespaceCreateLock);
+		WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));
+		LWLockAcquire(TablespaceCreateLock, LW_EXCLUSIVE);
+#endif
+		/* And now try again. */
 		if (!destroy_tablespace_directories(tablespaceoid, false))
 		{
 			/* Still not empty, the files must be important then */
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index c43cdd685b..c7cce5967d 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -27,6 +27,7 @@
 #include "storage/latch.h"
 #include "storage/proc.h"
 #include "storage/shmem.h"
+#include "storage/smgr.h"
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
 
@@ -98,7 +99,7 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 static bool 

Re: pgbench: option delaying queries till connections establishment?

2021-01-30 Thread Fabien COELHO


Hello Thomas,


3 . Decide if it's sane for the Windows-based emulation to be in here
too, or if it should stay in pgbench.c.  Or alternatively, if we're
emulating pthread stuff on Windows, why not also put the other pthread
emulation stuff from pgbench.c into a "ports" file; that seems
premature and overkill for your project.  I dunno.


I decided to solve only the macOS problem for now.  So in this
version, the A and B patches are exactly as you had them in your v7,
except that B includes “port/pg_pthread.h” instead of .

Maybe it’d make sense to move the Win32 pthread emulation stuff out of
pgbench.c into src/port too (the pre-existing stuff, and the new
barrier stuff you added), but that seems like a separate patch, one
that I’m not best placed to write, and it’s not clear to me that we’ll
want to be using pthread APIs as our main abstraction if/when thread
usage increases in the PG source tree anyway.  Other opinions welcome.


I think it would be much more consistent to move all the thread complement 
stuff there directly: Currently (v8) the windows implementation is in 
pgbench and the MacOS implementation in port, which is quite messy.


Attached is a patch set which does that. I cannot test it neither on 
Windows nor on MacOS. Path 1 & 2 are really independent.


--
Fabien.diff --git a/configure b/configure
index e202697bbf..54c5a7963f 100755
--- a/configure
+++ b/configure
@@ -11668,6 +11668,62 @@ if test "$ac_res" != no; then :
 
 fi
 
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing pthread_barrier_wait" >&5
+$as_echo_n "checking for library containing pthread_barrier_wait... " >&6; }
+if ${ac_cv_search_pthread_barrier_wait+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_func_search_save_LIBS=$LIBS
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char pthread_barrier_wait ();
+int
+main ()
+{
+return pthread_barrier_wait ();
+  ;
+  return 0;
+}
+_ACEOF
+for ac_lib in '' pthread; do
+  if test -z "$ac_lib"; then
+ac_res="none required"
+  else
+ac_res=-l$ac_lib
+LIBS="-l$ac_lib  $ac_func_search_save_LIBS"
+  fi
+  if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_search_pthread_barrier_wait=$ac_res
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext
+  if ${ac_cv_search_pthread_barrier_wait+:} false; then :
+  break
+fi
+done
+if ${ac_cv_search_pthread_barrier_wait+:} false; then :
+
+else
+  ac_cv_search_pthread_barrier_wait=no
+fi
+rm conftest.$ac_ext
+LIBS=$ac_func_search_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_search_pthread_barrier_wait" >&5
+$as_echo "$ac_cv_search_pthread_barrier_wait" >&6; }
+ac_res=$ac_cv_search_pthread_barrier_wait
+if test "$ac_res" != no; then :
+  test "$ac_res" = "none required" || LIBS="$ac_res $LIBS"
+
+fi
+
 # Solaris:
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing fdatasync" >&5
 $as_echo_n "checking for library containing fdatasync... " >&6; }
@@ -15853,6 +15909,19 @@ esac
 
 fi
 
+ac_fn_c_check_func "$LINENO" "pthread_barrier_wait" "ac_cv_func_pthread_barrier_wait"
+if test "x$ac_cv_func_pthread_barrier_wait" = xyes; then :
+  $as_echo "#define HAVE_PTHREAD_BARRIER_WAIT 1" >>confdefs.h
+
+else
+  case " $LIBOBJS " in
+  *" pthread_barrier_wait.$ac_objext "* ) ;;
+  *) LIBOBJS="$LIBOBJS pthread_barrier_wait.$ac_objext"
+ ;;
+esac
+
+fi
+
 ac_fn_c_check_func "$LINENO" "pwrite" "ac_cv_func_pwrite"
 if test "x$ac_cv_func_pwrite" = xyes; then :
   $as_echo "#define HAVE_PWRITE 1" >>confdefs.h
diff --git a/configure.ac b/configure.ac
index a5ad072ee4..23ad861b27 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1152,6 +1152,7 @@ AC_SEARCH_LIBS(getopt_long, [getopt gnugetopt])
 AC_SEARCH_LIBS(shm_open, rt)
 AC_SEARCH_LIBS(shm_unlink, rt)
 AC_SEARCH_LIBS(clock_gettime, [rt posix4])
+AC_SEARCH_LIBS(pthread_barrier_wait, pthread)
 # Solaris:
 AC_SEARCH_LIBS(fdatasync, [rt posix4])
 # Required for thread_test.c on Solaris
@@ -1734,6 +1735,7 @@ AC_REPLACE_FUNCS(m4_normalize([
 	mkdtemp
 	pread
 	preadv
+	pthread_barrier_wait
 	pwrite
 	pwritev
 	random
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index a4a3f40048..0ac0e186ea 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -66,6 +66,7 @@
 #include "libpq-fe.h"
 #include "pgbench.h"
 #include "portability/instr_time.h"
+#include "port/pg_pthread.h"
 
 #ifndef M_PI
 #define M_PI 3.14159265358979323846
@@ -109,26 +110,6 @@ typedef struct socket_set
 
 #endif			/* POLL_USING_SELECT */
 
-/*
- * Multi-platform pthread implementations
- */
-
-#ifdef WIN32
-/* Use native win32 threads on Windows */
-typedef struct win32_pthread *pthread_t;
-typedef int pthread_attr_t;
-
-static int	

Re: [PATCH] pg_hba.conf error messages for logical replication connections

2021-01-30 Thread Paul Martinez
On Thu, Jan 28, 2021 at 8:17 PM Amit Kapila  wrote:
>
> What exactly are you bothered about here? Is the database name not
> present in the message your concern or the message uses 'replication'
> but actually it doesn't relate to 'replication' specified in
> pg_hba.conf your concern? I think with the current scheme one might
> say that replication word in error message helps them to distinguish
> logical replication connection error from a regular connection error.
> I am not telling what you are proposing is wrong but just that the
> current scheme of things might be helpful to some users. If you can
> explain a bit how the current message misled you and the proposed one
> solves that confusion that would be better?
>

My main confusion arose from conflating the word "replication" in the
error message with the "replication" keyword in pg_hba.conf.

In my case, I was actually confused because I was creating logical
replication connections that weren't getting rejected, despite a lack
of any "replication" rules in my pg_hba.conf. I had the faulty
assumption that replication connection requires "replication" keyword,
and my change to the documentation makes it explicit that logical
replication connections do not match the "replication" keyword.

I was digging through the code trying to understand why it was working,
and also making manual connections before I stumbled upon these error
messages.

The fact that the error message doesn't include the database name
definitely contributed to my confusion. It only mentions the word
"replication", and not a database name, and that reinforces the idea
that the "replication" keyword rule should apply, because it seems
"replication" has replaced the database name.

But overall, I would agree that the current messages aren't wrong,
and my fix could still cause confusion because now logical replication
connections won't be described as "replication" connections.

How about explicitly specifying physical vs. logical replication in the
error message, and also adding hints for clarifying the use of
the "replication" keyword in pg_hba.conf?

if physical replication
  Error "pg_hba.conf rejects physical replication connection ..."
  Hint "Physical replication connections only match pg_hba.conf rules
using the "replication" keyword"
else if logical replication
  Error "pg_hba.conf rejects logical replication connection to database %s ..."
  // Maybe add this?
  Hint "Logical replication connections do not match pg_hba.conf rules
using the "replication" keyword"
else
  Error "pg_hba.conf rejects connection to database %s ..."

If I did go with this approach, would it be better to have three
separate branches, or to just introduce another variable for the
connection type? I'm not sure what is optimal for translation. (If both
types of replication connections get hints then probably three branches
is better.)

const char *connection_type;

connection_type =
  am_db_walsender ? _("logical replication connection") :
  am_walsender ? _("physical replication connection") :
  _("connection")


- Paul




Re: SELECT INTO deprecation

2021-01-30 Thread Peter Eisentraut

On 2020-12-17 02:30, Michael Paquier wrote:

On Wed, Dec 16, 2020 at 06:07:08PM +0100, Peter Eisentraut wrote:

Right, we would very likely not add it now.  But it doesn't seem to cause a
lot of ongoing maintenance burden, so if there is a use case, it's not
unreasonable to keep it around.  I have no other motive here, I was just
curious.


 From the point of view of the code, this would simplify the grammar
rules of SELECT, which is a good thing.  And this would also simplify
some code in the rewriter where we create some CreateTableAsStmt
on-the-fly where an IntoClause is specified, but my take is that this
is not really a burden when it comes to long-term maintenance.  And
the git history tells, at least it seems to me so, that this is not
manipulated much.


I have committed my small documentation patch that points out 
compatibility with other implementations.





Re: Allow CURRENT_ROLE in GRANTED BY

2021-01-30 Thread Peter Eisentraut

On 2020-12-30 13:43, Simon Riggs wrote:

On Thu, 10 Dec 2020 at 18:40, Peter Eisentraut
 wrote:


On 2020-06-24 20:21, Peter Eisentraut wrote:

On 2020-06-24 10:12, Vik Fearing wrote:

On 6/24/20 8:35 AM, Peter Eisentraut wrote:

I was checking some loose ends in SQL conformance, when I noticed: We
support GRANT role ... GRANTED BY CURRENT_USER, but we don't support
CURRENT_ROLE in that place, even though in PostgreSQL they are
equivalent.  Here is a trivial patch to add that.



The only thing that isn't dead-obvious about this patch is the commit
message says "[PATCH 1/2]".  What is in the other part?


Hehe.  The second patch is some in-progress work to add the GRANTED BY
clause to the regular GRANT command.  More on that perhaps at a later date.


Here is the highly anticipated and quite underwhelming second part of
this patch set.


Looks great, but no test to confirm it works. I would suggest adding a
test and committing directly since I don't see any cause for further
discussion.


Committed with some tests.  Thanks.

--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/