Re: aggregate crash

2020-01-20 Thread Andres Freund
Hi,

On 2020-01-15 19:16:30 -0800, Andres Freund wrote:
> The bug, in a way, exists all the way back, but it's a bit harder to
> create NULL values where the datum component isn't 0.

> To fix I suggest we, in all branches, do the equivalent of adding
> something like:
> diff --git i/src/backend/executor/execExprInterp.c 
> w/src/backend/executor/execExprInterp.c
> index 790380051be..3260a63ac6b 100644
> --- i/src/backend/executor/execExprInterp.c
> +++ w/src/backend/executor/execExprInterp.c
> @@ -4199,6 +4199,12 @@ ExecAggTransReparent(AggState *aggstate, 
> AggStatePerTrans pertrans,
>   pertrans->transtypeByVal,
>   pertrans->transtypeLen);
>  }
> +else
> +{
> +/* ensure datum component is 0 for NULL transition values */
> +newValue = (Datum) 0;
> +}
> +
>  if (!oldValueIsNull)
>  {
>  if (DatumIsReadWriteExpandedObject(oldValue,
> 
> and a comment explaining why it's (now) safe to rely on datum
> comparisons for
> if (DatumGetPointer(newVal) != DatumGetPointer(pergroup->transValue))

Pushed something along those lines.


> A separate question is whether it's worth adding code to
> e.g. EEO_CASE(EEOP_FUNCEXPR_STRICT) also resetting *op->resvalue to
> (Datum) 0.  I don't personally don't think ensuring the datum is always
> 0 when isnull true is all that helpful, if we can't guarantee it
> everywhere. So I'm a bit loathe to add cycles to places that don't need
> it, and are hot.

I wonder if its worth adding a few valgrind annotations marking values
as undefined when null? Would make it easier to catch such cases in the
future.

Greetings,

Andres Freund




Re: [HACKERS] Block level parallel vacuum

2020-01-20 Thread Masahiko Sawada
On Tue, 21 Jan 2020 at 16:13, Amit Kapila  wrote:
>
> On Tue, Jan 21, 2020 at 12:11 PM Masahiko Sawada
>  wrote:
> >
> > On Tue, 21 Jan 2020 at 15:35, Amit Kapila  wrote:
> > >
> > > On Tue, Jan 21, 2020 at 11:30 AM Andres Freund  wrote:
> > > >
> > > > Hi,
> > > >
> > > > On 2020-01-20 09:09:35 +0530, Amit Kapila wrote:
> > > > > Pushed, after fixing these two comments.
> > > >
> > > > When attempting to vacuum a large table I just got:
> > > >
> > > > postgres=# vacuum FREEZE ;
> > > > ERROR:  invalid memory alloc request size 1073741828
> > > >
> > > > #0  palloc (size=1073741828) at 
> > > > /mnt/tools/src/postgresql/src/backend/utils/mmgr/mcxt.c:959
> > > > #1  0x56452cc45cac in lazy_space_alloc (vacrelstats=0x56452e5ab0e8, 
> > > > vacrelstats=0x56452e5ab0e8, relblocks=24686152)
> > > > at 
> > > > /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:2741
> > > > #2  lazy_scan_heap (aggressive=true, nindexes=1, Irel=0x56452e5ab1c8, 
> > > > vacrelstats=, params=0x7ffdf8c00290, onerel= > > > out>)
> > > > at 
> > > > /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:786
> > > > #3  heap_vacuum_rel (onerel=, params=0x7ffdf8c00290, 
> > > > bstrategy=)
> > > > at 
> > > > /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:472
> > > > #4  0x56452cd8b42c in table_relation_vacuum (bstrategy= > > > out>, params=0x7ffdf8c00290, rel=0x7fbcdff1e248)
> > > > at /mnt/tools/src/postgresql/src/include/access/tableam.h:1450
> > > > #5  vacuum_rel (relid=16454, relation=, 
> > > > params=params@entry=0x7ffdf8c00290) at 
> > > > /mnt/tools/src/postgresql/src/backend/commands/vacuum.c:1882
> > > >
> > > > Looks to me that the calculation moved into compute_max_dead_tuples()
> > > > continues to use use an allocation ceiling
> > > > maxtuples = Min(maxtuples, MaxAllocSize / 
> > > > sizeof(ItemPointerData));
> > > > but the actual allocation now is
> > > >
> > > > #define SizeOfLVDeadTuples(cnt) \
> > > > add_size((offsetof(LVDeadTuples, itemptrs)), \
> > > >  mul_size(sizeof(ItemPointerData), cnt))
> > > >
> > > > i.e. the overhead of offsetof(LVDeadTuples, itemptrs) is not taken into
> > > > account.
> > > >
> > >
> > > Right, I think we need to take into account in both the places in
> > > compute_max_dead_tuples():
> > >
> > > maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
> > > ..
> > > maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
> > >
> > >
> >
> > Agreed. Attached patch should fix this issue.
> >
>
> if (useindex)
>   {
> - maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
> + maxtuples = ((vac_work_mem * 1024L) - SizeOfLVDeadTuplesHeader) /
> sizeof(ItemPointerData);
>
> SizeOfLVDeadTuplesHeader is not defined by patch.  Do you think it
> makes sense to add a comment here about the calculation?

Oops, it should be SizeOfLVDeadTuples. Attached updated version.

I defined two macros: SizeOfLVDeadTuples is the size of LVDeadTuples
struct and SizeOfDeadTuples is the size including LVDeadTuples struct
and dead tuples.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


fix_max_dead_tuples_v2.patch
Description: Binary data


Re: [HACKERS] Block level parallel vacuum

2020-01-20 Thread Amit Kapila
On Tue, Jan 21, 2020 at 12:11 PM Masahiko Sawada
 wrote:
>
> On Tue, 21 Jan 2020 at 15:35, Amit Kapila  wrote:
> >
> > On Tue, Jan 21, 2020 at 11:30 AM Andres Freund  wrote:
> > >
> > > Hi,
> > >
> > > On 2020-01-20 09:09:35 +0530, Amit Kapila wrote:
> > > > Pushed, after fixing these two comments.
> > >
> > > When attempting to vacuum a large table I just got:
> > >
> > > postgres=# vacuum FREEZE ;
> > > ERROR:  invalid memory alloc request size 1073741828
> > >
> > > #0  palloc (size=1073741828) at 
> > > /mnt/tools/src/postgresql/src/backend/utils/mmgr/mcxt.c:959
> > > #1  0x56452cc45cac in lazy_space_alloc (vacrelstats=0x56452e5ab0e8, 
> > > vacrelstats=0x56452e5ab0e8, relblocks=24686152)
> > > at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:2741
> > > #2  lazy_scan_heap (aggressive=true, nindexes=1, Irel=0x56452e5ab1c8, 
> > > vacrelstats=, params=0x7ffdf8c00290, onerel= > > out>)
> > > at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:786
> > > #3  heap_vacuum_rel (onerel=, params=0x7ffdf8c00290, 
> > > bstrategy=)
> > > at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:472
> > > #4  0x56452cd8b42c in table_relation_vacuum (bstrategy= > > out>, params=0x7ffdf8c00290, rel=0x7fbcdff1e248)
> > > at /mnt/tools/src/postgresql/src/include/access/tableam.h:1450
> > > #5  vacuum_rel (relid=16454, relation=, 
> > > params=params@entry=0x7ffdf8c00290) at 
> > > /mnt/tools/src/postgresql/src/backend/commands/vacuum.c:1882
> > >
> > > Looks to me that the calculation moved into compute_max_dead_tuples()
> > > continues to use use an allocation ceiling
> > > maxtuples = Min(maxtuples, MaxAllocSize / 
> > > sizeof(ItemPointerData));
> > > but the actual allocation now is
> > >
> > > #define SizeOfLVDeadTuples(cnt) \
> > > add_size((offsetof(LVDeadTuples, itemptrs)), \
> > >  mul_size(sizeof(ItemPointerData), cnt))
> > >
> > > i.e. the overhead of offsetof(LVDeadTuples, itemptrs) is not taken into
> > > account.
> > >
> >
> > Right, I think we need to take into account in both the places in
> > compute_max_dead_tuples():
> >
> > maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
> > ..
> > maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
> >
> >
>
> Agreed. Attached patch should fix this issue.
>

if (useindex)
  {
- maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
+ maxtuples = ((vac_work_mem * 1024L) - SizeOfLVDeadTuplesHeader) /
sizeof(ItemPointerData);

SizeOfLVDeadTuplesHeader is not defined by patch.  Do you think it
makes sense to add a comment here about the calculation?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: pg13 PGDLLIMPORT list

2020-01-20 Thread Craig Ringer
On Tue, 21 Jan 2020 at 12:49, Michael Paquier  wrote:
>
> On Tue, Jan 21, 2020 at 08:34:05AM +0530, Amit Kapila wrote:
> > As such no objection, but I am not sure if the other person need it on
> > back branches as well.  Are you planning to commit this, or if you
> > want I can take care of it?
>
> Thanks for the reminder.  Done now.  I have also switched the
> surrounding parameters while on it to not be inconsistent.

While speaking of PGDLLIMPORT, I wrote a quick check target in the
makefiles for some extensions I work on. It identified the following
symbols as used by the extensions but not exported:

XactLastRecEnd (xlog.h)
criticalSharedRelcachesBuilt   (relcache.h)
hot_standby_feedback   (walreceiver.h)
pgstat_track_activities  (pgstat.h)
WalRcv  (walreceiver.h)
wal_receiver_status_interval (walreceiver.h)
wal_retrieve_retry_interval (walreceiver.h)

Of those, XactLastRecEnd is by far the most important.

Failure to export pgstat_track_activities is a bug IMO, since it's
exported by inline functions pgstat_report_wait_start() and
pgstat_report_wait_end() in pgstat.h

criticalSharedRelcachesBuilt is useful in extensions that may do genam
systable_beginscan() etc in functions called both early in startup and
later on.

hot_standby_feedback can be worked around by reading the GUC via the
config options interface. But IMO all GUC symbols should be
PGDLLEXPORTed, especially since we lack an interface for extensions to
read arbitrary GUC values w/o formatting to string then parsing the
string.

wal_receiver_status_interval and wal_retrieve_retry_interval are not
that important, but again they're GUCs.

Being able to see WalRcv is very useful when running extension code on
a streaming physical replica, where you want to make decisions based
on what's actually replicated.

Anyone object to exporting these?


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise




Re: Physical replication slot advance is not persistent

2020-01-20 Thread Craig Ringer
((On Tue, 21 Jan 2020 at 11:06, Michael Paquier  wrote:

> > The replication interface should not immediately flush changes to the
> > slot replay position on advance. It should be marked dirty and left to
> > be flushed by the next checkpoint. Doing otherwise potentially
> > introduces a lot of unnecessary fsync()s and may have an unpleasant
> > impact on performance.
>
> Some portions of the advancing code tells a different story.  It seems
> to me that the intention behind the first implementation of slot
> advancing was to get things flushed if any advancing was done.

Taking a step back here, I have no concerns with proposed changes for
pg_replication_slot_advance(). Disregard my comments about safety with
the SQL interface for the purposes of this thread, they apply only to
logical slots and are really unrelated to
pg_replication_slot_advance().

Re your comment above: For slot advances in general the flush to disk
is done lazily for performance reasons, but I think you meant
pg_replication_slot_advance() specifically.

pg_replication_slot_advance() doesn't appear to make any promises as
to immediate durability either way. It updates the required LSN
immediately with ReplicationSlotsUpdateRequiredLSN() so it
theoretically marks WAL as removable before it's flushed. But I don't
think we'll ever actually remove any WAL segments until checkpoint, at
which point we'll also flush any dirty slots, so it doesn't really
matter. For logical slots the lsn and xmin are both protected by the
effective/actual tracking logic and can't advance until the slot is
flushed.

The app might be surprised if the slot goes backwards after an
pg_replication_slot_advance() followed by a server crash though.

> The
> check doing that is actually broken from the start, but that's another
> story.  Could you check with Petr what was the intention here or drag
> his attention to this thread?  He is the original author of the
> feature.  So his output would be nice to have.

I'll ask him. He's pretty bogged at the moment though, and I've done a
lot of work in this area too. (See e.g. the catalog_xmin in hot
standby feedback changes).

> > The SQL interface advances the slot restart position and marks the
> > slot dirty *before the client has confirmed receipt of the data and
> > flushed it to disk*. So there's a data loss window. If the client
> > disconnects or crashes before all the data from that function call is
> > safely flushed to disk it may lose the data, then be unable to fetch
> > it again from the server because of the restart_lsn position advance.
>
> Well, you have the same class of problems with for example synchronous
> replication.  The only state a client can be sure of is that it
> received a confirmation that the operation happens and completed.
> Any other state tells that the operation may have happened.  Or not.
> Now, being sure that the data of the new slot has been flushed once
> the advancing function is done once the client got the confirmation
> that the work is done is a property which could be interesting to some
> class of applications.

That's what we already provide for the streaming interface for slot access.

I agree there's no need to shove a fix to the SQL interface for
phys/logical slots into this same discussion. I'm just trying to make
sure we don't "fix" a "bug" that's actually an important part of the
design by trying to fix a perceived-missing flush in the streaming
interface too. I am not at all confident that the test coverage for
this is sufficient right now, since we lack a good way to make
postgres delay various lazy internal activity to let us reliably
examine intermediate states in a race-free way, so I'm not sure tests
would catch it.

> > Really, we should add a "no_advance_position" option to the SQL
> > interface, then expect the client to call a second function that
> > explicitly advances the restart_lsn and confirmed_flush_lsn. Otherwise
> > no SQL interface client can be crashsafe.
>
> Hm.  Could you elaborate more what you mean here?  I am not sure to
> understand.  Note that calling the advance function multiple times has
> no effects, and the result returned is the actual restart_lsn (or
> confirmed_flush_lsn of course).

I've probably confused things a bit here. I don't mind if whether or
not pg_replication_slot_advance() forces an immediate flush, I think
there are reasonable arguments in both directions.

In the above I was talking about how pg_logical_slot_get_changes()
presently advances the slot position immediately, so if the client
loses its connection before reading and flushing all the data it may
be unable to recover. And while pg_logical_slot_peek_changes() lets
the app read the data w/o advancing the slot, it has to then do a
separate pg_replication_slot_advance() which has to do the decoding
work again. I'd like to improve that, but I didn't intend to confuse
or sidetrack this discussion in the process. Sorry.

We don't have a SQL-level interface for 

Re: PATCH: standby crashed when replay block which truncated in standby but failed to truncate in master node

2020-01-20 Thread Fujii Masao




On 2020/01/21 13:39, Michael Paquier wrote:

On Tue, Jan 21, 2020 at 08:45:14AM +0530, Amit Kapila wrote:

The original email doesn't say so.  I might be missing something, but
can you explain what makes you think so.


Oops.  Incorrect thread, I was thinking about this one previously:
https://www.postgresql.org/message-id/822113470.250068.1573246011...@connect.xfinity.com

Re-reading the root of the thread, I am still not sure what we could
do, as that's rather tricky.  See here:
https://www.postgresql.org/message-id/20190927061414.gf8...@paquier.xyz


The original proposal, i.e., holding the interrupts during
the truncation, is worth considering? It is not a perfect
solution but might improve the situation a bit.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: [HACKERS] Block level parallel vacuum

2020-01-20 Thread Masahiko Sawada
On Tue, 21 Jan 2020 at 15:35, Amit Kapila  wrote:
>
> On Tue, Jan 21, 2020 at 11:30 AM Andres Freund  wrote:
> >
> > Hi,
> >
> > On 2020-01-20 09:09:35 +0530, Amit Kapila wrote:
> > > Pushed, after fixing these two comments.
> >
> > When attempting to vacuum a large table I just got:
> >
> > postgres=# vacuum FREEZE ;
> > ERROR:  invalid memory alloc request size 1073741828
> >
> > #0  palloc (size=1073741828) at 
> > /mnt/tools/src/postgresql/src/backend/utils/mmgr/mcxt.c:959
> > #1  0x56452cc45cac in lazy_space_alloc (vacrelstats=0x56452e5ab0e8, 
> > vacrelstats=0x56452e5ab0e8, relblocks=24686152)
> > at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:2741
> > #2  lazy_scan_heap (aggressive=true, nindexes=1, Irel=0x56452e5ab1c8, 
> > vacrelstats=, params=0x7ffdf8c00290, onerel=)
> > at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:786
> > #3  heap_vacuum_rel (onerel=, params=0x7ffdf8c00290, 
> > bstrategy=)
> > at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:472
> > #4  0x56452cd8b42c in table_relation_vacuum (bstrategy=, 
> > params=0x7ffdf8c00290, rel=0x7fbcdff1e248)
> > at /mnt/tools/src/postgresql/src/include/access/tableam.h:1450
> > #5  vacuum_rel (relid=16454, relation=, 
> > params=params@entry=0x7ffdf8c00290) at 
> > /mnt/tools/src/postgresql/src/backend/commands/vacuum.c:1882
> >
> > Looks to me that the calculation moved into compute_max_dead_tuples()
> > continues to use use an allocation ceiling
> > maxtuples = Min(maxtuples, MaxAllocSize / 
> > sizeof(ItemPointerData));
> > but the actual allocation now is
> >
> > #define SizeOfLVDeadTuples(cnt) \
> > add_size((offsetof(LVDeadTuples, itemptrs)), \
> >  mul_size(sizeof(ItemPointerData), cnt))
> >
> > i.e. the overhead of offsetof(LVDeadTuples, itemptrs) is not taken into
> > account.
> >
>
> Right, I think we need to take into account in both the places in
> compute_max_dead_tuples():
>
> maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
> ..
> maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
>
>

Agreed. Attached patch should fix this issue.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


fix_max_dead_tuples.patch
Description: Binary data


Re: TRUNCATE on foreign tables

2020-01-20 Thread Michael Paquier
On Mon, Jan 20, 2020 at 10:50:21PM +0900, Michael Paquier wrote:
> I have spent a good amount of time polishing 0001, tweaking the docs,
> comments, error messages and a bit its logic.  I am getting
> comfortable with it, but it still needs an extra lookup, an indent run
> which has some noise and I lacked of time today.  0002 has some of its
> issues fixed and I have not reviewed it fully yet.  There are still
> some places not adapted in it (why do you use "Bug?" in all your
> elog() messages by the way?), so the postgres_fdw part needs more
> attention.  Could you think about some docs for it by the way?

I have more comments about the postgres_fdw that need to be
addressed.

+   if (!OidIsValid(server_id))
+   {
+   server_id = GetForeignServerIdByRelId(frel_oid);
+   user = GetUserMapping(GetUserId(), server_id);
+   conn = GetConnection(user, false);
+   }
+   else if (server_id != GetForeignServerIdByRelId(frel_oid))
+   elog(ERROR, "Bug? inconsistent Server-IDs were supplied");
I agree here that an elog() looks more adapted than an assert.
However I would change the error message to be more like "incorrect
server OID supplied by the TRUNCATE callback" or something similar.
The server OID has to be valid anyway, so don't you just bypass any
errors if it is not set?

+-- truncate two tables at a command
What does this sentence mean?  Isn't that "truncate two tables in one
single command"?

The table names in the tests are rather hard to parse.  I think that
it would be better to avoid underscores at the beginning of the
relation names.

It would be nice to have some coverage with inheritance, and also
track down in the tests what we expect when ONLY is specified in that
case (with and without ONLY, both parent and child relations).

Anyway, attached is the polished version for 0001 that I would be fine
to commit, except for one point: are there objections if we do not
have extra handling for ONLY when it comes to foreign tables with
inheritance?  As the patch stands, the list of relations is first
built, with an inheritance recursive lookup done depending on if ONLY
is used or not.  Hence, if using "TRUNCATE ONLY foreign_tab, ONLY
foreign_tab2", then only those two tables would be passed down to the
FDW.  If ONLY is removed, both tables as well as their children are
added to the lists of relations split by server OID.  One problem is
that this could be confusing for some users I guess?  For example,
with a 1:1 mapping in the schema of the local and remote servers, a
user asking for TRUNCATE ONLY foreign_tab would pass down to the
remote just the equivalent of "TRUNCATE foreign_tab" using
postgres_fdw, causing the full inheritance tree to be truncated on the
remote side.  The concept of ONLY mixed with inherited foreign tables
is rather blurry (point raised by Stephen upthread). 
--
Michael
From dce70bca1242372c4d36c9d84973adffa90e5bdb Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 21 Jan 2020 15:32:41 +0900
Subject: [PATCH] Add FDW callback for support of TRUNCATE

---
 src/include/foreign/fdwapi.h   |   7 ++
 src/backend/commands/tablecmds.c   | 113 -
 src/test/regress/expected/foreign_data.out |   8 +-
 doc/src/sgml/fdwhandler.sgml   |  36 +++
 src/tools/pgindent/typedefs.list   |   1 +
 5 files changed, 156 insertions(+), 9 deletions(-)

diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..0a9f36735e 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -151,6 +151,10 @@ typedef bool (*AnalyzeForeignTable_function) (Relation relation,
 typedef List *(*ImportForeignSchema_function) (ImportForeignSchemaStmt *stmt,
 			   Oid serverOid);
 
+typedef void (*ExecForeignTruncate_function) (List *frels_list,
+			  DropBehavior behavior,
+			  bool restart_seqs);
+
 typedef Size (*EstimateDSMForeignScan_function) (ForeignScanState *node,
  ParallelContext *pcxt);
 typedef void (*InitializeDSMForeignScan_function) (ForeignScanState *node,
@@ -236,6 +240,9 @@ typedef struct FdwRoutine
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
 
+	/* Support functions for TRUNCATE */
+	ExecForeignTruncate_function ExecForeignTruncate;
+
 	/* Support functions for parallelism under Gather node */
 	IsForeignScanParallelSafe_function IsForeignScanParallelSafe;
 	EstimateDSMForeignScan_function EstimateDSMForeignScan;
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 30b72b6297..2c575668c2 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -59,6 +59,7 @@
 #include "commands/typecmds.h"
 #include "commands/user.h"
 #include "executor/executor.h"
+#include "foreign/fdwapi.h"
 #include "foreign/foreign.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -295,6 +296,21 

Re: [HACKERS] Block level parallel vacuum

2020-01-20 Thread Amit Kapila
On Tue, Jan 21, 2020 at 11:30 AM Andres Freund  wrote:
>
> Hi,
>
> On 2020-01-20 09:09:35 +0530, Amit Kapila wrote:
> > Pushed, after fixing these two comments.
>
> When attempting to vacuum a large table I just got:
>
> postgres=# vacuum FREEZE ;
> ERROR:  invalid memory alloc request size 1073741828
>
> #0  palloc (size=1073741828) at 
> /mnt/tools/src/postgresql/src/backend/utils/mmgr/mcxt.c:959
> #1  0x56452cc45cac in lazy_space_alloc (vacrelstats=0x56452e5ab0e8, 
> vacrelstats=0x56452e5ab0e8, relblocks=24686152)
> at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:2741
> #2  lazy_scan_heap (aggressive=true, nindexes=1, Irel=0x56452e5ab1c8, 
> vacrelstats=, params=0x7ffdf8c00290, onerel=)
> at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:786
> #3  heap_vacuum_rel (onerel=, params=0x7ffdf8c00290, 
> bstrategy=)
> at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:472
> #4  0x56452cd8b42c in table_relation_vacuum (bstrategy=, 
> params=0x7ffdf8c00290, rel=0x7fbcdff1e248)
> at /mnt/tools/src/postgresql/src/include/access/tableam.h:1450
> #5  vacuum_rel (relid=16454, relation=, 
> params=params@entry=0x7ffdf8c00290) at 
> /mnt/tools/src/postgresql/src/backend/commands/vacuum.c:1882
>
> Looks to me that the calculation moved into compute_max_dead_tuples()
> continues to use use an allocation ceiling
> maxtuples = Min(maxtuples, MaxAllocSize / 
> sizeof(ItemPointerData));
> but the actual allocation now is
>
> #define SizeOfLVDeadTuples(cnt) \
> add_size((offsetof(LVDeadTuples, itemptrs)), \
>  mul_size(sizeof(ItemPointerData), cnt))
>
> i.e. the overhead of offsetof(LVDeadTuples, itemptrs) is not taken into
> account.
>

Right, I think we need to take into account in both the places in
compute_max_dead_tuples():

maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
..
maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Minor issues in .pgpass

2020-01-20 Thread Fujii Masao

Hi,

When I was researching the maximum length of password in PostgreSQL
to answer the question from my customer, I found that there are two
minor issues in .pgpass file.

(1) If the length of a line in .pgpass file is larger than 319B,
   libpq silently treats each 319B in the line as a separate
   setting line.

(2) The document explains that a line beginning with # is treated
   as a comment in .pgpass. But as far as I read the code,
   there is no code doing such special handling. Whether a line
   begins with # or not, libpq just checks that the first token
   in the line match with the host. That is, if you try to connect
   to the host with the hostname beginning with #,
   it can match to the line beginning with # in .pgpass.

   Also if the length of that "comment" line is larger than 319B,
   the latter part of the line can be treated as valid setting.

You may think that these unexpected behaviors are not so harmful
in practice because "usually" the length of password setting line is
less than 319B and the hostname beginning with # is less likely to be
used. But the problem exists. And there are people who want to use
large password or to write a long comment (e.g., with multibyte
characters like Japanese) in .pgass, so these may be more harmful
in the near future.

For (1), I think that we should make libpq warn if the length of a line
is larger than 319B, and throw away the remaining part beginning from
320B position. Whether to enlarge the length of a line should be
a separate discussion, I think.

For (2), libpq should treat any lines beginning with # as comments.

I've not created the patch yet, but will do if we reach to
the consensus.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




RE: SLRU statistics

2020-01-20 Thread tsunakawa.ta...@fujitsu.com
From: Tomas Vondra 
> You're right the users can't really take advantage of this - my primary
> motivation was providing a feedback for devs, benchmarking etc. That
> might have been done with DEBUG messages or something, but this seems
> more convenient.

Understood.  I'm in favor of adding performance information even if it doesn't 
make sense for users (like other DBMSs sometimes do.)  One concern is that all 
the PostgreSQL performance statistics have been useful so far for tuning in 
some way, and this may become the first exception.  Do we describe the SLRU 
stats view in the manual, or hide it only for PG devs and support staff?


> I think it's unclear how desirable / necessary it is to allow users to
> tweak those caches. I don't think we should have a GUC for everything,
> but maybe there's some sort of heuristics to determine the size. The
> assumption is we actually find practical workloads where the size of
> these SLRUs is a performance issue.

I understood that the new performance statistics are expected to reveal what 
SLRUs need to be tunable and/or implemented with a different mechanism like 
shared buffers.


Regards
Takayuki Tsunakawa







Re: error context for vacuum to include block number

2020-01-20 Thread Masahiko Sawada
On Tue, 21 Jan 2020 at 06:49, Justin Pryzby  wrote:
>
> On Mon, Jan 20, 2020 at 11:11:20AM -0800, Andres Freund wrote:
> > This I do not get. I didn't yet fully wake up, so I might just be slow?
>
> It was needlessly cute at the cost of clarity (meant to avoid setting
> error_context_stack in lazy_scan_heap and again immediately on its return).
>
> On Mon, Jan 20, 2020 at 11:13:05AM -0800, Andres Freund wrote:
> > I was thinking that you could just use LVRelStats.
>
> Done.
>
> On Mon, Jan 20, 2020 at 11:11:20AM -0800, Andres Freund wrote:
> > Alternatively we could push another context for each index inside
> > lazy_vacuum_all_indexes(). There's been plenty bugs in indexes
> > triggering problems, so that could be worthwhile.
>
> Did this too, although I'm not sure what kind of errors it'd find (?)
>
> I considered elimating other uses of RelationGetRelationName, or looping over
> vacrelstats->blkno instead of local blkno.  I did that in an additional patch
> (that will cause conflicts if you try to apply it, due to other vacuum patch 
> in
> this branch).
>
> CREATE TABLE t AS SELECT generate_series(1,9)a;
>
> postgres=# SET client_min_messages=debug;SET statement_timeout=39; VACUUM 
> (VERBOSE, PARALLEL 0) t;
> INFO:  vacuuming "public.t"
> 2020-01-20 15:46:14.993 CST [20056] ERROR:  canceling statement due to 
> statement timeout
> 2020-01-20 15:46:14.993 CST [20056] CONTEXT:  while scanning block 211 of 
> relation "public.t"
> 2020-01-20 15:46:14.993 CST [20056] STATEMENT:  VACUUM (VERBOSE, PARALLEL 0) 
> t;
> ERROR:  canceling statement due to statement timeout
> CONTEXT:  while scanning block 211 of relation "public.t"
>
> SELECT 'CREATE INDEX ON t(a)' FROM generate_series(1,11);\gexec
> UPDATE t SET a=a+1;
>
> postgres=# SET client_min_messages=debug;SET statement_timeout=99; VACUUM 
> (VERBOSE, PARALLEL 0) t;
> INFO:  vacuuming "public.t"
> DEBUG:  "t_a_idx": vacuuming index
> 2020-01-20 15:47:36.338 CST [20139] ERROR:  canceling statement due to 
> statement timeout
> 2020-01-20 15:47:36.338 CST [20139] CONTEXT:  while vacuuming relation 
> "public.t_a_idx"
> 2020-01-20 15:47:36.338 CST [20139] STATEMENT:  VACUUM (VERBOSE, PARALLEL 0) 
> t;
> ERROR:  canceling statement due to statement timeout
> CONTEXT:  while vacuuming relation "public.t_a_idx"
>
> I haven't found a good way of exercizing the "vacuuming heap" path, though.

Some of them conflicts with the current HEAD(62c9b52231). Please rebase them.

Regards,


--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




TODO: Allow a stalled COPY to exit if the backend is terminated

2020-01-20 Thread Tatsuo Ishii
In TODO wiki:
https://wiki.postgresql.org/wiki/TODO

- Allow a stalled COPY to exit if the backend is terminated

Re: possible bug not in open items

https://www.postgresql.org/message-id/flat/200904091648.n39GmMJ07139%40momjian.us#d86f9ba37b4b34d3931c7152a028fe45

Hasn't this been fixed?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: [HACKERS] Block level parallel vacuum

2020-01-20 Thread Andres Freund
Hi,

On 2020-01-20 09:09:35 +0530, Amit Kapila wrote:
> Pushed, after fixing these two comments.

When attempting to vacuum a large table I just got:

postgres=# vacuum FREEZE ;
ERROR:  invalid memory alloc request size 1073741828

#0  palloc (size=1073741828) at 
/mnt/tools/src/postgresql/src/backend/utils/mmgr/mcxt.c:959
#1  0x56452cc45cac in lazy_space_alloc (vacrelstats=0x56452e5ab0e8, 
vacrelstats=0x56452e5ab0e8, relblocks=24686152)
at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:2741
#2  lazy_scan_heap (aggressive=true, nindexes=1, Irel=0x56452e5ab1c8, 
vacrelstats=, params=0x7ffdf8c00290, onerel=)
at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:786
#3  heap_vacuum_rel (onerel=, params=0x7ffdf8c00290, 
bstrategy=)
at /mnt/tools/src/postgresql/src/backend/access/heap/vacuumlazy.c:472
#4  0x56452cd8b42c in table_relation_vacuum (bstrategy=, 
params=0x7ffdf8c00290, rel=0x7fbcdff1e248)
at /mnt/tools/src/postgresql/src/include/access/tableam.h:1450
#5  vacuum_rel (relid=16454, relation=, 
params=params@entry=0x7ffdf8c00290) at 
/mnt/tools/src/postgresql/src/backend/commands/vacuum.c:1882

Looks to me that the calculation moved into compute_max_dead_tuples()
continues to use use an allocation ceiling
maxtuples = Min(maxtuples, MaxAllocSize / 
sizeof(ItemPointerData));
but the actual allocation now is

#define SizeOfLVDeadTuples(cnt) \
add_size((offsetof(LVDeadTuples, itemptrs)), \
 mul_size(sizeof(ItemPointerData), cnt))

i.e. the overhead of offsetof(LVDeadTuples, itemptrs) is not taken into
account.

Regards,

Andres




Re: closesocket behavior in different platforms

2020-01-20 Thread Amit Kapila
On Fri, Dec 6, 2019 at 11:24 AM vignesh C  wrote:
>
> It is noticed that in all the 4 cases the message "FATAL:  terminating 
> connection due to administrator command" does not appear in windows.
>
> However the following message is present in the server log file:
> FATAL:  terminating connection due to administrator command
>
> The reason for this looks like:
> When the server closes a connection, it sends the ErrorResponse packet, and 
> then closes the socket and terminates the backend process. If the packet is 
> received before the server closes the connection, the error message is 
> received in both windows and linux. If the packet is not received before the 
> server closes the connection, the error message is not received in case of 
> windows where as in linux it is received.
>
> There have been a couple of discussion earlier also on this [1] & [2], but we 
> could not find any alternate solution.
>
> One of the options that msdn suggests in [3] is to use SO_LINGER option, we 
> had tried this option with no luck in solving. One other thing that we had 
> tried was to sleep for 1 second before closing the socket, this solution 
> works if the client is active, whereas in case of inactive clients it does 
> not solves the problem. One other thought that we had was to simultaneously 
> check the connection from psql, when we are waiting for query input in 
> gets_interactive function or have a separate thread to check the connection 
> status periodically, this might work only in case of psql but will not work 
> for application which uses libpq. Amit had also suggested one solution in 
> [4], where he proposed 'I have also tried calling closesocket() explicitly in 
> our function socket_close which has changed the error message to "could not 
> receive data from server: Software caused connection abort 
> (0x2745/10053)".'
>

Based on previous investigation and information in this email, I don't
see anything we can do about this.

> Should we add some documentation for the above behavior.
>

That sounds reasonable to me.  Any proposal for the same?   One idea
could be to add something like "Client Disconnection Problems" after
the "Client Connection Problems" section in docs [1].

Anybody else has any better suggestions on this topic?


[1] - https://www.postgresql.org/docs/devel/server-start.html

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: [Proposal] Global temporary tables

2020-01-20 Thread Pavel Stehule
Hi

I have a free time this evening, so I will check this patch

I have a one question

+ /* global temp table get relstats from localhash */
+ if (RELATION_IS_GLOBAL_TEMP(rel))
+ {
+ get_gtt_relstats(RelationGetRelid(rel),
+ , , ,
+ NULL, NULL);
+ }
+ else
+ {
+ /* coerce values in pg_class to more desirable types */
+ relpages = (BlockNumber) rel->rd_rel->relpages;
+ reltuples = (double) rel->rd_rel->reltuples;
+ relallvisible = (BlockNumber) rel->rd_rel->relallvisible;
+ }

Isbn't possible to fill the rd_rel structure too, so this branching can be
reduced?

Regards

Pavel

po 20. 1. 2020 v 17:27 odesílatel 曾文旌(义从) 
napsal:

>
>
> > 2020年1月20日 上午1:32,Erik Rijkers  写道:
> >
> > On 2020-01-19 18:04, 曾文旌(义从) wrote:
> >>> 2020年1月14日 下午9:20,Pavel Stehule  写道:
> >>> út 14. 1. 2020 v 14:09 odesílatel 曾文旌(义从)  > napsal:
> >
> >>> [global_temporary_table_v4-pg13.patch ]
> >
> > Hi,
> >
> > This patch doesn't quiet apply for me:
> >
> > patching file src/backend/access/common/reloptions.c
> > patching file src/backend/access/gist/gistutil.c
> > patching file src/backend/access/hash/hash.c
> > Hunk #1 succeeded at 149 (offset 3 lines).
> > patching file src/backend/access/heap/heapam_handler.c
> > patching file src/backend/access/heap/vacuumlazy.c
> > patching file src/backend/access/nbtree/nbtpage.c
> > patching file src/backend/access/table/tableam.c
> > patching file src/backend/access/transam/xlog.c
> > patching file src/backend/catalog/Makefile
> > Hunk #1 FAILED at 44.
> > 1 out of 1 hunk FAILED -- saving rejects to file
> src/backend/catalog/Makefile.rej
> > [...]
> >   (The rest applies without errors)
> >
> > src/backend/catalog/Makefile.rej contains:
> >
> > 
> > --- src/backend/catalog/Makefile
> > +++ src/backend/catalog/Makefile
> > @@ -44,6 +44,8 @@ OBJS = \
> >   storage.o \
> >   toasting.o
> >
> > +OBJS += storage_gtt.o
> > +
> > BKIFILES = postgres.bki postgres.description postgres.shdescription
> >
> > include $(top_srcdir)/src/backend/common.mk
> > 
> >
> > Can you have a look?
> I updated the code and remade the patch.
> Please give me feedback if you have any more questions.
>
>
>
>
> >
> >
> > thanks,
> >
> > Erik Rijkers
> >
> >
> >
> >
> >
>
>


Re: Increase psql's password buffer size

2020-01-20 Thread Fujii Masao




On 2020/01/21 4:21, David Fetter wrote:

On Mon, Jan 20, 2020 at 07:44:25PM +0100, David Fetter wrote:

On Mon, Jan 20, 2020 at 01:12:35PM -0500, Tom Lane wrote:

David Fetter  writes:

At least two cloud providers are now stuffing large amounts of
information into the password field. This change makes it possible to
accommodate that usage in interactive sessions.


Like who?


AWS and Azure are two examples I know of.


It seems like a completely silly idea.  And if 2K is sane, why not
much more?


Good question. Does it make sense to rearrange these things so they're
allocated at runtime instead of compile time?


(I can't say that s/100/2048/ in one place is a particularly evil
change; what bothers me is the likelihood that there are other
places that won't cope with arbitrarily long passwords.  Not all of
them are necessarily under our control, either.)


I found one that is, so please find attached the next revision of the
patch.


I found another place that assumes 100 bytes and upped it to 2048.


There are other places that 100 bytes password length is assumed.
It's better to check the 0001 patch that posted in the following thread.
https://www.postgresql.org/message-id/09512c4f-8cb9-4021-b455-ef4c4f0d5...@amazon.com

I have no strong opinion about the maximum length of password,
for now. But IMO it's worth committing that 0001 patch as the first step
for this problem.

Also IMO the more problematic thing is that psql silently truncates
the password specified in the prompt into 99B if its length is
more than 99B. I think that psql should emit a warning in this case
so that users can notice that.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: Error message inconsistency

2020-01-20 Thread MBeena Emerson
Hi Amit,

On Tue, 21 Jan 2020 at 10:49, Amit Kapila  wrote:

> On Thu, Jan 9, 2020 at 5:42 PM MBeena Emerson 
> wrote:
> >
>
> Hi Beena,
>
> It is better to reply inline.
>
> > Hi Mahendra,
> >
> > Thanks for the patch.
> > I am not sure but maybe the relation name should also be added to the
> following test case?
> >
> > create table t4 (id int);
> > insert into t4 values (1);
> > ALTER TABLE t4 ADD CONSTRAINT c1 CHECK (id > 10) NOT VALID; -- succeeds
> > ALTER TABLE t4 VALIDATE CONSTRAINT c1;
> > ERROR:  check constraint "c1" is violated by some row
> >
>
> I see that in this case, we are using errtableconstraint which should
> set table/schema name, but then that doesn't seem to be used.  Can we
> explore it a bit from that angle?
>

The usage of the function errtableconstraint seems only to set the
schema_name table_name constraint_name internally and not for display
purposes. As seen in the following two cases where the relation name is
displayed using RelationGetRelationName and errtableconstraint is called as
part of errcode parameter not errmsg.

  ereport(ERROR,
(errcode(ERRCODE_CHECK_VIOLATION),
 errmsg("new row for relation \"%s\" violates check
constraint \"%s\"",
RelationGetRelationName(orig_rel), failed),
 val_desc ? errdetail("Failing row contains %s.",
val_desc) : 0,
 errtableconstraint(orig_rel, failed)));

ereport(ERROR,
(errcode(ERRCODE_UNIQUE_VIOLATION),
 errmsg("duplicate key value violates
unique constraint \"%s\"",
RelationGetRelationName(rel)),
 key_desc ? errdetail("Key %s already
exists.",
  key_desc) : 0,
 errtableconstraint(heapRel,

RelationGetRelationName(rel;


--
M Beena Emerson

EnterpriseDB: http://www.enterprisedb.com


Re: Error message inconsistency

2020-01-20 Thread Amit Kapila
On Mon, Jan 6, 2020 at 6:31 PM Mahendra Singh Thalor  wrote:
>
> On Sat, 6 Jul 2019 at 09:53, Amit Kapila  wrote:
> >
> > On Mon, Jul 1, 2019 at 10:05 PM Alvaro Herrera  
> > wrote:
> > >
> > > Do we have an actual patch here?
> > >
> >
> > We have a patch, but it needs some more work like finding similar
> > places and change all of them at the same time and then change the
> > tests to adapt the same.
> >
>
> Hi all,
> Based on above discussion, I tried to find out all the places where we need 
> to change error for "not null constraint".  As Amit Kapila pointed out 1 
> place, I changed the error and adding modified patch.
>

It seems you have not used the two error codes
(ERRCODE_NOT_NULL_VIOLATION and ERRCODE_CHECK_VIOLATION) pointed by me
above.

> What does this patch?
> Before this patch, to display error of "not-null constraint", we were not 
> displaying relation name in some cases so attached patch is adding relation 
> name with the "not-null constraint" error in 2 places. I didn't changed out 
> files of test suite as we haven't finalized error messages.
>
> I verified Robert's point of for partition tables also. With the error, we 
> are adding relation name of "child table" and i think, it is correct.
>

Can you show the same with the help of an example?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: [PATCH] Resolve Parallel Hash Join Performance Issue

2020-01-20 Thread Thomas Munro
(Replies to both Gang and Tom below).

On Fri, Jan 10, 2020 at 1:52 PM Deng, Gang  wrote:
> Thank you for the comment. Yes, I agree the alternative of using 
> '(!parallel)', so that no need to test the bit. Will someone submit patch to 
> for it accordingly?

Here's a patch like that.

On Fri, Jan 10, 2020 at 3:43 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > Right, I see.  The funny thing is that the match bit is not even used
> > in this query (it's used for right and full hash join, and those
> > aren't supported for parallel joins yet).  Hmm.  So, instead of the
> > test you proposed, an alternative would be to use if (!parallel).
> > That's a value that will be constant-folded, so that there will be no
> > branch in the generated code (see the pg_attribute_always_inline
> > trick).  If, in a future release, we need the match bit for parallel
> > hash join because we add parallel right/full hash join support, we
> > could do it the way you showed, but only if it's one of those join
> > types, using another constant parameter.
>
> Can we base the test off the match type today, and avoid leaving
> something that will need to be fixed later?

I agree that it'd be nicer to use the logically correct thing, namely
a test of HJ_FILL_INNER(node), but that'd be a run-time check.  I'd
like to back-patch this and figured that we don't want to add new
branches too casually.

I have an experimental patch where "fill_inner" and "fill_outer" are
compile-time constants and you can skip various bits of code without
branching (part of a larger experiment to figure out which of many
parameters are worth specialising at a cost of a couple of KB of text
per combination, including the ability to use wider hashes so that
monster sized joins work better).  Then I could test the logically
correct thing explicitly without branches.


0001-Avoid-unnecessary-shmem-writes-in-Parallel-Hash-Join.patch
Description: Binary data


Re: Error message inconsistency

2020-01-20 Thread Amit Kapila
On Thu, Jan 9, 2020 at 5:42 PM MBeena Emerson  wrote:
>

Hi Beena,

It is better to reply inline.

> Hi Mahendra,
>
> Thanks for the patch.
> I am not sure but maybe the relation name should also be added to the 
> following test case?
>
> create table t4 (id int);
> insert into t4 values (1);
> ALTER TABLE t4 ADD CONSTRAINT c1 CHECK (id > 10) NOT VALID; -- succeeds
> ALTER TABLE t4 VALIDATE CONSTRAINT c1;
> ERROR:  check constraint "c1" is violated by some row
>

I see that in this case, we are using errtableconstraint which should
set table/schema name, but then that doesn't seem to be used.  Can we
explore it a bit from that angle?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Option to dump foreign data in pg_dump

2020-01-20 Thread vignesh C
On Mon, Jan 20, 2020 at 8:34 PM Luis Carril  wrote:
>
>
> Hi Vignesh,
>
>yes you are right I could reproduce it also with 'file_fdw'. The issue is 
> that LOCK is not supported on foreign tables, so I guess that the safest 
> solution is to make the --include-foreign-data incompatible with --jobs, 
> because skipping the locking for foreign tables maybe can lead to a deadlock 
> anyway. Suggestions?
>

Yes we can support --include-foreign-data without parallel option and
later add support for parallel option as a different patch.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Physical replication slot advance is not persistent

2020-01-20 Thread Michael Paquier
On Mon, Jan 20, 2020 at 11:00:14AM -0800, Andres Freund wrote:
> On 2020-01-20 15:45:40 +0900, Michael Paquier wrote:
>> 1) The slot advancing has to mark the slot as dirty, but should we
>> make the change persistent at the end of the function or should we
>> wait for a checkpoint to do the work, meaning that any update done to
>> the slot would be lost if a crash occurs in-between?  Note that we
>> have this commit in slotfuncs.c for
>> pg_logical_replication_slot_advance():
>>  * Dirty the slot so it's written out at the next checkpoint.
>>  * We'll still lose its position on crash, as documented, but it's
>>  * better than always losing the position even on clean restart.
>> 
>> This comment refers to the documentation for the logical decoding
>> section (see logicaldecoding-replication-slots in
>> logicaldecoding.sgml), and even if nothing can be done until the slot
>> advance function reaches its hand, we ought to make the data
>> persistent if we can.
> 
> That doesn't really seem like a meaningful reference, because the
> concerns between constantly streaming out changes (where we don't want
> to fsync every single transaction), and doing so in a manual advance
> through an sql function, seem different.

No disagreement with that, still it is the only reference we have in
the docs about that.  I think that we should take the occasion to
update the docs of the advancing functions accordingly with what we
think is the best choice; should the slot information be flushed at
the end of the function, or at the follow-up checkpoint?

>> 3) The amount of testing related to slot advancing could be better
>> with cluster-wide operations.
>> 
>> @@ -370,6 +370,11 @@ pg_physical_replication_slot_advance(XLogRecPtr
>> moveto)
>> MyReplicationSlot->data.restart_lsn = moveto;
>> 
>> SpinLockRelease(>mutex);
>> retlsn = moveto;
>> +
>> +   ReplicationSlotMarkDirty();
>> +
>> +   /* We moved retart_lsn, update the global value. */
>> +   ReplicationSlotsComputeRequiredLSN();
>> I think that the proposed patch is missing a call to
>> ReplicationSlotsComputeRequiredXmin() here for physical slots.
> 
> Hm. It seems ok to include, but I don't think omitting it currently has
> negative effects?

effective_xmin can be used by WAL senders with physical slots.  It
seems safer in the long term to include it, IMO.

>>  static XLogRecPtr
>> -pg_physical_replication_slot_advance(XLogRecPtr moveto)
>> +pg_physical_replication_slot_advance(XLogRecPtr moveto, bool *advance_done)
>>  {
>>  XLogRecPtr  startlsn = MyReplicationSlot->data.restart_lsn;
>>  XLogRecPtr  retlsn = startlsn;
>>  
>> +*advance_done = false;
>> +
>>  if (startlsn < moveto)
>>  {
>>  SpinLockAcquire(>mutex);
>>  MyReplicationSlot->data.restart_lsn = moveto;
>>  SpinLockRelease(>mutex);
>>  retlsn = moveto;
>> +*advance_done = true;
>>  }
>>  
>>  return retlsn;
> 
> Hm. Why did you choose not to use endlsn as before (except being
> broken), or something?

When doing repetitive calls of the advancing functions, the advancing
happens in the first call, and the next ones do nothing, so if no
updates is done there is no meaning to flush the slot information.

> It seems quite conceivable somebody is using these functions in an
> extension. 

Not sure I get that, pg_physical_replication_slot_advance and
pg_logical_replication_slot_advance are static in slotfuncs.c.

> Hm. I'm think testing this with real LSNs is a better idea. What if the
> node actually already is past FF/ at this point? Quite unlikely,
> I know, but still. I.e. why not get the current LSN after the INSERT,
> and assert that the slot, after the restart, is that?

Sure.  If not disabling autovacuum in the test, we'd just need to make
sure if that advancing is at least ahead of the INSERT position.

Anyway, I am still not sure if we should got down the road to just
mark the slot as dirty if any advancing is done and let the follow-up
checkpoint to the work, if the advancing function should do the slot
flush, or if we choose one and make the other an optional choice (not
for back-branches, obviously.  Based on my reading of the code, my
guess is that a flush should happen at the end of the advancing
function.
--
Michael


signature.asc
Description: PGP signature


Re: pg13 PGDLLIMPORT list

2020-01-20 Thread Michael Paquier
On Tue, Jan 21, 2020 at 08:34:05AM +0530, Amit Kapila wrote:
> As such no objection, but I am not sure if the other person need it on
> back branches as well.  Are you planning to commit this, or if you
> want I can take care of it?

Thanks for the reminder.  Done now.  I have also switched the
surrounding parameters while on it to not be inconsistent.
--
Michael


signature.asc
Description: PGP signature


Re: PATCH: standby crashed when replay block which truncated in standby but failed to truncate in master node

2020-01-20 Thread Michael Paquier
On Tue, Jan 21, 2020 at 08:45:14AM +0530, Amit Kapila wrote:
> The original email doesn't say so.  I might be missing something, but
> can you explain what makes you think so.

Oops.  Incorrect thread, I was thinking about this one previously:
https://www.postgresql.org/message-id/822113470.250068.1573246011...@connect.xfinity.com

Re-reading the root of the thread, I am still not sure what we could
do, as that's rather tricky.  See here:
https://www.postgresql.org/message-id/20190927061414.gf8...@paquier.xyz
--
Michael


signature.asc
Description: PGP signature


Re: PATCH: standby crashed when replay block which truncated in standby but failed to truncate in master node

2020-01-20 Thread Amit Kapila
On Tue, Jan 21, 2020 at 6:05 AM Michael Paquier  wrote:
>
> On Mon, Jan 20, 2020 at 02:13:53PM +0530, Amit Kapila wrote:
> > Are we planning to do something about the original problem reported in
> > this thread?
>
> We should.  This is on my TODO list, though seeing that it involved
> full_page_writes=off I drifted a bit away from it.
>

The original email doesn't say so.  I might be missing something, but
can you explain what makes you think so.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Physical replication slot advance is not persistent

2020-01-20 Thread Michael Paquier
On Tue, Jan 21, 2020 at 09:44:12AM +0800, Craig Ringer wrote:
> PLEASE do not make the streaming replication interface force flushes!

Yeah, that's a bad idea.  FWIW, my understanding is that this has been
only proposed in v3, and this has been discarded:
https://www.postgresql.org/message-id/175c2760666a78205e053207794c0...@postgrespro.ru

> The replication interface should not immediately flush changes to the
> slot replay position on advance. It should be marked dirty and left to
> be flushed by the next checkpoint. Doing otherwise potentially
> introduces a lot of unnecessary fsync()s and may have an unpleasant
> impact on performance.

Some portions of the advancing code tells a different story.  It seems
to me that the intention behind the first implementation of slot
advancing was to get things flushed if any advancing was done.  The
check doing that is actually broken from the start, but that's another
story.  Could you check with Petr what was the intention here or drag
his attention to this thread?  He is the original author of the
feature.  So his output would be nice to have.

> Clients of the replication protocol interface should be doing their
> own position tracking on the client side. They should not ever be
> relying on the server side restart position for correctness, since it
> can go backwards on crash and restart. Any that do rely on it are
> incorrect. I should propose a docs change that explains how the server
> and client restart position tracking interacts on both phy and logical
> rep since it's not really covered right now and naïve client
> implementations will be wrong.
>
> I don't really care if the SQL interface forces an immediate flush
> since it's never going to have good performance anyway.

Okay, the flush could be optional as well, but that's a different
discussion.  The docs of logical decoding mention that slot data may
go backwards in the event of a crash.  If you have improvements for
that, surely that's welcome.

> The SQL interface advances the slot restart position and marks the
> slot dirty *before the client has confirmed receipt of the data and
> flushed it to disk*. So there's a data loss window. If the client
> disconnects or crashes before all the data from that function call is
> safely flushed to disk it may lose the data, then be unable to fetch
> it again from the server because of the restart_lsn position advance.

Well, you have the same class of problems with for example synchronous
replication.  The only state a client can be sure of is that it
received a confirmation that the operation happens and completed.
Any other state tells that the operation may have happened.  Or not.
Now, being sure that the data of the new slot has been flushed once
the advancing function is done once the client got the confirmation
that the work is done is a property which could be interesting to some
class of applications.

> Really, we should add a "no_advance_position" option to the SQL
> interface, then expect the client to call a second function that
> explicitly advances the restart_lsn and confirmed_flush_lsn. Otherwise
> no SQL interface client can be crashsafe.

Hm.  Could you elaborate more what you mean here?  I am not sure to
understand.  Note that calling the advance function multiple times has
no effects, and the result returned is the actual restart_lsn (or
confirmed_flush_lsn of course).
--
Michael


signature.asc
Description: PGP signature


Re: Minimal logical decoding on standbys

2020-01-20 Thread Andreas Joseph Krogh

På tirsdag 21. januar 2020 kl. 03:57:42, skrev Amit Khandekar <
amitdkhan...@gmail.com >: 
[...]
 Sorry for the late reply.
 This patch only supports logical decoding from standby. So it's just
 an infrastructure for supporting logical replication from standby. We
 don't support creating a publication from standby, but the publication
 on master is replicated on standby, so we might be able to create
 subscription nodes that connect to existing publications on standby,
 but basically we haven't tested whether the publication/subscription
 model works with a publication on a physical standby. This patch is
 focussed on providing a way to continue logical replication *after*
 the standby is promoted as master. 


Thanks for clarifying. 


--
 Andreas Joseph Krogh

Re: pg13 PGDLLIMPORT list

2020-01-20 Thread Amit Kapila
On Sat, Jan 18, 2020 at 3:19 PM Michael Paquier  wrote:
>
> On Sat, Jan 18, 2020 at 09:19:19AM +0200, Julien Rouhaud wrote:
> > On Sat, 18 Jan 2020, 09:04 Amit Kapila  >> +1 for adding PGDLLIMPORT to these variables.  In the past, we have
> >> added it on the request of some extension authors, so I don't see any
> >> problem doing this time as well.
> >>
> >
> > +1 too
>
> Thanks.  If there are no objections, I would like to actually
> back-patch that.
>

As such no objection, but I am not sure if the other person need it on
back branches as well.  Are you planning to commit this, or if you
want I can take care of it?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Minimal logical decoding on standbys

2020-01-20 Thread Amit Khandekar
On Fri, 17 Jan 2020 at 13:20, Andreas Joseph Krogh  wrote:
>
> På torsdag 16. januar 2020 kl. 05:42:24, skrev Amit Khandekar 
> :
>
> On Fri, 10 Jan 2020 at 17:50, Rahila Syed  wrote:
> >
> > Hi Amit,
> >
> > Can you please rebase the patches as they don't apply on latest master?
>
> Thanks for notifying. Attached is the rebased version.
>
>
> Will this patch enable logical replication from a standby-server?

Sorry for the late reply.
This patch only supports logical decoding from standby. So it's just
an infrastructure for supporting logical replication from standby. We
don't support creating a publication from standby, but the publication
on master is replicated on standby, so we might be able to create
subscription nodes that connect to existing publications on standby,
but basically we haven't tested whether the publication/subscription
model works with a publication on a physical standby. This patch is
focussed on providing a way to continue logical replication *after*
the standby is promoted as master.


--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company




Re: Physical replication slot advance is not persistent

2020-01-20 Thread Craig Ringer
On Fri, 17 Jan 2020 at 01:09, Alexey Kondratov
 wrote:
>
> > I think we shouldn't touch the paths used by replication protocol. And
> > don't we focus on how we make a change of a replication slot from SQL
> > interface persistent?  It seems to me that generaly we don't need to
> > save dirty slots other than checkpoint, but the SQL function seems
> > wanting the change to be saved immediately.

PLEASE do not make the streaming replication interface force flushes!

The replication interface should not immediately flush changes to the
slot replay position on advance. It should be marked dirty and left to
be flushed by the next checkpoint. Doing otherwise potentially
introduces a lot of unnecessary fsync()s and may have an unpleasant
impact on performance.

Clients of the replication protocol interface should be doing their
own position tracking on the client side. They should not ever be
relying on the server side restart position for correctness, since it
can go backwards on crash and restart. Any that do rely on it are
incorrect. I should propose a docs change that explains how the server
and client restart position tracking interacts on both phy and logical
rep since it's not really covered right now and naïve client
implementations will be wrong.

I don't really care if the SQL interface forces an immediate flush
since it's never going to have good performance anyway.

It's already impossible to write a strictly correct and crash safe
client with the SQL interface. Adding forced flushing won't make that
any better or worse.

The SQL interface advances the slot restart position and marks the
slot dirty *before the client has confirmed receipt of the data and
flushed it to disk*. So there's a data loss window. If the client
disconnects or crashes before all the data from that function call is
safely flushed to disk it may lose the data, then be unable to fetch
it again from the server because of the restart_lsn position advance.

Really, we should add a "no_advance_position" option to the SQL
interface, then expect the client to call a second function that
explicitly advances the restart_lsn and confirmed_flush_lsn. Otherwise
no SQL interface client can be crashsafe.




Re: [HACKERS] kqueue

2020-01-20 Thread Thomas Munro
On Tue, Jan 21, 2020 at 8:03 AM Tom Lane  wrote:
> I observe very similar behavior on FreeBSD/amd64 12.0-RELEASE-p12,
> so it's not just macOS.

Thanks for testing.  Fixed by handling the new
exit_on_postmaster_death flag from commit cfdf4dc4.

On Tue, Jan 21, 2020 at 5:55 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > [ 0001-Add-kqueue-2-support-for-WaitEventSet-v13.patch ]
>
> I haven't read this patch in any detail, but a couple quick notes:
>
> * It needs to be rebased over the removal of pg_config.h.win32
> --- it should be touching Solution.pm instead, I believe.

Done.

> * I'm disturbed by the addition of a hunk to the supposedly
> system-API-independent WaitEventSetWait() function.  Is that
> a generic bug fix?  If not, can we either get rid of it, or
> at least wrap it in "#ifdef WAIT_USE_KQUEUE" so that this
> patch isn't inflicting a performance penalty on everyone else?

Here's a version that adds no new code to non-WAIT_USE_KQUEUE paths.
That code deals with the fact that we sometimes discover the
postmaster is gone before we're in a position to report an event, so
we need an inter-function memory of some kind.  The new coding also
handles a race case where someone reuses the postmaster's pid before
we notice it went away.  In theory, the need for that could be
entirely removed by collapsing the 'adjust' call into the 'wait' call
(a single kevent() invocation can do both things), but I'm not sure if
it's worth the complexity.  As for generally reducing syscalls noise,
for both kqueue and epoll, I think that should be addressed separately
by better reuse of WaitEventSet objects[1].

[1] 
https://www.postgresql.org/message-id/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com


0001-Add-kqueue-2-support-for-WaitEventSet-v14.patch
Description: Binary data


Re: Index Skip Scan

2020-01-20 Thread Peter Geoghegan
On Mon, Jan 20, 2020 at 1:19 PM Peter Geoghegan  wrote:
> On Mon, Jan 20, 2020 at 11:01 AM Jesper Pedersen
>  wrote:
> > > - nbtsearch.c _bt_skip line 1440
> > > if (BTScanPosIsValid(so->currPos) &&
> > >   _bt_scankey_within_page(scan, so->skipScanKey, 
> > > so->currPos.buf, dir))
> > >
> > > Is it allowed to look at the high key / low key of the page without have 
> > > a read lock on it?
> > >
> >
> > In case of a split the page will still contain a high key and a low key,
> > so this should be ok.
>
> This is definitely not okay.

I suggest that you find a way to add assertions to code like
_bt_readpage() that verify that we do in fact have the buffer content
lock. Actually, there is an existing assertion here that covers the
pin, but not the buffer content lock:

static bool
_bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum)
{

...

/*
 * We must have the buffer pinned and locked, but the usual macro can't be
 * used here; this function is what makes it good for currPos.
 */
Assert(BufferIsValid(so->currPos.buf));

You can add another assertion that calls a new utility function in
bufmgr.c. That can use the same logic as this existing assertion in
FlushOneBuffer():

Assert(LWLockHeldByMe(BufferDescriptorGetContentLock(bufHdr)));

We haven't needed assertions like this so far because it's usually it
is clear whether or not a buffer lock is held (plus the bufmgr.c
assertions help on their own). The fact that it isn't clear whether or
not a buffer lock will be held by caller here suggests a problem. Even
still, having some guard rails in the form of these assertions could
be helpful. Also, it seems like _bt_scankey_within_page() should have
a similar set of assertions.

BTW, there is a paper that describes optimizations like loose index
scan and skip scan together, in fairly general terms: "Efficient
Search of Multidimensional B-Trees". Loose index scans are given the
name "MDAM duplicate elimination" in the paper. See:

http://vldb.org/conf/1995/P710.PDF

Goetz Graefe told me about the paper. It seems like the closest thing
that exists to a taxonomy or conceptual framework for these
techniques.

-- 
Peter Geoghegan




Re: libxml2 is dropping xml2-config

2020-01-20 Thread Tom Lane
Christoph Berg  writes:
> Re: To PostgreSQL Hackers 2020-01-20 <20200120204715.ga73...@msg.df7cb.de>
>> Debian reports that libxml2 is dropping the xml2-config binary:

> Please disregard that, I had assumed this was a change made by libxml2
> upstream. I'm in contact with the libxml2 Debian maintainer to get
> that change off the table.

I was wondering about that --- I had thought libxml2 upstream was
not terribly active anymore.

> (Putting in support for pkg-config still makes sense, though.)

Perhaps.  Are there any platforms where libxml2 doesn't install a
pkg-config file?  What are we supposed to do if there's no pkg-config?
(The workaround we have for that with ICU is sufficiently bletcherous
that I'm not eager to replicate it for libxml2...)

regards, tom lane




Re: Rearranging ALTER TABLE to avoid multi-operations bugs

2020-01-20 Thread Tom Lane
I wrote:
> The squirrely-ness around identity is that while this now works:

> regression=# CREATE TABLE itest8 (f1 int);
> CREATE TABLE
> regression=# ALTER TABLE itest8
> regression-#   ADD COLUMN f2 int NOT NULL,
> regression-#   ALTER COLUMN f2 ADD GENERATED ALWAYS AS IDENTITY;
> ALTER TABLE

> it doesn't work if there's rows in the table:

> regression=# CREATE TABLE itest8 (f1 int);
> CREATE TABLE
> regression=# insert into itest8 default values;
> INSERT 0 1
> regression=# ALTER TABLE itest8
>   ADD COLUMN f2 int NOT NULL,
>   ALTER COLUMN f2 ADD GENERATED ALWAYS AS IDENTITY;
> ERROR:  column "f2" contains null values

> The same would be true if you tried to do the ALTER as two separate
> operations (because the ADD ... NOT NULL, without a default, will
> naturally fail on a nonempty table).  So I don't feel *too* awful
> about that.  But it'd be better if this worked.

After further poking at that, I've concluded that maybe this is not
a bug but operating as designed.  Adding the GENERATED property in a
separate step is arguably equivalent to setting a plain default in
a separate step, and look at how we handle that:

regression=# create table t1(x int);
CREATE TABLE
regression=# insert into t1 values(1);
INSERT 0 1
regression=# alter table t1 add column y int default 11,
  alter column y set default 12;
ALTER TABLE
regression=# table t1;
 x | y  
---+
 1 | 11
(1 row)

This is documented, rather opaquely perhaps, for the SET DEFAULT
case:

SET/DROP DEFAULT

These forms set or remove the default value for a column. Default
values only apply in subsequent INSERT or UPDATE commands; they do not
cause rows already in the table to change.

So the design principle here seems to be that we fill the column
using whatever is specified *in the ADD COLUMN subcommand*, and
any screwing-about in other subcommands just affects what the
behavior will be in subsequent INSERT commands.  That's a little
weird but it has potential use-cases.  If we attempt to apply the
"new" default immediately then this syntax devolves to having the
same effects as a simple ADD-COLUMN-with-default.  There's not
a lot of reason to write the longer form if that's what you wanted.

So I'm now inclined to think that the code is all right.  We could
improve the documentation, perhaps, with an explicit example.
Also, the man page's entry for SET GENERATED says nothing of this,
but it likely ought to say the same thing as SET DEFAULT.

Also, we don't really have any test cases proving it works that way.
Simple tests, such as the one above, are not too trustworthy because
the attmissingval optimization tends to hide what's really happening.
(I found this out the hard way while messing with a patch to change
the behavior --- which I now think we shouldn't do, anyhow.)

regards, tom lane




Reducing WaitEventSet syscall churn

2020-01-20 Thread Thomas Munro
Hi,

Here are some patches to get rid of frequent system calls.

0001 changes all qualifying WaitLatch() calls to use a new function
WaitMyLatch() that reuses a common WaitEventSet.  That's for callers
that only want to wait for their own latch or an optional timeout,
with automatic exit-on-postmaster-death.

0002 changes condition_variable.c to use WaitMyLatch(), instead of its
own local thing like that.  Perhaps this makes up for the use of the
extra fd consumed by 0001.

0003 changes pgstat.c to use its own local reusable WaitEventSet.

To see what I'm talking about, try tracing a whole cluster with eg
strace/truss/dtruss -f postgres -D pgdata.  This applies to Linux
systems, or BSD/macOS systems with the nearby kqueue patch applied.
On systems that fall back to poll(), there aren't any setup/teardown
syscalls.


0001-Add-WaitMyLatch-to-replace-many-WaitLatch-calls.patch
Description: Binary data


0002-Use-WaitMyLatch-for-condition-variables.patch
Description: Binary data


0003-Introduce-a-reusable-WaitEventSet-for-the-stats-coll.patch
Description: Binary data


Re: Physical replication slot advance is not persistent

2020-01-20 Thread Kyotaro Horiguchi
Thanks for looking this.

At Mon, 20 Jan 2020 11:00:14 -0800, Andres Freund  wrote in 
> > Here is a summary of the points raised (please correct me if that
> > does not sound right to you, Andres):
> 
> > 1) The slot advancing has to mark the slot as dirty, but should we
> > make the change persistent at the end of the function or should we
> > wait for a checkpoint to do the work, meaning that any update done to
> > the slot would be lost if a crash occurs in-between?  Note that we
> > have this commit in slotfuncs.c for
> > pg_logical_replication_slot_advance():
> >  * Dirty the slot so it's written out at the next checkpoint.
> >  * We'll still lose its position on crash, as documented, but it's
> >  * better than always losing the position even on clean restart.
> > 
> > This comment refers to the documentation for the logical decoding
> > section (see logicaldecoding-replication-slots in
> > logicaldecoding.sgml), and even if nothing can be done until the slot
> > advance function reaches its hand, we ought to make the data
> > persistent if we can.
> 
> That doesn't really seem like a meaningful reference, because the
> concerns between constantly streaming out changes (where we don't want
> to fsync every single transaction), and doing so in a manual advance
> through an sql function, seem different.

Yes, that is the reason I didn't suggest not to save the file there.
I don't have a clear opinion on it but I agree that users expect that
any changes they made from SQL interface should survive a
crash-recovery.

> > 3) The amount of testing related to slot advancing could be better
> > with cluster-wide operations.
> > 
> > @@ -370,6 +370,11 @@ pg_physical_replication_slot_advance(XLogRecPtr
> > moveto)
> > MyReplicationSlot->data.restart_lsn = moveto;
> > 
> > SpinLockRelease(>mutex);
> > retlsn = moveto;
> > +
> > +   ReplicationSlotMarkDirty();
> > +
> > +   /* We moved retart_lsn, update the global value. */
> > +   ReplicationSlotsComputeRequiredLSN();
> > I think that the proposed patch is missing a call to
> > ReplicationSlotsComputeRequiredXmin() here for physical slots.
> 
> Hm. It seems ok to include, but I don't think omitting it currently has
> negative effects?

I think no. It is updated sooner or later when replication proceeds
and received a reply message.

> > So, I have been looking at this patch by myself, and updated it so as
> > the extra slot save is done only if any advancing has been done, on
> > top of the other computations that had better be around for
> > consistency.
> 
> Hm, I don't necessarily what that's necessary.

On the other hand, no negitve effect by the extra saving of the file
as far as the SQL function itself is not called extremely
frequently. If I read Andres's comment above correctly, I agree not to
add complexity to supress the "needless" saving of the file.

> > The patch includes TAP tests for physical and logical slots'
> > durability across restarts.
> 
> Cool!
> 
> 
> > diff --git a/src/backend/replication/slotfuncs.c 
> > b/src/backend/replication/slotfuncs.c
> > index bb69683e2a..af3e114fc9 100644
> > --- a/src/backend/replication/slotfuncs.c
> > +++ b/src/backend/replication/slotfuncs.c
> > @@ -359,17 +359,20 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
> >   * checkpoints.
> >   */
> >  static XLogRecPtr
> > -pg_physical_replication_slot_advance(XLogRecPtr moveto)
> > +pg_physical_replication_slot_advance(XLogRecPtr moveto, bool *advance_done)
> >  {
> > XLogRecPtr  startlsn = MyReplicationSlot->data.restart_lsn;
> > XLogRecPtr  retlsn = startlsn;
> >  
> > +   *advance_done = false;
> > +
> > if (startlsn < moveto)
> > {
> > SpinLockAcquire(>mutex);
> > MyReplicationSlot->data.restart_lsn = moveto;
> > SpinLockRelease(>mutex);
> > retlsn = moveto;
> > +   *advance_done = true;
> > }
> >  
> > return retlsn;
> 
> Hm. Why did you choose not to use endlsn as before (except being
> broken), or something? It seems quite conceivable somebody is using
> these functions in an extension.
> 
> 
> 
> 
> > +# Test physical slot advancing and its durability.  Create a new slot on
> > +# the primary, not used by any of the standbys. This reserves WAL at 
> > creation.
> > +my $phys_slot = 'phys_slot';
> > +$node_master->safe_psql('postgres',
> > +   "SELECT pg_create_physical_replication_slot('$phys_slot', true);");
> > +$node_master->psql('postgres', "
> > +   CREATE TABLE tab_phys_slot (a int);
> > +   INSERT INTO tab_phys_slot VALUES (generate_series(1,10));");
> > +my $psql_rc = $node_master->psql('postgres',
> > +   "SELECT pg_replication_slot_advance('$phys_slot', 'FF/');");
> > +is($psql_rc, '0', 'slot advancing works with physical slot');
> 
> Hm. I'm think testing this with real LSNs is a better idea. What if the
> node actually already is past FF/ at this point? Quite unlikely,
> I know, but still. I.e. why not get the current LSN 

Re: PATCH: standby crashed when replay block which truncated in standby but failed to truncate in master node

2020-01-20 Thread Michael Paquier
On Mon, Jan 20, 2020 at 02:13:53PM +0530, Amit Kapila wrote:
> Are we planning to do something about the original problem reported in
> this thread?

We should.  This is on my TODO list, though seeing that it involved
full_page_writes=off I drifted a bit away from it.
--
Michael


signature.asc
Description: PGP signature


Re: Decade indication

2020-01-20 Thread Isaac Morland
On Fri, 17 Jan 2020 at 17:52, Bruce Momjian  wrote:


> I assume there is enough agreement that decades start on 20X0 that we
> don't need to document that Postgres does that.
>

I think the inconsistency between years, decades, centuries, and millenia
is worthy of documentation. In fact, it already is for EXTRACT:

https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT

It describes decade as "The year field divided by 10", whereas for century
and millennium it refers to centuries and millennia beginning in '01 years.
I think if I were designing EXTRACT I would probably have decades follow
the pattern of century and millennium, mostly because if somebody wants
year / 10 they can just write that. But I am, to say the least, not
proposing any modifications to this particular API, for multiple reasons
which I'm sure almost any reader of this list will agree with.


Re: BRIN cost estimate breaks geometric indexes

2020-01-20 Thread Egor Rogov

On 21.01.2020 0:00, Darafei "Komяpa" Praliaskouski wrote:

Hi,

Found out today that BRIN indexes don't really work for PostGIS and 
box datatypes.


Since 
https://github.com/postgres/postgres/commit/7e534adcdc70866e7be74d626b0ed067c890a251 Postgres 
requires datatype to provide correlation statistics. Such statistics 
wasn't provided by PostGIS and box types.


Today I tried to replace a 200gb gist index with 8mb brin index and 
queries didn't work as expected - it was never used. set 
enable_seqscan=off helped for a bit but that's not a permanent solution.
Plans for context: 
https://gist.github.com/Komzpa/2cd396ec9b65e2c93341e9934d974826


Debugging session on #postgis IRC channel leads to this ticket to 
create a (not that meaningful) correlation statistics for geometry 
datatype: https://trac.osgeo.org/postgis/ticket/4625#ticket


Postgres Professional mentioned symptoms of the issue in their 
in-depth manual: 
https://habr.com/ru/company/postgrespro/blog/346460/ - box datatype 
showed same unusable BRIN symptoms for them.



(Translated to English: 
https://habr.com/en/company/postgrespro/blog/452900/)



A reasonable course of action on Postgres side seems to be to not 
assume selectivity of 1 in absence of correlation statistics, but 
something that would prefer such an index to a parallel seq scan, but 
higher than similar GIST.


Any other ideas?



As far as I understand, correlation is computed only for sortable types, 
which means that the current concept of correlation works as intended 
only for B-tree indexes.


Ideally, correlation should be computed for (attribute, index) pair, 
taking into account order of values returned by the index scan. Less 
ideal but more easier approach can be to ignore the computed correlation 
for any index access except B-tree, and just assume some predefined 
constant.







Re: error context for vacuum to include block number

2020-01-20 Thread Justin Pryzby
On Mon, Jan 20, 2020 at 11:11:20AM -0800, Andres Freund wrote:
> This I do not get. I didn't yet fully wake up, so I might just be slow?

It was needlessly cute at the cost of clarity (meant to avoid setting
error_context_stack in lazy_scan_heap and again immediately on its return).

On Mon, Jan 20, 2020 at 11:13:05AM -0800, Andres Freund wrote:
> I was thinking that you could just use LVRelStats.

Done.

On Mon, Jan 20, 2020 at 11:11:20AM -0800, Andres Freund wrote:
> Alternatively we could push another context for each index inside
> lazy_vacuum_all_indexes(). There's been plenty bugs in indexes
> triggering problems, so that could be worthwhile.

Did this too, although I'm not sure what kind of errors it'd find (?)

I considered elimating other uses of RelationGetRelationName, or looping over
vacrelstats->blkno instead of local blkno.  I did that in an additional patch
(that will cause conflicts if you try to apply it, due to other vacuum patch in
this branch).

CREATE TABLE t AS SELECT generate_series(1,9)a;

postgres=# SET client_min_messages=debug;SET statement_timeout=39; VACUUM 
(VERBOSE, PARALLEL 0) t;
INFO:  vacuuming "public.t"
2020-01-20 15:46:14.993 CST [20056] ERROR:  canceling statement due to 
statement timeout
2020-01-20 15:46:14.993 CST [20056] CONTEXT:  while scanning block 211 of 
relation "public.t"
2020-01-20 15:46:14.993 CST [20056] STATEMENT:  VACUUM (VERBOSE, PARALLEL 0) t;
ERROR:  canceling statement due to statement timeout
CONTEXT:  while scanning block 211 of relation "public.t"

SELECT 'CREATE INDEX ON t(a)' FROM generate_series(1,11);\gexec
UPDATE t SET a=a+1;

postgres=# SET client_min_messages=debug;SET statement_timeout=99; VACUUM 
(VERBOSE, PARALLEL 0) t;
INFO:  vacuuming "public.t"
DEBUG:  "t_a_idx": vacuuming index
2020-01-20 15:47:36.338 CST [20139] ERROR:  canceling statement due to 
statement timeout
2020-01-20 15:47:36.338 CST [20139] CONTEXT:  while vacuuming relation 
"public.t_a_idx"
2020-01-20 15:47:36.338 CST [20139] STATEMENT:  VACUUM (VERBOSE, PARALLEL 0) t;
ERROR:  canceling statement due to statement timeout
CONTEXT:  while vacuuming relation "public.t_a_idx"

I haven't found a good way of exercizing the "vacuuming heap" path, though.
>From 44f5eaaef66c570395a9af2bdbe74943c9163c4d Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v10 1/5] vacuum errcontext to show block being processed

As requested here.
https://www.postgresql.org/message-id/20190807235154.erbmr4o4bo6vgnjv%40alap3.anarazel.de
---
 src/backend/access/heap/vacuumlazy.c | 38 ++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index bd2e7fb..2eb3caa 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -290,8 +290,12 @@ typedef struct LVRelStats
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
-} LVRelStats;
 
+	/* Used by the error callback */
+	char		*relname;
+	char 		*relnamespace;
+	BlockNumber blkno;
+} LVRelStats;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -361,6 +365,7 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static void vacuum_error_callback(void *arg);
 
 
 /*
@@ -813,6 +818,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	ErrorContextCallback errcallback;
 
 	pg_rusage_init();
 
@@ -892,6 +898,16 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	next_unskippable_block = 0;
 	skipping_blocks = skip_blocks(onerel, params, _unskippable_block, nblocks, , aggressive);
 
+	/* Setup error traceback support for ereport() */
+	vacrelstats->relnamespace = get_namespace_name(RelationGetNamespace(onerel));
+	vacrelstats->relname = relname;
+	vacrelstats->blkno = InvalidBlockNumber; /* Not known yet */
+
+	errcallback.callback = vacuum_error_callback;
+	errcallback.arg = (void *) vacrelstats;
+	errcallback.previous = error_context_stack;
+	error_context_stack = 
+
 	for (blkno = 0; blkno < nblocks; blkno++)
 	{
 		Buffer		buf;
@@ -913,6 +929,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 #define FORCE_CHECK_PAGE() \
 		(blkno == nblocks - 1 && should_attempt_truncation(params, vacrelstats))
 
+		vacrelstats->blkno = blkno;
+
 		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
 		if (blkno == next_unskippable_block)
@@ -979,11 +997,13 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			}
 
 			/* Work on all the indexes, 

Re: Increase psql's password buffer size

2020-01-20 Thread David Fetter
On Mon, Jan 20, 2020 at 09:17:47PM +0100, Alexander Kukushkin wrote:
> Hi,
> 
> I think I should add my two cents.
> 
> On Mon, 20 Jan 2020 at 20:38, Tom Lane  wrote:
> >
> > > I found another place that assumes 100 bytes and upped it to 2048.
> 
> There one more place, in the code which is parsing .pgpass

What I found that seems like it might be related was on line 6900 of
src/interfaces/libpq/fe-connect.c (passwordFromFile):

#define LINELEN NAMEDATALEN*5

which is 315 (63*5) by default and isn't 100 on any sane setup. What
did I miss?

In any case, having the lengths be different in different places
seems sub-optimal. PGPASSWORD is just a const char *, so could be
quite long. The password prompted for by psql can be up to 100 bytes,
and the one read from .pgpass is bounded from above by 

315
- 4 (colons)
- 4 (shortest possible hostname)
- 4 (usual port number)
- 1 (shortest db name)
- 1 (shortest possible username)
---
301

> > So this is pretty much exactly what I expected.  And have you tried
> > it with e.g. PAM, or LDAP?
> >
> > I think the AWS guys are fools to imagine that this will work in very
> > many places, and I don't see why we should be leading the charge to
> > make it work for them.  What's the point of having a huge amount of
> > data in a password, anyway?
> 
> We at Zalando are using JWT tokens as passwords. JWT tokens are
> self-contained and therefore quite huge (up to 700-800 bytes in our
> case). Tokens have a limited lifetime (1 hour) and we are using PAM to
> verify them.
> Altogether the whole thing works like a charm. The only problem that
> it is not possible to copy the token into psql password prompt,
> but there is a workaround, export PGPASSWORD=verylongtokenstring &&
> psql
> 
> JWT: https://jwt.io/
> PAM module to verify OAuth tokens: https://github.com/CyberDem0n/pam-oauth2

This reminds me of a patch that implemented PGPASSCOMMAND.
https://www.postgresql.org/message-id/flat/CAE35ztOGZqgwae3mBA%3DL97pSg3kvin2xycQh%3Dir%3D5NiwCApiYQ%40mail.gmail.com

Discussion of that seems to have trailed off, though. My thought on
that was that it was making a decision about the presence of both a
.pgpass file and a PGPASSCOMMAND setting that it shouldn't have made,
i.e. it decided which took precedence.  I think it should fail when
presented with both, as there's not a single right answer that will
cover all cases.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Index Skip Scan

2020-01-20 Thread Peter Geoghegan
On Mon, Jan 20, 2020 at 11:01 AM Jesper Pedersen
 wrote:
> > - nbtsearch.c _bt_skip line 1440
> > if (BTScanPosIsValid(so->currPos) &&
> >   _bt_scankey_within_page(scan, so->skipScanKey, 
> > so->currPos.buf, dir))
> >
> > Is it allowed to look at the high key / low key of the page without have a 
> > read lock on it?
> >
>
> In case of a split the page will still contain a high key and a low key,
> so this should be ok.

This is definitely not okay.

> > - nbtsearch.c in general
> > Most of the code seems to rely quite heavily on the fact that xs_want_itup 
> > forces _bt_drop_lock_and_maybe_pin to never release the buffer pin. Have 
> > you considered that compacting of a page may still happen even if you hold 
> > the pin? [1] I've been trying to come up with cases in which this may break 
> > the patch, but I haven't able to produce such a scenario - so it may be 
> > fine.

Try making _bt_findinsertloc() call _bt_vacuum_one_page() whenever the
page is P_HAS_GARBAGE(), regardless of whether or not the page is
about to split. That will still be correct, while having a much better
chance of breaking the patch during stress-testing.

Relying on a buffer pin to prevent the B-Tree structure itself from
changing in any important way seems likely to be broken already. Even
if it isn't, it sounds fragile.

A leaf page doesn't really have anything called a low key. It usually
has a current first "data item"/non-pivot tuple, which is an
inherently unstable thing. Also, it has a very loose relationship with
the high key of the left sibling page, which the the closest thing to
a low key that exists (often they'll have almost the same key values,
but that is not guaranteed at all). While I haven't studied the patch,
the logic within _bt_scankey_within_page() seems fishy to me for that
reason.

> There is a BT_READ lock in place when finding the correct leaf page, or
> searching within the leaf page itself. _bt_vacuum_one_page deletes only
> LP_DEAD tuples, but those are already ignored in _bt_readpage. Peter, do
> you have some feedback for this ?

It sounds like the design of the patch relies on doing something other
than stopping a scan "between" pages, in the sense that is outlined in
the commit message of commit 09cb5c0e. If so, then that's a serious
flaw in its design.

-- 
Peter Geoghegan




Re: libxml2 is dropping xml2-config

2020-01-20 Thread Christoph Berg
Re: To PostgreSQL Hackers 2020-01-20 <20200120204715.ga73...@msg.df7cb.de>
> Debian reports that libxml2 is dropping the xml2-config binary:

Please disregard that, I had assumed this was a change made by libxml2
upstream. I'm in contact with the libxml2 Debian maintainer to get
that change off the table.

> We should teach configure.in to recognize that natively as well, I
> guess.

(Putting in support for pkg-config still makes sense, though.)

Christoph




BRIN cost estimate breaks geometric indexes

2020-01-20 Thread Komяpa
Hi,

Found out today that BRIN indexes don't really work for PostGIS and box
datatypes.

Since
https://github.com/postgres/postgres/commit/7e534adcdc70866e7be74d626b0ed067c890a251
Postgres
requires datatype to provide correlation statistics. Such statistics wasn't
provided by PostGIS and box types.

Today I tried to replace a 200gb gist index with 8mb brin index and queries
didn't work as expected - it was never used. set enable_seqscan=off helped
for a bit but that's not a permanent solution.
Plans for context:
https://gist.github.com/Komzpa/2cd396ec9b65e2c93341e9934d974826

Debugging session on #postgis IRC channel leads to this ticket to create a
(not that meaningful) correlation statistics for geometry datatype:
https://trac.osgeo.org/postgis/ticket/4625#ticket

Postgres Professional mentioned symptoms of the issue in their in-depth
manual: https://habr.com/ru/company/postgrespro/blog/346460/ - box datatype
showed same unusable BRIN symptoms for them.

A reasonable course of action on Postgres side seems to be to not assume
selectivity of 1 in absence of correlation statistics, but something that
would prefer such an index to a parallel seq scan, but higher than similar
GIST.

Any other ideas?

-- 
Darafei Praliaskouski
Support me: http://patreon.com/komzpa


Re: Increase psql's password buffer size

2020-01-20 Thread Tom Lane
Alexander Kukushkin  writes:
> I think I should add my two cents.
> We at Zalando are using JWT tokens as passwords. JWT tokens are
> self-contained and therefore quite huge (up to 700-800 bytes in our
> case). Tokens have a limited lifetime (1 hour) and we are using PAM to
> verify them.
> Altogether the whole thing works like a charm. The only problem that
> it is not possible to copy the token into psql password prompt,
> but there is a workaround, export PGPASSWORD=verylongtokenstring &&
> psql

I remain unconvinced that this is a good design, as compared to the
alternative of hashing $large_secret_data down to a more typical
length for a password.

Quite aside from whether or not you run into any implementation
restrictions on password length, using externally-sourced secret
data directly as a password seems like a lousy idea from a pure
security standpoint.  What happens if somebody compromises your
database, or even just your connection to the database, and is
able to read out the raw password?  The damage is worse than the
ordinary case of just being able to get into your database account,
because now the attacker has info about a formerly-secure upstream
datum, which probably lets him into some other things.  It's not
unlike using the same password across multiple services.

In the case you describe, you're also exposing that data to wherever
the PAM mechanism is keeping its secrets, hence presenting an even
larger attack surface.  Hashing the data before it goes to any of
those places would go a long way towards mitigating the risk.

regards, tom lane




libxml2 is dropping xml2-config

2020-01-20 Thread Christoph Berg
Debian reports that libxml2 is dropping the xml2-config binary:

Date: Mon, 20 Jan 2020 20:42:47 +0100
From: Mattia Rizzolo 
Reply-To: Mattia Rizzolo , 949...@bugs.debian.org
Subject: Bug#949428: postgresql-12: FTBFS with libxml2 2.9.10 (uses xml2-config)

Source: postgresql-12
Version: 12.1-2
Severity: important
Tags: ftbfs
User: libx...@packages.debian.org
Usertags: ftbfs-2.9.10 xml2-config


Dear maintainer,

your package is using `xml2-config` to detect and use libxml2.  I'm
removing that script, so please update your build system to use
pkg-config instead.

The libxml2 package in experimental already doesn't ship the xml2-config
script.

Attached is the full build log, hopefully relevant excerpt follows:


checking for xml2-config... no
...
configure: error: header file  is required for XML support

[...]
- End forwarded message -

Luckily the ./configure script is compatible enough so that this hack
works: (tested on master)

./configure  --with-libxml XML2_CONFIG='pkg-config libxml-2.0'
[...]
checking for XML2_CONFIG... pkg-config libxml-2.0
[...]
checking for xmlSaveToBuffer in -lxml2... yes
[...]
checking libxml/parser.h usability... yes
checking libxml/parser.h presence... yes
checking for libxml/parser.h... yes

We should teach configure.in to recognize that natively as well, I
guess.

Christoph




Re: Add limit option to copy function

2020-01-20 Thread Tom Lane
=?UTF-8?B?6rmA64yA7Zi4?=  writes:
> I suggest adding a limit option to the copy function that limits count of 
> input/output.
> I think this will be useful for testing with sample data.

I'm quite skeptical of the value of this.  On the output side, you
can already do it with

COPY (SELECT ... LIMIT n) TO wherever;

Moreover, that approach allows you to include an ORDER BY, which is
generally good practice in any query that includes LIMIT, in case
you'd like deterministic results.

On the input side, it's true that you'd have to resort to some
outside features (perhaps applying "head" to the input file, or
some such), or else copy the data into a temp table and post-process.
But that's true for most ways that you might want to adjust or
filter the input data; why should this one be different?

We don't consider that COPY is a general-purpose ETL engine, and
have resisted addition of features to it in the past because
they'd slow down the primary use-case.  That objection applies
here too.  Yeah, it's (probably) not a big slowdown ... but it's
hard to justify any cost at all for a feature that is outside
the design scope of COPY.

regards, tom lane




Re: Greatest Common Divisor

2020-01-20 Thread Dean Rasheed
On Mon, 20 Jan 2020 at 19:04, Alvaro Herrera  wrote:
>
> On 2020-Jan-20, Dean Rasheed wrote:
>
> > +   
> > +greatest common divisor  the largest positive number that
> > +divides both inputs with no remainder; returns 
> > 0 if
> > +both inputs are zero
> > +   
>
> Warning, severe TOC/bikeshedding ahead.
>
> I don't know why, but this dash-semicolon sequence reads strange to me
> and looks out of place.  I would use parens for the first phrase and
> keep the semicolon, that is "greatest common divisor (the largest ...);
> returns 0 if ..."
>
> That seems more natural to me, and we're already using parens in other
> description s.
>

Hmm, OK. I suppose that's more logical because then the bit in parens
is the standard definition of gcd/lcm, and the part after the
semicolon is the implementation choice for the special case not
covered by the standard definition.

Regards,
Dean




Re: Increase psql's password buffer size

2020-01-20 Thread Alexander Kukushkin
Hi,

I think I should add my two cents.

On Mon, 20 Jan 2020 at 20:38, Tom Lane  wrote:
>
> > I found another place that assumes 100 bytes and upped it to 2048.

There one more place, in the code which is parsing .pgpass

>
> So this is pretty much exactly what I expected.  And have you tried
> it with e.g. PAM, or LDAP?
>
> I think the AWS guys are fools to imagine that this will work in very
> many places, and I don't see why we should be leading the charge to
> make it work for them.  What's the point of having a huge amount of
> data in a password, anyway?

We at Zalando are using JWT tokens as passwords. JWT tokens are
self-contained and therefore quite huge (up to 700-800 bytes in our
case). Tokens have a limited lifetime (1 hour) and we are using PAM to
verify them.
Altogether the whole thing works like a charm. The only problem that
it is not possible to copy the token into psql password prompt,
but there is a workaround, export PGPASSWORD=verylongtokenstring &&
psql

JWT: https://jwt.io/
PAM module to verify OAuth tokens: https://github.com/CyberDem0n/pam-oauth2

Regards,
--
Alexander Kukushkin




Re: Online checksums patch - once again

2020-01-20 Thread Robert Haas
On Sat, Jan 18, 2020 at 6:18 PM Daniel Gustafsson  wrote:
> Thanks again for reviewing (and working on the infrastructure required for 
> this
> patch to begin with)!  Regarding the persisting the progress; that would be a
> really neat feature but I don't have any suggestion on how to do that safely
> for real use-cases.

Leaving to one side the question of how much work is involved, could
we do something conceptually similar to relfrozenxid/datfrozenxid,
i.e. use catalog state to keep track of which objects have been
handled and which not?

Very rough sketch:

* set a flag indicating that checksums must be computed for all page writes
* use barriers and other magic to make sure everyone has gotten the
memo from the previous step
* use new catalog fields pg_class.relhaschecksums and
pg_database.dathaschecksums to track whether checksums are enabled
* keep launching workers for databases where !pg_class.dathaschecksums
until none remain
* mark checksums as fully enabled
* party

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Greatest Common Divisor

2020-01-20 Thread Dean Rasheed
On Mon, 20 Jan 2020 at 18:52, Vik Fearing  wrote:
>
> On 20/01/2020 11:28, Dean Rasheed wrote:
> >
> > +   
> > +least common multiple  the smallest strictly positive number
> > +that is an integer multiple of both inputs; returns
> > 0
> > +if either input is zero
> > +   
>
> In that case should lcm be "...that is an integral multiple..." since
> the numeric version will return numeric?
>

So "integral multiple" instead of "integer multiple"? I think I'm more
used to the latter, but I'm happy with either.

Regards,
Dean




Re: Increase psql's password buffer size

2020-01-20 Thread Tom Lane
David Fetter  writes:
> On Mon, Jan 20, 2020 at 07:44:25PM +0100, David Fetter wrote:
>> On Mon, Jan 20, 2020 at 01:12:35PM -0500, Tom Lane wrote:
>>> (I can't say that s/100/2048/ in one place is a particularly evil
>>> change; what bothers me is the likelihood that there are other
>>> places that won't cope with arbitrarily long passwords.  Not all of
>>> them are necessarily under our control, either.)

>> I found one that is, so please find attached the next revision of the
>> patch.

> I found another place that assumes 100 bytes and upped it to 2048.

So this is pretty much exactly what I expected.  And have you tried
it with e.g. PAM, or LDAP?

I think the AWS guys are fools to imagine that this will work in very
many places, and I don't see why we should be leading the charge to
make it work for them.  What's the point of having a huge amount of
data in a password, anyway?  You can't expect to get it back out
again, and there's no reason to believe that it adds any security
after a certain point.  If they want a bunch of different things
contributing to the password, OK, but they could just hash those
things together and thereby keep their submitted password to a length
that will work with most services.

regards, tom lane




Re: Increase psql's password buffer size

2020-01-20 Thread David Fetter
On Mon, Jan 20, 2020 at 07:44:25PM +0100, David Fetter wrote:
> On Mon, Jan 20, 2020 at 01:12:35PM -0500, Tom Lane wrote:
> > David Fetter  writes:
> > > At least two cloud providers are now stuffing large amounts of
> > > information into the password field. This change makes it possible to
> > > accommodate that usage in interactive sessions.
> > 
> > Like who?
> 
> AWS and Azure are two examples I know of.
> 
> > It seems like a completely silly idea.  And if 2K is sane, why not
> > much more?
> 
> Good question. Does it make sense to rearrange these things so they're
> allocated at runtime instead of compile time?
> 
> > (I can't say that s/100/2048/ in one place is a particularly evil
> > change; what bothers me is the likelihood that there are other
> > places that won't cope with arbitrarily long passwords.  Not all of
> > them are necessarily under our control, either.)
> 
> I found one that is, so please find attached the next revision of the
> patch.

I found another place that assumes 100 bytes and upped it to 2048.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From fb05bf709df0a67a63bca413cd7f0f276cab78b9 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Mon, 20 Jan 2020 09:58:19 -0800
Subject: [PATCH v3] Increase psql's password buffer size
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.24.1"

This is a multi-part message in MIME format.
--2.24.1
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


At least two cloud providers are now stuffing large amounts of
information into the password field. This makes it possible to
accommodate that usage in interactive sessions.

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index e111cee556..61386fe4ae 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -1827,8 +1827,8 @@ exec_command_password(PsqlScanState scan_state, bool active_branch)
 	{
 		char	   *opt0 = psql_scan_slash_option(scan_state,
   OT_SQLID, NULL, true);
-		char		pw1[100];
-		char		pw2[100];
+		char		pw1[2048];
+		char		pw2[2048];
 
 		simple_prompt("Enter new password: ", pw1, sizeof(pw1), false);
 		simple_prompt("Enter it again: ", pw2, sizeof(pw2), false);
@@ -2845,7 +2845,7 @@ copy_previous_query(PQExpBuffer query_buf, PQExpBuffer previous_buf)
 static char *
 prompt_for_password(const char *username)
 {
-	char		buf[100];
+	char		buf[2048];
 
 	if (username == NULL || username[0] == '\0')
 		simple_prompt("Password: ", buf, sizeof(buf), false);
diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
index 3302bd4dd3..a7e3263979 100644
--- a/src/bin/psql/startup.c
+++ b/src/bin/psql/startup.c
@@ -120,7 +120,7 @@ main(int argc, char *argv[])
 	struct adhoc_opts options;
 	int			successResult;
 	bool		have_password = false;
-	char		password[100];
+	char		password[2048];
 	bool		new_pass;
 
 	pg_logging_init(argv[0]);

--2.24.1--




Re: error context for vacuum to include block number

2020-01-20 Thread Andres Freund
Hi,

On 2019-12-12 21:08:31 -0600, Justin Pryzby wrote:
> On Wed, Dec 11, 2019 at 12:33:53PM -0300, Alvaro Herrera wrote:
> On Wed, Dec 11, 2019 at 08:54:25AM -0800, Andres Freund wrote:
> > Hm, wonder if could be worthwhile to not use a separate struct here, but
> > instead extend one of the existing structs to contain the necessary
> > information. Or perhaps have one new struct with all the necessary
> > information. There's already quite a few places that do
> > get_namespace_name(), for example.
> 
> Didn't find a better struct to use yet.

I was thinking that you could just use LVRelStats.

Greetings,

Andres Freund




Re: error context for vacuum to include block number

2020-01-20 Thread Andres Freund
Hi,

On 2020-01-19 23:41:59 -0600, Justin Pryzby wrote:
>  /*
> + * Return whether skipping blocks or not.
> + * Except when aggressive is set, we want to skip pages that are
> + * all-visible according to the visibility map, but only when we can skip
> + * at least SKIP_PAGES_THRESHOLD consecutive pages.  Since we're reading
> + * sequentially, the OS should be doing readahead for us, so there's no
> + * gain in skipping a page now and then; that's likely to disable
> + * readahead and so be counterproductive. Also, skipping even a single
> + * page means that we can't update relfrozenxid, so we only want to do it
> + * if we can skip a goodly number of pages.
> + *
> + * When aggressive is set, we can't skip pages just because they are
> + * all-visible, but we can still skip pages that are all-frozen, since
> + * such pages do not need freezing and do not affect the value that we can
> + * safely set for relfrozenxid or relminmxid.
> + *
> + * Before entering the main loop, establish the invariant that
> + * next_unskippable_block is the next block number >= blkno that we can't
> + * skip based on the visibility map, either all-visible for a regular scan
> + * or all-frozen for an aggressive scan.  We set it to nblocks if there's
> + * no such block.  We also set up the skipping_blocks flag correctly at
> + * this stage.
> + *
> + * Note: The value returned by visibilitymap_get_status could be slightly
> + * out-of-date, since we make this test before reading the corresponding
> + * heap page or locking the buffer.  This is OK.  If we mistakenly think
> + * that the page is all-visible or all-frozen when in fact the flag's just
> + * been cleared, we might fail to vacuum the page.  It's easy to see that
> + * skipping a page when aggressive is not set is not a very big deal; we
> + * might leave some dead tuples lying around, but the next vacuum will
> + * find them.  But even when aggressive *is* set, it's still OK if we miss
> + * a page whose all-frozen marking has just been cleared.  Any new XIDs
> + * just added to that page are necessarily newer than the GlobalXmin we
> + * computed, so they'll have no effect on the value to which we can safely
> + * set relfrozenxid.  A similar argument applies for MXIDs and relminmxid.
> + *
> + * We will scan the table's last page, at least to the extent of
> + * determining whether it has tuples or not, even if it should be skipped
> + * according to the above rules; except when we've already determined that
> + * it's not worth trying to truncate the table.  This avoids having
> + * lazy_truncate_heap() take access-exclusive lock on the table to attempt
> + * a truncation that just fails immediately because there are tuples in
> + * the last page.  This is worth avoiding mainly because such a lock must
> + * be replayed on any hot standby, where it can be disruptive.
> + */

FWIW, I think we should just flat out delete all this logic, and replace
it with a few explicit PrefetchBuffer() calls. Just by chance I
literally just now sped up a VACUUM by more than a factor of 10, by
manually prefetching buffers. At least the linux kernel readahead logic
doesn't deal well with reading and writing to different locations in the
same file, and that's what the ringbuffer pretty invariably leads to for
workloads that aren't cached.

Partially so I'll find it when I invariably search for this in the
future:
select pg_prewarm(relid, 'buffer', 'main', blocks_done, 
least(blocks_done+10, blocks_total)) from pg_stat_progress_create_index 
where phase = 'building index: scanning table' and datid = (SELECT oid FROM 
pg_database WHERE datname = current_database());
\watch 0.5


> From 623c725c8add0670b28cdbfceca1824ba5b0647c Mon Sep 17 00:00:00 2001
> From: Justin Pryzby 
> Date: Thu, 12 Dec 2019 20:54:37 -0600
> Subject: [PATCH v9 2/3] vacuum errcontext to show block being processed
> 
> As requested here.
> https://www.postgresql.org/message-id/20190807235154.erbmr4o4bo6vgnjv%40alap3.anarazel.de
> ---
>  src/backend/access/heap/vacuumlazy.c | 37 
> 
>  1 file changed, 37 insertions(+)
> 
> diff --git a/src/backend/access/heap/vacuumlazy.c 
> b/src/backend/access/heap/vacuumlazy.c
> index 9849685..c96abdf 100644
> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -289,6 +289,12 @@ typedef struct LVRelStats
>   boollock_waiter_detected;
>  } LVRelStats;
>  
> +typedef struct
> +{
> + char *relname;
> + char *relnamespace;
> + BlockNumber blkno;
> +} vacuum_error_callback_arg;
>  
>  /* A few variables that don't seem worth passing around as parameters */
>  static int   elevel = -1;
> @@ -358,6 +364,7 @@ static void end_parallel_vacuum(Relation *Irel, 
> IndexBulkDeleteResult **stats,
>   LVParallelState 
> *lps, int nindexes);
>  static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
>  static bool 

Re: [HACKERS] kqueue

2020-01-20 Thread Tom Lane
I wrote:
> Peter Eisentraut  writes:
>> I took this patch for a quick spin on macOS.  The result was that the 
>> test suite hangs in the test src/test/recovery/t/017_shm.pl.  I didn't 
>> see any mentions of this anywhere in the thread, but that test is newer 
>> than the beginning of this thread.  Can anyone confirm or deny this 
>> issue?  Is it specific to macOS perhaps?

> Yeah, I duplicated the problem in macOS Catalina (10.15.2), using today's
> HEAD.  The core regression tests pass, as do the earlier recovery tests
> (I didn't try a full check-world though).  Somewhere early in 017_shm.pl,
> things freeze up with four postmaster-child processes stuck in 100%-
> CPU-consuming loops.

I observe very similar behavior on FreeBSD/amd64 12.0-RELEASE-p12,
so it's not just macOS.

I now think that the autovac launcher isn't actually stuck in the way
that the other processes are.  The ones that are actually consuming
CPU are the checkpointer, bgwriter, and walwriter.  On the FreeBSD
box their stack traces are

(gdb) bt
#0  _close () at _close.S:3
#1  0x007b4dd1 in FreeWaitEventSet (set=) at latch.c:660
#2  WaitLatchOrSocket (latch=0x80a1477a8, wakeEvents=, sock=-1, 
timeout=, wait_event_info=83886084) at latch.c:432
#3  0x0074a1b0 in CheckpointerMain () at checkpointer.c:514
#4  0x005691e2 in AuxiliaryProcessMain (argc=2, argv=0x7fffce90)
at bootstrap.c:461

(gdb) bt
#0  _fcntl () at _fcntl.S:3
#1  0x000800a6cd84 in fcntl (fd=4, cmd=2)
at /usr/src/lib/libc/sys/fcntl.c:56
#2  0x007b4eb5 in CreateWaitEventSet (context=, 
nevents=) at latch.c:625
#3  0x007b4c82 in WaitLatchOrSocket (latch=0x80a147b00, wakeEvents=41, 
sock=-1, timeout=200, wait_event_info=83886083) at latch.c:389
#4  0x00749ecd in BackgroundWriterMain () at bgwriter.c:304
#5  0x005691dd in AuxiliaryProcessMain (argc=2, argv=0x7fffce90)
at bootstrap.c:456

(gdb) bt
#0  _kevent () at _kevent.S:3
#1  0x007b58a1 in WaitEventAdjustKqueue (set=0x800e6a120, 
event=0x800e6a170, old_events=) at latch.c:1034
#2  0x007b4d87 in AddWaitEventToSet (set=, 
events=, 
fd=-1, latch=, user_data=) at latch.c:778
#3  WaitLatchOrSocket (latch=0x80a147e58, wakeEvents=41, sock=-1, 
timeout=5000, wait_event_info=83886093) at latch.c:410
#4  0x0075b349 in WalWriterMain () at walwriter.c:256
#5  0x005691ec in AuxiliaryProcessMain (argc=2, argv=0x7fffce90)
at bootstrap.c:467

Note that these are just snapshots --- it looks like these processes
are repeatedly creating and destroying WaitEventSets, they're not
stuck inside the kernel.

regards, tom lane




Re: Greatest Common Divisor

2020-01-20 Thread Alvaro Herrera
On 2020-Jan-20, Dean Rasheed wrote:

> +   
> +greatest common divisor  the largest positive number that
> +divides both inputs with no remainder; returns 0 
> if
> +both inputs are zero
> +   

Warning, severe TOC/bikeshedding ahead.

I don't know why, but this dash-semicolon sequence reads strange to me
and looks out of place.  I would use parens for the first phrase and
keep the semicolon, that is "greatest common divisor (the largest ...);
returns 0 if ..."

That seems more natural to me, and we're already using parens in other
description s.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Index Skip Scan

2020-01-20 Thread Jesper Pedersen

Hi Floris,

On 1/15/20 8:33 AM, Floris Van Nee wrote:

I reviewed the latest version of the patch. Overall some good improvements I 
think. Please find my feedback below.



Thanks for your review !


- I think I mentioned this before - it's not that big of a deal, but it just 
looks weird and inconsistent to me:
create table t2 as (select a, b, c, 10 d from generate_series(1,5) a, 
generate_series(1,100) b, generate_series(1,1) c); create index on t2 
(a,b,c desc);

postgres=# explain select distinct on (a,b) a,b,c from t2 where a=2 and b>=5 and 
b<=5 order by a,b,c desc;
QUERY PLAN
-
  Index Only Scan using t2_a_b_c_idx on t2  (cost=0.43..216.25 rows=500 
width=12)
Skip scan: true
Index Cond: ((a = 2) AND (b >= 5) AND (b <= 5))
(3 rows)

postgres=# explain select distinct on (a,b) a,b,c from t2 where a=2 and b=5 
order by a,b,c desc;
QUERY PLAN
-
  Unique  (cost=0.43..8361.56 rows=500 width=12)
->  Index Only Scan using t2_a_b_c_idx on t2  (cost=0.43..8361.56 rows=9807 
width=12)
  Index Cond: ((a = 2) AND (b = 5))
(3 rows)

When doing a distinct on (params) and having equality conditions for all 
params, it falls back to the regular index scan even though there's no reason 
not to use the skip scan here. It's much faster to write b between 5 and 5 now 
rather than writing b=5. I understand this was a limitation of the unique-keys 
patch at the moment which could be addressed in the future. I think for the 
sake of consistency it would make sense if this eventually gets addressed.



Agreed, that it is an improvement that should be made. I would like 
David's view on this since it relates to the UniqueKey patch.



- nodeIndexScan.c, line 126
This sets xs_want_itup to true in all cases (even for non skip-scans). I don't 
think this is acceptable given the side-effects this has (page will never be 
unpinned in between returned tuples in _bt_drop_lock_and_maybe_pin)



Correct - fixed.


- nbsearch.c, _bt_skip, line 1440
_bt_update_skip_scankeys(scan, indexRel); This function is called twice now - 
once in the else {} and immediately after that outside of the else. The second 
call can be removed I think.



Yes, removed the "else" call site.


- nbtsearch.c _bt_skip line 1597
LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
scan->xs_itup = (IndexTuple) PageGetItem(page, 
itemid);

This is an UNLOCK followed by a read of the unlocked page. That looks incorrect?



Yes, that needed to be changed.


- nbtsearch.c _bt_skip line 1440
if (BTScanPosIsValid(so->currPos) &&
_bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, 
dir))

Is it allowed to look at the high key / low key of the page without have a read 
lock on it?



In case of a split the page will still contain a high key and a low key, 
so this should be ok.



- nbtsearch.c line 1634
if (_bt_readpage(scan, indexdir, offnum))  ...
else
  error()

Is it really guaranteed that a match can be found on the page itself? Isn't it 
possible that an extra index condition, not part of the scan key, makes none of 
the keys on the page match?



The logic for this has been changed.


- nbtsearch.c in general
Most of the code seems to rely quite heavily on the fact that xs_want_itup 
forces _bt_drop_lock_and_maybe_pin to never release the buffer pin. Have you 
considered that compacting of a page may still happen even if you hold the pin? 
[1] I've been trying to come up with cases in which this may break the patch, 
but I haven't able to produce such a scenario - so it may be fine. But it would 
be good to consider again. One thing I was thinking of was a scenario where 
page splits and/or compacting would happen in between returning tuples. Could 
this break the _bt_scankey_within_page check such that it thinks the scan key 
is within the current page, while it actually isn't? Mainly for backward and/or 
cursor scans. Forward scans shouldn't be a problem I think. Apologies for being 
a bit vague as I don't have a clear example ready when it would go wrong. It 
may well be fine, but it was one of the things on my mind.



There is a BT_READ lock in place when finding the correct leaf page, or 
searching within the leaf page itself. _bt_vacuum_one_page deletes only 
LP_DEAD tuples, but those are already ignored in _bt_readpage. Peter, do 
you have some feedback for this ?



Please, find the updated patches attached that Dmitry and I made.

Thanks again !

Best regards,
 Jesper
>From 3c540c93307e6cbe792b31b12d4ecb025cd6b327 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Fri, 15 Nov 2019 09:46:05 -0500
Subject: [PATCH 1/2] Unique key

Design by David Rowley.

Author: Jesper Pedersen
---
 

Re: Physical replication slot advance is not persistent

2020-01-20 Thread Andres Freund
Hi,

On 2020-01-20 15:45:40 +0900, Michael Paquier wrote:
> On Thu, Jan 16, 2020 at 08:09:09PM +0300, Alexey Kondratov wrote:
> > OK, I have definitely overthought that, thanks. This looks like a minimal
> > subset of changes that actually solves the bug. I would only prefer to keep
> > some additional comments (something like the attached), otherwise after half
> > a year it will be unclear again, why we save slot unconditionally here.
> 
> Since this email, Andres has sent an email that did not reach the
> community lists, but where all the participants of this thread were in
> CC.

Ugh, that was an accident.


> Here is a summary of the points raised (please correct me if that
> does not sound right to you, Andres):

> 1) The slot advancing has to mark the slot as dirty, but should we
> make the change persistent at the end of the function or should we
> wait for a checkpoint to do the work, meaning that any update done to
> the slot would be lost if a crash occurs in-between?  Note that we
> have this commit in slotfuncs.c for
> pg_logical_replication_slot_advance():
>  * Dirty the slot so it's written out at the next checkpoint.
>  * We'll still lose its position on crash, as documented, but it's
>  * better than always losing the position even on clean restart.
> 
> This comment refers to the documentation for the logical decoding
> section (see logicaldecoding-replication-slots in
> logicaldecoding.sgml), and even if nothing can be done until the slot
> advance function reaches its hand, we ought to make the data
> persistent if we can.

That doesn't really seem like a meaningful reference, because the
concerns between constantly streaming out changes (where we don't want
to fsync every single transaction), and doing so in a manual advance
through an sql function, seem different.


> 3) The amount of testing related to slot advancing could be better
> with cluster-wide operations.
> 
> @@ -370,6 +370,11 @@ pg_physical_replication_slot_advance(XLogRecPtr
> moveto)
> MyReplicationSlot->data.restart_lsn = moveto;
> 
> SpinLockRelease(>mutex);
> retlsn = moveto;
> +
> +   ReplicationSlotMarkDirty();
> +
> +   /* We moved retart_lsn, update the global value. */
> +   ReplicationSlotsComputeRequiredLSN();
> I think that the proposed patch is missing a call to
> ReplicationSlotsComputeRequiredXmin() here for physical slots.

Hm. It seems ok to include, but I don't think omitting it currently has
negative effects?


> So, I have been looking at this patch by myself, and updated it so as
> the extra slot save is done only if any advancing has been done, on
> top of the other computations that had better be around for
> consistency.

Hm, I don't necessarily what that's necessary.


> The patch includes TAP tests for physical and logical slots'
> durability across restarts.

Cool!


> diff --git a/src/backend/replication/slotfuncs.c 
> b/src/backend/replication/slotfuncs.c
> index bb69683e2a..af3e114fc9 100644
> --- a/src/backend/replication/slotfuncs.c
> +++ b/src/backend/replication/slotfuncs.c
> @@ -359,17 +359,20 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
>   * checkpoints.
>   */
>  static XLogRecPtr
> -pg_physical_replication_slot_advance(XLogRecPtr moveto)
> +pg_physical_replication_slot_advance(XLogRecPtr moveto, bool *advance_done)
>  {
>   XLogRecPtr  startlsn = MyReplicationSlot->data.restart_lsn;
>   XLogRecPtr  retlsn = startlsn;
>  
> + *advance_done = false;
> +
>   if (startlsn < moveto)
>   {
>   SpinLockAcquire(>mutex);
>   MyReplicationSlot->data.restart_lsn = moveto;
>   SpinLockRelease(>mutex);
>   retlsn = moveto;
> + *advance_done = true;
>   }
>  
>   return retlsn;

Hm. Why did you choose not to use endlsn as before (except being
broken), or something? It seems quite conceivable somebody is using
these functions in an extension.




> +# Test physical slot advancing and its durability.  Create a new slot on
> +# the primary, not used by any of the standbys. This reserves WAL at 
> creation.
> +my $phys_slot = 'phys_slot';
> +$node_master->safe_psql('postgres',
> + "SELECT pg_create_physical_replication_slot('$phys_slot', true);");
> +$node_master->psql('postgres', "
> + CREATE TABLE tab_phys_slot (a int);
> + INSERT INTO tab_phys_slot VALUES (generate_series(1,10));");
> +my $psql_rc = $node_master->psql('postgres',
> + "SELECT pg_replication_slot_advance('$phys_slot', 'FF/');");
> +is($psql_rc, '0', 'slot advancing works with physical slot');

Hm. I'm think testing this with real LSNs is a better idea. What if the
node actually already is past FF/ at this point? Quite unlikely,
I know, but still. I.e. why not get the current LSN after the INSERT,
and assert that the slot, after the restart, is that?


Greetings,

Andres Freund




Re: Greatest Common Divisor

2020-01-20 Thread Vik Fearing
On 20/01/2020 11:28, Dean Rasheed wrote:
> Looking at the docs, I think it's worth going a little further than
> just saying what the acronyms stand for -- especially since the
> behaviour for zero inputs is an implementation choice (albeit the most
> common one). I propose the following:
>
> +   
> +greatest common divisor  the largest positive number that
> +divides both inputs with no remainder; returns 0 
> if
> +both inputs are zero
> +   
>
> and:
>
> +   
> +least common multiple  the smallest strictly positive number
> +that is an integer multiple of both inputs; returns
> 0
> +if either input is zero
> +   
>
> (I have tried to be precise in my use of terms like "number" and
> "integer", to cover the different cases)


In that case should lcm be "...that is an integral multiple..." since
the numeric version will return numeric?


Other than that, I'm happy with this change.

-- 

Vik Fearing





Re: SLRU statistics

2020-01-20 Thread Tomas Vondra

On Mon, Jan 20, 2020 at 03:01:36PM -0300, Alvaro Herrera wrote:

On 2020-Jan-20, Tomas Vondra wrote:


On Mon, Jan 20, 2020 at 01:04:33AM +, tsunakawa.ta...@fujitsu.com wrote:
> From: Tomas Vondra 
> > One of the stats I occasionally wanted to know are stats for the SLRU
> > stats (we have couple of those - clog, subtrans, ...). So here is a WIP
> > version of a patch adding that.
>
> How can users take advantage of this information?  I think we also need
> the ability to set the size of SLRU buffers.  (I want to be freed from
> the concern about the buffer shortage by setting the buffer size to its
> maximum.  For example, CLOG would be only 1 GB.)

You're right the users can't really take advantage of this - my primary
motivation was providing a feedback for devs, benchmarking etc. That
might have been done with DEBUG messages or something, but this seems
more convenient.


I think the stats are definitely needed if we keep the current code.
I've researched some specific problems in this code, such as the need
for more subtrans SLRU buffers; IIRC it was pretty painful to figure out
what the problem was without counters, and it'd have been trivial with
them.



Right. Improving our ability to monitor/measure things is the goal of
this patch.


I think it's unclear how desirable / necessary it is to allow users to
tweak those caches. I don't think we should have a GUC for everything,
but maybe there's some sort of heuristics to determine the size. The
assumption is we actually find practical workloads where the size of
these SLRUs is a performance issue.


I expect we'll eventually realize the need for changes in this area.
Either configurability in the buffer pool sizes, or moving them to be
part of shared_buffers (IIRC Thomas Munro had a patch for this.)
Example: SLRUs like pg_commit and pg_subtrans have higher buffer
consumption as the range of open transactions increases; for many users
this is not a concern and they can live with the default values.

(I think when pg_commit (née pg_clog) buffers were increased, we should
have increased pg_subtrans buffers to match.)



Quite possibly, yes. All I'm saying is that it's not something I intend
to address with this patch. It's quite possible the solutions will be
different for each SLRU, and that will require more research.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] kqueue

2020-01-20 Thread Thomas Munro
On Tue, Jan 21, 2020 at 2:34 AM Peter Eisentraut
 wrote:
> I took this patch for a quick spin on macOS.  The result was that the
> test suite hangs in the test src/test/recovery/t/017_shm.pl.  I didn't
> see any mentions of this anywhere in the thread, but that test is newer
> than the beginning of this thread.  Can anyone confirm or deny this
> issue?  Is it specific to macOS perhaps?

Thanks for testing, and sorry I didn't run a full check-world after
that rebase.  What happened here is that after commit cfdf4dc4 landed
on master, every implementation now needs to check for
exit_on_postmaster_death, and this patch didn't get the message.
Those processes are stuck in their main loops having detected
postmaster death, but not having any handling for it.  Will fix.




Re: Increase psql's password buffer size

2020-01-20 Thread David Fetter
On Mon, Jan 20, 2020 at 01:12:35PM -0500, Tom Lane wrote:
> David Fetter  writes:
> > At least two cloud providers are now stuffing large amounts of
> > information into the password field. This change makes it possible to
> > accommodate that usage in interactive sessions.
> 
> Like who?

AWS and Azure are two examples I know of.

> It seems like a completely silly idea.  And if 2K is sane, why not
> much more?

Good question. Does it make sense to rearrange these things so they're
allocated at runtime instead of compile time?

> (I can't say that s/100/2048/ in one place is a particularly evil
> change; what bothers me is the likelihood that there are other
> places that won't cope with arbitrarily long passwords.  Not all of
> them are necessarily under our control, either.)

I found one that is, so please find attached the next revision of the
patch.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From dfe72e1f7b3af646ba3d0bff017c9574eb54eb4c Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Mon, 20 Jan 2020 09:58:19 -0800
Subject: [PATCH v2] Increase psql's password buffer size
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.24.1"

This is a multi-part message in MIME format.
--2.24.1
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


At least two cloud providers are now stuffing large amounts of
information into the password field. This makes it possible to
accommodate that usage in interactive sessions.

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index e111cee556..61386fe4ae 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -1827,8 +1827,8 @@ exec_command_password(PsqlScanState scan_state, bool active_branch)
 	{
 		char	   *opt0 = psql_scan_slash_option(scan_state,
   OT_SQLID, NULL, true);
-		char		pw1[100];
-		char		pw2[100];
+		char		pw1[2048];
+		char		pw2[2048];
 
 		simple_prompt("Enter new password: ", pw1, sizeof(pw1), false);
 		simple_prompt("Enter it again: ", pw2, sizeof(pw2), false);
@@ -2845,7 +2845,7 @@ copy_previous_query(PQExpBuffer query_buf, PQExpBuffer previous_buf)
 static char *
 prompt_for_password(const char *username)
 {
-	char		buf[100];
+	char		buf[2048];
 
 	if (username == NULL || username[0] == '\0')
 		simple_prompt("Password: ", buf, sizeof(buf), false);

--2.24.1--




Re: SLRU statistics

2020-01-20 Thread Alvaro Herrera
On 2020-Jan-20, Tomas Vondra wrote:

> On Mon, Jan 20, 2020 at 01:04:33AM +, tsunakawa.ta...@fujitsu.com wrote:
> > From: Tomas Vondra 
> > > One of the stats I occasionally wanted to know are stats for the SLRU
> > > stats (we have couple of those - clog, subtrans, ...). So here is a WIP
> > > version of a patch adding that.
> > 
> > How can users take advantage of this information?  I think we also need
> > the ability to set the size of SLRU buffers.  (I want to be freed from
> > the concern about the buffer shortage by setting the buffer size to its
> > maximum.  For example, CLOG would be only 1 GB.)
> 
> You're right the users can't really take advantage of this - my primary
> motivation was providing a feedback for devs, benchmarking etc. That
> might have been done with DEBUG messages or something, but this seems
> more convenient.

I think the stats are definitely needed if we keep the current code.
I've researched some specific problems in this code, such as the need
for more subtrans SLRU buffers; IIRC it was pretty painful to figure out
what the problem was without counters, and it'd have been trivial with
them.

> I think it's unclear how desirable / necessary it is to allow users to
> tweak those caches. I don't think we should have a GUC for everything,
> but maybe there's some sort of heuristics to determine the size. The
> assumption is we actually find practical workloads where the size of
> these SLRUs is a performance issue.

I expect we'll eventually realize the need for changes in this area.
Either configurability in the buffer pool sizes, or moving them to be
part of shared_buffers (IIRC Thomas Munro had a patch for this.)
Example: SLRUs like pg_commit and pg_subtrans have higher buffer
consumption as the range of open transactions increases; for many users
this is not a concern and they can live with the default values.

(I think when pg_commit (née pg_clog) buffers were increased, we should
have increased pg_subtrans buffers to match.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Increase psql's password buffer size

2020-01-20 Thread Tom Lane
David Fetter  writes:
> At least two cloud providers are now stuffing large amounts of
> information into the password field. This change makes it possible to
> accommodate that usage in interactive sessions.

Like who?  It seems like a completely silly idea.  And if 2K is sane,
why not much more?

(I can't say that s/100/2048/ in one place is a particularly evil change;
what bothers me is the likelihood that there are other places that won't
cope with arbitrarily long passwords.  Not all of them are necessarily
under our control, either.)

regards, tom lane




Increase psql's password buffer size

2020-01-20 Thread David Fetter
Folks,

At least two cloud providers are now stuffing large amounts of
information into the password field. This change makes it possible to
accommodate that usage in interactive sessions.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 168c380548d1f97e4f6e9245851c22029931e8b5 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Mon, 20 Jan 2020 09:58:19 -0800
Subject: [PATCH v1] Increase psql's password buffer size
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.24.1"

This is a multi-part message in MIME format.
--2.24.1
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


At least two cloud providers are now stuffing large amounts of
information into the password field. This makes it possible to
accommodate that usage in interactive sessions.

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index e111cee556..84c2a0a37f 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -2845,7 +2845,7 @@ copy_previous_query(PQExpBuffer query_buf, PQExpBuffer previous_buf)
 static char *
 prompt_for_password(const char *username)
 {
-	char		buf[100];
+	char		buf[2048];
 
 	if (username == NULL || username[0] == '\0')
 		simple_prompt("Password: ", buf, sizeof(buf), false);

--2.24.1--




Re: [HACKERS] kqueue

2020-01-20 Thread Tom Lane
Thomas Munro  writes:
> [ 0001-Add-kqueue-2-support-for-WaitEventSet-v13.patch ]

I haven't read this patch in any detail, but a couple quick notes:

* It needs to be rebased over the removal of pg_config.h.win32
--- it should be touching Solution.pm instead, I believe.

* I'm disturbed by the addition of a hunk to the supposedly
system-API-independent WaitEventSetWait() function.  Is that
a generic bug fix?  If not, can we either get rid of it, or
at least wrap it in "#ifdef WAIT_USE_KQUEUE" so that this
patch isn't inflicting a performance penalty on everyone else?

regards, tom lane




Re: [HACKERS] kqueue

2020-01-20 Thread Tom Lane
Peter Eisentraut  writes:
> I took this patch for a quick spin on macOS.  The result was that the 
> test suite hangs in the test src/test/recovery/t/017_shm.pl.  I didn't 
> see any mentions of this anywhere in the thread, but that test is newer 
> than the beginning of this thread.  Can anyone confirm or deny this 
> issue?  Is it specific to macOS perhaps?

Yeah, I duplicated the problem in macOS Catalina (10.15.2), using today's
HEAD.  The core regression tests pass, as do the earlier recovery tests
(I didn't try a full check-world though).  Somewhere early in 017_shm.pl,
things freeze up with four postmaster-child processes stuck in 100%-
CPU-consuming loops.  I captured stack traces:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x7fff6554dbb6 libsystem_kernel.dylib`kqueue + 10
frame #1: 0x000105511533 
postgres`CreateWaitEventSet(context=, nevents=) at 
latch.c:622:19 [opt]
frame #2: 0x000105511305 
postgres`WaitLatchOrSocket(latch=0x000112e02da4, wakeEvents=41, sock=-1, 
timeout=237000, wait_event_info=83886084) at latch.c:389:22 [opt]
frame #3: 0x0001054a7073 postgres`CheckpointerMain at 
checkpointer.c:514:10 [opt]
frame #4: 0x0001052da390 postgres`AuxiliaryProcessMain(argc=2, 
argv=0x7ffeea9dded0) at bootstrap.c:461:4 [opt]

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x7fff6554dbce libsystem_kernel.dylib`kevent + 10
frame #1: 0x000105511ddc 
postgres`WaitEventAdjustKqueue(set=0x7fc8e8805920, 
event=0x7fc8e8805958, old_events=) at latch.c:1034:7 [opt]
frame #2: 0x000105511638 postgres`AddWaitEventToSet(set=, 
events=, fd=, latch=, 
user_data=) at latch.c:778:2 [opt]
frame #3: 0x000105511342 
postgres`WaitLatchOrSocket(latch=0x000112e030f4, wakeEvents=41, sock=-1, 
timeout=200, wait_event_info=83886083) at latch.c:397:3 [opt]
frame #4: 0x0001054a6d69 postgres`BackgroundWriterMain at 
bgwriter.c:304:8 [opt]
frame #5: 0x0001052da38b postgres`AuxiliaryProcessMain(argc=2, 
argv=0x7ffeea9dded0) at bootstrap.c:456:4 [opt]

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x7fff65549c66 libsystem_kernel.dylib`close + 10
frame #1: 0x000105511466 postgres`WaitLatchOrSocket [inlined] 
FreeWaitEventSet(set=) at latch.c:660:2 [opt]
frame #2: 0x00010551145d 
postgres`WaitLatchOrSocket(latch=0x000112e03444, wakeEvents=, 
sock=-1, timeout=5000, wait_event_info=83886093) at latch.c:432 [opt]
frame #3: 0x0001054b8685 postgres`WalWriterMain at walwriter.c:256:10 
[opt]
frame #4: 0x0001052da39a postgres`AuxiliaryProcessMain(argc=2, 
argv=0x7ffeea9dded0) at bootstrap.c:467:4 [opt]

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x7fff655515be libsystem_kernel.dylib`__select + 10
frame #1: 0x0001056a6191 postgres`pg_usleep(microsec=) at 
pgsleep.c:56:10 [opt]
frame #2: 0x0001054abe12 postgres`backend_read_statsfile at 
pgstat.c:5720:3 [opt]
frame #3: 0x0001054adcc0 
postgres`pgstat_fetch_stat_dbentry(dbid=) at pgstat.c:2431:2 [opt]
frame #4: 0x0001054a320c postgres`do_start_worker at 
autovacuum.c:1248:20 [opt]
frame #5: 0x0001054a2639 postgres`AutoVacLauncherMain [inlined] 
launch_worker(now=632853327674576) at autovacuum.c:1357:9 [opt]
frame #6: 0x0001054a2634 
postgres`AutoVacLauncherMain(argc=, argv=) at 
autovacuum.c:769 [opt]
frame #7: 0x0001054a1ea7 postgres`StartAutoVacLauncher at 
autovacuum.c:415:4 [opt]

I'm not sure how much faith to put in the last couple of those, as
stopping the earlier processes could perhaps have had side-effects.
But evidently 017_shm.pl is doing something that interferes with
our ability to create kqueue-based WaitEventSets.

regards, tom lane




Re: SLRU statistics

2020-01-20 Thread Tomas Vondra

On Mon, Jan 20, 2020 at 01:04:33AM +, tsunakawa.ta...@fujitsu.com wrote:

From: Tomas Vondra 

One of the stats I occasionally wanted to know are stats for the SLRU
stats (we have couple of those - clog, subtrans, ...). So here is a WIP
version of a patch adding that.


How can users take advantage of this information?  I think we also need
the ability to set the size of SLRU buffers.  (I want to be freed from
the concern about the buffer shortage by setting the buffer size to its
maximum.  For example, CLOG would be only 1 GB.)



You're right the users can't really take advantage of this - my primary
motivation was providing a feedback for devs, benchmarking etc. That
might have been done with DEBUG messages or something, but this seems
more convenient.

I think it's unclear how desirable / necessary it is to allow users to
tweak those caches. I don't think we should have a GUC for everything,
but maybe there's some sort of heuristics to determine the size. The
assumption is we actually find practical workloads where the size of
these SLRUs is a performance issue.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add support for automatically updating Unicode derived files

2020-01-20 Thread Tom Lane
Peter Eisentraut  writes:
> On 2020-01-15 01:37, Tom Lane wrote:
>> This patch is making src/tools/pginclude/headerscheck unhappy:
>> ./src/include/common/unicode_combining_table.h:3: error: array type has 
>> incomplete element type

> Hmm, this file is only meant to be included inside one particular 
> function.  Making it standalone includable would seem to be unnecessary. 
>   What should we do?

Well, we could make it a documented exception in headerscheck and
cpluspluscheck.

regards, tom lane




Re: Option to dump foreign data in pg_dump

2020-01-20 Thread Luis Carril
On Tue, Jan 14, 2020 at 5:22 PM Luis Carril 
mailto:luis.car...@swarm64.com>> wrote:
Can you have a look at dump with parallel option. Parallel option will
take a lock on table while invoking lockTableForWorker. May be this is
not required for foreign tables.
Thoughts?
I tried with -j and found no issue. I guess that the foreign table needs 
locking anyway to prevent anyone to modify it while is being dumped.


I'm able to get the problem with the following steps:
Bring up a postgres setup with servers running in 5432 & 5433 port.

Execute the following commands in Server1 configured on 5432 port:

  *   CREATE EXTENSION postgres_fdw;

  *   CREATE SERVER foreign_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS 
(host '127.0.0.1', port '5433', dbname 'postgres');

  *   create user user1 password '123';

  *   alter user user1 with superuser;

  *   CREATE USER MAPPING FOR user1 SERVER foreign_server OPTIONS (user 
'user1', password '123');

Execute the following commands in Server2 configured on 5433 port:

  *   create user user1 password '123';

  *   alter user user1 with superuser;

Execute the following commands in Server2 configured on 5433 port as user1 user:

  *   create schema test;

  *   create table test.test1(id int);

  *   insert into test.test1 values(10);

Execute the following commands in Server1 configured on 5432 port as user1 user:

  *   CREATE FOREIGN TABLE foreign_table1 (id integer NOT NULL) SERVER 
foreign_server OPTIONS (schema_name 'test', table_name 'test1');

Without parallel option, the operation is successful:

  *   ./pg_dump -d postgres -f dumpdir -U user1 -F d  --include-foreign-data 
foreign_server

With parallel option it fails:

  *   ./pg_dump -d postgres -f dumpdir1 -U user1 -F d -j 5 
--include-foreign-data foreign_server

pg_dump: error: could not obtain lock on relation "public.foreign_table1"
This usually means that someone requested an ACCESS EXCLUSIVE lock on the table 
after the pg_dump parent process had gotten the initial ACCESS SHARE lock on 
the table.
pg_dump: error: a worker process died unexpectedly

There may be simpler steps than this to reproduce the issue, i have not try to 
optimize it.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Hi Vignesh,

   yes you are right I could reproduce it also with 'file_fdw'. The issue is 
that LOCK is not supported on foreign tables, so I guess that the safest 
solution is to make the --include-foreign-data incompatible with --jobs, 
because skipping the locking for foreign tables maybe can lead to a deadlock 
anyway. Suggestions?

Cheers
Luis M Carril


From: vignesh C 
Sent: Thursday, January 16, 2020 10:01 AM
To: Luis Carril 
Cc: Alvaro Herrera ; Daniel Gustafsson 
; Laurenz Albe ; PostgreSQL Hackers 

Subject: Re: Option to dump foreign data in pg_dump

On Tue, Jan 14, 2020 at 5:22 PM Luis Carril 
mailto:luis.car...@swarm64.com>> wrote:
Can you have a look at dump with parallel option. Parallel option will
take a lock on table while invoking lockTableForWorker. May be this is
not required for foreign tables.
Thoughts?
I tried with -j and found no issue. I guess that the foreign table needs 
locking anyway to prevent anyone to modify it while is being dumped.


I'm able to get the problem with the following steps:
Bring up a postgres setup with servers running in 5432 & 5433 port.

Execute the following commands in Server1 configured on 5432 port:

  *   CREATE EXTENSION postgres_fdw;

  *   CREATE SERVER foreign_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS 
(host '127.0.0.1', port '5433', dbname 'postgres');

  *   create user user1 password '123';

  *   alter user user1 with superuser;

  *   CREATE USER MAPPING FOR user1 SERVER foreign_server OPTIONS (user 
'user1', password '123');

Execute the following commands in Server2 configured on 5433 port:

  *   create user user1 password '123';

  *   alter user user1 with superuser;

Execute the following commands in Server2 configured on 5433 port as user1 user:

  *   create schema test;

  *   create table test.test1(id int);

  *   insert into test.test1 values(10);

Execute the following commands in Server1 configured on 5432 port as user1 user:

  *   CREATE FOREIGN TABLE foreign_table1 (id integer NOT NULL) SERVER 
foreign_server OPTIONS (schema_name 'test', table_name 'test1');

Without parallel option, the operation is successful:

  *   ./pg_dump -d postgres -f dumpdir -U user1 -F d  --include-foreign-data 
foreign_server

With parallel option it fails:

  *   ./pg_dump -d postgres -f dumpdir1 -U user1 -F d -j 5 
--include-foreign-data foreign_server

pg_dump: error: could not obtain lock on relation "public.foreign_table1"
This usually means that someone requested an ACCESS EXCLUSIVE lock on the table 
after the pg_dump parent process had gotten the initial ACCESS SHARE lock on 
the table.
pg_dump: error: a worker process died unexpectedly

There may be simpler steps than this to reproduce the 

Re: Physical replication slot advance is not persistent

2020-01-20 Thread a . kondratov
On 20 Jan 2020, 09:45 +0300, Michael Paquier , wrote:
>
> So, I have been looking at this patch by myself, and updated it so as
> the extra slot save is done only if any advancing has been done, on
> top of the other computations that had better be around for
> consistency. The patch includes TAP tests for physical and logical
> slots' durability across restarts.
>
> Thoughts?


I still think that this extra check of whether any advance just happened or not 
just adds extra complexity. We could use slot dirtiness for the same purpose 
and slot saving routines check it automatically. Anyway, approach with adding a 
new flag should resolve this bug as well, of course, and maybe it will be a bit 
more transparent and explicit.

Just eyeballed your patch and it looks fine at a first glance, excepting the 
logical slot advance part:

   retlsn = MyReplicationSlot->data.confirmed_flush;
+   *advance_done = true;

   /* free context, call shutdown callback */
   FreeDecodingContext(ctx);

I am not sure that this is a right place to set advance_done flag to true. 
Wouldn’t it be set to true even in the case of no-op, when 
LogicalConfirmReceivedLocation was never executed? Probably we should set the 
flag near the LogicalConfirmReceivedLocation call?


--
Alexey Kondratov



Re: TRUNCATE on foreign tables

2020-01-20 Thread Michael Paquier
On Mon, Jan 20, 2020 at 11:30:34AM +0900, Kohei KaiGai wrote:
> Sorry, it was a midnight job on Friday.

Should I be, err, worried about that?  ;)

> Please check the attached patch.

+   switch (behavior)
+   {
+   case DROP_RESTRICT:
+   appendStringInfoString(buf, " RESTRICT");
+   break;
+   case DROP_CASCADE:
+   appendStringInfoString(buf, " CASCADE");
+   break;
+   default:
+   elog(ERROR, "Bug? unexpected DropBehavior (%d)",
(int)behavior);
+   break;
+   }
Here, you can actually remove the default clause.  By doing so,
compilation would generate a warning if a new value is added to
DropBehavior if it is not listed.  So anybody adding a new value to
the enum will need to think about this code path.

+   ereport(ERROR,
+   (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+errmsg("foreign data wrapper \"%s\" on behalf of the
foreign table \"%s\" does not support TRUNCATE",
+   fdw->fdwname, relname)));
I see two problems here:
- The error message is too complicated.  I would just use "cannot
truncate foreign table \"%s\"".
- The error code should be ERRCODE_FEATURE_NOT_SUPPORTED.

The docs for the FDW description can be improved.  I found that a
large portion of it used rather unclear English, and that things were
not clear enough regarding the use of a list of relations, when an
error is raised because ExecForeignTruncate is NULL, etc.  I have also
cut the last paragraph which is actually implementation-specific
(think for example about callbacks at xact commit/abort time).

Documentation needs to be added to postgres_fdw about the truncation
support.  Particularly, providing details about the possibility to do
truncates in our shot for a set of relations so as dependencies are
automatically handled is an advantage to mention.

There is no need to include the truncate routine in
ForeignTruncateInfo, as the server OID can be used to find it.

Another thing is that I would prefer splitting the patch into two
separate commits, so attached are two patches:
- 0001 for the addition of the in-core API
- 0002 for the addition of support in postgres_fdw.

I have spent a good amount of time polishing 0001, tweaking the docs,
comments, error messages and a bit its logic.  I am getting
comfortable with it, but it still needs an extra lookup, an indent run
which has some noise and I lacked of time today.  0002 has some of its
issues fixed and I have not reviewed it fully yet.  There are still
some places not adapted in it (why do you use "Bug?" in all your
elog() messages by the way?), so the postgres_fdw part needs more
attention.  Could you think about some docs for it by the way?
--
Michael
From ed63df0fdeac43443b8e15709c19fe6c1f38a1f8 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 20 Jan 2020 22:39:34 +0900
Subject: [PATCH 1/2] Add FDW callback for support of TRUNCATE

---
 src/include/foreign/fdwapi.h   |   7 ++
 src/backend/commands/tablecmds.c   | 110 -
 src/test/regress/expected/foreign_data.out |   8 +-
 doc/src/sgml/fdwhandler.sgml   |  36 +++
 4 files changed, 153 insertions(+), 8 deletions(-)

diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..0a9f36735e 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -151,6 +151,10 @@ typedef bool (*AnalyzeForeignTable_function) (Relation relation,
 typedef List *(*ImportForeignSchema_function) (ImportForeignSchemaStmt *stmt,
 			   Oid serverOid);
 
+typedef void (*ExecForeignTruncate_function) (List *frels_list,
+			  DropBehavior behavior,
+			  bool restart_seqs);
+
 typedef Size (*EstimateDSMForeignScan_function) (ForeignScanState *node,
  ParallelContext *pcxt);
 typedef void (*InitializeDSMForeignScan_function) (ForeignScanState *node,
@@ -236,6 +240,9 @@ typedef struct FdwRoutine
 	/* Support functions for IMPORT FOREIGN SCHEMA */
 	ImportForeignSchema_function ImportForeignSchema;
 
+	/* Support functions for TRUNCATE */
+	ExecForeignTruncate_function ExecForeignTruncate;
+
 	/* Support functions for parallelism under Gather node */
 	IsForeignScanParallelSafe_function IsForeignScanParallelSafe;
 	EstimateDSMForeignScan_function EstimateDSMForeignScan;
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 30b72b6297..f83f88a82f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -60,6 +60,7 @@
 #include "commands/user.h"
 #include "executor/executor.h"
 #include "foreign/foreign.h"
+#include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -295,6 +296,20 @@ struct DropRelationCallbackState
 #define		ATT_FOREIGN_TABLE		0x0020
 #define		ATT_PARTITIONED_INDEX	0x0040
 
+/*
+ * ForeignTruncateInfo
+ *
+ * Information related to truncation of foreign tables.  This
+ * is a single 

Re: [HACKERS] kqueue

2020-01-20 Thread Peter Eisentraut

On 2019-12-20 01:26, Thomas Munro wrote:

It's still my intention to get this committed eventually, but I got a
bit frazzled by conflicting reports on several operating systems.  For
FreeBSD, performance was improved in many cases, but there were also
some regressions that seemed to be related to ongoing work in the
kernel that seemed worth waiting for.  I don't have the details
swapped into my brain right now, but there was something about a big
kernel lock for Unix domain sockets which possibly explained some
local pgbench problems, and there was also a problem relating to
wakeup priority with some test parameters, which I'd need to go and
dig up.  If you want to test this and let us know how you get on,
that'd be great!  Here's a rebase against PostgreSQL's master branch,


I took this patch for a quick spin on macOS.  The result was that the 
test suite hangs in the test src/test/recovery/t/017_shm.pl.  I didn't 
see any mentions of this anywhere in the thread, but that test is newer 
than the beginning of this thread.  Can anyone confirm or deny this 
issue?  Is it specific to macOS perhaps?


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Crash in BRIN summarization

2020-01-20 Thread Alvaro Herrera
On 2020-Jan-20, Heikki Linnakangas wrote:

> Sorry, forgot all about it. Pushed now.

Thank you!

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Making psql error out on output failures

2020-01-20 Thread Daniel Verite
David Zhang wrote:

> Yes, I agree with you. For case 2 "select repeat('111', 100) \g 
> /mnt/ramdisk/file", it can be easily fixed with more accurate error 
> message similar to pg_dump, one example could be something like below. 
> But for case 1 "psql -d postgres  -At -c "select repeat('111', 100)" 
> > /mnt/ramdisk/file" , it may require a lot of refactoring work.

I don't quite see why you make that distinction? The relevant bits of
code are common, it's all the code in src/fe_utils/print.c called
from PrintQuery().

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Greatest Common Divisor

2020-01-20 Thread Dean Rasheed
Looking at the docs, I think it's worth going a little further than
just saying what the acronyms stand for -- especially since the
behaviour for zero inputs is an implementation choice (albeit the most
common one). I propose the following:

+   
+greatest common divisor  the largest positive number that
+divides both inputs with no remainder; returns 0 if
+both inputs are zero
+   

and:

+   
+least common multiple  the smallest strictly positive number
+that is an integer multiple of both inputs; returns
0
+if either input is zero
+   

(I have tried to be precise in my use of terms like "number" and
"integer", to cover the different cases)

Regards,
Dean




Re: pgsql: Allow 'sslkey' and 'sslcert' in postgres_fdw user mappings

2020-01-20 Thread Andrew Dunstan


On 1/20/20 2:48 AM, Craig Ringer wrote:
> On Thu, 9 Jan 2020 at 22:38, Christoph Berg  > wrote:
>
> Re: Robert Haas 2020-01-09
>  >
> > Does this mean that a non-superuser can induce postgres_fdw to
> read an
> > arbitrary file from the local filesystem?
>
> Yes, see my comments in the "Allow 'sslkey' and 'sslcert' in
> postgres_fdw user mappings" thread.
>
>
> Ugh, I misread your comment.
>
> You raise a sensible concern.
>
> These options should be treated the same as the proposed option to
> allow passwordless connections: disallow creation or alteration of FDW
> connection strings that use them by non-superusers. So a superuser can
> define a user mapping that uses these options, but normal users may not.
>
>


Already done.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: Add support for automatically updating Unicode derived files

2020-01-20 Thread Peter Eisentraut

On 2020-01-15 01:37, Tom Lane wrote:

Peter Eisentraut  writes:

Committed, thanks.


This patch is making src/tools/pginclude/headerscheck unhappy:

./src/include/common/unicode_combining_table.h:3: error: array type has 
incomplete element type

I guess that header needs another #include, or else you need to
move some declarations around.


Hmm, this file is only meant to be included inside one particular 
function.  Making it standalone includable would seem to be unnecessary. 
 What should we do?


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Crash in BRIN summarization

2020-01-20 Thread Heikki Linnakangas

On 17/01/2020 23:35, Alvaro Herrera wrote:

On 2019-Aug-28, Heikki Linnakangas wrote:


I bumped into a little bug in BRIN, while hacking on something unrelated.
This causes a segfault, or an assertion failure if assertions are enabled:


Heikki, I just noticed that you haven't pushed this bugfix.  Would you
like me to?  (If I don't hear from you, I'll probably get to it next
week.)


Sorry, forgot all about it. Pushed now.

On 28/08/2019 14:03, Emre Hasegeli wrote:
>> brin_inclusion_union() has a similar issue, but I didn't write a script
>> to reproduce that. Fix attached.
>
> I am not sure about this part.  If it's okay to use col_a->bv_values
> without copying, it should also be okay to use col_b->bv_values, no?

No, the 'a' and 'b' arguments are not symmetric. 'a' is in a long-lived 
memory context, and the function modifies it in place, whereas 'b' is 
short-lived and isn't touched by the function.


- Heikki




Re: PATCH: standby crashed when replay block which truncated in standby but failed to truncate in master node

2020-01-20 Thread Amit Kapila
On Sat, Jan 18, 2020 at 9:18 AM Michael Paquier  wrote:
>
> On Fri, Jan 17, 2020 at 07:36:51PM +0900, Fujii Masao wrote:
> > So, what about
> >
> > ---
> > causes the system to ignore invalid page references in WAL records
> > (but still report a warning), and continue the recovery.
> > ---
>
> And that sounds good to me.  Switching the patch as ready for
> committer.
>

Are we planning to do something about the original problem reported in
this thread?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Remove page-read callback from XLogReaderState.

2020-01-20 Thread Kyotaro Horiguchi
Thanks!

At Fri, 17 Jan 2020 20:14:36 +0200, Heikki Linnakangas  wrote 
in 
> On 29/11/2019 10:14, Kyotaro Horiguchi wrote:
> > At Thu, 28 Nov 2019 21:37:03 +0900 (JST), Kyotaro Horiguchi
> >  wrote in
>  0dc8ead463 hit this. Rebased.
> >>>
> >>> Please review the pg_waldump.c hunks in 0001; they revert recent
> >>> changes.
> >>
> >> Ughhh! I'l check it. Thank you for noticing!!
> > Fixed that, re-rebased and small comment and cosmetic changes in this
> > version.
> 
> Thanks! I finally got around to look at this again. A lot has happened
> since I last looked at this. Notably, commit 0dc8ead463 introduced
> another callback function into the XLogReader interface. It's not
> entirely clear what the big picture with the new callback was and how
> that interacts with the refactoring here. I'll have to spend some time
> to make myself familiar with those changes.
> 
> Earlier in this thread, you wrote:
> > - Changed calling convention of XLogReadRecord
> > To make caller loop simple, XLogReadRecord now allows to specify
> > the same valid value while reading the a record. No longer need
> > to change lsn to invalid after the first call in the following
> > reader loop.
> >while (XLogReadRecord(state, lsn, , ) ==
> >XLREAD_NEED_DATA)
> >{
> >  if (!page_reader(state))
> >break;
> >}
> 
> Actually, I had also made a similar change in the patch version I
> posted at
> https://www.postgresql.org/message-id/4f7a5fad-fa04-b0a3-231b-56d200b646dc%40iki.fi.
>  Maybe
> you missed that email? It looks like you didn't include any of the
> changes from that in the patch series. In any case, clearly that idea
> has some merit, since we both independently made a change in that
> calling convention :-).

I'm very sorry but I missed that...

> Actually, I propose that we make that change first, and then continue
> reviewing the rest of these patches. I think it's a more convenient
> interface, independently of the callback refactoring. What do you
> think of the attached patch?

Separating XLogBeginRead seems reasonable.  The annoying recptr trick
is no longer needed. In particular some logical-decoding stuff become
far cleaner by the patch.

fetching_ckpt of ReadRecord is a bit annoying but that coundn't be
moved outside to, say, ReadCheckpointRecord in a clean way. (Anyway
XLogBeginRead is not the place, of course).

As I looked through it, it looks good to me but give me some more time
to look closer.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Improve errors when setting incorrect bounds for SSL protocols

2020-01-20 Thread Peter Eisentraut

On 2020-01-15 03:28, Michael Paquier wrote:

Good points.  And the get routines are not that portable in OpenSSL
either even if HEAD supports 1.0.1 and newer versions...  Attached is
an updated patch which uses a GUC check for both parameters, and
provides a hint on top of the original error message.  The SSL context
does not get reloaded if there is an error, so the errors from OpenSSL
cannot be triggered as far as I checked (after mixing a couple of
corrent and incorrect combinations manually).


The reason this wasn't done originally is that it is not correct to have 
GUC check hooks that refer to other GUC variables, because otherwise you 
get inconsistent behavior depending on the order of processing of the 
assignments.  In this case, I think it would work because you have 
symmetric checks for both variables, but in general it is a problematic 
strategy.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Allow 'sslkey' and 'sslcert' in postgres_fdw user mappings

2020-01-20 Thread Craig Ringer
On Fri, 10 Jan 2020 at 06:16, Andrew Dunstan 
wrote:

> On Fri, Jan 10, 2020 at 8:32 AM Tom Lane  wrote:
> >
> > Andrew Dunstan  writes:
> > > On Fri, Jan 10, 2020 at 1:21 AM Robert Haas 
> wrote:
> > >> I share the concern about the security issue here. I can't testify to
> > >> whether Christoph's whole analysis is here, but as a general point,
> > >> non-superusers can't be allowed to do things that cause the server to
> > >> access arbitrary local files.
> >
> > > It's probably fairly easy to do (c.f. 6136e94dcb). I'm not (yet)
> > > convinced that there is any significant security threat here. This
> > > doesn't give the user or indeed any postgres code any access to the
> > > contents of these files. But if there is a consensus to restrict this
> > > I'll do it.
> >
> > Well, even without access to the file contents, the mere ability to
> > probe the existence of a file is something we don't want unprivileged
> > users to have.  And (I suppose) this is enough for that, by looking
> > at what error you get back from trying it.
> >
>
>
> OK, that's convincing enough. Will do it before long.


Thanks. I'm 100% convinced the superuser restriction should be imposed. I
can imagine there being a risk of leaking file contents in error output
such as parse errors from OpenSSL that we pass on for example. Tricking Pg
into reading from a fifo could be problematic too.

I should've applied that restriction from the start, the same way as
passwordless connections are restricted.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Greatest Common Divisor

2020-01-20 Thread Vik Fearing
On 20/01/2020 08:44, Dean Rasheed wrote:
> On Tue, 7 Jan 2020 at 12:31, Tom Lane  wrote:
>> Dean Rasheed  writes:
>>> Do we actually need the smallint versions of these functions?
>> Doubt it.  It'd be fairly hard even to call those, since e.g. "42"
>> is an int not a smallint.
>>
> I see this has been marked RFC. I'll take it, 


Thanks!


> and barring objections,
> I'll start by ripping out the smallint code.


No strong objection.

-- 

Vik Fearing