Re: empty column name in error message

2019-12-17 Thread Michael Paquier
On Wed, Dec 18, 2019 at 11:23:17AM +0900, Amit Langote wrote:
> Thanks for adding the test case.

For the archives: this has been applied as of 2acab05.
--
Michael


signature.asc
Description: PGP signature


Re: Rework manipulation and structure of attribute mappings

2019-12-17 Thread Michael Paquier
On Tue, Dec 17, 2019 at 01:54:27PM +0900, Amit Langote wrote:
> Thanks for the updated patch.  I don't have any comments, except that
> the text I suggested couple of weeks ago no longer reads clear:

I have spent a couple of extra hours on the patch, and committed it.
There was one issue in logicalrelation.h which failed to compile
standalone.

> + * by DDL operating on inheritance and partition trees to convert fully
> + * transformed expression trees from parent rowtype to child rowtype or
> + * vice-versa.
> 
> Maybe:
> 
> ...to adjust the Vars in fully transformed expression trees to bear
> output relation's attribute numbers.

I have used something more generic at the end:
+ * mappings by comparing input and output TupleDescs.  Such mappings
+ * are typically used by DDL operating on inheritance and partition trees
+ * to do a conversion between rowtypes logically equivalent but with
+ * columns in a different order, taking into account dropped columns.
--
Michael


signature.asc
Description: PGP signature


Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2019-12-17 Thread Amit Khandekar
On Tue, 17 Dec 2019 at 17:40, Amit Khandekar  wrote:
> By the way, the backport patch is turning out to be simpler. It's
> because in pre-12 versions, the file offset is part of the Vfd
> structure, so all the offset handling is not required.

Please have a look at the attached backport patch for PG 11. branch.
Once you are ok with the patch, I will port it on other branches.
Note that in the patch, wherever applicable I have renamed the fd
variable to vfd to signify that it is a vfd, and not the kernel fd. If
we don't do the renaming, the patch would be still smaller, but I
think the renaming makes sense.

The recovery TAP tests don't seem to be there on 9.4 and 9.5 branch,
so I think it's ok to not have any tests with the patches on these
branches that don't have the tap tests.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


pg11_use_vfd_for_logrep.patch
Description: Binary data


obsolete comment in ExecBRUpdateTriggers()

2019-12-17 Thread Amit Langote
Hi,

It seems that d986d4e87f6 forgot to update a comment upon renaming a variable.

Attached fixes it.

Thanks,
Amit
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index faeea16d21..99cb5bf557 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3033,8 +3033,8 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
/*
 * In READ COMMITTED isolation level it's possible that target 
tuple
 * was changed due to concurrent update.  In that case we have 
a raw
-* subplan output tuple in newSlot, and need to run it through 
the
-* junk filter to produce an insertable tuple.
+* subplan output tuple in epqslot_candidate, and need to run it
+* through the junk filter to produce an insertable tuple.
 *
 * Caution: more than likely, the passed-in slot is the same as 
the
 * junkfilter's output slot, so we are clobbering the original 
value


Re: [HACKERS] Block level parallel vacuum

2019-12-17 Thread Amit Kapila
[please trim extra text before responding]

On Wed, Dec 18, 2019 at 12:01 PM Mahendra Singh  wrote:
>
> On Tue, 10 Dec 2019 at 00:30, Mahendra Singh  wrote:
> >
> >
> > 3.
> > After v35 patch, vacuum.sql regression test is taking too much time due to 
> > large number of inserts so by reducing number of tuples, we can reduce that 
> > time.
> > +INSERT INTO pvactst SELECT i, array[1,2,3], point(i, i+1) FROM 
> > generate_series(1,10) i;
> >
> > here, instead of 10, we can make 1000 to reduce time of this test case 
> > because we only want to test code and functionality.
>
> As we added check of min_parallel_index_scan_size in v36 patch set to
> decide parallel vacuum, 1000 tuples are not enough to do parallel
> vacuum. I can see that we are not launching any workers in vacuum.sql
> test case and hence, code coverage also decreased. I am not sure that
> how to fix this.
>

Try by setting min_parallel_index_scan_size to 0 in test case.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Block level parallel vacuum

2019-12-17 Thread Amit Kapila
On Wed, Dec 18, 2019 at 11:46 AM Masahiko Sawada
 wrote:
>
> On Wed, 18 Dec 2019 at 15:03, Amit Kapila  wrote:
> >
> > I was analyzing your changes related to ReinitializeParallelDSM() and
> > it seems like we might launch more number of workers for the
> > bulkdelete phase.   While creating a parallel context, we used the
> > maximum of "workers required for bulkdelete phase" and "workers
> > required for cleanup", but now if the number of workers required in
> > bulkdelete phase is lesser than a cleanup phase(as mentioned by you in
> > one example), then we would launch more workers for bulkdelete phase.
>
> Good catch. Currently when creating a parallel context the number of
> workers passed to CreateParallelContext() is set not only to
> pcxt->nworkers but also pcxt->nworkers_to_launch. We would need to
> specify the number of workers actually to launch after created the
> parallel context or when creating it. Or I think we call
> ReinitializeParallelDSM() even the first time running index vacuum.
>

How about just having ReinitializeParallelWorkers which can be called
only via vacuum even for the first time before the launch of workers
as of now?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Block level parallel vacuum

2019-12-17 Thread Mahendra Singh
On Tue, 10 Dec 2019 at 00:30, Mahendra Singh  wrote:
>
> On Fri, 6 Dec 2019 at 10:50, Amit Kapila  wrote:
> >
> > On Thu, Dec 5, 2019 at 7:44 PM Robert Haas  wrote:
> > >
> > > I think it might be a good idea to change what we expect index AMs to
> > > do rather than trying to make anything that they happen to be doing
> > > right now work, no matter how crazy. In particular, suppose we say
> > > that you CAN'T add data on to the end of IndexBulkDeleteResult any
> > > more, and that instead the extra data is passed through a separate
> > > parameter. And then you add an estimate method that gives the size of
> > > the space provided by that parameter (and if the estimate method isn't
> > > defined then the extra parameter is passed as NULL) and document that
> > > the data stored there might get flat-copied.
> > >
> >
> > I think this is a good idea and serves the purpose we are trying to
> > achieve currently.  However, if there are any IndexAM that is using
> > the current way to pass stats with additional information, they would
> > need to change even if they don't want to use parallel vacuum
> > functionality (say because their indexes are too small or whatever
> > other reasons).  I think this is a reasonable trade-off and the
> > changes on their end won't be that big.  So, we should do this.
> >
> > > Now, you've taken the
> > > onus off of parallel vacuum to cope with any crazy thing a
> > > hypothetical AM might be doing, and instead you've defined the
> > > behavior of that hypothetical AM as wrong. If somebody really needs
> > > that, it's now their job to modify the index AM machinery further
> > > instead of your job to somehow cope.
> > >
> >
> > makes sense.
> >
> > > > Here, we have a need to reduce the number of workers.  Index Vacuum
> > > > has two different phases (index vacuum and index cleanup) which uses
> > > > the same parallel-context/DSM but both could have different
> > > > requirements for workers.  The second phase (cleanup) would normally
> > > > need fewer workers as if the work is done in the first phase, second
> > > > wouldn't need it, but we have exceptions like gin indexes where we
> > > > need it for the second phase as well because it takes the pass
> > > > over-index again even if we have cleaned the index in the first phase.
> > > > Now, consider the case where we have 3 btree indexes and 2 gin
> > > > indexes, we would need 5 workers for index vacuum phase and 2 workers
> > > > for index cleanup phase.  There are other cases too.
> > > >
> > > > We also considered to have a separate DSM for each phase, but that
> > > > appeared to have overhead without much benefit.
> > >
> > > How about adding an additional argument to ReinitializeParallelDSM()
> > > that allows the number of workers to be reduced? That seems like it
> > > would be less confusing than what you have now, and would involve
> > > modify code in a lot fewer places.
> > >
> >
> > Yeah, we can do that.  We can maintain some information in
> > LVParallelState which indicates whether we need to reinitialize the
> > DSM before launching workers.  Sawada-San, do you see any problem with
> > this idea?
> >
> >
> > > > > Is there any legitimate use case for parallel vacuum in combination
> > > > > with vacuum cost delay?
> > > > >
> > > >
> > > > Yeah, we also initially thought that it is not legitimate to use a
> > > > parallel vacuum with a cost delay.  But to get a wider view, we
> > > > started a separate thread [2] and there we reach to the conclusion
> > > > that we need a solution for throttling [3].
> > >
> > > OK, thanks for the pointer. This doesn't address the other part of my
> > > complaint, though, which is that the whole discussion between you and
> > > Dilip and Sawada-san presumes that you want the delays ought to be
> > > scattered across the workers roughly in proportion to their share of
> > > the I/O, and it seems NOT AT ALL clear that this is actually a
> > > desirable property. You're all assuming that, but none of you has
> > > justified it, and I think the opposite might be true in some cases.
> > >
> >
> > IIUC, your complaint is that in some cases, even if the I/O rate is
> > enough for one worker, we will still launch more workers and throttle
> > them.  The point is we can't know in advance how much I/O is required
> > for each index.  We can try to do that based on index size, but I
> > don't think that will be right because it is possible that for the
> > bigger index, we don't need to dirty the pages and most of the pages
> > are in shared buffers, etc.  The current algorithm won't use more I/O
> > than required and it will be good for cases where one or some of the
> > indexes are doing more I/O as compared to others and it will also work
> > equally good even when the indexes have a similar amount of work.  I
> > think we could do better if we can predict how much I/O each index
> > requires before actually scanning the index.
> >
> > I agree with the other points (add a 

Re: partition routing layering in nodeModifyTable.c

2019-12-17 Thread Amit Langote
On Thu, Sep 26, 2019 at 1:56 PM Amit Langote  wrote:
> On Wed, Sep 4, 2019 at 10:45 AM Amit Langote  wrote:
> > On Fri, Aug 9, 2019 at 10:51 AM Amit Langote  
> > wrote:
> > To avoid losing track of this, I've added this to November CF.
> >
> > https://commitfest.postgresql.org/25/2277/
> >
> > Struggled a bit to give a title to the entry though.
>
> Noticed that one of the patches needed a rebase.
>
> Attached updated patches.  Note that v8-0001 is v7-0001 unchanged that
> Fujita-san posted on Aug 8.

Rebased again.

Thanks,
Amit
From a0f6939e5b2ee8f78c813545f45d59a346d22e5f Mon Sep 17 00:00:00 2001
From: Etsuro Fujita 
Date: Thu, 8 Aug 2019 21:41:12 +0900
Subject: [PATCH v9 1/4] Remove dependency on estate->es_result_relation_info
 from FDW APIs.

FDW APIs for executing a foreign table direct modification assumed that
the FDW would obtain the target foreign table's ResultRelInfo from
estate->es_result_relation_info of the passed-in ForeignScanState node,
but the upcoming patch(es) to refactor partitioning-related code in
nodeModifyTable.c will remove the es_result_relation_info variable.
Revise BeginDirectModify()'s API to pass the ResultRelInfo explicitly, to
remove the dependency on that variable from the FDW APIs.  For
ExecInitForeignScan() to efficiently get the ResultRelInfo to pass to
BeginDirectModify(), add a field to ForeignScan that gives the index of
the target foreign table in the list of the query result relations.

Patch by Amit Langote, following a proposal by Andres Freund, reviewed by
Andres Freund and me

Discussion: 
https://postgr.es/m/20190718010911.l6xcdv6birtxi...@alap3.anarazel.de
---
 contrib/postgres_fdw/postgres_fdw.c | 25 +++--
 doc/src/sgml/fdwhandler.sgml|  8 ++--
 src/backend/executor/nodeForeignscan.c  | 14 ++
 src/backend/nodes/copyfuncs.c   |  1 +
 src/backend/nodes/outfuncs.c|  1 +
 src/backend/nodes/readfuncs.c   |  1 +
 src/backend/optimizer/plan/createplan.c |  2 ++
 src/backend/optimizer/plan/setrefs.c| 15 +++
 src/include/foreign/fdwapi.h|  1 +
 src/include/nodes/plannodes.h   |  3 +++
 10 files changed, 59 insertions(+), 12 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c 
b/contrib/postgres_fdw/postgres_fdw.c
index bdc21b36d1..642deaf7cb 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -218,6 +218,7 @@ typedef struct PgFdwDirectModifyState
int num_tuples; /* # of result tuples */
int next_tuple; /* index of next one to 
return */
RelationresultRel;  /* relcache entry for the 
target relation */
+   ResultRelInfo *resultRelInfo;   /* ResultRelInfo for the target 
relation */
AttrNumber *attnoMap;   /* array of attnums of input user 
columns */
AttrNumber  ctidAttno;  /* attnum of input ctid column 
*/
AttrNumber  oidAttno;   /* attnum of input oid column */
@@ -361,7 +362,9 @@ static bool postgresPlanDirectModify(PlannerInfo *root,
 
ModifyTable *plan,
 Index 
resultRelation,
 int 
subplan_index);
-static void postgresBeginDirectModify(ForeignScanState *node, int eflags);
+static void postgresBeginDirectModify(ForeignScanState *node,
+ ResultRelInfo *rinfo,
+ int eflags);
 static TupleTableSlot *postgresIterateDirectModify(ForeignScanState *node);
 static void postgresEndDirectModify(ForeignScanState *node);
 static void postgresExplainForeignScan(ForeignScanState *node,
@@ -2319,6 +2322,11 @@ postgresPlanDirectModify(PlannerInfo *root,
rebuild_fdw_scan_tlist(fscan, returningList);
}
 
+   /*
+* Set the index of the subplan result rel.
+*/
+   fscan->resultRelIndex = subplan_index;
+
table_close(rel, NoLock);
return true;
 }
@@ -2328,7 +2336,9 @@ postgresPlanDirectModify(PlannerInfo *root,
  * Prepare a direct foreign table modification
  */
 static void
-postgresBeginDirectModify(ForeignScanState *node, int eflags)
+postgresBeginDirectModify(ForeignScanState *node,
+ ResultRelInfo *rinfo,
+ int eflags)
 {
ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
EState *estate = node->ss.ps.state;
@@ -2356,7 +2366,7 @@ postgresBeginDirectModify(ForeignScanState *node, int 
eflags)
 * Identify which user to do the remote access as.  This should match 
what
 * ExecCheckRTEPerms() does.
 */
-  

Re: [HACKERS] Block level parallel vacuum

2019-12-17 Thread Masahiko Sawada
On Wed, 18 Dec 2019 at 03:39, Mahendra Singh  wrote:
>
>
> Thanks for updated patches.  I verified my all reported issues and all are 
> fixed in v36 patch set.
>
> Below are some review comments:
> 1.
> +   /* cap by max_parallel_maintenace_workers */
> +   parallel_workers = Min(parallel_workers, 
> max_parallel_maintenance_workers);
>
> Here, spell of max_parallel_maintenace_workers is wrong.  (correct: 
> max_parallel_maintenance_workers)
>
> 2.
> + * size of stats for each index.  Also, this function   Since currently we 
> don't support parallel vacuum
> + * for autovacuum we don't need to care about autovacuum_work_mem
>
> Here, I think, 1st line should be changed because it is not looking correct 
> as grammatically.

Thank you for reviewing and testing this patch. I'll incorporate your
comments in the next version patch.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] Block level parallel vacuum

2019-12-17 Thread Masahiko Sawada
On Wed, 18 Dec 2019 at 15:03, Amit Kapila  wrote:
>
> On Tue, Dec 17, 2019 at 6:07 PM Masahiko Sawada
>  wrote:
> >
> > On Fri, 13 Dec 2019 at 15:50, Amit Kapila  wrote:
> > >
> > > > > I think it shouldn't be more than the number with which we have
> > > > > created a parallel context, no?  If that is the case, then I think it
> > > > > should be fine.
> > > >
> > > > Right. I thought that ReinitializeParallelDSM() with an additional
> > > > argument would reduce DSM but I understand that it doesn't actually
> > > > reduce DSM but just have a variable for the number of workers to
> > > > launch, is that right?
> > > >
> > >
> > > Yeah, probably, we need to change the nworkers stored in the context
> > > and it should be lesser than the value already stored in that number.
> > >
> > > > And we also would need to call
> > > > ReinitializeParallelDSM() at the beginning of vacuum index or vacuum
> > > > cleanup since we don't know that we will do either index vacuum or
> > > > index cleanup, at the end of index vacum.
> > > >
> > >
> > > Right.
> >
> > I've attached the latest version patch set. These patches requires the
> > gist vacuum patch[1]. The patch incorporated the review comments.
> >
>
> I was analyzing your changes related to ReinitializeParallelDSM() and
> it seems like we might launch more number of workers for the
> bulkdelete phase.   While creating a parallel context, we used the
> maximum of "workers required for bulkdelete phase" and "workers
> required for cleanup", but now if the number of workers required in
> bulkdelete phase is lesser than a cleanup phase(as mentioned by you in
> one example), then we would launch more workers for bulkdelete phase.

Good catch. Currently when creating a parallel context the number of
workers passed to CreateParallelContext() is set not only to
pcxt->nworkers but also pcxt->nworkers_to_launch. We would need to
specify the number of workers actually to launch after created the
parallel context or when creating it. Or I think we call
ReinitializeParallelDSM() even the first time running index vacuum.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: unsupportable composite type partition keys

2019-12-17 Thread Amit Langote
On Wed, Dec 18, 2019 at 2:12 AM Tom Lane  wrote:
> Amit Langote  writes:
> > It seems to me that we currently allow expressions that are anonymous
> > and self-referencing composite type records as partition key, but
> > shouldn't.  Allowing them leads to this:
>
> Hm.  Seems like the restrictions here ought to be just about the same
> as on index columns, no?  That is, it should be roughly a test like
> "no pseudo-types".  The check you're proposing seems awfully specific,
> and I doubt that the equivalent check in CREATE INDEX looks the same.
> (But I didn't go look ... I might be wrong.)

We also need to disallow self-referencing composite type in the case
of partitioning, because otherwise it leads to infinite recursion
shown in my first email.

The timing of building PartitionDesc is what causes it, because the
construction of PartitionBoundInfo in turn requires opening the parent
relation if the partition partition key is of self-referencing
composite type, because we need the TupleDesc when sorting the
partition bounds.  Maybe we'll need to rearrange that someday so that
PartitionDesc is built outside RelationBuildDesc path, so this
infinite recursion doesn't occur, but maybe allowing this case isn't
that useful to begin with?

Thanks,
Amit




Re: [HACKERS] Block level parallel vacuum

2019-12-17 Thread Amit Kapila
On Tue, Dec 17, 2019 at 6:07 PM Masahiko Sawada
 wrote:
>
> On Fri, 13 Dec 2019 at 15:50, Amit Kapila  wrote:
> >
> > > > I think it shouldn't be more than the number with which we have
> > > > created a parallel context, no?  If that is the case, then I think it
> > > > should be fine.
> > >
> > > Right. I thought that ReinitializeParallelDSM() with an additional
> > > argument would reduce DSM but I understand that it doesn't actually
> > > reduce DSM but just have a variable for the number of workers to
> > > launch, is that right?
> > >
> >
> > Yeah, probably, we need to change the nworkers stored in the context
> > and it should be lesser than the value already stored in that number.
> >
> > > And we also would need to call
> > > ReinitializeParallelDSM() at the beginning of vacuum index or vacuum
> > > cleanup since we don't know that we will do either index vacuum or
> > > index cleanup, at the end of index vacum.
> > >
> >
> > Right.
>
> I've attached the latest version patch set. These patches requires the
> gist vacuum patch[1]. The patch incorporated the review comments.
>

I was analyzing your changes related to ReinitializeParallelDSM() and
it seems like we might launch more number of workers for the
bulkdelete phase.   While creating a parallel context, we used the
maximum of "workers required for bulkdelete phase" and "workers
required for cleanup", but now if the number of workers required in
bulkdelete phase is lesser than a cleanup phase(as mentioned by you in
one example), then we would launch more workers for bulkdelete phase.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: progress report for ANALYZE

2019-12-17 Thread Tatsuro Yamada

Hi Amit-san,



  >>> 1)

For now, I'm not sure it should be set current_child_table_relid to zero
when the current phase is changed from "acquiring inherited sample rows" to
"computing stats". See  bellow.


In the upthread discussion [1], Robert asked to *not* do such things,
that is, resetting some values due to phase change.  I'm not sure his
point applies to this case too though.


//Below "computing stats" is for the partitioned table hoge,
//therefore the second column from the left side would be
//better to set Zero to easy to understand.


On second thought, maybe we should not reset, because that might be
considered loss of information.  To avoid confusion, we should simply
document that current_child_table_relid is only valid during the
"acquiring inherited sample rows" phase.



Okay, agreed. :)

 

2)
There are many "finalizing analyze" phases based on relids in the case
of partitioning tables. Would it better to fix the document? or it
would be better to reduce it to one?


It would be better to modify the document of "finalizing analyze" phase.

# Before modify
 The command is updating pg_class. When this phase is completed,
 ANALYZE will end.

# Modified
 The command is updating pg_class. When this phase is completed,
 ANALYZE will end. In the case of partitioned table,
 it might be shown on each partitions.

What do you think that? I'm going to fix it, if you agreed. :)


*All* phases are repeated in this case, not not just "finalizing
analyze", because ANALYZE repeatedly runs for each partition after the
parent partitioned table's ANALYZE finishes.  ANALYZE's documentation
mentions that analyzing a partitioned table also analyzes all of its
partitions, so users should expect to see the progress information for
each partition.  So, I don't think we need to clarify that if only in
one phase's description.  Maybe we can add a note after the phase
description table which mentions this implementation detail about
partitioned tables.  Like this:

   

 Note that when ANALYZE is run on a partitioned table,
 all of its partitions are also recursively analyzed as also mentioned on
 .  In that case, ANALYZE
 progress is reported first for the parent table, whereby its inheritance
 statistics are collected, followed by that for each partition.

   



Ah.. you are right: All phases are repeated, it shouldn't be fixed
the only one phase's description.



Some more comments on the documentation:

+   Number of computed extended stats.  This counter only advances
when the phase
+   is computing extended stats.

Number of computed extended stats -> Number of extended stats computed



Will fix.

 

+   Number of analyzed child tables.  This counter only advances
when the phase
+   is computing extended stats.

Regarding, "Number of analyzed child table", note that we don't
"analyze" child tables in this phase, only scan its blocks to collect
samples for parent's ANALYZE.  Also, the 2nd sentence is wrong -- you
meant "when the phase is acquiring inherited sample
rows.  I suggest to write this as follows:

Number of child tables scanned.  This counter only advances when the phase
is acquiring inherited sample rows.



Oops, I will fix it. :)


 

+ OID of the child table currently being scanned.
+   It might be different from relid when analyzing tables that
have child tables.

I suggest:

OID of the child table currently being scanned.  This field is only valid when
the phase is computing extended stats.



Will fix.



+   The command is currently scanning the
current_relid
+   to obtain samples.

I suggest:

The command is currently scanning the the table given by
current_relid to obtain sample rows.



Will fix.

 

+   The command is currently scanning the
current_child_table_relid
+   to obtain samples.

I suggest (based on phase description pg_stat_progress_create_index
phase descriptions):

The command is currently scanning child tables to obtain sample rows.  Columns
child_tables_total,
child_tables_done, and
current_child_table_relid contain the progress
information for this phase.



Will fix.



+
+ computing stats

I think the phase name should really be "computing statistics", that
is, use the full word.



Will fix.

 

+ 
+   The command is computing stats from the samples obtained
during the table scan.
+ 
+

So I suggest:

The command is computing statistics from the sample rows obtained during
the table scan



Will fix.

 

+ computing extended stats
+ 
+   The command is computing extended stats from the samples
obtained in the previous phase.
+ 

I suggest:

The command is computing extended statistics from the sample rows obtained
during the table scan.



Will fix.


Thanks for your above useful suggestions. It helps me a lot. :)


Cheers!
Tatsuro Yamada





Re: archive status ".ready" files may be created too early

2019-12-17 Thread Fujii Masao
On Fri, Dec 13, 2019 at 7:50 AM Bossart, Nathan  wrote:
>
> Hi hackers,
>
> I believe I've uncovered a bug that may cause archive status ".ready"
> files to be created too early, which in turn may cause an incorrect
> version of the corresponding WAL segment to be archived.
>
> The crux of the issue seems to be that XLogWrite() does not wait for
> the entire record to be written to disk before creating the ".ready"
> file.  Instead, it just waits for the last page of the segment to be
> written before notifying the archiver.  If PostgreSQL crashes before
> it is able to write the rest of the record, it will end up reusing the
> ".ready" segment at the end of crash recovery.  In the meantime, the
> archiver process may have already processed the old version of the
> segment.

Maybe I'm missing something... But since XLogWrite() seems to
call issue_xlog_fsync() before XLogArchiveNotifySeg(), ISTM that
this trouble shouldn't happen. No?

Regards,

-- 
Fujii Masao




Re: non-exclusive backup cleanup is mildly broken

2019-12-17 Thread Kyotaro Horiguchi
At Tue, 17 Dec 2019 15:48:40 -0500, Tom Lane  wrote in 
> I wrote:
> > Took a quick look.  I agree that this seems a lot cleaner than the
> > alternative proposals.  I'd suggest however that the header comment
> > for do_pg_abort_backup could do with more work, perhaps along the
> > lines of "The odd-looking signature allows this to be registered
> > directly as a shmem_exit handler".
> 
> > Personally I'd have kept the handler as a separate function that's just
> > a one-line wrapper around "void do_pg_abort_backup(bool emit_warning)".
> > We don't usually treat callbacks as functions to be also called in
> > their own right.  But if you don't want to do that, I'll settle for an
> > acknowledgement of the hack in the comment.
> 
> Oh, scratch that --- looking closer, I see that the only two use-cases in
> the patched code are via before_shmem_exit and PG_ENSURE_ERROR_CLEANUP,
> and both of those require a function with the signature of an on_exit
> callback.  So there's no need for a separate wrapper because this isn't
> going to be called any other way.  I still recommend amending the
> comment to explain why it has this signature, though.

The existing comment of do_pg_abort_backup follows seems to be a bit
stale to me. I think it can be revised that way.

  * NB: This is only for aborting a non-exclusive backup that doesn't write
  * backup_label. A backup started with pg_start_backup() needs to be finished
  * with pg_stop_backup().


Don't we need a regression test for this behavior?


+ * Register a handler that will warn about unterminated backups at end of
+ * session, unless this has already been done.

Though I'm not sure of the necessity to do, it might need to mention
cleaning up, which is actually done by the function.


Other than the above looks good to me. The patch applies cleanly and
works as expected.


In another branch of this thread,

At Tue, 17 Dec 2019 08:38:13 -0500, Robert Haas  wrote 
in 
> On the question of whether the assertion is needed, it is currently
> the case that you could call cancel_before_shmem_exit() without
> knowing whether you'd actually registered a handler or not. With the
> proposed change, that would no longer be legal. Maybe that's OK. But
> it doesn't seem entirely crazy to have some error-recovery path where
> cancel_before_shmem_exit() could get called twice in obscure
> circumstances, and right now, you can rest easy, knowing that the
> second call just won't do anything. If we make it an assertion failure
> to do such things, then you can't. On the other hand, maybe there's no
> use for such a construct, and it'd be better to just reduce confusion.
> Anyway, I think this is a separate topic from fixing this specific
> problem.

The part is not for the case "without knowing whether you'd actually
registered a handler or not", but for the case "without knowing
whether someone could have registered a handler or not after the
registration I made". But there is not so much difference and I don't
insist on that because of the reason you mentioned as above. Thanks
for elaborating.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: non-exclusive backup cleanup is mildly broken

2019-12-17 Thread Fujii Masao
On Wed, Dec 18, 2019 at 12:19 PM Robert Haas  wrote:
>
> On Tue, Dec 17, 2019 at 6:40 PM Michael Paquier  wrote:
> > So that's how you prevent piling up multiple registrations of this
> > callback compared to v1.  FWIW, I think that it is a cleaner approach
> > to remove the callback once a non-exclusive backup is done, because a
> > session has no need for it once it is done with its non-exclusive
> > backup, and this session may remain around for some time.
>
> The fact that the session may remain around for some time isn't really
> relevant, because the callback isn't consuming any resources. It does
> not increase memory usage by a single byte, nor CPU consumption
> either. It does consume a few CPU cycles when the backend finally
> exits, but the number of such cycles is very small and unrelated to
> the length of the session. And removing the callback isn't entirely
> free, either.
>
> I think the real point for me is that it's bad to have two sources of
> truth. We have the sessionBackupState variable that tells us whether
> we're in a backup, and then we separately have whether or not the
> callback is registered. If those two things ever get out of sync, as
> they do in the test cases that I've proposed, then we have problems --
> so it's better not to maintain the state in two separate ways.
>
> The way it's set up right now actually seems quite fragile even apart
> from the problem with cancel_before_shmem_exit(). do_pg_stop_backup()
> sets sessionBackupState to SESSION_BACKUP_NONE and then does things
> that might fail. If they do, then the cancel_before_shmem_exit()
> callback will leave the callback installed, which can lead to a
> spurious warning or assertion failure later as in the original report.
> The only way to avoid that problem would be to move the
> cancel_before_shmem_exit() callback so that it happens right next to
> setting sessionBackupState to SESSION_BACKUP_NONE -- note the comments
> there saying we can't even do WALInsertLockRelease() between updating
> XLogCtl and updating sessionBackupState. But that would be very ugly,
> because we'd then have to pass a flag to do_pg_stop_backup() saying
> whether to remove the callback, since only one caller wants that
> behavior.
>
> And, we'd have to change cancel_before_shmem_exit() to search the
> whole array, which would almost certainly cost more cycles than
> leaving a do-nothing callback around.

If pg_abort_backup callback function can be called safely even when
the backup is not in progress, we can just use the global variable like
pg_abort_backup_registered to register the callback function only
on first call. In this way, cancel_before_shmem_exit() doesn't need to
search the array to get rid of the function.

Regards,

-- 
Fujii Masao




Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2019-12-17 Thread Peter Geoghegan
On Thu, Dec 12, 2019 at 6:21 PM Peter Geoghegan  wrote:
> Still waiting for some review of the first patch, to get it out of the
> way. Anastasia?

I plan to commit this first patch [1] in the next day or two, barring
any objections.

It's clear that the nbtree "pin scan" VACUUM code is totally
unnecessary -- it really should have been fully removed by commit
3e4b7d87 back in 2016.

[1] 
https://www.postgresql.org/message-id/flat/CAH2-WzkWLRDzCaxsGvA_pZoaix_2AC9S6%3D-D6JMLkQYhqrJuEg%40mail.gmail.com#daed349a71ff9d7ac726cc0e3e01a436
-- 
Peter Geoghegan




Re: non-exclusive backup cleanup is mildly broken

2019-12-17 Thread Robert Haas
On Tue, Dec 17, 2019 at 6:40 PM Michael Paquier  wrote:
> So that's how you prevent piling up multiple registrations of this
> callback compared to v1.  FWIW, I think that it is a cleaner approach
> to remove the callback once a non-exclusive backup is done, because a
> session has no need for it once it is done with its non-exclusive
> backup, and this session may remain around for some time.

The fact that the session may remain around for some time isn't really
relevant, because the callback isn't consuming any resources. It does
not increase memory usage by a single byte, nor CPU consumption
either. It does consume a few CPU cycles when the backend finally
exits, but the number of such cycles is very small and unrelated to
the length of the session. And removing the callback isn't entirely
free, either.

I think the real point for me is that it's bad to have two sources of
truth. We have the sessionBackupState variable that tells us whether
we're in a backup, and then we separately have whether or not the
callback is registered. If those two things ever get out of sync, as
they do in the test cases that I've proposed, then we have problems --
so it's better not to maintain the state in two separate ways.

The way it's set up right now actually seems quite fragile even apart
from the problem with cancel_before_shmem_exit(). do_pg_stop_backup()
sets sessionBackupState to SESSION_BACKUP_NONE and then does things
that might fail. If they do, then the cancel_before_shmem_exit()
callback will leave the callback installed, which can lead to a
spurious warning or assertion failure later as in the original report.
The only way to avoid that problem would be to move the
cancel_before_shmem_exit() callback so that it happens right next to
setting sessionBackupState to SESSION_BACKUP_NONE -- note the comments
there saying we can't even do WALInsertLockRelease() between updating
XLogCtl and updating sessionBackupState. But that would be very ugly,
because we'd then have to pass a flag to do_pg_stop_backup() saying
whether to remove the callback, since only one caller wants that
behavior.

And, we'd have to change cancel_before_shmem_exit() to search the
whole array, which would almost certainly cost more cycles than
leaving a do-nothing callback around.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: segmentation fault when cassert enabled

2019-12-17 Thread Amit Kapila
On Tue, Dec 17, 2019 at 6:01 PM vignesh C  wrote:
>
> On Tue, Dec 17, 2019 at 10:09 AM Amit Kapila  wrote:
> >
> > Attached patch with updated commit message based on suggestions.  I am
> > planning to commit this tomorrow unless there are more comments.
> >
>
> While testing the patch on back versions, I found that the patch does
> not apply on PG 11 & PG 10 branch. Attached patch has the changes for
> PG 11 & PG 10 branch. Only difference in the patch was that table_open
> needed to be changed to heap_open. I have verified the patch on back
> branches and found it to be working. For PG 12 & current the previous
> patch that Amit need to be used, I'm not reattaching here.
>

Pushed.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: RFC: split OBJS lines to one object per line

2019-12-17 Thread Mahendra Singh
On Wed, 18 Dec 2019 at 07:23, Michael Paquier  wrote:
>
> On Tue, Dec 17, 2019 at 11:40:17PM +0530, Mahendra Singh wrote:
> > I found some inconsistency in alphabetical order in
> > src/backend/tsearch/Makefile, src/backend/utils/Makefile and
> > src/pl/plpython/Makefile files.  Attached patch is fixing those order
> > related inconsistency.
>
> Thanks, committed.  The one-liner style is also used in ifaddrs, but

Thanks Michael for quick response.

> fmgrtab.c is generated so I have left that out.  Now, have you tried
> to compile plpython before sending this patch?  Because as you forgot
> to add one backslash after WIN32RES, compilation was failing there.
> And you also forgot to remove two backslashes at the end of the other
> two lists modified :)

Sorry, I forgot to add backslashes. I will take care from next time.

Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com




Re: empty column name in error message

2019-12-17 Thread Amit Langote
On Wed, Dec 18, 2019 at 1:21 AM Tom Lane  wrote:
> Haven't read the patch in any detail yet, but that seems like
> an improvement.  And I guess we need a test case, or we'll
> break it again :-(

Thanks for adding the test case.

Regards,
Amit




Re: [PATCH] Windows port add support to BCryptGenRandom

2019-12-17 Thread Michael Paquier
On Tue, Dec 17, 2019 at 02:20:20PM +, Ranier Vilela wrote:
> As concern [1], at src/include/port/win32.h, the comments still
> references Windows XP and claims about possible MingW break.

This looks like a leftover of d9dd406, which has made the code to
require C99.  As we don't support compilation with Windows XP and
require Windows 7, we should be able to remove all the dance around
MIN_WINNT in win32.h, don't you think?
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Improve documentation of REINDEX options

2019-12-17 Thread Michael Paquier
On Tue, Dec 17, 2019 at 10:23:51AM +0100, Josef Šimánek wrote:
> For me this is the default way how to reindex whole table manually in psql
> since you get some "progress". Anyway I can remove it if you don't see any
> benefit in extending this example.

I have thought more about this one through the night, and I am still
not sure that this brings much.  So, committed a simpler version
handling only the option part.
--
Michael


signature.asc
Description: PGP signature


Re: RFC: split OBJS lines to one object per line

2019-12-17 Thread Michael Paquier
On Tue, Dec 17, 2019 at 11:40:17PM +0530, Mahendra Singh wrote:
> I found some inconsistency in alphabetical order in
> src/backend/tsearch/Makefile, src/backend/utils/Makefile and
> src/pl/plpython/Makefile files.  Attached patch is fixing those order
> related inconsistency.

Thanks, committed.  The one-liner style is also used in ifaddrs, but
fmgrtab.c is generated so I have left that out.  Now, have you tried
to compile plpython before sending this patch?  Because as you forgot
to add one backslash after WIN32RES, compilation was failing there.
And you also forgot to remove two backslashes at the end of the other
two lists modified :)
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Tiny optimization.

2019-12-17 Thread Bruce Momjian
On Sat, Nov 23, 2019 at 10:44:47AM +0100, Andreas Karlsson wrote:
> On 11/22/19 10:58 PM, Ranier Vilela wrote:
> > Remove redutant test.
> 
> Yeah, this test does look redundant since we already check for if parent is
> NULL earlier in the function. Any optimizing compiler should see this too,
> but it is still a good idea to remove it for code clarity.

Agreed, patch applied.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2019-12-17 Thread Peter Geoghegan
On Tue, Dec 17, 2019 at 5:18 PM Bruce Momjian  wrote:
> On Tue, Dec 17, 2019 at 03:30:33PM -0800, Peter Geoghegan wrote:
> > With many real world unique indexes, the true reason behind most or
> > all B-Tree page splits is "version churn". I view these page splits as
> > a permanent solution to a temporary problem -- we *permanently*
> > degrade the index structure in order to deal with a *temporary* burst
> > in versions that need to be stored. That's really bad.
>
> Yes, I was thinking why do we need to optimize duplicates in a unique
> index but then remembered is a version problem.

The whole idea of deduplication in unique indexes is hard to explain.
It just sounds odd. Also, it works using the same infrastructure as
regular deduplication, while having rather different goals.
Fortunately, it seems like we don't really have to tell users about it
in order for them to see a benefit -- there will be no choice for them
to make there (they just get it).

The regular deduplication stuff isn't confusing at all, though. It has
some noticeable though small downside, so it will be documented and
configurable. (I'm optimistic that it can be enabled by default,
because even with high cardinality non-unique indexes the downside is
rather small -- we waste some CPU cycles just before a page is split.)

-- 
Peter Geoghegan




Re: [Proposal] Level4 Warnings show many shadow vars

2019-12-17 Thread Michael Paquier
On Tue, Dec 17, 2019 at 09:36:13AM +0900, Michael Paquier wrote:
> As that's a confusion I introduced with d9fadbf, I would like to fix
> that and backpatch this change down to 11.  (Ranier gets the
> authorship per se as that's extracted from a larger patch).

Committed that part.

I got to look at the rest of the stuff discussed, and I am not sure
that any of the changes are actually things which improve
readability.

Let's take one example.  The changes in pg_dump/ like
/progname/prog_name/ have just been done in haste, without actual
thoughts about how the problem ought to be fixed.  And in this case,
something which could be more adapted is to remove the argument from
usage() because progname is a global variable, initialized from the
beginning in pg_restore.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2019-12-17 Thread Bruce Momjian
On Tue, Dec 17, 2019 at 03:30:33PM -0800, Peter Geoghegan wrote:
> With many real world unique indexes, the true reason behind most or
> all B-Tree page splits is "version churn". I view these page splits as
> a permanent solution to a temporary problem -- we *permanently*
> degrade the index structure in order to deal with a *temporary* burst
> in versions that need to be stored. That's really bad.

Yes, I was thinking why do we need to optimize duplicates in a unique
index but then remembered is a version problem.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: [PATCH] Fix possible underflow in expression (maxoff - 1)

2019-12-17 Thread Thomas Munro
On Mon, Nov 25, 2019 at 8:21 AM Ranier Vilela  wrote:
> >Where are you getting this stuff from? Are you using a static analysis tool?

> Yes,two static tools,  but reviewed by me.

If you're working on/with static code analysis tools, I have some
requests :-)  How could we automate the discovery of latch wait
programming mistakes?




Re: [HACKERS] Proposal to add work_mem option to postgres_fdw module

2019-12-17 Thread Jeff Janes
On Fri, Sep 7, 2018 at 9:17 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 05/09/2018 18:46, Peter Eisentraut wrote:
> > On 01/09/2018 06:33, Shinoda, Noriyoshi (PN Japan GCS Delivery) wrote:
> >> Certainly the PQconndefaults function specifies Debug flag for the
> "options" option.
> >> I think that eliminating the Debug flag is the simplest solution.
> >> For attached patches, GUC can be specified with the following syntax.
> >>
> >> CREATE SERVER remsvr1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host
> 'host 1', ..., options '-c work_mem=64MB -c geqo=off');
> >> ALTER SERVER remsv11 OPTIONS (SET options '-c work_mem=64MB -c
> geqo=off');
> >>
> >> However, I am afraid of the effect that this patch will change the
> behavior of official API PQconndefaults.
> >
> > It doesn't change the behavior of the API, it just changes the result of
> > the API function, which is legitimate and the reason we have the API
> > function in the first place.
> >
> > I think this patch is fine.  I'll work on committing it.
>
> I have committed just the libpq change.  The documentation change was
> redundant, because the documentation already stated that all libpq
> options are accepted.  (Arguably, the documentation was wrong before.)
>

Since the effect of this commit was to make postgres_fdw actually comply
with its documentation,
should this have been back-patched?  Is there a danger in backpatching this
change to libpq to older versions?

This seems like more of a bug fix than an added feature.

Cheers,

Jeff


Re: non-exclusive backup cleanup is mildly broken

2019-12-17 Thread Michael Paquier
On Tue, Dec 17, 2019 at 12:52:05PM -0500, Robert Haas wrote:
> On Tue, Dec 17, 2019 at 8:38 AM Robert Haas  wrote:
>> Since there doesn't seem to be any opposition to my original fix,
>> except for the fact that I included a bug in it, I'm going to go see
>> about getting that committed.
> 
> Perhaps I spoke too soon: I'm not sure whether Michael's comments
> amount to an objection. While I give him a chance to respond, here's
> an updated patch.

stoppoint = do_pg_stop_backup(label_file->data, waitforarchive, NULL);
-   cancel_before_shmem_exit(nonexclusive_base_backup_cleanup, (Datum) 0);
[...]
+void
+register_persistent_abort_backup_handler(void)
+{
+   static bool already_done = false;
+
+   if (already_done)
+   return;
So that's how you prevent piling up multiple registrations of this
callback compared to v1.  FWIW, I think that it is a cleaner approach
to remove the callback once a non-exclusive backup is done, because a
session has no need for it once it is done with its non-exclusive
backup, and this session may remain around for some time.

+   if (emit_warning)
+   ereport(WARNING,
+   (errmsg("aborting backup due to backend exiting before
pg_stop_back up was called")));
This warning is incorrect => "pg_stop_back up".  (Mentioned upthread.)
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2019-12-17 Thread Peter Geoghegan
On Tue, Dec 17, 2019 at 1:58 PM Bruce Momjian  wrote:
> > Attached is v26, which adds this new criteria/heuristic for unique
> > indexes. We now seem to consistently get good results with unique
> > indexes.
>
> In the past we tried to increase the number of cases where HOT updates
> can happen but were unable to.

Right -- the WARM project.

The Z-heap project won't change the fundamentals here. It isn't going
to solve the fundamental problem of requiring that the index AM create
a new set of physical index tuples in at least *some* cases. A heap
tuple cannot be updated in-place when even one indexed column changes
-- you're not much better off than you were with the classic heapam,
because indexes get bloated in a way that wouldn't happen with Oracle.
(Even still, Z-heap isn't sensitive to when and how opportunistic heap
pruning takes place, and doesn't have the same issue with having to
fit the heap tuple on the same page or create a new HOT chain. This
will make things much better with some workloads.)

> Would this help with non-HOT updates?

Definitely, yes. The strategy used with unique indexes is specifically
designed to avoid "unnecessary" page splits altogether -- it only
makes sense because of the possibility of non-HOT UPDATEs with
mostly-unchanged index tuples. Thinking about what's going on here
from first principles is what drove the unique index deduplication
design:

With many real world unique indexes, the true reason behind most or
all B-Tree page splits is "version churn". I view these page splits as
a permanent solution to a temporary problem -- we *permanently*
degrade the index structure in order to deal with a *temporary* burst
in versions that need to be stored. That's really bad.

Consider a classic pgbench workload, for example. The smaller indexes
on the smaller tables (pgbench_tellers_pkey and pgbench_branches_pkey)
have leaf pages that will almost certainly be split a few minutes in,
even though the UPDATEs on the underlying tables never modify indexed
columns (i.e. even though HOT is as effective as it possibly could be
with this unthrottled workload). Actually, even the resulting split
pages will themselves usually be split again, and maybe even once more
after that.  We started out with leaf pages that stored just under 370
items on each leaf page (with fillfactor 90 + 8KiB BLCKSZ), and end up
with leaf pages that often have less than 50 items (sometimes as few
as 10). Even though the "logical contents" of the index are *totally*
unchanged. This could almost be considered pathological by users.

Of course, it's easy to imagine a case where it matters a lot more
than classic pgbench (pgbench_tellers_pkey and pgbench_branches_pkey
are always small, so it's easy to see the effect, which is why I went
with that example). For example, you could have a long running
transaction, which would probably have the effect of significantly
bloating even the large pgbench index (pgbench_accounts_pkey) --
typically you won't see that with classic pgbench until you do
something to frustrate VACUUM (and opportunistic cleanup). (I have
mostly been using non-HOT UPDATEs to test the patch, though.)

In theory we could go even further than this by having some kind of
version store for indexes, and using this to stash old versions rather
than performing a page split. Then you wouldn't have any page splits
in the pgbench indexes; VACUUM would eventually be able to return the
index to its "pristine" state. The trade-off with that design would be
that index scans would have to access two index pages for a while (a
leaf page, plus its subsidiary old version page). Maybe we can
actually go that far in the future -- there are various database
research papers that describe designs like this (the designs described
within these papers do things like determine whether a "version split"
or a "value split" should be performed).

What we have now is an incremental improvement, that doesn't have any
apparent downside with unique indexes -- the way that deduplication is
triggered for unique indexes is almost certain to be a win. When
deduplication isn't triggered, everything works in the same way as
before -- it's "zero overhead" for unique indexes that don't benefit.
The design augments existing garbage collection mechanisms,
particularly the way in which we set LP_DEAD bits within
_bt_check_unique().

> Do we have any benchmarks where non-HOT updates cause slowdowns that we
> can test on this?

AFAICT, any workload that has lots of non-HOT updates will benefit at
least a little bit -- indexes will finish up smaller, there will be
higher throughput, and there will be a reduction in latency for
queries.

With the right distribution of values, it's not that hard to mostly
control bloat in an index that doubles in size without the
optimization, which is much more significant. I have already reported
on this [1]. I've also been able to observe increases of 15%-20% in
TPS with similar workloads (with commensurate 

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2019-12-17 Thread Bruce Momjian
On Thu, Dec 12, 2019 at 06:21:20PM -0800, Peter Geoghegan wrote:
> On Tue, Dec 3, 2019 at 12:13 PM Peter Geoghegan  wrote:
> > The new criteria/heuristic for unique indexes is very simple: If a
> > unique index has an existing item that is a duplicate on the incoming
> > item at the point that we might have to split the page, then apply
> > deduplication. Otherwise (when the incoming item has no duplicates),
> > don't apply deduplication at all -- just accept that we'll have to
> > split the page. We already cache the bounds of our initial binary
> > search in insert state, so we can reuse that information within
> > _bt_findinsertloc() when considering deduplication in unique indexes.
> 
> Attached is v26, which adds this new criteria/heuristic for unique
> indexes. We now seem to consistently get good results with unique
> indexes.

In the past we tried to increase the number of cases where HOT updates
can happen but were unable to.  Would this help with non-HOT updates? 
Do we have any benchmarks where non-HOT updates cause slowdowns that we
can test on this?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: tableam vs. TOAST

2019-12-17 Thread Robert Haas
On Fri, Nov 22, 2019 at 10:41 AM Robert Haas  wrote:
> Thanks for the review. Updated patches attached. This version is more
> complete than the last set of patches I posted.  It looks like this:
>
> 0001 - Lets a table AM that needs a toast table choose the AM that
> will be used to implement the toast table.
> 0002 - Refactoring and code cleanup for TOAST code.
> 0003 - Move heap-specific portion of logic refactored by previous
> patch to a separate function.
> 0004 - Lets a table AM arrange to call a different function when
> detoasting, instead of the one created by 0003.

Hearing no further comments, I went ahead and pushed 0002 today. That
turned out to have a bug, so I pushed a fix for that. Hopefully the
buildfarm will agree that it's fixed.

Meanwhile, here are the remaining patches again, rebased over the bug fix.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v10-0003-tableam-New-callback-relation_fetch_toast_slice.patch
Description: Binary data


v10-0001-tableam-Allow-choice-of-toast-AM.patch
Description: Binary data


v10-0002-Move-heap-specific-detoasting-logic-into-a-separ.patch
Description: Binary data


Re: non-exclusive backup cleanup is mildly broken

2019-12-17 Thread Tom Lane
I wrote:
> Took a quick look.  I agree that this seems a lot cleaner than the
> alternative proposals.  I'd suggest however that the header comment
> for do_pg_abort_backup could do with more work, perhaps along the
> lines of "The odd-looking signature allows this to be registered
> directly as a shmem_exit handler".

> Personally I'd have kept the handler as a separate function that's just
> a one-line wrapper around "void do_pg_abort_backup(bool emit_warning)".
> We don't usually treat callbacks as functions to be also called in
> their own right.  But if you don't want to do that, I'll settle for an
> acknowledgement of the hack in the comment.

Oh, scratch that --- looking closer, I see that the only two use-cases in
the patched code are via before_shmem_exit and PG_ENSURE_ERROR_CLEANUP,
and both of those require a function with the signature of an on_exit
callback.  So there's no need for a separate wrapper because this isn't
going to be called any other way.  I still recommend amending the
comment to explain why it has this signature, though.

regards, tom lane




Re: client auth docs seem to have devolved

2019-12-17 Thread Tom Lane
Magnus Hagander  writes:
> On Tue, Dec 17, 2019 at 5:01 PM Tom Lane  wrote:
>> I suggest changing the sect1's contents to be a list of available auth
>> methods, linked to their subsections.  That would provide approximately
>> the same quality-of-use as the subsection TOC that used to be there.

> Yeah, that sounds better. Is there some docbook magic that can do that for
> us?

I was just intending to do it the hard way, since even if such magic
exists, it'd probably only regurgitate the section titles.  It seems
more useful to allow for some descriptive text along with that.
(Not a lot, but maybe a full sentence for each one.)

regards, tom lane




Re: client auth docs seem to have devolved

2019-12-17 Thread Magnus Hagander
On Tue, Dec 17, 2019 at 5:01 PM Tom Lane  wrote:

> Magnus Hagander  writes:
> > This was changed by Peter in
> > commit 56811e57323faa453947eb82f007e323a952e1a1 along with the
> > restructuring. It used to say "the following subsections". So techically
> I
> > think that change is correct, but that doesn't necessarily make it
> helpful.
>
> > But based on how it actually renders, since that section doesn't contain
> > any actual useful info, we should perhaps just remove section 20.3
> > completely. Peter, thoughts?
>
> Then, URLs pointing to that page (such as Dave evidently has bookmarked)
> would break entirely, which doesn't seem like an improvement.
>

Ugh, that's a good point of course. Didn't think of that.


I suggest changing the sect1's contents to be a list of available auth
> methods, linked to their subsections.  That would provide approximately
> the same quality-of-use as the subsection TOC that used to be there.
>

Yeah, that sounds better. Is there some docbook magic that can do that for
us?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: non-exclusive backup cleanup is mildly broken

2019-12-17 Thread Magnus Hagander
On Tue, Dec 17, 2019 at 7:05 PM Tom Lane  wrote:

> Robert Haas  writes:
> > Perhaps I spoke too soon: I'm not sure whether Michael's comments
> > amount to an objection. While I give him a chance to respond, here's
> > an updated patch.
>
> Took a quick look.  I agree that this seems a lot cleaner than the
> alternative proposals.  I'd suggest however that the header comment
> for do_pg_abort_backup could do with more work, perhaps along the
> lines of "The odd-looking signature allows this to be registered
> directly as a shmem_exit handler".
>
> Personally I'd have kept the handler as a separate function that's just
> a one-line wrapper around "void do_pg_abort_backup(bool emit_warning)".
> We don't usually treat callbacks as functions to be also called in
> their own right.  But if you don't want to do that, I'll settle for an
> acknowledgement of the hack in the comment.
>

As would I, but I'm also fine with either of the two ways.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: allowing broader use of simplehash

2019-12-17 Thread Robert Haas
On Sat, Dec 14, 2019 at 10:24 PM Robert Haas  wrote:
> On Thu, Dec 12, 2019 at 2:33 PM Andres Freund  wrote:
> > I was basically just thinking that we could pass the context to use via
> > CurrentMemoryContext, instead of explicitly passing it in.
>
> I thought about that, but as a general rule, replacing a function
> parameter with a global variable is the wrong direction. One could
> argue this particular case is a counterexample, and I won't fight
> tooth and nail if you want to take that position, but I don't think I
> believe it myself.

After confirming with Andres that he didn't have an objection to me
pressing forward with these patches, I have committed them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: global / super barriers (for checksums)

2019-12-17 Thread Sergei Kornilov
Hi

> Andrew Gierth complained about this too over on -committers, and I saw
> his message first and pushed a fix. It includes the first and third
> hunks from your proposed patch, but not the second one.

Yep, I received his email just after sending mine. Thanks, my build is clean 
now.

regards, Sergei




Re: global / super barriers (for checksums)

2019-12-17 Thread Robert Haas
On Tue, Dec 17, 2019 at 1:44 PM Sergei Kornilov  wrote:
> > Stellar. If nobody objects in the meantime, I plan to commit 0001-0003
> > next week.
>
> My compiler (gcc 8.3.0) is not happy with recent 
> 5910d6c7e311f0b14e3d3cb9ce3597c01d3a3cde commit:
>
> autovacuum.c:831:1: error: ‘AutoVacLauncherShutdown’ was used with no 
> prototype before its definition [-Werror=missing-prototypes]
> checkpointer.c:524:1: error: ‘HandleCheckpointerInterrupts’ was used with no 
> prototype before its definition [-Werror=missing-prototypes]
>
> I think definition should looks as in attached patch. With this change build 
> is clean

Andrew Gierth complained about this too over on -committers, and I saw
his message first and pushed a fix. It includes the first and third
hunks from your proposed patch, but not the second one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: global / super barriers (for checksums)

2019-12-17 Thread Sergei Kornilov
Hello

> Stellar. If nobody objects in the meantime, I plan to commit 0001-0003
> next week.

My compiler (gcc 8.3.0) is not happy with recent 
5910d6c7e311f0b14e3d3cb9ce3597c01d3a3cde commit:

autovacuum.c:831:1: error: ‘AutoVacLauncherShutdown’ was used with no prototype 
before its definition [-Werror=missing-prototypes]
checkpointer.c:524:1: error: ‘HandleCheckpointerInterrupts’ was used with no 
prototype before its definition [-Werror=missing-prototypes]

I think definition should looks as in attached patch. With this change build is 
clean

regards, Sergeidiff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 1792008ebe..47b4bacadb 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -311,7 +311,7 @@ NON_EXEC_STATIC void AutoVacLauncherMain(int argc, char *argv[]) pg_attribute_no
 
 static Oid	do_start_worker(void);
 static void HandleAutoVacLauncherInterrupts(void);
-static void AutoVacLauncherShutdown() pg_attribute_noreturn();
+static void AutoVacLauncherShutdown(void) pg_attribute_noreturn();
 static void launcher_determine_sleep(bool canlaunch, bool recursing,
 	 struct timeval *nap);
 static void launch_worker(TimestampTz now);
@@ -828,7 +828,7 @@ HandleAutoVacLauncherInterrupts(void)
  * Perform a normal exit from the autovac launcher.
  */
 static void
-AutoVacLauncherShutdown()
+AutoVacLauncherShutdown(void)
 {
 	ereport(DEBUG1,
 			(errmsg("autovacuum launcher shutting down")));
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 3f35b324c3..df527ac021 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -161,7 +161,7 @@ static pg_time_t last_xlog_switch_time;
 
 /* Prototypes for private functions */
 
-static void HandleCheckpointerInterrupts();
+static void HandleCheckpointerInterrupts(void);
 static void CheckArchiveTimeout(void);
 static bool IsCheckpointOnSchedule(double progress);
 static bool ImmediateCheckpointRequested(void);


Re: [HACKERS] Block level parallel vacuum

2019-12-17 Thread Mahendra Singh
On Tue, 17 Dec 2019 at 18:07, Masahiko Sawada <
masahiko.saw...@2ndquadrant.com> wrote:
>
> On Fri, 13 Dec 2019 at 15:50, Amit Kapila  wrote:
> >
> > On Fri, Dec 13, 2019 at 11:08 AM Masahiko Sawada
> >  wrote:
> > >
> > > On Fri, 13 Dec 2019 at 14:19, Amit Kapila 
wrote:
> > > >
> > > > > > >
> > > > > > > How about adding an additional argument to
ReinitializeParallelDSM()
> > > > > > > that allows the number of workers to be reduced? That seems
like it
> > > > > > > would be less confusing than what you have now, and would
involve
> > > > > > > modify code in a lot fewer places.
> > > > > > >
> > > > > >
> > > > > > Yeah, we can do that.  We can maintain some information in
> > > > > > LVParallelState which indicates whether we need to reinitialize
the
> > > > > > DSM before launching workers.  Sawada-San, do you see any
problem with
> > > > > > this idea?
> > > > >
> > > > > I think the number of workers could be increased in cleanup
phase. For
> > > > > example, if we have 1 brin index and 2 gin indexes then in
bulkdelete
> > > > > phase we need only 1 worker but in cleanup we need 2 workers.
> > > > >
> > > >
> > > > I think it shouldn't be more than the number with which we have
> > > > created a parallel context, no?  If that is the case, then I think
it
> > > > should be fine.
> > >
> > > Right. I thought that ReinitializeParallelDSM() with an additional
> > > argument would reduce DSM but I understand that it doesn't actually
> > > reduce DSM but just have a variable for the number of workers to
> > > launch, is that right?
> > >
> >
> > Yeah, probably, we need to change the nworkers stored in the context
> > and it should be lesser than the value already stored in that number.
> >
> > > And we also would need to call
> > > ReinitializeParallelDSM() at the beginning of vacuum index or vacuum
> > > cleanup since we don't know that we will do either index vacuum or
> > > index cleanup, at the end of index vacum.
> > >
> >
> > Right.
>
> I've attached the latest version patch set. These patches requires the
> gist vacuum patch[1]. The patch incorporated the review comments. In
> current version patch only indexes that support parallel vacuum and
> whose size is larger than min_parallel_index_scan_size can participate
> parallel vacuum. I'm still not unclear to me that using
> min_parallel_index_scan_size is the best approach but I agreed to set
> a lower bound of relation size. I separated the patch for
> PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION  from the main patch and
> I'm working on that patch.
>
> Please review it.
>
> [1]
https://www.postgresql.org/message-id/CAA4eK1J1RxmXFAHC34S4_BznT76cfbrvqORSk23iBgRAOj1azw%40mail.gmail.com

Thanks for updated patches.  I verified my all reported issues and all are
fixed in v36 patch set.

Below are some review comments:
1.
+   /* cap by *max_parallel_maintenace_workers* */


+   parallel_workers = Min(parallel_workers,
max_parallel_maintenance_workers);

Here, spell of max_parallel_*maintenace*_workers is wrong.  (correct:
max_parallel_maintenance_workers)

2.
+ * size of stats for each index.  Also, this function   Since currently we
don't support parallel vacuum

+ * for autovacuum we don't need to care about autovacuum_work_mem

Here, I think, 1st line should be changed because it is not looking correct
as grammatically.

Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com


Re: global / super barriers (for checksums)

2019-12-17 Thread Robert Haas
On Thu, Dec 12, 2019 at 2:54 PM Andres Freund  wrote:
> I'd either add a test (if we have some) or placeholder kind
> initially. But I'd also be ok with going for either of the other
> versions directly - but it seems harder to tackle the patches together.

OK. I have committed 0001-0003 as I had mentioned last week that I
intended to do. For 0004, I have replaced "sample" with "placeholder"
and added comments explaining that this is intended to be replaced by
the first real user of the mechanism. If there are no further
comments/objections I'll go ahead with this one as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v4-0001-Extend-the-ProcSignal-mechanism-to-support-barrie.patch
Description: Binary data


Re: RFC: split OBJS lines to one object per line

2019-12-17 Thread Mahendra Singh
On Fri, 8 Nov 2019 at 14:38, Michael Paquier  wrote:
>
> On Thu, Nov 07, 2019 at 12:02:04PM -0500, Tom Lane wrote:
> > I don't think it'd be a great idea to change parallel_schedule like
> > that.  Independently adding test scripts to the same parallel batch
> > probably won't end well: you might end up over the concurrency limit,
> > or the scripts might conflict through sharing table names or the like.
> > So I'd rather see that there's a conflict to worry about.
> >
> > Anyway, merge conflicts there aren't so common IME.
>
> FWIW, I was not referring to the schedule files here, just to REGRESS
> and ISOLATION in the modules' Makefiles.  If you think that's not
> worth doing it, let's drop my suggestion then.
> --

I found some inconsistency in alphabetical order in
src/backend/tsearch/Makefile, src/backend/utils/Makefile and
src/pl/plpython/Makefile files.  Attached patch is fixing those order
related inconsistency.

Thanks and Regards
Mahendra Thalor
EnterpriseDB: http://www.enterprisedb.com


Fixed_backend_folder_makefiles_v1.patch
Description: Binary data


Re: non-exclusive backup cleanup is mildly broken

2019-12-17 Thread Tom Lane
Robert Haas  writes:
> Perhaps I spoke too soon: I'm not sure whether Michael's comments
> amount to an objection. While I give him a chance to respond, here's
> an updated patch.

Took a quick look.  I agree that this seems a lot cleaner than the
alternative proposals.  I'd suggest however that the header comment
for do_pg_abort_backup could do with more work, perhaps along the
lines of "The odd-looking signature allows this to be registered
directly as a shmem_exit handler".

Personally I'd have kept the handler as a separate function that's just
a one-line wrapper around "void do_pg_abort_backup(bool emit_warning)".
We don't usually treat callbacks as functions to be also called in
their own right.  But if you don't want to do that, I'll settle for an
acknowledgement of the hack in the comment.

regards, tom lane




Re: non-exclusive backup cleanup is mildly broken

2019-12-17 Thread Robert Haas
On Tue, Dec 17, 2019 at 8:38 AM Robert Haas  wrote:
> Since there doesn't seem to be any opposition to my original fix,
> except for the fact that I included a bug in it, I'm going to go see
> about getting that committed.

Perhaps I spoke too soon: I'm not sure whether Michael's comments
amount to an objection. While I give him a chance to respond, here's
an updated patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v2-0001-Fix-minor-problems-with-non-exclusive-backup-clea.patch
Description: Binary data


Re: unsupportable composite type partition keys

2019-12-17 Thread Tom Lane
Amit Langote  writes:
> It seems to me that we currently allow expressions that are anonymous
> and self-referencing composite type records as partition key, but
> shouldn't.  Allowing them leads to this:

Hm.  Seems like the restrictions here ought to be just about the same
as on index columns, no?  That is, it should be roughly a test like
"no pseudo-types".  The check you're proposing seems awfully specific,
and I doubt that the equivalent check in CREATE INDEX looks the same.
(But I didn't go look ... I might be wrong.)

regards, tom lane




Re: client auth docs seem to have devolved

2019-12-17 Thread Dave Cramer
> Then, URLs pointing to that page (such as Dave evidently has bookmarked)
> would break entirely, which doesn't seem like an improvement.
>

it was linked to in a bug report.

Dave Cramer


Re: empty column name in error message

2019-12-17 Thread Tom Lane
Amit Langote  writes:
> I wonder if it's worthwhile to fix the following not-so-friendly error 
> message:

> create index on foo ((row(a)));
> ERROR:  column "" has pseudo-type record

Ugh.  That used to work more nicely:

regression=# create index fooi on foo ((row(a)));
ERROR:  column "pg_expression_1" has pseudo-type record

But that was back in 8.4 :-( ... 9.0 and up behave as you show.
I'm guessing we broke it when we rearranged the rules for naming
index expression columns.

> For example, the attached patch makes it this:

> create index on foo ((row(a)));
> ERROR:  column "row" has pseudo-type record

Haven't read the patch in any detail yet, but that seems like
an improvement.  And I guess we need a test case, or we'll
break it again :-(

regards, tom lane




Re: client auth docs seem to have devolved

2019-12-17 Thread Tom Lane
Magnus Hagander  writes:
> This was changed by Peter in
> commit 56811e57323faa453947eb82f007e323a952e1a1 along with the
> restructuring. It used to say "the following subsections". So techically I
> think that change is correct, but that doesn't necessarily make it helpful.

> But based on how it actually renders, since that section doesn't contain
> any actual useful info, we should perhaps just remove section 20.3
> completely. Peter, thoughts?

Then, URLs pointing to that page (such as Dave evidently has bookmarked)
would break entirely, which doesn't seem like an improvement.

I suggest changing the sect1's contents to be a list of available auth
methods, linked to their subsections.  That would provide approximately
the same quality-of-use as the subsection TOC that used to be there.

regards, tom lane




RE: [PATCH] Windows port add support to BCryptGenRandom

2019-12-17 Thread Ranier Vilela
De: Michael Paquier
Enviadas: Terça-feira, 17 de Dezembro de 2019 04:34
>So, this basically matches with what the MS documents tell us, and my
>impression: this API is available down to at least MSVC 2008, which is
>much more than what we support on HEAD where one can use MSVC 2013 and
>newer versions.  Note that for the minimal platforms supported our
>documentation cite Windows Server 2008 R2 SP1 and Windows 7, implying
>_WIN32_WINNT >= 0x0600.
As concern [1], at src/include/port/win32.h, the comments still references 
Windows XP and claims about possible MingW break.

>In short, this means two things:
>- Your patch, as presented, is wrong.
Well, I try correct him to target MSVC 2013.

>There is no need to make conditional the use of BCryptGenRandom.
If legacy Windows Crypto API still remain, and the patch can broken MingW, I 
believe as necessary conditional use of BCryptGenRandom.

Best regards,
Ranier Vilela


Re: non-exclusive backup cleanup is mildly broken

2019-12-17 Thread Robert Haas
On Fri, Dec 13, 2019 at 3:00 AM Michael Paquier  wrote:
> Agreed, that's an issue and do_pg_abort_abort should not touch
> sessionBackupState, so you should keep cancel_before_shmem_exit after
> pg_stop_backup().

I don't understand this comment, because that can't possibly work.  It
assumes either that nobody else is allowed to use before_shmem_exit()
after we do, or that cancel_before_shmem_exit() does something that it
doesn't actually do.

In general, before_shmem_exit() callbacks are intended to be
persistent, and therefore it's the responsibility of the callback to
test whether any work needs to be done. This particular callback is an
exception, assuming that it can remove itself when there's no longer
any work to be done.

> Other than that, I have looked in details at how
> safe it is to move before_shmem_exit(do_pg_abort_backup) before
> do_pg_start_backup() and the cleanups of nonExclusiveBackups happen
> safely and consistently in the event of an error during
> pg_start_backup().

I came to the same conclusion, but I think it's still better to
register the callback first. If the callback is properly written to do
nothing when there's nothing to do, then having it registered earlier
is harmless. And if, in the future, do_pg_start_backup() should be
changed in such a way that, say, it can throw an error at the very
end, then registering the handler first would prevent that from being
a bug.

It is generally more robust to register a cleanup handler in advance
and then count on it to do the right thing than to try to write code
that isn't allowed to fail in the wrong place.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Improvement to psql's connection defaults

2019-12-17 Thread Christoph Moench-Tegeder
## Tomas Zubiri (m...@tomaszubiri.com):

> We already established that a tcp connection was subpar in terms of
> latency, we shall note then that a tcp connection is subpar in terms
> of security.

It's an entirely different thing, I'd argue. I'm not even convinced
that an error message is a bad thing: not specifying connection parameters
gives you the defaults (which are clearly documented - having the doc
maintainers enter into a SEO-contest would be expecting too much); and
if that fails, there's an error. Adding more guesswork on how to connect
to your database server will add confusion instead of reducing it.
(Where does "localhost" resolve to? Does it resolve at all? What
about IPv4 vs. IPv6? Is IP traffic allowed there? That's all stuff
which has been relevant in one way or the other while looking at existing
systems. Real world can deviate quite significantly from what one
whould expect as "sane".) I for one prefer to have clear defaults
and clear error messages in case that does not work.

>  Additionally, it's
> already possible to have this subpar connection and differing
> interface on non-unix platforms.

I think there's only one relevant platform without unix sockets
left (I'm not sure about vxWorks and other embedded systems, but
their applications rarely include full-blown database servers),
and that system has gone great lengths to include a linux subsystem -
that might tell you something.

> >(a) don't mix-and-match Postgres packages from different vendors,
> 
> Since there's a client-server architecture here, I'm assuming that
> there's compatibility between different versions of the software.

Generally speaking: yes. But compiled-in defaults may not match
(like the location of the Unix sockets) across different vendor
packages of the same version. And client tools might have a hard
time working against a newer major release of the server: the protocol
does not change (at least, it didn't for a long time, except for some
additions like SCRAM authentication), but the catalog may have changed
between major versions and the client can't get the information it
needs.

>  Consider this example, if you are away
> from home and you tell Google Maps or Uber that you want to go to your
> city, does it fail claiming that it doesn't have enough information or
> claiming that the route it would take given the subpar information you
> gave it would be subpar?

That's a good example, but not in the way you think: people and vehicles
(trucks and lorries, even) ending up some hundreds of kilometres from
their intended destination because their navigation system "tried it's
best" with an ambiguously entered name is quite a common occurence here.
(For example, there are at least six places called "Forst" and some dozens
"Neustadt" - many more if you count boroughs and similar - in Germany).

Regards,
Christoph

-- 
Spare Space




RE: Windows port minor fixes

2019-12-17 Thread Ranier Vilela
De: Michael Paquier
Enviadas: Terça-feira, 17 de Dezembro de 2019 04:45
>And if you actually group things together so as any individual looking
>at your patches does not have to figure out which piece applies to
>what, that's also better.
I'm still trying to find the best way.

>Anyway, the patch for putenv() is wrong in the way the memory is freed, but 
>this >has been mentioned on another thread.
Oh yes, putenv depending on glibc version, copy and not others, the pointer.
At Windows side, the Dr.Memory, reported two memory leaks, with strdup.
The v2 is better, because, simplifies the function.
Submitted a proposal for setenv support for Windows, in other thread.

>We rely on MAXPGPATH heavily so your patch trying to change
>the buffer length does not bring much,
I am a little confused about which path you are talking about.
If it about var path at function validate_exec, I believe that there is a 
mistake.

char path_exe[MAXPGPATH + sizeof(".exe") - 1];
The -1, its suspicious and can be removed.

Once there, I tried to improve the code by simplifying and removing the 
excessive number of functions.

At Windows side, the code paths, is less tested.
The Dr.Memory, reported 3794 potential unaddressable access at WIN32 block, 
pipe_read_line function, wich call validade_exec.

Best regards,
Ranier Vilela



Re: non-exclusive backup cleanup is mildly broken

2019-12-17 Thread Robert Haas
On Tue, Dec 17, 2019 at 1:31 AM Kyotaro Horiguchi
 wrote:
> The patch can cause removal of a wrong cleanup function on non-cassert
> build. That might be unwanted. But I think the assertion is needed
> anyway.

I agree with the first part of this critique, but not necessarily with
the second part. Right now, if you call cancel_before_shmem_exit(),
you might not remove the handler that you intended to remove, but you
won't remove some unrelated handler. With the patch, if assertions are
disabled, you will definitely remove something, but it might not be
the thing you intended to remove. That seems worse.

On the question of whether the assertion is needed, it is currently
the case that you could call cancel_before_shmem_exit() without
knowing whether you'd actually registered a handler or not. With the
proposed change, that would no longer be legal. Maybe that's OK. But
it doesn't seem entirely crazy to have some error-recovery path where
cancel_before_shmem_exit() could get called twice in obscure
circumstances, and right now, you can rest easy, knowing that the
second call just won't do anything. If we make it an assertion failure
to do such things, then you can't. On the other hand, maybe there's no
use for such a construct, and it'd be better to just reduce confusion.
Anyway, I think this is a separate topic from fixing this specific
problem.

Since there doesn't seem to be any opposition to my original fix,
except for the fact that I included a bug in it, I'm going to go see
about getting that committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [HACKERS] Block level parallel vacuum

2019-12-17 Thread Masahiko Sawada
On Fri, 13 Dec 2019 at 15:50, Amit Kapila  wrote:
>
> On Fri, Dec 13, 2019 at 11:08 AM Masahiko Sawada
>  wrote:
> >
> > On Fri, 13 Dec 2019 at 14:19, Amit Kapila  wrote:
> > >
> > > > > >
> > > > > > How about adding an additional argument to ReinitializeParallelDSM()
> > > > > > that allows the number of workers to be reduced? That seems like it
> > > > > > would be less confusing than what you have now, and would involve
> > > > > > modify code in a lot fewer places.
> > > > > >
> > > > >
> > > > > Yeah, we can do that.  We can maintain some information in
> > > > > LVParallelState which indicates whether we need to reinitialize the
> > > > > DSM before launching workers.  Sawada-San, do you see any problem with
> > > > > this idea?
> > > >
> > > > I think the number of workers could be increased in cleanup phase. For
> > > > example, if we have 1 brin index and 2 gin indexes then in bulkdelete
> > > > phase we need only 1 worker but in cleanup we need 2 workers.
> > > >
> > >
> > > I think it shouldn't be more than the number with which we have
> > > created a parallel context, no?  If that is the case, then I think it
> > > should be fine.
> >
> > Right. I thought that ReinitializeParallelDSM() with an additional
> > argument would reduce DSM but I understand that it doesn't actually
> > reduce DSM but just have a variable for the number of workers to
> > launch, is that right?
> >
>
> Yeah, probably, we need to change the nworkers stored in the context
> and it should be lesser than the value already stored in that number.
>
> > And we also would need to call
> > ReinitializeParallelDSM() at the beginning of vacuum index or vacuum
> > cleanup since we don't know that we will do either index vacuum or
> > index cleanup, at the end of index vacum.
> >
>
> Right.

I've attached the latest version patch set. These patches requires the
gist vacuum patch[1]. The patch incorporated the review comments. In
current version patch only indexes that support parallel vacuum and
whose size is larger than min_parallel_index_scan_size can participate
parallel vacuum. I'm still not unclear to me that using
min_parallel_index_scan_size is the best approach but I agreed to set
a lower bound of relation size. I separated the patch for
PARALLEL_VACUUM_DISABLE_LEADER_PARTICIPATION  from the main patch and
I'm working on that patch.

Please review it.

[1] 
https://www.postgresql.org/message-id/CAA4eK1J1RxmXFAHC34S4_BznT76cfbrvqORSk23iBgRAOj1azw%40mail.gmail.com

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


v36-0001-Add-index-AM-field-and-callback-for-parallel-ind.patch
Description: Binary data


v36-0003-Add-FAST-option-to-vacuum-command.patch
Description: Binary data


v36-0002-Add-parallel-option-to-VACUUM-command.patch
Description: Binary data


Re: segmentation fault when cassert enabled

2019-12-17 Thread vignesh C
On Tue, Dec 17, 2019 at 10:09 AM Amit Kapila  wrote:
>
> On Mon, Dec 16, 2019 at 9:16 PM Jehan-Guillaume de Rorthais
>  wrote:
> >
> > On Mon, 16 Dec 2019 13:27:43 +0100
> > Peter Eisentraut  wrote:
> >
> > > On 2019-12-16 11:11, Amit Kapila wrote:
> > > > I agree that this is a timing issue.  I also don't see a way to write
> > > > a reproducible test for this.  However, I could reproduce it via
> > > > debugger consistently by following the below steps.  I have updated a
> > > > few comments and commit messages in the attached patch.
> > > >
> > > > Peter E., Petr J or anyone else, do you have comments or objections on
> > > > this patch?  If none, then I am planning to commit (and backpatch)
> > > > this patch in a few days time.
> > >
> > > The patch seems fine to me.  Writing a test seems hard.  Let's skip it.
> > >
>
> Okay.
>
> > > The commit message has a duplicate "building"/"built" in the first 
> > > sentence.
> >
> > I think the sentence is quite long. I had to re-read it to get it.
> >
> > What about:
> >
> >   This patch allows building the local relmap cache for a subscribed 
> > relation
> >   after processing pending invalidation messages and potential relcache
> >   updates.
> >
>
> Attached patch with updated commit message based on suggestions.  I am
> planning to commit this tomorrow unless there are more comments.
>

While testing the patch on back versions, I found that the patch does
not apply on PG 11 & PG 10 branch. Attached patch has the changes for
PG 11 & PG 10 branch. Only difference in the patch was that table_open
needed to be changed to heap_open. I have verified the patch on back
branches and found it to be working. For PG 12 & current the previous
patch that Amit need to be used, I'm not reattaching here.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
From 2c4123d514473a9203c3617655229286a84487e6 Mon Sep 17 00:00:00 2001
From: vignesh 
Date: Tue, 17 Dec 2019 17:05:35 +0530
Subject: [PATCH] Fix subscriber invalid memory access on DDL.

This patch allows building the local relmap cache for a subscribed
relation after processing pending invalidation messages and potential
relcache updates.  Without this, the attributes in the local cache don't
tally with the updated relcache entry leading to invalid memory access.

Reported-by Jehan-Guillaume de Rorthais
Author: Jehan-Guillaume de Rorthais and Vignesh C
Reviewed-by: Amit Kapila
Backpatch-through: 10
Discussion: https://postgr.es/m/20191025175929.7e90dbf5@firost
---
 src/backend/replication/logical/relation.c | 40 +++---
 1 file changed, 26 insertions(+), 14 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index cf6f0a7..e1bbabd 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -220,6 +220,8 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 {
 	LogicalRepRelMapEntry *entry;
 	bool		found;
+	Oid			relid = InvalidOid;
+	LogicalRepRelation *remoterel;
 
 	if (LogicalRepRelMap == NULL)
 		logicalrep_relmap_init();
@@ -232,20 +234,17 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 		elog(ERROR, "no relation map entry for remote relation ID %u",
 			 remoteid);
 
-	/* Need to update the local cache? */
+	remoterel = >remoterel;
+
+	/*
+	 * When opening and locking a relation, pending invalidation messages are
+	 * processed which can invalidate the relation.  We need to update the
+	 * local cache both when we are first time accessing the relation and when
+	 * the relation is invalidated (aka entry->localreloid is set InvalidOid).
+	 */
 	if (!OidIsValid(entry->localreloid))
 	{
-		Oid			relid;
-		int			i;
-		int			found;
-		Bitmapset  *idkey;
-		TupleDesc	desc;
-		LogicalRepRelation *remoterel;
-		MemoryContext oldctx;
-
-		remoterel = >remoterel;
-
-		/* Try to find and lock the relation by name. */
+		/* Try to find and lock the relation by name */
 		relid = RangeVarGetRelid(makeRangeVar(remoterel->nspname,
 			  remoterel->relname, -1),
  lockmode, true);
@@ -256,6 +255,21 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 			remoterel->nspname, remoterel->relname)));
 		entry->localrel = heap_open(relid, NoLock);
 
+	}
+	else
+	{
+		relid = entry->localreloid;
+		entry->localrel = heap_open(entry->localreloid, lockmode);
+	}
+
+	if (!OidIsValid(entry->localreloid))
+	{
+		int			found;
+		Bitmapset  *idkey;
+		TupleDesc	desc;
+		MemoryContext oldctx;
+		int			i;
+
 		/* Check for supported relkind. */
 		CheckSubscriptionRelkind(entry->localrel->rd_rel->relkind,
  remoterel->nspname, remoterel->relname);
@@ -350,8 +364,6 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 
 		entry->localreloid = relid;
 	}
-	else
-		entry->localrel = heap_open(entry->localreloid, lockmode);
 
 	if (entry->state != SUBREL_STATE_READY)
 		entry->state = 

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2019-12-17 Thread Amit Khandekar
On Mon, 16 Dec 2019 at 16:52, Amit Kapila  wrote:
>
> On Mon, Dec 16, 2019 at 3:26 PM Amit Khandekar  wrote:
> >
> > On Sat, 14 Dec 2019 at 11:59, Amit Kapila  wrote:
> > >
> > > I have also made minor changes related to below code in patch:
> > > - else if (readBytes != sizeof(ReorderBufferDiskChange))
> > > +
> > > + file->curOffset += readBytes;
> > > +
> > > + if (readBytes !=
> > > sizeof(ReorderBufferDiskChange))
> > >
> > > Why the size is added before the error check?
> > The logic was : even though it's an error that the readBytes does not
> > match the expected size, the file read is successful so update the vfd
> > offset as early as possible. In our case, this might not matter much,
> > but who knows, in the future, in the exception block (say, in
> > ReorderBufferIterTXNFinish, someone assumes that the file offset is
> > correct and does something with that, then we will get in trouble,
> > although I agree that it's very unlikely.
> >
>
> I am not sure if there is any such need, but even if it is there, I
> think updating after a *short* read (read less than expected) doesn't
> seem like a good idea because there is clearly some problem with the
> read call.  Also, in the case below that case where we read the actual
> change data, the offset is updated after the check of *short* read.  I
> don't see any advantage in such an inconsistency.  I still feel it is
> better to update the offset after all error checks.
Ok, no problem; I don't see any harm in doing the updates after the size checks.

By the way, the backport patch is turning out to be simpler. It's
because in pre-12 versions, the file offset is part of the Vfd
structure, so all the offset handling is not required.

>
> >
> > > and see if you can run perltidy for the test file.
> > Hmm, I tried perltidy, and it seems to mostly add a space after ( and
> > a space before ) if there's already; so "('postgres'," is replaced by
> > "( 'postgres',". And this is going to be inconsistent with
> > other places. And it replaces tab with spaces. Do you think we should
> > try perltidy, or have we before been using this tool for the tap tests
> > ?
> >
>
> See text in src/test/perl/README (Note that all tests and test tools
> should have perltidy run on them before patches are submitted, using
> perltidy - profile=src/tools/pgindent/perltidyrc).  It is recommended
> to use perltidy.
>
> Now, if it is making the added code inconsistent with nearby code,
> then I suggest to leave it.
In many places, it is becoming inconsistent, but will see if there are
some places where it does make sense and does not break consistency.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company




Re: client auth docs seem to have devolved

2019-12-17 Thread Magnus Hagander
On Tue, Dec 17, 2019 at 1:02 PM Dave Cramer  wrote:

> On Tue, 17 Dec 2019 at 06:53, Magnus Hagander  wrote:
>
>> On Tue, Dec 17, 2019 at 12:43 PM Dave Cramer 
>> wrote:
>>
>>> While following an old link to
>>> https://www.postgresql.org/docs/10/auth-methods.html
>>>
>>> I see a list of links to authentication methods. However:
>>>
>>> When I hit the current version
>>> https://www.postgresql.org/docs/current/auth-methods.html
>>>
>>> There are absolutely no links...
>>>
>>>
>> That's because the structure of the docs changed. You need to hit "up",
>> which will take you to
>> https://www.postgresql.org/docs/current/client-authentication.html,
>> which now has the list of links. Note how the different methods used to be
>> 20.3.x, and are now directly listed as 20.y.
>>
>> I'm unsure if that was intentional in the upstream docs, but that's what
>> makes the website behave like it does.
>>
>
> Fair enough but
>
> 20.3. Authentication Methods
> The following sections describe the authentication methods in more detail.
>
> certainly is misleading.
>
>
This was changed by Peter in
commit 56811e57323faa453947eb82f007e323a952e1a1 along with the
restructuring. It used to say "the following subsections". So techically I
think that change is correct, but that doesn't necessarily make it helpful.

But based on how it actually renders, since that section doesn't contain
any actual useful info, we should perhaps just remove section 20.3
completely. Peter, thoughts?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: client auth docs seem to have devolved

2019-12-17 Thread Dave Cramer
On Tue, 17 Dec 2019 at 06:53, Magnus Hagander  wrote:

> On Tue, Dec 17, 2019 at 12:43 PM Dave Cramer  wrote:
>
>> While following an old link to
>> https://www.postgresql.org/docs/10/auth-methods.html
>>
>> I see a list of links to authentication methods. However:
>>
>> When I hit the current version
>> https://www.postgresql.org/docs/current/auth-methods.html
>>
>> There are absolutely no links...
>>
>>
> That's because the structure of the docs changed. You need to hit "up",
> which will take you to
> https://www.postgresql.org/docs/current/client-authentication.html, which
> now has the list of links. Note how the different methods used to be
> 20.3.x, and are now directly listed as 20.y.
>
> I'm unsure if that was intentional in the upstream docs, but that's what
> makes the website behave like it does.
>

Fair enough but

20.3. Authentication Methods
The following sections describe the authentication methods in more detail.

certainly is misleading.

Thanks,

Dave

>


Re: client auth docs seem to have devolved

2019-12-17 Thread Magnus Hagander
On Tue, Dec 17, 2019 at 12:43 PM Dave Cramer  wrote:

> While following an old link to
> https://www.postgresql.org/docs/10/auth-methods.html
>
> I see a list of links to authentication methods. However:
>
> When I hit the current version
> https://www.postgresql.org/docs/current/auth-methods.html
>
> There are absolutely no links...
>
>
That's because the structure of the docs changed. You need to hit "up",
which will take you to
https://www.postgresql.org/docs/current/client-authentication.html, which
now has the list of links. Note how the different methods used to be
20.3.x, and are now directly listed as 20.y.

I'm unsure if that was intentional in the upstream docs, but that's what
makes the website behave like it does.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


client auth docs seem to have devolved

2019-12-17 Thread Dave Cramer
While following an old link to
https://www.postgresql.org/docs/10/auth-methods.html

I see a list of links to authentication methods. However:

When I hit the current version
https://www.postgresql.org/docs/current/auth-methods.html

There are absolutely no links...

Dave Cramer


Re: error context for vacuum to include block number

2019-12-17 Thread Kyotaro Horiguchi
At Mon, 16 Dec 2019 11:49:56 +0900, Michael Paquier  wrote 
in 
> On Sun, Dec 15, 2019 at 10:27:12AM -0600, Justin Pryzby wrote:
> > I named it so because it calls both lazy_vacuum_index
> > ("PROGRESS_VACUUM_PHASE_VACUUM_INDEX") and
> > lazy_vacuum_heap("PROGRESS_VACUUM_PHASE_VACUUM_HEAP")
> > 
> > I suppose you don't think the other way around is better?
> > lazy_vacuum_index_heap
> 
> Dunno.  Let's see if others have other thoughts on the matter.  FWIW,
> I have a long history at naming things in a way others don't like :)

lazy_vacuum_heap_index() seems confusing to me.  I read the name as
Michael did before looking the above explanation.

lazy_vacuum_heap_and_index() is clearer to me.
lazy_vacuum_heap_with_index() could also work but I'm not sure it's
further better.

I see some function names like that, and some others that have two
verbs bonded by "_and_".

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Extracting only the columns needed for a query

2019-12-17 Thread Dmitry Dolgov
> On Tue, Jul 16, 2019 at 06:49:10PM -0700, Melanie Plageman wrote:
>
> We implemented Approach B in the attached patch set (patch 0001) and
> then implemented Approach A (patch 0002) to sanity check the pruned
> list of columns to scan we were getting at plan-time.
> We emit a notice in SeqNext() if the two column sets differ.
> Currently, for all of the queries in the regression suite, no
> differences are found.
> We did notice that none of the current regression tests for triggers
> are showing a difference between columns that can be extracted at plan
> time and those that can be extracted at execution time--though we
> haven't dug into this at all.

Thanks for the patch! If I understand correctly from this thread,
approach B is more preferable, so I've concentrated on the patch 0001
and have a few commentaries/questions:

* The idea is to collect columns that have being used for selects/updates
  (where it makes sense for columnar storage to avoid extra work), do I see it
  right? If it's the case, then scanCols could be a bit misleading, since it
  gives an impression that it's only about reads.

* After a quick experiment, it seems that extract_used_columns is invoked for
  updates, but collects too many colums, e.g:

create table test (id int primary key, a text, b text, c text);
update test set a = 'something' where id = 1;

  collects into scanCols all columns (varattno from 1 to 4) + again the first
  column from baserestrictinfo. Is it correct?

* Not sure if it supposed to be covered by this functionality, but if we do

insert ... on conflict (uniq_col) do update set other_col = 'something'

  and actually have to perform an update, extract_used_columns is not called.

* Probably it also makes sense to check IS_DUMMY_REL in extract_used_columns?

> > On Sat, Jun 15, 2019 at 10:02 AM Tom Lane  wrote:
> >
> > Another reason for having the planner do this is that presumably, in
> > an AM that's excited about this, the set of fetched columns should
> > play into the cost estimates for the scan.  I've not been paying
> > enough attention to the tableam work to know if we've got hooks for
> > the AM to affect scan costing ... but if we don't, that seems like
> > a hole that needs plugged.
>
> AM callback relation_estimate_size exists currently which planner leverages.
> Via this callback it fetches tuples, pages, etc.. So, our thought is to extend
> this API if possible to pass down needed column and help perform better 
> costing
> for the query. Though we think if wish to leverage this function, need to know
> list of columns before planning hence might need to use query tree.

I believe it would be beneficial to add this potential API extension patch into
the thread (as an example of an interface defining how scanCols could be used)
and review them together.




Re: automating pg_config.h.win32 maintenance

2019-12-17 Thread Peter Eisentraut

On 2019-12-17 07:30, Michael Paquier wrote:

The patch looks pretty clean.  I have a few minor comments.

-   if (/^AC_INIT\(\[PostgreSQL\], \[([^\]]+)\]/)
+   if (/^AC_INIT\(\[([^\]]+)\], \[([^\]]+)\], \[([^\]]+)\]/)
 {
Why did you remove the bit about "PostgreSQL"?


Just to make it more general.  If we're going to parse the arguments, 
why not parse all of them the same way.



+   open(my $i, '<', "src/include/pg_config.h.in")
+ || confess "Could not open pg_config.h.in\n";
+   open(my $o, '>', "src/include/pg_config.h")
+ || confess "Could not write to pg_config.h\n";
Failure to open pg_config.h.

Wouldn't it be better to remove pg_config_ext.h.win32 as well?


Yeah, good idea.  Attached patch is refactored so all three header files 
managed by AC_CONFIG_HEADERS are processed the same way.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From b60719bd445fd63f0c42db8823f7d6cda89a6732 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 7 Nov 2019 15:56:22 +0100
Subject: [PATCH v3 1/2] Generate pg_config.h from pg_config.h.in on Windows

---
 src/tools/msvc/Solution.pm | 544 +
 1 file changed, 435 insertions(+), 109 deletions(-)

diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index ac626dfa53..0a40557f9e 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -20,7 +20,6 @@ sub _new
projects   => {},
options=> $options,
numver => '',
-   strver => '',
VisualStudioVersion=> undef,
MinimumVisualStudioVersion => undef,
vcver  => undef,
@@ -140,16 +139,22 @@ sub GenerateFiles
 {
my $self = shift;
my $bits = $self->{platform} eq 'Win32' ? 32 : 64;
+   my $package_name;
+   my $package_version;
+   my $package_bugreport;
 
# Parse configure.in to get version numbers
open(my $c, '<', "configure.in")
  || confess("Could not open configure.in for reading\n");
while (<$c>)
{
-   if (/^AC_INIT\(\[PostgreSQL\], \[([^\]]+)\]/)
+   if (/^AC_INIT\(\[([^\]]+)\], \[([^\]]+)\], \[([^\]]+)\]/)
{
-   $self->{strver} = $1;
-   if ($self->{strver} !~ /^(\d+)(?:\.(\d+))?/)
+   $package_name  = $1;
+   $package_version   = $2;
+   $package_bugreport = $3;
+
+   if ($package_version !~ /^(\d+)(?:\.(\d+))?/)
{
confess "Bad format of version: 
$self->{strver}\n";
}
@@ -159,7 +164,10 @@ sub GenerateFiles
}
close($c);
confess "Unable to parse configure.in for all variables!"
- if ($self->{strver} eq '' || $self->{numver} eq '');
+ if ( $package_name eq ''
+   || $package_version eq ''
+   || $self->{numver} eq ''
+   || $package_bugreport eq '');
 
if (IsNewer("src/include/pg_config_os.h", "src/include/port/win32.h"))
{
@@ -167,100 +175,371 @@ sub GenerateFiles
copyFile("src/include/port/win32.h", 
"src/include/pg_config_os.h");
}
 
-   if (IsNewer("src/include/pg_config.h", "src/include/pg_config.h.win32"))
+   print "Generating configuration headers...\n";
+   my $extraver = $self->{options}->{extraver};
+   $extraver = '' unless defined $extraver;
+   my $port = $self->{options}->{"--with-pgport"} || 5432;
+
+   # Every symbol in pg_config.h.in must be accounted for here.  Set
+   # to undef if the symbol should not be defined.
+   my %define = (
+   ACCEPT_TYPE_ARG1 => 'unsigned int',
+   ACCEPT_TYPE_ARG2 => 'struct sockaddr *',
+   ACCEPT_TYPE_ARG3 => 'int',
+   ACCEPT_TYPE_RETURN   => 'unsigned int PASCAL',
+   ALIGNOF_DOUBLE   => 8,
+   ALIGNOF_INT  => 4,
+   ALIGNOF_LONG => 4,
+   ALIGNOF_LONG_LONG_INT=> 8,
+   ALIGNOF_PG_INT128_TYPE   => undef,
+   ALIGNOF_SHORT=> 2,
+   AC_APPLE_UNIVERSAL_BUILD => undef,
+   BLCKSZ   => 1024 * 
$self->{options}->{blocksize},
+   DEF_PGPORT   => $port,
+   DEF_PGPORT_STR   => qq{"$port"},
+   ENABLE_GSS   => $self->{options}->{gss} ? 1 : undef,
+   ENABLE_NLS   => $self->{options}->{nls} ? 1 : undef,
+   ENABLE_THREAD_SAFETY => 1,
+   FLEXIBLE_ARRAY_MEMBER=> '/**/',
+   

Re: archive status ".ready" files may be created too early

2019-12-17 Thread Kyotaro Horiguchi
Thank you Alvaro for the comment (on my comment).

At Fri, 13 Dec 2019 18:33:44 -0300, Alvaro Herrera  
wrote in 
> On 2019-Dec-13, Kyotaro Horiguchi wrote:
> 
> > At Thu, 12 Dec 2019 22:50:20 +, "Bossart, Nathan"  
> > wrote in 
> 
> > > The crux of the issue seems to be that XLogWrite() does not wait for
> > > the entire record to be written to disk before creating the ".ready"
> > > file.  Instead, it just waits for the last page of the segment to be
> > > written before notifying the archiver.  If PostgreSQL crashes before
> > > it is able to write the rest of the record, it will end up reusing the
> > > ".ready" segment at the end of crash recovery.  In the meantime, the
> > > archiver process may have already processed the old version of the
> > > segment.
> > 
> > Year, that can happen if the server restarted after the crash.
> 
> ... which is the normal way to run things, no?

Yes. In older version (< 10), the default value for wal_level was
minimal. In 10, the default only for wal_level was changed to
replica. Still I'm not sure if restart_after_crash can be recommended
for streaming replcation...

> Why is it bad?  It's the default value.

I reconsider it more deeply. And concluded that's not harm replication
as I thought.

WAL-buffer overflow may write partial continuation record and it can
be flushed immediately. That made me misunderstood that standby can
receive only the first half of a continuation record. Actually, that
write doesn't advance LogwrtResult.Flush. So standby doesn't receive a
split record on page boundary. (The cases where crashed mater is used
as new standby as-is might contaminate my thought..)

Sorry for the bogus comment.  My conclusion here is that
restart_after_crash doesn't seem to harm standby immediately.

> > The standby can be incosistent at the time of master crash, so it
> > should be fixed using pg_rewind or should be recreated from a base
> > backup.
> 
> Surely the master will just come up and replay its WAL, and there should
> be no inconsistency.
> 
> You seem to be thinking that a standby is promoted immediately on crash
> of the master, but this is not a given.

Basically no, but it might be mixed a bit. Anyway returning to the
porposal, I think that XLogWrite can be called during at
WAL-buffer-full and it can go into the last page in a segment. The
proposed patch doesn't work since the XLogWrite call didn't write the
whole continuation record. But I'm not sure that corner-case is worth
amendint..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Allow cluster owner to bypass authentication

2019-12-17 Thread Peter Eisentraut

On 2019-12-17 05:40, Stephen Frost wrote:

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:

The idea is that if you connect over a Unix-domain socket and the local
(effective) user is the same as the server's (effective) user, then
access should be granted immediately without any checking of
pg_hba.conf.  Because it's "your own" server and you can do anything you
want with it anyway.


While I understand where you're generally coming from, I'm not entirely
convinced that this is a good direction to go in.  Yes, you could go
change pg_hba.conf (maybe..)- but would doing so trigger an email to
someone else?  Would you really be able to change pg_hba.conf when you
consider more restrictive environments, like where there are SELinux
checks?  These days, a simple getpeerid() doesn't actually convey all of
the information about a process that would let you be confident that the
client really has the same access to the system that the running PG
server does.


I realize that there are a number of facilities nowadays to do enhanced 
security setups.  But let's consider what 99% of users are using.  If 
the database server runs as user X and you are logged in as user X, you 
should be able to manage the database server that is running as user X 
without further restrictions.  Anything else would call into question 
the entire security model that postgres is built around.  But also, 
there is an option to turn this off in my patch, if you really have the 
need.



This addresses the shortcomings of using peer as the default mechanism
in initdb.  In a subsequent step, my idea would be to make the default
initdb authentication setup to use md5 (or scram, tbd.) for both local
and host.


I'm definitely in favor of having 'peer' be used by default in initdb.


'peer' is not good default for initdb.  Consider setting up a database 
server on a notional multiuser host with peer authentication.  As soon 
as you create a database user, that would allow some random OS user to 
log into your database server, if the name matches.  'peer' is useful if 
there is a strong coordination between the OS user creation and the 
database user creation.  But the default set up by initdb should really 
only let the instance owner in by default and require some additional 
authentication (like passwords) from everybody else.  'peer' cannot 
express that.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: archive status ".ready" files may be created too early

2019-12-17 Thread Kyotaro Horiguchi
Uggg. I must apologyze for the last bogus comment.

At Fri, 13 Dec 2019 21:24:36 +, "Bossart, Nathan"  
wrote in 
> On 12/12/19, 8:08 PM, "Kyotaro Horiguchi"  wrote:
> > As the result the patch doesn't seem to save anything than setting up
> > and operating correctly.
> 
> Disregarding the behavior of standby servers for a minute, I think

I'm sorry. a continuation record split beyond a segment boundary
doesn't seem to harm replication. Please forget it.

> that what I've described is still a problem for archiving.  If the

Yeah, I think that happens and it seems a problem.

> segment is archived too early, point-in-time restores that require it
> will fail.  If the server refuses to overwrite existing archive files,
> the archiver process may fail to process the "good" version of the
> segment until someone takes action to fix it.  I think this is
> especially troubling for backup utilities like pgBackRest that check
> the archive_status directory independently since it is difficult to
> know if the segment is truly ".ready".
> 
> I've attached a slightly improved patch to show how this might be
> fixed.  I am curious what concerns there are about doing something
> like it to prevent this scenario.

Basically, I agree to the direction, where the .ready notification is
delayed until all requested WAL bytes are written out.

But I think I found a corner case where the patch doesn't work. As I
mentioned in another message, if WAL buffer was full,
AdvanceXLInsertBuffer calls XLogWrite to write out the victim buffer
regardless whether the last record in the page was the first half of a
continuation record. XLogWrite can mark the segment as .ready even
with the patch.

Is that correct? And do you think the corner case is worth amending?

If so, we could amend also that case by marking the last segment as
.ready when XLogWrite writes the first bytes of the next segment. (As
the further corner-case, it still doesn't work if a contination record
spans over trhee or more segments.. But I don't (or want not to) think
we don't need to consider that case..)


Thoughts?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [PATCH] Improve documentation of REINDEX options

2019-12-17 Thread Josef Šimánek
út 17. 12. 2019 v 6:36 odesílatel Michael Paquier 
napsal:

> On Fri, Dec 13, 2019 at 10:28:33AM +0100, Josef Šimánek wrote:
> > I have prepared patch to improve documentation for REINDEX. It
> > should be more inline with another documentation pages.
> >
> > You can see the change applied in attached file. Patch can be found at
> > https://github.com/simi/postgres/pull/3 (diff -
> > https://github.com/simi/postgres/pull/3.diff, patch -
> > https://github.com/simi/postgres/pull/3.patch).
>
> Please, always attach your patches to emails sent on this mailing
> list.  If for a reason or another, the data located to with external
> link is lost (imagine for example that your github account is gone or
> that github is reduced to ashes), then such patches would be lost, and
> anybody looking at this email 10 years from now would not know what
> you have been writing about here.  I am attaching it here for the
> archive's sake.
>
> +where option can
> be:
> +
> +VERBOSE
> Why not...  We did that in the docs of ANALYZE for v11 when
> introducing the parenthesized grammar flavor for the options
> available.
>
> -   Rebuild all the indexes on the table my_table:
> +   Rebuild all the indexes on the table my_table
> with progress report per index:
>
>  
> -REINDEX TABLE my_table;
> +REINDEX (VERBOSE) TABLE my_table;
> Not sure if this part brings much to the reader though.  It is not
> like the command description of REINDEX is complicated with dozens
> of option choices.
>

For me this is the default way how to reindex whole table manually in psql
since you get some "progress". Anyway I can remove it if you don't see any
benefit in extending this example.


> --
> Michael
>


Re: [PATCH] Improve documentation of REINDEX options

2019-12-17 Thread Josef Šimánek
út 17. 12. 2019 v 6:36 odesílatel Michael Paquier 
napsal:

> On Fri, Dec 13, 2019 at 10:28:33AM +0100, Josef Šimánek wrote:
> > I have prepared patch to improve documentation for REINDEX. It
> > should be more inline with another documentation pages.
> >
> > You can see the change applied in attached file. Patch can be found at
> > https://github.com/simi/postgres/pull/3 (diff -
> > https://github.com/simi/postgres/pull/3.diff, patch -
> > https://github.com/simi/postgres/pull/3.patch).
>
> Please, always attach your patches to emails sent on this mailing
> list.  If for a reason or another, the data located to with external
> link is lost (imagine for example that your github account is gone or
> that github is reduced to ashes), then such patches would be lost, and
> anybody looking at this email 10 years from now would not know what
> you have been writing about here.  I am attaching it here for the
> archive's sake.
>

Sorry, I'm attaching the same patch now for future reference.


>
> +where option can
> be:
> +
> +VERBOSE
> Why not...  We did that in the docs of ANALYZE for v11 when
> introducing the parenthesized grammar flavor for the options
> available.
>
> -   Rebuild all the indexes on the table my_table:
> +   Rebuild all the indexes on the table my_table
> with progress report per index:
>
>  
> -REINDEX TABLE my_table;
> +REINDEX (VERBOSE) TABLE my_table;
> Not sure if this part brings much to the reader though.  It is not
> like the command description of REINDEX is complicated with dozens
> of option choices.
> --
> Michael
>
From 992fbb946daf0e5483319665c74a1a75dfea503b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= 
Date: Fri, 13 Dec 2019 01:59:07 +0100
Subject: [PATCH] Unify REINDEX options documentation and extend examples.

---
 doc/src/sgml/ref/reindex.sgml | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 10881ab03a85..007cceb320ee 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,11 @@ PostgreSQL documentation
 
  
 
-REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name
+REINDEX [ ( option [, ...] ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name
+
+where option can be:
+
+VERBOSE
 
  
 
@@ -422,10 +426,10 @@ REINDEX INDEX my_index;
   
 
   
-   Rebuild all the indexes on the table my_table:
+   Rebuild all the indexes on the table my_table with progress report per index:
 
 
-REINDEX TABLE my_table;
+REINDEX (VERBOSE) TABLE my_table;
 
   
 


unsupportable composite type partition keys

2019-12-17 Thread Amit Langote
Hi,

It seems to me that we currently allow expressions that are anonymous
and self-referencing composite type records as partition key, but
shouldn't.  Allowing them leads to this:

create table foo (a int) partition by list ((row(a, b)));
create table foo1 partition of foo for values in ('(1)'::foo);
create table foo2 partition of foo for values in ('(2)'::foo);
explain select * from foo where row(a) = '(1)'::foo;
ERROR:  stack depth limit exceeded

Stack trace is this:

#0  errfinish (dummy=0) at elog.c:442
#1  0x00911a51 in check_stack_depth () at postgres.c:3288
#2  0x007970e6 in expression_tree_mutator (node=0x31890a0,
mutator=0x82095f ,
context=0x7fff0578ef60) at nodeFuncs.c:2526
#3  0x0082340b in eval_const_expressions_mutator
(node=0x31890a0, context=0x7fff0578ef60) at clauses.c:3605
#4  0x0079875c in expression_tree_mutator (node=0x31890f8,
mutator=0x82095f ,
context=0x7fff0578ef60) at nodeFuncs.c:2996
#5  0x0082340b in eval_const_expressions_mutator
(node=0x31890f8, context=0x7fff0578ef60) at clauses.c:3605
#6  0x0079810c in expression_tree_mutator (node=0x3188cc8,
mutator=0x82095f ,
context=0x7fff0578ef60) at nodeFuncs.c:2863
#7  0x0082225d in eval_const_expressions_mutator
(node=0x3188cc8, context=0x7fff0578ef60) at clauses.c:3154
#8  0x0079875c in expression_tree_mutator (node=0x3189240,
mutator=0x82095f ,
context=0x7fff0578ef60) at nodeFuncs.c:2996
#9  0x0082340b in eval_const_expressions_mutator
(node=0x3189240, context=0x7fff0578ef60) at clauses.c:3605
#10 0x0082090c in eval_const_expressions (root=0x0,
node=0x3189240) at clauses.c:2265
#11 0x00a75169 in RelationBuildPartitionKey
(relation=0x7f5ca3e479a8) at partcache.c:139
#12 0x00a7aa5e in RelationBuildDesc (targetRelId=17178,
insertIt=true) at relcache.c:1171
#13 0x00a7c975 in RelationIdGetRelation (relationId=17178) at
relcache.c:2035
#14 0x0048e0c0 in relation_open (relationId=17178, lockmode=1)
at relation.c:59
#15 0x00a8a4f7 in load_typcache_tupdesc (typentry=0x1c16bc0)
at typcache.c:793
#16 0x00a8a3bb in lookup_type_cache (type_id=17180, flags=256)
at typcache.c:748
#17 0x00a8bba4 in lookup_rowtype_tupdesc_internal
(type_id=17180, typmod=-1, noError=false) at typcache.c:1570
#18 0x00a8be43 in lookup_rowtype_tupdesc (type_id=17180,
typmod=-1) at typcache.c:1656
#19 0x00a0713f in record_cmp (fcinfo=0x7fff0578f4d0) at rowtypes.c:815
#20 0x00a083e2 in btrecordcmp (fcinfo=0x7fff0578f4d0) at rowtypes.c:1276
#21 0x00a97bd9 in FunctionCall2Coll (flinfo=0x2bb4a98,
collation=0, arg1=51939144, arg2=5194) at fmgr.c:1162
#22 0x008443f6 in qsort_partition_list_value_cmp (a=0x3188c50,
b=0x3188c58, arg=0x2bb46c0) at partbounds.c:1769
#23 0x00af9dc6 in qsort_arg (a=0x3188c50, n=2, es=8,
cmp=0x84439a , arg=0x2bb46c0) at
qsort_arg.c:132
#24 0x0084186a in create_list_bounds (boundspecs=0x3188650,
nparts=2, key=0x2bb46c0, mapping=0x7fff0578f7d8) at partbounds.c:396
#25 0x008410ec in partition_bounds_create
(boundspecs=0x3188650, nparts=2, key=0x2bb46c0,
mapping=0x7fff0578f7d8) at partbounds.c:206
#26 0x00847622 in RelationBuildPartitionDesc
(rel=0x7f5ca3e47560) at partdesc.c:205
#27 0x00a7aa6a in RelationBuildDesc (targetRelId=17178,
insertIt=true) at relcache.c:1172

Also:

create table foo (a int) partition by list ((row(a)));
create table foo1 partition of foo for values in (row(1));
create table foo2 partition of foo for values in (row(2));

explain select * from foo where row(a) = '(1)'::foo;
QUERY PLAN
--
 Seq Scan on foo1 foo  (cost=0.00..41.88 rows=13 width=4)
   Filter: (ROW(a) = '(1)'::foo)
(2 rows)

explain select * from foo where row(a) = '(2)'::foo;
QUERY PLAN
--
 Seq Scan on foo2 foo  (cost=0.00..41.88 rows=13 width=4)
   Filter: (ROW(a) = '(2)'::foo)
(2 rows)

-- another session
explain select * from foo where row(a) = '(1)'::foo;
ERROR:  record type has not been registered
LINE 1: explain select * from foo where row(a) = '(1)'::foo;

Attached a patch to fix that.

Thanks,
Amit
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index daa80ec4aa..8e5e3296df 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -15157,8 +15157,9 @@ ComputePartitionAttrs(ParseState *pstate, Relation rel, 
List *partParams, AttrNu
 */
 
/*
-* Cannot have expressions containing whole-row 
references or
-* system column references.
+* Cannot have expressions containing whole-row 
references,
+* system column references, or anonymous 

Re: Reorderbuffer crash during recovery

2019-12-17 Thread Amit Kapila
On Wed, Dec 11, 2019 at 11:13 AM vignesh C  wrote:
>
>
> It sets the final_lsn here so that it can iterate from the start_lsn
> to final_lsn and cleanup the serialized files in
> ReorderBufferRestoreCleanup function. One solution We were thinking
> was to store the lsn of the last serialized change while serialiizing
> and set the final_lsn in the above case where it crashes like the
> below code:

Sure, we can do something on the lines what you are suggesting, but
why can't we update final_lsn at the time of serializing the changes?
If we do that then we don't even need to compute it separately during
ReorderBufferAbortOld.

Let me try to explain the problem and proposed solutions for the same.
Currently, after serializing the changes we remove the 'changes' from
ReorderBufferTXN.  Now, if the system crashes due to any reason after
that, we won't be able to compute final_lsn after the restart.  And
that leads to access violation in ReorderBufferAbortOld which is
trying to access changes list from ReorderBufferTXN to compute
final_lsn.

We could fix it by either tracking 'last_serialized_change' as
proposed by Vignesh or we could update the final_lsn while we
serialize the changes.

IIUC, commit df9f682c7bf81674b6ae3900fd0146f35df0ae2e [1] tried to fix
some related issue which leads to this another problem.  Alvaro,
Andres, do you have any suggestions?


[1] -
commit df9f682c7bf81674b6ae3900fd0146f35df0ae2e
Author: Alvaro Herrera 
Date:   Fri Jan 5 12:17:10 2018 -0300

Fix failure to delete spill files of aborted transactions

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: pg_upgrade fails with non-standard ACL

2019-12-17 Thread Arthur Zakirov

Hello,

On 2019/12/05 11:31, Michael Paquier wrote:

On Wed, Dec 04, 2019 at 06:15:52PM +0900, Arthur Zakirov wrote:

Ah, I thought that pg_identify_object() gives properly quoted identity, and
it could be used to make SQL script.


It depends on the object type.  For columns I can see in your patch
that you are using a dedicated code path, but I don't think that we
should assume that there is an exact one-one mapping between the
object type and the grammar of GRANT/REVOKE for the object type
supported because both are completely independent things and
facilities.  Take for example foreign data wrappers.  Of course for
this patch this depends if we have system object of the type which
would be impacted.  That's not the case of foreign data wrappers and
likely it will never be, but there is no way to be sure that this
won't get broken silently in the future.


I attached new version of the patch. It still uses pg_identify_object(), 
I'm not sure about other ways to build identities yet.


It handles relations, their columns, functions, procedure and languages. 
There are other GRANT'able objects, like databases, foreign data 
wrappers and servers, schemas and tablespaces.


I didn't include handling of databases, schemas and tablespaces because 
I don't know yet how to identify if a such object is system object other 
than just hard code them. They are not pinned and are not created within 
pg_catalog schema.


Foreign data wrappers and servers are not handled too. There are no such 
built-in objects, so it is not possible to test them with 
"test_rename_catalog_objects.patch". And I'm not sure if they will be 
pinned (to test if it is system) if there will be such objects in the 
future.



Sure! But I'm not sure that I understood the idea. Do you mean the patch
which adds a TAP test? It can initialize two clusters, in first it renames
some system objects and changes their ACLs. And finally the test runs
pg_upgrade which will fail.


A TAP test won't help here because the idea is to create a patch for
HEAD which willingly introduces changes for system objects, where the
source binaries have ACLs on object types which are broken on the
target binaries with the custom patch.  That's to make sure that all
the logic which would get added to pu_upgrade is working correctly.
Other ideas are welcome.


I added "test_rename_catalog_objects.patch" and 
"test_add_acl_to_catalog_objects.sql". So testing steps are following:
- initialize new source instance (e.g. pg v12) and run 
"test_add_acl_to_catalog_objects.sql" on it
- apply "pg_upgrade_ACL_check_v6.patch" and 
"test_add_acl_to_catalog_objects.sql" for HEAD

- initialize new target instance for HEAD
- run pg_upgrade, it should create revoke_objects.sql file

"test_rename_catalog_objects.patch" should be applied after 
"pg_upgrade_ACL_check_v6.patch".


Renamed objects are the following:
- table pg_subscription -> pg_sub
- columns pg_subscription.subenabled -> subactive, 
pg_subscription.subconninfo -> subconn

- view pg_stat_subscription -> pg_stat_sub
- columns pg_stat_subscription.received_lsn -> received_location, 
pg_stat_subscription.latest_end_lsn -> latest_end_location

- function pg_stat_get_subscription -> pg_stat_get_sub
- language sql -> pgsql

--
Arthur
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index cdd8788b9e..3a2004c23b 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -16,6 +16,7 @@
 
 static void check_new_cluster_is_empty(void);
 static void check_databases_are_compatible(void);
+static void check_for_changed_signatures(void);
 static void check_locale_and_encoding(DbInfo *olddb, DbInfo *newdb);
 static bool equivalent_locale(int category, const char *loca, const char 
*locb);
 static void check_is_install_user(ClusterInfo *cluster);
@@ -142,6 +143,8 @@ check_and_dump_old_cluster(bool live_check)
if (GET_MAJOR_VERSION(old_cluster.major_version) <= 804)
new_9_0_populate_pg_largeobject_metadata(_cluster, true);
 
+   get_non_default_acl_infos(_cluster);
+
/*
 * While not a check option, we do this now because this is the only 
time
 * the old server is running.
@@ -161,6 +164,7 @@ check_new_cluster(void)
 
check_new_cluster_is_empty();
check_databases_are_compatible();
+   check_for_changed_signatures();
 
check_loadable_libraries();
 
@@ -443,6 +447,223 @@ check_databases_are_compatible(void)
}
 }
 
+/*
+ * Find the location of the last dot, return NULL if not found.
+ */
+static char *
+last_dot_location(const char *identity)
+{
+   const char *p,
+  *ret = NULL;
+
+   for (p = identity; *p; p++)
+   if (*p == '.')
+   ret = p;
+   return unconstify(char *, ret);
+}
+
+/*
+ * check_for_changed_signatures()
+ *
+ * Checks that the old cluster doesn't have non-default ACL's for system 
objects
+ * which had different signatures in the new