Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2020-03-24 Thread Fujii Masao




On 2020/01/28 0:24, Fujii Masao wrote:



On 2020/01/15 19:18, Paul Guo wrote:

I further fixed the last test failure (due to a small bug in the test, not in 
code). Attached are the new patch series. Let's see the CI pipeline result.


Thanks for updating the patches!

I started reading the 0003 patch.


I marked this patch as Waiting on Author in CF because there is no update
since my last review comments. Could you mark it as Needs Review again
if you post the updated version of the patch.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: Missing errcode() in ereport

2020-03-24 Thread Tom Lane
Thomas Munro  writes:
> I think this caused anole to say:

> "reloptions.c", line 1362: error #2042: operand types are incompatible
> ("void" and "int")
>   errdetail_internal("%s", _(optenum->detailmsg)) : 0));

Yeah, I was just looking at that :-(

We could revert the change to have these functions return void,
or we could run around and change the places with this usage
pattern to use "(void) 0" instead of just "0".  The latter would
be somewhat painful if only minority compilers warn, though.
Also, I don't think that having to change ereport usage was part
of the agreed-to plan here ... so I'm leaning to the former.

regards, tom lane




Re: Missing errcode() in ereport

2020-03-24 Thread Thomas Munro
On Wed, Mar 25, 2020 at 9:30 AM Tom Lane  wrote:
> Andres Freund  writes:
> > On 2020-03-23 17:24:49 -0400, Tom Lane wrote:
> >> Hearing no objections, I started to review Andres' patchset with
> >> that plan in mind.
>
> > Thanks for pushing the first part!
>
> I pushed all of it, actually.

I think this caused anole to say:

"reloptions.c", line 1362: error #2042: operand types are incompatible
("void" and "int")
  errdetail_internal("%s", _(optenum->detailmsg)) : 0));




Re: replay pause vs. standby promotion

2020-03-24 Thread Fujii Masao




On 2020/03/25 0:17, Sergei Kornilov wrote:

Hello


I pushed the latest version of the patch. If you have further opinion
about immediate promotion, let's keep discussing that!


Thank you!

Honestly, I forgot that the promotion is documented in high-availability.sgml 
as:


Before failover, any WAL immediately available in the archive or in pg_wal will 
be
restored, but no attempt is made to connect to the master.


I mistakenly thought that promote should be "immediately"...


If a promotion is triggered while recovery is paused, the paused state ends and 
a promotion continues.


Could we add a few words in func.sgml to clarify the behavior? Especially for 
users from my example above. Something like:


If a promotion is triggered while recovery is paused, the paused state ends, 
replay of any WAL immediately available in the archive or in pg_wal will be 
continued and then a promotion will be completed.


This description is true if pause is requested by pg_wal_replay_pause(),
but not if recovery target is reached and pause is requested by
recovery_target_action=pause. In the latter case, even if there are WAL data
avaiable in pg_wal or archive, they are not replayed, i.e., the promotion
completes immediately. Probably we should document those two cases
explicitly to avoid the confusion about a promotion and recovery pause?

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: error context for vacuum to include block number

2020-03-24 Thread Masahiko Sawada
On Wed, 25 Mar 2020 at 14:08, Justin Pryzby  wrote:
>
> On Wed, Mar 25, 2020 at 10:22:21AM +0530, Amit Kapila wrote:
> > On Wed, Mar 25, 2020 at 10:05 AM Masahiko Sawada
> >  wrote:
> > >
> > > On Wed, 25 Mar 2020 at 12:44, Amit Kapila  wrote:
> > > >
> > > > On Tue, Mar 24, 2020 at 7:51 PM Masahiko Sawada
> > > >  wrote:
> > > > >
> > > > >
> > > > > I got the point. But if we set the error context before that, I think
> > > > > we need to change the error context message. The error context message
> > > > > of heap truncation phase is "while truncating relation \"%s.%s\" to %u
> > > > > blocks", but cbarg->blkno will be the number of blocks of the current
> > > > > relation.
> > > > >
> > > > >case VACUUM_ERRCB_PHASE_TRUNCATE:
> > > > >if (BlockNumberIsValid(cbarg->blkno))
> > > > >errcontext("while truncating relation \"%s.%s\" to %u 
> > > > > blocks",
> > > > >   cbarg->relnamespace, cbarg->relname, 
> > > > > cbarg->blkno);
> > > > >break;
> > > > >
> > > >
> > > > Do you mean to say that actually we are just prefetching or reading
> > > > the pages in count_nondeletable_pages() but the message doesn't have
> > > > any such indication?  If not that, what problem do you see with the
> > > > message?   What is your suggestion?
> > >
> > > I meant that with the patch, suppose that the table has 100 blocks and
> > > we're truncating it to 50 blocks in RelationTruncate(), the error
> > > context message will be "while truncating relation "aaa.bbb" to 100
> > > blocks", which is not correct. I think it should be "while truncating
> > > relation "aaa.bbb" to 50 blocks". We can know the relation can be
> > > truncated to 50 blocks by the result of count_nondeletable_pages(). So
> > > if we update the arguments before it we will use the number of blocks
> > > of relation before truncation.
> > >
> >
> > Won't the latest patch by Justin will fix this as he has updated the
> > block count after count_nondeletable_pages?  Apart from that, I feel
>
> The issue is if the error happens *during* count_nondeletable_pages().
> We don't want it to say "truncating relation to 100 blocks".

Right.

>
> > the first call to update_vacuum_error_cbarg in lazy_truncate_heap
> > should have input parameter as vacrelstats->nonempty_pages instead of
> > new_rel_pages to indicate the remaining pages after truncation?
>
> Yea, I think that addresses the issue.

+1

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: error context for vacuum to include block number

2020-03-24 Thread Justin Pryzby
On Wed, Mar 25, 2020 at 10:22:21AM +0530, Amit Kapila wrote:
> On Wed, Mar 25, 2020 at 10:05 AM Masahiko Sawada
>  wrote:
> >
> > On Wed, 25 Mar 2020 at 12:44, Amit Kapila  wrote:
> > >
> > > On Tue, Mar 24, 2020 at 7:51 PM Masahiko Sawada
> > >  wrote:
> > > >
> > > >
> > > > I got the point. But if we set the error context before that, I think
> > > > we need to change the error context message. The error context message
> > > > of heap truncation phase is "while truncating relation \"%s.%s\" to %u
> > > > blocks", but cbarg->blkno will be the number of blocks of the current
> > > > relation.
> > > >
> > > >case VACUUM_ERRCB_PHASE_TRUNCATE:
> > > >if (BlockNumberIsValid(cbarg->blkno))
> > > >errcontext("while truncating relation \"%s.%s\" to %u 
> > > > blocks",
> > > >   cbarg->relnamespace, cbarg->relname, 
> > > > cbarg->blkno);
> > > >break;
> > > >
> > >
> > > Do you mean to say that actually we are just prefetching or reading
> > > the pages in count_nondeletable_pages() but the message doesn't have
> > > any such indication?  If not that, what problem do you see with the
> > > message?   What is your suggestion?
> >
> > I meant that with the patch, suppose that the table has 100 blocks and
> > we're truncating it to 50 blocks in RelationTruncate(), the error
> > context message will be "while truncating relation "aaa.bbb" to 100
> > blocks", which is not correct. I think it should be "while truncating
> > relation "aaa.bbb" to 50 blocks". We can know the relation can be
> > truncated to 50 blocks by the result of count_nondeletable_pages(). So
> > if we update the arguments before it we will use the number of blocks
> > of relation before truncation.
> >
> 
> Won't the latest patch by Justin will fix this as he has updated the
> block count after count_nondeletable_pages?  Apart from that, I feel

The issue is if the error happens *during* count_nondeletable_pages().
We don't want it to say "truncating relation to 100 blocks".

> the first call to update_vacuum_error_cbarg in lazy_truncate_heap
> should have input parameter as vacrelstats->nonempty_pages instead of
> new_rel_pages to indicate the remaining pages after truncation?

Yea, I think that addresses the issue.

-- 
Justin




Re: bad logging around broken restore_command

2020-03-24 Thread Fujii Masao




On 2020/03/10 11:47, Kyotaro Horiguchi wrote:

At Thu, 6 Feb 2020 23:23:42 +0900, Fujii Masao  
wrote in

On 2020/02/06 1:10, Jeff Janes wrote:

If the restore command claims to have succeeded, but fails to have created
a file with the right name (due to typos or escaping or quoting issues, for
example), no complaint is issued at the time, and it then fails later with
a relatively uninformative error message like "could not locate required
checkpoint record".

...

I don't see why ENOENT is thought to deserve the silent treatment.  It
seems weird that success gets logged ("restored log file \"%s\" from
archive"), but one particular type of unexpected failure does not.


Agreed.


In the first place it is not perfectly silent and that problem cannot
happen.  In the ENOENT case, the function reports "could not restore
file \"%s\" from archive: %s", but with DEBUG2 then returns false, and
the callers treat the failure properly.


Yes.


I've attached a patch which emits a LOG message for ENOENT.

Isn't it better to use "could not stat file" message even in ENOENT
case?
If yes, something like message that you used in the patch should be
logged as DETAIL or HINT message. So, what about the attached patch?


If you want to see some log messages in the case, it is sufficient to
raise the loglevel of the existing message from DEBUG2 to LOG.


Isn't it too noisy to log every time when we could not restore
the archived file? In archive recovery case, it's common to fail
to restore archive files and try to replay WAL files in pg_wal.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: error context for vacuum to include block number

2020-03-24 Thread Amit Kapila
On Wed, Mar 25, 2020 at 10:05 AM Masahiko Sawada
 wrote:
>
> On Wed, 25 Mar 2020 at 12:44, Amit Kapila  wrote:
> >
> > On Tue, Mar 24, 2020 at 7:51 PM Masahiko Sawada
> >  wrote:
> > >
> > >
> > > I got the point. But if we set the error context before that, I think
> > > we need to change the error context message. The error context message
> > > of heap truncation phase is "while truncating relation \"%s.%s\" to %u
> > > blocks", but cbarg->blkno will be the number of blocks of the current
> > > relation.
> > >
> > >case VACUUM_ERRCB_PHASE_TRUNCATE:
> > >if (BlockNumberIsValid(cbarg->blkno))
> > >errcontext("while truncating relation \"%s.%s\" to %u 
> > > blocks",
> > >   cbarg->relnamespace, cbarg->relname, 
> > > cbarg->blkno);
> > >break;
> > >
> >
> > Do you mean to say that actually we are just prefetching or reading
> > the pages in count_nondeletable_pages() but the message doesn't have
> > any such indication?  If not that, what problem do you see with the
> > message?   What is your suggestion?
>
> I meant that with the patch, suppose that the table has 100 blocks and
> we're truncating it to 50 blocks in RelationTruncate(), the error
> context message will be "while truncating relation "aaa.bbb" to 100
> blocks", which is not correct. I think it should be "while truncating
> relation "aaa.bbb" to 50 blocks". We can know the relation can be
> truncated to 50 blocks by the result of count_nondeletable_pages(). So
> if we update the arguments before it we will use the number of blocks
> of relation before truncation.
>

Won't the latest patch by Justin will fix this as he has updated the
block count after count_nondeletable_pages?  Apart from that, I feel
the first call to update_vacuum_error_cbarg in lazy_truncate_heap
should have input parameter as vacrelstats->nonempty_pages instead of
new_rel_pages to indicate the remaining pages after truncation?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-24 Thread Justin Pryzby
On Wed, Mar 25, 2020 at 01:34:43PM +0900, Masahiko Sawada wrote:
> I meant that with the patch, suppose that the table has 100 blocks and
> we're truncating it to 50 blocks in RelationTruncate(), the error
> context message will be "while truncating relation "aaa.bbb" to 100
> blocks", which is not correct.

> I think it should be "while truncating
> relation "aaa.bbb" to 50 blocks". We can know the relation can be
> truncated to 50 blocks by the result of count_nondeletable_pages(). So
> if we update the arguments before it we will use the number of blocks
> of relation before truncation.

Hm, yea, at that point it's:
|new_rel_pages = RelationGetNumberOfBlocks(onerel);
..so we can do better.

> My suggestion is either that we change the error message to, for
> example, "while truncating relation "aaa.bbb" having 100 blocks", or
> that we change the patch so that we can use "50 blocks" in the error
> context message.

We could do:

update_vacuum_error_cbarg(vacrelstats,
  VACUUM_ERRCB_PHASE_TRUNCATE,
  InvalidBlockNumber, NULL, false);

new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
vacrelstats->blkno = new_rel_pages;

...

case VACUUM_ERRCB_PHASE_TRUNCATE:
if (BlockNumberIsValid(cbarg->blkno))
errcontext("while truncating relation \"%s.%s\" to %u blocks",
   cbarg->relnamespace, cbarg->relname, cbarg->blkno);
else
/* Error happened before/during count_nondeletable_pages() */
errcontext("while truncating relation \"%s.%s\"",
   cbarg->relnamespace, cbarg->relname);
break;

-- 
Justin




Re: error context for vacuum to include block number

2020-03-24 Thread Masahiko Sawada
On Wed, 25 Mar 2020 at 12:44, Amit Kapila  wrote:
>
> On Tue, Mar 24, 2020 at 7:51 PM Masahiko Sawada
>  wrote:
> >
> > On Tue, 24 Mar 2020 at 22:37, Amit Kapila  wrote:
> > >
> > > On Tue, Mar 24, 2020 at 6:18 PM Masahiko Sawada
> > >  wrote:
> > > >
> > > >
> > > > I've read through the latest version patch (v31), here are my comments:
> > > >
> > > > 1.
> > > > +/* Update error traceback information */
> > > > +olderrcbarg = *vacrelstats;
> > > > +update_vacuum_error_cbarg(vacrelstats,
> > > > +  VACUUM_ERRCB_PHASE_TRUNCATE,
> > > > new_rel_pages, NULL,
> > > > +  false);
> > > > +
> > > >  /*
> > > >   * Scan backwards from the end to verify that the end pages 
> > > > actually
> > > >   * contain no tuples.  This is *necessary*, not optional, 
> > > > because
> > > >   * other backends could have added tuples to these pages 
> > > > whilst we
> > > >   * were vacuuming.
> > > >   */
> > > >  new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
> > > >
> > > > We need to set the error context after setting new_rel_pages.
> > > >
> > >
> > > We want to cover the errors raised in count_nondeletable_pages().  In
> > > an earlier version of the patch, we had TRUNCATE_PREFETCH phase which
> > > use to cover those errors, but that was not good as we were
> > > setting/resetting it multiple times and it was not clear such a
> > > separate phase would add any value.
> >
> > I got the point. But if we set the error context before that, I think
> > we need to change the error context message. The error context message
> > of heap truncation phase is "while truncating relation \"%s.%s\" to %u
> > blocks", but cbarg->blkno will be the number of blocks of the current
> > relation.
> >
> >case VACUUM_ERRCB_PHASE_TRUNCATE:
> >if (BlockNumberIsValid(cbarg->blkno))
> >errcontext("while truncating relation \"%s.%s\" to %u 
> > blocks",
> >   cbarg->relnamespace, cbarg->relname, 
> > cbarg->blkno);
> >break;
> >
>
> Do you mean to say that actually we are just prefetching or reading
> the pages in count_nondeletable_pages() but the message doesn't have
> any such indication?  If not that, what problem do you see with the
> message?   What is your suggestion?

I meant that with the patch, suppose that the table has 100 blocks and
we're truncating it to 50 blocks in RelationTruncate(), the error
context message will be "while truncating relation "aaa.bbb" to 100
blocks", which is not correct. I think it should be "while truncating
relation "aaa.bbb" to 50 blocks". We can know the relation can be
truncated to 50 blocks by the result of count_nondeletable_pages(). So
if we update the arguments before it we will use the number of blocks
of relation before truncation.

My suggestion is either that we change the error message to, for
example, "while truncating relation "aaa.bbb" having 100 blocks", or
that we change the patch so that we can use "50 blocks" in the error
context message.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Fastpath while arranging the changes in LSN order in logical decoding

2020-03-24 Thread Dilip Kumar
On Wed, Mar 25, 2020 at 9:23 AM Amit Kapila  wrote:
>
> On Wed, Mar 25, 2020 at 12:46 AM Andres Freund  wrote:
> >
> > On 2020-03-24 18:36:03 +0530, Dilip Kumar wrote:
> > > IMHO, I have tried the best case but did not see any performance gain
> > > so I am not planning to test this further.  I have attached the patch
> > > for removing the TODO.
> >
> > Pushed. Thanks!
> >
>
> I have updated the CF entry.  Thanks to all involved in this.

Thanks!

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Fastpath while arranging the changes in LSN order in logical decoding

2020-03-24 Thread Amit Kapila
On Wed, Mar 25, 2020 at 12:46 AM Andres Freund  wrote:
>
> On 2020-03-24 18:36:03 +0530, Dilip Kumar wrote:
> > IMHO, I have tried the best case but did not see any performance gain
> > so I am not planning to test this further.  I have attached the patch
> > for removing the TODO.
>
> Pushed. Thanks!
>

I have updated the CF entry.  Thanks to all involved in this.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-24 Thread Amit Kapila
On Wed, Mar 25, 2020 at 8:49 AM Justin Pryzby  wrote:
>
> On Tue, Mar 24, 2020 at 09:47:30PM +0900, Masahiko Sawada wrote:
> > We need to set the error context after setting new_rel_pages.
>
> Done
>
> > The type name ErrCbPhase seems to be very generic name, how about
> > VacErrCbPhase or VacuumErrCbPhase?
>
> Done.
>
> Thanks for your review comments.
>

@@ -870,6 +904,12 @@ lazy_scan_heap(Relation onerel, VacuumParams
*params, LVRelStats *vacrelstats,
  else
  skipping_blocks = false;

+ /* Setup error traceback support for ereport() */
+ errcallback.callback = vacuum_error_callback;
+ errcallback.arg = vacrelstats;
+ errcallback.previous = error_context_stack;
+ error_context_stack = 

I think by mistake you have re-introduced this chunk of code.  We
don't need this as we have done it in the caller.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-24 Thread Amit Kapila
On Tue, Mar 24, 2020 at 7:51 PM Masahiko Sawada
 wrote:
>
> On Tue, 24 Mar 2020 at 22:37, Amit Kapila  wrote:
> >
> > On Tue, Mar 24, 2020 at 6:18 PM Masahiko Sawada
> >  wrote:
> > >
> > >
> > > I've read through the latest version patch (v31), here are my comments:
> > >
> > > 1.
> > > +/* Update error traceback information */
> > > +olderrcbarg = *vacrelstats;
> > > +update_vacuum_error_cbarg(vacrelstats,
> > > +  VACUUM_ERRCB_PHASE_TRUNCATE,
> > > new_rel_pages, NULL,
> > > +  false);
> > > +
> > >  /*
> > >   * Scan backwards from the end to verify that the end pages 
> > > actually
> > >   * contain no tuples.  This is *necessary*, not optional, because
> > >   * other backends could have added tuples to these pages whilst 
> > > we
> > >   * were vacuuming.
> > >   */
> > >  new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
> > >
> > > We need to set the error context after setting new_rel_pages.
> > >
> >
> > We want to cover the errors raised in count_nondeletable_pages().  In
> > an earlier version of the patch, we had TRUNCATE_PREFETCH phase which
> > use to cover those errors, but that was not good as we were
> > setting/resetting it multiple times and it was not clear such a
> > separate phase would add any value.
>
> I got the point. But if we set the error context before that, I think
> we need to change the error context message. The error context message
> of heap truncation phase is "while truncating relation \"%s.%s\" to %u
> blocks", but cbarg->blkno will be the number of blocks of the current
> relation.
>
>case VACUUM_ERRCB_PHASE_TRUNCATE:
>if (BlockNumberIsValid(cbarg->blkno))
>errcontext("while truncating relation \"%s.%s\" to %u blocks",
>   cbarg->relnamespace, cbarg->relname, cbarg->blkno);
>break;
>

Do you mean to say that actually we are just prefetching or reading
the pages in count_nondeletable_pages() but the message doesn't have
any such indication?  If not that, what problem do you see with the
message?   What is your suggestion?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-24 Thread Justin Pryzby
On Tue, Mar 24, 2020 at 09:47:30PM +0900, Masahiko Sawada wrote:
> We need to set the error context after setting new_rel_pages.

Done

> The type name ErrCbPhase seems to be very generic name, how about
> VacErrCbPhase or VacuumErrCbPhase?

Done.

Thanks for your review comments.

-- 
Justin
>From 26c57039135896ebf29b96c172d35d869ed1ce69 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v32 1/3] Introduce vacuum errcontext to display additional
 information.

The additional information displayed will be block number for error
occurring while processing heap and index name for error occurring
while processing the index.

This will help us in diagnosing the problems that occur during a vacuum.
For ex. due to corruption (either caused by bad hardware or by some bug)
if we get some error while vacuuming, it can help us identify the block
in heap and or additional index information.

It sets up an error context callback to display additional information
with the error.  During different phases of vacuum (heap scan, heap
vacuum, index vacuum, index clean up, heap truncate), we update the error
context callback to display appropriate information.  We can extend it to
a bit more granular level like adding the phases for FSM operations or for
prefetching the blocks while truncating. However, I felt that it requires
adding many more error callback function calls and can make the code a bit
complex, so left those for now.

Author: Justin Pryzby, with few changes by Amit Kapila
Reviewed-by: Alvaro Herrera, Amit Kapila, Andres Freund, Michael Paquier
and Sawada Masahiko
Discussion: https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 260 ---
 src/tools/pgindent/typedefs.list |   1 +
 2 files changed, 236 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43efc32..cbea791968 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -144,6 +144,17 @@
  */
 #define ParallelVacuumIsActive(lps) PointerIsValid(lps)
 
+/* Phases of vacuum during which we report error context. */
+typedef enum
+{
+	VACUUM_ERRCB_PHASE_UNKNOWN,
+	VACUUM_ERRCB_PHASE_SCAN_HEAP,
+	VACUUM_ERRCB_PHASE_VACUUM_INDEX,
+	VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+	VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
+	VACUUM_ERRCB_PHASE_TRUNCATE
+} VacErrCbPhase;
+
 /*
  * LVDeadTuples stores the dead tuple TIDs collected during the heap scan.
  * This is allocated in the DSM segment in parallel mode and in local memory
@@ -270,6 +281,8 @@ typedef struct LVParallelState
 
 typedef struct LVRelStats
 {
+	char	   *relnamespace;
+	char	   *relname;
 	/* useindex = true means two-pass strategy; false means one-pass */
 	bool		useindex;
 	/* Overall statistics about rel */
@@ -290,8 +303,12 @@ typedef struct LVRelStats
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
-} LVRelStats;
 
+	/* Used for error callback */
+	char	   *indname;
+	BlockNumber blkno;			/* used only for heap operations */
+	VacErrCbPhase	phase;
+} LVRelStats;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -314,10 +331,10 @@ static void lazy_vacuum_all_indexes(Relation onerel, Relation *Irel,
 	LVRelStats *vacrelstats, LVParallelState *lps,
 	int nindexes);
 static void lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
-			  LVDeadTuples *dead_tuples, double reltuples);
+			  LVDeadTuples *dead_tuples, double reltuples, LVRelStats *vacrelstats);
 static void lazy_cleanup_index(Relation indrel,
 			   IndexBulkDeleteResult **stats,
-			   double reltuples, bool estimated_count);
+			   double reltuples, bool estimated_count, LVRelStats *vacrelstats);
 static int	lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 			 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
 static bool should_attempt_truncation(VacuumParams *params,
@@ -337,13 +354,13 @@ static void lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult *
 		 int nindexes);
 static void parallel_vacuum_index(Relation *Irel, IndexBulkDeleteResult **stats,
   LVShared *lvshared, LVDeadTuples *dead_tuples,
-  int nindexes);
+  int nindexes, LVRelStats *vacrelstats);
 static void vacuum_indexes_leader(Relation *Irel, IndexBulkDeleteResult **stats,
   LVRelStats *vacrelstats, LVParallelState *lps,
   int nindexes);
 static void vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats,
 			 LVShared *lvshared, LVSharedIndStats *shared_indstats,
-			 LVDeadTuples *dead_tuples);
+			 LVDeadTuples *dead_tuples, LVRelStats *vacrelstats);
 static void lazy_cleanup_all_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
 	 LVRelStats *vacrelstats, 

Re: Collation versions on Windows (help wanted, apply within)

2020-03-24 Thread Thomas Munro
On Tue, Mar 24, 2020 at 7:56 AM Juan José Santamaría Flecha
 wrote:
> On Mon, Mar 23, 2020 at 5:59 AM Thomas Munro  wrote:
>> Done in this new 0002 patch (untested).  0001 removes the comment that
>> individual collations can't have a NULL version, reports NULL for
>> Linux/glibc collations like C.UTF-8 by stripping the suffix and
>> comparing with C and POSIX as suggested by Peter E.
>
>  It applies and passes tests without a problem in Windows, and works as 
> expected.

Thanks!  Pushed.

>From the things we learned in this thread, I think there is an open
item for someone to write a patch to call EnumSystemLocalesEx() and
populate the initial set of collations, where we use "locale -a" on
Unix.  I'm not sure where the encoding is supposed to come from
though, which is why I didn't try to write a patch myself.




Re: improve transparency of bitmap-only heap scans

2020-03-24 Thread Amit Kapila
On Wed, Mar 25, 2020 at 12:44 AM Tom Lane  wrote:
>
> I took a quick look through this patch.  While I see nothing to complain
> about implementation-wise, I'm a bit befuddled as to why we need this
> reporting when there is no comparable data provided for regular index-only
> scans.  Over there, you just get "Heap Fetches: n", and the existing
> counts for bitmap scans seem to cover the same territory.
>

Isn't deducing similar information ("Skipped Heap Fetches: n") there
is a bit easier than it is here?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




[PATCH] Fix CommitTransactionCommand() to CallXactCallbacks() in TBLOCK_ABORT_END

2020-03-24 Thread Dave Sharpe
Hi pghackers,
This is my first time posting here ...  Gilles Darold and I are developing a 
new FDW which is based on the contrib/postgres_fdw. The postgres_fdw logic uses 
a RegisterXactCallback function to send local transactions remote. But I found 
that a registered XactCallback is not always called for a successful client 
ROLLBACK statement. So, a successful local ROLLBACK does not get sent remote by 
FDW, and now the local and remote transaction states are messed up, out of 
sync. The local database is "eating" the successful rollback.

I attach a git format-patch file 
0001-Fix-CommitTransactionCommand-to-CallXactCallbacks-in.patch
The patch fixes the problem and is ready to commit as far as I can tell. It 
adds some comment lines and three lines of code to CommitTransactionCommand() 
in the TBLOCK_ABORT_END case. Line (1) to reset the transaction's blockState 
back to TBLOCK_ABORT, ahead of (2) a new call to callXactCallbacks(). If the 
callback returns successful (which is usually the case), (3) the new code 
switches back to TBLOCK_ABORT_END, then continues with old code to 
CleanupTransaction() as before. If any callback does error out, the 
TBLOCK_ABORT was the correct block state for an error.

I have not added a regression test since my scenario involves a C module ... I 
didn't see such a regression test, but somebody can teach me where to put the C 
module and Makefile if a regression test should be added. Heads up that the 
scenario to hit this is a bit complex (to me) ... I attach the following unit 
test files:
1) eat_rollback.c, _PG_init() installs my_utility_hook() for INFO log for 
debugging.
RegisterSubXactCallback(mySubtransactionCallback) which injects some logging 
and an ERROR for savepoints, i.e., negative testing, e.g., like a remote FDW 
savepoint failed.
And RegisterXactTransaction(myTransactionCallback) with info logging.
2) Makefile, to make the eat_rollback.so
3) eat_rollback2.sql, drives the fail sequence, especially the SAVEPOINT, which 
errors out, then later a successful ROLLBACK gets incorrectly eaten (no 
CALLBACK info logging, demonstrates the bug), then another successful ROLLBACK 
(now there is CALLBACK info logging).
4) eat_rollback2.out, output without the fix, the rollback at line 68 is 
successful but there is not myTransactionCallback() INFO output
5) eat_rollback2.fixed, output after the fix, the rollback at line 68 is 
successful, and now there is a myTransactionCallback() INFO log. Success!

It first failed for me in v11.1, I suspect it failed since before then too. And 
it is still failing in current master.

Bye the way, we worked around the bug in our FDW by handling sub/xact in the 
utility hook. A transaction callback is still needed for implicit, internal 
rollbacks. Another advantage of the workaround is (I think) it reduces the code 
complexity and improves performance because the entire subxact stack is not 
unwound to drive each and every subtransaction level to remote. Also 
sub/transaction statements are sent remote as they arrive (local and remote are 
more "transactionally" synced, not stacked by FDW for later). And of course, 
the workaround doesn't hit the above bug, since our utility hook correctly 
handles the client ROLLBACK. If it makes sense to the community, I could try 
and patch contrib/postgres_fdw to do what we are doing. But note that I don't 
need it myself: we have our own new FDW for remote DB2 z/OS (!) ... at LzLabs 
we are a building a software defined mainframe with PostgreSQL (of all things).

Hope it helps!

Dave Sharpe
I don't necessarily agree with everything I say. (MM)
www.lzlabs.com


0001-Fix-CommitTransactionCommand-to-CallXactCallbacks-in.patch
Description:  0001-Fix-CommitTransactionCommand-to-CallXactCallbacks-in.patch


eat_rollback.c
Description: eat_rollback.c


Makefile
Description: Makefile


eat_rollback2.sql
Description: eat_rollback2.sql


eat_rollback2.out
Description: eat_rollback2.out


eat_rollback2.fixed
Description: eat_rollback2.fixed


Re: Run-time pruning for ModifyTable

2020-03-24 Thread Amit Langote
Hi David,

Sorry I couldn't get to this sooner.

On Wed, Mar 25, 2020 at 9:49 AM David Rowley  wrote:
> On Wed, 25 Mar 2020 at 13:00, Tom Lane  wrote:
> > David Rowley  writes:
> > > I had a closer look at this today and the code I have in
> > > inheritance_planner() is certainly not right.
> >
> > Although I didn't get around to it for v13, there's still a plan on the
> > table for inheritance_planner() to get nuked from orbit [1].
> >
> > Maybe this improvement should be put on hold till that's done?
>
> Possibly. I'm not really wedded to the idea of getting it in. However,
> it would really only be the inheritance planner part that would need
> to be changed later. I don't think any of the other code would need to
> be adjusted.
>
> Amit shared his thoughts in [1].  If you'd rather I held off, then I will.
>
> David
>
> [1] 
> https://www.postgresql.org/message-id/CA%2BHiwqGhD7ieKsv5%2BGkmHgs-XhP2DoUhtESVb3MU-4j14%3DG6LA%40mail.gmail.com

Actually, I was saying in that email that the update/delete planning
overhaul being talked about will make the entirety of the
functionality this patch is adding, which is ModifyTable node being
able to prune its subplans based on run-time parameter values,
redundant.  That's because, with the overhaul, there won't be multiple
subplans under ModifyTable, only one which would take care of any
pruning that's necessary.

What I did say in favor of this patch though is that it doesn not seem
that invasive, so maybe okay to get in for v13.

-- 
Thank you,

Amit Langote
EnterpriseDB: http://www.enterprisedb.com




Re: some AppVeyor files

2020-03-24 Thread Thomas Munro
On Tue, Mar 24, 2020 at 5:05 AM Peter Eisentraut
 wrote:
> You can also run this yourself without the detour through the commit
> fest app.  Attached are three patches that add .appveyor.yml files, for
> MSVC, MinGW, and Cygwin respectively.  (An open problem is to combine
> them all into one.)  I have been using these regularly over the last few
> months to test code on these Windows variants.

Thanks!  I added a link to this thread to a Wiki page that tries to
collect information on this topic[1].  Another thing you could be
interested in is the ability to test on several different MSVC
versions (I tried to find some appveyor.yml files I had around here
somewhere to do that, but no cigar... it's just different paths for
those .bat files that set up the environment).

Here is my current wish list for Windows CI:

1.  Run check-world with tap tests.
2.  Turn on the equivalent of -Werror (maybe).
3.  Turn on asserts.
4.  Print backtraces on crash.
5.  Dump all potentially relevant logs on failure (initdb.log,
regression.diff etc).
6.  Find a Windows thing that is like ccache and preserve its cache
across builds (like Travis, which saves some build time).

[1] https://wiki.postgresql.org/wiki/Continuous_Integration




Re: pg_upgrade fails with non-standard ACL

2020-03-24 Thread Artur Zakirov

Hello David,

On 3/25/2020 2:08 AM, David Steele wrote:

On 12/17/19 3:10 AM, Arthur Zakirov wrote:


I attached new version of the patch. It still uses 
pg_identify_object(), I'm not sure about other ways to build 
identities yet.


This patch applies and builds but fails regression tests on Linux and 
Windows:

https://travis-ci.org/postgresql-cfbot/postgresql/builds/666134656
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.85292


Regression tests fail because cfbot applies 
"test_rename_catalog_objects.patch". Regression tests pass without it.


This patch shouldn't be applied by cfbot. I'm not sure how to do this. 
But maybe it is possible to use different extension name for the test 
patch, not ".patch".


--
Artur




AllocSetEstimateChunkSpace()

2020-03-24 Thread Jeff Davis
Attached is a small patch that introduces a simple function,
AllocSetEstimateChunkSpace(), and uses it to improve the estimate for
the size of a hash entry for hash aggregation.

Getting reasonable estimates for the hash entry size (including the
TupleHashEntryData, the group key tuple, the per-group state, and by-
ref transition values) is important for multiple reasons:

* It helps the planner know when hash aggregation is likely to spill,
and how to cost it.

* It helps hash aggregation regulate itself by setting a group limit
(separate from the memory limit) to account for growing by-ref
transition values.

* It helps choose a good initial size for the hash table. Too large of
a hash table will crowd out space that could be used for the group keys
or the per-group state.

Each group requires up to three palloc chunks: one for the group key
tuple, one for the per-group states, and one for a by-ref transition
value. Each chunk can have a lot of overhead: in addition to the chunk
header (16 bytes overhead), it also rounds the value up to a power of
two (~25% overhead). So, it's important to account for this chunk
overhead.

I considered making it a method of a memory context, but the planner
will call this function before the hash agg memory context is created.
It seems to make more sense for this to just be an AllocSet-specific
function for now.

Regards,
Jeff Davis

diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 2d94d7dc25e..7fa71ad2d65 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -1637,16 +1637,37 @@ find_hash_columns(AggState *aggstate)
 
 /*
  * Estimate per-hash-table-entry overhead.
+ *
+ * NB: This assumes the memory context at execution time will be an
+ * AllocSetContext.
  */
 Size
-hash_agg_entry_size(int numAggs, Size tupleWidth, Size transitionSpace)
+hash_agg_entry_size(int numTrans, Size tupleWidth, Size transitionSpace)
 {
+	Size	 tupleChunkSize;
+	Size	 pergroupChunkSize;
+	Size	 transitionChunkSize;
+	Size	 tupleSize			 = (MAXALIGN(SizeofMinimalTupleHeader) +
+	tupleWidth);
+	Size	 pergroupSize		 = numTrans * sizeof(AggStatePerGroupData);
+
+	tupleChunkSize = AllocSetEstimateChunkSpace(tupleSize);
+
+	if (pergroupSize > 0)
+		pergroupChunkSize = AllocSetEstimateChunkSpace(pergroupSize);
+	else
+		pergroupChunkSize = 0;
+
+	if (transitionSpace > 0)
+		transitionChunkSize = AllocSetEstimateChunkSpace(transitionSpace);
+	else
+		transitionChunkSize = 0;
+
 	return
-		MAXALIGN(SizeofMinimalTupleHeader) +
-		MAXALIGN(tupleWidth) +
-		MAXALIGN(sizeof(TupleHashEntryData) +
- numAggs * sizeof(AggStatePerGroupData)) +
-		transitionSpace;
+		sizeof(TupleHashEntryData) +
+		tupleChunkSize +
+		pergroupChunkSize +
+		transitionChunkSize;
 }
 
 /*
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index c0623f106d2..678ddd77912 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -560,6 +560,30 @@ AllocSetContextCreateInternal(MemoryContext parent,
 	return (MemoryContext) set;
 }
 
+/*
+ * Estimate total memory consumed for a chunk of the requested size.
+ */
+Size
+AllocSetEstimateChunkSpace(Size size)
+{
+	Size		chunk_size;
+
+	if (size > ALLOC_CHUNK_LIMIT)
+	{
+		chunk_size = MAXALIGN(size);
+
+		return chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
+	}
+	else
+	{
+		int			fidx = AllocSetFreeIndex(size);
+
+		chunk_size = (1 << ALLOC_MINBITS) << fidx;
+
+		return chunk_size + ALLOC_CHUNKHDRSZ;
+	}
+}
+
 /*
  * AllocSetReset
  *		Frees all memory which is allocated in the given set.
diff --git a/src/include/executor/nodeAgg.h b/src/include/executor/nodeAgg.h
index a5b8a004d1e..c2b55728bfa 100644
--- a/src/include/executor/nodeAgg.h
+++ b/src/include/executor/nodeAgg.h
@@ -314,7 +314,7 @@ extern AggState *ExecInitAgg(Agg *node, EState *estate, int eflags);
 extern void ExecEndAgg(AggState *node);
 extern void ExecReScanAgg(AggState *node);
 
-extern Size hash_agg_entry_size(int numAggs, Size tupleWidth,
+extern Size hash_agg_entry_size(int numTrans, Size tupleWidth,
 Size transitionSpace);
 extern void hash_agg_set_limits(double hashentrysize, uint64 input_groups,
 int used_bits, Size *mem_limit,
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 909bc2e9888..67e53e4b977 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -155,6 +155,7 @@ extern MemoryContext AllocSetContextCreateInternal(MemoryContext parent,
    Size minContextSize,
    Size initBlockSize,
    Size maxBlockSize);
+extern Size AllocSetEstimateChunkSpace(Size size);
 
 /*
  * This wrapper macro exists to check for non-constant strings used as context


Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-24 Thread James Coleman
On Tue, Mar 24, 2020 at 8:23 PM James Coleman  wrote:
> While working on finding a test case to show rescan isn't implemented
> properly yet, I came across a bug. At the top of
> ExecInitIncrementalSort, we assert that eflags does not contain
> EXEC_FLAG_REWIND. But the following query (with merge and hash joins
> disabled) breaks that assertion:
>
> select * from t join (select * from t order by a, b) s on s.a = t.a
> where t.a in (1,2);
>
> The comments about this flag in src/include/executor/executor.h say:
>
> * REWIND indicates that the plan node should try to efficiently support
> * rescans without parameter changes. (Nodes must support ExecReScan calls
> * in any case, but if this flag was not given, they are at liberty to do it
> * through complete recalculation. Note that a parameter change forces a
> * full recalculation in any case.)
>
> Now we know that except in rare cases (as just discussed recently up
> thread) we can't implement rescan efficiently.
>
> So is this a planner bug (i.e., should we try not to generate
> incremental sort plans that require efficient rewind)? Or can we just
> remove that part of the assertion and know that we'll implement the
> rescan, albeit inefficiently? We already explicitly declare that we
> don't support backwards scanning, but I don't see a way to declare the
> same for rewind.

Other nodes seem to get a materialization node placed above them to
support this case "better". Is that something we should be doing?

> I'm going to try the latter approach now to see if it at least solves
> the immediate problem...

So a couple of interesting results here. First, it does seem to fix
the assertion failure, and rescan is being used in this case (as I
expected).

The plans have a bit of a weird shape, at least in my mind. First, to
get the incremental sort on the right side of the join I had to:
set enable_mergejoin = off;
set enable_hashjoin = off;
and got this plan:

 Gather  (cost=1000.47..108541.96 rows=2 width=16)
   Workers Planned: 2
   ->  Nested Loop  (cost=0.47..107541.76 rows=1 width=16)
 Join Filter: (t.a = t_1.a)
 ->  Parallel Seq Scan on t  (cost=0.00..9633.33 rows=1 width=8)
   Filter: (a = ANY ('{1,2}'::integer[]))
 ->  Incremental Sort  (cost=0.47..75408.43 rows=100 width=8)
   Sort Key: t_1.a, t_1.b
   Presorted Key: t_1.a
   ->  Index Scan using idx_t_a on t t_1
(cost=0.42..30408.42 rows=100 width=8)

To get rid of the parallelism but keep the same basic plan shape I had
to further:
set enable_seqscan = off;
set enable_material = off;
and got this plan:

 Nested Loop  (cost=0.89..195829.74 rows=2 width=16)
   Join Filter: (t.a = t_1.a)
   ->  Index Scan using idx_t_a on t  (cost=0.42..12.88 rows=2 width=8)
 Index Cond: (a = ANY ('{1,2}'::integer[]))
   ->  Incremental Sort  (cost=0.47..75408.43 rows=100 width=8)
 Sort Key: t_1.a, t_1.b
 Presorted Key: t_1.a
 ->  Index Scan using idx_t_a on t t_1  (cost=0.42..30408.42
rows=100 width=8)

Two observations:
1. Ideally the planner would have realized that the join condition can
be safely pushed down into both index scans. I was surprised this
didn't happen, but...maybe that's just not supported?

2. Ideally the nested loop node would have the smarts to know that the
right child is ordered, and therefore it can stop pulling nodes from
it as soon as it stops matching the join condition for each iteration
in the loop. I'm less surprised this isn't supported; it seems like a
fairly advanced optimization (OTOH it is the kind of interesting
optimization incremental sort opens up in more cases.

I don't *think* either of these are issues with the patch, but wanted
to mention them in case it piqued someone's curiosity or in case it
actually is a bug [in our patch or otherwise].

James




Re: Run-time pruning for ModifyTable

2020-03-24 Thread David Rowley
On Wed, 25 Mar 2020 at 13:00, Tom Lane  wrote:
>
> David Rowley  writes:
> > I had a closer look at this today and the code I have in
> > inheritance_planner() is certainly not right.
>
> Although I didn't get around to it for v13, there's still a plan on the
> table for inheritance_planner() to get nuked from orbit [1].
>
> Maybe this improvement should be put on hold till that's done?

Possibly. I'm not really wedded to the idea of getting it in. However,
it would really only be the inheritance planner part that would need
to be changed later. I don't think any of the other code would need to
be adjusted.

Amit shared his thoughts in [1].  If you'd rather I held off, then I will.

David

[1] 
https://www.postgresql.org/message-id/CA%2BHiwqGhD7ieKsv5%2BGkmHgs-XhP2DoUhtESVb3MU-4j14%3DG6LA%40mail.gmail.com




Re: Include sequence relation support in logical replication

2020-03-24 Thread Andres Freund
On 2020-03-24 16:19:21 -0700, Cary Huang wrote:
> Hi
> 
> 
> 
> From the PG logical replication documentation, I see that there is a
> listed limitation that sequence relation is not replicated
> logically. After some examination, I see that retrieving the next
> value from a sequence using the nextval() call will emits a WAL update
> every 32 calls to nextval(). In fact, when it emits a WAL update, it
> will write a future value 32 increments from now, and maintain a
> internal cache for delivering sequence numbers. It is done this way to
> minimize the write operation to WAL record at a risk of losing some
> values during a crash. So if we were to replicate the sequence, the
> subscriber will receive a future value (32 calls to nextval()) from
> now, and it obviously does not reflect current status. Sequence
> changes caused by other sequence-related SQL functions like setval()
> or ALTER SEQUENCE xxx, will always emit a WAL update, so replicating
> changes caused by these should not be a problem. 
> 
> 
> 
> I have shared a patch that allows sequence relation to be supported in 
> logical replication via the decoding plugin ( test_decoding for example ); it 
> does not support sequence relation in logical replication between a PG 
> publisher and a PG subscriber via pgoutput plugin as it will require much 
> more work. For the replication to make sense, the patch actually disables the 
> WAL update at every 32 nextval() calls, so every call to nextval() will emit 
> a WAL update for proper replication. This is done by setting SEQ_LOG_VALS to 
> 0 in sequence.c
> 
> 
> 
> I think the question is that should we minimize WAL update frequency (every 
> 32 calls) for getting next value in a sequence at a cost of losing values 
> during crash or being able to replicate a sequence relation properly at a 
> cost or more WAL updates?
> 
> 
> 
> 
> 
> Cary Huang
> 
> -
> 
> HighGo Software Inc. (Canada)
> 
> mailto:cary.hu...@highgo.ca
> 
> http://www.highgo.ca

> diff --git a/contrib/test_decoding/test_decoding.c 
> b/contrib/test_decoding/test_decoding.c
> index 93c948856e..7a7e572d6c 100644
> --- a/contrib/test_decoding/test_decoding.c
> +++ b/contrib/test_decoding/test_decoding.c
> @@ -466,6 +466,15 @@ pg_decode_change(LogicalDecodingContext *ctx, 
> ReorderBufferTXN *txn,
>   
> >data.tp.oldtuple->tuple,
>   true);
>   break;
> + case REORDER_BUFFER_CHANGE_SEQUENCE:
> + appendStringInfoString(ctx->out, " 
> SEQUENCE:");
> + if (change->data.sequence.newtuple == 
> NULL)
> + 
> appendStringInfoString(ctx->out, " (no-tuple-data)");
> + else
> + tuple_to_stringinfo(ctx->out, 
> tupdesc,
> + 
> >data.sequence.newtuple->tuple,
> + 
> false);
> + break;
>   default:
>   Assert(false);
>   }
> diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
> index 6aab73bfd4..941015e4aa 100644
> --- a/src/backend/commands/sequence.c
> +++ b/src/backend/commands/sequence.c
> @@ -49,11 +49,10 @@
>  
>  
>  /*
> - * We don't want to log each fetching of a value from a sequence,
> - * so we pre-log a few fetches in advance. In the event of
> - * crash we can lose (skip over) as many values as we pre-logged.
> + * Sequence replication is now supported and we will now need to log each 
> sequence
> + * update to WAL such that the standby can properly receive the sequence 
> change
>   */
> -#define SEQ_LOG_VALS 32
> +#define SEQ_LOG_VALS 0
>  
>  /*
>   * The "special area" of a sequence's buffer page looks like this.
> diff --git a/src/backend/replication/logical/decode.c 
> b/src/backend/replication/logical/decode.c
> index c2e5e3abf8..3dc14ead08 100644
> --- a/src/backend/replication/logical/decode.c
> +++ b/src/backend/replication/logical/decode.c
> @@ -42,6 +42,7 @@
>  #include "replication/reorderbuffer.h"
>  #include "replication/snapbuild.h"
>  #include "storage/standby.h"
> +#include "commands/sequence.h"
>  
>  typedef struct XLogRecordBuffer
>  {
> @@ -70,9 +71,11 @@ static void DecodeCommit(LogicalDecodingContext *ctx, 
> XLogRecordBuffer *buf,
>xl_xact_parsed_commit *parsed, 
> TransactionId xid);
>  static void DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
>   xl_xact_parsed_abort *parsed, 
> TransactionId xid);
> +static void DecodeSequence(LogicalDecodingContext *ctx, 

Re: Index Skip Scan

2020-03-24 Thread Andy Fan
On Wed, Mar 25, 2020 at 12:41 AM Dmitry Dolgov <9erthali...@gmail.com>
wrote:

> > On Wed, Mar 11, 2020 at 06:56:09PM +0800, Andy Fan wrote:
> >
> > There was a dedicated thread  [1]  where David explain his idea very
> > detailed, and you can also check that messages around that message for
> > the context. hope it helps.
>
> Thanks for pointing out to this thread! Somehow I've missed it, and now
> looks like we need to make some efforts to align patches for index skip
> scan and distincClause elimination.
>

Yes:).   Looks Index skip scan is a way of make a distinct result without a
real
distinct node,  which happens after the UniqueKeys check where I try to see
if
the result is unique already and before the place where create a unique node
for distinct node(With index skip scan we don't need it all).  Currently in
my patch,
the logical here is 1).  Check the UniqueKey to see if the result is not
unique already.
if not, go to next 2).  After the distinct paths are created,  I will add
the result of distinct
path as a unique key.   Will you add the index skip scan path during
create_distincts_paths
and add the UniqueKey to RelOptInfo?  if so I guess my current patch can
handle it since
it cares about the result of distinct path but no worried about how we
archive that.


Best Regards
Andy Fan


Re: PATCH: add support for IN and @> in functional-dependency statistics use

2020-03-24 Thread Tomas Vondra

On Thu, Mar 19, 2020 at 07:53:39PM +, Dean Rasheed wrote:

On Wed, 18 Mar 2020 at 00:29, Tomas Vondra  wrote:


OK, I took a look. I think from the correctness POV the patch is OK, but
I think the dependencies_clauselist_selectivity() function now got a bit
too complex. I've been able to parse it now, but I'm sure I'll have
trouble in the future :-(

Can we refactor / split it somehow and move bits of the logic to smaller
functions, or something like that?



Yeah, it has gotten a bit long. It's somewhat tricky splitting it up,
because of the number of shared variables used throughout the
function, but here's an updated patch splitting it into what seemed
like the 2 most logical pieces. The first piece (still in
dependencies_clauselist_selectivity()) works out what dependencies
can/should be applied, and the second piece in a new function does the
actual work of applying the list of functional dependencies to the
clause list.

I think that has made it easier to follow, and it has also reduced the
complexity of the final "no applicable stats" branch.



Seems OK to me.

I'd perhaps name deps_clauselist_selectivity differently, it's a bit too
similar to dependencies_clauselist_selectivity. Perhaps something like
clauselist_apply_dependencies? But that's a minor detail.


Another thing I'd like to suggest is keeping the "old" formula, and
instead of just replacing it with

P(a,b) = f * Min(P(a), P(b)) + (1-f) * P(a) * P(b)

but explaining how the old formula may produce nonsensical selectivity,
and how the new formula addresses that issue.



I think this is purely a comment issue? I've added some more extensive
comments attempting to justify the formulae.



Yes, it was purely a comment issue. Seems fine now.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-24 Thread James Coleman
On Tue, Mar 24, 2020 at 7:08 PM Tomas Vondra
 wrote:
>
> On Tue, Mar 24, 2020 at 06:26:11PM -0400, James Coleman wrote:
> >On Mon, Mar 23, 2020 at 11:44 PM Alvaro Herrera
> > wrote:
> >>
> >> On 2020-Mar-23, James Coleman wrote:
> >>
> >> > 4. nodeIncrementalSort.c ExecReScanIncrementalSort: This whole chunk
> >> > is suspect. I've mentioned previously I don't have a great mental
> >> > model of how rescan works and its invariants (IIRC someone said it was
> >> > about moving around a result set in a cursor). Regardless I'm pretty
> >> > sure this code just doesn't work correctly.
> >>
> >> I don't think that's the whole of it.  My own vague understanding of
> >> ReScan is that it's there to support running a node again, possibly with
> >> different parameters.  For example if you have a join of an indexscan
> >> on the outer side and an incremental sort on the inner side, and the
> >> values from the index are used as parameters to the incremental sort,
> >> then the incremental sort is going to receive ReScan calls for each of
> >> the values that the index returns.  Sometimes the index could give you
> >> the same values as before (because there's a dupe in the index), so you
> >> can just return the same values from the incremental sort; but other
> >> times it's going to return different values so you need to reset the
> >> incremental sort to "start from scratch" using the new values as
> >> parameters.
> >>
> >> Now, if you have a cursor reading from the incremental sort and fetch
> >> all tuples, then rewind completely and fetch all again, then that's
> >> going to be a rescan as well.
> >>
> >> I agree with you that the code doesn't seem to implement that.
> >
> >I grepped the codebase for rescan, and noted this relevant info in
> >src/backend/executor/README:
> >
> >* Rescan command to reset a node and make it generate its output sequence
> >over again.
> >
> >* Parameters that can alter a node's results. After adjusting a parameter,
> >the rescan command must be applied to that node and all nodes above it.
> >There is a moderately intelligent scheme to avoid rescanning nodes
> >unnecessarily (for example, Sort does not rescan its input if no parameters
> >of the input have changed, since it can just reread its stored sorted data).
> >
> >That jives pretty well with what you're saying.
> >
> >The interesting thing with incremental sort, as the comments in the
> >patch already note, is that even if the params haven't changed, we
> >can't regenerate the same values again *unless* we know that we're
> >still in the same batch, or, have only processed a single full batch
> >(and the tuples are still in the full sort state) or we've
> >transitioned to prefix mode and have only transferred tuples from the
> >full sort state for a single prefix key group.
> >
> >That's a pretty narrow range of applicability of not needing to
> >re-execute the entire node, at least based on my assumptions about
> >when rescanning will typically happen.
> >
> >So, two followup questions:
> >
> >1. Given the narrow applicability, might it make sense to just say
> >"we're only going to do a total reset and rescan and not try to
> >implement a smart 'don't rescan if we don't have to'"?
> >
>
> I think that's a sensible approach.
>
> >2. What would be a typical or good way to test this? Should I
> >basically repeat many of the existing implementation tests but with a
> >cursor and verify that rescanning produces the same results? That's
> >probably the path I'm going to take if there are no objections. Of
> >course we would need even more testing if we wanted to have the "smart
> >rescan" functionality.
> >
>
> I haven't checked, but how are we testing it for the other nodes?

I haven't checked yet, but figured I'd ask in case someone had ideas
off the top of their head.

While working on finding a test case to show rescan isn't implemented
properly yet, I came across a bug. At the top of
ExecInitIncrementalSort, we assert that eflags does not contain
EXEC_FLAG_REWIND. But the following query (with merge and hash joins
disabled) breaks that assertion:

select * from t join (select * from t order by a, b) s on s.a = t.a
where t.a in (1,2);

The comments about this flag in src/include/executor/executor.h say:

* REWIND indicates that the plan node should try to efficiently support
* rescans without parameter changes. (Nodes must support ExecReScan calls
* in any case, but if this flag was not given, they are at liberty to do it
* through complete recalculation. Note that a parameter change forces a
* full recalculation in any case.)

Now we know that except in rare cases (as just discussed recently up
thread) we can't implement rescan efficiently.

So is this a planner bug (i.e., should we try not to generate
incremental sort plans that require efficient rewind)? Or can we just
remove that part of the assertion and know that we'll implement the
rescan, albeit inefficiently? We already explicitly declare that we
don't support 

Re: Run-time pruning for ModifyTable

2020-03-24 Thread Tom Lane
David Rowley  writes:
> I had a closer look at this today and the code I have in
> inheritance_planner() is certainly not right.

Although I didn't get around to it for v13, there's still a plan on the
table for inheritance_planner() to get nuked from orbit [1].

Maybe this improvement should be put on hold till that's done?

regards, tom lane

[1] https://www.postgresql.org/message-id/357.1550612935%40sss.pgh.pa.us




Re: Run-time pruning for ModifyTable

2020-03-24 Thread David Rowley
On Tue, 10 Mar 2020 at 00:13, David Rowley  wrote:
> Over in inheritance_planner(), I noticed that the RT index of the
> SELECT query and the UPDATE/DELETE query can differ. There was some
> code that performed translations. I changed that code slightly so that
> it's a bit more optimal.  It was building two lists, one for the old
> RT index and one for the new. It added elements to this list
> regardless of if the RT indexes were the same or not. I've now changed
> that to only add to the list if they differ, which I feel should never
> be slower and most likely always faster.   I'm also now building a
> translation map between the old and new RT indexes, however, I only
> found one test in the regression tests which require any sort of
> translation of these RT indexes.  This was with an inheritance table,
> so I need to do a bit more work to find a case where this happens with
> a partitioned table to ensure all this works.

I had a closer look at this today and the code I have in
inheritance_planner() is certainly not right.

It's pretty easy to made the SELECT and UPDATE/DELETE's RT indexes
differ with something like:

drop table part_t cascade;
create table part_t (a int, b int, c int) partition by list (a);
create table part_t12 partition of part_t for values in(1,2) partition
by list (a);
create table part_t12_1 partition of part_t12 for values in(1);
create table part_t12_2 partition of part_t12 for values in(2);
create table part_t3 partition of part_t for values in(3);
create view vw_part_t as select * from part_t;

explain analyze update vw_part_t set a = t2.a +0 from part_t t2 where
t2.a = vw_part_t.a and vw_part_t.a = (select 1);

In this case, the sub-partitioned table changes RT index.  I can't
just take the RelOptInfo's from the partition_root's simple_rel_array
and put them in the correct element in the root's simple_rel_array as
they RT indexes stored within also need to be translated.

I'll be having another look at this to see what the best fix is going to be.

David




Re: NOT IN subquery optimization

2020-03-24 Thread Tom Lane
"Li, Zheng"  writes:
> * I find it entirely unacceptable to stick some planner temporary
> fields into struct SubLink.  If you need that storage you'll have
> to find some other place to put it.  But in point of fact I don't
> think you need it; it doesn't look to me to be critical to generate
> the subquery's plan any earlier than make_subplan would have done it.
> Moreover, you should really strive to *not* do that, because it's
> likely to get in the way of other future optimizations.  As the
> existing comment in make_subplan already suggests, we might want to
> delay subplan planning even further than that in future.

>   The reason for calling make_subplan this early is that we want to
> Call subplan_is_hashable(plan), to decide whether to proceed with the proposed
> transformation.

Well, you're going to have to find another way, because this one won't do.

If you really need to get whacked over the head with a counterexample for
this approach, consider what happens if some part of the planner decides
to pass the SubLink through copyObject, expression_tree_mutator, etc
in between where you've done the planning and where make_subplan looks
at it.  Since you haven't taught copyfuncs.c about these fields, they'll
semi-accidentally wind up as NULL afterwards, meaning you lost the
information anyway.  (In fact, I wouldn't be surprised if that's happening
already in some cases; you couldn't really tell, since make_subplan will
just repeat the work.)  On the other hand, you can't have copyfuncs.c
copying such fields either --- we don't have copyfuncs support for
PlannerInfo, and if we did, the case would end up as infinite recursion.
Nor would it be particularly cool to try to fake things out by copying the
pointers as scalars; that will lead to dangling pointers later on.

BTW, so far as I can see, the only reason you're bothering with the whole
thing is to compare the size of the subquery output with work_mem, because
that's all that subplan_is_hashable does.  I wonder whether that
consideration is even still necessary in the wake of 1f39bce02.  If it is,
I wonder whether there isn't a cheaper way to figure it out.  (Note
similar comment in make_subplan.)

Also ...

> We try to stick with the fast hashed subplan when possible to avoid
> potential performance degradation from the transformation which may
> restrict the planner to choose Nested Loop Anti Join in order to handle
> Null properly,

But can't you detect that case directly?  It seems like you'd need to
figure out the NULL situation anyway to know whether the transformation
to antijoin is valid in the first place.

regards, tom lane




Include sequence relation support in logical replication

2020-03-24 Thread Cary Huang
Hi



>From the PG logical replication documentation, I see that there is a listed 
>limitation that sequence relation is not replicated logically. After some 
>examination, I see that retrieving the next value from a sequence using the 
>nextval() call will emits a WAL update every 32 calls to nextval(). In fact, 
>when it emits a WAL update, it will write a future value 32 increments from 
>now, and maintain a internal cache for delivering sequence numbers. It is done 
>this way to minimize the write operation to WAL record at a risk of losing 
>some values during a crash. So if we were to replicate the sequence, the 
>subscriber will receive a future value (32 calls to nextval()) from now, and 
>it obviously does not reflect current status. Sequence changes caused by other 
>sequence-related SQL functions like setval() or ALTER SEQUENCE xxx, will 
>always emit a WAL update, so replicating changes caused by these should not be 
>a problem. 



I have shared a patch that allows sequence relation to be supported in logical 
replication via the decoding plugin ( test_decoding for example ); it does not 
support sequence relation in logical replication between a PG publisher and a 
PG subscriber via pgoutput plugin as it will require much more work. For the 
replication to make sense, the patch actually disables the WAL update at every 
32 nextval() calls, so every call to nextval() will emit a WAL update for 
proper replication. This is done by setting SEQ_LOG_VALS to 0 in sequence.c



I think the question is that should we minimize WAL update frequency (every 32 
calls) for getting next value in a sequence at a cost of losing values during 
crash or being able to replicate a sequence relation properly at a cost or more 
WAL updates?





Cary Huang

-

HighGo Software Inc. (Canada)

mailto:cary.hu...@highgo.ca

http://www.highgo.ca

sequence_replication.patch
Description: Binary data


Re: Ltree syntax improvement

2020-03-24 Thread Tom Lane
Nikita Glukhov  writes:
> Attached new version of the patch.

I spent a little bit of time looking through this, and have a few
comments:

* You have a lot of places where tests for particular ASCII characters
are done like this:

if ((charlen == 1) && t_iseq(src, '\\'))

This is a tedious and expensive way to spell

if (*src == '\\')

because charlen is necessarily 1 if you are looking at an ASCII character;
there is no allowed backend encoding in which an ASCII character can be
the first byte of a multibyte character.  Aside from the direct
simplifications of the tests that this makes possible, I see some places
where you'd not have to pass charlen around, either.

* I spent a fair amount of time thinking that a lot of the added code
was wrong because it was only considering escaping and not
double-quoting.  I eventually concluded that the idea is to convert
double-quoting to a pure escape-based representation during input
and store it that way.  However, I don't really see why that is either
necessary or a good idea --- the internal storage already has a length
counter for each label.  So I think what you really ought to be doing
here is simplifying out both quotes and escapes during ltree_in
and just storing the notionally-represented string internally.
(If I've misunderstood what the plan is, well the utter lack of
documentation in the patch isn't helping.)

* The added test cases seem a bit excessive and repetitive.

regards, tom lane




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-24 Thread Tomas Vondra

On Tue, Mar 24, 2020 at 06:26:11PM -0400, James Coleman wrote:

On Mon, Mar 23, 2020 at 11:44 PM Alvaro Herrera
 wrote:


On 2020-Mar-23, James Coleman wrote:

> 4. nodeIncrementalSort.c ExecReScanIncrementalSort: This whole chunk
> is suspect. I've mentioned previously I don't have a great mental
> model of how rescan works and its invariants (IIRC someone said it was
> about moving around a result set in a cursor). Regardless I'm pretty
> sure this code just doesn't work correctly.

I don't think that's the whole of it.  My own vague understanding of
ReScan is that it's there to support running a node again, possibly with
different parameters.  For example if you have a join of an indexscan
on the outer side and an incremental sort on the inner side, and the
values from the index are used as parameters to the incremental sort,
then the incremental sort is going to receive ReScan calls for each of
the values that the index returns.  Sometimes the index could give you
the same values as before (because there's a dupe in the index), so you
can just return the same values from the incremental sort; but other
times it's going to return different values so you need to reset the
incremental sort to "start from scratch" using the new values as
parameters.

Now, if you have a cursor reading from the incremental sort and fetch
all tuples, then rewind completely and fetch all again, then that's
going to be a rescan as well.

I agree with you that the code doesn't seem to implement that.


I grepped the codebase for rescan, and noted this relevant info in
src/backend/executor/README:

* Rescan command to reset a node and make it generate its output sequence
over again.

* Parameters that can alter a node's results. After adjusting a parameter,
the rescan command must be applied to that node and all nodes above it.
There is a moderately intelligent scheme to avoid rescanning nodes
unnecessarily (for example, Sort does not rescan its input if no parameters
of the input have changed, since it can just reread its stored sorted data).

That jives pretty well with what you're saying.

The interesting thing with incremental sort, as the comments in the
patch already note, is that even if the params haven't changed, we
can't regenerate the same values again *unless* we know that we're
still in the same batch, or, have only processed a single full batch
(and the tuples are still in the full sort state) or we've
transitioned to prefix mode and have only transferred tuples from the
full sort state for a single prefix key group.

That's a pretty narrow range of applicability of not needing to
re-execute the entire node, at least based on my assumptions about
when rescanning will typically happen.

So, two followup questions:

1. Given the narrow applicability, might it make sense to just say
"we're only going to do a total reset and rescan and not try to
implement a smart 'don't rescan if we don't have to'"?



I think that's a sensible approach.


2. What would be a typical or good way to test this? Should I
basically repeat many of the existing implementation tests but with a
cursor and verify that rescanning produces the same results? That's
probably the path I'm going to take if there are no objections. Of
course we would need even more testing if we wanted to have the "smart
rescan" functionality.



I haven't checked, but how are we testing it for the other nodes?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: NOT IN subquery optimization

2020-03-24 Thread Li, Zheng
Hi Tom,

Thanks for the feedback.

* I find it entirely unacceptable to stick some planner temporary
fields into struct SubLink.  If you need that storage you'll have
to find some other place to put it.  But in point of fact I don't
think you need it; it doesn't look to me to be critical to generate
the subquery's plan any earlier than make_subplan would have done it.
Moreover, you should really strive to *not* do that, because it's
likely to get in the way of other future optimizations.  As the
existing comment in make_subplan already suggests, we might want to
delay subplan planning even further than that in future.

  The reason for calling make_subplan this early is that we want to
Call subplan_is_hashable(plan), to decide whether to proceed with the proposed
transformation. We try to stick with the fast hashed subplan when possible to 
avoid
potential performance degradation from the transformation which may restrict the
planner to choose Nested Loop Anti Join in order to handle Null properly,
the following is an example from subselect.out:
explain (costs false) select * from s where n not in (select u from l);
  QUERY PLAN
---
 Nested Loop Anti Join
   InitPlan 1 (returns $0)
 ->  Seq Scan on l l_1
   ->  Seq Scan on s
 Filter: ((n IS NOT NULL) OR (NOT $0))
   ->  Index Only Scan using l_u on l
 Index Cond: (u = s.n)

However, if the subplan is not hashable, the above Nested Loop Anti Join is
actually faster.

* I'm also not too happy with the (undocumented) rearrangement of
reduce_outer_joins.  There's a specific sequence of processing that
that's involved in, as documented at the top of prepjointree.c, and
I doubt that you can just randomly call it from other places and expect
good results.  In particular, since JOIN alias var flattening won't have
happened yet when this code is being run from pull_up_sublinks, it's
unlikely that reduce_outer_joins will reliably get the same answers it
would get normally.  I also wonder whether it's safe to make the
parsetree changes it makes earlier than normal, and whether it will be
problematic to run it twice on the same tree, and whether its rather
indirect connection to distribute_qual_to_rels is going to misbehave.

  The rearrangement of  reduce_outer_joins was to make the null test function
is_node_nonnullable() more accurate. Later we added strict predicates logic in
is_node_nonnullable(), so I think we can get rid of the rearrangement of
reduce_outer_joins now without losing accuracy.

* The proposed test additions seem to about triple the runtime of
subselect.sql.  This seems excessive.  I also wonder why it's necessary
for this test to build its own large tables; couldn't it re-use ones
that already exist in the regression database?

  I added a lot of test cases. But yes, I can reuse the existing large table if
there is one that doesn't fit in 64KB work_mem.

* Not really sure that we need a new planner GUC for this, but if we
do, it needs to be documented.

  The new GUC is just in case if anything goes wrong, the user can
easily turn it off.

Regards,
Zheng




Re: WIP: WAL prefetch (another approach)

2020-03-24 Thread Andres Freund
Hi,

On 2020-03-18 18:18:44 +1300, Thomas Munro wrote:
> From 1b03eb5ada24c3b23ab8ca6db50e0c5d90d38259 Mon Sep 17 00:00:00 2001
> From: Thomas Munro 
> Date: Mon, 9 Dec 2019 17:22:07 +1300
> Subject: [PATCH 3/5] Add WalRcvGetWriteRecPtr() (new definition).
> 
> A later patch will read received WAL to prefetch referenced blocks,
> without waiting for the data to be flushed to disk.  To do that, it
> needs to be able to see the write pointer advancing in shared memory.
> 
> The function formerly bearing name was recently renamed to
> WalRcvGetFlushRecPtr(), which better described what it does.

Hm. I'm a bit weary of reusing the name with a different meaning. If
there's any external references, this'll hide that they need to
adapt. Perhaps, even if it's a bit clunky, name it GetUnflushedRecPtr?


> From c62fde23f70ff06833d743a1c85716e15f3c813c Mon Sep 17 00:00:00 2001
> From: Thomas Munro 
> Date: Tue, 17 Mar 2020 17:26:41 +1300
> Subject: [PATCH 4/5] Allow PrefetchBuffer() to report what happened.
> 
> Report whether a prefetch was actually initiated due to a cache miss, so
> that callers can limit the number of concurrent I/Os they try to issue,
> without counting the prefetch calls that did nothing because the page
> was already in our buffers.
> 
> If the requested block was already cached, return a valid buffer.  This
> might enable future code to avoid a buffer mapping lookup, though it
> will need to recheck the buffer before using it because it's not pinned
> so could be reclaimed at any time.
> 
> Report neither hit nor miss when a relation's backing file is missing,
> to prepare for use during recovery.  This will be used to handle cases
> of relations that are referenced in the WAL but have been unlinked
> already due to actions covered by WAL records that haven't been replayed
> yet, after a crash.

We probably should take this into account in nodeBitmapHeapscan.c


> diff --git a/src/backend/storage/buffer/bufmgr.c 
> b/src/backend/storage/buffer/bufmgr.c
> index d30aed6fd9..4ceb40a856 100644
> --- a/src/backend/storage/buffer/bufmgr.c
> +++ b/src/backend/storage/buffer/bufmgr.c
> @@ -469,11 +469,13 @@ static int  ts_ckpt_progress_comparator(Datum a, 
> Datum b, void *arg);
>  /*
>   * Implementation of PrefetchBuffer() for shared buffers.
>   */
> -void
> +PrefetchBufferResult
>  PrefetchSharedBuffer(struct SMgrRelationData *smgr_reln,
>ForkNumber forkNum,
>BlockNumber blockNum)
>  {
> + PrefetchBufferResult result = { InvalidBuffer, false };
> +
>  #ifdef USE_PREFETCH
>   BufferTag   newTag; /* identity of requested block */
>   uint32  newHash;/* hash value for newTag */
> @@ -497,7 +499,23 @@ PrefetchSharedBuffer(struct SMgrRelationData *smgr_reln,
>  
>   /* If not in buffers, initiate prefetch */
>   if (buf_id < 0)
> - smgrprefetch(smgr_reln, forkNum, blockNum);
> + {
> + /*
> +  * Try to initiate an asynchronous read.  This returns false in
> +  * recovery if the relation file doesn't exist.
> +  */
> + if (smgrprefetch(smgr_reln, forkNum, blockNum))
> + result.initiated_io = true;
> + }
> + else
> + {
> + /*
> +  * Report the buffer it was in at that time.  The caller may be 
> able
> +  * to avoid a buffer table lookup, but it's not pinned and it 
> must be
> +  * rechecked!
> +  */
> + result.buffer = buf_id + 1;

Perhaps it'd be better to name this "last_buffer" or such, to make it
clearer that it may be outdated?


> -void
> +PrefetchBufferResult
>  PrefetchBuffer(Relation reln, ForkNumber forkNum, BlockNumber blockNum)
>  {
>  #ifdef USE_PREFETCH
> @@ -540,13 +564,17 @@ PrefetchBuffer(Relation reln, ForkNumber forkNum, 
> BlockNumber blockNum)
>errmsg("cannot access temporary tables 
> of other sessions")));
>  
>   /* pass it off to localbuf.c */
> - PrefetchLocalBuffer(reln->rd_smgr, forkNum, blockNum);
> + return PrefetchLocalBuffer(reln->rd_smgr, forkNum, blockNum);
>   }
>   else
>   {
>   /* pass it to the shared buffer version */
> - PrefetchSharedBuffer(reln->rd_smgr, forkNum, blockNum);
> + return PrefetchSharedBuffer(reln->rd_smgr, forkNum, blockNum);
>   }
> +#else
> + PrefetchBuffer result = { InvalidBuffer, false };
> +
> + return result;
>  #endif   /* USE_PREFETCH 
> */
>  }

Hm. Now that results are returned indicating whether the buffer is in
s_b - shouldn't the return value be accurate regardless of USE_PREFETCH?



> +/*
> + * Type returned by PrefetchBuffer().
> + */
> +typedef struct PrefetchBufferResult
> +{
> + Buffer  buffer; /* If valid, 

Re: Option to dump foreign data in pg_dump

2020-03-24 Thread Alvaro Herrera
On 2020-Mar-24, Alvaro Herrera wrote:

> On 2020-Mar-23, Alvaro Herrera wrote:
> 
> > COPY public.ft1 (c1, c2, c3) FROM stdin;
> > pg_dump: error: query failed: ERROR:  foreign-data wrapper "dummy" has no 
> > handler
> > pg_dump: error: query was: COPY (SELECT c1, c2, c3 FROM public.ft1 ) TO 
> > stdout;
> > 
> > Maybe what we should do just verify that you do get that error (and no
> > other errors).
> 
> Done that way.  Will be pushing this shortly.

Hmm, but travis is failing on the cfbot, and I can't see why ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Improve checking for pg_index.xmin

2020-03-24 Thread Alexander Korotkov
On Tue, Mar 24, 2020 at 3:38 PM Amit Kapila  wrote:
> On Sun, Jan 12, 2020 at 3:33 AM Alexander Korotkov
>  wrote:
> >
> > On Wed, Jan 8, 2020 at 4:37 PM Tom Lane  wrote:
> > > Heikki Linnakangas  writes:
> > > > On 01/11/2019 01:50, Alexander Korotkov wrote:
> > > >> This happens so, because we're checking that there is no broken HOT
> > > >> chains after index creation by comparison pg_index.xmin and
> > > >> TransactionXmin.   So, we check that pg_index.xmin is in the past for
> > > >> current transaction in lossy way by comparison just xmins.  Attached
> > > >> patch changes this check to XidInMVCCSnapshot().
> > > >> With patch the issue is gone.  My doubt about this patch is that it
> > > >> changes check with TransactionXmin to check with GetActiveSnapshot(),
> > > >> which might be more recent.  However, query shouldn't be executer with
> > > >> older snapshot than one it was planned with.
> > >
> > > > Hmm. Maybe you could construct a case like that with a creative mix of
> > > > stable and volatile functions? Using GetOldestSnapshot() would be safer.
> > >
> > > I really wonder if this is safe at all.
> > >
> > > (1) Can we assume that the query will be executed with same-or-newer
> > > snapshot as what was used by the planner?  There's no such constraint
> > > in the plancache, I'm pretty sure.
> > >
> > > (2) Is committed-ness of the index-creating transaction actually
> > > sufficient to ensure that none of the broken HOT chains it saw are
> > > a problem for the onlooker transaction?  This is, at best, really
> > > un-obvious.  Some of those HOT chains could involve xacts that were
> > > still not committed when the index build finished, I believe.
> > >
> > > (3) What if the index was made with CREATE INDEX CONCURRENTLY ---
> > > which xid is actually on the pg_index row, and how does that factor
> > > into (1) and (2)?
> >
> > Thank you for pointing.  I'll investigate these issues in detail.
> >
>
> Are you planning to work on this patch [1] for current CF?  If not,
> then I think it is better to either move this to the next CF or mark
> it as RWF.

I didn't manage to investigate this subject and provide new patch
version.  I'm marking this RWF.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-24 Thread James Coleman
On Mon, Mar 23, 2020 at 11:44 PM Alvaro Herrera
 wrote:
>
> On 2020-Mar-23, James Coleman wrote:
>
> > 4. nodeIncrementalSort.c ExecReScanIncrementalSort: This whole chunk
> > is suspect. I've mentioned previously I don't have a great mental
> > model of how rescan works and its invariants (IIRC someone said it was
> > about moving around a result set in a cursor). Regardless I'm pretty
> > sure this code just doesn't work correctly.
>
> I don't think that's the whole of it.  My own vague understanding of
> ReScan is that it's there to support running a node again, possibly with
> different parameters.  For example if you have a join of an indexscan
> on the outer side and an incremental sort on the inner side, and the
> values from the index are used as parameters to the incremental sort,
> then the incremental sort is going to receive ReScan calls for each of
> the values that the index returns.  Sometimes the index could give you
> the same values as before (because there's a dupe in the index), so you
> can just return the same values from the incremental sort; but other
> times it's going to return different values so you need to reset the
> incremental sort to "start from scratch" using the new values as
> parameters.
>
> Now, if you have a cursor reading from the incremental sort and fetch
> all tuples, then rewind completely and fetch all again, then that's
> going to be a rescan as well.
>
> I agree with you that the code doesn't seem to implement that.

I grepped the codebase for rescan, and noted this relevant info in
src/backend/executor/README:

* Rescan command to reset a node and make it generate its output sequence
over again.

* Parameters that can alter a node's results. After adjusting a parameter,
the rescan command must be applied to that node and all nodes above it.
There is a moderately intelligent scheme to avoid rescanning nodes
unnecessarily (for example, Sort does not rescan its input if no parameters
of the input have changed, since it can just reread its stored sorted data).

That jives pretty well with what you're saying.

The interesting thing with incremental sort, as the comments in the
patch already note, is that even if the params haven't changed, we
can't regenerate the same values again *unless* we know that we're
still in the same batch, or, have only processed a single full batch
(and the tuples are still in the full sort state) or we've
transitioned to prefix mode and have only transferred tuples from the
full sort state for a single prefix key group.

That's a pretty narrow range of applicability of not needing to
re-execute the entire node, at least based on my assumptions about
when rescanning will typically happen.

So, two followup questions:

1. Given the narrow applicability, might it make sense to just say
"we're only going to do a total reset and rescan and not try to
implement a smart 'don't rescan if we don't have to'"?

2. What would be a typical or good way to test this? Should I
basically repeat many of the existing implementation tests but with a
cursor and verify that rescanning produces the same results? That's
probably the path I'm going to take if there are no objections. Of
course we would need even more testing if we wanted to have the "smart
rescan" functionality.

Thoughts?

James




Re: PostgreSQL proposal of Google Summer of Code

2020-03-24 Thread Fabrízio de Royes Mello
On Tue, Mar 24, 2020 at 7:07 PM Maryam Farrukh 
wrote:

> Dear Sir/Madam,
>
> I am very much interested in working on a project of PostgreSQL for Google
> summer internship. While I was writing a proposal, I came across some
> guidelines by the company to get in touch about the nature of the project
> and then draft the proposal. I would be very much interested in learning
> more about the project so I can come up with a reasonable proposal.
>
>
Hi Maryam,

You can start having a look on the following links:
https://wiki.postgresql.org/wiki/GSoC
https://wiki.postgresql.org/wiki/GSoC_2020

As an old PostgreSQL GSoC student I can tell you it's an amazing experience.

Regards,

-- 
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


PostgreSQL proposal of Google Summer of Code

2020-03-24 Thread Maryam Farrukh
Dear Sir/Madam,

I am very much interested in working on a project of PostgreSQL for Google
summer internship. While I was writing a proposal, I came across some
guidelines by the company to get in touch about the nature of the project
and then draft the proposal. I would be very much interested in learning
more about the project so I can come up with a reasonable proposal.

Best regards,
Maryam Farrukh


Re: allow online change primary_conninfo

2020-03-24 Thread Alvaro Herrera
I think the startup sighup handler should be in startup.c, not xlog.c,
which has enough random code already.  We can add an accessor in xlog.c
to let changing the walrcv status flag, to be called from the signal
handler.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: vacuum verbose detail logs are unclear; log at *start* of each stage

2020-03-24 Thread Tom Lane
Justin Pryzby  writes:
> On Wed, Jan 22, 2020 at 02:34:57PM +0900, Michael Paquier wrote:
>> From patch 0003:
>> /*
>> +* Indent multi-line DETAIL if being sent to client (verbose)
>> +* We don't know if it's sent to the client (client_min_messages);
>> +* Also, that affects output to the logfile, too; assume that it's 
>> more
>> +* important to format messages requested by the client than to make
>> +* verbose logs pretty when also sent to the logfile.
>> +*/
>> +   msgprefix = elevel==INFO ? "!\t" : "";
>> Such stuff gets a -1 from me.  This is not project-like, and you make
>> the translation of those messages much harder than they should be.

> I don't see why its harder to translate ?

The really fundamental problem with this is that you are trying to make
the server do what is properly the client's job, namely format messages
nicely.  Please read the message style guidelines [1], particularly
the bit about "Formatting", which basically says "don't":

Formatting

Don't put any specific assumptions about formatting into the message
texts. Expect clients and the server log to wrap lines to fit their
own needs. In long messages, newline characters (\n) can be used to
indicate suggested paragraph breaks. Don't end a message with a
newline. Don't use tabs or other formatting characters. (In error
context displays, newlines are automatically added to separate levels
of context such as function calls.)

Rationale: Messages are not necessarily displayed on terminal-type
displays. In GUI displays or browsers these formatting instructions
are at best ignored.


regards, tom lane

[1] https://www.postgresql.org/docs/devel/error-style-guide.html




Re: Additional improvements to extended statistics

2020-03-24 Thread Thomas Munro
On Sun, Mar 15, 2020 at 3:23 PM Tomas Vondra
 wrote:
> On Sun, Mar 15, 2020 at 02:48:02PM +1300, Thomas Munro wrote:
> >Stimulated by some bad plans involving JSON, I found my way to your
> >WIP stats-on-expressions patch in this thread.  Do I understand
> >correctly that it will eventually also support single expressions,
> >like CREATE STATISTICS t_distinct_abc (ndistinct) ON
> >(my_jsonb_column->>'abc') FROM t?  It looks like that would solve
> >problems that otherwise require a generated column or an expression
> >index just to get ndistinct.
>
> Yes, I think that's generally the plan. I was also thinking about
> inventing some sort of special JSON statistics (e.g. extracting paths
> from the JSONB and computing frequencies, or something like that). But
> stats on expressions are one of the things I'd like to do in PG14.

Interesting idea.  If you had simple single-expression statistics, I
suppose a cave-person version of this would be to write a
script/stored procedure that extracts the distinct set of JSON paths
and does CREATE STATISTICS for expressions to access each path.  That
said, I suspect that in many cases there's a small set of a paths and
a human designer would know what to do.  I didn't manage to try your
WIP stats-on-expressions patch due to bitrot and unfinished parts, but
I am hoping it just needs to remove the "if (numcols < 2)
ereport(ERROR ...)" check to get a very very useful thing.




Re: Missing errcode() in ereport

2020-03-24 Thread Tom Lane
Andres Freund  writes:
> On 2020-03-23 17:24:49 -0400, Tom Lane wrote:
>> Hearing no objections, I started to review Andres' patchset with
>> that plan in mind.

> Thanks for pushing the first part!

I pushed all of it, actually.

regards, tom lane




Re: Option to dump foreign data in pg_dump

2020-03-24 Thread Alvaro Herrera
On 2020-Mar-23, Alvaro Herrera wrote:

> COPY public.ft1 (c1, c2, c3) FROM stdin;
> pg_dump: error: query failed: ERROR:  foreign-data wrapper "dummy" has no 
> handler
> pg_dump: error: query was: COPY (SELECT c1, c2, c3 FROM public.ft1 ) TO 
> stdout;
> 
> Maybe what we should do just verify that you do get that error (and no
> other errors).

Done that way.  Will be pushing this shortly.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 4c9456992640431c45ed47aec488ac20cae9a4b0 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 20 Mar 2020 18:42:03 -0300
Subject: [PATCH v9 1/2] pg_dump: Allow dumping data of specific foreign
 servers

Author: Luis Carril
Discussion: https://postgr.es/m/lejpr01mb0185483c0079d2f651b16231e7...@lejpr01mb0185.deuprd01.prod.outlook.de
---
 doc/src/sgml/ref/pg_dump.sgml |  30 ++
 src/bin/pg_dump/pg_dump.c | 110 --
 src/bin/pg_dump/pg_dump.h |   1 +
 3 files changed, 136 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 13bd320b31..a9bc397165 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -767,6 +767,36 @@ PostgreSQL documentation
   
  
 
+ 
+  --include-foreign-data=foreignserver
+  
+   
+Dump the data for any foreign table with a foreign server
+matching foreignserver
+pattern. Multiple foreign servers can be selected by writing multiple
+--include-foreign-data switches.
+Also, the foreignserver parameter is
+interpreted as a pattern according to the same rules used by
+psql's \d commands (see ),
+so multiple foreign servers can also be selected by writing wildcard characters
+in the pattern.  When using wildcards, be careful to quote the pattern
+if needed to prevent the shell from expanding the wildcards; see
+.
+The only exception is that an empty pattern is disallowed.
+   
+
+   
+
+ When --include-foreign-data is specified,
+ pg_dump does not check that the foreign
+ table is writeable.  Therefore, there is no guarantee that the
+ results of a foreign table dump can be successfully restored.
+
+   
+  
+ 
+
  
   --inserts
   
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 959b36a95c..1849dfe3d7 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -119,6 +119,8 @@ static SimpleStringList table_exclude_patterns = {NULL, NULL};
 static SimpleOidList table_exclude_oids = {NULL, NULL};
 static SimpleStringList tabledata_exclude_patterns = {NULL, NULL};
 static SimpleOidList tabledata_exclude_oids = {NULL, NULL};
+static SimpleStringList foreign_servers_include_patterns = {NULL, NULL};
+static SimpleOidList foreign_servers_include_oids = {NULL, NULL};
 
 
 /* placeholders for the delimiters for comments */
@@ -153,6 +155,9 @@ static void expand_schema_name_patterns(Archive *fout,
 		SimpleStringList *patterns,
 		SimpleOidList *oids,
 		bool strict_names);
+static void expand_foreign_server_name_patterns(Archive *fout,
+SimpleStringList *patterns,
+SimpleOidList *oids);
 static void expand_table_name_patterns(Archive *fout,
 	   SimpleStringList *patterns,
 	   SimpleOidList *oids,
@@ -385,6 +390,7 @@ main(int argc, char **argv)
 		{"no-sync", no_argument, NULL, 7},
 		{"on-conflict-do-nothing", no_argument, _nothing, 1},
 		{"rows-per-insert", required_argument, NULL, 10},
+		{"include-foreign-data", required_argument, NULL, 11},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -600,6 +606,11 @@ main(int argc, char **argv)
 dopt.dump_inserts = (int) rowsPerInsert;
 break;
 
+			case 11:			/* include foreign data */
+simple_string_list_append(_servers_include_patterns,
+		  optarg);
+break;
+
 			default:
 fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 exit_nicely(1);
@@ -641,6 +652,12 @@ main(int argc, char **argv)
 		exit_nicely(1);
 	}
 
+	if (dopt.schemaOnly && foreign_servers_include_patterns.head != NULL)
+		fatal("options -s/--schema-only and --include-foreign-data cannot be used together");
+
+	if (numWorkers > 1 && foreign_servers_include_patterns.head != NULL)
+		fatal("option --include-foreign-data is not supported with parallel backup");
+
 	if (dopt.dataOnly && dopt.outputClean)
 	{
 		pg_log_error("options -c/--clean and -a/--data-only cannot be used together");
@@ -808,6 +825,9 @@ main(int argc, char **argv)
 			   _exclude_oids,
 			   false);
 
+	expand_foreign_server_name_patterns(fout, _servers_include_patterns,
+		_servers_include_oids);
+
 	/* non-matching exclusion patterns aren't an error */
 
 	/*
@@ -1011,6 +1031,9 @@ help(const char *progname)
 	

Re: Add A Glossary

2020-03-24 Thread Robert Haas
On Tue, Mar 24, 2020 at 3:40 PM Jürgen Purtz  wrote:
> On 24.03.20 19:46, Robert Haas wrote:
> Do we use shared_buffers for WAL ?
>
> No.
>
> What's about the explanation in 
> https://www.postgresql.org/docs/12/runtime-config-wal.html : "wal_buffers 
> (integer)The amount of shared memory used for WAL data that has not yet 
> been written to disk. The default setting of -1 selects a size equal to 
> 1/32nd (about 3%) of shared_buffers, ... " ? My understanding was, that the 
> parameter wal_buffers grabs some of the existing shared_buffers for its own 
> purpose. Is this a misinterpretation? Are shared_buffers and wal_buffers two 
> different shared memory areas?

Yes. The code adds up the shared memory requests from all of the
different subsystems and then allocates one giant chunk of shared
memory which is divided up between them. The overwhelming majority of
that memory goes into shared_buffers, but not all of it. You can use
the new pg_get_shmem_allocations() function to see how it's used. For
example, with shared_buffers=4GB:

rhaas=# select name, pg_size_pretty(size) from
pg_get_shmem_allocations() order by size desc limit 10;
 name | pg_size_pretty
--+
 Buffer Blocks| 4096 MB
 Buffer Descriptors   | 32 MB
   | 32 MB
 XLOG Ctl | 16 MB
 Buffer IO Locks  | 16 MB
 Checkpointer Data| 12 MB
 Checkpoint BufferIds | 10 MB
 clog | 2067 kB
  | 1876 kB
 subtrans | 261 kB
(10 rows)

rhaas=# select count(*), pg_size_pretty(sum(size)) from
pg_get_shmem_allocations();
 count | pg_size_pretty
---+
54 | 4219 MB
(1 row)

So, in this configuration, there whole shared memory segment is
4219MB, of which 4096MB is allocated to shared_buffers and the rest to
dozens of smaller allocations, with 1876 kB left over that might get
snapped up later by an extension that wants some shared memory.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add A Glossary

2020-03-24 Thread Jürgen Purtz


On 24.03.20 19:46, Robert Haas wrote:

Do we use shared_buffers for WAL ?

No.


What's about the explanation in 
https://www.postgresql.org/docs/12/runtime-config-wal.html : 
"wal_buffers (integer)    The amount of shared memory used for WAL data 
that has not yet been written to disk. The default setting of -1 selects 
a size equal to 1/32nd (about 3%) of shared_buffers, ... " ? My 
understanding was, that the parameter wal_buffers grabs some of the 
existing shared_buffers for its own purpose. Is this a 
misinterpretation? Are shared_buffers and wal_buffers two different 
shared memory areas?


Kind regards, Jürgen




Re: Missing errcode() in ereport

2020-03-24 Thread Andres Freund
On 2020-03-23 17:24:49 -0400, Tom Lane wrote:
> Hearing no objections, I started to review Andres' patchset with
> that plan in mind.

Thanks for pushing the first part!




Re: Add A Glossary

2020-03-24 Thread Corey Huinker
>
>
> > > +  Records to the file system and creates a special
> >
> > Does the chckpointer actually write WAL ?
>
> Yes.
>
> > An FK doesn't require the values in its table to be unique, right ?
>
> I believe it does require that the values are unique.
>
> > I think there's some confusion.  Constraints are not objects, right ?
>
> I think constraints are definitely objects. They have names and you
> can, for example, COMMENT on them.
>
> > Do we use shared_buffers for WAL ?
>
> No.
>
> (I have not reviewed the patch; these are just a few comments on your
> comments.)
>
>
I'm going to be coalescing the feedback into an updated patch very soon
(tonight/tomorrow), so please keep the feedback on the text/wording coming
until then.
If anyone has a first attempt at all the ACID definitions, I'd love to see
those as well.


Re: Fastpath while arranging the changes in LSN order in logical decoding

2020-03-24 Thread Andres Freund
On 2020-03-24 18:36:03 +0530, Dilip Kumar wrote:
> IMHO, I have tried the best case but did not see any performance gain
> so I am not planning to test this further.  I have attached the patch
> for removing the TODO.

Pushed. Thanks!




Re: improve transparency of bitmap-only heap scans

2020-03-24 Thread Tom Lane
I took a quick look through this patch.  While I see nothing to complain
about implementation-wise, I'm a bit befuddled as to why we need this
reporting when there is no comparable data provided for regular index-only
scans.  Over there, you just get "Heap Fetches: n", and the existing
counts for bitmap scans seem to cover the same territory.

I agree with the original comment that it's pretty strange that
EXPLAIN doesn't identify an index-only BMS at all; but fixing that
is a different patch.

regards, tom lane




Re: Adding a test for speculative insert abort case

2020-03-24 Thread Andres Freund
On 2020-03-24 18:03:57 +0530, Amit Kapila wrote:
> Can we change the status of CF entry for this patch [1] to committed
> or is there any work pending?

Done!




Re: How to only auto-restart BGW only on crash or _PG_init

2020-03-24 Thread Robert Haas
On Tue, Mar 24, 2020 at 2:33 PM Jeremy Finzel  wrote:
> I would be grateful for some direction on how to use Background workers to 
> have a worker automatically restart *only* in certain cases, i.e. on 
> postmaster start (_PG_init) or a soft crash.  I run into all sorts of trouble 
> if I set bgw_restart_time to actually restart on sigterm, because in most 
> cases I don't want it to restart (i.e. it was launched with invalid config, 
> the SQL becomes invalid...).  But I *do* want it to auto-restart in any kind 
> of crash.  If I set bgw_restart_time to never restart, then it doesn't 
> restart after a soft crash, which I want.
>
> This is for my extension pglogical_ticker, and specifically within this main 
> function where a sigterm might happen:
> https://github.com/enova/pglogical_ticker/blob/ef9b68fd6b5b99787034520009577f8cfec0049c/pglogical_ticker.c#L85-L201
>
> I have tried several things unsuccessfully (checking result of SPI_execute or 
> SPI_connect) , usually resulting in a constantly restarting and failing 
> worker.  So, is there a straightforward way to only have the worker 
> auto-restart in a very narrow range of cases?

I think what you can do is configure the worker to always restart, but
then have it exit(0) in the cases where you don't want it to restart,
and exit(1) in the cases where you do want it to restart.  See:

https://git.postgresql.org/pg/commitdiff/be7558162acc5578d0b2cf0c8d4c76b6076ce352

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add A Glossary

2020-03-24 Thread Robert Haas
On Fri, Mar 20, 2020 at 3:58 PM Justin Pryzby  wrote:
> > +  A process that writes dirty pages and WAL
> > +  Records to the file system and creates a special
>
> Does the chckpointer actually write WAL ?

Yes.

> An FK doesn't require the values in its table to be unique, right ?

I believe it does require that the values are unique.

> I think there's some confusion.  Constraints are not objects, right ?

I think constraints are definitely objects. They have names and you
can, for example, COMMENT on them.

> Do we use shared_buffers for WAL ?

No.

(I have not reviewed the patch; these are just a few comments on your comments.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [PATCH] Implement INSERT SET syntax

2020-03-24 Thread Tom Lane
I wrote:
> No doubt that's all fixable, but the realization that some cases of
> this syntax are *not* just syntactic sugar for standards-compliant
> syntax is giving me pause.  Do we really want to get out front of
> the SQL committee on extending INSERT in an incompatible way?

One compromise that might be worth thinking about is to disallow
multiassignments in this syntax, so as to (1) avoid the possibility
of generating something that can't be represented by standard INSERT
and (2) get something done in time for v13.  The end of March is not
that far off.  Perhaps somebody would come back and extend it later,
or perhaps not.

A slightly more ambitious compromise would be to allow multiassignment
only when the source can be pulled apart into independent subexpressions,
comparable to the restriction we used to have in UPDATE itself (before
8f889b108 or thereabouts).

In either case the transformation could be done right in gram.y and
a helpful error thrown for unsupported cases.

regards, tom lane




How to only auto-restart BGW only on crash or _PG_init

2020-03-24 Thread Jeremy Finzel
Good afternoon!

I would be grateful for some direction on how to use Background workers to
have a worker automatically restart *only* in certain cases, i.e. on
postmaster start (_PG_init) or a soft crash.  I run into all sorts of
trouble if I set bgw_restart_time to actually restart on sigterm, because
in most cases I don't want it to restart (i.e. it was launched with invalid
config, the SQL becomes invalid...).  But I *do* want it to auto-restart in
any kind of crash.  If I set bgw_restart_time to never restart, then it
doesn't restart after a soft crash, which I want.

This is for my extension pglogical_ticker, and specifically within this
main function where a sigterm might happen:
https://github.com/enova/pglogical_ticker/blob/ef9b68fd6b5b99787034520009577f8cfec0049c/pglogical_ticker.c#L85-L201

I have tried several things unsuccessfully (checking result of SPI_execute
or SPI_connect) , usually resulting in a constantly restarting and failing
worker.  So, is there a straightforward way to only have the worker
auto-restart in a very narrow range of cases?

Many thanks!
Jeremy


Re: Add A Glossary

2020-03-24 Thread Peter Eisentraut

On 2020-03-20 01:11, Alvaro Herrera wrote:

I gave this a look.  I first reformatted it so I could read it; that's
0001.  Second I changed all the long  items into s, which
are shorter and don't have to repeat the title of the refered to page.
(Of course, this changes the link to be in the same style as every other
link in our documentation; some people don't like it. But it's our
style.)


AFAICT, all the  elements in this patch should be changed to .

If there is something undesirable about the output style, we can change 
that, but it's not this patch's job to make up its own rules.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Implement INSERT SET syntax

2020-03-24 Thread Tom Lane
David Steele  writes:
> On 12/3/19 4:44 AM, Gareth Palmer wrote:
>> Attached is a fixed version.

> Does this version of the patch address your concerns?

No.  I still find the reliance on a FROM clause being present
to be pretty arbitrary.  Also, I don't believe that ruleutils.c
requires no changes, because it's not going to be possible to
transform every usage of this syntax to old-style.  I tried to
prove the point with this trivial example:

regression=# create table foo (f1 int ,f2 int, f3 int);
CREATE TABLE
regression=# create table bar (f1 int ,f2 int, f3 int);
CREATE TABLE
regression=# create rule r1 as on insert to foo do instead
regression-# insert into bar set (f1,f2,f3) = (select f1,f2,f3 from foo);

intending to show that the rule decompilation was bogus, but
I didn't get that far because the parser crashed:

TRAP: FailedAssertion("pstate->p_multiassign_exprs == NIL", File: 
"parse_target.c", Line: 287)
postgres: postgres regression [local] CREATE 
RULE(ExceptionalCondition+0x55)[0x8fb6e5]
postgres: postgres regression [local] CREATE RULE[0x5bd0c3]
postgres: postgres regression [local] CREATE RULE[0x583def]
postgres: postgres regression [local] CREATE RULE(transformStmt+0x2d5)[0x582665]
postgres: postgres regression [local] CREATE 
RULE(transformRuleStmt+0x2ad)[0x5bf2ad]
postgres: postgres regression [local] CREATE RULE(DefineRule+0x17)[0x793847]

If I do it like this, I get a different assertion:

regression=# insert into bar set (f1,f2,f3) = (select f1,f2,f3) from foo;
server closed the connection unexpectedly

TRAP: FailedAssertion("exprKind == EXPR_KIND_UPDATE_SOURCE", File: 
"parse_target.c", Line: 209)
postgres: postgres regression [local] 
INSERT(ExceptionalCondition+0x55)[0x8fb6e5]
postgres: postgres regression [local] 
INSERT(transformTargetList+0x1a7)[0x5bd277]
postgres: postgres regression [local] INSERT(transformStmt+0xbe0)[0x582f70]
postgres: postgres regression [local] INSERT[0x5839f3]
postgres: postgres regression [local] INSERT(transformStmt+0x2d5)[0x582665]
postgres: postgres regression [local] 
INSERT(transformTopLevelStmt+0xd)[0x58411d]
postgres: postgres regression [local] INSERT(parse_analyze+0x69)[0x584269]


No doubt that's all fixable, but the realization that some cases of
this syntax are *not* just syntactic sugar for standards-compliant
syntax is giving me pause.  Do we really want to get out front of
the SQL committee on extending INSERT in an incompatible way?

regards, tom lane




Re: Columns correlation and adaptive query optimization

2020-03-24 Thread David Steele

On 12/24/19 3:15 AM, Konstantin Knizhnik wrote:
New version of patch implicitly adding multicolumn statistic in 
auto_explain extension and using it in optimizer for more precise 
estimation of join selectivity.
This patch fixes race condition while adding statistics and restricts 
generated statistic name to fit in 64 bytes (NameData).


This patch no longer applies: https://commitfest.postgresql.org/27/2386/

The CF entry has been updated to Waiting on Author.

Regards,
--
-David
da...@pgmasters.net




Re: pg_upgrade fails with non-standard ACL

2020-03-24 Thread David Steele

On 12/17/19 3:10 AM, Arthur Zakirov wrote:


I attached new version of the patch. It still uses pg_identify_object(), 
I'm not sure about other ways to build identities yet.


This patch applies and builds but fails regression tests on Linux and 
Windows:

https://travis-ci.org/postgresql-cfbot/postgresql/builds/666134656
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.85292

The CF entry has been updated to Waiting on Author.

Regards,
--
-David
da...@pgmasters.net




Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2020-03-24 Thread Ibrar Ahmed
On Fri, Mar 13, 2020 at 6:58 AM Justin Pryzby  wrote:

>
> Thanks for picking up this patch.  There's a minor typo:
>
> +* readable outside of this sessoin. Therefore
> doing IO here isn't
>
> => session
>
> --
> Justin
>

Thanks, please see the updated and rebased patch. (master
17a28b03645e27d73bf69a95d7569b61e58f06eb)

-- 
Ibrar Ahmed
diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index f0dcb897c4..6ac3e525eb 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -131,6 +131,69 @@ select pg_truncate_visibility_map('test_partition');
  
 (1 row)
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+---+-+
+ 0 | t   | t
+ 1 | t   | t
+ 2 | t   | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+
+(0 rows)
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+---+-+
+ 0 | f   | f
+ 1 | f   | f
+ 2 | t   | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+
+(0 rows)
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+---+-+
+ 0 | t   | t
+ 1 | f   | f
+ 2 | t   | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+
+(0 rows)
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -140,3 +203,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index c2a7f1d9e4..01a65fdab4 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -72,6 +72,82 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
 select * from pg_check_frozen('test_partition'); -- hopefully none
 select pg_truncate_visibility_map('test_partition');
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+\.
+copy copyfreeze from stdin;
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -81,3 +157,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 29694b8aa4..614958e5ee 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2114,6 +2114,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	int			ndone;
 	PGAlignedBlock 

Re: pg_upgrade fails with non-standard ACL

2020-03-24 Thread Anastasia Lubennikova

On 17.12.2019 11:10, Arthur Zakirov wrote:

On 2019/12/05 11:31, Michael Paquier wrote:

On Wed, Dec 04, 2019 at 06:15:52PM +0900, Arthur Zakirov wrote:
Ah, I thought that pg_identify_object() gives properly quoted 
identity, and

it could be used to make SQL script.


It depends on the object type.  For columns I can see in your patch
that you are using a dedicated code path, but I don't think that we
should assume that there is an exact one-one mapping between the
object type and the grammar of GRANT/REVOKE for the object type
supported because both are completely independent things and
facilities.  Take for example foreign data wrappers.  Of course for
this patch this depends if we have system object of the type which
would be impacted.  That's not the case of foreign data wrappers and
likely it will never be, but there is no way to be sure that this
won't get broken silently in the future.


I attached new version of the patch. It still uses 
pg_identify_object(), I'm not sure about other ways to build 
identities yet.


Michael, do I understand it correctly that your concerns about 
pg_identify_object() relate only to the revoke sql script?


Now, I tend to agree that we should split this patch into two separate 
parts, to make it simpler.
The first patch is to find affected objects and print warnings and the 
second is to generate script.




I added "test_rename_catalog_objects.patch" and 
"test_add_acl_to_catalog_objects.sql". So testing steps are following:
- initialize new source instance (e.g. pg v12) and run 
"test_add_acl_to_catalog_objects.sql" on it
- apply "pg_upgrade_ACL_check_v6.patch" and 
"test_add_acl_to_catalog_objects.sql" for HEAD

- initialize new target instance for HEAD
- run pg_upgrade, it should create revoke_objects.sql file

"test_rename_catalog_objects.patch" should be applied after 
"pg_upgrade_ACL_check_v6.patch".


Renamed objects are the following:
- table pg_subscription -> pg_sub
- columns pg_subscription.subenabled -> subactive, 
pg_subscription.subconninfo -> subconn

- view pg_stat_subscription -> pg_stat_sub
- columns pg_stat_subscription.received_lsn -> received_location, 
pg_stat_subscription.latest_end_lsn -> latest_end_location

- function pg_stat_get_subscription -> pg_stat_get_sub
- language sql -> pgsql


I've tried to test it. Script is attached.
Described test case works. If a new cluster contains renamed objects, 
/pg_upgrade --check/ successfully finds them and generates a script to 
revoke non-default ACL. Though, without 
test_rename_catalog_objects.patch, /pg_upgrade --check/ still generates 
the same message, which is false positive.


I am going to fix it and send the updated patch later this week.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



pg_upgrade_ACL_test.sh
Description: application/shellscript


Re: backup manifests

2020-03-24 Thread Robert Haas
On Mon, Mar 23, 2020 at 11:43 PM Amit Kapila  wrote:
> All others except one are passing now.  See the summary of the failed
> test below and attached are failed run logs.
>
> Test Summary Report
> ---
> t/003_corruption.pl  (Wstat: 65280 Tests: 14 Failed: 0)
>   Non-zero exit status: 255
>   Parse errors: Bad plan.  You planned 44 tests but ran 14.
> Files=6, Tests=123, 164 wallclock secs ( 0.06 usr +  0.02 sys =  0.08 CPU)
> Result: FAIL

Hmm. It looks like it's trying to remove the symlink that points to
the tablespace directory, and failing with no error message. I could
set that permutation to be skipped on Windows, or maybe there's an
alternate method you can suggest that would work?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [PATCH] Implement INSERT SET syntax

2020-03-24 Thread David Steele

Hi Tom,

On 12/3/19 4:44 AM, Gareth Palmer wrote:

On Sun, Dec 1, 2019 at 4:32 PM Michael Paquier  wrote:


On Fri, Nov 22, 2019 at 12:24:15PM +1300, Gareth Palmer wrote:

Attached is an updated patch with for_locking_clause added, test-cases
re-use existing tables and the comments and documentation have been
expanded.


Per the automatic patch tester, documentation included in the patch
does not build.  Could you please fix that?  I have moved the patch to
next CF, waiting on author.


Attached is a fixed version.


Does this version of the patch address your concerns?

Regards,
--
-David
da...@pgmasters.net




Re: Index Skip Scan

2020-03-24 Thread Dmitry Dolgov
> On Wed, Mar 11, 2020 at 06:56:09PM +0800, Andy Fan wrote:
>
> There was a dedicated thread  [1]  where David explain his idea very
> detailed, and you can also check that messages around that message for
> the context. hope it helps.

Thanks for pointing out to this thread! Somehow I've missed it, and now
looks like we need to make some efforts to align patches for index skip
scan and distincClause elimination.




Re: Index Skip Scan

2020-03-24 Thread Dmitry Dolgov
> On Wed, Mar 11, 2020 at 11:17:51AM +1300, David Rowley wrote:
>
> Yes, I was complaining that a ProjectionPath breaks the optimisation
> and I don't believe there's any reason that it should.
>
> I believe the way to make that work correctly requires paying
> attention to the Path's uniquekeys rather than what type of path it
> is.

Thanks for the suggestion. As a result of the discussion I've modified
the patch, does it look similar to what you had in mind?

In this version if all conditions are met and there are corresponding
unique keys, a new index skip scan path will be added to
unique_pathlists. In case if requested distinct clauses match with
unique keys, create_distinct_paths can choose this path without needen
to know what kind of path is it. Also unique_keys are passed through
ProjectionPath, so optimization for the example mentioned in this thread
before now should work (I've added one test for that).

I haven't changed anything about UniqueKey structure itself (one of the
suggestions was about Expr instead of EquivalenceClass), but I believe
we need anyway to figure out how two existing imlementation (in this
patch and from [1]) of this idea can be connected.

[1]: 
https://www.postgresql.org/message-id/flat/CAKU4AWrwZMAL%3DuaFUDMf4WGOVkEL3ONbatqju9nSXTUucpp_pw%40mail.gmail.com
>From c7af8157da82db1cedf02e6ec0de355b56275680 Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Tue, 24 Mar 2020 17:04:32 +0100
Subject: [PATCH v33 1/2] Unique key

Design by David Rowley.

Author: Jesper Pedersen
---
 src/backend/nodes/outfuncs.c  | 14 ++
 src/backend/nodes/print.c | 39 +++
 src/backend/optimizer/path/Makefile   |  3 +-
 src/backend/optimizer/path/allpaths.c |  8 +++
 src/backend/optimizer/path/indxpath.c | 41 
 src/backend/optimizer/path/pathkeys.c | 71 ++-
 src/backend/optimizer/plan/planagg.c  |  1 +
 src/backend/optimizer/plan/planmain.c |  1 +
 src/backend/optimizer/plan/planner.c  | 37 +-
 src/backend/optimizer/util/pathnode.c | 46 +
 src/include/nodes/nodes.h |  1 +
 src/include/nodes/pathnodes.h | 19 +++
 src/include/nodes/print.h |  1 +
 src/include/optimizer/pathnode.h  |  2 +
 src/include/optimizer/paths.h | 11 +
 15 files changed, 272 insertions(+), 23 deletions(-)

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index d76fae44b8..16083e7a7e 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1723,6 +1723,7 @@ _outPathInfo(StringInfo str, const Path *node)
 	WRITE_FLOAT_FIELD(startup_cost, "%.2f");
 	WRITE_FLOAT_FIELD(total_cost, "%.2f");
 	WRITE_NODE_FIELD(pathkeys);
+	WRITE_NODE_FIELD(uniquekeys);
 }
 
 /*
@@ -2205,6 +2206,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
 	WRITE_NODE_FIELD(eq_classes);
 	WRITE_BOOL_FIELD(ec_merging_done);
 	WRITE_NODE_FIELD(canon_pathkeys);
+	WRITE_NODE_FIELD(canon_uniquekeys);
 	WRITE_NODE_FIELD(left_join_clauses);
 	WRITE_NODE_FIELD(right_join_clauses);
 	WRITE_NODE_FIELD(full_join_clauses);
@@ -2214,6 +2216,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
 	WRITE_NODE_FIELD(placeholder_list);
 	WRITE_NODE_FIELD(fkey_list);
 	WRITE_NODE_FIELD(query_pathkeys);
+	WRITE_NODE_FIELD(query_uniquekeys);
 	WRITE_NODE_FIELD(group_pathkeys);
 	WRITE_NODE_FIELD(window_pathkeys);
 	WRITE_NODE_FIELD(distinct_pathkeys);
@@ -2401,6 +2404,14 @@ _outPathKey(StringInfo str, const PathKey *node)
 	WRITE_BOOL_FIELD(pk_nulls_first);
 }
 
+static void
+_outUniqueKey(StringInfo str, const UniqueKey *node)
+{
+	WRITE_NODE_TYPE("UNIQUEKEY");
+
+	WRITE_NODE_FIELD(eq_clause);
+}
+
 static void
 _outPathTarget(StringInfo str, const PathTarget *node)
 {
@@ -4092,6 +4103,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PathKey:
 _outPathKey(str, obj);
 break;
+			case T_UniqueKey:
+_outUniqueKey(str, obj);
+break;
 			case T_PathTarget:
 _outPathTarget(str, obj);
 break;
diff --git a/src/backend/nodes/print.c b/src/backend/nodes/print.c
index 42476724d8..d286b34544 100644
--- a/src/backend/nodes/print.c
+++ b/src/backend/nodes/print.c
@@ -459,6 +459,45 @@ print_pathkeys(const List *pathkeys, const List *rtable)
 	printf(")\n");
 }
 
+/*
+ * print_uniquekeys -
+ *	  uniquekeys list of UniqueKeys
+ */
+void
+print_uniquekeys(const List *uniquekeys, const List *rtable)
+{
+	ListCell   *l;
+
+	printf("(");
+	foreach(l, uniquekeys)
+	{
+		UniqueKey *unique_key = (UniqueKey *) lfirst(l);
+		EquivalenceClass *eclass = (EquivalenceClass *) unique_key->eq_clause;
+		ListCell   *k;
+		bool		first = true;
+
+		/* chase up */
+		while (eclass->ec_merged)
+			eclass = eclass->ec_merged;
+
+		printf("(");
+		foreach(k, eclass->ec_members)
+		{
+			EquivalenceMember *mem = (EquivalenceMember *) lfirst(k);
+
+			if (first)
+first = false;
+			else
+printf(", ");
+			print_expr((Node *) mem->em_expr, 

Re: Built-in connection pooler

2020-03-24 Thread Konstantin Knizhnik

Hi David,

On 24.03.2020 16:26, David Steele wrote:

Hi Konstantin,

On 11/14/19 2:06 AM, Konstantin Knizhnik wrote:

Attached please find rebased patch with this bug fixed.


This patch no longer applies: http://cfbot.cputube.org/patch_27_2067.log

CF entry has been updated to Waiting on Author.



Rebased version of the patch is attached.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/spi/refint.c b/contrib/spi/refint.c
index 6fbfef2..27aa6cb 100644
--- a/contrib/spi/refint.c
+++ b/contrib/spi/refint.c
@@ -11,6 +11,7 @@
 
 #include "commands/trigger.h"
 #include "executor/spi.h"
+#include "storage/proc.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -94,6 +95,8 @@ check_primary_key(PG_FUNCTION_ARGS)
 	else
 		tuple = trigdata->tg_newtuple;
 
+	MyProc->is_tainted = true;
+
 	trigger = trigdata->tg_trigger;
 	nargs = trigger->tgnargs;
 	args = trigger->tgargs;
@@ -286,6 +289,8 @@ check_foreign_key(PG_FUNCTION_ARGS)
 		/* internal error */
 		elog(ERROR, "check_foreign_key: cannot process INSERT events");
 
+	MyProc->is_tainted = true;
+
 	/* Have to check tg_trigtuple - tuple being deleted */
 	trigtuple = trigdata->tg_trigtuple;
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 355b408..23210ba 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -732,6 +732,169 @@ include_dir 'conf.d'
   
  
 
+ 
+  max_sessions (integer)
+  
+   max_sessions configuration parameter
+  
+  
+  
+
+  The maximum number of client sessions that can be handled by
+  one connection proxy when session pooling is enabled.
+  This parameter does not add any memory or CPU overhead, so
+  specifying a large max_sessions value
+  does not affect performance.
+  If the max_sessions limit is reached new connections are not accepted.
+
+
+  The default value is 1000. This parameter can only be set at server start.
+   
+  
+ 
+
+ 
+  session_pool_size (integer)
+  
+   session_pool_size configuration parameter
+  
+  
+  
+
+  Enables session pooling and defines the maximum number of
+  backends that can be used by client sessions for each database/user combination.
+  Launched non-tainted backends are never terminated even if there are no active sessions.
+  Backend is considered as tainted if client updates GUCs, creates temporary table or prepared statements.
+  Tainted backend can server only one client.
+
+
+  The default value is 10, so up to 10 backends will serve each database,
+
+  
+ 
+
+ 
+  proxy_port (integer)
+  
+   proxy_port configuration parameter
+  
+  
+  
+
+  Sets the TCP port for the connection pooler.
+  Clients connected to main "port" will be assigned dedicated backends,
+  while client connected to proxy port will be connected to backends through proxy which
+  performs transaction level scheduling. 
+   
+
+  The default value is 6543.
+
+  
+ 
+
+ 
+  connection_proxies (integer)
+  
+   connection_proxies configuration parameter
+  
+  
+  
+
+  Sets number of connection proxies.
+  Postmaster spawns separate worker process for each proxy. Postmaster scatters connections between proxies using one of scheduling policies (round-robin, random, load-balancing).
+  Each proxy launches its own subset of backends.
+  So maximal number of non-tainted backends is  session_pool_size*connection_proxies*databases*roles.
+   
+
+  The default value is 0, so session pooling is disabled.
+
+  
+ 
+
+ 
+  session_schedule (enum)
+  
+   session_schedule configuration parameter
+  
+  
+  
+
+  Specifies scheduling policy for assigning session to proxies in case of
+  connection pooling. Default policy is round-robin.
+
+
+  With round-robin policy postmaster cyclicly scatter sessions between proxies.
+
+
+  With random policy postmaster randomly choose proxy for new session.
+
+
+  With load-balancing policy postmaster choose proxy with lowest load average.
+  Load average of proxy is estimated by number of clients connection assigned to this proxy with extra weight for SSL connections.
+   
+  
+ 
+
+ 
+  idle_pool_worker_timeout (integer)
+  
+   idle_pool_worker_timeout configuration parameter
+  
+  
+  
+
+ Terminate an idle connection pool worker after the specified number of milliseconds.
+ The default value is 0, so 

Re: Control your disk usage in PG: Introduction to Disk Quota Extension

2020-03-24 Thread David Steele

On 12/2/19 1:39 AM, Haozhou Wang wrote:


Thank you very much for your email. I rebased the code with the newest 
master and attached it in the attachment.


This patch applies but no longer builds on Linux or Windows:
https://travis-ci.org/github/postgresql-cfbot/postgresql/builds/666113036
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.85284

The CF entry has been updated to Waiting on Author.

Regards,
--
-David
da...@pgmasters.net




Re: Creating foreign key on partitioned table is too slow

2020-03-24 Thread Alvaro Herrera
On 2020-Mar-24, David Steele wrote:

> This patch still applies but there seems to be some disagreement on
> how to proceed.

Actually, I don't think there's any disagreement regarding the patch I
last posted.  (There was disagreement on the previous patches, which
were very different).  Tom suggested to look at the heuristics used for
RECOVER_RELATION_BUILD_MEMORY, and the patch does exactly that.  It
would be great if Kato Sho can try the original test case with my latest
patch (the one in https://postgr.es/m/20191113214544.GA16060@alvherre.pgsql )
and let us know if it improves things.

The patch as posted generates these warnings in my current GCC that it
didn't when I checked last, but they're harmless -- if/when I push,
it'll be without the parens.

/pgsql/source/master/src/backend/utils/cache/relcache.c:1064:21: warning: 
equality comparison with extraneous parentheses [-Wparentheses-equality]
if ((relp->relkind == RELKIND_PARTITIONED_TABLE)
 ~~^~~~
/pgsql/source/master/src/backend/utils/cache/relcache.c:1064:21: note: remove 
extraneous parentheses around the comparison to silence this warning
if ((relp->relkind == RELKIND_PARTITIONED_TABLE)
~  ^   ~
/pgsql/source/master/src/backend/utils/cache/relcache.c:1064:21: note: use '=' 
to turn this equality comparison into an assignment
if ((relp->relkind == RELKIND_PARTITIONED_TABLE)
   ^~
   =
/pgsql/source/master/src/backend/utils/cache/relcache.c:1242:33: warning: 
equality comparison with extraneous parentheses [-Wparentheses-equality]
if ((relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 ~~^~~~
/pgsql/source/master/src/backend/utils/cache/relcache.c:1242:33: note: remove 
extraneous parentheses around the comparison to silence this warning
if ((relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
~  ^   ~
/pgsql/source/master/src/backend/utils/cache/relcache.c:1242:33: note: use '=' 
to turn this equality comparison into an assignment
if ((relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
   ^~
   =
2 warnings generated.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: replay pause vs. standby promotion

2020-03-24 Thread Sergei Kornilov
Hello

> I pushed the latest version of the patch. If you have further opinion
> about immediate promotion, let's keep discussing that!

Thank you!

Honestly, I forgot that the promotion is documented in high-availability.sgml 
as:

> Before failover, any WAL immediately available in the archive or in pg_wal 
> will be
> restored, but no attempt is made to connect to the master.

I mistakenly thought that promote should be "immediately"...

> If a promotion is triggered while recovery is paused, the paused state ends 
> and a promotion continues.

Could we add a few words in func.sgml to clarify the behavior? Especially for 
users from my example above. Something like:

> If a promotion is triggered while recovery is paused, the paused state ends, 
> replay of any WAL immediately available in the archive or in pg_wal will be 
> continued and then a promotion will be completed.

regards, Sergei




Re: documentation pdf build fail (HEAD)

2020-03-24 Thread Tom Lane
Peter Eisentraut  writes:
> On 2020-03-24 15:31, Tom Lane wrote:
>> The problem seems to be that cedffbdb8
>> introduced some broken table markup.  I wonder why xmllint
>> failed to catch it?

> It's not a validity issue in the DocBook markup.  The error comes from 
> FOP, which complains because it requires the column count, but other 
> processors don't necessarily require it.

Maybe not, but if the count is there, shouldn't it be checked?

In this particular case, the table was obviously broken if you looked
at the rendered HTML, but I'd kind of expect the toolchain to provide
basic sanity checks without having to do that.

regards, tom lane




Re: extension patch of CREATE OR REPLACE TRIGGER

2020-03-24 Thread David Steele

On 12/1/19 8:56 PM, osumi.takami...@fujitsu.com wrote:



The latest patch includes calls to heap_open(), causing its compilation to fail.
Could you please send a rebased version of the patch?  I have moved the entry to
next CF, waiting on author.

Thanks. I've fixed where you pointed out.


This patch no longer applies: http://cfbot.cputube.org/patch_27_2307.log

CF entry has been updated to Waiting on Author.


Also, I'm waiting for other kind of feedbacks from anyone.


Hopefully a re-based patch will help with that.

Regards,
--
-David
da...@pgmasters.net




Re: pgbench - add \aset to store results of a combined query

2020-03-24 Thread David Steele

On 11/29/19 4:34 AM, Fabien COELHO wrote:


It seems to me that there is no point to have the variable aset in 
Command because this structure includes already MetaCommand, so the 
information is duplicated. [...] Perhaps I am missing something?


Yep. ISTM that you are missing that aset is not an independent meta 
command like most others but really changes the state of the previous 
SQL command, so that it needs to be stored into that with some 
additional fields. This is the same with "gset" which is tagged by a 
non-null "varprefix".


So I cannot remove the "aset" field.

And I would suggest to change readCommandResponse() to use a 
MetaCommand in argument.


MetaCommand is not enough: we need varprefix, and then distinguishing 
between aset and gset. Although this last point can be done with a 
MetaCommand, ISTM that a bool is_aset is clear and good enough. It is 
possible to switch if you insist on it, but I do not think it is desirable.


Michael, do you agree with Fabien's comments?

Attached v4 removes an unwanted rebased comment duplication and does 
minor changes while re-reading the code.


This patch no longer applies: http://cfbot.cputube.org/patch_27_2091.log

CF entry has been updated to Waiting on Author.

Regards,
--
-David
da...@pgmasters.net




Re: [patch]socket_timeout in interfaces/libpq

2020-03-24 Thread David Steele

On 11/29/19 12:22 AM, nagaura.ryo...@fujitsu.com wrote:



From: Michael Paquier 
On Wed, Jun 26, 2019 at 11:56:28AM +, nagaura.ryo...@fujitsu.com wrote:

It seems that you did not think so at that time.
# Please refer to [1]

I don't think all the reviewers are completely negative.


I recall having a negative impression on the patch when first looking at it, 
and still
have the same impression when looking at the last version.  Just with a quick
look, assuming that you can bypass all cleanup operations normally taken by
pqDropConnection() through a hijacking of pqWait() is not fine as this routine
explicitely assumes to *never* have a timeout for its wait.

>

I couldn't understand what you meant.
Do you say that we shouldn't change pqWait() behavior?
Or should I modify my patch to use pqDropConnection()?


This patch no longer applies: http://cfbot.cputube.org/patch_27_2175.log

CF entry has been updated to Waiting on Author.

More importantly it looks like there is still no consensus on this  
patch, which is an uncommitted part of a previous patch [1].


Unless somebody chimes in I'll mark this Returned with Feedback at the  
end of the CF.


Regards,
--
-David
da...@pgmasters.net

[1]  
https://www.postgresql.org/message-id/raw/20190406065428.GA2145%40paquier.xyz





Re: psql - improve test coverage from 41% to 88%

2020-03-24 Thread David Steele

Hi Fabien,

On 11/27/19 11:01 PM, Michael Paquier wrote:

On Wed, Nov 27, 2019 at 10:14:16AM +0100, Fabien COELHO wrote:

Indeed, I did not notice.



This patch no longer applies: http://cfbot.cputube.org/patch_27_2262.log

CF entry has been updated to Waiting on Author.

Regards,
--
-David
da...@pgmasters.net




Re: documentation pdf build fail (HEAD)

2020-03-24 Thread Peter Eisentraut

On 2020-03-24 15:31, Tom Lane wrote:

The problem seems to be that cedffbdb8
introduced some broken table markup.  I wonder why xmllint
failed to catch it?


It's not a validity issue in the DocBook markup.  The error comes from 
FOP, which complains because it requires the column count, but other 
processors don't necessarily require it.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Creating foreign key on partitioned table is too slow

2020-03-24 Thread David Steele

On 11/13/19 4:45 PM, Alvaro Herrera wrote:
>

But it is not totally
unreasonable to have lots of partitions, and as we improve the system,
more and more people will want to.


+1

This patch still applies but there seems to be some disagreement on how 
to proceed.


Any thoughts?

Regards,
--
-David
da...@pgmasters.net




Re: color by default

2020-03-24 Thread Juan José Santamaría Flecha
On Mon, Mar 23, 2020 at 9:32 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

>
> I'm giving up on making color the default, since there is clearly no
> consensus.
>
> Attached is the documentation patch reworked.
>

I think there is also some value in adding the functionality proposed in
terminal_supports_color().

Regards,

Juan José Santamaría Flecha


Re: Additional improvements to extended statistics

2020-03-24 Thread Tomas Vondra

On Tue, Mar 24, 2020 at 01:20:07PM +, Dean Rasheed wrote:

On Tue, 24 Mar 2020 at 01:08, Tomas Vondra  wrote:


Hmmm. So let's consider a simple OR clause with two arguments, both
covered by single statistics object. Something like this:

   CREATE TABLE t (a int, b int);
   INSERT INTO t SELECT mod(i, 10), mod(i, 10)
 FROM generate_series(1,10);
   CREATE STATISTICS s (mcv) ON a,b FROM t;

   SELECT * FROM t WHERE a = 0 OR b = 0;

Which is estimated correctly...



Hmm, the reason that is estimated correctly is that the MCV values
cover all values in the table, so mcv_totalsel is 1 (or pretty close
to 1), and other_sel is capped to nearly 0, and so the result is
basically just mcv_sel -- i.e., in this case the MCV estimates are
returned more-or-less as-is, rather than being applied as a
correction, and so the result is independent of how many times
extended stats are applied.

The more interesting (and probably more realistic) case is where the
MCV values do not cover the all values in the table, as in the new
mcv_lists_partial examples in the regression tests, for example this
test case, which produces a significant overestimate:

 SELECT * FROM mcv_lists_partial WHERE a = 0 OR b = 0 OR c = 0

although actually, I think there's another reason for that (in
addition to the extended stats correction being applied twice) -- the
way the MCV code makes use of base selectivity doesn't seem really
appropriate for OR clauses, because the way base_frequency is computed
is really based on the assumption that every column would be matched.
I'm not sure that there's any easy answer to that though. I feels like
what's needed when computing the match bitmaps in mcv.c, is to produce
a bitmap (would it fit in 1 byte?) per value, to indicate which
columns match, and base_frequency values per column. That would be
significantly more work though, so almost certainly isn't feasible for
PG13.



Good point. I haven't thought about the base frequencies. I think 1 byte
should be enough, as we limit the number of columns to 8.


IIUC the problem you have in mind is that we end up calling
statext_mcv_clauselist_selectivity twice, once for the top-level AND
clause with a single argument, and then recursively for the expanded OR
clause. Indeed, that seems to be applying the correction twice.


>I'm not sure what's the best way to resolve that. Perhaps
>statext_mcv_clauselist_selectivity() / statext_is_compatible_clause()
>should ignore OR clauses from an AND-list, on the basis that they will
>get processed recursively later. Or perhaps estimatedclauses can
>somehow be used to prevent this, though I'm not sure exactly how that
>would work.

I don't know. I feel uneasy about just ignoring some of the clauses,
because what happens for complex clauses, where the OR is not directly
in the AND clause, but is negated or something like that?

Isn't it the case that clauselist_selectivity_simple (and the OR
variant) should ignore extended stats entirely? That is, we'd need to
add a flag (or _simple variant) to clause_selectivity, so that it calls
causelist_selectivity_simple_or. So the calls would look like this:

   clauselist_selectivity
 statext_clauselist_selectivity
   statext_mcv_clauselist_selectivity
 clauselist_selectivity_simple  <--- disable extended stats
   clause_selectivity
 clauselist_selectivity_simple_or
   clause_selectivity
   clause_selectivity
 mcv_clauselist_selectivity
 clauselist_selectivity_simple
   already estimated

I've only quickly hacked clause_selectivity, but it does not seems very
invasive (of course, it means disruption to clause_selectivity callers,
but I suppose most call clauselist_selectivity).



Sounds like a reasonable approach, but I think it would be better to
preserve the current public API by having clauselist_selectivity()
become a thin wrapper around  a new function that optionally applies
extended stats.



OK, makes sense. I'll take a stab at it.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: documentation pdf build fail (HEAD)

2020-03-24 Thread Tom Lane
Erikjan Rijkers  writes:
> I build the pdf (for HEAD) almost daily without problems, but at the 
> moment I get the error below.
> I am not sure whether to blame my particular setup (debian stretch), or 
> whether there might be an error in the .sgml.  The html files still 
> build OK.

Yeah, I see it too.  The problem seems to be that cedffbdb8
introduced some broken table markup.  I wonder why xmllint
failed to catch it?  While catching morerows mistakes might be
hard in general, it shouldn't have been difficult to notice that
this table row contained more columns than the table spec allowed.

> If anyone has a suggestion on how to tackle this I'd be grateful.

The "position" noted in the error report seems to be a line number
and column number in the .fo file.  Once you go there and look around
at surrounding text, you can locate the matching .sgml input and then
try to eyeball what's wrong with it.

Fix pushed.

regards, tom lane




Re: error context for vacuum to include block number

2020-03-24 Thread Masahiko Sawada
On Tue, 24 Mar 2020 at 22:37, Amit Kapila  wrote:
>
> On Tue, Mar 24, 2020 at 6:18 PM Masahiko Sawada
>  wrote:
> >
> >
> > I've read through the latest version patch (v31), here are my comments:
> >
> > 1.
> > +/* Update error traceback information */
> > +olderrcbarg = *vacrelstats;
> > +update_vacuum_error_cbarg(vacrelstats,
> > +  VACUUM_ERRCB_PHASE_TRUNCATE,
> > new_rel_pages, NULL,
> > +  false);
> > +
> >  /*
> >   * Scan backwards from the end to verify that the end pages 
> > actually
> >   * contain no tuples.  This is *necessary*, not optional, because
> >   * other backends could have added tuples to these pages whilst we
> >   * were vacuuming.
> >   */
> >  new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
> >
> > We need to set the error context after setting new_rel_pages.
> >
>
> We want to cover the errors raised in count_nondeletable_pages().  In
> an earlier version of the patch, we had TRUNCATE_PREFETCH phase which
> use to cover those errors, but that was not good as we were
> setting/resetting it multiple times and it was not clear such a
> separate phase would add any value.

I got the point. But if we set the error context before that, I think
we need to change the error context message. The error context message
of heap truncation phase is "while truncating relation \"%s.%s\" to %u
blocks", but cbarg->blkno will be the number of blocks of the current
relation.

   case VACUUM_ERRCB_PHASE_TRUNCATE:
   if (BlockNumberIsValid(cbarg->blkno))
   errcontext("while truncating relation \"%s.%s\" to %u blocks",
  cbarg->relnamespace, cbarg->relname, cbarg->blkno);
   break;

>
> > 2.
> > +   vacrelstats->relnamespace =
> > get_namespace_name(RelationGetNamespace(onerel));
> > +   vacrelstats->relname = pstrdup(RelationGetRelationName(onerel));
> >
> > I think we can pfree these two variables to avoid a memory leak during
> > vacuum on multiple relations.
> >
>
> Yeah, I had also thought about it but I noticed that we are not
> freeing for vacrelstats.  Also, I think the memory is allocated in
> TopTransactionContext which should be freed via
> CommitTransactionCommand before vacuuming of the next relation, so not
> sure if there is much value in freeing those variables.

Right, thank you for explanation. I was concerned memory leak because
relation name and schema name could be large compared to vacrelstats
but I agree to free them at commit time.

>
> > 3.
> > +/* Phases of vacuum during which we report error context. */
> > +typedef enum
> > +{
> > +   VACUUM_ERRCB_PHASE_UNKNOWN,
> > +   VACUUM_ERRCB_PHASE_SCAN_HEAP,
> > +   VACUUM_ERRCB_PHASE_VACUUM_INDEX,
> > +   VACUUM_ERRCB_PHASE_VACUUM_HEAP,
> > +   VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
> > +   VACUUM_ERRCB_PHASE_TRUNCATE
> > +} ErrCbPhase;
> >
> > Comparing to the vacuum progress phases, there is not a phase for
> > final cleanup which includes updating the relation stats. Is there any
> > reason why we don't have that phase for ErrCbPhase?
> >
>
> I think we can add it if we want, but it was not clear to me if that is 
> helpful.

Yeah, me too. The most likely place where an error happens is
vac_update_relstats but the error message seems to be enough.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Internal key management system

2020-03-24 Thread Bruce Momjian
On Tue, Mar 24, 2020 at 02:29:57PM +0900, Masahiko Sawada wrote:
> That seems to work fine.
> 
> So we will have pg_cryptokeys within PGDATA and each key is stored
> into separate file named the key id such as "sql", "tde-wal" and
> "tde-block". I'll update the patch and post.

Yes, that makes sense to me.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: documentation pdf build fail (HEAD)

2020-03-24 Thread Jürgen Purtz

Ubuntu 18.04: no crash, but possibly a side effect:

[INFO] FOUserAgent - Rendered page #2685.
[INFO] FOUserAgent - Rendered page #2686.
[INFO] FOUserAgent - Rendered page #2687.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"function-encode" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"function-decode" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-altercollation-notes-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-altertable-notes-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-createaggregate-notes-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-createindex-storage-parameters-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-createindex-concurrently-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-createtable-storage-parameters-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-createtable-compatibility-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-declare-notes-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-inserting-params-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-on-conflict-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-prepare-examples-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-reindex-concurrently-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-with-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-from-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-where-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-groupby-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-having-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-window-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-select-list-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-distinct-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-union-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-intersect-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-except-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-orderby-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-limit-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"sql-for-update-share-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"pg-dump-examples-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"app-psql-patterns-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"app-psql-variables-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"app-psql-interpolation-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"app-psql-prompting-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"app-psql-environment-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"app-psql-examples-title" found.
[WARN] FOUserAgent - Destination: Unresolved ID reference 
"app-postgres-single-user-title" found.

[INFO] FOUserAgent - Rendered page #2688.
[WARN] FOUserAgent - Page 226: Unresolved ID reference "function-decode" 
found.
[WARN] FOUserAgent - Page 226: Unresolved ID reference "function-encode" 
found.


Kind regards, J. Purtz





Re: improve transparency of bitmap-only heap scans

2020-03-24 Thread James Coleman
On Tue, Mar 24, 2020 at 1:24 AM Amit Kapila  wrote:
>
> On Fri, Mar 20, 2020 at 7:09 AM James Coleman  wrote:
> >
> > Awesome, thanks for confirming with an actual plan.
> >
> > > I don't think it matters in nontext mode, but at least in text mode, I 
> > > think
> > > maybe the Unfetched blocks should be output after the exact and lossy 
> > > blocks,
> > > in case someone is parsing it, and because bitmap-only is a relatively new
> > > feature.  Its output is probably less common than exact/lossy.
> >
> > I tweaked that (and a comment that didn't reference the change); see 
> > attached.
> >
>
> Few comments:
> 1.
> -
> - if (tbmres->ntuples >= 0)
> + else if (tbmres->ntuples >= 0)
>   node->exact_pages++;
>
> How is this change related to this patch?



> 2.
> + * unfetched_pagestotal number of pages not retrieved due to vm
>   * prefetch_iterator  iterator for prefetching ahead of current page
>   * prefetch_pages# pages prefetch iterator is ahead of current
>   * prefetch_targetcurrent target prefetch distance
> @@ -1591,6 +1592,7 @@ typedef struct BitmapHeapScanState
>   Buffer pvmbuffer;
>   long exact_pages;
>   long lossy_pages;
> + long unfetched_pages;
>
> Can we name it as skipped_pages?

That seems easy enough to do.

> 3. Can we add a test or two for this functionality?

>From what I can tell the current lossy page count isn't tested either;
would we expect the explain output from such a test to be stable
across different architectures etc.?

James




Re: error context for vacuum to include block number

2020-03-24 Thread Amit Kapila
On Tue, Mar 24, 2020 at 7:18 PM Justin Pryzby  wrote:
>
> On Tue, Mar 24, 2020 at 07:07:03PM +0530, Amit Kapila wrote:
> > On Tue, Mar 24, 2020 at 6:18 PM Masahiko Sawada 
> >  wrote:
> > > 1.
> > > +/* Update error traceback information */
> > > +olderrcbarg = *vacrelstats;
> > > +update_vacuum_error_cbarg(vacrelstats,
> > > +  VACUUM_ERRCB_PHASE_TRUNCATE,
> > > new_rel_pages, NULL,
> > > +  false);
> > > +
> > >  /*
> > >   * Scan backwards from the end to verify that the end pages 
> > > actually
> > >   * contain no tuples.  This is *necessary*, not optional, because
> > >   * other backends could have added tuples to these pages whilst 
> > > we
> > >   * were vacuuming.
> > >   */
> > >  new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
> > >
> > > We need to set the error context after setting new_rel_pages.
> >
> > We want to cover the errors raised in count_nondeletable_pages().  In
> > an earlier version of the patch, we had TRUNCATE_PREFETCH phase which
> > use to cover those errors, but that was not good as we were
> > setting/resetting it multiple times and it was not clear such a
> > separate phase would add any value.
>
> I insisted on covering count_nondeletable_pages since it calls ReadBuffer(),
> but I think we need to at least set vacrelsats->blkno = new_rel_pages, since 
> it
> may be different, right ?
>

yeah, that makes sense.

> > > 2.
> > > +   vacrelstats->relnamespace =
> > > get_namespace_name(RelationGetNamespace(onerel));
> > > +   vacrelstats->relname = pstrdup(RelationGetRelationName(onerel));
> > >
> > > I think we can pfree these two variables to avoid a memory leak during
> > > vacuum on multiple relations.
> >
> > Yeah, I had also thought about it but I noticed that we are not
> > freeing for vacrelstats.  Also, I think the memory is allocated in
> > TopTransactionContext which should be freed via
> > CommitTransactionCommand before vacuuming of the next relation, so not
> > sure if there is much value in freeing those variables.
>
> One small reason to free them is that (as Tom mentioned upthread) it's good to
> ensure that those variables are their own allocation, and not depending on
> being able to access relcache or anything else during an unexpected error.
>

That is a good reason to allocate them separately but not for doing
retail free especially when the caller of the function will free the
context in which that is allocated.  I think Sawada-San's concern was
that it will leak memory across the vacuum of multiple relations but
that is not the case here.  Won't it look odd if we are freeing memory
for members of vacrelstats but not for vacrelstats itself?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: error context for vacuum to include block number

2020-03-24 Thread Justin Pryzby
On Tue, Mar 24, 2020 at 07:07:03PM +0530, Amit Kapila wrote:
> On Tue, Mar 24, 2020 at 6:18 PM Masahiko Sawada 
>  wrote:
> > 1.
> > +/* Update error traceback information */
> > +olderrcbarg = *vacrelstats;
> > +update_vacuum_error_cbarg(vacrelstats,
> > +  VACUUM_ERRCB_PHASE_TRUNCATE,
> > new_rel_pages, NULL,
> > +  false);
> > +
> >  /*
> >   * Scan backwards from the end to verify that the end pages 
> > actually
> >   * contain no tuples.  This is *necessary*, not optional, because
> >   * other backends could have added tuples to these pages whilst we
> >   * were vacuuming.
> >   */
> >  new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
> >
> > We need to set the error context after setting new_rel_pages.
> 
> We want to cover the errors raised in count_nondeletable_pages().  In
> an earlier version of the patch, we had TRUNCATE_PREFETCH phase which
> use to cover those errors, but that was not good as we were
> setting/resetting it multiple times and it was not clear such a
> separate phase would add any value.

I insisted on covering count_nondeletable_pages since it calls ReadBuffer(),
but I think we need to at least set vacrelsats->blkno = new_rel_pages, since it
may be different, right ?

> > 2.
> > +   vacrelstats->relnamespace =
> > get_namespace_name(RelationGetNamespace(onerel));
> > +   vacrelstats->relname = pstrdup(RelationGetRelationName(onerel));
> >
> > I think we can pfree these two variables to avoid a memory leak during
> > vacuum on multiple relations.
> 
> Yeah, I had also thought about it but I noticed that we are not
> freeing for vacrelstats.  Also, I think the memory is allocated in
> TopTransactionContext which should be freed via
> CommitTransactionCommand before vacuuming of the next relation, so not
> sure if there is much value in freeing those variables.

One small reason to free them is that (as Tom mentioned upthread) it's good to
ensure that those variables are their own allocation, and not depending on
being able to access relcache or anything else during an unexpected error.

-- 
Justin




documentation pdf build fail (HEAD)

2020-03-24 Thread Erikjan Rijkers

Hello,

I build the pdf (for HEAD) almost daily without problems, but at the 
moment I get the error below.


I am not sure whether to blame my particular setup (debian stretch), or 
whether there might be an error in the .sgml.  The html files still 
build OK.


If anyone has a suggestion on how to tackle this I'd be grateful.

thanks,

Erik Rijkers



[...]
[INFO] FOUserAgent - Rendered page #526.
[INFO] FOUserAgent - Rendered page #527.
[INFO] FOUserAgent - Rendered page #528.
[INFO] FOUserAgent - Rendered page #529.
[[ERROR] FOP - Exception org.apache.fop.fo.ValidationException: The column-number or number of 
cells in the row overflows the number of fo:table-columns specified for 
the table. (See position 47337:52207)
javax.xml.transform.TransformerException: 
org.apache.fop.fo.ValidationException: The column-number or number of 
cells in the row overflows the number of fo:table-columns specified for 
the table. (See position 47337:52207)>org.apache.fop.apps.FOPException: 
org.apache.fop.fo.ValidationException: The column-number or number of 
cells in the row overflows the number of fo:table-columns specified for 
the table. (See position 47337:52207)
javax.xml.transform.TransformerException: 
org.apache.fop.fo.ValidationException: The column-number or number of 
cells in the row overflows the number of fo:table-columns specified for 
the table. (See position 47337:52207)
at 
org.apache.fop.cli.InputHandler.transformTo(InputHandler.java:289)
at 
org.apache.fop.cli.InputHandler.renderTo(InputHandler.java:115)

at org.apache.fop.cli.Main.startFOP(Main.java:186)
at org.apache.fop.cli.Main.main(Main.java:217)
Caused by: javax.xml.transform.TransformerException: 
org.apache.fop.fo.ValidationException: The column-number or number of 
cells in the row overflows the number of fo:table-columns specified for 
the table. (See position 47337:52207)
at 
org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:502)
at 
org.apache.fop.cli.InputHandler.transformTo(InputHandler.java:286)

... 3 more
Caused by: org.apache.fop.fo.ValidationException: The column-number or 
number of cells in the row overflows the number of fo:table-columns 
specified for the table. (See position 47337:52207)
at 
org.apache.fop.events.ValidationExceptionFactory.createException(ValidationExceptionFactory.java:38)
at 
org.apache.fop.events.EventExceptionManager.throwException(EventExceptionManager.java:58)
at 
org.apache.fop.events.DefaultEventBroadcaster$1.invoke(DefaultEventBroadcaster.java:175)

at com.sun.proxy.$Proxy4.tooManyCells(Unknown Source)
at 
org.apache.fop.fo.flow.table.TableCellContainer.addTableCellChild(TableCellContainer.java:75)
at 
org.apache.fop.fo.flow.table.TableRow.addChildNode(TableRow.java:95)
at 
org.apache.fop.fo.FOTreeBuilder$MainFOHandler.startElement(FOTreeBuilder.java:324)
at 
org.apache.fop.fo.FOTreeBuilder.startElement(FOTreeBuilder.java:179)
at 
org.apache.xalan.transformer.TransformerIdentityImpl.startElement(TransformerIdentityImpl.java:1073)
at 
org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
at 
org.apache.xerces.xinclude.XIncludeHandler.startElement(Unknown Source)
at 
org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown 
Source)
at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown 
Source)
at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown 
Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown 
Source)

at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown 
Source)
at 
org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at 
org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:485)

... 4 more





Re: error context for vacuum to include block number

2020-03-24 Thread Amit Kapila
On Tue, Mar 24, 2020 at 6:18 PM Masahiko Sawada
 wrote:
>
>
> I've read through the latest version patch (v31), here are my comments:
>
> 1.
> +/* Update error traceback information */
> +olderrcbarg = *vacrelstats;
> +update_vacuum_error_cbarg(vacrelstats,
> +  VACUUM_ERRCB_PHASE_TRUNCATE,
> new_rel_pages, NULL,
> +  false);
> +
>  /*
>   * Scan backwards from the end to verify that the end pages actually
>   * contain no tuples.  This is *necessary*, not optional, because
>   * other backends could have added tuples to these pages whilst we
>   * were vacuuming.
>   */
>  new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
>
> We need to set the error context after setting new_rel_pages.
>

We want to cover the errors raised in count_nondeletable_pages().  In
an earlier version of the patch, we had TRUNCATE_PREFETCH phase which
use to cover those errors, but that was not good as we were
setting/resetting it multiple times and it was not clear such a
separate phase would add any value.

> 2.
> +   vacrelstats->relnamespace =
> get_namespace_name(RelationGetNamespace(onerel));
> +   vacrelstats->relname = pstrdup(RelationGetRelationName(onerel));
>
> I think we can pfree these two variables to avoid a memory leak during
> vacuum on multiple relations.
>

Yeah, I had also thought about it but I noticed that we are not
freeing for vacrelstats.  Also, I think the memory is allocated in
TopTransactionContext which should be freed via
CommitTransactionCommand before vacuuming of the next relation, so not
sure if there is much value in freeing those variables.

> 3.
> +/* Phases of vacuum during which we report error context. */
> +typedef enum
> +{
> +   VACUUM_ERRCB_PHASE_UNKNOWN,
> +   VACUUM_ERRCB_PHASE_SCAN_HEAP,
> +   VACUUM_ERRCB_PHASE_VACUUM_INDEX,
> +   VACUUM_ERRCB_PHASE_VACUUM_HEAP,
> +   VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
> +   VACUUM_ERRCB_PHASE_TRUNCATE
> +} ErrCbPhase;
>
> Comparing to the vacuum progress phases, there is not a phase for
> final cleanup which includes updating the relation stats. Is there any
> reason why we don't have that phase for ErrCbPhase?
>

I think we can add it if we want, but it was not clear to me if that is helpful.

> The type name ErrCbPhase seems to be very generic name, how about
> VacErrCbPhase or VacuumErrCbPhase?
>

It sounds like a better name.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: truncating timestamps on arbitrary intervals

2020-03-24 Thread Tels

Hello John,

this looks like a nice feature. I'm wondering how it relates to the 
following use-case:


When drawing charts, the user can select pre-defined widths on times 
(like "15 min", "1 hour").


The data for these slots is fitted always to intervalls that start in 0 
in the slot, e.g. if the user selects "15 min", the interval always 
starts at xx:00, xx:15, xx:30 or xx:45. This is to aid caching of the 
resulting data, and so that requesting the same chart at xx:00 and xx:01 
actually draws the same chart, and not some bar with only one minute 
data at at the end.


In PSQL, this works out to using this as GROUP BY and then summing up 
all values:


  SELECT FLOOR(EXTRACT(EPOCH from thetime) / 3600) * 3600, SUM(events) 
FROM mytable ... GROUP BY 1;


whereas here 3600 means "hourly".

It is of course easy for things like "1 hour", but for yearly I had to 
come up with things like:


  EXTRAC(YEAR FROM thetime) * 12 + EXTRACT(MONTH FROM thetime)

and its gets even more involved for weeks, weekdays or odd things like 
fortnights.


So would that mean one could replace this by:

 date_trunc_interval('3600 seconds'::interval, thetime)

and it would be easier, and (hopefully) faster?

Best regards,

Tels




Re: [Proposal] Global temporary tables

2020-03-24 Thread Prabhat Sahu
Hi Wenjing,
Please check my findings(on gtt_v20.patch) as below:

*TestCase1:* (cache lookup failed on GTT)

-- Session1:
postgres=# create global temporary table gtt1(c1 int) on commit delete rows;
CREATE TABLE

-- Session2:
postgres=# drop table gtt1 ;
DROP TABLE

-- Session1:
postgres=# create global temporary table gtt1(c1 int) on commit delete rows;
ERROR:  cache lookup failed for relation 16384


*TestCase2:*

-- Session1:
postgres=# create global temporary table gtt (c1 integer) on commit
preserve rows;
CREATE TABLE
postgres=# insert into gtt values(10);
INSERT 0 1

-- Session2:
postgres=# drop table gtt;
DROP TABLE


I hope "session2" should not allow to perform the "DROP" operation on GTT
having data.

*Behavior of GTT in Oracle Database in such scenario:* For a completed
transaction on GTT with(on_commit_delete_rows='FALSE') with data in a
session, we will not be able to DROP from any session, we need to TRUNCATE
the data first to DROP the table.

SQL> drop table gtt;
drop table gtt
   *
ERROR at line 1:
ORA-14452: attempt to create, alter or drop an index on temporary table
already
in use



On Tue, Mar 17, 2020 at 9:16 AM 曾文旌(义从)  wrote:

>
>
> 2020年3月11日 下午3:52,Prabhat Sahu  写道:
>
> On Mon, Mar 9, 2020 at 10:02 PM 曾文旌(义从) 
> wrote:
>
>>
>>
>> Fixed in global_temporary_table_v18-pg13.patch.
>>
> Hi Wenjing,
> Thanks for the patch. I have verified the previous issues with
> "gtt_v18_pg13.patch" and those are resolved.
> Please find below case:
>
> postgres=# create sequence seq;
> CREATE SEQUENCE
>
> postgres=# CREATE GLOBAL TEMPORARY TABLE gtt1(c1 int PRIMARY KEY) ON
> COMMIT DELETE ROWS;
> CREATE TABLE
>
> postgres=# CREATE GLOBAL TEMPORARY TABLE gtt2(c1 int PRIMARY KEY) ON
> COMMIT PRESERVE ROWS;
> CREATE TABLE
>
> postgres=# alter table gtt1 add c2 int default nextval('seq');
> ERROR:  cannot reindex global temporary tables
>
> postgres=# alter table gtt2 add c2 int default nextval('seq');
> ERROR:  cannot reindex global temporary tables
>
> reindex GTT is already supported
>
> Please check global_temporary_table_v20-pg13.patch
>
>
> Wenjing
>
>
>
>
> *Note*: We are getting this error if we have a key column(PK/UNIQUE) in a
> GTT, and trying to add a column with a default sequence into it.
>
> --
>
> With Regards,
> Prabhat Kumar Sahu
> EnterpriseDB: http://www.enterprisedb.com
>
>
>

-- 

With Regards,
Prabhat Kumar Sahu
EnterpriseDB: http://www.enterprisedb.com


Re: Built-in connection pooler

2020-03-24 Thread David Steele

Hi Konstantin,

On 11/14/19 2:06 AM, Konstantin Knizhnik wrote:

Attached please find rebased patch with this bug fixed.


This patch no longer applies: http://cfbot.cputube.org/patch_27_2067.log

CF entry has been updated to Waiting on Author.

Regards,
--
-David
da...@pgmasters.net




Re: Additional improvements to extended statistics

2020-03-24 Thread Dean Rasheed
On Tue, 24 Mar 2020 at 01:08, Tomas Vondra  wrote:
>
> Hmmm. So let's consider a simple OR clause with two arguments, both
> covered by single statistics object. Something like this:
>
>CREATE TABLE t (a int, b int);
>INSERT INTO t SELECT mod(i, 10), mod(i, 10)
>  FROM generate_series(1,10);
>CREATE STATISTICS s (mcv) ON a,b FROM t;
>
>SELECT * FROM t WHERE a = 0 OR b = 0;
>
> Which is estimated correctly...
>

Hmm, the reason that is estimated correctly is that the MCV values
cover all values in the table, so mcv_totalsel is 1 (or pretty close
to 1), and other_sel is capped to nearly 0, and so the result is
basically just mcv_sel -- i.e., in this case the MCV estimates are
returned more-or-less as-is, rather than being applied as a
correction, and so the result is independent of how many times
extended stats are applied.

The more interesting (and probably more realistic) case is where the
MCV values do not cover the all values in the table, as in the new
mcv_lists_partial examples in the regression tests, for example this
test case, which produces a significant overestimate:

  SELECT * FROM mcv_lists_partial WHERE a = 0 OR b = 0 OR c = 0

although actually, I think there's another reason for that (in
addition to the extended stats correction being applied twice) -- the
way the MCV code makes use of base selectivity doesn't seem really
appropriate for OR clauses, because the way base_frequency is computed
is really based on the assumption that every column would be matched.
I'm not sure that there's any easy answer to that though. I feels like
what's needed when computing the match bitmaps in mcv.c, is to produce
a bitmap (would it fit in 1 byte?) per value, to indicate which
columns match, and base_frequency values per column. That would be
significantly more work though, so almost certainly isn't feasible for
PG13.

> IIUC the problem you have in mind is that we end up calling
> statext_mcv_clauselist_selectivity twice, once for the top-level AND
> clause with a single argument, and then recursively for the expanded OR
> clause. Indeed, that seems to be applying the correction twice.
>
>
> >I'm not sure what's the best way to resolve that. Perhaps
> >statext_mcv_clauselist_selectivity() / statext_is_compatible_clause()
> >should ignore OR clauses from an AND-list, on the basis that they will
> >get processed recursively later. Or perhaps estimatedclauses can
> >somehow be used to prevent this, though I'm not sure exactly how that
> >would work.
>
> I don't know. I feel uneasy about just ignoring some of the clauses,
> because what happens for complex clauses, where the OR is not directly
> in the AND clause, but is negated or something like that?
>
> Isn't it the case that clauselist_selectivity_simple (and the OR
> variant) should ignore extended stats entirely? That is, we'd need to
> add a flag (or _simple variant) to clause_selectivity, so that it calls
> causelist_selectivity_simple_or. So the calls would look like this:
>
>clauselist_selectivity
>  statext_clauselist_selectivity
>statext_mcv_clauselist_selectivity
>  clauselist_selectivity_simple  <--- disable extended stats
>clause_selectivity
>  clauselist_selectivity_simple_or
>clause_selectivity
>clause_selectivity
>  mcv_clauselist_selectivity
>  clauselist_selectivity_simple
>already estimated
>
> I've only quickly hacked clause_selectivity, but it does not seems very
> invasive (of course, it means disruption to clause_selectivity callers,
> but I suppose most call clauselist_selectivity).
>

Sounds like a reasonable approach, but I think it would be better to
preserve the current public API by having clauselist_selectivity()
become a thin wrapper around  a new function that optionally applies
extended stats.

Regards,
Dean




Re: Fastpath while arranging the changes in LSN order in logical decoding

2020-03-24 Thread Dilip Kumar
On Tue, Mar 24, 2020 at 6:16 PM Amit Kapila  wrote:
>
> On Mon, Mar 9, 2020 at 11:07 PM Andres Freund  wrote:
> >
> > On 2020-03-07 11:15:27 +0530, Dilip Kumar wrote:
> > > IMHO, if we conclude that because there is no performance gain so we
> > > don't want to add one extra path in the code then we might want to
> > > remove that TODO from the code so that we don't spend time for
> > > optimizing this in the future.
> >
> > +1
> >
>
> Dilip, are you planning to do more tests for this?  Anyone else wants
> to do more tests? If not, based on current results, we can remove that
> TODO and in future, if someone comes with a test case to show benefit
> for adding fastpath, then we can consider the patch proposed by Dilip.

IMHO, I have tried the best case but did not see any performance gain
so I am not planning to test this further.  I have attached the patch
for removing the TODO.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


0001-Remove-TODO-comments-for-fast-path.patch
Description: Binary data


Re: error context for vacuum to include block number

2020-03-24 Thread Masahiko Sawada
On Tue, 24 Mar 2020 at 18:19, Amit Kapila  wrote:
>
> On Tue, Mar 24, 2020 at 2:37 PM Masahiko Sawada
>  wrote:
> >
> > On Tue, 24 Mar 2020 at 13:53, Amit Kapila  wrote:
> > >
> > > On Tue, Mar 24, 2020 at 9:46 AM Justin Pryzby  
> > > wrote:
> > > >
> > > > On Mon, Mar 23, 2020 at 02:25:14PM +0530, Amit Kapila wrote:
> > > > > > Yea, and it would be misleading if we reported "while scanning 
> > > > > > block..of
> > > > > > relation" if we actually failed while writing its FSM.
> > > > > >
> > > > > > My previous patches did this:
> > > > > >
> > > > > > +   case VACUUM_ERRCB_PHASE_VACUUM_FSM:
> > > > > > +   errcontext("while vacuuming free space map 
> > > > > > of relation \"%s.%s\"",
> > > > > > +   cbarg->relnamespace, 
> > > > > > cbarg->relname);
> > > > > > +   break;
> > > > > >
> > > > >
> > > > > In what kind of errors will this help?
> > > >
> > > > If there's an I/O error on an _fsm file, for one.
> > > >
> > >
> > > If there is a read or write failure, then we give error like below
> > > which already has required information.
> > > ereport(ERROR,
> > > (errcode_for_file_access(),
> > > errmsg("could not read block %u in file \"%s\": %m",
> > > blocknum, FilePathName(v->mdfd_vfd;
> >
> > Yeah, you're right. We, however, cannot see that the error happened
> > while recording freespace or while vacuuming freespace map but perhaps
> > we can see the situation by seeing the error message in most cases. So
> > I'm okay with the current set of phases.
> >
> > I'll also test the current version of patch today.
> >
>
> okay, thanks.

I've read through the latest version patch (v31), here are my comments:

1.
+/* Update error traceback information */
+olderrcbarg = *vacrelstats;
+update_vacuum_error_cbarg(vacrelstats,
+  VACUUM_ERRCB_PHASE_TRUNCATE,
new_rel_pages, NULL,
+  false);
+
 /*
  * Scan backwards from the end to verify that the end pages actually
  * contain no tuples.  This is *necessary*, not optional, because
  * other backends could have added tuples to these pages whilst we
  * were vacuuming.
  */
 new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);

We need to set the error context after setting new_rel_pages.

2.
+   vacrelstats->relnamespace =
get_namespace_name(RelationGetNamespace(onerel));
+   vacrelstats->relname = pstrdup(RelationGetRelationName(onerel));

I think we can pfree these two variables to avoid a memory leak during
vacuum on multiple relations.

3.
+/* Phases of vacuum during which we report error context. */
+typedef enum
+{
+   VACUUM_ERRCB_PHASE_UNKNOWN,
+   VACUUM_ERRCB_PHASE_SCAN_HEAP,
+   VACUUM_ERRCB_PHASE_VACUUM_INDEX,
+   VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+   VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
+   VACUUM_ERRCB_PHASE_TRUNCATE
+} ErrCbPhase;

Comparing to the vacuum progress phases, there is not a phase for
final cleanup which includes updating the relation stats. Is there any
reason why we don't have that phase for ErrCbPhase?

The type name ErrCbPhase seems to be very generic name, how about
VacErrCbPhase or VacuumErrCbPhase?

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Fastpath while arranging the changes in LSN order in logical decoding

2020-03-24 Thread Amit Kapila
On Mon, Mar 9, 2020 at 11:07 PM Andres Freund  wrote:
>
> On 2020-03-07 11:15:27 +0530, Dilip Kumar wrote:
> > IMHO, if we conclude that because there is no performance gain so we
> > don't want to add one extra path in the code then we might want to
> > remove that TODO from the code so that we don't spend time for
> > optimizing this in the future.
>
> +1
>

Dilip, are you planning to do more tests for this?  Anyone else wants
to do more tests? If not, based on current results, we can remove that
TODO and in future, if someone comes with a test case to show benefit
for adding fastpath, then we can consider the patch proposed by Dilip.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Improve checking for pg_index.xmin

2020-03-24 Thread Amit Kapila
On Sun, Jan 12, 2020 at 3:33 AM Alexander Korotkov
 wrote:
>
> On Wed, Jan 8, 2020 at 4:37 PM Tom Lane  wrote:
> > Heikki Linnakangas  writes:
> > > On 01/11/2019 01:50, Alexander Korotkov wrote:
> > >> This happens so, because we're checking that there is no broken HOT
> > >> chains after index creation by comparison pg_index.xmin and
> > >> TransactionXmin.   So, we check that pg_index.xmin is in the past for
> > >> current transaction in lossy way by comparison just xmins.  Attached
> > >> patch changes this check to XidInMVCCSnapshot().
> > >> With patch the issue is gone.  My doubt about this patch is that it
> > >> changes check with TransactionXmin to check with GetActiveSnapshot(),
> > >> which might be more recent.  However, query shouldn't be executer with
> > >> older snapshot than one it was planned with.
> >
> > > Hmm. Maybe you could construct a case like that with a creative mix of
> > > stable and volatile functions? Using GetOldestSnapshot() would be safer.
> >
> > I really wonder if this is safe at all.
> >
> > (1) Can we assume that the query will be executed with same-or-newer
> > snapshot as what was used by the planner?  There's no such constraint
> > in the plancache, I'm pretty sure.
> >
> > (2) Is committed-ness of the index-creating transaction actually
> > sufficient to ensure that none of the broken HOT chains it saw are
> > a problem for the onlooker transaction?  This is, at best, really
> > un-obvious.  Some of those HOT chains could involve xacts that were
> > still not committed when the index build finished, I believe.
> >
> > (3) What if the index was made with CREATE INDEX CONCURRENTLY ---
> > which xid is actually on the pg_index row, and how does that factor
> > into (1) and (2)?
>
> Thank you for pointing.  I'll investigate these issues in detail.
>

Are you planning to work on this patch [1] for current CF?  If not,
then I think it is better to either move this to the next CF or mark
it as RWF.

[1] - https://commitfest.postgresql.org/27/2337/

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Adding a test for speculative insert abort case

2020-03-24 Thread Amit Kapila
On Wed, Feb 12, 2020 at 6:50 AM Melanie Plageman
 wrote:
>
>
> On Tue, Feb 11, 2020 at 4:45 PM Andres Freund  wrote:
>>
>>
>> I additionally restricted the controller_print_speculative_locks step to
>> the current database and made a bunch of debug output more
>> precise. Survived ~150 runs locally.
>>
>> Lets see what the buildfarm says...
>>
>
> Thanks so much for finishing the patch and checking for race
> conditions!
>

Can we change the status of CF entry for this patch [1] to committed
or is there any work pending?


[1] - https://commitfest.postgresql.org/27/2200/
-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Silence compiler warnings with Python 3.9

2020-03-24 Thread Peter Eisentraut

On 2020-03-02 14:22, Peter Eisentraut wrote:

Starting with Python 3.9, the Python headers contain inline functions
that fall afoul of our -Wdeclaration-after-statement coding style.  In
order to silence those warnings, I've added some GCC-specific
contortions to disable that warning for Python.h only.  Clang doesn't
appear to warn about this at all; maybe it recognizes that this is an
external header file.  We could also write a configure check for this if
we want to be more flexible.

(Attempts to convince upstream to change the coding style were
unsuccessful (https://bugs.python.org/issue39615).)


My fix in cpython was accepted after all and the issue is no longer 
present in the latest alpha (3.9.0a5), so this can be considered closed.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: truncating timestamps on arbitrary intervals

2020-03-24 Thread John Naylor
On Sun, Mar 15, 2020 at 2:26 PM I wrote:
>
> To get more logical behavior, perhaps the optional parameter is better
> as an offset instead of an origin. Alternatively (or additionally),
> the function could do the math on int64 timestamps directly.

For v6, I changed the algorithm to use pg_tm for months and years, and
int64 for all smaller units. Despite the split, I think it's easier to
read now, and certainly shorter. This has the advantage that now
mixing units is allowed, as long as you don't mix months/years with
days or smaller, which often doesn't make sense and is not very
practical. (not yet documented) One consequence of this is that when
operating on months/years, and the origin contains smaller units, the
smaller units are ignored. Example:

select date_trunc_interval('12 months'::interval, timestamp
'2012-03-01 01:21:01', timestamp '2011-03-22');
 date_trunc_interval
-
 2012-03-01 00:00:00
(1 row)

Even though not quite a full year has passed, it ignores the days in
the origin time and detects a difference in 12 calendar months. That
might be fine, although we could also throw an error and say origins
must be in the form of '-01-01 00:00:00' when truncating on months
and/or years.

I added a sketch of documentation for the origin parameter and more tests.

On Fri, Mar 13, 2020 at 7:48 PM Isaac Morland  wrote:
> I'm confused by this. If my calendars are correct, both 1900-01-02 and 
> 2020-02-11 are Tuesdays. So if the date being adjusted and the origin are 
> both Tuesday, shouldn't the day part be left alone when truncating to 7 days? 
> Also, I'd like to confirm that the default starting point for 7 day periods 
> (weeks) is Monday, per ISO.

This is fixed.

select date_trunc_interval('7 days'::interval, timestamp '2020-02-11
01:01:01.0', TIMESTAMP '1900-01-02');
 date_trunc_interval
-
 2020-02-11 00:00:00
(1 row)

select date_trunc_interval('7 days'::interval, d), count(*) from
generate_series( '2020-02-01'::timestamp, '2020-03-31', '1 day') d
group by 1 order by 1;
 date_trunc_interval | count
-+---
 2020-01-27 00:00:00 | 2
 2020-02-03 00:00:00 | 7
 2020-02-10 00:00:00 | 7
 2020-02-17 00:00:00 | 7
 2020-02-24 00:00:00 | 7
 2020-03-02 00:00:00 | 7
 2020-03-09 00:00:00 | 7
 2020-03-16 00:00:00 | 7
 2020-03-23 00:00:00 | 7
 2020-03-30 00:00:00 | 2
(10 rows)

> Perhaps the starting point for dates should be either 0001-01-01 (the 
> proleptic beginning of the CE calendar) or 2001-01-01 (the beginning of the 
> current 400-year repeating cycle of leap years and weeks, and a Monday, 
> giving the appropriate ISO result for truncating to 7 day periods).

I went ahead with 2001-01-01 for the time being.

On Thu, Mar 19, 2020 at 4:20 PM Artur Zakirov  wrote:
>
> =# select date_trunc_interval('1.1 year'::interval, TIMESTAMP
> '2020-02-01 01:21:01');
> ERROR:  only one interval unit allowed for truncation

For any lingering cases like this (i don't see any), maybe an error
hint is in order. The following works now, as expected for 1 year 1
month:

select date_trunc_interval('1.1 year'::interval, timestamp '2002-05-01
01:21:01');
 date_trunc_interval
-
 2002-02-01 00:00:0

I'm going to look into implementing timezone while awaiting comments on v6.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


v6-datetrunc_interval.patch
Description: Binary data


  1   2   >