Re: kill_prior_tuple and index scan costing

2020-03-21 Thread Andres Freund
Hi,

On 2020-03-21 23:53:05 -0500, Justin Pryzby wrote:
> On Sat, Mar 21, 2020 at 07:33:02PM -0700, Andres Freund wrote:
> > While your recent btree work ensures that we get the heap tids for an
> > equality lookup in heap order (right?),
> 
> I think when I tested the TID tiebreaker patch, it didn't help for our case,
> which is for inequality: (timestamptz >= start AND timestamptz < end).
> 
> That seems to explain why, although I don't understand why it wouldn't also
> apply to inequality comparison ?

Because tids are only ordered for a single lookup key. So if you scan
across multiple values you could have key:page visited in the order of
1:1 1:2 1:99 2:1 2:7 99:1 or such, i.e. the heap pages would not be
monotonically increasing. You can't however have 1:17 1:1, because for a
specific key value, the tid is used as an additional comparison value.

Greetings,

Andres Freund




Re: kill_prior_tuple and index scan costing

2020-03-21 Thread Justin Pryzby
On Sat, Mar 21, 2020 at 07:33:02PM -0700, Andres Freund wrote:
> While your recent btree work ensures that we get the heap tids for an
> equality lookup in heap order (right?),

I think when I tested the TID tiebreaker patch, it didn't help for our case,
which is for inequality: (timestamptz >= start AND timestamptz < end).

That seems to explain why, although I don't understand why it wouldn't also
apply to inequality comparison ?

|template1=# CREATE TABLE t(i int,j int); CREATE INDEX ON t(i); INSERT INTO t 
SELECT (0.0001*a+9*(random()-0.5))::int FROM generate_series(1,) a; 
VACUUM ANALYZE t;
|template1=# explain (analyze,buffers) SELECT * FROM t WHERE i BETWEEN 2000 AND 
3000;
| Index Scan using t_i_idx on t  (cost=0.44..277164.86 rows=10026349 width=8) 
(actual time=0.199..6839.564 rows=10010076 loops=1)
|   Index Cond: ((i >= 2000) AND (i <= 3000))
|   Buffers: shared hit=394701 read=52699

vs.

|template1=# SET enable_seqscan=off; SET enable_indexscan=off; explain 
(analyze,buffers) SELECT * FROM t WHERE i BETWEEN 2000 AND 3000;
| Bitmap Heap Scan on t  (cost=135038.52..1977571.10 rows=10026349 width=8) 
(actual time=743.649..3760.643 rows=10010076 loops=1)
|   Recheck Cond: ((i >= 2000) AND (i <= 3000))
|   Heap Blocks: exact=44685
|   Buffers: shared read=52700
|   ->  Bitmap Index Scan on t_i_idx  (cost=0.00..132531.93 rows=10026349 
width=0) (actual time=726.474..726.475 rows=10010076 loops=1)
| Index Cond: ((i >= 2000) AND (i <= 3000))
| Buffers: shared read=8015

I'm not concerned with the "actual" time or hit vs cached, but the total buffer
pages.  Indexscan accessed 450k buffers vs 52k for bitmapscan.

> I don't think we currently have
> the planner infrastructure to know that that's the case (since other
> index types don't guarantee that) / take it into account for planning?

Right, since correlation is a property of the table column and not of the
index.  See also:
https://www.postgresql.org/message-id/14438.1512499...@sss.pgh.pa.us

Years ago I had a patch to make correlation a property of indexes.

-- 
Justin




Re: kill_prior_tuple and index scan costing

2020-03-21 Thread Andres Freund
Hi,

reply largely based on a quick IM conversation between Peter and me.

On 2020-03-04 17:13:33 -0800, Peter Geoghegan wrote:
> Both plans are very similar, really. The number of heap accesses and
> B-Tree index page accesses is exactly the same in each case.

Note that bitmap heap scans, currently, have the huge advantage of being
able to efficiently prefetch heap data. That can be a *huge* performance
boon (I've seen several orders of magnitude on SSDs).

There's also some benefits of bitmap heap scans in other ways. For heap
the "single tid" path index->heap lookup locks the page once for each
tid, whereas bitmap heap scans only do that once - adding more lock
cycles obvious can have a noticable performance impact.  Not having
interspersed io between index and heap can be beneficial too.


I thought we had optimized the non-lossy bitmap path for heap
(i.e. heapam_scan_bitmap_next_block()) to perform visibility checks more
efficiently than single tid fetches
(i.e. heapam_index_fetch_tuple()). But both use
heap_hot_search_buffer(), even though the number of page locks differs.

I'm a bit surprised that neither heap_hot_search_buffer() nor the "lossy
path" in heapam_scan_bitmap_next_blocks()'s take advantage of the page's
all-visible? I don't really see a good reason for that. The HTSV calls
do show up noticably in profiles, in my experience.


While your recent btree work ensures that we get the heap tids for an
equality lookup in heap order (right?), I don't think we currently have
the planner infrastructure to know that that's the case (since other
index types don't guarantee that) / take it into account for planning?


> But the index scan plan has one non-obvious advantage, that might
> matter a lot in the real world: it can apply the kill_prior_tuple
> optimization. (It is never possible to use the kill_prior_tuple
> optimization during a bitmap index scan.)

Indeed. I've seen this cause very significant issues a couple
times. Basically whenever the handful of very common queries that
touched most of the data switched to bitmap heap scans, the indexes
would explode in size.  Due to the index sizes involved there was no way
normal vacuum could clean up dead tuples quickly enough to prevent
bloat, but with kill_prior_tuple that wasn't a problem.

I have wondered whether we could "just" add some support for
kill_prior_tuple to the bitmap index scan infrastructure. Obviously
that'd require some way of calling "back" into the index code once
(several?) tuples on a page are found to be dead during a bitmap heap
scan. Which would require keeping track of additional metadata for each
tuple in the tid bitmap, which is obviously not free and would have to
be conditional.

I don't really like the kill_prior_tuple interface much. But I don't
immediately see how to do better, without increasing the overhead.


> It makes sense that the planner determines that a bitmap index scan is
> faster -- or it used to make sense. Before commit dd299df8, which made
> heap TID a tiebreaker nbtree index column, we might find ourselves
> accessing the same heap page multiple times, pinning it a second or a
> third time within the executor (it depended on very unstable
> implementation details in the nbtree code). These days we should
> *reliably* access the same number of heap pages (and index pages) with
> either plan. (There are a couple of caveats that I'm glossing over for
> now, like pg_upgrade'd indexes.)

Leaving the locking difference pointed out above aside, there still is a
significant difference in how many times we indirectly call into the
index AM, and how much setup work has to be done though?

There's at least one index_getnext_tid() call for each result tuple,
which each time indirectly has to call btgettuple(). And each
btgettuple() has to do checks (do array keys need to be advanced, has
_bt_first() been called). Whereas btgetbitmap() can amortize across
all tuples.

And that's without considering the fact that, to me, btgetbitmap() could
be significantly optimized by adding multiple tuples to the bitmap at
once, rather than doing so one-by-one.

Greetings,

Andres Freund




doc review for parallel vacuum

2020-03-21 Thread Justin Pryzby
Original, long thread
https://www.postgresql.org/message-id/flat/CAA4eK1%2Bnw1FBK3_sDnW%2B7kB%2Bx4qbDJqetgqwYW8k2xv82RZ%2BKw%40mail.gmail.com#b1745ee853b137043e584b500b41300f

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index ab1b8c2398..140637983a 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -237,15 +237,15 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ 
integer
-  background workers (for the detail of each vacuum phases, please
+  background workers (for the detail of each vacuum phase, please
   refer to ).  If the
-  PARALLEL option is omitted, then
-  VACUUM decides the number of workers based on number
-  of indexes that support parallel vacuum operation on the relation which
-  is further limited by .
-  The index can participate in a parallel vacuum if and only if the size
+  PARALLEL option is omitted, then the number of workers
+  is determined based on the number of indexes that support parallel vacuum
+  operation on the relation, and is further limited by .
+  An index can participate in parallel vacuum if and only if the size
   of the index is more than .
   Please note that it is not guaranteed that the number of parallel workers
   specified in integer will
@@ -253,7 +253,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ 
VACUUM causes a substantial increase in I/O traffic,
 which might cause poor performance for other active sessions.  Therefore,
 it is sometimes advisable to use the cost-based vacuum delay feature.  For
-parallel vacuum, each worker sleeps proportional to the work done by that
+parallel vacuum, each worker sleeps proportionally to the work done by that
 worker.  See  for
 details.


diff --git a/src/backend/access/heap/vacuumlazy.c 
b/src/backend/access/heap/vacuumlazy.c
index e017db4446..0ab0bea312 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -194,7 +194,7 @@ typedef struct LVShared
 * live tuples in the index vacuum case or the new live tuples in the
 * index cleanup case.
 *
-* estimated_count is true if the reltuples is an estimated value.
+* estimated_count is true if reltuples is an estimated value.
 */
double  reltuples;
boolestimated_count;
@@ -757,7 +757,7 @@ skip_blocks(Relation onerel, VacuumParams *params, 
BlockNumber *next_unskippable
  * to reclaim dead line pointers.
  *
  * If the table has at least two indexes, we execute both index 
vacuum
- * and index cleanup with parallel workers unless the parallel 
vacuum is
+ * and index cleanup with parallel workers unless parallel vacuum 
is
  * disabled.  In a parallel vacuum, we enter parallel mode and then
  * create both the parallel context and the DSM segment before 
starting
  * heap scan so that we can record dead tuples to the DSM segment. 
 All
@@ -836,7 +836,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, 
LVRelStats *vacrelstats,
vacrelstats->latestRemovedXid = InvalidTransactionId;
 
/*
-* Initialize the state for a parallel vacuum.  As of now, only one 
worker
+* Initialize state for a parallel vacuum.  As of now, only one worker
 * can be used for an index, so we invoke parallelism only if there are 
at
 * least two indexes on a table.
 */
@@ -864,7 +864,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, 
LVRelStats *vacrelstats,
}
 
/*
-* Allocate the space for dead tuples in case the parallel vacuum is not
+* Allocate the space for dead tuples in case parallel vacuum is not
 * initialized.
 */
if (!ParallelVacuumIsActive(lps))
@@ -2111,7 +2111,7 @@ parallel_vacuum_index(Relation *Irel, 
IndexBulkDeleteResult **stats,
shared_indstats = get_indstats(lvshared, idx);
 
/*
-* Skip processing indexes that doesn't participate in parallel
+* Skip processing indexes that don't participate in parallel
 * operation
 */
if (shared_indstats == NULL ||
@@ -2223,7 +2223,7 @@ vacuum_one_index(Relation indrel, IndexBulkDeleteResult 
**stats,
shared_indstats->updated = true;
 
/*
-* Now that the stats[idx] points to the DSM segment, we don't 
need
+* Now that stats[idx] points to the DSM segment, we don't need
 * the locally allocated results.
 */
pfree(*stats);
@@ -2329,7 +2329,7 @@ lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult 
**stats,
  * lazy_cleanup_index() -- do post-vacuum cleanup for one index relation.
  *
  * reltuples is the number of heap 

Re: Additional size of hash table is alway zero for hash aggregates

2020-03-21 Thread Andres Freund
Hi,

On 2020-03-21 17:45:31 -0700, Jeff Davis wrote:
> Or, we can keep the 'additionalsize' argument but put it to work store
> the AggStatePerGroupData inline in the hash table. That would allow us
> to remove the 'additional' pointer from TupleHashEntryData, saving 8
> bytes plus the chunk header for every group. That sounds very tempting.

I don't see how? That'd require making the hash bucket addressing deal
with variable sizes, which'd be bad for performance reasons. Since there
can be a aggstate->numtrans AggStatePerGroupDatas for each hash table
entry, I don't see how to avoid a variable size?


> If we want to get even more clever, we could try to squish
> AggStatePerGroupData into 8 bytes by putting the flags
> (transValueIsNull and noTransValue) into unused bits of the Datum.
> That would work if the transtype is by-ref (low bits if pointer will
> be unused), or if the type's size is less than 8, or if the particular
> aggregate doesn't need either of those booleans. It would get messy,
> but saving 8 bytes per group is non-trivial.

I'm somewhat doubtful it's worth going for those per-type optimizations
- the wins don't seem large enough, relative to other per-group space
needs. Also adds additional instructions to fetching those values...

If we want to optimize memory usage, I think I'd first go for allocating
the group's "firstTuple" together with all the AggStatePerGroupDatas.

Greetings,

Andres Freund




Re: Why does [auto-]vacuum delay not report a wait event?

2020-03-21 Thread Peter Geoghegan
On Sat, Mar 21, 2020 at 5:53 PM Andres Freund  wrote:
> My comment is entirely unrelated to GIN, but about the way the delay 
> infrastructure manages state (in global vars).

The fact that ginInsertCleanup() uses "stats != NULL" to indicate
whether it is being called from within VACUUM or not is surely
relevant, or at least relevant to the issue that Mahendra just
reported. shiftList() relies on this directly already.

-- 
Peter Geoghegan




Re: Why does [auto-]vacuum delay not report a wait event?

2020-03-21 Thread Andres Freund
Hi,

On March 21, 2020 5:51:19 PM PDT, Peter Geoghegan  wrote:
>On Sat, Mar 21, 2020 at 5:25 PM Andres Freund 
>wrote:
>> > diff --git a/src/backend/access/gin/ginfast.c
>b/src/backend/access/gin/ginfast.c
>> > index 11d7ec067a..c99dc4a8be 100644
>> > --- a/src/backend/access/gin/ginfast.c
>> > +++ b/src/backend/access/gin/ginfast.c
>> > @@ -892,7 +892,7 @@ ginInsertCleanup(GinState *ginstate, bool
>full_clean,
>> >*/
>> >   processPendingPage(, , page,
>FirstOffsetNumber);
>> >
>> > - vacuum_delay_point();
>> > + stats->delay_msec += vacuum_delay_point();
>> >
>> >   /*
>> >* Is it time to flush memory to disk?  Flush if we
>are at the end of
>> > @@ -929,7 +929,7 @@ ginInsertCleanup(GinState *ginstate, bool
>full_clean,
>> >   {
>> >   ginEntryInsert(ginstate, attnum, key,
>category,
>> >  list,
>nlist, NULL);
>> > - vacuum_delay_point();
>> > + stats->delay_msec +=
>vacuum_delay_point();
>> >   }
>> >
>> >   /*
>> > @@ -1002,7 +1002,7 @@ ginInsertCleanup(GinState *ginstate, bool
>full_clean,
>> >   /*
>> >* Read next page in pending list
>> >*/
>> > - vacuum_delay_point();
>> > + stats->delay_msec += vacuum_delay_point();
>> >   buffer = ReadBuffer(index, blkno);
>> >   LockBuffer(buffer, GIN_SHARE);
>> >   page = BufferGetPage(buffer);
>>
>> On a green field I'd really like to pass a 'vacuum state' struct to
>> vacuum_delay_point().
>
>In a green field situation, there'd be no ginInsertCleanup() at all.
>It is a Lovecraftian horror show. The entire thing should be scrapped
>now, in fact.

My comment is entirely unrelated to GIN, but about the way the delay 
infrastructure manages state (in global vars).

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: Why does [auto-]vacuum delay not report a wait event?

2020-03-21 Thread Peter Geoghegan
On Sat, Mar 21, 2020 at 5:25 PM Andres Freund  wrote:
> > diff --git a/src/backend/access/gin/ginfast.c 
> > b/src/backend/access/gin/ginfast.c
> > index 11d7ec067a..c99dc4a8be 100644
> > --- a/src/backend/access/gin/ginfast.c
> > +++ b/src/backend/access/gin/ginfast.c
> > @@ -892,7 +892,7 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
> >*/
> >   processPendingPage(, , page, FirstOffsetNumber);
> >
> > - vacuum_delay_point();
> > + stats->delay_msec += vacuum_delay_point();
> >
> >   /*
> >* Is it time to flush memory to disk?  Flush if we are at 
> > the end of
> > @@ -929,7 +929,7 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
> >   {
> >   ginEntryInsert(ginstate, attnum, key, 
> > category,
> >  list, nlist, NULL);
> > - vacuum_delay_point();
> > + stats->delay_msec += vacuum_delay_point();
> >   }
> >
> >   /*
> > @@ -1002,7 +1002,7 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
> >   /*
> >* Read next page in pending list
> >*/
> > - vacuum_delay_point();
> > + stats->delay_msec += vacuum_delay_point();
> >   buffer = ReadBuffer(index, blkno);
> >   LockBuffer(buffer, GIN_SHARE);
> >   page = BufferGetPage(buffer);
>
> On a green field I'd really like to pass a 'vacuum state' struct to
> vacuum_delay_point().

In a green field situation, there'd be no ginInsertCleanup() at all.
It is a Lovecraftian horror show. The entire thing should be scrapped
now, in fact.

--
Peter Geoghegan




Re: Additional size of hash table is alway zero for hash aggregates

2020-03-21 Thread Jeff Davis
On Fri, 2020-03-13 at 00:34 +, Andrew Gierth wrote:
> > > > > > "Justin" == Justin Pryzby  writes:
> 
>  > On Thu, Mar 12, 2020 at 12:16:26PM -0700, Andres Freund wrote:
>  >> Indeed, that's incorrect. Causes the number of buckets for the
>  >> hashtable to be set higher - the size is just used for that. I'm
> a

It's also used to set the 'entrysize' field of the TupleHashTable,
which doesn't appear to be used for anything? Maybe we should just
remove that field... it confused me for a moment as I was looking into
this.

>  >> bit wary of changing this in the stable branches - could cause
>  >> performance changes?
> 
> I think (offhand, not tested) that the number of buckets would only
> be
> affected if the (planner-supplied) numGroups value would cause
> work_mem
> to be exceeded; the planner doesn't plan a hashagg at all in that
> 

Now that we have Disk-based HashAgg, which already tries to choose the
number of buckets with work_mem in mind; and no other caller specifies
non-zero additionalsize, why not just get rid of that argument
completely? It can still sanity check against work_mem for the sake of
other callers. But it doesn't need 'additionalsize' to do so.

Or, we can keep the 'additionalsize' argument but put it to work store
the AggStatePerGroupData inline in the hash table. That would allow us
to remove the 'additional' pointer from TupleHashEntryData, saving 8
bytes plus the chunk header for every group. That sounds very tempting.

If we want to get even more clever, we could try to squish
AggStatePerGroupData into 8 bytes by putting the flags
(transValueIsNull and noTransValue) into unused bits of the Datum. That
would work if the transtype is by-ref (low bits if pointer will be
unused), or if the type's size is less than 8, or if the particular
aggregate doesn't need either of those booleans. It would get messy,
but saving 8 bytes per group is non-trivial.

Regards,
Jeff Davis






Re: Why does [auto-]vacuum delay not report a wait event?

2020-03-21 Thread Andres Freund
Hi,

> On Thu, Mar 19, 2020 at 03:44:49PM -0700, Andres Freund wrote:
> > But uh, unfortunately the vacuum delay code just sleeps without setting
> > a wait event:
> ...
> > Seems like it should instead use a new wait event in the PG_WAIT_TIMEOUT
> > class?
> > 
> > Given how frequently we run into trouble with [auto]vacuum throttling
> > being a problem, and there not being any way to monitor that currently,
> > that seems like it'd be a significant improvement, given the effort?
> 
> I see that pg_sleep sets WAIT_EVENT_PG_SLEEP, but pg_usleep doesn't, I guess
> because the overhead is significant for a small number of usecs, and because 
> it
> doesn't work for pg_usleep to set a generic event if callers want to be able 
> to
> set a more specific wait event.

I don't think the overhead is a meaningful issue - compared to yielding
to the kernel / context switching, setting the wait event isn't a
significant cost.

I think the issue is more the second part - it's used as part of other
things using their own wait events potentially.

I think we should just rip out pg_usleep and replace it with latch
waits. While the case at hand is user configurable (but the max wait
time is 100ms, so it's not too bad), it's generally not great to sleep
without ensuring that interrupts are handled.  Nor is it great that we
don't sleep again if the sleep is interrupted.  There may be a case or
two where we don't want to layer ontop of latches (perhaps the spinlock
delay loop?), but pretty much everywhere else a different routine would
make more sense.


> Also, I noticed that SLEEP_ON_ASSERT comment (31338352b) wants to use 
> pg_usleep
> "which seems too short.".  Surely it should use pg_sleep, added at 782eefc58 a
> few years later ?

I don't see problem with using sleep here?


> Also, there was a suggestion recently that this should have a separate
> vacuum_progress phase:
> |vacuumlazy.c:#define VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL  50 /* ms */
> |vacuumlazy.c:pg_usleep(VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL * 1000L);
> 
> I was planning to look at that eventually ; should it have a wait event 
> instead
> or in addition ?

A separate phase? How would that look like? We don't want to replace the
knowledge that currently e.g. the heap scan is in progress?



> > It'd probably also be helpful to report the total time [auto]vacuum
> > spent being delayed for vacuum verbose/autovacuum logging, but imo
> > that'd be a parallel feature to a wait event, not a replacement.
> 
> This requires wider changes than I anticipated.
> 
> 2020-03-20 22:35:51.308 CDT [16534] LOG:  automatic aggressive vacuum of 
> table "template1.pg_catalog.pg_class": index scans: 1
> pages: 0 removed, 11 remain, 0 skipped due to pins, 0 skipped frozen
> tuples: 6 removed, 405 remain, 0 are dead but not yet removable, 
> oldest xmin: 1574
> buffer usage: 76 hits, 7 misses, 8 dirtied
> avg read rate: 16.378 MB/s, avg write rate: 18.718 MB/s
> Cost-based delay: 2 msec
> system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
> 
> VACUUM VERBOSE wouldn't normally be run with cost_delay > 0, so that field 
> will
> typically be zero, so I made it conditional

I personally dislike conditional output like that, because it makes
parsing the output harder.


> , which is supposedly why it's written like that, even though that's
> not otherwise being used since 17eaae98.

Well, it's also just hard to otherwise manage this long translatable
strings. And we're essentially still conditional, due to the 'msgfmt'
branches - if the whole output were output in a single appendStringInfo
call, we'd have to duplicate all the following format strings too.

Personally I really wish we'd just merge the vacuum verbose and the
autovacuum lgoging code, even if it causes some temporary pain.


On 2020-03-20 23:07:51 -0500, Justin Pryzby wrote:
> From 68c5ad8c7a9feb0c68afad310e3f52c21c3cdbaf Mon Sep 17 00:00:00 2001
> From: Justin Pryzby 
> Date: Fri, 20 Mar 2020 20:47:30 -0500
> Subject: [PATCH v1 1/2] Report wait event for cost-based vacuum delay
> 
> ---
>  doc/src/sgml/monitoring.sgml| 2 ++
>  src/backend/commands/vacuum.c   | 2 ++
>  src/backend/postmaster/pgstat.c | 3 +++
>  src/include/pgstat.h| 3 ++-
>  4 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
> index 5bffdcce10..46c99a55b7 100644
> --- a/doc/src/sgml/monitoring.sgml
> +++ b/doc/src/sgml/monitoring.sgml
> @@ -1507,6 +1507,8 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
> 11:34   0:00 postgres: ser
>(pg_wal, archive or stream) before trying
>again to retrieve WAL data, at recovery.
>   
> + VacuumDelay
> + Waiting in a cost-based vacuum delay point.
>  
>  
>   IO
> diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
> index d625d17bf4..59731d687f 100644
> --- 

Re: ssl passphrase callback

2020-03-21 Thread Andrew Dunstan

On 3/21/20 9:18 AM, Andrew Dunstan wrote:
> On 3/19/20 4:10 AM, asaba.takan...@fujitsu.com wrote:
>
>
>
>> Trailing space:
>>
>> 220 +   X509v3 Subject Key Identifier:
>> 222 +   X509v3 Authority Key Identifier:
>
> We're going to remove all the text, so this becomes moot.
>
>
>> Missing "d"(password?):
>>
>> 121 +/* init hook for SSL, the default sets the passwor callback if 
>> appropriate */
>>
> Will fix, thanks.
>
>


Latest patch attached, I think all comments have been addressed. I
propose to push this later this coming week if there are no more comments.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 18d3fff068..b0a8816cff 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -45,6 +45,9 @@
 #include "tcop/tcopprot.h"
 #include "utils/memutils.h"
 
+/* default init hook can be overridden by a shared library */
+static void  default_openssl_tls_init(SSL_CTX *context, bool isServerStart);
+openssl_tls_init_hook_typ openssl_tls_init_hook = default_openssl_tls_init;
 
 static int	my_sock_read(BIO *h, char *buf, int size);
 static int	my_sock_write(BIO *h, const char *buf, int size);
@@ -116,27 +119,10 @@ be_tls_init(bool isServerStart)
 	SSL_CTX_set_mode(context, SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER);
 
 	/*
-	 * Set password callback
+	 * Call init hook (usually to set password callback)
 	 */
-	if (isServerStart)
-	{
-		if (ssl_passphrase_command[0])
-			SSL_CTX_set_default_passwd_cb(context, ssl_external_passwd_cb);
-	}
-	else
-	{
-		if (ssl_passphrase_command[0] && ssl_passphrase_command_supports_reload)
-			SSL_CTX_set_default_passwd_cb(context, ssl_external_passwd_cb);
-		else
+	(* openssl_tls_init_hook)(context, isServerStart);
 
-			/*
-			 * If reloading and no external command is configured, override
-			 * OpenSSL's default handling of passphrase-protected files,
-			 * because we don't want to prompt for a passphrase in an
-			 * already-running server.
-			 */
-			SSL_CTX_set_default_passwd_cb(context, dummy_ssl_passwd_cb);
-	}
 	/* used by the callback */
 	ssl_is_server_start = isServerStart;
 
@@ -1313,3 +1299,27 @@ ssl_protocol_version_to_openssl(int v, const char *guc_name, int loglevel)
 	GetConfigOption(guc_name, false, false;
 	return -1;
 }
+
+
+static void
+default_openssl_tls_init(SSL_CTX *context, bool isServerStart)
+{
+	if (isServerStart)
+	{
+		if (ssl_passphrase_command[0])
+			SSL_CTX_set_default_passwd_cb(context, ssl_external_passwd_cb);
+	}
+	else
+	{
+		if (ssl_passphrase_command[0] && ssl_passphrase_command_supports_reload)
+			SSL_CTX_set_default_passwd_cb(context, ssl_external_passwd_cb);
+		else
+			/*
+			 * If reloading and no external command is configured, override
+			 * OpenSSL's default handling of passphrase-protected files,
+			 * because we don't want to prompt for a passphrase in an
+			 * already-running server.
+			 */
+			SSL_CTX_set_default_passwd_cb(context, dummy_ssl_passwd_cb);
+	}
+}
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 2b9ab32293..73d278f3b2 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -972,17 +972,6 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	LocalProcessControlFile(false);
 
-	/*
-	 * Initialize SSL library, if specified.
-	 */
-#ifdef USE_SSL
-	if (EnableSSL)
-	{
-		(void) secure_initialize(true);
-		LoadedSSL = true;
-	}
-#endif
-
 	/*
 	 * Register the apply launcher.  Since it registers a background worker,
 	 * it needs to be called before InitializeMaxBackends(), and it's probably
@@ -996,6 +985,17 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	process_shared_preload_libraries();
 
+	/*
+	 * Initialize SSL library, if specified.
+	 */
+#ifdef USE_SSL
+	if (EnableSSL)
+	{
+		(void) secure_initialize(true);
+		LoadedSSL = true;
+	}
+#endif
+
 	/*
 	 * Now that loadable modules have had their chance to register background
 	 * workers, calculate MaxBackends.
diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h
index 82e57afc64..ee57fdc301 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -287,6 +287,10 @@ extern void be_tls_get_peer_serial(Port *port, char *ptr, size_t len);
 extern char *be_tls_get_certificate_hash(Port *port, size_t *len);
 #endif
 
+/* init hook for SSL, the default sets the password callback if appropriate */
+typedef void(* openssl_tls_init_hook_typ)(SSL_CTX *context, bool isServerStart);
+extern openssl_tls_init_hook_typ openssl_tls_init_hook;
+
 #endif			/* USE_SSL */
 
 #ifdef ENABLE_GSS
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index b2eaef3bff..5f975ebcba 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -25,4 +25,9 @@ SUBDIRS = \
 		 

Re: Why does [auto-]vacuum delay not report a wait event?

2020-03-21 Thread Mahendra Singh Thalor
On Sat, 21 Mar 2020 at 09:38, Justin Pryzby  wrote:
>
> On Thu, Mar 19, 2020 at 03:44:49PM -0700, Andres Freund wrote:
> > But uh, unfortunately the vacuum delay code just sleeps without setting
> > a wait event:
> ...
> > Seems like it should instead use a new wait event in the PG_WAIT_TIMEOUT
> > class?
> >
> > Given how frequently we run into trouble with [auto]vacuum throttling
> > being a problem, and there not being any way to monitor that currently,
> > that seems like it'd be a significant improvement, given the effort?
>
> I see that pg_sleep sets WAIT_EVENT_PG_SLEEP, but pg_usleep doesn't, I
guess
> because the overhead is significant for a small number of usecs, and
because it
> doesn't work for pg_usleep to set a generic event if callers want to be
able to
> set a more specific wait event.
>
> Also, I noticed that SLEEP_ON_ASSERT comment (31338352b) wants to use
pg_usleep
> "which seems too short.".  Surely it should use pg_sleep, added at
782eefc58 a
> few years later ?
>
> Also, there was a suggestion recently that this should have a separate
> vacuum_progress phase:
> |vacuumlazy.c:#define VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL50 /* ms
*/
> |vacuumlazy.c:pg_usleep(VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL * 1000L);
>
> I was planning to look at that eventually ; should it have a wait event
instead
> or in addition ?
>
> > It'd probably also be helpful to report the total time [auto]vacuum
> > spent being delayed for vacuum verbose/autovacuum logging, but imo
> > that'd be a parallel feature to a wait event, not a replacement.
>
> This requires wider changes than I anticipated.
>
> 2020-03-20 22:35:51.308 CDT [16534] LOG:  automatic aggressive vacuum of
table "template1.pg_catalog.pg_class": index scans: 1
> pages: 0 removed, 11 remain, 0 skipped due to pins, 0 skipped
frozen
> tuples: 6 removed, 405 remain, 0 are dead but not yet removable,
oldest xmin: 1574
> buffer usage: 76 hits, 7 misses, 8 dirtied
> avg read rate: 16.378 MB/s, avg write rate: 18.718 MB/s
> Cost-based delay: 2 msec
> system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
>
> VACUUM VERBOSE wouldn't normally be run with cost_delay > 0, so that
field will
> typically be zero, so I made it conditional, which is supposedly why it's
> written like that, even though that's not otherwise being used since
17eaae98.
>
> Added at https://commitfest.postgresql.org/28/2515/
>
> --
> Justin

Thanks Justin for quick patch.

I haven't reviewed your full patch but I can see that "make installcheck"
is failing with segment fault.

*Stack trace;*
Core was generated by `postgres: autovacuum worker regression
 '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x560080cc827c in ginInsertCleanup (ginstate=0x7ffe0b648980,
full_clean=false, fill_fsm=true, forceCleanup=true, stats=0x0) at
ginfast.c:895
895stats->delay_msec += vacuum_delay_point();
(gdb) bt
#0  0x560080cc827c in ginInsertCleanup (ginstate=0x7ffe0b648980,
full_clean=false, fill_fsm=true, forceCleanup=true, stats=0x0) at
ginfast.c:895
#1  0x560080cdd0c3 in ginvacuumcleanup (info=0x7ffe0b64b0c0, stats=0x0)
at ginvacuum.c:706
#2  0x560080d791d4 in index_vacuum_cleanup (info=0x7ffe0b64b0c0,
stats=0x0) at indexam.c:711
#3  0x560080fa790e in do_analyze_rel (onerel=0x56008259e6e0,
params=0x560082206de4, va_cols=0x0, acquirefunc=0x560080fa8a75
, relpages=25, inh=false,
in_outer_xact=false, elevel=13) at analyze.c:683
#4  0x560080fa5f3e in analyze_rel (relid=37789,
relation=0x5600822ba1a0, params=0x560082206de4, va_cols=0x0,
in_outer_xact=false, bstrategy=0x5600822064e0) at analyze.c:263
#5  0x5600810d9eb7 in vacuum (relations=0x56008227e5b8,
params=0x560082206de4, bstrategy=0x5600822064e0, isTopLevel=true) at
vacuum.c:468
#6  0x560081357608 in autovacuum_do_vac_analyze (tab=0x560082206de0,
bstrategy=0x5600822064e0) at autovacuum.c:3115
#7  0x5600813557dd in do_autovacuum () at autovacuum.c:2466
#8  0x56008135373d in AutoVacWorkerMain (argc=0, argv=0x0) at
autovacuum.c:1693
#9  0x560081352f75 in StartAutoVacWorker () at autovacuum.c:1487
#10 0x56008137ed5f in StartAutovacuumWorker () at postmaster.c:5580
#11 0x56008137e199 in sigusr1_handler (postgres_signal_arg=10) at
postmaster.c:5297
#12 
#13 0x7f18b778bff7 in __GI___select (nfds=9, readfds=0x7ffe0b64c050,
writefds=0x0, exceptfds=0x0, timeout=0x7ffe0b64bfc0) at
../sysdeps/unix/sysv/linux/select.c:41
#14 0x56008137499a in ServerLoop () at postmaster.c:1691
#15 0x560081373e63 in PostmasterMain (argc=3, argv=0x560082189020) at
postmaster.c:1400
#16 0x5600811d37ea in main (argc=3, argv=0x560082189020) at main.c:210

Here, stats is null so it is crashing.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com


Database recovery from tablespace only

2020-03-21 Thread Phillip Black
Hey Hackers,

We had a database running on debian where its disks went corrupted. And
currently is not possible to access those disks, unreadable.

For backup we used:

File System Level Backup
Continuous Archiving and Point-in-Time Recovery (PITR)

We used tablespaces for the entire database.

When trying to recover from the backup we noticed that the backup was
partially complete, here is the info:

PG_VERSION 8.2.4

base  we are missing the files

global  we are missing the files

pg_clog  we are missing the files

pg_multixact  we are missing the files

pg_subtrans  we are missing the files

pg_tblspc  All files are preserved there from a backup

pg_twophase  we are missing the files

pg_xlog  we are missing the files


Previous backups had the same error. We only have backups from the last 2
weeks.

We have a very old (about 10 years old) data/

What are the alternatives for recovery here? (besides trying to recover
disk data with specialists)

Thanks!

Phillip Black.


Re: Refactor compile-time assertion checks for C/C++

2020-03-21 Thread Tom Lane
Michael Paquier  writes:
> The fun does not stop here.  gcc is fine when using that for C and
> C++:
> #define StaticAssertStmt(condition, errmessage) \
>do { struct static_assert_struct { int static_assert_failure : (condition) 
> ? 1 : -1; }; } while(0)
> #define StaticAssertExpr(condition, errmessage) \
>((void) ({ StaticAssertStmt(condition, errmessage); }))

Hm, I'm not so sure.  I just noticed that cpluspluscheck is failing
for me now:

$ src/tools/pginclude/cpluspluscheck
In file included from /tmp/cpluspluscheck.HRgpVA/test.cpp:4:
./src/include/common/int128.h: In function 'void 
int128_add_int64_mul_int64(INT128*, int64, int64)':
./src/include/common/int128.h:180: error: types may not be defined in 'sizeof' 
expressions

which of course is pointing at

StaticAssertStmt(((int64) -1 >> 1) == (int64) -1,
 "arithmetic right shift is needed");

so the existing "C and C++" fallback StaticAssertStmt doesn't work for
older g++.  (This is g++ 4.4.7 from RHEL6.)

> But then problems come from MSVC which does not like the do{} part for
> statements, and this works:

Huh?  Surely do{} is a legal statement.

Maybe we should just revert b7f64c64d instead of putting more time
into this.  It's looking like we're going to end up with four or so
implementations no matter what, so it's getting hard to see any
real benefit.

regards, tom lane




Re: [HACKERS] WAL logging problem in 9.4.3?

2020-03-21 Thread Noah Misch
On Sat, Mar 21, 2020 at 12:01:27PM -0700, Noah Misch wrote:
> Pushed, after adding a missing "break" to gist_identify() and tweaking two
> more comments.  However, a diverse minority of buildfarm members are failing
> like this, in most branches:
> 
> Mar 21 13:16:37 #   Failed test 'wal_level = minimal, SET TABLESPACE, hint 
> bit'
> Mar 21 13:16:37 #   at t/018_wal_optimize.pl line 231.
> Mar 21 13:16:37 #  got: '1'
> Mar 21 13:16:37 # expected: '2'
> Mar 21 13:16:46 # Looks like you failed 1 test of 34.
> Mar 21 13:16:46 [13:16:46] t/018_wal_optimize.pl  
>   -- 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake=2020-03-21%2016%3A52%3A05
> 
> Since I run two of the failing animals, I expect to reproduce this soon.

force_parallel_regress was the setting needed to reproduce this:

  printf '%s\n%s\n%s\n' 'log_statement = all' 'force_parallel_mode = regress' 
>/tmp/force_parallel.conf
  make -C src/test/recovery check PROVE_TESTS=t/018_wal_optimize.pl 
TEMP_CONFIG=/tmp/force_parallel.conf

The proximate cause is the RelFileNodeSkippingWAL() call that we added to
MarkBufferDirtyHint().  MarkBufferDirtyHint() runs in parallel workers, but
parallel workers have zeroes for pendingSyncHash and rd_*Subid.  I hacked up
the attached patch to understand the scope of the problem (not to commit).  It
logs a message whenever a parallel worker uses pendingSyncHash or
RelationNeedsWAL().  Some of the cases happen often enough to make logs huge,
so the patch suppresses logging for them.  You can see the lower-volume calls
like this:

  printf '%s\n%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 
'max_wal_senders = 0' 'force_parallel_mode = regress' 
>/tmp/minimal_parallel.conf
  make check-world TEMP_CONFIG=/tmp/minimal_parallel.conf
  find . -name log | xargs grep -rl 'nm0 invalid'

Not all are actual bugs.  For example, get_relation_info() behaves fine:

/* Temporary and unlogged relations are inaccessible during recovery. */
if (!RelationNeedsWAL(relation) && RecoveryInProgress())

Kyotaro, can you look through the affected code and propose a strategy for
good coexistence of parallel query with the WAL skipping mechanism?

Since I don't expect one strategy to win clearly and quickly, I plan to revert
the main patch around 2020-03-22 17:30 UTC.  That will give the patch about
twenty-four hours in the buildfarm, so more animals can report in.  I will
leave the three smaller patches in place.

> fairywren failed differently on 9.5; I have not yet studied it:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren=2020-03-21%2018%3A01%3A10

This did not remain specific to 9.5.  On platforms where SIZEOF_SIZE_T==4 or
SIZEOF_LONG==4, wal_skip_threshold cannot exceed 2GB.  A simple s/1TB/1GB/ in
the test should fix this.
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a25d539..885049d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7183,7 +7183,7 @@ log_heap_clean(Relation reln, Buffer buffer,
XLogRecPtr  recptr;
 
/* Caller should not call me on a non-WAL-logged relation */
-   Assert(RelationNeedsWAL(reln));
+   Assert(RelationNeedsWALKnownProblem(reln));
 
xlrec.latestRemovedXid = latestRemovedXid;
xlrec.nredirected = nredirected;
diff --git a/src/backend/access/heap/pruneheap.c 
b/src/backend/access/heap/pruneheap.c
index 1794cfd..5be469d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -258,7 +258,7 @@ heap_page_prune(Relation relation, Buffer buffer, 
TransactionId OldestXmin,
/*
 * Emit a WAL XLOG_HEAP2_CLEAN record showing what we did
 */
-   if (RelationNeedsWAL(relation))
+   if (RelationNeedsWALKnownProblem(relation))
{
XLogRecPtr  recptr;
 
diff --git a/src/backend/access/nbtree/nbtsearch.c 
b/src/backend/access/nbtree/nbtsearch.c
index 8ff49ce..cfa72ce 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -67,7 +67,7 @@ _bt_drop_lock_and_maybe_pin(IndexScanDesc scan, BTScanPos sp)
LockBuffer(sp->buf, BUFFER_LOCK_UNLOCK);
 
if (IsMVCCSnapshot(scan->xs_snapshot) &&
-   RelationNeedsWAL(scan->indexRelation) &&
+   RelationNeedsWALKnownProblem(scan->indexRelation) &&
!scan->xs_want_itup)
{
ReleaseBuffer(sp->buf);
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 0ed7c64..14fb36f 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -93,6 +93,8 @@ RelationCreateStorage(RelFileNode rnode, char relpersistence)
BackendId   backend;
boolneeds_wal;
 
+   PreventParallelWorker(); /* can't update pendingSyncHash */
+
switch (relpersistence)
 

Re: Ecpg dependency

2020-03-21 Thread Bruce Momjian
On Sat, Mar 21, 2020 at 07:30:48PM +, Dagfinn Ilmari Mannsåker wrote:
> Bruce Momjian  writes:
> 
> > On Sat, Mar 21, 2020 at 02:14:44PM -0400, Bruce Momjian wrote:
> >> On Tue, Mar 10, 2020 at 01:47:14PM +0100, Filip Janus wrote:
> >> > Hello,
> >> > After upgrade from 11.2 to 12.2 I found, that build of ecpg component 
> >> > depends
> >> > on pgcommon_shlib and pgport_shlib.  But build of ecpg doesn't include 
> >> > build
> >> > of pgcommon_shlib and pgport_shlib. That means, if I want to build ecpg, 
> >> > first
> >> > I need to build  pgcommon_shlib and pgport_shlib and after that I am 
> >> > able to
> >> > build ecpg.
> >> > 
> >> > I would like to ask if this behavior is expected or not ? Because 
> >> > previous
> >> > version doesn't require this separate builds.
> >> 
> >> Ah, I see the problem, and this is a new bug in PG 12.  The attached
> >> patch fixes PG 12 and master.
> >
> >> + all-lib: | submake-libpgport
> >
> > Oh, I forgot to mention I got this line from
> > src/interfaces/libpq/Makefile:
> 
> And that line is wrong, but my patch to fix it¹ seems to have fallen
> between the cracks.
> 
> [1] 
> https://www.postgresql.org/message-id/flat/871rsa13ae.fsf%40wibble.ilmari.org 
> 
> Adding the dependency to `all-lib` only fixes it for serial builds. To
> fix it properly, so it works with parallel builds (e.g. 'make -j4 -C
> src/interfaces/ecpg', the dependency needs to be declared via
> SHLIB_PREREQS, as attached

Oh, good catch.  I did not notice that patch before.  Adding that change
to src/interfaces/ecpg/pgtypeslib/Makefile fixes the stand-alone
compile.

The attached patch does this, and changes libpq to use it too, so
parallel Make works there too.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +
diff --git a/src/interfaces/ecpg/pgtypeslib/Makefile b/src/interfaces/ecpg/pgtypeslib/Makefile
index 530b580d7c..ae79ead7a7 100644
--- a/src/interfaces/ecpg/pgtypeslib/Makefile
+++ b/src/interfaces/ecpg/pgtypeslib/Makefile
@@ -24,6 +24,7 @@ override CFLAGS += $(PTHREAD_CFLAGS)
 
 SHLIB_LINK_INTERNAL = -lpgcommon_shlib -lpgport_shlib
 SHLIB_LINK += $(filter -lintl -lm, $(LIBS))
+SHLIB_PREREQS = submake-libpgport
 
 SHLIB_EXPORTS = exports.txt
 
diff --git a/src/interfaces/libpq/Makefile b/src/interfaces/libpq/Makefile
index a06882651f..d4919970f8 100644
--- a/src/interfaces/libpq/Makefile
+++ b/src/interfaces/libpq/Makefile
@@ -85,13 +85,12 @@ endif
 ifeq ($(PORTNAME), win32)
 SHLIB_LINK += -lshell32 -lws2_32 -lsecur32 $(filter -leay32 -lssleay32 -lcomerr32 -lkrb5_32, $(LIBS))
 endif
+SHLIB_PREREQS = submake-libpgport
 
 SHLIB_EXPORTS = exports.txt
 
 all: all-lib
 
-all-lib: | submake-libpgport
-
 # Shared library stuff
 include $(top_srcdir)/src/Makefile.shlib
 backend_src = $(top_srcdir)/src/backend


Re: Additional improvements to extended statistics

2020-03-21 Thread Tomas Vondra

On Thu, Mar 19, 2020 at 07:08:07PM +, Dean Rasheed wrote:

On Wed, 18 Mar 2020 at 19:31, Tomas Vondra  wrote:


Attached is a rebased patch series, addressing both those issues.

I've been wondering why none of the regression tests failed because of
the 0.0 vs. 1.0 issue, but I think the explanation is pretty simple - to
make the tests stable, all the MCV lists we use are "perfect" i.e. it
represents 100% of the data. But this selectivity is used to compute
selectivity only for the part not represented by the MCV list, i.e. it's
not really used. I suppose we could add a test that would use larger
MCV item, but I'm afraid that'd be inherently unstable :-(



I think it ought to be possible to write stable tests for this code
branch -- I think all you need is for the number of rows to remain
small, so that the stats sample every row and are predictable, while
the MCVs don't cover all values, which can be achieved with a mix of
some common values, and some that only occur once.

I haven't tried it, but it seems like it would be possible in principle.



Ah, right. Yeah, I think that should work. I thought there would be some
volatility due to groups randomly not making it into the MCV list, but
you're right it's possible to construct the data in a way to make it
perfectly deterministic. So that's what I've done in the attached patch.



Another thing I was thinking about is the changes to the API. We need to
pass information whether the clauses are connected by AND or OR to a
number of places, and 0001 does that in two ways. For some functions it
adds a new parameter (called is_or), and for other functiosn it creates
a new copy of a function. So for example

   - statext_mcv_clauselist_selectivity
   - statext_clauselist_selectivity

got the new flag, while e.g. clauselist_selectivity gets a new "copy"
sibling called clauselist_selectivity_or.

There were two reasons for not using flag. First, clauselist_selectivity
and similar functions have to do very different stuff for these two
cases, so it'd be just one huge if/else block. Second, minimizing
breakage of third-party code - pretty much all the extensions I've seen
only work with AND clauses, and call clauselist_selectivity. Adding a
flag would break that code. (Also, there's a bit of laziness, because
this was the simplest thing to do during development.)

But I wonder if that's sufficient reason - maybe we should just add the
flag in all cases. It might break some code, but the fix is trivial (add
a false there).

Opinions?



-1

I think of clause_selectivity() and clauselist_selectivity() as the
public API that everyone is using, whilst the functions that support
lists of clauses to be combined using OR are internal (to the planner)
implementation details. I think callers of public API tend to either
have implicitly AND'ed list of clauses, or a single OR clause.



OK, thanks. That was mostly my reasoning too - not wanting to cause
unnecessary breakage. And yes, I agree most people just call
clauselist_selectivity with a list of clauses combined using AND.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From f94929e936670d189fd628430fd73e49e410e71a Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sat, 21 Mar 2020 22:29:24 +0100
Subject: [PATCH] Improve estimation of OR clauses using extended statistics

Until now, OR clauses were estimated using extended statistics only when
the whole clause (all the arguments) are compatible. If even just one
argument was found to be incompatible, the whole clause was estimated
ignoring extended statistics. Estimation errors for OR clauses tend to
be fairly mild, so this was considered acceptable, but it may become an
issue for OR clauses with more complex arguments, etc.

This commit relaxes the restriction, using mostly the same logic as AND
clauses. We first apply extended statistics to as many arguments as
possible, and then use the (s1 + s2 - s1 * s2) formula to factor in the
remaining clauses.

The OR clause is still considered incompatible, though. If any argument
is unsupported or references variable not covered by the statistics, the
whole OR clause is incompatible. The consequence is that e.g. clauses

(a = 1) AND (b = 1 OR c = 1 OR d = 1)

can't be estimated by statistics on (a,b,c) because the OR clause also
references "d". So we'll estimate each of the AND arguments separately,
and the extended statistics will be used only to estimate the OR clause.
This may be solved by creating statistics including the "d" column, but
the issue applies to cases where the clause type is unsupported, e.g.

(a = 1) AND (b = 1 OR c = 1 OR mod(d,10) = 0)

which can't be solved by adding "d" to the statistics, at least for now.

Author: Tomas Vondra
Reviewed-by: Dean Rasheed, Thomas Munro
Discussion: Discussion: 
https://www.postgresql.org/message-id/flat/20200113230008.g67iyk4cs3xbnjju@development
---
 

Re: plan cache overhead on plpgsql expression

2020-03-21 Thread Pavel Stehule
so 21. 3. 2020 v 19:24 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > So the patch has a problem with constant casting - unfortunately the mix
> of
> > double precision variables and numeric constants is pretty often in
> > Postgres.
>
> Yeah.  I believe the cause of that is that the patch thinks it can skip
> passing an inline-function-free simple expression through the planner.
> That's flat out wrong.  Quite aside from failing to perform
> constant-folding (which is presumably the cause of the slowdown you
> spotted), that means that we miss performing such non-optional
> transformations as rearranging named-function-argument notation into
> positional order.  I didn't bother to test that but I'm sure it can be
> shown to lead to crashes.
>
> Now that I've looked at the patch I don't like it one bit from a
> structural standpoint either.  It's basically trying to make an end
> run around the plancache, which is not going to be maintainable even
> if it correctly accounted for everything the plancache does today.
> Which it doesn't.  Two big problems are:
>
> * It doesn't account for the possibility of search_path changes
> affecting the interpretation of an expression.
>
> * It assumes that the *only* things that a simple plan could get
> invalidated for are functions that were inlined.  This isn't the
> case --- a counterexample is that removal of no-op CoerceToDomain
> nodes requires the plan to be invalidated if the domain's constraints
> change.  And there are likely to be more such issues in future.
>
>
> So while there's clearly something worth pursuing here, I do not like
> anything about the way it was done.  I think that the right way to
> think about this problem is "how can the plan cache provide a fast
> path for checking validity of simple-expression plans?".  And when you
> think about it that way, there's a pretty obvious answer: if the plan
> involves no table references, there's not going to be any locks that
> have to be taken before we can check the is_valid flag.  So we can
> have a fast path that skips AcquirePlannerLocks and
> AcquireExecutorLocks, which are a big part of the problem, and we can
> also bypass some of the other random checks that GetCachedPlan has to
> make, like whether RLS affects the plan.
>
> Another chunk of the issue is the constant acquisition and release of
> reference counts on the plan.  We can't really skip that (I suspect
> there are additional bugs in Amit's patch arising from trying to do so).
> However, plpgsql already has mechanisms for paying simple-expression
> setup costs once per transaction rather than once per expression use.
> So we can set up a simple-expression ResourceOwner managed much like
> the simple-expression EState, and have it hold a refcount on the
> CachedPlan for each simple expression, and pay that overhead just once
> per transaction.
>
> So I worked on those ideas for awhile, and came up with the attached
> patchset:
>
> 0001 adds some regression tests in this area (Amit's patch fails the
> tests concerning search_path changes).
>
> 0002 does what's suggested above.  I also did a little bit of marginal
> micro-tuning in exec_eval_simple_expr() itself.
>
> 0003 improves the biggest remaining cost of validity rechecking,
> which is verifying that the search_path is the same as it was when
> the plan was cached.
>
> I haven't done any serious performance testing on this, but it gives
> circa 2X speedup on Pavel's original example, which is at least
> fairly close to the results that Amit's patch got there.  And it
> makes this last batch of test cases faster not slower, too.
>
> With this patch, perf shows the hotspots on Pavel's original example
> as being
>
> +   19.24%19.17% 46470  postmaster   plpgsql.so
>[.] exec_eval_expr
> +   15.19%15.15% 36720  postmaster   plpgsql.so
>[.] plpgsql_param_eval_var
> +   14.98%14.94% 36213  postmaster   postgres
>[.] ExecInterpExpr
> +6.32% 6.30% 15262  postmaster   plpgsql.so
>[.] exec_stmt
> +6.08% 6.06% 14681  postmaster   plpgsql.so
>[.] exec_assign_value
>
> Maybe there's more that could be done to knock fat out of
> exec_eval_expr and/or plpgsql_param_eval_var, but at least
> the plan cache isn't the bottleneck anymore.
>

I tested Tom's patches, and I can confirm these results.

It doesn't break tests (and all tests plpgsql_check tests passed without
problems).

The high overhead of ExecInterpExpr is related to prepare fcinfo, and
checking nulls arguments because all functions are strict
plpgsql_param_eval_var, looks like expensive is var = (PLpgSQL_var *)
estate->datums[dno] and *op->resvalue = var->value;

It looks great.

Pavel



>
> regards, tom lane
>
>


Re: SQL/JSON: functions

2020-03-21 Thread Pavel Stehule
so 21. 3. 2020 v 11:07 odesílatel Nikita Glukhov 
napsal:

> Attached 46th version of the patches.
>
>
> On 20.03.2020 22:34, Pavel Stehule wrote:
>
>
> čt 19. 3. 2020 v 23:57 odesílatel Nikita Glukhov 
> napsal:
>
>> Attached 45th version of the patches.
>>
>> Nodes JsonFormat, JsonReturning, JsonPassing, JsonBehavior were fixed.
>>
>>
>> On 17.03.2020 21:35, Pavel Stehule wrote:
>>
>>
>> út 17. 3. 2020 v 1:55 odesílatel Nikita Glukhov 
>> napsal:
>>
>>> Attached 44th version of the patches.
>>>
>>>
>>> On 12.03.2020 16:41, Pavel Stehule wrote:
>>>
>>>
>>> On 12.03.2020 00:09 Nikita Glukhov wrote:
>>>
 Attached 43rd version of the patches.

 The previous patch #4 ("Add function formats") was removed.
 Instead, introduced new executor node JsonCtorExpr which is used to wrap
 SQL/JSON constructor function calls (FuncExpr, Aggref, WindowFunc).

 Also, JsonIsPredicate node began to be used as executor node.
 This helped to remove unnecessary json[b]_is_valid() user functions.


>>> It looks very well - all tests passed, code looks well.
>>>
>>> Now, when there are special nodes, then the introduction of
>>> COERCE_INTERNAL_CAST is not necessary, and it is only my one and last
>>> objection again this patch's set.
>>>
>>> I have removed patch #3 with COERCE_INTERNAL_CAST too.
>>>
>>> Coercions in JsonValueExpr in JsonCtorExpr, which were previously hidden 
>>> with
>>> COERCE_INTERNAL_CAST and which were outputted as RETURNING or FORMAT JSON
>>> clauses, now moved into separate expressions.
>>>
>>> I am looking on the code, and although the code is correct it doesn't
>> look well (consistently with other node types).
>>
>> I think so JsonFormat and JsonReturning should be node types, not just
>> structs. If these types will be nodes, then you can simplify _readJsonExpr
>> and all node operations on this node.
>>
>>
>> JsonFormat and JsonReturning was transformed into nodes, and not included
>> directly into other nodes now.
>>
>>
>> User functions json[b]_build_object_ext() and json[b]_build_array_ext() also
>>> can be easily removed.   But it seems harder to remove new aggregate 
>>> functions
>>> json[b]_objectagg() and json[b]_agg_strict(), because they can't be called
>>> directly from JsonCtorExpr node.
>>>
>>>
>> I don't see reasons for another reduction now. Can be great if you can
>> finalize work what you plan for pg13.
>>
>> I have removed json[b]_build_object_ext() and json[b]_build_array_ext().
>
> But json[b]_objectagg() and json[b]_agg_strict() are still present.
> It seems that removing them requires majors refactoring of the execution
> of Aggref and WindowFunc nodes.
>
>
> +<->READ_ENUM_FIELD(on_error.btype, JsonBehaviorType);
>> +<->READ_NODE_FIELD(on_error.default_expr);
>> +<->READ_ENUM_FIELD(on_empty.btype, JsonBehaviorType);
>> +<->READ_NODE_FIELD(on_empty.default_expr);
>>
>> JsonBehavior is node type. Then why you don't write just
>>
>> READ_NODE_FIELD(on_error);
>> READ_NODE_FIELD(on_empty)
>>
>> ??
>>
>> JsonBehavior now used in JsonExpr as a pointer to node.
>>
>>
>> And maybe the code can be better if you don't use JsonPassing struct (or
>> use it only inside gram.y) and pass just List *names, List *values.
>>
>> Nodes should to contains another nodes or scalar types. Using structs
>> (that are not nodes)  inside doesn't look consistently.
>>
>> JsonPassing was replaced with two Lists.
>>
>>
>> I found some not finished code in 0003 patch
>> +
>> +json_name_and_value:
>> +/* TODO
>> +<-><--><-->KEY c_expr VALUE_P json_value_expr %prec POSTFIXOP
>> +<-><--><--><-->{ $$ = makeJsonKeyValue($2, $4); }
>> +<-><--><-->|
>> +*/
>>
>> This is unsupported variant of standard syntax, because it seems to lead
>> to unresolvable conflicts.  The only supports syntax is:
>>
>> JSON_OBJECT(key : value)
>> JSON_OBJECT(key VALUE value)
>>
>> ok. So please change comment from ToDo to this explanation. Maybe note in
> doc (unimplemented features can be good idea)
>
> Some these unresolvable conflicts are solved with extra parenthesis in
> Postgres.
>
>
>>
>> The support for json type in jsonpath also seems to be immature, so I will 
>> try
>>> to remove it in the next iteration.
>>>
>>> What do you think? This patch is little bit off topic, so if you don't
>> need it, then can be removed. Is there some dependency for "jsontable" ?
>>
>> There is a dependency in SQL/JSON query functions executor, because executor
>> uses new structure JsonItem instead of plain JsonbValue.  I will try to
>> preserve refactoring with JsonItem introduction, but remove json support.
>> If it will be still unacceptable, I will try to completely remove patch #1.
>>
>> I have completely removed patch #1. It turned out to be not so difficult.
>
>
>
> I have much better feeling from version 45 (now it looks like Postgres C
> :)). Still there are some small issues.
>
> 1. commented code
>
> +json_encoding:
> +<-><--><-->name<--><--><--><--><--><--><--><--><-->{ $$ =
> 

Re: Ecpg dependency

2020-03-21 Thread Dagfinn Ilmari Mannsåker
Bruce Momjian  writes:

> On Sat, Mar 21, 2020 at 02:14:44PM -0400, Bruce Momjian wrote:
>> On Tue, Mar 10, 2020 at 01:47:14PM +0100, Filip Janus wrote:
>> > Hello,
>> > After upgrade from 11.2 to 12.2 I found, that build of ecpg component 
>> > depends
>> > on pgcommon_shlib and pgport_shlib.  But build of ecpg doesn't include 
>> > build
>> > of pgcommon_shlib and pgport_shlib. That means, if I want to build ecpg, 
>> > first
>> > I need to build  pgcommon_shlib and pgport_shlib and after that I am able 
>> > to
>> > build ecpg.
>> > 
>> > I would like to ask if this behavior is expected or not ? Because previous
>> > version doesn't require this separate builds.
>> 
>> Ah, I see the problem, and this is a new bug in PG 12.  The attached
>> patch fixes PG 12 and master.
>
>> + all-lib: | submake-libpgport
>
> Oh, I forgot to mention I got this line from
> src/interfaces/libpq/Makefile:

And that line is wrong, but my patch to fix it¹ seems to have fallen
between the cracks.

[1] 
https://www.postgresql.org/message-id/flat/871rsa13ae.fsf%40wibble.ilmari.org 

Adding the dependency to `all-lib` only fixes it for serial builds. To
fix it properly, so it works with parallel builds (e.g. 'make -j4 -C
src/interfaces/ecpg', the dependency needs to be declared via
SHLIB_PREREQS, as attached

- ilmari
-- 
- Twitter seems more influential [than blogs] in the 'gets reported in
  the mainstream press' sense at least.   - Matt McLeod
- That'd be because the content of a tweet is easier to condense down
  to a mainstream media article.  - Calle Dybedahl

>From fb9e077308433970588505604d79ab21e7b1404a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Sat, 21 Mar 2020 19:29:51 +
Subject: [PATCH] Add missing libpgport prereq in ecpg's pgtypeslib

---
 src/interfaces/ecpg/pgtypeslib/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/interfaces/ecpg/pgtypeslib/Makefile b/src/interfaces/ecpg/pgtypeslib/Makefile
index 530b580d7c..6094a8d08a 100644
--- a/src/interfaces/ecpg/pgtypeslib/Makefile
+++ b/src/interfaces/ecpg/pgtypeslib/Makefile
@@ -25,6 +25,8 @@ override CFLAGS += $(PTHREAD_CFLAGS)
 SHLIB_LINK_INTERNAL = -lpgcommon_shlib -lpgport_shlib
 SHLIB_LINK += $(filter -lintl -lm, $(LIBS))
 
+SHLIB_PREREQS = submake-libpgport
+
 SHLIB_EXPORTS = exports.txt
 
 OBJS = \
-- 
2.20.1



Re: [HACKERS] WAL logging problem in 9.4.3?

2020-03-21 Thread Bruce Momjian


Wow, this thread started in 2015.  :-O

Date: Fri, 3 Jul 2015 00:05:24 +0200

---

On Sat, Mar 21, 2020 at 12:01:27PM -0700, Noah Misch wrote:
> On Sun, Mar 15, 2020 at 08:46:47PM -0700, Noah Misch wrote:
> > On Wed, Mar 04, 2020 at 04:29:19PM +0900, Kyotaro Horiguchi wrote:
> > > The attached is back-patches from 9.5 through master.
> > 
> > Thanks.  I've made some edits.  I'll plan to push the attached patches on
> > Friday or Saturday.
> 
> Pushed, after adding a missing "break" to gist_identify() and tweaking two
> more comments.  However, a diverse minority of buildfarm members are failing
> like this, in most branches:
> 
> Mar 21 13:16:37 #   Failed test 'wal_level = minimal, SET TABLESPACE, hint 
> bit'
> Mar 21 13:16:37 #   at t/018_wal_optimize.pl line 231.
> Mar 21 13:16:37 #  got: '1'
> Mar 21 13:16:37 # expected: '2'
> Mar 21 13:16:46 # Looks like you failed 1 test of 34.
> Mar 21 13:16:46 [13:16:46] t/018_wal_optimize.pl  
>   -- 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake=2020-03-21%2016%3A52%3A05
> 
> Since I run two of the failing animals, I expect to reproduce this soon.
> 
> fairywren failed differently on 9.5; I have not yet studied it:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren=2020-03-21%2018%3A01%3A10
> 
> > > It lacks a part of TAP infrastructure nowadays we have, but I
> > > want to have the test (and it actually found a bug I made during this
> > > work). So I added a patch to back-patch TestLib.pm, PostgresNode.pm
> > > and RecursiveCopy.pm along with 018_wal_optimize.pl.
> > > (0004-Add-TAP-test-for-WAL-skipping-feature.patch)
> > 
> > That is a good idea.  Rather than make it specific to this test, I would 
> > like
> > to back-patch all applicable test files from 9.6 src/test/recovery.  I'll 
> > plan
> > to push that one part on Thursday.
> 
> That push did not cause failures.
> 
> 

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: [HACKERS] WAL logging problem in 9.4.3?

2020-03-21 Thread Noah Misch
On Sun, Mar 15, 2020 at 08:46:47PM -0700, Noah Misch wrote:
> On Wed, Mar 04, 2020 at 04:29:19PM +0900, Kyotaro Horiguchi wrote:
> > The attached is back-patches from 9.5 through master.
> 
> Thanks.  I've made some edits.  I'll plan to push the attached patches on
> Friday or Saturday.

Pushed, after adding a missing "break" to gist_identify() and tweaking two
more comments.  However, a diverse minority of buildfarm members are failing
like this, in most branches:

Mar 21 13:16:37 #   Failed test 'wal_level = minimal, SET TABLESPACE, hint bit'
Mar 21 13:16:37 #   at t/018_wal_optimize.pl line 231.
Mar 21 13:16:37 #  got: '1'
Mar 21 13:16:37 # expected: '2'
Mar 21 13:16:46 # Looks like you failed 1 test of 34.
Mar 21 13:16:46 [13:16:46] t/018_wal_optimize.pl  
  -- 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake=2020-03-21%2016%3A52%3A05

Since I run two of the failing animals, I expect to reproduce this soon.

fairywren failed differently on 9.5; I have not yet studied it:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren=2020-03-21%2018%3A01%3A10

> > It lacks a part of TAP infrastructure nowadays we have, but I
> > want to have the test (and it actually found a bug I made during this
> > work). So I added a patch to back-patch TestLib.pm, PostgresNode.pm
> > and RecursiveCopy.pm along with 018_wal_optimize.pl.
> > (0004-Add-TAP-test-for-WAL-skipping-feature.patch)
> 
> That is a good idea.  Rather than make it specific to this test, I would like
> to back-patch all applicable test files from 9.6 src/test/recovery.  I'll plan
> to push that one part on Thursday.

That push did not cause failures.




Re: Ecpg dependency

2020-03-21 Thread Bruce Momjian
On Sat, Mar 21, 2020 at 02:14:44PM -0400, Bruce Momjian wrote:
> On Tue, Mar 10, 2020 at 01:47:14PM +0100, Filip Janus wrote:
> > Hello,
> > After upgrade from 11.2 to 12.2 I found, that build of ecpg component 
> > depends
> > on pgcommon_shlib and pgport_shlib.  But build of ecpg doesn't include build
> > of pgcommon_shlib and pgport_shlib. That means, if I want to build ecpg, 
> > first
> > I need to build  pgcommon_shlib and pgport_shlib and after that I am able to
> > build ecpg.
> > 
> > I would like to ask if this behavior is expected or not ? Because previous
> > version doesn't require this separate builds.
> 
> Ah, I see the problem, and this is a new bug in PG 12.  The attached
> patch fixes PG 12 and master.

> + all-lib: | submake-libpgport

Oh, I forgot to mention I got this line from
src/interfaces/libpq/Makefile:

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: plan cache overhead on plpgsql expression

2020-03-21 Thread Tom Lane
Pavel Stehule  writes:
> So the patch has a problem with constant casting - unfortunately the mix of
> double precision variables and numeric constants is pretty often in
> Postgres.

Yeah.  I believe the cause of that is that the patch thinks it can skip
passing an inline-function-free simple expression through the planner.
That's flat out wrong.  Quite aside from failing to perform
constant-folding (which is presumably the cause of the slowdown you
spotted), that means that we miss performing such non-optional
transformations as rearranging named-function-argument notation into
positional order.  I didn't bother to test that but I'm sure it can be
shown to lead to crashes.

Now that I've looked at the patch I don't like it one bit from a
structural standpoint either.  It's basically trying to make an end
run around the plancache, which is not going to be maintainable even
if it correctly accounted for everything the plancache does today.
Which it doesn't.  Two big problems are:

* It doesn't account for the possibility of search_path changes
affecting the interpretation of an expression.

* It assumes that the *only* things that a simple plan could get
invalidated for are functions that were inlined.  This isn't the
case --- a counterexample is that removal of no-op CoerceToDomain
nodes requires the plan to be invalidated if the domain's constraints
change.  And there are likely to be more such issues in future.


So while there's clearly something worth pursuing here, I do not like
anything about the way it was done.  I think that the right way to
think about this problem is "how can the plan cache provide a fast
path for checking validity of simple-expression plans?".  And when you
think about it that way, there's a pretty obvious answer: if the plan
involves no table references, there's not going to be any locks that
have to be taken before we can check the is_valid flag.  So we can
have a fast path that skips AcquirePlannerLocks and
AcquireExecutorLocks, which are a big part of the problem, and we can
also bypass some of the other random checks that GetCachedPlan has to
make, like whether RLS affects the plan.

Another chunk of the issue is the constant acquisition and release of
reference counts on the plan.  We can't really skip that (I suspect
there are additional bugs in Amit's patch arising from trying to do so).
However, plpgsql already has mechanisms for paying simple-expression
setup costs once per transaction rather than once per expression use.
So we can set up a simple-expression ResourceOwner managed much like
the simple-expression EState, and have it hold a refcount on the
CachedPlan for each simple expression, and pay that overhead just once
per transaction.

So I worked on those ideas for awhile, and came up with the attached
patchset:

0001 adds some regression tests in this area (Amit's patch fails the
tests concerning search_path changes).

0002 does what's suggested above.  I also did a little bit of marginal
micro-tuning in exec_eval_simple_expr() itself.

0003 improves the biggest remaining cost of validity rechecking,
which is verifying that the search_path is the same as it was when
the plan was cached.

I haven't done any serious performance testing on this, but it gives
circa 2X speedup on Pavel's original example, which is at least
fairly close to the results that Amit's patch got there.  And it
makes this last batch of test cases faster not slower, too.

With this patch, perf shows the hotspots on Pavel's original example
as being

+   19.24%19.17% 46470  postmaster   plpgsql.so 
  [.] exec_eval_expr
+   15.19%15.15% 36720  postmaster   plpgsql.so 
  [.] plpgsql_param_eval_var
+   14.98%14.94% 36213  postmaster   postgres   
  [.] ExecInterpExpr
+6.32% 6.30% 15262  postmaster   plpgsql.so 
  [.] exec_stmt
+6.08% 6.06% 14681  postmaster   plpgsql.so 
  [.] exec_assign_value

Maybe there's more that could be done to knock fat out of
exec_eval_expr and/or plpgsql_param_eval_var, but at least
the plan cache isn't the bottleneck anymore.

regards, tom lane

diff --git a/src/pl/plpgsql/src/Makefile b/src/pl/plpgsql/src/Makefile
index 70a9c34..193df8a 100644
--- a/src/pl/plpgsql/src/Makefile
+++ b/src/pl/plpgsql/src/Makefile
@@ -33,8 +33,8 @@ DATA = plpgsql.control plpgsql--1.0.sql
 REGRESS_OPTS = --dbname=$(PL_TESTDB)
 
 REGRESS = plpgsql_call plpgsql_control plpgsql_copy plpgsql_domain \
-	plpgsql_record plpgsql_cache plpgsql_transaction plpgsql_trap \
-	plpgsql_trigger plpgsql_varprops
+	plpgsql_record plpgsql_cache plpgsql_simple plpgsql_transaction \
+	plpgsql_trap plpgsql_trigger plpgsql_varprops
 
 # where to find gen_keywordlist.pl and subsidiary files
 TOOLSDIR = $(top_srcdir)/src/tools
diff --git a/src/pl/plpgsql/src/expected/plpgsql_simple.out 

Re: Ecpg dependency

2020-03-21 Thread Bruce Momjian
On Tue, Mar 10, 2020 at 01:47:14PM +0100, Filip Janus wrote:
> Hello,
> After upgrade from 11.2 to 12.2 I found, that build of ecpg component depends
> on pgcommon_shlib and pgport_shlib.  But build of ecpg doesn't include build
> of pgcommon_shlib and pgport_shlib. That means, if I want to build ecpg, first
> I need to build  pgcommon_shlib and pgport_shlib and after that I am able to
> build ecpg.
> 
> I would like to ask if this behavior is expected or not ? Because previous
> version doesn't require this separate builds.

Ah, I see the problem, and this is a new bug in PG 12.  The attached
patch fixes PG 12 and master.
   
-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +
diff --git a/src/interfaces/ecpg/pgtypeslib/Makefile b/src/interfaces/ecpg/pgtypeslib/Makefile
new file mode 100644
index 530b580..6ce129b
*** a/src/interfaces/ecpg/pgtypeslib/Makefile
--- b/src/interfaces/ecpg/pgtypeslib/Makefile
*** OBJS = \
*** 38,43 
--- 38,45 
  
  all: all-lib
  
+ all-lib: | submake-libpgport
+ 
  # Shared library stuff
  include $(top_srcdir)/src/Makefile.shlib
  


Re: Auxiliary Processes and MyAuxProc

2020-03-21 Thread Peter Eisentraut

On 2020-03-19 14:29, Mike Palmiotto wrote:

More specifically, I don't agree with the wholesale renaming of
auxiliary process to subprocess.  Besides the massive code churn, the


I'm not sure I understand the argument here. Where do you see
wholesale renaming of AuxProc to Subprocess? Subprocess is the name
for postmaster subprocesses, whereas Auxiliary Processes are a subset
of those processes.


Specifically renaming AuxProcType to SubprocessType and everything that 
follows from that, including the name of new files.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




GSoC chat link not working

2020-03-21 Thread Ananya Srivastava
Hey Folks!
irc://irc.freenode.net/postgresql  link is not working and I am not able to
use the chat option to clear some doubt. here is an ss if you require it.

Thanking you in advance
Ananya Srivastava
[image: image.png]


Re: Add A Glossary

2020-03-21 Thread Justin Pryzby
On Sat, Mar 21, 2020 at 03:08:30PM +0100, Jürgen Purtz wrote:
> On 21.03.20 00:03, Justin Pryzby wrote:
> > > > > +   
> > > > > +Host
> > > > > +
> > > > > + 
> > > > > +  See Server.
> > > > Or client.  Or proxy at some layer or other intermediate thing.  Maybe 
> > > > just
> > > > remove this.
> > > Sometimes the term "host" is used in a different meaning. Therefor we 
> > > shall
> > > have this glossary entry for clarification that it shall be used only in 
> > > the
> > > sense of a "server".
> > I think that suggests just removing "host" and consistently saying "server".
> 
> "server", "host", "database server": All three terms are used intensively in
> the documentation. When we define glossary terms, we should also take into
> account the consequences for those parts.

The documentation uses "host", but doesn't always mean "server".

$ git grep -Fw host doc/src/
doc/src/sgml/backup.sgml:   that you can perform this backup procedure from any 
remote host that has

pg_hba appears to be all about client "hosts".
FATAL:  no pg_hba.conf entry for host "123.123.123.123", user "andym", database 
"testdb"

-- 
Justin




Re: GSoC applicant proposal, Uday PB

2020-03-21 Thread Chapman Flack
On 03/21/20 10:36, inout wrote:

> Uday how about starting work on your proposal and claimed interest in
> PostgreSQL without worrying about the GSoC money ?

Naturally I'd be pleased with that outcome, but it's a purely personal
and situational decision that I wouldn't presume to press.

Regards,
-Chap




Re: Internal key management system

2020-03-21 Thread Bruce Momjian
On Sat, Mar 21, 2020 at 10:01:02AM -0400, Bruce Momjian wrote:
> On Sat, Mar 21, 2020 at 02:12:46PM +0900, Masahiko Sawada wrote:
> > On Sat, 21 Mar 2020 at 05:30, Bruce Momjian  wrote:
> > > We should create an SQL-level master key that is different from the
> > > block-level master key.  By using separate keys, and not deriving them
> > > from a single key, they keys can be rotated and migrated to a different
> > > cluster independently.  For example, users might want to create a new
> > > cluster with a new block-level key, but might want to copy the SQL-level
> > > key from the old cluster to the new cluster.  Both keys would be
> > > unlocked with the same passphrase.
> > 
> > I've updated the patch according to yesterday's meeting. As the above
> > description by Bruce, the current patch have two encryption keys.
> > Previously we have the master key in pg_control but due to exceeding
> > the safe size limit of pg_control I moved two keys to the dedicated
> > file located at global/pg_key. A wrapped key is 128 bytes and the
> > total size including two wrapped key became 552 bytes while safe limit
> > is 512 bytes.
> > 
> > During pg_upgrade we copy the key file from the old cluster to the new
> > cluster. Therefore we can unwrap the data that is wrapped on the old
> > cluster on the new cluster.
> 
> I wonder if we should just use two files, one for each key.

Actually, I think we need three files:

*  TDE WAL key file
*  TDE block key file
*  SQL-level file

Primaries and standbys have to use the same TDE WAL key file, but can
use different TDE block key files to allow for key rotation, so having
separate files makes sense --- maybe they need to be in their own
directory.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: GSoC applicant proposal, Uday PB

2020-03-21 Thread inout
>  ---Original Message---
>  From: Chapman Flack 
>  
>  p.b uday, thank you for your interest and the time spent on your
>  proposal; I'm sorry to be withdrawing this project. I hope you might
>  continue to have interest in PostgreSQL generally or PL/Java specifically
>  in the future.
>  
>  Regards,
>  -Chap
>  

Uday how about starting work on your proposal and claimed interest in 
PostgreSQL without worrying about the GSoC money ?




Re: GSoC applicant proposal, Uday PB

2020-03-21 Thread Chapman Flack
On 03/19/20 15:31, Stephen Frost wrote:
>> So I'm not certain how it should be categorized, or whether GSoC
>> rules should preclude it. Judgment call?
>
> You could ask on the GSoC mentors list, but I feel pretty confident that
> this doesn't meet the criteria to be a GSoC project, unfortunately.

It did produce a useful thread on the mentors list, with views like
"while GSOC project doesn't allow project about *WRITING* documentation,
I don't see any reason why coding some solution which improves your
release cycles for eternity wouldn't qualify as a valid GSOC project" and
"if it's just writing a few scripts that will take a week and the remainder
is documenting that work, I'm fairly certain that doesn't count as a valid
GSoC. Now if it's code-heavy -- then definitely that counts as a valid
GSoC project."

At the same time, an off-list response added "However if the org admins
don't like the project, or prefer other projects instead of it, the
project has low chances of being selected" and, given that there doesn't
seem to be clear agreement, and the organization surely will have more
worthy projects than slots will be available for, it is probably best
that I withdraw this one for now.

p.b uday, thank you for your interest and the time spent on your
proposal; I'm sorry to be withdrawing this project. I hope you might
continue to have interest in PostgreSQL generally or PL/Java specifically
in the future.

Regards,
-Chap




Re: Add A Glossary

2020-03-21 Thread Jürgen Purtz

On 21.03.20 00:03, Justin Pryzby wrote:

+   
+Host
+
+ 
+  See Server.

Or client.  Or proxy at some layer or other intermediate thing.  Maybe just
remove this.

Sometimes the term "host" is used in a different meaning. Therefor we shall
have this glossary entry for clarification that it shall be used only in the
sense of a "server".

I think that suggests just removing "host" and consistently saying "server".


"server", "host", "database server": All three terms are used 
intensively in the documentation. When we define glossary terms, we 
should also take into account the consequences for those parts. 
"database server" is the most diffuse. E.g.: In 'config.sgml' he is used 
in the sense of some hardware or VM "... If you have a dedicated 
database server with 1GB or more of RAM ..." as well as in the sense of 
an instance "... To start the database server on the command prompt 
...". Additionally the term is completely misleading. In both cases we 
do not mean something which is related to a database but something which 
is related to a cluster.


In the past, people accepted such blurs. My - minimal - intention is to 
raise awareness of such ambiguities, or - better - to clearly define the 
situation in the glossary. But this is only a first step. The second 
step shall be a rework of the documentation to use the preferred terms 
defined in the glossary. Because there will be a time gap between the 
two steps, we may want to be a little chatty in the glossary and define 
ambiguous terms as shown in the following example:


---

Server: The term "Server" denotes   .

Host: An outdated term which will be replaced by 
Server over time.


Database Server: An outdated term which sometimes denotes a 
Server and sometimes an 
Instance.


---

This is a pattern for all terms which we currently described with the 
phrase "See ...". Later, after reviewing the documentation by 
eliminating the non-preferred terms, the glossary entries with "An 
outdated term ..." can be dropped.


In the last days we have seen many huge and small proposals. I think, it 
will be helpful to summarize this work by waiting for a patch from 
Alvaro containing everything it deems useful from his point of view.


Kind regards, Jürgen




Re: Internal key management system

2020-03-21 Thread Bruce Momjian
On Sat, Mar 21, 2020 at 02:12:46PM +0900, Masahiko Sawada wrote:
> On Sat, 21 Mar 2020 at 05:30, Bruce Momjian  wrote:
> > We should create an SQL-level master key that is different from the
> > block-level master key.  By using separate keys, and not deriving them
> > from a single key, they keys can be rotated and migrated to a different
> > cluster independently.  For example, users might want to create a new
> > cluster with a new block-level key, but might want to copy the SQL-level
> > key from the old cluster to the new cluster.  Both keys would be
> > unlocked with the same passphrase.
> 
> I've updated the patch according to yesterday's meeting. As the above
> description by Bruce, the current patch have two encryption keys.
> Previously we have the master key in pg_control but due to exceeding
> the safe size limit of pg_control I moved two keys to the dedicated
> file located at global/pg_key. A wrapped key is 128 bytes and the
> total size including two wrapped key became 552 bytes while safe limit
> is 512 bytes.
> 
> During pg_upgrade we copy the key file from the old cluster to the new
> cluster. Therefore we can unwrap the data that is wrapped on the old
> cluster on the new cluster.

I wonder if we should just use two files, one for each key.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: color by default

2020-03-21 Thread Isaac Morland
On Sat, 21 Mar 2020 at 00:25, Jonah H. Harris 
wrote:

> On Tue, Dec 31, 2019 at 8:35 AM Tom Lane  wrote:
>
>> Peter Eisentraut  writes:
>> > With the attached patch, I propose to enable the colored output by
>> > default in PG13.
>>
>> FWIW, I shall be setting NO_COLOR permanently if this gets committed.
>> I wonder how many people there are who actually *like* colored output?
>> I find it to be invariably less readable than plain B text.
>>
>
> Same.
>

For me it depends on what the colour is doing. I was very pleased when I
first saw coloured output from ls, which if I remember correctly repeats
the information provided by -F but more prominently. Similarly, I
appreciate diff output that highlights the differences. At the same time I
can appreciate a preference for "just plain text please".


> I may well be in the minority, but I think some kind of straw poll
>> might be advisable, rather than doing this just because.
>>
>
> +1 on no color by default.
>

Given that there is apparently a standard NO_COLOR environment variable
which can be set, I think it's reasonable to default to gentle use of
colour, but turn it off if the standard variable is set. Somebody who wants
no color will probably already have the variable set, so in effect for the
people who want it that way no colour would already be the default (not the
default default, but the de facto default).


Re: ssl passphrase callback

2020-03-21 Thread Andrew Dunstan


On 3/19/20 4:10 AM, asaba.takan...@fujitsu.com wrote:



> Trailing space:
>
> 220 +   X509v3 Subject Key Identifier:
> 222 +   X509v3 Authority Key Identifier:


We're going to remove all the text, so this becomes moot.


>
> Missing "d"(password?):
>
> 121 +/* init hook for SSL, the default sets the passwor callback if 
> appropriate */
>

Will fix, thanks.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: ssl passphrase callback

2020-03-21 Thread Andrew Dunstan


On 3/15/20 10:14 PM, Andreas Karlsson wrote:
> On 2/18/20 11:39 PM, Andrew Dunstan wrote:
>> This should fix the issue, it happened when I switched to using a
>> pre-generated cert/key.
>
> # Review
>
> The patch still applies and passes the test suite, both with openssl
> enabled and with it disabled.
>
> As for the feature I agree that it is nice to expose this callback to
> extension writers and I agree with the approach taken. The other
> proposals up-thread seem over engineered to me. Maybe if it was a
> general feature used in many places those proposals would be worth it,
> but I am still skeptical even then. This approach is so much simpler.
>
> The only real risk I see is that if people install multiple libraries
> for this they will overwrite the hook for each other but we have other
> cases like that already so I think that is fine.


Right, me too.


>
> The patch moves secure_initialize() to after
> process_shared_preload_libraries() which in theory could break some
> extension but it does not seem like a likely thing for extensions to
> rely on. Or is it?


I don't think so.


>
> An idea would be to have the code in ssl_passphrase_func.c to warn if
> the ssl_passphrase_command GUC is set to make it more useful as
> example code for people implementing this hook.


I'll look at that. Should be possible.


>
> # Nitpicking
>
> The certificate expires in 2030 while all other certificates used in
> tests expires in 2046. Should we be consistent?


Sure. will fix.


>
> There is text in server.crt and server.key, while other certificates
> and keys used in the tests do not have this. Again, should we be
> consistent?


Not in server.key, but I will suppress it for the crt file.



>
> Empty first line in
> src/test/modules/ssl_passphrase_callback/t/001_testfunc.pl which
> should probably just be removed or replaced with a shebang.


OK


>
> There is an extra space between the parentheses in the line below.
> Does that follow our code style for Perl?
>
> +unless ( ($ENV{with_openssl} || 'no') eq 'yes')
>
> Missing space after comma in:
>
> +ok(-e "$ddir/postmaster.pid","postgres started");
>
> Missing space after comma in:
>
> +ok(! -e "$ddir/postmaster.pid","postgres not started with bad
> passphrase");


I'll make sure to run it through our perl indenter.


Thanks for the review.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: backup manifests

2020-03-21 Thread Amit Kapila
On Sat, Mar 21, 2020 at 4:00 AM Robert Haas  wrote:
>
> On Mon, Mar 16, 2020 at 2:03 AM Suraj Kharage
>  wrote:
> > One more suggestion, recent commit (1933ae62) has added the PostgreSQL home 
> > page to --help output.
>
> Good catch. Fixed. I also attempted to address the compiler warning
> you mentioned in your other email.
>
> Also, I realized that the previous patch versions didn't handle the
> hex-encoded path format that we need to use for non-UTF8 filenames,
> and that there was no easy way to test that format. So, in this
> version I added an option to force all pathnames to be encoded in that
> format. I also made that option capable of suppressing the backup
> manifest altogether. Other than that, this version is pretty much the
> same as the last version, except for a few additional test cases which
> I added to get the code coverage up even a little more. It would be
> nice if someone could test whether the tests pass on Windows.
>

On my CentOS, the patch gives below compilation failure:
pg_validatebackup.c: In function ‘parse_manifest_file’:
pg_validatebackup.c:335:19: error: assignment left-hand side might be
a candidate for a format attribute [-Werror=suggest-attribute=format]
  context.error_cb = report_manifest_error;

I have tested it on Windows and found there are multiple failures.
The failures are as below:
Test Summary Report
---
t/002_algorithm.pl   (Wstat: 512 Tests: 5 Failed: 4)
  Failed tests:  2-5
  Non-zero exit status: 2
  Parse errors: Bad plan.  You planned 19 tests but ran 5.
t/003_corruption.pl  (Wstat: 256 Tests: 14 Failed: 7)
  Failed tests:  2, 4, 6, 8, 10, 12, 14
  Non-zero exit status: 1
  Parse errors: Bad plan.  You planned 44 tests but ran 14.
t/004_options.pl (Wstat: 4352 Tests: 25 Failed: 17)
  Failed tests:  2, 4, 6-12, 14-17, 19-20, 22, 25
  Non-zero exit status: 17
t/005_bad_manifest.pl (Wstat: 1792 Tests: 44 Failed: 7)
  Failed tests:  18, 24, 26, 30, 32, 34, 36
  Non-zero exit status: 7
Files=6, Tests=109, 72 wallclock secs ( 0.05 usr +  0.01 sys =  0.06 CPU)
Result: FAIL

Failure Report

t/002_algorithm.pl . 1/19
#   Failed test 'backup ok with algorithm "none"'
#   at t/002_algorithm.pl line 33.

#   Failed test 'backup manifest exists'
#   at t/002_algorithm.pl line 39.

t/002_algorithm.pl . 4/19 #   Failed test 'validate backup with
algorithm "none"'
#   at t/002_algorithm.pl line 53.

#   Failed test 'backup ok with algorithm "crc32c"'
#   at t/002_algorithm.pl line 33.
# Looks like you planned 19 tests but ran 5.
# Looks like you failed 4 tests of 5 run.
# Looks like your test exited with 2 just after 5.
t/002_algorithm.pl . Dubious, test returned 2 (wstat 512, 0x200)
Failed 18/19 subtests
t/003_corruption.pl  1/44
#   Failed test 'intact backup validated'
#   at t/003_corruption.pl line 110.

#   Failed test 'corrupt backup fails validation: extra_file: matches'
#   at t/003_corruption.pl line 117.
#   'pg_validatebackup: fatal: could not parse backup
manifest: both pathname and encoded pathname
# '
# doesn't match '(?^:extra_file.*present on disk but not in the manifest)'
t/003_corruption.pl  5/44
#   Failed test 'intact backup validated'
#   at t/003_corruption.pl line 110.
t/003_corruption.pl  7/44
#   Failed test 'corrupt backup fails validation:
extra_tablespace_file: matches'
#   at t/003_corruption.pl line 117.
#   'pg_validatebackup: fatal: could not parse backup
manifest: both pathname and encoded pathname
# '
# doesn't match '(?^:extra_ts_file.*present on disk but not in the
manifest)'
t/003_corruption.pl  9/44
#   Failed test 'intact backup validated'
#   at t/003_corruption.pl line 110.

#   Failed test 'corrupt backup fails validation: missing_file: matches'
#   at t/003_corruption.pl line 117.
#   'pg_validatebackup: fatal: could not parse backup
manifest: both pathname and encoded pathname
# '
# doesn't match '(?^:pg_xact/.*present in the manifest but not on disk)'
t/003_corruption.pl  13/44
#   Failed test 'intact backup validated'
#   at t/003_corruption.pl line 110.
# Looks like you planned 44 tests but ran 14.
# Looks like you failed 7 tests of 14 run.
# Looks like your test exited with 1 just after 14.
t/003_corruption.pl  Dubious, test returned 1 (wstat 256, 0x100)
Failed 37/44 subtests
t/004_options.pl ... 1/25
#   Failed test '-q succeeds: exit code 0'
#   at t/004_options.pl line 25.

#   Failed test '-q succeeds: no stderr'
#   at t/004_options.pl line 27.
#  got: 'pg_validatebackup: fatal: could not parse backup
manifest: both pathname and encoded pathname
# '
# expected: ''

#   Failed test '-q checksum mismatch: matches'
#   at t/004_options.pl line 37.
#   'pg_validatebackup: fatal: could not parse backup
manifest: both pathname and encoded pathname
# '
# doesn't match '(?^:checksum mismatch for file \"PG_VERSION\")'

Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-03-21 Thread Kartyshov Ivan
As it was discussed earlier, I added wait for statement into begin/start 
statement.


Synopsis
==
BEGIN [ WORK | TRANSACTION ] [ transaction_mode[, ...] ] wait_for_event
  where transaction_mode is one of:
ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ 
COMMITTED | READ UNCOMMITTED }

READ WRITE | READ ONLY
[ NOT ] DEFERRABLE

  WAIT FOR [ANY | SOME | ALL] event [, event ...]
  and event is:
  LSN value [options]
  TIMESTAMP value

  and options is:
  TIMEOUT delay
  UNTIL TIMESTAMP timestamp



--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 8d91f3529e..8697f9807f 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -187,6 +187,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/begin.sgml b/doc/src/sgml/ref/begin.sgml
index c23bbfb4e7..d86465b6d3 100644
--- a/doc/src/sgml/ref/begin.sgml
+++ b/doc/src/sgml/ref/begin.sgml
@@ -21,13 +21,25 @@ PostgreSQL documentation
 
  
 
-BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
+BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] wait_for_event
 
 where transaction_mode is one of:
 
 ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
 READ WRITE | READ ONLY
 [ NOT ] DEFERRABLE
+
+wait_for_event
+WAIT FOR [ANY | SOME | ALL] event [, event ...]
+
+where event is:
+LSN value [options]
+TIMESTAMP value
+
+and where options is one of:
+TIMEOUT delay
+UNTIL TIMESTAMP timestamp
+
 
  
 
diff --git a/doc/src/sgml/ref/wait.sgml b/doc/src/sgml/ref/wait.sgml
new file mode 100644
index 00..41b8eb019f
--- /dev/null
+++ b/doc/src/sgml/ref/wait.sgml
@@ -0,0 +1,148 @@
+
+
+
+ 
+  WAIT FOR
+ 
+
+ 
+  WAIT FOR
+  7
+  SQL - Language Statements
+ 
+
+ 
+  WAIT FOR
+  wait for the target LSN to be replayed
+ 
+
+ 
+
+WAIT FOR [ANY | SOME | ALL] event [, event ...]
+
+where event is:
+LSN value [options]
+TIMESTAMP value
+
+and where options is one of:
+TIMEOUT delay
+UNTIL TIMESTAMP timestamp
+
+WAIT FOR LSN 'lsn_number'
+WAIT FOR LSN 'lsn_number' TIMEOUT wait_timeout
+WAIT FOR LSN 'lsn_number' UNTIL TIMESTAMP wait_time
+WAIT FOR TIMESTAMP wait_time
+WAIT FOR ALL LSN 'lsn_number' TIMEOUT wait_timeout, TIMESTAMP wait_time
+WAIT FOR ANY LSN 'lsn_number', TIMESTAMP wait_time
+
+ 
+
+ 
+  Description
+
+  
+   WAIT FOR provides a simple
+   interprocess communication mechanism to wait for timestamp or the target log sequence
+   number (LSN) on standby in PostgreSQL
+   databases with master-standby asynchronous replication. When run with the
+   LSN option, the WAIT FOR command
+   waits for the specified LSN to be replayed. By default, wait
+   time is unlimited. Waiting can be interrupted using Ctrl+C, or
+   by shutting down the postgres server. You can also limit the wait
+   time using the TIMEOUT option.
+  
+
+ 
+
+ 
+  Parameters
+
+  
+   
+LSN
+
+ 
+  Specify the target log sequence number to wait for.
+ 
+
+   
+
+   
+TIMEOUT wait_timeout
+
+ 
+  Limit the time interval to wait for the LSN to be replayed.
+  The specified wait_timeout must be an integer
+  and is measured in milliseconds.
+ 
+
+   
+
+   
+UNTIL TIMESTAMP wait_time
+
+ 
+  Limit the time to wait for the LSN to be replayed.
+  The specified wait_time must be timestamp.
+ 
+
+   
+
+  
+ 
+
+ 
+  Examples
+
+  
+   Run WAIT FOR from psql,
+   limiting wait time to 1 milliseconds:
+
+
+WAIT FOR LSN '0/3F07A6B1' TIMEOUT 1;
+NOTICE:  LSN is not reached. Try to increase wait time.
+LSN reached
+-
+ f
+(1 row)
+
+  
+
+  
+   Wait until the specified LSN is replayed:
+
+WAIT FOR LSN '0/3F07A611';
+LSN reached
+-
+ t
+(1 row)
+
+  
+
+  
+   Limit LSN wait time to 50 milliseconds, and then cancel the command:
+
+WAIT FOR LSN '0/3F0FF791' TIMEOUT 50;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to increase wait time.
+ERROR:  canceling statement due to user request
+ LSN reached
+-
+ f
+(1 row)
+
+
+ 
+
+ 
+  Compatibility
+
+  
+   There is no WAIT FOR statement in the SQL
+   standard.
+  
+ 
+
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index cef09dd38b..588e96aa14 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -215,6 +215,7 @@



+   
 
  
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 793c076da6..8591472e21 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -41,6 +41,7 @@
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
 #include "commands/tablespace.h"
+#include 

Re: GiST secondary split

2020-03-21 Thread Alexander Korotkov
Hi, Peter!

On Sat, Mar 21, 2020 at 12:36 AM Peter Griggs  wrote:
> I am hacking some GIST code for a research project and wanted clarification 
> about what exactly a secondary split is in GIST.

Secondary split in GiST is the split by second and subsequent columns
on multicolumn GiST indexes.  In the general it works as following.
Split by the first column produced two union keys.  It might happen
that some of first column values are contained in both of union keys.
If so, corresponding tuples are subject of secondary split.

> More specifically I am wondering why the supportSecondarySplit function 
> (which is in src/backend/access/gist/gistsplit.c) can assume that the data is 
> currently on the left side in order to swap it.

I don't think it assumes that all the data is currently on the left
side.  There is left and right sides of primary split.  And in the
same time there is left and right sides of secondary split.  The might
union them straight or crosswise.  The name of leaveOnLeft variable
might be confusing.  leaveOnLeft == true means straight union, while
leaveOnLeft == false means crosswise union.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-03-21 Thread Anna Akenteva

On 2020-03-17 15:47, Kartyshov Ivan wrote:

Synopsis
==
WAIT FOR [ANY | SOME | ALL] event [, event ...]

I'm confused as to what SOME would mean in this
command's syntax, but I can see you removed it
from gram.y since the last patch. Did you
decide to not implement it after all?

Also, I had a look at the code and tested it a bit.


If I specify many events, here's what happens:

For WAIT_FOR_ALL strategy, it chooses
- maximum LSN
- maximum delay
and waits for the resulting event.

For WAIT_FOR_ANY strategy - same, but it uses
minimal LSN/delay.

In other words, statements
  (1) WAIT FOR ALL
LSN '7F97208' TIMEOUT 11,
LSN '3002808' TIMEOUT 50;
  (2) WAIT FOR ANY
LSN '7F97208' TIMEOUT 11,
LSN '3002808' TIMEOUT 50;
are essentially equivalent to:
  (1) WAIT FOR LSN '7F97208' TIMEOUT 50;
  (2) WAIT FOR LSN '3002808' TIMEOUT 11;

It seems a bit counter-intuitive to me, because
I expected events to be treated independently.
Is this the expected behaviour?


In utility.c:
if (event->delay < time_val)
time_val = event->delay / 1000;

Since event->delay is an int, the result will
be zero for any delay value less than 1000.
I suggest either dividing by 1000.0 or
explicitly converting int to float.

Also, shouldn't event->delay be divided
by 1000 in the 'if' part as well?


You compare two LSN-s using pg_lsn_cmp():
res = DatumGetUInt32(
DirectFunctionCall2(pg_lsn_cmp,
lsn, trg_lsn));

As far as I understand, it'd be enough to use
operators such as "<=", as you do in wait.c:
/* If LSN has been replayed */
if (trg_lsn <= cur_lsn)

--
Anna Akenteva
Postgres Professional:
The Russian Postgres Company
http://www.postgrespro.com




Re: SQL/JSON: functions

2020-03-21 Thread Nikita Glukhov

Attached 46th version of the patches.


On 20.03.2020 22:34, Pavel Stehule wrote:


čt 19. 3. 2020 v 23:57 odesílatel Nikita Glukhov 
mailto:n.glu...@postgrespro.ru>> napsal:


Attached 45th version of the patches.

Nodes JsonFormat, JsonReturning, JsonPassing, JsonBehavior were fixed.

On 17.03.2020 21:35, Pavel Stehule wrote:


út 17. 3. 2020 v 1:55 odesílatel Nikita Glukhov
mailto:n.glu...@postgrespro.ru>> napsal:

Attached 44th version of the patches.


On 12.03.2020 16:41, Pavel Stehule wrote:


On 12.03.2020 00:09 Nikita Glukhov wrote:

Attached 43rd version of the patches.

The previous patch #4 ("Add function formats") was removed.
Instead, introduced new executor node JsonCtorExpr which is used to 
wrap
SQL/JSON constructor function calls (FuncExpr, Aggref, WindowFunc).

Also, JsonIsPredicate node began to be used as executor node.
This helped to remove unnecessary json[b]_is_valid() user functions.


It looks very well - all tests passed, code looks well.

Now, when there are special nodes, then the introduction of
COERCE_INTERNAL_CAST is not necessary, and it is only my one
and last objection again this patch's set.


I have removed patch #3 with COERCE_INTERNAL_CAST too.

Coercions in JsonValueExpr in JsonCtorExpr, which were previously 
hidden with
COERCE_INTERNAL_CAST and which were outputted as RETURNING or FORMAT 
JSON
clauses, now moved into separate expressions.

I am looking on the code, and although the code is correct it
doesn't look well (consistently with other node types).

I think so JsonFormat and JsonReturning should be node types, not
just structs. If these types will be nodes, then you can simplify
_readJsonExpr and all node operations on this node.


JsonFormat and JsonReturning was transformed into nodes, and not included
directly into other nodes now.



User functions json[b]_build_object_ext() and json[b]_build_array_ext() 
also
can be easily removed.   But it seems harder to remove new aggregate 
functions
json[b]_objectagg() and json[b]_agg_strict(), because they can't be 
called
directly from JsonCtorExpr node.


I don't see reasons for another reduction now. Can be great if
you can finalize work what you plan for pg13.


I have removed json[b]_build_object_ext() and json[b]_build_array_ext().

But json[b]_objectagg() and json[b]_agg_strict() are still present.
It seems that removing them requires majors refactoring of the execution
of Aggref and WindowFunc nodes.


+<->READ_ENUM_FIELD(on_error.btype, JsonBehaviorType);
+<->READ_NODE_FIELD(on_error.default_expr);
+<->READ_ENUM_FIELD(on_empty.btype, JsonBehaviorType);
+<->READ_NODE_FIELD(on_empty.default_expr);

JsonBehavior is node type. Then why you don't write just

READ_NODE_FIELD(on_error);
READ_NODE_FIELD(on_empty)

??


JsonBehavior now used in JsonExpr as a pointer to node.



And maybe the code can be better if you don't use JsonPassing
struct (or use it only inside gram.y) and pass just List *names,
List *values.

Nodes should to contains another nodes or scalar types. Using
structs (that are not nodes)  inside doesn't look consistently.


JsonPassing was replaced with two Lists.



I found some not finished code in 0003 patch
+
+json_name_and_value:
+/* TODO
+<-><--><-->KEY c_expr VALUE_P json_value_expr %prec POSTFIXOP
+<-><--><--><-->{ $$ = makeJsonKeyValue($2, $4); }
+<-><--><-->|
+*/


This is unsupported variant of standard syntax, because it seems to lead
to unresolvable conflicts.  The only supports syntax is:

JSON_OBJECT(key : value)
JSON_OBJECT(key VALUE value)

ok. So please change comment from ToDo to this explanation. Maybe note 
in doc (unimplemented features can be good idea)


Some these unresolvable conflicts are solved with extra parenthesis in 
Postgres.




The support for json type in jsonpath also seems to be immature, so I 
will try
to remove it in the next iteration.

What do you think? This patch is little bit off topic, so if you
don't need it, then can be removed. Is there some dependency for
"jsontable" ?


There is a dependency in SQL/JSON query functions executor, because executor
uses new structure JsonItem instead of plain JsonbValue.  I will try to
preserve refactoring with JsonItem introduction, but remove json support.
If it will be still unacceptable, I will try to completely remove patch #1.


I have completely removed patch #1. It turned out to be not so difficult.




I have much better feeling from version 45 (now it looks like Postgres 
C :)). Still there are some small issues.


1. commented code

+json_encoding:
+<-><--><-->name<--><--><--><--><--><--><--><--><-->{ $$ = 

Re: error context for vacuum to include block number

2020-03-21 Thread Justin Pryzby
On Sat, Mar 21, 2020 at 01:00:03PM +0530, Amit Kapila wrote:
> I have addressed your comments in the attached patch.  Today, while
> testing error messages from various phases, I noticed that the patch
> fails to display error context if the error occurs during the truncate
> phase.  The reason was that we had popped the error stack in
> lazy_scan_heap due to which it never calls the callback.  I think we
> need to set up callback at a higher level as is done in the attached
> patch.  I have done the testing by inducing errors in various phases
> and it prints the required information.  Let me know what you think of
> the attached?

Thanks.  My tests with TRUNCATE were probably back when we had multiple
push/pop cycles of local error callbacks.

This passes my tests.

-- 
Justin




Re: error context for vacuum to include block number

2020-03-21 Thread Amit Kapila
On Fri, Mar 20, 2020 at 8:24 PM Justin Pryzby  wrote:
>
> On Fri, Mar 20, 2020 at 04:58:08PM +0530, Amit Kapila wrote:
> > See, how the attached looks?  I have written a commit message as well,
> > see if I have missed anyone is from the credit list?
>
> Thanks for looking again.
>
> Couple tweaks:
>

I have addressed your comments in the attached patch.  Today, while
testing error messages from various phases, I noticed that the patch
fails to display error context if the error occurs during the truncate
phase.  The reason was that we had popped the error stack in
lazy_scan_heap due to which it never calls the callback.  I think we
need to set up callback at a higher level as is done in the attached
patch.  I have done the testing by inducing errors in various phases
and it prints the required information.  Let me know what you think of
the attached?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v31-0001-Introduce-vacuum-errcontext-to-display-additiona.patch
Description: Binary data