date:20210706

Re: [PATCH][postgres_fdw] Add push down of CASE WHEN clauses

2021-07-06 Thread Gilles Darold

Le 07/07/2021 à 06:59, David Rowley a écrit :
> On Wed, 7 Jul 2021 at 10:18, Gilles Darold  wrote:
>> I have noticed that postgres_fdw do not push down the CASE WHEN clauses. In 
>> the following case this normal:
> This looks very similar to [1] which is in the current commitfest.
>
> Are you able to look over that patch and check to ensure you're not
> doing anything extra that the other patch isn't. If so, then likely
> the best way to progress would be for you to test and review that
> patch.
>
> David
>
> [1] https://commitfest.postgresql.org/33/3171/


Strange I have searched the commitfest yesterday but without success,
this is clearly a duplicate. Anyway, thanks for the pointer and yes I
will review Alexander's patch as I know the subject now :-)


Best regards

-- 
Gilles Darold
MigOps Inc (https://migops.com/)

Re: [PATCH] Make jsonapi usable from libpq

2021-07-06 Thread Tom Lane

Michael Paquier  writes:
> It seems to me that this does not address yet the problems with the
> palloc/pstrdup in jsonapi.c though, which would need to rely on
> malloc() if we finish to use this code in libpq.  I am not sure yet
> that we have any need to do that yet as we may finish by not using
> OAUTH as SASL mechanism at the end in core.  So perhaps it would be
> better to just give up on this thread for now?

Yeah, I think there's nothing to do here unless we decide that we
have to have JSON-parsing ability inside libpq ... which is a
situation I think we should try hard to avoid.

regards, tom lane

Re: [PATCH] Make jsonapi usable from libpq

2021-07-06 Thread Michael Paquier

On Tue, Jun 29, 2021 at 03:34:29PM -0400, Tom Lane wrote:
> Actually, I'd forgotten that the PQExpBuffer functions are already
> exported by libpq, and much of our frontend code already uses them
> from there.  So we don't really need to move anything unless there's
> a call to use this code in clients that don't use libpq, which are
> a pretty small set.
> 
> Also, having them be available both from libpq.so and from libpgcommon.a
> would be a tad problematic I think; it'd be hard to tell which way the
> linker would choose to resolve that.

Coming back to this thread.  You were thinking about switching to
PQExpBuffer for the error strings generated depending on error code
for the frontend, right?  Using a PQExpBuffer would be a problem if
attempting to get a more detailed error for pg_verifybackup, though I
guess that we can continue to live in this tool without this much
amount of details in the error strings.

It seems to me that this does not address yet the problems with the
palloc/pstrdup in jsonapi.c though, which would need to rely on
malloc() if we finish to use this code in libpq.  I am not sure yet
that we have any need to do that yet as we may finish by not using
OAUTH as SASL mechanism at the end in core.  So perhaps it would be
better to just give up on this thread for now?
--
Michael


signature.asc
Description: PGP signature

Re: row filtering for logical replication

2021-07-06 Thread Greg Nancarrow

On Thu, Jul 1, 2021 at 10:43 AM Euler Taveira  wrote:
>
>
> Amit, thanks for rebasing this patch. I already had a similar rebased patch in
> my local tree. A recent patch broke your version v15 so I rebased it.
>

Hi,

I did some testing of the performance of the row filtering, in the
case of the publisher INSERTing 100,000 rows, using a similar test
setup and timing as previously used in the “commands_to_perf_test.sql“
script posted by Önder Kalacı.

I found that with the call to ExecInitExtraTupleSlot() in
pgoutput_row_filter(), then the performance of pgoutput_row_filter()
degrades considerably over the 100,000 invocations, and on my system
it took about 43 seconds to filter and send to the subscriber.
However, by caching the tuple table slot in RelationSyncEntry, this
duration can be dramatically reduced by 38+ seconds.
A further improvement can be made using this in combination with
Peter's plan cache (v16-0004).
I've attached a patch for this, which relies on the latest v16-0001
and v16-0004 patches posted by Peter Smith (noting that v16-0001 is
identical to your previously-posted 0001 patch).
Also attached is a graph (created by Peter Smith – thanks!) detailing
the performance improvement.

Regards,
Greg Nancarrow
Fujitsu Australia

v16-0005-Improve-row-filtering-performance.patch
Description: Binary data

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

2021-07-06 Thread Andy Fan

>
>
>
> For example: SELECT nullablecol FROM tab WHERE nullablecol = ;
>
> If the equality operator is strict then the nullablecol can be NULL in
> the WHERE clause but not in the SELECT list. Tom's idea should allow
> us to determine both of those things but your idea cannot tell them
> apart, so, in theory at least, Tom's idea seems better to me.
>
> David
>

That's really something I can't do,  thanks for the explanation.

-- 
Best Regards
Andy Fan (https://www.aliyun.com/)

Re: [PATCH] Pull general SASL framework out of SCRAM

2021-07-06 Thread Michael Paquier

On Tue, Jul 06, 2021 at 06:20:49PM +, Jacob Champion wrote:
> On Mon, 2021-07-05 at 17:17 +0900, Michael Paquier wrote:
> Each name must be null-terminated, not just null-separated. That way
> the list of names ends with an empty string:
> 
> name-one\0  <- added by the mechanism
>   name-two\0<- added by the mechanism
> \0  <- added by the framework
> 
> The way it's worded now, I could see some implementers failing to
> terminate the final name because the framework adds a trailing null
> already -- but the framework is terminating the list, not the final
> name.

Good point.  I have used ending with '\0' bytes instead.

>> + * init()
>> + *
>> + * Initializes mechanism-specific state for a connection.  This
>> + * callback must return a pointer to its allocated state, which will
>> + * be passed as-is as the first argument to the other callbacks.
>> + * free() is called to release any state resources.
> 
> Maybe say "The free() callback is called" to differentiate it from
> standard free()?

Yes, that could be confusing.  Switched to your wording instead.

> It's possible for conn->sasl to be NULL here, say if the client has
> channel_binding=require but connects as a user with an MD5 secret. The
> SCRAM TAP tests have one such case.

Indeed.

>> Hmm.  Does the RFCs tell us anything about that?
> 
> Just in general terms:
> 
>>Each authentication exchange consists of a message from the client to
>>the server requesting authentication via a particular mechanism,
>>followed by one or more pairs of challenges from the server and
>>responses from the client, followed by a message from the server
>>indicating the outcome of the authentication exchange.  (Note:
>>exchanges may also be aborted as discussed in Section 3.5.)
> 
> So a challenge must be met with a response, or the exchange must be
> aborted. (And I don't think our protocol implementation provides a
> client abort message; if something goes wrong, we just tear down the
> connection.)

Thanks.  At the same time, section 3.5 also says that the client may
send a message to abort.  So one can interpret that the client has
also the choice to abort without sending a response back to the
server?  Or I am just interpreting incorrectly the use of "may" in
this context?
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH][postgres_fdw] Add push down of CASE WHEN clauses

2021-07-06 Thread David Rowley

On Wed, 7 Jul 2021 at 10:18, Gilles Darold  wrote:
> I have noticed that postgres_fdw do not push down the CASE WHEN clauses. In 
> the following case this normal:

This looks very similar to [1] which is in the current commitfest.

Are you able to look over that patch and check to ensure you're not
doing anything extra that the other patch isn't. If so, then likely
the best way to progress would be for you to test and review that
patch.

David

[1] https://commitfest.postgresql.org/33/3171/

Re: ExecRTCheckPerms() and many prunable partitions

2021-07-06 Thread David Rowley

On Fri, 2 Jul 2021 at 12:41, Amit Langote  wrote:
> I'll mark the CF entry as WoA, unless you'd rather I just mark it RwF.

I've set it to waiting on author. It was still set to needs review.

If you think you'll not get time to write the patch during this CF,
feel free to bump it out.

David

Re: [PATCH] expand the units that pg_size_pretty supports on output

2021-07-06 Thread David Rowley

On Wed, 7 Jul 2021 at 02:46, David Christensen
 wrote:
> if we do decide to expand the units table there will be a
> few additional changes (most significantly, the return value of 
> `pg_size_bytes()` will need to switch
> to `numeric`).

I wonder if it's worth changing pg_size_bytes() to return NUMERIC
regardless of if we add any additional units or not.

Would you like to create 2 patches, one to change the return type and
another to add the new units, both based on top of the v2 patch I sent
earlier?

David

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

2021-07-06 Thread David Rowley

On Tue, 6 Jul 2021 at 22:39, David Rowley  wrote:
> If anyone feels differently, please let me know in the next couple of
> days. Otherwise, I plan on taking a final look and pushing it soon.

After doing some very minor adjustments, I pushed this. (29f45e299).

Thanks to James and Zhihong for reviewing.

David

Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb

2021-07-06 Thread Amul Sul

On Tue, Jul 6, 2021 at 11:06 PM Tom Lane  wrote:
>
> Amul Sul  writes:
> > On Tue, Apr 20, 2021 at 6:59 AM Kyotaro Horiguchi
> >  wrote:
> >> I don't mind RelationGetSmgr(index)->smgr_rnode alone or
> >> >member alone and there's not the previous call to
> >> RelationGetSmgr just above. How about using a temporary variable?
> >>
> >> SMgrRelation srel = RelationGetSmgr(index);
> >> smgrwrite(srel, ...);
> >> log_newpage(srel->..);
>
> > Understood.  Used a temporary variable for the place where
> > RelationGetSmgr() calls are placed too close or in a loop.
>
> [ squint... ]  Doesn't this risk introducing exactly the sort of
> cache-clobber hazard we're trying to prevent?  That is, the above is
> not safe unless you are *entirely* certain that there is not and never
> will be any possibility of a relcache flush before you are done using
> the temporary variable.  Otherwise it can become a dangling pointer.
>

Yeah, there will a hazard, even if we sure right but cannot guarantee future
changes in any subroutine that could get call in between.

> The point of the static-inline function idea was to be cheap enough
> that it isn't worth worrying about this sort of risky optimization.
> Given that an smgr function is sure to involve some kernel calls,
> I doubt it's worth sweating over an extra test-and-branch beforehand.
> So where I was hoping to get to is that smgr objects are *only*
> referenced by RelationGetSmgr() calls and nobody ever keeps any
> other pointers to them across any non-smgr operations.
>

Ok, will revert changes added in  the previous version, thanks.

Regards,
Amul

Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options

2021-07-06 Thread Bharath Rupireddy

On Wed, Jul 7, 2021 at 5:33 AM Peter Smith  wrote:
> PSA my patch which includes all the fixes mentioned above.

I agree with Amit to start a separate thread to discuss these points.
IMO, we can close this thread. What do you think?

Regards,
Bharath Rupireddy.

Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options

2021-07-06 Thread Amit Kapila

On Wed, Jul 7, 2021 at 7:36 AM Alvaro Herrera  wrote:
>
> On 2021-Jul-07, Peter Smith wrote:
>
> > 1. Zap 'opts' up-front
> >
> > + *
> > + * Caller is expected to have cleared 'opts'.
> >
> > This comment is putting the onus on the caller to "do the right thing".
> >
> > I think that hopeful expectations about input should be removed - the
> > function should just be responsible itself just to zap the SubOpts
> > up-front. It makes the code more readable, and it removes any
> > potential risk of garbage unintentionally passed in that struct.
> >
> > /* Start out with cleared opts. */
> > memset(opts, 0, sizeof(SubOpts));
>
> Yeah, I gave the initialization aspect some thought too when I reviewed
> 0001.  Having an explicit memset() just for sanity check is a waste when
> you don't really need it; and we're initializing the struct already at
> declaration time by assigning {0} to it, so having to add memset feels
> like such a waste.  Another point in the same area is that some of the
> struct members are initialized to some default value different from 0,
> so I wondered if it would have been better to remove the = {0} and
> instead have another function that would set everything up the way we
> want; but it seemed a bit excessive, so I ended up not suggesting that.
>
> We have many places in the source tree where the caller is expected to
> do the right thing, even when those right things are more complex than
> this one.  This one place isn't terribly bad in that regard, given that
> it's a pretty well contained static struct anyway (we would certainly
> not export a struct of this name in any .h file.)  I don't think it's
> all that bad.
>
> > Alternatively, at least there should be an assertion for some sanity check.
> >
> > Assert(opt->specified_opts == 0);
>
> No opinion on this.  It doesn't seem valuable enough, but maybe I'm on
> the minority on this.
>

I am also not sure if such an assertion adds much value.

> > 2. Remove redundant conditions
> >
> >   /* Check for incompatible options from the user. */
> > - if (enabled && *enabled_given && *enabled)
> > + if (opts->enabled &&
> > + IsSet(supported_opts, SUBOPT_ENABLED) &&
> > + IsSet(opts->specified_opts, SUBOPT_ENABLED))
>
> (etc)
>
> Yeah, I thought about this too when I saw the 0002 patch in this series.
> I agree that the extra rechecks are a bit pointless.
>

I don't think the committed patch has made anything worse here or
added any new condition. Now, even if we want to change these
conditions, it is better to discuss them separately. If we see as per
current code these look a bit redundant but OTOH, in the future one
might expect that if the supported option is not passed by the caller
and the user has specified it then it should be an error. For example,
we don't want to support some option via some Alter variant but it is
supported by Create variant.

-- 
With Regards,
Amit Kapila.

Re: rand48 replacement

2021-07-06 Thread Yura Sokolov


Fabien COELHO писал 2021-07-06 23:49:

Hello Yura,


However, I'm not enthousiastic at combining two methods depending on
the range, the function looks complex enough without that, so I would
suggest not to take this option. Also, the decision process adds to
the average cost, which is undesirable.


Given 99.99% cases will be in the likely case, branch predictor should
eliminate decision cost.


Hmmm. ISTM that a branch predictor should predict that unknown < small
should probably be false, so a hint should be given that it is really
true.


Why? Branch predictor is history based: if it were usually true here
then it will be true this time either.
unknown < small is usually true therefore branch predictor will assume
it is true.

I put `likely` for compiler: compiler then puts `likely` path closer.



And as Dean Rasheed correctly mentioned, mask method will have really 
bad pattern for branch predictor if range is not just below or equal 
to power of two.


On average the bitmask is the better unbiased method, if the online
figures are to be trusted. Also, as already said, I do not really want
to add code complexity, especially to get lower average performance,
and especially with code like "threshold = - range % range", where
both variables are unsigned, I have a headache just looking at it:-)


If you mention https://www.pcg-random.org/posts/bounded-rands.html then
1. first graphs are made with not exact Lemire's code.
  Last code from 
https://lemire.me/blog/2016/06/30/fast-random-shuffling/
  (which I derived from) performs modulo operation only if `(leftover < 
range)`.
  Even with `rand_range(100)` it is just once in four thousands 
runs.

2. You can see "Small-Constant Benchmark" at that page, Debiased Int is
  1.5 times faster. And even in "Small-Shuffle" benchmark their 
unoptimized

  version is on-par with mask method.
3. If you go to "Making Improvements/Faster Threshold-Based Discarding"
  section, then you'll see code my version is matched with. It is twice
  faster than masked method in Small-Shuffle benchmark, and just a bit
  slower in Large-Shuffle.




And __builtin_clzl is not free lunch either, it has latency 3-4 cycles
on modern processor.


Well, % is not cheap either.

Well, probably it could run in parallel with some part of xoroshiro, 
but it depends on how compiler will optimize this function.


I would certainly select the unbias multiply method if we want a u32 
range variant.


There could be two functions.


Yep, but do we need them? Who is likely to want 32 bits pseudo random
ints in a range? pgbench needs 64 bits.

So I'm still inclined to just keep the faster-on-average bitmask
method, despite that it may be slower for some ranges. The average
cost for the worst case in PRNG calls is, if I'm not mistaken:

  1 * 0.5 + 2 * 0.25 + 3 * 0.125 + ... ~ 2

which does not look too bad.


You doesn't count cost of branch-misprediction.
https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array
https://lemire.me/blog/2019/10/15/mispredicted-branches-can-multiply-your-running-times/
Therefore calculation should be at least:

   1 * 0.5 + 0.5 * (3 + 0.5 * (3 + ...)) = 6

By the way, we have 64bit random. If we use 44bit from it for range <= 
(1<<20), then
bias will be less than 1/(2**24). Could we just ignore it (given it is 
not crypto

strong random)?

uint64 pg_prng_u64_range(pg_prng_state *state, uint64 range)
{
  uint64 val = xoroshiro128ss(state);
  uint64 m;
  if ((range & (range-1) == 0) /* handle all power 2 cases */
return range != 0 ? val & (range-1) : 0;
  if (likely(range < (1<<20)))
 /*
  * While multiply method is biased, bias will be smaller than 
1/(1<<24) for

  * such small ranges. Lets ignore it.
  */
 return ((val >> 20) * range) >> 44;
  /* Apple's mask method */
  m = mask_u64(range-1);
  val &= m;
  while (val >= range)
val = xoroshiro128ss(state) & m;
  return val;
}

Anyway, excuse me for heating this discussion cause of such 
non-essential issue.

I'll try to control myself and don't proceed it further.

regards,
Sokolov Yura.

Re: [PATCH] expand the units that pg_size_pretty supports on output

2021-07-06 Thread David Rowley

On Wed, 7 Jul 2021 at 02:54, Tom Lane  wrote:
>
> Minor nit: use "const char *text" in the struct declaration, so
> that all of the static data can be placed in fixed storage.

Thanks for pointing that out.

> David Rowley  writes:
> > (I'm not sure why pgindent removed the space between != and NULL, but
> > it did, so I left it.)
>
> It did that because "text" is a typedef name, so it's a bit confused
> about whether the statement is really a declaration.  Personally I'd
> have used "name" or something like that for that field, anyway.

I should have thought of that. Thanks for highlighting it.  I've
renamed the field.

Updated patch attached.

David


v2-0001-Use-lookup-table-for-units-in-pg_size_pretty-and-.patch
Description: Binary data

Re: [PATCH] expand the units that pg_size_pretty supports on output

2021-07-06 Thread David Rowley

On Wed, 7 Jul 2021 at 00:51, Dean Rasheed  wrote:
> 10. half_round(21) == 11 :: 20 >> 1 = 10
>
> The correct result should be 10 (it would be very odd to claim that
> 10241 bytes should be displayed as 11kb), but the half-rounding keeps
> rounding up at each stage.
>
> That's a general property of rounding -- you need to be very careful
> when rounding more than once, since otherwise errors will propagate.
> C.f. 4083f445c0, which removed a double-round in numeric sqrt().

Thanks. I've adjusted the patch to re-add the round bool flag and get
rid of the rightShift field.  I'm now calculating how many bits to
shift right by based on the difference between the unitbits of the
current and next unit then taking 1 bit less if the next unit does
half rounding and the current one does not, or adding an extra bit on
in the opposite case.

I'll post another patch shortly.

David

Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options

2021-07-06 Thread Bharath Rupireddy

On Tue, Jul 6, 2021 at 9:24 PM Alvaro Herrera  wrote:
>
> On 2021-Jul-06, Bharath Rupireddy wrote:
>
> > Thanks, Amit. I'm posting the 0002 patch which removes extra ereport
> > calls using local variables. Please review it.
>
> I looked at this the other day and I'm not sure I like it very much.
> It's not making anything any simpler, it's barely saving two lines of
> code.  I think we can do without this change.

Just for the records. I will withdraw the 0002 patch as no one has
shown interest. Thanks.

Regards,
Bharath Rupireddy.

Warn if initdb's --sync-only option is mixed with other options

2021-07-06 Thread Gurjeet Singh

When reading through code for my previous patch [1] I realized that
initdb does *not* warn users that it ignores all other options (except
-D/--pgdata) if the --sync-only option is used.

I'm not able to come up with an exact situation to prove this, but
this behaviour seems potentially dangerous. The user might mix the
--sync-only option with other options, but would be extremely
surprised if those other options didn't take effect.

I _think_ we should throw an error if the user specifies any options
that are being ignored. But an error might break someone's automation
(perhaps for their own good), since the current behaviour has been in
place for a very long time, so I'm willing to settle for at least a
warning in such a case.

[1]:
Slightly improve initdb --sync-only option's help message
https://www.postgresql.org/message-id/CABwTF4U6hbNNE1bv%3DLxQdJybmUdZ5NJQ9rKY9tN82NXM8QH%2BiQ%40mail.gmail.com

Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/


v1-0001-Warn-if-sync-only-is-used-with-other-options.patch
Description: Binary data

Re: postgres_fdw - should we tighten up batch_size, fetch_size options against non-numeric values?

2021-07-06 Thread Fujii Masao





On 2021/07/01 21:41, Bharath Rupireddy wrote:

On Thu, Jul 1, 2021 at 6:07 PM Fujii Masao  wrote:


On 2021/07/01 13:16, Bharath Rupireddy wrote:

On Thu, Jul 1, 2021 at 8:23 AM Fujii Masao  wrote:

The recent commit 61d599ede7 documented that the type of those options is
floating point. But the docs still use "is a numeric value" in the descriptions
of them. Probably it should be replaced with "is a floating point value" there.
If we do this, isn't it better to use "floating point" even in the error 
message?


Agreed. PSA v5 patch.


Thanks for updating the patch! LGTM.
Barring any objection, I will commit this patch.


Thanks.


One question is; should we back-patch this? This is not a bug fix,
so I'm not sure if it's worth back-patching that to already-released versions.
But it may be better to do that to v14.


IMO, it's a good-to-have fix in v14. But, -1 for backpatching to v13
and lower branches.


Agreed. So I pushed the patch to master and v14. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options

2021-07-06 Thread Alvaro Herrera

On 2021-Jul-07, Peter Smith wrote:

> 1. Zap 'opts' up-front
> 
> + *
> + * Caller is expected to have cleared 'opts'.
> 
> This comment is putting the onus on the caller to "do the right thing".
> 
> I think that hopeful expectations about input should be removed - the
> function should just be responsible itself just to zap the SubOpts
> up-front. It makes the code more readable, and it removes any
> potential risk of garbage unintentionally passed in that struct.
> 
> /* Start out with cleared opts. */
> memset(opts, 0, sizeof(SubOpts));

Yeah, I gave the initialization aspect some thought too when I reviewed
0001.  Having an explicit memset() just for sanity check is a waste when
you don't really need it; and we're initializing the struct already at
declaration time by assigning {0} to it, so having to add memset feels
like such a waste.  Another point in the same area is that some of the
struct members are initialized to some default value different from 0,
so I wondered if it would have been better to remove the = {0} and
instead have another function that would set everything up the way we
want; but it seemed a bit excessive, so I ended up not suggesting that.

We have many places in the source tree where the caller is expected to
do the right thing, even when those right things are more complex than
this one.  This one place isn't terribly bad in that regard, given that
it's a pretty well contained static struct anyway (we would certainly
not export a struct of this name in any .h file.)  I don't think it's
all that bad.

> Alternatively, at least there should be an assertion for some sanity check.
> 
> Assert(opt->specified_opts == 0);

No opinion on this.  It doesn't seem valuable enough, but maybe I'm on
the minority on this.

> 2. Remove redundant conditions
> 
>   /* Check for incompatible options from the user. */
> - if (enabled && *enabled_given && *enabled)
> + if (opts->enabled &&
> + IsSet(supported_opts, SUBOPT_ENABLED) &&
> + IsSet(opts->specified_opts, SUBOPT_ENABLED))

(etc)

Yeah, I thought about this too when I saw the 0002 patch in this series.
I agree that the extra rechecks are a bit pointless.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Slightly improve initdb --sync-only option's help message

2021-07-06 Thread Gurjeet Singh

When reading the output of `initdb --help` I could not clearly
understand what the purpose of the --sync-only option was, until I
read the documentation of initdb.

  -S, --sync-only   only sync data directory

Perhaps the confusion was caused by the fact that sync(hronization)
means different things in different contexts, and many of those
contexts apply to databases, and to data directories; time sync, data
sync, replica sync, etc.

I think it would be helpful if the help message was slightly more
descriptive. Some options:

Used in patch:
 only sync data directory; does not modify any data

To match the wording of --sync-only option:
write contents of data directory to disk; helpful after --no-sync option

Clearly specify the system operation used for the option
perform fsync on data directory; helpful after --no-sync option

Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/


v1-0001-Explicitly-state-that-sync-only-does-not-modify-d.patch
Description: Binary data

Re: visibility map corruption

2021-07-06 Thread Bruce Momjian

On Tue, Jul  6, 2021 at 08:36:13PM -0400, Bruce Momjian wrote:
> On Tue, Jul  6, 2021 at 06:49:10PM -0400, Bruce Momjian wrote:
> > My point is that there are a lot internals involved here that are not
> > part of pg_upgrade, though it probably only affects pg_upgrade.  Anyway,
> > Bertrand patch seems to have what I need.
> 
> One question is how do we want to handle cases where -x next_xid is used
> but -u oldestXid is not used?  Compute a value for oldestXid like we did
> previously?  Throw an error?  Leave oldestXid unchanged?  I am thinking
> the last option.

Here is a modified version of Bertrand's patch, with docs, that does the
last option.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.



oldestxid.diff.gz
Description: application/gzip

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

2021-07-06 Thread David Rowley

On Wed, 7 Jul 2021 at 13:04, Andy Fan  wrote:
> Looking forward to watching this change closely, thank you both David and Tom!
> But I still don't understand what the faults my way have , do you mind 
> telling the
> details?

The problem is that we don't need 6 different ways to determine if a
Var can be NULL or not.  You're proposing to add a method using
Bitmapsets and Tom has some proposing ideas around tracking
nullability in Vars.  We don't need both.

It seems to me that having it in Var allows us to have a much finer
gradient about where exactly a Var can be NULL.

For example: SELECT nullablecol FROM tab WHERE nullablecol = ;

If the equality operator is strict then the nullablecol can be NULL in
the WHERE clause but not in the SELECT list. Tom's idea should allow
us to determine both of those things but your idea cannot tell them
apart, so, in theory at least, Tom's idea seems better to me.

David

RE: [HACKERS] logical decoding of two-phase transactions

2021-07-06 Thread tanghy.f...@fujitsu.com

On Tuesday, July 6, 2021 7:18 PM Ajin Cherian 
>
> thanks for the test!
> I hadn't updated the case where sending schema across was the first
> change of the transaction as part of the decoding of the
> truncate command. In this test case, the schema was sent across
> without the stream start, hence the error on the apply worker.
> I have updated with a fix. Please do a test and confirm.
> 

Thanks for your patch.
I have tested and confirmed that the issue was fixed.

Regards
Tang

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

2021-07-06 Thread Andy Fan

On Tue, Jul 6, 2021 at 9:14 PM Tom Lane  wrote:

> David Rowley  writes:
> > Tom, I'm wondering if you might get a chance to draw up a design for
> > what you've got in mind with this?  I assume adding a new field in
> > Var, but I'm drawing a few blanks on how things might work for equal()
> > when one Var has the field set and another does not.
>
> As I said before, it hasn't progressed much past the handwaving stage,
> but it does seem like it's time to get it done.  I doubt I'll have any
> cycles for it during the commitfest, but maybe I can devote a block of
> time during August.
>
> regards, tom lane
>

Looking forward to watching this change closely, thank you both David and
Tom!
But I still don't understand what the faults my way have , do you mind
telling the
details?

-- 
Best Regards
Andy Fan (https://www.aliyun.com/)

Re: Re[3]: On login trigger: take three

2021-07-06 Thread Greg Nancarrow

On Sun, Jul 4, 2021 at 1:21 PM vignesh C  wrote:
>
> CFBot shows the following failure:
> # poll_query_until timed out executing this query:
> # SELECT '0/3046250' <= replay_lsn AND state = 'streaming' FROM
> pg_catalog.pg_stat_replication WHERE application_name = 'standby_1';
> # expecting this output:
> # t
> # last actual query output:
> # t
> # with stderr:
> # NOTICE: You are welcome!
> # Looks like your test exited with 29 before it could output anything.
> t/001_stream_rep.pl ..
> Dubious, test returned 29 (wstat 7424, 0x1d00)
>

Thanks.
I found that the patch was broken by commit f452aaf7d (the part
"adjust poll_query_until to insist on empty stderr as well as a stdout
match").
So I had to remove a "RAISE NOTICE" (which was just an informational
message) from the login trigger function, to satisfy the new
poll_query_until expectations.
Also, I updated a PG14 version check (now must check PG15 version).

Regards,
Greg Nancarrow
Fujitsu Australia

v16-0001-on_client_connect_event_trigger.patch
Description: Binary data

Re: visibility map corruption

2021-07-06 Thread Bruce Momjian

On Tue, Jul  6, 2021 at 06:49:10PM -0400, Bruce Momjian wrote:
> On Tue, Jul  6, 2021 at 03:46:48PM -0700, Peter Geoghegan wrote:
> > On Tue, Jul 6, 2021 at 3:30 PM Bruce Momjian  wrote:
> > > Yes, I can, though it seems like a much bigger issue than pg_upgrade.
> > > I will be glad to dig into it.
> > 
> > I'm not sure what you mean by that. Technically this would be an issue
> > for any program that uses "pg_resetwal -x" in the way that pg_upgrade
> > does, with those same expectations. But isn't pg_upgrade the only
> > known program that behaves like that?
> > 
> > I don't see any reason why this wouldn't be treated as a pg_upgrade
> > bug in the release notes, regardless of the exact nature or provenance
> > of the issue -- the pg_upgrade framing seems useful because this is a
> > practical problem for pg_upgrade users alone. Have I missed something?
> 
> My point is that there are a lot internals involved here that are not
> part of pg_upgrade, though it probably only affects pg_upgrade.  Anyway,
> Bertrand patch seems to have what I need.

One question is how do we want to handle cases where -x next_xid is used
but -u oldestXid is not used?  Compute a value for oldestXid like we did
previously?  Throw an error?  Leave oldestXid unchanged?  I am thinking
the last option.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options

2021-07-06 Thread Peter Smith

On Tue, Jul 6, 2021 at 6:21 PM Amit Kapila  wrote:
>
> On Fri, Jul 2, 2021 at 12:36 PM Amit Kapila  wrote:
> >
> > On Fri, Jul 2, 2021 at 8:35 AM Alvaro Herrera  
> > wrote:
> > >
> > > > The latest patch sent by Bharath looks good to me. Would you like to
> > > > commit it or shall I take care of it?
> > >
> > > Please, go ahead.
> > >
> >
> > Okay, I'll push it early next week (by Tuesday) unless there are more
> > comments or suggestions. Thanks!
> >
>
> Pushed!

Yesterday, I needed to refactor a lot of code due to this push [1].

The refactoring exercise caused me to study these v11 changes much more deeply.

IMO there are a few improvements that should be made:

//

1. Zap 'opts' up-front

+ *
+ * Caller is expected to have cleared 'opts'.

This comment is putting the onus on the caller to "do the right thing".

I think that hopeful expectations about input should be removed - the
function should just be responsible itself just to zap the SubOpts
up-front. It makes the code more readable, and it removes any
potential risk of garbage unintentionally passed in that struct.

/* Start out with cleared opts. */
memset(opts, 0, sizeof(SubOpts));


Alternatively, at least there should be an assertion for some sanity check.

Assert(opt->specified_opts == 0);



2. Remove redundant conditions

  /* Check for incompatible options from the user. */
- if (enabled && *enabled_given && *enabled)
+ if (opts->enabled &&
+ IsSet(supported_opts, SUBOPT_ENABLED) &&
+ IsSet(opts->specified_opts, SUBOPT_ENABLED))
  ereport(ERROR,
  (errcode(ERRCODE_SYNTAX_ERROR),
  /*- translator: both %s are strings of the form "option = value" */
  errmsg("%s and %s are mutually exclusive options",
  "connect = false", "enabled = true")));

- if (create_slot && create_slot_given && *create_slot)
+ if (opts->create_slot &&
+ IsSet(supported_opts, SUBOPT_CREATE_SLOT) &&
+ IsSet(opts->specified_opts, SUBOPT_CREATE_SLOT))
  ereport(ERROR,
  (errcode(ERRCODE_SYNTAX_ERROR),
  errmsg("%s and %s are mutually exclusive options",
  "connect = false", "create_slot = true")));

- if (copy_data && copy_data_given && *copy_data)
+ if (opts->copy_data &&
+ IsSet(supported_opts, SUBOPT_COPY_DATA) &&
+ IsSet(opts->specified_opts, SUBOPT_COPY_DATA))
  ereport(ERROR,
  (errcode(ERRCODE_SYNTAX_ERROR),
  errmsg("%s and %s are mutually exclusive options",
  "connect = false", "copy_data = true")));

By definition, this function only allows any option to be
"specified_opts" if that option is also "supported_opts". So, there is
really no need in the above code to re-check again that it is
supported.

It can be simplified like this:

  /* Check for incompatible options from the user. */
- if (enabled && *enabled_given && *enabled)
+ if (opts->enabled &&
+ IsSet(opts->specified_opts, SUBOPT_ENABLED))
  ereport(ERROR,
  (errcode(ERRCODE_SYNTAX_ERROR),
  /*- translator: both %s are strings of the form "option = value" */
  errmsg("%s and %s are mutually exclusive options",
  "connect = false", "enabled = true")));

- if (create_slot && create_slot_given && *create_slot)
+ if (opts->create_slot &&
+ IsSet(opts->specified_opts, SUBOPT_CREATE_SLOT))
  ereport(ERROR,
  (errcode(ERRCODE_SYNTAX_ERROR),
  errmsg("%s and %s are mutually exclusive options",
  "connect = false", "create_slot = true")));

- if (copy_data && copy_data_given && *copy_data)
+ if (opts->copy_data &&
+ IsSet(opts->specified_opts, SUBOPT_COPY_DATA))
  ereport(ERROR,
  (errcode(ERRCODE_SYNTAX_ERROR),
  errmsg("%s and %s are mutually exclusive options",
  "connect = false", "copy_data = true")));

-

3. Remove redundant conditions

Same as 2. Here are more examples of conditions where the redundant
checking of "supported_opts" can be removed.

  /*
  * Do additional checking for disallowed combination when slot_name = NONE
  * was used.
  */
- if (slot_name && *slot_name_given && !*slot_name)
+ if (!opts->slot_name &&
+ IsSet(supported_opts, SUBOPT_SLOT_NAME) &&
+ IsSet(opts->specified_opts, SUBOPT_SLOT_NAME))
  {
- if (enabled && *enabled_given && *enabled)
+ if (opts->enabled &&
+ IsSet(supported_opts, SUBOPT_ENABLED) &&
+ IsSet(opts->specified_opts, SUBOPT_ENABLED))
  ereport(ERROR,
  (errcode(ERRCODE_SYNTAX_ERROR),
  /*- translator: both %s are strings of the form "option = value" */
  errmsg("%s and %s are mutually exclusive options",
  "slot_name = NONE", "enabled = true")));

- if (create_slot && create_slot_given && *create_slot)
+ if (opts->create_slot &&
+ IsSet(supported_opts, SUBOPT_CREATE_SLOT) &&
+ IsSet(opts->specified_opts, SUBOPT_CREATE_SLOT))
  ereport(ERROR,
  (errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: both %s are strings of the form "option = value" */
  errmsg("%s and %s are mutually exclusive options",
  "slot_name = NONE", "create_slot = true")));

It can be simplified like this:

  /*
  * Do additional checking for disallowed combination when slot_name = NONE
  * was used.
  */
- if (slot_name && *slot_name_given && !*slot_name)
+ if

Re: Column Filtering in Logical Replication

2021-07-06 Thread Alvaro Herrera

Hello, here are a few comments on this patch.

The patch adds a function get_att_num_by_name; but we have a lsyscache.c
function for that purpose, get_attnum.  Maybe that one should be used
instead?

get_tuple_columns_map() returns a bitmapset of the attnos of the columns
in the given list, so its name feels wrong.  I propose
get_table_columnset().  However, this function is invoked for every
insert/update change, so it's going to be far too slow to be usable.  I
think you need to cache the bitmapset somewhere, so that the function is
only called on first use.  I didn't look very closely, but it seems that
struct RelationSyncEntry may be a good place to cache it.

The patch adds a new parse node PublicationTable, but doesn't add
copyfuncs.c, equalfuncs.c, readfuncs.c, outfuncs.c support for it.
Maybe try a compile with WRITE_READ_PARSE_PLAN_TREES and/or
COPY_PARSE_PLAN_TREES enabled to make sure everything is covered.
(I didn't verify that this actually catches anything ...)

The new column in pg_publication_rel is prrel_attr.  This name seems at
odds with existing column names (we don't use underscores in catalog
column names).  Maybe prattrs is good enough?  prrelattrs?  We tend to
use plurals for columns that are arrays.

It's not super clear to me that strlist_to_textarray() and related
processing will behave sanely when the column names contain weird
characters such as commas or quotes, or just when used with uppercase
column names.  Maybe it's worth having tests that try to break such
cases.

You seem to have left a debugging "elog(LOG)" line in OpenTableList.

I got warnings from "git am" about trailing whitespace being added by
the patch in two places.


Thanks!

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/

Re: visibility map corruption

2021-07-06 Thread Peter Geoghegan

On Tue, Jul 6, 2021 at 3:49 PM Bruce Momjian  wrote:
> My point is that there are a lot internals involved here that are not
> part of pg_upgrade, though it probably only affects pg_upgrade.  Anyway,
> Bertrand patch seems to have what I need.

I was confused by your remarks because I am kind of looking at it from
the opposite angle. At least now that I've thought about it a bit.

Since the snippet of pg_resetwal code that sets oldestXid wasn't ever
intended to be used by pg_upgrade, but was anyway, what we have is a
something that's clearly totally wrong (at least in the pg_upgrade
case). It's not just wrong for pg_upgrade to do things that way --
it's also wildly unreasonable. We heard a complaint about this from
Reddit only because it worked "as designed", and so made the cluster
immediately have an anti-wraparound autovacuum. But why would anybody
want that behavior, even if it was implemented correctly? It simply
makes no sense.

The consequences of this bug are indeed complicated and subtle and
will probably never be fully understood. But at the same time fixing
the bug now seems kind of simple. (Working backwards to arrive here
was a bit tricky, mind you.)

-- 
Peter Geoghegan

Re: visibility map corruption

2021-07-06 Thread Bruce Momjian

On Tue, Jul  6, 2021 at 03:46:48PM -0700, Peter Geoghegan wrote:
> On Tue, Jul 6, 2021 at 3:30 PM Bruce Momjian  wrote:
> > Yes, I can, though it seems like a much bigger issue than pg_upgrade.
> > I will be glad to dig into it.
> 
> I'm not sure what you mean by that. Technically this would be an issue
> for any program that uses "pg_resetwal -x" in the way that pg_upgrade
> does, with those same expectations. But isn't pg_upgrade the only
> known program that behaves like that?
> 
> I don't see any reason why this wouldn't be treated as a pg_upgrade
> bug in the release notes, regardless of the exact nature or provenance
> of the issue -- the pg_upgrade framing seems useful because this is a
> practical problem for pg_upgrade users alone. Have I missed something?

My point is that there are a lot internals involved here that are not
part of pg_upgrade, though it probably only affects pg_upgrade.  Anyway,
Bertrand patch seems to have what I need.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: visibility map corruption

2021-07-06 Thread Bruce Momjian

On Tue, Jul  6, 2021 at 06:30:41PM -0400, Bruce Momjian wrote:
> On Tue, Jul  6, 2021 at 02:27:34PM -0700, Peter Geoghegan wrote:
> > > BTW, is it really necessary for copy_xact_xlog_xid to invoke pg_resetwal
> > > so many times?  Why can't we pass all of the update-this options in one
> > > call?
> > 
> > No opinion here.
> > 
> > > Who's going to do the legwork on this?
> > 
> > Can Bruce take care of committing the patch for this? Bruce?
> > 
> > This should definitely be in the next point release IMV.
> 
> Yes, I can, though it seems like a much bigger issue than pg_upgrade.
> I will be glad to dig into it.

Bertrand Drouvot provided a patch in the thread, so I will start from
that;  CC'ing him too.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: visibility map corruption

2021-07-06 Thread Peter Geoghegan

On Tue, Jul 6, 2021 at 3:30 PM Bruce Momjian  wrote:
> Yes, I can, though it seems like a much bigger issue than pg_upgrade.
> I will be glad to dig into it.

I'm not sure what you mean by that. Technically this would be an issue
for any program that uses "pg_resetwal -x" in the way that pg_upgrade
does, with those same expectations. But isn't pg_upgrade the only
known program that behaves like that?

I don't see any reason why this wouldn't be treated as a pg_upgrade
bug in the release notes, regardless of the exact nature or provenance
of the issue -- the pg_upgrade framing seems useful because this is a
practical problem for pg_upgrade users alone. Have I missed something?

-- 
Peter Geoghegan

Re: visibility map corruption

2021-07-06 Thread Bruce Momjian

On Tue, Jul  6, 2021 at 02:27:34PM -0700, Peter Geoghegan wrote:
> > BTW, is it really necessary for copy_xact_xlog_xid to invoke pg_resetwal
> > so many times?  Why can't we pass all of the update-this options in one
> > call?
> 
> No opinion here.
> 
> > Who's going to do the legwork on this?
> 
> Can Bruce take care of committing the patch for this? Bruce?
> 
> This should definitely be in the next point release IMV.

Yes, I can, though it seems like a much bigger issue than pg_upgrade.
I will be glad to dig into it.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: visibility map corruption

2021-07-06 Thread Peter Geoghegan

On Tue, Jul 6, 2021 at 3:12 PM Mark Dilger  wrote:
> Thanks, Peter, for drawing my attention to this.  I had already been 
> following this thread, but had not yet thought about the problem in terms of 
> amcheck.
>
> I will investigate possible solutions in verify_heapam().

Thanks! Great that we might be able to make a whole class of bugs
detectable with the new amcheck stuff. Glad that I didn't forget about
amcheck myself -- I almost forgot.

When I was working on the btree amcheck code, I looked for interesting
historical bugs and made sure that I could detect them. That seems
even more important with heapam. I wouldn't be surprised if one or two
important invariants were missed, in part because the heapam design
didn't have invariants as a starting point.

-- 
Peter Geoghegan

[PATCH][postgres_fdw] Add push down of CASE WHEN clauses

2021-07-06 Thread Gilles Darold


Hi,


I have noticed that postgres_fdw do not push down the CASE WHEN clauses. 
In the following case this normal:


   contrib_regression=# EXPLAIN (ANALYZE, VERBOSE) SELECT (CASE WHEN
   mod(c1, 4) = 0 THEN 1 ELSE 2 END) FROM ft1;
   QUERY PLAN
   
-
 Foreign Scan on public.ft1  (cost=100.00..146.00 rows=1000
   width=4) (actual time=0.306..0.844 rows=822 loops=1)
   Output: CASE WHEN (mod(c1, 4) = 0) THEN 1 ELSE 2 END
   Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
 Planning Time: 0.139 ms
 Execution Time: 1.057 ms
   (5 rows)


but in these other cases this is a performances killer, all records are 
fetched



   contrib_regression=# EXPLAIN (ANALYZE, VERBOSE) SELECT sum(CASE WHEN
   mod(c1, 4) = 0 THEN 1 ELSE 2 END) FROM ft1;
  QUERY PLAN
   
---
 Aggregate  (cost=148.50..148.51 rows=1 width=8) (actual
   time=1.421..1.422 rows=1 loops=1)
   Output: sum(CASE WHEN (mod(c1, 4) = 0) THEN 1 ELSE 2 END)
   ->  Foreign Scan on public.ft1  (cost=100.00..141.00 rows=1000
   width=4) (actual time=0.694..1.366 rows=822 loops=1)
 Output: c1
 Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
 Planning Time: 1.531 ms
 Execution Time: 3.901 ms
   (7 rows)


   contrib_regression=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM ft1
   WHERE c1 > (CASE WHEN mod(c1, 4) = 0 THEN 1 ELSE 100 END);
   QUERY PLAN
   
-
 Foreign Scan on public.ft1  (cost=100.00..148.48 rows=333
   width=47) (actual time=0.763..3.003 rows=762 loops=1)
   Output: c1, c2, c3, c4, c5, c6, c7, c8
   Filter: (ft1.c1 > CASE WHEN (mod(ft1.c1, 4) = 0) THEN 1 ELSE 100
   END)
   Rows Removed by Filter: 60
   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S
   1"."T 1"
 Planning Time: 0.584 ms
 Execution Time: 3.392 ms
   (7 rows)


The attached patch adds push down of CASE WHEN clauses. Queries above 
have the following plans when this patch is applied:



   contrib_regression=# EXPLAIN (ANALYZE, VERBOSE) SELECT sum(CASE WHEN
   mod(c1, 4) = 0 THEN 1 ELSE 2 END) FROM ft1;
  QUERY PLAN
   
--
 Foreign Scan  (cost=107.50..128.53 rows=1 width=8) (actual
   time=2.022..2.024 rows=1 loops=1)
   Output: (sum(CASE WHEN (mod(c1, 4) = 0) THEN 1 ELSE 2 END))
   Relations: Aggregate on (public.ft1)
   Remote SQL: SELECT sum(CASE  WHEN (mod("C 1", 4) = 0) THEN 1
   ELSE 2 END) FROM "S 1"."T 1"
 Planning Time: 0.252 ms
 Execution Time: 2.684 ms
   (6 rows)

   contrib_regression=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM ft1
   WHERE c1 > (CASE WHEN mod(c1, 4) = 0 THEN 1 ELSE 100 END);
   QUERY PLAN

   

   --
 Foreign Scan on public.ft1  (cost=100.00..135.16 rows=333
   width=47) (actual time=1.797..3.463 rows=762 loops=1)
   Output: c1, c2, c3, c4, c5, c6, c7, c8
   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S
   1"."T 1" WHERE (("C 1" > CASE  WHEN (mod("C 1", 4) = 0)
   THEN 1 ELSE 100 END))
 Planning Time: 0.745 ms
 Execution Time: 3.860 ms
   (5 rows)


I don't see a good reason to never push the CASE WHEN clause but perhaps 
I'm missing something, any though?



Best regards,

--
Gilles Darold
MigOps Inc (http://migops.com)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 31919fda8c..8b8ca91e25 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -185,6 +185,7 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 	deparse_expr_cxt *context);
+static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context);
 
 /*
  * Helper functions
@@ -796,6 +797,53 @@ foreign_expr_walker(Node *node,
 	state = FDW_COLLATE_UNSAFE;
 			}
 			break;
+		case T_CaseTestExpr:
+			{
+CaseTestExpr *c = (CaseTestExpr *) node;
+
+/*
+ * If the expression has nondefault collation, either it's of a
+ * non-builtin type, or it reflects folding of a CollateExpr.
+ * It's unsafe to send to the remote unless it's used in a
+ * non-collation-sensitive context.
+ */
+collation = c->collation;
+if

Re: [PATCH] Allow CustomScan nodes to signal projection support

2021-07-06 Thread Tom Lane

Aleksander Alekseev  writes:
>> I named the flag CUSTOMPATH_SUPPORT_PROJECTION similar to the other
>> custom node flags, but this would revert the current logic

> This seems to be a typical Kobayashi Maru situation, i.e any choice is
> a bad one. I suggest keeping the patch as is and hoping that the
> developers of existing extensions read the release notes.

Yeah, I concur that's the least bad choice.

I got annoyed by the fact that the existing checks of CustomPath flags
had several randomly different styles for basically identical tests,
and this patch wanted to introduce yet another.  I cleaned that up and
pushed this.

regards, tom lane

Re: visibility map corruption

2021-07-06 Thread Mark Dilger




> On Jul 6, 2021, at 2:27 PM, Peter Geoghegan  wrote:
> 
> It looks like amcheck's verify_heapam.c functionality almost catches
> bugs like this one. Something for Mark (CC'd) to consider. Does it
> matter that we usually "ctx.oldest_xid = ctx.relfrozenxid", and so
> usually use pg_class.relfrozenxid as our oldest_xid (and not
> ShmemVariableCache->oldestXid)? In other words, could we be doing more
> to sanitize ShmemVariableCache->oldestXid, especially when the
> relation's pg_class.relfrozenxid happens to be set to a real XID?

Thanks, Peter, for drawing my attention to this.  I had already been following 
this thread, but had not yet thought about the problem in terms of amcheck.

I will investigate possible solutions in verify_heapam().
 
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: trivial improvement to system_or_bail

2021-07-06 Thread Alvaro Herrera

On 2021-Jun-30, Daniel Gustafsson wrote:

> + BAIL_OUT("system $_[0] failed: $!\n");
> I wonder if we should take more inspiration from the Perl manual and change it
> to "failed to execute" to make it clear that the failure was in executing the
> program, not from the program itself?

You're right, that's a good distinction to make.  I've used this
wording.  Thanks.

> +1 for adding the extra details, but another thing that I've always found
> very confusing is just the phrasing of the message itself.  It makes
> no sense unless (a) you know that "system" is Perl's function for
> executing a shell command, (b) you are familiar with Perl's generally
> cavalier approach to parentheses, and (c) you are also unbothered by
> whether the word "failed" is part of the message text or the command
> being complained of.  We really need to do something to set off the
> shell command's text from the surrounding verbiage a little better.
> 
> I'd prefer something like
> 
>   command "pg_ctl start" failed: details here

Done that way, thanks for the suggestion.

Failures now look like this, respectively:

Bailout called.  Further testing stopped:  failed to execute command "finitdb 
-D 
/home/alvherre/Code/pgsql-build/master/src/test/recovery/tmp_check/t_019_replslot_limit_primary_data/pgdata
 -A trust -N --wal-segsize=1": No such file or directory

Bailout called.  Further testing stopped:  command "initdb -0D 
/home/alvherre/Code/pgsql-build/master/src/test/recovery/tmp_check/t_019_replslot_limit_primary_data/pgdata
 -A trust -N --wal-segsize=1" exited with value 1

Bailout called.  Further testing stopped:  command "initdb -0D 
/home/alvherre/Code/pgsql-build/master/src/test/recovery/tmp_check/t_019_replslot_limit_primary_data/pgdata
 -A trust -N --wal-segsize=1" died with signal 11


Previously it was just

Bailout called.  Further testing stopped:  system initdb failed

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"I must say, I am absolutely impressed with what pgsql's implementation of
VALUES allows me to do. It's kind of ridiculous how much "work" goes away in
my code.  Too bad I can't do this at work (Oracle 8/9)."   (Tom Allison)
   http://archives.postgresql.org/pgsql-general/2007-06/msg00016.php

Re: visibility map corruption

2021-07-06 Thread Peter Geoghegan

On Tue, Jul 6, 2021 at 11:57 AM Tom Lane  wrote:
> Peter Geoghegan  writes:
> > ... We should just carry forward the original oldestXid.
>
> Yup.  It's a bit silly that we recognized the need to do that
> for oldestMultiXid yet not for oldestXid.

True. But at the same time it somehow doesn't seem silly at all. IME
some of the most devious bugs evade detection by hiding in plain
sight.

It looks like amcheck's verify_heapam.c functionality almost catches
bugs like this one. Something for Mark (CC'd) to consider. Does it
matter that we usually "ctx.oldest_xid = ctx.relfrozenxid", and so
usually use pg_class.relfrozenxid as our oldest_xid (and not
ShmemVariableCache->oldestXid)? In other words, could we be doing more
to sanitize ShmemVariableCache->oldestXid, especially when the
relation's pg_class.relfrozenxid happens to be set to a real XID?

> BTW, is it really necessary for copy_xact_xlog_xid to invoke pg_resetwal
> so many times?  Why can't we pass all of the update-this options in one
> call?

No opinion here.

> Who's going to do the legwork on this?

Can Bruce take care of committing the patch for this? Bruce?

This should definitely be in the next point release IMV.

-- 
Peter Geoghegan

Re: [PATCH v3 1/1] Fix detection of preadv/pwritev support for OSX.

2021-07-06 Thread Tom Lane

James Hilliard  writes:
> On Tue, Jul 6, 2021 at 2:34 PM Tom Lane  wrote:
>> As far as I can tell, the only way to really deal with #2 is to
>> perform a runtime dlsym() probe to see whether pwritev exists, and
>> then fall back to our src/port/ implementation if not.  This does
>> not look particularly hard (especially since the lookup code only
>> needs to work on macOS), though I admit I've not tried to code it.

> Maybe we just need to do something like this?:
> https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg06499.html

Hm.  I don't trust __builtin_available too much --- does it exist
on really old macOS?  dlsym() seems less likely to create problems
in that respect.  Of course, if there are scenarios where
__builtin_available knows that the symbol is available but doesn't
work, then we might have to use it.

In any case, I don't like their superstructure around the probe.
What I'd prefer is a function pointer that initially points to
the probe code, and then we change it to point at either pwritev
or the pg_pwritev emulation.  Certainly failing with ENOSYS is
exactly what *not* to do.

regards, tom lane

Re: [PATCH v3 1/1] Fix detection of preadv/pwritev support for OSX.

2021-07-06 Thread James Hilliard

On Tue, Jul 6, 2021 at 2:34 PM Tom Lane  wrote:
>
> Peter Eisentraut  writes:
> > I think this change is perfectly appropriate (modulo some small cleanups).
>
> I think there are a couple of issues here.
>
> 1. People who are already using MACOSX_DEPLOYMENT_TARGET to control
> their builds would like to keep on doing so, but the AC_CHECK_FUNCS
> probe doesn't work with that.  James' latest patch seems like a
> reasonable fix for that (it's a lot less invasive than where we
> started from).  There is a worry about side-effects on other
> platforms, but I don't see an answer to that that's better than
> "throw it on the buildfarm and see if anything breaks".
>
> However ...
>
> 2. We'd really like to use preadv/pwritev where available.  I
> maintain that MACOSX_DEPLOYMENT_TARGET is not only not the right
> approach to that, but it's actually counterproductive.  It forces
> you to build for the lowest common denominator, ie the oldest macOS
> release you want to support.  Even when running on a release that
> has pwritev, your build will never use it.
>
> As far as I can tell, the only way to really deal with #2 is to
> perform a runtime dlsym() probe to see whether pwritev exists, and
> then fall back to our src/port/ implementation if not.  This does
> not look particularly hard (especially since the lookup code only
> needs to work on macOS), though I admit I've not tried to code it.

Maybe we just need to do something like this?:
https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg06499.html

>
> What's unclear to me at the moment is whether #1 and #2 interact,
> ie is there still any point in changing configure's probes if
> we put in a runtime check on Darwin?  I think that we might want
> to pay no attention to what the available header files say about
> pwritev, as long as we can get the correct 'struct iovec'
> declaration from them.
>
> regards, tom lane

Re: rand48 replacement

2021-07-06 Thread Fabien COELHO




Hello Yura,


However, I'm not enthousiastic at combining two methods depending on
the range, the function looks complex enough without that, so I would
suggest not to take this option. Also, the decision process adds to
the average cost, which is undesirable.


Given 99.99% cases will be in the likely case, branch predictor should
eliminate decision cost.


Hmmm. ISTM that a branch predictor should predict that unknown < small 
should probably be false, so a hint should be given that it is really 
true.


And as Dean Rasheed correctly mentioned, mask method will have really 
bad pattern for branch predictor if range is not just below or equal to 
power of two.


On average the bitmask is the better unbiased method, if the online 
figures are to be trusted. Also, as already said, I do not really want to 
add code complexity, especially to get lower average performance, and 
especially with code like "threshold = - range % range", where both 
variables are unsigned, I have a headache just looking at it:-)



And __builtin_clzl is not free lunch either, it has latency 3-4 cycles
on modern processor.


Well, % is not cheap either.

Well, probably it could run in parallel with some part of xoroshiro, but 
it depends on how compiler will optimize this function.


I would certainly select the unbias multiply method if we want a u32 
range variant.


There could be two functions.


Yep, but do we need them? Who is likely to want 32 bits pseudo random 
ints in a range? pgbench needs 64 bits.


So I'm still inclined to just keep the faster-on-average bitmask method, 
despite that it may be slower for some ranges. The average cost for the 
worst case in PRNG calls is, if I'm not mistaken:


  1 * 0.5 + 2 * 0.25 + 3 * 0.125 + ... ~ 2

which does not look too bad.

--
Fabien.

Re: [PATCH v3 1/1] Fix detection of preadv/pwritev support for OSX.

2021-07-06 Thread Tom Lane

Peter Eisentraut  writes:
> I think this change is perfectly appropriate (modulo some small cleanups).

I think there are a couple of issues here.

1. People who are already using MACOSX_DEPLOYMENT_TARGET to control
their builds would like to keep on doing so, but the AC_CHECK_FUNCS
probe doesn't work with that.  James' latest patch seems like a
reasonable fix for that (it's a lot less invasive than where we
started from).  There is a worry about side-effects on other
platforms, but I don't see an answer to that that's better than
"throw it on the buildfarm and see if anything breaks".

However ...

2. We'd really like to use preadv/pwritev where available.  I
maintain that MACOSX_DEPLOYMENT_TARGET is not only not the right
approach to that, but it's actually counterproductive.  It forces
you to build for the lowest common denominator, ie the oldest macOS
release you want to support.  Even when running on a release that
has pwritev, your build will never use it.

As far as I can tell, the only way to really deal with #2 is to
perform a runtime dlsym() probe to see whether pwritev exists, and
then fall back to our src/port/ implementation if not.  This does
not look particularly hard (especially since the lookup code only
needs to work on macOS), though I admit I've not tried to code it.

What's unclear to me at the moment is whether #1 and #2 interact,
ie is there still any point in changing configure's probes if
we put in a runtime check on Darwin?  I think that we might want
to pay no attention to what the available header files say about
pwritev, as long as we can get the correct 'struct iovec'
declaration from them.

regards, tom lane

Re: Using COPY FREEZE in pgbench

2021-07-06 Thread Dean Rasheed

On Sun, 4 Jul 2021 at 09:32, Tatsuo Ishii  wrote:
>
> >> So overall gain by the patch is around 15%, whereas the last test
> >> before the commit was 14%.  It seems the patch is still beneficial
> >> after the commit.
> >
> > Yes, that's good!
>
> Yeah!
>

I tested this with -s100 and got similar results, but with -s1000 it
was more like 1.5x faster:

master:
done in 111.33 s (drop tables 0.00 s, create tables 0.01 s,
client-side generate 52.45 s, vacuum 32.30 s, primary keys 26.58 s)

patch:
done in 74.04 s (drop tables 0.46 s, create tables 0.04 s, client-side
generate 51.81 s, vacuum 2.11 s, primary keys 19.63 s)

Nice!

Regards,
Dean

Re: Hook for extensible parsing.

2021-07-06 Thread Jim Mlodgenski

On Sat, Jun 12, 2021 at 4:29 AM Julien Rouhaud  wrote:

> I'd like to propose an alternative approach, which is to allow multiple 
> parsers
> to coexist, and let third-party parsers optionally fallback on the core
> parsers.  I'm sending this now as a follow-up of [1] and to avoid duplicated
> efforts, as multiple people are interested in that topic.

The patches all build properly and pass all regressions tests.

> pg_parse_query() will instruct plugins to parse a query at a time.  They're
> free to ignore that mode if they want to implement the 3rd mode.  If so, they
> should either return multiple RawStmt, a single RawStmt with a 0 or
> strlen(query_string) stmt_len, or error out.  Otherwise, they will implement
> either mode 1 or 2, and they should always return a List containing a single
> RawStmt with properly set stmt_len, even if the underlying statement is NULL.
> This is required to properly skip valid strings that don't contain a
> statements, and pg_parse_query() will skip RawStmt that don't contain an
> underlying statement.

Wouldn't we want to only loop through the individual statements if parser_hook
exists? The current patch seems to go through the new code path regardless
of the hook being grabbed.

Re: proposal - log_full_scan

2021-07-06 Thread Daniel Gustafsson

> On 6 Jul 2021, at 18:14, Pavel Stehule  wrote:

> I thought about it more, and sometimes bitmap index scans are problematic 
> too, index scans in nested loops can be a problem too.

Right.  Depending on the circumstances, pretty much anything in a plan can be
something deemed problematic in some production setting.

> Unfortunately, this idea is not well prepared. My patch was a proof concept - 
> but I think so it is not a good start point. Maybe it needs some tracing API 
> on executor level and some tool like "perf top", but for executor. Post 
> execution analysis is not a good direction with big overhead, and mainly it 
> is not friendly in critical situations. I need some handy tool like perf, but 
> for executor nodes. I don't know how to do it effectively.

These are hot codepaths so adding tracing instrumentation which collects enough
information to be useful, and which can be drained fast enough to not be a
resource problem is tricky.

> Thank you for your review and for your time, but I think it is better to 
> remove this patch from commit fest. I have no idea how to design this feature 
> well :-/

No worries, I hope we see an updated approach at some time.  In the meantime
I'm marking this patch Returned with Feedback.

--
Daniel Gustafsson   https://vmware.com/

Re: prion failed with ERROR: missing chunk number 0 for toast value 14334 in pg_toast_2619

2021-07-06 Thread Justin Pryzby

On Wed, Jun 30, 2021 at 03:29:41PM +0900, Michael Paquier wrote:
> On Tue, May 18, 2021 at 01:04:18PM +0200, Drouvot, Bertrand wrote:
> > On 5/17/21 8:56 PM, Andres Freund wrote:
> >> On 2021-05-17 20:14:40 +0200, Drouvot, Bertrand wrote:
> >>> I was also wondering if:
> >>> 
> >>>   * We should keep the old behavior in case pg_resetwal -x is being used
> >>> without -u?  (The proposed patch does not set an arbitrary oldestXID
> >>> anymore in  case -x is used)
> >> I don't think we should. I don't see anything in the old behaviour worth
> >> maintaining.
> 
> So, pg_resetwal logic with the oldest XID assignment is causing some
> problem here.  This open item is opened for some time now and it is
> idle for a couple of weeks.  It looks that we have some solution
> drafted, to be able to move forward, with the following things (no
> patches yet): 
> - More robustness safety checks in procarray.c.
> - A rework of oldestXid in pg_resetwal.
> 
> Is there somebody working on that?

Bertrand sent a patch which is awaiting review.
This allows both of Tom's reproducers to pass pg_upgrade (with assertions and
delays).

http://cfbot.cputube.org/bertrand-drouvot.html

I re-arranged the pg_upgrade output of that patch: it was in the middle of the
two halves: "Setting next transaction ID and epoch for new cluster"

-- 
Justin
>From 30204d85005d6288d2dc93bcd28ff4fb2454a703 Mon Sep 17 00:00:00 2001
From: Tom Lane 
Date: Sun, 16 May 2021 16:23:02 -0400
Subject: [PATCH 1/3] prion failed with ERROR: missing chunk number 0 for toast
 value 14334 in pg_toast_2619

It also seems like some assertions in procarray.c would be a
good idea.  With the attached patch, we get through core
regression just fine, but the pg_upgrade test fails immediately
after the "Resetting WAL archives" step.
---
 src/backend/storage/ipc/procarray.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4c91e721d0..324e105c59 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2486,6 +2486,15 @@ GetSnapshotData(Snapshot snapshot)
    oldestfxid);
 		/* accurate value known */
 		GlobalVisTempRels.maybe_needed = GlobalVisTempRels.definitely_needed;
+
+		/* Do basic sanity check on these XIDs */
+		Assert(FullTransactionIdPrecedesOrEquals(GlobalVisSharedRels.maybe_needed,
+ GlobalVisSharedRels.definitely_needed));
+		Assert(FullTransactionIdPrecedesOrEquals(GlobalVisCatalogRels.maybe_needed,
+ GlobalVisCatalogRels.definitely_needed));
+		Assert(FullTransactionIdPrecedesOrEquals(GlobalVisDataRels.maybe_needed,
+ GlobalVisDataRels.definitely_needed));
+		/* not much point in checking GlobalVisTempRels, given the above */
 	}
 
 	RecentXmin = xmin;
@@ -4020,6 +4029,8 @@ GlobalVisTestFor(Relation rel)
 
 	Assert(FullTransactionIdIsValid(state->definitely_needed) &&
 		   FullTransactionIdIsValid(state->maybe_needed));
+	Assert(FullTransactionIdPrecedesOrEquals(state->maybe_needed,
+			 state->definitely_needed));
 
 	return state;
 }
@@ -4085,6 +4096,15 @@ GlobalVisUpdateApply(ComputeXidHorizonsResult *horizons)
 			   GlobalVisDataRels.definitely_needed);
 	GlobalVisTempRels.definitely_needed = GlobalVisTempRels.maybe_needed;
 
+	/* Do basic sanity check on these XIDs */
+	Assert(FullTransactionIdPrecedesOrEquals(GlobalVisSharedRels.maybe_needed,
+			 GlobalVisSharedRels.definitely_needed));
+	Assert(FullTransactionIdPrecedesOrEquals(GlobalVisCatalogRels.maybe_needed,
+			 GlobalVisCatalogRels.definitely_needed));
+	Assert(FullTransactionIdPrecedesOrEquals(GlobalVisDataRels.maybe_needed,
+			 GlobalVisDataRels.definitely_needed));
+	/* not much point in checking GlobalVisTempRels, given the above */
+
 	ComputeXidHorizonsResultLastXmin = RecentXmin;
 }
 
-- 
2.17.0

>From 544a88b1651b0db177d5883219c7ae75f1100f7b Mon Sep 17 00:00:00 2001
From: "Drouvot, Bertrand" 
Date: Tue, 18 May 2021 13:26:38 +0200
Subject: [PATCH 2/3] pg_upgrade can result in early wraparound on databases
 with high transaction load
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

On 5/4/21 10:17 AM, Drouvot, Bertrand wrote:

Copy/pasting Andres feedback (Thanks Andres for this feedback) on those
questions from another thread [1].

 > I was also wondering if:
 >
 > * We should keep the old behavior in case pg_resetwal -x is being used
 > without -u?  (The proposed patch does not set an arbitrary oldestXID
 > anymore in  case -x is used)

Andres: I don't think we should. I don't see anything in the old
behaviour worth
maintaining.

 > * We should ensure that the xid provided with -x or -u is
 > >= FirstNormalTransactionId (Currently the only check is that it is
 > # 0)?

Andres: Applying TransactionIdIsNormal() seems like a good idea.

=> I am attaching a new version that makes use of
TransactionIdIsNormal() checks.

Re: visibility map corruption

2021-07-06 Thread Tom Lane

Peter Geoghegan  writes:
> ... We should just carry forward the original oldestXid.

Yup.  It's a bit silly that we recognized the need to do that
for oldestMultiXid yet not for oldestXid.

BTW, is it really necessary for copy_xact_xlog_xid to invoke pg_resetwal
so many times?  Why can't we pass all of the update-this options in one
call?

Who's going to do the legwork on this?

regards, tom lane

Re: visibility map corruption

2021-07-06 Thread Peter Geoghegan

On Tue, Jul 6, 2021 at 10:58 AM Bruce Momjian  wrote:
> Well, pg_upgrade corruptions are rare, but so is modifying
> autovacuum_freeze_max_age.  If we have a corruption and we know
> autovacuum_freeze_max_age was modified, odds are that is the cause.

My point is that there isn't necessarily that much use in trying to
determine what really happened here. It would be nice to know for
sure, but it shouldn't affect what we do about the bug.

In a way the situation with the bug is simple. Clearly Tom wasn't
thinking of pg_upgrade when he wrote the relevant pg_resetwal code
that sets oldestXid, because pg_upgrade didn't exist at the time. He
was thinking of restoring the database to a relatively sane state in
the event of some kind of corruption, with all of the uncertainty that
goes with that. Nobody noticed that pg_upgrade gets this same behavior
until much more recently.

Now that we see the problem laid out, there isn't much to think about
that will affect the response to the issue. At least as far as I can
tell. We know that pg_upgrade uses pg_resetwal's -x flag in a context
where there is no reason at all to think that the database is corrupt
-- Tom can't have anticipated that all those years ago. It's easy to
see that the behavior is wrong for pg_upgrade, and it's very hard to
imagine any way in which it might have accidentally made some sense
all along. We should just carry forward the original oldestXid.

-- 
Peter Geoghegan

Re: [PATCH] Pull general SASL framework out of SCRAM

2021-07-06 Thread Jacob Champion

On Mon, 2021-07-05 at 17:17 +0900, Michael Paquier wrote:
> On Wed, Jun 30, 2021 at 10:30:12PM +, Jacob Champion wrote:
> > Done in v3, with a second patch for the code motion.
> 
> I have gone through that, tweaking the documentation you have added as
> that's the meat of the patch, reworking a bit the declarations of the
> callbacks (no need for several typedef gere) and doing some small
> format changes to make the indentation happy.  And that looks pretty
> good.

Looks very good, thanks! A few comments on the docs changes:

> +  * Output parameters:
> +  *
> +  *  buf:  A StringInfo buffer that the callback should populate with
> +  *supported mechanism names.  The names are appended 
> into this
> +  *StringInfo, separated by '\0' bytes.

Each name must be null-terminated, not just null-separated. That way
the list of names ends with an empty string:

name-one\0  <- added by the mechanism
  name-two\0<- added by the mechanism
\0  <- added by the framework

The way it's worded now, I could see some implementers failing to
terminate the final name because the framework adds a trailing null
already -- but the framework is terminating the list, not the final
name.

> +  * init()
> +  *
> +  * Initializes mechanism-specific state for a connection.  This
> +  * callback must return a pointer to its allocated state, which will
> +  * be passed as-is as the first argument to the other callbacks.
> +  * free() is called to release any state resources.

Maybe say "The free() callback is called" to differentiate it from
standard free()?

> It is a bit sad that the SCRAM part cannot be completely
> unplugged from the auth part, because of the call to the free function
> and the HBA checks, but adding more wrappers to accomodate with that
> is not really worth it.

Yeah. I think that additional improvements/refactoring here will come
naturally if clients are ever allowed to negotiate SASL mechanisms in
the future. Doesn't need to happen now.

> -   if (!pg_fe_scram_channel_bound(conn->sasl_state))
> +   if (!conn->sasl || 
> !conn->sasl->channel_bound(conn->sasl_state))
> conn->sasl should be set in this code path.  This style is safer.

It's possible for conn->sasl to be NULL here, say if the client has
channel_binding=require but connects as a user with an MD5 secret. The
SCRAM TAP tests have one such case.

> The top comment of scram_init() still mentioned
> pg_be_scram_get_mechanisms(), while it should be
> scram_get_mechanisms().
> 
> PG_MAX_SASL_MESSAGE_LENGTH can stay within auth-sasl.c.

Looks good to me.

> > - I don't think it's legal for a client to refuse a challenge from the
> > server without aborting the exchange, so we should probably check to
> > make sure that client responses are non-NULL in the success case.
> 
> Hmm.  Does the RFCs tell us anything about that?

Just in general terms:

>Each authentication exchange consists of a message from the client to
>the server requesting authentication via a particular mechanism,
>followed by one or more pairs of challenges from the server and
>responses from the client, followed by a message from the server
>indicating the outcome of the authentication exchange.  (Note:
>exchanges may also be aborted as discussed in Section 3.5.)

So a challenge must be met with a response, or the exchange must be
aborted. (And I don't think our protocol implementation provides a
client abort message; if something goes wrong, we just tear down the
connection.)

Thanks,
--Jacob

Re: Pipeline mode and PQpipelineSync()

2021-07-06 Thread Alvaro Herrera

On 2021-Jul-06, Boris Kolpackov wrote:

> Alvaro Herrera  writes:
> 
> > Ah, yes it does.  I can reproduce this now.  I thought PQconsumeInput
> > was sufficient, but it's not: you have to have the PQgetResult in there
> > too.  Looking ...
> 
> Any progress on fixing this?

Yeah ... the problem as I understand it is that the state transition in
libpq when the connection is in pipeline aborted state is bogus.  I'll
post a patch in a bit.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
#error "Operator lives in the wrong universe"
  ("Use of cookies in real-time system development", M. Gleixner, M. Mc Guire)

Re: Pipeline mode and PQpipelineSync()

2021-07-06 Thread Alvaro Herrera

On 2021-Jul-06, Boris Kolpackov wrote:

> Alvaro Herrera  writes:
> 
> > Ah, yes it does.  I can reproduce this now.  I thought PQconsumeInput
> > was sufficient, but it's not: you have to have the PQgetResult in there
> > too.  Looking ...
> 
> Any progress on fixing this?

Can you please try with this patch?

Thanks

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
>From c84b66821249631ba1654b22866ca54c49f238c4 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 6 Jul 2021 12:46:42 -0400
Subject: [PATCH] fix libpq state machine

---
 src/interfaces/libpq/fe-exec.c | 46 ++
 1 file changed, 35 insertions(+), 11 deletions(-)

diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index b13ddab393..1295a417c1 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -1223,7 +1223,8 @@ pqAllocCmdQueueEntry(PGconn *conn)
 
 /*
  * pqAppendCmdQueueEntry
- *		Append a caller-allocated command queue entry to the queue.
+ *		Append a caller-allocated command queue entry to the queue, and update
+ *		conn->asyncStatus as needed to account for it.
  *
  * The query itself must already have been put in the output buffer by the
  * caller.
@@ -1239,6 +1240,33 @@ pqAppendCmdQueueEntry(PGconn *conn, PGcmdQueueEntry *entry)
 		conn->cmd_queue_tail->next = entry;
 
 	conn->cmd_queue_tail = entry;
+
+	switch (conn->pipelineStatus)
+	{
+		case PQ_PIPELINE_OFF:
+		case PQ_PIPELINE_ON:
+			/*
+			 * If there's a result ready to be consumed, let it be so (that is,
+			 * don't change away from READY or READY_MORE); otherwise we wait
+			 * for some result to arrive from the server.
+			 */
+			if (conn->asyncStatus == PGASYNC_IDLE)
+conn->asyncStatus = PGASYNC_BUSY;
+			break;
+
+			/*
+			 * If in aborted pipeline state, we don't expect anything from the
+			 * server, so we have to let PQgetResult consume from the aborted
+			 * queue right away.
+			 */
+		case PQ_PIPELINE_ABORTED:
+			if (conn->asyncStatus == PGASYNC_IDLE)
+			{
+resetPQExpBuffer(>errorMessage);
+pqPipelineProcessQueue(conn);
+			}
+			break;
+	}
 }
 
 /*
@@ -1375,7 +1403,6 @@ PQsendQueryInternal(PGconn *conn, const char *query, bool newQuery)
 
 	/* OK, it's launched! */
 	pqAppendCmdQueueEntry(conn, entry);
-	conn->asyncStatus = PGASYNC_BUSY;
 	return 1;
 
 sendFailed:
@@ -1510,10 +1537,6 @@ PQsendPrepare(PGconn *conn,
 	/* if insufficient memory, query just winds up NULL */
 	entry->query = strdup(query);
 
-	pqAppendCmdQueueEntry(conn, entry);
-
-	conn->asyncStatus = PGASYNC_BUSY;
-
 	/*
 	 * Give the data a push (in pipeline mode, only if we're past the size
 	 * threshold).  In nonblock mode, don't complain if we're unable to send
@@ -1522,6 +1545,8 @@ PQsendPrepare(PGconn *conn,
 	if (pqPipelineFlush(conn) < 0)
 		goto sendFailed;
 
+	pqAppendCmdQueueEntry(conn, entry);
+
 	return 1;
 
 sendFailed:
@@ -1815,7 +1840,7 @@ PQsendQueryGuts(PGconn *conn,
 
 	/* OK, it's launched! */
 	pqAppendCmdQueueEntry(conn, entry);
-	conn->asyncStatus = PGASYNC_BUSY;
+
 	return 1;
 
 sendFailed:
@@ -2445,7 +2470,7 @@ PQsendDescribe(PGconn *conn, char desc_type, const char *desc_target)
 
 	/* OK, it's launched! */
 	pqAppendCmdQueueEntry(conn, entry);
-	conn->asyncStatus = PGASYNC_BUSY;
+
 	return 1;
 
 sendFailed:
@@ -3072,15 +3097,14 @@ PQpipelineSync(PGconn *conn)
 		pqPutMsgEnd(conn) < 0)
 		goto sendFailed;
 
-	pqAppendCmdQueueEntry(conn, entry);
-
 	/*
 	 * Give the data a push.  In nonblock mode, don't complain if we're unable
 	 * to send it all; PQgetResult() will do any additional flushing needed.
 	 */
 	if (PQflush(conn) < 0)
 		goto sendFailed;
-	conn->asyncStatus = PGASYNC_BUSY;
+
+	pqAppendCmdQueueEntry(conn, entry);
 
 	return 1;
 
-- 
2.20.1

Re: visibility map corruption

2021-07-06 Thread Bruce Momjian

On Tue, Jul  6, 2021 at 10:32:24AM -0700, Peter Geoghegan wrote:
> On Tue, Jul 6, 2021 at 10:27 AM Bruce Momjian  wrote:
> > OK, this is confirmation that the pg_resetwal bug, and its use by
> > pg_upgrade, is a serious issue that needs to be addressed.  I am
> > prepared to work on it now.
> 
> To be clear, I'm not 100% sure that this is related to the pg_upgrade
> + "pg_resetwal sets oldestXid to an invented value" issue. I am sure
> that that is a serious issue that needs to be addressed sooner rather
> than later, though.

Well, pg_upgrade corruptions are rare, but so is modifying
autovacuum_freeze_max_age.  If we have a corruption and we know
autovacuum_freeze_max_age was modified, odds are that is the cause.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Planning time grows exponentially with levels of nested views

2021-07-06 Thread Tom Lane

Dean Rasheed  writes:
> I took a look at this and wasn't able to find any way to break it, and
> your argument that it can't really make such rewriter bugs any worse
> makes sense.

Thanks for looking!

> Would it make sense to update the comment prior to copying the subquery?

Yeah, I hadn't touched that yet because the question was exactly about
whether it's correct or not.  I think we can shorten it to

 * Need a modifiable copy of the subquery to hack on, so that the
 * RTE can be left unchanged in case we decide below that we can't
 * pull it up after all.

> Out of curiosity, I also tested DML against these deeply nested views
> to see how the pull-up code in the rewriter compares in terms of
> performance, since it does a very similar job. As expected, it's
> O(N^2) as well, but it's about an order of magnitude faster:

Oh good.  I hadn't thought to look at that angle of things.

> ... for a one-line change, that's pretty impressive.

Yeah.  The problem might get less bad if we get anywhere with the
idea I suggested at [1].  If we can reduce the number of RTEs
in a view's query, then copying it would get cheaper.  Still,
not copying it at all is always going to be better.  I'll go
ahead and push the patch.

regards, tom lane

[1] https://www.postgresql.org/message-id/697679.1625154303%40sss.pgh.pa.us

Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb

2021-07-06 Thread Tom Lane

Amul Sul  writes:
> On Tue, Apr 20, 2021 at 6:59 AM Kyotaro Horiguchi
>  wrote:
>> I don't mind RelationGetSmgr(index)->smgr_rnode alone or
>> >member alone and there's not the previous call to
>> RelationGetSmgr just above. How about using a temporary variable?
>> 
>> SMgrRelation srel = RelationGetSmgr(index);
>> smgrwrite(srel, ...);
>> log_newpage(srel->..);

> Understood.  Used a temporary variable for the place where
> RelationGetSmgr() calls are placed too close or in a loop.

[ squint... ]  Doesn't this risk introducing exactly the sort of
cache-clobber hazard we're trying to prevent?  That is, the above is
not safe unless you are *entirely* certain that there is not and never
will be any possibility of a relcache flush before you are done using
the temporary variable.  Otherwise it can become a dangling pointer.

The point of the static-inline function idea was to be cheap enough
that it isn't worth worrying about this sort of risky optimization.
Given that an smgr function is sure to involve some kernel calls,
I doubt it's worth sweating over an extra test-and-branch beforehand.
So where I was hoping to get to is that smgr objects are *only*
referenced by RelationGetSmgr() calls and nobody ever keeps any
other pointers to them across any non-smgr operations.

regards, tom lane

Re: visibility map corruption

2021-07-06 Thread Peter Geoghegan

On Tue, Jul 6, 2021 at 10:27 AM Bruce Momjian  wrote:
> OK, this is confirmation that the pg_resetwal bug, and its use by
> pg_upgrade, is a serious issue that needs to be addressed.  I am
> prepared to work on it now.

To be clear, I'm not 100% sure that this is related to the pg_upgrade
+ "pg_resetwal sets oldestXid to an invented value" issue. I am sure
that that is a serious issue that needs to be addressed sooner rather
than later, though.

-- 
Peter Geoghegan

Re: Planning time grows exponentially with levels of nested views

2021-07-06 Thread Dean Rasheed

On Sun, 18 Apr 2021 at 21:42, Tom Lane  wrote:
>
> > If multiple references are actually possible then this'd break it.
>
> I think this patch doesn't make things any worse for such a case though.
> If we re-introduced such a bug, the result would be an immediate null
> pointer crash while trying to process the second reference to a
> flattenable subquery.  That's probably better for debuggability than
> what happens now, where we just silently process the duplicate reference.
>

I took a look at this and wasn't able to find any way to break it, and
your argument that it can't really make such rewriter bugs any worse
makes sense.

Would it make sense to update the comment prior to copying the subquery?

Out of curiosity, I also tested DML against these deeply nested views
to see how the pull-up code in the rewriter compares in terms of
performance, since it does a very similar job. As expected, it's
O(N^2) as well, but it's about an order of magnitude faster:

(times to run a plain EXPLAIN in ms, with patch)

 | SELECT | INSERT | UPDATE | DELETE
-++++
v64  |  1.259 |  0.189 |  0.250 |  0.187
v128 |  5.035 |  0.506 |  0.547 |  0.509
v256 | 20.393 |  1.633 |  1.607 |  1.638
v512 | 81.101 |  6.649 |  6.517 |  6.699

Maybe that's not surprising, since there's less to do at that stage.
Anyway, it's reassuring to know that it copes OK with this (I've seen
some quite deeply nested views in practice, but never that deep).

For comparison, this is what SELECT performance looked like for me
without the patch:

 | SELECT
-+--
v64  |9.589
v128 |   73.292
v256 |  826.964
v512 | 7844.419

so, for a one-line change, that's pretty impressive.

Regards,
Dean

Re: visibility map corruption

2021-07-06 Thread Bruce Momjian

On Sun, Jul 4, 2021 at 10:28:25PM +, Floris Van Nee wrote:
> >
> > I wonder if it's related to this issue:
> >
> > https://www.postgresql.org/message-
> > id/20210423234256.hwopuftipdmp3...@alap3.anarazel.de
> >
> > Have you increased autovacuum_freeze_max_age from its default? This
> > already sounds like the kind of database where that would make
> > sense.
> >
>
> autovacuum_freeze_max_age is increased in our setup indeed (it is
> set to 500M). However, we do regularly run manual VACUUM (FREEZE)
> on individual tables in the database, including this one. A lot of
> tables in the database follow an INSERT-only pattern and since it's
> not running v13 yet, this meant that these tables would only rarely
> be touched by autovacuum. Autovacuum would sometimes kick in on some
> of these tables at the same time at unfortunate moments. Therefore we
> have some regular job running that VACUUM (FREEZE)s tables with a xact
> age higher than a (low, 10M) threshold ourselves.

OK, this is confirmation that the pg_resetwal bug, and its use by
pg_upgrade, is a serious issue that needs to be addressed.  I am
prepared to work on it now.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Grammar railroad diagram

2021-07-06 Thread Domingo Alvarez Duarte


Hello Bruce !

You can download the railroad generator to generate offline using Java 
here -> https://www.bottlecaps.de/rr/download/rr-1.63-java8.zip (link 
from the https://www.bottlecaps.de/rr/ui on tab Welcome).


java -jar rr.war -out:Dafny.atg.xhtml grammar.txt

Cheers !

On 6/7/21 18:51, Bruce Momjian wrote:

On Sat, Jul  3, 2021 at 10:39:02AM +0200, Domingo Alvarez Duarte wrote:

I've done a experimental tool to convert bison grammars to a kind of EBNF
understood by https://www.bottlecaps.de/rr/ui to generate railroad diagrams see
bellow the converted 'postgresql-13.3/src/backend/parser/gram.y' and with some
hand made changes to allow view it at https://www.bottlecaps.de/rr/ui the order
of the rules could be changed to a better view of the railroad diagrams. Copy
and paste the EBNF bellow on https://www.bottlecaps.de/rr/ui tab Edit Grammar
then switch to the tab View Diagram.

That is pretty cool.  I had trouble figuring out how to get it working,
so here are the steps I used:

1.  save my attachment (created by Domingo)
2.  go to https://www.bottlecaps.de/rr/ui
3.  select "Edit Grammar"
4.  choose "Browse" at the bottom
5.  select the attachment you saved in #1
6.  choose "Load" at the bottom
7.  select "View Diagram"

You can even click on the yellow boxes to see the sub-grammar.  People
have asked for railroad diagrams in the past, and this certainly
produces them, and "Options" allows many customizations.

I tried downloading as XHTML+SVG and HTML+PNG but got an error:

HTTP Status 500 – Internal Server Error

Type Exception Report

Message The multi-part request contained parameter data (excluding
uploaded files) that exceeded the limit for maxPostSize set on the
associated connector

Description The server encountered an unexpected condition that
prevented it from fulfilling the request.

It might be nice to download this output and host it on the Postgres
website at some point.

Re: Grammar railroad diagram

2021-07-06 Thread Bruce Momjian

On Sat, Jul  3, 2021 at 10:39:02AM +0200, Domingo Alvarez Duarte wrote:
> I've done a experimental tool to convert bison grammars to a kind of EBNF
> understood by https://www.bottlecaps.de/rr/ui to generate railroad diagrams 
> see
> bellow the converted 'postgresql-13.3/src/backend/parser/gram.y' and with some
> hand made changes to allow view it at https://www.bottlecaps.de/rr/ui the 
> order
> of the rules could be changed to a better view of the railroad diagrams. Copy
> and paste the EBNF bellow on https://www.bottlecaps.de/rr/ui tab Edit Grammar
> then switch to the tab View Diagram.

That is pretty cool.  I had trouble figuring out how to get it working,
so here are the steps I used:

1.  save my attachment (created by Domingo)
2.  go to https://www.bottlecaps.de/rr/ui
3.  select "Edit Grammar"
4.  choose "Browse" at the bottom
5.  select the attachment you saved in #1
6.  choose "Load" at the bottom
7.  select "View Diagram"

You can even click on the yellow boxes to see the sub-grammar.  People
have asked for railroad diagrams in the past, and this certainly
produces them, and "Options" allows many customizations.

I tried downloading as XHTML+SVG and HTML+PNG but got an error:

HTTP Status 500 – Internal Server Error

Type Exception Report

Message The multi-part request contained parameter data (excluding
uploaded files) that exceeded the limit for maxPostSize set on the
associated connector

Description The server encountered an unexpected condition that
prevented it from fulfilling the request.

It might be nice to download this output and host it on the Postgres
website at some point.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

stmtblock ::= stmtmulti
stmtmulti ::= stmtmulti ';' stmt | stmt
stmt ::= AlterEventTrigStmt | AlterCollationStmt | AlterDatabaseStmt | 
AlterDatabaseSetStmt | AlterDefaultPrivilegesStmt | AlterDomainStmt | 
AlterEnumStmt | AlterExtensionStmt | AlterExtensionContentsStmt | 
AlterFdwStmt | AlterForeignServerStmt | AlterForeignTableStmt | 
AlterFunctionStmt | AlterGroupStmt | AlterObjectDependsStmt | 
AlterObjectSchemaStmt | AlterOwnerStmt | AlterOperatorStmt | 
AlterTypeStmt | AlterPolicyStmt | AlterSeqStmt | AlterSystemStmt | 
AlterTableStmt | AlterTblSpcStmt | AlterCompositeTypeStmt | 
AlterPublicationStmt | AlterRoleSetStmt | AlterRoleStmt | 
AlterSubscriptionStmt | AlterStatsStmt | AlterTSConfigurationStmt | 
AlterTSDictionaryStmt | AlterUserMappingStmt | AnalyzeStmt | CallStmt | 
CheckPointStmt | ClosePortalStmt | ClusterStmt | CommentStmt | 
ConstraintsSetStmt | CopyStmt | CreateAmStmt | CreateAsStmt | 
CreateAssertionStmt | CreateCastStmt | CreateConversionStmt | 
CreateDomainStmt | CreateExtensionStmt | CreateFdwStmt | 
CreateForeignServerStmt | CreateForeignTableStmt | CreateFunctionStmt | 
CreateGroupStmt | CreateMatViewStmt | CreateOpClassStmt | 
CreateOpFamilyStmt | CreatePublicationStmt | AlterOpFamilyStmt | 
CreatePolicyStmt | CreatePLangStmt | CreateSchemaStmt | CreateSeqStmt | 
CreateStmt | CreateSubscriptionStmt | CreateStatsStmt | 
CreateTableSpaceStmt | CreateTransformStmt | CreateTrigStmt | 
CreateEventTrigStmt | CreateRoleStmt | CreateUserStmt | 
CreateUserMappingStmt | CreatedbStmt | DeallocateStmt | 
DeclareCursorStmt | DefineStmt | DeleteStmt | DiscardStmt | DoStmt | 
DropCastStmt | DropOpClassStmt | DropOpFamilyStmt | DropOwnedStmt | 
DropPLangStmt | DropStmt | DropSubscriptionStmt | DropTableSpaceStmt | 
DropTransformStmt | DropRoleStmt | DropUserMappingStmt | DropdbStmt | 
ExecuteStmt | ExplainStmt | FetchStmt | GrantStmt | GrantRoleStmt | 
ImportForeignSchemaStmt | IndexStmt | InsertStmt | ListenStmt | 
RefreshMatViewStmt | LoadStmt | LockStmt | NotifyStmt | PrepareStmt | 
ReassignOwnedStmt | ReindexStmt | RemoveAggrStmt | RemoveFuncStmt | 
RemoveOperStmt | RenameStmt | RevokeStmt | RevokeRoleStmt | RuleStmt | 
SecLabelStmt | SelectStmt | TransactionStmt | TruncateStmt | 
UnlistenStmt | UpdateStmt | VacuumStmt | VariableResetStmt | 
VariableSetStmt | VariableShowStmt | ViewStmt |
CallStmt ::= CALL func_application
CreateRoleStmt ::= CREATE ROLE RoleId opt_with OptRoleList
opt_with ::= WITH | WITH_LA |
OptRoleList ::= OptRoleList CreateOptRoleElem |
AlterOptRoleList ::= AlterOptRoleList AlterOptRoleElem |
AlterOptRoleElem ::= PASSWORD Sconst | PASSWORD NULL_P | ENCRYPTED 
PASSWORD Sconst | UNENCRYPTED PASSWORD Sconst | INHERIT | CONNECTION 
LIMIT SignedIconst | VALID UNTIL Sconst | USER role_list | IDENT
CreateOptRoleElem ::= AlterOptRoleElem | SYSID Iconst | ADMIN role_list 
| ROLE role_list | IN_P ROLE role_list | IN_P GROUP_P role_list
CreateUserStmt ::= CREATE USER RoleId opt_with OptRoleList
AlterRoleStmt ::= ALTER ROLE RoleSpec opt_with AlterOptRoleList | ALTER 
USER RoleSpec

Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-07-06 Thread Mark Dilger

> On Jul 5, 2021, at 1:50 AM, Andrey Borodin  wrote:
> 
> I'm not sure, but maybe we should allow replication role to change 
> session_replication_role?

Thanks, Andrey, for taking a look.

Yes, there is certainly some logic to that suggestion.  The patch v4-0005 only 
delegates authority to perform ALTER SYSTEM SET to three roles:  
pg_database_security, pg_network_security, and pg_host_security.  I don't mind 
expanding this list to include the replication attribute, but I am curious 
about opinions on the general design.  There may be an advantage in keeping the 
list short.  In particular, as the list gets longer, will it get harder to 
decide which role to associate with each new GUC that gets added?  For 
third-party extensions, will it be harder for them to decide in any principled 
way which role to assign to each GUC that they add?  There are multiple ways to 
cut up the set of all GUCs.  database/host/network is not an entirely clean 
distinction, and perhaps database/host/network/replication is better, but I'm 
uncertain. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: logical replication worker accesses catalogs in error context callback

2021-07-06 Thread Tom Lane

Bharath Rupireddy  writes:
> How about making the below else if statement and the attname
> assignment into a single line? They are falling below the 80 char
> limit.
> else if (colno > 0 && colno <= list_length(rte->eref->colnames))
> attname = strVal(list_nth(rte->eref->colnames, colno - 1));

Pushed that way.  (I think possibly I'd had this code further indented
in its first incarnation, thus the extra line breaks.)

regards, tom lane

Re: proposal - log_full_scan

2021-07-06 Thread Pavel Stehule

Hi

út 6. 7. 2021 v 16:07 odesílatel Daniel Gustafsson  napsal:

> Looking at this I like the idea in principle, but I'm not convinced that
> auto_explain is the right tool for this.  auto_explain is for identifying
> slow
> queries, and what you are proposing is to identify queries with a certain
> "shape" (for lack of a better term) even if they aren't slow as per the
> log_min_duration setting.  If log_min_duration is deemed to crude due to
> query
> volume then sample_rate is the tool.  If sample_rate is also discarded,
> then
> pg_stat_statements seems a better option.
>

I don't think so pg_stat_statements can be used - a) it doesn't check
execution plan, so this feature can have big overhead against current
pg_stat_statements, that works just with AST, b) pg_stat_statements has one
entry per AST - but this can be problem on execution plan level, and this
is out of perspective of pg_stat_statements.

>
> Also, why just sequential scans (apart from it being this specific
> usecase)?
> If the idea is to track aspects of execution which are deemed slow, then
> tracking for example spills etc would be just as valid.  Do you have
> thoughts
> on that?
>

Yes, I thought about it more, and sometimes bitmap index scans are
problematic too, index scans in nested loops can be a problem too.

For my last customer I had to detect queries with a large bitmap index
scan. I can do it with a combination of pg_stat_statements and log
checking, but this work is not very friendly.

My current idea is to have some extension that can be tran for generally
specified executor nodes.

Sometimes I can say - I need to know all queries that does seq scan over
tabx where tuples processed > N. In other cases can be interesting to know
the queries that uses index x for bitmap index scan,


>
> That being said, a few comments on the patch:
>
> -   (auto_explain_log_min_duration >= 0 && \
> +   ((auto_explain_log_min_duration >= 0 || auto_explain_log_seqscan
> != -1) && \
> Is there a reason to not follow the existing code and check for >= 0?
>
> +   DefineCustomIntVariable("auto_explain.log_seqscan",
> It's only a matter of time before another node is proposed for logging, and
> then we'll be stuck adding log_XXXnode GUCs.  Is there a more future-proof
> way
> to do this?
>
> +   "Sets the minimum tuples produced by sequantial scans which plans
> will be logged",
> s/sequantial/sequential/
>
> -   es->analyze = (queryDesc->instrument_options &&
> auto_explain_log_analyze);
> +   es->analyze = (queryDesc->instrument_options &&
> (auto_explain_log_analyze || auto_explain_log_seqscan != -1));
> Turning on ANALYZE when log_analyze isn't set to True is a no-no IMO.
>
> + * Colllect relations where log_seqscan limit was exceeded
> s/Colllect/Collect/
>
> +   if (*relnames.data != '\0')
> +   appendStringInfoString(, ",");
> This should use appendStringInfoChar instead.
>
> +   (errmsg("duration: %.3f ms, over limit seqscans: %s, plan:\n%s",
> The "over limit" part is superfluous since it otherwise wouldn't be
> logged.  If
> we're prefixing something wouldn't it be more helpful to include the limit,
> like: "seqscans >= %d tuples returned:".  I'm not a fan of "seqscans" but
> spelling it out is also quite verbose and this is grep-able.
>
> Documentation and tests are also missing
>

Unfortunately, this idea is not well prepared. My patch was a proof concept
- but I think so it is not a good start point. Maybe it needs some tracing
API on executor level and some tool like "perf top", but for executor. Post
execution analysis is not a good direction with big overhead, and mainly it
is not friendly in critical situations. I need some handy tool like perf,
but for executor nodes. I don't know how to do it effectively.

Thank you for your review and for your time, but I think it is better to
remove this patch from commit fest. I have no idea how to design this
feature well :-/

Regards

Pavel




>
> --
> Daniel Gustafsson   https://vmware.com/
>
>

Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options

2021-07-06 Thread Alvaro Herrera

On 2021-Jul-06, Bharath Rupireddy wrote:

> Thanks, Amit. I'm posting the 0002 patch which removes extra ereport
> calls using local variables. Please review it.

I looked at this the other day and I'm not sure I like it very much.
It's not making anything any simpler, it's barely saving two lines of
code.  I think we can do without this change.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

2021-07-06 Thread James Coleman

On Tue, Jul 6, 2021 at 11:03 AM Ronan Dunklau  wrote:
>
> Thank you for the review, I will address those shortly, but will answer some
> questions in the meantime.
>
> > > First, the changes are lacking any explanatory comments. Probably we
> > > should follow how nodeAgg does this and add both comments to the
> > > ExecSort function header as well as specific comments above the "if"
> > > around the new tuplesort_begin_datum explaining the specific
> > > conditions that are required for the optimization to be useful and
> > > safe.
>
> Done, since I lifted the restrictions following your questions, there isn't
> much left to comment. (see below)
>
> > >
> > > That leads to a question I had: I don't follow why bounded mode (when
> > > using byval) needs to be excluded. Comments should be added if there's
> > > a good reason (as noted above), but maybe it's a case we can handle
> > > safely?
>
> I had test failures when trying to move the Datum around when performing a
> bounded sort, but did not look into it at first.
>
> Now I've looked into it, and the switch to a heapsort when using bounded mode
> just unconditionnaly tried to free a tuple that was never there to begin with.
> So if the SortTuple does not contain an actual tuple, but only a single datum,
> do not do that.
>
> I've updated the patch to fix this and enable the optimization in the case of
> bounded sort.

Awesome.

> > > A second question: at first glance it's intuitively the case we might
> > > not be able to handle byref values. But nodeAgg doesn't seem to have
> > > that restriction. What's the difference here?
> >
> > I think tuplesort_begin_datum, doesn't have any such limitation, it
> > can handle any type of Datum so I think we don't need to consider the
> > only attbyval, we can consider any type of attribute for this
> > optimization.
>
> I've restricted the optimization to byval types because of the following
> comment in nodeAgg.c:
>
> /*
>  * Note: if input type is pass-by-ref, the datums returned by the
> sort are
>  * freshly palloc'd in the per-query context, so we must be careful
> to
>  * pfree them when they are no longer needed.
>  */
>
> As I was not sure how to handle that, I prefered the safety of not enabling
> it. Since you both told me it should be safe, I've lifted that restriction
> too.

To be clear, I don't know for certain it's safe [without extra work],
but even if it involves some extra manual pfree'ing (a la nodeAgg)
it's probably worth it. Maybe someone else will weigh in on whether or
not anything special is required here to ensure we don't leak memory
(I haven't looked in detail yet).

> > A few small code observations:
> > - In my view the addition of unlikely() in ExecSort is unlikely to be
> > of benefit because it's a single call for the entire node's execution
> > (not in the tuple loop).
>
> Done.
>
> > - It seems clearer to change the "if (!node->is_single_val)" to flip
> > the true/false cases so we don't need the negation.
>
> Agreed, done.

Thanks

> > - I assume there are tests that likely already cover this case, but
> > it'd be worth verifying that.
>
> Yes many test cases cover that, but maybe it would be better to explictly
> check for it on some cases: do you think we could add a debug message that can
> be checked for ?

Mostly I think we should verify code coverage and _maybe_ add a
specific test or two that we know execises this path. I don't know
that the debug message needs to be matched in the test (probably more
pain than it's worth), but the debug message ("enabling datum sort
optimizaton" or similar) might be good anyway.

I wonder if we need to change costing of sorts for this case. I don't
like having to do so, but it's a significant change in speed, so
probably should impact what plan gets chosen. Hopefully others will
weigh on this also.

> > Finally, I believe the same optimization likely ought to be added to
> > nodeIncrementalSort. It's less likely the tests there are sufficient
> > for both this and the original case, but we'll see.
>
> I will look into it, but isn't incrementalsort used to sort tuples on several
> keys, when they are already sorted on the first ? In that case, I doubt we
> would ever have a single-valued tuple here, except if there is a projection to
> strip the tuple from extraneous attributes.

Yes and no. When incremental sort has to do a full sort there will
always be at least 2 attributes. But in prefix sort mode (see
prefixsort_state) only non-presorted columns are sorted (i.e., if
given a,b already sorted by a, then only b is sorted). So the
prefixsort_state could use this optimization.

James

Re: Enhanced error message to include hint messages for redundant options error

2021-07-06 Thread vignesh C

On Wed, Jun 30, 2021 at 7:48 PM vignesh C  wrote:
>
> On Thu, May 13, 2021 at 8:09 PM vignesh C  wrote:
> >
> > On Thu, May 13, 2021 at 4:58 AM Alvaro Herrera  
> > wrote:
> > >
> >
> > Thanks for the comments, Attached patch has the changes for the same.
> >
>
> The Patch was not applying on Head, the attached patch is rebased on
> top of Head.

The patch was not applying on the head because of the recent commit
"8aafb02616753f5c6c90bbc567636b73c0cbb9d4", attached patch which is
rebased on HEAD.

Regards,
Vignesh
From 5ccc262d895d688fbca42d795ea01c7c16c09b54 Mon Sep 17 00:00:00 2001
From: vignesh 
Date: Tue, 6 Jul 2021 20:22:45 +0530
Subject: [PATCH v8] Enhance error message to include option name in case of
 duplicate option error.

Enhanced error message to include option name in case of duplication
option error, so that the user can easily identify the error.
---
 contrib/file_fdw/file_fdw.c |  13 +--
 src/backend/catalog/aclchk.c|  14 +--
 src/backend/commands/copy.c |  57 +++---
 src/backend/commands/dbcommands.c   |  76 +++--
 src/backend/commands/extension.c|  23 +---
 src/backend/commands/foreigncmds.c  |  21 ++--
 src/backend/commands/functioncmds.c |  59 --
 src/backend/commands/publicationcmds.c  |  30 ++---
 src/backend/commands/sequence.c |  48 ++--
 src/backend/commands/subscriptioncmds.c |  62 +-
 src/backend/commands/tablecmds.c|   4 +-
 src/backend/commands/typecmds.c |  34 ++
 src/backend/commands/user.c | 118 +---
 src/backend/parser/parse_utilcmd.c  |   4 +-
 src/backend/replication/pgoutput/pgoutput.c |  23 ++--
 src/backend/replication/walsender.c |  15 +--
 src/backend/tcop/utility.c  |  20 ++--
 src/include/commands/defrem.h   |  16 ++-
 src/include/commands/publicationcmds.h  |   4 +-
 src/include/commands/subscriptioncmds.h |   4 +-
 src/include/commands/typecmds.h |   2 +-
 src/include/commands/user.h |   2 +-
 src/test/regress/expected/copy2.out |  24 ++--
 src/test/regress/expected/foreign_data.out  |   8 +-
 src/test/regress/expected/publication.out   |   4 +-
 25 files changed, 232 insertions(+), 453 deletions(-)

diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 2c2f149fb0..f49dd47930 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -200,6 +200,7 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 	char	   *filename = NULL;
 	DefElem*force_not_null = NULL;
 	DefElem*force_null = NULL;
+	DefElem*def;
 	List	   *other_options = NIL;
 	ListCell   *cell;
 
@@ -209,7 +210,7 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 	 */
 	foreach(cell, options_list)
 	{
-		DefElem*def = (DefElem *) lfirst(cell);
+		def = (DefElem *) lfirst(cell);
 
 		if (!is_valid_option(def->defname, catalog))
 		{
@@ -290,10 +291,7 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 		else if (strcmp(def->defname, "force_not_null") == 0)
 		{
 			if (force_not_null)
-ereport(ERROR,
-		(errcode(ERRCODE_SYNTAX_ERROR),
-		 errmsg("conflicting or redundant options"),
-		 errhint("Option \"force_not_null\" supplied more than once for a column.")));
+ReportDuplicateOptionError(def, NULL);
 			force_not_null = def;
 			/* Don't care what the value is, as long as it's a legal boolean */
 			(void) defGetBoolean(def);
@@ -302,10 +300,7 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 		else if (strcmp(def->defname, "force_null") == 0)
 		{
 			if (force_null)
-ereport(ERROR,
-		(errcode(ERRCODE_SYNTAX_ERROR),
-		 errmsg("conflicting or redundant options"),
-		 errhint("Option \"force_null\" supplied more than once for a column.")));
+ReportDuplicateOptionError(def, NULL);
 			force_null = def;
 			(void) defGetBoolean(def);
 		}
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index 53392414f1..264cdc2730 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -59,6 +59,7 @@
 #include "catalog/pg_ts_template.h"
 #include "catalog/pg_type.h"
 #include "commands/dbcommands.h"
+#include "commands/defrem.h"
 #include "commands/event_trigger.h"
 #include "commands/extension.h"
 #include "commands/proclang.h"
@@ -910,30 +911,25 @@ ExecAlterDefaultPrivilegesStmt(ParseState *pstate, AlterDefaultPrivilegesStmt *s
 	List	   *nspnames = NIL;
 	DefElem*drolespecs = NULL;
 	DefElem*dnspnames = NULL;
+	DefElem*defel;
 	AclMode		all_privileges;
 	const char *errormsg;
 
 	/* Deconstruct the "options" part of the statement */
 	foreach(cell, stmt->options)
 	{
-		DefElem*defel = (DefElem *) lfirst(cell);
+		defel = (DefElem *) lfirst(cell);
 
 		if (strcmp(defel->defname, "schemas") == 0)
 		{
 			if (dnspnames)
-ereport(ERROR,
-		(errcode(ERRCODE_SYNTAX_ERROR),
-		 errmsg("conflicting or redundant

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

2021-07-06 Thread Ronan Dunklau

Thank you for the review, I will address those shortly, but will answer some 
questions in the meantime.

> > First, the changes are lacking any explanatory comments. Probably we
> > should follow how nodeAgg does this and add both comments to the
> > ExecSort function header as well as specific comments above the "if"
> > around the new tuplesort_begin_datum explaining the specific
> > conditions that are required for the optimization to be useful and
> > safe.

Done, since I lifted the restrictions following your questions, there isn't 
much left to comment. (see below)

> > 
> > That leads to a question I had: I don't follow why bounded mode (when
> > using byval) needs to be excluded. Comments should be added if there's
> > a good reason (as noted above), but maybe it's a case we can handle
> > safely?

I had test failures when trying to move the Datum around when performing a 
bounded sort, but did not look into it at first.

Now I've looked into it, and the switch to a heapsort when using bounded mode 
just unconditionnaly tried to free a tuple that was never there to begin with. 
So if the SortTuple does not contain an actual tuple, but only a single datum, 
do not do that. 

I've updated the patch to fix this and enable the optimization in the case of 
bounded sort.

> > 
> > A second question: at first glance it's intuitively the case we might
> > not be able to handle byref values. But nodeAgg doesn't seem to have
> > that restriction. What's the difference here?
> 
> I think tuplesort_begin_datum, doesn't have any such limitation, it
> can handle any type of Datum so I think we don't need to consider the
> only attbyval, we can consider any type of attribute for this
> optimization.

I've restricted the optimization to byval types because of the following 
comment in nodeAgg.c:

/*
 * Note: if input type is pass-by-ref, the datums returned by the 
sort are
 * freshly palloc'd in the per-query context, so we must be careful 
to
 * pfree them when they are no longer needed.
 */

As I was not sure how to handle that, I prefered the safety of not enabling 
it. Since you both told me it should be safe, I've lifted that restriction 
too.


> A few small code observations:
> - In my view the addition of unlikely() in ExecSort is unlikely to be
> of benefit because it's a single call for the entire node's execution
> (not in the tuple loop).

Done.

> - It seems clearer to change the "if (!node->is_single_val)" to flip
> the true/false cases so we don't need the negation.

Agreed, done.

> - I assume there are tests that likely already cover this case, but
> it'd be worth verifying that.

Yes many test cases cover that, but maybe it would be better to explictly 
check for it on some cases: do you think we could add a debug message that can 
be checked for ? 

> Finally, I believe the same optimization likely ought to be added to
> nodeIncrementalSort. It's less likely the tests there are sufficient
> for both this and the original case, but we'll see.

I will look into it, but isn't incrementalsort used to sort tuples on several 
keys, when they are already sorted on the first ? In that case, I doubt we 
would ever have a single-valued tuple here, except if there is a projection to 
strip the tuple from extraneous attributes.

-- 
Ronan Dunklaudiff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index b99027e0d7..ff443d15a9 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -29,6 +29,10 @@
  *		which saves the results in a temporary file or memory. After the
  *		initial call, returns a tuple from the file with each call.
  *
+ *		The tuplesort can either occur on the whole tuple (this is the nominal
+ *		case) or, when the input / output tuple consists of only one attribute,
+ *		we switch to the tuplesort_*_datum API, optimized for that specific case.
+ *
  *		Conditions:
  *		  -- none.
  *
@@ -85,33 +89,59 @@ ExecSort(PlanState *pstate)
 
 		outerNode = outerPlanState(node);
 		tupDesc = ExecGetResultType(outerNode);
-
-		tuplesortstate = tuplesort_begin_heap(tupDesc,
-			  plannode->numCols,
-			  plannode->sortColIdx,
-			  plannode->sortOperators,
-			  plannode->collations,
-			  plannode->nullsFirst,
-			  work_mem,
-			  NULL,
-			  node->randomAccess);
+		/*
+		 * Switch to the tuplesort_*_datum interface when we have only one key,
+		 * as it is much more efficient especially when the type is pass-by-value.
+		 */
+		if (tupDesc->natts == 1)
+		{
+			node->is_single_val = true;
+			tuplesortstate = tuplesort_begin_datum(TupleDescAttr(tupDesc, 0)->atttypid,
+   plannode->sortOperators[0],
+   plannode->collations[0],
+   plannode->nullsFirst[0],
+   work_mem,
+   NULL,
+   node->randomAccess);
+		} else
+			tuplesortstate = tuplesort_begin_heap(tupDesc,
+

Re: [PATCH] expand the units that pg_size_pretty supports on output

2021-07-06 Thread Tom Lane

David Rowley  writes:
> Does anyone want to have a look over this?  If not, I plan to push it
> in the next day or so.

Minor nit: use "const char *text" in the struct declaration, so
that all of the static data can be placed in fixed storage.

> (I'm not sure why pgindent removed the space between != and NULL, but
> it did, so I left it.)

It did that because "text" is a typedef name, so it's a bit confused
about whether the statement is really a declaration.  Personally I'd
have used "name" or something like that for that field, anyway.

regards, tom lane

Re: .ready and .done files considered harmful

2021-07-06 Thread Jeevan Ladhe

> I have a few suggestions on the patch
> 1.
> +
> + /*
> + * Found the oldest WAL, reset timeline ID and log segment number to
> generate
> + * the next WAL file in the sequence.
> + */
> + if (found && !historyFound)
> + {
> + XLogFromFileName(xlog, , , wal_segment_size);
> + ereport(LOG,
> + (errmsg("directory scan to archive write-ahead log file \"%s\"",
> + xlog)));
> + }
>
> If a history file is found we are not updating curFileTLI and
> nextLogSegNo, so it will attempt the previously found segment.  This
> is fine because it will not find that segment and it will rescan the
> directory.  But I think we can do better, instead of searching the
> same old segment in the previous timeline we can search that old
> segment in the new TL so that if the TL switch happened within the
> segment then we will find the segment and we will avoid the directory
> search.
>
>
>  /*
> + * Log segment number and timeline ID to get next WAL file in a sequence.
> + */
> +static XLogSegNo nextLogSegNo = 0;
> +static TimeLineID curFileTLI = 0;
> +
>
> So everytime archiver will start with searching segno=0 in timeline=0.
> Instead of doing this can't we first scan the directory and once we
> get the first segment to archive then only we can start predicting the
> next wal segment?  I think there is nothing wrong even if we try to
> look for seg 0 in timeline 0, everytime we start the archivar but that
> will be true only once in the history of the cluster so why not skip
> this until we scan the directory once?
>

+1, I like Dilip's ideas here to optimize further.

Also, one minor comment:

+   /*
+* Log segment number already points to the next file in the sequence

+* (as part of successful archival of the previous file). Generate the
path
+* for status file.

+*/

This comment is a bit confusing with the name of the variable nextLogSegNo.
I think the name of the variable is appropriate here, but maybe we can
reword
the comment something like:

+   /*
+* We already have the next anticipated log segment number and the
+* timeline, check if this WAL file is ready to be archived. If
yes, skip
+* the directory scan.
+*/

Regards,
Jeevan Ladhe

Re: [PATCH] expand the units that pg_size_pretty supports on output

2021-07-06 Thread David Christensen



David Rowley writes:

> On Mon, 5 Jul 2021 at 20:00, David Rowley  wrote:
>> I don't really like the fact that I had to add the doHalfRound field
>> to get the same rounding behaviour as the original functions.  I'm
>> wondering if it would just be too clever just to track how many bits
>> we've shifted right by in pg_size_pretty* and compare that to the
>> value of multishift for the current unit and do appropriate rounding
>> to get us to the value of multishift.  In theory, we could just keep
>> calling the half_rounded macro until we make it to the multishift
>> value.  My current thoughts are that it's unlikely that anyone would
>> twiddle with the size_pretty_units array in such a way for the extra
>> code to be worth it. Maybe someone else feels differently.
>
> I made another pass over this and ended up removing the doHalfRound
> field in favour of just doing rounding based on the previous
> bitshifts.
>
> I did a few other tidy ups and I think it's a useful patch as it
> reduces the amount of code a bit and makes it dead simple to add new
> units in the future.  Most importantly it'll help keep pg_size_pretty,
> pg_size_pretty_numeric and pg_size_bytes all in sync in regards to
> what units they support.
>
> Does anyone want to have a look over this?  If not, I plan to push it
> in the next day or so.
>
> (I'm not sure why pgindent removed the space between != and NULL, but
> it did, so I left it.)
>
> David

I like the approach you took here; much cleaner to have one table for all of 
the individual
codepaths.  Testing worked as expected; if we do decide to expand the units 
table there will be a
few additional changes (most significantly, the return value of 
`pg_size_bytes()` will need to switch
to `numeric`).

Thanks,

David

Re: Identify missing publications from publisher while create/alter subscription.

2021-07-06 Thread vignesh C

On Wed, Jun 30, 2021 at 8:23 PM vignesh C  wrote:
>
> On Sun, Jun 6, 2021 at 11:55 AM vignesh C  wrote:
> >
> > On Fri, May 7, 2021 at 6:44 PM vignesh C  wrote:
> > >
> > > Thanks for the comments, the attached patch has the fix for the same.
> >
> > The patch was not applying on the head, attached patch which is rebased on 
> > HEAD.
>
> The patch was not applying on the head, attached patch which is rebased on 
> HEAD.

The patch was not applying on the head, attached patch which is rebased on HEAD.

Regards,
Vignesh
From 059130a6d393c8c020234747865368596a3c4702 Mon Sep 17 00:00:00 2001
From: vignesh 
Date: Tue, 6 Jul 2021 19:54:11 +0530
Subject: [PATCH v11] Identify missing publications from publisher while
 create/alter subscription.

Creating/altering subscription is successful when we specify a publication which
does not exist in the publisher. This patch checks if the specified publications
are present in the publisher and throws an error if any of the publication is
missing in the publisher.
---
 doc/src/sgml/ref/alter_subscription.sgml  |  13 ++
 doc/src/sgml/ref/create_subscription.sgml |  18 +-
 src/backend/commands/subscriptioncmds.c   | 208 +++---
 src/bin/psql/tab-complete.c   |   6 +-
 src/test/subscription/t/007_ddl.pl|  68 ++-
 5 files changed, 281 insertions(+), 32 deletions(-)

diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index b3d173179f..205f82d0d0 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -161,6 +161,19 @@ ALTER SUBSCRIPTION name RENAME TO <
  
 

+
+   
+validate_publication (boolean)
+
+ 
+  When true, the command verifies if all the specified publications
+  that are being subscribed to are present in the publisher and throws
+  an error if any of the publication doesn't exist. The default is
+  false.
+ 
+
+   
+
   
 

diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index e812beee37..cad9285c16 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -207,8 +207,9 @@ CREATE SUBSCRIPTION subscription_nameCREATE SUBSCRIPTION
   should connect to the publisher at all.  Setting this to
   false will change default values of
-  enabled, create_slot and
-  copy_data to false.
+  enabled, create_slot,
+  copy_data and
+  validate_publication to false.
  
 
  
@@ -239,6 +240,19 @@ CREATE SUBSCRIPTION subscription_name
 

+
+   
+validate_publication (boolean)
+
+ 
+  When true, the command verifies if all the specified publications
+  that are being subscribed to are present in the publisher and throws
+  an error if any of the publication doesn't exist. The default is
+  false.
+ 
+
+   
+
   
 

diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index eb88d877a5..e3f8438e5f 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -59,6 +59,7 @@
 #define SUBOPT_REFRESH0x0040
 #define SUBOPT_BINARY0x0080
 #define SUBOPT_STREAMING			0x0100
+#define SUBOPT_VALIDATE_PUB			0x0200
 
 /* check if the 'val' has 'bits' set */
 #define IsSet(val, bits)  (((val) & (bits)) == (bits))
@@ -79,6 +80,7 @@ typedef struct SubOpts
 	bool		refresh;
 	bool		binary;
 	bool		streaming;
+	bool		validate_publication;
 } SubOpts;
 
 static List *fetch_table_list(WalReceiverConn *wrconn, List *publications);
@@ -123,6 +125,8 @@ parse_subscription_options(List *stmt_options, bits32 supported_opts, SubOpts *o
 		opts->binary = false;
 	if (IsSet(supported_opts, SUBOPT_STREAMING))
 		opts->streaming = false;
+	if (IsSet(supported_opts, SUBOPT_VALIDATE_PUB))
+		opts->validate_publication = false;
 
 	/* Parse options */
 	foreach(lc, stmt_options)
@@ -237,6 +241,17 @@ parse_subscription_options(List *stmt_options, bits32 supported_opts, SubOpts *o
 			opts->specified_opts |= SUBOPT_STREAMING;
 			opts->streaming = defGetBoolean(defel);
 		}
+		else if (IsSet(supported_opts, SUBOPT_VALIDATE_PUB) &&
+ strcmp(defel->defname, "validate_publication") == 0)
+		{
+			if (IsSet(opts->specified_opts, SUBOPT_VALIDATE_PUB))
+ereport(ERROR,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg("conflicting or redundant options")));
+
+			opts->specified_opts |= SUBOPT_VALIDATE_PUB;
+			opts->validate_publication = defGetBoolean(defel);
+		}
 		else
 			ereport(ERROR,
 	(errcode(ERRCODE_SYNTAX_ERROR),
@@ -275,10 +290,19 @@ parse_subscription_options(List *stmt_options, bits32 supported_opts, SubOpts *o
 	 errmsg("%s and %s are mutually exclusive options",

Re: .ready and .done files considered harmful

2021-07-06 Thread Dipesh Pandit

> specifically about history files being given higher priority for
> archiving.  If we go with this change then we'd at least want to rewrite
> or remove those comments, but I don't actually agree that we should
> remove that preference to archive history files ahead of WAL, for the
> reasons brought up previously.

> As was suggested on that subthread, it seems like it should be possible
> to just track the current timeline and adjust what we're doing if the
> timeline changes, and we should even know what the .history file is at
> that point and likely don't even need to scan the directory for it, as
> it'll be the old timeline ID.

I agree, I missed this part. The .history file should be given higher
preference.
I will take care of it in the next patch.

Thanks,
Dipesh

Re: proposal - log_full_scan

2021-07-06 Thread Daniel Gustafsson

Looking at this I like the idea in principle, but I'm not convinced that
auto_explain is the right tool for this.  auto_explain is for identifying slow
queries, and what you are proposing is to identify queries with a certain
"shape" (for lack of a better term) even if they aren't slow as per the
log_min_duration setting.  If log_min_duration is deemed to crude due to query
volume then sample_rate is the tool.  If sample_rate is also discarded, then
pg_stat_statements seems a better option.

Also, why just sequential scans (apart from it being this specific usecase)?
If the idea is to track aspects of execution which are deemed slow, then
tracking for example spills etc would be just as valid.  Do you have thoughts
on that?

That being said, a few comments on the patch:

-   (auto_explain_log_min_duration >= 0 && \
+   ((auto_explain_log_min_duration >= 0 || auto_explain_log_seqscan != -1) 
&& \
Is there a reason to not follow the existing code and check for >= 0?

+   DefineCustomIntVariable("auto_explain.log_seqscan",
It's only a matter of time before another node is proposed for logging, and
then we'll be stuck adding log_XXXnode GUCs.  Is there a more future-proof way
to do this?

+   "Sets the minimum tuples produced by sequantial scans which plans will 
be logged",
s/sequantial/sequential/

-   es->analyze = (queryDesc->instrument_options && 
auto_explain_log_analyze);
+   es->analyze = (queryDesc->instrument_options && 
(auto_explain_log_analyze || auto_explain_log_seqscan != -1));
Turning on ANALYZE when log_analyze isn't set to True is a no-no IMO.

+ * Colllect relations where log_seqscan limit was exceeded
s/Colllect/Collect/

+   if (*relnames.data != '\0')
+   appendStringInfoString(, ",");
This should use appendStringInfoChar instead.

+   (errmsg("duration: %.3f ms, over limit seqscans: %s, plan:\n%s",
The "over limit" part is superfluous since it otherwise wouldn't be logged.  If
we're prefixing something wouldn't it be more helpful to include the limit,
like: "seqscans >= %d tuples returned:".  I'm not a fan of "seqscans" but
spelling it out is also quite verbose and this is grep-able.

Documentation and tests are also missing

--
Daniel Gustafsson   https://vmware.com/

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

2021-07-06 Thread Dilip Kumar

On Tue, Jul 6, 2021 at 6:49 PM James Coleman  wrote:
>
> Adding David since this patch is likely a precondition for [1].
>
> On Tue, Jul 6, 2021 at 2:15 AM Ronan Dunklau  wrote:
> >
> > Hello,
> >
> > While testing the patch "Add proper planner support for ORDER BY / DISTINCT
> > aggregates" [0] I discovered the performance penalty from adding a sort node
> > essentially came from not using the single-datum tuplesort optimization in
> > ExecSort (contrary to the sorting done in ExecAgg).
> >
> > I originally proposed this patch as a companion in the same thread [1], but
> > following James suggestion I'm making a separate thread just for this as the
> > optimization is worthwhile independently of David's patch: it looks like we
> > can expect a 2x speedup on a "select a single ordered column" case.
> >
> > The patch aimed to be as simple as possible: we only turn this optimization 
> > on
> > when the tuple being sorted has only one attribute, it is "byval" (so as not
> > to incur copies which would be hard to track in the execution tree) and
> > unbound (again, not having to deal with copying borrowed datum anywhere).
>
> Thanks again for finding this and working up a patch.
>
> I've taken a look, and while I haven't dug into testing it yet, I have
> a few comments.
>
> First, the changes are lacking any explanatory comments. Probably we
> should follow how nodeAgg does this and add both comments to the
> ExecSort function header as well as specific comments above the "if"
> around the new tuplesort_begin_datum explaining the specific
> conditions that are required for the optimization to be useful and
> safe.
>
> That leads to a question I had: I don't follow why bounded mode (when
> using byval) needs to be excluded. Comments should be added if there's
> a good reason (as noted above), but maybe it's a case we can handle
> safely?
>
> A second question: at first glance it's intuitively the case we might
> not be able to handle byref values. But nodeAgg doesn't seem to have
> that restriction. What's the difference here?
>

I think tuplesort_begin_datum, doesn't have any such limitation, it
can handle any type of Datum so I think we don't need to consider the
only attbyval, we can consider any type of attribute for this
optimization.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

2021-07-06 Thread Ranier Vilela

Em ter., 6 de jul. de 2021 às 10:19, James Coleman 
escreveu:

> Adding David since this patch is likely a precondition for [1].
>
> On Tue, Jul 6, 2021 at 2:15 AM Ronan Dunklau 
> wrote:
> >
> > Hello,
> >
> > While testing the patch "Add proper planner support for ORDER BY /
> DISTINCT
> > aggregates" [0] I discovered the performance penalty from adding a sort
> node
> > essentially came from not using the single-datum tuplesort optimization
> in
> > ExecSort (contrary to the sorting done in ExecAgg).
> >
> > I originally proposed this patch as a companion in the same thread [1],
> but
> > following James suggestion I'm making a separate thread just for this as
> the
> > optimization is worthwhile independently of David's patch: it looks like
> we
> > can expect a 2x speedup on a "select a single ordered column" case.
> >
> > The patch aimed to be as simple as possible: we only turn this
> optimization on
> > when the tuple being sorted has only one attribute, it is "byval" (so as
> not
> > to incur copies which would be hard to track in the execution tree) and
> > unbound (again, not having to deal with copying borrowed datum anywhere).
>
> Thanks again for finding this and working up a patch.
>
> I've taken a look, and while I haven't dug into testing it yet, I have
> a few comments.
>
> First, the changes are lacking any explanatory comments. Probably we
> should follow how nodeAgg does this and add both comments to the
> ExecSort function header as well as specific comments above the "if"
> around the new tuplesort_begin_datum explaining the specific
> conditions that are required for the optimization to be useful and
> safe.
>
> That leads to a question I had: I don't follow why bounded mode (when
> using byval) needs to be excluded. Comments should be added if there's
> a good reason (as noted above), but maybe it's a case we can handle
> safely?
>
> A second question: at first glance it's intuitively the case we might
> not be able to handle byref values. But nodeAgg doesn't seem to have
> that restriction. What's the difference here?
>
> A few small code observations:
> - In my view the addition of unlikely() in ExecSort is unlikely to be
> of benefit because it's a single call for the entire node's execution
> (not in the tuple loop).
>
No objection. And I agree that testing is complex and needs to remain as it
is.

- It seems clearer to change the "if (!node->is_single_val)" to flip
> the true/false cases so we don't need the negation.
>
I think yes, it can be better.

regards,
Ranier Vilela

Re: .ready and .done files considered harmful

2021-07-06 Thread Stephen Frost

Greetings,

* Dipesh Pandit (dipesh.pan...@gmail.com) wrote:
> We have addressed the O(n^2) problem which involves directory scan for
> archiving individual WAL files by maintaining a WAL counter to identify
> the next WAL file in a sequence.

This seems to have missed the concerns raised in
https://postgr.es/m/20210505170601.gf20...@tamriel.snowman.net ..?

And also the comments immediately above the ones being added here:

> @@ -596,29 +606,55 @@ pgarch_archiveXlog(char *xlog)
>   * larger ID; the net result being that past timelines are given higher
>   * priority for archiving.  This seems okay, or at least not obviously worth
>   * changing.
> + *
> + * WAL files are generated in a specific order of log segment number. The
> + * directory scan for each WAL file can be minimized by identifying the next
> + * WAL file in the sequence. This can be achieved by maintaining log segment
> + * number and timeline ID corresponding to WAL file currently being archived.
> + * The log segment number of current WAL file can be incremented by '1' upon
> + * successful archival to point to the next WAL file.

specifically about history files being given higher priority for
archiving.  If we go with this change then we'd at least want to rewrite
or remove those comments, but I don't actually agree that we should
remove that preference to archive history files ahead of WAL, for the
reasons brought up previously.

As was suggested on that subthread, it seems like it should be possible
to just track the current timeline and adjust what we're doing if the
timeline changes, and we should even know what the .history file is at
that point and likely don't even need to scan the directory for it, as
it'll be the old timeline ID.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: "debug_invalidate_system_caches_always" is too long

2021-07-06 Thread Tom Lane

Andrew Dunstan  writes:
> On 7/5/21 11:46 PM, Bharath Rupireddy wrote:
>> On Tue, Jul 6, 2021 at 12:43 AM Tom Lane  wrote:
>>> I like "debug_flush_caches" --- it's short and accurate.

>> Do we always flush the cache entries into the disk? Sometimes we just
>> invalidate the cache entries in the registered invalidation callbacks,
>> right? Since we already use the term "clobber" in the user visible
>> config option --clobber-cache, isn't it consistent to use
>> debug_clobber_caches?

> I think 'flush' here means simply 'discard'. Maybe that would be a
> better word to use.

"Discard" could be misinterpreted too, no doubt.  None of these words
have one single exact meaning, so I have only limited patience for
this sort of argumentation.

(As for initdb's "--clobber-cache", I'm assuming we'd rename that to
match whatever we come up with here.  It is what it is now only because
I was unwilling to call it "--use-debug-invalidate-system-caches-always".)

regards, tom lane

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

2021-07-06 Thread James Coleman

Adding David since this patch is likely a precondition for [1].

On Tue, Jul 6, 2021 at 2:15 AM Ronan Dunklau  wrote:
>
> Hello,
>
> While testing the patch "Add proper planner support for ORDER BY / DISTINCT
> aggregates" [0] I discovered the performance penalty from adding a sort node
> essentially came from not using the single-datum tuplesort optimization in
> ExecSort (contrary to the sorting done in ExecAgg).
>
> I originally proposed this patch as a companion in the same thread [1], but
> following James suggestion I'm making a separate thread just for this as the
> optimization is worthwhile independently of David's patch: it looks like we
> can expect a 2x speedup on a "select a single ordered column" case.
>
> The patch aimed to be as simple as possible: we only turn this optimization on
> when the tuple being sorted has only one attribute, it is "byval" (so as not
> to incur copies which would be hard to track in the execution tree) and
> unbound (again, not having to deal with copying borrowed datum anywhere).

Thanks again for finding this and working up a patch.

I've taken a look, and while I haven't dug into testing it yet, I have
a few comments.

First, the changes are lacking any explanatory comments. Probably we
should follow how nodeAgg does this and add both comments to the
ExecSort function header as well as specific comments above the "if"
around the new tuplesort_begin_datum explaining the specific
conditions that are required for the optimization to be useful and
safe.

That leads to a question I had: I don't follow why bounded mode (when
using byval) needs to be excluded. Comments should be added if there's
a good reason (as noted above), but maybe it's a case we can handle
safely?

A second question: at first glance it's intuitively the case we might
not be able to handle byref values. But nodeAgg doesn't seem to have
that restriction. What's the difference here?

A few small code observations:
- In my view the addition of unlikely() in ExecSort is unlikely to be
of benefit because it's a single call for the entire node's execution
(not in the tuple loop).
- It seems clearer to change the "if (!node->is_single_val)" to flip
the true/false cases so we don't need the negation.
- I assume there are tests that likely already cover this case, but
it'd be worth verifying that.

Finally, I believe the same optimization likely ought to be added to
nodeIncrementalSort. It's less likely the tests there are sufficient
for both this and the original case, but we'll see.

Thanks,
James

1: 
https://www.postgresql.org/message-id/CAApHDvpHzfo92%3DR4W0%2BxVua3BUYCKMckWAmo-2t_KiXN-wYH%3Dw%40mail.gmail.com

Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

2021-07-06 Thread Tom Lane

David Rowley  writes:
> Tom, I'm wondering if you might get a chance to draw up a design for
> what you've got in mind with this?  I assume adding a new field in
> Var, but I'm drawing a few blanks on how things might work for equal()
> when one Var has the field set and another does not.

As I said before, it hasn't progressed much past the handwaving stage,
but it does seem like it's time to get it done.  I doubt I'll have any
cycles for it during the commitfest, but maybe I can devote a block of
time during August.

regards, tom lane

Re: Fix pkg-config file for static linking

2021-07-06 Thread Peter Eisentraut


On 21.06.21 15:47, Filip Gospodinov wrote:

-PKG_CONFIG_REQUIRES_PRIVATE = libssl libcrypto
+PKG_CONFIG_REQUIRES_PRIVATE = libpgcommon libpgport libssl libcrypto


This doesn't work.

This patch adds libpgcommon and libpgport to Requires.private.  But they 
are not pkg-config names but library names, so they should be added to 
Libs.private.

Re: Asymmetric partition-wise JOIN

2021-07-06 Thread Alexander Pyhalov


Andrey Lepikhov писал 2021-07-06 12:28:

On 5/7/21 23:15, Zhihong Yu wrote:
On Mon, Jul 5, 2021 at 2:57 AM Andrey Lepikhov 
mailto:a.lepik...@postgrespro.ru>> wrote:
+            * Can't imagine situation when join relation already 
exists. But in

+            * the 'partition_join' regression test it happens.
+            * It may be an indicator of possible problems.
Should a log be added in the above case ?

I made additional analysis of this branch of code. This situation can
happen in the case of one child or if we join two plane tables with
partitioned. Both situations are legal and I think we don't needed to
add any log message here.
Other mistakes were fixed.


Hi.

Small  typo in comment in src/backend/optimizer/plan/setrefs.c:

 281
 282 /*
 283  * Adjust RT indexes of AppendRelInfos and add to final 
appendrels list.
 284  * The AppendRelInfos are copied, because as a part of a 
subplan its could

 285  * be visited many times in the case of asymmetric join.
 286  */
 287 foreach(lc, root->append_rel_list)
 288 {

its  -> it (or they) ?
--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: [PATCH] expand the units that pg_size_pretty supports on output

2021-07-06 Thread Dean Rasheed

On Tue, 6 Jul 2021 at 13:15, David Rowley  wrote:
>
> Can you give an example where calling half_rounded too many times will
> give the wrong value? Keeping in mind we call half_rounded the number
> of times that the passed in value would need to be left-shifted by to
> get the equivalent truncated value.
>

./half_rounded 10241 10
1. half_round(10241) == 5121 :: 10241 >> 1 = 5120
2. half_round(5121) == 2561 :: 5120 >> 1 = 2560
3. half_round(2561) == 1281 :: 2560 >> 1 = 1280
4. half_round(1281) == 641 :: 1280 >> 1 = 640
5. half_round(641) == 321 :: 640 >> 1 = 320
6. half_round(321) == 161 :: 320 >> 1 = 160
7. half_round(161) == 81 :: 160 >> 1 = 80
8. half_round(81) == 41 :: 80 >> 1 = 40
9. half_round(41) == 21 :: 40 >> 1 = 20
10. half_round(21) == 11 :: 20 >> 1 = 10

The correct result should be 10 (it would be very odd to claim that
10241 bytes should be displayed as 11kb), but the half-rounding keeps
rounding up at each stage.

That's a general property of rounding -- you need to be very careful
when rounding more than once, since otherwise errors will propagate.
C.f. 4083f445c0, which removed a double-round in numeric sqrt().

To be clear, I'm not saying that the current code half-rounds more
than once, just that it reads as if it does.

Regards,
Dean

?????? Why is XLOG_FPI_FOR_HINT always need backups?

2021-07-06 Thread zwj

Thank you for reply. 
You are right and the PostgreSQL server writes the entire content of each disk 
page to WAL during the first modification of that page after a
checkpoint while data checksum is on. 


But I wonder whether it is necessary or not while my file system can protect 
the blocks of database to be torn. And I read a comment in 
functionMarkBufferDirtyHint:
```
/*
* If we need to protect hint bit updates from torn writes, WAL-log a
* full page image of the page. This full page image is only necessary
* if the hint bit update is the first change to the page since the
* last checkpoint.
*
* We don't check full_page_writes here because that logic is included
* when we call XLogInsert() since the value changes dynamically.
*/



```
However, the code tell me it has nothing to do with full_page_writes. I can't 
figure it out.


--
Zhang Wenjie


--  --
??: 
   "Japin Li"   
 
https://www.postgresql.org/docs/current/runtime-config-wal.html

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: Removing redundant check for transaction in progress in check_safe_enum_use

2021-07-06 Thread Matthias van de Meent

On Sun, 4 Jul 2021, 03:40 Zhihong Yu,  wrote:
>
> Hi,
> I was looking at :
> Relax transactional restrictions on ALTER TYPE ... ADD VALUE (redux).
>
> In check_safe_enum_use():
>
> +   if (!TransactionIdIsInProgress(xmin) &&
> +   TransactionIdDidCommit(xmin))
> +   return;
>
> Since the condition would be true only when TransactionIdDidCommit() returns 
> true, I think the call to TransactionIdIsInProgress is not needed.
> If transaction for xmin is committed, the transaction cannot be in progress 
> at the same time.

I'm not sure that removing the !TransactionIdIsInProgress-check is
correct. The comment in heapam_visibility.c:13 explains that we need
to check TransactionIdIsInProgress before TransactionIdDidCommit in
non-MVCC snapshots, and I'm fairly certain that check_safe_enum_use()
is not guaranteed to run only in MVCC snapshots (at least its
documentation does not warn against non-MVCC snapshots).

Kind regards,

Matthias van de Meent

Re: [PATCH] expand the units that pg_size_pretty supports on output

2021-07-06 Thread David Rowley

On Tue, 6 Jul 2021 at 23:39, Dean Rasheed  wrote:
> When I first read this:
>
> +/* half-round until we get down to unitBits */
> +while (rightshifts++ < unit->unitBits)
> +size = half_rounded(size);
>
> it looked to me like it would be invoking half_rounded() multiple
> times, which raised alarm bells because that would risk rounding the
> wrong way. Eventually I realised that by the time it reaches that,
> rightshifts will always equal unit->unitBits or unit->unitBits - 1, so
> it'll never do more than one half-round, which is important.

It's true that based on how the units table is set up now, it'll only
ever do it once for all but the first loop.

I wrote the attached .c file just to try to see if it ever goes wrong
and I didn't manage to find any inputs where it did.  I always seem to
get the half rounded value either the same as the shifted value or 1
higher towards positive infinity

$ ./half_rounded -102 3
1. half_round(-102) == -51 :: -102 >> 1 = -51
2. half_round(-51) == -25 :: -51 >> 1 = -26
3. half_round(-25) == -12 :: -26 >> 1 = -13

$ ./half_rounded 6432 3
1. half_round(6432) == 3216 :: 6432 >> 1 = 3216
2. half_round(3216) == 1608 :: 3216 >> 1 = 1608
3. half_round(1608) == 804 :: 1608 >> 1 = 804

Can you give an example where calling half_rounded too many times will
give the wrong value? Keeping in mind we call half_rounded the number
of times that the passed in value would need to be left-shifted by to
get the equivalent truncated value.

David
#define half_rounded(x)   (((x) + ((x) < 0 ? 0 : 1)) / 2)

#include 
#include 

int main(int argc, char **argv)
{
int num, times;
int i;
int r, shift;
if (argc < 3)
{
printf("Usage: %s  \n");
return -1;
}

num = atoi(argv[1]);
times = atoi(argv[2]);
r = num;
shift = num;
for (i = 0; i < times; i++)
{
int rr = half_rounded(r);
printf("%d. half_round(%d) == %d :: %d >> 1 = %d\n", i+1, r, 
rr, shift, shift >> 1);
r = rr;
shift >>= 1;
}

return 0;
}

Re: Remove useless int64 range checks on BIGINT sequence MINVALUE/MAXVALUE values

2021-07-06 Thread Greg Nancarrow

On Tue, Jul 6, 2021 at 8:43 PM David Rowley  wrote:
>
> On Sun, 4 Jul 2021 at 20:53, David Rowley  wrote:
> >
> > On Sat, 3 Jul 2021 at 22:44, Peter Eisentraut
> >  wrote:
> > > I don't think this is a good change.
> >
> > > I think we should leave it as is.
> >
> > I'm inclined to agree.
>
> Does anyone object to marking this patch as rejected in the CF app?
>

I think if you're going to reject this patch, a brief comment should
be added to that code to justify why that existing superfluous check
is worthwhile.
(After all, similar checks are not being done elsewhere in the
Postgres code, AFAICS. e.g. "int" variables are not being checked to
see whether they hold values greater than MAXINT).

Regards,
Greg Nancarrow
Fujitsu Australia

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

2021-07-06 Thread Ranier Vilela

Em ter., 6 de jul. de 2021 às 08:25, Ranier Vilela 
escreveu:

> Em ter., 6 de jul. de 2021 às 03:15, Ronan Dunklau 
> escreveu:
>
>> Hello,
>>
>> While testing the patch "Add proper planner support for ORDER BY /
>> DISTINCT
>> aggregates" [0] I discovered the performance penalty from adding a sort
>> node
>> essentially came from not using the single-datum tuplesort optimization
>> in
>> ExecSort (contrary to the sorting done in ExecAgg).
>>
>> I originally proposed this patch as a companion in the same thread [1],
>> but
>> following James suggestion I'm making a separate thread just for this as
>> the
>> optimization is worthwhile independently of David's patch: it looks like
>> we
>> can expect a 2x speedup on a "select a single ordered column" case.
>>
>> The patch aimed to be as simple as possible: we only turn this
>> optimization on
>> when the tuple being sorted has only one attribute, it is "byval" (so as
>> not
>> to incur copies which would be hard to track in the execution tree) and
>> unbound (again, not having to deal with copying borrowed datum anywhere).
>>
>> The attached patch is originally by me, with some cleanup by Ranier
>> Vilela.
>> I'm sending Ranier's version here.
>>
> Nice Ronan.
> But I think there is some confusion here.
> The author is not you?
>
> Just to clarify, at Commitfest, it was supposed to be the other way around.
> You as an author and David as a reviewer.
> I'll put myself as a reviewer too.
>
Sorry David, my mistake.
I confused the numbers (id) of Commitfest.

regards,
Ranier Vilela

Re: [PATCH] expand the units that pg_size_pretty supports on output

2021-07-06 Thread Dean Rasheed

On Tue, 6 Jul 2021 at 10:20, David Rowley  wrote:
>
> I made another pass over this and ended up removing the doHalfRound
> field in favour of just doing rounding based on the previous
> bitshifts.
>

When I first read this:

+/* half-round until we get down to unitBits */
+while (rightshifts++ < unit->unitBits)
+size = half_rounded(size);

it looked to me like it would be invoking half_rounded() multiple
times, which raised alarm bells because that would risk rounding the
wrong way. Eventually I realised that by the time it reaches that,
rightshifts will always equal unit->unitBits or unit->unitBits - 1, so
it'll never do more than one half-round, which is important.

So perhaps using doHalfRound would be clearer, but it could just be a
local variable tracking whether or not it's the first time through the
loop.

Regards,
Dean

Re: Can a child process detect postmaster death when in pg_usleep?

2021-07-06 Thread Bharath Rupireddy

On Tue, Jul 6, 2021 at 4:33 PM Michael Paquier  wrote:
>
> On Tue, Jul 06, 2021 at 03:54:07PM +0530, Bharath Rupireddy wrote:
> > Thanks. You are right. The issue is due to the MyLatch being set by
> > SwitchToSharedLatch before WaitLatch. If we use (WL_TIMEOUT |
> > WL_EXIT_ON_PM_DEATH), then the backends will honour the
> > post_auth_delay as well as detect the postmaster death. Since we are
> > not using WL_LATCH_SET, I removed ResetLatch. Also, added some
> > comments around why we are not using WL_LATCH_SET.
> >
> > For PreAuthDelay, there's no problem to use WL_LATCH_SET as MyLatch
> > still points to the local latch(which is not set) in
> > BackendInitialize().
>
> FWIW, I think that it could be a good idea to use the same set of
> flags for all the pre/post_auth_delay paths for consistency.  That's
> useful when grepping for one.  Please note that I don't plan to look
> more at this patch set for this CF as I am not really excited by the
> updates involving developer options, and I suspect more issues like
> the one I found upthread so this needs a close lookup.
>
> If somebody else wishes to look at it, please feel free, of course.

Thanks. Anyways, I removed WL_LATCH_SET for PreAuthDelay as well. PSA v4 patch.

Regards,
Bharath Rupireddy.


v3-0001-Use-a-WaitLatch-for-Pre-and-Post-Auth-Delay.patch
Description: Binary data

Re: Commit fest manager

2021-07-06 Thread Ibrar Ahmed

On Tue, Jul 6, 2021 at 3:58 PM Michael Paquier  wrote:

> On Tue, Jul 06, 2021 at 03:38:23PM +0500, Ibrar Ahmed wrote:
> > Any update and decision on this? so I can start working on this.
>
> Working on the CF does not strongly require the admin permissions.  I
> have already switched the current CF as in progress, so most of the
> admin job is done for this month :)
>
> Another piece of interest are the reviewer/author stat reports, but
> poking at patching does not need this information in most cases.
> --
> Michael
>

Great, thanks Michael, I will start doing my work :)

-- 
Ibrar Ahmed

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

2021-07-06 Thread Ranier Vilela

Em ter., 6 de jul. de 2021 às 03:15, Ronan Dunklau 
escreveu:

> Hello,
>
> While testing the patch "Add proper planner support for ORDER BY /
> DISTINCT
> aggregates" [0] I discovered the performance penalty from adding a sort
> node
> essentially came from not using the single-datum tuplesort optimization in
> ExecSort (contrary to the sorting done in ExecAgg).
>
> I originally proposed this patch as a companion in the same thread [1],
> but
> following James suggestion I'm making a separate thread just for this as
> the
> optimization is worthwhile independently of David's patch: it looks like
> we
> can expect a 2x speedup on a "select a single ordered column" case.
>
> The patch aimed to be as simple as possible: we only turn this
> optimization on
> when the tuple being sorted has only one attribute, it is "byval" (so as
> not
> to incur copies which would be hard to track in the execution tree) and
> unbound (again, not having to deal with copying borrowed datum anywhere).
>
> The attached patch is originally by me, with some cleanup by Ranier
> Vilela.
> I'm sending Ranier's version here.
>
Nice Ronan.
But I think there is some confusion here.
The author is not you?

Just to clarify, at Commitfest, it was supposed to be the other way around.
You as an author and David as a reviewer.
I'll put myself as a reviewer too.

regards,
Ranier Vilela

Re: ECPG doesn't compile CREATE AS EXECUTE properly.

2021-07-06 Thread Michael Paquier

On Tue, Jul 06, 2021 at 05:47:34PM +0900, Kyotaro Horiguchi wrote:
> More accurately, I didn't come up with the way to split out some of
> the rule-components in a rule out as another rule using the existing
> infrastructure.
>
> [...]
> 
> Then add the following component to the rule "stmt".

I see.  That sounds interesting as solution, and consistent with the
existing cursor queries.  The ECPG parser is not that advanced, and we
may not really need to make it more complicated with sub-clause
handling like INEs.  So I'd be rather in favor of what you are
describing here.
--
Michael


signature.asc
Description: PGP signature

Re: Why is XLOG_FPI_FOR_HINT always need backups?

2021-07-06 Thread Japin Li



On Tue, 06 Jul 2021 at 17:58, zwj <757634...@qq.com> wrote:
> Hi all,
>
> When I read the source code file src/backend/access/transam/xloginsert.c, I 
> get something confused me.
> In the function XLogSaveBufferForHint, the flags are always 
> REGBUF_FORCE_IMAGE which means it is always need backups.
> Is it right? Why do not check the full_page_writes?

The documentation [1] says:

--
wal_log_hints (boolean)
When this parameter is on, the PostgreSQL server writes the entire content of
each disk page to WAL during the first modification of that page after a
checkpoint, even for non-critical modifications of so-called hint bits.
--

Does that mean whether the full_page_writes enable or not, if the wal_log_hints
enabled, we always write the entire content of each disk page to WAL? If I'm
right, should we mention this in wal_log_hints?

[1] https://www.postgresql.org/docs/current/runtime-config-wal.html

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: Can a child process detect postmaster death when in pg_usleep?

2021-07-06 Thread Michael Paquier

On Tue, Jul 06, 2021 at 03:54:07PM +0530, Bharath Rupireddy wrote:
> Thanks. You are right. The issue is due to the MyLatch being set by
> SwitchToSharedLatch before WaitLatch. If we use (WL_TIMEOUT |
> WL_EXIT_ON_PM_DEATH), then the backends will honour the
> post_auth_delay as well as detect the postmaster death. Since we are
> not using WL_LATCH_SET, I removed ResetLatch. Also, added some
> comments around why we are not using WL_LATCH_SET.
> 
> For PreAuthDelay, there's no problem to use WL_LATCH_SET as MyLatch
> still points to the local latch(which is not set) in
> BackendInitialize().

FWIW, I think that it could be a good idea to use the same set of
flags for all the pre/post_auth_delay paths for consistency.  That's
useful when grepping for one.  Please note that I don't plan to look
more at this patch set for this CF as I am not really excited by the
updates involving developer options, and I suspect more issues like
the one I found upthread so this needs a close lookup.

If somebody else wishes to look at it, please feel free, of course.
--
Michael

signature.asc
Description: PGP signature

Re: Commit fest manager

2021-07-06 Thread Michael Paquier

On Tue, Jul 06, 2021 at 03:38:23PM +0500, Ibrar Ahmed wrote:
> Any update and decision on this? so I can start working on this.

Working on the CF does not strongly require the admin permissions.  I
have already switched the current CF as in progress, so most of the
admin job is done for this month :)

Another piece of interest are the reviewer/author stat reports, but
poking at patching does not need this information in most cases.
--
Michael


signature.asc
Description: PGP signature

Re: Removing unneeded self joins

2021-07-06 Thread Hywel Carver

On Mon, Jul 5, 2021 at 2:20 PM Andrey Lepikhov 
wrote:

> On 2/7/21 01:56, Hywel Carver wrote:
> > On Wed, Jun 30, 2021 at 12:21 PM Andrey Lepikhov
> > mailto:a.lepik...@postgrespro.ru>> wrote:
> > I think, here we could ask more general question: do we want to
> > remove a
> > 'IS NOT NULL' clause from the clause list if the rest of the list
> > implicitly implies it?
> >
> >
> > My suggestion was not to remove it, but to avoid adding it in the first
> > place. When your optimisation has found a join on a group of columns
> > under a uniqueness constraint, you would do something like this (forgive
> > the pseudo-code)
> >
> > foreach(column, join_clause) {
> >if(column.nullable) { // This condition is what I'm suggesting is
> added
> >  add_null_test(column, IS_NOT_NULL);
> >}
> > }
> >
> > But it may be that that's not possible or practical at this point in the
> > code.
> I think, such option will require to implement a new machinery to prove
> that arbitrary column couldn't produce NULL value.
>

Got it, and it makes sense to me that this would be out of scope for this
change.

I remember in the previous conversation about this, Tomas acknowledged that
while there are some silly queries that would benefit from this change,
there are also some well-written ones (e.g. properly denormalised table
structures, with decomposed views that need joining together in some
queries). So the optimization needs to be essentially free to run to
minimise impact on other queries.

Looking through the email chain, a previous version of this patch added
~0.6% to planning time in the worst case tested - does that meet the
"essentially free" requirement?

Re: Remove useless int64 range checks on BIGINT sequence MINVALUE/MAXVALUE values

2021-07-06 Thread David Rowley

On Sun, 4 Jul 2021 at 20:53, David Rowley  wrote:
>
> On Sat, 3 Jul 2021 at 22:44, Peter Eisentraut
>  wrote:
> > I don't think this is a good change.
>
> > I think we should leave it as is.
>
> I'm inclined to agree.

Does anyone object to marking this patch as rejected in the CF app?

David

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

2021-07-06 Thread David Rowley

I've been looking at the NOT IN hashing patch again and after making a
few minor tweaks I think it's pretty much ready to go.

If anyone feels differently, please let me know in the next couple of
days. Otherwise, I plan on taking a final look and pushing it soon.

David


v5-0001-Use-hash-table-to-speed-up-NOT-IN-with-a-set-of-C.patch
Description: Binary data

Re: Commit fest manager

2021-07-06 Thread Ibrar Ahmed

On Fri, Jul 2, 2021 at 7:15 PM Ibrar Ahmed  wrote:

>
>
> On Fri, 2 Jul 2021 at 7:06 PM, vignesh C  wrote:
>
>> On Fri, Jul 2, 2021 at 6:05 PM Ibrar Ahmed  wrote:
>> >
>> >
>> >
>> > On Fri, 2 Jul 2021 at 1:47 PM, David Rowley 
>> wrote:
>> >>
>> >> On Fri, 2 Jul 2021 at 15:04, vignesh C  wrote:
>> >> > I'm interested in assisting Ibrar Ahmed.
>> >>
>> >> It might be worth talking to Ibrar to see where you can lend a hand. I
>> >> think in terms of the number of patches, this might be our biggest
>> >> commitfest yet.
>> >>
>> >> 2020-07 246
>> >> 2020-09 235
>> >> 2020-11 244
>> >> 2021-01 260
>> >> 2020-03 295
>> >> 2020-07 342
>> >>
>> >> It's possible Ibrar would welcome you helping to take care of some of
>> >> the duties.  I've never been brave enough to take on the CF manager
>> >> role yet, but from what I can see, to do a good job takes a huge
>> >> amount of effort.
>> >>
>> >> David
>> >
>> >
>> > I am willing to take the responsibility, help from vegnsh is welcome
>>
>> Thanks, Can someone provide me permissions as this will be my first
>> time. My username is vignesh.postgres.
>>
>> Regards,
>> Vignesh
>
> i need permission my id is ibrar.ah...@gmail.com
>
>>
>> --
> Ibrar Ahmed
>


Any update and decision on this? so I can start working on this.

-- 
Ibrar Ahmed

1 2 >

1 - 100 of 129 matches

Mail list logo