Re: snapshot too old issues, first around wraparound and then more.

2021-06-22 Thread Greg Stark
On Thu, 17 Jun 2021 at 23:49, Noah Misch  wrote:
>
> On Wed, Jun 16, 2021 at 12:00:57PM -0400, Tom Lane wrote:
> > I agree that's a great use-case.  I don't like this implementation though.
> > I think if you want to set things up like that, you should draw a line
> > between the tables it's okay for the long transaction to touch and those
> > it isn't, and then any access to the latter should predictably draw an
> > error.

> I agree that would be a useful capability, but it solves a different problem.

Yeah, I think this discussion veered off into how to improve vacuum
snapshot tracking. That's an worthwhile endeavour but it doesn't
really address the use case this patch was there to target.

Fundamentally there's no way in SQL for users to give this information
to Postgres. There's nothing in SQL or our API that lets a client
inform Postgres what tables a session is going to access within a
transaction in the future.

What this alternative would look like would be a command that a client
would have to issue at the start of every transaction listing every
table that transaction will be allowed to touch. Any attempt to read
from any other table during the transaction would then get an error.

That sounds like it would be neat but it wouldn't work great with the
general approach in Postgres of having internal functions accessing
relations on demand (think of catalog tables, toast tables, and
pg_proc functions).

The "snapshot too old" approach is much more in line with Postgres's
general approach of giving users a general purpose platform and then
dealing gracefully with the consequences.

-- 
greg




Re: [HACKERS] logical decoding of two-phase transactions

2021-06-22 Thread vignesh C
On Wed, Jun 23, 2021 at 9:10 AM Ajin Cherian  wrote:
>
> On Tue, Jun 22, 2021 at 3:36 PM Greg Nancarrow  wrote:
>
> > Some minor comments:
> >
> > (1)
> > v88-0002
> >
> > doc/src/sgml/logicaldecoding.sgml
> >
> > "examples shows" is not correct.
> > I think there is only ONE example being referred to.
> >
> > BEFORE:
> > +The following examples shows how logical decoding is controlled over 
> > the
> > AFTER:
> > +The following example shows how logical decoding is controlled over the
> >
> >
> fixed.
>
> > (2)
> > v88 - 0003
> >
> > doc/src/sgml/ref/create_subscription.sgml
> >
> > (i)
> >
> > BEFORE:
> > +  to the subscriber on the PREPARE TRANSACTION. By default,
> > the transaction
> > +  prepared on publisher is decoded as a normal transaction at 
> > commit.
> > AFTER:
> > +  to the subscriber on the PREPARE TRANSACTION. By default,
> > the transaction
> > +  prepared on the publisher is decoded as a normal
> > transaction at commit time.
> >
>
> fixed.
>
> > (ii)
> >
> > src/backend/access/transam/twophase.c
> >
> > The double-bracketing is unnecessary:
> >
> > BEFORE:
> > + if ((gxact->valid && strcmp(gxact->gid, gid) == 0))
> > AFTER:
> > + if (gxact->valid && strcmp(gxact->gid, gid) == 0)
> >
>
> fixed.
>
> > (iii)
> >
> > src/backend/replication/logical/snapbuild.c
> >
> > Need to add some commas to make the following easier to read, and
> > change "needs" to "need":
> >
> > BEFORE:
> > + * The prepared transactions that were skipped because previously
> > + * two-phase was not enabled or are not covered by initial snapshot needs
> > + * to be sent later along with commit prepared and they must be before
> > + * this point.
> > AFTER:
> > + * The prepared transactions, that were skipped because previously
> > + * two-phase was not enabled or are not covered by initial snapshot, need
> > + * to be sent later along with commit prepared and they must be before
> > + * this point.
> >
>
> fixed.
>
> > (iv)
> >
> > src/backend/replication/logical/tablesync.c
> >
> > I think the convention used in Postgres code is to check for empty
> > Lists using "== NIL" and non-empty Lists using "!= NIL".
> >
> > BEFORE:
> > + if (table_states_not_ready && !last_start_times)
> > AFTER:
> > + if (table_states_not_ready != NIL && !last_start_times)
> >
> >
> > BEFORE:
> > + else if (!table_states_not_ready && last_start_times)
> > AFTER:
> > + else if (table_states_not_ready == NIL && last_start_times)
> >
>
> fixed.
>
> Also fixed comments from Vignesh:
>
> 1) This content is present in
> v87-0001-Add-option-to-set-two-phase-in-CREATE_REPLICATIO.patch and
> v87-0003-Add-support-for-prepared-transactions-to-built-i.patch, it
> can be removed from one of them
>
> +   TWO_PHASE
> +   
> +
> + Specify that this logical replication slot supports decoding
> of two-phase
> + transactions. With this option, two-phase commands like
> + PREPARE TRANSACTION, COMMIT
> PREPARED
> + and ROLLBACK PREPARED are decoded and 
> transmitted.
> + The transaction will be decoded and transmitted at
> + PREPARE TRANSACTION time.
> +
> +   
> +  
> +
> +  
>
> I don't see this duplicate content.

Thanks for the updated patch.
The patch v89-0001-Add-option-to-set-two-phase-in-CREATE_REPLICATIO.patch
has the following:
+   TWO_PHASE
+   
+
+ Specify that this logical replication slot supports decoding
of two-phase
+ transactions. With this option, two-phase commands like
+ PREPARE TRANSACTION, COMMIT
PREPARED
+ and ROLLBACK PREPARED are decoded and transmitted.
+ The transaction will be decoded and transmitted at
+ PREPARE TRANSACTION time.
+
+   
+  

The patch v89-0003-Add-support-for-prepared-transactions-to-built-i.patch
has the following:
+   TWO_PHASE
+   
+
+ Specify that this replication slot supports decode of two-phase
+ transactions. With this option, two-phase commands like
+ PREPARE TRANSACTION, COMMIT
PREPARED
+ and ROLLBACK PREPARED are decoded and transmitted.
+ The transaction will be decoded and transmitted at
+ PREPARE TRANSACTION time.
+
+   
+  

We can remove one of them.

Regards,
Vignesh




Automatic notification for top transaction IDs

2021-06-22 Thread Gurjeet Singh
I came across this thread [1] to disallow canceling a transaction not
yet confirmed by a synchronous replica. I think my proposed patch
might help that case as well, hence adding all involved in that thread
to BCC, for one-time notification.

As mentioned in that thread, when sending a cancellation signal, the
client cannot be sure if the cancel signal was honored, and if the
transaction was cancelled successfully. In the attached patch, the
backend emits a NotificationResponse containing the current full
transaction id. It does so only if the relevant GUC is enabled, and
when the top-transaction is being assigned the ID.

This information can be useful to the client, when:
i) it wants to cancel a transaction _after_ issuing a COMMIT, and
ii) it wants to check the status of its transaction that it sent
COMMIT for, but never received a response (perhaps because the server
crashed).

Additionally, this information can be useful for middleware, like
Transaction Processing Monitors, which can now transparently (without
any change in application code) monitor the status of transactions (by
watching for the transaction status indicator in the ReadyForQuery
protocol message). They can use the transaction ID from the
NotificationResponse to open a watcher, and on seeing either an 'E' or
'I' payload in subsequent ReadyForQuery messages, close the watcher.
On server crash, or other adverse events, they can then use the
transaction IDs still being watched to check status of those
transactions, and take appropriate actions, e.g. retry any aborted
transactions.

We cannot use the elog() mechanism for this notification because it is
sensitive to the value of client_min_messages. Hence I used the NOTIFY
infrastructure for this message. I understand that this usage violates
some expectations as to how NOTIFY messages are supposed to behave
(see [2] below), but I think these are acceptable violations; open to
hearing if/why this might not be acceptable, and any possible
alternatives.

I'm not very familiar with the parallel workers infrastructure, so the
patch is missing any consideration for those.

Reviews welcome.

[1]: subject was: Re: Disallow cancellation of waiting for synchronous
replication
thread: 
https://www.postgresql.org/message-id/flat/C1F7905E-5DB2-497D-ABCC-E14D4DEE506C%40yandex-team.ru

[2]:
 At present, NotificationResponse can only be sent outside a
 transaction, and thus it will not occur in the middle of a
 command-response series, though it might occur just before ReadyForQuery.
 It is unwise to design frontend logic that assumes that, however.
 Good practice is to be able to accept NotificationResponse at any
 point in the protocol.

Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/


notify_xid.patch
Description: Binary data


Re: Assertion failure in HEAD and 13 after calling COMMIT in a stored proc

2021-06-22 Thread Justin Pryzby
On Wed, Jun 23, 2021 at 12:07:11AM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > This causes the server to crash during FETCH.
> 
> > ts=# begin; declare b cursor for VALUES(1); fetch 100 in b;
> > BEGIN
> > DECLARE CURSOR
> > server closed the connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> > The connection to the server was lost. Attempting reset: Failed.
> 
> Hm, works for me:

I think it's because I had old pg_stat_statements module, and hadn't make -C 
contrib.

Sorry for the noise.

-- 
Justin




Partition Check not updated when insert into a partition

2021-06-22 Thread houzj.f...@fujitsu.com
Hi,

When directly INSERT INTO partition, postgres will invoke ExecPartitionCheck
which will execute its parent's and grandparent's partition constraint check.
>From the code, the whole constraint check is saved in relcache::rd_partcheck.

For a multi-level partition, for example: table 'A' is partition of table
'B', and 'B' is also partition of table 'C'. After I 'ALTER TABLE C DETACH B',
I thought partition constraint check of table 'C' does not matter anymore if
INSERT INTO table 'A'. But it looks like the relcache of 'A' is not invalidated
after detaching 'B'. And the relcache::rd_partcheck still include the partition
constraint of table 'C'. Note If I invalidate the table 'A''s relcache manually,
then next time the relcache::rd_partcheck will be updated to the expected
one which does not include partition constraint check of table 'C'.
(ATTACH partition has the same behaviour that relcache::rd_partcheck will
not be updated immediately)

Does it work as expected ? I didn't find some explanation from the doc.
(sorry if I missed something).

Best regards,
houzj




Re: Doc chapter for Hash Indexes

2021-06-22 Thread Amit Kapila
On Tue, Jun 22, 2021 at 2:31 PM Simon Riggs
 wrote:
>
> I attach both clean and compare versions.
>

Do we want to hold this work for PG15 or commit in PG14 and backpatch
it till v10 where we have made hash indexes crash-safe? I would vote
for committing in PG14 and backpatch it till v10, however, I am fine
if we want to commit just to PG14 or PG15.

-- 
With Regards,
Amit Kapila.




Re: Assertion failure in HEAD and 13 after calling COMMIT in a stored proc

2021-06-22 Thread Tom Lane
Justin Pryzby  writes:
> This causes the server to crash during FETCH.

> ts=# begin; declare b cursor for VALUES(1); fetch 100 in b;
> BEGIN
> DECLARE CURSOR
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.

Hm, works for me:

regression=# begin; declare b cursor for VALUES(1); fetch 100 in b;
BEGIN
DECLARE CURSOR
 column1 
-
   1
(1 row)

regards, tom lane




Re: Assertion failure in HEAD and 13 after calling COMMIT in a stored proc

2021-06-22 Thread Justin Pryzby
On Tue, Jun 22, 2021 at 01:13:08PM -0400, Tom Lane wrote:
> I wrote:
> > The attached seems to be enough to resolve Jim's example.  I'd like
> > to invent a test case that involves a detoast of the simple
> > expression's result, too, to show that transiently pushing a
> > snapshot for the duration of the expression is not the right fix.
> 
> Here we go.  This test case gives "cannot fetch toast data without an
> active snapshot" in v11 and v12 branch tips.  Since those branches lack
> the 73b06cf89 optimization, they push a snapshot while calling the
> SQL-language function, thus it doesn't complain.  But what comes back
> is toasted, and then we fail trying to detoast it.

This causes the server to crash during FETCH.

ts=# begin; declare b cursor for VALUES(1); fetch 100 in b;
BEGIN
DECLARE CURSOR
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

7c337b6b527b7052e6a751f966d5734c56f668b5 is the first bad commit
|   commit 7c337b6b527b7052e6a751f966d5734c56f668b5
|   Author: Tom Lane 
|   Date:   Fri Jun 18 11:22:58 2021 -0400
|
|   Centralize the logic for protective copying of utility statements.

I noticed because it was tickled by pg_dump.

[109037.576659] postgres[32358]: segfault at 9a ip 7f86a68fa7b1 sp 
7fffd5ae2a88 error 4 in libc-2.17.so[7f86a678b000+1c4000]

< 2021-06-22 20:00:06.557 EDT  >LOG:  server process (PID 32358) was terminated 
by signal 11: Segmentation fault
< 2021-06-22 20:00:06.557 EDT  >DETAIL:  Failed process was running: FETCH 1000 
IN bloboid

Core was generated by `postgres: postgres ts [local] FETCH  
'.

(gdb) bt
#0  0x7f86a68fa7b1 in __strlen_sse2_pminub () from /lib64/libc.so.6
#1  0x008f7151 in string_hash (key=0x9a, keysize=64) at hashfn.c:667
#2  0x008c1dd0 in hash_search (hashp=0x1534168, keyPtr=0x9a, 
action=action@entry=HASH_REMOVE, foundPtr=foundPtr@entry=0x0) at dynahash.c:959
#3  0x008df29b in PortalDrop (portal=0x1532158, isTopCommit=) at portalmem.c:514
#4  0x007959a7 in exec_simple_query (query_string=0x14bce88 "FETCH 1000 
IN bloboid") at postgres.c:1224
#5  0x00796e2d in PostgresMain (argc=argc@entry=1, 
argv=argv@entry=0x7fffd5ae2ff0, dbname=0x14b9a78 "ts", username=) at postgres.c:4486
#6  0x004890c1 in BackendRun (port=, port=) at postmaster.c:4507
#7  BackendStartup (port=0x14e4280) at postmaster.c:4229
#8  ServerLoop () at postmaster.c:1745
#9  0x00718dcd in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x14b79e0) at postmaster.c:1417
#10 0x00489f32 in main (argc=3, argv=0x14b79e0) at main.c:209

(gdb) p *portal
$1 = {name = 0x9a , prepStmtName = 0x0, 
portalContext = 0x15f5b00, resowner = 0x14f7d50, cleanup = 0x0, createSubid = 
1, activeSubid = 1,
  sourceText = 0x14bce88 "FETCH 1000 IN bloboid", commandTag = CMDTAG_FETCH, qc 
= {commandTag = CMDTAG_FETCH, nprocessed = 0}, stmts = 0x14bdc58, cplan = 0x0,
  portalParams = 0x0, queryEnv = 0x0, strategy = PORTAL_UTIL_SELECT, 
cursorOptions = 4, run_once = true, status = PORTAL_READY, portalPinned = 
false, autoHeld = false,
  queryDesc = 0x0, tupDesc = 0x15f5c18, formats = 0x15f5d28, portalSnapshot = 
0x0, holdStore = 0x165d688, holdContext = 0x165d570, holdSnapshot = 0x0, 
atStart = true,
  atEnd = true, portalPos = 0, creation_time = 677721606449684, visible = false}




Re: pgbench logging broken by time logic changes

2021-06-22 Thread Michael Paquier
On Tue, Jun 22, 2021 at 12:06:45PM +0200, Fabien COELHO wrote:
> Attached an updated v8 patch which adds (reinstate) an improved TAP test
> which would have caught the various regressions on logs.

> Given that such tests were removed once before, I'm unsure whether they will
> be acceptable, despite that their usefulness has been clearly demonstrated.
> At least it is for the record. Sigh:-(

Thanks!

This v8 is an addition of the fix for the epoch with the adjustments
for the aggregate reports in the logs.  The maths look rather right
after a read and after some tests.

+# note: this test is time sensitive, and may fail on a very
+#   loaded host.
+# note: --progress-timestamp is not tested
+my $delay = pgbench(
+   '-T 2 -P 1 -l --aggregate-interval=1 -S -b se@2'
+   . ' --rate=20 --latency-limit=1000 -j ' . $nthreads
+   . ' -c 3 -r',
+   0,
+   [   qr{type: multiple},
+   qr{clients: 3},
+   qr{threads: $nthreads},
+   qr{duration: 2 s},
+   qr{script 1: .* select only},
+   qr{script 2: .* select only},
+   qr{statement latencies in milliseconds},
+   qr{FROM pgbench_accounts} ],
+   [ qr{vacuum}, qr{progress: 1\b} ],
+   'pgbench progress', undef,
+   "--log-prefix=$bdir/001_pgbench_log_1");
Could we make that shorter at 1s?  That will shorten the duration of
the test run.  It is easy to miss that this test is for
--aggregate-interval (aka the logAgg() path) without a comment.

+# cool check that we are around 2 seconds
+# The rate may results in an unlucky schedule which triggers
+# an early exit, hence the loose bound.
+ok(0.0 < $delay && $delay < 4.0, "-T 2 run around 2 seconds");

The command itself would not fail, but we would just fail on a machine
where the difference in start/stop time is higher than 4 seconds,
right?  On RPI-level machines, this could fail reliably.  I am not
completely sure what's the additional value we can get from that extra
test, to be honest.

+# $nthreads threads, 2 seconds, but due to timing imprecision we might get
+# only 1 or as many as 3 progress reports per thread.
+check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3,
+   qr{^\d{10,} \d{1,2} \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+$});
+
Now this one is good and actually stable thanks to the fact that we'd
get at least the final logs, and the main complain we got to discuss
about on this thread was the format of the logs.  I would say to give
up on the first test, and keep the second.  But, is this regex
correct?  Shouldn't we check for six integer fields only with the
first one having a minimal number of digits for the epoch?
--
Michael


signature.asc
Description: PGP signature


Re: intermittent failures in Cygwin from select_parallel tests

2021-06-22 Thread Noah Misch
On Tue, Jun 22, 2021 at 08:18:42AM -0400, Andrew Dunstan wrote:
> On 6/22/21 2:42 AM, Noah Misch wrote:
> > I was wondering about suggesting some kind of
> >> official warning, but I guess the manual already covers it with this
> >> 10 year old notice.  I don't know much about Windows or Cygwin so I'm
> >> not sure if it needs updating or not, but I would guess that there are
> >> no longer any such systems?
> >>
> >>   Cygwin is not recommended for running a
> >>   production server, and it should only be used for running on
> >>   older versions of Windows where
> >>   the native build does not work.
> > I expect native builds work on all Microsoft-supported Windows versions, so 
> > +1
> > for removing everything after the comma.
> 
> If we just want to declare Cygwin unsupported I can halt lorikeet and we
> can be done with it.

No way.  lorikeet is valuable. 




Re: Optionally automatically disable logical replication subscriptions on error

2021-06-22 Thread Amit Kapila
On Mon, Jun 21, 2021 at 4:17 PM Amit Kapila  wrote:
>
> On Mon, Jun 21, 2021 at 11:19 AM Amit Kapila  wrote:
> >
>
> I think we should store the input from the user (like disable_on_error
> flag or xid to skip) in the system catalog pg_subscription and send
> the error information (subscrtion_id, rel_id, xid of failed xact,
> error_code, error_message, etc.) to the stats collector which can be
> used to display such information via a stat view.
>
> The disable_on_error flag handling could be that on error it sends the
> required error info to stats collector and then updates the subenabled
> in pg_subscription. In rare conditions, where we are able to send the
> message but couldn't update the subenabled info in pg_subscription
> either due to some error or server restart, the apply worker would
> again try to apply the same change and would hit the same error again
> which I think should be fine because it will ultimately succeed.
>
> The skip xid handling will also be somewhat similar where on an error,
> we will send the error information to stats collector which will be
> displayed via stats view. Then the user is expected to ask for skip
> xid (Alter Subscription ... SKIP ) based on information
> displayed via stat view. Now, the apply worker can skip changes from
> such a transaction, and then during processing of commit record of the
> skipped transaction, it should update xid to invalid value, so that
> next time that shouldn't be used. I think it is important to update
> xid to an invalid value as part of the skipped transaction because
> otherwise, after the restart, we won't be able to decide whether we
> still want to skip the xid stored for a subscription.
>

One minor detail I missed in the above sketch for skipped transaction
feature was that actually we only need replication origin state from
the commit record of the skipped transaction and then I think we need
to start a transaction, update the xid value to invalid, set the
replication origin state and commit that transaction.

-- 
With Regards,
Amit Kapila.




Re: subscription/t/010_truncate.pl failure on desmoxytes in REL_13_STABLE

2021-06-22 Thread Amit Kapila
On Tue, Jun 22, 2021 at 6:54 PM Tom Lane  wrote:
>
> Thomas Munro  writes:
> > On Tue, Jun 22, 2021 at 10:01 PM Amit Kapila  
> > wrote:
> >> I see this report is from 16th June 2021 and the commit is on 18th
> >> June 2021. So, I am hoping this should have been fixed but if we see
> >> it again then probably we need to investigate it further.
>
> > Ahh, that makes sense.  Thanks for checking, and sorry for the noise.
>
> BTW, the reason that the walsender is still showing its "query" as
> "SELECT pg_config..." is that pre-v14 versions don't update the
> reported query for replication commands, only plain-SQL commands.
> I recall that we fixed that in HEAD awhile ago; should we consider
> back-patching something for it?
>

I think it would be great if we can do that. Analyzing such failures
and in general for replication errors that will be a nice improvement
and make the jobs of many people a bit easier.

-- 
With Regards,
Amit Kapila.




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-06-22 Thread Greg Nancarrow
On Wed, Jun 23, 2021 at 1:06 AM Maxim Orlov  wrote:
>
> The analysis in the beginning of the discussion seems to be right, but
> the fix v2 looks too invasive for me.
>
> Personally, I'd like not to remove snapshot even if transaction is
> read-only. I propose to consider "xid < TransactionXmin" as a legit case
> and just promote xid to TransactionXmin.
>
> It's annoying this old bug still not fixed. What do you think?


This v3 patch doesn't look right to me at all.
It's not addressing the fundamental problem, it just seems to be
working around it, by fiddling a xid value to avoid an assertion
failure.

I really can't see how the v2 patch "removes a snapshot" in the
read-only transaction case, and is "invasive".
Have you found a case that the v2 patch breaks?

The v2 patch does follow the analysis in the beginning of the
discussion, which identified that in setting up parallel workers, a
"later transaction snapshot" was taken than the one actually used in
the statement execution, and that's what eventually leads to the
observed Assertion failure.
The original problem reporter stated: "So the root cause is the
Parallel Workers process set the TransactionXmin with later
transcation snapshot".
Also, as far as I can see, it makes no sense to pass parallel workers
both an active snapshot and a (later) transaction snapshot. In the
leader, prior to execution and running parallel workers, a transaction
snapshot is obtained and pushed as the active snapshot (note: only ONE
snapshot at this point). It makes little sense to me to then obtain
ANOTHER transaction snapshot when setting up parallel workers, and
pass that (only) to the workers along with the original (earlier)
active snapshot. (If there is a reason for passing the TWO snapshots
to parallel workers, original code authors and/or snapshot experts
should speak up now, and code comments should be added accordingly to
explain the purpose and how it is MEANT to work)
This is why the v2 patch updates the code to just pass one snapshot
(the active snapshot) to the parallel workers, which gets restored in
each worker as the transaction snapshot and set as the active snapshot
(so is then consistent with the snapshot setup in the parallel
leader).
The v3 patch doesn't address any of this at all. It is just checking
if the xid is < the TransactionXMin in the snapshot, and if so,
setting the xid to be TransactionXMin, and thus avoiding the Assertion
failure. But that TransactionXMin was from the "later transaction
snapshot", which was only obtained for the workers and doesn't
necessarily match the earlier active snapshot taken, that is
associated with the actual statement execution. As far as I am
concerned, that "later transaction snapshot", taken for use by the
workers, doesn't seem right, so that's why the v2 patch removes the
use of it and only uses the active snapshot, for consistency with the
rest of the code. I think that if there was something fundamentally
wrong with the v2 patch's removal of that "later transaction snapshot"
used by the parallel workers, then surely there would be a test
failure somewhere - but that is not the case.
I think the v2 patch should be restored as the proposed solution here.

Regards,
Greg Nancarrow
Fujitsu Australia




RE: [bug?] Missed parallel safety checks, and wrong parallel safety

2021-06-22 Thread houzj.f...@fujitsu.com
On Tuesday, June 22, 2021 8:25 PM Amit Kapila  wrote:
> On Wed, Jun 16, 2021 at 8:57 AM houzj.f...@fujitsu.com 
>  wrote:
> >
> > I think the check of partition could be even more complicated if we
> > need to check the parallel safety of partition key expression. If user
> > directly insert into a partition, then we need invoke
> > ExecPartitionCheck which will execute all it's parent's and
> > grandparent's partition key expressions. It means if we change a
> > parent table's partition key expression(by 1) change function in expr
> > or 2) attach the parent table as partition of another parent table), then we
> need to invalidate all its child's relcache.
> >
> 
> I think we already invalidate the child entries when we add/drop constraints 
> on
> a parent table. See ATAddCheckConstraint, ATExecDropConstraint. If I am not
> missing anything then this case shouldn't be a problem. Do you have
> something else in mind?

Currently, attach/detach a partition doesn't invalidate the child entries
recursively, except when detach a partition concurrently which will add a
constraint to all the child. Do you mean we can add the logic about
invalidating the child entries recursively when attach/detach a partition ?

Best regards,
houzj


Re: Pipeline mode and PQpipelineSync()

2021-06-22 Thread Alvaro Herrera
On 2021-Jun-22, Alvaro Herrera wrote:

> > So I think it would be useful to clarify the server behavior and
> > specify it in the documentation.
> 
> I'll see about improving the docs on these points.

So I started to modify the second paragraph to indicate that the client
would send data on PQflush/buffer full/PQpipelineSync, only to realize
that the first paragraph already explains this.  So I'm not sure if any
changes are needed.

Maybe your complaint is only based on disagreement about what does libpq
do regarding queueing commands; and as far as I can tell in quick
experimentation with libpq, it works as the docs state already.

-- 
Álvaro Herrera   Valdivia, Chile




Re: Speed up pg_checksums in cases where checksum already set

2021-06-22 Thread Michael Paquier
On Fri, Jun 18, 2021 at 08:01:17PM -0400, Greg Sabino Mullane wrote:
> I don't know that we need to bother: the default is already to sync and one
> has to go out of one's way using the -N argument to NOT sync, so I think
> it's a pretty safe assumption to everyone (except those who read my first
> version of my patch!) that syncing always happens.

Perhaps you are right to keep it simple.  If people would like to
document that more precisely, it could always be changed if
necessary.  What you have here sounds pretty much right to me.
--
Michael


signature.asc
Description: PGP signature


Re: Avoid stuck of pbgench due to skipped transactions

2021-06-22 Thread Yugo NAGATA
Hello Greg,

On Tue, 22 Jun 2021 19:22:38 +
Greg Sabino Mullane  wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, failed
> Implements feature:   tested, failed
> Spec compliant:   not tested
> Documentation:not tested
> 
> Looks fine to me, as a way of catching this edge case.

Thank you for looking into this!

'make installcheck-world' and 'Implements feature' are marked "failed",
but did you find any problem on this patch?

-- 
Yugo NAGATA 




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-06-22 Thread Yugo NAGATA
Hello Fabien,

On Tue, 22 Jun 2021 20:03:58 +0200 (CEST)
Fabien COELHO  wrote:

> 
> Hello Yugo-san,
> 
> Thanks a lot for continuing this work started by Marina!
> 
> I'm planning to review it for the July CF. I've just added an entry there:
> 
>   https://commitfest.postgresql.org/33/3194/

Thanks!

-- 
Yugo NAGATA 




Re: Use simplehash.h instead of dynahash in SMgr

2021-06-22 Thread Thomas Munro
On Tue, Jun 22, 2021 at 6:51 PM David Rowley  wrote:
> I guess we could also ask ourselves how many join algorithms we need.

David and I discussed this a bit off-list, and I just wanted to share
how I understand the idea so far in case it helps someone else.  There
are essentially three subcomponents working together:

1.  A data structure similar in some ways to a C++ std::deque,
which gives O(1) access to elements by index, is densely packed to
enable cache-friendly scanning of all elements, has stable addresses
(as long as you only add new elements at the end or overwrite existing
slots), and is internally backed by an array of pointers to a set of
chunks.

2.  A bitmapset that tracks unused elements in 1, making it easy to
find the lowest-index hole when looking for a place to put a new one
by linear search for a 1 bit, so that we tend towards maximum density
despite having random frees from time to time (seems good, the same
idea is used in  kernels to allocate the lowest unused file descriptor
number).

3. A hash table that has as elements indexes into 1.  It somehow hides
the difference between keys (what callers look things up with) and
keys reachable by following an index into 1 (where elements' keys
live).

One thought is that you could do 1 as a separate component as the
"primary" data structure, and use a plain old simplehash for 3 as a
kind of index into it, but use pointers (rather than indexes) to
objects in 1 as elements.  I don't know if it's better.




Re: Maintaining a list of pgindent commits for "git blame" to ignore

2021-06-22 Thread Peter Geoghegan
On Tue, Jun 22, 2021 at 5:01 PM Alvaro Herrera  wrote:
> I have 2.30.  It works better.  To be clear: some lines still appear as
> originating in some pgindent commit, when they are created by such a
> commit.  But as far as I've seen, they're mostly uninteresting ones
> (whitespace, only braces, only "else", only "for (;;)" and similar).

As I understand it there are a small number of remaining lines that
are fundamentally impossible to attribute to any commit but a pgindent
commit. These are lines that a pgindent commit created, typically when
it adds a new single line of whitespace (carriage return). I think
that these remaining lines of whitespace probably *should* be
attributed to a pgindent commit -- it's actually a good thing. In any
case they're unlikely to be called up because they're just whitespace.

> The git blame experience seems much better.  Thanks!

I'm very pleased with the results myself.

It's particularly nice when you "git blame" an old file that has been
through multiple huge pgindent changes. You can actually see
reasonable attributions for commits that go back to the 1990s now.

-- 
Peter Geoghegan




Re: Maintaining a list of pgindent commits for "git blame" to ignore

2021-06-22 Thread Alvaro Herrera
On 2021-Mar-18, Peter Geoghegan wrote:

> On Thu, Mar 18, 2021 at 3:12 PM Bruce Momjian  wrote:
> > It would be kind of nice if the file can be generated automatically.  I
> > have you checked if 'pgindent' being on the first line of the commit is
> > sufficient?
> 
> I generated the file by looking for commits that:
> 
> 1) Mentioned "pgindent" or "PGINDENT" in the entire commit message.
> 
> 2) Had more than 20 or 30 files changed.

Is there a minimum git version for this to work?  It doesn't seem to
work for me.

... ah, apparently one needs git 2.23:
https://www.moxio.com/blog/43/ignoring-bulk-change-commits-with-git-blame

I have 2.20.

[ apt install -t buster-backports git ]

I have 2.30.  It works better.  To be clear: some lines still appear as
originating in some pgindent commit, when they are created by such a
commit.  But as far as I've seen, they're mostly uninteresting ones
(whitespace, only braces, only "else", only "for (;;)" and similar).
   
The git blame experience seems much better.  Thanks!

-- 
Álvaro Herrera39°49'30"S 73°17'W
"El que vive para el futuro es un iluso, y el que vive para el pasado,
un imbécil" (Luis Adler, "Los tripulantes de la noche")




Re: cleanup temporary files after crash

2021-06-22 Thread Michael Paquier
On Tue, Jun 22, 2021 at 04:12:51PM -0300, Euler Taveira wrote:
> Was it? I seem to have missed this suggestion.
> 
> I'm attaching a patch to fix it.

Looks adapted to me.
--
Michael


signature.asc
Description: PGP signature


Re: [PoC] Federated Authn/z with OAUTHBEARER

2021-06-22 Thread Jacob Champion
On Fri, 2021-06-18 at 13:07 +0900, Michael Paquier wrote:
> On Tue, Jun 08, 2021 at 04:37:46PM +, Jacob Champion wrote:
> > 1. Prep
> > 
> >   0001 decouples the SASL code from the SCRAM implementation.
> >   0002 makes it possible to use common/jsonapi from the frontend.
> >   0003 lets the json_errdetail() result be freed, to avoid leaks.
> > 
> > The first three patches are, hopefully, generally useful outside of
> > this implementation, and I'll plan to register them in the next
> > commitfest. The middle two patches are the "interesting" pieces, and
> > I've split them into client and server for ease of understanding,
> > though neither is particularly useful without the other.
> 
> Beginning with the beginning, could you spawn two threads for the
> jsonapi rework and the SASL/SCRAM business?

Done [1, 2]. I've copied your comments into those threads with my
responses, and I'll have them registered in commitfest shortly.

Thanks!
--Jacob

[1] 
https://www.postgresql.org/message-id/3d2a6f5d50e741117d6baf83eb67ebf1a8a35a11.camel%40vmware.com
[2] 
https://www.postgresql.org/message-id/a250d475ba1c0cc0efb7dfec8e538fcc77cdcb8e.camel%40vmware.com


Re: [PoC] Federated Authn/z with OAUTHBEARER

2021-06-22 Thread Jacob Champion
On Fri, 2021-06-18 at 11:31 +0300, Heikki Linnakangas wrote:
> On 08/06/2021 19:37, Jacob Champion wrote:
> > We've been working on ways to expand the list of third-party auth
> > methods that Postgres provides. Some example use cases might be "I want
> > to let anyone with a Google account read this table" or "let anyone who
> > belongs to this GitHub organization connect as a superuser".
> 
> Cool!

Glad you think so! :D

> > The iddawc dependency for client-side OAuth was extremely helpful to
> > develop this proof of concept quickly, but I don't think it would be an
> > appropriate component to build a real feature on. It's extremely
> > heavyweight -- it incorporates a huge stack of dependencies, including
> > a logging framework and a web server, to implement features we would
> > probably never use -- and it's fairly difficult to debug in practice.
> > If a device authorization flow were the only thing that libpq needed to
> > support natively, I think we should just depend on a widely used HTTP
> > client, like libcurl or neon, and implement the minimum spec directly
> > against the existing test suite.
> 
> You could punt and let the application implement that stuff. I'm 
> imagining that the application code would look something like this:
> 
> conn = PQconnectStartParams(...);
> for (;;)
> {
>  status = PQconnectPoll(conn)
>  switch (status)
>  {
>  case CONNECTION_SASL_TOKEN_REQUIRED:
>  /* open a browser for the user, get token */
>  token = open_browser()
>  PQauthResponse(token);
>  break;
>  ...
>  }
> }

I was toying with the idea of having a callback for libpq clients,
where they could take full control of the OAuth flow if they wanted to.
Doing it inline with PQconnectPoll seems like it would work too. It has
a couple of drawbacks that I can see:

- If a client isn't currently using a poll loop, they'd have to switch
to one to be able to use OAuth connections. Not a difficult change, but
considering all the other hurdles to making this work, I'm hoping to
minimize the hoop-jumping.

- A client would still have to receive a bunch of OAuth parameters from
some new libpq API in order to construct the correct URL to visit, so
the overall complexity for implementers might be higher than if we just
passed those params directly in a callback.

> It would be nice to have a simple default implementation, though, for 
> psql and all the other client applications that come with PostgreSQL itself.

I agree. I think having a bare-bones implementation in libpq itself
would make initial adoption *much* easier, and then if specific
applications wanted to have richer control over an authorization flow,
then they could implement that themselves with the aforementioned
callback.

The Device Authorization flow was the most minimal working
implementation I could find, since it doesn't require a web browser on
the system, just the ability to print a prompt to the console. But if
anyone knows of a better flow for this use case, I'm all ears.

> > If you've read this far, thank you for your interest, and I hope you
> > enjoy playing with it!
> 
> A few small things caught my eye in the backend oauth_exchange function:
> 
> > +   /* Handle the client's initial message. */
> > +   p = strdup(input);
> 
> this strdup() should be pstrdup().

Thanks, I'll fix that in the next re-roll.

> In the same function, there are a bunch of reports like this:
> 
> >ereport(ERROR,
> > +  (errcode(ERRCODE_PROTOCOL_VIOLATION),
> > +   errmsg("malformed OAUTHBEARER message"),
> > +   errdetail("Comma expected, but found character 
> > \"%s\".",
> > + sanitize_char(*p;
> 
> I don't think the double quotes are needed here, because sanitize_char 
> will return quotes if it's a single character. So it would end up 
> looking like this: ... found character "'x'".

I'll fix this too. Thanks!

--Jacob


[PATCH] Make jsonapi usable from libpq

2021-06-22 Thread Jacob Champion
(This is split off from my work on OAUTHBEARER [1].)

The jsonapi in src/common can't currently be compiled into libpq. The
first patch here removes the dependency on pg_log_fatal(), which is not
available to the sharedlib. The second patch makes sure that all of the
return values from json_errdetail() can be pfree'd, to avoid long-
running leaks.

In the original thread, Michael Paquier commented:

> +#  define check_stack_depth()
> +#  ifdef JSONAPI_NO_LOG
> +#define json_log_and_abort(...) \
> +   do { fprintf(stderr, __VA_ARGS__); exit(1); } while(0)
> +#  else
> In patch 0002, this is the wrong approach.  libpq will not be able to
> feed on such reports, and you cannot use any of the APIs from the
> palloc() family either as these just fail on OOM.  libpq should be
> able to know about the error, and would fill in the error back to the
> application.  This abstraction is not necessary on HEAD as
> pg_verifybackup is fine with this level of reporting.  My rough guess
> is that we will need to split the existing jsonapi.c into two files,
> one that can be used in shared libraries and a second that handles the 
> errors.

Hmm. I'm honestly hesitant to start splitting files apart, mostly
because json_log_and_abort() is only called from two places, and both
those places are triggered by programmer error as opposed to user
error.

Would it make more sense to switch to an fprintf-and-abort case, to
match the approach taken by PGTHREAD_ERROR and the out-of-memory
conditions in fe-print.c? Or is there already precedent for handling
can't-happen code paths with in-band errors, through the PGconn?

--Jacob

[1] 
https://www.postgresql.org/message-id/d1b467a78e0e36ed85a09adf979d04cf124a9d4b.ca...@vmware.com
From 0541598e4f0bad1b9ff41a4640ec69491b393d54 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Mon, 3 May 2021 11:15:15 -0700
Subject: [PATCH 2/7] src/common: remove logging from jsonapi for shlib

The can't-happen code in jsonapi was pulling in logging code, which for
libpq is not included.
---
 src/common/Makefile  |  4 
 src/common/jsonapi.c | 11 ---
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/src/common/Makefile b/src/common/Makefile
index 38a8599337..6f1039bc78 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -28,6 +28,10 @@ subdir = src/common
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
+# For use in shared libraries, jsonapi needs to not link in any logging
+# functions.
+override CFLAGS_SL += -DJSONAPI_NO_LOG
+
 # don't include subdirectory-path-dependent -I and -L switches
 STD_CPPFLAGS := $(filter-out -I$(top_srcdir)/src/include -I$(top_builddir)/src/include,$(CPPFLAGS))
 STD_LDFLAGS := $(filter-out -L$(top_builddir)/src/common -L$(top_builddir)/src/port,$(LDFLAGS))
diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c
index 1bf38d7b42..6b6001b118 100644
--- a/src/common/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -27,11 +27,16 @@
 #endif
 
 #ifdef FRONTEND
-#define check_stack_depth()
-#define json_log_and_abort(...) \
+#  define check_stack_depth()
+#  ifdef JSONAPI_NO_LOG
+#define json_log_and_abort(...) \
+	do { fprintf(stderr, __VA_ARGS__); exit(1); } while(0)
+#  else
+#define json_log_and_abort(...) \
 	do { pg_log_fatal(__VA_ARGS__); exit(1); } while(0)
+#  endif
 #else
-#define json_log_and_abort(...) elog(ERROR, __VA_ARGS__)
+#  define json_log_and_abort(...) elog(ERROR, __VA_ARGS__)
 #endif
 
 /*
-- 
2.25.1

From 5ad4b3c7835fe9e0f284702ec7b827c27770854e Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Mon, 3 May 2021 15:38:26 -0700
Subject: [PATCH 3/7] common/jsonapi: always palloc the error strings

...so that client code can pfree() to avoid memory leaks in long-running
operations.
---
 src/common/jsonapi.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c
index 6b6001b118..f7304f584f 100644
--- a/src/common/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -1089,7 +1089,7 @@ json_errdetail(JsonParseErrorType error, JsonLexContext *lex)
 			return psprintf(_("Expected JSON value, but found \"%s\"."),
 			extract_token(lex));
 		case JSON_EXPECTED_MORE:
-			return _("The input string ended unexpectedly.");
+			return pstrdup(_("The input string ended unexpectedly."));
 		case JSON_EXPECTED_OBJECT_FIRST:
 			return psprintf(_("Expected string or \"}\", but found \"%s\"."),
 			extract_token(lex));
@@ -1103,16 +1103,16 @@ json_errdetail(JsonParseErrorType error, JsonLexContext *lex)
 			return psprintf(_("Token \"%s\" is invalid."),
 			extract_token(lex));
 		case JSON_UNICODE_CODE_POINT_ZERO:
-			return _("\\u cannot be converted to text.");
+			return pstrdup(_("\\u cannot be converted to text."));
 		case JSON_UNICODE_ESCAPE_FORMAT:
-			return _("\"\\u\" must be followed by four hexadecimal digits.");
+			return pstrdup(_("\"\\u\" must be followed by four hexadecimal digits."));
 		case JSON_UNICODE_HIGH_ESCAPE:
 	

Re: Pipeline mode and PQpipelineSync()

2021-06-22 Thread Alvaro Herrera
On 2021-Jun-21, Boris Kolpackov wrote:

> Alvaro Herrera  writes:
> 
> > I think I should rephrase this to say that PQpipelineSync() is needed
> > where the user needs the server to start executing commands; and
> > separately indicate that it is possible (but not promised) that the
> > server would start executing commands ahead of time because $reasons.
> 
> I think always requiring PQpipelineSync() is fine since it also serves
> as an error recovery boundary. But the fact that the server waits until
> the sync message to start executing the pipeline is surprising. To me
> this seems to go contrary to the idea of a "pipeline".

But does that actually happen?  There's a very easy test we can do by
sending queries that sleep.  If my libpq program sends a "SELECT
pg_sleep(2)", then PQflush(), then sleep in the client program two more
seconds without sending the sync; and *then* send the sync, I find that
the program takes 2 seconds, not four.  This shows that both client and
server slept in parallel, even though I didn't send the Sync until after
the client was done sleeping.

In order to see this, I patched libpq_pipeline.c with the attached, and
ran it under time:

time ./libpq_pipeline  simple_pipeline -t simple.trace
simple pipeline... sent and flushed the sleep. Sleeping 2s here:
client sleep done
ok

real0m2,008s
user0m0,000s
sys 0m0,003s


So I see things happening as you describe in (1):

> In fact, I see the following ways the server could behave:
> 
> 1. The server starts executing queries and sending their results before
>receiving the sync message.

I am completely at a loss on how to explain a server that behaves in any
other way, given how the protocol is designed.  There is no buffering on
the server side.

> While it can be tempting to say that this is an implementation detail,
> this affects the way one writes a client. For example, I currently have
> the following comment in my code:
> 
>   // Send queries until we get blocked. This feels like a better
>   // overall strategy to keep the server busy compared to sending one
>   // query at a time and then re-checking if there is anything to read
>   // because the results of INSERT/UPDATE/DELETE are presumably small
>   // and quite a few of them can get buffered before the server gets
>   // blocked.
> 
> This would be a good strategy for behavior (1) but not (3) (where it
> would make more sense to queue the queries on the client side).

Agreed, that's the kind of strategy I would have thought was the most
reasonable, given my understanding of how the protocol works.

I wonder if your program is being affected by something else.  Maybe the
socket is nonblocking (though I don't quite understand how that would
affect the client behavior in just this way), or your program is
buffering elsewhere.  I don't do C++ much so I can't help you with that.

> So I think it would be useful to clarify the server behavior and
> specify it in the documentation.

I'll see about improving the docs on these points.

> > Do I have it right that other than this documentation problem, you've
> > been able to use pipeline mode successfully?
> 
> So far I've only tried it in a simple prototype (single INSERT statement).
> But I am busy plugging it into ODB's bulk operation support (that we
> already have for Oracle and MSSQL) and once that's done I should be
> able to exercise things in more meaningful ways.

Fair enough.

-- 
Álvaro Herrera39°49'30"S 73°17'W




[PATCH] Pull general SASL framework out of SCRAM

2021-06-22 Thread Jacob Champion
(This is split off from my work on OAUTHBEARER [1].)

Currently, the SASL logic is tightly coupled to the SCRAM
implementation. This patch untangles the two, by introducing callback
structs for both the frontend and backend.

In the original thread, Michael Paquier commented:

> +   /* TODO: SASL_EXCHANGE_FAILURE with output is forbidden in SASL */
> if (result == SASL_EXCHANGE_SUCCESS)
> sendAuthRequest(port,
> AUTH_REQ_SASL_FIN,
> output,
> outputlen);
> Perhaps that's an issue we need to worry on its own?  I didn't recall
> this part..

Yeah, it was non-obvious to me on the first read too. It's a
consequence of a couple parts of the SASL spec [2]:

>The protocol may include an optional additional data field in this
>outcome message.  This field can only include additional data when
>the outcome is successful.

and

>SASL mechanism specifications MUST supply the following information:
> 
>[...]
> 
>   b) An indication of whether the server is expected to provide
>  additional data when indicating a successful outcome.  If so,
>  if the server sends the additional data as a challenge, the
>  specification MUST indicate that the response to this challenge
>  is an empty response.

There isn't a corresponding provision for data for a *failed* outcome,
so any such data would have to be sent as a separate, mechanism-
specific, challenge. This is what OAUTHBEARER has to do, with an
annoying "failure handshake".

(Note that our protocol implementation provides an "additional data"
field for the initial client response, but *not* for the authentication
outcome. That seems odd to me, but it is what it is, I suppose.)

Regarding that specific TODO -- I think it'd be good for the framework
to fail hard if a mechanism tries to send data during a failure
outcome, as it probably means the mechanism isn't implemented to spec.

--Jacob

[1] 
https://www.postgresql.org/message-id/d1b467a78e0e36ed85a09adf979d04cf124a9d4b.ca...@vmware.com
[2] https://datatracker.ietf.org/doc/html/rfc4422
From a6a65b66cc3dc5da7219378dbadb090ff10fd42b Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Tue, 13 Apr 2021 10:25:48 -0700
Subject: [PATCH 1/7] auth: generalize SASL mechanisms

Split the SASL logic out from the SCRAM implementation, so that it can
be reused by other mechanisms.  New implementations will implement both
a pg_sasl_mech and a pg_be_sasl_mech.
---
 src/backend/libpq/auth-scram.c   | 34 ++-
 src/backend/libpq/auth.c | 34 ---
 src/include/libpq/sasl.h | 34 +++
 src/include/libpq/scram.h| 13 +++--
 src/interfaces/libpq/fe-auth-scram.c | 40 +++-
 src/interfaces/libpq/fe-auth.c   | 16 ---
 src/interfaces/libpq/fe-auth.h   | 11 ++--
 src/interfaces/libpq/fe-connect.c|  6 +
 src/interfaces/libpq/libpq-int.h | 14 ++
 9 files changed, 139 insertions(+), 63 deletions(-)
 create mode 100644 src/include/libpq/sasl.h

diff --git a/src/backend/libpq/auth-scram.c b/src/backend/libpq/auth-scram.c
index f9e1026a12..db3ca75a60 100644
--- a/src/backend/libpq/auth-scram.c
+++ b/src/backend/libpq/auth-scram.c
@@ -101,11 +101,25 @@
 #include "common/sha2.h"
 #include "libpq/auth.h"
 #include "libpq/crypt.h"
+#include "libpq/sasl.h"
 #include "libpq/scram.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
 #include "utils/timestamp.h"
 
+static void  scram_get_mechanisms(Port *port, StringInfo buf);
+static void *scram_init(Port *port, const char *selected_mech,
+		const char *shadow_pass);
+static int   scram_exchange(void *opaq, const char *input, int inputlen,
+			char **output, int *outputlen, char **logdetail);
+
+/* Mechanism declaration */
+const pg_be_sasl_mech pg_be_scram_mech = {
+	scram_get_mechanisms,
+	scram_init,
+	scram_exchange,
+};
+
 /*
  * Status data for a SCRAM authentication exchange.  This should be kept
  * internal to this file.
@@ -170,16 +184,14 @@ static char *sanitize_str(const char *s);
 static char *scram_mock_salt(const char *username);
 
 /*
- * pg_be_scram_get_mechanisms
- *
  * Get a list of SASL mechanisms that this module supports.
  *
  * For the convenience of building the FE/BE packet that lists the
  * mechanisms, the names are appended to the given StringInfo buffer,
  * separated by '\0' bytes.
  */
-void
-pg_be_scram_get_mechanisms(Port *port, StringInfo buf)
+static void
+scram_get_mechanisms(Port *port, StringInfo buf)
 {
 	/*
 	 * Advertise the mechanisms in decreasing order of importance.  So the
@@ -199,8 +211,6 @@ pg_be_scram_get_mechanisms(Port *port, StringInfo buf)
 }
 
 /*
- * pg_be_scram_init
- *
  * Initialize a new SCRAM authentication exchange status tracker.  This
  * needs to be called before doing any exchange.  It will be filled 

Re: disfavoring unparameterized nested loops

2021-06-22 Thread Peter Geoghegan
On Tue, Jun 22, 2021 at 2:53 AM Tomas Vondra
 wrote:
> Yeah, I like the insurance analogy - it gets to the crux of the problem,
> because insurance is pretty much exactly about managing risk.

The user's exposure to harm is what truly matters. I admit that that's
very hard to quantify, but we should at least try to do so.

We sometimes think about a plan that is 10x slower as if it's
infinitely slow, or might as well be. But it's usually not like that
-- it is generally meaningfully much better than the plan being 100x
slower, which is itself sometimes appreciably better than 1000x
slower. And besides, users often don't get anything like the optimal
plan, even on what they would consider to be a good day (which is most
days). So maybe 10x slower is actually the baseline good case already,
without anybody knowing it. Most individual queries are not executed
very often, even on the busiest databases. The extremes really do
matter a lot.

If a web app or OLTP query is ~10x slower than optimal then it might
be the practical equivalent of an outage that affects the query alone
(i.e. "infinitely slow") -- but probably not. I think that it is worth
paying more than nothing to avoid plans that are so far from optimal
that they might as well take forever to execute. This is not
meaningfully different from a database outage affecting one particular
query. It kind of is in a category of its own that surpasses "slow
plan", albeit one that is very difficult or impossible to define
formally.

There may be a huge amount of variation in risk tolerance among
basically reasonable people. For example, if somebody chooses to
engage in some kind of extreme sport, to me it seems understandable.
It's just not my cup of tea. Whereas if somebody chooses to never wear
a seatbelt while driving, then to me they're simply behaving
foolishly. They're not willing to incur the tiniest inconvenience in
order to get a huge reduction in potential harm -- including a very
real risk of approximately the worst thing that can happen to you.
Sure, refusing to wear a seatbelt can theoretically be classified as
just another point on the risk tolerance spectrum, but that seems
utterly contrived to me. Some things (maybe not that many) really are
like that, or can at least be assumed to work that way as a practical
matter.

> But making
> everything slower will be a hard sell, because wast majority of
> workloads already running on Postgres don't have this issue at all, so
> for them it's not worth the expense.

I think that we're accepting too much risk here. But I bet it's also
true that we're not taking enough risk in other areas. That was really
my point with the insurance analogy -- we can afford to take lots of
individual risks as long as they don't increase our exposure to truly
disastrous outcomes -- by which I mean queries that might as well take
forever to execute as far as the user is concerned. (Easier said than
done, of course.)

A simple trade-off between fast and robust doesn't seem like a
universally helpful thing. Sometimes it's a very unhelpful way of
looking at the situation. If you make something more robust to extreme
bad outcomes, then you may have simultaneously made it *faster* (not
slower) for all practical purposes. This can happen when the increase
in robustness allows the user to tune the system aggressively, and
only take on new risks that they can truly live with (which wouldn't
have been possible without the increase in robustness). For example, I
imagine that the failsafe mechanism added to VACUUM will actually make
it possible to tune VACUUM much more aggressively -- it might actually
end up significantly improving performance for all practical purposes,
even though technically it has nothing to do with performance.

Having your indexes a little more bloated because the failsafe
kicked-in is a survivable event -- the DBA lives to fight another day,
and *learns* to tune vacuum/the app so it doesn't happen again and
again. An anti-wraparound failure is perhaps not a survivable event --
the DBA gets fired. This really does seem like a fundamental
difference to me.

> Following the insurance analogy,
> selling tornado insurance in Europe is mostly pointless.

Principled skepticism of this kind of thing is of course necessary and
welcome. It *could* be taken too far.

> And the lack of data also plays role - the insurance company will ask
> for higher rates when it does not have enough accurate data about the
> phenomenon, or when there's a lot of unknowns. Maybe this would allow
> some basic measure of uncertainty, based on the number and type of
> restrictions, joins, etc.

I don't think that you can really model uncertainty. But you can have
true certainty (or close to it) about a trade-off that makes the
system fundamentally more robust over time. You can largely be certain
about both the cost of the insurance, as well as how it ameliorates
the problem in at least some cases.

> So maybe some fairly rough measure of 

Fwd: Emit namespace in post-copy output

2021-06-22 Thread Mike
When running a VACUUM or CLUSTER command, the namespace name is not part of
the emitted message.

Using `vacuumdb` CLI tool recently with multiple jobs, I found that
reading the output messages harder to match the relations with their
namespaces.

Example:

INFO:  vacuuming "sendgrid.open"
INFO:  vacuuming "mailgun.open"
...
INFO:  "open": found 0 removable, 31460776 nonremovable row versions in
1358656 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU 31.35s/261.26u sec elapsed 1620.68 sec.
...

In this example. the user can't readily tell which `open` relation was
completed.

Attached is a patch using existing functions to include the namespace in
the output string.

Looking forward to feedback!
-Mike Fiedler
From c66506ad7440a1b501cca858b5120cc568f7a00b Mon Sep 17 00:00:00 2001
From: Mike Fiedler 
Date: Tue, 22 Jun 2021 12:05:48 -0400
Subject: [PATCH] Emit namespace in post-copy output

During a VACUUM or CLUSTER command, the initial output emits the
relation namespace along with the relation name.

Add the namespace to the post-action to match the initial `errmsg`
using the same functions to build the string.

Signed-off-by: Mike Fiedler 
---
 src/backend/access/heap/vacuumlazy.c | 3 ++-
 src/backend/commands/cluster.c   | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 44f198398c..bd30461dec 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1673,7 +1673,8 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
 	appendStringInfo(, _("%s."), pg_rusage_show());
 
 	ereport(elevel,
-			(errmsg("\"%s\": found %lld removable, %lld nonremovable row versions in %u out of %u pages",
+			(errmsg("\"%s.%s\": found %lld removable, %lld nonremovable row versions in %u out of %u pages",
+	vacrel->relnamespace,
 	vacrel->relname,
 	(long long) vacrel->tuples_deleted,
 	(long long) vacrel->num_tuples, vacrel->scanned_pages,
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 6487a9e3fc..e4c577cfff 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -921,7 +921,8 @@ copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
 
 	/* Log what we did */
 	ereport(elevel,
-			(errmsg("\"%s\": found %.0f removable, %.0f nonremovable row versions in %u pages",
+			(errmsg("\"%s.%s\": found %.0f removable, %.0f nonremovable row versions in %u pages",
+	get_namespace_name(RelationGetNamespace(OldHeap)),
 	RelationGetRelationName(OldHeap),
 	tups_vacuumed, num_tuples,
 	RelationGetNumberOfBlocks(OldHeap)),
-- 
2.30.1 (Apple Git-130)



Re: Reducing the cycle time for CLOBBER_CACHE_ALWAYS buildfarm members

2021-06-22 Thread Tom Lane
Andrew Dunstan  writes:
> On 6/20/21 6:10 PM, Tom Lane wrote:
>> (3) Since this only works in v14 and up, older branches would
>> have to fall back to -DCLOBBER_CACHE_ALWAYS.  Perhaps we could
>> improve the buildfarm client script so that buildfarm owners
>> just configure "clobber_cache_testing => 1" and then the script
>> would do the right branch-dependent thing.

> Maybe. Let's see what you come up with.

Here's a couple of draft-quality patches --- one for initdb, one
for the buildfarm --- to implement this idea.  These are just
lightly tested; in particular I've not had the patience to run
full BF cycles to see how much is actually saved.

regards, tom lane

diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index afd344b4c0..3077530c7b 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -388,6 +388,17 @@ PostgreSQL documentation
 Other, less commonly used, options are also available:
 
 
+ 
+  --clobber-cache
+  
+   
+Run the bootstrap backend with the
+debug_invalidate_system_caches_always=1 option.
+This takes a very long time and is only of use for deep debugging.
+   
+  
+ 
+
  
   -d
   --debug
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 152d21e88b..0945d70061 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -202,6 +202,9 @@ static bool authwarning = false;
 static const char *boot_options = "-F";
 static const char *backend_options = "--single -F -O -j -c search_path=pg_catalog -c exit_on_error=true";
 
+/* Additional switches to pass to backend (either boot or standalone) */
+static char *extra_options = "";
+
 static const char *const subdirs[] = {
 	"global",
 	"pg_wal/archive_status",
@@ -962,12 +965,12 @@ test_config_settings(void)
 		test_buffs = MIN_BUFS_FOR_CONNS(test_conns);
 
 		snprintf(cmd, sizeof(cmd),
- "\"%s\" --boot -x0 %s "
+ "\"%s\" --boot -x0 %s %s "
  "-c max_connections=%d "
  "-c shared_buffers=%d "
  "-c dynamic_shared_memory_type=%s "
  "< \"%s\" > \"%s\" 2>&1",
- backend_exec, boot_options,
+ backend_exec, boot_options, extra_options,
  test_conns, test_buffs,
  dynamic_shared_memory_type,
  DEVNULL, DEVNULL);
@@ -998,12 +1001,12 @@ test_config_settings(void)
 		}
 
 		snprintf(cmd, sizeof(cmd),
- "\"%s\" --boot -x0 %s "
+ "\"%s\" --boot -x0 %s %s "
  "-c max_connections=%d "
  "-c shared_buffers=%d "
  "-c dynamic_shared_memory_type=%s "
  "< \"%s\" > \"%s\" 2>&1",
- backend_exec, boot_options,
+ backend_exec, boot_options, extra_options,
  n_connections, test_buffs,
  dynamic_shared_memory_type,
  DEVNULL, DEVNULL);
@@ -1403,11 +1406,11 @@ bootstrap_template1(void)
 	unsetenv("PGCLIENTENCODING");
 
 	snprintf(cmd, sizeof(cmd),
-			 "\"%s\" --boot -x1 -X %u %s %s %s",
+			 "\"%s\" --boot -x1 -X %u %s %s %s %s",
 			 backend_exec,
 			 wal_segment_size_mb * (1024 * 1024),
 			 data_checksums ? "-k" : "",
-			 boot_options,
+			 boot_options, extra_options,
 			 debug ? "-d 5" : "");
 
 
@@ -2263,6 +2266,7 @@ usage(const char *progname)
 	printf(_("  -X, --waldir=WALDIR   location for the write-ahead log directory\n"));
 	printf(_("  --wal-segsize=SIZEsize of WAL segments, in megabytes\n"));
 	printf(_("\nLess commonly used options:\n"));
+	printf(_("  --clobber-cache   use cache-clobbering debug option\n"));
 	printf(_("  -d, --debug   generate lots of debugging output\n"));
 	printf(_("  -L DIRECTORY  where to find the input files\n"));
 	printf(_("  -n, --no-cleando not clean up after errors\n"));
@@ -2863,8 +2867,8 @@ initialize_data_directory(void)
 	fflush(stdout);
 
 	snprintf(cmd, sizeof(cmd),
-			 "\"%s\" %s template1 >%s",
-			 backend_exec, backend_options,
+			 "\"%s\" %s %s template1 >%s",
+			 backend_exec, backend_options, extra_options,
 			 DEVNULL);
 
 	PG_CMD_OPEN;
@@ -2943,6 +2947,7 @@ main(int argc, char *argv[])
 		{"wal-segsize", required_argument, NULL, 12},
 		{"data-checksums", no_argument, NULL, 'k'},
 		{"allow-group-access", no_argument, NULL, 'g'},
+		{"clobber-cache", no_argument, NULL, 14},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -3084,6 +3089,11 @@ main(int argc, char *argv[])
 			case 'g':
 SetDataDirectoryCreatePerm(PG_DIR_MODE_GROUP);
 break;
+			case 14:
+extra_options = psprintf("%s %s",
+		 extra_options,
+		 "-c debug_invalidate_system_caches_always=1");
+break;
 			default:
 /* getopt_long already emitted a complaint */
 fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
diff --git a/build-farm.conf.sample b/build-farm.conf.sample
index 7e80451..d7e6a62 100644
--- a/build-farm.conf.sample
+++ b/build-farm.conf.sample
@@ -98,6 +98,11 @@ my $confdir = File::Spec->rel2abs(File::Basename::dirname(__FILE__));
 		  --leak-check=no --gen-suppressions=all   

Re: Decouple operator classes from index access methods

2021-06-22 Thread Alexander Korotkov
On Tue, Jun 22, 2021 at 8:06 PM Tom Lane  wrote:
> > I suggest the initial version to come with 2 new access methods in the
> > new type: hashing and ordering.  We can use those in the functions
> > that are currently searching for the hash and btree operator classes.
>
> Again, exactly what does that buy us, other than more complication
> and overhead?
>
> I can see some value perhaps in letting other opclasses refer to
> btree and hash opclasses rather than duplicating their entries.
> But that doesn't seem to be what you're proposing here.

In future we could have, for instance, LSM or in-memory B-tree or
other index AM, which could use existing B-tree or hash opclasses.

But even now, we could use this decoupling to get rid of ugly
btree_gist and btree_gin.  And also solve the extensibility problem
here.  If an extension provides datatype with B-tree opclass, we
currently can't directly use it with GiST and GIN.  So, in order to
provide B-tree-like indexing for GiST and GIN, an extension needs to
explicitly define GiST and GIN B-tree-like opclasses.

>From my point of view, we can consider a decoupling patch if it will
come with an ability to use B-tree opclasses directly in GiST and GIN.

--
Regards,
Alexander Korotkov




Re: Remove useless int64 range checks on BIGINT sequence MINVALUE/MAXVALUE values

2021-06-22 Thread Greg Sabino Mullane
Those code comments look good.

Re: Avoid stuck of pbgench due to skipped transactions

2021-06-22 Thread Greg Sabino Mullane
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   not tested
Documentation:not tested

Looks fine to me, as a way of catching this edge case.

Re: cleanup temporary files after crash

2021-06-22 Thread Euler Taveira
On Fri, Jun 11, 2021, at 9:43 PM, Justin Pryzby wrote:
> On Sat, Oct 31, 2020 at 09:01:15PM -0300, Euler Taveira wrote:
> > > > The current behavior is only useful for debugging purposes.
> 
> On Wed, 28 Oct 2020 at 15:42, Tomas Vondra  
> wrote:
> > > One thing I'm not sure about is whether we should have the GUC as
> > > proposed, or have a negative "keep_temp_files_after_restart" defaulting
> > > to false. But I don't have a very good justification for the alternative
> > > other than vague personal preference.
> 
> On Sat, Oct 31, 2020 at 09:01:15PM -0300, Euler Taveira wrote:
> > I thought about not providing a GUC at all or provide it in the developer
> > section. I've never heard someone saying that they use those temporary
> > files to investigate an issue. Regarding a crash, all information is already
> > available and temporary files don't provide extra details. This new GUC is 
> > just to keep the
> > previous behavior. I'm fine without the GUC, though.
> 
> Should this GUC be classified as a developer option, and removed from
> postgresql.sample.conf ?
It probably should.

> That was discussed initially in October but not since.
Was it? I seem to have missed this suggestion.

I'm attaching a patch to fix it.


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From fbfd63fcb0d75045b56576cb220e7caea6af9e92 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Tue, 22 Jun 2021 14:39:56 -0300
Subject: [PATCH v1] Move new GUC remove_temp_files_after_crash to another
 section

The appropriate section for GUC remove_temp_files_after_crash is
probably Developer Options since the main goal of the feature is to
allow inspecting temporary files for debugging purposes. It was an
oversight in commit cd91de0d. Per report from Justin Pryzby.
---
 doc/src/sgml/config.sgml  | 41 +--
 src/backend/utils/misc/guc.c  |  2 +-
 src/backend/utils/misc/postgresql.conf.sample |  2 -
 3 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index eeba2caa43..f5a753e589 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9863,28 +9863,6 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
   
  
 
- 
-  remove_temp_files_after_crash (boolean)
-  
-   remove_temp_files_after_crash configuration parameter
-  
-  
-  
-   
-When set to on, which is the default,
-PostgreSQL will automatically remove
-temporary files after a backend crash. If disabled, the files will be
-retained and may be used for debugging, for example. Repeated crashes
-may however result in accumulation of useless files.
-   
-
-   
-This parameter can only be set in the postgresql.conf
-file or on the server command line.
-   
-  
- 
-
  
   data_sync_retry (boolean)
   
@@ -10912,6 +10890,25 @@ LOG:  CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1)
   
  
 
+ 
+  remove_temp_files_after_crash (boolean)
+  
+   remove_temp_files_after_crash configuration parameter
+  
+  
+  
+   
+When set to on, which is the default,
+PostgreSQL will automatically remove
+temporary files after a backend crash. If disabled, the files will be
+retained and may be used for debugging, for example. Repeated crashes
+may however result in accumulation of useless files. This parameter 
+can only be set in the postgresql.conf file or on
+the server command line.
+   
+  
+ 
+
 
   
   
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 68b62d523d..d8b8f4a572 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1404,7 +1404,7 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 	{
-		{"remove_temp_files_after_crash", PGC_SIGHUP, ERROR_HANDLING_OPTIONS,
+		{"remove_temp_files_after_crash", PGC_SIGHUP, DEVELOPER_OPTIONS,
 			gettext_noop("Remove temporary files after backend crash."),
 			NULL
 		},
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index ddbb6dc2be..a5a7174b0e 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -768,8 +768,6 @@
 
 #exit_on_error = off			# terminate session on any error?
 #restart_after_crash = on		# reinitialize after backend crash?
-#remove_temp_files_after_crash = on	# remove temporary files after
-	# backend crash?
 #data_sync_retry = off			# retry or panic on failure to fsync
 	# data?
 	# (change requires restart)
-- 
2.20.1



Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-06-22 Thread Fabien COELHO



Hello Yugo-san,

Thanks a lot for continuing this work started by Marina!

I'm planning to review it for the July CF. I've just added an entry there:

https://commitfest.postgresql.org/33/3194/

--
Fabien.




Re: Decouple operator classes from index access methods

2021-06-22 Thread Emre Hasegeli
> I can see some value perhaps in letting other opclasses refer to
> btree and hash opclasses rather than duplicating their entries.
> But that doesn't seem to be what you're proposing here.

That's what I am proposing.  My idea is to change the current btree
and hash opclasses to be under the new proposed access methods so
btree, hash, and any other index access method can refer to them.




Re: Assertion failure in HEAD and 13 after calling COMMIT in a stored proc

2021-06-22 Thread Tom Lane
I wrote:
> The attached seems to be enough to resolve Jim's example.  I'd like
> to invent a test case that involves a detoast of the simple
> expression's result, too, to show that transiently pushing a
> snapshot for the duration of the expression is not the right fix.

Here we go.  This test case gives "cannot fetch toast data without an
active snapshot" in v11 and v12 branch tips.  Since those branches lack
the 73b06cf89 optimization, they push a snapshot while calling the
SQL-language function, thus it doesn't complain.  But what comes back
is toasted, and then we fail trying to detoast it.

regards, tom lane

diff --git a/src/pl/plpgsql/src/expected/plpgsql_transaction.out b/src/pl/plpgsql/src/expected/plpgsql_transaction.out
index 918cc0913e..35845d1d6b 100644
--- a/src/pl/plpgsql/src/expected/plpgsql_transaction.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_transaction.out
@@ -430,6 +430,24 @@ SELECT * FROM test1;
 ---+---
 (0 rows)
 
+-- detoast result of simple expression after commit
+CREATE TEMP TABLE test4(f1 text);
+ALTER TABLE test4 ALTER COLUMN f1 SET STORAGE EXTERNAL; -- disable compression
+INSERT INTO test4 SELECT repeat('xyzzy', 2000);
+-- immutable mark is a bit of a lie, but it serves to make call a simple expr
+-- that will return a still-toasted value
+CREATE FUNCTION data_source(i int) RETURNS TEXT LANGUAGE sql
+AS 'select f1 from test4' IMMUTABLE;
+DO $$
+declare x text;
+begin
+  for i in 1..3 loop
+x := data_source(i);
+commit;
+  end loop;
+  raise notice 'length(x) = %', length(x);
+end $$;
+NOTICE:  length(x) = 1
 -- operations on composite types vs. internal transactions
 DO LANGUAGE plpgsql $$
 declare
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 392456ae85..06bdd04774 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -35,6 +35,7 @@
 #include "parser/parse_type.h"
 #include "parser/scansup.h"
 #include "storage/proc.h"
+#include "tcop/pquery.h"
 #include "tcop/tcopprot.h"
 #include "tcop/utility.h"
 #include "utils/array.h"
@@ -6153,6 +6154,15 @@ exec_eval_simple_expr(PLpgSQL_execstate *estate,
 	if (expr->expr_simple_in_use && expr->expr_simple_lxid == curlxid)
 		return false;
 
+	/*
+	 * Ensure that there's a portal-level snapshot, in case this simple
+	 * expression is the first thing evaluated after a COMMIT or ROLLBACK.
+	 * We'd have to do this anyway before executing the expression, so we
+	 * might as well do it now to ensure that any possible replanning doesn't
+	 * need to take a new snapshot.
+	 */
+	EnsurePortalSnapshotExists();
+
 	/*
 	 * Revalidate cached plan, so that we will notice if it became stale. (We
 	 * need to hold a refcount while using the plan, anyway.)  If replanning
diff --git a/src/pl/plpgsql/src/sql/plpgsql_transaction.sql b/src/pl/plpgsql/src/sql/plpgsql_transaction.sql
index cc26788b9a..8e4783c9a5 100644
--- a/src/pl/plpgsql/src/sql/plpgsql_transaction.sql
+++ b/src/pl/plpgsql/src/sql/plpgsql_transaction.sql
@@ -354,6 +354,27 @@ $$;
 SELECT * FROM test1;
 
 
+-- detoast result of simple expression after commit
+CREATE TEMP TABLE test4(f1 text);
+ALTER TABLE test4 ALTER COLUMN f1 SET STORAGE EXTERNAL; -- disable compression
+INSERT INTO test4 SELECT repeat('xyzzy', 2000);
+
+-- immutable mark is a bit of a lie, but it serves to make call a simple expr
+-- that will return a still-toasted value
+CREATE FUNCTION data_source(i int) RETURNS TEXT LANGUAGE sql
+AS 'select f1 from test4' IMMUTABLE;
+
+DO $$
+declare x text;
+begin
+  for i in 1..3 loop
+x := data_source(i);
+commit;
+  end loop;
+  raise notice 'length(x) = %', length(x);
+end $$;
+
+
 -- operations on composite types vs. internal transactions
 DO LANGUAGE plpgsql $$
 declare


Re: Decouple operator classes from index access methods

2021-06-22 Thread Tom Lane
Emre Hasegeli  writes:
> I think we can benefit from higher level operator classes which can
> support multiple index implementations.  This is achievable by
> introducing another type of access method.

I do not really understand what the point of that is?

To the extent that operator classes have any meaning at all apart
from the associated index AMs, ISTM it's that they embody specific
well-known semantics, as btree and hash opclasses do for sorting
and hashing respectively.  I'm not clear on what a multi-AM
opclass notion would do for us.

> I suggest the initial version to come with 2 new access methods in the
> new type: hashing and ordering.  We can use those in the functions
> that are currently searching for the hash and btree operator classes.

Again, exactly what does that buy us, other than more complication
and overhead?

I can see some value perhaps in letting other opclasses refer to
btree and hash opclasses rather than duplicating their entries.
But that doesn't seem to be what you're proposing here.

regards, tom lane




Decouple operator classes from index access methods

2021-06-22 Thread Emre Hasegeli
I think we can benefit from higher level operator classes which can
support multiple index implementations.  This is achievable by
introducing another type of access method.  Here is my idea in SQL:

CREATE ACCESS METHOD ordering
TYPE INTERFACE HANDLER ordering_ifam_handler;

CREATE ACCESS METHOD btree
TYPE INDEX HANDLER bthandler
IMPLEMENTS (ordering);

CREATE OPERATOR CLASS btree_int_ops
FOR TYPE int USING ordering AS
FUNCTION 1 btint4cmp(int, int),
FUNCTION 3 =(int, int);

This can make it easier to develop both new index access methods and
data types.  The extensions can provide them without depending on each
other.

The initial implementation is attached.  I wrote it only to ask for
feedback.  I am happy to work on the missing pieces if the community
supports the idea.

I suggest the initial version to come with 2 new access methods in the
new type: hashing and ordering.  We can use those in the functions
that are currently searching for the hash and btree operator classes.

Later, I want to work on developing another access method for
containment.  It can support high level operator classes with only SQL
callable functions.  GiST, SP-GiST, and BRIN can implement this.  We
can allow the new implementations together with the existing ones.


0001-v01-Implement-INTERFACE-access-method.patch
Description: Binary data


Re: Maintaining a list of pgindent commits for "git blame" to ignore

2021-06-22 Thread Peter Geoghegan
On Tue, Jun 22, 2021 at 9:43 AM Tom Lane  wrote:
> > Pushed. Thanks.
>
> Um.  You probably should have waited for beta2 to be tagged.
> No harm done likely, but it's not per release process.

Sorry about that. I was aware of the policy, but somehow overlooked
that we were in the window between stamping and tagging. I'll be more
careful about this in the future.

-- 
Peter Geoghegan




Re: Maintaining a list of pgindent commits for "git blame" to ignore

2021-06-22 Thread Tom Lane
Peter Geoghegan  writes:
> On Tue, Jun 22, 2021 at 8:18 AM Tom Lane  wrote:
>> OK, no objections then.

> Pushed. Thanks.

Um.  You probably should have waited for beta2 to be tagged.
No harm done likely, but it's not per release process.

regards, tom lane




Re: Maintaining a list of pgindent commits for "git blame" to ignore

2021-06-22 Thread Peter Geoghegan
On Tue, Jun 22, 2021 at 8:18 AM Tom Lane  wrote:
> OK, no objections then.

Pushed. Thanks.

-- 
Peter Geoghegan




Re: Assertion failure in HEAD and 13 after calling COMMIT in a stored proc

2021-06-22 Thread Tom Lane
I wrote:
> Hmm.  I think the real issue here is that commit 84f5c2908 did
> not cover the "simple expression" code path in plpgsql.  We
> need to re-establish an outer snapshot when the next thing
> that happens after COMMIT is a simple expression, too.

The attached seems to be enough to resolve Jim's example.  I'd like
to invent a test case that involves a detoast of the simple
expression's result, too, to show that transiently pushing a
snapshot for the duration of the expression is not the right fix.

regards, tom lane

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 0ce382e123..96bb77e0b1 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -38,6 +38,7 @@
 #include "plpgsql.h"
 #include "storage/proc.h"
 #include "tcop/cmdtag.h"
+#include "tcop/pquery.h"
 #include "tcop/tcopprot.h"
 #include "tcop/utility.h"
 #include "utils/array.h"
@@ -5958,6 +5959,15 @@ exec_eval_simple_expr(PLpgSQL_execstate *estate,
 		expr->expr_simple_lxid == curlxid)
 		return false;
 
+	/*
+	 * Ensure that there's a portal-level snapshot, in case this simple
+	 * expression is the first thing evaluated after a COMMIT or ROLLBACK.
+	 * We'd have to do this anyway before executing the expression, so we
+	 * might as well do it now to ensure that any possible replanning doesn't
+	 * need to take a new snapshot.
+	 */
+	EnsurePortalSnapshotExists();
+
 	/*
 	 * Check to see if the cached plan has been invalidated.  If not, and this
 	 * is the first use in the current transaction, save a plan refcount in


Re: intermittent failures in Cygwin from select_parallel tests

2021-06-22 Thread Andrew Dunstan


On 6/22/21 10:41 AM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> It's worth pointing out that Cygwin psql is built with readline, which
>> makes it rather nicer to use than the Native builds. We still have this
>> in configure.ac:
>>
>> # readline on MinGW has problems with backslashes in psql and other bugs.
>> # This is particularly a problem with non-US code pages.
>> # Therefore disable its use until we understand the cause. 2004-07-20
>>
>> I have no idea if it's still true. My msys2 system (fairywren) has
>> readline installed. Assuming I disable this piece of code and build with
>> readline, how would I test for the errors reported in 2004?
> A bit of archive excavation turned up various contemporaneous reports:
>
> https://www.postgresql.org/message-id/flat/6BCB9D8A16AC4241919521715F4D8BCE34BE7A%40algol.sollentuna.se
> https://www.postgresql.org/message-id/flat/ca3pfg%2427tt%241%40news.hub.org
> https://www.postgresql.org/message-id/flat/OFD583EA49.2F089C03-ONC1256E6A.002706B8-C1256E6A.002784FC%40flender.com
> https://www.postgresql.org/message-id/flat/OF585C42B8.14897FF3-ONC1256EBC.0042A26E-C1256EBC.0042E23F%40flender.com
>
> It's certainly possible that some or all of those problems are gone.
>
>   



Thanks, I will need more info than seems to be there though.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Maintaining a list of pgindent commits for "git blame" to ignore

2021-06-22 Thread Tom Lane
Peter Geoghegan  writes:
> On Tue, Jun 22, 2021 at 8:04 AM Tom Lane  wrote:
>> Hmm, is the "git config blame.ignoreRevsFile" setting actually
>> repo-relative?  I'm a bit confused as to whether an absolute
>> file path would be needed to ensure correct behavior.

> That seems to be standard practice, and certainly works for me.

OK, no objections then.

regards, tom lane




Re: Maintaining a list of pgindent commits for "git blame" to ignore

2021-06-22 Thread Peter Geoghegan
On Tue, Jun 22, 2021 at 8:04 AM Tom Lane  wrote:
> Hmm, is the "git config blame.ignoreRevsFile" setting actually
> repo-relative?  I'm a bit confused as to whether an absolute
> file path would be needed to ensure correct behavior.

That seems to be standard practice, and certainly works for me.

If any of the hashes are not well formed, or even appear in
abbreviated form, "git blame"  breaks in a very obvious and visible
way. So there is zero chance of it breaking without somebody noticing
immediately.

-- 
Peter Geoghegan




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-06-22 Thread Maxim Orlov
The analysis in the beginning of the discussion seems to be right, but 
the fix v2 looks too invasive for me.


Personally, I'd like not to remove snapshot even if transaction is 
read-only. I propose to consider "xid < TransactionXmin" as a legit case 
and just promote xid to TransactionXmin.


It's annoying this old bug still not fixed. What do you think?
---
Best regards,
Maxim Orlov.diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 6a8e521f894..ce7c90aa38f 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -29,6 +29,7 @@
 #include "postgres.h"
 
 #include "access/slru.h"
+#include "access/parallel.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
 #include "pg_trace.h"
@@ -152,6 +153,21 @@ SubTransGetTopmostTransaction(TransactionId xid)
 	TransactionId parentXid = xid,
 previousXid = xid;
 
+	/*
+	 * We may have different xmins in active and transaction snapshots in
+	 * parallel workers. And TransactionXmin can be promoted to the max of them.
+	 * So, we do this to avoid assert below when restore active snapshot with
+	 * less xmin value.
+	 *
+	 * XXX: the same may be fixed by prohibiting read-only transactions (like
+	 * parallel scans) to increment TransactionXmin.
+	 */
+	if (IsParallelWorker())
+	{
+		xid = xid > TransactionXmin ? xid : TransactionXmin;
+		parentXid = xid;
+		previousXid = xid;
+	}
 	/* Can't ask about stuff that might not be around anymore */
 	Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
 


Re: Maintaining a list of pgindent commits for "git blame" to ignore

2021-06-22 Thread Tom Lane
Peter Geoghegan  writes:
> Attached is a patch file that puts it all together. I would like to
> commit this in the next couple of days.

Hmm, is the "git config blame.ignoreRevsFile" setting actually
repo-relative?  I'm a bit confused as to whether an absolute
file path would be needed to ensure correct behavior.

regards, tom lane




Re: Assertion failure in HEAD and 13 after calling COMMIT in a stored proc

2021-06-22 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Jun 21, 2021 at 04:19:27PM -0700, Jim Nasby wrote:
>> The following generates an assertion failure.

> A bisect run points me to the following commit:
> commit 73b06cf893c9d3bb38c11878a12cc29407e78b6c
> Author: Tom Lane 
> Date:   Fri Nov 22 15:02:18 2019 -0500
> Avoid taking a new snapshot for an immutable simple expression in plpgsql.

Hmm.  I think the real issue here is that commit 84f5c2908 did
not cover the "simple expression" code path in plpgsql.  We
need to re-establish an outer snapshot when the next thing
that happens after COMMIT is a simple expression, too.

In this view, 73b06cf8 just removed code that was masking the
lack of a snapshot during the evaluation of the simple expr
itself.  However, we'd still have had a problem if the simple
expr returned a toast pointer that we had to dereference after
returning (and popping that snapshot).  So I'm thinking
back-patch to v11, as before.

regards, tom lane




Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2021-06-22 Thread Yugo NAGATA
Hi hackers,

On Mon, 24 May 2021 11:29:10 +0900
Yugo NAGATA  wrote:

> Hi hackers,
> 
> On Tue, 10 Mar 2020 09:48:23 +1300
> Thomas Munro  wrote:
> 
> > On Tue, Mar 10, 2020 at 8:43 AM Fabien COELHO  wrote:
> > > >> Thank you very much! I'm going to send a new patch set until the end of
> > > >> this week (I'm sorry I was very busy in the release of Postgres Pro
> > > >> 11...).
> > > >
> > > > Is anyone interested in rebasing this, and summarising what needs to
> > > > be done to get it in?  It's arguably a bug or at least quite
> > > > unfortunate that pgbench doesn't work with SERIALIZABLE, and I heard
> > > > that a couple of forks already ship Marina's patch set.
> 
> I got interested in this and now looking into the patch and the past 
> discussion. 
> If anyone other won't do it and there are no objection, I would like to rebase
> this. Is that okay?

I rebased and fixed the previous patches (v11) rewtten by Marina Polyakova,
and attached the revised version (v12).

v12-0001-Pgbench-errors-use-the-Variables-structure-for-c.patch
- a patch for the Variables structure (this is used to reset client 
variables during the repeating of transactions after 
serialization/deadlock failures).

v12-0002-Pgbench-errors-and-serialization-deadlock-retrie.patch
- the main patch for handling client errors and repetition of 
transactions with serialization/deadlock failures (see the detailed 
description in the file).

These are the revised versions from v11-0002 and v11-0003. v11-0001
(for the RandomState structure) is not included because this has been
already committed (40923191944). V11-0004 (for a separate error reporting
function) is not included neither because pgbench now uses common logging
APIs (30a3e772b40).

In addition to rebase on master, I updated the patch according with the
review from Fabien COELHO [1] and discussions after this. Also, I added
some other fixes through my reviewing the previous patch.

[1] 
https://www.postgresql.org/message-id/alpine.DEB.2.21.1809081450100.10506%40lancre

Following are fixes according with Fabian's review.

> * Features

> As far as the actual retry feature is concerned, I'd say we are nearly 
> there. However I have an issue with changing the behavior on meta command 
> and other sql errors, which I find not desirable.
...
> I do not think that these changes of behavior are desirable. Meta command and
> miscellaneous SQL errors should result in immediatly aborting the whole run,
> because the client test code itself could not run correctly or the SQL sent
> was somehow wrong, which is also the client's fault, and the server 
> performance bench does not make much sense in such conditions.
> 
> ISTM that the focus of this patch should only be to handle some server 
> runtime errors that can be retryed, but not to change pgbench behavior on 
> other kind of errors. If these are to be changed, ISTM that it would be a 
> distinct patch and would require some discussion, and possibly an option 
> to enable it or not if some use case emerge. AFA this patch is concerned, 
> I'd suggest to let that out.

Previously, all SQL and meta command errors could be retried, but I fixed
to allow only serialization & deadlock errors to be retried. 

> Doc says "you cannot use an infinite number of retries without 
> latency-limit..."
> 
> Why should this be forbidden? At least if -T timeout takes precedent and
> shortens the execution, ISTM that there could be good reason to test that.
> Maybe it could be blocked only under -t if this would lead to an non-ending
> run.

I fixed to allow to use --max-tries with -T option even if latency-limit
is not used.

> As "--print-errors" is really for debug, maybe it could be named
> "--debug-errors". I'm not sure that having "--debug" implying this option
> is useful: As there are two distinct options, the user may be allowed
> to trigger one or the other as they wish?

print-errors was renamed to debug-errors.

> makeVariableValue error message is not for debug, but must be kept in all
> cases, and the false returned must result in an immediate abort. Same thing 
> about
> lookupCreateVariable, an invalid name is a user error which warrants an 
> immediate
> abort. Same thing again about coerce* functions or evalStandardFunc...
> Basically, most/all added "debug_level >= DEBUG_ERRORS" are not desirable.

"DEBUG_ERRORS" messages unrelated to serialization & deadlock errors were 
removed.

> sendRollback(): I'd suggest to simplify. The prepare/extended statement stuff 
> is
> really about the transaction script, not dealing with errors, esp as there is 
> no
> significant advantage in preparing a "ROLLBACK" statement which is short and 
> has
> no parameters. I'd suggest to remove this function and just issue
> PQsendQuery("ROLLBACK;") in all cases.

Now, we just issue PQsendQuery("ROLLBACK;").

> In copyVariables, I'd simplify
>
>  + if (source_var->svalue == NULL)
>  +   dest_var->svalue = NULL;
>  + else
>  +   dest_var->svalue = 

Re: intermittent failures in Cygwin from select_parallel tests

2021-06-22 Thread Tom Lane
Andrew Dunstan  writes:
> It's worth pointing out that Cygwin psql is built with readline, which
> makes it rather nicer to use than the Native builds. We still have this
> in configure.ac:
>
> # readline on MinGW has problems with backslashes in psql and other bugs.
> # This is particularly a problem with non-US code pages.
> # Therefore disable its use until we understand the cause. 2004-07-20
>
> I have no idea if it's still true. My msys2 system (fairywren) has
> readline installed. Assuming I disable this piece of code and build with
> readline, how would I test for the errors reported in 2004?

A bit of archive excavation turned up various contemporaneous reports:

https://www.postgresql.org/message-id/flat/6BCB9D8A16AC4241919521715F4D8BCE34BE7A%40algol.sollentuna.se
https://www.postgresql.org/message-id/flat/ca3pfg%2427tt%241%40news.hub.org
https://www.postgresql.org/message-id/flat/OFD583EA49.2F089C03-ONC1256E6A.002706B8-C1256E6A.002784FC%40flender.com
https://www.postgresql.org/message-id/flat/OF585C42B8.14897FF3-ONC1256EBC.0042A26E-C1256EBC.0042E23F%40flender.com

It's certainly possible that some or all of those problems are gone.

regards, tom lane




Re: Cosmic ray hits integerset

2021-06-22 Thread Alvaro Herrera
On 2021-Jun-22, Thomas Munro wrote:

> Hi,
> 
> Here's a curious one-off failure in test_integerset:
> 
> +ERROR:  iterate returned wrong value; got 519985430528, expected 485625692160

Cosmic rays indeed.  The base-2 representation of the expected value is
11100010001000110001100
and that of the actual value is
0010001000110001100

There's a single bit of difference.

-- 
Álvaro Herrera   Valdivia, Chile
"No hay hombre que no aspire a la plenitud, es decir,
la suma de experiencias de que un hombre es capaz"




Re: Case expression pushdown

2021-06-22 Thread Alexander Pyhalov

Seino Yuki писал 2021-06-22 16:03:

On 2021-06-16 01:29, Alexander Pyhalov wrote:

Hi.

Ashutosh Bapat писал 2021-06-15 16:24:
Looks quite useful to me. Can you please add this to the next 
commitfest?




Addded to commitfest. Here is an updated patch version.


Thanks for posting the patch.
I agree with this content.


+ Foreign Scan on public.ft2  (cost=156.58..165.45 rows=394 width=14)

It's not a big issue, but is there any intention behind the pattern of
outputting costs in regression tests?


Hi.

No, I don't think it makes much sense. Updated tests (also added case 
with empty else).

--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom f6b57a3b84d1be0325321d4a0971f7b05b13a80a Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Tue, 30 Mar 2021 13:24:14 +0300
Subject: [PATCH] Allow pushing CASE expression to foreign server

---
 contrib/postgres_fdw/deparse.c| 118 ++
 .../postgres_fdw/expected/postgres_fdw.out|  64 ++
 contrib/postgres_fdw/sql/postgres_fdw.sql |  28 +
 3 files changed, 210 insertions(+)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 31919fda8c6..3621fed4b54 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -87,6 +87,7 @@ typedef struct foreign_loc_cxt
 {
 	Oid			collation;		/* OID of current collation, if any */
 	FDWCollateState state;		/* state of current collation choice */
+	Expr	   *case_arg;		/* the last case arg to inspect */
 } foreign_loc_cxt;
 
 /*
@@ -101,6 +102,7 @@ typedef struct deparse_expr_cxt
  * a base relation. */
 	StringInfo	buf;			/* output buffer to append to */
 	List	  **params_list;	/* exprs that will become remote Params */
+	List	   *case_args;		/* list of args to deparse CaseTestExpr */
 } deparse_expr_cxt;
 
 #define REL_ALIAS_PREFIX	"r"
@@ -186,6 +188,9 @@ static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 	deparse_expr_cxt *context);
 
+static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context);
+static void deparseCaseTestExpr(CaseTestExpr *node, deparse_expr_cxt *context);
+
 /*
  * Helper functions
  */
@@ -254,6 +259,7 @@ is_foreign_expr(PlannerInfo *root,
 		glob_cxt.relids = baserel->relids;
 	loc_cxt.collation = InvalidOid;
 	loc_cxt.state = FDW_COLLATE_NONE;
+	loc_cxt.case_arg = NULL;
 	if (!foreign_expr_walker((Node *) expr, _cxt, _cxt))
 		return false;
 
@@ -312,6 +318,7 @@ foreign_expr_walker(Node *node,
 	/* Set up inner_cxt for possible recursion to child nodes */
 	inner_cxt.collation = InvalidOid;
 	inner_cxt.state = FDW_COLLATE_NONE;
+	inner_cxt.case_arg = outer_cxt->case_arg;
 
 	switch (nodeTag(node))
 	{
@@ -509,6 +516,62 @@ foreign_expr_walker(Node *node,
 	state = FDW_COLLATE_UNSAFE;
 			}
 			break;
+		case T_CaseExpr:
+			{
+CaseExpr   *ce = (CaseExpr *) node;
+ListCell   *arg;
+
+if (ce->arg)
+	inner_cxt.case_arg = ce->arg;
+
+foreach(arg, ce->args)
+{
+	CaseWhen   *w = lfirst_node(CaseWhen, arg);
+
+	if (!foreign_expr_walker((Node *) w->expr,
+			 glob_cxt, _cxt))
+		return false;
+
+	if (!foreign_expr_walker((Node *) w->result,
+			 glob_cxt, _cxt))
+		return false;
+}
+
+if (!foreign_expr_walker((Node *) ce->defresult,
+		 glob_cxt, _cxt))
+	return false;
+
+collation = ce->casecollid;
+if (collation == InvalidOid)
+	state = FDW_COLLATE_NONE;
+else if (inner_cxt.state == FDW_COLLATE_SAFE &&
+		 collation == inner_cxt.collation)
+	state = FDW_COLLATE_SAFE;
+else if (collation == DEFAULT_COLLATION_OID)
+	state = FDW_COLLATE_NONE;
+else
+	state = FDW_COLLATE_UNSAFE;
+			}
+			break;
+		case T_CaseTestExpr:
+			{
+Expr	   *arg;
+
+Assert(outer_cxt->case_arg != NULL);
+arg = outer_cxt->case_arg;
+
+if (!foreign_expr_walker((Node *) arg,
+		 glob_cxt, _cxt))
+	return false;
+
+/*
+ * Collation and state just bubble up from the previously
+ * saved case argument
+ */
+collation = inner_cxt.collation;
+state = inner_cxt.state;
+			}
+			break;
 		case T_OpExpr:
 		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
 			{
@@ -1019,6 +1082,7 @@ deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *rel,
 	context.foreignrel = rel;
 	context.scanrel = IS_UPPER_REL(rel) ? fpinfo->outerrel : rel;
 	context.params_list = params_list;
+	context.case_args = NIL;
 
 	/* Construct SELECT clause */
 	deparseSelectSql(tlist, is_subquery, retrieved_attrs, );
@@ -1598,6 +1662,7 @@ deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
 			context.scanrel = foreignrel;
 			context.root = root;
 			context.params_list = params_list;
+			context.case_args = NIL;
 
 			appendStringInfoChar(buf, '(');
 			appendConditions(fpinfo->joinclauses, );
@@ -1901,6 +1966,7 @@ 

Re: Case expression pushdown

2021-06-22 Thread Seino Yuki

On 2021-06-16 01:29, Alexander Pyhalov wrote:

Hi.

Ashutosh Bapat писал 2021-06-15 16:24:
Looks quite useful to me. Can you please add this to the next 
commitfest?




Addded to commitfest. Here is an updated patch version.


Thanks for posting the patch.
I agree with this content.


+ Foreign Scan on public.ft2  (cost=156.58..165.45 rows=394 width=14)

It's not a big issue, but is there any intention behind the pattern of
outputting costs in regression tests?

Regards,

--
Yuki Seino
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Assertion failure in HEAD and 13 after calling COMMIT in a stored proc

2021-06-22 Thread Ranier Vilela
 On Mon, Jun 21, 2021 at 04:19:27PM -0700, Jim Nasby wrote:
> The following generates an assertion failure. Quick testing with start and
> stop as well as the core dump shows it’s failing on the execution of
> `schema_name := schema_name(i)` immediately after COMMIT, because there’s
no
> active snapshot. On a build without asserts I get a failure in
> GetActiveSnapshot() (second stack trace). This works fine on 12_STABLE,
but
> fails on 13_STABLE and HEAD.

For me it's a typo.
need_snapshot = (expr->expr_simple_mutable || !estate->readonly_func);

HEAD with no assertion:

 CREATE OR REPLACE FUNCTION public.schema_name(i integer)
postgres-#   RETURNS text
postgres-#   LANGUAGE sql
postgres-#   IMMUTABLE
postgres-# AS $function$
postgres$# SELECT 'test_' || trim(to_char(i, '00'))
postgres$# $function$;
CREATE FUNCTION
postgres=# CREATE OR REPLACE PROCEDURE public.build_schema(start integer,
stop
postgres(# integer, commit_interval integer DEFAULT 10, do_insert boolean
DEFAULT true)
postgres-#   LANGUAGE plpgsql
postgres-# AS $procedure$
postgres$# DECLARE
postgres$#  schema_name text;
postgres$# BEGIN
postgres$# FOR i IN start .. stop LOOP
postgres$#  schema_name := schema_name(i);
postgres$#  IF i % commit_interval = 0 THEN
postgres$#  --RAISE NOTICE 'COMMIT CREATE step %', i;
postgres$#  COMMIT;
postgres$#  END IF;
postgres$# END LOOP;
postgres$# END$procedure$;
CREATE PROCEDURE
postgres=# CALL build_schema(1,11);
CALL
postgres=# CALL build_schema(1,11);
CALL
postgres=# CALL build_schema(1,11);
CALL

The comments in the function are clear:
If expression is mutable OR is a non-read-only function, so need a snapshot.

Can you test please?

regards,
Ranier Vilela


fix_segfault_pl_exec.patch
Description: Binary data


Re: [bug?] Missed parallel safety checks, and wrong parallel safety

2021-06-22 Thread Amit Kapila
On Wed, Jun 16, 2021 at 8:57 AM houzj.f...@fujitsu.com
 wrote:
>
> I think the check of partition could be even more complicated if we need to
> check the parallel safety of partition key expression. If user directly 
> insert into
> a partition, then we need invoke ExecPartitionCheck which will execute all 
> it's
> parent's and grandparent's partition key expressions. It means if we change a
> parent table's partition key expression(by 1) change function in expr or 2) 
> attach
> the parent table as partition of another parent table), then we need to 
> invalidate
> all its child's relcache.
>

I think we already invalidate the child entries when we add/drop
constraints on a parent table. See ATAddCheckConstraint,
ATExecDropConstraint. If I am not missing anything then this case
shouldn't be a problem. Do you have something else in mind?

-- 
With Regards,
Amit Kapila.




Re: intermittent failures in Cygwin from select_parallel tests

2021-06-22 Thread Andrew Dunstan


On 6/22/21 2:42 AM, Noah Misch wrote:
> I was wondering about suggesting some kind of
>> official warning, but I guess the manual already covers it with this
>> 10 year old notice.  I don't know much about Windows or Cygwin so I'm
>> not sure if it needs updating or not, but I would guess that there are
>> no longer any such systems?
>>
>>   Cygwin is not recommended for running a
>>   production server, and it should only be used for running on
>>   older versions of Windows where
>>   the native build does not work.
> I expect native builds work on all Microsoft-supported Windows versions, so +1
> for removing everything after the comma.
>
>


If we just want to declare Cygwin unsupported I can halt lorikeet and we
can be done with it. Brolga is on its last legs anyway, as we can't
built on NT past release 10.

It's worth pointing out that Cygwin psql is built with readline, which
makes it rather nicer to use than the Native builds. We still have this
in configure.ac:


# readline on MinGW has problems with backslashes in psql and other bugs.
# This is particularly a problem with non-US code pages.
# Therefore disable its use until we understand the cause. 2004-07-20
if test "$PORTNAME" = "win32"; then
  if test "$with_readline" = yes; then
    AC_MSG_WARN([*** Readline does not work on MinGW --- disabling])
    with_readline=no
  fi
fi


I have no idea if it's still true. My msys2 system (fairywren) has
readline installed. Assuming I disable this piece of code and build with
readline, how would I test for the errors reported in 2004?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: disfavoring unparameterized nested loops

2021-06-22 Thread Robert Haas
On Mon, Jun 21, 2021 at 4:52 PM Tom Lane  wrote:
> I'm willing to take some flak if there's not an easy proof that the outer
> scan is single-row, but I don't think we should just up and change it
> for cases where there is.

I think that's a reasonable request. I'm not sure that I believe it's
100% necessary, but it's certainly an improvement on a technical
level, and given that the proposed change could impact quite a lot of
plans, it's fair to want to see some effort being put into mitigating
the possible downsides. Now, I'm not sure when I might have time to
actually try to do the work, which kind of sucks, but that's how it
goes sometimes.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re:disfavoring unparameterized nested loops

2021-06-22 Thread Finnerty, Jim
> But making everything slower will be a hard sell, because vast majority of
> workloads already running on Postgres don't have this issue at all, so
> for them it's not worth the expense. Following the insurance analogy,
> selling tornado insurance in Europe is mostly pointless.
>

Agree. I've been surprised about NOT hearing complaints from PostgreSQL
customers about a particular "bad" plan choice that was common in other
rdbms products where large, complex queries were the norm.  The situation
occurs late in a plan with many joins where a hash join can be used and  
where either side is estimated to fit in memory.  On one side is a base table 
with cardinality that we have statistics for, while the other side has an
estimated cardinality that is the result of many estimates each of which
has error that can compound, and that in some cases amounts to a wild guess.
(e.g. what is the selectivity of SUM(x) < 12 ?).  If the planner's point 
estimate 
of cardinality is such that both sides could fit in memory, then a bad plan can
easily be made.  As Peter said, [ most ] humans have no trouble dealing with 
these kind of situations. They take the risk of being wrong into account.

So in our world, the useful numbers are 0, 1, measured N, and estimated N,
but we don't distinguish between measured N and estimated N.

But that doesn't mean that OLTP customers would be willing to accept
slightly suboptimal plans to mitigate a risk they don't experience.

> Insurance is also about personal preference / risk tolerance. Maybe I'm
> fine with accepting risk that my house burns down, or whatever ...

Right, and that's why the problem mentioned above is still out there
annoying customers who have complex plans.  To them it looks like
an obviously bad plan choice.

Something that might help is to have the planner cost be a structure instead
of a number.  Costs of plans that are deemed "risky" are accumulated 
separately from plans that make no risky choices, and for a given set
of join items you keep the minimum cost plan of both types. It may happen that 
all
plans eventually make a risky choice, in which case you take the plan with the 
minimum
cost, but if there exists a plan with no risky choices, then the minimum cost
plan with no risky choices is chosen, with a GUC that enables a customer to 
ignore
risk when making this choice.  This is not in the spirit of the hoped for 
simple heuristic,
and it would be heuristic in its classification of plans that are risky, but in 
the NLJ case 
the cost of an unparameterized NLJ could be deemed risky if the cardinality of 
the inner 
relation is not 0, 1, or measured N.






RE: [bug?] Missed parallel safety checks, and wrong parallel safety

2021-06-22 Thread houzj.f...@fujitsu.com
On Monday, June 21, 2021 11:23 PM Robert Haas  wrote:
> On Mon, Jun 21, 2021 at 12:56 AM Amit Kapila  wrote:
> > Yeah, the session in which we are doing Alter Function won't be able
> > to lock it but it will wait for the AccessExclusiveLock on the rel to
> > be released because it will also try to acquire it before sending
> > invalidation.
> 
> I think users would not be very happy with such behavior. Users accept that if
> they try to access a relation, they might end up needing to wait for a lock 
> on it,
> but what you are proposing here might make a session block waiting for a lock
> on a relation which it never attempted to access.
> 
> I think this whole line of attack is a complete dead-end. We can invent new
> types of invalidations if we want, but they need to be sent based on which
> objects actually got changed, not based on what we think might be affected
> indirectly as a result of those changes. It's reasonable to regard something 
> like
> a trigger or constraint as a property of the table because it is really a
> dependent object. It is associated with precisely one table when it is created
> and the association can never be changed. On the other hand, functions clearly
> have their own existence. They can be created and dropped independently of
> any table and the tables with which they are associated can change at any 
> time.
> In that kind of situation, invalidating the table based on changes to the 
> function
> is riddled with problems which I am pretty convinced we're never going to be
> able to solve. I'm not 100% sure what we ought to do here, but I'm pretty sure
> that looking up the tables that happen to be associated with the function in 
> the
> session that is modifying the function is not it.

I agree that we should send invalid message like
" function OID's parallel safety has changed ". And when each session accept
this invalid message, each session needs to invalid the related table. Based on
previous mails, we only want to invalid the table that use this function in the
index expression/trigger/constraints. The problem is how to get all the related
tables. Robert-san suggested cache a list of pg_proc OIDs, that means we need
to rebuild the list everytime if the relcache is invalidated. The cost to do 
that
could be expensive, especially for extracting pg_proc OIDs from index 
expression,
because we need to invoke index_open(index, lock) to get the index expression.

Or, maybe we can let each session uses the pg_depend to get the related table 
and
invalid them after accepting the new type invalid message.

Best regards,
houzj


Re: subscription/t/010_truncate.pl failure on desmoxytes in REL_13_STABLE

2021-06-22 Thread Thomas Munro
On Tue, Jun 22, 2021 at 10:01 PM Amit Kapila  wrote:
> On Tue, Jun 22, 2021 at 10:33 AM Thomas Munro  wrote:
> > While scanning for assertion failures on the build farm that don't
> > appear to have been discussed, I found this[1] in
> > 010_truncate_publisher.log on the 13 branch:
> >
> > TRAP: FailedAssertion("tupdesc->tdrefcount <= 0", File:
> > "/home/bf/build/buildfarm-desmoxytes/REL_13_STABLE/pgsql.build/../pgsql/src/backend/access/common/tupdesc.c",
> > Line: 321)
>
> I guess this could be similar to what we see at:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2021-06-15%2020%3A49%3A26
>
> We have discussed this in another thread at:
> https://www.postgresql.org/message-id/648020.1623854904%40sss.pgh.pa.us
>
> The reason why I think it is the same is that assertion failure shown
> in the report is from function FreeTupleDesc() which we can call from
> pgoutput.c while processing the invalidation. Ideally, we shouldn't
> call invalidation before initializing the tuple conversion map for
> partitions but in some rare cases, that was happening which we have
> fixed in commit
> (https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=357cb8f07f95665ea533ff534821c22c35b01288).
>
> I see this report is from 16th June 2021 and the commit is on 18th
> June 2021. So, I am hoping this should have been fixed but if we see
> it again then probably we need to investigate it further.

Ahh, that makes sense.  Thanks for checking, and sorry for the noise.




Re: pgbench logging broken by time logic changes

2021-06-22 Thread Fabien COELHO


Bonjour Michaël,


If this were core server code threatening data integrity I would be
inclined to be more strict, but after all pg_bench is a utility program,
and I think we can allow a little more latitude.


+1.  Let's be flexible here.  It looks better to not rush a fix, and
we still have some time ahead.


Attached an updated v8 patch which adds (reinstate) an improved TAP test 
which would have caught the various regressions on logs.


Given that such tests were removed once before, I'm unsure whether they 
will be acceptable, despite that their usefulness has been clearly 
demonstrated. At least it is for the record. Sigh:-(


--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index e61055b6b7..dfb2ff6859 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -343,6 +343,12 @@ typedef struct StatsData
 	SimpleStats lag;
 } StatsData;
 
+/*
+ * For displaying epoch timestamp, as some time functions may have
+ * another reference.
+ */
+pg_time_usec_t epoch_shift;
+
 /*
  * Struct to keep random state.
  */
@@ -649,7 +655,7 @@ static void setDoubleValue(PgBenchValue *pv, double dval);
 static bool evaluateExpr(CState *st, PgBenchExpr *expr,
 		 PgBenchValue *retval);
 static ConnectionStateEnum executeMetaCommand(CState *st, pg_time_usec_t *now);
-static void doLog(TState *thread, CState *st,
+static void doLog(TState *thread, CState *st, bool tx,
   StatsData *agg, bool skipped, double latency, double lag);
 static void processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 			 bool skipped, StatsData *agg);
@@ -3766,16 +3772,47 @@ executeMetaCommand(CState *st, pg_time_usec_t *now)
 	return CSTATE_END_COMMAND;
 }
 
+/* print aggregated report to logfile */
+static void
+logAgg(FILE *logfile, StatsData *agg)
+{
+	fprintf(logfile, INT64_FORMAT " " INT64_FORMAT " %.0f %.0f %.0f %.0f",
+			(agg->start_time + epoch_shift) / 100,
+			agg->cnt,
+			agg->latency.sum,
+			agg->latency.sum2,
+			agg->latency.min,
+			agg->latency.max);
+	if (throttle_delay)
+	{
+		fprintf(logfile, " %.0f %.0f %.0f %.0f",
+agg->lag.sum,
+agg->lag.sum2,
+agg->lag.min,
+agg->lag.max);
+		if (latency_limit)
+			fprintf(logfile, " " INT64_FORMAT, agg->skipped);
+	}
+	fputc('\n', logfile);
+}
+
 /*
  * Print log entry after completing one transaction.
  *
+ * Param tx tells whether the call actually represents a transaction,
+ * or if it is used to flush aggregated logs.
+ *
+ * The function behavior changes depending on sample_rate (a fraction of
+ * transaction is reported) and agg_interval (transactions are aggregated
+ * and reported once every agg_interval seconds).
+ *
  * We print Unix-epoch timestamps in the log, so that entries can be
  * correlated against other logs.  On some platforms this could be obtained
  * from the caller, but rather than get entangled with that, we just eat
  * the cost of an extra syscall in all cases.
  */
 static void
-doLog(TState *thread, CState *st,
+doLog(TState *thread, CState *st, bool tx,
 	  StatsData *agg, bool skipped, double latency, double lag)
 {
 	FILE	   *logfile = thread->logfile;
@@ -3794,43 +3831,36 @@ doLog(TState *thread, CState *st,
 	/* should we aggregate the results or not? */
 	if (agg_interval > 0)
 	{
+		pg_time_usec_t	next;
+
 		/*
 		 * Loop until we reach the interval of the current moment, and print
 		 * any empty intervals in between (this may happen with very low tps,
 		 * e.g. --rate=0.1).
 		 */
-
-		while (agg->start_time + agg_interval <= now)
+		while ((next = agg->start_time + agg_interval * INT64CONST(100)) <= now)
 		{
-			/* print aggregated report to logfile */
-			fprintf(logfile, INT64_FORMAT " " INT64_FORMAT " %.0f %.0f %.0f %.0f",
-	agg->start_time,
-	agg->cnt,
-	agg->latency.sum,
-	agg->latency.sum2,
-	agg->latency.min,
-	agg->latency.max);
-			if (throttle_delay)
-			{
-fprintf(logfile, " %.0f %.0f %.0f %.0f",
-		agg->lag.sum,
-		agg->lag.sum2,
-		agg->lag.min,
-		agg->lag.max);
-if (latency_limit)
-	fprintf(logfile, " " INT64_FORMAT, agg->skipped);
-			}
-			fputc('\n', logfile);
+			logAgg(logfile, agg);
 
 			/* reset data and move to next interval */
-			initStats(agg, agg->start_time + agg_interval);
+			initStats(agg, next);
 		}
 
-		/* accumulate the current transaction */
-		accumStats(agg, skipped, latency, lag);
+		if (tx)
+			/* accumulate the current transaction */
+			accumStats(agg, skipped, latency, lag);
+		else
+			/* final call to show the last aggregate */
+			logAgg(logfile, agg);
 	}
 	else
 	{
+		/* switch to epoch */
+		now += epoch_shift;
+
+		/* !tx only used for aggregated data */
+		Assert(tx);
+
 		/* no, print raw transactions */
 		if (skipped)
 			fprintf(logfile, "%d " INT64_FORMAT " skipped %d " INT64_FORMAT " "
@@ -3890,7 +3920,7 @@ processXactStats(TState *thread, CState *st, pg_time_usec_t *now,
 	st->cnt++;
 
 	if (use_log)
-		doLog(thread, st, agg, skipped, 

Re: subscription/t/010_truncate.pl failure on desmoxytes in REL_13_STABLE

2021-06-22 Thread Amit Kapila
On Tue, Jun 22, 2021 at 10:33 AM Thomas Munro  wrote:
>
> While scanning for assertion failures on the build farm that don't
> appear to have been discussed, I found this[1] in
> 010_truncate_publisher.log on the 13 branch:
>
> TRAP: FailedAssertion("tupdesc->tdrefcount <= 0", File:
> "/home/bf/build/buildfarm-desmoxytes/REL_13_STABLE/pgsql.build/../pgsql/src/backend/access/common/tupdesc.c",
> Line: 321)
>

I guess this could be similar to what we see at:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2021-06-15%2020%3A49%3A26

We have discussed this in another thread at:
https://www.postgresql.org/message-id/648020.1623854904%40sss.pgh.pa.us

The reason why I think it is the same is that assertion failure shown
in the report is from function FreeTupleDesc() which we can call from
pgoutput.c while processing the invalidation. Ideally, we shouldn't
call invalidation before initializing the tuple conversion map for
partitions but in some rare cases, that was happening which we have
fixed in commit
(https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=357cb8f07f95665ea533ff534821c22c35b01288).

I see this report is from 16th June 2021 and the commit is on 18th
June 2021. So, I am hoping this should have been fixed but if we see
it again then probably we need to investigate it further.

-- 
With Regards,
Amit Kapila.




Re: disfavoring unparameterized nested loops

2021-06-22 Thread Tomas Vondra



On 6/22/21 2:25 AM, Peter Geoghegan wrote:
> On Mon, Jun 21, 2021 at 4:51 PM Bruce Momjian  wrote:
>> There were a lot of interesting ideas in this thread and I want to
>> analyze some of them.  First, there is the common assumption (not
>> stated) that over-estimating by 5% and underestimating by 5% cause the
>> same harm, which is clearly false.  If I go to a restaurant and estimate
>> the bill to be 5% higher or %5 lower, assuming I have sufficient funds,
>> under or over estimating is probably fine.  If I am driving and
>> under-estimate the traction of my tires, I am probably fine, but if I
>> over-estimate their traction by 5%, I might crash.
> 
> My favorite analogy is the home insurance one:
> 
> It might make sense to buy home insurance because losing one's home
> (say through fire) is a loss that usually just cannot be tolerated --
> you are literally ruined. We can debate how likely it is to happen,
> but in the end it's not so unlikely that it can't be ruled out. At the
> same time I may be completely unwilling to buy insurance for personal
> electronic devices. I can afford to replace all of them if I truly
> have to. And the chances of all of them breaking or being stolen on
> the same day is remote (unless my home burns down!). If I drop my cell
> phone and crack the screen, I'll be annoyed, but it's certainly not
> the end of the world.
> 
> This behavior will make perfect sense to most people. But it doesn't
> scale up or down. I have quite a few electronic devices, but only a
> single home, so technically I'm taking risks way more often than I am
> playing it safe here. Am I risk tolerant when it comes to insurance?
> Conservative?
> 
> I myself don't think that it is sensible to apply either term here.
> It's easier to just look at the specifics. A home is a pretty
> important thing to almost everybody; we can afford to treat it as a
> special case.
> 
>> If that is accurate, I think the big question is how common are cases
>> where the outer side can't be proven to have zero or one row and nested
>> loops are enough of a win to risk its greater sensitivity to
>> misestimation.  If it is uncommon, seems we could just code the
>> optimizer to use hash joins in those cases without a user-visible knob,
>> beyond the knob that already turns off nested loop joins.
> 
> I think it's possible that Robert's proposal will lead to very
> slightly slower plans in the vast majority of cases that are affected,
> while still being a very good idea. Why should insurance be 100% free,
> though? Maybe it can be in some cases where we get lucky, but why
> should that be the starting point? It just has to be very cheap
> relative to what we do today for us to come out ahead, certainly, but
> that seems quite possible in at least this case.
> 

Yeah, I like the insurance analogy - it gets to the crux of the problem,
because insurance is pretty much exactly about managing risk. But making
everything slower will be a hard sell, because wast majority of
workloads already running on Postgres don't have this issue at all, so
for them it's not worth the expense. Following the insurance analogy,
selling tornado insurance in Europe is mostly pointless.

Insurance is also about personal preference / risk tolerance. Maybe I'm
fine with accepting risk that my house burns down, or whatever ...

And the lack of data also plays role - the insurance company will ask
for higher rates when it does not have enough accurate data about the
phenomenon, or when there's a lot of unknowns. Maybe this would allow
some basic measure of uncertainty, based on the number and type of
restrictions, joins, etc. The more restrictions we have, the less
certain the estimates are. Some conditions are estimated less
accurately, and using default estimates makes it much less accurate.

So maybe some fairly rough measure of uncertainty might work, and the
user might specify how much risk it's willing to tolerate.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Doc chapter for Hash Indexes

2021-06-22 Thread Simon Riggs
On Tue, Jun 22, 2021 at 7:15 AM Amit Kapila  wrote:
>
> On Mon, Jun 21, 2021 at 6:38 PM Simon Riggs
>  wrote:
> >
> > New chapter for Hash Indexes, designed to help users understand how
> > they work and when to use them.
> >
> > Mostly newly written, but a few paras lifted from README when they were 
> > helpful.
> >
>
> Few comments
> ==
> 1.
> +  Hash indexes are best optimized for SELECTs and UPDATEs using equality
> +  scans on larger tables.
>
> Is there a reason to mention Selects and Updates but not Deletes?

Deletes decrease the number of rows, so must eventually be matched with inserts.
So deletes imply inserts.

Perhaps it should say "update-heavy"

> 2.
> +  Like B-Trees, hash indexes perform simple index tuple deletion. This
> +  is a deferred maintenance operation that deletes index tuples that are
> +  known to be safe to delete (those whose item identifier's LP_DEAD bit
> +  is already set). This is performed speculatively upon each insert,
> +  though may not succeed if the page is pinned by another backend.
>
> It is not very clear to me when we perform the simple index tuple
> deletion from the above sentence. We perform it when there is no space
> to accommodate a new tuple on the bucket page and as a result, we
> might need to create an overflow page. Basically, I am not sure
> saying: "This is performed speculatively upon each insert .." is
> helpful.

OK, thanks, will reword.

> 3.
> +  incrementally expanded.  When a new bucket is to be added to the index,
> +  exactly one existing bucket will need to be "split", with some of its
> +  tuples being transferred to the new bucket according to the updated
> +  key-to-bucket-number mapping.  This is essentially the same hash table
>
> In most places, the patch has used a single space after the full stop
> but at some places like above, it has used two spaces after full stop.
> I think it is better to be consistent.

OK

> 4.
>  This is essentially the same hash table
> +  management technique embodied in src/backend/utils/hash/dynahash.c for
> +  in-memory hash tables used within PostgreSQL internals.
>
> I am not sure if there is a need to mention this in the user-facing
> doc. I think README is a better place for this.

OK, will remove. Thanks


I've reworded most things from both Amit and Justin; thanks for your reviews.

I attach both clean and compare versions.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


doc_hash_index.v1-v2.diff
Description: Binary data


doc_hash_index.v2.patch
Description: Binary data


Cosmic ray hits integerset

2021-06-22 Thread Thomas Munro
Hi,

Here's a curious one-off failure in test_integerset:

+ERROR:  iterate returned wrong value; got 519985430528, expected 485625692160

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros=2021-04-01%2018:19:47




Re: improvements in Unicode tables generation code

2021-06-22 Thread Kyotaro Horiguchi
At Tue, 22 Jun 2021 11:20:46 +0300, Heikki Linnakangas  wrote 
in 
> On 22/06/2021 10:20, Peter Eisentraut wrote:
> > v1-0004-Simplify-code-generation-code.patch
> > This simplifies the code a bit, which helps with the next patch.
> 
> If we do that, let's add the trailing commas to the other arrays too,
> not just the combined maps.
> 
> No objection, but how does this help the next patch?
> 
> If we want to avoid the stray commas (and I think they are a little
> ugly, but that's a matter of taste), we could adopt the approach that
> print_radix_table() uses to avoid the comma. That seems simpler than
> what print_from_utf8_combined_map and print_to_utf8_combined_map are
> doing.

+1 for adopting the same method with print_radix_table *if* we do want
to avoid the stray commans.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: improvements in Unicode tables generation code

2021-06-22 Thread Heikki Linnakangas

On 22/06/2021 10:20, Peter Eisentraut wrote:

I have accumulated a few patches to improve the output of the scripts in
src/backend/utils/mb/Unicode/ to be less non-standard-looking and fix a
few other minor things in that area.

v1-0001-Make-Unicode-makefile-more-parallel-safe.patch

The makefile rule that calls UCS_to_most.pl was written incorrectly for
parallel make.  The script writes all output files in one go, but the
rule as written would call the command once for each output file in
parallel.


This could use a comment. At a quick glance, I don't understand what all 
the $(wordlist ...) magic does.


Perhaps we should change the script or Makefile so that it doesn't 
create all the maps in one go?



v1-0002-Make-UCS_to_most.pl-process-encodings-in-sorted-o.patch

This mainly just helps eyeball the output while debugging the previous
patch.


+1


v1-0003-Remove-some-whitespace-in-generated-C-output.patch

Improve a small formatting issue in the output.


I'm surprised the added \n in the perl code didn't result in extra 
newlines in the outputs.



v1-0004-Simplify-code-generation-code.patch

This simplifies the code a bit, which helps with the next patch.


If we do that, let's add the trailing commas to the other arrays too, 
not just the combined maps.


No objection, but how does this help the next patch?

If we want to avoid the stray commas (and I think they are a little 
ugly, but that's a matter of taste), we could adopt the approach that 
print_radix_table() uses to avoid the comma. That seems simpler than 
what print_from_utf8_combined_map and print_to_utf8_combined_map are doing.



v1-0005-Fix-indentation-in-generated-output.patch

This changes the indentation in the output from two spaces to a tab.

I haven't included the actual output changes in the last patch, because
they would be huge, but the idea should be clear.

All together, these make the output look closer to how pgindent would
make it.


Thanks!

- Heikki




Re: improvements in Unicode tables generation code

2021-06-22 Thread Kyotaro Horiguchi
At Tue, 22 Jun 2021 09:20:16 +0200, Peter Eisentraut 
 wrote in 
> I have accumulated a few patches to improve the output of the scripts
> in src/backend/utils/mb/Unicode/ to be less non-standard-looking and
> fix a few other minor things in that area.
> 
> v1-0001-Make-Unicode-makefile-more-parallel-safe.patch
> 
> The makefile rule that calls UCS_to_most.pl was written incorrectly
> for parallel make.  The script writes all output files in one go, but
> the rule as written would call the command once for each output file
> in parallel.

I was annoyed by that behavior but haven't found how to stop that.  It
looks to work. (But I haven't run it for me for the reason at the end
of this mail.)

> v1-0002-Make-UCS_to_most.pl-process-encodings-in-sorted-o.patch
> 
> This mainly just helps eyeball the output while debugging the previous
> patch.
> 
> v1-0003-Remove-some-whitespace-in-generated-C-output.patch
> 
> Improve a small formatting issue in the output.

These look just fine.

> v1-0004-Simplify-code-generation-code.patch
> 
> This simplifies the code a bit, which helps with the next patch.

This simplifies the code in exchange of allowing a comma after the
last element of array literals.  I'm fine with it as long as we allow
that style in the tree.

> v1-0005-Fix-indentation-in-generated-output.patch
> 
> This changes the indentation in the output from two spaces to a tab.
> 
> I haven't included the actual output changes in the last patch,
> because they would be huge, but the idea should be clear.
> 
> All together, these make the output look closer to how pgindent would
> make it.

I agree to the fix.

Mmm. (although, somewhat unrelated to this patch set) I tried this but
I found that www.unicode.org doesn't respond (for at least these
several days).  I'm not sure what is happening here.

> wget -O 8859-2.TXT --no-use-server-timestamps 
> https://www.unicode.org/Public/MAPPINGS/ISO8859/8859-2.TXT
> --2021-06-22 17:09:34--  
> https://www.unicode.org/Public/MAPPINGS/ISO8859/8859-2.TXT
> Resolving www.unicode.org (www.unicode.org)... 66.34.208.12
> Connecting to www.unicode.org (www.unicode.org)|66.34.208.12|:443... 
(timeouts)


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Toast compression method options

2021-06-22 Thread Michael Paquier
On Tue, Jun 22, 2021 at 11:05:22AM +0530, Dilip Kumar wrote:
> IMHO there is certainly a use case, basically, if we compress the data
> so that we can avoid storing it externally.  Now suppose for some
> data, with default LZ4 compression, the compression ratio is so high
> that you are able to compress to the size which is way under the
> limit.  So for such data, the acceleration can be increased such that
> compression is fast and compression ratio is good enough that it is
> not going to the external storage.  I agree it will be difficult for
> the user to make such a decision and select the acceleration value but
> based on the data pattern and their compressed length the admin can
> make such a decision.  So in short select the acceleration value such
> that you can compress it fast and the compression ratio is good enough
> to keep it from storing externally.

Theoritically, I agree that there could be a use case, and that was
the point I was trying to outline above.  My point is more from a
practical point of view.  LZ4 is designed to be fast and cheap in CPU
with a rather low compression ratio compared to other modern algos.

Could it be possible to think about some worst cases where one may
want to reduce its compression to save some CPU?  The point, as you
say, to allow a tuning of the acceleration would be that one may want
to save a bit of CPU and does not care about the extra disk space it
takes.  Still, I am wondering why one would not just store the values
externally in such cases and just save as much compression effort as
possible.
--
Michael


signature.asc
Description: PGP signature


improvements in Unicode tables generation code

2021-06-22 Thread Peter Eisentraut
I have accumulated a few patches to improve the output of the scripts in 
src/backend/utils/mb/Unicode/ to be less non-standard-looking and fix a 
few other minor things in that area.


v1-0001-Make-Unicode-makefile-more-parallel-safe.patch

The makefile rule that calls UCS_to_most.pl was written incorrectly for 
parallel make.  The script writes all output files in one go, but the 
rule as written would call the command once for each output file in 
parallel.


v1-0002-Make-UCS_to_most.pl-process-encodings-in-sorted-o.patch

This mainly just helps eyeball the output while debugging the previous 
patch.


v1-0003-Remove-some-whitespace-in-generated-C-output.patch

Improve a small formatting issue in the output.

v1-0004-Simplify-code-generation-code.patch

This simplifies the code a bit, which helps with the next patch.

v1-0005-Fix-indentation-in-generated-output.patch

This changes the indentation in the output from two spaces to a tab.

I haven't included the actual output changes in the last patch, because 
they would be huge, but the idea should be clear.


All together, these make the output look closer to how pgindent would 
make it.
From 3dce99c8e57aec91db85965b6cef947484c00a5e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 22 Jun 2021 09:06:28 +0200
Subject: [PATCH v1 1/5] Make Unicode makefile more parallel-safe

---
 src/backend/utils/mb/Unicode/Makefile | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/mb/Unicode/Makefile 
b/src/backend/utils/mb/Unicode/Makefile
index ed6fc07e08..4969b8c385 100644
--- a/src/backend/utils/mb/Unicode/Makefile
+++ b/src/backend/utils/mb/Unicode/Makefile
@@ -72,7 +72,9 @@ GENERICTEXTS = $(ISO8859TEXTS) $(WINTEXTS) \
 
 all: $(MAPS)
 
-$(GENERICMAPS): UCS_to_most.pl $(GENERICTEXTS)
+$(wordlist 2, $(words $(GENERICMAPS)), $(GENERICMAPS)): $(firstword 
$(GENERICMAPS)) ;
+
+$(firstword $(GENERICMAPS)): UCS_to_most.pl $(GENERICTEXTS)
$(PERL) -I $(srcdir) $<
 
 johab_to_utf8.map utf8_to_johab.map: UCS_to_JOHAB.pl JOHAB.TXT
-- 
2.32.0

From 291f0acfd5331ab2f29710b156c40b0dad703ca2 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 22 Jun 2021 09:06:28 +0200
Subject: [PATCH v1 2/5] Make UCS_to_most.pl process encodings in sorted order

This just makes the progress output easier to follow.
---
 src/backend/utils/mb/Unicode/UCS_to_most.pl | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/mb/Unicode/UCS_to_most.pl 
b/src/backend/utils/mb/Unicode/UCS_to_most.pl
index 4f974388d7..6b699b376d 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_most.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_most.pl
@@ -54,7 +54,8 @@
 # make maps for all encodings if not specified
 my @charsets = (scalar(@ARGV) > 0) ? @ARGV : sort keys(%filename);
 
-foreach my $charset (@charsets)
+# the sort is just so that the output is easier to eyeball
+foreach my $charset (sort @charsets)
 {
my $mapping = _source($filename{$charset});
 
-- 
2.32.0

From 242440c79a10aab92e1293e1441dc963fd26e2ce Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 22 Jun 2021 09:06:28 +0200
Subject: [PATCH v1 3/5] Remove some whitespace in generated C output

It doesn't match the normal coding style.
---
 src/backend/utils/mb/Unicode/convutils.pm   | 4 ++--
 src/backend/utils/mb/Unicode/euc_jis_2004_to_utf8.map   | 2 +-
 src/backend/utils/mb/Unicode/shift_jis_2004_to_utf8.map | 2 +-
 src/backend/utils/mb/Unicode/utf8_to_euc_jis_2004.map   | 2 +-
 src/backend/utils/mb/Unicode/utf8_to_shift_jis_2004.map | 2 +-
 5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/backend/utils/mb/Unicode/convutils.pm 
b/src/backend/utils/mb/Unicode/convutils.pm
index 5ad38514be..8369e91b2d 100644
--- a/src/backend/utils/mb/Unicode/convutils.pm
+++ b/src/backend/utils/mb/Unicode/convutils.pm
@@ -173,7 +173,7 @@ sub print_from_utf8_combined_map
 
printf $out "\n/* Combined character map */\n";
printf $out
- "static const pg_utf_to_local_combined ULmap${charset}_combined[ %d ] 
= {",
+ "static const pg_utf_to_local_combined ULmap${charset}_combined[%d] = 
{\n",
  scalar(@$table);
my $first = 1;
foreach my $i (sort { $a->{utf8} <=> $b->{utf8} } @$table)
@@ -208,7 +208,7 @@ sub print_to_utf8_combined_map
 
printf $out "\n/* Combined character map */\n";
printf $out
- "static const pg_local_to_utf_combined LUmap${charset}_combined[ %d ] 
= {",
+ "static const pg_local_to_utf_combined LUmap${charset}_combined[%d] = 
{\n",
  scalar(@$table);
 
my $first = 1;
diff --git a/src/backend/utils/mb/Unicode/euc_jis_2004_to_utf8.map 
b/src/backend/utils/mb/Unicode/euc_jis_2004_to_utf8.map
index d2da4a383b..3a8fc9d26f 100644
--- a/src/backend/utils/mb/Unicode/euc_jis_2004_to_utf8.map
+++ b/src/backend/utils/mb/Unicode/euc_jis_2004_to_utf8.map
@@ -3414,7 +3414,7 @@ static const uint32 

Re: intermittent failures in Cygwin from select_parallel tests

2021-06-22 Thread Noah Misch
On Tue, Jun 22, 2021 at 05:52:03PM +1200, Thomas Munro wrote:
> On Tue, Jun 22, 2021 at 5:21 PM Noah Misch  wrote:
> > On Thu, Aug 03, 2017 at 10:45:50AM -0400, Robert Haas wrote:
> > > On Wed, Aug 2, 2017 at 11:47 PM, Noah Misch  wrote:
> > > > postmaster algorithms rely on the PG_SETMASK() calls preventing that.  
> > > > Without
> > > > such protection, duplicate bgworkers are an understandable result.  I 
> > > > caught
> > > > several other assertions; the PMChildFlags failure is another case of
> > > > duplicate postmaster children:
> > > >
> > > >   6 TRAP: FailedAssertion("!(entry->trans == ((void *)0))", File: 
> > > > "pgstat.c", Line: 871)
> > > >   3 TRAP: FailedAssertion("!(PMSignalState->PMChildFlags[slot] == 
> > > > 1)", File: "pmsignal.c", Line: 229)
> > > >  20 TRAP: FailedAssertion("!(RefCountErrors == 0)", File: 
> > > > "bufmgr.c", Line: 2523)
> > > >  21 TRAP: FailedAssertion("!(vmq->mq_sender == ((void *)0))", File: 
> > > > "shm_mq.c", Line: 221)
> > > >  Also, got a few "select() failed in postmaster: Bad address"
> > > >
> > > > I suspect a Cygwin signals bug.  I'll try to distill a self-contained 
> > > > test
> > > > case for the Cygwin hackers.  The lack of failures on buildfarm member 
> > > > brolga
> > > > argues that older Cygwin is not affected.
> > >
> > > Nice detective work.
> >
> > Thanks.  http://marc.info/?t=15018329641 has my upstream report.  The
> > Cygwin project lead reproduced this, but a fix remained elusive.
> >
> > I guess we'll ignore weird postmaster-associated lorikeet failures for the
> > foreseeable future.
> 
> While reading a list of recent build farm assertion failures I learned that
> this is still broken in Cygwin 3.2, and eventually found my way back
> to this thread.

Interesting.  Which branch(es) showed you failures?  I had wondered if the
move to sa_mask (commit 9abb2bfc) would effectively end the problem in v13+.
Perhaps the Cygwin bug pokes through even that.  Perhaps the sa_mask
conditionals need to be "#if defined(WIN32) && !defined(__CYGWIN__)" to help
current buildfarm members.

> I was wondering about suggesting some kind of
> official warning, but I guess the manual already covers it with this
> 10 year old notice.  I don't know much about Windows or Cygwin so I'm
> not sure if it needs updating or not, but I would guess that there are
> no longer any such systems?
> 
>   Cygwin is not recommended for running a
>   production server, and it should only be used for running on
>   older versions of Windows where
>   the native build does not work.

I expect native builds work on all Microsoft-supported Windows versions, so +1
for removing everything after the comma.




Re: Assertion failure in HEAD and 13 after calling COMMIT in a stored proc

2021-06-22 Thread Michael Paquier
On Mon, Jun 21, 2021 at 04:19:27PM -0700, Jim Nasby wrote:
> The following generates an assertion failure. Quick testing with start and
> stop as well as the core dump shows it’s failing on the execution of
> `schema_name := schema_name(i)` immediately after COMMIT, because there’s no
> active snapshot. On a build without asserts I get a failure in
> GetActiveSnapshot() (second stack trace). This works fine on 12_STABLE, but
> fails on 13_STABLE and HEAD.

A bisect run points me to the following commit:
commit 73b06cf893c9d3bb38c11878a12cc29407e78b6c
Author: Tom Lane 
Date:   Fri Nov 22 15:02:18 2019 -0500

Avoid taking a new snapshot for an immutable simple expression in plpgsql.

Snapshots would be taken when using non-immutable functions.  I'd need
to study more this code to grab if we could improve the situation
after committing the transaction, but, Tom, shouldn't we enforce a
snapshot in the case where the expression has not been prepared for
execution in the new XACT, even for the immutable case?  It seems to 
me that this refers to the case where expr_simple_lxid is still
invalid, no?
--
Michael


signature.asc
Description: PGP signature


Re: Doc chapter for Hash Indexes

2021-06-22 Thread Amit Kapila
On Mon, Jun 21, 2021 at 6:38 PM Simon Riggs
 wrote:
>
> New chapter for Hash Indexes, designed to help users understand how
> they work and when to use them.
>
> Mostly newly written, but a few paras lifted from README when they were 
> helpful.
>

Few comments
==
1.
+  Hash indexes are best optimized for SELECTs and UPDATEs using equality
+  scans on larger tables.

Is there a reason to mention Selects and Updates but not Deletes?

2.
+  Like B-Trees, hash indexes perform simple index tuple deletion. This
+  is a deferred maintenance operation that deletes index tuples that are
+  known to be safe to delete (those whose item identifier's LP_DEAD bit
+  is already set). This is performed speculatively upon each insert,
+  though may not succeed if the page is pinned by another backend.

It is not very clear to me when we perform the simple index tuple
deletion from the above sentence. We perform it when there is no space
to accommodate a new tuple on the bucket page and as a result, we
might need to create an overflow page. Basically, I am not sure
saying: "This is performed speculatively upon each insert .." is
helpful.

3.
+  incrementally expanded.  When a new bucket is to be added to the index,
+  exactly one existing bucket will need to be "split", with some of its
+  tuples being transferred to the new bucket according to the updated
+  key-to-bucket-number mapping.  This is essentially the same hash table

In most places, the patch has used a single space after the full stop
but at some places like above, it has used two spaces after full stop.
I think it is better to be consistent.

4.
 This is essentially the same hash table
+  management technique embodied in src/backend/utils/hash/dynahash.c for
+  in-memory hash tables used within PostgreSQL internals.

I am not sure if there is a need to mention this in the user-facing
doc. I think README is a better place for this.

-- 
With Regards,
Amit Kapila.