RE: Time delayed LR (WAS Re: logical replication restrictions)

2022-12-11 Thread Takamichi Osumi (Fujitsu)
On Monday, December 12, 2022 2:54 PM Kyotaro Horiguchi 
 wrote:
> I asked about unexpected walsender termination caused by this feature but I
> think I didn't received an answer for it and the behavior is still exists.
> 
> Specifically, if servers have the following settings, walsender terminates for
> replication timeout. After that, connection is restored after the LR delay 
> elapses.
> Although it can be said to be working in that sense, the error happens
> repeatedly every about min_apply_delay internvals but is hard to distinguish
> from network troubles.  I'm not sure you're deliberately okay with it but, I 
> don't
> think the behavior causing replication timeouts is acceptable.
> 
> > wal_sender_timeout = 10s;
> > wal_receiver_temeout = 10s;
> >
> > create subscription ... with (min_apply_delay='60s');
> 
> This is a kind of artificial but timeout=60s and delay=5m is not an uncommon
> setup and that also causes this "issue".
> 
> subscriber:
> > 2022-12-12 14:17:18.139 JST LOG:  terminating walsender process due to
> > replication timeout
> > 2022-12-12 14:18:11.076 JST LOG:  starting logical decoding for slot "s1"
> ...
Hi, Horiguchi-san


Thank you so much for your report!
Yes. Currently, how to deal with the timeout issue is under discussion.
Some analysis about the root cause are also there.

Kindly have a look at [1].


[1] - 
https://www.postgresql.org/message-id/TYAPR01MB58669394A67F2340B82E42D1F5E29%40TYAPR01MB5866.jpnprd01.prod.outlook.com



Best Regards,
Takamichi Osumi





RE: Time delayed LR (WAS Re: logical replication restrictions)

2022-12-11 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

This is a reply for later part of your e-mail.

> > (2) About the timeout issue
> >
> > When having a look at the physical replication internals,
> > it conducts sending feedback and application of delay separately on 
> > different
> processes.
> > OTOH, the logical replication needs to achieve those within one process.
> >
> > When we want to apply delay and avoid the timeout,
> > we should not store all the transactions data into memory.
> > So, one approach for this is to serialize the transaction data and after 
> > the delay,
> > we apply the transactions data.
> >
> 
> It is not clear to me how this will avoid a timeout.

At first, the reason why the timeout occurs is that while delaying the apply
worker neither reads messages from the walsender nor replies to it.
The worker's last_recv_timeout will be not updated because it does not receive
messages. This leads to wal_receiver_timeout. Similarly, the walsender's
last_processing will be not updated and exit due to the timeout because the
worker does not reply to upstream.

Based on the above, we thought that workers must receive and handle messages
evenif they are delaying applying transactions. In more detail, workers must
iterate the outer loop in LogicalRepApplyLoop().

If workers receive transactions but they need to delay applying, they must keep
messages somewhere. So we came up with the idea that workers serialize changes
once and apply later. Our basic design is as follows:

* All transactions areserialized to files if min_apply_delay is set to non-zero.
* After receiving the commit message and spending time, workers reads and
  applies spooled messages

> > However, this means if users adopt this feature,
> > then all transaction data that should be delayed would be serialized.
> > We are not sure if this sounds a valid approach or not.
> >
> > One another approach was to divide the time of delay in apply_delay() and
> > utilize the divided time for WaitLatch and sends the keepalive messages from
> there.
> >
> 
> Do we anytime send keepalive messages from the apply side? I think we
> only send feedback reply messages as a response to the publisher's
> keep_alive message. So, we need to do something similar for this if
> you want to follow this approach.

Right, and the above mechanism is needed for workers to understand messages
and send feedback replies as a response to the publisher's keepalive message.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



RE: Time delayed LR (WAS Re: logical replication restrictions)

2022-12-11 Thread Takamichi Osumi (Fujitsu)
On Wednesday, December 7, 2022 2:07 PM Amit Kapila  
wrote:
> On Tue, Dec 6, 2022 at 5:44 PM Takamichi Osumi (Fujitsu)
>  wrote:
> >
> > On Friday, December 2, 2022 4:05 PM Amit Kapila 
> wrote:
> > > On Tue, Nov 15, 2022 at 12:33 PM Amit Kapila
> > > 
> > > wrote:
> > > > One more thing I would like you to consider is the point raised by
> > > > me related to this patch's interaction with the parallel apply
> > > > feature as mentioned in the email [1]. I am not sure the idea
> > > > proposed in that email [1] is a good one because delaying after
> > > > applying commit may not be good as we want to delay the apply of
> > > > the transaction(s) on subscribers by this feature. I feel this needs 
> > > > more
> thought.
> > > >
> > >
> > > I have thought a bit more about this and we have the following
> > > options to choose the delay point from. (a) apply delay just before
> > > committing a transaction. As mentioned in comments in the patch this
> > > can lead to bloat and locks held for a long time. (b) apply delay
> > > before starting to apply changes for a transaction but here the
> > > problem is which time to consider. In some cases, like for streaming
> > > transactions, we don't receive the commit/prepare xact time in the start
> message. (c) use (b) but use the previous transaction's commit time.
> > > (d) apply delay after committing a transaction by using the xact's commit
> time.
> > >
> > > At this stage, among above, I feel any one of (c) or (d) is worth
> > > considering. Now, the difference between (c) and (d) is that if
> > > after commit the next xact's data is already delayed by more than
> > > min_apply_delay time then we don't need to kick the new logic of apply
> delay.
> > >
> > > The other thing to consider whether we need to process any keepalive
> > > messages during the delay because otherwise, walsender may think
> > > that the subscriber is not available and time out. This may not be a
> > > problem for synchronous replication but otherwise, it could be a problem.
> > >
> > > Thoughts?
> > (1) About the timing to apply the delay
> >
> > One approach of (b) would be best. The idea is to delay all types of
> > transaction's application based on the time when one transaction arrives at
> the subscriber node.
> >
> 
> But I think it will unnecessarily add the delay when there is a delay in 
> sending
> the transaction by the publisher (say due to the reason that publisher was 
> busy
> handling other workloads or there was a temporary network communication
> break between publisher and subscriber). This could probably be the reason
> why physical replication (via recovery_min_apply_delay) uses the commit time 
> of
> the sending side.
You are right. The approach (b) adds additional (or unnecessary) delay
due to network communication or machine troubles in streaming-in-progress cases.
We agreed this approach (b) has the disadvantage.


> > One advantage of this approach over (c) and (d) is that this can avoid
> > the case where we might apply a transaction immediately without
> > waiting, if there are two transactions sequentially and the time in between
> exceeds the min_apply_delay time.
> >
> 
> I am not sure if I understand your point. However, I think even if the
> transactions are sequential but if the time between them exceeds (say because
> the publisher was down) min_apply_delay, there is no need to apply additional
> delay.
I'm sorry, my description was not accurate. 

As for the approach (c), kindly imagine two transactions (txn1, txn2) are 
executed
on the publisher side and the publisher tries to send both of them to the 
subscriber.
Here, there is no network trouble and the publisher isn't busy for other 
workloads.
However, the diff of the time between txn1 and txn2 execeeds "min_apply_delay"
(which is set to the subscriber).

In this case, when the txn2 is a stream-in-progress transaction,
we don't apply any delay for txn2 when it arrives on the subscriber.
It's because before txn2 comes to the subscriber, "min_apply_delay"
has already passed on the publisher side.
This means there's a case we don't apply any delay when we choose approach (c).

The approach (d) has also similar disadvantage.
IIUC, in this approach the subscriber applies delay after committing a 
transaction,
based on the commit/prepare time of publisher side. Kindly, imagine two 
transactions
are executed on the publisher and the 2nd transaction completes after the 
subscriber's delay
for the 1st transaction. Again, there is no network troubles and no heavy 
workloads on the publisher.
If so, the delay for the txn1 already finishes when the 2nd transaction
arrives on the subscriber, then the 2nd transaction will be applied immediately 
without delay.

Another new discussion point is to utilize (b) and stream commit/stream prepare 
time
and apply the delay immediately before applying the spool files of the 
transactions
in the stream-in-progress transaction cases.

Does someone has any opinion on 

Re: Checksum errors in pg_stat_database

2022-12-11 Thread Drouvot, Bertrand




On 12/12/22 5:09 AM, Michael Paquier wrote:

On Sun, Dec 11, 2022 at 08:48:15PM -0500, Tom Lane wrote:

I think a stats table indexed solely by relfilenode wouldn't be a great
idea, because you'd lose all the stats about a table as soon as you
do vacuum full/cluster/rewriting-alter-table/etc.  Can we create two
index structures over the same stats table entries, so you can look
up by either relfilenode or OID?  I'm not quite sure how to manage
rewrites, where you transiently have two relfilenodes for "the
same" table ... maybe we could allow multiple pointers to the same
stats entry??


FWIW, I am not sure that I would care much if we were to dropped the
stats associated to a relfilenode on a rewrite.  In terms of checksum
failures, tuples are deformed so if there is one checksum failure a
rewrite would just not happen.  The potential complexity is not really
appealing compared to the implementation simplicity and its gains, and
rewrites are lock-heavy so I'd like to think that people avoid them
(cough)..


Agree that this is less "problematic" for the checksum use case.
On the other hand, losing IO stats (as the ones we could add later on, 
suggested by Andres up-thread) looks more of a concern to me.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: slab allocator performance issues

2022-12-11 Thread John Naylor
On Sat, Dec 10, 2022 at 11:02 AM David Rowley  wrote:
> [v4]

Thanks for working on this!

I ran an in-situ benchmark using the v13 radix tree patchset ([1] WIP but
should be useful enough for testing allocation speed), only applying the
first five, which are local-memory only. The benchmark is not meant to
represent a realistic workload, and primarily stresses traversal and
allocation of the smallest node type. Minimum of five, with turbo-boost
off, on recent Intel laptop hardware:

v13-0001 to 0005:

# select * from bench_load_random_int(500 * 1000);
 mem_allocated | load_ms
---+-
 151123432 | 222

47.06%  postgres  postgres [.] rt_set
22.89%  postgres  postgres [.] SlabAlloc
 9.65%  postgres  postgres [.] rt_node_insert_inner.isra.0
 5.94%  postgres  [unknown][k] 0xb5e011b7
 3.62%  postgres  postgres [.] MemoryContextAlloc
 2.70%  postgres  libc.so.6[.] __memmove_avx_unaligned_erms
 2.60%  postgres  postgres [.] SlabFree

+ v4 slab:

# select * from bench_load_random_int(500 * 1000);
 mem_allocated | load_ms
---+-
 152463112 | 213

  52.42%  postgres  postgres  [.] rt_set
  12.80%  postgres  postgres  [.] SlabAlloc
   9.38%  postgres  postgres  [.] rt_node_insert_inner.isra.0
   7.87%  postgres  [unknown] [k] 0xb5e011b7
   4.98%  postgres  postgres  [.] SlabFree

While allocation is markedly improved, freeing looks worse here. The
proportion is surprising because only about 2% of nodes are freed during
the load, but doing that takes up 10-40% of the time compared to allocating.

num_keys = 50, height = 7
n4 = 2501016, n15 = 56932, n32 = 270, n125 = 0, n256 = 257

Sidenote: I don't recall ever seeing vsyscall (I think that's what the
0xb5e011b7 address is referring to) in a profile, so not sure what
is happening there.

[1]
https://www.postgresql.org/message-id/CAFBsxsHNE621mGuPhd7kxaGc22vMkoSu7R4JW9Zan1jjorGy3g%40mail.gmail.com

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Checksum errors in pg_stat_database

2022-12-11 Thread Drouvot, Bertrand




On 12/12/22 12:40 AM, Michael Paquier wrote:

On Sun, Dec 11, 2022 at 09:18:42PM +0100, Magnus Hagander wrote:

It would be less of a concern yes, but I think it still would be a concern.
If you have a large amount of corruption you could quickly get to millions
of rows to keep track of which would definitely be a problem in shared
memory as well, wouldn't it?


Yes.  I have discussed this item with Bertrand off-list and I share
the same concern.  This would lead to an lot of extra workload on a
large seqscan for a corrupted relation when the stats are written
(shutdown delay) while bloating shared memory with potentially
millions of items even if variable lists are handled through a dshash
and DSM.


But perhaps we could keep a list of "the last 100 checksum failures" or
something like that?


Applying a threshold is one solution.  Now, a second thing I have seen
in the past is that some disk partitions were busted but not others,
and the current database-level counters are not enough to make a
difference when it comes to grab patterns in this area.  A list of the
last N failures may be able to show some pattern, but that would be
like analyzing things with a lot of noise without a clear conclusion.
Anyway, the workload caused by the threshold number had better be
measured before being decided (large set of relation files with a full
range of blocks corrupted, much better if these are in the OS cache
when scanned), which does not change the need of a benchmark.

What about just adding a counter tracking the number of checksum
failures for relfilenodes 


Agree about your concern for tracking the corruption for every single block.
I like this idea for relfilenodes tracking instead. Indeed it looks like this 
is enough useful historical information to work with.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: ExecRTCheckPerms() and many prunable partitions (checkAsUser)

2022-12-11 Thread Amit Langote
On Sun, Dec 11, 2022 at 6:25 PM Amit Langote  wrote:
> I've attached 0001 to remove those extraneous code blocks and add a
> comment mentioning that userid need not be recomputed.
>
> While staring at the build_simple_rel() bit mentioned above, I
> realized that this code fails to set userid correctly in the
> inheritance parent rels that are child relations of subquery parent
> relations, such as UNION ALL subqueries.  In that case, instead of
> copying the userid (= 0) of the parent rel, the child should look up
> its own RTEPermissionInfo, which should be there, and use the
> checkAsUser value from there.  I've attached 0002 to fix this hole.  I
> am not sure whether there's a way to add a test case for this in the
> core suite.

Ah, I realized we could just expand the test added by 553d2ec27 with a
wrapper view (to test checkAsUser functionality) and a UNION ALL query
over the view (to test this change).

I've done that in the attached updated patch, in which I've also
addressed Justin's comments.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


v2-0001-Remove-some-dead-code-in-selfuncs.c.patch
Description: Binary data


v2-0002-Correctly-set-userid-of-subquery-rel-s-child-rels.patch
Description: Binary data


Re: Time delayed LR (WAS Re: logical replication restrictions)

2022-12-11 Thread Kyotaro Horiguchi
Hello.

I asked about unexpected walsender termination caused by this feature
but I think I didn't received an answer for it and the behavior is
still exists.

Specifically, if servers have the following settings, walsender
terminates for replication timeout. After that, connection is restored
after the LR delay elapses. Although it can be said to be working in
that sense, the error happens repeatedly every about min_apply_delay
internvals but is hard to distinguish from network troubles.  I'm not
sure you're deliberately okay with it but, I don't think the behavior
causing replication timeouts is acceptable.

> wal_sender_timeout = 10s;
> wal_receiver_temeout = 10s;
> 
> create subscription ... with (min_apply_delay='60s');

This is a kind of artificial but timeout=60s and delay=5m is not an
uncommon setup and that also causes this "issue".

subscriber:
> 2022-12-12 14:17:18.139 JST LOG:  terminating walsender process due to 
> replication timeout
> 2022-12-12 14:18:11.076 JST LOG:  starting logical decoding for slot "s1"
...

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Testing DDL Deparser

2022-12-11 Thread Zheng Li
Hi,

Thanks for working on this and for the feedback!

I've added the updated deparser testing module to the DDL replication
thread in [1].

We'll add more test cases to the testing module and continue the
discussion there.

[1] 
https://www.postgresql.org/message-id/CAAD30U%2BA%3D2rjZ%2BxejNz%2Be1A%3DudWPQMxHD8W48nbhxwJRfw_qrA%40mail.gmail.com

Regards,
Zheng

On Mon, Oct 24, 2022 at 7:29 AM Alvaro Herrera  wrote:
>
> On 2022-Oct-20, Runqi Tian wrote:
>
> > My question regarding subcommand is actually on commands other than
> > ALTER TABLE. Let me give an example (You can also find this example in
> > the patch), when command like
> >
> > CREATE SCHEMA element_test CREATE TABLE foo (id int)
> >
> > is caught by ddl_command_end trigger, function
> > pg_event_trigger_ddl_commands() will return 2 records which I called
> > as subcommands in the previous email.
>
> Ah, right.
>
> I don't remember why we made these commands be separate; but for
> instance if you try to add a SERIAL column you'll also see one command
> to create the sequence, then the table, then the sequence gets its OWNED
> BY the column.
>
> I think the point is that we want to have some regularity so that an
> application can inspect the JSON blobs and perhaps modify them; if you
> make a bunch of sub-objects, this becomes more difficult.  For DDL
> replication purposes perhaps this isn't very useful (since you just grab
> it and execute on the other side as-is), but other use cases might have
> other ideas.
>
> > Is this behavior expected? I thought the deparser is designed to
> > deparse the entire command to a single command instead of dividing
> > them into 2 commands.
>
> It is expected.
>
> > It seems that keeping separate test cases in deparser tests folder is
> > better than using the test cases in core regression tests folder
> > directly. I will write some test cases in the new deparser test
> > folder.
>
> Well, the reason to use the regular regression tests rather than
> separate, is that when a developer adds a new feature, they will add
> test cases for it in regular regression tests, so deparsing of their
> command will be automatically picked up by the DDL-deparse testing
> framework.  We discussed at the time that another option would be to
> have patch reviewers ensure that the added DDL commands are also tested
> in the DDL-deparse framework, but nobody wants to add yet another thing
> that we have to remember (or, more likely, forget).
>
> > I see, it seems that a function to deparse DROP command to JSON output
> > is needed but not provided yet. I implemented a function
> > deparse_drop_ddl() in the testing harness, maybe we could consider
> > exposing a function in engine to deparse DROP command as
> > deparse_ddl_command() does.
>
> No objection against this idea.
>
> > updated to:
> >
> > 1, The deparsed JSON is the same as the expected string
>
> I would rather say "has the same effects as".
>
> > 1, Build DDL event triggers and DDL deparser into pg_regress tests so
> > that DDL commands in these tests can be captured and deparsed.
> > 2, Let the goal 3 implementation, aka the TAP test to execute test
> > cases from pg_regress, if sub and pub node don’t dump the same
> > results, some DDL commands must be changed.
> >
> > Solution 1 is more lighter weight as we only need to run pg_regress
> > once. Any other thoughts?
>
> We have several runs of pg_regress already -- apart from the normal one,
> we run it once in recovery/t/027_stream_regress.pl and once in the
> pg_upgrade test.  I'm not sure that we necessarily need to avoid another
> one here, particularly if avoiding it would potentially pollute the
> results for the regular tests.  I am okay with solution 2 personally.
>
> If we really wanted to optimize this, perhaps we should try to drive all
> three uses (pg_upgrade, stream_regress, this new test) from a single
> pg_regress run.  But ISTM we don't *have* to.
>
> --
> Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
> "Hay dos momentos en la vida de un hombre en los que no debería
> especular: cuando puede permitírselo y cuando no puede" (Mark Twain)
>
>




Re: Checksum errors in pg_stat_database

2022-12-11 Thread Michael Paquier
On Sun, Dec 11, 2022 at 08:48:15PM -0500, Tom Lane wrote:
> I think a stats table indexed solely by relfilenode wouldn't be a great
> idea, because you'd lose all the stats about a table as soon as you
> do vacuum full/cluster/rewriting-alter-table/etc.  Can we create two
> index structures over the same stats table entries, so you can look
> up by either relfilenode or OID?  I'm not quite sure how to manage
> rewrites, where you transiently have two relfilenodes for "the
> same" table ... maybe we could allow multiple pointers to the same
> stats entry??

FWIW, I am not sure that I would care much if we were to dropped the
stats associated to a relfilenode on a rewrite.  In terms of checksum
failures, tuples are deformed so if there is one checksum failure a
rewrite would just not happen.  The potential complexity is not really
appealing compared to the implementation simplicity and its gains, and
rewrites are lock-heavy so I'd like to think that people avoid them
(cough)..
--
Michael


signature.asc
Description: PGP signature


Re: Tree-walker callbacks vs -Wdeprecated-non-prototype

2022-12-11 Thread Thomas Munro
On Mon, Dec 12, 2022 at 4:07 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > As visible on seawasp and locally (16/main branch nightly packages),
> > they decided to start warning about these casts with a new strict
> > variant of the warning.  Their discussion:
>
> > https://reviews.llvm.org/D134831
>
> > There are also a few other cases unrelated to this thread's original
> > problem, for example casts involving pg_funcptr_t, HashCompareFunc.  I
> > guess our options would be to turn that warning off, or reconsider and
> > try shoving the cast of "generic" arguments pointers down into the
> > functions?
>
> I'm for "turn the warning off".  Per previous discussion, adhering
> strictly to that rule would make our code worse (less legible AND
> less safe), not better.

Alright, this seems to do the trick here.
From b01690ca859b5f27ece386577b7b9d9d7572b8d2 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 12 Dec 2022 16:28:24 +1300
Subject: [PATCH] Disable clang 16's -Wcast-function-type-strict.

Clang 16 is still in development, but seawasp reveals that it has
started warning about many of our casts of function pointers.  Disable
the new warning for now, since otherwise buildfarm animal seawasp fails.

May be back-patched with other Clang/LLVM 16 changes around release
time.

Discussion: https://postgr.es/m/CA%2BhUKGJvX%2BL3aMN84ksT-cGy08VHErRNip3nV-WmTx7f6Pqhyw%40mail.gmail.com
---
 configure| 44 
 configure.ac |  6 ++
 meson.build  |  3 +++
 3 files changed, 53 insertions(+)

diff --git a/configure b/configure
index 62f382c1d1..5025448a05 100755
--- a/configure
+++ b/configure
@@ -6611,6 +6611,50 @@ fi
   if test -n "$NOT_THE_CFLAGS"; then
 CFLAGS="$CFLAGS -Wno-stringop-truncation"
   fi
+  # Suppress clang 16's strict warnings about function casts
+  NOT_THE_CFLAGS=""
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CC} supports -Wcast-function-type-strict, for NOT_THE_CFLAGS" >&5
+$as_echo_n "checking whether ${CC} supports -Wcast-function-type-strict, for NOT_THE_CFLAGS... " >&6; }
+if ${pgac_cv_prog_CC_cflags__Wcast_function_type_strict+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_save_CFLAGS=$CFLAGS
+pgac_save_CC=$CC
+CC=${CC}
+CFLAGS="${NOT_THE_CFLAGS} -Wcast-function-type-strict"
+ac_save_c_werror_flag=$ac_c_werror_flag
+ac_c_werror_flag=yes
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  pgac_cv_prog_CC_cflags__Wcast_function_type_strict=yes
+else
+  pgac_cv_prog_CC_cflags__Wcast_function_type_strict=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+ac_c_werror_flag=$ac_save_c_werror_flag
+CFLAGS="$pgac_save_CFLAGS"
+CC="$pgac_save_CC"
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_prog_CC_cflags__Wcast_function_type_strict" >&5
+$as_echo "$pgac_cv_prog_CC_cflags__Wcast_function_type_strict" >&6; }
+if test x"$pgac_cv_prog_CC_cflags__Wcast_function_type_strict" = x"yes"; then
+  NOT_THE_CFLAGS="${NOT_THE_CFLAGS} -Wcast-function-type-strict"
+fi
+
+
+  if test -n "$NOT_THE_CFLAGS"; then
+CFLAGS="$CFLAGS -Wno-cast-function-type-strict"
+  fi
 elif test "$ICC" = yes; then
   # Intel's compiler has a bug/misoptimization in checking for
   # division by NAN (NaN == 0), -mp1 fixes it, so add it to the CFLAGS.
diff --git a/configure.ac b/configure.ac
index cfb10f59ce..4cd328fc83 100644
--- a/configure.ac
+++ b/configure.ac
@@ -573,6 +573,12 @@ if test "$GCC" = yes -a "$ICC" = no; then
   if test -n "$NOT_THE_CFLAGS"; then
 CFLAGS="$CFLAGS -Wno-stringop-truncation"
   fi
+  # Suppress clang 16's strict warnings about function casts
+  NOT_THE_CFLAGS=""
+  PGAC_PROG_CC_VAR_OPT(NOT_THE_CFLAGS, [-Wcast-function-type-strict])
+  if test -n "$NOT_THE_CFLAGS"; then
+CFLAGS="$CFLAGS -Wno-cast-function-type-strict"
+  fi
 elif test "$ICC" = yes; then
   # Intel's compiler has a bug/misoptimization in checking for
   # division by NAN (NaN == 0), -mp1 fixes it, so add it to the CFLAGS.
diff --git a/meson.build b/meson.build
index a62d8171cc..9df8685dfd 100644
--- a/meson.build
+++ b/meson.build
@@ -1773,6 +1773,9 @@ negative_warning_flags = [
   'format-truncation',
   'stringop-truncation',
 
+  # Suppress clang 16's strict warnings about function casts
+  'cast-function-type-strict',
+
   # To make warning_level=2 / -Wextra work, we'd need at least the following
   # 'clobbered',
   # 'missing-field-initializers',
-- 
2.35.1



Re: Improve WALRead() to suck data directly from WAL buffers when possible

2022-12-11 Thread Kyotaro Horiguchi
Sorry for the confusion.

At Mon, 12 Dec 2022 12:06:36 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Mon, 12 Dec 2022 11:57:17 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> > This patch copies the bleeding edge WAL page without recording the
> > (next) insertion point nor checking whether all in-progress insertion
> > behind the target LSN have finished. Thus the copied page may have
> > holes.  That being said, the sequential-reading nature and the fact
> > that WAL buffers are zero-initialized may make it work for recovery,
> > but I don't think this also works for replication.
> 
> Mmm. I'm a bit dim. Recovery doesn't read concurrently-written
> records. Please forget about recovery.

NO... Logical walsenders do that. So, please forget about this...

> > I remember that the one of the advantage of reading the on-memory WAL
> > records is that that allows walsender to presend the unwritten
> > records. So perhaps we should manage how far the buffer is filled with
> > valid content (or how far we can presend) in this feature.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Tree-walker callbacks vs -Wdeprecated-non-prototype

2022-12-11 Thread Tom Lane
Thomas Munro  writes:
> As visible on seawasp and locally (16/main branch nightly packages),
> they decided to start warning about these casts with a new strict
> variant of the warning.  Their discussion:

> https://reviews.llvm.org/D134831

> There are also a few other cases unrelated to this thread's original
> problem, for example casts involving pg_funcptr_t, HashCompareFunc.  I
> guess our options would be to turn that warning off, or reconsider and
> try shoving the cast of "generic" arguments pointers down into the
> functions?

I'm for "turn the warning off".  Per previous discussion, adhering
strictly to that rule would make our code worse (less legible AND
less safe), not better.

regards, tom lane




Re: Improve WALRead() to suck data directly from WAL buffers when possible

2022-12-11 Thread Kyotaro Horiguchi
At Mon, 12 Dec 2022 11:57:17 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> This patch copies the bleeding edge WAL page without recording the
> (next) insertion point nor checking whether all in-progress insertion
> behind the target LSN have finished. Thus the copied page may have
> holes.  That being said, the sequential-reading nature and the fact
> that WAL buffers are zero-initialized may make it work for recovery,
> but I don't think this also works for replication.

Mmm. I'm a bit dim. Recovery doesn't read concurrently-written
records. Please forget about recovery.

> I remember that the one of the advantage of reading the on-memory WAL
> records is that that allows walsender to presend the unwritten
> records. So perhaps we should manage how far the buffer is filled with
> valid content (or how far we can presend) in this feature.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Improve WALRead() to suck data directly from WAL buffers when possible

2022-12-11 Thread Kyotaro Horiguchi
At Fri, 9 Dec 2022 14:33:39 +0530, Bharath Rupireddy 
 wrote in 
> The patch introduces concurrent readers for the WAL buffers, so far
> only there are concurrent writers. In the patch, WALRead() takes just
> one lock (WALBufMappingLock) in shared mode to enable concurrent
> readers and does minimal things - checks if the requested WAL page is
> present in WAL buffers, if so, copies the page and releases the lock.
> I think taking just WALBufMappingLock is enough here as the concurrent
> writers depend on it to initialize and replace a page in WAL buffers.
> 
> I'll add this to the next commitfest.
> 
> Thoughts?

This adds copying of the whole page (at least) at every WAL *record*
read, fighting all WAL writers by taking WALBufMappingLock on a very
busy page while the copying. I'm a bit doubtful that it results in an
overall improvement. It seems to me almost all pread()s here happens
on file buffer so it is unclear to me that copying a whole WAL page
(then copying the target record again) wins over a pread() call that
copies only the record to read. Do you have an actual number of how
frequent WAL reads go to disk, or the actual number of performance
gain or real I/O reduction this patch offers?

This patch copies the bleeding edge WAL page without recording the
(next) insertion point nor checking whether all in-progress insertion
behind the target LSN have finished. Thus the copied page may have
holes.  That being said, the sequential-reading nature and the fact
that WAL buffers are zero-initialized may make it work for recovery,
but I don't think this also works for replication.

I remember that the one of the advantage of reading the on-memory WAL
records is that that allows walsender to presend the unwritten
records. So perhaps we should manage how far the buffer is filled with
valid content (or how far we can presend) in this feature.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Tree-walker callbacks vs -Wdeprecated-non-prototype

2022-12-11 Thread Thomas Munro
As visible on seawasp and locally (16/main branch nightly packages),
they decided to start warning about these casts with a new strict
variant of the warning.  Their discussion:

https://reviews.llvm.org/D134831

There are also a few other cases unrelated to this thread's original
problem, for example casts involving pg_funcptr_t, HashCompareFunc.  I
guess our options would be to turn that warning off, or reconsider and
try shoving the cast of "generic" arguments pointers down into the
functions?




Re: Why does L Blink Tree need lock coupling?

2022-12-11 Thread Peter Geoghegan
On Sun, Dec 11, 2022 at 5:38 PM Oliver Yang  wrote:
> The README in nbtree mentions that L algorithm must couple
> locks when moving right during ascent for insertion.  However,
> it's hard to see why that's necessary.

You're not the first person to ask about this exact point in the last
few years. The last time somebody asked me about it via private email.
I'm surprised that there is even that level of interest.

It's not necessary to couple locks on internal levels of the tree in
nbtree, of course. Which is why we don't couple locks in
_bt_getstackbuf().

Note that we have two "move right" functions here --
_bt_getstackbuf(), and _bt_moveright(). Whereas the L pseudocode has
one function for both things. Perhaps just because it's only abstract
pseudocode -- it needs to be understood in context.

You have to be realistic about how faithfully a real world system can
be expected to implement something like L Some of the assumptions
made by the paper just aren't reasonable, especially the assumption
about atomic page reads (I always thought that that assumption was
odd). Plus nbtree does plenty of things that L don't consider,
including things that are obviously related to the stuff they do talk
about. For example, nbtree has left links, while L does not (nor
does Lanin & Shasha, though they do have something weirdly similar
that they call "out links").

> Since L mostly
> discussed concurrent insertions and searches, what can go wrong
> if inserters only acquire one lock at a time?

You have to consider that we don't match on the separator key during
the ascent of the B-tree structure following a split. That's another a
big difference between nbtree and the paper -- we store a block number
in our descent stack instead.

Imagine if PostgreSQL's nbtree did match on a separator key, like
Lehman and Yao. Think about this scenario:

* We descend the tree and follow a downlink, while remembering the
associated separator key on our descent stack. It is a simple 3 level
B-Tree.

* We split a leaf page, and have to relocate the separator 1 level up
(in level 1).

* The high key of the internal page on level 1 exactly matches our
separator key -- so it must be that the separator key is to the right.

* We release our lock on the original parent, and then lock and read
its right sibling. But where do we insert our new separator key?

We cannot match the original separator key because it doesn't exist in
this other internal page to the right -- there is no first separator
key in any internal page, including when it isn't the root of the
tree. Actually, you could say that there is a separator key associated
with the downlink, but it's only a negative infinity sentinel key.
Negative infinity keys represent "absolute negative infinity" when
they're from the root of the entire B-Tree, but in any other internal
page it represents "relative negative infinity" -- it's only lower
than everything in that particular subtree only.

At the very least it seems risky to assume that it's safe to match on
the separator key without lock coupling so that we see that the
downlink really does come after the matches-descent-stack high key
separator in the original parent page. You could probably make it work
if you had to, but it's annoying to explain, and not actually that
valuable -- moving right within _bt_getstackbuf() is a rare case in
general (and trying to relocate the downlink that happens to become
the first downlink following a concurrent internal page split is even
rarer).

In PostgreSQL it's not annoying to understand why it's okay, because
it's obviously okay to just match on the downlink/block number
directly, which is how it has always worked. It only becomes a problem
when you try to understand what Lehman and Yao meant. It's unfortunate
that they say "at most 3 locks", and therefore draw attention to this
not-very-important issue. Lehman and Yao probably found it easier to
say "let's try to keep our paper simple by making the move right
routine couple locks in very rare cases where it is actually necessary
to move right".

> The Lanin paper cited in README also agrees that B-link
> structure allows inserts and searches to lock only one node at a
> time although it's not apparent in L itself.

But the Lanin & Shasha paper has a far more optimistic approach. They
make rather bold claims about how many locks they can get away with
holding at any one time. That makes it significantly different to L
as well as nbtree (nbtree is far closer to L than it is to Lanin &
Shasha).

--
Peter Geoghegan




Re: Speedup generation of command completion tags

2022-12-11 Thread David Rowley
Thanks for having a look at this.

On Sun, 11 Dec 2022 at 11:05, Andres Freund  wrote:
>
> On 2022-12-10 20:32:06 +1300, David Rowley wrote:
> > @@ -20,13 +20,14 @@
> >  typedef struct CommandTagBehavior
> >  {
> >   const char *name;
> > + const uint8 namelen;
>
> Perhaps worth adding a comment noting that namelen is the length without the
> null byte?

I've added that now plus a few other fields that could be easily
documented. I left out commenting all of the remaining fields as
documenting table_rewrite_ok seemed slightly more than this patch
should be doing.  There's commands that rewrite tables, e.g. VACUUM
FULL that have that as false. Looks like this is only used for
commands that *might* rewrite the table. I didn't want to tackle that
in this patch.

> What about moving make_completion_tag() to cmdtag.c? Then we could just get
> the entire CommandTagBehaviour struct at once. It's not super pretty to pass
> QueryCompletion to a routine in cmdtag.c, but it's not awful. And if we deem
> it problematic, we could just pass qc->commandTag, qc->nprocessed as a
> separate arguments.

That seems like a good idea. I've renamed and moved the function in
the attached. I also adjusted how the trailing NUL char is appended to
avoid having to calculate len + 1 and append the NUL char twice for
the most commonly taken path.

> I wonder if any of the other GetCommandTagName() would benefit noticably from
> not having to compute the length. I guess the calls
> set_ps_display(GetCommandTagName()) calls in exec_simple_query() and
> exec_execute_message() might, although set_ps_display() isn't exactly zero
> overhead. But I do see it show up as a few percent in profiles, with the
> biggest contributor being the call to strlen.

I think that could be improved for sure.  It does seem like we'd need
to add set_ps_display_with_len() to make what you said work.  There's
probably lower hanging fruit in that function that could be fixed
without having to do that, however.  For example:

strlcpy(ps_buffer + ps_buffer_fixed_size, activity,
ps_buffer_size - ps_buffer_fixed_size);
ps_buffer_cur_len = strlen(ps_buffer);

could be written as:

strlcpy(ps_buffer + ps_buffer_fixed_size, activity,
ps_buffer_size - ps_buffer_fixed_size);
ps_buffer_cur_len = ps_buffer_fixed_size + Min(strlen(activity),
ps_buffer_size - ps_buffer_fixed_size - 1);

That's pretty horrible to read though.

This sort of thing also makes me think that our investment in having
more usages of strlcpy() and fewer usages of strncpy was partially a
mistake. There are exactly 2 usages of the return value of strlcpy in
our entire source tree. That's about 1% of all calls.  Likely what
would be better is a function that returns the number of bytes
*actually* copied instead of one that returns the number of bytes that
it would have copied if it hadn't run out of space. Such a function
could be defined as:

size_t
strdcpy(char * const dst, const char *src, ptrdiff_t len)
{
char *dstp = dst;

while (len-- > 0)
{
if ((*dstp = *src++) == '\0')
{
*dstp = '\0';
break;
}
dstp++;
}
return (dstp - dst);
}

Then we could append to strings like:

char buffer[STRING_SIZE];
char *bufp = buffer;

bufp += strdcpy(bufp, "01", STRING_SIZE - (bufp - buffer));
bufp += strdcpy(bufp, "23", STRING_SIZE - (bufp - buffer));
bufp += strdcpy(bufp, "45", STRING_SIZE - (bufp - buffer));

which allows transformation of the set_ps_display() code to:

pg_buffer_cur_len = ps_buffer_fixed_size;
pg_buffer_cur_len += strdcpy(ps_buffer + ps_buffer_fixed_size,
activity, ps_buffer_size - ps_buffer_fixed_size);

(Assume the name strdcpy as a placeholder name for an actual name that
does not conflict with something that it's not.)

I'd rather not go into too much detail about that here though. I don't
see any places that can make use of the known tag length without going
to the trouble of inventing new functions or changing the signature of
existing ones.

David
From 20b386ca1794421ec0f2aaf81a08ab6f0ed2d9e2 Mon Sep 17 00:00:00 2001
From: David Rowley 
Date: Sat, 10 Dec 2022 20:27:01 +1300
Subject: [PATCH v2] Speed up creation of command completion tags

Author: David Rowley, Andres Freund
---
 src/backend/tcop/cmdtag.c | 73 +--
 src/backend/tcop/dest.c   | 29 +++-
 src/include/tcop/cmdtag.h |  5 +++
 src/include/tcop/dest.h   |  2 --
 4 files changed, 79 insertions(+), 30 deletions(-)

diff --git a/src/backend/tcop/cmdtag.c b/src/backend/tcop/cmdtag.c
index 262484f561..8e52ff084a 100644
--- a/src/backend/tcop/cmdtag.c
+++ b/src/backend/tcop/cmdtag.c
@@ -15,18 +15,22 @@
 
 #include "miscadmin.h"
 #include "tcop/cmdtag.h"
+#include "utils/builtins.h"
 
 
 typedef struct CommandTagBehavior
 {
-   const char *name;
+   const char *name;   /* tag name, e.g. 
"SELECT" */
+   const uint8 namelen;/* set to 

Re: Checksum errors in pg_stat_database

2022-12-11 Thread Tom Lane
Michael Paquier  writes:
> On Sun, Dec 11, 2022 at 04:51:49PM -0800, Andres Freund wrote:
>> I think there's a good argument for starting to track some stats based on the
>> relfilenode, rather the oid, because it'd allow us to track e.g. the number 
>> of
>> writes for a relation too (we don't have the oid when writing out
>> buffers). But that's a relatively large change...

> Yeah.  I was thinking among the lines of sync requests and sync
> failures, as well.

I think a stats table indexed solely by relfilenode wouldn't be a great
idea, because you'd lose all the stats about a table as soon as you
do vacuum full/cluster/rewriting-alter-table/etc.  Can we create two
index structures over the same stats table entries, so you can look
up by either relfilenode or OID?  I'm not quite sure how to manage
rewrites, where you transiently have two relfilenodes for "the
same" table ... maybe we could allow multiple pointers to the same
stats entry??

regards, tom lane




Why does L Blink Tree need lock coupling?

2022-12-11 Thread Oliver Yang
Hi,

The README in nbtree mentions that L algorithm must couple
locks when moving right during ascent for insertion.  However,
it's hard to see why that's necessary.  Since L mostly
discussed concurrent insertions and searches, what can go wrong
if inserters only acquire one lock at a time?

The Lanin paper cited in README also agrees that B-link
structure allows inserts and searches to lock only one node at a
time although it's not apparent in L itself.



Thanks,

Hong


Re: Checksum errors in pg_stat_database

2022-12-11 Thread Michael Paquier
On Sun, Dec 11, 2022 at 04:51:49PM -0800, Andres Freund wrote:
> Why were you thinking of tracking it separately from PgStat_StatTabEntry?

We only know the relfilenode when loading the page on a checksum
failure, not its parent relation, and there are things like physical
base backups where we would not know them anyway because we may not be
connected to a database.  Or perhaps it would be possible to link
table entries with their relfilenodes using some tweaks in the stat
APIs?  I am sure that you know the business in this area better than I
do currently :)

> I think there's a good argument for starting to track some stats based on the
> relfilenode, rather the oid, because it'd allow us to track e.g. the number of
> writes for a relation too (we don't have the oid when writing out
> buffers). But that's a relatively large change...

Yeah.  I was thinking among the lines of sync requests and sync
failures, as well.
--
Michael


signature.asc
Description: PGP signature


Re: Checksum errors in pg_stat_database

2022-12-11 Thread Andres Freund
Hi,

On 2022-12-12 08:40:04 +0900, Michael Paquier wrote:
> What about just adding a counter tracking the number of checksum
> failures for relfilenodes in a new structure related to them (note
> that I did not write PgStat_StatTabEntry)?

Why were you thinking of tracking it separately from PgStat_StatTabEntry?

I think there's a good argument for starting to track some stats based on the
relfilenode, rather the oid, because it'd allow us to track e.g. the number of
writes for a relation too (we don't have the oid when writing out
buffers). But that's a relatively large change...

Greetings,

Andres Freund




Re: GetNewObjectId question

2022-12-11 Thread Michael Paquier
On Sun, Dec 11, 2022 at 04:20:17PM +0900, Michael Paquier wrote:
> Looks OK seen from here.  Thanks for the patch!  I don't have much
> freshness and time today, but I can get back to this thread tomorrow.

Applied as of eae7fe4.
--
Michael


signature.asc
Description: PGP signature


Re: Force streaming every change in logical decoding

2022-12-11 Thread Peter Smith
On Sat, Dec 10, 2022 at 5:03 PM Dilip Kumar  wrote:
>
> On Tue, Dec 6, 2022 at 11:53 AM shiy.f...@fujitsu.com
>  wrote:
> >
> > Hi hackers,
> >
> > In logical decoding, when logical_decoding_work_mem is exceeded, the 
> > changes are
> > sent to output plugin in streaming mode. But there is a restriction that the
> > minimum value of logical_decoding_work_mem is 64kB. I tried to add a GUC to
> > allow sending every change to output plugin without waiting until
> > logical_decoding_work_mem is exceeded.
> >
> > This helps to test streaming mode. For example, to test "Avoid streaming the
> > transaction which are skipped" [1], it needs many XLOG_XACT_INVALIDATIONS
> > messages. With the new option, it can be tested with fewer changes and in 
> > less
> > time. Also, this new option helps to test more scenarios for "Perform 
> > streaming
> > logical transactions by background workers" [2].
>
> Some comments on the patch
>
...
> This GUC name "force_stream_mode" somehow appears like we are forcing
> this streaming mode irrespective of whether the
> subscriber has requested for this mode or not.  But actually it is not
> that, it is just streaming each change if
> it is enabled.  So we might need to think on the name (at least we
> should avoid using *mode* in the name IMHO).
>

I thought the same. Names like those shown below might be more appropriate:
stream_checks_work_mem = true/false
stream_mode_checks_size = true/false
stream_for_large_tx_only = true/false
... etc.

The GUC name length could get a bit unwieldy but isn't it better for
it to have the correct meaning than to have a shorter but slightly
ambiguous name? Anyway, it is a developer option so I guess longer
names are less of a problem.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Checksum errors in pg_stat_database

2022-12-11 Thread Michael Paquier
On Sun, Dec 11, 2022 at 09:18:42PM +0100, Magnus Hagander wrote:
> It would be less of a concern yes, but I think it still would be a concern.
> If you have a large amount of corruption you could quickly get to millions
> of rows to keep track of which would definitely be a problem in shared
> memory as well, wouldn't it?

Yes.  I have discussed this item with Bertrand off-list and I share
the same concern.  This would lead to an lot of extra workload on a
large seqscan for a corrupted relation when the stats are written
(shutdown delay) while bloating shared memory with potentially
millions of items even if variable lists are handled through a dshash
and DSM.

> But perhaps we could keep a list of "the last 100 checksum failures" or
> something like that?

Applying a threshold is one solution.  Now, a second thing I have seen
in the past is that some disk partitions were busted but not others,
and the current database-level counters are not enough to make a
difference when it comes to grab patterns in this area.  A list of the
last N failures may be able to show some pattern, but that would be
like analyzing things with a lot of noise without a clear conclusion.
Anyway, the workload caused by the threshold number had better be
measured before being decided (large set of relation files with a full
range of blocks corrupted, much better if these are in the OS cache
when scanned), which does not change the need of a benchmark.

What about just adding a counter tracking the number of checksum
failures for relfilenodes in a new structure related to them (note
that I did not write PgStat_StatTabEntry)?

If we do that, it is then possible to cross-check the failures with
tablespaces, which would point to disk areas that are more sensitive
to corruption.
--
Michael


signature.asc
Description: PGP signature


Re: PGDOCS - Logical replication GUCs - added some xrefs

2022-12-11 Thread Tom Lane
Peter Smith  writes:
> On Sat, Dec 10, 2022 at 5:10 AM samay sharma  wrote:
>> Also, I don't see this patch in the 2023/01 commitfest. Might be worth 
>> moving to that one.

> Hmm, it was already recorded in the 2022-11 commitfest [1], so I
> assumed it would just carry forward to the next one.

Ian is still working on closing out the November 'fest :-(.
I suspect that in a day or so that one will get moved, and
you will have duplicate entries in the January 'fest.

regards, tom lane




Re: PGDOCS - Logical replication GUCs - added some xrefs

2022-12-11 Thread Peter Smith
On Sat, Dec 10, 2022 at 5:10 AM samay sharma  wrote:
>

>
> I don't have any other feedback. This looks good to me.
>
> Also, I don't see this patch in the 2023/01 commitfest. Might be worth moving 
> to that one.
>

Hmm, it was already recorded in the 2022-11 commitfest [1], so I
assumed it would just carry forward to the next one.

Anyway, I've added it again to 2023-01 commitfest [2]. Thanks for telling me.

--
[1] 2022-11 CF - https://commitfest.postgresql.org/40/3959/
[2] 2023-01 CF - https://commitfest.postgresql.org/41/4061/

Kind Regards,
Peter Smith.
Fujitsu Australia.




Date-time extraneous fields with reserved keywords

2022-12-11 Thread Joseph Koshakow
Hi all,

Attached is a patch to fix another parsing error for date-time types
that allow extraneous fields with certain reserved keywords. For
example both `date '1995-08-06 epoch'` and `date 'today epoch'` were
considered valid dates that both resolve to 1970-01-01.

- Joe Koshakow
From fb4c161afff08b926eea12d8689a148e99cbdb5c Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Sun, 11 Dec 2022 16:08:43 -0500
Subject: [PATCH] Handle extraneous fields in date-time input

DecodeDateTime sometimest allowed extraneous fields to be included with
reserved keywords. For example `date '1995-08-06 epoch'` would be
parsed successfully, but the date was ignored. This commit fixes the
issue so an error is returned instead.
---
 src/backend/utils/adt/datetime.c|  3 +++
 src/test/regress/expected/date.out  | 17 +
 src/test/regress/expected/timestamp.out | 17 +
 src/test/regress/sql/date.sql   |  6 ++
 src/test/regress/sql/timestamp.sql  |  6 ++
 5 files changed, 49 insertions(+)

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index b5b117a8ca..1e141a06f4 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -1431,6 +1431,9 @@ DecodeDateTime(char **field, int *ftype, int nf,
 break;
 
 			default:
+/* only allowed if we haven't already parsed some fields */
+if (fmask)
+	return DTERR_BAD_FORMAT;
 *dtype = val;
 		}
 
diff --git a/src/test/regress/expected/date.out b/src/test/regress/expected/date.out
index f8f83e40e9..50a4a52d8c 100644
--- a/src/test/regress/expected/date.out
+++ b/src/test/regress/expected/date.out
@@ -1526,3 +1526,20 @@ select make_time(10, 55, 100.1);
 ERROR:  time field value out of range: 10:55:100.1
 select make_time(24, 0, 2.1);
 ERROR:  time field value out of range: 24:00:2.1
+-- test errors with reserved keywords
+SELECT date '1995-08-06 epoch';
+ERROR:  invalid input syntax for type date: "1995-08-06 epoch"
+LINE 1: SELECT date '1995-08-06 epoch';
+^
+SELECT date '1995-08-06 infinity';
+ERROR:  invalid input syntax for type date: "1995-08-06 infinity"
+LINE 1: SELECT date '1995-08-06 infinity';
+^
+SELECT date '1995-08-06 -infinity';
+ERROR:  invalid input syntax for type date: "1995-08-06 -infinity"
+LINE 1: SELECT date '1995-08-06 -infinity';
+^
+SELECT date 'now infinity';
+ERROR:  invalid input syntax for type date: "now infinity"
+LINE 1: SELECT date 'now infinity';
+^
diff --git a/src/test/regress/expected/timestamp.out b/src/test/regress/expected/timestamp.out
index be66274738..f68ecd19ea 100644
--- a/src/test/regress/expected/timestamp.out
+++ b/src/test/regress/expected/timestamp.out
@@ -2110,3 +2110,20 @@ select * from generate_series('2020-01-01 00:00'::timestamp,
   '2020-01-02 03:00'::timestamp,
   '0 hour'::interval);
 ERROR:  step size cannot equal zero
+-- test errors with reserved keywords
+SELECT timestamp '1995-08-06 01:01:01 epoch';
+ERROR:  invalid input syntax for type timestamp: "1995-08-06 01:01:01 epoch"
+LINE 1: SELECT timestamp '1995-08-06 01:01:01 epoch';
+ ^
+SELECT timestamp '1995-08-06 01:01:01 infinity';
+ERROR:  invalid input syntax for type timestamp: "1995-08-06 01:01:01 infinity"
+LINE 1: SELECT timestamp '1995-08-06 01:01:01 infinity';
+ ^
+SELECT timestamp '1995-08-06 01:01:01 -infinity';
+ERROR:  invalid input syntax for type timestamp: "1995-08-06 01:01:01 -infinity"
+LINE 1: SELECT timestamp '1995-08-06 01:01:01 -infinity';
+ ^
+SELECT timestamp 'today epoch';
+ERROR:  invalid input syntax for type timestamp: "today epoch"
+LINE 1: SELECT timestamp 'today epoch';
+ ^
diff --git a/src/test/regress/sql/date.sql b/src/test/regress/sql/date.sql
index 9fd15be5f9..82da992e3a 100644
--- a/src/test/regress/sql/date.sql
+++ b/src/test/regress/sql/date.sql
@@ -371,3 +371,9 @@ select make_date(2013, 13, 1);
 select make_date(2013, 11, -1);
 select make_time(10, 55, 100.1);
 select make_time(24, 0, 2.1);
+
+-- test errors with reserved keywords
+SELECT date '1995-08-06 epoch';
+SELECT date '1995-08-06 infinity';
+SELECT date '1995-08-06 -infinity';
+SELECT date 'now infinity';
diff --git a/src/test/regress/sql/timestamp.sql b/src/test/regress/sql/timestamp.sql
index e1175b12ce..f7e3fe1270 100644
--- a/src/test/regress/sql/timestamp.sql
+++ b/src/test/regress/sql/timestamp.sql
@@ -391,3 +391,9 @@ select generate_series('2022-01-01 00:00'::timestamp,
 select * from generate_series('2020-01-01 00:00'::timestamp,
   '2020-01-02 03:00'::timestamp,
   '0 hour'::interval);
+
+-- test errors with reserved keywords
+SELECT timestamp '1995-08-06 01:01:01 epoch';
+SELECT timestamp '1995-08-06 01:01:01 infinity';

Re: static assert cleanup

2022-12-11 Thread Peter Smith
On Fri, Dec 9, 2022 at 6:47 PM Peter Eisentraut
 wrote:
>
> A number of static assertions could be moved to better places.
>
> We first added StaticAssertStmt() in 2012, which required all static
> assertions to be inside function bodies.  We then added
> StaticAssertDecl() in 2020, which enabled static assertions on file
> level.  We have a number of calls that were stuck in not-really-related
> functions for this historical reason.  This patch set cleans that up.
>
> 0001-Update-static-assert-usage-comment.patch
>
> This updates the usage information in c.h to be more current and precise.
>
> 0002-Move-array-size-related-static-assertions.patch
>
> This moves some obviously poorly placed ones.
>
> 0003-Move-some-static-assertions-to-better-places.patch
>
> This moves some that I thought were suboptimally placed but it could be
> debated or refined.
>
> 0004-Use-StaticAssertDecl-where-possible.patch
>
> This just changes some StaticAssertStmt() to StaticAssertDecl() where
> appropriate.  It's optional.

Patch 0002

diff --git a/src/backend/utils/cache/syscache.c
b/src/backend/utils/cache/syscache.c
index eec644ec84..bb3dd6f4d2 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -1040,6 +1040,9 @@ static const struct cachedesc cacheinfo[] = {
  }
 };

+StaticAssertDecl(SysCacheSize == (int) lengthof(cacheinfo),
+ "SysCacheSize does not match syscache.c's array");
+
 static CatCache *SysCache[SysCacheSize];

In almost every example I found of StaticAssertXXX, the lengthof(arr)
part came first in the condition. Since you are modifying this anyway,
should this one also be reversed for consistency?

==

Patch 0004

diff --git a/src/backend/executor/execExprInterp.c
b/src/backend/executor/execExprInterp.c
index 1dab2787b7..ec26ae506f 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -496,7 +496,7 @@ ExecInterpExpr(ExprState *state, ExprContext
*econtext, bool *isnull)
  &_EEOP_LAST
  };

- StaticAssertStmt(EEOP_LAST + 1 == lengthof(dispatch_table),
+ StaticAssertDecl(EEOP_LAST + 1 == lengthof(dispatch_table),
  "dispatch_table out of whack with ExprEvalOp");

Ditto the previous comment.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Error-safe user functions

2022-12-11 Thread Tom Lane
Andres Freund  writes:
> On 2022-12-11 12:24:11 -0500, Tom Lane wrote:
>> I spent some time looking at this, and was discouraged to conclude
>> that the notational mess would probably be substantially out of
>> proportion to the value.  The main problem is that we'd have to change
>> the API of pushJsonbValue, which has more than 150 call sites, most
>> of which would need to grow a new test for failure return.  Maybe
>> somebody will feel like tackling that at some point, but not me today.

> Could we address this more minimally by putting the error state into the
> JsonbParseState and add a check for that error state to convertToJsonb() or
> such (by passing in the JsonbParseState)? We'd need to return immediately in
> pushJsonbValue() if there's already an error, but that that's not too bad.

We could shoehorn error state into the JsonbParseState, although the
fact that that stack normally starts out empty is a bit of a problem.
I think you'd have to push a dummy entry if you want soft errors,
store the error state pointer into that, and have pushState() copy
down the parent's error pointer.  Kind of ugly, but do-able.  Whether
it's better than replacing that argument with a pointer-to-struct-
that-includes-the-stack-and-the-error-pointer wasn't real clear to me.

What seemed like a mess was getting the calling code to quit early.
I'm not convinced that just putting an immediate exit into pushJsonbValue
would be enough, because the callers tend to assume a series of calls
will behave as they expect.  Probably some of the call sites could
ignore the issue, but you'd still end with a lot of messy changes
I fear.

regards, tom lane




Re: Error-safe user functions

2022-12-11 Thread Andres Freund
Hi,

On 2022-12-11 12:24:11 -0500, Tom Lane wrote:
> Andrew Dunstan  writes:
> > On 2022-12-10 Sa 14:38, Tom Lane wrote:
> >> I have not done anything here about errors within JsonbValueToJsonb.
> >> There would need to be another round of API-extension in that area
> >> if we want to be able to trap its errors.  As you say, those are mostly
> >> about exceeding implementation size limits, so I suppose one could argue
> >> that they are not so different from palloc failure.  It's still annoying.
> >> If people are good with the changes attached, I might take a look at
> >> that.
> 
> > Awesome.
> 
> I spent some time looking at this, and was discouraged to conclude
> that the notational mess would probably be substantially out of
> proportion to the value.  The main problem is that we'd have to change
> the API of pushJsonbValue, which has more than 150 call sites, most
> of which would need to grow a new test for failure return.  Maybe
> somebody will feel like tackling that at some point, but not me today.

Could we address this more minimally by putting the error state into the
JsonbParseState and add a check for that error state to convertToJsonb() or
such (by passing in the JsonbParseState)? We'd need to return immediately in
pushJsonbValue() if there's already an error, but that that's not too bad.

Greetings,

Andres Freund




Re: Checksum errors in pg_stat_database

2022-12-11 Thread Magnus Hagander
On Thu, Dec 8, 2022 at 2:35 PM Drouvot, Bertrand <
bertranddrouvot...@gmail.com> wrote:

>
>
> On 4/2/19 7:06 PM, Magnus Hagander wrote:
> > On Tue, Apr 2, 2019 at 8:47 AM Michael Paquier  > wrote:
> >
> > On Tue, Apr 02, 2019 at 07:43:12AM +0200, Julien Rouhaud wrote:
> >  > On Tue, Apr 2, 2019 at 6:56 AM Michael Paquier <
> mich...@paquier.xyz > wrote:
> >  >>  One thing which is not
> >  >> proposed on this patch, and I am fine with it as a first draft,
> is
> >  >> that we don't have any information about the broken block number
> and
> >  >> the file involved.  My gut tells me that we'd want a separate
> view,
> >  >> like pg_stat_checksums_details with one tuple per (dboid, rel,
> fork,
> >  >> blck) to be complete.  But that's just for future work.
> >  >
> >  > That could indeed be nice.
> >
> > Actually, backpedaling on this one...  pg_stat_checksums_details may
> > be a bad idea as we could finish with one row per broken block.  If
> > a corruption is spreading quickly, pgstat would not be able to
> sustain
> > that amount of objects.  Having pg_stat_checksums would allow us to
> > plugin more data easily based on the last failure state:
> > - last relid of failure
> > - last fork type of failure
> > - last block number of failure.
> > Not saying to do that now, but having that in pg_stat_database does
> > not seem very natural to me.  And on top of that we would have an
> > extra row full of NULLs for shared objects in pg_stat_database if we
> > adopt the unique view approach...  I find that rather ugly.
> >
> >
> > I think that tracking each and every block is of course a non-starter,
> as you've noticed.
>
> I think that's less of a concern now that the stats collector process has
> gone and that the stats are now collected in shared memory, what do you
> think?
>

It would be less of a concern yes, but I think it still would be a concern.
If you have a large amount of corruption you could quickly get to millions
of rows to keep track of which would definitely be a problem in shared
memory as well, wouldn't it?

But perhaps we could keep a list of "the last 100 checksum failures" or
something like that?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Error-safe user functions

2022-12-11 Thread Tom Lane
Andrew Dunstan  writes:
> Maybe as we work through the remaining input functions (there are about
> 60 core candidates left on my list) we should mark them with a comment
> if no adjustment is needed.

I did a quick pass through them last night.  Assuming that we don't
need to touch the unimplemented input functions (eg for pseudotypes),
I count these core functions as still needing work:

aclitemin
bit_in
box_in
bpcharin
byteain
cash_in
cidin
cidr_in
circle_in
inet_in
int2vectorin
jsonpath_in
line_in
lseg_in
macaddr8_in
macaddr_in
multirange_in
namein
oidin
oidvectorin
path_in
pg_lsn_in
pg_snapshot_in
point_in
poly_in
range_in
regclassin
regcollationin
regconfigin
regdictionaryin
regnamespacein
regoperatorin
regoperin
regprocedurein
regprocin
regrolein
regtypein
tidin
tsqueryin
tsvectorin
uuid_in
varbit_in
varcharin
xid8in
xidin
xml_in

and these contrib functions:

hstore:
hstore_in
intarray:
bqarr_in
isn:
ean13_in
isbn_in
ismn_in
issn_in
upc_in
ltree:
ltree_in
lquery_in
ltxtq_in
seg:
seg_in

Maybe we should have a conversation about which of these are
highest priority to get to a credible feature.  We clearly need
to fix the remaining SQL-spec types (varchar and bpchar, mainly).
At the other extreme, likely nobody would weep if we never fixed
int2vectorin, for instance.

I'm a little concerned about the cost-benefit of fixing the reg* types.
The ones that accept type names actually use the core grammar to parse
those.  Now, we probably could fix the grammar to be non-throwing, but
it'd be very invasive and I'm not sure about the performance impact.
It might be best to content ourselves with soft reporting of lookup
failures, as opposed to syntax problems.

regards, tom lane




Re: Error-safe user functions

2022-12-11 Thread Andrew Dunstan


On 2022-12-11 Su 12:24, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 2022-12-10 Sa 14:38, Tom Lane wrote:
>>> I have not done anything here about errors within JsonbValueToJsonb.
>>> There would need to be another round of API-extension in that area
>>> if we want to be able to trap its errors.  As you say, those are mostly
>>> about exceeding implementation size limits, so I suppose one could argue
>>> that they are not so different from palloc failure.  It's still annoying.
>>> If people are good with the changes attached, I might take a look at
>>> that.
>> Awesome.
> I spent some time looking at this, and was discouraged to conclude
> that the notational mess would probably be substantially out of
> proportion to the value.  The main problem is that we'd have to change
> the API of pushJsonbValue, which has more than 150 call sites, most
> of which would need to grow a new test for failure return.  Maybe
> somebody will feel like tackling that at some point, but not me today.
>
>   


Yes, I had similar feelings when I looked at it. I don't think this
needs to hold up proceeding with the SQL/JSON rework, which I think can
reasonably restart now.

Maybe as we work through the remaining input functions (there are about
60 core candidates left on my list) we should mark them with a comment
if no adjustment is needed.

I'm going to look at jsonpath and the text types next, I somewhat tied
up this week but might get to relook at pushJsonbValue later in the month.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Error-safe user functions

2022-12-11 Thread Tom Lane
Andrew Dunstan  writes:
> On 2022-12-10 Sa 14:38, Tom Lane wrote:
>> I have not done anything here about errors within JsonbValueToJsonb.
>> There would need to be another round of API-extension in that area
>> if we want to be able to trap its errors.  As you say, those are mostly
>> about exceeding implementation size limits, so I suppose one could argue
>> that they are not so different from palloc failure.  It's still annoying.
>> If people are good with the changes attached, I might take a look at
>> that.

> Awesome.

I spent some time looking at this, and was discouraged to conclude
that the notational mess would probably be substantially out of
proportion to the value.  The main problem is that we'd have to change
the API of pushJsonbValue, which has more than 150 call sites, most
of which would need to grow a new test for failure return.  Maybe
somebody will feel like tackling that at some point, but not me today.

regards, tom lane




Re: Date-Time dangling unit fix

2022-12-11 Thread Joseph Koshakow
On Sun, Dec 11, 2022 at 10:29 AM Joseph Koshakow  wrote:
>
> Hi all,
>
> Attached is a patch to fix a parsing error for date-time types that
> allow dangling units in the input. For example,
> `date '1995-08-06 m y d'` was considered a valid date and the dangling
> units were ignored.
>
> Intervals also suffer from a similar issue, but the attached patch
> doesn't fix that issue. For example,
> `interval '1 day second month 6 hours days years ago'` is parsed as a
> valid interval with -1 days and -6 hours. I'm hoping to fix that in a
> later patch, but it will likely be more complicated than the other
> date-time fixes.
>
> - Joe Koshakow

I think I sent that to the wrong email address.
From fbcf39211fc7a379ea021160298604694383d56c Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Sat, 10 Dec 2022 18:59:26 -0500
Subject: [PATCH] Handle dangling units in date-time input

DecodeDateTime and DecodeTimeOnly allowed dangling unit types on input
without returning an error. For example, `date '1995-08-06 m y d'` was
considered a valid date and the dangling units were ignored. This
commit fixes this issue so an error is returned instead.
---
 src/backend/utils/adt/datetime.c  | 8 
 src/test/regress/expected/date.out| 5 +
 src/test/regress/expected/time.out| 5 +
 src/test/regress/expected/timestamp.out   | 5 +
 src/test/regress/expected/timestamptz.out | 5 +
 src/test/regress/expected/timetz.out  | 5 +
 src/test/regress/sql/date.sql | 3 +++
 src/test/regress/sql/time.sql | 3 +++
 src/test/regress/sql/timestamp.sql| 3 +++
 src/test/regress/sql/timestamptz.sql  | 3 +++
 src/test/regress/sql/timetz.sql   | 3 +++
 11 files changed, 48 insertions(+)

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index b5b117a8ca..a985d1b6ea 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -1566,6 +1566,10 @@ DecodeDateTime(char **field, int *ftype, int nf,
 		fmask |= tmask;
 	}			/* end loop over fields */
 
+	/* prefix type was dangling and never handled */
+	if (ptype != 0)
+		return DTERR_BAD_FORMAT;
+
 	/* do final checking/adjustment of Y/M/D fields */
 	dterr = ValidateDate(fmask, isjulian, is2digits, bc, tm);
 	if (dterr)
@@ -2415,6 +2419,10 @@ DecodeTimeOnly(char **field, int *ftype, int nf,
 		fmask |= tmask;
 	}			/* end loop over fields */
 
+	/* prefix type was dangling and never handled */
+	if (ptype != 0)
+		return DTERR_BAD_FORMAT;
+
 	/* do final checking/adjustment of Y/M/D fields */
 	dterr = ValidateDate(fmask, isjulian, is2digits, bc, tm);
 	if (dterr)
diff --git a/src/test/regress/expected/date.out b/src/test/regress/expected/date.out
index f8f83e40e9..fec466a594 100644
--- a/src/test/regress/expected/date.out
+++ b/src/test/regress/expected/date.out
@@ -1526,3 +1526,8 @@ select make_time(10, 55, 100.1);
 ERROR:  time field value out of range: 10:55:100.1
 select make_time(24, 0, 2.1);
 ERROR:  time field value out of range: 24:00:2.1
+-- test error on dangling units
+SELECT date '1995-08-06 m';
+ERROR:  invalid input syntax for type date: "1995-08-06 m"
+LINE 1: SELECT date '1995-08-06 m';
+^
diff --git a/src/test/regress/expected/time.out b/src/test/regress/expected/time.out
index a44caededd..5f7058eca8 100644
--- a/src/test/regress/expected/time.out
+++ b/src/test/regress/expected/time.out
@@ -229,3 +229,8 @@ SELECT date_part('epoch',   TIME '2020-05-26 13:30:25.575401');
  48625.575401
 (1 row)
 
+-- test error on dangling units
+SELECT time '12:30:15 d';
+ERROR:  invalid input syntax for type time: "12:30:15 d"
+LINE 1: SELECT time '12:30:15 d';
+^
diff --git a/src/test/regress/expected/timestamp.out b/src/test/regress/expected/timestamp.out
index be66274738..6fce7319eb 100644
--- a/src/test/regress/expected/timestamp.out
+++ b/src/test/regress/expected/timestamp.out
@@ -2110,3 +2110,8 @@ select * from generate_series('2020-01-01 00:00'::timestamp,
   '2020-01-02 03:00'::timestamp,
   '0 hour'::interval);
 ERROR:  step size cannot equal zero
+-- test error on dangling units
+SELECT timestamp '1995-08-06 12:30:15 y';
+ERROR:  invalid input syntax for type timestamp: "1995-08-06 12:30:15 y"
+LINE 1: SELECT timestamp '1995-08-06 12:30:15 y';
+ ^
diff --git a/src/test/regress/expected/timestamptz.out b/src/test/regress/expected/timestamptz.out
index fb06acbccc..565c5595ea 100644
--- a/src/test/regress/expected/timestamptz.out
+++ b/src/test/regress/expected/timestamptz.out
@@ -3085,3 +3085,8 @@ select * from tmptz where f1 at time zone 'utc' = '2017-01-18 00:00';
  Tue Jan 17 16:00:00 2017 PST
 (1 row)
 
+-- test error on dangling units
+SELECT timestamptz '1995-08-06 12:30:15 m';
+ERROR:  invalid input syntax for type timestamp with time zone: "1995-08-06 12:30:15 m"
+LINE 1: SELECT timestamptz '1995-08-06 

Date-Time dangling unit fix

2022-12-11 Thread Joseph Koshakow
Hi all,

Attached is a patch to fix a parsing error for date-time types that
allow dangling units in the input. For example,
`date '1995-08-06 m y d'` was considered a valid date and the dangling
units were ignored.

Intervals also suffer from a similar issue, but the attached patch
doesn't fix that issue. For example,
`interval '1 day second month 6 hours days years ago'` is parsed as a
valid interval with -1 days and -6 hours. I'm hoping to fix that in a
later patch, but it will likely be more complicated than the other
date-time fixes.

- Joe Koshakow
From fbcf39211fc7a379ea021160298604694383d56c Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Sat, 10 Dec 2022 18:59:26 -0500
Subject: [PATCH] Handle dangling units in date-time input

DecodeDateTime and DecodeTimeOnly allowed dangling unit types on input
without returning an error. For example, `date '1995-08-06 m y d'` was
considered a valid date and the dangling units were ignored. This
commit fixes this issue so an error is returned instead.
---
 src/backend/utils/adt/datetime.c  | 8 
 src/test/regress/expected/date.out| 5 +
 src/test/regress/expected/time.out| 5 +
 src/test/regress/expected/timestamp.out   | 5 +
 src/test/regress/expected/timestamptz.out | 5 +
 src/test/regress/expected/timetz.out  | 5 +
 src/test/regress/sql/date.sql | 3 +++
 src/test/regress/sql/time.sql | 3 +++
 src/test/regress/sql/timestamp.sql| 3 +++
 src/test/regress/sql/timestamptz.sql  | 3 +++
 src/test/regress/sql/timetz.sql   | 3 +++
 11 files changed, 48 insertions(+)

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index b5b117a8ca..a985d1b6ea 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -1566,6 +1566,10 @@ DecodeDateTime(char **field, int *ftype, int nf,
 		fmask |= tmask;
 	}			/* end loop over fields */
 
+	/* prefix type was dangling and never handled */
+	if (ptype != 0)
+		return DTERR_BAD_FORMAT;
+
 	/* do final checking/adjustment of Y/M/D fields */
 	dterr = ValidateDate(fmask, isjulian, is2digits, bc, tm);
 	if (dterr)
@@ -2415,6 +2419,10 @@ DecodeTimeOnly(char **field, int *ftype, int nf,
 		fmask |= tmask;
 	}			/* end loop over fields */
 
+	/* prefix type was dangling and never handled */
+	if (ptype != 0)
+		return DTERR_BAD_FORMAT;
+
 	/* do final checking/adjustment of Y/M/D fields */
 	dterr = ValidateDate(fmask, isjulian, is2digits, bc, tm);
 	if (dterr)
diff --git a/src/test/regress/expected/date.out b/src/test/regress/expected/date.out
index f8f83e40e9..fec466a594 100644
--- a/src/test/regress/expected/date.out
+++ b/src/test/regress/expected/date.out
@@ -1526,3 +1526,8 @@ select make_time(10, 55, 100.1);
 ERROR:  time field value out of range: 10:55:100.1
 select make_time(24, 0, 2.1);
 ERROR:  time field value out of range: 24:00:2.1
+-- test error on dangling units
+SELECT date '1995-08-06 m';
+ERROR:  invalid input syntax for type date: "1995-08-06 m"
+LINE 1: SELECT date '1995-08-06 m';
+^
diff --git a/src/test/regress/expected/time.out b/src/test/regress/expected/time.out
index a44caededd..5f7058eca8 100644
--- a/src/test/regress/expected/time.out
+++ b/src/test/regress/expected/time.out
@@ -229,3 +229,8 @@ SELECT date_part('epoch',   TIME '2020-05-26 13:30:25.575401');
  48625.575401
 (1 row)
 
+-- test error on dangling units
+SELECT time '12:30:15 d';
+ERROR:  invalid input syntax for type time: "12:30:15 d"
+LINE 1: SELECT time '12:30:15 d';
+^
diff --git a/src/test/regress/expected/timestamp.out b/src/test/regress/expected/timestamp.out
index be66274738..6fce7319eb 100644
--- a/src/test/regress/expected/timestamp.out
+++ b/src/test/regress/expected/timestamp.out
@@ -2110,3 +2110,8 @@ select * from generate_series('2020-01-01 00:00'::timestamp,
   '2020-01-02 03:00'::timestamp,
   '0 hour'::interval);
 ERROR:  step size cannot equal zero
+-- test error on dangling units
+SELECT timestamp '1995-08-06 12:30:15 y';
+ERROR:  invalid input syntax for type timestamp: "1995-08-06 12:30:15 y"
+LINE 1: SELECT timestamp '1995-08-06 12:30:15 y';
+ ^
diff --git a/src/test/regress/expected/timestamptz.out b/src/test/regress/expected/timestamptz.out
index fb06acbccc..565c5595ea 100644
--- a/src/test/regress/expected/timestamptz.out
+++ b/src/test/regress/expected/timestamptz.out
@@ -3085,3 +3085,8 @@ select * from tmptz where f1 at time zone 'utc' = '2017-01-18 00:00';
  Tue Jan 17 16:00:00 2017 PST
 (1 row)
 
+-- test error on dangling units
+SELECT timestamptz '1995-08-06 12:30:15 m';
+ERROR:  invalid input syntax for type timestamp with time zone: "1995-08-06 12:30:15 m"
+LINE 1: SELECT timestamptz '1995-08-06 12:30:15 m';
+   ^
diff --git a/src/test/regress/expected/timetz.out b/src/test/regress/expected/timetz.out

Re: Error-safe user functions

2022-12-11 Thread Andrew Dunstan


On 2022-12-10 Sa 19:00, Tom Lane wrote:
> Looking at my notes, there's one other loose end
> I forgot to mention:
>
>  * Note: pg_unicode_to_server() will throw an error for a
>  * conversion failure, rather than returning a failure
>  * indication.  That seems OK.
>
> We ought to do something about that, but I'm not sure how hard we
> ought to work at it.  Perhaps it's sufficient to make a variant of
> pg_unicode_to_server that just returns true/false instead of failing,
> and add a JsonParseErrorType for "untranslatable character" to let
> json_errdetail return a reasonably on-point message. 


Seems reasonable.


>  We could imagine
> extending the ErrorSaveContext infrastructure into the encoding
> conversion modules, and maybe at some point that'll be worth doing,
> but in this particular context it doesn't seem like we'd be getting
> a very much better error message.  The main thing that we would get
> from such an extension is a chance to capture the report from
> report_untranslatable_char.  But what that adds is the ability to
> identify exactly which character couldn't be translated --- and in
> this use-case there's always just one.
>
>   


Yeah, probably overkill for now.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: ExecRTCheckPerms() and many prunable partitions (checkAsUser)

2022-12-11 Thread Justin Pryzby
On Sun, Dec 11, 2022 at 06:25:48PM +0900, Amit Langote wrote:
> On Sun, Dec 11, 2022 at 5:17 AM Justin Pryzby  wrote:
> > The original code rechecks rte->checkAsUser with the rte of the parent
> > rel.  The patch changed to access onerel instead, but that's not updated
> > after looping to find the parent.
> >
> > Is that okay ?  It doesn't seem intentional, since "userid" is still
> > being recomputed, but based on onerel, which hasn't changed.  The
> > original intent (since 553d2ec27) is to recheck the parent's
> > "checkAsUser".
> >
> > It seems like this would matter for partitioned tables, when the
> > partition isn't readable, but its parent is, and accessed via a view
> > owned by another user?
> 
> Thanks for pointing this out.
> 
> I think these blocks which are rewriting userid to basically the same
> value should have been removed from these sites as part of 599b33b94.

I thought maybe; thanks for checking.

Little nitpicks:

001:
Fine to use the same userid as it's same in all
=> the same

002:
give that it's a subquery rel.
=> given

-- 
Justin




Re: ExecRTCheckPerms() and many prunable partitions (checkAsUser)

2022-12-11 Thread Amit Langote
Hi,

On Sun, Dec 11, 2022 at 5:17 AM Justin Pryzby  wrote:
> On Tue, Nov 29, 2022 at 10:37:56PM +0900, Amit Langote wrote:
> > 0002 contains changes that has to do with changing how we access
> > checkAsUser in some foreign table planning/execution code sites.
> > Thought it might be better to describe it separately too.
>
> This was committed as 599b33b94:
> Stop accessing checkAsUser via RTE in some cases
>
> Which does this in a couple places in selfuncs.c:
>
> if (!vardata->acl_ok &&
> root->append_rel_array != NULL)
> {
> AppendRelInfo *appinfo;
> Index   varno = index->rel->relid;
>
> appinfo = root->append_rel_array[varno];
> while (appinfo &&
>
> planner_rt_fetch(appinfo->parent_relid,
> root)->rtekind == 
> RTE_RELATION)
> {
> varno = appinfo->parent_relid;
> appinfo = 
> root->append_rel_array[varno];
> }
> if (varno != index->rel->relid)
> {
> /* Repeat access check on this rel */
> rte = planner_rt_fetch(varno, root);
> Assert(rte->rtekind == RTE_RELATION);
>
> -   userid = rte->checkAsUser ? 
> rte->checkAsUser : GetUserId();
> +   userid = OidIsValid(onerel->userid) ?
> +   onerel->userid : GetUserId();
>
> vardata->acl_ok =
> rte->securityQuals == NIL &&
> (pg_class_aclcheck(rte->relid,
>userid,
>ACL_SELECT) == 
> ACLCHECK_OK);
> }
> }
>
>
> The original code rechecks rte->checkAsUser with the rte of the parent
> rel.  The patch changed to access onerel instead, but that's not updated
> after looping to find the parent.
>
> Is that okay ?  It doesn't seem intentional, since "userid" is still
> being recomputed, but based on onerel, which hasn't changed.  The
> original intent (since 553d2ec27) is to recheck the parent's
> "checkAsUser".
>
> It seems like this would matter for partitioned tables, when the
> partition isn't readable, but its parent is, and accessed via a view
> owned by another user?

Thanks for pointing this out.

I think these blocks which are rewriting userid to basically the same
value should have been removed from these sites as part of 599b33b94.
Even before that commit, the checkAsUser value should have been the
same in the RTE of both the child relation passed to these functions
and that of the root parent that's looked up by looping.  That's
because expand_single_inheritance_child(), which adds child RTEs,
copies the parent RTE's checkAsUser, that is, as of commit 599b33b94.
A later commit a61b1f74823c has removed the checkAsUser field from
RangeTblEntry.

Moreover, 599b33b94 adds some code in build_simple_rel() to set a
given rel's userid value by copying it from the parent rel, such that
the userid value would be the same in all relations in a given
inheritance tree.

I've attached 0001 to remove those extraneous code blocks and add a
comment mentioning that userid need not be recomputed.

While staring at the build_simple_rel() bit mentioned above, I
realized that this code fails to set userid correctly in the
inheritance parent rels that are child relations of subquery parent
relations, such as UNION ALL subqueries.  In that case, instead of
copying the userid (= 0) of the parent rel, the child should look up
its own RTEPermissionInfo, which should be there, and use the
checkAsUser value from there.  I've attached 0002 to fix this hole.  I
am not sure whether there's a way to add a test case for this in the
core suite.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


v1-0001-Remove-some-dead-code-in-selfuncs.c.patch
Description: Binary data


v1-0002-Fix-build_simple_rel-to-correctly-set-childrel-us.patch
Description: Binary data