Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-10 Thread Amit Kapila
On Fri, Mar 8, 2024 at 8:08 PM Bharath Rupireddy
 wrote:
>
> On Wed, Mar 6, 2024 at 4:28 PM Amit Kapila  wrote:
> >
> > IIUC, the current conflict_reason is primarily used to determine
> > logical slots on standby that got invalidated due to recovery time
> > conflict. On the primary, it will also show logical slots that got
> > invalidated due to the corresponding WAL got removed. Is that
> > understanding correct?
>
> That's right.
>
> > If so, we are already sort of overloading this
> > column. However, now adding more invalidation reasons that won't
> > happen during recovery conflict handling will change entirely the
> > purpose (as per the name we use) of this variable. I think
> > invalidation_reason could depict this column correctly but OTOH I
> > guess it would lose its original meaning/purpose.
>
> Hm. I get the concern. Are you okay with having inavlidation_reason
> separately for both logical and physical slots? In such a case,
> logical slots that got invalidated on the standby will have duplicate
> info in conflict_reason and invalidation_reason, is this fine?
>

If we have duplicate information in two columns that could be
confusing for users. BTW, isn't the recovery conflict occur only
because of rows_removed and wal_level_insufficient reasons? The
wal_removed or the new reasons you are proposing can't happen because
of recovery conflict. Am, I missing something here?

> Another idea is to make 'conflict_reason text' as a 'conflicting
> boolean' again (revert 007693f2a3), and have 'invalidation_reason
> text' for both logical and physical slots. So, whenever 'conflicting'
> is true, one can look at invalidation_reason for the reason for
> conflict. How does this sound?
>

So, does this mean that conflicting will only be true for some of the
reasons (say wal_level_insufficient, rows_removed, wal_removed) and
logical slots but not for others? I think that will also not eliminate
the duplicate information as user could have deduced that from single
column

-- 
With Regards,
Amit Kapila.




Re: Add new error_action COPY ON_ERROR "log"

2024-03-10 Thread Michael Paquier
On Sat, Mar 09, 2024 at 12:01:49AM +0530, Bharath Rupireddy wrote:
> If NOTICE per attribute and incrementing num_errors per row is
> implemented, it ends up erroring out with ERROR:  missing data for
> column "m"  for all-column-empty-row. Shall we treat this ERROR softly
> too if on_error ignore is specified? Or shall we discuss this idea
> separately?
>
> ERROR:  missing data for column "m"
> CONTEXT:  COPY check_ign_err, line 5: ""

Hmm.  I have spent some time looking at the bevahior of ON_ERROR, and
there are two tests in copy2.sql, one for the case where there is more
data than attributes and a second where there is not enough data in a
row that checks for errors.

For example, take this table:
=# create table tab (a int, b int, c int);
CREATE TABLE

This case works, even if the row has clearly not enough attributes.
The first attribute can be parsed, not the second one and this causes
the remaining data of the row to be skipped:
=# copy tab from stdin (delimiter ',', on_error ignore);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1,
>> \.
NOTICE:  0: 1 row was skipped due to data type incompatibility
LOCATION:  CopyFrom, copyfrom.c:1314
COPY 0

This case fails.  The first and the second attributes can be parsed, 
and the line fails because we are missing the last attribute as of a
lack of delimiter:
=# copy tab from stdin (delimiter ',', on_error ignore);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 1,1
>> \.
ERROR:  22P04: missing data for column "c"
CONTEXT:  COPY tab, line 1: "1,1"

This brings a weird edge case for the all-column-empty-row case you
are mentioning once if we try to get information about all the rows we
should expect, but this has as side effect to break a case that's
intended ro work with ON_ERROR, as far as I understand, which is to
skip entirely a raw on the first conversion error we find, even if
there is no data afterwards.  I was a bit confused by that first, but
I can also see why it is useful as-is on HEAD.

At the end of the day, this comes down to what is more helpful to the
user.  And I'd agree on the side what ON_ERROR does currently, which
is what your patch relies on: on the first conversion failure, give up
and skip the rest of the row because we cannot trust its contents.
That's my way of saying that I am fine with the proposal of your
patch, and that we cannot provide the full state of a row without
making the error stack of COPY more invasive.

Perhaps we could discuss this idea of ensuring that all the attributes
are on a row in a different thread, as you say, but I am not really
convinced that there's a strong need for it at this stage as ON_ERROR
is new to v17.  So it does not sound like a bad thing to let it brew
more before implementing more options and make the COPY paths more
complicated than they already are.  I suspect that this may trigger
some discussion during the beta/stability phase depending on the
initial feedback.  Perhaps I'm wrong, though.

+  verbose, a NOTICE message
+  containing the line number and column name for each discarded row is
+  emitted.

This should clarify that the column name refers to the attribute where
the input conversion has failed, I guess.  Specifying only "column
name" without more context is a bit confusing.
--
Michael


signature.asc
Description: PGP signature


Re: "type with xxxx does not exist" when doing ExecMemoize()

2024-03-10 Thread Tender Wang
David Rowley  于2024年3月11日周一 13:25写道:

> On Thu, 7 Mar 2024 at 22:50, David Rowley  wrote:
> >
> > On Thu, 7 Mar 2024 at 15:24, Tender Wang  wrote:
> > >
> > > Andrei Lepikhov  于2024年3月6日周三 11:37写道:
> > >> I think, it is a bug. Should it be fixed (and back-patched) earlier?
> > >
> > > Agreed.   Need David to review it as he knows this area best.
> >
> > This is on my list of things to do. Just not at the top yet.
>
> I've gone over this patch and I'm happy with the changes to
> nodeMemoize.c.  The thing I did change was the newly added test.  The
> problem there was the test was passing for me with and without the
> code fix.  I ended up changing the test so the cache hits and misses
> are reported.  That required moving the test to above where the
> work_mem is set to 64KB so we can be certain the values will all be
> cached and the cache hits are predictable.
>
> My other changes were just cosmetic.
>
> Thanks for working on this fix.  I've pushed the patch.
>
> David
>

Thanks for pushing the patch.
-- 
Tender Wang
OpenPie:  https://en.openpie.com/


Re: "type with xxxx does not exist" when doing ExecMemoize()

2024-03-10 Thread David Rowley
On Thu, 7 Mar 2024 at 22:50, David Rowley  wrote:
>
> On Thu, 7 Mar 2024 at 15:24, Tender Wang  wrote:
> >
> > Andrei Lepikhov  于2024年3月6日周三 11:37写道:
> >> I think, it is a bug. Should it be fixed (and back-patched) earlier?
> >
> > Agreed.   Need David to review it as he knows this area best.
>
> This is on my list of things to do. Just not at the top yet.

I've gone over this patch and I'm happy with the changes to
nodeMemoize.c.  The thing I did change was the newly added test.  The
problem there was the test was passing for me with and without the
code fix.  I ended up changing the test so the cache hits and misses
are reported.  That required moving the test to above where the
work_mem is set to 64KB so we can be certain the values will all be
cached and the cache hits are predictable.

My other changes were just cosmetic.

Thanks for working on this fix.  I've pushed the patch.

David




Re: POC, WIP: OR-clause support for indexes

2024-03-10 Thread Andrei Lepikhov

On 7/3/2024 21:51, Alexander Korotkov wrote:

Hi!

On Tue, Mar 5, 2024 at 9:59 AM Andrei Lepikhov 
mailto:a.lepik...@postgrespro.ru>> wrote:

 > On 5/3/2024 12:30, Andrei Lepikhov wrote:
 > > On 4/3/2024 09:26, jian he wrote:
 > ... and the new version of the patchset is attached.

I made some revisions for the patchset.

Great!

1) Use hash_combine() to combine hash values.

Looks better

2) Upper limit the number of array elements by MAX_SAOP_ARRAY_SIZE.


I'm not convinced about this limit. The initial reason was to combine 
long lists of ORs into the array because such a transformation made at 
an early stage increases efficiency.
I understand the necessity of this limit in the array decomposition 
routine but not in the creation one.


3) Better save the original order of clauses by putting hash entries and 
untransformable clauses to the same list.  A lot of differences in 
regression tests output have gone.
I agree that reducing the number of changes in regression tests looks 
better. But to achieve this, you introduced a hack that increases the 
complexity of the code. Is it worth it? Maybe it would be better to make 
one-time changes in tests instead of getting this burden on board. Or 
have you meant something more introducing the node type?


We don't make array values unique.  That might make query execution 
performance somewhat worse, and also makes selectivity estimation 
worse.  I suggest Andrei and/or Alena should implement making array 
values unique.

The fix Alena has made looks correct. But I urge you to think twice:
The optimizer doesn't care about duplicates, so why do we do it?
What's more, this optimization is intended to speed up queries with long 
OR lists. Using the list_append_unique() comparator on such lists could 
impact performance. I suggest sticking to the common rule and leaving 
the responsibility on the user's shoulders.
At least, we should do this optimization later, in one pass, with 
sorting elements before building the array. But what if we don't have a 
sort operator for the type?


--
regards,
Andrei Lepikhov
Postgres Professional





RE: speed up a logical replica setup

2024-03-10 Thread Hayato Kuroda (Fujitsu)
Dear Vignesh,

Thanks for updating the patch, but cfbot still got angry [1].
Note that two containers (autoconf and meson) failed at different place,
so I think it is intentional one. It seems that there may be a bug related with 
32-bit build.

We should see and fix as soon as possible.

[1]: http://cfbot.cputube.org/highlights/all.html#4637

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



Re: Documentation: warn about two_phase when altering a subscription

2024-03-10 Thread Amit Kapila
On Sat, Feb 24, 2024 at 5:21 AM Tristen Raab  wrote:
>
> I've reviewed your patch, it applies correctly and the documentation builds 
> without any errors. As for the content of the patch itself, I think so far 
> it's good but would make two modifications. I like how the patch was worded 
> originally when referring to the subscription, stating these parameters were 
> 'in' the subscription rather than 'by' it. So I'd propose changing
>
> > parameters specified by the subscription. When creating the slot, ensure
>
> to
>
> > parameters specified in the subscription. When creating the slot, ensure
>

This suggestion makes sense, so I updated the patch for this and committed.

-- 
With Regards,
Amit Kapila.




Re: speed up a logical replica setup

2024-03-10 Thread vignesh C
On Sat, 9 Mar 2024 at 00:56, Tomas Vondra  wrote:
>
>
>
> On 3/8/24 10:44, Hayato Kuroda (Fujitsu) wrote:
> > Dear Tomas, Euler,
> >
> > Thanks for starting to read the thread! Since I'm not an original author,
> > I want to reply partially.
> >
> >> I decided to take a quick look on this patch today, to see how it works
> >> and do some simple tests. I've only started to get familiar with it, so
> >> I have only some comments / questions regarding usage, not on the code.
> >> It's quite possible I didn't understand some finer points, or maybe it
> >> was already discussed earlier in this very long thread, so please feel
> >> free to push back or point me to the past discussion.
> >>
> >> Also, some of this is rather opinionated, but considering I didn't see
> >> this patch before, my opinions may easily be wrong ...
> >
> > I felt your comments were quit valuable.
> >
> >> 1) SGML docs
> >>
> >> It seems the SGML docs are more about explaining how this works on the
> >> inside, rather than how to use the tool. Maybe that's intentional, but
> >> as someone who didn't work with pg_createsubscriber before I found it
> >> confusing and not very helpful.
> >>
> >> For example, the first half of the page is prerequisities+warning, and
> >> sure those are useful details, but prerequisities are checked by the
> >> tool (so I can't really miss this) and warnings go into a lot of details
> >> about different places where things may go wrong. Sure, worth knowing
> >> and including in the docs, but maybe not right at the beginning, before
> >> I learn how to even run the tool?
> >
> > Hmm, right. I considered below improvements. Tomas and Euler, how do you 
> > think?
> >
> > * Adds more descriptions in "Description" section.
> > * Moves prerequisities+warning to "Notes" section.
> > * Adds "Usage" section which describes from a single node.
> >
> >> I'm not sure FOR ALL TABLES is a good idea. Or said differently, I'm
> >> sure it won't work for a number of use cases. I know large databases
> >> it's common to create "work tables" (not necessarily temporary) as part
> >> of a batch job, but there's no need to replicate those tables.
> >
> > Indeed, the documentation does not describe that all tables in the database
> > would be included in the publication.
> >
> >> I do understand that FOR ALL TABLES is the simplest approach, and for v1
> >> it may be an acceptable limitation, but maybe it'd be good to also
> >> support restricting which tables should be replicated (e.g. blacklist or
> >> whitelist based on table/schema name?).
> >
> > May not directly related, but we considered that accepting options was a 
> > next-step [1].
> >
> >> Note: I now realize this might fall under the warning about DDL, which
> >> says this:
> >>
> >> Executing DDL commands on the source server while running
> >> pg_createsubscriber is not recommended. If the target server has
> >> already been converted to logical replica, the DDL commands must
> >> not be replicated so an error would occur.
> >
> > Yeah, they would not be replicated, but not lead ERROR.
> > So should we say like "Creating tables on the source server..."?
> >
>
> Perhaps. Clarifying the docs would help, but it depends on the wording.
> For example, I doubt this should talk about "creating tables" because
> there are other DDL that (probably) could cause issues (like adding a
> column to the table, or something like that).
>
> >> 5) slot / publication / subscription name
> >>
> >> I find it somewhat annoying it's not possible to specify names for
> >> objects created by the tool - replication slots, publication and
> >> subscriptions. If this is meant to be a replica running for a while,
> >> after a while I'll have no idea what pg_createsubscriber_569853 or
> >> pg_createsubscriber_459548_2348239 was meant for.
> >>
> >> This is particularly annoying because renaming these objects later is
> >> either not supported at all (e.g. for replication slots), or may be
> >> quite difficult (e.g. publications).
> >>
> >> I do realize there are challenges with custom names (say, if there are
> >> multiple databases to replicate), but can't we support some simple
> >> formatting with basic placeholders? So we could specify
> >>
> >> --slot-name "myslot_%d_%p"
> >>
> >> or something like that?
> >
> > Not sure we can do in the first version, but looks nice. One concern is 
> > that I
> > cannot find applications which accepts escape strings like log_line_prefix.
> > (It may just because we do not have use-case.) Do you know examples?
> >
>
> I can't think of a tool already doing that, but I think that's simply
> because it was not needed. Why should we be concerned about this?
>
> >> BTW what will happen if we convert multiple standbys? Can't they all get
> >> the same slot name (they all have the same database OID, and I'm not
> >> sure how much entropy the PID has)?
> >
> > I tested and the second try did not work. The primal reason was the name of 
> > 

Re: May be BUG. Periodic burst growth of the checkpoint_req counter on replica.

2024-03-10 Thread Anton A. Melnikov

On 11.03.2024 03:39, Alexander Korotkov wrote:

Now that we distinguish stats for checkpoints and
restartpoints, we need to update the docs.  Please, check the patch
attached.


Maybe bring the pg_stat_get_checkpointer_buffers_written() description in 
consistent with these changes?
Like that:

--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5740 +5740 @@
-  descr => 'statistics: number of buffers written by the checkpointer',
+  descr => 'statistics: number of buffers written during checkpoints and 
restartpoints',

And after i took a fresh look at this patch i noticed a simple way to extract
write_time and sync_time counters for restartpoints too.

What do you think, is there a sense to do this?


With the best wishes,


--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Missing LWLock protection in pgstat_reset_replslot()

2024-03-10 Thread Michael Paquier
On Fri, Mar 08, 2024 at 07:46:26PM +0900, Michael Paquier wrote:
> Sounds good to me.

I've applied the patch of this thread as b36fbd9f8da1, though I did
not see a huge point in backpatching as at the end this is just a
consistency improvement.
--
Michael


signature.asc
Description: PGP signature


Re: remaining sql/json patches

2024-03-10 Thread jian he
On Sun, Mar 10, 2024 at 10:57 PM jian he  wrote:
>
> one more issue.

Hi
one more documentation issue.
after applied V42, 0001 to 0003,
there are 11 appearance of `FORMAT JSON` in functions-json.html
still not a single place explained what it is for.

json_query ( context_item, path_expression [ PASSING { value AS
varname } [, ...]] [ RETURNING data_type [ FORMAT JSON [ ENCODING UTF8
] ] ] [ { WITHOUT | WITH { CONDITIONAL | [UNCONDITIONAL] } } [ ARRAY ]
WRAPPER ] [ { KEEP | OMIT } QUOTES [ ON SCALAR STRING ] ] [ { ERROR |
NULL | EMPTY { [ ARRAY ] | OBJECT } | DEFAULT expression } ON EMPTY ]
[ { ERROR | NULL | EMPTY { [ ARRAY ] | OBJECT } | DEFAULT expression }
ON ERROR ])

FORMAT JSON seems just a syntax sugar or for compatibility in json_query.
but it returns an error when the returning type category is not
TYPCATEGORY_STRING.

for example, even the following will return an error.
`
CREATE TYPE regtest_comptype AS (b text);
SELECT JSON_QUERY(jsonb '{"a":{"b":"c"}}', '$.a' RETURNING
regtest_comptype format json);
`

seems only types in[0] will not generate an error, when specifying
FORMAT JSON in JSON_QUERY.

so it actually does something, not a syntax sugar?

[0] https://www.postgresql.org/docs/current/datatype-character.html




Re: [PoC] Improve dead tuple storage for lazy vacuum

2024-03-10 Thread John Naylor
On Thu, Mar 7, 2024 at 10:35 PM Masahiko Sawada  wrote:
>
> I've attached the remaining patches for CI. I've made some minor
> changes in separate patches and drafted the commit message for
> tidstore patch.
>
> While reviewing the tidstore code, I thought that it would be more
> appropriate to place tidstore.c under src/backend/lib instead of
> src/backend/common/access since the tidstore is no longer implemented
> only for heap or other access methods, and it might also be used by
> executor nodes in the future. What do you think?

That's a heck of a good question. I don't think src/backend/lib is
right -- it seems that's for general-purpose data structures.
Something like backend/utils is also too general.
src/backend/access/common has things for tuple descriptors, toast,
sessions, and I don't think tidstore is out of place here. I'm not
sure there's a better place, but I could be convinced otherwise.

v68-0001:

I'm not sure if commit messages are much a subject of review, and it's
up to the committer, but I'll share a couple comments just as
something to think about, not something I would ask you to change: I
think it's a bit distracting that the commit message talks about the
justification to use it for vacuum. Let's save that for the commit
with actual vacuum changes. Also, I suspect saying there are a "wide
range" of uses is over-selling it a bit, and that paragraph is a bit
awkward aside from that.

+ /* Collect TIDs extracted from the key-value pair */
+ result->num_offsets = 0;
+

This comment has nothing at all to do with this line. If the comment
is for several lines following, some of which are separated by blank
lines, there should be a blank line after the comment. Also, why isn't
tidstore_iter_extract_tids() responsible for setting that to zero?

+ ts->context = CurrentMemoryContext;

As far as I can tell, this member is never accessed again -- am I
missing something?

+ /* DSA for tidstore will be detached at the end of session */

No other test module pins the mapping, but that doesn't necessarily
mean it's wrong. Is there some advantage over explicitly detaching?

+-- Add tids in random order.

I don't see any randomization here. I do remember adding row_number to
remove whitespace in the output, but I don't remember a random order.
On that subject, the row_number was an easy trick to avoid extra
whitespace, but maybe we should just teach the setting function to
return blocknumber rather than null?

+Datum
+tidstore_create(PG_FUNCTION_ARGS)
+{
...
+ tidstore = TidStoreCreate(max_bytes, dsa);

+Datum
+tidstore_set_block_offsets(PG_FUNCTION_ARGS)
+{

+ TidStoreSetBlockOffsets(tidstore, blkno, offs, noffs);

These names are too similar. Maybe the test module should do
s/tidstore_/test_/ or similar.

+/* Sanity check if we've called tidstore_create() */
+static void
+check_tidstore_available(void)
+{
+ if (tidstore == NULL)
+ elog(ERROR, "tidstore is not initialized");
+}

I don't find this very helpful. If a developer wiped out the create
call, wouldn't the test crash and burn pretty obviously?

In general, the .sql file is still very hard-coded. Functions are
created that contain a VALUES statement. Maybe it's okay for now, but
wanted to mention it. Ideally, we'd have some randomized tests,
without having to display it. That could be in addition to (not
replacing) the small tests we have that display input. (see below)


v68-0002:

@@ -329,6 +381,13 @@ TidStoreIsMember(TidStore *ts, ItemPointer tid)

  ret = (page->words[wordnum] & ((bitmapword) 1 << bitnum)) != 0;

+#ifdef TIDSTORE_DEBUG
+ if (!TidStoreIsShared(ts))
+ {
+ bool ret_debug = ts_debug_is_member(ts, tid);;
+ Assert(ret == ret_debug);
+ }
+#endif

This only checking the case where we haven't returned already. In particular...

+ /* no entry for the blk */
+ if (page == NULL)
+ return false;
+
+ wordnum = WORDNUM(off);
+ bitnum = BITNUM(off);
+
+ /* no bitmap for the off */
+ if (wordnum >= page->nwords)
+ return false;

...these results are not checked.

More broadly, it seems like the test module should be able to test
everything that the debug-build array would complain about. Including
ordered iteration. This may require first saving our test input to a
table. We could create a cursor on a query that fetches the ordered
input from the table and verifies that the tid store iterate produces
the same ordered set, maybe with pl/pgSQL. Or something like that.
Seems like not a whole lot of work. I can try later in the week, if
you like.

v68-0005/6 look ready to squash

v68-0008 - I'm not a fan of captilizing short comment fragments. I use
the style of either: short lower-case phrases, or full sentences
including capitalization, correct grammar and period. I see these two
styles all over the code base, as appropriate.

+ /* Remain attached until end of backend */

We'll probably want this comment, if in fact we want this behavior.

+ /*
+ * Note that funcctx->call_cntr is incremented in SRF_RETURN_NEXT
+ * before return.
+ 

Re: Confine vacuum skip logic to lazy_scan_skip

2024-03-10 Thread Thomas Munro
On Mon, Mar 11, 2024 at 5:31 AM Melanie Plageman
 wrote:
> On Wed, Mar 6, 2024 at 6:47 PM Melanie Plageman
>  wrote:
> > Performance results:
> >
> > The TL;DR of my performance results is that streaming read vacuum is
> > faster. However there is an issue with the interaction of the streaming
> > read code and the vacuum buffer access strategy which must be addressed.

Woo.

> I have investigated the interaction between
> maintenance_io_concurrency, streaming reads, and the vacuum buffer
> access strategy (BAS_VACUUM).
>
> The streaming read API limits max_pinned_buffers to a pinned buffer
> multiplier (currently 4) * maintenance_io_concurrency buffers with the
> goal of constructing reads of at least MAX_BUFFERS_PER_TRANSFER size.
>
> Since the BAS_VACUUM ring buffer is size 256 kB or 32 buffers with
> default block size, that means that for a fully uncached vacuum in
> which all blocks must be vacuumed and will be dirtied, you'd have to
> set maintenance_io_concurrency at 8 or lower to see the same number of
> reuses (and shared buffer consumption) as master.
>
> Given that we allow users to specify BUFFER_USAGE_LIMIT to vacuum, it
> seems like we should force max_pinned_buffers to a value that
> guarantees the expected shared buffer usage by vacuum. But that means
> that maintenance_io_concurrency does not have a predictable impact on
> streaming read vacuum.
>
> What is the right thing to do here?
>
> At the least, the default size of the BAS_VACUUM ring buffer should be
> BLCKSZ * pinned_buffer_multiplier * default maintenance_io_concurrency
> (probably rounded up to the next power of two) bytes.

Hmm, does the v6 look-ahead distance control algorithm mitigate that
problem?  Using the ABC classifications from the streaming read
thread, I think for A it should now pin only 1, for B 16 and for C, it
depends on the size of the random 'chunks': if you have a lot of size
1 random reads then it shouldn't go above 10 because of (default)
maintenance_io_concurrency.  The only way to get up to very high
numbers would be to have a lot of random chunks triggering behaviour
C, but each made up of long runs of misses.  For example one can
contrive a BHS query that happens to read pages 0-15 then 20-35 then
40-55 etc etc so that we want to get lots of wide I/Os running
concurrently.  Unless vacuum manages to do something like that, it
shouldn't be able to exceed 32 buffers very easily.

I suspect that if we taught streaming_read.c to ask the
BufferAccessStrategy (if one is passed in) what its recommended pin
limit is (strategy->nbuffers?), we could just clamp
max_pinned_buffers, and it would be hard to find a workload where that
makes a difference, and we could think about more complicated logic
later.

In other words, I think/hope your complaints about excessive pinning
from v5 WRT all-cached heap scans might have also already improved
this case by happy coincidence?  I haven't tried it out though, I just
read your description of the problem...




Re: Reducing the log spam

2024-03-10 Thread Laurenz Albe
On Sat, 2024-03-09 at 14:03 +0100, Laurenz Albe wrote:
> Here is a patch that implements this.

And here is patch v2 that fixes a bug and passes the regression tests.

Yours,
Laurenz Albe
From 6c72ea7d0aa1df569a4e53f54fcb1a11542ac0ef Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Mon, 11 Mar 2024 03:41:49 +0100
Subject: [PATCH v2] Add parameter log_suppress_errcodes

The parameter contains a list of SQLSTATEs for which
FATAL and ERROR messages are not logged.  This is to
suppress messages that are of little interest to the
database administrator, but tend to clutter the log.

Author: Laurenz Albe
Discussion: https://postgr.es/m/408f399e7de1416c47bab7e260327ed5ad92838c.camel%40cybertec.at
---
 doc/src/sgml/config.sgml  |  34 +
 doc/src/sgml/logical-replication.sgml |   3 +-
 src/backend/utils/error/elog.c| 116 ++
 src/backend/utils/misc/guc_tables.c   |  11 ++
 src/backend/utils/misc/postgresql.conf.sample |   4 +
 src/bin/pg_ctl/t/004_logrotate.pl |   1 +
 src/include/pg_config_manual.h|  10 ++
 src/include/utils/guc.h   |   1 +
 src/include/utils/guc_hooks.h |   2 +
 src/test/subscription/t/014_binary.pl |   5 +
 src/test/subscription/t/029_on_error.pl   |   1 +
 11 files changed, 187 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 43b1a132a2..89a698c7b6 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6850,6 +6850,40 @@ local0.*/var/log/postgresql
   
  
 
+ 
+  log_suppress_errcodes (string)
+  
+   log_suppress_errcodes configuration parameter
+  
+  
+  
+   
+Causes ERROR and FATAL messages
+with certain error codes to be excluded from the log.
+The value is a comma-separated list of five-character error codes as
+listed in .  An error code that
+represents a class of errors (ends with three zeros) suppresses logging
+of all error codes within that class.  For example, the entry
+08000 (connection_exception)
+would suppress an error with code 08P01
+(protocol_violation).  The default setting is
+23505,3D000,3F000,42601,42704,42883,42P01,57P03.
+Only superusers and users with the appropriate SET
+privilege can change this setting.
+   
+
+   
+This setting is useful to exclude error messages from the log that are
+frequent but irrelevant.  That makes it easier to spot relevant
+messages in the log and keeps log files from growing too big.  For
+example, if you have a monitoring system that regularly establishes a
+TCP connection to the server without sending a correct startup packet,
+you could suppress the protocol violation errors by adding the error
+code 08P01 to the list.
+   
+  
+ 
+
  
   log_min_duration_statement (integer)
   
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index ec2130669e..6c647a8782 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -1490,7 +1490,8 @@ test_sub=# SELECT * FROM t1 ORDER BY id;
   
A conflict will produce an error and will stop the replication; it must be
resolved manually by the user.  Details about the conflict can be found in
-   the subscriber's server log.
+   the subscriber's server log if you remove 23505 from
+   .
   
 
   
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index c9719f358b..01eb00e870 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -112,12 +112,16 @@ int			Log_error_verbosity = PGERROR_DEFAULT;
 char	   *Log_line_prefix = NULL; /* format for extra log line info */
 int			Log_destination = LOG_DESTINATION_STDERR;
 char	   *Log_destination_string = NULL;
+char	   *log_suppress_errcodes = NULL;
 bool		syslog_sequence_numbers = true;
 bool		syslog_split_messages = true;
 
 /* Processed form of backtrace_functions GUC */
 static char *backtrace_function_list;
 
+/* Processed form form of log_suppress_errcodes (zero-terminated array) */
+static int *suppressed_errcodes;
+
 #ifdef HAVE_SYSLOG
 
 /*
@@ -866,6 +870,27 @@ errcode(int sqlerrcode)
 
 	edata->sqlerrcode = sqlerrcode;
 
+	/*
+	 * ERROR and FATAL messages with codes in log_suppress_errcodes don't get
+	 * logged.
+	 */
+	if ((edata->elevel == ERROR ||
+		 edata->elevel == FATAL) &&
+		suppressed_errcodes != NULL)
+	{
+		int *state;
+
+		for (state = suppressed_errcodes; *state != 0; state++)
+			/* error categories match all error codes in the category */
+			if (sqlerrcode == *state ||
+(ERRCODE_IS_CATEGORY(*state) &&
+ ERRCODE_TO_CATEGORY(sqlerrcode) == *state))
+			{
+edata->output_to_server = false;
+break;
+			}
+	}
+
 	return 0;	/* return value 

Re: Stack overflow issue

2024-03-10 Thread Alexander Korotkov
On Thu, Mar 7, 2024 at 11:07 AM Alexander Korotkov  wrote:
> On Thu, Mar 7, 2024 at 9:53 AM Egor Chindyaskin  wrote:
> >
> > > 6 march 2024 г., at 19:17, Alexander Korotkov  
> > > wrote:
> > >
> > > The revised set of remaining patches is attached.
> > >
> > > 0001 Turn tail recursion into iteration in CommitTransactionCommand()
> > > I did minor revision of comments and code blocks order to improve the
> > > readability.
> > >
> > > 0002 Avoid stack overflow in ShowTransactionStateRec()
> > > I didn't notice any issues, leave this piece as is.
> > >
> > > 0003 Avoid recursion in MemoryContext functions
> > > I've renamed MemoryContextTraverse() => MemoryContextTraverseNext(),
> > > which I think is a bit more intuitive.  Also I fixed
> > > MemoryContextMemConsumed(), which was still trying to use the removed
> > > argument "print" of MemoryContextStatsInternal() function.
> > >
> > > Generally, I think this patchset fixes important stack overflow holes.
> > > It is quite straightforward, clear and the code has a good shape.  I'm
> > > going to push this if no objections.
> >
> > I have tested the scripts from message [1]. After applying these patches 
> > and Tom Lane’s patch from message [2], all of the above mentioned functions 
> > no longer caused the server to crash. I also tried increasing the values in 
> > the presented scripts, which also did not lead to server crashes. Thank you!
> > Also, I would like to clarify something. Will fixes from message [3] and 
> > others be backported to all other branches, not just the master branch? As 
> > far as I remember, Tom Lane made corrections to all branches. For example 
> > [4].
> >
> > Links:
> > 1. 
> > https://www.postgresql.org/message-id/343ff14f-3060-4f88-9cc6-efdb390185df%40mail.ru
> > 2. https://www.postgresql.org/message-id/386032.1709765547%40sss.pgh.pa.us
> > 3. 
> > https://www.postgresql.org/message-id/CAPpHfduZqAjF%2B7rDRP-RGNHjOXy7nvFROQ0MGS436f8FPY5DpQ%40mail.gmail.com
> > 4. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e07ebd4b
>
> Thank you for your feedback!
>
> Initially I didn't intend to backpatch any of these.  But on second
> thought with the references you provided, I think we should backpatch
> simple check_stack_depth() checks from d57b7cc333 to all supported
> branches, but apply refactoring of memory contextes and transaction
> commit/abort just to master.  Opinions?

I've just backpatched check_stack_depth() checks to all supported branches.

--
Regards,
Alexander Korotkov




Re: Fix log_line_prefix to display the transaction id (%x) for statements not in a transaction block

2024-03-10 Thread Quan Zongliang




On 2024/3/4 15:48, jian he wrote:


Maybe we can tolerate LOG, first output the query plan then statement.


It is more appropriate to let the extension solve its own problems.
Of course, this change is not easy to implement.
Due to the way XID is assigned, there seems to be no good solution at 
the moment.






Re: Make COPY format extendable: Extract COPY TO format implementations

2024-03-10 Thread Sutou Kouhei
Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Mon, 11 Mar 2024 08:00:00 +0800,
  jian he  wrote:

> Hi, here are my cents:
> Currently in v17, we have 3 extra functions within DoCopyTo
> CopyToStart, one time, start, doing some preliminary work.
> CopyToOneRow, doing the repetitive work, called many times, row by row.
> CopyToEnd, one time doing the closing work.
> 
> seems to need a function pointer for processing the format and other options.
> or maybe the reason is we need a one time function call before doing DoCopyTo,
> like one time initialization.

I know that JSON format wants it but can we defer it? We can
add more options later. I want to proceed this improvement
step by step.

More use cases will help us which callbacks are needed. We
will be able to collect more use cases by providing basic
callbacks.

> generally in v17, the code pattern looks like this.
> if (cstate->opts.binary)
> {
> /* handle binary format */
> }
> else if (cstate->routine)
> {
> /* custom code, make the copy format extensible */
> }
> else
> {
> /* handle non-binary, (csv or text) format */
> }
> maybe we need another bool flag like `bool buildin_format`.
> if the copy format is {csv|text|binary}  then buildin_format is true else 
> false.
> 
> so the code pattern would be:
> if (cstate->opts.binary)
> {
> /* handle binary format */
> }
> else if (cstate->routine && !buildin_format)
> {
> /* custom code, make the copy format extensible */
> }
> else
> {
> /* handle non-binary, (csv or text) format */
> }
> 
> otherwise the {CopyToRoutine| CopyFromRoutine} needs a function pointer
> to distinguish native copy format and extensible supported format,
> like I mentioned above?

Hmm. I may miss something but I think that we don't need the
bool flag. Because we don't set cstate->routine for native
copy formats. So we can distinguish native copy format and
extensible supported format by checking only
cstate->routine.


Thanks,
-- 
kou




Re: May be BUG. Periodic burst growth of the checkpoint_req counter on replica.

2024-03-10 Thread Alexander Korotkov
On Sat, Mar 9, 2024 at 4:38 PM Magnus Hagander  wrote:
> Per the docs, the sync_time, write_time and buffers_written only apply
> to checkpoints, not restartpoints. Is this correct? AFAICT from a
> quick look at the code they include both checkpoints and restartpoints
> in which case I think the docs should be clarified to indicate that?

Right, these fields work as before reflecting both checkpoints and
restartpoints.  Documentation said checkpoints implying restartpoints
as well.  Now that we distinguish stats for checkpoints and
restartpoints, we need to update the docs.  Please, check the patch
attached.

--
Regards,
Alexander Korotkov


rename_pg_stat_get_checkpointer_fields.patch
Description: Binary data


Re: Add checkpoint/redo LSNs to recovery errors.

2024-03-10 Thread Michael Paquier
On Sun, Mar 10, 2024 at 04:58:19PM +1300, David Steele wrote:
> No doubt there are many other recovery log messages that could be improved,
> but I'd rather not bog down the patch.

Fair argument, and these additions are useful when taken
independently.  I've applied your suggestions for now.
--
Michael


signature.asc
Description: PGP signature


Re: Make COPY format extendable: Extract COPY TO format implementations

2024-03-10 Thread jian he
On Fri, Mar 8, 2024 at 8:23 AM Sutou Kouhei  wrote:
>
>
> This shows that the v17 approach doesn't affect the current
> text/csv/binary implementations. (The v17 approach just adds
> 2 new structs, Copy{From,To}Rountine, without changing the
> current text/csv/binary implementations.)
>
> Can we push the v17 patch and proceed following
> implementations? Could someone (especially a PostgreSQL
> committer) take a look at this for double-check?
>

Hi, here are my cents:
Currently in v17, we have 3 extra functions within DoCopyTo
CopyToStart, one time, start, doing some preliminary work.
CopyToOneRow, doing the repetitive work, called many times, row by row.
CopyToEnd, one time doing the closing work.

seems to need a function pointer for processing the format and other options.
or maybe the reason is we need a one time function call before doing DoCopyTo,
like one time initialization.

We can placed the function pointer after:
`
cstate = BeginCopyTo(pstate, rel, query, relid,
stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
`


generally in v17, the code pattern looks like this.
if (cstate->opts.binary)
{
/* handle binary format */
}
else if (cstate->routine)
{
/* custom code, make the copy format extensible */
}
else
{
/* handle non-binary, (csv or text) format */
}
maybe we need another bool flag like `bool buildin_format`.
if the copy format is {csv|text|binary}  then buildin_format is true else false.

so the code pattern would be:
if (cstate->opts.binary)
{
/* handle binary format */
}
else if (cstate->routine && !buildin_format)
{
/* custom code, make the copy format extensible */
}
else
{
/* handle non-binary, (csv or text) format */
}

otherwise the {CopyToRoutine| CopyFromRoutine} needs a function pointer
to distinguish native copy format and extensible supported format,
like I mentioned above?




Re: Vectored I/O in bulk_write.c

2024-03-10 Thread Thomas Munro
Here also is a first attempt at improving the memory allocation and
memory layout.

I wonder if bulk logging should trigger larger WAL writes in the "Have
to write it ourselves" case in AdvanceXLInsertBuffer(), since writing
8kB of WAL at a time seems like an unnecessarily degraded level of
performance, especially with wal_sync_method=open_datasync.  Of course
the real answer is "make sure wal_buffers is high enough for your
workload" (usually indirectly by automatically scaling from
shared_buffers), but this problem jumps out when tracing bulk_writes.c
with default settings.  We write out the index 128kB at a time, but
the WAL 8kB at a time.
From 793caf2db6c00314f5bd8f7146d4797508f2f627 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 9 Mar 2024 16:04:21 +1300
Subject: [PATCH v3 1/3] Provide vectored variant of smgrextend().

Since mdwrite() and mdextend() were basically the same, merge them.
They had different assertions, but those would surely apply to any
implementation of smgr, so move them up to the smgr.c wrapper
level.  The other difference was the policy on segment creation, but
that can be captured by having smgrwritev() and smgrextendv() call a
single mdwritev() function with a new "extend" flag.

Discussion: https://postgr.es/m/CA%2BhUKGLx5bLwezZKAYB2O_qHj%3Dov10RpgRVY7e8TSJVE74oVjg%40mail.gmail.com
---
 src/backend/storage/smgr/md.c   | 106 +---
 src/backend/storage/smgr/smgr.c |  32 ++
 src/include/storage/md.h|   3 +-
 src/include/storage/smgr.h  |  12 +++-
 4 files changed, 47 insertions(+), 106 deletions(-)

diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index bf0f3ca76d1..9b125841226 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -447,79 +447,10 @@ mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo)
 	pfree(path);
 }
 
-/*
- * mdextend() -- Add a block to the specified relation.
- *
- * The semantics are nearly the same as mdwrite(): write at the
- * specified position.  However, this is to be used for the case of
- * extending a relation (i.e., blocknum is at or beyond the current
- * EOF).  Note that we assume writing a block beyond current EOF
- * causes intervening file space to become filled with zeroes.
- */
-void
-mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
-		 const void *buffer, bool skipFsync)
-{
-	off_t		seekpos;
-	int			nbytes;
-	MdfdVec*v;
-
-	/* If this build supports direct I/O, the buffer must be I/O aligned. */
-	if (PG_O_DIRECT != 0 && PG_IO_ALIGN_SIZE <= BLCKSZ)
-		Assert((uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, buffer));
-
-	/* This assert is too expensive to have on normally ... */
-#ifdef CHECK_WRITE_VS_EXTEND
-	Assert(blocknum >= mdnblocks(reln, forknum));
-#endif
-
-	/*
-	 * If a relation manages to grow to 2^32-1 blocks, refuse to extend it any
-	 * more --- we mustn't create a block whose number actually is
-	 * InvalidBlockNumber.  (Note that this failure should be unreachable
-	 * because of upstream checks in bufmgr.c.)
-	 */
-	if (blocknum == InvalidBlockNumber)
-		ereport(ERROR,
-(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("cannot extend file \"%s\" beyond %u blocks",
-		relpath(reln->smgr_rlocator, forknum),
-		InvalidBlockNumber)));
-
-	v = _mdfd_getseg(reln, forknum, blocknum, skipFsync, EXTENSION_CREATE);
-
-	seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
-
-	Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE);
-
-	if ((nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ, seekpos, WAIT_EVENT_DATA_FILE_EXTEND)) != BLCKSZ)
-	{
-		if (nbytes < 0)
-			ereport(ERROR,
-	(errcode_for_file_access(),
-	 errmsg("could not extend file \"%s\": %m",
-			FilePathName(v->mdfd_vfd)),
-	 errhint("Check free disk space.")));
-		/* short write: complain appropriately */
-		ereport(ERROR,
-(errcode(ERRCODE_DISK_FULL),
- errmsg("could not extend file \"%s\": wrote only %d of %d bytes at block %u",
-		FilePathName(v->mdfd_vfd),
-		nbytes, BLCKSZ, blocknum),
- errhint("Check free disk space.")));
-	}
-
-	if (!skipFsync && !SmgrIsTemp(reln))
-		register_dirty_segment(reln, forknum, v);
-
-	Assert(_mdnblocks(reln, forknum, v) <= ((BlockNumber) RELSEG_SIZE));
-}
-
 /*
  * mdzeroextend() -- Add new zeroed out blocks to the specified relation.
  *
- * Similar to mdextend(), except the relation can be extended by multiple
- * blocks at once and the added blocks will be filled with zeroes.
+ * The added blocks will be filled with zeroes.
  */
 void
 mdzeroextend(SMgrRelation reln, ForkNumber forknum,
@@ -595,13 +526,7 @@ mdzeroextend(SMgrRelation reln, ForkNumber forknum,
 		{
 			int			ret;
 
-			/*
-			 * Even if we don't want to use fallocate, we can still extend a
-			 * bit more efficiently than writing each 8kB block individually.
-			 * pg_pwrite_zeros() (via FileZero()) uses pg_pwritev_with_retry()
-			 * to avoid multiple 

Re: Hooking into ExplainOneQuery() complicated by missing standard_ExplainOneQuery

2024-03-10 Thread Michael Paquier
On Thu, Mar 07, 2024 at 07:45:01AM +0900, Michael Paquier wrote:
> That's nice.  You would be able to shave quite a bit of code.  If
> there are no objections, I propose to apply the change of this thread
> to have this standard explain wrapper at the beginning of next week.
> If others have any comments, feel free.

Well, done as of a04ddd077e61.
--
Michael


signature.asc
Description: PGP signature


Re: Adding OLD/NEW support to RETURNING

2024-03-10 Thread jian he
On Sat, Mar 9, 2024 at 3:53 AM Dean Rasheed  wrote:
>
>
> Attached is a new patch, now with docs (no other code changes).
>

Hi,
some issues I found, while playing around with
support-returning-old-new-v2.patch

doc/src/sgml/ref/update.sgml:
[ RETURNING [ WITH ( { OLD | NEW } AS output_alias [, ...] ) ]
* | output_expression [ [ AS ]
output_name ] [, ...] ]


There is no parameter explanation for `*`.
so, I think the synopsis may not cover cases like:
`
update foo set f3 = 443 RETURNING new.*;
`
I saw the explanation at output_alias, though.

-
insert into foo select 1, 2 RETURNING old.*, new.f2, old.f1();
ERROR:  function old.f1() does not exist
LINE 1: ...sert into foo select 1, 2 RETURNING old.*, new.f2, old.f1();
  ^
HINT:  No function matches the given name and argument types. You
might need to add explicit type casts.

I guess that's ok, slightly different context evaluation. if you say
"old.f1", old refers to the virtual table "old",
but "old.f1()", the "old" , reevaluate to the schema "old".
you need privilege to schema "old", you also need execution privilege
to function "old.f1()" to execute the above query.
so seems no security issue after all.
-
I found a fancy expression:
`
CREATE  TABLE foo (f1 serial, f2 text, f3 int default 42);
insert into foo select 1, 2 union select 11, 22 RETURNING old.*,
new.f2, (select sum(new.f1) over());
`
is this ok?

also the following works on PG16, not sure it's a bug.
`
 insert into foo select 1, 2 union select 11, 22 RETURNING  (select count(*));
`
but not these
`
insert into foo select 1, 2 union select 11, 22 RETURNING  (select
count(old.*));
insert into foo select 1, 2 union select 11, 22 RETURNING  (select sum(f1));
`
-
I found another interesting case, while trying to add some tests on
for new code in createplan.c.
in postgres_fdw.sql, right after line `MERGE ought to fail cleanly`

--this will work
insert into itrtest select 1, 'foo' returning new.*,old.*;
--these two will fail
insert into remp1 select 1, 'foo' returning new.*;
insert into remp1 select 1, 'foo' returning old.*;

itrtest is the partitioned non-foreign table.
remp1 is the partition of itrtest, foreign table.

--
I did find a segment fault bug.
insert into foo select 1, 2 RETURNING (select sum(old.f1) over());

This information is set in a subplan node.
/* update the ExprState's flags if Var refers to OLD/NEW */
if (variable->varreturningtype == VAR_RETURNING_OLD)
state->flags |= EEO_FLAG_HAS_OLD;
else if (variable->varreturningtype == VAR_RETURNING_NEW)
state->flags |= EEO_FLAG_HAS_NEW;

but in ExecInsert:
`
else if (resultRelInfo->ri_projectReturning->pi_state.flags & EEO_FLAG_HAS_OLD)
{
oldSlot = ExecGetReturningSlot(estate, resultRelInfo);
ExecStoreAllNullTuple(oldSlot);
oldSlot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
}
`
it didn't use subplan node state->flags information. so the ExecInsert
above code, never called, and should be executed.
however
`
insert into foo select 1, 2 RETURNING (select sum(new.f1)over());`
works

Similarly this
 `
delete from foo RETURNING (select sum(new.f1) over());
`
also causes segmentation fault.
--
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
new file mode 100644
index 6133dbc..c9d3661
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -411,12 +411,21 @@ slot_getsysattr(TupleTableSlot *slot, in
 {
  Assert(attnum < 0); /* caller error */

+ /*
+ * If the tid is not valid, there is no physical row, and all system
+ * attributes are deemed to be NULL, except for the tableoid.
+ */
  if (attnum == TableOidAttributeNumber)
  {
  *isnull = false;
  return ObjectIdGetDatum(slot->tts_tableOid);
  }
- else if (attnum == SelfItemPointerAttributeNumber)
+ if (!ItemPointerIsValid(>tts_tid))
+ {
+ *isnull = true;
+ return PointerGetDatum(NULL);
+ }
+ if (attnum == SelfItemPointerAttributeNumber)
  {
  *isnull = false;
  return PointerGetDatum(>tts_tid);

These changes is slot_getsysattr is somehow independ of this feature?




Re: ExecAppendAsyncEventWait() in REL_14_STABLE can corrupt PG_exception_stack

2024-03-10 Thread Michael Paquier
On Mon, Feb 26, 2024 at 04:29:44PM +0900, Etsuro Fujita wrote:
> Will do.  (I was thinking you would get busy from now on.)

Fujita-san, have you been able to look at this thread?
--
Michael


signature.asc
Description: PGP signature


Re: psql: add \create_function command

2024-03-10 Thread Steve Chavez
> Maybe we could go with :{+...} or the like?
> or maybe :{{ ... }}

Tab completion didn't work for :{?} and I noted that the same problem
would arise for :{+ or :{{ (and tab completion would be more important
here). So I fixed that on:

https://www.postgresql.org/message-id/flat/CAGRrpzZU48F2oV3d8eDLr=4tu9xfh5jt9ed+qu1+x91gmh6...@mail.gmail.com

Would be great to have the above fix reviewed/committed to keep making
progress here.

Besides that, since :{ is already sort of a prefix for psql functions, how
about having `:{file()}`? That would be clearer than :{+ or :{{.

Best regards,
Steve Chavez

On Mon, 29 Jan 2024 at 12:29, Pavel Stehule  wrote:

>
>
> po 29. 1. 2024 v 18:11 odesílatel Tom Lane  napsal:
>
>> Steve Chavez  writes:
>> > However, :{?variable_name} is already taken by psql to test whether a
>> > variable is defined or not. It might be confusing to use the same
>> syntax.
>>
>> Hmm.  Maybe we could go with :{+...} or the like?
>>
>> > How about using the convention of interpreting an identifier as a file
>> path
>> > if it has an slash on it?
>>
>> Sorry, that is just horrid.  foo/bar means division, and "foo/bar"
>> is simply an identifier per SQL standard, so you can't squeeze that
>> in without breaking an ocean of stuff.  Plus, there are many use-cases
>> where there's no reason to put a slash in a relative filename.
>>
>
> sometimes paths starts by $ or .
>
> or maybe :{{ ... }}
>
>
>
>>
>> regards, tom lane
>>
>


Re: Invalid query generated by postgres_fdw with UNION ALL and ORDER BY

2024-03-10 Thread David Rowley
On Fri, 8 Mar 2024 at 23:14, Richard Guo  wrote:
> I've looked at this patch a bit.  I once wondered why we don't check
> pathkey->pk_eclass->ec_has_const with EC_MUST_BE_REDUNDANT to see if the
> pathkey is not needed.  Then I realized that a child member would not be
> marked as constant even if the child expr is a Const, as explained in
> add_child_rel_equivalences().

This situation where the child member is a Const but the parent isn't
is unique to UNION ALL queries.  The only other cases where we have
child members are with partitioned and inheritance tables. In those
cases, the parent member just maps to the equivalent child member
replacing parent Vars with the corresponding child Var according to
the column mapping between the parent and child. It might be nice if
partitioning supported mapping to a Const as in many cases that could
save storing the same value in the table every time, but we don't
support that, so this can only happen with UNION ALL queries.

> BTW, I wonder if it is possible that we have a pseudoconstant expression
> that is not of type Const.  In such cases we would need to check
> 'bms_is_empty(pull_varnos(em_expr))' instead of 'IsA(em_expr, Const)'.
> However, I'm unable to think of such an expression in this context.

I can't see how there'd be any problems with a misinterpretation of a
pseudoconstant value as an ordinal column position on the remote
server. Surely it's only actual "Const" node types that we're just
going to call the type's output function which risks it yielding a
string of digits and the remote server thinking that we must mean an
ordinal column position.

> The patch looks good to me otherwise.

Thanks

David




Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

2024-03-10 Thread Thomas Munro
On Mon, Mar 11, 2024 at 9:59 AM Thomas Munro  wrote:
> On Mon, Mar 11, 2024 at 9:30 AM Heikki Linnakangas  wrote:
> > Hmm, I'm not sure if we need even smgrreleaseall() here anymore. It's
> > not required for correctness AFAICS. We don't do it in single-rel
> > invalidation in RelationCacheInvalidateEntry() either.
>
> I think we do, because we have missed sinval messages.  It's unlikely
> but a relfilenode might have been recycled, and we might have file
> descriptors that point to the unlinked files.  That is, there are new
> files with the same names and we need to open those ones.

... though I think you would be right if Dilip and Robert had
succeeded in their quest to introduce 56-bit non-cycling relfilenodes.
And for the record, we can also shoot ourselves in the foot in another
known case without sinval[1], so more work is needed here, but that
doesn't mean this sinval code path should also aim footwards.

[1] 
https://www.postgresql.org/message-id/flat/CA%2BhUKGLs554tQFCUjv_vn7ft9Xv5LNjPoAd--3Df%2BJJKJ7A8kw%40mail.gmail.com#f099d68e95edcfe408818447d9da04a7




Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

2024-03-10 Thread Thomas Munro
On Mon, Mar 11, 2024 at 9:30 AM Heikki Linnakangas  wrote:
> Barring objections, I'll commit the attached.

+1

I guess the comment for smgrreleaseall() could also be updated.  It
mentions only PROCSIGNAL_BARRIER_SMGRRELEASE, but I think sinval
overflow (InvalidateSystemCaches()) should also be mentioned?

> Hmm, I'm not sure if we need even smgrreleaseall() here anymore. It's
> not required for correctness AFAICS. We don't do it in single-rel
> invalidation in RelationCacheInvalidateEntry() either.

I think we do, because we have missed sinval messages.  It's unlikely
but a relfilenode might have been recycled, and we might have file
descriptors that point to the unlinked files.  That is, there are new
files with the same names and we need to open those ones.




Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

2024-03-10 Thread Heikki Linnakangas

On 10/03/2024 11:20, Heikki Linnakangas wrote:

On 10/03/2024 08:23, Thomas Munro wrote:

On Sun, Mar 10, 2024 at 6:48 PM Thomas Munro  wrote:

I won't be surprised if the answer is: if you're holding a reference,
you have to get a pin (referring to bulk_write.c).


Ahhh, on second thoughts, I take that back, I think the original
theory still actually works just fine.  It's just that somewhere in
our refactoring of that commit, when we were vacillating between
different semantics for 'destroy' and 'release', I think we made a
mistake: in RelationCacheInvalidate() I think we should now call
smgrreleaseall(), not smgrdestroyall().


Yes, I ran the reproducer with 'rr', and came to the same conclusion.
That smgrdestroyall() call closes destroys the SmgrRelation, breaking
the new assumption that an unpinned SmgrRelation is valid until end of
transaction.


That satisfies the
requirements for sinval queue overflow: we close md.c segments (and
most importantly virtual file descriptors), so our lack of sinval
records can't hurt us, we'll reopen all files as required.  That's
what CLOBBER_CACHE_ALWAYS is effectively testing (and more).  But the
smgr pointer remains valid, and retains only its "identity", eg hash
table key, and that's also fine.  It won't be destroyed until after
the end of the transaction.  Which was the point, and it allows things
like bulk_write.c (and streaming_read.c) to hold an smgr reference.
Right


+1.

I wonder if we can now simplify RelationCacheInvalidate() further. In
the loop above that smgrdestroyall():


while ((idhentry = (RelIdCacheEnt *) hash_seq_search()) != NULL)
{
relation = idhentry->reldesc;

/* Must close all smgr references to avoid leaving dangling 
ptrs */
RelationCloseSmgr(relation);

/*
 * Ignore new relations; no other backend will manipulate them 
before
 * we commit.  Likewise, before replacing a relation's 
relfilelocator,
 * we shall have acquired AccessExclusiveLock and drained any
 * applicable pending invalidations.
 */
if (relation->rd_createSubid != InvalidSubTransactionId ||
relation->rd_firstRelfilelocatorSubid != 
InvalidSubTransactionId)
continue;



I don't think we need to call RelationCloseSmgr() for relations created
in the same transaction anymore.


Barring objections, I'll commit the attached.

Hmm, I'm not sure if we need even smgrreleaseall() here anymore. It's 
not required for correctness AFAICS. We don't do it in single-rel 
invalidation in RelationCacheInvalidateEntry() either.


--
Heikki Linnakangas
Neon (https://neon.tech)
From be03591a4c69224de5b96335aa4a29b3462e2bc9 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Sun, 10 Mar 2024 22:18:59 +0200
Subject: [PATCH 1/1] Don't destroy SMgrRelations at relcache invalidation

With commit 21d9c3ee4e, SMgrRelations remain valid until end of
transaction (or longer if they're "pinned"). Relcache invalidation can
happen in the middle of a transaction, so we must not destroy them at
relcache invalidation anymore.

This was revealed by failures in the 'constraints' test in buildfarm
animals using -DCLOBBER_CACHE_ALWAYS. That started failing with commit
8af2565248, which was the first commit that started to rely on an
SMgrRelation living until end of transaction.

Diagnosed-by: Tomas Vondra, Thomas Munro
Discussion: https://www.postgresql.org/message-id/CA%2BhUKGK%2B5DOmLaBp3Z7C4S-Yv6yoROvr1UncjH2S1ZbPT8D%2BZg%40mail.gmail.com
---
 src/backend/utils/cache/relcache.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 7ad6a35a50e..2cd19d603fb 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -2985,9 +2985,6 @@ RelationCacheInvalidate(bool debug_discard)
 	{
 		relation = idhentry->reldesc;
 
-		/* Must close all smgr references to avoid leaving dangling ptrs */
-		RelationCloseSmgr(relation);
-
 		/*
 		 * Ignore new relations; no other backend will manipulate them before
 		 * we commit.  Likewise, before replacing a relation's relfilelocator,
@@ -3039,11 +3036,10 @@ RelationCacheInvalidate(bool debug_discard)
 	}
 
 	/*
-	 * Now zap any remaining smgr cache entries.  This must happen before we
-	 * start to rebuild entries, since that may involve catalog fetches which
-	 * will re-open catalog files.
+	 * We cannot destroy the SMgrRelations as there might still be references
+	 * to them, but close the underlying file descriptors.
 	 */
-	smgrdestroyall();
+	smgrreleaseall();
 
 	/* Phase 2: rebuild the items found to need rebuild in phase 1 */
 	foreach(l, rebuildFirstList)
-- 
2.39.2



Re: Statistics Import and Export

2024-03-10 Thread Corey Huinker
On Sun, Mar 10, 2024 at 11:57 AM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Fri, Mar 8, 2024 at 12:06 PM Corey Huinker 
> wrote:
> >
> > Anyway, here's v7. Eagerly awaiting feedback.
>
> Thanks for working on this. It looks useful to have the ability to
> restore the stats after upgrade and restore. But, the exported stats
> are valid only until the next ANALYZE is run on the table. IIUC,
> postgres collects stats during VACUUM, autovacuum and ANALYZE, right?
>  Perhaps there are other ways to collect stats. I'm thinking what
> problems does the user face if they are just asked to run ANALYZE on
> the tables (I'm assuming ANALYZE doesn't block concurrent access to
> the tables) instead of automatically exporting stats.
>

Correct. These are just as temporary as any other analyze of the table.
Another analyze will happen later, probably through autovacuum and wipe out
these values. This is designed to QUICKLY get stats into a table to enable
the database to be operational sooner. This is especially important after
an upgrade/restore, when all stats were wiped out. Other uses could be
adapting this for use the postgres_fdw so that we don't have to do table
sampling on the remote table, and of course statistics injection to test
the query planner.


> 2.
> +they are replaced by the next auto-analyze. This function is used
> by
> +pg_upgrade and pg_restore to
> +convey the statistics from the old system version into the new
> one.
> +   
>
> Is there any demonstration of pg_set_relation_stats and
> pg_set_attribute_stats being used either in pg_upgrade or in
> pg_restore? Perhaps, having them as 0002, 0003 and so on patches might
> show real need for functions like this. It also clarifies how these
> functions pull stats from tables on the old cluster to the tables on
> the new cluster.
>

That code was adapted from do_analyze(), and yes, there is a patch for
pg_dump, but as I noted earlier it is on hold pending feedback.


>
> 3. pg_set_relation_stats and pg_set_attribute_stats seem to be writing
> to pg_class and might affect the plans as stats can get tampered. Can
> we REVOKE the execute permissions from the public out of the box in
> src/backend/catalog/system_functions.sql? This way one can decide who
> to give permissions to.
>

You'd have to be the table owner to alter the stats. I can envision these
functions getting a special role, but they could also be fine as
superuser-only.


>
> 4.
> +SELECT pg_set_relation_stats('stats_export_import.test'::regclass,
> 3.6::float4, 15000);
> + pg_set_relation_stats
> +---
> + t
> +(1 row)
> +
> +SELECT reltuples, relpages FROM pg_class WHERE oid =
> 'stats_export_import.test'::regclass;
> + reltuples | relpages
> +---+--
> +   3.6 |15000
>
> Isn't this test case showing a misuse of these functions? Table
> actually has  no rows, but we are lying to the postgres optimizer on
> stats.


Consider this case. You want to know at what point the query planner will
start using a given index. You can generate dummy data for a thousand, a
million, a billion rows, and wait for that to complete, or you can just
tell the table "I say you have a billion rows, twenty million pages, etc"
and see when it changes.

But again, in most cases, you're setting the values to the same values the
table had on the old database just before the restore/upgrade.


> I think altering stats of a table mustn't be that easy for the
> end user.


Only easy for the end users that happen to be the table owner or a
superuser.


> As mentioned in comment #3, permissions need to be
> tightened. In addition, we can also mark the functions pg_upgrade only
> with CHECK_IS_BINARY_UPGRADE, but that might not work for pg_restore
> (or I don't know if we have a way to know within the server that the
> server is running for pg_restore).
>

I think they will have usage both in postgres_fdw and for tuning.


>
> 5. In continuation to the comment #2, is pg_dump supposed to generate
> pg_set_relation_stats and pg_set_attribute_stats statements for each
> table? When pg_dump does that , pg_restore can automatically load the
> stats.
>

Current plan is to have one TOC entry in the post-data section with a
dependency on the table/index/matview. That let's us leverage existing
filters. The TOC entry will have a series of statements in it, one
pg_set_relation_stats() and one pg_set_attribute_stats() per attribute.


> 9. Isn't it good to add a test case where the plan of a query on table
> after exporting the stats would remain same as that of the original
> table from which the stats are exported? IMO, this is a more realistic
> than just comparing pg_statistic of the tables because this is what an
> end-user wants eventually.
>

I'm sure we can add something like that, but query plan formats change a
lot and are greatly dependent on database configuration, so maintaining
such a test would be a lot of work.


Re: pg_dump include/exclude fails to error

2024-03-10 Thread Magnus Hagander
On Sun, Mar 10, 2024 at 4:51 PM Pavel Stehule  wrote:
>
> Hi
>
> ne 10. 3. 2024 v 15:23 odesílatel Magnus Hagander  
> napsal:
>>
>> When including tables with the new pg_dump functionality, it fails to
>> error out if a table is missing, but only if more than one table is
>> specified.
>>
>> E.g., if table foo exist, but not bar:
>>
>> pg_dump --table bar
>> pg_dump: error: no matching tables were found
>>
>> with file "myfilter" containing just "table bar"
>> pg_dump --filter myfilter
>> pg_dump: error: no matching tables were found
>>
>> with the file "myfilter" containing both "table foo" and "table bar"
>> (order doesn't matter):
>> 
>
>
> is not this expected behaviour (consistent with -t option)?
>
> (2024-03-10 16:48:07) postgres=# \dt
> List of relations
> ┌┬──┬───┬───┐
> │ Schema │ Name │ Type  │ Owner │
> ╞╪══╪═══╪═══╡
> │ public │ foo  │ table │ pavel │
> └┴──┴───┴───┘
> (1 row)
>
> pavel@nemesis:~/src/orafce$ /usr/local/pgsql/master/bin/pg_dump -t foo -t boo 
> > /dev/null
> pavel@nemesis:~/src/orafce$
>
> if you want to raise error, you should to use option --strict-names.
>
> pavel@nemesis:~/src/orafce$ /usr/local/pgsql/master/bin/pg_dump -t foo -t boo 
> --strict-names > /dev/null
> pg_dump: error: no matching tables were found for pattern "boo"

Hmpf, you're right. I guess I don't use multiple-dash-t often enough
:) So yeah, then I agree this is probably the right behaviour. Maybe
the docs for --filter deserves a specific mention about these rules
though, as it's going to be a lot more common to specify multiples
when using --filter?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: Confine vacuum skip logic to lazy_scan_skip

2024-03-10 Thread Melanie Plageman
On Wed, Mar 6, 2024 at 6:47 PM Melanie Plageman
 wrote:
>
> Performance results:
>
> The TL;DR of my performance results is that streaming read vacuum is
> faster. However there is an issue with the interaction of the streaming
> read code and the vacuum buffer access strategy which must be addressed.

I have investigated the interaction between
maintenance_io_concurrency, streaming reads, and the vacuum buffer
access strategy (BAS_VACUUM).

The streaming read API limits max_pinned_buffers to a pinned buffer
multiplier (currently 4) * maintenance_io_concurrency buffers with the
goal of constructing reads of at least MAX_BUFFERS_PER_TRANSFER size.

Since the BAS_VACUUM ring buffer is size 256 kB or 32 buffers with
default block size, that means that for a fully uncached vacuum in
which all blocks must be vacuumed and will be dirtied, you'd have to
set maintenance_io_concurrency at 8 or lower to see the same number of
reuses (and shared buffer consumption) as master.

Given that we allow users to specify BUFFER_USAGE_LIMIT to vacuum, it
seems like we should force max_pinned_buffers to a value that
guarantees the expected shared buffer usage by vacuum. But that means
that maintenance_io_concurrency does not have a predictable impact on
streaming read vacuum.

What is the right thing to do here?

At the least, the default size of the BAS_VACUUM ring buffer should be
BLCKSZ * pinned_buffer_multiplier * default maintenance_io_concurrency
(probably rounded up to the next power of two) bytes.

- Melanie




Re: UUID v7

2024-03-10 Thread Andrey M. Borodin


> On 10 Mar 2024, at 17:59, Andrey M. Borodin  wrote:
> 
> I tried to "make docs", but it gives me gazilion of errors... Is there an 
> easy way to see resulting HTML?

Oops, CFbot expectedly found a problem...
Sorry for the noise, this version, I hope, will pass all the tests.
Thanks!


Best regards, Andrey Borodin.


v19-0001-Implement-UUID-v7.patch
Description: Binary data


Re: Statistics Import and Export

2024-03-10 Thread Bharath Rupireddy
On Fri, Mar 8, 2024 at 12:06 PM Corey Huinker  wrote:
>
> Anyway, here's v7. Eagerly awaiting feedback.

Thanks for working on this. It looks useful to have the ability to
restore the stats after upgrade and restore. But, the exported stats
are valid only until the next ANALYZE is run on the table. IIUC,
postgres collects stats during VACUUM, autovacuum and ANALYZE, right?
 Perhaps there are other ways to collect stats. I'm thinking what
problems does the user face if they are just asked to run ANALYZE on
the tables (I'm assuming ANALYZE doesn't block concurrent access to
the tables) instead of automatically exporting stats.

Here are some comments on the v7 patch. I've not dived into the whole
thread, so some comments may be of type repeated or need
clarification. Please bear with me.

1. The following two are unnecessary changes in pg_proc.dat, please remove them.

--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8818,7 +8818,6 @@
 { oid => '3813', descr => 'generate XML text node',
   proname => 'xmltext', proisstrict => 't', prorettype => 'xml',
   proargtypes => 'text', prosrc => 'xmltext' },
-
 { oid => '2923', descr => 'map table contents to XML',
   proname => 'table_to_xml', procost => '100', provolatile => 's',
   proparallel => 'r', prorettype => 'xml',
@@ -12163,8 +12162,24 @@

 # GiST stratnum implementations
 { oid => '8047', descr => 'GiST support',
-  proname => 'gist_stratnum_identity', prorettype => 'int2',
+  proname => 'gist_stratnum_identity',prorettype => 'int2',
   proargtypes => 'int2',
   prosrc => 'gist_stratnum_identity' },

2.
+they are replaced by the next auto-analyze. This function is used by
+pg_upgrade and pg_restore to
+convey the statistics from the old system version into the new one.
+   

Is there any demonstration of pg_set_relation_stats and
pg_set_attribute_stats being used either in pg_upgrade or in
pg_restore? Perhaps, having them as 0002, 0003 and so on patches might
show real need for functions like this. It also clarifies how these
functions pull stats from tables on the old cluster to the tables on
the new cluster.

3. pg_set_relation_stats and pg_set_attribute_stats seem to be writing
to pg_class and might affect the plans as stats can get tampered. Can
we REVOKE the execute permissions from the public out of the box in
src/backend/catalog/system_functions.sql? This way one can decide who
to give permissions to.

4.
+SELECT pg_set_relation_stats('stats_export_import.test'::regclass,
3.6::float4, 15000);
+ pg_set_relation_stats
+---
+ t
+(1 row)
+
+SELECT reltuples, relpages FROM pg_class WHERE oid =
'stats_export_import.test'::regclass;
+ reltuples | relpages
+---+--
+   3.6 |15000

Isn't this test case showing a misuse of these functions? Table
actually has  no rows, but we are lying to the postgres optimizer on
stats. I think altering stats of a table mustn't be that easy for the
end user. As mentioned in comment #3, permissions need to be
tightened. In addition, we can also mark the functions pg_upgrade only
with CHECK_IS_BINARY_UPGRADE, but that might not work for pg_restore
(or I don't know if we have a way to know within the server that the
server is running for pg_restore).

5. In continuation to the comment #2, is pg_dump supposed to generate
pg_set_relation_stats and pg_set_attribute_stats statements for each
table? When pg_dump does that , pg_restore can automatically load the
stats.

6.
+/*-
* * statistics.c *
+ * IDENTIFICATION
+ *   src/backend/statistics/statistics.c
+ *
+ *-

A description of what the new file statistics.c does is missing.

7. pgindent isn't happy with new file statistics.c, please check.

8.
+/*
+ * Import statistics for a given relation attribute
+ *
+ * pg_set_attribute_stats(relation Oid, attname name, stainherit bool,
+ *stanullfrac float4, stawidth int, stadistinct float4,

Having function definition in the function comment isn't necessary -
it's hard to keep it consistent with pg_proc.dat in future. If
required, one can either look at pg_proc.dat or docs.

9. Isn't it good to add a test case where the plan of a query on table
after exporting the stats would remain same as that of the original
table from which the stats are exported? IMO, this is a more realistic
than just comparing pg_statistic of the tables because this is what an
end-user wants eventually.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: pg_dump include/exclude fails to error

2024-03-10 Thread Pavel Stehule
Hi

ne 10. 3. 2024 v 15:23 odesílatel Magnus Hagander 
napsal:

> When including tables with the new pg_dump functionality, it fails to
> error out if a table is missing, but only if more than one table is
> specified.
>
> E.g., if table foo exist, but not bar:
>
> pg_dump --table bar
> pg_dump: error: no matching tables were found
>
> with file "myfilter" containing just "table bar"
> pg_dump --filter myfilter
> pg_dump: error: no matching tables were found
>
> with the file "myfilter" containing both "table foo" and "table bar"
> (order doesn't matter):
> 
>

is not this expected behaviour (consistent with -t option)?

(2024-03-10 16:48:07) postgres=# \dt
List of relations
┌┬──┬───┬───┐
│ Schema │ Name │ Type  │ Owner │
╞╪══╪═══╪═══╡
│ public │ foo  │ table │ pavel │
└┴──┴───┴───┘
(1 row)

pavel@nemesis:~/src/orafce$ /usr/local/pgsql/master/bin/pg_dump -t foo -t
boo > /dev/null
pavel@nemesis:~/src/orafce$

if you want to raise error, you should to use option --strict-names.

pavel@nemesis:~/src/orafce$ /usr/local/pgsql/master/bin/pg_dump -t foo -t
boo --strict-names > /dev/null
pg_dump: error: no matching tables were found for pattern "boo"

Regards

Pavel


>
> Not having looked into the code, but it looks to me like some variable
> isn't properly reset, or perhaps there is a check for existence rather
> than count?
>
> //Magnus
>


Re: remaining sql/json patches

2024-03-10 Thread jian he
one more issue.
+ case JSON_VALUE_OP:
+ /* Always omit quotes from scalar strings. */
+ jsexpr->omit_quotes = (func->quotes == JS_QUOTES_OMIT);
+
+ /* JSON_VALUE returns text by default. */
+ if (!OidIsValid(jsexpr->returning->typid))
+ {
+ jsexpr->returning->typid = TEXTOID;
+ jsexpr->returning->typmod = -1;
+ }

by default, makeNode(JsonExpr), node initialization,
jsexpr->omit_quotes will initialize to false,
Even though there was no implication to the JSON_TABLE patch (probably
because coerceJsonFuncExprOutput), all tests still passed.
based on the above comment, and the regress test, you still need do (i think)
`
jsexpr->omit_quotes = true;
`




Re: pg_dump include/exclude fails to error

2024-03-10 Thread Daniel Gustafsson
> On 10 Mar 2024, at 15:22, Magnus Hagander  wrote:

> Not having looked into the code, but it looks to me like some variable
> isn't properly reset, or perhaps there is a check for existence rather
> than count?

Thanks for the report, I'll have a look later today when back.

--
Daniel Gustafsson





Re: [DOC] Add detail regarding resource consumption wrt max_connections

2024-03-10 Thread Michael Banck
Hi,

On Sun, Mar 10, 2024 at 09:58:25AM -0400, Robert Treat wrote:
> On Fri, Mar 8, 2024 at 10:47 AM Michael Banck  wrote:
> > On Fri, Jan 12, 2024 at 10:14:38PM +, Cary Huang wrote:
> > > I think it is good to warn the user about the increased allocation of
> > > memory for certain parameters so that they do not abuse it by setting
> > > it to a huge number without knowing the consequences.
> >
> > Right, and I think it might be useful to log (i.e. at LOG not DEBUG3
> > level, with a nicer message) the amount of memory we allocate on
> > startup, that is just one additional line per instance lifetime but
> > might be quite useful to admins. Or maybe two lines if we log whether we
> > could allocate it as huge pages or not as well:
> >
> > |2024-03-08 16:46:13.117 CET [237899] DEBUG:  invoking 
> > IpcMemoryCreate(size=145145856)
> > |2024-03-08 16:46:13.117 CET [237899] DEBUG:  mmap(146800640) with 
> > MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory
> >
> 
> If we were going to add these details (and I very much like the idea),
> I would advocate that we put it somewhere more permanent than a single
> log entry at start-up. Given that database up-times easily run months
> and sometimes years, it is hard to imagine we'd always have access to
> the log files to figure this out on any actively running systems.

Well actually, those two numbers are already available at runtime, via
the shared_memory_size and (from 17 on) huge_pages_status GUCs.

So this would be geared at admins that keeps in long-term storage and
want to know what the numbers were a while ago. Maybe it is not that
interesting, but I think one or two lines at startup would not hurt.


Michael




pg_dump include/exclude fails to error

2024-03-10 Thread Magnus Hagander
When including tables with the new pg_dump functionality, it fails to
error out if a table is missing, but only if more than one table is
specified.

E.g., if table foo exist, but not bar:

pg_dump --table bar
pg_dump: error: no matching tables were found

with file "myfilter" containing just "table bar"
pg_dump --filter myfilter
pg_dump: error: no matching tables were found

with the file "myfilter" containing both "table foo" and "table bar"
(order doesn't matter):


Not having looked into the code, but it looks to me like some variable
isn't properly reset, or perhaps there is a check for existence rather
than count?

//Magnus




Re: [DOC] Add detail regarding resource consumption wrt max_connections

2024-03-10 Thread Robert Treat
On Fri, Mar 8, 2024 at 10:47 AM Michael Banck  wrote:
>
> Hi,
>
> On Fri, Jan 12, 2024 at 10:14:38PM +, Cary Huang wrote:
> > I think it is good to warn the user about the increased allocation of
> > memory for certain parameters so that they do not abuse it by setting
> > it to a huge number without knowing the consequences.
>
> Right, and I think it might be useful to log (i.e. at LOG not DEBUG3
> level, with a nicer message) the amount of memory we allocate on
> startup, that is just one additional line per instance lifetime but
> might be quite useful to admins. Or maybe two lines if we log whether we
> could allocate it as huge pages or not as well:
>
> |2024-03-08 16:46:13.117 CET [237899] DEBUG:  invoking 
> IpcMemoryCreate(size=145145856)
> |2024-03-08 16:46:13.117 CET [237899] DEBUG:  mmap(146800640) with 
> MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory
>

If we were going to add these details (and I very much like the idea),
I would advocate that we put it somewhere more permanent than a single
log entry at start-up. Given that database up-times easily run months
and sometimes years, it is hard to imagine we'd always have access to
the log files to figure this out on any actively running systems.

Robert Treat
https://xzilla.net




Re: UUID v7

2024-03-10 Thread Andrey M. Borodin
Hi Peter,

thank you for so thoughtful review.

> On 6 Mar 2024, at 12:13, Peter Eisentraut  wrote:
> 
> I have various comments on this patch:
> 
> 
> - doc/src/sgml/func.sgml
> 
> The documentation of the new functions should be broken up a bit.
> It's all one paragraph now.  At least make it several paragraphs, or
> possibly tables or something else.
I've split functions to generate UUIDs from functions to extract stuff.

> 
> Avoid listing the functions twice: Once before the description and
> then again in the description.  That's just going to get out of date.
> The first listing is not necessary, I think.

Fixed.

> The return values in the documentation should use the public-facing
> type names, like "timestamp with time zone" and "smallint".

Fixed.

> The descriptions of the UUID generation functions use handwavy
> language in their descriptions, like "It provides much better data
> locality" or "unacceptable security or business intelligence
> implications", which isn't useful.  Either we cut that all out and
> just say, it creates a UUIDv7, done, look elsewhere for more
> information, or we provide some more concretely useful details.

I've removed all that stuff entirely.

> We shouldn't label a link as "IETF standard" when it's actually a
> draft.

Fixed.

Well, all my modifications of documentation are kind of blind... I tried to 
"make docs", but it gives me gazilion of errors... Is there an easy way to see 
resulting HTML?


> - src/include/catalog/pg_proc.dat
> 
> The description of uuidv4 should be "generate UUID version 4", so that
> it parallels uuidv7.

Fixed.

> The description of uuid_extract_time says 'extract timestamp from UUID
> version 7', the implementation is not limited to version 7.

Fixed.

> I think uuid_extract_time should be named uuid_extract_timestamp,
> because it extracts a timestamp, not a time.

Renamed.

> The functions uuid_extract_ver and uuid_extract_var could be named
> uuid_extract_version and uuid_extract_variant.  Otherwise, it's hard
> to tell them apart, with only one letter different.

Renamed.

> - src/test/regress/sql/uuid.sql
> 
> Why are the tests using the input format '{...}', which is not the
> standard one?

Fixed.

> - src/backend/utils/adt/uuid.c
> 
> All this new code should have more comments.  There is a lot of bit
> twiddling going on, and I suppose one is expected to follow along in
> the RFC?  At least each function should have a header comment, so one
> doesn't have to check in pg_proc.dat what it's supposed to do.

I've added some header comment. One big comment is attached to v7, I tried to 
take parts mostly from RFC. Yet there are a lot of my additions that now need 
review...

> I'm suspicious that these functions all appear to return null for
> erroneous input, rather than raising errors.  I think at least some
> explanation for this should be recorded somewhere.

The input is not erroneous per se.
But the fact that
# select 1/0;
ERROR:  division by zero
makes me consider throwing an error. There was some argumentation upthread for 
not throwing error though, but now I cannot find it... maybe I accepted this 
behaviour as more user-friendly.

> I think the behavior of uuid_extract_var(iant) is wrong.  The code
> takes just two bits to return, but the draft document is quite clear
> that the variant is 4 bits (see Table 1).

Well, it was correct only for implemented variant. I've made version that 
implements full table 1 from section 4.1.

> The uuidv7 function could really use a header comment that explains
> the choices that were made.  The RFC draft provides various options
> that implementations could use; we should describe which ones we
> chose.

Done.

> 
> I would have expected that, since gettimeofday() provides microsecond
> precision, we'd put the extra precision into "rand_a" as per Section 6.2 
> method 3.

I had chosen method 2 over method 3 as most portable. Can we be sure how many 
bits (after reading milliseconds) are there across different OSes? Even if we 
put extra 10 bits of timestamp, we cannot extract safely them.
These bits could promote inter-backend stortability. E.i. when many backends 
generate data fast - this data is still somewhat ordered even within 1ms. But I 
think benefits of this sortability are outweighed by portability(unknown real 
resolution), simplicity(we don't store microseconds, thus do not try to extract 
them).
All this arguments are weak, but if one method would be strictly better than 
another - there would be only one method.

> 
> You use some kind of counter, but could you explain which method that
> counter implements?
I described counter in uuidv7() header.

> 
> I don't see any acknowledgment of issues relating to concurrency or
> restarts.  Like, how do we prevent duplicates being generated by
> concurrent sessions or between restarts?  Maybe the counter or random
> stuff does that, but it's not explained.

I think restart takes more than 1ms, so this is covered with time 

Re: Failures in constraints regression test, "read only 0 of 8192 bytes"

2024-03-10 Thread Heikki Linnakangas

Thanks for diagnosing this!

On 10/03/2024 08:23, Thomas Munro wrote:

On Sun, Mar 10, 2024 at 6:48 PM Thomas Munro  wrote:

I won't be surprised if the answer is: if you're holding a reference,
you have to get a pin (referring to bulk_write.c).


Ahhh, on second thoughts, I take that back, I think the original
theory still actually works just fine.  It's just that somewhere in
our refactoring of that commit, when we were vacillating between
different semantics for 'destroy' and 'release', I think we made a
mistake: in RelationCacheInvalidate() I think we should now call
smgrreleaseall(), not smgrdestroyall().


Yes, I ran the reproducer with 'rr', and came to the same conclusion. 
That smgrdestroyall() call closes destroys the SmgrRelation, breaking 
the new assumption that an unpinned SmgrRelation is valid until end of 
transaction.



That satisfies the
requirements for sinval queue overflow: we close md.c segments (and
most importantly virtual file descriptors), so our lack of sinval
records can't hurt us, we'll reopen all files as required.  That's
what CLOBBER_CACHE_ALWAYS is effectively testing (and more).  But the
smgr pointer remains valid, and retains only its "identity", eg hash
table key, and that's also fine.  It won't be destroyed until after
the end of the transaction.  Which was the point, and it allows things
like bulk_write.c (and streaming_read.c) to hold an smgr reference.
Right


+1.

I wonder if we can now simplify RelationCacheInvalidate() further. In 
the loop above that smgrdestroyall():



while ((idhentry = (RelIdCacheEnt *) hash_seq_search()) != NULL)
{
relation = idhentry->reldesc;

/* Must close all smgr references to avoid leaving dangling 
ptrs */
RelationCloseSmgr(relation);

/*
 * Ignore new relations; no other backend will manipulate them 
before
 * we commit.  Likewise, before replacing a relation's 
relfilelocator,
 * we shall have acquired AccessExclusiveLock and drained any
 * applicable pending invalidations.
 */
if (relation->rd_createSubid != InvalidSubTransactionId ||
relation->rd_firstRelfilelocatorSubid != 
InvalidSubTransactionId)
continue;



I don't think we need to call RelationCloseSmgr() for relations created 
in the same transaction anymore.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Combine headerscheck and cpluspluscheck scripts

2024-03-10 Thread Peter Eisentraut

On 07.03.24 08:30, Michael Paquier wrote:

On Thu, Mar 07, 2024 at 01:37:36PM +1300, Thomas Munro wrote:

+1


Looking at the patch, nice cleanup.


Committed, thanks.