RE: Is this a problem in GenericXLogFinish()?

2024-02-04 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

> 
> @@ -692,6 +697,9 @@ _hash_freeovflpage(Relation rel, Buffer bucketbuf,
> Buffer ovflbuf,
>   if (!xlrec.is_prev_bucket_same_wrt)
>   wbuf_flags |= REGBUF_NO_CHANGE;
>   XLogRegisterBuffer(1, wbuf, wbuf_flags);
> +
> + /* Track the registration status for later use */
> + wbuf_registered = true;
>   }
> 
>   XLogRegisterBuffer(2, ovflbuf, REGBUF_STANDARD);
> @@ -719,7 +727,12 @@ _hash_freeovflpage(Relation rel, Buffer
> bucketbuf, Buffer ovflbuf,
> 
>   recptr = XLogInsert(RM_HASH_ID, XLOG_HASH_SQUEEZE_PAGE);
> 
> - PageSetLSN(BufferGetPage(wbuf), recptr);
> + /* Set LSN to wbuf page buffer only when it is being registered */
> + if (wbuf_registered)
> + PageSetLSN(BufferGetPage(wbuf), recptr);
> 
> Why set LSN when the page is not modified (say when we use the flag
> REGBUF_NO_CHANGE)?  I think you need to use a flag mod_wbuf and set it
> in appropriate places during register buffer calls.

You are right. Based on the previous discussions, PageSetLSN() must be called
after the MakeBufferDirty(). REGBUF_NO_CHANGE has been introduced for skipping
these requirements. Definitevely, no_change buffers must not be PageSetLSN()'d.
Other pages, e.g., metabuf, has already been followed the rule.

I updated the patch based on the requirement.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



v2_avoid_registration.patch
Description: v2_avoid_registration.patch


Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-02-04 Thread Dilip Kumar
On Sun, Feb 4, 2024 at 7:10 PM Alvaro Herrera  wrote:
>
> On 2024-Feb-02, Dilip Kumar wrote:
>
> > I have checked the patch and it looks fine to me other than the above
> > question related to memory barrier usage one more question about the
> > same, basically below to instances 1 and 2 look similar but in 1 you
> > are not using the memory write_barrier whereas in 2 you are using the
> > write_barrier, why is it so?  I mean why the reordering can not happen
> > in 1 and it may happen in 2?
>
> What I was thinking is that there's a lwlock operation just below, which
> acts as a barrier.  But I realized something more important: there are
> only two places that matter, which are SlruSelectLRUPage and
> SimpleLruZeroPage.  The others are all initialization code that run at a
> point where there's no going to be any concurrency in SLRU access, so we
> don't need barriers anyway.  In SlruSelectLRUPage we definitely don't
> want to evict the page that SimpleLruZeroPage has initialized, starting
> from the point where it returns that new page to its caller.
> But if you consider the code of those two routines, you realize that the
> only time an equality between latest_page_number and "this_page_number"
> is going to occur, is when both pages are in the same bank ... and both
> routines are required to be holding the bank lock while they run, so in
> practice this is never a problem.

Right, in fact when I first converted this 'latest_page_number' to an
atomic the thinking was to protect it from concurrently setting the
values in SimpleLruZeroPage() and also concurrently reading in
SlruSelectLRUPage() should not read the corrupted value.  All other
usages were during the initialization phase where we do not need any
protection.

>
> We need the atomic write and atomic read so that multiple processes
> processing pages in different banks can update latest_page_number
> simultaneously.  But the equality condition that we're looking for?
> it can never happen concurrently.

Yeah, that's right, after you told I also realized that the case is
protected by the bank lock.  Earlier I didn't think about this case.

> In other words, these barriers are fully useless.
>
> (We also have SimpleLruTruncate, but I think it's not as critical to
> have a barrier there anyhow: accessing a slightly outdated page number
> could only be a problem if a bug elsewhere causes us to try to truncate
> in the current page.  I think we only have this code there because we
> did have such bugs in the past, but IIUC this shouldn't happen anymore.)

+1, I agree with this theory in general.  But the below comment in
SimpleLruTrucate in your v3 patch doesn't seem correct, because here
we are checking if the latest_page_number is smaller than the cutoff
if so we log it as wraparound and skip the whole thing and that is
fine even if we are reading with atomic variable and slightly outdated
value should not be a problem but the comment claim that this safe
because we have the same bank lock as SimpleLruZeroPage(), but that's
not true here we will be acquiring different bank locks one by one
based on which slotno we are checking.  Am I missing something?


+ * An important safety check: the current endpoint page must not be
+ * eligible for removal.  Like SlruSelectLRUPage, we don't need a
+ * memory barrier here because for the affected page to be relevant,
+ * we'd have to have the same bank lock as SimpleLruZeroPage.
  */
- if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
+ if (ctl->PagePrecedes(pg_atomic_read_u64(>latest_page_number),
+   cutoffPage))


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: cfbot does not list patches in 'Current commitfest'

2024-02-04 Thread Daniel Gustafsson
> On 5 Feb 2024, at 08:00, Richard Guo  wrote:
> 
> ... and the patches in CF #47 (currently open) are listed in 'Next
> commitfest'.  I guess we need to manually create a "next" commitfest?

I've added a years worth of commitfests to the CF app, but I don't know how to
update the CFBot (it might just happen automagically).

--
Daniel Gustafsson





Re: pg_column_toast_chunk_id: a function to get a chunk ID of a TOASTed value

2024-02-04 Thread Yugo NAGATA
On Thu, 1 Feb 2024 17:59:56 +0800
jian he  wrote:

> On Thu, Feb 1, 2024 at 12:45 PM Yugo NAGATA  wrote:
> >
> > Here is a updated patch, v6.
> 
> v6 patch looks good.

Thank you for your review and updating the status to RwC!

Regards,
Yugo Nagata


-- 
Yugo NAGATA 




Re: cataloguing NOT NULL constraints

2024-02-04 Thread Michael Paquier
On Fri, Feb 02, 2024 at 07:00:01PM +0300, Alexander Lakhin wrote:
> results in:
> NOTICE:  merging definition of column "i" for child "b"
> NOTICE:  merging definition of column "i" for child "c"
> ERROR:  tuple already updated by self
> 
> (This is similar to bug #18297, but ATExecAddColumn() isn't guilty in this
> case.)

Still I suspect that the fix should be similar, soI'll go put a coin
on a missing CCI().
--
Michael


signature.asc
Description: PGP signature


Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-04 Thread Michael Paquier
On Fri, Feb 02, 2024 at 05:46:18PM +0900, Sutou Kouhei wrote:
> Hi,
> 
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Fri, 2 Feb 2024 17:04:28 +0900,
>   Michael Paquier  wrote:
> 
> > One idea I was considering is whether we should use a special value in
> > the "format" DefElem, say "custom:$my_custom_format" where it would be
> > possible to bypass the formay check when processing options and find
> > the routines after processing all the options.  I'm not wedded to
> > that, but attaching the routines to the state data is IMO the correct
> > thing, because this has nothing to do with CopyFormatOptions.
> 
> Thanks for sharing your idea.
> Let's discuss how to support custom options after we
> complete the current performance changes.
> 
> >> I'm OK with the approach. But how about adding the extra
> >> callbacks to Copy{From,To}StateData not
> >> Copy{From,To}Routines like CopyToStateData::data_dest_cb and
> >> CopyFromStateData::data_source_cb? They are only needed for
> >> "text" and "csv". So we don't need to add them to
> >> Copy{From,To}Routines to keep required callback minimum.
> > 
> > And set them in cstate while we are in the Start routine, right?
> 
> I imagined that it's done around the following part:
> 
> @@ -1418,6 +1579,9 @@ BeginCopyFrom(ParseState *pstate,
> /* Extract options from the statement node tree */
> ProcessCopyOptions(pstate, >opts, true /* is_from */ , 
> options);
>  
> +   /* Set format routine */
> +   cstate->routine = CopyFromGetRoutine(cstate->opts);
> +
> /* Process the target relation */
> cstate->rel = rel;
>  
> 
> Example1:
> 
> /* Set format routine */
> cstate->routine = CopyFromGetRoutine(cstate->opts);
> if (!cstate->opts.binary)
> if (cstate->opts.csv_mode)
> cstate->copy_read_attributes = CopyReadAttributesCSV;
> else
> cstate->copy_read_attributes = CopyReadAttributesText;
> 
> Example2:
> 
> static void
> CopyFromSetRoutine(CopyFromState cstate)
> {
> if (cstate->opts.csv_mode)
> {
> cstate->routine = 
> cstate->copy_read_attributes = CopyReadAttributesCSV;
> }
> else if (cstate.binary)
> cstate->routine = 
> else
> {
> cstate->routine = 
> cstate->copy_read_attributes = CopyReadAttributesText;
> }
> }
> 
> BeginCopyFrom()
> {
> /* Set format routine */
> CopyFromSetRoutine(cstate);
> }
> 
> 
> But I don't object your original approach. If we have the
> extra callbacks in Copy{From,To}Routines, I just don't use
> them for my custom format extension.
> 
> >> What is the better next action for us? Do you want to
> >> complete the WIP v11 patch set by yourself (and commit it)?
> >> Or should I take over it?
> > 
> > I was planning to work on that, but wanted to be sure how you felt
> > about the problem with text and csv first.
> 
> OK.
> My opinion is the above. I have an idea how to implement it
> but it's not a strong idea. You can choose whichever you like.

So, I've looked at all that today, and finished by applying two
patches as of 2889fd23be56 and 95fb5b49024a to get some of the
weirdness with the workhorse routines out of the way.  Both have added
callbacks assigned in their respective cstate data for text and csv.
As this is called within the OneRow routine, I can live with that.  If
there is an opposition to that, we could just attach it within the
routines.  The CopyAttributeOut routines had a strange argument
layout, actually, the flag for the quotes is required as a header uses
no quotes, but there was little point in the "single arg" case, so
I've removed it.

I am attaching a v12 which is close to what I want it to be, with
much more documentation and comments.  There are two things that I've
changed compared to the previous versions though:
1) I have added a callback to set up the input and output functions
rather than attach that in the Start callback.  These routines are now
called once per argument, where we know that the argument is valid.
The callbacks are in charge of filling the FmgrInfos.  There are some
good reasons behind that:
- No need for plugins to think about how to allocate this data.  v11
and other versions were doing things the wrong way by allocating this
stuff in the wrong memory context as we switch to the COPY context
when we are in the Start routines.
- This avoids attisdropped problems, and we have a long history of
bugs regarding that.  I'm ready to bet that custom formats would get
that wrong.
2) I have backpedaled on the postpare callback, which did not bring
much in clarity IMO while being a CSV-only callback.  Note that we
have in copyfromparse.c more paths that are only for CSV but the past
versions of the patch never cared about that.  This makes the text and
CSV implementations much closer to each other, as a result.

I had mixed feelings about CopySendEndOfRow() being split to
CopyToTextSendEndOfRow() to send the line 

cfbot does not list patches in 'Current commitfest'

2024-02-04 Thread Richard Guo
... and the patches in CF #47 (currently open) are listed in 'Next
commitfest'.  I guess we need to manually create a "next" commitfest?

Thanks
Richard


Re: meson: catalog/syscache_ids.h isn't installed

2024-02-04 Thread Masahiko Sawada
On Mon, Feb 5, 2024 at 10:29 AM Sutou Kouhei  wrote:
>
> Hi,
>
> catalog/syscache_ids.h is referred by utils/syscache.h but
> it's not installed with Meson.
>
> FYI:
> * 9b1a6f50b91dca6610932650c8c81a3c924259f9
>   It uses catalog/syscache_ids.h in utils/syscache.h but
>   catalog/syscache_ids.h isn't installed.
> * 6eb6086faa3842c2a38a1ee2f97bf9a42ce27610
>   It changes a Makefile to install catalog/syscache_ids.h but
>   it doesn't change meson.build.
>
> 
> diff --git a/src/include/catalog/meson.build b/src/include/catalog/meson.build
> index 6be76dca1d..0bf6e112d5 100644
> --- a/src/include/catalog/meson.build
> +++ b/src/include/catalog/meson.build
> @@ -114,7 +114,7 @@ output_install = [
>dir_data,
>dir_data,
>dir_include_server / 'catalog',
> -  false,
> +  dir_include_server / 'catalog',
>dir_include_server / 'catalog',
>dir_include_server / 'catalog',
>  ]
> 
>

Thank you for reporting the issue and the patch.

I've confirmed this patch fixes the issue. But I don't have enough
knowledge of meson to assess this fix.

Peter, could you check this fix as it seems the recent commits forgot
to update the meson.build file?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Is this a problem in GenericXLogFinish()?

2024-02-04 Thread Amit Kapila
On Mon, Feb 5, 2024 at 10:00 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> >
> > Amit, this has been applied as of 861f86beea1c, and I got pinged about
> > the fact this triggers inconsistencies because we always set the LSN
> > of the write buffer (wbuf in _hash_freeovflpage) but
> > XLogRegisterBuffer() would *not* be called when the two following
> > conditions happen:
> > - When xlrec.ntups <= 0.
> > - When !xlrec.is_prim_bucket_same_wrt && !xlrec.is_prev_bucket_same_wrt
> >
> > And it seems to me that there is still a bug here: there should be no
> > point in setting the LSN on the write buffer if we don't register it
> > in WAL at all, no?
>
> Thanks for pointing out, I agreed your saying. PSA the patch for diagnosing 
> the
> issue.
>

@@ -692,6 +697,9 @@ _hash_freeovflpage(Relation rel, Buffer bucketbuf,
Buffer ovflbuf,
  if (!xlrec.is_prev_bucket_same_wrt)
  wbuf_flags |= REGBUF_NO_CHANGE;
  XLogRegisterBuffer(1, wbuf, wbuf_flags);
+
+ /* Track the registration status for later use */
+ wbuf_registered = true;
  }

  XLogRegisterBuffer(2, ovflbuf, REGBUF_STANDARD);
@@ -719,7 +727,12 @@ _hash_freeovflpage(Relation rel, Buffer
bucketbuf, Buffer ovflbuf,

  recptr = XLogInsert(RM_HASH_ID, XLOG_HASH_SQUEEZE_PAGE);

- PageSetLSN(BufferGetPage(wbuf), recptr);
+ /* Set LSN to wbuf page buffer only when it is being registered */
+ if (wbuf_registered)
+ PageSetLSN(BufferGetPage(wbuf), recptr);

Why set LSN when the page is not modified (say when we use the flag
REGBUF_NO_CHANGE)?  I think you need to use a flag mod_wbuf and set it
in appropriate places during register buffer calls.

-- 
With Regards,
Amit Kapila.




Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-02-04 Thread jian he
On Mon, Feb 5, 2024 at 10:29 AM torikoshia  wrote:
>
> Hi,
>
> On 2024-02-03 15:22, jian he wrote:
> > The idea of on_error is to tolerate errors, I think.
> > if a column has a not null constraint, let it cannot be used with
> > (on_error 'null')
>
> > +   /*
> > +* we can specify on_error 'null', but it can only apply to
> > columns
> > +* don't have not null constraint.
> > +   */
> > +   if (att->attnotnull && cstate->opts.on_error ==
> > COPY_ON_ERROR_NULL)
> > +   ereport(ERROR,
> > +   (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> > +errmsg("copy on_error 'null' cannot be used with
> > not null constraint column")));
>
> This means we cannot use ON_ERROR 'null' even when there is one column
> which have NOT NULL constraint, i.e. primary key, right?
> IMHO this is strong constraint and will decrease the opportunity to use
> this feature.

I don't want to fail in the middle of bulk inserts,
so I thought immediately erroring out would be a great idea.
Let's see what other people think.




Re: Synchronizing slots from primary to standby

2024-02-04 Thread Ajin Cherian
On Mon, Feb 5, 2024 at 1:29 PM Zhijie Hou (Fujitsu) 
wrote:

> On Monday, February 5, 2024 10:17 AM Zhijie Hou (Fujitsu) <
> houzj.f...@fujitsu.com> wrote:
>
> There was one miss in the doc that cause CFbot failure,
> attach the correct version V77_2 here. There are no code changes compared
> to V77 version.
>
> Best Regards,
> Hou zj
>

Just noticed that doc/src/sgml/config.sgml still refers to enable_synclot
instead of sync_replication_slots:

The standbys corresponding to the physical replication slots in
standby_slot_names must configure
enable_syncslot = true so they can receive
 failover logical slots changes from the primary.

regards,
Ajin Cherian
Fujitsu Australia


Re: src/bin/pg_upgrade/t/004_subscription.pl test comment fix

2024-02-04 Thread Amit Kapila
On Mon, Feb 5, 2024 at 2:42 AM Peter Smith  wrote:
>
> On Sat, Feb 3, 2024 at 5:28 PM Amit Kapila  wrote:
> >
> > On Thu, Feb 1, 2024 at 5:58 AM Peter Smith  wrote:
> > >
> > > OK. Now using your suggested 2nd sentence:
> > >
> > > +# The subscription's running status and failover option should be 
> > > preserved
> > > +# in the upgraded instance. So regress_sub1 should still have
> > > subenabled,subfailover
> > > +# set to true, while regress_sub2 should have both set to false.
> > >
> > > ~
> > >
> > > I also tweaked some other nearby comments/messages which did not
> > > mention the 'failover' preservation.
> > >
> >
> > Looks mostly good to me. One minor nitpick:
> > *
> > along with retaining the replication origin's remote lsn
> > -# and subscription's running status.
> > +# and subscription's running status and failover option.
> >
> > I don't think we need to use 'and' twice in the above sentence. We
> > should use ',' between different properties. I can change this on
> > Monday and push it unless you think otherwise.
> >
>
> +1
>

Pushed!

-- 
With Regards,
Amit Kapila.




RE: Is this a problem in GenericXLogFinish()?

2024-02-04 Thread Hayato Kuroda (Fujitsu)
Dear Michael, Amit,

> 
> Amit, this has been applied as of 861f86beea1c, and I got pinged about
> the fact this triggers inconsistencies because we always set the LSN
> of the write buffer (wbuf in _hash_freeovflpage) but
> XLogRegisterBuffer() would *not* be called when the two following
> conditions happen:
> - When xlrec.ntups <= 0.
> - When !xlrec.is_prim_bucket_same_wrt && !xlrec.is_prev_bucket_same_wrt
> 
> And it seems to me that there is still a bug here: there should be no
> point in setting the LSN on the write buffer if we don't register it
> in WAL at all, no?

Thanks for pointing out, I agreed your saying. PSA the patch for diagnosing the
issue.

This patch can avoid the inconsistency due to the LSN setting and output a debug
LOG when we met such a case. I executed hash_index.sql and confirmed the log was
output [1]. This meant that current test has already had a workload which meets 
below
conditions:

 - the overflow page has no tuples (xlrec.ntups is 0),
 - to-be-written page - wbuf - is not the primary (xlrec.is_prim_bucket_same_wrt
   is false), and
 - to-be-written buffer is not next to the overflow page
   (xlrec.is_prev_bucket_same_wrt is false)

So, I think my patch (after removing elog(...) part) can fix the issue. Thought?

[1]:
```
LOG:  XXX: is_wbuf_registered: false
CONTEXT:  while vacuuming index "hash_cleanup_index" of relation 
"public.hash_cleanup_heap"
STATEMENT:  VACUUM hash_cleanup_heap;
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



avoid_registration.patch
Description: avoid_registration.patch


RE: Synchronizing slots from primary to standby

2024-02-04 Thread Zhijie Hou (Fujitsu)
On Thursday, February 1, 2024 12:20 PM Amit Kapila  
wrote:
> 
> On Thu, Feb 1, 2024 at 8:15 AM Euler Taveira  wrote:
> >
> > On Mon, Jan 29, 2024, at 10:17 AM, Zhijie Hou (Fujitsu) wrote:
> >
> > Attach the V72-0001 which addressed above comments, other patches will
> be
> > rebased and posted after pushing first patch. Thanks Shveta for helping
> address
> > the comments.
> >
> >
> > While working on another patch I noticed a new NOTICE message:
> >
> > NOTICE:  changed the failover state of replication slot "foo" on publisher 
> > to
> false
> >
> > I wasn't paying much attention to this thread then I start reading the 2
> > patches that was recently committed. The message above surprises me
> because
> > pg_createsubscriber starts to emit this message. The reason is that it 
> > doesn't
> > create the replication slot during the CREATE SUBSCRIPTION. Instead, it
> creates
> > the replication slot with failover = false and no such option is informed
> > during CREATE SUBSCRIPTION which means it uses the default value (failover
> =
> > false). I expect that I don't see any message because it is *not* changing 
> > the
> > behavior. I was wrong. It doesn't check the failover state on publisher, it
> > just executes walrcv_alter_slot() and emits a message.
> >
> > IMO if we are changing an outstanding property on node A from node B,
> node B
> > already knows (or might know) about that behavior change (because it is
> sending
> > the command), however, node A doesn't (unless log_replication_commands
> = on --
> > it is not the default).
> >
> > Do we really need this message as NOTICE?
> >
> 
> The reason for adding this NOTICE was to keep it similar to other
> Notice messages in these commands like create/drop slot. However, here
> the difference is we may not have altered the slot as the property is
> already the same as we want to set on the publisher. So, I am not sure
> whether we should follow the existing behavior or just get rid of it.
> And then do we remove similar NOTICE in AlterSubscription() as well?
> Normally, I think NOTICE intends to let users know if we did anything
> with slots while executing subscription commands. Does anyone else
> have an opinion on this point?
> 
> A related point, I think we can avoid setting the 'failover' property
> in ReplicationSlotAlter() if it is not changed, the advantage is we
> will avoid saving slots. OTOH, this won't be a frequent operation so
> we can leave it as it is as well.

Here is a patch to remove the NOTICE and improve the ReplicationSlotAlter.
The patch also includes few cleanups based on Peter's feedback.

Best Regards,
Hou zj


v2-0001-clean-up-for-776621a5.patch
Description: v2-0001-clean-up-for-776621a5.patch


Re: Synchronizing slots from primary to standby

2024-02-04 Thread Peter Smith
On Fri, Feb 2, 2024 at 11:18 PM shveta malik  wrote:
>
> On Fri, Feb 2, 2024 at 12:25 PM Peter Smith  wrote:
> >
> > Here are some review comments for v750002.
>
> Thanks for the feedback Peter. Addressed all in v76 except one.
>
> > (this is a WIP but this is what I found so far...)
>
> > I wonder if it is better to log all the problems in one go instead of
> > making users stumble onto them one at a time after fixing one and then
> > hitting the next problem. e.g. just set some variable "all_ok =
> > false;" each time instead of all the "return false;"
> >
> > Then at the end of the function just "return all_ok;"
>
> If we do this way, then we need to find a way to combine the msgs as
> well, otherwise the same msg will be repeated multiple times. For the
> concerned functionality (which needs one time config effort by user),
> I feel the existing way looks okay. We may consider optimizing it if
> we get more comments here.
>

I don't think combining messages is necessary;   I considered these
all as different (not the same msg repeated multiple times) since they
all have different errhints.

I felt a user would only know to make a configuration correction when
they are informed something is wrong, so my review point was we could
tell them all the wrong things up-front so then those can all be fixed
with a "one time config effort by user".

Otherwise, if multiple settings (e.g. from the list below) have wrong
values, I imagined the user will fix the first reported one, then the
next bad config will be reported, then the user will fix that one,
then the next bad config will be reported, then the user will fix that
one, and so on. It just seemed potentially/unnecessarilly painful.

- errhint("\"%s\" must be defined.", "primary_slot_name"));
- errhint("\"%s\" must be enabled.", "hot_standby_feedback"));
- errhint("\"wal_level\" must be >= logical."));
- errhint("\"%s\" must be defined.", "primary_conninfo"));
- errhint("'dbname' must be specified in \"%s\".", "primary_conninfo"));

~

Anyway, I just wanted to explain my review comment some more because
maybe my reason wasn't clear the first time. Whatever your decision
is, it is fine by me.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Where can I find the doxyfile?

2024-02-04 Thread John Morris
>> I wish you'd stop proposing the lex filter so that we can discuss the
>> Doxyfile.in file and the accompanying Meson rule.

I see no obstacle to discussing Doxyfile.in and the meson.build file. Your 
suggestions are welcome. If you can find the original Doxyfile, it
would make sense to incorporate it.

>> but having something that purports to display our source code and
>> then show something that is*not* our source tree sounds insane.

The filter enables doxygen to interpret Postgres comments. Any code displayed 
is the original code, not the filtered code.

>> Speaking from my personal point of view, I make very little use
>> of our Doxygen pages,

Same for me. The pages are missing critical information, such as function 
descriptions and descriptions of structure fields. This filter is an attempt at 
fixing those problems.

Of course, nothing will fix doxygen when there are no comments to begin with. 
If we want comprehensive doxygen output, we need to add comments. However, we 
don’t need to add those irritating extra symbols.




Re: Reordering DISTINCT keys to match input path's pathkeys

2024-02-04 Thread Richard Guo
cfbot reminds that this patch does not apply any more.  So I've rebased
it on master, and also adjusted the test cases a bit.

Thanks
Richard


v2-0001-Reordering-DISTINCT-keys-to-match-input-path-s-pathkeys.patch
Description: Binary data


Encoding protection for pgcrypto

2024-02-04 Thread shihao zhong
Hi hackers,

Currently, pgp_sym_decrypt_text and pgp_pub_decrypt_text doesn't
enforce database encoding validation even when returning text. This
patch adds validation and dedicated  tests to verify its
effectiveness. Additionally, some existing unit tests were moved to
the new tests as they failed in some encoding.

Thanks,
SHihao


fix_pycrypto.patch
Description: Binary data


Re: Small fix on COPY ON_ERROR document

2024-02-04 Thread Yugo NAGATA
On Sat, 3 Feb 2024 01:51:56 +0200
Alexander Korotkov  wrote:

> On Fri, Feb 2, 2024 at 10:07 PM David G. Johnston
>  wrote:
> > On Thu, Feb 1, 2024 at 11:59 PM Yugo NAGATA  wrote:
> >>
> >>
> >> I attached a updated patch including fixes you pointed out above.
> >>
> >
> > Removed "which"; changed "occupying" to "occupy"
> > Removed on of the two "amounts"
> > Changed "unacceptable to the input function" to just "converting" as that 
> > is what the average reader is more likely to be thinking.
> > The rest was good.
> 
> Thanks to everybody in the thread.
> Pushed.

Thanks!
> 
> --
> Regards,
> Alexander Korotkov


-- 
Yugo NAGATA 




Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-02-04 Thread torikoshia

Hi,

On 2024-02-03 15:22, jian he wrote:

The idea of on_error is to tolerate errors, I think.
if a column has a not null constraint, let it cannot be used with
(on_error 'null')



+   /*
+* we can specify on_error 'null', but it can only apply to 
columns

+* don't have not null constraint.
+   */
+   if (att->attnotnull && cstate->opts.on_error == 
COPY_ON_ERROR_NULL)

+   ereport(ERROR,
+   (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+errmsg("copy on_error 'null' cannot be used with 
not null constraint column")));


This means we cannot use ON_ERROR 'null' even when there is one column 
which have NOT NULL constraint, i.e. primary key, right?
IMHO this is strong constraint and will decrease the opportunity to use 
this feature.


It might be better to allow error_action 'null' for tables which have 
NOT NULL constraint columns, and when facing soft errors for those rows, 
skip that row or stop COPY.



Based on this, I've made a patch.
based on COPY Synopsis: ON_ERROR 'error_action'
on_error 'null', the  keyword NULL should be single quoted.


As you mentioned, single quotation seems a little odd..

I'm not sure what is the best name and syntax for this feature, but 
since current error_action are verbs('stop' and 'ignore'), I feel 'null' 
might not be appropriate.



demo:
COPY check_ign_err FROM STDIN WITH (on_error 'null');
1 {1} a
2 {2} 1
3 {3} 2
4 {4} b
a {5} c
\.

\pset null NULL

SELECT * FROM check_ign_err;
  n   |  m  |  k
--+-+--
1 | {1} | NULL
2 | {2} |1
3 | {3} |2
4 | {4} | NULL
 NULL | {5} | NULL


Since we notice the number of ignored rows when ON_ERROR is 'ignore', 
users may want to know the number of rows which was changed to NULL when 
using ON_ERROR 'null'.


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




Re: An improvement on parallel DISTINCT

2024-02-04 Thread Richard Guo
On Fri, Feb 2, 2024 at 7:36 PM David Rowley  wrote:

> Now for the other stuff you had.   I didn't really like this part:
>
> + /*
> + * Set target for partial_distinct_rel as generate_useful_gather_paths
> + * requires that the input rel has a valid reltarget.
> + */
> + partial_distinct_rel->reltarget = cheapest_partial_path->pathtarget;
>
> I think we should just make it work the same way as
> create_grouping_paths(), where grouping_target is passed as a
> parameter.
>
> I've done it that way in the attached.


The change looks good to me.

BTW, I kind of doubt that 'create_partial_distinct_paths' is a proper
function name given what it actually does.  It not only generates
distinct paths based on input_rel's partial paths, but also adds
Gather/GatherMerge on top of these partially distinct paths, followed by
a final unique/aggregate path to ensure uniqueness of the final result.
So maybe 'create_parallel_distinct_paths' or something like that would
be better?

I asked because I noticed that in create_partial_grouping_paths(), we
only generate partially aggregated paths, and any subsequent
FinalizeAggregate step is called in the caller.

Thanks
Richard


Re: An improvement on parallel DISTINCT

2024-02-04 Thread Richard Guo
On Fri, Feb 2, 2024 at 6:39 PM David Rowley  wrote:

> So the gains increase with more parallel workers due to pushing more
> work to the worker. Amdahl's law approves of this.
>
> I'll push the patch shortly.


Thanks for the detailed testing and pushing the patch!

Thanks
Richard


meson: catalog/syscache_ids.h isn't installed

2024-02-04 Thread Sutou Kouhei
Hi,

catalog/syscache_ids.h is referred by utils/syscache.h but
it's not installed with Meson.

FYI:
* 9b1a6f50b91dca6610932650c8c81a3c924259f9
  It uses catalog/syscache_ids.h in utils/syscache.h but
  catalog/syscache_ids.h isn't installed.
* 6eb6086faa3842c2a38a1ee2f97bf9a42ce27610
  It changes a Makefile to install catalog/syscache_ids.h but
  it doesn't change meson.build.


diff --git a/src/include/catalog/meson.build b/src/include/catalog/meson.build
index 6be76dca1d..0bf6e112d5 100644
--- a/src/include/catalog/meson.build
+++ b/src/include/catalog/meson.build
@@ -114,7 +114,7 @@ output_install = [
   dir_data,
   dir_data,
   dir_include_server / 'catalog',
-  false,
+  dir_include_server / 'catalog',
   dir_include_server / 'catalog',
   dir_include_server / 'catalog',
 ]


Thanks,
-- 
kou




Re: Grant read-only access to exactly one database amongst many

2024-02-04 Thread David G. Johnston
On Sun, Feb 4, 2024 at 5:04 PM Graham Leggett  wrote:

> Hi all,
>
> I have a postgresql 15 instance with two databases in it, and I have a
> need to grant read-only access to one of those databases to a given user.
>
> To do this I created a dedicated role for readonly access to the database
> db1:
>
> CREATE ROLE "dv_read_db1"
> GRANT CONNECT ON DATABASE db1 TO dv_read_db1
>

This grant is basically pointless since by default all roles can connect
everywhere via the PUBLIC pseudo-role.  You need to revoke that grant, or
even alter it being given out by default.



> Trouble is, I can create tables in db1 which is write access.


Since in v15 PUBLIC also gets CREATE on the public schema.

I can also connect to db2 (bad),


See my comment regarding the pointless grant in a default setup.

and I can enumerate the tables in db2 (bad),
>

Connect privilege grants reading all catalog data by design.


> I appears the mechanism I am using above has insecure side effects.
>

It has, from your expectation, insecure defaults which you never changed.
We changed public schema in v16 but the ease-of-use database connecting
remains.

David J.


Re: Grant read-only access to exactly one database amongst many

2024-02-04 Thread Tom Lane
Graham Leggett  writes:
> Trouble is, I can create tables in db1 which is write access. I can also 
> connect to db2 (bad), and I can enumerate the tables in db2 (bad), although 
> the queries of the contents say access is denied.

You need to read the docs about default privileges: see about
halfway down

https://www.postgresql.org/docs/15/ddl-priv.html

where it says "PostgreSQL grants privileges on some types of objects
to PUBLIC by default ...".  In this case I think you likely need to
revoke the default public CREATE privilege on schema public in db1,
and revoke the default public CONNECT privilege on database db2.

regards, tom lane




Re: to_regtype() Raises Error

2024-02-04 Thread Erik Wienhold
On 2024-02-04 20:20 +0100, David E. Wheeler wrote:
> On Feb 2, 2024, at 15:33, David E. Wheeler  wrote:
> 
> > Anyway, I’m happy to submit a documentation patch along the lines you 
> > suggested.
> 
> How’s this?
> 
> --- a/doc/src/sgml/func.sgml
> +++ b/doc/src/sgml/func.sgml
> @@ -25460,11 +25460,12 @@ SELECT collation for ('foo' COLLATE "de_DE");
>  regtype
> 
> 
> -Translates a textual type name to its OID.  A similar result is
> +Parses a string of text, extracts a potential type name from it, and
> +translates that name into an OID. A similar result is
>  obtained by casting the string to type regtype (see
> -); however, this function will return
> -NULL rather than throwing an error if the name is
> -not found.
> +). Failure to extract a valid potential
> +type name results in an error; however, if the extracted names is not

Here "extracted names" should be "extracted name" (singular).
Otherwise, the text looks good.

> +known to the system, this function will return 
> NULL.
> 
>
>   
> 
> Does similar wording need to apply to other `to_reg*` functions?

Just to_regtype() is fine IMO.  The other to_reg* functions don't throw
errors on similar input, e.g.:

test=> select to_regproc('foo bar');
 to_regproc

 
(1 row)

-- 
Erik




Grant read-only access to exactly one database amongst many

2024-02-04 Thread Graham Leggett
Hi all,

I have a postgresql 15 instance with two databases in it, and I have a need to 
grant read-only access to one of those databases to a given user.

To do this I created a dedicated role for readonly access to the database db1:

CREATE ROLE "dv_read_db1"
GRANT CONNECT ON DATABASE db1 TO dv_read_db1
GRANT USAGE ON SCHEMA public TO “dv_read_db1"
GRANT SELECT ON ALL TABLES IN SCHEMA public TO “dv_read_db1"
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO 
“dv_read_db1"

CREATE USER minfrin LOGIN;
GRANT dv_read_db1 TO minfrin;

On the surface this works, I get readonly access to db1.

Trouble is, I can create tables in db1 which is write access. I can also 
connect to db2 (bad), and I can enumerate the tables in db2 (bad), although the 
queries of the contents say access is denied.

I appears the mechanism I am using above has insecure side effects.

What is the way to grant read only access to a single database, without 
exposing other databases, and being futureproof against future features 
offering potential write access to a read only user?

Regards,
Graham
—





Re: [PATCH] ltree hash functions

2024-02-04 Thread jian he
On Thu, Feb 1, 2024 at 11:11 PM vignesh C  wrote:
>
> On Wed, 6 Dec 2023 at 04:08, Tommy Pavlicek  wrote:
> >
> > Thanks.
> >
> > I've attached the latest version that updates the naming in line with
> > the convention.
> >
> > On Mon, Dec 4, 2023 at 12:46 AM jian he  wrote:
> > >
> > > On Fri, Dec 1, 2023 at 8:44 AM Tommy Pavlicek  
> > > wrote:
> > > >
> > > >
> > > > Patch updated for those comments (and a touch of cleanup in the tests) 
> > > > attached.
> > >
> > > it would be a better name as hash_ltree than ltree_hash, similar logic
> > > applies to ltree_hash_extended.
> > > that would be the convention. see: 
> > > https://stackoverflow.com/a/69650940/15603477
> > >
> > >
> > > Other than that, it looks good.
>
> CFBot shows one of the test is failing as in [1]:
> diff -U3 /tmp/cirrus-ci-build/contrib/ltree/expected/ltree.out
> /tmp/cirrus-ci-build/build-32/testrun/ltree/regress/results/ltree.out
> --- /tmp/cirrus-ci-build/contrib/ltree/expected/ltree.out 2024-01-31
> 15:18:42.893039599 +
> +++ /tmp/cirrus-ci-build/build-32/testrun/ltree/regress/results/ltree.out
> 2024-01-31 15:23:25.309028749 +
> @@ -1442,9 +1442,14 @@
> ('0.1.2'::ltree), ('0'::ltree), ('0_asd.1_ASD'::ltree)) x(v)
>  WHERE  hash_ltree(v)::bit(32) != hash_ltree_extended(v, 0)::bit(32)
> OR hash_ltree(v)::bit(32) = hash_ltree_extended(v, 1)::bit(32);
> - value | standard | extended0 | extended1
> +--+---+---
> -(0 rows)
> +value| standard |
> extended0 |extended1
> +-+--+--+--
> + 0   | 10001010100010011011 |
> 0101100001000100011001011011 | 010110000100010001101001
> + 0.1 | 101010001010110001001110 |
> 0000100010001100110111010101 | 0000100010001101100011010101
> + 0.1.2   | 0111010101110100 |
> 1010111001110101100011010111 | 101011100111011100100011
> + 0   | 10001010100010011011 |
> 0101100001000100011001011011 | 010110000100010001101001
> + 0_asd.1_ASD | 011000101000101001001101 |
> 0000100010001100110111010101 | 0000100010001101100011010101
> +(5 rows)
>
> Please post an updated version for the same.
>
> [1] - 
> https://api.cirrus-ci.com/v1/artifact/task/5572544858685440/testrun/build-32/testrun/ltree/regress/regression.diffs
>

It only fails on Linux - Debian Bullseye - Meson.
I fixed the white space, named it v5.
I also made the following changes:
from

uint64 levelHash = hash_any_extended((unsigned char *) al->name, al->len, seed);
uint32 levelHash = hash_any((unsigned char *) al->name, al->len);

to
uint64 levelHash = DatumGetUInt64(hash_any_extended((unsigned char *)
al->name, al->len, seed));
uint32 levelHash = DatumGetUInt32(hash_any((unsigned char *) al->name,
al->len));

(these two line live in different functions)

I have some problems testing it locally, so I post the patch.
From ffc0e3daea5e490d8b9e6b7a0345f780d88b5624 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Fri, 2 Feb 2024 16:03:56 +0800
Subject: [PATCH v5 1/1] add hash functions for the ltree extension.

---
 contrib/ltree/Makefile|  2 +-
 contrib/ltree/expected/ltree.out  | 65 +
 contrib/ltree/ltree--1.2--1.3.sql | 23 +++
 contrib/ltree/ltree.control   |  2 +-
 contrib/ltree/ltree_op.c  | 68 +++
 contrib/ltree/ltreetest.sql   |  1 +
 contrib/ltree/meson.build |  1 +
 contrib/ltree/sql/ltree.sql   | 43 +++
 doc/src/sgml/ltree.sgml   |  8 
 9 files changed, 211 insertions(+), 2 deletions(-)
 create mode 100644 contrib/ltree/ltree--1.2--1.3.sql

diff --git a/contrib/ltree/Makefile b/contrib/ltree/Makefile
index 770769a7..f50f2c94 100644
--- a/contrib/ltree/Makefile
+++ b/contrib/ltree/Makefile
@@ -14,7 +14,7 @@ OBJS = \
 	ltxtquery_op.o
 
 EXTENSION = ltree
-DATA = ltree--1.1--1.2.sql ltree--1.1.sql ltree--1.0--1.1.sql
+DATA = ltree--1.2--1.3.sql ltree--1.1--1.2.sql ltree--1.1.sql ltree--1.0--1.1.sql
 PGFILEDESC = "ltree - hierarchical label data type"
 
 HEADERS = ltree.h
diff --git a/contrib/ltree/expected/ltree.out b/contrib/ltree/expected/ltree.out
index 984cd030..dd4f0fc1 100644
--- a/contrib/ltree/expected/ltree.out
+++ b/contrib/ltree/expected/ltree.out
@@ -1433,8 +1433,27 @@ SELECT '{j.k.l.m, g.b.c.d.e}'::ltree[] ?~ 'A*@|g.b.c.d.e';
  g.b.c.d.e
 (1 row)
 
+-- Check that the hash_ltree() and hash_ltree_extended() function's lower
+-- 32 bits match when the seed is 0 and do not match when the seed != 0
+SELECT v as value, hash_ltree(v)::bit(32) as standard,
+   hash_ltree_extended(v, 0)::bit(32) as extended0,
+   hash_ltree_extended(v, 1)::bit(32) as extended1
+FROM   (VALUES (NULL::ltree), (''::ltree), ('0'::ltree), ('0.1'::ltree),
+   ('0.1.2'::ltree), 

Re: On login trigger: take three

2024-02-04 Thread Alexander Korotkov
On Tue, Jan 23, 2024 at 11:52 PM Alexander Korotkov
 wrote:
> On Tue, Jan 23, 2024 at 8:00 PM Alexander Lakhin  wrote:
> > Please look at the following query, which triggers an assertion failure on
> > updating the field dathasloginevt for an entry in pg_database:
> > SELECT format('CREATE DATABASE db1 LOCALE_PROVIDER icu ICU_LOCALE en 
> > ENCODING utf8
> > ICU_RULES ''' || repeat(' ', 20) || ''' TEMPLATE template0;')
> > \gexec
> > \c db1 -
> >
> > CREATE FUNCTION on_login_proc() RETURNS event_trigger AS $$
> > BEGIN
> >RAISE NOTICE 'You are welcome!';
> > END;
> > $$ LANGUAGE plpgsql;
> >
> > CREATE EVENT TRIGGER on_login_trigger ON login EXECUTE PROCEDURE 
> > on_login_proc();
> > DROP EVENT TRIGGER on_login_trigger;
> >
> > \c
> >
> > \connect: connection to server on socket "/tmp/.s.PGSQL.5432" failed: 
> > server closed the connection unexpectedly
> >
> > The stack trace of the assertion failure is:
> > ...
> > #5  0x55c8699b9b8d in ExceptionalCondition (
> >  conditionName=conditionName@entry=0x55c869a1f7c0 
> > "HaveRegisteredOrActiveSnapshot()",
> >  fileName=fileName@entry=0x55c869a1f4c6 "toast_internals.c", 
> > lineNumber=lineNumber@entry=669) at assert.c:66
> > #6  0x55c86945df0a in init_toast_snapshot (...) at toast_internals.c:669
> > #7  0x55c86945dfbe in toast_delete_datum (...) at toast_internals.c:429
> > #8  0x55c8694fd1da in toast_tuple_cleanup (...) at toast_helper.c:309
> > #9  0x55c8694b55a1 in heap_toast_insert_or_update (...) at 
> > heaptoast.c:333
> > #10 0x55c8694a8c6c in heap_update (... at heapam.c:3604
> > #11 0x55c8694a96cb in simple_heap_update (...) at heapam.c:4062
> > #12 0x55c869555b7b in CatalogTupleUpdate (...) at indexing.c:322
> > #13 0x55c8695f0725 in EventTriggerOnLogin () at event_trigger.c:957
> > ...
> >
> > Funnily enough, when Tom Lane was wondering, whether pg_database's toast
> > table could pose a risk [1], I came to the conclusion that it's impossible,
> > but that was before the login triggers introduction...
> >
> > [1] https://www.postgresql.org/message-id/1284094.1695479962%40sss.pgh.pa.us
>
> Thank you for reporting.  I'm looking into this...
> I wonder if there is a way to avoid toast update given that we don't
> touch the toasted field here.

Usage of heap_inplace_update() seems appropriate for me here.  It
avoids trouble with both TOAST and row-level locks.  Alexander, could
you please recheck this fixes the problem.

--
Regards,
Alexander Korotkov


0001-Use-heap_inplace_update-to-unset-pg_database.dath-v1.patch
Description: Binary data


Re: src/bin/pg_upgrade/t/004_subscription.pl test comment fix

2024-02-04 Thread Peter Smith
On Sat, Feb 3, 2024 at 5:28 PM Amit Kapila  wrote:
>
> On Thu, Feb 1, 2024 at 5:58 AM Peter Smith  wrote:
> >
> > OK. Now using your suggested 2nd sentence:
> >
> > +# The subscription's running status and failover option should be preserved
> > +# in the upgraded instance. So regress_sub1 should still have
> > subenabled,subfailover
> > +# set to true, while regress_sub2 should have both set to false.
> >
> > ~
> >
> > I also tweaked some other nearby comments/messages which did not
> > mention the 'failover' preservation.
> >
>
> Looks mostly good to me. One minor nitpick:
> *
> along with retaining the replication origin's remote lsn
> -# and subscription's running status.
> +# and subscription's running status and failover option.
>
> I don't think we need to use 'and' twice in the above sentence. We
> should use ',' between different properties. I can change this on
> Monday and push it unless you think otherwise.
>

+1

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: to_regtype() Raises Error

2024-02-04 Thread David E. Wheeler
On Feb 2, 2024, at 15:33, David E. Wheeler  wrote:

> Anyway, I’m happy to submit a documentation patch along the lines you 
> suggested.

How’s this?

--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25460,11 +25460,12 @@ SELECT collation for ('foo' COLLATE "de_DE");
 regtype


-Translates a textual type name to its OID.  A similar result is
+Parses a string of text, extracts a potential type name from it, and
+translates that name into an OID. A similar result is
 obtained by casting the string to type regtype (see
-); however, this function will return
-NULL rather than throwing an error if the name is
-not found.
+). Failure to extract a valid potential
+type name results in an error; however, if the extracted names is not
+known to the system, this function will return NULL.

   
  

Does similar wording need to apply to other `to_reg*` functions?

Best,

David




v1-0001-Update-to_regtype-docs-regarding-error-condition.patch
Description: Binary data


Re: Patch: Add parse_type Function

2024-02-04 Thread David E. Wheeler
On Feb 4, 2024, at 13:52, Pavel Stehule  wrote:

> not yet, but I'll do it

Nice, thank you. I put it into the Commitfest.

  https://commitfest.postgresql.org/47/4807/

Best,

David





Re: Draft release notes for minor releases are up

2024-02-04 Thread Noah Misch
On Sun, Feb 04, 2024 at 01:13:53PM -0500, Tom Lane wrote:
> I now have this text for your CREATE DATABASE fixes:
> 
>  
>   Ensure durability of CREATE DATABASE (Noah Misch)
>  
> 
>  
>   If an operating system crash occurred during or shortly
>   after CREATE DATABASE, recovery could fail, or
>   subsequent connections to the new database could fail.  If a base
>   backup was taken in that window, similar problems could be observed
>   when trying to use the backup.  The symptom would be that the
>   database directory, PG_VERSION file, or
>   pg_filenode.map file was missing or empty.
>  

Thanks for updating it; this text works for me.

> This is ignoring the point that the set of observable symptoms might
> differ between the OS crash and base-backup-recovery cases, but
> I'm not sure that that's actually true, and in any case I don't think
> it matters for the release notes.

I agree with stopping short of adding that detail; it wouldn't help users make
a known decision.




Re: Patch: Add parse_type Function

2024-02-04 Thread Pavel Stehule
ne 4. 2. 2024 v 19:30 odesílatel David E. Wheeler 
napsal:

> On Feb 4, 2024, at 13:02, Pavel Stehule  wrote:
>
> > I thinks so proposed functionality can be useful
>
> Great, thank you!
>
> Is that a review? :-)
>

not yet, but I'll do it

Pavel




>
> D
>
>
>


Re: Patch: Add parse_type Function

2024-02-04 Thread David E. Wheeler
On Feb 4, 2024, at 13:02, Pavel Stehule  wrote:

> I thinks so proposed functionality can be useful 

Great, thank you!

Is that a review? :-)

D






Re: Draft release notes for minor releases are up

2024-02-04 Thread Tom Lane
Noah Misch  writes:
> On Fri, Feb 02, 2024 at 08:18:50PM -0500, Tom Lane wrote:
>> Noah Misch  writes:
>>> Shall the top of the notes advise to reindex GIN indexes?

>> I thought about that, but it's a pretty low-probability failure
>> I think, so I didn't write that advice.  Maybe I misjudged it.

> I can see there being failures so low-probability to omit that text, e.g. a
> failure identified theoretically and requiring a process to lose the CPU for
> hours.  For this one, the reporter seems to have arrived at it without a
> deliberate search.  This one just needs a recovery at the right WAL record,
> then two processes reaching the incomplete split concurrently.

The reporter didn't exactly say, but it did seem that the initial
detection was made without any code modifications, so I take your
point.  I'll add the advice.  Also, I now have this text for your
CREATE DATABASE fixes:

 
  Ensure durability of CREATE DATABASE (Noah Misch)
 

 
  If an operating system crash occurred during or shortly
  after CREATE DATABASE, recovery could fail, or
  subsequent connections to the new database could fail.  If a base
  backup was taken in that window, similar problems could be observed
  when trying to use the backup.  The symptom would be that the
  database directory, PG_VERSION file, or
  pg_filenode.map file was missing or empty.
 

This is ignoring the point that the set of observable symptoms might
differ between the OS crash and base-backup-recovery cases, but
I'm not sure that that's actually true, and in any case I don't think
it matters for the release notes.

regards, tom lane




Re: Patch: Add parse_type Function

2024-02-04 Thread Pavel Stehule
Hi

ne 4. 2. 2024 v 18:51 odesílatel David E. Wheeler 
napsal:

> Hackers,
>
> Attached is a patch to add a new function, `parse_type()`. It parses a
> type string and returns a record with the typid and typmod for the type, or
> raises an error if the type is invalid. It’s effectively a thin layer over
> the parser’s parseTypeString() function.
>
> The purpose of this function is to allow uses to resolve any type name,
> including aliases, to its formal name as rendered by, for example, pg_dump.
> An example:
>
> david=# WITH s(s) AS (
> SELECT * FROM unnest(ARRAY[
> 'timestamp(4)',
> 'interval(0)',
> 'interval second(0)',
> 'timestamptz',
> 'timestamptz(6)',
> 'varchar',
> 'varchar(128)'
> ])
> ),
> p(type, typid, typmod) AS (
> SELECT s, ((parse_type(s)).*)
>   FROM s
> )
> SELECT type, format_type(typid, typmod) FROM p;
> type|  format_type
> +
>  timestamp(4)   | timestamp(4) without time zone
>  interval(0)| interval(0)
>  interval second(0) | interval second(0)
>  timestamptz| timestamp with time zone
>  timestamptz(6) | timestamp(6) with time zone
>  varchar| character varying
>  varchar(128)   | character varying(128)
> (7 rows)
>
>
> The use case leading to this patch was pgTAP column data type assertions.
> pgTAP traditionally required full type names for testing data types, but
> recently added support for aliases. This broke for some types, because it
> was difficult to extract the typmod correctly, and even when we did, it
> failed for types such as `interval second(0)`, where `second (0)` is the
> typmod, and there was no sensible way to access the bit mask to generate
> the proper typmod to pass to `format_type()`.
>
> Erik Wienhold wrote this function to solve the problem, and I volunteered
> to submit a patch.
>
> Potential issues and questions for this patch:
>
> * Is `parse_type()` the right name to use? I can see arguments for
> `parse_type_string()`, `pg_parse_type()`, and `pg_parse_type_string()`.
> Whatever the consensus is works for me.
>

there can be an analogy with parse_ident, so parse_type looks ok


> * The function returns a record, which is a little fussy. I could see
> adding a `format_type_string()` function that just contains `SELECT
> format_type(p.typmod, p.typid) FROM parse_type($1) p` and returns the
> formatted type. But I think it might also be useful to have access to the
> typmod and typid directly via this method, since they’re already exposed as
> `format_type()`’s arguments.
>

I think so record has sense for this case


>
> * Is format_type.c the right home for the function? I put it there because
> I closely associate it with format_type(), but happy to move it to a more
> appropriate location.
>

yes


>
> * I tried to link to `format_type` from the docs entry for `parse_type`,
> and added an ID for the former, but I’m not sure it resolves.
>

I thinks so proposed functionality can be useful

Regards

Pavel


>
> Best,
>
> David
>
>


Re: Clean up command argument assembly

2024-02-04 Thread Tom Lane
Peter Eisentraut  writes:
> Should anything outside of libpq be using PQExpBuffer?

Perhaps not.  PQExpBuffer's behavior for OOM cases is designed
specifically for libpq, where exit-on-OOM is not okay and we
can hope to include failure checks wherever needed.  For most
of our application code, we'd much rather just exit-on-OOM
and not have to think about failure checks at the call sites.

Having said that, converting stuff like pg_dump would be quite awful
in terms of code churn and creating a back-patching nightmare.

Would it make any sense to think about having two sets of
routines with identical call APIs, but different failure
behavior, so that we don't have to touch the callers?

regards, tom lane




Patch: Add parse_type Function

2024-02-04 Thread David E. Wheeler
Hackers,

Attached is a patch to add a new function, `parse_type()`. It parses a type 
string and returns a record with the typid and typmod for the type, or raises 
an error if the type is invalid. It’s effectively a thin layer over the 
parser’s parseTypeString() function.

The purpose of this function is to allow uses to resolve any type name, 
including aliases, to its formal name as rendered by, for example, pg_dump. An 
example:

david=# WITH s(s) AS (
SELECT * FROM unnest(ARRAY[
'timestamp(4)',
'interval(0)',
'interval second(0)',
'timestamptz',
'timestamptz(6)',
'varchar',
'varchar(128)' 
]) 
),
p(type, typid, typmod) AS (
SELECT s, ((parse_type(s)).*)
  FROM s 
)   
SELECT type, format_type(typid, typmod) FROM p;
type|  format_type   
+
 timestamp(4)   | timestamp(4) without time zone
 interval(0)| interval(0)
 interval second(0) | interval second(0)
 timestamptz| timestamp with time zone
 timestamptz(6) | timestamp(6) with time zone
 varchar| character varying
 varchar(128)   | character varying(128)
(7 rows)


The use case leading to this patch was pgTAP column data type assertions. pgTAP 
traditionally required full type names for testing data types, but recently 
added support for aliases. This broke for some types, because it was difficult 
to extract the typmod correctly, and even when we did, it failed for types such 
as `interval second(0)`, where `second (0)` is the typmod, and there was no 
sensible way to access the bit mask to generate the proper typmod to pass to 
`format_type()`.

Erik Wienhold wrote this function to solve the problem, and I volunteered to 
submit a patch.

Potential issues and questions for this patch:

* Is `parse_type()` the right name to use? I can see arguments for 
`parse_type_string()`, `pg_parse_type()`, and `pg_parse_type_string()`. 
Whatever the consensus is works for me.

* The function returns a record, which is a little fussy. I could see adding a 
`format_type_string()` function that just contains `SELECT 
format_type(p.typmod, p.typid) FROM parse_type($1) p` and returns the formatted 
type. But I think it might also be useful to have access to the typmod and 
typid directly via this method, since they’re already exposed as 
`format_type()`’s arguments.

* Is format_type.c the right home for the function? I put it there because I 
closely associate it with format_type(), but happy to move it to a more 
appropriate location.

* I tried to link to `format_type` from the docs entry for `parse_type`, and 
added an ID for the former, but I’m not sure it resolves.

Best,

David



v1-0001-Add-parse_type-SQL-function.patch
Description: Binary data


Re: Clean up command argument assembly

2024-02-04 Thread Peter Eisentraut

On 05.07.23 07:22, Peter Eisentraut wrote:
It's a bit bogus to use PQExpBuffer for these. If you run out of 
memory, you silently get an empty string instead. StringInfo, which 
exits the process on OOM, would be more appropriate. We have tons of 
such inappropriate uses of PQExpBuffer in all our client programs, 
though, so I don't insist on fixing this particular case right now.


Interesting point.  But as you say better dealt with as a separate problem.


I was inspired by a33e17f210 (for pg_rewind) to clean up some more 
fixed-buffer command assembly and replace it with extensible buffers and 
some more elegant code.  And then I remembered this thread, and it's 
really a continuation of this.


The first patch deals with pg_regress and pg_isolation_regress.  It is 
pretty straightforward.


The second patch deals with pg_upgrade.  It would require exporting 
appendPQExpBufferVA() from libpq, which might be overkill.  But this 
gets to your point earlier:  Should pg_upgrade rather be using 
StringInfo instead of PQExpBuffer?  (Then we'd use appendStringInfoVA(), 
which already exists, but even if not we wouldn't need to change libpq 
to get it.)  Should anything outside of libpq be using PQExpBuffer?
From dc3da0dce85a69314fff0a03998fd89b4ca19d8c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sun, 4 Feb 2024 18:32:16 +0100
Subject: [PATCH 1/2] Use extensible buffers to assemble command lines

This makes use of StringInfo to assemble command lines, instead of
using fixed-size buffers and the (remote) possibility of "command too
long" errors.  Also makes the code a bit simpler.

This covers the test driver programs pg_regress and
pg_isolation_regress.

Similar to the changes done for pg_rewind in a33e17f210.
---
 src/test/isolation/isolation_main.c | 37 ++
 src/test/regress/pg_regress_main.c  | 41 +++--
 2 files changed, 30 insertions(+), 48 deletions(-)

diff --git a/src/test/isolation/isolation_main.c 
b/src/test/isolation/isolation_main.c
index 05e81035c1f..2a3e41d2101 100644
--- a/src/test/isolation/isolation_main.c
+++ b/src/test/isolation/isolation_main.c
@@ -12,6 +12,7 @@
 
 #include "postgres_fe.h"
 
+#include "lib/stringinfo.h"
 #include "pg_regress.h"
 
 char   saved_argv0[MAXPGPATH];
@@ -34,8 +35,7 @@ isolation_start_test(const char *testname,
charinfile[MAXPGPATH];
charoutfile[MAXPGPATH];
charexpectfile[MAXPGPATH];
-   charpsql_cmd[MAXPGPATH * 3];
-   size_t  offset = 0;
+   StringInfoData psql_cmd;
char   *appnameenv;
 
/* need to do the path lookup here, check isolation_init() for details 
*/
@@ -75,34 +75,23 @@ isolation_start_test(const char *testname,
add_stringlist_item(resultfiles, outfile);
add_stringlist_item(expectfiles, expectfile);
 
+   initStringInfo(_cmd);
+
if (launcher)
-   {
-   offset += snprintf(psql_cmd + offset, sizeof(psql_cmd) - offset,
-  "%s ", launcher);
-   if (offset >= sizeof(psql_cmd))
-   {
-   fprintf(stderr, _("command too long\n"));
-   exit(2);
-   }
-   }
+   appendStringInfo(_cmd, "%s ", launcher);
 
-   offset += snprintf(psql_cmd + offset, sizeof(psql_cmd) - offset,
-  "\"%s\" \"dbname=%s\" < \"%s\" > 
\"%s\" 2>&1",
-  isolation_exec,
-  dblist->str,
-  infile,
-  outfile);
-   if (offset >= sizeof(psql_cmd))
-   {
-   fprintf(stderr, _("command too long\n"));
-   exit(2);
-   }
+   appendStringInfo(_cmd,
+"\"%s\" \"dbname=%s\" < \"%s\" > 
\"%s\" 2>&1",
+isolation_exec,
+dblist->str,
+infile,
+outfile);
 
appnameenv = psprintf("isolation/%s", testname);
setenv("PGAPPNAME", appnameenv, 1);
free(appnameenv);
 
-   pid = spawn_process(psql_cmd);
+   pid = spawn_process(psql_cmd.data);
 
if (pid == INVALID_PID)
{
@@ -113,6 +102,8 @@ isolation_start_test(const char *testname,
 
unsetenv("PGAPPNAME");
 
+   pfree(psql_cmd.data);
+
return pid;
 }
 
diff --git a/src/test/regress/pg_regress_main.c 
b/src/test/regress/pg_regress_main.c
index 18155ef97e2..d607a3023b2 100644
--- a/src/test/regress/pg_regress_main.c
+++ b/src/test/regress/pg_regress_main.c
@@ -18,6 +18,7 @@
 
 #include "postgres_fe.h"
 
+#include "lib/stringinfo.h"
 #include "pg_regress.h"
 
 /*
@@ -34,8 +35,7 @@ psql_start_test(const char 

Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-02-04 Thread Andrey M. Borodin



> On 4 Feb 2024, at 18:38, Alvaro Herrera  wrote:
> 
> In other words, these barriers are fully useless.

+1. I've tried to understand ideas behind barriers, but latest_page_number is 
heuristics that does not need any guarantees at all. It's also is used in 
safety check which can fire only when everything is already broken beyond any 
repair.. (Though using atomic access seems a good idea anyway)

This patch uses wording "banks" in comments before banks start to exist. But as 
far as I understand, it is expected to be committed before "banks" patch.

Besides this patch looks good to me.


Best regards, Andrey Borodin.



Re: Collation version tracking for macOS

2024-02-04 Thread Jeff Davis
On Thu, 2024-02-01 at 15:58 -0500, Robert Haas wrote:
> Not that I'm the most qualified person to have an opinion on this
> topic, but did you intend to attach this stuff to this email, or is
> it
> somewhere else?

The previous patch is here:

https://www.postgresql.org/message-id/6f4a8c01a5cb1edf3a07d204c371fbddaef252f9.camel%40j-davis.com

And I attached the rendered HTML doc page, which conveniently renders
in the archives (thanks to web team -- I didn't know if that would
actually work until I tried it):

https://www.postgresql.org/message-id/attachment/142818/icu-multilib.html

For anyone interested in this work, the docs are the best place to
start.

I'm hesitant to put much more work into it (e.g. new patches, etc.)
without more feedback. Your opinion would certainly be valuable -- for
instance, when reading the docs, can you imagine yourself actually
using this if you ran into a collation versioning/migration problem?

Regards,
Jeff Davis





Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2024-02-04 Thread Alvaro Herrera
On 2024-Feb-02, Jelte Fennema-Nio wrote:

> The only reasonable thing I can think of to make that situation better
> is to move that part of the function outside of PQcancelPoll and
> create a dedicated PQcancelStart function for it. It introduces an
> extra function, but it does seem more in line with how we do the
> regular connection establishment. Basically you would have code like
> this then, which looks quite nice honestly:
> 
> cancelConn = PQcancelConn(conn);
> if (!PQcancelStart(cancelConn))
> pg_fatal("bad cancel connection: %s", 
> PQcancelErrorMessage(cancelConn));
> while (true)
> {
>  // polling using PQcancelPoll here
> }

Maybe this is okay?  I'll have a look at the whole final situation more
carefully later; or if somebody else wants to share an opinion, please
do so.

In the meantime I pushed your 0002 and 0003 patches, so you can take
this as an opportunity to rebase the remaining ones.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"The saddest aspect of life right now is that science gathers knowledge faster
 than society gathers wisdom."  (Isaac Asimov)




Re: Commitfest 2024-01 first week update

2024-02-04 Thread vignesh C
On Sun, 4 Feb 2024 at 14:32, Alvaro Herrera  wrote:
>
> On 2024-Jan-10, Daniel Gustafsson wrote:
>
> > > On 9 Jan 2024, at 23:18, Robert Haas  wrote:
> >
> > > I think we need to be more aggressive about marking things returned
> > > with feedback when they don't get updated.
> >
> > I very much agree.  Having marked quite a lot of patches as RwF when being 
> > CFM
> > I can attest that it gets very little off-list pushback or angry emails.  
> > While
> > it does happen, the overwhelming majority of responses are understanding and
> > positive, so no CFM should be worried about "being the bad guy".
>
> I like this idea very much -- return patches when the author does not
> respond AFTER receiving feedback or the patch rotting.
>
> However, this time around I saw that a bunch of patches were returned or
> threatened to be returned JUST BECAUSE nobody had replied to the thread,
> with a justification like "you need to generate more interest in your
> patch".  This is a TERRIBLE idea, and there's one reason why creating a
> new commitfest entry in the following commitfest is no good:

I have seen that most of the threads are being discussed and being
promptly updated. But very few of the entries become stale and just
move from one commitfest to another commitfest without anything being
done. For these kinds of entries, we were just trying to see if the
author or anybody is really interested or not in pursuing it.

We should do something about these kinds of entries, there were few
suggestions like tagging under a new category or so, can we add a new
status to park these entries something like "Waiting for direction".
The threads which have no discussion for 6 months or so can be flagged
to this new status and these can be discussed in one of the developer
meetings or so and conclude on these items.

Regards,
Vignesh




Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-02-04 Thread Alvaro Herrera
Sorry, brown paper bag bug there.  Here's the correct one.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"I can't go to a restaurant and order food because I keep looking at the
fonts on the menu.  Five minutes later I realize that it's also talking
about food" (Donald Knuth)
>From 99cadfdf7475146953e9846c20c4a708a3527937 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 31 Jan 2024 12:27:51 +0100
Subject: [PATCH v3] Use atomics for SlruSharedData->latest_page_number

---
 src/backend/access/transam/clog.c  |  6 +---
 src/backend/access/transam/commit_ts.c |  7 ++---
 src/backend/access/transam/multixact.c | 28 +++---
 src/backend/access/transam/slru.c  | 40 +++---
 src/include/access/slru.h  |  5 +++-
 5 files changed, 54 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index f6e7da7ffc..06fc2989ba 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -766,14 +766,10 @@ StartupCLOG(void)
 	TransactionId xid = XidFromFullTransactionId(TransamVariables->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Initialize our idea of the latest page number.
 	 */
-	XactCtl->shared->latest_page_number = pageno;
-
-	LWLockRelease(XactSLRULock);
+	pg_atomic_write_u64(>shared->latest_page_number, pageno);
 }
 
 /*
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 61b82385f3..6bfe60343e 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -689,9 +689,7 @@ ActivateCommitTs(void)
 	/*
 	 * Re-Initialize our idea of the latest page number.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
-	CommitTsCtl->shared->latest_page_number = pageno;
-	LWLockRelease(CommitTsSLRULock);
+	pg_atomic_write_u64(>shared->latest_page_number, pageno);
 
 	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
@@ -1006,7 +1004,8 @@ commit_ts_redo(XLogReaderState *record)
 		 * During XLOG replay, latest_page_number isn't set up yet; insert a
 		 * suitable value to bypass the sanity test in SimpleLruTruncate.
 		 */
-		CommitTsCtl->shared->latest_page_number = trunc->pageno;
+		pg_atomic_write_u64(>shared->latest_page_number,
+			trunc->pageno);
 
 		SimpleLruTruncate(CommitTsCtl, trunc->pageno);
 	}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 59523be901..febc429f72 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2017,13 +2017,15 @@ StartupMultiXact(void)
 	 * Initialize offset's idea of the latest page number.
 	 */
 	pageno = MultiXactIdToOffsetPage(multi);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(>shared->latest_page_number,
+		pageno);
 
 	/*
 	 * Initialize member's idea of the latest page number.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(>shared->latest_page_number,
+		pageno);
 }
 
 /*
@@ -2047,14 +2049,15 @@ TrimMultiXact(void)
 	oldestMXactDB = MultiXactState->oldestMultiXactDB;
 	LWLockRelease(MultiXactGenLock);
 
-	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
 	pageno = MultiXactIdToOffsetPage(nextMXact);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(>shared->latest_page_number,
+		pageno);
+
+	/* Clean up offsets state */
+	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
@@ -2081,14 +2084,16 @@ TrimMultiXact(void)
 
 	LWLockRelease(MultiXactOffsetSLRULock);
 
-	/* And the same for members */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	/*
+	 * And the same for members.
+	 *
 	 * (Re-)Initialize our idea of the latest page number for members.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(>shared->latest_page_number,
+		pageno);
+
+	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
@@ -,7 +3338,8 @@ multixact_redo(XLogReaderState *record)
 		 * SimpleLruTruncate.
 		 */
 		pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
-		MultiXactOffsetCtl->shared->latest_page_number = pageno;
+		pg_atomic_write_u64(>shared->latest_page_number,
+			pageno);
 		PerformOffsetsTruncation(xlrec.startTruncOff, xlrec.endTruncOff);
 
 		LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 9ac4790f16..c1d0dfc73b 100644
--- 

Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-02-04 Thread Alvaro Herrera
In short, I propose the attached.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
>From b4ba8135f8044e0077a27fcf6ad18451380cbcb3 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 31 Jan 2024 12:27:51 +0100
Subject: [PATCH v2] Use atomics for SlruSharedData->latest_page_number

---
 src/backend/access/transam/clog.c  |  6 +---
 src/backend/access/transam/commit_ts.c |  7 ++---
 src/backend/access/transam/multixact.c | 28 +++---
 src/backend/access/transam/slru.c  | 40 +++---
 src/include/access/slru.h  |  5 +++-
 5 files changed, 54 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index f6e7da7ffc..f8aa91eb0a 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -766,14 +766,10 @@ StartupCLOG(void)
 	TransactionId xid = XidFromFullTransactionId(TransamVariables->nextXid);
 	int64		pageno = TransactionIdToPage(xid);
 
-	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * Initialize our idea of the latest page number.
 	 */
-	XactCtl->shared->latest_page_number = pageno;
-
-	LWLockRelease(XactSLRULock);
+	pg_atomic_write_u64(XactCtl->shared->latest_page_number, pageno);
 }
 
 /*
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 61b82385f3..6bfe60343e 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -689,9 +689,7 @@ ActivateCommitTs(void)
 	/*
 	 * Re-Initialize our idea of the latest page number.
 	 */
-	LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
-	CommitTsCtl->shared->latest_page_number = pageno;
-	LWLockRelease(CommitTsSLRULock);
+	pg_atomic_write_u64(>shared->latest_page_number, pageno);
 
 	/*
 	 * If CommitTs is enabled, but it wasn't in the previous server run, we
@@ -1006,7 +1004,8 @@ commit_ts_redo(XLogReaderState *record)
 		 * During XLOG replay, latest_page_number isn't set up yet; insert a
 		 * suitable value to bypass the sanity test in SimpleLruTruncate.
 		 */
-		CommitTsCtl->shared->latest_page_number = trunc->pageno;
+		pg_atomic_write_u64(>shared->latest_page_number,
+			trunc->pageno);
 
 		SimpleLruTruncate(CommitTsCtl, trunc->pageno);
 	}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 59523be901..febc429f72 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2017,13 +2017,15 @@ StartupMultiXact(void)
 	 * Initialize offset's idea of the latest page number.
 	 */
 	pageno = MultiXactIdToOffsetPage(multi);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(>shared->latest_page_number,
+		pageno);
 
 	/*
 	 * Initialize member's idea of the latest page number.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(>shared->latest_page_number,
+		pageno);
 }
 
 /*
@@ -2047,14 +2049,15 @@ TrimMultiXact(void)
 	oldestMXactDB = MultiXactState->oldestMultiXactDB;
 	LWLockRelease(MultiXactGenLock);
 
-	/* Clean up offsets state */
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	/*
 	 * (Re-)Initialize our idea of the latest page number for offsets.
 	 */
 	pageno = MultiXactIdToOffsetPage(nextMXact);
-	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(>shared->latest_page_number,
+		pageno);
+
+	/* Clean up offsets state */
+	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current offsets page.  See notes in
@@ -2081,14 +2084,16 @@ TrimMultiXact(void)
 
 	LWLockRelease(MultiXactOffsetSLRULock);
 
-	/* And the same for members */
-	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
-
 	/*
+	 * And the same for members.
+	 *
 	 * (Re-)Initialize our idea of the latest page number for members.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
-	MultiXactMemberCtl->shared->latest_page_number = pageno;
+	pg_atomic_write_u64(>shared->latest_page_number,
+		pageno);
+
+	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
 
 	/*
 	 * Zero out the remainder of the current members page.  See notes in
@@ -,7 +3338,8 @@ multixact_redo(XLogReaderState *record)
 		 * SimpleLruTruncate.
 		 */
 		pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
-		MultiXactOffsetCtl->shared->latest_page_number = pageno;
+		pg_atomic_write_u64(>shared->latest_page_number,
+			pageno);
 		PerformOffsetsTruncation(xlrec.startTruncOff, xlrec.endTruncOff);
 
 		LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 9ac4790f16..c1d0dfc73b 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -17,7 +17,8 @@
  * per-buffer LWLocks that synchronize I/O for each buffer.  The control lock
  * must be held to 

Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-02-04 Thread Alvaro Herrera
On 2024-Feb-02, Dilip Kumar wrote:

> I have checked the patch and it looks fine to me other than the above
> question related to memory barrier usage one more question about the
> same, basically below to instances 1 and 2 look similar but in 1 you
> are not using the memory write_barrier whereas in 2 you are using the
> write_barrier, why is it so?  I mean why the reordering can not happen
> in 1 and it may happen in 2?

What I was thinking is that there's a lwlock operation just below, which
acts as a barrier.  But I realized something more important: there are
only two places that matter, which are SlruSelectLRUPage and
SimpleLruZeroPage.  The others are all initialization code that run at a
point where there's no going to be any concurrency in SLRU access, so we
don't need barriers anyway.  In SlruSelectLRUPage we definitely don't
want to evict the page that SimpleLruZeroPage has initialized, starting
from the point where it returns that new page to its caller.

But if you consider the code of those two routines, you realize that the
only time an equality between latest_page_number and "this_page_number"
is going to occur, is when both pages are in the same bank ... and both
routines are required to be holding the bank lock while they run, so in
practice this is never a problem.

We need the atomic write and atomic read so that multiple processes
processing pages in different banks can update latest_page_number
simultaneously.  But the equality condition that we're looking for?
it can never happen concurrently.

In other words, these barriers are fully useless.

(We also have SimpleLruTruncate, but I think it's not as critical to
have a barrier there anyhow: accessing a slightly outdated page number
could only be a problem if a bug elsewhere causes us to try to truncate
in the current page.  I think we only have this code there because we
did have such bugs in the past, but IIUC this shouldn't happen anymore.)

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/




Re: Commitfest 2024-01 first week update

2024-02-04 Thread Alvaro Herrera
On 2024-Jan-10, Daniel Gustafsson wrote:

> > On 9 Jan 2024, at 23:18, Robert Haas  wrote:
> 
> > I think we need to be more aggressive about marking things returned
> > with feedback when they don't get updated.
> 
> I very much agree.  Having marked quite a lot of patches as RwF when being CFM
> I can attest that it gets very little off-list pushback or angry emails.  
> While
> it does happen, the overwhelming majority of responses are understanding and
> positive, so no CFM should be worried about "being the bad guy".

I like this idea very much -- return patches when the author does not
respond AFTER receiving feedback or the patch rotting.

However, this time around I saw that a bunch of patches were returned or
threatened to be returned JUST BECAUSE nobody had replied to the thread,
with a justification like "you need to generate more interest in your
patch".  This is a TERRIBLE idea, and there's one reason why creating a
new commitfest entry in the following commitfest is no good: 

At the FOSDEM developer meeting, we do a run of CF patch triage, where
we check the topmost patches in order of number-of-commitfests.  If you
return an old patch and a new CF entry is created, this number is reset,
and we could quite possibly fail to detect some very old patch because
of this.  At times, the attention a patch gets during the CF triage is
sufficient to get the patch moving forward after long inactivity, so
this is not academic.  Case in point: [1].

So by all means let's return patches that rot or fail to get updated per
feedback.  But DO NOT return patches because of inactivity.

[1] https://postgr.es/m/202402011550.sfszd46247zi@alvherre.pgsql

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Las cosas son buenas o malas segun las hace nuestra opinión" (Lisias)




Re: Where can I find the doxyfile?

2024-02-04 Thread Alvaro Herrera
On 2024-Feb-02, John Morris wrote:

> Here is the updated patch. It should fix the meson issue when no
> doxygen is present.

I wish you'd stop proposing the lex filter so that we can discuss the
Doxyfile.in file and the accompanying Meson rule.

I think there's pretty much zero chance that we'd accept the
doxy_filter.l thingy in our source tree.  It seems something you want to
keep for your personal use.  I think this filter is the kind of genious
idea that it's OK if you can do it at home, but having something that
purports to display our source code and then show something that is
*not* our source tree sounds insane.  Who knows what artifacts are going
to show up in the doxygen output, and then we'll be on the hook to fix
those.  I propose to add the file to the wiki, maybe link to it from the
Developer_FAQ page.

Also, I'm not sure about the use case of people who wants to study the
Postgres source code but doesn't have time to download it into VSCode or
whatever.

In short: -1 from me.

I see the Doxy thing as roughly parallel to the coverage build rules ...
but the coverage rules seem more much closely intertwined with the tree
in a way that the Doxygen processing is not.  Speaking from my personal
point of view, I make very little use of our Doxygen pages, but I do use
them --- and very occassionally I would like to see what the changes
would be with some patch applied.  However, again, maybe having the .in
file in the wiki is enough, where you can download it and have your own
script to run it, and tweak the prefs however you want them and so on?

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/