date:20220711

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-07-11 Thread John Naylor

On Mon, Jul 11, 2022 at 11:07 PM Andres Freund  wrote:

> I wonder if we can add a somewhat more general function for scanning until
> some characters are found using SIMD? There's plenty other places that
could
> be useful.

In simple cases, we could possibly abstract the entire loop. With this
particular case, I imagine the most approachable way to write the loop
would be a bit more low-level:

while (p < end - VECTOR_WIDTH &&
   !vector_has_byte(p, '\\') &&
   !vector_has_byte(p, '"') &&
   vector_min_byte(p, 0x20))
p += VECTOR_WIDTH

I wonder if we'd lose a bit of efficiency here by not accumulating set bits
from the three conditions, but it's worth trying.
-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: First draft of the PG 15 release notes

2022-07-11 Thread Noah Misch

On Mon, Jul 11, 2022 at 12:39:57PM -0400, Bruce Momjian wrote:
> I had trouble reading the sentences in the order you used so I
> restructured it:
> 
>   The new default is one of the secure schema usage patterns thatlinkend="ddl-schemas-patterns"/> has recommended since the security
>   release for CVE-2018-1058.  The change applies to newly-created
>   databases in existing clusters and for new clusters.  Upgrading a
>   cluster or restoring a database dump will preserve existing permissions.

I agree with the sentence order change.

>   For existing databases, especially those having multiple users, consider
>   issuing REVOKE to adopt this new default.  For new
>   databases having zero need to defend against insider threats, granting
>   USAGE permission on their public
>   schemas will yield the behavior of prior releases.

s/USAGE/CREATE/ in the last sentence.  Looks good with that change.

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-11 Thread Amit Kapila

On Tue, Jul 12, 2022 at 11:38 AM Masahiko Sawada  wrote:
>
> On Tue, Jul 12, 2022 at 10:28 AM Masahiko Sawada  
> wrote:
> >
> >
> > I'm doing benchmark tests and will share the results.
> >
>
> I've done benchmark tests to measure the overhead introduced by doing
> bsearch() every time when decoding a commit record. I've simulated a
> very intensified situation where we decode 1M commit records while
> keeping builder->catchange.xip array but the overhead is negilible:
>
> HEAD: 584 ms
> Patched: 614 ms
>
> I've attached the benchmark script I used. With increasing
> LOG_SNAPSHOT_INTERVAL_MS to 9, the last decoding by
> pg_logicla_slot_get_changes() decodes 1M commit records while keeping
> catalog modifying transactions.
>

Thanks for the test. We should also see how it performs when (a) we
don't change LOG_SNAPSHOT_INTERVAL_MS, and (b) we have more DDL xacts
so that the array to search is somewhat bigger

-- 
With Regards,
Amit Kapila.

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-11 Thread Masahiko Sawada

On Tue, Jul 12, 2022 at 10:28 AM Masahiko Sawada  wrote:
>
> On Tue, Jul 12, 2022 at 9:48 AM Masahiko Sawada  wrote:
> >
> > On Fri, Jul 8, 2022 at 8:20 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Fri, Jul 8, 2022 at 5:59 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Fri, Jul 8, 2022 at 12:46 PM Masahiko Sawada  
> > > > wrote:
> > > > >
> > > > > On Fri, Jul 8, 2022 at 3:27 PM Amit Kapila  
> > > > > wrote:
> > > > > >
> > > > >
> > > > > > 1.
> > > > > > In ReorderBufferGetCatalogChangesXacts(), isn't it better to use the
> > > > > > list length of 'catchange_txns' to allocate xids array? If we can do
> > > > > > so, then we will save the need to repalloc as well.
> > > > >
> > > > > Since ReorderBufferGetcatalogChangesXacts() collects all ongoing
> > > > > catalog modifying transactions, the length of the array could be
> > > > > bigger than the one taken last time. We can start with the previous
> > > > > length but I think we cannot remove the need for repalloc.
> > > > >
> > > >
> > > > It is using the list "catchange_txns" to form xid array which
> > > > shouldn't change for the duration of
> > > > ReorderBufferGetCatalogChangesXacts(). Then the caller frees the xid
> > > > array after its use. Next time in
> > > > ReorderBufferGetCatalogChangesXacts(), the fresh allocation for xid
> > > > array happens, so not sure why repalloc would be required?
> > >
> > > Oops, I mistook catchange_txns for catchange->xcnt. You're right.
> > > Starting with the length of catchange_txns should be sufficient.
> > >
> >
> > I've attached an updated patch.
> >
> > While trying this idea, I noticed there is no API to get the length of
> > dlist, as we discussed offlist. Alternative idea was to use List
> > (T_XidList) but I'm not sure it's a great idea since deleting an xid
> > from the list is O(N), we need to implement list_delete_xid, and we
> > need to make sure allocating list node in the reorder buffer context.
> > So in the patch, I added a variable, catchange_ntxns, to keep track of
> > the length of the dlist. Please review it.
> >
>
> I'm doing benchmark tests and will share the results.
>

I've done benchmark tests to measure the overhead introduced by doing
bsearch() every time when decoding a commit record. I've simulated a
very intensified situation where we decode 1M commit records while
keeping builder->catchange.xip array but the overhead is negilible:

HEAD: 584 ms
Patched: 614 ms

I've attached the benchmark script I used. With increasing
LOG_SNAPSHOT_INTERVAL_MS to 9, the last decoding by
pg_logicla_slot_get_changes() decodes 1M commit records while keeping
catalog modifying transactions.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


bench.spec
Description: Binary data

Re: Reducing Memory Consumption (aset and generation)

2022-07-11 Thread David Rowley

On Mon, 11 Jul 2022 at 20:48, Matthias van de Meent
 wrote:
> > 2) v1-002-generation-reduces-memory-consumption.patch
> > Reduces memory used by struct GenerationBlock, by minus 8 bits,
>
> That seems fairly straight-forward -- 8 bytes saved on each page isn't
> a lot, but it's something.

I think 002 is likely the only patch here that has some merit.
However, it's hard to imagine any measurable performance gains from
it.  I think the smallest generation block we have today is 8192
bytes. Saving 8 bytes in that equates to a saving of 0.1% of memory.
For an 8MB page, it's 1024 times less than that.

I imagine Ranier has been working on this due the performance
regression mentioned in [1].  I think it'll be much more worthwhile to
aim to reduce the memory chunk overheads rather than the block
overheads, as Ranier is doing here. I posted a patch in [2] which does
that.  To make that work, I need to have the owning context in the
block. The 001 and 003 patch seems to remove those here.

David

[1] 
https://www.postgresql.org/message-id/caaphdvqxplzav6duer5vo_rbh_fehrhmlhigvqxw9jhcykp...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAApHDvpjauCRXcgcaL6+e3eqecEHoeRm9D-kcbuvBitgPnW=v...@mail.gmail.com

Re: Remove trailing newlines from pg_upgrade's messages

2022-07-11 Thread Peter Eisentraut




On 14.06.22 20:57, Tom Lane wrote:

Hence, the patch below removes trailing newlines from all of
pg_upgrade's message strings, and teaches its logging infrastructure
to print them where appropriate.  As in logging.c, there's now an
Assert that no format string passed to pg_log() et al ends with
a newline.


This patch looks okay to me.  I compared the output before and after in 
a few scenarios and didn't see any problematic differences.



This doesn't quite exactly match the code's prior behavior.  Aside
from the buggy-looking newlines mentioned above, there are a few
messages that formerly ended with a double newline, thus intentionally
producing a blank line, and now they don't.  I could have removed just
one of their newlines, but I'd have had to give up the Assert about
it, and I did not think that the extra blank lines were important
enough to justify that.


In this particular patch, the few empty lines that disappeared don't 
bother me.  In general, however, I think we can just fprintf(stderr, 
"\n") directly as necessary.

Re: PG15 beta1 sort performance regression due to Generation context change

2022-07-11 Thread David Rowley

On Fri, 3 Jun 2022 at 16:03, John Naylor  wrote:
>
> On Fri, Jun 3, 2022 at 10:14 AM David Rowley  wrote:
> > 4. As I just demonstrated in [1], if anyone is caught by this and has
> > a problem, the work_mem size increase required seems very small to get
> > performance back to better than in PG14. I found that setting work_mem
> > to 64.3MB makes PG15 faster than PG14 for the problem case. If anyone
> > happened to hit this case and find the performance regression
> > unacceptable then they have a way out... increase work_mem a little.
>
> Since #4 is such a small lift, I'd be comfortable with closing the open item.

I also think that we should close off this open item for PG15.  The
majority of cases are faster now and it is possible for anyone who
does happen to hit a bad case to raise work_mem by some fairly small
fraction.

I posted a WIP patch on [1] that we aim to get into PG16 to improve
the situation here further. I'm hoping the fact that we do have some
means to fix this for PG16 might mean we can leave this as a known
issue for PG15.

So far only Robert has raised concerns with this regression for PG15
(see [2]). Tom voted for leaving things as they are for PG15 in [3].
John agrees, as quoted above. Does anyone else have any opinion?

David

[1] 
https://www.postgresql.org/message-id/CAApHDvpjauCRXcgcaL6+e3eqecEHoeRm9D-kcbuvBitgPnW=v...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/ca+tgmozoyxfbn+aejgfjjccbew8mmkhffvcc61kp2oncmey...@mail.gmail.com
[3] https://www.postgresql.org/message-id/180278.1654009...@sss.pgh.pa.us

Re: [PATCH] New [relation] option engine

2022-07-11 Thread Nikolay Shaplov

В письме от понедельник, 11 июля 2022 г. 23:03:55 MSK пользователь Jeff Davis 
написал:

> > For index access methods "amoptions" member function that preformed
> > option
> > processing, were replaced with "amreloptspecset" member function that
> > provided
> > an SpecSet for reloptions for this AM, so caller can trigger option
> > processing
> > himself.
> 
> What about table access methods? There have been a couple attempts to
> allow custom reloptions for table AMs. Does this patch help that use
> case?

This patch does not add custom reloptions for table AM. It is already huge 
enough, with no extra functionality. But new option engine will make adding 
custom options for table AM more easy task, as main goal of this patch is to 
simplify adding options everywhere they needed. And yes, adding custom table 
AM options is one of my next goals, as soon as this patch is commit.

-- 
Nikolay Shaplov aka Nataraj
Fuzzing Engineer at Postgres Professional
Matrix IM: @dhyan:nataraj.su


signature.asc
Description: This is a digitally signed message part.

Re: Handle infinite recursion in logical replication setup

2022-07-11 Thread Amit Kapila

On Tue, Jul 12, 2022 at 8:43 AM vignesh C  wrote:
>
> On Mon, Jul 11, 2022 at 9:11 AM Peter Smith  wrote:
> >
> > Here are my review comments for the v30* patches:
> >
> > 
> > v30-0001
> > 
> >
> > 1.1 
> >
> > I was wondering if it is better to implement a new defGetOrigin method
> > now instead of just using the defGetString to process the 'origin',
> > since you may need to do that in future anyway if the 'origin' name is
> > planned to become specifiable by the user. OTOH maybe you prefer to
> > change this code later when the time comes. I am not sure what way is
> > best.
>
> I preferred to do that change when the feature is getting extended.
>

+1.

*
+$node_C->safe_psql(
+ 'postgres', "
+DELETE FROM tab");
+$node_B->safe_psql(
+ 'postgres', "
+DELETE FROM tab where a = 32");
+
+$node_A->wait_for_catchup($subname_BA);
+$node_B->wait_for_catchup($subname_AB);

Here, don't we need to use node_C instead of node_A for waiting as we
have performed an operation on node_C?

-- 
With Regards,
Amit Kapila.

Re: DropRelFileLocatorBuffers

2022-07-11 Thread Dilip Kumar

On Tue, Jul 12, 2022 at 7:55 AM Kyotaro Horiguchi
 wrote:
>
> At Mon, 11 Jul 2022 13:51:12 -0400, Robert Haas  wrote 
> in
> > On Fri, Jul 8, 2022 at 1:59 AM Kyotaro Horiguchi
> >  wrote:
> > > The function CreateAndCopyRelationData exists since before b0a55e4329
> > > but renamed since it takes RelFileLocators.
> >
> > I'm not very sold on this. I think that the places where you've
> > replaced RelFileLocator with just RelFile in various functions might
> > be an improvement, but the places where you've replaced Relation with
> > RelFile seem to me to be worse. I don't really see that there's
> > anything wrong with names like CreateAndCopyRelationData or
> > FlushRelationsAllBuffers, and in general I prefer function names that
> > are made up of whole words rather than parts of words.
>
> Fair enough. My first thought was that Relation can represent both
> Relation and "RelFile" but in the patch I choosed to make distinction
> between them by associating respectively to the types of the primary
> parameter (Relation or RelFileLocator).  So I'm fine with Relation
> instead since I see it more intuitive than RelFileLocator in the
> function names.
>
> The attached is that.

I think the naming used in your patch looks better to me. So +1 for the change.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Issue with pg_stat_subscription_stats

2022-07-11 Thread Amit Kapila

On Tue, Jul 12, 2022 at 3:26 AM Andres Freund  wrote:
>
> On 2022-07-07 10:50:27 +0900, Masahiko Sawada wrote:
> > Right, it's more robust. I've updated the patch accordingly.
>
> Do others have thoughts about backpatching this to 15 or not?
>

I am not against backpatching this but OTOH it doesn't appear critical
enough to block one's work, so not backpatching should be fine.

-- 
With Regards,
Amit Kapila.

RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-11 Thread shiy.f...@fujitsu.com

On Tue, Jul 12, 2022 8:49 AM Masahiko Sawada  wrote:
> 
> I've attached an updated patch.
> 
> While trying this idea, I noticed there is no API to get the length of
> dlist, as we discussed offlist. Alternative idea was to use List
> (T_XidList) but I'm not sure it's a great idea since deleting an xid
> from the list is O(N), we need to implement list_delete_xid, and we
> need to make sure allocating list node in the reorder buffer context.
> So in the patch, I added a variable, catchange_ntxns, to keep track of
> the length of the dlist. Please review it.
> 

Thanks for your patch. Here are some comments on the master patch.

1.
In catalog_change_snapshot.spec, should we use "RUNNING_XACTS record" instead of
"RUNNING_XACT record" / "XACT_RUNNING record" in the comment?

2.
+* Since catchange.xip is sorted, we find the lower bound of
+* xids that sill are interesting.

Typo?
"sill" -> "still"

3.
+* This array is set once when restoring the snapshot, xids are removed
+* from the array when decoding xl_running_xacts record, and then 
eventually
+* becomes an empty.

+   /* catchange list becomes an empty */
+   pfree(builder->catchange.xip);
+   builder->catchange.xip = NULL;

Should "becomes an empty" be modified to "becomes empty"?

4.
+ * changes that are smaller than ->xmin. Those won't ever get checked via
+ * the ->committed array and ->catchange, respectively. The committed xids will

Should we change 
"the ->committed array and ->catchange"
to
"the ->committed or ->catchange array"
?

Regards,
Shi yu

Re: Handle infinite recursion in logical replication setup

2022-07-11 Thread vignesh C

On Mon, Jul 11, 2022 at 5:03 PM Amit Kapila  wrote:
>
> On Sat, Jul 9, 2022 at 8:11 PM vignesh C  wrote:
> >
>
> Thanks, a few more comments on v30_0002*
> 1.
> +/*
> + * Represents whether copy_data parameter is specified with off, on or force.
>
> A comma is required after on.

Modified

> 2.
>   qsort(subrel_local_oids, list_length(subrel_states),
> sizeof(Oid), oid_cmp);
>
> + check_pub_table_subscribed(wrconn, sub->publications, copy_data,
> +sub->origin, subrel_local_oids,
> +list_length(subrel_states));
>
> We can avoid using list_length by using an additional variable in this case.

Modified

> 3.
> errmsg("table: \"%s.%s\" might have replicated data in the publisher",
> +nspname, relname),
>
> Why ':' is used after the table in the above message? I don't see such
> a convention at other places in the code. Also, having might in the
> error messages makes it less clear, so, can we slightly change the
> message as in the attached patch?

Modified as suggested

> 4. I have made some additional changes in the comments, kindly check
> the attached and merge those if you are okay.

I have merged the changes

> 5.
> +$node_C->safe_psql(
> + 'postgres', "
> +DELETE FROM tab_full");
> +$node_B->safe_psql(
> + 'postgres', "
> +DELETE FROM tab_full where a = 13");
>
> Don't we need to wait for these operations to replicate?

Modified to include wait

Thanks for the comments, the v31 patch attached at [1] has the changes
for the same.
[1] - 
https://www.postgresql.org/message-id/CALDaNm2-D860yULtcmZAzDbdiof-Dg6Y_YaY4owbO6Rj%3DXEHMw%40mail.gmail.com

Regards,
Vignesh

Re: DropRelFileLocatorBuffers

2022-07-11 Thread Kyotaro Horiguchi

At Mon, 11 Jul 2022 13:51:12 -0400, Robert Haas  wrote 
in 
> On Fri, Jul 8, 2022 at 1:59 AM Kyotaro Horiguchi
>  wrote:
> > The function CreateAndCopyRelationData exists since before b0a55e4329
> > but renamed since it takes RelFileLocators.
> 
> I'm not very sold on this. I think that the places where you've
> replaced RelFileLocator with just RelFile in various functions might
> be an improvement, but the places where you've replaced Relation with
> RelFile seem to me to be worse. I don't really see that there's
> anything wrong with names like CreateAndCopyRelationData or
> FlushRelationsAllBuffers, and in general I prefer function names that
> are made up of whole words rather than parts of words.

Fair enough. My first thought was that Relation can represent both
Relation and "RelFile" but in the patch I choosed to make distinction
between them by associating respectively to the types of the primary
parameter (Relation or RelFileLocator).  So I'm fine with Relation
instead since I see it more intuitive than RelFileLocator in the
function names.

The attached is that.

> > While working on this, I found that the following coment is wrong.
..
> I committed this.

Thanks!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From dbb3dbebbea6aae29588bfccdbda28d0cf15a6fe Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 8 Jul 2022 13:20:31 +0900
Subject: [PATCH v2] Rename some *RelFileLocator* functions

A recent commit b0a55e4329 changed the term RelFileNode to
RelFileLocator. Accordingly the names of some functions were
changed. However, since RelFileLocator doesn't fully cover what
RelFileNode represented, some of the function names have gotten less
intuitive.

This commit replaces all uses of RelFileLocator in function names with
Relation because that uses of RelFileLocator actually mean storage
object.
---
 src/backend/storage/buffer/bufmgr.c   | 63 +--
 src/backend/storage/buffer/localbuf.c | 16 +++
 src/backend/storage/smgr/smgr.c   |  4 +-
 src/include/storage/buf_internals.h   |  7 +--
 src/include/storage/bufmgr.h  | 10 ++---
 5 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index e257ae23e4..c7d7abcd73 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -121,7 +121,7 @@ typedef struct CkptTsStatus
  * Type for array used to sort SMgrRelations
  *
  * FlushRelationsAllBuffers shares the same comparator function with
- * DropRelFileLocatorsAllBuffers. Pointer to this struct and RelFileLocator must be
+ * DropRelationsAllBuffers. Pointer to this struct and RelFileLocator must be
  * compatible.
  */
 typedef struct SMgrSortArray
@@ -483,10 +483,10 @@ static BufferDesc *BufferAlloc(SMgrRelation smgr,
 			   BufferAccessStrategy strategy,
 			   bool *foundPtr);
 static void FlushBuffer(BufferDesc *buf, SMgrRelation reln);
-static void FindAndDropRelFileLocatorBuffers(RelFileLocator rlocator,
-			 ForkNumber forkNum,
-			 BlockNumber nForkBlock,
-			 BlockNumber firstDelBlock);
+static void FindAndDropRelationBuffers(RelFileLocator rlocator,
+	   ForkNumber forkNum,
+	   BlockNumber nForkBlock,
+	   BlockNumber firstDelBlock);
 static void RelationCopyStorageUsingBuffer(Relation src, Relation dst,
 		   ForkNumber forkNum,
 		   bool isunlogged);
@@ -3026,7 +3026,7 @@ BufferGetLSNAtomic(Buffer buffer)
 }
 
 /* -
- *		DropRelFileLocatorBuffers
+ *		DropRelationBuffers
  *
  *		This function removes from the buffer pool all the pages of the
  *		specified relation forks that have block numbers >= firstDelBlock.
@@ -3047,8 +3047,8 @@ BufferGetLSNAtomic(Buffer buffer)
  * 
  */
 void
-DropRelFileLocatorBuffers(SMgrRelation smgr_reln, ForkNumber *forkNum,
-		  int nforks, BlockNumber *firstDelBlock)
+DropRelationBuffers(SMgrRelation smgr_reln, ForkNumber *forkNum,
+	int nforks, BlockNumber *firstDelBlock)
 {
 	int			i;
 	int			j;
@@ -3064,8 +3064,8 @@ DropRelFileLocatorBuffers(SMgrRelation smgr_reln, ForkNumber *forkNum,
 		if (rlocator.backend == MyBackendId)
 		{
 			for (j = 0; j < nforks; j++)
-DropRelFileLocatorLocalBuffers(rlocator.locator, forkNum[j],
-			   firstDelBlock[j]);
+DropRelationLocalBuffers(rlocator.locator, forkNum[j],
+		 firstDelBlock[j]);
 		}
 		return;
 	}
@@ -3115,8 +3115,8 @@ DropRelFileLocatorBuffers(SMgrRelation smgr_reln, ForkNumber *forkNum,
 		nBlocksToInvalidate < BUF_DROP_FULL_SCAN_THRESHOLD)
 	{
 		for (j = 0; j < nforks; j++)
-			FindAndDropRelFileLocatorBuffers(rlocator.locator, forkNum[j],
-			 nForkBlock[j], firstDelBlock[j]);
+			FindAndDropRelationBuffers(rlocator.locator, forkNum[j],
+	   nForkBlock[j], firstDelBlock[j]

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2022-07-11 Thread Melanie Plageman

Hi,

In the attached patch set, I've added in missing IO operations for
certain IO Paths as well as enumerating in the commit message which IO
Paths and IO Operations are not currently counted and or not possible.

There is a TODO in HandleWalWriterInterrupts() about removing
pgstat_report_wal() since it is immediately before a proc_exit()

I was wondering if LocalBufferAlloc() should increment the counter or if
I should wait until GetLocalBufferStorage() to increment the counter.

I also realized that I am not differentiating between IOPATH_SHARED and
IOPATH_STRATEGY for IOOP_FSYNC. But, given that we don't know what type
of buffer we are fsync'ing by the time we call register_dirty_segment(),
I'm not sure how we would fix this.

On Wed, Jul 6, 2022 at 3:20 PM Andres Freund  wrote:

> On 2022-07-05 13:24:55 -0400, Melanie Plageman wrote:
> > From 2d089e26236c55d1be5b93833baa0cf7667ba38d Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman 
> > Date: Tue, 28 Jun 2022 11:33:04 -0400
> > Subject: [PATCH v22 1/3] Add BackendType for standalone backends
> >
> > All backends should have a BackendType to enable statistics reporting
> > per BackendType.
> >
> > Add a new BackendType for standalone backends, B_STANDALONE_BACKEND (and
> > alphabetize the BackendTypes). Both the bootstrap backend and single
> > user mode backends will have BackendType B_STANDALONE_BACKEND.
> >
> > Author: Melanie Plageman 
> > Discussion:
> https://www.postgresql.org/message-id/CAAKRu_aaq33UnG4TXq3S-OSXGWj1QGf0sU%2BECH4tNwGFNERkZA%40mail.gmail.com
> > ---
> >  src/backend/utils/init/miscinit.c | 17 +++--
> >  src/include/miscadmin.h   |  5 +++--
> >  2 files changed, 14 insertions(+), 8 deletions(-)
> >
> > diff --git a/src/backend/utils/init/miscinit.c
> b/src/backend/utils/init/miscinit.c
> > index eb43b2c5e5..07e6db1a1c 100644
> > --- a/src/backend/utils/init/miscinit.c
> > +++ b/src/backend/utils/init/miscinit.c
> > @@ -176,6 +176,8 @@ InitStandaloneProcess(const char *argv0)
> >  {
> >   Assert(!IsPostmasterEnvironment);
> >
> > + MyBackendType = B_STANDALONE_BACKEND;
>
> Hm. This is used for singleuser mode as well as bootstrap. Should we
> split those? It's not like bootstrap mode really matters for stats, so
> I'm inclined not to.
>
>
I have no opinion currently.
It depends on how commonly you think developers might want separate
bootstrap and single user mode IO stats.


>
> > @@ -375,6 +376,8 @@ BootstrapModeMain(int argc, char *argv[], bool
> check_only)
> >* out the initial relation mapping files.
> >*/
> >   RelationMapFinishBootstrap();
> > + // TODO: should this be done for bootstrap?
> > + pgstat_report_io_ops();
>
> Hm. Not particularly useful, but also not harmful. But we don't need an
> explicit call, because it'll be done at process exit too. At least I
> think, it could be that it's different for bootstrap.
>
>
>
I've removed this and other occurrences which were before proc_exit()
(and thus redundant). (Though I did not explicitly check if it was
different for bootstrap.)


>
> > diff --git a/src/backend/postmaster/autovacuum.c
> b/src/backend/postmaster/autovacuum.c
> > index 2e146aac93..e6dbb1c4bb 100644
> > --- a/src/backend/postmaster/autovacuum.c
> > +++ b/src/backend/postmaster/autovacuum.c
> > @@ -1712,6 +1712,9 @@ AutoVacWorkerMain(int argc, char *argv[])
> >   recentXid = ReadNextTransactionId();
> >   recentMulti = ReadNextMultiXactId();
> >   do_autovacuum();
> > +
> > + // TODO: should this be done more often somewhere in
> do_autovacuum()?
> > + pgstat_report_io_ops();
> >   }
>
> Don't think you need all these calls before process exit - it'll happen
> via pgstat_shutdown_hook().
>
> IMO it'd be a good idea to add pgstat_report_io_ops() to
> pgstat_report_vacuum()/analyze(), so that the stats for a longrunning
> autovac worker get updated more regularly.
>

noted and fixed.


>
>
> > diff --git a/src/backend/postmaster/bgwriter.c
> b/src/backend/postmaster/bgwriter.c
> > index 91e6f6ea18..87e4b9e9bd 100644
> > --- a/src/backend/postmaster/bgwriter.c
> > +++ b/src/backend/postmaster/bgwriter.c
> > @@ -242,6 +242,7 @@ BackgroundWriterMain(void)
> >
> >   /* Report pending statistics to the cumulative stats
> system */
> >   pgstat_report_bgwriter();
> > + pgstat_report_io_ops();
> >
> >   if (FirstCallSinceLastCheckpoint())
> >   {
>
> How about moving the pgstat_report_io_ops() into
> pgstat_report_bgwriter(), pgstat_report_autovacuum() etc? Seems
> unnecessary to have multiple pgstat_* calls in these places.
>
>
>
noted and fixed.


>
> > +/*
> > + * Flush out locally pending IO Operation statistics entries
> > + *
> > + * If nowait is true, this function returns false on lock failure.
> Otherwise
> > + * this function always returns true. Writer processes are mutually
> excluded
> > + * using LWLock, but readers are expe

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-11 Thread Masahiko Sawada

On Tue, Jul 12, 2022 at 9:48 AM Masahiko Sawada  wrote:
>
> On Fri, Jul 8, 2022 at 8:20 PM Masahiko Sawada  wrote:
> >
> > On Fri, Jul 8, 2022 at 5:59 PM Amit Kapila  wrote:
> > >
> > > On Fri, Jul 8, 2022 at 12:46 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Fri, Jul 8, 2022 at 3:27 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > >
> > > > > 1.
> > > > > In ReorderBufferGetCatalogChangesXacts(), isn't it better to use the
> > > > > list length of 'catchange_txns' to allocate xids array? If we can do
> > > > > so, then we will save the need to repalloc as well.
> > > >
> > > > Since ReorderBufferGetcatalogChangesXacts() collects all ongoing
> > > > catalog modifying transactions, the length of the array could be
> > > > bigger than the one taken last time. We can start with the previous
> > > > length but I think we cannot remove the need for repalloc.
> > > >
> > >
> > > It is using the list "catchange_txns" to form xid array which
> > > shouldn't change for the duration of
> > > ReorderBufferGetCatalogChangesXacts(). Then the caller frees the xid
> > > array after its use. Next time in
> > > ReorderBufferGetCatalogChangesXacts(), the fresh allocation for xid
> > > array happens, so not sure why repalloc would be required?
> >
> > Oops, I mistook catchange_txns for catchange->xcnt. You're right.
> > Starting with the length of catchange_txns should be sufficient.
> >
>
> I've attached an updated patch.
>
> While trying this idea, I noticed there is no API to get the length of
> dlist, as we discussed offlist. Alternative idea was to use List
> (T_XidList) but I'm not sure it's a great idea since deleting an xid
> from the list is O(N), we need to implement list_delete_xid, and we
> need to make sure allocating list node in the reorder buffer context.
> So in the patch, I added a variable, catchange_ntxns, to keep track of
> the length of the dlist. Please review it.
>

I'm doing benchmark tests and will share the results.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-07-11 Thread Masahiko Sawada

On Fri, Jul 8, 2022 at 3:43 PM John Naylor  wrote:
>
> On Fri, Jul 8, 2022 at 9:10 AM Masahiko Sawada  wrote:
>
> > I guess that the tree height is affected by where garbages are, right?
> > For example, even if all garbage in the table is concentrated in
> > 0.5GB, if they exist between 2^17 and 2^18 block, we use the first
> > byte of blockhi. If the table is larger than 128GB, the second byte of
> > the blockhi could be used depending on where the garbage exists.
>
> Right.
>
> > Another variation of how to store TID would be that we use the block
> > number as a key and store a bitmap of the offset as a value. We can
> > use Bitmapset for example,
>
> I like the idea of using existing code to set/check a bitmap if it's
> convenient. But (in case that was implied here) I'd really like to
> stay away from variable-length values, which would require
> "Single-value leaves" (slow). I also think it's fine to treat the
> key/value as just bits, and not care where exactly they came from, as
> we've been talking about.
>
> > or an approach like Roaring bitmap.
>
> This would require two new data structures instead of one. That
> doesn't seem like a path to success.

Agreed.

>
> > I think that at this stage it's better to define the design first. For
> > example, key size and value size, and these sizes are fixed or can be
> > set the arbitary size?
>
> I don't think we need to start over. Andres' prototype had certain
> design decisions built in for the intended use case (although maybe
> not clearly documented as such). Subsequent patches in this thread
> substantially changed many design aspects. If there were any changes
> that made things wonderful for vacuum, it wasn't explained, but Andres
> did explain how some of these changes were not good for other uses.
> Going to fixed 64-bit keys and values should still allow many future
> applications, so let's do that if there's no reason not to.

I thought Andres pointed out that given that we store BufferTag (or
part of that) into the key, the fixed 64-bit keys might not be enough
for buffer mapping use cases. If we want to use wider keys more than
64-bit, we would need to consider it.

>
> > For value size, if we support
> > different value sizes specified by the user, we can either embed
> > multiple values in the leaf node (called Multi-value leaves in ART
> > paper)
>
> I don't think "Multi-value leaves" allow for variable-length values,
> FWIW. And now I see I also used this term wrong in my earlier review
> comment -- v3/4 don't actually use "multi-value leaves", but Andres'
> does (going by the multiple leaf types). From the paper: "Multi-value
> leaves: The values are stored in one of four different leaf node
> types, which mirror the structure of inner nodes, but contain values
> instead of pointers."

Right, but sorry I meant the user specifies the arbitrary fixed-size
value length on creation like we do in dynahash.c.

>
> (It seems v3/v4 could be called a variation of "Combined pointer/value
> slots: If values fit into pointers, no separate node types are
> necessary. Instead, each pointer storage location in an inner node can
> either store a pointer or a value." But without the advantage of
> variable length keys).

Agreed.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Cleaning up historical portability baggage

2022-07-11 Thread Thomas Munro

On Tue, Jul 12, 2022 at 4:46 AM Robert Haas  wrote:
> I don't think that 0001 is buying us a whole lot, really. I prefer the
> style where we have PG-specific functions that behave differently on
> different platforms to the one where we call something that looks like
> a native OS function call on all platforms but on some of them it is
> secretly invoking a replacement implementation in src/port. The
> problem with the latter is it looks like you're using something that's
> universally supported and works the same way everywhere, but you're
> really not. If it were up to me, we'd have more pg_whatever() that
> calls whatever() on non-Windows and something else on Windows, rather
> than going in the direction that this patch takes us.

Hmm, but that's not what we're doing in general.  For example, on
Windows we're redirecting open() to a replacement function of our own,
we're not using "pg_open()" in our code.  That's not an example based
on AC_REPLACE_FUNCS, but there are plenty of those too.  Isn't this
quite well established?

AFAIK we generally only use pg_whatever() when there's a good reason,
such as an incompatibility, a complication or a different abstraction
that you want to highlight to a reader.  The reason here was
temporary: we couldn't implement standard pread/pwrite perfectly on
ancient HP-UX, but we *can* implement it on Windows, so the reason is
gone.

These particular pg_ prefixes have only been in our tree for a few
years and I was hoping to boot them out again before they stick, like
"Size".  I like using standard interfaces where possible for the very
basic stuff, to de-weird our stuff.

> I like all of the other patches. Reducing the number of configure
> tests that we need seems like a really good idea.

Thanks for looking.  Yeah, we could also be a little more aggressive
about removing configure tests, in the cases where it's just Windows
vs !Windows.  "HAVE_XXX" tests that are always true on POSIX systems
at the level we require would then be unnecessary.

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-11 Thread Masahiko Sawada

On Fri, Jul 8, 2022 at 8:20 PM Masahiko Sawada  wrote:
>
> On Fri, Jul 8, 2022 at 5:59 PM Amit Kapila  wrote:
> >
> > On Fri, Jul 8, 2022 at 12:46 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Fri, Jul 8, 2022 at 3:27 PM Amit Kapila  
> > > wrote:
> > > >
> > >
> > > > 1.
> > > > In ReorderBufferGetCatalogChangesXacts(), isn't it better to use the
> > > > list length of 'catchange_txns' to allocate xids array? If we can do
> > > > so, then we will save the need to repalloc as well.
> > >
> > > Since ReorderBufferGetcatalogChangesXacts() collects all ongoing
> > > catalog modifying transactions, the length of the array could be
> > > bigger than the one taken last time. We can start with the previous
> > > length but I think we cannot remove the need for repalloc.
> > >
> >
> > It is using the list "catchange_txns" to form xid array which
> > shouldn't change for the duration of
> > ReorderBufferGetCatalogChangesXacts(). Then the caller frees the xid
> > array after its use. Next time in
> > ReorderBufferGetCatalogChangesXacts(), the fresh allocation for xid
> > array happens, so not sure why repalloc would be required?
>
> Oops, I mistook catchange_txns for catchange->xcnt. You're right.
> Starting with the length of catchange_txns should be sufficient.
>

I've attached an updated patch.

While trying this idea, I noticed there is no API to get the length of
dlist, as we discussed offlist. Alternative idea was to use List
(T_XidList) but I'm not sure it's a great idea since deleting an xid
from the list is O(N), we need to implement list_delete_xid, and we
need to make sure allocating list node in the reorder buffer context.
So in the patch, I added a variable, catchange_ntxns, to keep track of
the length of the dlist. Please review it.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


master-v3-0001-Add-catalog-modifying-transactions-to-logical-dec.patch
Description: Binary data

Re: Support TRUNCATE triggers on foreign tables

2022-07-11 Thread Fujii Masao





On 2022/07/08 17:13, Ian Lawrence Barwick wrote:

If we want to add such prevention, we will need similar checks for
INSERT/DELETE/UPDATE not only TRUNCATE. However, I think such fix is independent
from this and it can be proposed as another patch.


Ah OK, makes sense from that point of view. Thanks for the clarification!


So I pushed the v2 patch that Yugo-san posted. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: making relfilenodes 56 bits

2022-07-11 Thread Andres Freund

On 2022-07-11 16:11:53 -0400, Robert Haas wrote:
> On Mon, Jul 11, 2022 at 3:34 PM Andres Freund  wrote:
> > Seems pretty simple to do. Have write_relmapper_file() write to a .tmp file
> > first (likely adding O_TRUNC to flags), use durable_rename() to rename it 
> > into
> > place.  The tempfile should probably be written out before the XLogInsert(),
> > the durable_rename() after, although I think it'd also be correct to more
> > closely approximate the current sequence.
> 
> Something like this?

Yea. I've not looked carefully, but on a quick skim it looks good.

> I chose not to use durable_rename() here, because that allowed me to
> do more of the work before starting the critical section, and it's
> probably slightly more efficient this way, too. That could be changed,
> though, if you really want to stick with durable_rename().

I guess I'm not enthused in duplicating the necessary knowledge in evermore
places. We've forgotten one of the magic incantations in the past, and needing
to find all the places that need to be patched is a bit bothersome.

Perhaps we could add extract helpers out of durable_rename()?

OTOH, I don't really see what we gain by keeping things out of the critical
section? It does seem good to have the temp-file creation/truncation and write
separately, but after that I don't think it's worth much to avoid a
PANIC. What legitimate issue does it avoid?

Greetings,

Andres Freund

Re: AIX support - alignment issues

2022-07-11 Thread Thomas Munro

On Tue, Jul 12, 2022 at 10:30 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > Here's a patch to remove all of these.
>
> Looks sane by eyeball --- I didn't grep for other references, though.

Thanks, pushed.

> > I didn't originally suggest that because of some kind of (mostly
> > vicarious) nostalgia.  I wonder if we should allow ourselves a
> > paragraph where we remember these systems.  I personally think it's
> > one of the amazing things about this project.  Here's what I came up
> > with, but I'm sure there are more.
>
> PlayStation 2 [1]?  Although I suppose that falls under MIPS,
> which probably means we could still run on it, if you can find one.

Yeah.  PS had MIPS, then PowerPC (Cell), and currently AMD
(interestingly they also run a modified FreeBSD kernel, but you can't
really get at it...).  Sega Dreamcast had SH4.

I added one more: Tru64 (but I didn't bother to list Digital UNIX or
OSF/1, not sure if software historians consider those different OSes
or just rebrands...).  Patches to improve this little paragraph
welcome.  Pushed.

Re: Making CallContext and InlineCodeBlock less special-case-y

2022-07-11 Thread Tom Lane

I wrote:
> Peter Eisentraut  writes:
>> On 10.07.22 01:50, Tom Lane wrote:
>>> As committed, gen_node_support.pl excludes CallContext and InlineCodeBlock
>>> from getting unneeded support functions via some very ad-hoc code.

>> Couldn't we just enable those support functions?  I think they were just 
>> excluded because they didn't have any before and nobody bothered to make 
>> any.

> Well, we could I suppose, but that path leads to a lot of dead code in
> backend/nodes/ --- obviously these two alone are negligible, but I want
> a story other than "it's a hack" for execnodes.h and the other files
> we exclude from generation of support code.

Here's a proposed patch for this bit.  Again, whether these two
node types have unnecessary support functions is not the point ---
obviously we could afford to waste that much space.  Rather, what
I'm after is to have a more explainable and flexible way of dealing
with the file-level exclusions applied to a lot of other node types.
This patch doesn't make any change in the script's output now, but
it gives us flexibility for the future.

regards, tom lane

diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index 2c06609726..056530a657 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -50,6 +50,8 @@ my @no_equal;
 my @no_read;
 # node types we don't want read/write support for
 my @no_read_write;
+# node types we don't want any support functions for, just node tags
+my @nodetag_only;
 
 # types that are copied by straight assignment
 my @scalar_types = qw(
@@ -95,7 +97,10 @@ push @scalar_types, qw(EquivalenceClass* EquivalenceMember*);
 # currently not required.
 push @scalar_types, qw(QualCost);
 
-# Nodes from these input files don't need support functions, just node tags.
+# Nodes from these input files are automatically treated as nodetag_only.
+# In the future we might add explicit pg_node_attr labeling to some of these
+# files and remove them from this list, but for now this is the path of least
+# resistance.
 my @nodetag_only_files = qw(
   nodes/execnodes.h
   access/amapi.h
@@ -113,10 +118,8 @@ my @nodetag_only_files = qw(
 
 # XXX various things we are not publishing right now to stay level
 # with the manual system
-push @no_copy,  qw(CallContext InlineCodeBlock);
-push @no_equal, qw(CallContext InlineCodeBlock);
 push @no_read_write,
-  qw(AccessPriv AlterTableCmd CallContext CreateOpClassItem FunctionParameter InferClause InlineCodeBlock ObjectWithArgs OnConflictClause PartitionCmd RoleSpec VacuumRelation);
+  qw(AccessPriv AlterTableCmd CreateOpClassItem FunctionParameter InferClause ObjectWithArgs OnConflictClause PartitionCmd RoleSpec VacuumRelation);
 push @no_read, qw(A_ArrayExpr A_Indices A_Indirection AlterStatsStmt
   CollateClause ColumnDef ColumnRef CreateForeignTableStmt CreateStatsStmt
   CreateStmt FuncCall ImportForeignSchemaStmt IndexElem IndexStmt
@@ -254,6 +257,10 @@ foreach my $infile (@ARGV)
 		{
 			push @no_read, $in_struct;
 		}
+		elsif ($attr eq 'nodetag_only')
+		{
+			push @nodetag_only, $in_struct;
+		}
 		elsif ($attr eq 'special_read_write')
 		{
 			# This attribute is called
@@ -314,13 +321,9 @@ foreach my $infile (@ARGV)
 	$node_type_info{$in_struct}->{field_types} = \%ft;
 	$node_type_info{$in_struct}->{field_attrs} = \%fa;
 
-	# Exclude nodes in nodetag_only_files from support.
-	if (elem $infile, @nodetag_only_files)
-	{
-		push @no_copy,   $in_struct;
-		push @no_equal,  $in_struct;
-		push @no_read_write, $in_struct;
-	}
+	# Propagate nodetag_only marking from files to nodes
+	push @nodetag_only, $in_struct
+	  if (elem $infile, @nodetag_only_files);
 
 	# Propagate some node attributes from supertypes
 	if ($supertype)
@@ -515,6 +518,7 @@ print $eff $node_includes;
 foreach my $n (@node_types)
 {
 	next if elem $n, @abstract_types;
+	next if elem $n, @nodetag_only;
 	my $struct_no_copy  = (elem $n, @no_copy);
 	my $struct_no_equal = (elem $n, @no_equal);
 	next if $struct_no_copy && $struct_no_equal;
@@ -706,6 +710,7 @@ print $rff $node_includes;
 foreach my $n (@node_types)
 {
 	next if elem $n, @abstract_types;
+	next if elem $n, @nodetag_only;
 	next if elem $n, @no_read_write;
 
 	# XXX For now, skip all "Stmt"s except that ones that were there before.
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index adc549002a..c8ed4e0552 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -61,12 +61,17 @@ typedef enum NodeTag
  *
  * - no_read: Does not support nodeRead() at all.
  *
+ * - nodetag_only: Does not support copyObject(), equal(), outNode(),
+ *   or nodeRead().
+ *
  * - special_read_write: Has special treatment in outNode() and nodeRead().
  *
  * Node types can be supertypes of other types whether or not they are marked
  * abstract: if a node struct ap

Re: automatically generating node support functions

2022-07-11 Thread Andres Freund

On 2022-07-11 18:39:44 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I'm not entirely sure how well either the existing or the sketch above works
> > when doing a VPATH build using tarball sources, and updating the files.
> 
> Seems like an awful lot of effort to avoid having multiple copies
> of the file list.  I think we should just do what I sketched earlier,
> ie put the master list into gen_node_support.pl and have it cross-check
> that against its command line.  If the meson system can avoid having
> its own copy of the list, great; but I don't feel like we have to make
> that happen for the makefiles or Solution.pm.

WFM.

Re: automatically generating node support functions

2022-07-11 Thread Tom Lane

Andres Freund  writes:
> I'm not entirely sure how well either the existing or the sketch above works
> when doing a VPATH build using tarball sources, and updating the files.

Seems like an awful lot of effort to avoid having multiple copies
of the file list.  I think we should just do what I sketched earlier,
ie put the master list into gen_node_support.pl and have it cross-check
that against its command line.  If the meson system can avoid having
its own copy of the list, great; but I don't feel like we have to make
that happen for the makefiles or Solution.pm.

regards, tom lane

Re: AIX support - alignment issues

2022-07-11 Thread Tom Lane

Thomas Munro  writes:
> Here's a patch to remove all of these.

Looks sane by eyeball --- I didn't grep for other references, though.

> I didn't originally suggest that because of some kind of (mostly
> vicarious) nostalgia.  I wonder if we should allow ourselves a
> paragraph where we remember these systems.  I personally think it's
> one of the amazing things about this project.  Here's what I came up
> with, but I'm sure there are more.

PlayStation 2 [1]?  Although I suppose that falls under MIPS,
which probably means we could still run on it, if you can find one.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/05e101c1834a%24e398b920%24f90e10ac%40toronto.redhat.com

Re: automatically generating node support functions

2022-07-11 Thread Andres Freund

Hi,

On 2022-07-11 18:09:15 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I don't think it's worth worrying about this not working reliably for non
> > --enable-depend builds, there's a lot more broken than this.
>
> Well, *I* care about that, and I won't stand for making the
> non-enable-depend case significantly more broken than it is now.
>
> In particular, what you're proposing would mean that "make clean"
> followed by rebuild wouldn't be sufficient to update everything
> anymore; you'd have to resort to maintainer-clean or "git clean -dfx"
> after touching any node definition file, else gen_node_support.pl
> would not get re-run.  Up with that I will not put.

I'm not sure it'd have to mean that, but we could just implement the
dependency stuff independent of the existing autodepend logic. Something like:

# ensure that dependencies of
-include gen_node_support.pl.deps
node-support-stamp: gen_node_support.pl
$(PERL) --deps $^.deps $^

I guess we'd have to distribute gen_node_support.pl.deps to make this work in
tarball builds - which is probably fine? Not really different than including
stamp files.

I'm not entirely sure how well either the existing or the sketch above works
when doing a VPATH build using tarball sources, and updating the files.

Greetings,

Andres Freund

Re: AIX support - alignment issues

2022-07-11 Thread Thomas Munro

On Tue, Jul 12, 2022 at 7:24 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > It's funny to think that you probably could run modern PostgreSQL on
> > the Sun 3 boxes the project started on in 1986 (based on clues from
> > the papers in our history section) if you put NetBSD on them, but
> > you'd probably need to cross compile due to lack of RAM.
>
> Yeah.  I'm wondering if that sh3el package was cross-compiled,
> and if so whether it was just part of a mass package build rather
> than something somebody was specifically interested in.  You'd
> have to be a glutton for pain to want to do actual work with PG
> on the kind of SH3 hardware that seems to be available.

/me pictures Stark wheeling a real Sun 3 into a conference room

Yeah, we can always consider putting SuperH back if someone showed up
to maintain/test it.  That seems unlikely, but apparently there's an
open source silicon project based on this ISA, so maybe a fast one
isn't impossible...

Here's a patch to remove all of these.

I didn't originally suggest that because of some kind of (mostly
vicarious) nostalgia.  I wonder if we should allow ourselves a
paragraph where we remember these systems.  I personally think it's
one of the amazing things about this project.  Here's what I came up
with, but I'm sure there are more.
From 9c20f8b70632a6a79333c835bcf3f4c7d427f1cf Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 12 Jul 2022 08:04:46 +1200
Subject: [PATCH 1/2] Further tidy-up for supported CPUs.

Further to commit 92d70b77, let's drop the code we carry for the
following untested architectures: M68K, M88K, M32R, SuperH.  We have no
idea if anything actually works there, and surely as vintage hardware
and microcontrollers they would be underpowered for any modern purpose.

We could always consider re-adding SuperH based on modern evidence of
usage and build farm support, if someone shows up to provide it.

While here, SPARC is usually written in all caps.

Discussion: https://postgr.es/m/959917.1657522169%40sss.pgh.pa.us
---
 doc/src/sgml/installation.sgml|   5 +-
 src/backend/storage/lmgr/s_lock.c |  65 ---
 src/include/storage/s_lock.h  | 102 --
 3 files changed, 2 insertions(+), 170 deletions(-)

diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index dec13eaa93..381b728e08 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -2125,11 +2125,10 @@ export MANPATH
 
   
In general, PostgreSQL can be expected to work on
-   these CPU architectures: x86, PowerPC, S/390, Sparc, ARM, MIPS, RISC-V,
+   these CPU architectures: x86, PowerPC, S/390, SPARC, ARM, MIPS, RISC-V,
and PA-RISC, including
big-endian, little-endian, 32-bit, and 64-bit variants where applicable.
-   Code support exists for M68K, M88K, M32R, and SuperH, but these
-   architectures are not known to have been tested recently.  It is often
+   It is often
possible to build on an unsupported CPU type by configuring with
--disable-spinlocks, but performance will be poor.
   
diff --git a/src/backend/storage/lmgr/s_lock.c b/src/backend/storage/lmgr/s_lock.c
index baea773a02..2a658ff594 100644
--- a/src/backend/storage/lmgr/s_lock.c
+++ b/src/backend/storage/lmgr/s_lock.c
@@ -220,71 +220,6 @@ update_spins_per_delay(int shared_spins_per_delay)
 }
 
 
-/*
- * Various TAS implementations that cannot live in s_lock.h as no inline
- * definition exists (yet).
- * In the future, get rid of tas.[cso] and fold it into this file.
- *
- * If you change something here, you will likely need to modify s_lock.h too,
- * because the definitions for these are split between this file and s_lock.h.
- */
-
-
-#ifdef HAVE_SPINLOCKS			/* skip spinlocks if requested */
-
-
-#if defined(__GNUC__)
-
-/*
- * All the gcc flavors that are not inlined
- */
-
-
-/*
- * Note: all the if-tests here probably ought to be testing gcc version
- * rather than platform, but I don't have adequate info to know what to
- * write.  Ideally we'd flush all this in favor of the inline version.
- */
-#if defined(__m68k__) && !defined(__linux__)
-/* really means: extern int tas(slock_t* **lock); */
-static void
-tas_dummy()
-{
-	__asm__ __volatile__(
-#if (defined(__NetBSD__) || defined(__OpenBSD__)) && defined(__ELF__)
-/* no underscore for label and % for registers */
-		 "\
-.global		tas \n\
-tas:			\n\
-			movel	%sp@(0x4),%a0	\n\
-			tas 	%a0@		\n\
-			beq 	_success	\n\
-			moveq	#-128,%d0	\n\
-			rts \n\
-_success:		\n\
-			moveq	#0,%d0		\n\
-			rts \n"
-#else
-		 "\
-.global		_tas\n\
-_tas:			\n\
-			movel	sp@(0x4),a0	\n\
-			tas 	a0@			\n\
-			beq 	_success	\n\
-			moveq 	#-128,d0	\n\
-			rts	\n\
-_success:		\n\
-			moveq 	#0,d0		\n\
-			rts	\n"
-#endif			/* (__NetBSD__ || __OpenBSD__) && __ELF__ */
-		);
-}
-#endif			/* __m68k__ && !__linux__ */
-#endif			/* not __GNUC__ */
-#endif			/* HAVE_SPINLOCKS */
-
-
-
 /

Re: automatically generating node support functions

2022-07-11 Thread Tom Lane

Andres Freund  writes:
> I don't think it's worth worrying about this not working reliably for non
> --enable-depend builds, there's a lot more broken than this.

Well, *I* care about that, and I won't stand for making the
non-enable-depend case significantly more broken than it is now.
In particular, what you're proposing would mean that "make clean"
followed by rebuild wouldn't be sufficient to update everything
anymore; you'd have to resort to maintainer-clean or "git clean -dfx"
after touching any node definition file, else gen_node_support.pl
would not get re-run.  Up with that I will not put.

regards, tom lane

Re: Issue with pg_stat_subscription_stats

2022-07-11 Thread Andres Freund

On 2022-07-07 10:50:27 +0900, Masahiko Sawada wrote:
> Right, it's more robust. I've updated the patch accordingly.

Do others have thoughts about backpatching this to 15 or not?

Re: automatically generating node support functions

2022-07-11 Thread Andres Freund

Hi,

On 2022-07-11 16:38:05 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2022-07-11 15:54:22 -0400, Tom Lane wrote:
> >> We can't simply move the file list into gen_node_support.pl, because
> >> (a) the build system has to know about the dependencies involved
> 
> > Meson has builtin support for tools like gen_node_support.pl reporting which
> > files they've read and then to use those as dependencies. It'd not be a lot 
> > of
> > effort to open-code that with make either.
> 
> If you want to provide code for that, sure, but I don't know how to do it.

It'd basically be something like a --deps option providing a path to a file
(e.g. .deps/nodetags.Po) where the script would emit something roughly
equivalent to

path/to/nodetags.h: path/to/nodes/nodes.h
path/to/nodetags.h: path/to/nodes/primnodes.h
...
path/to/readfuncs.c: path/to/nodetags.h

It might or might not make sense to output this as one rule instead of
multiple ones.

I think our existing dependency support would do the rest.

We'd still need a dependency on node-support-stamp (or nodetags.h or ...), to
trigger the first invocation of gen_node_support.pl.

I don't think it's worth worrying about this not working reliably for non
--enable-depend builds, there's a lot more broken than this. But it might be a
bit annoying to deal with either a) creating the .deps directory even without
--enable-depend, or b) specifying --deps only optionally.

I can give it a go if this doesn't sound insane.

Greetings,

Andres Freund

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-07-11 Thread Nathan Bossart

On Mon, Jul 11, 2022 at 01:25:33PM -0700, Andres Freund wrote:
> On 2022-04-08 13:18:57 -0700, Nathan Bossart wrote:
>> @@ -1035,32 +1036,9 @@ ParseConfigDirectory(const char *includedir,
>>  
>>  join_path_components(filename, directory, de->d_name);
>>  canonicalize_path(filename);
>> -if (stat(filename, &st) == 0)
>> +de_type = get_dirent_type(filename, de, true, elevel);
>> +if (de_type == PGFILETYPE_ERROR)
>>  {
>> -if (!S_ISDIR(st.st_mode))
>> -{
>> -/* Add file to array, increasing its size in 
>> blocks of 32 */
>> -if (num_filenames >= size_filenames)
>> -{
>> -size_filenames += 32;
>> -filenames = (char **) 
>> repalloc(filenames,
>> -
>> size_filenames * sizeof(char *));
>> -}
>> -filenames[num_filenames] = pstrdup(filename);
>> -num_filenames++;
>> -}
>> -}
>> -else
>> -{
>> -/*
>> - * stat does not care about permissions, so the most 
>> likely reason
>> - * a file can't be accessed now is if it was removed 
>> between the
>> - * directory listing and now.
>> - */
>> -ereport(elevel,
>> -(errcode_for_file_access(),
>> - errmsg("could not stat file \"%s\": 
>> %m",
>> -filename)));
>>  record_config_file_error(psprintf("could not stat file 
>> \"%s\"",
>>  
>>   filename),
>>   
>> calling_file, calling_lineno,
>> @@ -1068,6 +1046,18 @@ ParseConfigDirectory(const char *includedir,
>>  status = false;
>>  goto cleanup;
>>  }
>> +else if (de_type != PGFILETYPE_DIR)
>> +{
>> +/* Add file to array, increasing its size in blocks of 
>> 32 */
>> +if (num_filenames >= size_filenames)
>> +{
>> +size_filenames += 32;
>> +filenames = (char **) repalloc(filenames,
>> +
>>size_filenames * sizeof(char *));
>> +}
>> +filenames[num_filenames] = pstrdup(filename);
>> +num_filenames++;
>> +}
>>  }
>>  
>>  if (num_filenames > 0)
> 
> Seems like the diff would be easier to read if it didn't move code around as
> much?

I opted to reorganize in order save an indent and simplify the conditions a
bit.  The alternative is something like this:

if (de_type != PGFILETYPE_ERROR)
{
if (de_type != PGTFILETYPE_DIR)
{
...
}
}
else
{
...
}

I don't feel strongly one way or another, and I'll change it if you think
it's worth it to simplify the diff.

> Looks pretty reasonable, I'd be happy to commit it, I think.

Appreciate it.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: automatically generating node support functions

2022-07-11 Thread Tom Lane

I wrote:
> Andres Freund  writes:
>> Additionally, I think we've had to add tags to the enum in minor releases
>> before and I'm afraid this now would end up looking even more awkward?

> Peter and I already had a discussion about that upthread --- we figured
> that if there's a way to manually assign a nodetag's number, you could use
> that option when you have to add a tag in a stable branch.  We didn't
> actually build out that idea, but I can go do that, if we can solve the
> more fundamental problem of keeping the autogenerated numbers stable.

> One issue with that idea, of course, is that you have to remember to do
> it like that when back-patching a node addition.  Ideally there'd be
> something that'd carp if the last autogenerated tag moves in a stable
> branch, but I'm not very sure where to put that.

One way to do it is to provide logic in gen_node_support.pl to check
that, and activate that logic only in back branches.  If we make that
part of the branch-making procedure, we'd not forget to do it.

Proposed patch attached.

regards, tom lane

diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index 2c06609726..2c6766f537 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -34,6 +34,20 @@ sub elem
 	return grep { $_ eq $x } @_;
 }
 
+
+# ARM ABI STABILITY CHECK HERE:
+#
+# In stable branches, set $last_nodetag to the name of the last node type
+# that should receive an auto-generated nodetag number, and $last_nodetag_no
+# to its number.  The script will then complain if those values don't match
+# reality, providing a cross-check that we haven't broken ABI by adding or
+# removing nodetags.
+# In HEAD, these variables should be left undef, since we don't promise
+# ABI stability during development.
+
+my $last_nodetag= undef;
+my $last_nodetag_no = undef;
+
 # output file names
 my @output_files;
 
@@ -88,6 +102,9 @@ my @custom_copy_equal;
 # Similarly for custom read/write implementations.
 my @custom_read_write;
 
+# Track node types with manually assigned NodeTag numbers.
+my %manual_nodetag_number;
+
 # EquivalenceClasses are never moved, so just shallow-copy the pointer
 push @scalar_types, qw(EquivalenceClass* EquivalenceMember*);
 
@@ -267,6 +284,10 @@ foreach my $infile (@ARGV)
 			# does in fact exist.
 			push @no_read_write, $in_struct;
 		}
+		elsif ($attr =~ /^nodetag_number\((\d+)\)$/)
+		{
+			$manual_nodetag_number{$in_struct} = $1;
+		}
 		else
 		{
 			die
@@ -472,14 +493,31 @@ open my $nt, '>', 'nodetags.h' . $tmpext or die $!;
 
 printf $nt $header_comment, 'nodetags.h';
 
-my $i = 1;
+my $tagno= 0;
+my $last_tag = undef;
 foreach my $n (@node_types, @extra_tags)
 {
 	next if elem $n, @abstract_types;
-	print $nt "\tT_${n} = $i,\n";
-	$i++;
+	if (defined $manual_nodetag_number{$n})
+	{
+		# do not change $tagno or $last_tag
+		print $nt "\tT_${n} = $manual_nodetag_number{$n},\n";
+	}
+	else
+	{
+		$tagno++;
+		$last_tag = $n;
+		print $nt "\tT_${n} = $tagno,\n";
+	}
 }
 
+# verify that last auto-assigned nodetag stays stable
+die "ABI stability break: last nodetag is $last_tag not $last_nodetag\n"
+  if (defined $last_nodetag && $last_nodetag ne $last_tag);
+die
+  "ABI stability break: last nodetag number is $tagno not $last_nodetag_no\n"
+  if (defined $last_nodetag_no && $last_nodetag_no ne $tagno);
+
 close $nt;
 
 
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index adc549002a..e0b336cd28 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -63,6 +63,11 @@ typedef enum NodeTag
  *
  * - special_read_write: Has special treatment in outNode() and nodeRead().
  *
+ * - nodetag_number(VALUE): assign the specified nodetag number instead of
+ *   an auto-generated number.  Typically this would only be used in stable
+ *   branches, to give a newly-added node type a number without breaking ABI
+ *   by changing the numbers of existing node types.
+ *
  * Node types can be supertypes of other types whether or not they are marked
  * abstract: if a node struct appears as the first field of another struct
  * type, then it is the supertype of that type.  The no_copy, no_equal, and
diff --git a/src/tools/RELEASE_CHANGES b/src/tools/RELEASE_CHANGES
index e8de724fcd..73b02fa2a4 100644
--- a/src/tools/RELEASE_CHANGES
+++ b/src/tools/RELEASE_CHANGES
@@ -107,6 +107,10 @@ Starting a New Development Cycle
   placeholder), "git rm" the previous one, and update release.sgml and
   filelist.sgml to match.
 
+* In the newly-made branch, change src/backend/nodes/gen_node_support.pl
+  to enforce ABI stability of the NodeTag list (see "ARM ABI STABILITY
+  CHECK HERE" therein).
+
 * Notify the private committers email list, to ensure all committers
   are aware of the new branch even if they're not paying close attention
   to pgsql-hackers.

Re: Weird behaviour with binary copy, arrays and column count

2022-07-11 Thread Greg Stark

On Fri, 8 Jul 2022 at 13:09, James Vanns  wrote:
>
> It does seem to smell of an alignment, padding, buffer overrun, parsing kind 
> of error.

It does I think you may need to bust out a debugger and see what
array_recv is actually seeing...

-- 
greg

Re: automatically generating node support functions

2022-07-11 Thread Tom Lane

Andres Freund  writes:
> On 2022-07-11 15:54:22 -0400, Tom Lane wrote:
>> We can't simply move the file list into gen_node_support.pl, because
>> (a) the build system has to know about the dependencies involved

> Meson has builtin support for tools like gen_node_support.pl reporting which
> files they've read and then to use those as dependencies. It'd not be a lot of
> effort to open-code that with make either.

If you want to provide code for that, sure, but I don't know how to do it.

>> (b) gen_node_support.pl wouldn't know what to do in VPATH situations.

> We could easily add a --include-path argument or such. That'd be trivial to
> set for all of the build solutions.

True.

regards, tom lane

Re: automatically generating node support functions

2022-07-11 Thread Tom Lane

Andres Freund  writes:
> On 2022-07-11 16:17:28 -0400, Robert Haas wrote:
>> Sorry if I'm being dense, but why do we have to duplicate the list of
>> files instead of having gen_node_support.pl just sort whatever list
>> the build system provides to it?

> Because right now there's two buildsystems already (look at
> Solution.pm). Looks like we'll briefly have three, then two again.

There are two things we need: (1) be sure that the build system knows
about all the files of interest, and (2) process them in the correct
order, which is *not* alphabetical.  "Just sort" won't achieve either.

regards, tom lane

Re: automatically generating node support functions

2022-07-11 Thread Andres Freund

Hi,

On 2022-07-11 16:17:28 -0400, Robert Haas wrote:
> On Mon, Jul 11, 2022 at 3:54 PM Tom Lane  wrote:
> > We can't simply move the file list into gen_node_support.pl, because
> > (a) the build system has to know about the dependencies involved, and
> > (b) gen_node_support.pl wouldn't know what to do in VPATH situations.
> > However, we could have gen_node_support.pl contain a canonical list
> > of the files it expects to be handed, and make it bitch if its
> > arguments don't match that.
> 
> Sorry if I'm being dense, but why do we have to duplicate the list of
> files instead of having gen_node_support.pl just sort whatever list
> the build system provides to it?

Because right now there's two buildsystems already (look at
Solution.pm). Looks like we'll briefly have three, then two again.

Greetings,

Andres Freund

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-07-11 Thread Andres Freund

Hi,

On 2022-04-08 13:18:57 -0700, Nathan Bossart wrote:
> @@ -1035,32 +1036,9 @@ ParseConfigDirectory(const char *includedir,
>  
>   join_path_components(filename, directory, de->d_name);
>   canonicalize_path(filename);
> - if (stat(filename, &st) == 0)
> + de_type = get_dirent_type(filename, de, true, elevel);
> + if (de_type == PGFILETYPE_ERROR)
>   {
> - if (!S_ISDIR(st.st_mode))
> - {
> - /* Add file to array, increasing its size in 
> blocks of 32 */
> - if (num_filenames >= size_filenames)
> - {
> - size_filenames += 32;
> - filenames = (char **) 
> repalloc(filenames,
> - 
> size_filenames * sizeof(char *));
> - }
> - filenames[num_filenames] = pstrdup(filename);
> - num_filenames++;
> - }
> - }
> - else
> - {
> - /*
> -  * stat does not care about permissions, so the most 
> likely reason
> -  * a file can't be accessed now is if it was removed 
> between the
> -  * directory listing and now.
> -  */
> - ereport(elevel,
> - (errcode_for_file_access(),
> -  errmsg("could not stat file \"%s\": 
> %m",
> - filename)));
>   record_config_file_error(psprintf("could not stat file 
> \"%s\"",
>   
>   filename),
>
> calling_file, calling_lineno,
> @@ -1068,6 +1046,18 @@ ParseConfigDirectory(const char *includedir,
>   status = false;
>   goto cleanup;
>   }
> + else if (de_type != PGFILETYPE_DIR)
> + {
> + /* Add file to array, increasing its size in blocks of 
> 32 */
> + if (num_filenames >= size_filenames)
> + {
> + size_filenames += 32;
> + filenames = (char **) repalloc(filenames,
> + 
>size_filenames * sizeof(char *));
> + }
> + filenames[num_filenames] = pstrdup(filename);
> + num_filenames++;
> + }
>   }
>  
>   if (num_filenames > 0)

Seems like the diff would be easier to read if it didn't move code around as
much?

Looks pretty reasonable, I'd be happy to commit it, I think.

Greetings,

Andres Freund

Re: automatically generating node support functions

2022-07-11 Thread Andres Freund

Hi,

On 2022-07-11 15:54:22 -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Mon, Jul 11, 2022 at 1:57 PM Tom Lane  wrote:
> >> More generally, I'm having second thoughts about the wisdom of
> >> auto-generating the NodeTag enum at all.  With the current setup,
> >> I am absolutely petrified about the risk of silent ABI breakage
> >> thanks to the enum order changing.
> 
> > I think this is a valid concern, but having it be automatically
> > generated is awfully handy, so I think it would be nice to find some
> > way of preserving that.
> 
> Agreed.  The fundamental problem seems to be that each build toolchain
> has its own source of truth about the file processing order, but we now
> see that there had better be only one.  We could make the sole source
> of truth about that be gen_node_support.pl itself, I think.
> 
> We can't simply move the file list into gen_node_support.pl, because

> (a) the build system has to know about the dependencies involved

Meson has builtin support for tools like gen_node_support.pl reporting which
files they've read and then to use those as dependencies. It'd not be a lot of
effort to open-code that with make either.

Doesn't look like we have dependency handling in Solution.pm?

> (b) gen_node_support.pl wouldn't know what to do in VPATH situations.

We could easily add a --include-path argument or such. That'd be trivial to
set for all of the build solutions.

FWIW, for meson I already needed to add an option to specify the location of
output files (since scripts are called from the root of the build directory).

Greetings,

Andres Freund

Re: automatically generating node support functions

2022-07-11 Thread Robert Haas

On Mon, Jul 11, 2022 at 3:54 PM Tom Lane  wrote:
> We can't simply move the file list into gen_node_support.pl, because
> (a) the build system has to know about the dependencies involved, and
> (b) gen_node_support.pl wouldn't know what to do in VPATH situations.
> However, we could have gen_node_support.pl contain a canonical list
> of the files it expects to be handed, and make it bitch if its
> arguments don't match that.

Sorry if I'm being dense, but why do we have to duplicate the list of
files instead of having gen_node_support.pl just sort whatever list
the build system provides to it?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: making relfilenodes 56 bits

2022-07-11 Thread Robert Haas

On Mon, Jul 11, 2022 at 3:34 PM Andres Freund  wrote:
> Seems pretty simple to do. Have write_relmapper_file() write to a .tmp file
> first (likely adding O_TRUNC to flags), use durable_rename() to rename it into
> place.  The tempfile should probably be written out before the XLogInsert(),
> the durable_rename() after, although I think it'd also be correct to more
> closely approximate the current sequence.

Something like this?

I chose not to use durable_rename() here, because that allowed me to
do more of the work before starting the critical section, and it's
probably slightly more efficient this way, too. That could be changed,
though, if you really want to stick with durable_rename().

I haven't done anything about actually making the file variable-length
here, either, which I think is what we would want to do. If this seems
more or less all right, I can work on that next.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

v1-0001-rough-draft-of-removing-relmap-size-restriction.patch
Description: Binary data

Re: automatically generating node support functions

2022-07-11 Thread Tom Lane

Andres Freund  writes:
> Additionally, I think we've had to add tags to the enum in minor releases
> before and I'm afraid this now would end up looking even more awkward?

Peter and I already had a discussion about that upthread --- we figured
that if there's a way to manually assign a nodetag's number, you could use
that option when you have to add a tag in a stable branch.  We didn't
actually build out that idea, but I can go do that, if we can solve the
more fundamental problem of keeping the autogenerated numbers stable.

One issue with that idea, of course, is that you have to remember to do
it like that when back-patching a node addition.  Ideally there'd be
something that'd carp if the last autogenerated tag moves in a stable
branch, but I'm not very sure where to put that.

regards, tom lane

Re: [PATCH] New [relation] option engine

2022-07-11 Thread Jeff Davis

On Mon, 2022-02-14 at 00:43 +0300, Nikolay Shaplov wrote:
> For index access methods "amoptions" member function that preformed
> option 
> processing, were replaced with "amreloptspecset" member function that
> provided 
> an SpecSet for reloptions for this AM, so caller can trigger option
> processing 
> himself.

What about table access methods? There have been a couple attempts to
allow custom reloptions for table AMs. Does this patch help that use
case?

Regards,
Jeff Davis

Re: automatically generating node support functions

2022-07-11 Thread Tom Lane

Robert Haas  writes:
> On Mon, Jul 11, 2022 at 1:57 PM Tom Lane  wrote:
>> More generally, I'm having second thoughts about the wisdom of
>> auto-generating the NodeTag enum at all.  With the current setup,
>> I am absolutely petrified about the risk of silent ABI breakage
>> thanks to the enum order changing.

> I think this is a valid concern, but having it be automatically
> generated is awfully handy, so I think it would be nice to find some
> way of preserving that.

Agreed.  The fundamental problem seems to be that each build toolchain
has its own source of truth about the file processing order, but we now
see that there had better be only one.  We could make the sole source
of truth about that be gen_node_support.pl itself, I think.

We can't simply move the file list into gen_node_support.pl, because
(a) the build system has to know about the dependencies involved, and
(b) gen_node_support.pl wouldn't know what to do in VPATH situations.
However, we could have gen_node_support.pl contain a canonical list
of the files it expects to be handed, and make it bitch if its
arguments don't match that.

That's ugly I admit, but the set of files of interest doesn't change
so often that maintaining one additional copy would be a big problem.

Anybody got a better idea?

regards, tom lane

Re: annoyance with .git-blame-ignore-revs

2022-07-11 Thread Peter Geoghegan

On Mon, Jul 11, 2022 at 12:36 PM Tom Lane  wrote:
> Only if we had to update all those copies all the time.  But
> I'm guessing we wouldn't need a branch's copy to be newer than
> the last pgindent run affecting that branch?

I wouldn't care very much if the file itself was empty in the
backbranches, and remained that way -- that would at least suppress
annoying error messages on those branches (from my text editor's git
blame feature).

You might as well have the relevant commits when you backpatch, but
that's kind of not the point. At least to me. In any case I don't see
a need to maintain the file on the backbranches.

-- 
Peter Geoghegan

Re: annoyance with .git-blame-ignore-revs

2022-07-11 Thread Daniel Gustafsson

> On 11 Jul 2022, at 21:35, Tom Lane  wrote:
> 
> Alvaro Herrera  writes:
>> A viable option would be to backpatch the addition of
>> .git-blame-ignore-revs to all live branches.  Would that bother anyone?
> 
> Only if we had to update all those copies all the time.  But
> I'm guessing we wouldn't need a branch's copy to be newer than
> the last pgindent run affecting that branch?

We shouldn't need that, if we do it would indicate we did cosmetic-only commits
in backbranches which IIUC isn't in line with project policy (or at least rare
to the point of not being a problem).

--
Daniel Gustafsson   https://vmware.com/

Re: annoyance with .git-blame-ignore-revs

2022-07-11 Thread Peter Geoghegan

On Mon, Jul 11, 2022 at 12:30 PM Alvaro Herrera  wrote:
> Anybody has any idea how to handle this better?
>
> A viable option would be to backpatch the addition of
> .git-blame-ignore-revs to all live branches.  Would that bother anyone?

+1. I was thinking of suggesting the same thing myself, for the same reasons.

This solution is a good start, but it does leave one remaining
problem: commits from before the introduction of
.git-blame-ignore-revs still won't have the file. There was actually a
patch for git that tried to address the problem directly, but it
didn't go anywhere. Maybe just backpatching the file is good enough.

-- 
Peter Geoghegan

Re: annoyance with .git-blame-ignore-revs

2022-07-11 Thread Tom Lane

Alvaro Herrera  writes:
> A viable option would be to backpatch the addition of
> .git-blame-ignore-revs to all live branches.  Would that bother anyone?

Only if we had to update all those copies all the time.  But
I'm guessing we wouldn't need a branch's copy to be newer than
the last pgindent run affecting that branch?

regards, tom lane

Re: making relfilenodes 56 bits

2022-07-11 Thread Andres Freund

Hi,

On 2022-07-11 15:08:57 -0400, Robert Haas wrote:
> On Mon, Jul 11, 2022 at 2:57 PM Andres Freund  wrote:
> > I don't know where we could fit a sanity check that connects to all 
> > databases
> > and detects duplicates across all the pg_class instances. Perhaps 
> > pg_amcheck?
> 
> Unless we're going to change the way CREATE DATABASE works, uniqueness
> across databases is not guaranteed.

You could likely address that by not flagging conflicts iff oid also matches?
Not sure if worth it, but ...

> > Maybe the easiest fix here would be to replace the file atomically. Then we
> > don't need this <= 512 byte stuff. These are done rarely enough that I don't
> > think the overhead of creating a separate file, fsyncing that, renaming,
> > fsyncing, would be a problem?
> 
> Anything we can reasonably do to reduce the number of places where
> we're relying on things being <= 512 bytes seems like a step in the
> right direction to me. It's very difficult to know whether such code
> is correct, or what the probability is that crossing the 512-byte
> boundary would break anything.

Seems pretty simple to do. Have write_relmapper_file() write to a .tmp file
first (likely adding O_TRUNC to flags), use durable_rename() to rename it into
place.  The tempfile should probably be written out before the XLogInsert(),
the durable_rename() after, although I think it'd also be correct to more
closely approximate the current sequence.

It's a lot more problematic to do this for the control file, because we can
end up updating that at a high frequency on standbys, due to minRecoveryPoint.

I have wondered about maintaining that in a dedicated file instead, and
perhaps even doing so on a primary.

Greetings,

Andres Freund

Re: annoyance with .git-blame-ignore-revs

2022-07-11 Thread Daniel Gustafsson

> On 11 Jul 2022, at 18:31, Alvaro Herrera  wrote:

> A viable option would be to backpatch the addition of
> .git-blame-ignore-revs to all live branches.  Would that bother anyone?

I wouldn't mind having it backpatched.

--
Daniel Gustafsson   https://vmware.com/

annoyance with .git-blame-ignore-revs

2022-07-11 Thread Alvaro Herrera

I like the ignore-revs file, but I run into a problem with my setup:
because I keep checkouts of all branches as worktrees, then all branches
share the same .git/config file.  So if I put the recommended stanza for
[blame] in it, then 'git blame' complains in branches older than 13,
since those don't have the file:

$ git blame configure
fatal: could not open object name list: .git-blame-ignore-revs

My first workaround was to add empty .git-blame-ignore-revs in all
checkouts.  This was moderately ok (shrug), until after a recent `tig`
upgrade the empty file started to show up in the history as an untracked
file.

So I'm now by the second workaround, which is to move the [blame]
section of config to a separate file, and use a [includeIf] sections
like this:

[includeIf "onbranch:master"]
path=config.blame.inc
[includeIf "onbranch:REL_1{4,5,6,7,8,9}_STABLE"]
path=config.blame.inc
[includeIf "onbranch:REL_2*_STABLE"]
path=config.blame.inc

This is quite ugly, and it doesn't work at all if I run `git blame` in a
worktree that I create for development purposes (I don't name those
after the upstream PG branch they're based on).

Anybody has any idea how to handle this better?

A viable option would be to backpatch the addition of
.git-blame-ignore-revs to all live branches.  Would that bother anyone?

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: AIX support - alignment issues

2022-07-11 Thread Tom Lane

Thomas Munro  writes:
> On Mon, Jul 11, 2022 at 6:49 PM Tom Lane  wrote:
>> SuperH might be twitching a bit less feebly than these three,
>> but it seems to be a legacy architecture as well.  Not much
>> has happened there since the early 2000's AFAICS.

> It looks like there's an sh3el package for PostgreSQL on NetBSD here,
> so whoever maintains that might be in touch:
> https://ftp.netbsd.org/pub/pkgsrc/current/pkgsrc/databases/postgresql14-server/index.html

Hm.  For a moment there I was feeling bad about recommending cutting
off a platform somebody still pays attention to ... but looking at
the relevant NetBSD mailing list archives makes it look like that
port is pretty darn moribund.

> It's funny to think that you probably could run modern PostgreSQL on
> the Sun 3 boxes the project started on in 1986 (based on clues from
> the papers in our history section) if you put NetBSD on them, but
> you'd probably need to cross compile due to lack of RAM.

Yeah.  I'm wondering if that sh3el package was cross-compiled,
and if so whether it was just part of a mass package build rather
than something somebody was specifically interested in.  You'd
have to be a glutton for pain to want to do actual work with PG
on the kind of SH3 hardware that seems to be available.

regards, tom lane

Re: making relfilenodes 56 bits

2022-07-11 Thread Robert Haas

On Mon, Jul 11, 2022 at 2:57 PM Andres Freund  wrote:
> ISTM that we should redefine pg_class_tblspc_relfilenode_index to only cover
> relfilenode - afaics there's no real connection to the tablespace
> anymore. That'd a) reduce the size of the index b) guarantee uniqueness across
> tablespaces.

Sounds like a good idea.

> I don't know where we could fit a sanity check that connects to all databases
> and detects duplicates across all the pg_class instances. Perhaps pg_amcheck?

Unless we're going to change the way CREATE DATABASE works, uniqueness
across databases is not guaranteed.

> I think that's a very good idea. My concern around doing an XLogFlush() is
> that it could lead to a lot of tiny f[data]syncs(), because not much else
> needs to be written out. But the scheme you describe would likely lead the
> XLogFlush() flushing plenty other WAL writes out, addressing that.

Oh, interesting. I hadn't considered that angle.

> Maybe the easiest fix here would be to replace the file atomically. Then we
> don't need this <= 512 byte stuff. These are done rarely enough that I don't
> think the overhead of creating a separate file, fsyncing that, renaming,
> fsyncing, would be a problem?

Anything we can reasonably do to reduce the number of places where
we're relying on things being <= 512 bytes seems like a step in the
right direction to me. It's very difficult to know whether such code
is correct, or what the probability is that crossing the 512-byte
boundary would break anything.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-07-11 Thread Ranier Vilela

Em qui., 7 de jul. de 2022 às 14:01, Ranier Vilela 
escreveu:

> Attached the v1 of your patch.
> I think that all is safe to switch MemSet by {0}.
>
Here the rebased patch v2, against latest head.

regards,
Ranier Vilela


v2-0001-WIP-Replace-MemSet-calls-with-struct-initialization.patch
Description: Binary data

Re: making relfilenodes 56 bits

2022-07-11 Thread Andres Freund

Hi,

On 2022-07-07 13:26:29 -0400, Robert Haas wrote:
> We're trying to create a system where the relfilenumber counter is
> always ahead of all the relfilenumbers used on disk, but the coupling
> between the relfilenumber-advancement machinery and the
> make-files-on-disk machinery is pretty loose, and so there is a risk
> that bugs could escape detection. Whatever we can do to increase the
> probability of noticing when things have gone wrong, and/or to notice
> it quicker, will be good.

ISTM that we should redefine pg_class_tblspc_relfilenode_index to only cover
relfilenode - afaics there's no real connection to the tablespace
anymore. That'd a) reduce the size of the index b) guarantee uniqueness across
tablespaces.

I don't know where we could fit a sanity check that connects to all databases
and detects duplicates across all the pg_class instances. Perhaps pg_amcheck?

It may be worth changing RelidByRelfilenumber() / its infrastructure to not
use reltablespace anymore.

> One thing we could think about doing here is try to stagger the xlog
> and the flush. When we've used VAR_RELNUMBER_PREFETCH/2
> relfilenumbers, log a record reserving VAR_RELNUMBER_PREFETCH from
> where we are now, and remember the LSN. When we've used up our entire
> previous allocation, XLogFlush() that record before allowing the
> additional values to be used. The bookkeeping would be a bit more
> complicated than currently, but I don't think it would be too bad. I'm
> not sure how much it would actually help, though, or whether we need
> it.

I think that's a very good idea. My concern around doing an XLogFlush() is
that it could lead to a lot of tiny f[data]syncs(), because not much else
needs to be written out. But the scheme you describe would likely lead the
XLogFlush() flushing plenty other WAL writes out, addressing that.

> If new relfilenumbers are being used up really quickly, then maybe
> the record won't get flushed into the background before we run out of
> available numbers anyway, and if they aren't, then maybe it doesn't
> matter. On the other hand, even one transaction commit between when
> the record is logged and when we run out of the previous allocation is
> enough to force a flush, at least with synchronous_commit=on, so maybe
> the chances of being able to piggyback on an existing flush are not so
> bad after all. I'm not sure.

Even if the record isn't yet flushed out by the time we need to, the
deferred-ness means that there's a good chance more useful records can also be
flushed out at the same time...

> I notice that the patch makes no changes to relmapper.c, and I think
> that's a problem. Notice in particular:
> 
> #define MAX_MAPPINGS62  /* 62 * 8 + 16 = 512 */
> 
> I believe that making RelFileNumber into a 64-bit value will cause the
> 8 in the calculation above to change to 16, defeating the intention
> that the size of the file ought to be the smallest imaginable size of
> a disk sector. It does seem like it would have been smart to include a
> StaticAssertStmt in this file someplace that checks that the data
> structure has the expected size, and now might be a good time, perhaps
> in a separate patch, to add one.

+1

Perhaps MAX_MAPPINGS should be at least partially computed instead of doing
the math in a comment? sizeof(RelMapping) could directly be used, and we could
define SIZEOF_RELMAPFILE_START with a StaticAssert() enforcing it to be equal
to offsetof(RelMapFile, mappings), if we move crc & pad to *before* mappings -
afaics that should be entirely doable.

> If we do nothing fancy here, the maximum number of mappings will have to be
> reduced from 62 to 31, which is a problem because global/pg_filenode.map
> currently has 48 entries. We could try to arrange to squeeze padding out of
> the RelMapping struct, which would let us use just 12 bytes per mapping,
> which would increase the limit to 41, but that's still less than we're using
> already, never mind leaving room for future growth.

Ugh.

> I don't know what to do about this exactly. I believe it's been
> previously suggested that the actual minimum sector size on reasonably
> modern hardware is never as small as 512 bytes, so maybe the file size
> can just be increased to 1kB or something.

I'm not so sure that's a good idea - while the hardware sector size likely
isn't 512 on much storage anymore, it's still the size that most storage
protocols use. Which then means you need to be confident that you not just
rely on storage atomicity, but also that nothing in the
  filesystem <-> block layer <-> driver
stack somehow causes a single larger write to be split up into two.

And if you use a filesystem with a smaller filesystem block size, there might
not even be a choice for the write to be split into two writes. E.g. XFS still
supports 512 byte blocks (although I think it's deprecating block size < 1024).

Maybe the easiest fix here would be to replace the file atomically. Then we
don't need this <= 512 byte st

Re: Commitfest Update

2022-07-11 Thread Daniel Gustafsson

> On 11 Jul 2022, at 15:07, Matthias van de Meent 
>  wrote:

> No objections, but this adds another item to the implicit commitfest
> app user manual, that to the best of my knowledge seems to be mostly
> implicit institutional knowledge plus bits of information spread
> around the mailing lists.

That's mostly true yes, which means that anything I write below is to be taken
with n grains of salt as it's my interpretation of said institutional
knowledge.

> Do we have an actual manual or otherwise a (single?) place with
> documentation on how certain fields of the CFA should be used or
> interpreted, so that (new) hackers know what to expect or where to
> look?

We don't AFAIK, but we should.  Either an actual written manual (which may end
up in many tldr folders) or inline help within the app (the latter being my
preference I think).

> Examples of information about using the CFA that I couldn't find:
> - The Version field may contain a single specific postgresql version
> number, 'stable', or nothing. If my patch needs backpatching to all
> backbranches, which do I select? The oldest supported PG version, or
> 'stable'? Related to that: which version is indicated by 'stable'?

I'll refer to the commitmessage from the CF app repo on this:

commit a3bac5922db76efd5b6bb331a7141e9ca3209c4a
Author: Magnus Hagander 
Date:   Wed Feb 6 21:05:06 2019 +0100

Add a field to each patch for target version

This is particularly interesting towards the end of a cycle where it can
be used to flag patches that are not intended for the current version
but still needs review.

The thread on -hackers which concluded on adding the field has a lot more of
the reasoning but some quick digging didn't find it.

> - When creating or updating a CF entry, who are responsible for
> filling in which fields? May the author assign reviewers/committers,
> or should they do so themselves?

Reviewers and committers sign themselves up.

> - Should the 'git link' be filled with a link to the committed patch
> once committed, or is it a general purpose link to share a git
> repository with the proposed changes?

The gitlink field is (was?) primarily meant to hold links to external repos for
large patchsets where providing a repo on top of the patches in the thread is
valuable.  An example would be Andres et.al's IO work where being able to
follow the work as it unfolds in a repo is valuable for reviewers.

> - What should (or shoudn't) Annotations be used for?

Annotations are used for indicating that certain emails are specifically
important and/or highlight them as taking specific design decisions etc.  It
can be used for anything that is providing value to the a new reader of the
thread really.

> - What should I expect of the comment / review system of the CFA?
> Should I use feature over direct on-list mails?

I think that's up to personal preference, for reviewers who aren't subscribed
to -hackers it's clearly useful to attach it to the thread.  For anyone already
subscribed and used to corresponding on the mailinglist I would think that's
the easiest option.

> I'm not asking anyone to drop all and get all the features of the CFA
> documented, but for my almost 2 years of following the -hackers list I
> feel like I still haven't gotten a good understanding of the
> application that is meant to coordinate the shared state in patch
> development, and I think that's a quite a shame.

There has been a lot of discussions around how to improve the CF app but they
have to a large extent boiled down to ENOTENOUGHTIME.  I still have my hopes
that we can address these before too long, and adding user documentation is
clearly an important one.

--
Daniel Gustafsson   https://vmware.com/

Re: automatically generating node support functions

2022-07-11 Thread Andres Freund

Hi,

On 2022-07-11 13:57:38 -0400, Tom Lane wrote:
> More generally, I'm having second thoughts about the wisdom of
> auto-generating the NodeTag enum at all.  With the current setup,
> I am absolutely petrified about the risk of silent ABI breakage
> thanks to the enum order changing.  In particular, if the meson
> build fails to use the same input-file order as the makefile build,
> then we will get different enum orders from the two builds, causing
> an ABI discrepancy that nobody would notice until we had catastrophic
> extension-compatibility issues in the field.

Ugh, yes. And it already exists due to Solution.pm, although that's perhaps
less likely to be encountered "in the wild".

Additionally, I think we've had to add tags to the enum in minor releases
before and I'm afraid this now would end up looking even more awkward?


> Of course, sorting the tags by name is a simple way to fix that.
> But I'm not sure I want to buy into being forced to do it like that,
> because of the switch-density question.
> 
> So at this point I'm rather attracted to the idea of reverting to
> a manually-maintained NodeTag enum.

+0.5 - there might be a better solution to this, but I'm not immediately
seeing it.

Greetings,

Andres Freund

Re: First draft of the PG 15 release notes

2022-07-11 Thread Bruce Momjian

On Mon, Jul 11, 2022 at 12:39:23PM -0500, Justin Pryzby wrote:
> On Tue, May 10, 2022 at 11:41:08PM -0400, Bruce Momjian wrote:
> > On Tue, May 10, 2022 at 08:31:17PM -0500, Justin Pryzby wrote:
> 
> > > | Store server-level statistics in shared memory (Kyotaro Horiguchi, 
> > > Andres Freund, Melanie Plageman)
> > > 
> > > Should this be called "cumulative" statistics?  As in 
> > > b3abca68106d518ce5d3c0d9a1e0ec02a647ceda.
> > 
> > Uh, they are counters, which I guess is cummulative, but that doesn't
> > seem very descriptive.  The documentation call it the statistics
> > collector, but I am not sure we even have that anymore with an in-memory
> > implementation.  I am kind of not sure what to call it.
> 
> What I was trying to say is that it's now called the cumulative stats system.

It is actually called the "cumulative statistics system", so updated; 
patch attached and applied.

> > > | New function
> > > 
> > > "The new function .." (a few times)
> > 
> > Uh, I only see it once.
> 
> There's still a couple of these without "The".

Ah, found them, fixed.

> > > Should any of these be listed as incompatible changes (some of these I 
> > > asked
> > > before, but the others are from another list).
> 
> > > ccd10a9bfa5 Fix enforcement of PL/pgSQL variable CONSTANT markings (Tom 
> > > Lane)
> > 
> > I didn't see not enforcing constant as an incompatibility, but rather a
> > bug.
> 
> Yes it's a bug, but it's going to be seen as a compatibility issue for someone
> whose application breaks.  The same goes for other things I mentioned.

We don't guarantee that the only breakage is listed in the
incompatibilities section, only the most common ones.

> > > 376ce3e404b Prefer $HOME when looking up the current user's home 
> > > directory.
> > 
> > Uh, I didn't think so.
> > 
> > > 7844c9918a4 psql: Show all query results by default
> > 
> > Same.
> > 
> > > 17a856d08be Change aggregated log format of pgbench.
> > 
> > We have a pgbench section and I can't see it. I am trying to keep
> > incompatiblities as things related to in-production problems or
> > surprises.
> > 
> > > ? 73508475d69 Remove pg_atoi()
> > 
> > I don't see who would care except for internals folks.
> > 
> > > ? aa64f23b029 Remove MaxBackends variable in favor of GetMaxBackends() 
> > > function.
> > 
> > Same.
> > 
> > > ? d816f366bc4 psql: Make SSL info display more compact
> > 
> > I did look at that but considered that this wouldn't be something that
> > would break anything.
> > 
> > > ? 27b02e070fd pg_upgrade: Don't print progress status when output is not 
> > > a tty.
> > 
> > Same.
> > 
> > > ? ab4fd4f868e Remove 'datlastsysoid'.
> > 
> > Seemed too internal.
> 
> FYI, removal of this column broke a tool one of my coworkers uses (navicat).
> I'm told that the fix will be in navicat v16.1 (but their existing users will
> need to pay to upgrade from v15).

This actually supports my point --- only navicat needs to know about this
renaming, it its users.  Telling navicat users about this change does
not help them.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

diff --git a/doc/src/sgml/release-15.sgml b/doc/src/sgml/release-15.sgml
index 179ad37d9d..cde78b6181 100644
--- a/doc/src/sgml/release-15.sgml
+++ b/doc/src/sgml/release-15.sgml
@@ -1100,8 +1100,8 @@ Author: Andres Freund 
 
   

-Store server-level
-statistics in shared memory (Kyotaro Horiguchi, Andres
+Store cumulative statistics
+system data in shared memory (Kyotaro Horiguchi, Andres
 Freund, Melanie Plageman)

 
@@ -1248,7 +1248,7 @@ Author: Tom Lane 

 

-New function has_parameter_privilege()
+The new function has_parameter_privilege()
 reports on this privilege.

   
@@ -1656,7 +1656,7 @@ Author: Amit Kapila 

 

-New function pg_stat_reset_subscription_stats()
 allows the resetting of subscriber statistics.

Re: automatically generating node support functions

2022-07-11 Thread Robert Haas

On Mon, Jul 11, 2022 at 1:57 PM Tom Lane  wrote:
> More generally, I'm having second thoughts about the wisdom of
> auto-generating the NodeTag enum at all.  With the current setup,
> I am absolutely petrified about the risk of silent ABI breakage
> thanks to the enum order changing.  In particular, if the meson
> build fails to use the same input-file order as the makefile build,
> then we will get different enum orders from the two builds, causing
> an ABI discrepancy that nobody would notice until we had catastrophic
> extension-compatibility issues in the field.

I think this is a valid concern, but having it be automatically
generated is awfully handy, so I think it would be nice to find some
way of preserving that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Eliminating SPI from RI triggers - take 2

2022-07-11 Thread Robert Haas

On Fri, Jul 8, 2022 at 10:07 PM Tom Lane  wrote:
> Uh ... if such caching behavior is at all competently implemented,
> it will be transparent because the cache will notice and respond to
> events that should change its outputs.

Well, that assumes that we emit appropriate invalidations in every
place where permissions are updated, and take appropriate locks every
place where they are checked. I think that the first one might be too
optimistic, and the second one is definitely too optimistic. For
instance, consider pg_proc_ownercheck. There's no lock of any kind
taken on the function here, and at least in typical cases, I don't
think the caller takes one either. Compare the extensive tap-dancing
around locking and permissions checking in RangeVarGetRelidExtended
against the blithe unconcern in FuncnameGetCandidates.

I believe that of all the types of SQL objects in the system, only
relations have anything like proper interlocking against concurrent
DDL. Other examples of not caring at all include LookupCollation() and
LookupTypeNameExtended(). There's just no heavyweight locking here at
all, and so no invalidation based on sinval messages can ever be
reliable.

GRANT and REVOKE don't take proper locks, either, even on tables:

rhaas=# begin;
BEGIN
rhaas=*# lock table pgbench_accounts;
LOCK TABLE
rhaas=*#

Then, in another session:

rhaas=# create role foo;
CREATE ROLE
rhaas=# grant select on pgbench_accounts to foo;
GRANT
rhaas=#

Executing "SELECT * FROM pgbench_accounts" in the other session would
have blocked, but the GRANT has no problem at all.

I don't see that any of this is this patch's job to fix. If nobody's
cared enough to fix it any time in the past 20 years, or just didn't
want to pay the locking cost, well then we probably don't need to do
it now either. But I think it means that even the slightest change in
the timing or frequency of permissions checks is in theory a
user-visible change, because there are no grounds for assuming that
the permissions on any of the objects involved aren't changing while
the query is executing.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: automatically generating node support functions

2022-07-11 Thread Tom Lane

I wrote:
>> Andres Freund  writes:
>>> I was just rebasing meson ontop of this and was wondering whether the input
>>> filenames were in a particular order:

Pushed a patch to make that a bit less random-looking.

> +1 for sorting alphabetically.  I experimented with that and it's a
> really trivial change.

I had second thoughts about that, after noticing that alphabetizing
the NodeTag enum increased the backend's size by 20K or so.  Presumably
that's telling us that a bunch of switch statements got less dense,
which might possibly cause performance issues thanks to poorer cache
behavior or the like.  Maybe it's still appropriate to do, but it's
not as open-and-shut as I first thought.

More generally, I'm having second thoughts about the wisdom of
auto-generating the NodeTag enum at all.  With the current setup,
I am absolutely petrified about the risk of silent ABI breakage
thanks to the enum order changing.  In particular, if the meson
build fails to use the same input-file order as the makefile build,
then we will get different enum orders from the two builds, causing
an ABI discrepancy that nobody would notice until we had catastrophic
extension-compatibility issues in the field.

Of course, sorting the tags by name is a simple way to fix that.
But I'm not sure I want to buy into being forced to do it like that,
because of the switch-density question.

So at this point I'm rather attracted to the idea of reverting to
a manually-maintained NodeTag enum.  We know how to avoid ABI
breakage with that, and it's not exactly the most painful part
of adding a new node type.  Plus, that'd remove (most of?) the
need for gen_node_support.pl to deal with "node-tag-only" structs
at all.

Thoughts?

regards, tom lane

Re: DropRelFileLocatorBuffers

2022-07-11 Thread Robert Haas

On Fri, Jul 8, 2022 at 1:59 AM Kyotaro Horiguchi
 wrote:
> I thought for a moment that "Relation" sounded better but that naming
> is confusing in bufmgr.c, where functions take Relation and those take
> RelFileLocator exist together. So the (second) attached introduces
> "RelFile" to represent RelFileNode excluding RelFileLocator.
>
> The function CreateAndCopyRelationData exists since before b0a55e4329
> but renamed since it takes RelFileLocators.

I'm not very sold on this. I think that the places where you've
replaced RelFileLocator with just RelFile in various functions might
be an improvement, but the places where you've replaced Relation with
RelFile seem to me to be worse. I don't really see that there's
anything wrong with names like CreateAndCopyRelationData or
FlushRelationsAllBuffers, and in general I prefer function names that
are made up of whole words rather than parts of words.

> While working on this, I found that the following coment is wrong.
>
>  *  FlushRelationsAllBuffers
>  *
>  *  This function flushes out of the buffer pool all the pages of 
> all
>  *  forks of the specified smgr relations.  It's equivalent to 
> calling
>  *  FlushRelationBuffers once per fork per relation.  The 
> relations are
>  *  assumed not to use local buffers.
>
> It is equivalent to calling FlushRelationBuffers "per relation". This
> is attached as the first patch, which could be thought as a separate
> patch.

I committed this.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: EINTR in ftruncate()

2022-07-11 Thread Andres Freund

Hi,

On 2022-07-07 17:58:10 +1200, Thomas Munro wrote:
> Even if we go with this approach now, I think it's plausible that we
> might want to reconsider this yet again one day, perhaps allocating
> memory with some future asynchronous infrastructure while still
> processing interrupts.  It's not very nice to hold up recovery or
> ProcSignalBarrier for long operations.

I think the next improvement would be to do the fallocate in smaller chunks,
and accept interrupts inbetween.


> I'm a little unclear about ftruncate() here.  I don't expect it to
> report EINTR in other places where we use it (ie to make a local file
> on a non-"slow device" smaller), because I expect that to be like
> read(), write() etc which we don't wrap in EINTR loops.  Here you've
> observed EINTR when messing around with a debugger*.  It seems
> inconsistent to put posix_fallocate() in an EINTR retry loop for the
> benefit of debuggers, but not ftruncate().  But perhaps that's good
> enough, on the theory that posix_fallocate(1GB) is a very large target
> and you have a decent chance of hitting it.

> *It's funny that ftruncate() apparently doesn't automatically restart
> for ptrace SIGCONT on Linux according to your report, while poll()
> does according to my experiments, even though the latter is one of the
> never-restart functions (it doesn't on other OSes I hack on, and you
> feel the difference when debugging missing wakeup type bugs...).
> Random implementation details...

My test was basically while (true); {if (!ftruncate()) bleat(); if
(!fallocate()) bleat();} with a SIGALRM triggering regularly in the
background. The ftruncate failed, rarely, when attaching / detaching strace
-p. Rarely enough that I had already written you an IM saying that I couldn't
make it fail...  So it's hard to be confident this can't otherwise be
hit. With that caveat: I didn't hit it with a "real" file on a "real"
filesystem in a few minutes of trying. But unsurprisingly it triggers when
putting the file on a tmpfs.


> @@ -362,6 +355,14 @@ dsm_impl_posix_resize(int fd, off_t size)
>  {
>   int rc;
>  
> + /*
> +  * Block all blockable signals, except SIGQUIT.  posix_fallocate() can 
> run
> +  * for quite a long time, and is an all-or-nothing operation.  If we
> +  * allowed SIGUSR1 to interrupt us repeatedly (for example, due to 
> recovery
> +  * conflicts), the retry loop might never succeed.
> +  */
> + PG_SETMASK(&BlockSig);
> +
>   /* Truncate (or extend) the file to the requested size. */
>   rc = ftruncate(fd, size);
>

Hm - given that we've observed ftruncate failing with strace, and that
stracing to find problems isn't insane, shouldn't we retry the ftruncate too?
It's kind of obsoleted by your next patch, but not really, because it's not
unconceivable that other OSs behave similarly? And IIUC you're planning to not
backpatch 0002?


> + pgstat_report_wait_start(WAIT_EVENT_DSM_FILL_ZERO_WRITE);

(not new in this patch, just moved around) - FILL_ZERO_WRITE is imo an odd
description of what's happening... In my understanding this isn't doing any
writing, just allocating. But whatever...


Greetings,

Andres Freund

Re: First draft of the PG 15 release notes

2022-07-11 Thread Justin Pryzby

On Tue, May 10, 2022 at 11:41:08PM -0400, Bruce Momjian wrote:
> On Tue, May 10, 2022 at 08:31:17PM -0500, Justin Pryzby wrote:

> > | Store server-level statistics in shared memory (Kyotaro Horiguchi, Andres 
> > Freund, Melanie Plageman)
> > 
> > Should this be called "cumulative" statistics?  As in 
> > b3abca68106d518ce5d3c0d9a1e0ec02a647ceda.
> 
> Uh, they are counters, which I guess is cummulative, but that doesn't
> seem very descriptive.  The documentation call it the statistics
> collector, but I am not sure we even have that anymore with an in-memory
> implementation.  I am kind of not sure what to call it.

What I was trying to say is that it's now called the cumulative stats system.

> > | New function
> > 
> > "The new function .." (a few times)
> 
> Uh, I only see it once.

There's still a couple of these without "The".

> > Should any of these be listed as incompatible changes (some of these I asked
> > before, but the others are from another list).

> > ccd10a9bfa5 Fix enforcement of PL/pgSQL variable CONSTANT markings (Tom 
> > Lane)
> 
> I didn't see not enforcing constant as an incompatibility, but rather a
> bug.

Yes it's a bug, but it's going to be seen as a compatibility issue for someone
whose application breaks.  The same goes for other things I mentioned.

> > 376ce3e404b Prefer $HOME when looking up the current user's home directory.
> 
> Uh, I didn't think so.
> 
> > 7844c9918a4 psql: Show all query results by default
> 
> Same.
> 
> > 17a856d08be Change aggregated log format of pgbench.
> 
> We have a pgbench section and I can't see it. I am trying to keep
> incompatiblities as things related to in-production problems or
> surprises.
> 
> > ? 73508475d69 Remove pg_atoi()
> 
> I don't see who would care except for internals folks.
> 
> > ? aa64f23b029 Remove MaxBackends variable in favor of GetMaxBackends() 
> > function.
> 
> Same.
> 
> > ? d816f366bc4 psql: Make SSL info display more compact
> 
> I did look at that but considered that this wouldn't be something that
> would break anything.
> 
> > ? 27b02e070fd pg_upgrade: Don't print progress status when output is not a 
> > tty.
> 
> Same.
> 
> > ? ab4fd4f868e Remove 'datlastsysoid'.
> 
> Seemed too internal.

FYI, removal of this column broke a tool one of my coworkers uses (navicat).
I'm told that the fix will be in navicat v16.1 (but their existing users will
need to pay to upgrade from v15).

-- 
Justin

Re: replacing role-level NOINHERIT with a grant-level option

2022-07-11 Thread Robert Haas

On Mon, Jul 11, 2022 at 12:48 PM tushar  wrote:
> One scenario where the syntax is created in pg_dumpall is wrong
>
> postgres=# create user u1;
> postgres=# create group g1 with user u1;
> postgres=# grant g1 to u1 with admin option, inherit false;
>
> Perform pg_dumpall
>
> GRANT g1 TO u1 WITH ADMIN OPTION WITH INHERIT FALSE GRANTED BY edb;

Oops. Here is a rebased version of v3 which aims to fix this bug.

It seems that I can replace the previous changes to src/backend/nodes
with nothing at all in view of Peter's commit to automatically
generate node support functions. Nice.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

v4-0001-Allow-grant-level-control-of-role-inheritance-beh.patch
Description: Binary data

Re: AIX support - alignment issues

2022-07-11 Thread Tom Lane

Robert Haas  writes:
> On Mon, Jul 11, 2022 at 2:49 AM Tom Lane  wrote:
>> I think it'd be pretty reasonable to disclaim support for
>> any architecture that doesn't have a representative in our
>> buildfarm, which would lead to dropping all four of these.
>> If you don't like it, step up and run a buildfarm animal.

> I strongly suspect that anyone who tried to use a modern PostgreSQL on
> any of these platforms would find it quite an adventure, which is
> fine, because if you're trying to use any of those platforms in 2022,
> you are probably the sort of person who enjoys an adventure. But it
> can't really be useful to list them in the documentation, and it's
> unlikely that any of them "just work".

It's possible that they "just work", but we have no way of knowing that,
or knowing if we break them in future.  Thus the importance of having
a buildfarm animal to tell us that.

More generally, I think the value of carrying support for niche
architectures is that it helps keep us from falling into the
software-monoculture trap, from which we'd be unable to escape when
the hardware landscape inevitably changes.  However, it only helps
if somebody is testing such arches on a regular basis.  The fact that
there's some #ifdef'd code somewhere for M88K proves diddly-squat
about whether we could actually run on M88K today.  The situation
for niche operating systems is precisely analogous.

regards, tom lane

Re: replacing role-level NOINHERIT with a grant-level option

2022-07-11 Thread tushar

On Sat, Jul 9, 2022 at 1:27 AM Robert Haas  wrote:

> On Tue, Jul 5, 2022 at 8:04 AM Robert Haas  wrote:
> > On Sun, Jul 3, 2022 at 1:17 PM Nathan Bossart 
> wrote:
> > > If by "bolder" you mean "mark [NO]INHERIT as
> deprecated-and-to-be-removed
> > > and begin emitting WARNINGs when it and WITH INHERIT DEFAULT are
> used," I
> > > think it's worth consideration.  I suspect it will be hard to sell
> removing
> > > [NO]INHERIT in v16 because it would introduce a compatibility break
> without
> > > giving users much time to migrate.  I could be wrong, though.
> >
> > It's a fair point. But, if our goal for v16 is to do something that
> > could lead to an eventual deprecation of [NO]INHERIT, I still think
> > removing WITH INHERIT DEFAULT from the patch set is probably a good
> > idea.
>
> So here is an updated patch with that change.
>
>
Thanks, Robert, I created a few objects with different privileges on v14.4
e.g


postgres=# \dp+ atest2

   Access privileges

 Schema |  Name  | Type  |   Access privileges   |
Column privileges | Policies

++---+---+---+--

 public | atest2 | table | regress_priv_user1=arwdDxt/regress_priv_user1+|
|

||   | regress_priv_user2=r/regress_priv_user1  +|
|

||   | regress_priv_user3=w/regress_priv_user1  +|
|

||   | regress_priv_user4=a/regress_priv_user1  +|
|

||   | regress_priv_user5=D/regress_priv_user1   |
|

(1 row)




and found that after pg_upgrade there is no change on privileges  on
v16(w/patch)


One scenario where the syntax is created in pg_dumpall is wrong


postgres=# create user u1;

CREATE ROLE

postgres=# create group g1 with user u1;

CREATE ROLE

postgres=# grant g1 to u1 with admin option, inherit false;

GRANT ROLE

postgres=#


Perform pg_dumpall


This is the syntax coming


"


-- Role memberships

--


GRANT g1 TO u1 WITH ADMIN OPTION WITH INHERIT FALSE GRANTED BY edb;


"


If we run this syntax on psql, there is an error.


postgres=# GRANT g1 TO u1 WITH ADMIN OPTION WITH INHERIT FALSE GRANTED BY
edb;

ERROR:  syntax error at or near "WITH"



regards,

Re: Cleaning up historical portability baggage

2022-07-11 Thread Robert Haas

On Sat, Jul 9, 2022 at 9:46 PM Thomas Munro  wrote:
> The pwritev/preadv functions are unfortunately not standardised by
> POSIX (I dunno why, it's the obvious combination of the p* and *v
> functions) despite every OS in the list having them except for Solaris
> and old macOS.  Oh well.

I don't think that 0001 is buying us a whole lot, really. I prefer the
style where we have PG-specific functions that behave differently on
different platforms to the one where we call something that looks like
a native OS function call on all platforms but on some of them it is
secretly invoking a replacement implementation in src/port. The
problem with the latter is it looks like you're using something that's
universally supported and works the same way everywhere, but you're
really not. If it were up to me, we'd have more pg_whatever() that
calls whatever() on non-Windows and something else on Windows, rather
than going in the direction that this patch takes us.

I like all of the other patches. Reducing the number of configure
tests that we need seems like a really good idea.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: First draft of the PG 15 release notes

2022-07-11 Thread Bruce Momjian

On Sat, Jul  9, 2022 at 08:19:41PM -0700, Noah Misch wrote:
> > I think you would need to say "previous behavior" since people might be
> > upgrading from releases before PG 14.  I also would change "In existing
> 
> I felt "previous behavior" was mildly ambiguous.  I've changed it to "the
> behavior of prior releases".

Sure.
> 
> > databases" to "For existing databases".  I think your big risk here is
> 
> Done.  New version attached.

I had trouble reading the sentences in the order you used so I
restructured it:

The new default is one of the secure schema usage patterns that  has recommended since the security
release for CVE-2018-1058.  The change applies to newly-created
databases in existing clusters and for new clusters.  Upgrading a
cluster or restoring a database dump will preserve existing permissions.

For existing databases, especially those having multiple users, consider
issuing REVOKE to adopt this new default.  For new
databases having zero need to defend against insider threats, granting
USAGE permission on their public
schemas will yield the behavior of prior releases.

> > Is this something we want to get into in the release notes, or perhaps
> > do we need to link to a wiki page for these details?
> 
> No supported release has a wiki page link in its release notes.  We used wiki
> pages in the more-distant past, but I don't recall why.  I am not aware of
> wiki pages having relevant benefits.

I think the wiki was good if you needed a lot of release-specific text,
or if you wanted to adjust the wording after the release.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

Re: AIX support - alignment issues

2022-07-11 Thread Robert Haas

On Mon, Jul 11, 2022 at 2:49 AM Tom Lane  wrote:
> While we're here ...
>
> +   Code support exists for M68K, M88K, M32R, and SuperH, but these
> architectures are not known to have been tested recently.
>
> I think it'd be pretty reasonable to disclaim support for
> any architecture that doesn't have a representative in our
> buildfarm, which would lead to dropping all four of these.
> If you don't like it, step up and run a buildfarm animal.

+1. Keeping stuff like this in the documentation doesn't make those
platforms supported. What it does do is make it look like we're bad at
updating our documentation.

I strongly suspect that anyone who tried to use a modern PostgreSQL on
any of these platforms would find it quite an adventure, which is
fine, because if you're trying to use any of those platforms in 2022,
you are probably the sort of person who enjoys an adventure. But it
can't really be useful to list them in the documentation, and it's
unlikely that any of them "just work".

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: doc: Clarify what "excluded" represents for INSERT ON CONFLICT

2022-07-11 Thread Robert Haas

On Fri, Jul 8, 2022 at 11:18 PM Bruce Momjian  wrote:
> I found two places where a singular "row" would be better, doc patch
> attached.

+1.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: making relfilenodes 56 bits

2022-07-11 Thread Robert Haas

On Mon, Jul 11, 2022 at 7:39 AM Dilip Kumar  wrote:
> > buf_init.c:119:4: error: implicit truncation from 'int' to bit-field
> > changes value from -1 to 255 [-Werror,-Wbitfield-constant-conversion]
> > CLEAR_BUFFERTAG(buf->tag);
> > ^
> > ../../../../src/include/storage/buf_internals.h:122:14: note: expanded
> > from macro 'CLEAR_BUFFERTAG'
> > (a).forkNum = InvalidForkNumber, \
> > ^ ~
> > 1 error generated.
>
> Hmm so now we are using an unsigned int field so IMHO we can make
> InvalidForkNumber to 255 instead of -1?

If we're going to do that I think we had better do it as a separate,
preparatory patch.

It also makes me wonder why we're using macros rather than static
inline functions in buf_internals.h. I wonder whether we could do
something like this, for example, and keep InvalidForkNumber as -1:

static inline ForkNumber
BufTagGetForkNum(BufferTag *tagPtr)
{
int8 ret;

StaticAssertStmt(MAX_FORKNUM <= INT8_MAX);
ret = (int8) ((tagPtr->relForkDetails[0] >> BUFFERTAG_RELNUMBER_BITS);
return (ForkNumber) ret;
}

Even if we don't use that particular trick, I think we've generally
been moving toward using static inline functions rather than macros,
because it provides better type-safety and the code is often easier to
read. Maybe we should also approach it that way here. Or even commit a
preparatory patch replacing the existing macros with inline functions.
Or maybe it's best to leave it alone, not sure.

It feels like some of the changes to buf_internals.h in 0002 could be
moved into 0001. If we're going to introduce a combined method to set
the relnumber and fork, I think we could do that in 0001 rather than
making 0001 introduce a macro to set just the relfilenumber and then
having 0002 change it around again.

BUFFERTAG_RELNUMBER_BITS feels like a lie. It's defined to be 24, but
based on the name you'd expect it to be 56.

> But we are already logging this if we are setting the relfilenumber
> which is out of the already logged range, am I missing something?
> Check this change.
> +relnumbercount = relnumber - ShmemVariableCache->nextRelFileNumber;
> +if (ShmemVariableCache->relnumbercount <= relnumbercount)
> +{
> +LogNextRelFileNumber(relnumber + VAR_RELNUMBER_PREFETCH, NULL);
> +ShmemVariableCache->relnumbercount = VAR_RELNUMBER_PREFETCH;
> +}
> +else
> +ShmemVariableCache->relnumbercount -= relnumbercount;

Oh, I guess I missed that.

> I had those changes in v7-0003, now I have merged with 0002.  This has
> assert check while replaying the WAL for smgr create and smgr
> truncate, and while during normal path when allocating the new
> relfilenumber we are asserting for any existing file.

I think a test-and-elog might be better. Most users won't be running
assert-enabled builds, but this seems worth checking regardless.

> I have done some performance tests, with very small values I can see a
> lot of wait events for RelFileNumberGen but with bigger numbers like
> 256 or 512 it is not really bad.  See results at the end of the
> mail[1]

It's a little hard to interpret these results because you don't say
how often you were checking the wait events, or how often the
operation took to complete. I suppose we can guess the relative time
scale from the number of Activity events: if there were 190
WalWriterMain events observed, then the time to complete the operation
is probably 190 times how often you were checking the wait events, but
was that every second or every half second or every tenth of a second?

> I have done these changes during GetNewRelFileNumber() this required
> to track the last logged record pointer as well but I think this looks
> clean.  With this I can see some reduction in RelFileNumberGen wait
> event[1]

I find the code you wrote here a little bit magical. I believe it
depends heavily on choosing to issue the new WAL record when we've
exhausted exactly 50% of the available space. I suggest having two
constants, one of which is the number of relfilenumber values per WAL
record, and the other of which is the threshold for issuing a new WAL
record. Maybe something like RFN_VALUES_PER_XLOG and
RFN_NEW_XLOG_THRESHOLD, or something. And then work code that works
correctly for any value of RFN_NEW_XLOG_THRESHOLD between 0 (don't log
new RFNs until old allocation is completely exhausted) and
RFN_VALUES_PER_XLOG - 1 (log new RFNs after using just 1 item from the
previous allocation). That way, if in the future someone decides to
change the constant values, they can do that and the code still works.

> I am not sure what is the best solution here, but I agree that most of
> the modern hardware will have bigger sector size than 512 so we can
> just change file size of 1024.

I went looking for previous discussion of this topic. Here's Heikki
doubting whether even 512 is too big:

http://postgr.es/m/f03d9166-ad12-2a

Re: automatically generating node support functions

2022-07-11 Thread Andres Freund

Hi,

On 2022-07-11 12:07:09 -0400, Tom Lane wrote:
> I wrote:
> > Peter Eisentraut  writes:
> >> could not handle type "ScanDirection" in struct "IndexScan" field
> >> "indexorderdir"
>
> > Ah, I see.  Still, we could also handle that with
> > push @enum_types, qw(ScanDirection);
>
> I tried that, and it does work.  The only other input file we could
> get rid of that way is nodes/lockoptions.h, which likewise contributes
> only a couple of enum type names.

Kinda wonder if those headers are even worth having. Plenty other enums in
primnodes.h.


> Not sure it's worth messing with --- both ways seem crufty, though for
> different reasons.

Not sure either.

Greetings,

Andres Freund

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-07-11 Thread Andres Freund

Hi,

On 2022-07-11 11:53:26 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I wonder if we can't abstract this at least a bit better. If we go that 
> > route
> > a bit further, then add another arch, this code will be pretty much
> > unreadable.
>
> IMO, it's pretty unreadable *now*, for lack of comments about what it's
> doing and why.

Yea, that could at least be addressed by adding comments. But even with a
bunch of comments, it'd still be pretty hard to read once the events above
have happened (and they seem kind of inevitable).

I wonder if we can add a somewhat more general function for scanning until
some characters are found using SIMD? There's plenty other places that could
be useful.

Greetings,

Andres Freund

Re: automatically generating node support functions

2022-07-11 Thread Tom Lane

I wrote:
> Peter Eisentraut  writes:
>> could not handle type "ScanDirection" in struct "IndexScan" field 
>> "indexorderdir"

> Ah, I see.  Still, we could also handle that with
> push @enum_types, qw(ScanDirection);

I tried that, and it does work.  The only other input file we could
get rid of that way is nodes/lockoptions.h, which likewise contributes
only a couple of enum type names.  Not sure it's worth messing with
--- both ways seem crufty, though for different reasons.

regards, tom lane

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-07-11 Thread Tom Lane

Andres Freund  writes:
> I wonder if we can't abstract this at least a bit better. If we go that route
> a bit further, then add another arch, this code will be pretty much
> unreadable.

IMO, it's pretty unreadable *now*, for lack of comments about what it's
doing and why.

regards, tom lane

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-07-11 Thread Andres Freund

Hi,

On 2022-07-06 15:58:44 +0700, John Naylor wrote:
> From 82e13b6bebd85a152ededcfd75495c0c0f642354 Mon Sep 17 00:00:00 2001
> From: John Naylor 
> Date: Wed, 6 Jul 2022 15:50:09 +0700
> Subject: [PATCH v4 4/4] Use vectorized lookahead in json_lex_string on x86
> 
> ---
>  src/common/jsonapi.c | 48 
>  1 file changed, 48 insertions(+)
> 
> diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c
> index 81e176ad8d..44e8ed2b2f 100644
> --- a/src/common/jsonapi.c
> +++ b/src/common/jsonapi.c
> @@ -24,6 +24,12 @@
>  #include "miscadmin.h"
>  #endif
>  
> +/*  WIP: put somewhere sensible and consider removing CRC from the names */
> +#if (defined (__x86_64__) || defined(_M_AMD64)) && 
> (defined(USE_SSE42_CRC32C) || defined(USE_SSE42_CRC32C_WITH_RUNTIME_CHECK))
> +#include 
> +#define USE_SSE2
> +#endif
> +
>  /*
>   * The context of the parser is maintained by the recursive descent
>   * mechanism, but is passed explicitly to the error reporting routine
> @@ -842,12 +848,54 @@ json_lex_string(JsonLexContext *lex)
>   }
>   else
>   {
> +#ifdef USE_SSE2
> + __m128i block,
> + has_backslash,
> + has_doublequote,
> + control,
> + has_control,
> + error_cum = _mm_setzero_si128();
> + const   __m128i backslash = _mm_set1_epi8('\\');
> + const   __m128i doublequote = 
> _mm_set1_epi8('"');
> + const   __m128i max_control = 
> _mm_set1_epi8(0x1F);
> +#endif
>   /* start lookahead at current byte */
>   char   *p = s;
>  
>   if (hi_surrogate != -1)
>   return JSON_UNICODE_LOW_SURROGATE;
>  
> +#ifdef USE_SSE2
> + while (p < end - sizeof(__m128i))
> + {
> + block = _mm_loadu_si128((const __m128i *) p);
> +
> + /* direct comparison to quotes and backslashes 
> */
> + has_backslash = _mm_cmpeq_epi8(block, 
> backslash);
> + has_doublequote = _mm_cmpeq_epi8(block, 
> doublequote);
> +
> + /*
> +  * use saturation arithmetic to check for <= 
> highest control
> +  * char
> +  */
> + control = _mm_subs_epu8(block, max_control);
> + has_control = _mm_cmpeq_epi8(control, 
> _mm_setzero_si128());
> +
> + /*
> +  * set bits in error_cum where the 
> corresponding lanes in has_*
> +  * are set
> +  */
> + error_cum = _mm_or_si128(error_cum, 
> has_backslash);
> + error_cum = _mm_or_si128(error_cum, 
> has_doublequote);
> + error_cum = _mm_or_si128(error_cum, 
> has_control);
> +
> + if (_mm_movemask_epi8(error_cum))
> + break;
> +
> + p += sizeof(__m128i);
> + }
> +#endif   /* USE_SSE2 */
> +
>   while (p < end)
>   {
>   if (*p == '\\' || *p == '"')
> -- 
> 2.36.1
> 

I wonder if we can't abstract this at least a bit better. If we go that route
a bit further, then add another arch, this code will be pretty much
unreadable.

Greetings,

Andres Freund

Re: [PATCH] Compression dictionaries for JSONB

2022-07-11 Thread Aleksander Alekseev

Hi hackers,

> I invite anyone interested to join this effort as a co-author!

Here is v5. Same as v4 but with a fixed compiler warning (thanks,
cfbot). Sorry for the noise.

-- 
Best regards,
Aleksander Alekseev


v5-0001-Compression-dictionaries-for-JSONB.patch
Description: Binary data

Re: automatically generating node support functions

2022-07-11 Thread Tom Lane

Peter Eisentraut  writes:
> On 11.07.22 01:09, Tom Lane wrote:
>> Andres Freund  writes:
> I was just rebasing meson ontop of this and was wondering whether the input
> filenames were in a particular order:

> First, things used by later files need to be found in earlier files.  So 
> that constrains the order a bit.

Yeah, the script needs to see supertype nodes before subtype nodes,
else it will not realize that the subtypes are nodes at all.  However,
there is not very much cross-header-file subtyping.  I experimented with
rearranging the input-file order, and found that the *only* thing that
breaks it is to put primnodes.h after pathnodes.h (which fails because
PlaceHolderVar is a subtype of Expr).  You don't even need nodes.h to be
first, which astonished me initially, but then I realized that both
NodeTag and struct Node are special-cased in gen_node_support.pl,
so we know enough to get by even before reading nodes.h.

More generally, the main *nodes.h files themselves are arranged in
pipeline order, eg parsenodes.h #includes primnodes.h.  So that seems
to be a pretty safe thing to rely on even if we grow more cross-header
subtyping cases later.  But I'd vote for putting the incidental files
in alphabetical order.

> Second, the order of the files determines the ordering of the output. 
> The current order of the files reflects approximately the order how the 
> manual code was arranged.  That could be changed.  We could also just 
> sort the node types in the script and dump out everything alphabetically.

+1 for sorting alphabetically.  I experimented with that and it's a
really trivial change.

regards, tom lane

Re: [PATCH] Compression dictionaries for JSONB

2022-07-11 Thread Aleksander Alekseev

Hi hackers,

> OK, I see your point now. And I think this is a very good point.
> Basing "Compression dictionaries" on the API provided by "pluggable
> TOASTer" can also be less hacky than what I'm currently doing with
> `typmod` argument. I'm going to switch the implementation at some
> point, unless anyone will object to the idea.

Here is the rebased patch. I reworked the memory management a bit but
other than that there are no new changes.

So far we seem to have a consensus to:

1. Use bytea instead of NameData to store dictionary entries;

2. Assign monotonically ascending IDs to the entries instead of using
Oids, as it is done with pg_class.relnatts. In order to do this we
should either add a corresponding column to pg_type, or add a new
catalog table, e.g. pg_dict_meta. Personally I don't have a strong
opinion on what is better. Thoughts?

Both changes should be straightforward to implement and also are a
good exercise to newcomers.

I invite anyone interested to join this effort as a co-author! (since,
honestly, rewriting the same feature over and over again alone is
quite boring :D).

-- 
Best regards,
Aleksander Alekseev


v4-0001-Compression-dictionaries-for-JSONB.patch
Description: Binary data

Re: Add connection active, idle time to pg_stat_activity

2022-07-11 Thread Sergey Dudoladov

Hello,

thanks for the helpful review. I have incorporated most of the
suggestions into the patch. I have also rebased and tested the patch
on top of the current master (2cd2569c72b89200).

> +   int64   active_time_diff = 0;
> +   int64   transaction_idle_time_diff = 0;
>
> I think here we can use only a single variable say "state_time_diff" for
> example, as later only one of those two is incremented anyway.

I have written it this way to avoid cluttering the critical section
between PGSTAT_(BEGIN|END)_WRITE_ACTIVITY.
With two variable one can leave only actual increments in the section
and check conditions / call TimestampDifference outside of it.

Regards,
Sergey
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 4549c2560e..a0384fd3a5 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -979,6 +979,26 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
additional types.
   
  
+
+ 
+  
+   total_active_time double precision
+  
+  
+   Time in milliseconds this backend spent in active and
+   fastpath states.
+  
+ 
+
+ 
+  
+   total_idle_in_transaction_time double precision
+  
+  
+   Time in milliseconds this backend spent in idle in transaction
+   and idle in transaction (aborted) states.
+  
+ 
 

   
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fedaed533b..3498ea874a 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -864,7 +864,9 @@ CREATE VIEW pg_stat_activity AS
 s.backend_xmin,
 S.query_id,
 S.query,
-S.backend_type
+S.backend_type,
+S.total_active_time,
+S.total_idle_in_transaction_time
 FROM pg_stat_get_activity(NULL) AS S
 LEFT JOIN pg_database AS D ON (S.datid = D.oid)
 LEFT JOIN pg_authid AS U ON (S.usesysid = U.oid);
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index c7ed1e6d7a..27285cb27d 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -336,6 +336,8 @@ pgstat_bestart(void)
 	lbeentry.st_activity_start_timestamp = 0;
 	lbeentry.st_state_start_timestamp = 0;
 	lbeentry.st_xact_start_timestamp = 0;
+	lbeentry.st_total_active_time = 0;
+	lbeentry.st_total_transaction_idle_time = 0;
 	lbeentry.st_databaseid = MyDatabaseId;
 
 	/* We have userid for client-backends, wal-sender and bgworker processes */
@@ -524,6 +526,8 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	TimestampTz start_timestamp;
 	TimestampTz current_timestamp;
 	int			len = 0;
+	int64		active_time_diff = 0;
+	int64		transaction_idle_time_diff = 0;
 
 	TRACE_POSTGRESQL_STATEMENT_STATUS(cmd_str);
 
@@ -550,6 +554,9 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 			beentry->st_xact_start_timestamp = 0;
 			beentry->st_query_id = UINT64CONST(0);
 			proc->wait_event_info = 0;
+
+			beentry->st_total_active_time = 0;
+			beentry->st_total_transaction_idle_time = 0;
 			PGSTAT_END_WRITE_ACTIVITY(beentry);
 		}
 		return;
@@ -575,24 +582,31 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 	 * If the state has changed from "active" or "idle in transaction",
 	 * calculate the duration.
 	 */
-	if ((beentry->st_state == STATE_RUNNING ||
-		 beentry->st_state == STATE_FASTPATH ||
-		 beentry->st_state == STATE_IDLEINTRANSACTION ||
-		 beentry->st_state == STATE_IDLEINTRANSACTION_ABORTED) &&
+	if ((PGSTAT_IS_ACTIVE(beentry) || PGSTAT_IS_IDLEINTRANSACTION(beentry)) &&
 		state != beentry->st_state)
 	{
 		long		secs;
 		int			usecs;
+		int64		usecs_diff;
 
 		TimestampDifference(beentry->st_state_start_timestamp,
 			current_timestamp,
 			&secs, &usecs);
+		usecs_diff = secs * 100 + usecs;
 
-		if (beentry->st_state == STATE_RUNNING ||
-			beentry->st_state == STATE_FASTPATH)
-			pgstat_count_conn_active_time((PgStat_Counter) secs * 100 + usecs);
+		/*
+		 * We update per-backend st_total_active_time and st_total_transaction_idle_time
+		 * separately from pgStatActiveTime and pgStatTransactionIdleTime
+		 * used in pg_stat_database to provide per-DB statistics because
+		 * 1. Changing the former values implies modifying beentry and thus
+		 * have to be wrapped into PGSTAT_*_WRITE_ACTIVITY macros (see below).
+		 * 2. The latter values are reset to 0 once the data has been sent
+		 * to the statistics collector.
+		 */
+		if (PGSTAT_IS_ACTIVE(beentry))
+			active_time_diff = usecs_diff;
 		else
-			pgstat_count_conn_txn_idle_time((PgStat_Counter) secs * 100 + usecs);
+			transaction_idle_time_diff = usecs_diff;
 	}
 
 	/*
@@ -618,6 +632,9 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 		beentry->st_activity_start_t

Re: automatically generating node support functions

2022-07-11 Thread Tom Lane

Peter Eisentraut  writes:
> On 11.07.22 01:09, Tom Lane wrote:
>> The rest could probably be alphabetical.  I was also wondering if
>> all of them really need to be read at all --- I'm unclear on what
>> access/sdir.h is contributing, for example.

> could not handle type "ScanDirection" in struct "IndexScan" field 
> "indexorderdir"

Ah, I see.  Still, we could also handle that with

push @enum_types, qw(ScanDirection);

which would be exactly one place that needs to know about this, rather
than the three (soon to be four) places that know that access/sdir.h
needs to be read and then mostly ignored.

regards, tom lane

Re: doc: Bring mention of unique index forced transaction wait behavior outside of the internal section

2022-07-11 Thread Aleksander Alekseev

Hi Bruce,

> I was not happy with putting this in the Transaction Isolation section.
> I rewrote it and put it in the INSERT secion, right before ON CONFLICT;
> patch attached.

Looks good.

-- 
Best regards,
Aleksander Alekseev

Re: Reducing Memory Consumption (aset and generation)

2022-07-11 Thread Ranier Vilela

Em seg., 11 de jul. de 2022 às 09:25, Ranier Vilela 
escreveu:

> Hi,
> Thanks for take a look.
>
> Em seg., 11 de jul. de 2022 às 05:48, Matthias van de Meent <
> boekewurm+postg...@gmail.com> escreveu:
>
>> On Tue, 7 Jun 2022 at 03:09, Ranier Vilela  wrote:
>> >
>> > Let's restart this, to simplify the review and commit work.
>>
>> The patchset fails to apply. Could you send an updated version that
>> applies to the current master branch?
>>
> Sure.
>
Here the patchs updated.

regards,
Ranier Vilela


v2-003-aset-reduces-memory-consumption.patch
Description: Binary data


v2-002-generation-reduces-memory-consumption.patch
Description: Binary data


v2-001-aset-reduces-memory-consumption.diff
Description: Binary data

Re: Extending outfuncs support to utility statements

2022-07-11 Thread Tom Lane

Peter Eisentraut  writes:
> On 10.07.22 00:20, Tom Lane wrote:
>> We've long avoided building I/O support for utility-statement node
>> types, mainly because it didn't seem worth the trouble to write and
>> maintain such code by hand.
k
> This is also needed to be able to store utility statements in (unquoted) 
> SQL function bodies.  I have some in-progress code for that that I need 
> to dust off.  IIRC, there are still some nontrivial issues to work 
> through on the reading side.  I don't have a problem with enabling the 
> outfuncs side in the meantime.

Oh!  I'd not thought of that, but yes that is a plausible near-term
requirement for readfuncs support for utility statements.  So my
concern about suppressing those is largely a waste of effort.

There might be enough node types that are raw-parse-tree-only,
but not involved in utility statements, to make it worth
continuing to suppress readfuncs support for them.  But I kinda
doubt it.  I'll try to get some numbers later today.

regards, tom lane

Re: Making CallContext and InlineCodeBlock less special-case-y

2022-07-11 Thread Tom Lane

Peter Eisentraut  writes:
> On 10.07.22 01:50, Tom Lane wrote:
>> As committed, gen_node_support.pl excludes CallContext and InlineCodeBlock
>> from getting unneeded support functions via some very ad-hoc code.

> Couldn't we just enable those support functions?  I think they were just 
> excluded because they didn't have any before and nobody bothered to make 
> any.

Well, we could I suppose, but that path leads to a lot of dead code in
backend/nodes/ --- obviously these two alone are negligible, but I want
a story other than "it's a hack" for execnodes.h and the other files
we exclude from generation of support code.

After sleeping on it, I'm thinking the "pg_node_attr(nodetag_only)"
solution is the way to go, as that can lead to per-node rather than
per-file exclusion of support code, which we're surely going to want
eventually in more places.

regards, tom lane

Re: automatically generating node support functions

2022-07-11 Thread Peter Eisentraut


On 11.07.22 01:09, Tom Lane wrote:

Andres Freund  writes:

I was just rebasing meson ontop of this and was wondering whether the input
filenames were in a particular order:


First, things used by later files need to be found in earlier files.  So 
that constrains the order a bit.


Second, the order of the files determines the ordering of the output. 
The current order of the files reflects approximately the order how the 
manual code was arranged.  That could be changed.  We could also just 
sort the node types in the script and dump out everything alphabetically.



That annoyed me too.  I think it's sensible to list the "main" input
files first, but I'd put them in our traditional pipeline order:


nodes/nodes.h \
nodes/primnodes.h \
nodes/parsenodes.h \
nodes/pathnodes.h \
nodes/plannodes.h \
nodes/execnodes.h \


The seems worth trying out.


The rest could probably be alphabetical.  I was also wondering if
all of them really need to be read at all --- I'm unclear on what
access/sdir.h is contributing, for example.


could not handle type "ScanDirection" in struct "IndexScan" field 
"indexorderdir"

Foreign Key constraints on xocolatl/periods

2022-07-11 Thread Jean Carlo Giambastiani Lopes

Hi,
I'm sending this to pgsql-hackers because Vik Fearing (xocolatl), the reviewer 
of https://commitfest.postgresql.org/30/2316 also has a repository with a pgsql 
implementation of said functionalities: https://github.com/xocolatl/periods.

I have stumbled upon a probable issue 
(https://github.com/xocolatl/periods/issues/27), can anyone take a look and 
confirm if the current behavior is the expected? 

Thanks!

Re: Making CallContext and InlineCodeBlock less special-case-y

2022-07-11 Thread Peter Eisentraut


On 10.07.22 01:50, Tom Lane wrote:

As committed, gen_node_support.pl excludes CallContext and InlineCodeBlock
from getting unneeded support functions via some very ad-hoc code.


Couldn't we just enable those support functions?  I think they were just 
excluded because they didn't have any before and nobody bothered to make 
any.

Re: Extending outfuncs support to utility statements

2022-07-11 Thread Peter Eisentraut


On 10.07.22 00:20, Tom Lane wrote:

We've long avoided building I/O support for utility-statement node
types, mainly because it didn't seem worth the trouble to write and
maintain such code by hand.  Now that the automatic node-support-code
generation patch is in, that argument is gone, and it's just a matter
of whether the benefits are worth the backend code bloat.  I can
see two benefits worth considering:


This is also needed to be able to store utility statements in (unquoted) 
SQL function bodies.  I have some in-progress code for that that I need 
to dust off.  IIRC, there are still some nontrivial issues to work 
through on the reading side.  I don't have a problem with enabling the 
outfuncs side in the meantime.

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-11 Thread Masahiko Sawada

On Wed, Jul 6, 2022 at 3:01 PM Amit Kapila  wrote:
>
> On Wed, Jul 6, 2022 at 7:38 AM Masahiko Sawada  wrote:
> >
> > I'll post a new version patch in the next email with replying to other 
> > comments.
> >
>
> Okay, thanks for working on this. Few comments/suggestions on
> poc_remember_last_running_xacts_v2 patch:
>
> 1.
> +ReorderBufferSetLastRunningXactsCatalogChanges(ReorderBuffer *rb,
> TransactionId xid,
> +uint32 xinfo, int subxcnt,
> +TransactionId *subxacts, XLogRecPtr lsn)
> +{
> ...
> ...
> +
> + test = bsearch(&xid, rb->last_running_xacts, rb->n_last_running_xacts,
> +sizeof(TransactionId), xidComparator);
> +
> + if (test == NULL)
> + {
> + for (int i = 0; i < subxcnt; i++)
> + {
> + test = bsearch(&subxacts[i], rb->last_running_xacts, 
> rb->n_last_running_xacts,
> +sizeof(TransactionId), xidComparator);
> ...
>
> Is there ever a possibility that the top transaction id is not in the
> running_xacts list but one of its subxids is present? If yes, it is
> not very obvious at least to me so adding a comment here could be
> useful. If not, then why do we need this additional check for each of
> the sub-transaction ids?

I think there is no possibility. The check for subtransactions is not necessary.

>
> 2.
> @@ -627,6 +647,15 @@ DecodeCommit(LogicalDecodingContext *ctx,
> XLogRecordBuffer *buf,
>   commit_time = parsed->origin_timestamp;
>   }
>
> + /*
> + * Set the last running xacts as containing catalog change if necessary.
> + * This must be done before SnapBuildCommitTxn() so that we include catalog
> + * change transactions to the historic snapshot.
> + */
> + ReorderBufferSetLastRunningXactsCatalogChanges(ctx->reorder, xid,
> parsed->xinfo,
> +parsed->nsubxacts, parsed->subxacts,
> +buf->origptr);
> +
>   SnapBuildCommitTxn(ctx->snapshot_builder, buf->origptr, xid,
>  parsed->nsubxacts, parsed->subxacts);
>
> As mentioned previously as well, marking it before SnapBuildCommitTxn
> has one disadvantage, we sometimes do this work even if the snapshot
> state is SNAPBUILD_START or SNAPBUILD_BUILDING_SNAPSHOT in which case
> SnapBuildCommitTxn wouldn't do anything. Can we instead check whether
> the particular txn has invalidations and is present in the
> last_running_xacts list along with the check
> ReorderBufferXidHasCatalogChanges? I think that has the additional
> advantage that we don't need this additional marking if the xact is
> already marked as containing catalog changes.

Agreed.

>
> 3.
> 1.
> + /*
> + * We rely on HEAP2_NEW_CID records and XACT_INVALIDATIONS to know
> + * if the transaction has changed the catalog, and that information
> + * is not serialized to SnapBuilder.  Therefore, if the logical
> + * decoding decodes the commit record of the transaction that actually
> + * has done catalog changes without these records, we miss to add
> + * the xid to the snapshot so up creating the wrong snapshot.
>
> The part of the sentence "... snapshot so up creating the wrong
> snapshot." is not clear. In this comment, at one place you have used
> two spaces after a full stop, and at another place, there is one
> space. I think let's follow nearby code practice to use a single space
> before a new sentence.

Agreed.

>
> 4.
> +void
> +ReorderBufferProcessLastRunningXacts(ReorderBuffer *rb,
> xl_running_xacts *running)
> +{
> + /* Quick exit if there is no longer last running xacts */
> + if (likely(rb->n_last_running_xacts == 0))
> + return;
> +
> + /* First call, build the last running xact list */
> + if (rb->n_last_running_xacts == -1)
> + {
> + int nxacts = running->subxcnt + running->xcnt;
> + Size sz = sizeof(TransactionId) * nxacts;;
> +
> + rb->last_running_xacts = MemoryContextAlloc(rb->context, sz);
> + memcpy(rb->last_running_xacts, running->xids, sz);
> + qsort(rb->last_running_xacts, nxacts, sizeof(TransactionId), xidComparator);
> +
> + rb->n_last_running_xacts = nxacts;
> +
> + return;
> + }
>
> a. Can we add the function header comments for this function?

Updated.

> b. We seem to be tracking the running_xact information for the first
> running_xact record after start/restart. The name last_running_xacts
> doesn't sound appropriate for that, how about initial_running_xacts?

Sound good, updated.

>
> 5.
> + /*
> + * Purge xids in the last running xacts list if we can do that for at least
> + * one xid.
> + */
> + if (NormalTransactionIdPrecedes(rb->last_running_xacts[0],
> + running->oldestRunningXid))
>
> I think it would be a good idea to add a few lines here explaining why
> it is safe to purge. IIUC, it is because the commit for those xacts
> would have already been processed and we don't need such a xid
> anymore.

Right, updated.

>
> 6. As per the discussion above in this thread having
> XACT_XINFO_HAS_INVALS in the commit record doesn't indicate that the
> xact has catalog changes, so can we add somewhere in comments that for
> such a case we can't distinguish whether the txn has catalog change
> but we still mark the txn has cata

Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-07-11 Thread Peter Eisentraut




On 07.07.22 13:16, Alvaro Herrera wrote:

On 2022-Jul-07, Peter Eisentraut wrote:


diff --git a/src/bin/pg_basebackup/pg_basebackup.c 
b/src/bin/pg_basebackup/pg_basebackup.c
index 4445a86aee..79b23fa7d7 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c



@@ -1952,7 +1948,6 @@ BaseBackup(char *compression_algorithm, char 
*compression_detail,
else
starttli = latesttli;
PQclear(res);
-   MemSet(xlogend, 0, sizeof(xlogend));
  
  	if (verbose && includewal != NO_WAL)

pg_log_info("write-ahead log start point: %s on timeline %u",


You removed the MemSet here, but there's no corresponding
initialization.


Maybe that was an oversight by me, but it seems to me that that 
initialization was useless anyway, since xlogend is later 
unconditionally overwritten anyway.



diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index abb1c59770..e646b0e642 100644
--- a/src/port/snprintf.c
+++ b/src/port/snprintf.c
@@ -756,12 +756,9 @@ find_arguments(const char *format, va_list args,
int longflag;
int fmtpos;
int i;
-   int last_dollar;
-   PrintfArgType argtypes[PG_NL_ARGMAX + 1];
-
/* Initialize to "no dollar arguments known" */
-   last_dollar = 0;
-   MemSet(argtypes, 0, sizeof(argtypes));
+   int last_dollar = 0;
+   PrintfArgType argtypes[PG_NL_ARGMAX + 1] = {0};


pgindent will insert a blank line before the comment, which I personally
find quite ugly (because it splits the block of declarations).


Yeah.  I think I can convert that to an end-of-line comment instead.

Re: CREATE TABLE ( .. STORAGE ..)

2022-07-11 Thread Peter Eisentraut


On 11.07.22 11:27, Aleksander Alekseev wrote:

Here is a patch updated according to all the recent feedback, except
for two suggestions:


In v4 I forgot to list possible arguments for STORAGE in
alter_table.sgml, similarly as it is done for other subcommands. Here
is a corrected patch.


Here is the rebased patch.


The "safety check: do not allow toasted storage modes unless column 
datatype is TOAST-aware" could be moved into GetAttributeStorage(), so 
it doesn't have to be repeated.  (Note that GetAttributeCompression() 
does similar checking.)


ATExecSetStorage() currently doesn't do any such check, and your patch 
isn't adding one.  Is there a reason for that?

Re: pg15b2: large objects lost on upgrade

2022-07-11 Thread Robert Haas

On Sun, Jul 10, 2022 at 9:31 PM Michael Paquier  wrote:
> Hmm.  That would mean that the more LOs a cluster has, the more bloat
> there will be in the new cluster once the upgrade is done. That could
> be quite a few gigs worth of data laying around depending on the data
> inserted in the source cluster, and we don't have a way to know which
> files to remove post-upgrade, do we?

The files that are being leaked here are the files backing the
pg_largeobject table and the corresponding index as they existed in
the new cluster just prior to the upgrade. Hopefully, the table is a
zero-length file and the index is just one block, because you're
supposed to use a newly-initdb'd cluster as the target for a
pg_upgrade operation. Now, you don't actually have to do that: as
we've been discussing, there aren't as many sanity checks in this code
as there probably should be. But it would still be surprising to
initdb a new cluster, load gigabytes of large objects into it, and
then use it as the target cluster for a pg_upgrade.

As for whether it's possible to know which files to remove
post-upgrade, that's the same problem as trying to figure out whether
their are orphaned files in any other PostgreSQL cluster. We don't
have a tool for it, but if you're sure that the system is more or less
quiescent - no uncommitted DDL, in particular - you can compare
pg_class.relfilenode to the actual filesystem contents and figure out
what extra stuff is present on the filesystem level.

I am not saying we shouldn't try to fix this up more thoroughly, just
that I think you are overestimating the consequences.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Log details for client certificate failures

2022-07-11 Thread Peter Eisentraut



On 08.07.22 20:39, Jacob Champion wrote:

I also added an optional 0002 that bubbles the error info up to the
final ereport(ERROR), using errdetail() and errhint(). I can squash it
into 0001 if you like it, or drop it if you don't. (This approach
could be adapted to the client, too.)


I squashed those two together.  I also adjusted the error message a bit 
more for project style.  (We can put both lines into detail.)


I had to read up on this "ex_data" API.  Interesting.  But I'm wondering 
a bit about how the life cycle of these objects is managed.  What 
happens if the allocated error string is deallocated before its 
containing object?  Or vice versa?  How do we ensure we don't 
accidentally reuse the error string when the code runs again?  (I guess 
currently it can't?)  Maybe we should avoid this and just put the 
errdetail itself into a static variable that we unset once the message 
is printed?From e9bbf69618bfea433a6916af9cdcc59f04b96eda Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 11 Jul 2022 14:28:43 +0200
Subject: [PATCH v5] Log details for client certificate failures

Currently, debugging client cert verification failures is mostly
limited to looking at the TLS alert code on the client side.  For
simple deployments, sometimes it's enough to see "sslv3 alert
certificate revoked" and know exactly what needs to be fixed, but if
you add any more complexity (multiple CA layers, misconfigured CA
certificates, etc.), trying to debug what happened based on the TLS
alert alone can be an exercise in frustration.

Luckily, the server has more information about exactly what failed in
the chain, and we already have the requisite callback implemented as a
stub.  Add an ex_data pointer to the SSL handle, so that we can store
error information from the verification callback.  This lets us add
our error details directly to the final "could not accept SSL
connection" log message, as opposed to issuing intermediate LOGs.

It ends up looking like

LOG:  connection received: host=localhost port=43112
LOG:  could not accept SSL connection: certificate verify failed
DETAIL:  Client certificate verification failed at depth 1: unable to get 
local issuer certificate.
Failed certificate data (unverified): subject "/CN=Test CA for 
PostgreSQL SSL regression test client certs", serial number 
2315134995201656577, issuer "/CN=Test root CA for PostgreSQL SSL regression 
test suite".

The length of the Subject and Issuer strings is limited to prevent
malicious client certs from spamming the logs.  In case the truncation
makes things ambiguous, the certificate's serial number is also
logged.

Author: Jacob Champion 
Discussion: 
https://www.postgresql.org/message-id/flat/d13c4a5787c2a3f83705124f0391e0738c796751.ca...@vmware.com
---
 src/backend/libpq/be-secure-openssl.c | 132 +-
 src/test/ssl/conf/client-long.config  |  14 +++
 src/test/ssl/ssl/client-long.crt  |  20 
 src/test/ssl/ssl/client-long.key  |  27 ++
 src/test/ssl/sslfiles.mk  |   2 +-
 src/test/ssl/t/001_ssltests.pl|  40 +++-
 src/test/ssl/t/SSL/Backend/OpenSSL.pm |   2 +-
 7 files changed, 230 insertions(+), 7 deletions(-)
 create mode 100644 src/test/ssl/conf/client-long.config
 create mode 100644 src/test/ssl/ssl/client-long.crt
 create mode 100644 src/test/ssl/ssl/client-long.key

diff --git a/src/backend/libpq/be-secure-openssl.c 
b/src/backend/libpq/be-secure-openssl.c
index 3d0168a369..8d5eee0d16 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -81,6 +81,14 @@ static bool ssl_is_server_start;
 static int ssl_protocol_version_to_openssl(int v);
 static const char *ssl_protocol_version_to_string(int v);
 
+/* Helpers for storing application data in the SSL handle */
+struct pg_ex_data
+{
+   /* to bubble errors out of callbacks */
+   char   *errdetail;
+};
+static int ssl_ex_index;
+
 /*  */
 /*  Public interface   
*/
 /*  */
@@ -102,6 +110,7 @@ be_tls_init(bool isServerStart)
SSL_library_init();
SSL_load_error_strings();
 #endif
+   ssl_ex_index = SSL_get_ex_new_index(0, NULL, NULL, NULL, NULL);
SSL_initialized = true;
}
 
@@ -413,6 +422,7 @@ be_tls_open_server(Port *port)
int err;
int waitfor;
unsigned long ecode;
+   struct pg_ex_data exdata = {0};
boolgive_proto_hint;
 
Assert(!port->ssl);
@@ -445,6 +455,15 @@ be_tls_open_server(Port *port)

SSLerrmessage(ERR_get_error();
return -1;
}
+   if (!SSL_set_ex_data(port->ssl, ssl_ex_index, &ex

Re: Commitfest Update

2022-07-11 Thread Matthias van de Meent

On Fri, 8 Jul 2022, 23:41 Jacob Champion,  wrote:
>
> On 3/31/22 07:37, Tom Lane wrote:
>> Robert Haas  writes:
>>> On Thu, Mar 31, 2022 at 10:11 AM Tom Lane  wrote:
 ... Would it be feasible or reasonable
 to drop reviewers if they've not commented in the thread in X amount
 of time?
>>
>>> In theory, this might cause someone who made a valuable contribution
>>> to the discussion to not get credited in the commit message. But it
>>> probably wouldn't in practice, because I at least always construct the
>>> list of reviewers from the thread, not the CF app, since that tends to
>>> be wildly inaccurate in both directions. So maybe it's fine? Not sure.
>>
>> Hmm, I tend to believe what's in the CF app, so maybe I'm dropping the
>> ball on review credits :-(.  But there are various ways we could implement
>> this.  One way would be a nagbot that sends private email along the lines
>> of "you haven't commented on patch X in Y months.  Please remove your name
>> from the list of reviewers if you don't intend to review it any more."
>
> It seems there wasn't a definitive decision here. Are there any
> objections to more aggressive pruning of the Reviewers entries? So
> committers would need to go through the thread for full attribution,
> moving forward.
>
> If there are no objections, I'll start doing that during next Friday's
> patch sweep.

No objections, but this adds another item to the implicit commitfest
app user manual, that to the best of my knowledge seems to be mostly
implicit institutional knowledge plus bits of information spread
around the mailing lists.

Do we have an actual manual or otherwise a (single?) place with
documentation on how certain fields of the CFA should be used or
interpreted, so that (new) hackers know what to expect or where to
look?

Examples of information about using the CFA that I couldn't find:
- The Version field may contain a single specific postgresql version
number, 'stable', or nothing. If my patch needs backpatching to all
backbranches, which do I select? The oldest supported PG version, or
'stable'? Related to that: which version is indicated by 'stable'?

- When creating or updating a CF entry, who are responsible for
filling in which fields? May the author assign reviewers/committers,
or should they do so themselves?

- Should the 'git link' be filled with a link to the committed patch
once committed, or is it a general purpose link to share a git
repository with the proposed changes?

- What should (or shoudn't) Annotations be used for?

- What should I expect of the comment / review system of the CFA?
Should I use feature over direct on-list mails?

I have checked the wiki page on becoming a developer [0], but that
page seems woefully outdated with statements like "Commitfests are
scheduled to start on the 15th of the month" which hasn't been true
since 2015. The pages on Commitfests [1] and the Developer FAQ [2]
don't add much help either on how to use the CommitFest app. Even
(parts of) the checklist for the CFM on the wiki [3] still assumes the
old CF app that was last used in 2014: "It's based on the current
CommitFest app (written by Robert Haas), and will change once the new
CF App is done."

I'm not asking anyone to drop all and get all the features of the CFA
documented, but for my almost 2 years of following the -hackers list I
feel like I still haven't gotten a good understanding of the
application that is meant to coordinate the shared state in patch
development, and I think that's a quite a shame.

-Matthias

[0] https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F
[1] https://wiki.postgresql.org/wiki/CommitFest
[2] https://wiki.postgresql.org/wiki/Developer_FAQ
[3] https://wiki.postgresql.org/wiki/CommitFest_Checklist

Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

2022-07-11 Thread Matthias van de Meent

On Fri, 8 Jul 2022 at 21:35, David Zhang  wrote:
>
> Hi,
>
> I tried to apply this patch v5 to current master branch but it complains,
> "git apply --check
> v5-0001-Add-protections-in-xlog-record-APIs-against-large.patch
> error: patch failed: src/include/access/xloginsert.h:43
> error: src/include/access/xloginsert.h: patch does not apply"
>
> then I checked it out before the commit
> `b0a55e43299c4ea2a9a8c757f9c26352407d0ccc` and applied this v5 patch.

The attached rebased patchset should work with master @ 2cd2569c and
REL_15_STABLE @ 53df1e28. I've also added a patch that works for PG14
and earlier, which should be correct for all versions that include
commit 2c03216d (that is, all versions back to 9.5).

> 1) both make check and make installcheck passed.
>
> 2) and I can also see this patch v5 prevents the error happens previously,
>
> "postgres=# SELECT pg_logical_emit_message(false, long, long) FROM
> repeat(repeat(' ', 1024), 1024*1023) as l(long);
> ERROR:  too much WAL data"
>
> 3) without this v5 patch, the same test will cause the standby crash
> like below, and the standby not be able to boot up after this crash.

Thanks for reviewing.

Kind regards,

Matthias van de Meent


v6-0001-Add-protections-in-xlog-record-APIs-against-large.patch
Description: Binary data


v6-0001-Add-protections-in-xlog-record-APIs-against-large.v15-patch
Description: Binary data


v6-0001-Add-protections-in-xlog-record-APIs-against-large.v14-patch
Description: Binary data

Re: Reducing Memory Consumption (aset and generation)

2022-07-11 Thread Ranier Vilela

Hi,
Thanks for take a look.

Em seg., 11 de jul. de 2022 às 05:48, Matthias van de Meent <
boekewurm+postg...@gmail.com> escreveu:

> On Tue, 7 Jun 2022 at 03:09, Ranier Vilela  wrote:
> >
> > Let's restart this, to simplify the review and commit work.
>
> The patchset fails to apply. Could you send an updated version that
> applies to the current master branch?
>
Sure.


>
> > Regarding the patches now, we have:
> > 1) v1-001-aset-reduces-memory-consumption.patch
> > Reduces memory used by struct AllocBlockData by minus 8 bits,
>
> This seems reasonable, considering we don't generally use the field
> for anything but validation.
>
> > reducing the total size to 32 bits, which leads to "fitting" two structs
> in a 64bit cache.
>
> By bits, you mean bytes, right?
>
Correct.


>
> Regarding fitting 2 structs in 64 bytes, that point is moot, as each
> of these structs are stored at the front of each malloc-ed block, so
> you will never see more than one of these in the same cache line. Less
> space used is nice, but not as critical there IMO.
>
> > Remove tests elog(ERROR, "could not find block containing chunk %p" and
> > elog(ERROR, "could not find block containing chunk %p", moving them to
> > MEMORY_CONTEXT_CHECKING context.
> >
> > Since 8.2 versions, nobody complains about these tests.
> >
> > But if is not acceptable, have the option (3)
> v1-003-aset-reduces-memory-consumption.patch
> >
> > 2) v1-002-generation-reduces-memory-consumption.patch
> > Reduces memory used by struct GenerationBlock, by minus 8 bits,
>
> That seems fairly straight-forward -- 8 bytes saved on each page isn't
> a lot, but it's something.
>
> > reducing the total size to 32 bits, which leads to "fitting" two structs
> in a 64bit cache.
>
> Your size accounting seems wrong. On 64-bit architectures, we have
> dlist_node (=16) + Size (=8) + 2*int (=8) + 2 * (char*) (=16) = 48
> bytes. Shaving off the Size field reduces that by 8 bytes to 40 bytes.
>
> The argument of fitting 2 of these structures into one cache line is
> moot again, because here, too, two of this struct will not share a
> cache line (unless somehow we allocate 0-sized blocks, which would be
> a bug).
>
Right. I think I was very tired.


>
> > 3) v1-003-aset-reduces-memory-consumption.patch
> > Same to the (1), but without remove the tests:
> > elog(ERROR, "could not find block containing chunk %p" and
> > elog(ERROR, "could not find block containing chunk %p",
> > But at the cost of removing a one tiny part of the tests and
> > moving them to MEMORY_CONTEXT_CHECKING context.
>
> I like this patch over 001 due to allowing less corruption to occur in
> the memory context code. This allows for detecting some issues in 003,
> as opposed to none in 001.
>
I understand.

regards,
Ranier Vilela

Re: Pluggable toaster

2022-07-11 Thread Nikita Malakhov

Hi!
We have branch with incremental commits worm where patches were generated
with format-patch -
https://github.com/postgrespro/postgres/tree/toasterapi_clean
I'll clean up commits from garbage files asap, sorry, haven't noticed them
while moving changes.

Best regards,
Nikita Malakhov

On Fri, Jul 1, 2022 at 3:27 PM Matthias van de Meent <
boekewurm+postg...@gmail.com> wrote:

> On Thu, 30 Jun 2022 at 22:26, Nikita Malakhov  wrote:
> >
> > Hi hackers!
> > Here is the patch set rebased onto current master (15 rel beta 2 with
> commit from 29.06).
>
> Thanks!
>
> > Just to remind:
> > With this patch set you will be able to develop and plug in your own
> TOAST mechanics for table columns. Knowing internals and/or workflow and
> workload
> > of data being TOASTed makes Custom Toasters much more efficient in
> performance and storage.
>
> The new toast API doesn't seem to be very well documented, nor are the
> new features. Could you include a README or extend the comments on how
> this is expected to work, and/or how you expect people to use (the
> result of) `get_vtable`?
>
> > Patch set consists of 9 incremental patches:
> > [...]
> > 0002_toaster_interface_v7.patch - TOAST API interface and SQL syntax
> allowing creation of custom Toaster (CREATE TOASTER ...)
> > and setting Toaster to a table column (CREATE TABLE t (data bytea
> STORAGE EXTERNAL TOASTER bytea_toaster);)
>
> This patch 0002 seems to include changes to log files (!) that don't
> exist in current HEAD, but at the same time are not created by patch
> 0001. Could you please check and sanitize your patches to ensure that
> the changes are actually accurate?
>
> Like Robert Haas mentioned earlier[0], please create a branch in a git
> repository that has a commit containing the changes for each patch,
> and then use git format-patch to generate a single patchset, one that
> shares a single version number. Keeping track of what patches are
> needed to test this CF entry is already quite difficult due to the
> amount of patches and their packaging (I'm having troubles managing
> these seperate .patch.gz), and the different version tags definitely
> don't help in finding the correct set of patches to apply once
> downloaded.
>
> Kind regards,
>
> Matthias van de Meent
>
> [0]
> https://www.postgresql.org/message-id/CA%2BTgmoZBgNipyKuQAJzNw2w7C9z%2B2SMC0SAHqCnc_dG1nSLNcw%40mail.gmail.com
>

Re: Handle infinite recursion in logical replication setup

2022-07-11 Thread Amit Kapila

On Sat, Jul 9, 2022 at 8:11 PM vignesh C  wrote:
>

Thanks, a few more comments on v30_0002*
1.
+/*
+ * Represents whether copy_data parameter is specified with off, on or force.

A comma is required after on.

2.
  qsort(subrel_local_oids, list_length(subrel_states),
sizeof(Oid), oid_cmp);

+ check_pub_table_subscribed(wrconn, sub->publications, copy_data,
+sub->origin, subrel_local_oids,
+list_length(subrel_states));

We can avoid using list_length by using an additional variable in this case.

3.
errmsg("table: \"%s.%s\" might have replicated data in the publisher",
+nspname, relname),

Why ':' is used after the table in the above message? I don't see such
a convention at other places in the code. Also, having might in the
error messages makes it less clear, so, can we slightly change the
message as in the attached patch?

4. I have made some additional changes in the comments, kindly check
the attached and merge those if you are okay.

5.
+$node_C->safe_psql(
+ 'postgres', "
+DELETE FROM tab_full");
+$node_B->safe_psql(
+ 'postgres', "
+DELETE FROM tab_full where a = 13");

Don't we need to wait for these operations to replicate?

-- 
With Regards,
Amit Kapila.


v30_0002_amit.patch
Description: Binary data

1 2 >

1 - 100 of 106 matches

Mail list logo