Re: Add primary keys to system catalogs

2021-01-20 Thread Peter Eisentraut

On 2021-01-18 18:23, Tom Lane wrote:

The reason I got interested in this right now is the nearby
discussion [1] about why findoidjoins misses some catalog
relationships and whether we should fix that and/or make its results
more readily accessible.  I'd thought perhaps FK constraint entries
could supersede the need for that tool altogether, but now it seems
like not.


Yes, catalog consistency checking was also something I had in mind. 
There are clearly a number of complications still to resolve.  But in 
general the idea of representing relationships between tables in the 
DDL/schema sounds like a sensible idea.


--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/




Re: Add primary keys to system catalogs

2021-01-20 Thread Peter Eisentraut

On 2021-01-18 00:35, Robert Haas wrote:

I don't have any complaint about labelling some of the unique indexes
as primary keys, but I think installing foreign keys that don't really
enforce anything may lead to confusion.


FWIW, "not enforced" constraints (such as foreign keys) is a feature 
that is in the SQL standard and in other SQL implementations.


--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/




Re: Parallel INSERT (INTO ... SELECT ...)

2021-01-20 Thread Greg Nancarrow
On Mon, Dec 21, 2020 at 1:50 PM Hou, Zhijie  wrote:
>
> Hi
>
> +
> +   index_oid_list = RelationGetIndexList(rel);
> ...
>
> As memtioned in the comments of RelationGetIndexList:
> * we return a copy of the list palloc'd in the caller's context.  The caller
> * may list_free() the returned list after scanning it.
>
> Shall we list_free(index_oid_list) at the end of function ?
> Just to avoid potential memory leak.
>

I think that's a good idea, so I'll make that update in the next
version of the patch.
I do notice, however, that there seems to be quite a few places in the
Postgres code where RelationGetIndexList() is being called without a
corresponding list_free() being called - obviously instead relying on
it being deallocated when the caller's memory-context is destroyed.

Regards,
Greg Nancarrow
Fujitsu Australia




Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-20 Thread Bharath Rupireddy
On Thu, Jan 21, 2021 at 12:17 PM Fujii Masao
 wrote:
> On 2021/01/21 14:46, Bharath Rupireddy wrote:
> > On Thu, Jan 21, 2021 at 10:06 AM Fujii Masao
> >  wrote:
> >   > >> +   if (entry->server_hashvalue == hashvalue &&
>  +   (entry->xact_depth > 0 || result))
>  +   {
>  +   hash_seq_term();
>  +   break;
> 
>  entry->server_hashvalue can be 0? If yes, since 
>  postgres_fdw_disconnect_all()
>  specifies 0 as hashvalue, ISTM that the above condition can be true
>  unexpectedly. Can we replace this condition with just "if (!all)"?
> >>>
> >>> I don't think so entry->server_hashvalue can be zero, because
> >>> GetSysCacheHashValue1/CatalogCacheComputeHashValue will not return 0
> >>> as hash value. I have not seen someone comparing hashvalue with an
> >>> expectation that it has 0 value, for instance see if (hashvalue == 0
> >>> || riinfo->oidHashValue == hashvalue).
> >>>
> >>>Having if(!all) something like below there doesn't suffice because we
> >>> might call hash_seq_term, when some connection other than the given
> >>> foreign server connection is in use.
> >>
> >> No because we check the following condition before reaching that code. No?
> >>
> >> +   if ((all || entry->server_hashvalue == hashvalue) &&
> >>
> >>
> >> I was thinking that "(entry->xact_depth > 0 || result))" condition is not
> >> necessary because "result" is set to true when xact_depth <= 0 and that
> >> condition always indicates true.
> >
> > I think that condition is too confusing. How about having a boolean
> > can_terminate_scan like below?
>
> Thanks for thinking this. But at least for me, "if (!all)" looks not so 
> confusing.
> And the comment seems to explain why we can end the scan.

May I know if it's okay to have the boolean can_terminate_scan as shown in [1]?

[1] - 
https://www.postgresql.org/message-id/flat/CALj2ACVx0%2BiOsrAA-wXbo3RLAKqUoNvvEd7foJ0vLwOdu8XjXw%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel INSERT (INTO ... SELECT ...)

2021-01-20 Thread Greg Nancarrow
On Wed, Dec 23, 2020 at 1:45 PM Amit Kapila  wrote:
>
> On Wed, Dec 23, 2020 at 7:52 AM Hou, Zhijie  wrote:
> >
> > Hi
> >
> > > > I may be wrong, and if I miss sth in previous mails, please give me some
> > > hints.
> > > > IMO, serial insertion with underlying parallel SELECT can be
> > > > considered for foreign table or temporary table, as the insertions only
> > > happened in the leader process.
> > > >
> > >
> > > I don't think we support parallel scan for temporary tables. Can you 
> > > please
> > > try once both of these operations without Insert being involved? If you
> > > are able to produce a parallel plan without Insert then we can see why it
> > > is not supported with Insert.
> >
> > Sorry, may be I did not express it clearly, I actually means the case when 
> > insert's target(not in select part) table is temporary.
> > And you are right that parallel select is not enabled when temporary table 
> > is in select part.
> >
>
> I think Select can be parallel for this case and we should support this case.
>

So I think we're saying that if the target table is a foreign table or
temporary table, it can be regarded as PARALLEL_RESTRICTED, right?

i.e. code-wise:

/*
-* We can't support table modification in parallel-mode if
it's a foreign
-* table/partition (no FDW API for supporting parallel access) or a
+* We can't support table modification in a parallel worker if it's a
+* foreign table/partition (no FDW API for supporting parallel
access) or a
 * temporary table.
 */
if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
RelationUsesLocalBuffers(rel))
{
-   table_close(rel, lockmode);
-   context->max_hazard = PROPARALLEL_UNSAFE;
-   return true;
+   if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
+   {
+   table_close(rel, lockmode);
+   return true;
+   }
}


Regards,
Greg Nancarrow
Fujitsu Australia




Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-20 Thread Fujii Masao




On 2021/01/21 14:46, Bharath Rupireddy wrote:

On Thu, Jan 21, 2021 at 10:06 AM Fujii Masao
 wrote:
  > >> +   if (entry->server_hashvalue == hashvalue &&

+   (entry->xact_depth > 0 || result))
+   {
+   hash_seq_term();
+   break;

entry->server_hashvalue can be 0? If yes, since postgres_fdw_disconnect_all()
specifies 0 as hashvalue, ISTM that the above condition can be true
unexpectedly. Can we replace this condition with just "if (!all)"?


I don't think so entry->server_hashvalue can be zero, because
GetSysCacheHashValue1/CatalogCacheComputeHashValue will not return 0
as hash value. I have not seen someone comparing hashvalue with an
expectation that it has 0 value, for instance see if (hashvalue == 0
|| riinfo->oidHashValue == hashvalue).

   Having if(!all) something like below there doesn't suffice because we
might call hash_seq_term, when some connection other than the given
foreign server connection is in use.


No because we check the following condition before reaching that code. No?

+   if ((all || entry->server_hashvalue == hashvalue) &&


I was thinking that "(entry->xact_depth > 0 || result))" condition is not
necessary because "result" is set to true when xact_depth <= 0 and that
condition always indicates true.


I think that condition is too confusing. How about having a boolean
can_terminate_scan like below?


Thanks for thinking this. But at least for me, "if (!all)" looks not so 
confusing.
And the comment seems to explain why we can end the scan.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




RE: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-20 Thread Hou, Zhijie
> > > Attaching v15 patch set. Please consider it for further review.
> >
> > Hi
> >
> > I have some comments for the 0001 patch
> >
> > In v15-0001-postgres_fdw-function-to-discard-cached-connecti
> >
> > 1.
> > +  If there is no open connection to the given foreign server,
> false
> > +  is returned. If no foreign server with the given name is found,
> > + an error
> >
> > Do you think it's better add some testcases about:
> > call postgres_fdw_disconnect and postgres_fdw_disconnect_all
> > when there is no open connection to the given foreign server
> 
> Do you mean a test case where foreign server exists but
> postgres_fdw_disconnect() returns false because there's no connection for
> that server?


Yes, I read this from the doc, so I think it's better to test this.




> > 2.
> > +   /*
> > +* For the given server, if we closed connection
> or it is still in
> > +* use, then no need of scanning the cache
> further.
> > +*/
> > +   if (entry->server_hashvalue == hashvalue &&
> > +   (entry->xact_depth > 0 || result))
> > +   {
> > +   hash_seq_term();
> > +   break;
> > +   }
> >
> > If I am not wrong, is the following condition always true ?
> > (entry->xact_depth > 0 || result)
> 
> It's not always true. But it seems like it's too confusing, please have
> a look at the upthread suggestion to change this with can_terminate_scan
> boolean.

Thanks for the remind, I will look at that.



Best regards,
houzj




Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-20 Thread Bharath Rupireddy
On Thu, Jan 21, 2021 at 11:15 AM Hou, Zhijie  wrote:
>
> > Attaching v15 patch set. Please consider it for further review.
>
> Hi
>
> I have some comments for the 0001 patch
>
> In v15-0001-postgres_fdw-function-to-discard-cached-connecti
>
> 1.
> +  If there is no open connection to the given foreign server, 
> false
> +  is returned. If no foreign server with the given name is found, an 
> error
>
> Do you think it's better add some testcases about:
> call postgres_fdw_disconnect and postgres_fdw_disconnect_all when 
> there is no open connection to the given foreign server

Do you mean a test case where foreign server exists but
postgres_fdw_disconnect() returns false because there's no connection
for that server?

> 2.
> +   /*
> +* For the given server, if we closed connection or 
> it is still in
> +* use, then no need of scanning the cache further.
> +*/
> +   if (entry->server_hashvalue == hashvalue &&
> +   (entry->xact_depth > 0 || result))
> +   {
> +   hash_seq_term();
> +   break;
> +   }
>
> If I am not wrong, is the following condition always true ?
> (entry->xact_depth > 0 || result)

It's not always true. But it seems like it's too confusing, please
have a look at the upthread suggestion to change this with
can_terminate_scan boolean.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel INSERT (INTO ... SELECT ...)

2021-01-20 Thread Greg Nancarrow
On Thu, Jan 21, 2021 at 12:47 PM Hou, Zhijie  wrote:
>
> >
> Hi
>
> It seems there are some previous comments[1][2] not addressed in current 
> patch.
> Just to make sure it's not missed.
>
> [1]
> https://www.postgresql.org/message-id/77e1c06ffb2240838e5fc94ec8dcb7d3%40G08CNEXMBPEKD05.g08.fujitsu.local
>
> [2]
> https://www.postgresql.org/message-id/CAA4eK1LMmz58ej5BgVLJ8VsUGd%3D%2BKcaA8X%3DkStORhxpfpODOxg%40mail.gmail.com
>

Thanks for alerting me to those, somehow I completely missed them,
sorry about that.
I'll be sure to investigate and address them in the next version of
the patch I post.

Regards,
Greg Nancarrow
Fujitsu Australia




Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-20 Thread Bharath Rupireddy
On Thu, Jan 21, 2021 at 10:06 AM Fujii Masao
 wrote:
 > >> +   if (entry->server_hashvalue == hashvalue &&
> >> +   (entry->xact_depth > 0 || result))
> >> +   {
> >> +   hash_seq_term();
> >> +   break;
> >>
> >> entry->server_hashvalue can be 0? If yes, since 
> >> postgres_fdw_disconnect_all()
> >> specifies 0 as hashvalue, ISTM that the above condition can be true
> >> unexpectedly. Can we replace this condition with just "if (!all)"?
> >
> > I don't think so entry->server_hashvalue can be zero, because
> > GetSysCacheHashValue1/CatalogCacheComputeHashValue will not return 0
> > as hash value. I have not seen someone comparing hashvalue with an
> > expectation that it has 0 value, for instance see if (hashvalue == 0
> > || riinfo->oidHashValue == hashvalue).
> >
> >   Having if(!all) something like below there doesn't suffice because we
> > might call hash_seq_term, when some connection other than the given
> > foreign server connection is in use.
>
> No because we check the following condition before reaching that code. No?
>
> +   if ((all || entry->server_hashvalue == hashvalue) &&
>
>
> I was thinking that "(entry->xact_depth > 0 || result))" condition is not
> necessary because "result" is set to true when xact_depth <= 0 and that
> condition always indicates true.

I think that condition is too confusing. How about having a boolean
can_terminate_scan like below?

while ((entry = (ConnCacheEntry *) hash_seq_search()))
{
boolcan_terminate_scan = false;

/*
 * Either disconnect given or all the active and not in use cached
 * connections.
 */
if ((all || entry->server_hashvalue == hashvalue) &&
 entry->conn)
{
/* We cannot close connection that's in use, so issue a warning. */
if (entry->xact_depth > 0)
{
ForeignServer *server;

if (!all)
can_terminate_scan = true;

server = GetForeignServerExtended(entry->serverid,
  FSV_MISSING_OK);

if (!server)
{
/*
 * If the server has been dropped in the current explicit
 * transaction, then this entry would have been invalidated
 * in pgfdw_inval_callback at the end of drop sever
 * command. Note that this connection would not have been
 * closed in pgfdw_inval_callback because it is still being
 * used in the current explicit transaction. So, assert
 * that here.
 */
Assert(entry->invalidated);

ereport(WARNING,
(errmsg("cannot close dropped server
connection because it is still in use")));
}
else
ereport(WARNING,
(errmsg("cannot close connection for
server \"%s\" because it is still in use",
 server->servername)));
}
else
{
elog(DEBUG3, "discarding connection %p", entry->conn);
disconnect_pg_server(entry);
result = true;

if (!all)
can_terminate_scan = true;
}

/*
 * For the given server, if we closed connection or it is still in
 * use, then no need of scanning the cache further.
 */
if (can_terminate_scan)
{
hash_seq_term();
break;
}
}
}

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




RE: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-20 Thread Hou, Zhijie
> Attaching v15 patch set. Please consider it for further review.

Hi

I have some comments for the 0001 patch

In v15-0001-postgres_fdw-function-to-discard-cached-connecti

1.
+  If there is no open connection to the given foreign server, 
false
+  is returned. If no foreign server with the given name is found, an error

Do you think it's better add some testcases about:
call postgres_fdw_disconnect and postgres_fdw_disconnect_all when there 
is no open connection to the given foreign server

2.
+   /*
+* For the given server, if we closed connection or it 
is still in
+* use, then no need of scanning the cache further.
+*/
+   if (entry->server_hashvalue == hashvalue &&
+   (entry->xact_depth > 0 || result))
+   {
+   hash_seq_term();
+   break;
+   }

If I am not wrong, is the following condition always true ?
(entry->xact_depth > 0 || result)

Best regards,
houzj





Re: Support for NSS as a libpq TLS backend

2021-01-20 Thread Michael Paquier
On Tue, Jan 19, 2021 at 09:21:41PM +0100, Daniel Gustafsson wrote:
>> In order to
>> move on with this set, I would suggest to extract some parts of the
>> patch set independently of the others and have two buildfarm members
>> for the MSVC and non-MSVC cases to stress the parts that can be
>> committed.  Just seeing the size, we could move on with:
>> - The ./configure set, with the change to introduce --with-ssl=openssl. 
>> - 0004 for strong randoms.
>> - Support for cryptohashes.
> 
> I will leave it to others to decide the feasibility of this, I'm happy to 
> slice
> and dice the commits into smaller bits to for example separate out the
> --with-ssl autoconf change into a non NSS dependent commit, if that's wanted.

IMO it makes sense to extract the independent pieces and build on top
of them.  The bulk of the changes is likely going to have a bunch of
comments if reviewed deeply, so I think that we had better remove from
the stack the small-ish problems to ease the next moves.  The
./configure part and replacement of with_openssl by with_ssl is mixed
in 0001 and 0002, which is actually confusing.  And, FWIW, I would be
fine with applying a patch that introduces a --with-ssl with a
compatibility kept for --with-openssl.  This is what 0001 is doing,
actually, similarly to the past switches for --with-uuid.

A point that has been mentioned offline by you, but not mentioned on
this list.  The structure of the modules in src/test/ssl/ could be
refactored to help with an easier integration of more SSL libraries.
This makes sense taken independently.

> Based on an offlist discussion I believe this was a misunderstanding, but if I
> instead misunderstood that feel free to correct me with how you think this
> should be done.

The point would be to rename BITS_PER_BYTE to PG_BITS_PER_BYTE in the
code and avoid conflicts.  I am not completely sure if others would
agree here, but this would remove quite some ifdef/undef stuff from
the code dedicated to NSS.

> > src/sgml/libpq.sgml needs to document PQdefaultSSLKeyPassHook_nss, no?
> 
> Good point, fixed.

Please note that patch 0001 is failing to apply after the recent
commit b663a41.  There are conflicts in postgres_fdw.out.

Also, what's the minimum version of NSS that would be supported?  It
would be good to define an acceptable older version, to keep that
documented and to track that perhaps with some configure checks (?),
similarly to what is done for OpenSSL.

Patch 0006 has three trailing whitespaces (git diff --check
complains).  Running the regression tests of pgcrypto, I think that
the SHA2 implementation is not completely right.  Some SHA2 encoding
reports results from already-freed data.  I have spotted a second
issue within scram_HMAC_init(), where pg_cryptohash_create() remains
stuck inside NSS_InitContext(), freezing the regression tests where
password hashed for SCRAM are created.

+   ResourceOwnerEnlargeCryptoHash(CurrentResourceOwner);
+   ctx = MemoryContextAlloc(TopMemoryContext, sizeof(pg_cryptohash_ctx));
+#else
+   ctx = pg_malloc(sizeof(pg_cryptohash_ctx));
+#endif
cryptohash_nss.c cannot use pg_malloc() for frontend allocations.  On
OOM, your patch would call exit() directly, even within libpq.  But
shared library callers need to know about the OOM failure.

+   explicit_bzero(ctx, sizeof(pg_cryptohash_ctx));
+   pfree(ctx);
For similar reasons, pfree should not be used for the frontend code in
cryptohash_nss.c.  The fallback should be just a malloc/free set.

+   status = PK11_DigestBegin(ctx->pk11_context);
+
+   if (status != SECSuccess)
+   return 1;
+   return 0;
This needs to return -1 on failure, not 1.

I really need to study more the choide of the options chosen for
NSS_InitContext()...  But based on the docs I can read on the matter I
think that saving nsscontext in pg_cryptohash_ctx is right for each
cryptohash built.

src/tools/msvc/ is missing an update for cryptohash_nss.c.
--
Michael


signature.asc
Description: PGP signature


Re: Tid scan improvements

2021-01-20 Thread David Rowley
On Wed, 13 Jan 2021 at 15:38, Edmund Horner  wrote:
> Thanks for updating the patch.  I'd be very happy if this got picked up 
> again, and I'd certainly follow along and do some review.

Likewise here. I this patch was pretty close so it seems a shame to
let it slip through the cracks.

I spoke to Andres off-list about this patch. He mentioned that he
wasn't too keen on seeing the setscanlimits being baked into the table
AM API.  He mentioned that he'd rather not assume too much about table
AMs having all of their tids in a given range consecutively on a set
of pages.   That seems reasonable to me.  He suggested that we add a
new callback that just allows a range of ItemPointers to scan and
leave it up to the implementation to decide which pages should be
scanned to fetch the tuples within the given range.  I didn't argue. I
just went and coded it all, hopefully to Andres' description. The new
table AM callback is optional.

I've implemented this in the attached.

I also took the time to support backwards TID Range scans and added a
few more tests to test rescans.  I just found it annoying that TID
Scans supported backwards scans and TID Range scans did not.

The 0002 patch is the guts of it. The 0001 patch is an existing bug
that needs to be fixed before 0002 could go in (backwards TID Range
Scans are broken without this). I've posted separately about this bug
in [1]

I also didn't really like the idea of adding possibly lots of bool
fields to RelOptInfo to describe what the planner can do in regards to
what the given table AM supports.   I know that IndexOptInfo has such
a set of bool fields.  I'd rather not repeat that, so I just went with
a single int field named "amflags" and just added a single constant to
define a flag that specifies if the rel supports scanning ranges of
TIDs.

Edmund, will you get time to a look at this?

David

[1] 
https://www.postgresql.org/message-id/caaphdvpgc9h0_ovd2ctgbcxcs1n-qdyzsebrnuh+0cwja9c...@mail.gmail.com
From 3a0469df93690889793823fdec235f72c8fb81d7 Mon Sep 17 00:00:00 2001
From: "dgrow...@gmail.com" 
Date: Thu, 21 Jan 2021 16:27:25 +1300
Subject: [PATCH v11 1/2] Fix hypothetical bug in heap backward scans

Both heapgettup() and heapgettup_pagemode() incorrectly set the first page
to scan in a backward scan in which the pages to scan was specified by
heap_setscanlimits(). In theory, this could result in the incorrect pages
being scanned.  In practice, nowhere in core code performs backward scans
after having used heap_setscanlimits().  However, it's possible an
extension uses the heap functions in this way.

For the bug to manifest, the scan must be limited to fewer than the number
of pages in the relation and start at page 0.  The scan will start on the
final page in the table rather than the final page in the range of pages
to scan.  The correct number of pages is always scanned, it's just the
pages which are scanned which can be incorrect.

This is a precursor fix to a future patch which allows TID Range scans to
scan a subset of a heap table.

Proper adjustment of the heap scan code seems to have been missed when
heap_setscanlimits() was added in 7516f5259.

Author: David Rowley
Discussion: 
https://postgr.es/m/caaphdvpgc9h0_ovd2ctgbcxcs1n-qdyzsebrnuh+0cwja9c...@mail.gmail.com
---
 src/backend/access/heap/heapam.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index faffbb1865..ddd214b7af 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -603,11 +603,15 @@ heapgettup(HeapScanDesc scan,
 * forward scanners.
 */
scan->rs_base.rs_flags &= ~SO_ALLOW_SYNC;
-   /* start from last page of the scan */
-   if (scan->rs_startblock > 0)
-   page = scan->rs_startblock - 1;
+
+   /*
+* Start from last page of the scan.  Ensure we take 
into account
+* rs_numblocks if it's been adjusted by 
heap_setscanlimits().
+*/
+   if (scan->rs_numblocks != InvalidBlockNumber)
+   page = (scan->rs_startblock + 
scan->rs_numblocks - 1) % scan->rs_nblocks;
else
-   page = scan->rs_nblocks - 1;
+   page = (scan->rs_startblock + scan->rs_nblocks 
- 1) % scan->rs_nblocks;
heapgetpage((TableScanDesc) scan, page);
}
else
@@ -918,11 +922,15 @@ heapgettup_pagemode(HeapScanDesc scan,
 * forward scanners.
 */
scan->rs_base.rs_flags &= ~SO_ALLOW_SYNC;
-   /* start from last page of the scan */
-   if (scan->rs_startblock > 0)
-  

Re: Parallel INSERT (INTO ... SELECT ...)

2021-01-20 Thread Greg Nancarrow
On Thu, Jan 21, 2021 at 1:50 PM Zhihong Yu  wrote:
>
> For v12-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch:
>
> +   boolisParallelModifyLeader = IsA(planstate, GatherState) && 
> IsA(outerPlanState(planstate), ModifyTableState);
>
> Please wrap long line.
>

OK.
I thought I ran pg_indent fairly recently, but maybe it chose not to
wrap that line.


> +   uint64 *processed_count_space;
>
> If I read the code correctly, it seems it can be dropped (use 
> pei->processed_count directly).
>

You're right. I'll change it in the next version.

Regards,
Greg Nancarrow
Fujitsu Australia




Re: [HACKERS] [PATCH] Generic type subscripting

2021-01-20 Thread Dian M Fay
On Wed Jan 20, 2021 at 2:08 PM EST, Dmitry Dolgov wrote:
> > On Wed, Jan 20, 2021 at 11:34:16AM -0500, Dian M Fay wrote:
> > > Thanks, I need to remember to not skipp doc building for testing process
> > > even for such small changes. Hope now I didn't forget anything.
> > >
> > > > On Wed, Jan 20, 2021 at 09:58:43AM -0500, Dian M Fay wrote:
> > >
> > > > Here's a full editing pass on the documentation, with v45 and Pavel's
> > > > doc-whitespaces-fix.patch applied. I also corrected a typo in one of the
> > > > added hints.
> > >
> > > Great! I've applied almost all of it, except:
> > >
> > > + A jsonb value will accept assignments to nonexistent
> > > subscript
> > > + paths as long as the nonexistent elements being traversed are all
> > > arrays.
> > >
> > > Maybe I've misunderstood the intention, but there is no requirement
> > > about arrays for creating such an empty path. I've formulated it as:
> > >
> > > + A jsonb value will accept assignments to nonexistent
> > > subscript
> > > + paths as long as the last existing path key is an object or an array.
> >
> > My intention there was to highlight the difference between:
> >
> > * SET obj['a']['b']['c'] = '"newvalue"'
> > * SET arr[0][0][3] = '"newvalue"'
> >
> > obj has to conform to {"a": {"b": {...}}} in order to receive the
> > assignment of the nested c. If it doesn't, that's the error case we
> > discussed earlier. But arr can be null, [], and so on, and any missing
> > structure [[[null, null, null, "newvalue"]]] will be created.
>
> If arr is 'null', or any other scalar value, such subscripting will work
> only one level deep because they represented internally as an array of
> one element. If arr is '[]' the path will comply by definition. So it's
> essentially the same as for objects with no particular difference. If
> such a quirk about scalars being treated like arrays is bothering, we
> could also bend it in this case as well (see the attached version).

I missed that distinction in the original UPDATE paragraph too. Here's
another revision based on v48.
From a486ee221469037b08d3663f1ec142a905406f8b Mon Sep 17 00:00:00 2001
From: Dian M Fay 
Date: Wed, 20 Jan 2021 23:36:34 -0500
Subject: [PATCH] more jsonb subscripting documentation edits

---
 doc/src/sgml/json.sgml | 40 ++--
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index deeb9e66e0..e16dd6973d 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -616,16 +616,17 @@ SELECT jdoc-'guid', jdoc-'name' FROM api WHERE 
jdoc @ '{"tags": ["qu
 
   
UPDATE statements may use subscripting in the
-   SET clause to modify jsonb values. Object
-   values being traversed must exist as specified by the subscript path. For
-   instance, the path val['a']['b']['c'] assumes that
-   val, val['a'], and 
val['a']['b']
-   are all objects in every record being updated 
(val['a']['b']
-   may or may not contain a field named c, as long as it's 
an
-   object). If any individual val, 
val['a'],
-   or val['a']['b'] is a non-object such as a string, a 
number,
-   or NULL, an error is raised even if other values do 
conform.
-   Array values are not subject to this restriction, as detailed below.
+   SET clause to modify jsonb values. Subscript
+   paths must be traversible for all affected values insofar as they exist. For
+   instance, the path val['a']['b']['c'] can be traversed 
all
+   the way to c if every val,
+   val['a'], and val['a']['b'] is an
+   object. If any val['a'] or 
val['a']['b']
+   is not defined, it will be created as an empty object and filled as
+   necessary. However, if any val itself or one of the
+   intermediary values is defined as a non-object such as a string, number, or
+   jsonb null, traversal cannot proceed 
so
+   an error is raised and the transaction aborted.
   
 
   
@@ -658,8 +659,9 @@ SELECT * FROM table_name WHERE jsonb_field['key'] = 
'"value"';
 
jsonb assignment via subscripting handles a few edge cases
differently from jsonb_set. When a source 
jsonb
-   is NULL, assignment via subscripting will proceed as if
-   it was an empty JSON object:
+   value is NULL, assignment via subscripting will proceed
+   as if it was an empty JSON value of the type (object or array) implied by 
the
+   subscript key:
 
 
 -- Where jsonb_field was NULL, it is now {"a": 1}
@@ -680,17 +682,19 @@ UPDATE table_name SET jsonb_field[2] = '2';
 
 
A jsonb value will accept assignments to nonexistent subscript
-   paths as long as the last existing path key is an object or an array. Since
-   the final subscript is not traversed, it may be an object key. Nested arrays
-   will be created and NULL-padded according to the path 
until
-   the value can be placed appropriately.
+   paths as long as the last existing element to be traversed is an object or
+   array, as implied by the corresponding subscript (the element indicated by
+   the last subscript 

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-20 Thread Fujii Masao




On 2021/01/21 12:00, Bharath Rupireddy wrote:

On Wed, Jan 20, 2021 at 6:58 PM Fujii Masao  wrote:

+ * It checks if the cache has a connection for the given foreign server that's
+ * not being used within current transaction, then returns true. If the
+ * connection is in use, then it emits a warning and returns false.

The comment also should mention the case where no open connection
for the given server is found? What about rewriting this to the following?

-
If the cached connection for the given foreign server is found and has not
been used within current transaction yet, close the connection and return
true. Even when it's found, if it's already used, keep the connection, emit
a warning and return false. If it's not found, return false.
-


Done.


+ * It returns true, if it closes at least one connection, otherwise false.
+ *
+ * It returns false, if the cache doesn't exit.

The above second comment looks redundant.


Yes. "otherwise false" means it.


+   if (ConnectionHash)
+   result = disconnect_cached_connections(0, true);

Isn't it smarter to make disconnect_cached_connections() check
ConnectionHash and return false if it's NULL? If we do that, we can
simplify the code of postgres_fdw_disconnect() and _all().


Done.


+ * current transaction are disconnected. Otherwise, the unused entries with the
+ * given hashvalue are disconnected.

In the above second comment, a singular form should be used instead?
Because there must be no multiple entries with the given hashvalue.


Rephrased the function comment a bit. Mentioned the _disconnect and
_disconnect_all in comments because we have said enough what each of
those two functions do.

+/*
+ * Workhorse to disconnect cached connections.
+ *
+ * This function disconnects either all unused connections when called from
+ * postgres_fdw_disconnect_all or a given foreign server unused connection when
+ * called from postgres_fdw_disconnect.
+ *
+ * This function returns true if at least one connection is disconnected,
+ * otherwise false.
+ */


+   server = GetForeignServer(entry->serverid);

This seems to cause an error "cache lookup failed"
if postgres_fdw_disconnect_all() is called when there is
a connection in use but its server is dropped. To avoid this error,
GetForeignServerExtended() with FSV_MISSING_OK should be used
instead, like postgres_fdw_get_connections() does?


+1.  So, I changed it to GetForeignServerExtended, added an assertion
for invalidation  just like postgres_fdw_get_connections. I also added
a test case for this, we now emit a slightly different warning for
this case alone that is (errmsg("cannot close dropped server
connection because it is still in use")));. This warning looks okay as
we cannot show any other server name in the ouput and we know that
this rare case can exist when someone drops the server in an explicit
transaction.


+   if (entry->server_hashvalue == hashvalue &&
+   (entry->xact_depth > 0 || result))
+   {
+   hash_seq_term();
+   break;

entry->server_hashvalue can be 0? If yes, since postgres_fdw_disconnect_all()
specifies 0 as hashvalue, ISTM that the above condition can be true
unexpectedly. Can we replace this condition with just "if (!all)"?


I don't think so entry->server_hashvalue can be zero, because
GetSysCacheHashValue1/CatalogCacheComputeHashValue will not return 0
as hash value. I have not seen someone comparing hashvalue with an
expectation that it has 0 value, for instance see if (hashvalue == 0
|| riinfo->oidHashValue == hashvalue).

  Having if(!all) something like below there doesn't suffice because we
might call hash_seq_term, when some connection other than the given
foreign server connection is in use.


No because we check the following condition before reaching that code. No?

+   if ((all || entry->server_hashvalue == hashvalue) &&


I was thinking that "(entry->xact_depth > 0 || result))" condition is not
necessary because "result" is set to true when xact_depth <= 0 and that
condition always indicates true.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Parallel INSERT (INTO ... SELECT ...)

2021-01-20 Thread Greg Nancarrow
On Thu, Jan 21, 2021 at 1:28 PM Zhihong Yu  wrote:
>
> Hi,
> For v12-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch :
>
> is found from the additional parallel-safety checks, or from the existing
> parallel-safety checks for SELECT that it currently performs.
>
> existing and 'it currently performs' are redundant. You can omit 'that it 
> currently performs'.
>

OK, but this is very minor.

> Minor. For index_expr_max_parallel_hazard_for_modify(),
>
> +   if (keycol == 0)
> +   {
> +   /* Found an index expression */
>
> You can check if keycol != 0, continue with the loop. This would save some 
> indent.

Yes I know, but I don't really see any issue with indent (I'm using
4-space tabs).

>
> +   if (index_expr_item == NULL)/* shouldn't happen */
> +   {
> +   elog(WARNING, "too few entries in indexprs list");
>
> I think the warning should be an error since there is assertion ahead of the 
> if statement.
>

Assertions are normally for DEBUG builds, so the Assert would have no
effect in a production (release) build.
Besides, as I have explained in my reply to previous feedback, the
logging of a WARNING (rather than ERROR) is intentional, because I
want processing to continue (not stop) if ever this very rare
condition was to occur - so that the issue can be dealt with by the
current non-parallel processing (rather than stop dead in the middle
of parallel-safety-checking code). For a DEBUG build, it is handy for
the Assert to immediately alert us to the issue (which could really
only be caused by a database corruption, not bug in the code).
Note that Vignesh originally suggested changing it from
"elog(ERROR,...)" to "elog(WARNING,...)", and I agree with him.

Regards,
Greg Nancarrow
Fujitsu Australia




Re: ResourceOwner refactoring

2021-01-20 Thread Michael Paquier
On Thu, Jan 21, 2021 at 12:11:37AM +0200, Heikki Linnakangas wrote:
> Summary: In the the worst scenario, the patched version is still 24% slower
> than unpatched. But many other scenarios are now faster with the patch.

Is there a reason explaining the sudden drop for numsnaps within the
[10,60] range?  The gap looks deeper with a low numkeep.
--
Michael


signature.asc
Description: PGP signature


Re: pg_upgrade fails with non-standard ACL

2021-01-20 Thread Noah Misch
On Thu, Jan 21, 2021 at 01:03:58AM +0300, Anastasia Lubennikova wrote:
> On 03.01.2021 14:29, Noah Misch wrote:
> >On Thu, Jun 11, 2020 at 07:58:43PM +0300, Anastasia Lubennikova wrote:

> Thank you for the review.
> New version of the patch is attached, though I haven't tested it properly
> yet. I am planning to do in a couple of days.

Once that testing completes, please change the commitfest entry status to
Ready for Committer.

> >>+   snprintf(output_path, sizeof(output_path), "revoke_objects.sql");
> >If an upgrade deleted function pg_last_wal_replay_lsn, upgrading a database 
> >in
> >which "REVOKE ALL ON FUNCTION pg_last_wal_replay_lsn FROM public" happened
> >requires a GRANT.  Can you use a file name that will still make sense if
> >someone enhances pg_upgrade to generate such GRANT statements?
> I changed the name to 'fix_system_objects_ACL.sql'. Does it look good to
> you?

That name is fine with me.

> >   ALTER VIEW pg_locks RENAME COLUMN objid TO "a.b";
> >   GRANT SELECT ("a.b") ON pg_locks TO backup;
> >
> >After which revoke_objects.sql has:
> >
> >   psql:./revoke_objects.sql:9: ERROR:  syntax error at or near "") ON 
> > pg_catalog.pg_locks.""
> >   LINE 1: REVOKE ALL (b") ON pg_catalog.pg_locks."a FROM "backup";
> >
> >While that ALTER VIEW probably should have required allow_system_table_mods,
> >we need to assume such databases exist.
> 
> I've fixed it by using pg_identify_object_as_address(). Now we don't have to
> parse obj_identity.

Nice.




Re: adding wait_start column to pg_locks

2021-01-20 Thread Fujii Masao




On 2021/01/18 12:00, torikoshia wrote:

On 2021-01-15 15:23, torikoshia wrote:

Thanks for your reviewing and comments!

On 2021-01-14 12:39, Ian Lawrence Barwick wrote:

Looking at the code, this happens as the wait start time is being recorded in
the lock record itself, so always contains the value reported by the latest lock
acquisition attempt.


I think you are right and wait_start should not be recorded
in the LOCK.


On 2021-01-15 11:48, Ian Lawrence Barwick wrote:

2021年1月15日(金) 3:45 Robert Haas :


On Wed, Jan 13, 2021 at 10:40 PM Ian Lawrence Barwick
 wrote:

It looks like the logical place to store the value is in the

PROCLOCK

structure; ...


That seems surprising, because there's one PROCLOCK for every
combination of a process and a lock. But, a process can't be waiting
for more than one lock at the same time, because once it starts
waiting to acquire the first one, it can't do anything else, and
thus
can't begin waiting for a second one. So I would have thought that
this would be recorded in the PROC.


Umm, I think we're at cross-purposes here. The suggestion is to note
the time when the process started waiting for the lock in the
process's
PROCLOCK, rather than in the lock itself (which in the original
version
of the patch resulted in all processes with an interest in the lock
appearing
to have been waiting to acquire it since the time a lock acquisition
was most recently attempted).


AFAIU, it seems possible to record wait_start in the PROCLOCK but
redundant since each process can wait at most one lock.

To confirm my understanding, I'm going to make another patch that
records wait_start in the PGPROC.


Attached a patch.

I noticed previous patches left the wait_start untouched even after
it acquired lock.
The patch also fixed it.

Any thoughts?


Thanks for updating the patch! I think that this is really useful feature!!
I have two minor comments.

+  
+   wait_start timestamptz

The column name "wait_start" should be "waitstart" for the sake of consistency
with other column names in pg_locks? pg_locks seems to avoid including
an underscore in column names, so "locktype" is used instead of "lock_type",
"virtualtransaction" is used instead of "virtual_transaction", etc.

+   Lock acquisition wait start time. NULL if
+   lock acquired.

There seems the case where the wait start time is NULL even when "grant"
is false. It's better to add note about that case into the docs? For example,
I found that the wait start time is NULL while the startup process is waiting
for the lock. Is this only that case?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: REINDEX backend filtering

2021-01-20 Thread Julien Rouhaud
On Wed, Dec 16, 2020 at 8:27 AM Michael Paquier  wrote:
>
> On Tue, Dec 15, 2020 at 06:34:16PM +0100, Magnus Hagander wrote:
> > Is this really a common enough operation that we need it in the main 
> > grammar?
> >
> > Having the functionality, definitely, but what if it was "just" a
> > function instead? So you'd do something like:
> > SELECT 'reindex index ' || i FROM pg_blah(some, arguments, here)
> > \gexec
> >
> > Or even a function that returns the REINDEX commands directly (taking
> > a parameter to turn on/off concurrency for example).
> >
> > That also seems like it would be easier to make flexible, and just as
> > easy to plug into reindexdb?
>
> Having control in the grammar to choose which index to reindex for a
> table is very useful when it comes to parallel reindexing, because
> it is no-brainer in terms of knowing which index to distribute to one
> job or another.  In short, with this grammar you can just issue a set
> of REINDEX TABLE commands that we know will not conflict with each
> other.  You cannot get that level of control with REINDEX INDEX as it
> may be possible that more or more commands conflict if they work on an
> index of the same relation because it is required to take lock also on
> the parent table.  Of course, we could decide to implement a
> redistribution logic in all frontend tools that need such things, like
> reindexdb, but that's not something I think we should let the client
> decide of.  A backend-side filtering is IMO much simpler, less code,
> and more elegant.

Maybe additional filtering capabilities is not something that will be
required frequently, but I'm pretty sure that reindexing only indexes
that might be corrupt is something that will be required often..  So I
agree, having all that logic in the backend makes everything easier
for users, having the choice of the tools they want to issue the query
while still having all features available.

There was a conflict with a3dc926009be8 (Refactor option handling of
CLUSTER, REINDEX and VACUUM), so rebased version attached.  No other
changes included yet.


v2-0001-Add-a-new-COLLATION-option-to-REINDEX.patch
Description: Binary data


Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-20 Thread Bharath Rupireddy
On Wed, Jan 20, 2021 at 6:58 PM Fujii Masao  wrote:
> + * It checks if the cache has a connection for the given foreign server 
> that's
> + * not being used within current transaction, then returns true. If the
> + * connection is in use, then it emits a warning and returns false.
>
> The comment also should mention the case where no open connection
> for the given server is found? What about rewriting this to the following?
>
> -
> If the cached connection for the given foreign server is found and has not
> been used within current transaction yet, close the connection and return
> true. Even when it's found, if it's already used, keep the connection, emit
> a warning and return false. If it's not found, return false.
> -

Done.

> + * It returns true, if it closes at least one connection, otherwise false.
> + *
> + * It returns false, if the cache doesn't exit.
>
> The above second comment looks redundant.

Yes. "otherwise false" means it.

> +   if (ConnectionHash)
> +   result = disconnect_cached_connections(0, true);
>
> Isn't it smarter to make disconnect_cached_connections() check
> ConnectionHash and return false if it's NULL? If we do that, we can
> simplify the code of postgres_fdw_disconnect() and _all().

Done.

> + * current transaction are disconnected. Otherwise, the unused entries with 
> the
> + * given hashvalue are disconnected.
>
> In the above second comment, a singular form should be used instead?
> Because there must be no multiple entries with the given hashvalue.

Rephrased the function comment a bit. Mentioned the _disconnect and
_disconnect_all in comments because we have said enough what each of
those two functions do.

+/*
+ * Workhorse to disconnect cached connections.
+ *
+ * This function disconnects either all unused connections when called from
+ * postgres_fdw_disconnect_all or a given foreign server unused connection when
+ * called from postgres_fdw_disconnect.
+ *
+ * This function returns true if at least one connection is disconnected,
+ * otherwise false.
+ */

> +   server = GetForeignServer(entry->serverid);
>
> This seems to cause an error "cache lookup failed"
> if postgres_fdw_disconnect_all() is called when there is
> a connection in use but its server is dropped. To avoid this error,
> GetForeignServerExtended() with FSV_MISSING_OK should be used
> instead, like postgres_fdw_get_connections() does?

+1.  So, I changed it to GetForeignServerExtended, added an assertion
for invalidation  just like postgres_fdw_get_connections. I also added
a test case for this, we now emit a slightly different warning for
this case alone that is (errmsg("cannot close dropped server
connection because it is still in use")));. This warning looks okay as
we cannot show any other server name in the ouput and we know that
this rare case can exist when someone drops the server in an explicit
transaction.

> +   if (entry->server_hashvalue == hashvalue &&
> +   (entry->xact_depth > 0 || result))
> +   {
> +   hash_seq_term();
> +   break;
>
> entry->server_hashvalue can be 0? If yes, since postgres_fdw_disconnect_all()
> specifies 0 as hashvalue, ISTM that the above condition can be true
> unexpectedly. Can we replace this condition with just "if (!all)"?

I don't think so entry->server_hashvalue can be zero, because
GetSysCacheHashValue1/CatalogCacheComputeHashValue will not return 0
as hash value. I have not seen someone comparing hashvalue with an
expectation that it has 0 value, for instance see if (hashvalue == 0
|| riinfo->oidHashValue == hashvalue).

 Having if(!all) something like below there doesn't suffice because we
might call hash_seq_term, when some connection other than the given
foreign server connection is in use. Our intention to call
hash_seq_term is only when a given server is found and either it's in
use or is closed.

 if (!all && (entry->xact_depth > 0 || result))
{
hash_seq_term();
break;
}

Given the above points, the existing check looks good to me.

> +-- Closes loopback connection, returns true and issues a warning as loopback2
> +-- connection is still in use and can not be closed.
> +SELECT * FROM postgres_fdw_disconnect_all();
> +WARNING:  cannot close connection for server "loopback2" because it is still 
> in use
> + postgres_fdw_disconnect_all
> +-
> + t
> +(1 row)
>
> After the above test, isn't it better to call postgres_fdw_get_connections()
> to check that loopback is not output?

+1.

> +WARNING:  cannot close connection for server "loopback" because it is still 
> in use
> +WARNING:  cannot close connection for server "loopback2" because it is still 
> in use
>
> Just in the case please let me confirm that the order of these warning
> messages is 

Re: Parallel INSERT (INTO ... SELECT ...)

2021-01-20 Thread Zhihong Yu
Hi,
For v12-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.patch:

+   boolisParallelModifyLeader = IsA(planstate, GatherState) &&
IsA(outerPlanState(planstate), ModifyTableState);

Please wrap long line.

+   uint64 *processed_count_space;

If I read the code correctly, it seems it can be dropped (use
pei->processed_count directly).

Cheers



On Wed, Jan 20, 2021 at 6:29 PM Zhihong Yu  wrote:

> Hi,
> For v12-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch :
>
> is found from the additional parallel-safety checks, or from the existing
> parallel-safety checks for SELECT that it currently performs.
>
> existing and 'it currently performs' are redundant. You can omit 'that it
> currently performs'.
>
> Minor. For index_expr_max_parallel_hazard_for_modify(),
>
> +   if (keycol == 0)
> +   {
> +   /* Found an index expression */
>
> You can check if keycol != 0, continue with the loop. This would save some
> indent.
>
> +   if (index_expr_item == NULL)/* shouldn't happen */
> +   {
> +   elog(WARNING, "too few entries in indexprs list");
>
> I think the warning should be an error since there is assertion ahead of
> the if statement.
>
> +   Assert(!isnull);
> +   if (isnull)
> +   {
> +   /*
> +* This shouldn't ever happen, but if it does, log a
> WARNING
> +* and return UNSAFE, rather than erroring out.
> +*/
> +   elog(WARNING, "null conbin for constraint %u", con->oid);
>
> The above should be error as well.
>
> Cheers
>
> On Wed, Jan 20, 2021 at 5:06 PM Greg Nancarrow 
> wrote:
>
>> Thanks for the feedback.
>> Posting an updated set of patches. Changes are based on feedback, as
>> detailed below:
>>
>> There's a couple of potential issues currently being looked at:
>> - locking issues in additional parallel-safety checks?
>> - apparent uneven work distribution across the parallel workers, for
>> large insert data
>>
>>
>> [Antonin]
>> - Fixed bad Assert in PrepareParallelMode()
>> - Added missing comment to explain use of GetCurrentCommandId() in
>> PrepareParallelMode()
>> - Some variable name shortening in a few places
>> - Created common function for creation of non-parallel and parallel
>> ModifyTable paths
>> - Path count variable changed to bool
>> - Added FIXME comment to dubious code for creating Gather target-list
>> from ModifyTable subplan
>> - Fixed check on returningLists to use NIL instead of NULL
>>
>> [Amit]
>> - Moved additional parallel-safety checks (for modify case) into
>> max_parallel_hazard()
>> - Removed redundant calls to max_parallel_hazard_test()
>> - Added Asserts to "should never happen" null-attribute cases (and
>> added WARNING log missing from one case)
>> - Added comment for use of NoLock in max_parallel_hazard_for_modify()
>>
>> [Vignesh]
>> - Fixed a couple of typos
>> - Added a couple of test cases for testing that the same transaction
>> is used by all parallel workers
>>
>>
>> Regards,
>> Greg Nancarrow
>> Fujitsu Australia
>>
>


Re: [PATCH 1/1] Initial mach based shared memory support.

2021-01-20 Thread Thomas Munro
On Sun, Nov 22, 2020 at 9:19 PM James Hilliard
 wrote:
> In order to avoid hitting these limits we can bypass the wrapper layer
> and just use mach directly.

FWIW I looked into using mach_vm_alllocate() years ago because I
wanted to be able to use its VM_FLAGS_SUPERPAGE_SIZE_2MB flag to
implement PostgreSQL's huge_pages option for the main shared memory
area, but I ran into some difficulty getting such mapping to be
inherited by fork() children.  There may be some way to get past that,
but it seems the current crop of Apple Silicon has only 4KB and 16KB
pages and I don't know if that's interesting enough.  On the other
hand, I just saw a claim that "running an arm64 Debian VM on Apple M1,
using a 16K page size instead of 4K reduces this kernel build time by
*16%*".

[1] https://twitter.com/AtTheHackOfDawn/status/1333895115174187011




Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra

On 1/21/21 3:09 AM, tsunakawa.ta...@fujitsu.com wrote:

From: Tomas Vondra 

Right, that's pretty much what I ended up doing (without the CMD_INSERT
check it'd add batching info to explain for updates too, for example).
I'll do a bit more testing on the attached patch, but I think that's the right 
fix to
push.


Thanks to the outer check for operation ==  CMD_INSERT, the inner one became 
unnecessary.



Right. I've pushed the fix, hopefully buildfarm will get happy again.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Parallel INSERT (INTO ... SELECT ...)

2021-01-20 Thread Zhihong Yu
Hi,
For v12-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch :

is found from the additional parallel-safety checks, or from the existing
parallel-safety checks for SELECT that it currently performs.

existing and 'it currently performs' are redundant. You can omit 'that it
currently performs'.

Minor. For index_expr_max_parallel_hazard_for_modify(),

+   if (keycol == 0)
+   {
+   /* Found an index expression */

You can check if keycol != 0, continue with the loop. This would save some
indent.

+   if (index_expr_item == NULL)/* shouldn't happen */
+   {
+   elog(WARNING, "too few entries in indexprs list");

I think the warning should be an error since there is assertion ahead of
the if statement.

+   Assert(!isnull);
+   if (isnull)
+   {
+   /*
+* This shouldn't ever happen, but if it does, log a WARNING
+* and return UNSAFE, rather than erroring out.
+*/
+   elog(WARNING, "null conbin for constraint %u", con->oid);

The above should be error as well.

Cheers

On Wed, Jan 20, 2021 at 5:06 PM Greg Nancarrow  wrote:

> Thanks for the feedback.
> Posting an updated set of patches. Changes are based on feedback, as
> detailed below:
>
> There's a couple of potential issues currently being looked at:
> - locking issues in additional parallel-safety checks?
> - apparent uneven work distribution across the parallel workers, for
> large insert data
>
>
> [Antonin]
> - Fixed bad Assert in PrepareParallelMode()
> - Added missing comment to explain use of GetCurrentCommandId() in
> PrepareParallelMode()
> - Some variable name shortening in a few places
> - Created common function for creation of non-parallel and parallel
> ModifyTable paths
> - Path count variable changed to bool
> - Added FIXME comment to dubious code for creating Gather target-list
> from ModifyTable subplan
> - Fixed check on returningLists to use NIL instead of NULL
>
> [Amit]
> - Moved additional parallel-safety checks (for modify case) into
> max_parallel_hazard()
> - Removed redundant calls to max_parallel_hazard_test()
> - Added Asserts to "should never happen" null-attribute cases (and
> added WARNING log missing from one case)
> - Added comment for use of NoLock in max_parallel_hazard_for_modify()
>
> [Vignesh]
> - Fixed a couple of typos
> - Added a couple of test cases for testing that the same transaction
> is used by all parallel workers
>
>
> Regards,
> Greg Nancarrow
> Fujitsu Australia
>


Re: Printing backtrace of postgres processes

2021-01-20 Thread Craig Ringer
On Thu, 21 Jan 2021 at 09:56, Tom Lane  wrote:

> Craig Ringer  writes:
> > On Wed, 20 Jan 2021 at 05:23, Tom Lane  wrote:
> >> BTW, it also looks like the patch is doing nothing to prevent the
> >> backtrace from being sent to the connected client.
>
> > I don't see a good reason to send a bt to a client. Even though these
> > backtraces won't be analysing debuginfo and populating args, locals, etc,
> > it should still just go to the server log.
>
> Yeah.  That's easier than I was thinking, we just need to
> s/LOG/LOG_SERVER_ONLY/.
>
> >> Maybe, given all of these things, we should forget using elog at
> >> all and just emit the trace with fprintf(stderr).
>
> > That causes quite a lot of pain with MemoryContextStats() already
>
> True.  Given the changes discussed in the last couple messages, I don't
> see any really killer reasons why we can't ship the trace through elog.
> We can always try that first, and back off to fprintf if we do find
> reasons why it's too unstable.
>

Yep, works for me.

Thanks for being open to considering this.

I know lots of this stuff can seem like a pointless sidetrack because the
utility of it is not obvious on dev systems or when you're doing your own
hands-on expert support on systems you own and operate yourself. These
sorts of things really only start to make sense when you're touching many
different postgres systems "in the wild" - such as commercial support
services, helping people on -general, -bugs or stackoverflow, etc.

I really appreciate your help with it.


RE: POC: postgres_fdw insert batching

2021-01-20 Thread tsunakawa.ta...@fujitsu.com
From: Tom Lane 
> The "single target table" could be partitioned, in which case there'll be
> multiple resultrelinfos, some of which could be foreign tables.

Thank you.  I thought so at first, but later I found that ExecInsert() only 
handles one element in mtstate->resultRelInfo.  So I thought just the first 
element is processed in INSERT case.

I understood (guessed) the for loop is for UPDATE and DELETE.  EXPLAIN without 
ANALYZE UPDATE/DELETE on a partitioned table shows partitions, which would be 
mtstate->resultRelInfo.  EXPLAIN on INSERT doesn't show partitions, so I think 
INSERT will find relevant partitions based on input rows during execution.


Regards
Takayuki Tsunakawa





RE: ResourceOwner refactoring

2021-01-20 Thread kuroda.hay...@fujitsu.com
Dear Heikki,

I tested in the same situation, and I confirmed that almost same results are 
returned.
(results are at the end of the e-mail)

You think that these results are acceptable
because resowners own many resources(more than 64) in general
and it's mainly faster in such a situation, isn't it?

I cannot distinguish correctly, but sounds good.

test result

unpatched

 numkeep | numsnaps | lifo_time_ns | fifo_time_ns 
-+--+--+--
   0 |1 | 68.5 | 69.9
   0 |5 | 73.2 | 76.8
   0 |   10 | 70.6 | 74.7
   0 |   60 | 68.0 | 75.6
   0 |   70 | 91.3 | 94.8
   0 |  100 | 89.0 | 89.1
   0 | 1000 | 97.9 | 98.9
   0 |1 |116.0 |115.9
   9 |   10 | 74.7 | 76.6
   9 |  100 | 80.8 | 80.1
   9 | 1000 | 86.0 | 86.2
   9 |1 |116.1 |116.8
  65 |   70 | 84.7 | 85.3
  65 |  100 | 80.5 | 80.3
  65 | 1000 | 86.3 | 86.2
  65 |1 |115.4 |115.9
(16 rows)

patched

 numkeep | numsnaps | lifo_time_ns | fifo_time_ns 
-+--+--+--
   0 |1 | 62.4 | 62.6
   0 |5 | 68.0 | 66.9
   0 |   10 | 73.6 | 78.1
   0 |   60 | 82.3 | 87.2
   0 |   70 | 83.0 | 89.1
   0 |  100 | 82.8 | 87.9
   0 | 1000 | 88.2 | 96.6
   0 |1 |119.6 |124.5
   9 |   10 | 62.0 | 62.8
   9 |  100 | 75.3 | 78.0
   9 | 1000 | 82.6 | 89.3
   9 |1 |116.6 |122.6
  65 |   70 | 66.7 | 66.4
  65 |  100 | 74.6 | 77.2
  65 | 1000 | 82.1 | 88.2
  65 |1 |118.0 |124.1
(16 rows)

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: Add docs stub for recovery.conf

2021-01-20 Thread Craig Ringer
On Wed, 20 Jan 2021 at 02:45, Stephen Frost  wrote:

> Greetings,
>
> * Stephen Frost (sfr...@snowman.net) wrote:
> > * Craig Ringer (craig.rin...@enterprisedb.com) wrote:
> > > On Thu, 14 Jan 2021 at 03:44, Stephen Frost 
> wrote:
> > > > Alright, how does this look?  The new entries are all under the
> > > > 'obsolete' section to keep it out of the main line, but should work
> to
> > > > 'fix' the links that currently 404 and provide a bit of a 'softer'
> > > > landing for the other cases that currently just forcibly redirect
> using
> > > > the website doc alias capability.
> > >
> > > Thanks for expanding the change to other high profile obsoleted or
> renamed
> > > features and tools.
> >
> > Thanks for taking the time to review it and comment on it!
> >
> > > One minor point. I'm not sure this is quite the best way to spell the
> index
> > > entries:
> > >
> > > +   
> > > + obsolete
> > > + pg_receivexlog
> > > +   
> > >
> > > as it will produce an index term "obsolete" with a list of various
> > > components under it. While that concentrates them nicely, it means
> people
> > > won't actually find them if they're using the index alphabetically.
> >
> > Ah, yeah, that's definitely a good point and one that I hadn't really
> > spent much time thinking about.
> >
> > > I'd slightly prefer
> > >
> > > +   
> > > + pg_receivexlog
> > > + pg_receivewal
> > > +   
> > >
> > > even though that bulks the index up a little, because then people are
> a bit
> > > more likely to find it.
> >
> > Yup, makes sense, updated patch attached which makes that change.
> >
> > > > I ended up not actually doing this for the catalog -> view change of
> > > > pg_replication_slots simply because I don't really think folks will
> > > > misunderstand or be confused by that redirect since it's still the
> same
> > > > relation.  If others disagree though, we could certainly change that
> > > > too.
> > >
> > > I agree with you.
> >
> > Ok, great.
> >
> > How does the attached look then?
>

Pretty good to me. Thanks so much for your help and support with this.


Index entries render as e.g.

pg_xlogdump, The pg_xlogdump command
(see also pg_waldump)

wheras with the obsolete subhead they would render as something like:

obsolete, Obsolete or renamed features, settings and files
pg_xlogdump, The pg_xlogdump command

The see also spelling is much easier to find in the index but doesn't make
it as obvious that it's obsoleted/replaced.

A look at the doxygen docs suggest we should use  not  for
these.

A quick

sed -i -e 's///g' -e 's/<\/seealso>/<\/see>/g'
doc/src/sgml/appendix-obsolete*

causes them to render much better:

pg_receivexlog, The pg_receivexlog command (see pg_receivewal)

It might be worth changing the s too, so I've done so in the
attached. The terms now render as:

pg_receivexlog, pg_receivexlog renamed to pg_recievewal (see
pg_receivewal)

which is good enough in my opinion. The duplication is messy but an
expected artifact of index generation. I don't see any docbook 
attribute that lets you suppress insertion of the  of the section
containing the , and it's not worth fiddling to try to eliminate
it with structural hacks.

The attached changes the titles, changes  to , and also
updates the comments in the obsolete entries SGML docs to specify that the
id must be unchanged + give a recommended index term format.
From c11e0f079c07c669abac0f00ae3f1ebdfc18eae7 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Thu, 21 Jan 2021 10:01:29 +0800
Subject: [PATCH v3] Add a docs section for obsoleted and renamed functions and
 settings

The new appendix groups information on renamed or removed settings,
commands, etc into an out-of-the-way part of the docs.

The original id elements are retained in each subsection to ensure that
the same filenames are produced for HTML docs. This prevents /current/
links on the web from breaking, and allows users of the web docs
to follow links from old version pages to info on the changes in the
new version. Prior to this change, a link to /current/ for renamed
sections like the recovery.conf docs would just 404. Similarly if
someone searched for recovery.conf they would find the pg11 docs,
but there would be no /12/ or /current/ link, so they couldn't easily
find out that it was removed in pg12 or how to adapt.

Index entries are also added so that there's a breadcrumb trail for
users to follow when they know the old name, but not what we changed it
to. So a user who is trying to find out how to set standby_mode in
PostgreSQL 12+, or where pg_resetxlog went, now has more chance of
finding that information.

Craig Ringer and Stephen Frost
---
 .../sgml/appendix-obsolete-pgreceivexlog.sgml | 24 
 .../sgml/appendix-obsolete-pgresetxlog.sgml   | 24 
 .../sgml/appendix-obsolete-pgxlogdump.sgml| 24 
 .../appendix-obsolete-recovery-config.sgml| 58 +++
 doc/src/sgml/appendix-obsolete.sgml

Re: [PoC] Non-volatile WAL buffer

2021-01-20 Thread Masahiko Sawada
On Thu, Jan 7, 2021 at 2:16 AM Tomas Vondra
 wrote:
>
> Hi,
>
> I think I've managed to get the 0002 patch [1] rebased to master and
> working (with help from Masahiko Sawada). It's not clear to me how it
> could have worked as submitted - my theory is that an incomplete patch
> was submitted by mistake, or something like that.
>
> Unfortunately, the benchmark results were kinda disappointing. For a
> pgbench on scale 500 (fits into shared buffers), an average of three
> 5-minute runs looks like this:
>
>branch 116326496
>
>master  7291 87704165310150437224186
>ntt 7912106095213206212410237819
>simple-no-buffers   7654 96544115416 95828103065
>
> NTT refers to the patch from September 10, pre-allocating a large WAL
> file on PMEM, and simple-no-buffers is the simpler patch simply removing
> the WAL buffers and writing directly to a mmap-ed WAL segment on PMEM.
>
> Note: The patch is just replacing the old implementation with mmap.
> That's good enough for experiments like this, but we probably want to
> keep the old one for setups without PMEM. But it's good enough for
> testing, benchmarking etc.
>
> Unfortunately, the results for this simple approach are pretty bad. Not
> only compared to the "ntt" patch, but even to master. I'm not entirely
> sure what's the root cause, but I have a couple hypotheses:
>
> 1) bug in the patch - That's clearly a possibility, although I've tried
> tried to eliminate this possibility.
>
> 2) PMEM is slower than DRAM - From what I know, PMEM is much faster than
> NVMe storage, but still much slower than DRAM (both in terms of latency
> and bandwidth, see [2] for some data). It's not terrible, but the
> latency is maybe 2-3x higher - not a huge difference, but may matter for
> WAL buffers?
>
> 3) PMEM does not handle parallel writes well - If you look at [2],
> Figure 4(b), you'll see that the throughput actually *drops" as the
> number of threads increase. That's pretty strange / annoying, because
> that's how we write into WAL buffers - each thread writes it's own data,
> so parallelism is not something we can get rid of.
>
> I've added some simple profiling, to measure number of calls / time for
> each operation (use -DXLOG_DEBUG_STATS to enable). It accumulates data
> for each backend, and logs the counts every 1M ops.
>
> Typical stats from a concurrent run looks like this:
>
>xlog stats cnt 4300
>   map cnt 100 time 5448333 unmap cnt 100 time 3730963
>   memcpy cnt 985964 time 1550442272 len 15150499
>   memset cnt 0 time 0 len 0
>   persist cnt 13836 time 10369617 len 16292182
>
> The times are in nanoseconds, so this says the backend did 100  mmap and
> unmap calls, taking ~10ms in total. There were ~14k pmem_persist calls,
> taking 10ms in total. And the most time (~1.5s) was used by pmem_memcpy
> copying about 15MB of data. That's quite a lot :-(

It might also be interesting if we can see how much time spent on each
logging function, such as XLogInsert(), XLogWrite(), and XLogFlush().

>
> My conclusion from this is that eliminating WAL buffers and writing WAL
> directly to PMEM (by memcpy to mmap-ed WAL segments) is probably not the
> right approach.
>
> I suppose we should keep WAL buffers, and then just write the data to
> mmap-ed WAL segments on PMEM. Which I think is what the NTT patch does,
> except that it allocates one huge file on PMEM and writes to that
> (instead of the traditional WAL segments).
>
> So I decided to try how it'd work with writing to regular WAL segments,
> mmap-ed ad hoc. The pmem-with-wal-buffers-master.patch patch does that,
> and the results look a bit nicer:
>
>branch 116326496
>
>master  7291 87704165310150437224186
>ntt 7912106095213206212410237819
>simple-no-buffers   7654 96544115416 95828103065
>with-wal-buffers7477 95454181702140167214715
>
> So, much better than the version without WAL buffers, somewhat better
> than master (except for 64/96 clients), but still not as good as NTT.
>
> At this point I was wondering how could the NTT patch be faster when
> it's doing roughly the same thing. I'm sire there are some differences,
> but it seemed strange. The main difference seems to be that it only maps
> one large file, and only once. OTOH the alternative "simple" patch maps
> segments one by one, in each backend. Per the debug stats the map/unmap
> calls are fairly cheap, but maybe it interferes with the memcpy somehow.
>

While looking at the two methods: NTT and simple-no-buffer, I realized
that in XLogFlush(), NTT patch flushes (by pmem_flush() and
pmem_drain()) WAL 

RE: POC: postgres_fdw insert batching

2021-01-20 Thread tsunakawa.ta...@fujitsu.com
From: Tomas Vondra 
> Right, that's pretty much what I ended up doing (without the CMD_INSERT
> check it'd add batching info to explain for updates too, for example).
> I'll do a bit more testing on the attached patch, but I think that's the 
> right fix to
> push.

Thanks to the outer check for operation ==  CMD_INSERT, the inner one became 
unnecessary.


Regards
Takayuki Tsunakawa



Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tom Lane
"tsunakawa.ta...@fujitsu.com"  writes:
> Just for learning, could anyone tell me what this loop for?  I thought 
> current Postgres's DML supports a single target table, so it's enough to 
> handle the first element of mtstate->resultRelInfo.

The "single target table" could be partitioned, in which case there'll be
multiple resultrelinfos, some of which could be foreign tables.

regards, tom lane




Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra



On 1/21/21 2:53 AM, Amit Langote wrote:

On Thu, Jan 21, 2021 at 10:42 AM Tomas Vondra
 wrote:

On 1/21/21 2:24 AM, Amit Langote wrote:

On Thu, Jan 21, 2021 at 9:56 AM Tomas Vondra
 wrote:

On 1/21/21 1:17 AM, Zhihong Yu wrote:

Hi,
The assignment to resultRelInfo is done when junk_filter_needed is true:

   if (junk_filter_needed)
   {
   resultRelInfo = mtstate->resultRelInfo;

Should the code for determining batch size access mtstate->resultRelInfo
directly ?



IMO the issue is that code iterates over all plans and moves to the next
for each one:

   resultRelInfo++;

so it ends up pointing past the last element, hence the failures. So
yeah, either the code needs to move before the loop (per my patch), or
we need to access mtstate->resultRelInfo directly.


Accessing mtstate->resultRelInfo directly would do.  The only
constraint on where this block should be placed is that
ri_projectReturning must be valid as of calling
GetForeignModifyBatchSize(), as Tsunakawa-san pointed out upthread.
So, after this block in ExecInitModifyTable:

  /*
   * Initialize RETURNING projections if needed.
   */
  if (node->returningLists)
  {
  
  /*
   * Build a projection for each result rel.
   */
  resultRelInfo = mtstate->resultRelInfo;
  foreach(l, node->returningLists)
  {
  List   *rlist = (List *) lfirst(l);

  resultRelInfo->ri_returningList = rlist;
  resultRelInfo->ri_projectReturning =
  ExecBuildProjectionInfo(rlist, econtext, slot, >ps,
  
resultRelInfo->ri_RelationDesc->rd_att);
  resultRelInfo++;
  }
  }



Right. But I think Tom is right this should initialize ri_BatchSize for
all the resultRelInfo elements, not just the first one. Per the attached
patch, which resolves the issue both on x86_64 and armv7l for me.


+1 in general.  To avoid looping uselessly in the case of
UPDATE/DELETE where batching can't be used today, I'd suggest putting
if (operation == CMD_INSERT) around the loop.



Right, that's pretty much what I ended up doing (without the CMD_INSERT 
check it'd add batching info to explain for updates too, for example). 
I'll do a bit more testing on the attached patch, but I think that's the 
right fix to push.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9c36860704..a4870d621a 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2797,18 +2797,30 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 * Determine if the FDW supports batch insert and determine the batch
 	 * size (a FDW may support batching, but it may be disabled for the
 	 * server/table).
+	 *
+	 * We only do this for INSERT, so that for UPDATE/DELETE the value
+	 * remains set to 0.
 	 */
-	if (!resultRelInfo->ri_usesFdwDirectModify &&
-		operation == CMD_INSERT &&
-		resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
-		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
-		resultRelInfo->ri_BatchSize =
-			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
-	else
-		resultRelInfo->ri_BatchSize = 1;
+	if (operation == CMD_INSERT)
+	{
+		resultRelInfo = mtstate->resultRelInfo;
+		for (i = 0; i < nplans; i++)
+		{
+			if (!resultRelInfo->ri_usesFdwDirectModify &&
+operation == CMD_INSERT &&
+resultRelInfo->ri_FdwRoutine != NULL &&
+resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+resultRelInfo->ri_BatchSize =
+	resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+			else
+resultRelInfo->ri_BatchSize = 1;
+
+			Assert(resultRelInfo->ri_BatchSize >= 1);
 
-	Assert(resultRelInfo->ri_BatchSize >= 1);
+			resultRelInfo++;
+		}
+	}
 
 	/*
 	 * Lastly, if this is not the primary (canSetTag) ModifyTable node, add it


RE: POC: postgres_fdw insert batching

2021-01-20 Thread tsunakawa.ta...@fujitsu.com
From: Zhihong Yu 
> My first name is Zhihong.

> You can call me Ted if you want to save some typing :-)

Ah, I'm very sorry.  Thank you, let me call you Ted then.  That can't be 
mistaken.

Regards
Takayuki Tsunakawa




Re: Printing backtrace of postgres processes

2021-01-20 Thread Tom Lane
Craig Ringer  writes:
> On Wed, 20 Jan 2021 at 05:23, Tom Lane  wrote:
>> BTW, it also looks like the patch is doing nothing to prevent the
>> backtrace from being sent to the connected client.

> I don't see a good reason to send a bt to a client. Even though these
> backtraces won't be analysing debuginfo and populating args, locals, etc,
> it should still just go to the server log.

Yeah.  That's easier than I was thinking, we just need to
s/LOG/LOG_SERVER_ONLY/.

>> Maybe, given all of these things, we should forget using elog at
>> all and just emit the trace with fprintf(stderr).

> That causes quite a lot of pain with MemoryContextStats() already

True.  Given the changes discussed in the last couple messages, I don't
see any really killer reasons why we can't ship the trace through elog.
We can always try that first, and back off to fprintf if we do find
reasons why it's too unstable.

regards, tom lane




Re: POC: postgres_fdw insert batching

2021-01-20 Thread Amit Langote
On Thu, Jan 21, 2021 at 10:42 AM Tomas Vondra
 wrote:
> On 1/21/21 2:24 AM, Amit Langote wrote:
> > On Thu, Jan 21, 2021 at 9:56 AM Tomas Vondra
> >  wrote:
> >> On 1/21/21 1:17 AM, Zhihong Yu wrote:
> >>> Hi,
> >>> The assignment to resultRelInfo is done when junk_filter_needed is true:
> >>>
> >>>   if (junk_filter_needed)
> >>>   {
> >>>   resultRelInfo = mtstate->resultRelInfo;
> >>>
> >>> Should the code for determining batch size access mtstate->resultRelInfo
> >>> directly ?
> >>>
> >>
> >> IMO the issue is that code iterates over all plans and moves to the next
> >> for each one:
> >>
> >>   resultRelInfo++;
> >>
> >> so it ends up pointing past the last element, hence the failures. So
> >> yeah, either the code needs to move before the loop (per my patch), or
> >> we need to access mtstate->resultRelInfo directly.
> >
> > Accessing mtstate->resultRelInfo directly would do.  The only
> > constraint on where this block should be placed is that
> > ri_projectReturning must be valid as of calling
> > GetForeignModifyBatchSize(), as Tsunakawa-san pointed out upthread.
> > So, after this block in ExecInitModifyTable:
> >
> >  /*
> >   * Initialize RETURNING projections if needed.
> >   */
> >  if (node->returningLists)
> >  {
> >  
> >  /*
> >   * Build a projection for each result rel.
> >   */
> >  resultRelInfo = mtstate->resultRelInfo;
> >  foreach(l, node->returningLists)
> >  {
> >  List   *rlist = (List *) lfirst(l);
> >
> >  resultRelInfo->ri_returningList = rlist;
> >  resultRelInfo->ri_projectReturning =
> >  ExecBuildProjectionInfo(rlist, econtext, slot, 
> > >ps,
> >  
> > resultRelInfo->ri_RelationDesc->rd_att);
> >  resultRelInfo++;
> >  }
> >  }
> >
>
> Right. But I think Tom is right this should initialize ri_BatchSize for
> all the resultRelInfo elements, not just the first one. Per the attached
> patch, which resolves the issue both on x86_64 and armv7l for me.

+1 in general.  To avoid looping uselessly in the case of
UPDATE/DELETE where batching can't be used today, I'd suggest putting
if (operation == CMD_INSERT) around the loop.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




RE: POC: postgres_fdw insert batching

2021-01-20 Thread tsunakawa.ta...@fujitsu.com
From: Tomas Vondra 
> Right. But I think Tom is right this should initialize ri_BatchSize for all 
> the
> resultRelInfo elements, not just the first one. Per the attached patch, which
> resolves the issue both on x86_64 and armv7l for me.

I think Your patch is perfect in the sense that it's ready for the future 
multi-target DML support.  +1

Just for learning, could anyone tell me what this loop for?  I thought current 
Postgres's DML supports a single target table, so it's enough to handle the 
first element of mtstate->resultRelInfo.  In that sense, Amit-san and I agreed 
that we don't put the if block in the for loop yet.


Regards
Takayuki Tsunakawa



Re: POC: postgres_fdw insert batching

2021-01-20 Thread Zhihong Yu
Hi, Takayuki-san:
My first name is Zhihong.

You can call me Ted if you want to save some typing :-)

Cheers

On Wed, Jan 20, 2021 at 5:37 PM tsunakawa.ta...@fujitsu.com <
tsunakawa.ta...@fujitsu.com> wrote:

> From: Zhihong Yu 
>
> > Do we need to consider how this part of code inside
> ExecInitModifyTable() would evolve ?
>
>
>
> > I think placing the compound condition toward the end of
> ExecInitModifyTable() is reasonable because it checks the latest
> information.
>
>
>
> +1 for Zaihong-san's idea.  But instead of rewriting every relsultRelInfo
> to mtstate->resultRelInfo, which makes it a bit harder to read, I'd like to
> suggest just adding "resultRelInfo = mtstate->resultRelInfo;" immediately
> before the if block.
>
>
>
> Thanks a lot, all for helping to solve the problem quickly!
>
>
>
>
>
> Regards
>
> Takayuki Tsunakawa
>
>
>
>


RE: Parallel INSERT (INTO ... SELECT ...)

2021-01-20 Thread Hou, Zhijie
> 
> Thanks for the feedback.
> Posting an updated set of patches. Changes are based on feedback, as detailed
> below:
Hi

It seems there are some previous comments[1][2] not addressed in current patch.
Just to make sure it's not missed.

[1]
https://www.postgresql.org/message-id/77e1c06ffb2240838e5fc94ec8dcb7d3%40G08CNEXMBPEKD05.g08.fujitsu.local

[2]
https://www.postgresql.org/message-id/CAA4eK1LMmz58ej5BgVLJ8VsUGd%3D%2BKcaA8X%3DkStORhxpfpODOxg%40mail.gmail.com

Best regards,
houzj




Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra




On 1/21/21 2:22 AM, Tom Lane wrote:

Tomas Vondra  writes:

I may be wrong, but the most likely explanation seems to be this is due
to the junk filter initialization, which simply moves past the end of
the mtstate->resultRelInfo array.


resultRelInfo is certainly pointing at garbage at that point.



Yup. It's pretty amazing the x86 machines seem to be mostly OK with it.


It kinda seems the GetForeignModifyBatchSize call should happen before
that block. The attached patch fixes this for me (i.e. regression tests
pass with no valgrind reports.



Or did I get that wrong?


Don't we need to initialize ri_BatchSize for *each* resultrelinfo,
not merely the first one?  That is, this new code needs to be
somewhere inside a loop over the result rels.



Yeah, I think you're right. That's an embarrassing oversight :-(


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Printing backtrace of postgres processes

2021-01-20 Thread Craig Ringer
On Wed, 20 Jan 2021 at 05:23, Tom Lane  wrote:

>
> Recursion is scary, but it should (I think) not be possible if this
> is driven off CHECK_FOR_INTERRUPTS.  There will certainly be none
> of those in libbacktrace.
>

We can also hold interrupts for the call, and it might be wise to do so.

One point here is that it might be a good idea to suppress elog.c's
> calls to functions in the error context stack.  As we saw in another
> connection recently, allowing that to happen makes for a *very*
> large increase in the footprint of code that you are expecting to
> work at any random CHECK_FOR_INTERRUPTS call site.
>

I strongly agree. Treat it as errhidecontext().

BTW, it also looks like the patch is doing nothing to prevent the
> backtrace from being sent to the connected client.  I'm not sure
> what I think about whether it'd be okay from a security standpoint
> to do that on the connection that requested the trace, but I sure
> as heck don't want it to happen on connections that didn't.


I don't see a good reason to send a bt to a client. Even though these
backtraces won't be analysing debuginfo and populating args, locals, etc,
it should still just go to the server log.


> Maybe, given all of these things, we should forget using elog at
> all and just emit the trace with fprintf(stderr).


That causes quite a lot of pain with MemoryContextStats() already as it's
frequently difficult to actually locate the output given the variations
that exist in customer logging configurations. Sometimes stderr goes to a
separate file or to journald. It's also much harder to locate the desired
output since there's no log_line_prefix. I have a WIP patch floating around
somewhere that tries to teach MemoryContextStats to write to the ereport
channel when not called during an actual out-of-memory situation for that
reason; an early version is somewhere in the archives.

This is one of those "ok in development, painful in production" situations.

So I'm not a big fan of pushing it to stderr, though I'd rather have that
than not have the ability at all.


Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra



On 1/21/21 2:24 AM, Amit Langote wrote:

On Thu, Jan 21, 2021 at 9:56 AM Tomas Vondra
 wrote:

On 1/21/21 1:17 AM, Zhihong Yu wrote:

Hi,
The assignment to resultRelInfo is done when junk_filter_needed is true:

  if (junk_filter_needed)
  {
  resultRelInfo = mtstate->resultRelInfo;

Should the code for determining batch size access mtstate->resultRelInfo
directly ?



IMO the issue is that code iterates over all plans and moves to the next
for each one:

  resultRelInfo++;

so it ends up pointing past the last element, hence the failures. So
yeah, either the code needs to move before the loop (per my patch), or
we need to access mtstate->resultRelInfo directly.


Accessing mtstate->resultRelInfo directly would do.  The only
constraint on where this block should be placed is that
ri_projectReturning must be valid as of calling
GetForeignModifyBatchSize(), as Tsunakawa-san pointed out upthread.
So, after this block in ExecInitModifyTable:

 /*
  * Initialize RETURNING projections if needed.
  */
 if (node->returningLists)
 {
 
 /*
  * Build a projection for each result rel.
  */
 resultRelInfo = mtstate->resultRelInfo;
 foreach(l, node->returningLists)
 {
 List   *rlist = (List *) lfirst(l);

 resultRelInfo->ri_returningList = rlist;
 resultRelInfo->ri_projectReturning =
 ExecBuildProjectionInfo(rlist, econtext, slot, >ps,
 
resultRelInfo->ri_RelationDesc->rd_att);
 resultRelInfo++;
 }
 }



Right. But I think Tom is right this should initialize ri_BatchSize for 
all the resultRelInfo elements, not just the first one. Per the attached 
patch, which resolves the issue both on x86_64 and armv7l for me.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9c36860704..10febcae8a 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2798,17 +2798,23 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 * size (a FDW may support batching, but it may be disabled for the
 	 * server/table).
 	 */
-	if (!resultRelInfo->ri_usesFdwDirectModify &&
-		operation == CMD_INSERT &&
-		resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
-		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
-		resultRelInfo->ri_BatchSize =
-			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
-	else
-		resultRelInfo->ri_BatchSize = 1;
+	resultRelInfo = mtstate->resultRelInfo;
+	for (i = 0; i < nplans; i++)
+	{
+		if (!resultRelInfo->ri_usesFdwDirectModify &&
+			operation == CMD_INSERT &&
+			resultRelInfo->ri_FdwRoutine != NULL &&
+			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+			resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+			resultRelInfo->ri_BatchSize =
+resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+		else
+			resultRelInfo->ri_BatchSize = 1;
+
+		Assert(resultRelInfo->ri_BatchSize >= 1);
 
-	Assert(resultRelInfo->ri_BatchSize >= 1);
+		resultRelInfo++;
+	}
 
 	/*
 	 * Lastly, if this is not the primary (canSetTag) ModifyTable node, add it


Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-20 Thread Michael Paquier
On Wed, Jan 20, 2021 at 03:34:39PM -0300, Alvaro Herrera wrote:
> On 2021-Jan-20, Alexey Kondratov wrote:
>> Ugh, forgot to attach the patches. Here they are.
> 
> Yeah, looks reasonable.

Patch 0002 still has a whole set of issues as I pointed out a couple
of hours ago, but if we agree on 0001 as being useful if done
independently, I'd rather get that done first.  This way or just
merging both things in a single commit is not a big deal seeing the
amount of code, so I am fine with any approach.  It may be possible
that 0001 requires more changes depending on the work to-be-done for
0002 though?

>> +/* No work if no change in tablespace. */
>> +oldTablespaceOid = rd_rel->reltablespace;
>> +if (tablespaceOid != oldTablespaceOid ||
>> +(tablespaceOid == MyDatabaseTableSpace && 
>> OidIsValid(oldTablespaceOid)))
>> +{
>> +/* Update the pg_class row. */
>> +rd_rel->reltablespace = (tablespaceOid == MyDatabaseTableSpace) 
>> ?
>> +InvalidOid : tablespaceOid;
>> +CatalogTupleUpdate(pg_class, >t_self, tuple);
>> +
>> +changed = true;
>> +}
>> +
>> +if (changed)
>> +/* Record dependency on tablespace */
>> +changeDependencyOnTablespace(RelationRelationId,
>> + 
>> reloid, rd_rel->reltablespace);
> 
> Why have a separate "if (changed)" block here instead of merging with
> the above?

Yep.

+   if (SetRelTablespace(reloid, newTableSpace))
+   /* Make sure the reltablespace change is visible */
+   CommandCounterIncrement();
At quick glance, I am wondering why you just don't do a CCI within
SetRelTablespace().
--
Michael


signature.asc
Description: PGP signature


RE: POC: postgres_fdw insert batching

2021-01-20 Thread tsunakawa.ta...@fujitsu.com
From: Zhihong Yu 
> Do we need to consider how this part of code inside ExecInitModifyTable() 
> would evolve ?

> I think placing the compound condition toward the end of 
> ExecInitModifyTable() is reasonable because it checks the latest information.

+1 for Zaihong-san's idea.  But instead of rewriting every relsultRelInfo to 
mtstate->resultRelInfo, which makes it a bit harder to read, I'd like to 
suggest just adding "resultRelInfo = mtstate->resultRelInfo;" immediately 
before the if block.

Thanks a lot, all for helping to solve the problem quickly!


Regards
Takayuki Tsunakawa



Re: Printing backtrace of postgres processes

2021-01-20 Thread Craig Ringer
On Wed, 20 Jan 2021 at 01:31, Robert Haas  wrote:

> On Sat, Jan 16, 2021 at 3:21 PM Tom Lane  wrote:
> > I'd argue that backtraces for those processes aren't really essential,
> > and indeed that trying to make the syslogger report its own backtrace
> > is damn dangerous.
>
> I agree. Ideally I'd like to be able to use the same mechanism
> everywhere and include those processes too, but surely regular
> backends and parallel workers are going to be the things that come up
> most often.
>
> > (Personally, I think this whole patch fails the safety-vs-usefulness
> > tradeoff, but I expect I'll get shouted down.)
>
> You and I are frequently on opposite sides of these kinds of
> questions, but I think this is a closer call than many cases. I'm
> convinced that it's useful, but I'm not sure whether it's safe. On the
> usefulness side, backtraces are often the only way to troubleshoot
> problems that occur on production systems. I wish we had better
> logging and tracing tools instead of having to ask for this sort of
> thing, but we don't.


Agreed.

In theory we should be able to do this sort of thing using external trace
and diagnostic tools like perf, systemtap, etc. In practice, these tools
tend to be quite version-sensitive and hard to get right without multiple
rounds of back-and-forth to deal with specifics of the site's setup,
installed debuginfo or lack thereof, specific tool versions, etc.

It's quite common to have to fall back on attaching gdb with a breakpoint
on a function in the export symbol table (so it works w/o debuginfo),
saving a core, and then analysing the core on a separate system on which
debuginfo is available for all the loaded modules. It's a major pain.

The ability to get a basic bt from within Pg is strongly desirable. IIRC
gdb's basic unwinder works without external debuginfo, if not especially
well. libunwind produces much better results, but that didn't pass the
extra-dependency bar when backtracing support was introduced to core
postgres.

On a side note, to help get better diagnostics I've also been meaning to
look into building --enable-debug with -ggdb3 so we can embed macro info,
and using dwz to deduplicate+compress the debuginfo so we can encourage
people to install it by default on production. I also want to start
exporting pointers to all the important data symbols for diagnostic use,
even if we do so in a separate ELF section just for debug use.


Re: strange error reporting

2021-01-20 Thread Tom Lane
Robert Haas  writes:
>>> Maybe it would be better if it said:
>>> connection to database at socket "/tmp/.s.PGSQL.5432" failed: FATAL:
>>> database "rhaas" does not exist

>> I'd be inclined to spell it "connection to server at ... failed",
>> but that sort of wording is surely also possible.

> "connection to server" rather than "connection to database" works for
> me; in fact, I think I like it slightly better.

If I don't hear any other opinions, I'll change these messages to

"connection to server at socket \"%s\" failed: "
"connection to server at \"%s\" (%s), port %s failed: "

(or maybe "server on socket"?  "at" sounds right for the IP address
case, but it feels a little off in the socket pathname case.)

regards, tom lane




Re: POC: postgres_fdw insert batching

2021-01-20 Thread Amit Langote
On Thu, Jan 21, 2021 at 9:56 AM Tomas Vondra
 wrote:
> On 1/21/21 1:17 AM, Zhihong Yu wrote:
> > Hi,
> > The assignment to resultRelInfo is done when junk_filter_needed is true:
> >
> >  if (junk_filter_needed)
> >  {
> >  resultRelInfo = mtstate->resultRelInfo;
> >
> > Should the code for determining batch size access mtstate->resultRelInfo
> > directly ?
> >
>
> IMO the issue is that code iterates over all plans and moves to the next
> for each one:
>
>  resultRelInfo++;
>
> so it ends up pointing past the last element, hence the failures. So
> yeah, either the code needs to move before the loop (per my patch), or
> we need to access mtstate->resultRelInfo directly.

Accessing mtstate->resultRelInfo directly would do.  The only
constraint on where this block should be placed is that
ri_projectReturning must be valid as of calling
GetForeignModifyBatchSize(), as Tsunakawa-san pointed out upthread.
So, after this block in ExecInitModifyTable:

/*
 * Initialize RETURNING projections if needed.
 */
if (node->returningLists)
{

/*
 * Build a projection for each result rel.
 */
resultRelInfo = mtstate->resultRelInfo;
foreach(l, node->returningLists)
{
List   *rlist = (List *) lfirst(l);

resultRelInfo->ri_returningList = rlist;
resultRelInfo->ri_projectReturning =
ExecBuildProjectionInfo(rlist, econtext, slot, >ps,
resultRelInfo->ri_RelationDesc->rd_att);
resultRelInfo++;
}
}

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tom Lane
Tomas Vondra  writes:
> I may be wrong, but the most likely explanation seems to be this is due 
> to the junk filter initialization, which simply moves past the end of 
> the mtstate->resultRelInfo array.

resultRelInfo is certainly pointing at garbage at that point.

> It kinda seems the GetForeignModifyBatchSize call should happen before 
> that block. The attached patch fixes this for me (i.e. regression tests 
> pass with no valgrind reports.

> Or did I get that wrong?

Don't we need to initialize ri_BatchSize for *each* resultrelinfo,
not merely the first one?  That is, this new code needs to be
somewhere inside a loop over the result rels.

regards, tom lane




Re: POC: postgres_fdw insert batching

2021-01-20 Thread Zhihong Yu
Hi,
Do we need to consider how this part of code inside ExecInitModifyTable()
would evolve ?

I think placing the compound condition toward the end
of ExecInitModifyTable() is reasonable because it checks the latest
information.

Regards

On Wed, Jan 20, 2021 at 5:11 PM Tomas Vondra 
wrote:

>
>
> On 1/21/21 2:02 AM, Zhihong Yu wrote:
> > Hi, Tomas:
> > In my opinion, my patch is a little better.
> > Suppose one of the conditions in the if block changes in between the
> > start of loop and the end of the loop:
> >
> >   * Determine if the FDW supports batch insert and determine the
> batch
> >   * size (a FDW may support batching, but it may be disabled for the
> >   * server/table).
> >
> > My patch would reflect that change. I guess this was the reason the if /
> > else block was placed there in the first place.
> >
>
> But can it change? All the loop does is extracting junk attributes from
> the plans, it does not modify anything related to the batching. Or maybe
> I just don't understand what you mean.
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra




On 1/21/21 2:02 AM, Zhihong Yu wrote:

Hi, Tomas:
In my opinion, my patch is a little better.
Suppose one of the conditions in the if block changes in between the 
start of loop and the end of the loop:


      * Determine if the FDW supports batch insert and determine the batch
      * size (a FDW may support batching, but it may be disabled for the
      * server/table).

My patch would reflect that change. I guess this was the reason the if / 
else block was placed there in the first place.




But can it change? All the loop does is extracting junk attributes from 
the plans, it does not modify anything related to the batching. Or maybe 
I just don't understand what you mean.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: POC: postgres_fdw insert batching

2021-01-20 Thread Zhihong Yu
Hi, Tomas:
In my opinion, my patch is a little better.
Suppose one of the conditions in the if block changes in between the start
of loop and the end of the loop:

 * Determine if the FDW supports batch insert and determine the batch
 * size (a FDW may support batching, but it may be disabled for the
 * server/table).

My patch would reflect that change. I guess this was the reason the if /
else block was placed there in the first place.

Cheers

On Wed, Jan 20, 2021 at 4:56 PM Tomas Vondra 
wrote:

>
>
> On 1/21/21 1:17 AM, Zhihong Yu wrote:
> > Hi,
> > The assignment to resultRelInfo is done when junk_filter_needed is true:
> >
> >  if (junk_filter_needed)
> >  {
> >  resultRelInfo = mtstate->resultRelInfo;
> >
> > Should the code for determining batch size access mtstate->resultRelInfo
> > directly ?
> >
>
> IMO the issue is that code iterates over all plans and moves to the next
> for each one:
>
>  resultRelInfo++;
>
> so it ends up pointing past the last element, hence the failures. So
> yeah, either the code needs to move before the loop (per my patch), or
> we need to access mtstate->resultRelInfo directly.
>
> I'm pretty amazed this did not crash during any of the many regression
> runs I did recently.
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra




On 1/21/21 1:17 AM, Zhihong Yu wrote:

Hi,
The assignment to resultRelInfo is done when junk_filter_needed is true:

         if (junk_filter_needed)
         {
             resultRelInfo = mtstate->resultRelInfo;

Should the code for determining batch size access mtstate->resultRelInfo 
directly ?




IMO the issue is that code iterates over all plans and moves to the next 
for each one:


resultRelInfo++;

so it ends up pointing past the last element, hence the failures. So 
yeah, either the code needs to move before the loop (per my patch), or 
we need to access mtstate->resultRelInfo directly.


I'm pretty amazed this did not crash during any of the many regression 
runs I did recently.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra

On 1/21/21 12:59 AM, Tom Lane wrote:

Tomas Vondra  writes:

OK, pushed after a little bit of additional polishing (mostly comments).
Thanks everyone!


florican reports this is seriously broken on 32-bit hardware:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican=2021-01-20%2023%3A08%3A15

First guess is incorrect memory-allocation computations ...



I know, although it seems more like an access to unitialized memory. 
I've already posted a patch that resolves that for me on 64-bits (per 
valgrind, I suppose it's the same issue).


I'm working on reproducing it on 32-bits, hopefully it won't take long.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Printing LSN made easy

2021-01-20 Thread Kyotaro Horiguchi
At Wed, 20 Jan 2021 16:40:59 +0900, Michael Paquier  wrote 
in 
> On Wed, Jan 20, 2021 at 07:25:37AM +0100, Peter Eisentraut wrote:
> > It looks like we are not getting any consensus on this approach.  One
> > reduced version I would consider is just the second part, so you'd write
> > something like
> > 
> > snprintf(lsnchar, sizeof(lsnchar), "%X/%X",
> >  LSN_FORMAT_ARGS(lsn));
> > 
> > This would still reduce notational complexity quite a bit but avoid any
> > funny business with the format strings.
> 
> That seems reasonable to me.  So +1.

That seems in the good balance. +1, too.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra

On 1/21/21 12:52 AM, Tomas Vondra wrote:

Hmm, seems that florican doesn't like this :-(

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican=2021-01-20%2023%3A08%3A15 



It's a i386 machine running FreeBSD, so not sure what exactly it's picky 
about. But when I tried running this under valgrind, I get some strange 
failures in the new chunk in ExecInitModifyTable:


   /*
    * Determine if the FDW supports batch insert and determine the batch
    * size (a FDW may support batching, but it may be disabled for the
    * server/table).
    */
   if (!resultRelInfo->ri_usesFdwDirectModify &&
   operation == CMD_INSERT &&
   resultRelInfo->ri_FdwRoutine != NULL &&
   resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
   resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
   resultRelInfo->ri_BatchSize =

resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
   else
   resultRelInfo->ri_BatchSize = 1;

   Assert(resultRelInfo->ri_BatchSize >= 1);

It seems as if the resultRelInfo is not initialized, or something like 
that. I wouldn't be surprised if the 32-bit machine was pickier and 
failing because of that.


A sample of the valgrind log is attached. It's pretty much just 
repetitions of these three reports.




OK, it's definitely accessing uninitialized memory, because the 
resultRelInfo (on line 2801, i.e. the "if" condition) looks like this:


(gdb) p resultRelInfo
$1 = (ResultRelInfo *) 0xe595988
(gdb) p *resultRelInfo
$2 = {type = 2139062142, ri_RangeTableIndex = 2139062143, 
ri_RelationDesc = 0x7f7f7f7f7f7f7f7f, ri_NumIndices = 2139062143, 
ri_IndexRelationDescs = 0x7f7f7f7f7f7f7f7f, ri_IndexRelationInfo = 
0x7f7f7f7f7f7f7f7f,
  ri_TrigDesc = 0x7f7f7f7f7f7f7f7f, ri_TrigFunctions = 
0x7f7f7f7f7f7f7f7f, ri_TrigWhenExprs = 0x7f7f7f7f7f7f7f7f, 
ri_TrigInstrument = 0x7f7f7f7f7f7f7f7f, ri_ReturningSlot = 
0x7f7f7f7f7f7f7f7f, ri_TrigOldSlot = 0x7f7f7f7f7f7f7f7f,
  ri_TrigNewSlot = 0x7f7f7f7f7f7f7f7f, ri_FdwRoutine = 
0x7f7f7f7f7f7f7f7f, ri_FdwState = 0x7f7f7f7f7f7f7f7f, 
ri_usesFdwDirectModify = 127, ri_NumSlots = 2139062143, ri_BatchSize = 
2139062143, ri_Slots = 0x7f7f7f7f7f7f7f7f,
  ri_PlanSlots = 0x7f7f7f7f7f7f7f7f, ri_WithCheckOptions = 
0x7f7f7f7f7f7f7f7f, ri_WithCheckOptionExprs = 0x7f7f7f7f7f7f7f7f, 
ri_ConstraintExprs = 0x7f7f7f7f7f7f7f7f, ri_GeneratedExprs = 
0x7f7f7f7f7f7f7f7f,
  ri_NumGeneratedNeeded = 2139062143, ri_junkFilter = 
0x7f7f7f7f7f7f7f7f, ri_returningList = 0x7f7f7f7f7f7f7f7f, 
ri_projectReturning = 0x7f7f7f7f7f7f7f7f, ri_onConflictArbiterIndexes = 
0x7f7f7f7f7f7f7f7f,
  ri_onConflict = 0x7f7f7f7f7f7f7f7f, ri_PartitionCheckExpr = 
0x7f7f7f7f7f7f7f7f, ri_PartitionRoot = 0x7f7f7f7f7f7f7f7f, 
ri_RootToPartitionMap = 0x8, ri_PartitionTupleSlot = 0x8, 
ri_ChildToRootMap = 0xe5952b0,

  ri_CopyMultiInsertBuffer = 0xe596740}
(gdb)

I may be wrong, but the most likely explanation seems to be this is due 
to the junk filter initialization, which simply moves past the end of 
the mtstate->resultRelInfo array.


It kinda seems the GetForeignModifyBatchSize call should happen before 
that block. The attached patch fixes this for me (i.e. regression tests 
pass with no valgrind reports.


Or did I get that wrong?

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9c36860704..2ac4999dc8 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2672,6 +2672,23 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		}
 	}
 
+	/*
+	 * Determine if the FDW supports batch insert and determine the batch
+	 * size (a FDW may support batching, but it may be disabled for the
+	 * server/table).
+	 */
+	if (!resultRelInfo->ri_usesFdwDirectModify &&
+		operation == CMD_INSERT &&
+		resultRelInfo->ri_FdwRoutine != NULL &&
+		resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+		resultRelInfo->ri_BatchSize =
+			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+	else
+		resultRelInfo->ri_BatchSize = 1;
+
+	Assert(resultRelInfo->ri_BatchSize >= 1);
+
 	/* select first subplan */
 	mtstate->mt_whichplan = 0;
 	subplan = (Plan *) linitial(node->plans);
@@ -2793,23 +2810,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		}
 	}
 
-	/*
-	 * Determine if the FDW supports batch insert and determine the batch
-	 * size (a FDW may support batching, but it may be disabled for the
-	 * server/table).
-	 */
-	if (!resultRelInfo->ri_usesFdwDirectModify &&
-		operation == CMD_INSERT &&
-		resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
-		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
-		resultRelInfo->ri_BatchSize =
-			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);

Heap's backwards scan scans the incorrect pages with heap_setscanlimits()

2021-01-20 Thread David Rowley
Hackers,

It looks like both heapgettup() and heapgettup_pagemode() are coded
incorrectly when setting the page to start the scan on for a backwards
scan when heap_setscanlimits() has been used.

It looks like the code was not updated during 7516f5259.

The current code is:

/* start from last page of the scan */
if (scan->rs_startblock > 0)
page = scan->rs_startblock - 1;
else
page = scan->rs_nblocks - 1;


Where rs_startblock is either the sync scan start location, or the
start page set by heap_setscanlimits().  rs_nblocks is the number of
blocks in the relation.

Let's say we have a 100 block relation and we want to scan blocks 10
to 30 in a backwards order. We'll do heap_setscanlimits(scan, 10, 21);
to indicate that we want to scan 21 blocks starting at page 10 and
finishing after scanning page 30.

What the code above does is wrong.  Since rs_startblock is > 0 we'll execute:

page = scan->rs_nblocks - 1;

i.e. 99. then proceed to scan blocks all blocks down to 78.  Oops. Not
quite the 10 to 30 that we asked for.

Now, it does not appear that there are any live bugs here, in core at
least.  The only usage I see of heap_setscanlimits() is in
heapam_index_build_range_scan() to which I see the scan is a forward
scan. I only noticed the bug as I'm in the middle of fixing up [1] to
implement backwards TID Range scans.

Proposed patch attached.

Since this is not a live bug, is it worth a backpatch?  I guess some
extensions could suffer from this, I'm just not sure how likely that
is as if anyone is doing backwards scanning with a setscanlimits set,
then they'd surely have noticed this already!?

David

[1] 
https://postgr.es/m/CAMyN-kB-nFTkF=va_jpwfno08s0d-yk0f741s2b7ldmyai8...@mail.gmail.com
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index faffbb1865..ddd214b7af 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -603,11 +603,15 @@ heapgettup(HeapScanDesc scan,
 * forward scanners.
 */
scan->rs_base.rs_flags &= ~SO_ALLOW_SYNC;
-   /* start from last page of the scan */
-   if (scan->rs_startblock > 0)
-   page = scan->rs_startblock - 1;
+
+   /*
+* Start from last page of the scan.  Ensure we take 
into account
+* rs_numblocks if it's been adjusted by 
heap_setscanlimits().
+*/
+   if (scan->rs_numblocks != InvalidBlockNumber)
+   page = (scan->rs_startblock + 
scan->rs_numblocks - 1) % scan->rs_nblocks;
else
-   page = scan->rs_nblocks - 1;
+   page = (scan->rs_startblock + scan->rs_nblocks 
- 1) % scan->rs_nblocks;
heapgetpage((TableScanDesc) scan, page);
}
else
@@ -918,11 +922,15 @@ heapgettup_pagemode(HeapScanDesc scan,
 * forward scanners.
 */
scan->rs_base.rs_flags &= ~SO_ALLOW_SYNC;
-   /* start from last page of the scan */
-   if (scan->rs_startblock > 0)
-   page = scan->rs_startblock - 1;
+
+   /*
+* Start from last page of the scan.  Ensure we take 
into account
+* rs_numblocks if it's been adjusted by 
heap_setscanlimits().
+*/
+   if (scan->rs_numblocks != InvalidBlockNumber)
+   page = (scan->rs_startblock + 
scan->rs_numblocks - 1) % scan->rs_nblocks;
else
-   page = scan->rs_nblocks - 1;
+   page = (scan->rs_startblock + scan->rs_nblocks 
- 1) % scan->rs_nblocks;
heapgetpage((TableScanDesc) scan, page);
}
else


Re: POC: postgres_fdw insert batching

2021-01-20 Thread Zhihong Yu
Hi,
The assignment to resultRelInfo is done when junk_filter_needed is true:

if (junk_filter_needed)
{
resultRelInfo = mtstate->resultRelInfo;

Should the code for determining batch size access mtstate->resultRelInfo
directly ?

diff --git a/src/backend/executor/nodeModifyTable.c
b/src/backend/executor/nodeModifyTable.c
index 9c36860704..a6a814454d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2798,17 +2798,17 @@ ExecInitModifyTable(ModifyTable *node, EState
*estate, int eflags)
  * size (a FDW may support batching, but it may be disabled for the
  * server/table).
  */
-if (!resultRelInfo->ri_usesFdwDirectModify &&
+if (!mtstate->resultRelInfo->ri_usesFdwDirectModify &&
 operation == CMD_INSERT &&
-resultRelInfo->ri_FdwRoutine != NULL &&
-resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
-resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
-resultRelInfo->ri_BatchSize =
-
 resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+mtstate->resultRelInfo->ri_FdwRoutine != NULL &&
+mtstate->resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+mtstate->resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+mtstate->resultRelInfo->ri_BatchSize =
+
 
mtstate->resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(mtstate->resultRelInfo);
 else
-resultRelInfo->ri_BatchSize = 1;
+mtstate->resultRelInfo->ri_BatchSize = 1;

-Assert(resultRelInfo->ri_BatchSize >= 1);
+Assert(mtstate->resultRelInfo->ri_BatchSize >= 1);

 /*
  * Lastly, if this is not the primary (canSetTag) ModifyTable node,
add it

Cheers

On Wed, Jan 20, 2021 at 3:52 PM Tomas Vondra 
wrote:

> Hmm, seems that florican doesn't like this :-(
>
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican=2021-01-20%2023%3A08%3A15
>
> It's a i386 machine running FreeBSD, so not sure what exactly it's picky
> about. But when I tried running this under valgrind, I get some strange
> failures in the new chunk in ExecInitModifyTable:
>
>/*
> * Determine if the FDW supports batch insert and determine the batch
> * size (a FDW may support batching, but it may be disabled for the
> * server/table).
> */
>if (!resultRelInfo->ri_usesFdwDirectModify &&
>operation == CMD_INSERT &&
>resultRelInfo->ri_FdwRoutine != NULL &&
>resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
>resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
>resultRelInfo->ri_BatchSize =
>
> resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
>else
>resultRelInfo->ri_BatchSize = 1;
>
>Assert(resultRelInfo->ri_BatchSize >= 1);
>
> It seems as if the resultRelInfo is not initialized, or something like
> that. I wouldn't be surprised if the 32-bit machine was pickier and
> failing because of that.
>
> A sample of the valgrind log is attached. It's pretty much just
> repetitions of these three reports.
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tom Lane
Tomas Vondra  writes:
> OK, pushed after a little bit of additional polishing (mostly comments).
> Thanks everyone!

florican reports this is seriously broken on 32-bit hardware:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican=2021-01-20%2023%3A08%3A15

First guess is incorrect memory-allocation computations ...

regards, tom lane




Re: list of extended statistics on psql

2021-01-20 Thread Tatsuro Yamada

Hi Tomas and hackers,

On 2021/01/21 7:00, Tomas Vondra wrote:

I created patches and my test results on PG10, 11, 12, and 14 are fine.

   0001:
 - Fix query to use pg_statistic_ext only
 - Replace statuses "required" and "built" with "defined"
 - Remove the size columns
 - Fix document
 - Add schema name as a filter condition on the query

   0002:
 - Fix all results of \dX
 - Add new testcase by non-superuser

Please find attached files. :-D


Thanks, I've pushed this. I had to tweak the regression tests a bit, for two 
reasons:

1) to change user in regression tests, don't use \connect, but SET ROLE and 
RESET ROLE

2) roles in regression tests should use names with regress_ prefix



Thanks for reviewing many times and committing the feature!

I understood 1) and 2). I'll keep that in mind for the next developing patch.
Then, If possible, could you add Justin to the commit message as a reviewer?
Because I revised the document partly based on his comments.

Finally, As extended stats were more used, this feature becomes more useful.
I hope it helps DBA. :-D


Thanks,
Tatsuro Yamada






Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra

Hmm, seems that florican doesn't like this :-(

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican=2021-01-20%2023%3A08%3A15

It's a i386 machine running FreeBSD, so not sure what exactly it's picky 
about. But when I tried running this under valgrind, I get some strange 
failures in the new chunk in ExecInitModifyTable:


  /*
   * Determine if the FDW supports batch insert and determine the batch
   * size (a FDW may support batching, but it may be disabled for the
   * server/table).
   */
  if (!resultRelInfo->ri_usesFdwDirectModify &&
  operation == CMD_INSERT &&
  resultRelInfo->ri_FdwRoutine != NULL &&
  resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
  resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
  resultRelInfo->ri_BatchSize =

resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
  else
  resultRelInfo->ri_BatchSize = 1;

  Assert(resultRelInfo->ri_BatchSize >= 1);

It seems as if the resultRelInfo is not initialized, or something like 
that. I wouldn't be surprised if the 32-bit machine was pickier and 
failing because of that.


A sample of the valgrind log is attached. It's pretty much just 
repetitions of these three reports.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
==186005== Invalid read of size 1
==186005==at 0x759C12: ExecInitModifyTable (nodeModifyTable.c:2801)
==186005==by 0x720B35: ExecInitNode (execProcnode.c:174)
==186005==by 0x716F99: InitPlan (execMain.c:939)
==186005==by 0x715F47: standard_ExecutorStart (execMain.c:266)
==186005==by 0x715CD3: ExecutorStart (execMain.c:148)
==186005==by 0x9495B1: ProcessQuery (pquery.c:155)
==186005==by 0x94AF24: PortalRunMulti (pquery.c:1267)
==186005==by 0x94A4C8: PortalRun (pquery.c:779)
==186005==by 0x94438E: exec_simple_query (postgres.c:1240)
==186005==by 0x9485F5: PostgresMain (postgres.c:4394)
==186005==by 0x88D1D4: BackendRun (postmaster.c:4484)
==186005==by 0x88CB53: BackendStartup (postmaster.c:4206)
==186005==by 0x889205: ServerLoop (postmaster.c:1730)
==186005==by 0x888AD0: PostmasterMain (postmaster.c:1402)
==186005==by 0x78D466: main (main.c:209)
==186005==  Address 0xe594f78 is 1,864 bytes inside a block of size 8,192 alloc'd
==186005==at 0x483A809: malloc (vg_replace_malloc.c:307)
==186005==by 0xAFEFE6: AllocSetContextCreateInternal (aset.c:468)
==186005==by 0x7298F9: CreateExprContextInternal (execUtils.c:252)
==186005==by 0x7299DD: CreateExprContext (execUtils.c:302)
==186005==by 0x729C5E: ExecAssignExprContext (execUtils.c:481)
==186005==by 0x75D24D: ExecInitSeqScan (nodeSeqscan.c:147)
==186005==by 0x720BEF: ExecInitNode (execProcnode.c:207)
==186005==by 0x716F99: InitPlan (execMain.c:939)
==186005==by 0x715F47: standard_ExecutorStart (execMain.c:266)
==186005==by 0x715CD3: ExecutorStart (execMain.c:148)
==186005==by 0x949E64: PortalStart (pquery.c:505)
==186005==by 0x9442A2: exec_simple_query (postgres.c:1201)
==186005==by 0x9485F5: PostgresMain (postgres.c:4394)
==186005==by 0x88D1D4: BackendRun (postmaster.c:4484)
==186005==by 0x88CB53: BackendStartup (postmaster.c:4206)
==186005==by 0x889205: ServerLoop (postmaster.c:1730)
==186005==by 0x888AD0: PostmasterMain (postmaster.c:1402)
==186005==by 0x78D466: main (main.c:209)
==186005== 
{
   
   Memcheck:Addr1
   fun:ExecInitModifyTable
   fun:ExecInitNode
   fun:InitPlan
   fun:standard_ExecutorStart
   fun:ExecutorStart
   fun:ProcessQuery
   fun:PortalRunMulti
   fun:PortalRun
   fun:exec_simple_query
   fun:PostgresMain
   fun:BackendRun
   fun:BackendStartup
   fun:ServerLoop
   fun:PostmasterMain
   fun:main
}
==186005== Invalid write of size 4
==186005==at 0x759C74: ExecInitModifyTable (nodeModifyTable.c:2809)
==186005==by 0x720B35: ExecInitNode (execProcnode.c:174)
==186005==by 0x716F99: InitPlan (execMain.c:939)
==186005==by 0x715F47: standard_ExecutorStart (execMain.c:266)
==186005==by 0x715CD3: ExecutorStart (execMain.c:148)
==186005==by 0x9495B1: ProcessQuery (pquery.c:155)
==186005==by 0x94AF24: PortalRunMulti (pquery.c:1267)
==186005==by 0x94A4C8: PortalRun (pquery.c:779)
==186005==by 0x94438E: exec_simple_query (postgres.c:1240)
==186005==by 0x9485F5: PostgresMain (postgres.c:4394)
==186005==by 0x88D1D4: BackendRun (postmaster.c:4484)
==186005==by 0x88CB53: BackendStartup (postmaster.c:4206)
==186005==by 0x889205: ServerLoop (postmaster.c:1730)
==186005==by 0x888AD0: PostmasterMain (postmaster.c:1402)
==186005==by 0x78D466: main (main.c:209)
==186005==  Address 0xe594f80 is 1,872 bytes inside a block of size 8,192 alloc'd
==186005==at 0x483A809: malloc (vg_replace_malloc.c:307)
==186005==by 0xAFEFE6: AllocSetContextCreateInternal (aset.c:468)
==186005==by 0x7298F9: CreateExprContextInternal (execUtils.c:252)
==186005==by 

Re: [PATCH 1/1] Fix detection of pwritev support for OSX.

2021-01-20 Thread James Hilliard
On Wed, Jan 20, 2021 at 4:07 PM Tom Lane  wrote:
>
> James Hilliard  writes:
> > On Tue, Jan 19, 2021 at 6:37 PM Tom Lane  wrote:
> >> I've found no direct means to control the
> >> SDK path at all, but so far it appears that "xcrun --show-sdk-path"
> >> agrees with the compiler's default -isysroot path as seen in the
> >> compiler's -v output.  I suspect that this isn't coincidental,
> >> but reflects xcrun actually being used in the compiler launch
> >> process.  If it were to flip over to using a IOS SDK, that would
> >> mean that bare "cc" would generate nonfunctional executables,
> >> which just about any onlooker would agree is broken.
>
> > So there's some more weirdness involved here, whether or not you
> > have the command line install seems to affect the output of the
> > "xcrun --show-sdk-path" command, but not the
> > "xcrun --sdk macosx --show-sdk-path" command.
>
> Yeah, that's what we discovered in the other thread.  It seems that
> with "--sdk macosx" you'll always get a pointer to the (solitary)
> SDK under /Applications/Xcode.app, but with the short "xcrun
> --show-sdk-path" command you might get either that or a pointer to
> something under /Library/Developer/CommandLineTools.
>
> I now believe what is actually happening with the short command is
> that it's iterating through the available SDKs (according to some not
> very clear search path) and picking the first one it finds that
> matches the host system version.  That matches the ktrace evidence
> that shows it reading the SDKSettings.plist file in each SDK
> directory.  The fact that it can seize on either an actual directory
> or an equivalent symlink might be due to chance ordering of directory
> entries.  (It'd be interesting to see "ls -f" output for your
> /Library/Developer/CommandLineTools/SDKs directory ... though if

Well at the moment I completely deleted that directory...and the build
works fine with my patch still.

> you've been experimenting with deinstall/reinstall, there's no
> reason to suppose the entry order is still the same.)
>
> I'm not sure that the case of not having the "command line tools"
> installed is interesting for our purposes.  AFAIK you have to have
> that in order to have access to required tools like bison and gmake.
> (That reminds me, I was intending to add something to our docs
> about how-to-build-from-source to say that you need to install those.)

Yeah, not 100% sure but I was able to build just fine after deleting my
command line tools. I think it just switched to using the normal SDK
toolchain, I guess that's the fallback logic doing that.

It would be pretty annoying to have to install an outdated SDK just to
build postgres for no other reason than the autoconf feature detection
being broken.

>
> > Note that with my patch the binaries will always be compatible with the
> > host system by default, even if the SDK is capable of producing binaries
> > that are incompatible so building postgres works with and without the
> > command line tools SDK.
>
> Yeah.  I don't see that as a benefit actually.  Adding the
> -no_weak_imports linker switch (or the other one you're suggesting)
> means that you *cannot* cross-compile for a newer macOS version,
> even if you set PG_SYSROOT and/or MACOSX_DEPLOYMENT_TARGET with the
> intention of doing so.

Best I can tell this isn't true, I was able to cross compile for a newer
MACOSX_DEPLOYMENT_TARGET than my build host just fine. The
binary fails with a "Symbol not found: _pwritev" error when I try
to run it on the system that built it.

In regards to the -no_weak_imports switch...that is something different
from my understanding as it just strips the weak imports forcing the
fallback code paths to be taken instead, essentially functioning as if
the weak symbols are never available. It's largely separate from the
deployment target from my understanding as weak symbols are feature
that lets you use newer syscalls while still providing backwards
compatible fallbacks for older systems.

> You'll still get a build that reflects the set
> of kernel calls available on the host system.  Admittedly, this is a
> case that's not likely to be of interest to very many people, but
> I don't see why a method with that restriction is superior to picking
> a default SDK that matches the host system (and can be overridden).

But to fix the build when using a newer SDK overriding the SDK location
does not help, you would have to override the broken feature detection.

>
> > So I think "xcrun --sdk macosx --show-sdk-path" is probably preferable
> > but either should work as long as we can properly detect deployment
> > target symbol availability, regardless this SDK sysroot selection issue is
> > effectively an entirely different issue from the feature detection not 
> > properly
> > respecting the configured deployment target.
>
> No, I think it's pretty much equivalent.  If we pick the right SDK
> then we'll get the build we want.

Generally any recent SDK installed should 

Re: Allow matching whole DN from a client certificate

2021-01-20 Thread Jacob Champion
On Wed, 2021-01-20 at 19:07 +, Jacob Champion wrote:
> I think you'll want to be careful to specify the format as much as
> possible, both to make sure that other backend TLS implementations can
> actually use the same escaping system and to ensure that user regexes
> don't suddenly start matching different things at some point in the
> future.

Along those lines: the current implementation doesn't escape commas in
fields, which means you can inject them to force a bad regex match. For
instance, when using the usermap that's in the patch:

dn"/^.*OU=Testing,.*$"username

if I create a certificate with the Organizational Unit name "Testing,
or something", then that will also match. Switching to RFC 2253/4514
quoting fixes comma injection (and reverses the order of the RDNs,
which requires a change to the regex patterns). But I think that the
regex as supplied still isn't strong enough to prevent problems.

For example, the equals sign isn't a delimiter and therefore isn't
quoted. So if I get my CA to sign a certificate with some arbitrary
field value of "HEY YOU=Testing", then that will also match the above
usermap. You'd need to write the regex with extreme attention to detail
and a full understanding of the escaping scheme to get around that --
assuming that the scheme is generally parsable with regexes to begin
with.

> I'm going to test this patch with some UTF-8 DNs later today; I'll share my 
> findings.

UTF-8 has the opposite issue; it's escaped in a way that makes it
unusable in a regex match. For example, say I have a (simple for the
sake of example, but broken as noted above) usermap of

dn"/^CN=([^,]*).*$"\1

which is supposed to emulate the functionality of the "clientname=CN"
mode, and two users named "postgres" and "οδυσσέας". The "CN=postgres"
user will work just fine, but the UTF-8 CN of "οδυσσέας" will be
escaped into "\U03BF\U03B4\U03C5\U03C3\U03C3\U03AD\U03B1\U03C2" and
fail to match the internal user. (I'm not seeing an RFC describe the
"\U" escaping scheme; maybe it's OpenSSL-specific?)

--Jacob






Re: [PATCH 1/1] Fix detection of pwritev support for OSX.

2021-01-20 Thread Tom Lane
James Hilliard  writes:
> On Tue, Jan 19, 2021 at 6:37 PM Tom Lane  wrote:
>> I've found no direct means to control the
>> SDK path at all, but so far it appears that "xcrun --show-sdk-path"
>> agrees with the compiler's default -isysroot path as seen in the
>> compiler's -v output.  I suspect that this isn't coincidental,
>> but reflects xcrun actually being used in the compiler launch
>> process.  If it were to flip over to using a IOS SDK, that would
>> mean that bare "cc" would generate nonfunctional executables,
>> which just about any onlooker would agree is broken.

> So there's some more weirdness involved here, whether or not you
> have the command line install seems to affect the output of the
> "xcrun --show-sdk-path" command, but not the
> "xcrun --sdk macosx --show-sdk-path" command.

Yeah, that's what we discovered in the other thread.  It seems that
with "--sdk macosx" you'll always get a pointer to the (solitary)
SDK under /Applications/Xcode.app, but with the short "xcrun
--show-sdk-path" command you might get either that or a pointer to
something under /Library/Developer/CommandLineTools.

I now believe what is actually happening with the short command is
that it's iterating through the available SDKs (according to some not
very clear search path) and picking the first one it finds that
matches the host system version.  That matches the ktrace evidence
that shows it reading the SDKSettings.plist file in each SDK
directory.  The fact that it can seize on either an actual directory
or an equivalent symlink might be due to chance ordering of directory
entries.  (It'd be interesting to see "ls -f" output for your
/Library/Developer/CommandLineTools/SDKs directory ... though if
you've been experimenting with deinstall/reinstall, there's no
reason to suppose the entry order is still the same.)

I'm not sure that the case of not having the "command line tools"
installed is interesting for our purposes.  AFAIK you have to have
that in order to have access to required tools like bison and gmake.
(That reminds me, I was intending to add something to our docs
about how-to-build-from-source to say that you need to install those.)

> Note that with my patch the binaries will always be compatible with the
> host system by default, even if the SDK is capable of producing binaries
> that are incompatible so building postgres works with and without the
> command line tools SDK.

Yeah.  I don't see that as a benefit actually.  Adding the
-no_weak_imports linker switch (or the other one you're suggesting)
means that you *cannot* cross-compile for a newer macOS version,
even if you set PG_SYSROOT and/or MACOSX_DEPLOYMENT_TARGET with the
intention of doing so.  You'll still get a build that reflects the set
of kernel calls available on the host system.  Admittedly, this is a
case that's not likely to be of interest to very many people, but
I don't see why a method with that restriction is superior to picking
a default SDK that matches the host system (and can be overridden).

> So I think "xcrun --sdk macosx --show-sdk-path" is probably preferable
> but either should work as long as we can properly detect deployment
> target symbol availability, regardless this SDK sysroot selection issue is
> effectively an entirely different issue from the feature detection not 
> properly
> respecting the configured deployment target.

No, I think it's pretty much equivalent.  If we pick the right SDK
then we'll get the build we want.

regards, tom lane




Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra

OK, pushed after a little bit of additional polishing (mostly comments).

Thanks everyone!

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: ResourceOwner refactoring

2021-01-20 Thread Heikki Linnakangas

On 19/01/2021 11:09, Heikki Linnakangas wrote:

On 18/01/2021 18:11, Robert Haas wrote:

On Mon, Jan 18, 2021 at 11:11 AM Robert Haas  wrote:

On Mon, Jan 18, 2021 at 10:19 AM Heikki Linnakangas  wrote:

On 18/01/2021 16:34, Alvaro Herrera wrote:

So according to your performance benchmark, we're willing to accept a
30% performance loss on an allegedly common operation -- numkeep=0
numsnaps=10 becomes 49.8ns from 37.6ns.  That seems a bit shocking.
Maybe you can claim that these operations aren't exactly hot spots, and
so the fact that we remain in the same power-of-ten is sufficient.  Is
that the argument?


That's right. The fast path is fast, and that's important. The slow path
becomes 30% slower, but that's acceptable.


I don't know whether a 30% slowdown will hurt anybody, but it seems
like kind of a lot, and I'm not sure I understand what corresponding
benefit we're getting.


The benefit is to make it easy for extensions to use resource owners to
track whatever resources they need to track. And arguably, the new
mechanism is nicer for built-in code, too.

I'll see if I can optimize the slow paths, to make it more palatable.


Ok, here's a new set of patches, and new test results. I replaced the 
hash function with a cheaper one. I also added the missing wrappers that 
Alvaro and Hayato Kuroda commented on, and fixed the typos that Michael 
Paquier pointed out.


In the test script, I increased the number of iterations used in the 
tests, to make them run longer and produce more stable results. There is 
still a fair amount of jitter in the results, so take any particular 
number with a grain of salt, but the overall trend is repeatable.


The results now look like this:

Unpatched
-

postgres=# \i snaptest.sql
 numkeep | numsnaps | lifo_time_ns | fifo_time_ns
-+--+--+--
   0 |1 | 32.7 | 32.3
   0 |5 | 33.0 | 32.9
   0 |   10 | 34.1 | 35.5
   0 |   60 | 32.5 | 38.3
   0 |   70 | 47.6 | 48.9
   0 |  100 | 47.9 | 49.3
   0 | 1000 | 52.9 | 52.7
   0 |1 | 61.7 | 62.4
   9 |   10 | 38.4 | 37.6
   9 |  100 | 42.3 | 42.3
   9 | 1000 | 43.9 | 45.0
   9 |1 | 62.2 | 62.5
  65 |   70 | 42.4 | 42.9
  65 |  100 | 43.2 | 43.9
  65 | 1000 | 44.0 | 45.1
  65 |1 | 62.4 | 62.6
(16 rows)

Patched
---

postgres=# \i snaptest.sql
 numkeep | numsnaps | lifo_time_ns | fifo_time_ns
-+--+--+--
   0 |1 | 32.8 | 34.2
   0 |5 | 33.8 | 32.2
   0 |   10 | 37.2 | 40.2
   0 |   60 | 40.2 | 45.1
   0 |   70 | 40.9 | 48.4
   0 |  100 | 41.9 | 45.2
   0 | 1000 | 49.0 | 55.6
   0 |1 | 62.4 | 67.2
   9 |   10 | 33.6 | 33.0
   9 |  100 | 39.6 | 39.7
   9 | 1000 | 42.2 | 45.0
   9 |1 | 60.7 | 63.9
  65 |   70 | 32.5 | 33.6
  65 |  100 | 37.5 | 39.7
  65 | 1000 | 42.3 | 46.3
  65 |1 | 61.9 | 64.8
(16 rows)

For easier side-by-side comparison, here are the same numbers with the 
patched and unpatched results side by side, and their ratio (ratio < 1 
means the patched version is faster):


LIFO tests:
 numkeep | numsnaps | unpatched | patched | ratio
-+--+---+-+---
   0 |1 |  32.7 |32.8 |  1.00
   0 |5 |  33.0 |33.8 |  1.02
   0 |   10 |  34.1 |37.2 |  1.09
   0 |   60 |  32.5 |40.2 |  1.24
   0 |   70 |  47.6 |40.9 |  0.86
   0 |  100 |  47.9 |41.9 |  0.87
   0 | 1000 |  52.9 |49.0 |  0.93
   0 |1 |  61.7 |62.4 |  1.01
   9 |   10 |  38.4 |33.6 |  0.88
   9 |  100 |  42.3 |39.6 |  0.94
   9 | 1000 |  43.9 |42.2 |  0.96
   9 |1 |  62.2 |60.7 |  0.98
  65 |   70 |  42.4 |32.5 |  0.77
  65 |  100 |  43.2 |37.5 |  0.87
  65 | 1000 |  44.0 |42.3 |  0.96
  65 |1 |  62.4 |61.9 |  0.99
(16 rows)


FIFO tests:
 numkeep | numsnaps | unpatched | patched | ratio
-+--+---+-+---
   0 |1 |  32.3 |34.2 |  1.06
   0 |5 |  32.9 |32.2 |  0.98
   0 |   10 |  35.5 |40.2 |  1.13
  

Re: pg_upgrade fails with non-standard ACL

2021-01-20 Thread Anastasia Lubennikova

On 03.01.2021 14:29, Noah Misch wrote:

On Thu, Jun 11, 2020 at 07:58:43PM +0300, Anastasia Lubennikova wrote:

On 08.06.2020 19:31, Alvaro Herrera wrote:

I'm thinking what's a good way to have a test that's committable.  Maybe
it would work to add a TAP test to pg_upgrade that runs initdb, does a
few GRANTS as per your attachment, then runs pg_upgrade?  Currently the
pg_upgrade tests are not TAP ... we'd have to revive
https://postgr.es/m/20180126080026.gi17...@paquier.xyz
(Some progress was already made on the buildfarm front to permit this)

I would be glad to add some test, but it seems to me that the infrastructure
changes for cross-version pg_upgrade test is much more complicated task than
this modest bugfix.

Agreed.

Thank you for the review.
New version of the patch is attached, though I haven't tested it 
properly yet. I am planning to do in a couple of days.

--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
+static void
+check_for_changed_signatures(void)
+{
+   snprintf(output_path, sizeof(output_path), "revoke_objects.sql");

If an upgrade deleted function pg_last_wal_replay_lsn, upgrading a database in
which "REVOKE ALL ON FUNCTION pg_last_wal_replay_lsn FROM public" happened
requires a GRANT.  Can you use a file name that will still make sense if
someone enhances pg_upgrade to generate such GRANT statements?
I changed the name to 'fix_system_objects_ACL.sql'. Does it look good to 
you?





+   if (script == NULL && (script = fopen_priv(output_path, 
"w")) == NULL)
+   pg_fatal("could not open file \"%s\": 
%s\n",
+output_path, 
strerror(errno));

Use %m instead of passing sterror(errno) to %s.

Done.

+   }
+
+   /* Handle columns separately */
+   if (strstr(aclinfo->obj_type, "column") != NULL)
+   {
+   char   *pdot = 
last_dot_location(aclinfo->obj_ident);

I'd write strrchr(aclinfo->obj_ident, '.') instead of introducing
last_dot_location().

This assumes column names don't contain '.'; how difficult would it be to
remove that assumption?  If it's difficult, a cheap approach would be to
ignore any entry containing too many dots.  We're not likely to create such
column names at initdb time, but v13 does allow:

   ALTER VIEW pg_locks RENAME COLUMN objid TO "a.b";
   GRANT SELECT ("a.b") ON pg_locks TO backup;

After which revoke_objects.sql has:

   psql:./revoke_objects.sql:9: ERROR:  syntax error at or near "") ON 
pg_catalog.pg_locks.""
   LINE 1: REVOKE ALL (b") ON pg_catalog.pg_locks."a FROM "backup";

While that ALTER VIEW probably should have required allow_system_table_mods,
we need to assume such databases exist.


I've fixed it by using pg_identify_object_as_address(). Now we don't 
have to parse obj_identity.




Overall, this patch predicts a subset of cases where pg_dump will emit a
failing GRANT or REVOKE that targets a pg_catalog object.  Can you write a
code comment stating what it does and does not detect?  I think it's okay to
not predict every failure, but let's record the intended coverage.  Here are a
few examples I know; there may be more.  The above query only finds GRANTs to
non-pinned roles.  pg_dump dumps the following, but the above query doesn't
find them:

   REVOKE ALL ON FUNCTION pg_last_wal_replay_lsn FROM public;
   GRANT EXECUTE ON FUNCTION pg_reload_conf() TO pg_signal_backend;

The above query should test refclassid.




This function should not run queries against servers older than 9.6, because
pg_dump emits GRANT/REVOKE for pg_catalog objects only when targeting 9.6 or
later.  An upgrade from 8.4 to master is failing on the above query:


Fixed.


The patch adds many lines wider than 78 columns, this being one example.  In
general, break such lines.  (Don't break string literal arguments of
ereport(), though.)

Fixed.

nm



--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 43fc297eb6..429e4468f2 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -16,6 +16,7 @@
 
 static void check_new_cluster_is_empty(void);
 static void check_databases_are_compatible(void);
+static void check_for_changed_signatures(void);
 static void check_locale_and_encoding(DbInfo *olddb, DbInfo *newdb);
 static bool equivalent_locale(int category, const char *loca, const char *locb);
 static void check_is_install_user(ClusterInfo *cluster);
@@ -151,6 +152,8 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 804)
 		new_9_0_populate_pg_largeobject_metadata(_cluster, true);
 
+	get_non_default_acl_infos(_cluster);
+
 	/*
 	 * While not a check option, we do this now because this is the 

Re: list of extended statistics on psql

2021-01-20 Thread Tomas Vondra




On 1/20/21 7:41 AM, Tatsuro Yamada wrote:

Hi Tomas,

On 2021/01/20 11:35, Tatsuro Yamada wrote:
Apologies for all the extra work - I haven't realized this flaw when 
pushing for showing more stuff :-(


Don't worry about it. We didn't notice the problem even when viewed by 
multiple

people on -hackers. Let's keep moving forward. :-D

I'll send a patch including a regression test on the next patch.



I created patches and my test results on PG10, 11, 12, and 14 are fine.

   0001:
     - Fix query to use pg_statistic_ext only
     - Replace statuses "required" and "built" with "defined"
     - Remove the size columns
     - Fix document
     - Add schema name as a filter condition on the query

   0002:
     - Fix all results of \dX
     - Add new testcase by non-superuser

Please find attached files. :-D


Thanks, I've pushed this. I had to tweak the regression tests a bit, for 
two reasons:


1) to change user in regression tests, don't use \connect, but SET ROLE 
and RESET ROLE


2) roles in regression tests should use names with regress_ prefix



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [PATCH 1/1] Fix detection of pwritev support for OSX.

2021-01-20 Thread James Hilliard
On Tue, Jan 19, 2021 at 6:37 PM Tom Lane  wrote:
>
> James Hilliard  writes:
> > Actually, this looks path looks wrong in general, the value for
> > "xcrun --sdk macosx --show-sdk-path" should take precedence over
> > "xcrun --show-sdk-path" as the latter may be used for IOS potentially.
>
> What is "potentially"?

Well I'm not sure the SDK parameter always defaults to macos although
I guess it probably does as I couldn't figure out a way to change it:
$ xcodebuild -showsdks
iOS SDKs:
iOS 14.3  -sdk iphoneos14.3
iOS Simulator SDKs:
Simulator - iOS 14.3  -sdk iphonesimulator14.3
macOS SDKs:
DriverKit 20.2-sdk driverkit.macosx20.2
macOS 11.1-sdk macosx11.1
tvOS SDKs:
tvOS 14.3 -sdk appletvos14.3
tvOS Simulator SDKs:
Simulator - tvOS 14.3 -sdk appletvsimulator14.3
watchOS SDKs:
watchOS 7.2   -sdk watchos7.2
watchOS Simulator SDKs:
Simulator - watchOS 7.2   -sdk watchsimulator7.2

> I've found no direct means to control the
> SDK path at all, but so far it appears that "xcrun --show-sdk-path"
> agrees with the compiler's default -isysroot path as seen in the
> compiler's -v output.  I suspect that this isn't coincidental,
> but reflects xcrun actually being used in the compiler launch
> process.  If it were to flip over to using a IOS SDK, that would
> mean that bare "cc" would generate nonfunctional executables,
> which just about any onlooker would agree is broken.

So there's some more weirdness involved here, whether or not you
have the command line install seems to affect the output of the
"xcrun --show-sdk-path" command, but not the
"xcrun --sdk macosx --show-sdk-path" command.

This is what I get without the command line tools:
$ xcrun --show-sdk-path
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk
$ xcrun --sdk macosx --show-sdk-path
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk
this last one is just a symlink to the other path.

With command line tools this is different however:
$ xcrun --show-sdk-path
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk
$ xcrun --sdk macosx --show-sdk-path
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk

Note that the /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk
is different from the normal SDK and doesn't seem to be able to generate
binaries that target a 11.0 deployment target on my 10.15 system, however
I am unsure if this behavior can be relied upon.

So in terms of what works best, the newer normal SDK has the most flexibility
as it can produce both 10.15 target binaries and 11.0 target binaries
depending on the MACOSX_DEPLOYMENT_TARGET while the command
line tools SDK can only produce 10.15 target binaries it would appear.

Note that with my patch the binaries will always be compatible with the
host system by default, even if the SDK is capable of producing binaries
that are incompatible so building postgres works with and without the
command line tools SDK.

So I think "xcrun --sdk macosx --show-sdk-path" is probably preferable
but either should work as long as we can properly detect deployment
target symbol availability, regardless this SDK sysroot selection issue is
effectively an entirely different issue from the feature detection not properly
respecting the configured deployment target.

>
> I'm really not excited about trying to make the build work with
> a non-native SDK as you are proposing.  I think that's just going
> to lead to a continuing stream of problems, because of Apple's
> opinions about how cross-version compatibility should work.

Well the minimum required target version is pretty much strictly based on
MACOSX_DEPLOYMENT_TARGET so our feature detection still needs
to use that, otherwise cross target compilation for newer or older targets will
not work correctly.

>From my understanding the reason AC_REPLACE_FUNCS does not
throw an error for deployment target incompatible functions is that it only
checks if the function exists and not if it is actually useable, this is
why I had to add an explicit AC_LANG_PROGRAM compile test to
properly trigger a compile failure if the function is not usable for a
particular deployment target version, merely checking if the function
exists in the header is not sufficient.

> It also seems like unnecessary complexity, because there is always
> (AFAICS) a native SDK version available.  We just need to find it.

Best I can tell this is not true, it is some(most?) of the time but
it's not something
we can rely upon as systems may only contain a newer SDK, but this newer SDK
is still capable of producing binaries that can run on the build host system so
this shouldn't be an issue as long as we can do target feature
detection properly.

>
> regards, tom lane




Re: Deleting older versions in unique indexes to avoid page splits

2021-01-20 Thread Peter Geoghegan
On Wed, Jan 20, 2021 at 10:53 AM Peter Geoghegan  wrote:
> This patch is unusual in that you really need to think about emergent
> behaviors to understand it. That is certainly a difficult thing to do,
> and it's understandable that even an expert might not grok it without
> considering it carefully.

I happened to stumble upon a recent blog post that seems like a light,
approachable introduction to some of the key concepts here:

https://jessitron.com/2021/01/18/when-costs-are-nonlinear-keep-it-small/

Bottom-up index deletion enhances a complex system whose maintenance
costs are *dramatically* nonlinear, at least in many important cases.
If you apply linear thinking to such a system then you'll probably end
up with a bad design.

The system as a whole is made efficient by making sure that we're lazy
when that makes sense, while also making sure that we're eager when
that makes sense. So it almost *has* to be structured as a bottom-up,
reactive mechanism -- no other approach is able to ramp up or down in
exactly the right way. Talking about small cost differences (things
that can easily be empirically measured, perhaps with a
microbenchmark) is almost irrelevant to the big picture. It's even
irrelevant to the "medium picture".

What's more, it's basically a mistake to think of heap page accesses
that don't yield any deletable index tuples as wasted effort (even
though that's how I describe them myself!). Here's why: we have to
access the heap page to learn that it has nothing for us in the first
place place! If we somehow knew ahead of time that some useless-to-us
heap block was useless, then the whole system wouldn't be bottom-up
(by definition). In other words, failing to get any index tuple
deletes from an entire heap page *is itself a form of feedback* at the
local level -- it guides the entire system's behavior over time. Why
should we expect to get that information at zero cost?

This is somehow both simple and complicated, which creates huge
potential for miscommunications. I tried to describe this in various
ways at various points. Perhaps I could have done better with that.

-- 
Peter Geoghegan




Re: poc - possibility to write window function in PL languages

2021-01-20 Thread Pavel Stehule
st 20. 1. 2021 v 21:32 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > st 20. 1. 2021 v 21:07 odesílatel Tom Lane  napsal:
> >> Uh, what?  I don't understand what this "partition context" is.
>
> > It was my name for an access to window partition local memory -
> > WinGetPartitionLocalMemory
>
> Ah.
>
> > We need some interface for this cache
>
> I'm not convinced we need to expose that, or that it'd be very
> satisfactory to plpgsql users if we did.  The fact that it's fixed-size
> and initializes to zeroes are both things that are okay for C programmers
> but might be awkward to deal with in plpgsql code.  At the very least it
> would greatly constrain what data types you could usefully store.
>
> So I'd be inclined to leave that out, at least for the first version.
>

I think this functionality is relatively important. If somebody tries to
implement own window function, then he starts with some variation of the
row_num function.

We can support only types of fixed length to begin.

Regards

Pavel



>
> regards, tom lane
>


Re: poc - possibility to write window function in PL languages

2021-01-20 Thread Tom Lane
Pavel Stehule  writes:
> st 20. 1. 2021 v 21:07 odesílatel Tom Lane  napsal:
>> Uh, what?  I don't understand what this "partition context" is.

> It was my name for an access to window partition local memory -
> WinGetPartitionLocalMemory

Ah.

> We need some interface for this cache

I'm not convinced we need to expose that, or that it'd be very
satisfactory to plpgsql users if we did.  The fact that it's fixed-size
and initializes to zeroes are both things that are okay for C programmers
but might be awkward to deal with in plpgsql code.  At the very least it
would greatly constrain what data types you could usefully store.

So I'd be inclined to leave that out, at least for the first version.

regards, tom lane




Re: poc - possibility to write window function in PL languages

2021-01-20 Thread Pavel Stehule
st 20. 1. 2021 v 21:07 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > The second question is work with partition context value. This should be
> > only one value, and of only one but of any type per function. In this
> case
> > we cannot use GET statements. I had an idea of enhancing declaration.
> Some
> > like
>
> > DECLARE
> >   pcx PARTITION CONTEXT (int); -- read partition context
> > BEGIN
> >   pcx := 10; -- set partition context
>
> > What do you think about it?
>
> Uh, what?  I don't understand what this "partition context" is.
>

It was my name for an access to window partition local memory -
WinGetPartitionLocalMemory

We need some interface for this cache

Regards

Pavel







>
> regards, tom lane
>


Re: poc - possibility to write window function in PL languages

2021-01-20 Thread Tom Lane
Pavel Stehule  writes:
> The second question is work with partition context value. This should be
> only one value, and of only one but of any type per function. In this case
> we cannot use GET statements. I had an idea of enhancing declaration. Some
> like

> DECLARE
>   pcx PARTITION CONTEXT (int); -- read partition context
> BEGIN
>   pcx := 10; -- set partition context

> What do you think about it?

Uh, what?  I don't understand what this "partition context" is.

regards, tom lane




Re: strange error reporting

2021-01-20 Thread Robert Haas
On Wed, Jan 20, 2021 at 1:54 PM Tom Lane  wrote:
> Alvaro Herrera  writes:
> > On 2021-Jan-20, Robert Haas wrote:
> >> I figured it was something like that. I don't know whether the right
> >> thing is to use something like PQdb() to get the correct database
> >> name, or whether we should go with Tom's suggestion and omit that
> >> detail altogether, but I think showing the empty string when the user
> >> relied on the default is too confusing.
>
> > Well, the patch seems small enough, and I don't think it'll be in any
> > way helpful to omit that detail.
>
> I'm +1 for applying and back-patching that.  I still think we might
> want to just drop the phrase altogether in HEAD, but we wouldn't do
> that in the back branches, and the message is surely misleading as-is.

Sure, that makes sense.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Calculation of relids (pull_varnos result) for PlaceHolderVars

2021-01-20 Thread Tom Lane
I wrote:
> ...
> 2. pull_varnos() is not passed the planner "root" data structure,
> so it can't get at the PlaceHolderInfo list.  We can change its
> API of course, but that propagates to dozens of places.
> ...
> The 0001 patch attached goes ahead and makes those API changes.
> I think this is perfectly reasonable to do in HEAD, but it most
> likely is an unacceptable API/ABI break for the back branches.
> ...
> A third way is to preserve the existing pull_varnos() API in
> the back branches, changing all the internal calls to use a
> new function that has the additional "root" parameter.  This
> seems feasible but I've not attempted to code it yet.

Here's a proposed fix that does it like that.  The 0001 patch
is the same as before, and then 0002 is a delta to be applied
only in the back branches.  What I did there was install a layer
of macros in the relevant .c files that cause calls that look like
the HEAD versions to be redirected to the "xxx_new" functions.
The idea is to keep the actual code in sync with HEAD, for
readability and to minimize back-patching pain.  It could be
argued that this is too cute and the internal references should
just go to the "new" functions in the back branches.

I did not bother to preserve ABI for these two functions:
indexcol_is_bool_constant_for_query()
build_implied_join_equality()
because I judged it highly unlikely that any extensions are
calling them.  If anybody thinks differently, we could hack
those in the same way.

regards, tom lane

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2f2d4d171c..48c2f23ae0 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -5709,7 +5709,8 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel,
 			 * RestrictInfos, so we must make our own.
 			 */
 			Assert(!IsA(expr, RestrictInfo));
-			rinfo = make_restrictinfo(expr,
+			rinfo = make_restrictinfo(root,
+	  expr,
 	  true,
 	  false,
 	  false,
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 4f5b870d1b..d263ecf082 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -227,7 +227,7 @@ clauselist_selectivity_ext(PlannerInfo *root,
 			}
 			else
 			{
-ok = (NumRelids(clause) == 1) &&
+ok = (NumRelids(root, clause) == 1) &&
 	(is_pseudo_constant_clause(lsecond(expr->args)) ||
 	 (varonleft = false,
 	  is_pseudo_constant_clause(linitial(expr->args;
@@ -609,7 +609,7 @@ bms_is_subset_singleton(const Bitmapset *s, int x)
  *	  restriction or join estimator.  Subroutine for clause_selectivity().
  */
 static inline bool
-treat_as_join_clause(Node *clause, RestrictInfo *rinfo,
+treat_as_join_clause(PlannerInfo *root, Node *clause, RestrictInfo *rinfo,
 	 int varRelid, SpecialJoinInfo *sjinfo)
 {
 	if (varRelid != 0)
@@ -643,7 +643,7 @@ treat_as_join_clause(Node *clause, RestrictInfo *rinfo,
 		if (rinfo)
 			return (bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
 		else
-			return (NumRelids(clause) > 1);
+			return (NumRelids(root, clause) > 1);
 	}
 }
 
@@ -860,7 +860,7 @@ clause_selectivity_ext(PlannerInfo *root,
 		OpExpr	   *opclause = (OpExpr *) clause;
 		Oid			opno = opclause->opno;
 
-		if (treat_as_join_clause(clause, rinfo, varRelid, sjinfo))
+		if (treat_as_join_clause(root, clause, rinfo, varRelid, sjinfo))
 		{
 			/* Estimate selectivity for a join clause. */
 			s1 = join_selectivity(root, opno,
@@ -896,7 +896,7 @@ clause_selectivity_ext(PlannerInfo *root,
   funcclause->funcid,
   funcclause->args,
   funcclause->inputcollid,
-  treat_as_join_clause(clause, rinfo,
+  treat_as_join_clause(root, clause, rinfo,
 	   varRelid, sjinfo),
   varRelid,
   jointype,
@@ -907,7 +907,7 @@ clause_selectivity_ext(PlannerInfo *root,
 		/* Use node specific selectivity calculation function */
 		s1 = scalararraysel(root,
 			(ScalarArrayOpExpr *) clause,
-			treat_as_join_clause(clause, rinfo,
+			treat_as_join_clause(root, clause, rinfo,
  varRelid, sjinfo),
 			varRelid,
 			jointype,
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 380336518f..aab06c7d21 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1858,7 +1858,7 @@ cost_incremental_sort(Path *path,
 		 * Check if the expression contains Var with "varno 0" so that we
 		 * don't call estimate_num_groups in that case.
 		 */
-		if (bms_is_member(0, pull_varnos((Node *) member->em_expr)))
+		if (bms_is_member(0, pull_varnos(root, (Node *) member->em_expr)))
 		{
 			unknown_varno = true;
 			break;
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index ab6eaaead1..0188c1e9a1 100644
--- 

Re: [HACKERS] [PATCH] Generic type subscripting

2021-01-20 Thread Dmitry Dolgov
> On Wed, Jan 20, 2021 at 11:34:16AM -0500, Dian M Fay wrote:
> > Thanks, I need to remember to not skipp doc building for testing process
> > even for such small changes. Hope now I didn't forget anything.
> >
> > > On Wed, Jan 20, 2021 at 09:58:43AM -0500, Dian M Fay wrote:
> >
> > > Here's a full editing pass on the documentation, with v45 and Pavel's
> > > doc-whitespaces-fix.patch applied. I also corrected a typo in one of the
> > > added hints.
> >
> > Great! I've applied almost all of it, except:
> >
> > + A jsonb value will accept assignments to nonexistent
> > subscript
> > + paths as long as the nonexistent elements being traversed are all
> > arrays.
> >
> > Maybe I've misunderstood the intention, but there is no requirement
> > about arrays for creating such an empty path. I've formulated it as:
> >
> > + A jsonb value will accept assignments to nonexistent
> > subscript
> > + paths as long as the last existing path key is an object or an array.
>
> My intention there was to highlight the difference between:
>
> * SET obj['a']['b']['c'] = '"newvalue"'
> * SET arr[0][0][3] = '"newvalue"'
>
> obj has to conform to {"a": {"b": {...}}} in order to receive the
> assignment of the nested c. If it doesn't, that's the error case we
> discussed earlier. But arr can be null, [], and so on, and any missing
> structure [[[null, null, null, "newvalue"]]] will be created.

If arr is 'null', or any other scalar value, such subscripting will work
only one level deep because they represented internally as an array of
one element. If arr is '[]' the path will comply by definition. So it's
essentially the same as for objects with no particular difference. If
such a quirk about scalars being treated like arrays is bothering, we
could also bend it in this case as well (see the attached version).
>From a4037c651a0cfd2448f38b6c8c932b5a09136b0a Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Fri, 18 Dec 2020 17:19:51 +0100
Subject: [PATCH v48 1/3] Subscripting for jsonb

Subscripting implementation for jsonb. It does not support slices, does
not have a limit for number of subscripts and for assignment expects a
replace value to be of jsonb type. There is also one functional
difference in assignment via subscripting from jsonb_set, when an
original jsonb container is NULL, subscripting replaces it with an empty
jsonb and proceed with assignment.

For the sake of code reuse, some parts of jsonb functionality were
rearranged to allow use the same functions for jsonb_set and assign
subscripting operation.

The original idea belongs to Oleg Bartunov.

Reviewed-by: Tom Lane, Arthur Zakirov, Pavel Stehule, Dian M Fay
---
 doc/src/sgml/json.sgml  |  51 
 src/backend/utils/adt/Makefile  |   1 +
 src/backend/utils/adt/jsonb_util.c  |  76 -
 src/backend/utils/adt/jsonbsubs.c   | 413 
 src/backend/utils/adt/jsonfuncs.c   | 180 ++--
 src/include/catalog/pg_proc.dat |   4 +
 src/include/catalog/pg_type.dat |   3 +-
 src/include/utils/jsonb.h   |   6 +-
 src/test/regress/expected/jsonb.out | 272 +-
 src/test/regress/sql/jsonb.sql  |  84 +-
 10 files changed, 985 insertions(+), 105 deletions(-)
 create mode 100644 src/backend/utils/adt/jsonbsubs.c

diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index 5b9a5557a4..3ace5e444b 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -602,6 +602,57 @@ SELECT jdoc-'guid', jdoc-'name' FROM api WHERE jdoc @ '{"tags": ["qu
   
  
 
+ 
+  jsonb Subscripting
+  
+   The jsonb data type supports array-style subscripting expressions
+   to extract and modify elements. Nested values can be indicated by chaining
+   subscripting expressions, following the same rules as the path
+   argument in the jsonb_set function. If a jsonb
+   value is an array, numeric subscripts start at zero, and negative integers count
+   backwards from the last element of the array. Slice expressions are not supported.
+   The result of a subscripting expression is always of the jsonb data type.
+  
+
+  
+   An example of subscripting syntax:
+
+
+-- Extract object value by key
+SELECT ('{"a": 1}'::jsonb)['a'];
+
+-- Extract nested object value by key path
+SELECT ('{"a": {"b": {"c": 1}}}'::jsonb)['a']['b']['c'];
+
+-- Extract array element by index
+SELECT ('[1, "2", null]'::jsonb)[1];
+
+-- Update object value by key. Note the quotes around '1': the assigned
+-- value must be of the jsonb type as well
+UPDATE table_name SET jsonb_field['key'] = '1';
+
+-- Filter records using a WHERE clause with subscripting. Since the result of
+-- subscripting is jsonb, the value we compare it against must also be jsonb.
+-- The double quotes make "value" also a valid jsonb string.
+SELECT * FROM table_name WHERE jsonb_field['key'] = '"value"';
+
+
+   jsonb assignment via subscripting handles a few edge cases
+   differently from jsonb_set. When a source jsonb
+   is NULL, 

Re: Allow matching whole DN from a client certificate

2021-01-20 Thread Jacob Champion
On Mon, 2021-01-18 at 11:23 +0100, Daniel Gustafsson wrote:
> +   /* use commas instead of slashes */
> +   X509_NAME_print_ex(bio, x509name, 0, XN_FLAG_SEP_COMMA_PLUS);
> I don't have strong opinions on whether we shold use slashes or commas, but I
> think it needs to be documented that commas are required since slashes is the
> more common way to print a DN.

There's also a XN_FLAG_RFC2253 flag, which claims to print in an RFC-
compatible escape system. (It includes XN_FLAG_SEP_COMMA_PLUS.) I think
NSS uses a similar RFC-based escape scheme, but I haven't checked to
see whether it's fully compatible.

I think you'll want to be careful to specify the format as much as
possible, both to make sure that other backend TLS implementations can
actually use the same escaping system and to ensure that user regexes
don't suddenly start matching different things at some point in the
future. As a cautionary tale, nginx is stuck with two versions of their
similar feature (ssl_client_s_dn_legacy vs ssl_client_s_dn) after their
switch to an RFC-compatible system [1].

Even when using RFC 2253 (now 4514) escaping, there are things left to the 
implementation, such as the order of AVAs inside multivalued RDNs. The RFC 
warns not to consider these representations to be canonical forms, so it's 
possible that switching (or upgrading) the TLS library in use could force users 
to change their regexes at some point in the future.
I'm going to test this patch with some UTF-8 DNs later today; I'll
share my findings. I'm also going to read up on [2] a bit more.

--Jacob

[1] 
https://serverfault.com/questions/829754/why-did-the-format-of-nginx-ssl-client-i-dn-suddenly-change

[2] 
https://frasertweedale.github.io/blog-redhat/posts/2019-05-28-a-dn-is-not-a-string.html


catalogs.sgml documentation ambiguity

2021-01-20 Thread Joel Jacobson
Some catalog tables have references to pg_attribute.attnum.

In the documentation, it only says "(references pg_attribute.attnum)"
but not which oid column to include in the two-column "foreign key".

This would not be a problem if there would only be one reference to 
pg_class.oid,
but some catalog tables have multiple columns that references pg_class.oid.

For instance, pg_constraint has two columns (conkey, confkey) referencing 
pg_attribute,
and three columns (conrelid, conindid, confrelid) referencing pg_class.

A user might wonder:
- Which one of these three columns should be used in combination with the 
conkey/confkey elements to join pg_attribute?

If we would have array foreign key support, I would guess the "foreign keys" 
should be:

FOREIGN KEY (confrelid, EACH ELEMENT OF confkey) REFERENCES 
pg_catalog.pg_attribute (attrelid, attnum)
FOREIGN KEY (conrelid, EACH ELEMENT OF conkey) REFERENCES 
pg_catalog.pg_attribute (attrelid, attnum)

It's of course harder to guess for a machine though, which would need a 
separate human-produced lookup-table.

Could it be meaningful to clarify these multi-key relations in the 
documentation?

As a bonus, machines could then parse the information out of catalogs.sgml.

Here is a list of catalogs referencing pg_attribute and with multiple pg_class 
references:

  table_name  |   array_agg
--+---
pg_constraint| {confrelid,conindid,conrelid}
pg_index | {indexrelid,indrelid}
pg_partitioned_table | {partdefid,partrelid}
pg_trigger   | {tgconstrindid,tgconstrrelid,tgrelid}
(4 rows)

Produced using query:

SELECT b.table_name, array_agg(DISTINCT b.column_name)
FROM pit.oid_joins AS a
JOIN pit.oid_joins AS b
ON b.table_name = a.table_name
WHERE a.ref_table_name = 'pg_attribute'
AND b.ref_table_name = 'pg_class'
GROUP BY b.table_name
HAVING cardinality(array_agg(DISTINCT b.column_name)) > 1
;


Re: strange error reporting

2021-01-20 Thread Tom Lane
Alvaro Herrera  writes:
> On 2021-Jan-20, Robert Haas wrote:
>> I figured it was something like that. I don't know whether the right
>> thing is to use something like PQdb() to get the correct database
>> name, or whether we should go with Tom's suggestion and omit that
>> detail altogether, but I think showing the empty string when the user
>> relied on the default is too confusing.

> Well, the patch seems small enough, and I don't think it'll be in any
> way helpful to omit that detail.

I'm +1 for applying and back-patching that.  I still think we might
want to just drop the phrase altogether in HEAD, but we wouldn't do
that in the back branches, and the message is surely misleading as-is.

regards, tom lane




Re: Deleting older versions in unique indexes to avoid page splits

2021-01-20 Thread Peter Geoghegan
On Wed, Jan 20, 2021 at 5:33 AM Amit Kapila  wrote:
> > Victor independently came up with a benchmark that ran over several
> > hours, with cleanup consistently held back by ~5 minutes by a long
> > running transaction:
> >
>
> AFAICS, the long-running transaction used in the test is below:
> SELECT abalance, pg_sleep(300) FROM pgbench_accounts WHERE mtime >
> now() - INTERVAL '15min' ORDER BY aid LIMIT 1;
>
> This shouldn't generate a transaction id so would it be sufficient to
> hold back the clean-up?

It will hold back clean-up because it holds open a snapshot. Whether
or not the long running transaction has been allocated a true XID (not
just a VXID) is irrelevant. Victor's test case is perfectly valid.

In general there are significant benefits for cases with long-running
transactions, which will be quite apparent if you do something simple
like run pgbench (a script with non-HOT updates) while a REPEATABLE
READ transaction runs in psql (and has actually acquired a snapshot by
running a simple query -- the xact snapshot is acquired lazily). I
understand that this will be surprising if you believe that the
problem in these cases is strictly that there are too many "recently
dead" versions that still need to be stored (i.e. versions that
cleanup simply isn't able to remove, given the xmin horizon
invariant). But it's now clear that that's not what really happens in
most cases with a long running transaction. What we actually see is
that local page-level inefficiencies in cleanup were (and perhaps to
some degree still are) a much bigger problem than the inherent
inability of cleanup to remove even one or two tuples. This is at
least true until the bloat problem becomes a full-blown disaster
(because cleanup really is inherently restricted by the global xmin
horizon, and therefore hopelessly behind).

In reality there are seldom that many individual logical rows that get
updated more than a few times in (say) any given one hour period. This
is true even with skewed updates -- the skew is almost never visible
at the level of an individual leaf page. The behavior we see now that
we have bottom-up index deletion is much closer to the true optimal
behavior for the general approach Postgres takes to cleanup of garbage
tuples, since it tends to make the number of versions for any given
logical row as low as possible (within the confines of the global xmin
horizon limitations for cleanup).

Of course it would also be helpful to have something like zheap --
some mechanism that can store "recently dead" versions some place
where they at least don't bloat up the main relation structures. But
that's only one part of the big picture for Postgres MVCC garbage. We
should not store garbage tuples (i.e. those that are dead rather than
just recently dead) *anywhere*.

> First, it is not clear to me if that has properly simulated the
> long-running test but even if it is what I intend to say was to have
> an open long-running transaction possibly for the entire duration of
> the test? If we do that, we will come to know if there is any overhead
> and if so how much?

I am confident that you are completely wrong about regressing cases
with long-running transactions, except perhaps in some very narrow
sense that is of little practical relevance. Victor's test case did
result in a small loss of throughput, for example, but that's a small
price to pay to not have your indexes explode (note also that most of
the indexes weren't even used for reads, so in the real world it would
probably also improve throughput even in the short term). FWIW the
small drop in TPS probably had little to do with the cost of visiting
the heap for visibility information. Workloads must be made to "live
within their means". You can call that a regression if you like, but I
think that almost anybody else would take issue with that
characterization.

Slowing down non-HOT updaters in these extreme cases may actually be a
good thing, even when bottom-up deletion finally becomes ineffective.
It can be thought of as backpressure. I am not worried about slowing
down something that is already hopelessly inefficient and
unsustainable. I'd go even further than that, in fact -- I now wonder
if we should *deliberately* slow them down some more!

> Test with 2 un-modified indexes
> ===
> create table t1(c1 int, c2 int, c3 int, c4 char(10));
> create index idx_t1 on t1(c1);
> create index idx_t2 on t1(c2);
> create index idx_t3 on t1(c3);
>
> insert into t1 values(generate_series(1,500), 1, 10, 'aa');
> update t1 set c2 = 2;

> postgres=# update t1 set c2 = 2;
> UPDATE 500
> Time: 46533.530 ms (00:46.534)
>
> With HEAD
> ==
> postgres=# update t1 set c2 = 2;
> UPDATE 500
> Time: 52529.839 ms (00:52.530)
>
> I have dropped and recreated the table after each update in the test.

Good thing that you remembered to drop and recreate the table, since
otherwise bottom-up index deletion would look really good!

Besides, 

Re: strange error reporting

2021-01-20 Thread Alvaro Herrera
On 2021-Jan-20, Robert Haas wrote:

> On Wed, Jan 20, 2021 at 1:25 PM Alvaro Herrera  
> wrote:
> > That's because pgbench reports the input argument dbname, but since you
> > didn't specify anything, then PQconnectdbParams() uses the libpq
> > behavior.  I think we'd have to use PQdb() instead.
> 
> I figured it was something like that. I don't know whether the right
> thing is to use something like PQdb() to get the correct database
> name, or whether we should go with Tom's suggestion and omit that
> detail altogether, but I think showing the empty string when the user
> relied on the default is too confusing.

Well, the patch seems small enough, and I don't think it'll be in any
way helpful to omit that detail.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"Having your biases confirmed independently is how scientific progress is
made, and hence made our great society what it is today" (Mary Gardiner)
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index f7da3e1f62..30fde9e9ec 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -1226,7 +1226,7 @@ doConnect(void)
 	if (PQstatus(conn) == CONNECTION_BAD)
 	{
 		pg_log_error("connection to database \"%s\" failed: %s",
-	 dbName, PQerrorMessage(conn));
+	 PQdb(conn), PQerrorMessage(conn));
 		PQfinish(conn);
 		return NULL;
 	}


Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-20 Thread Alvaro Herrera
On 2021-Jan-20, Alexey Kondratov wrote:

> On 2021-01-20 21:08, Alexey Kondratov wrote:
> > 
> > I did a refactoring of ATExecSetTableSpaceNoStorage() in the 0001. New
> > function SetRelTablesapce() is placed into the tablecmds.c. Following
> > 0002 gets use of it. Is it close to what you and Michael suggested?
> 
> Ugh, forgot to attach the patches. Here they are.

Yeah, looks reasonable.

> + /* No work if no change in tablespace. */
> + oldTablespaceOid = rd_rel->reltablespace;
> + if (tablespaceOid != oldTablespaceOid ||
> + (tablespaceOid == MyDatabaseTableSpace && 
> OidIsValid(oldTablespaceOid)))
> + {
> + /* Update the pg_class row. */
> + rd_rel->reltablespace = (tablespaceOid == MyDatabaseTableSpace) 
> ?
> + InvalidOid : tablespaceOid;
> + CatalogTupleUpdate(pg_class, >t_self, tuple);
> +
> + changed = true;
> + }
> +
> + if (changed)
> + /* Record dependency on tablespace */
> + changeDependencyOnTablespace(RelationRelationId,
> +  
> reloid, rd_rel->reltablespace);

Why have a separate "if (changed)" block here instead of merging with
the above?


-- 
Álvaro Herrera39°49'30"S 73°17'W




Re: strange error reporting

2021-01-20 Thread Robert Haas
On Wed, Jan 20, 2021 at 1:25 PM Alvaro Herrera  wrote:
> That's because pgbench reports the input argument dbname, but since you
> didn't specify anything, then PQconnectdbParams() uses the libpq
> behavior.  I think we'd have to use PQdb() instead.

I figured it was something like that. I don't know whether the right
thing is to use something like PQdb() to get the correct database
name, or whether we should go with Tom's suggestion and omit that
detail altogether, but I think showing the empty string when the user
relied on the default is too confusing.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: strange error reporting

2021-01-20 Thread Alvaro Herrera
On 2021-Jan-20, Robert Haas wrote:

> On Wed, Jan 20, 2021 at 12:19 PM Tom Lane  wrote:
> > Robert Haas  writes:
> > > I just made the mistake of trying to run pgbench without first running
> > > createdb and got this:
> >
> > > pgbench: error: connection to database "" failed: could not connect to
> > > socket "/tmp/.s.PGSQL.5432": FATAL:  database "rhaas" does not exist
> >
> > > This looks pretty bogus because (1) I was not attempting to connect to
> > > a database whose name is the empty string [...]
> >
> > I'm not sure about the empty DB name in the first part (presumably
> > that's from pgbench, so what was your pgbench command exactly?).
> 
> I think it was just 'pgbench -i 40'. For sure, I didn't specify a database 
> name.

That's because pgbench reports the input argument dbname, but since you
didn't specify anything, then PQconnectdbParams() uses the libpq
behavior.  I think we'd have to use PQdb() instead.

-- 
Álvaro Herrera   Valdivia, Chile




Re: Jsonpath ** vs lax mode

2021-01-20 Thread Alvaro Herrera
On 2021-Jan-20, Alexander Korotkov wrote:

> My proposal is to make everything after the ** operator use strict mode
> (patch attached).  I think this shouldn't be backpatched, just applied to
> the v14.  Other suggestions?

I think changing the mode midway through the operation is strange.  What
do you think of requiring for ** that mode is strict?  That is, if ** is
used and the mode is lax, an error is thrown.

Thanks

-- 
Álvaro Herrera   Valdivia, Chile




Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-20 Thread Alexey Kondratov

On 2021-01-20 21:08, Alexey Kondratov wrote:

On 2021-01-20 18:54, Alvaro Herrera wrote:

On 2021-Jan-20, Alvaro Herrera wrote:


On 2021-Jan-20, Michael Paquier wrote:

> +/*
> + * This is mostly duplicating ATExecSetTableSpaceNoStorage,
> + * which should maybe be factored out to a library function.
> + */
> Wouldn't it be better to do first the refactoring of 0002 and then
> 0001 so as REINDEX can use the new routine, instead of putting that
> into a comment?

I think merging 0001 and 0002 into a single commit is a reasonable
approach.


... except it doesn't make a lot of sense to have set_rel_tablespace 
in

either indexcmds.c or index.c.  I think tablecmds.c is a better place
for it.  (I would have thought catalog/storage.c, but that one's not 
the

right abstraction level it seems.)



I did a refactoring of ATExecSetTableSpaceNoStorage() in the 0001. New
function SetRelTablesapce() is placed into the tablecmds.c. Following
0002 gets use of it. Is it close to what you and Michael suggested?



Ugh, forgot to attach the patches. Here they are.

--
AlexeyFrom 2c3876f99bc8ebdd07c532619992e7ec3093e50a Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Mon, 23 Mar 2020 21:10:29 +0300
Subject: [PATCH v2 2/2] Allow REINDEX to change tablespace

REINDEX already does full relation rewrite, this patch adds a
possibility to specify a new tablespace where new relfilenode
will be created.
---
 doc/src/sgml/ref/reindex.sgml |  22 +
 src/backend/catalog/index.c   |  72 ++-
 src/backend/commands/indexcmds.c  |  68 ++-
 src/bin/psql/tab-complete.c   |   4 +-
 src/include/catalog/index.h   |   2 +
 src/test/regress/input/tablespace.source  |  53 +++
 src/test/regress/output/tablespace.source | 102 ++
 7 files changed, 318 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 627b36300c..4f84060c4d 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -27,6 +27,7 @@ REINDEX [ ( option [, ...] ) ] { IN
 
 CONCURRENTLY [ boolean ]
 VERBOSE [ boolean ]
+TABLESPACE new_tablespace
 
  
 
@@ -187,6 +188,19 @@ REINDEX [ ( option [, ...] ) ] { IN
 

 
+   
+TABLESPACE
+
+ 
+  This specifies that indexes will be rebuilt on a new tablespace.
+  Cannot be used with "mapped" relations. If SCHEMA,
+  DATABASE or SYSTEM is specified, then
+  all unsuitable relations will be skipped and a single WARNING
+  will be generated.
+ 
+
+   
+

 VERBOSE
 
@@ -210,6 +224,14 @@ REINDEX [ ( option [, ...] ) ] { IN
 

 
+   
+new_tablespace
+
+ 
+  The tablespace where indexes will be rebuilt.
+ 
+
+   
   
  
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index b8cd35e995..ed98b17483 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -57,6 +57,7 @@
 #include "commands/event_trigger.h"
 #include "commands/progress.h"
 #include "commands/tablecmds.h"
+#include "commands/tablespace.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
@@ -1394,9 +1395,13 @@ index_update_collation_versions(Oid relid, Oid coll)
  * Create concurrently an index based on the definition of the one provided by
  * caller.  The index is inserted into catalogs and needs to be built later
  * on.  This is called during concurrent reindex processing.
+ *
+ * "tablespaceOid" is the new tablespace to use for this index.  If
+ * InvalidOid, use the tablespace in-use instead.
  */
 Oid
-index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId, const char *newName)
+index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
+			   Oid tablespaceOid, const char *newName)
 {
 	Relation	indexRelation;
 	IndexInfo  *oldInfo,
@@ -1526,7 +1531,8 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId, const char
 			  newInfo,
 			  indexColNames,
 			  indexRelation->rd_rel->relam,
-			  indexRelation->rd_rel->reltablespace,
+			  OidIsValid(tablespaceOid) ?
+tablespaceOid : indexRelation->rd_rel->reltablespace,
 			  indexRelation->rd_indcollation,
 			  indclass->values,
 			  indcoloptions->values,
@@ -3591,6 +3597,8 @@ IndexGetRelation(Oid indexId, bool missing_ok)
 
 /*
  * reindex_index - This routine is used to recreate a single index
+ *
+ * See comments of reindex_relation() for details about "tablespaceOid".
  */
 void
 reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
@@ -3603,6 +3611,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
 	volatile bool skipped_constraint = false;
 	PGRUsage	ru0;
 	bool		progress = ((params->options & REINDEXOPT_REPORT_PROGRESS) != 0);
+	bool		set_tablespace = OidIsValid(params->tablespaceOid);
 
 	pg_rusage_init();
 

Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2021-01-20 Thread Alexey Kondratov

On 2021-01-20 18:54, Alvaro Herrera wrote:

On 2021-Jan-20, Alvaro Herrera wrote:


On 2021-Jan-20, Michael Paquier wrote:

> +/*
> + * This is mostly duplicating ATExecSetTableSpaceNoStorage,
> + * which should maybe be factored out to a library function.
> + */
> Wouldn't it be better to do first the refactoring of 0002 and then
> 0001 so as REINDEX can use the new routine, instead of putting that
> into a comment?

I think merging 0001 and 0002 into a single commit is a reasonable
approach.


... except it doesn't make a lot of sense to have set_rel_tablespace in
either indexcmds.c or index.c.  I think tablecmds.c is a better place
for it.  (I would have thought catalog/storage.c, but that one's not 
the

right abstraction level it seems.)



I did a refactoring of ATExecSetTableSpaceNoStorage() in the 0001. New 
function SetRelTablesapce() is placed into the tablecmds.c. Following 
0002 gets use of it. Is it close to what you and Michael suggested?




But surely ATExecSetTableSpaceNoStorage should be using this new
routine.  (I first thought 0002 was doing that, since that commit is
calling itself a "refactoring", but now that I look closer, it's not.)



Yeah, this 'refactoring' was initially referring to refactoring of what 
Justin added to one of the previous 0001. And it was meant to be merged 
with 0001, once agreed, but we got distracted by other stuff.


I have not yet addressed Michael's concerns regarding reindex of 
partitions. I am going to look closer on it tomorrow.



Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company




Re: strange error reporting

2021-01-20 Thread Robert Haas
On Wed, Jan 20, 2021 at 12:47 PM Tom Lane  wrote:
> Fair.  One possibility, which'd take a few more cycles in libpq but
> likely not anything significant, is to replace "could not connect to ..."
> with "while connecting to ..." once we're past the connect() per se.

Yeah. I think this is kind of a client-side version of errcontext(),
except we don't really have that context formally, so we're trying to
figure out how to fake it in specific cases.

> > Maybe it would be better if it said:
>
> > connection to database at socket "/tmp/.s.PGSQL.5432" failed: FATAL:
> > database "rhaas" does not exist
>
> I'd be inclined to spell it "connection to server at ... failed",
> but that sort of wording is surely also possible.

"connection to server" rather than "connection to database" works for
me; in fact, I think I like it slightly better.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: strange error reporting

2021-01-20 Thread Tom Lane
Robert Haas  writes:
> On Wed, Jan 20, 2021 at 12:19 PM Tom Lane  wrote:
>> But the 'could not connect to socket' part is a consequence of my
>> recent fiddling with libpq's connection failure reporting, see
>> 52a10224e.  We could discuss exactly how that ought to be spelled,
>> but the idea is to consistently identify the host that we were trying
>> to connect to.  If you have a multi-host connection string, it's
>> conceivable that "rhaas" exists on some of those hosts and not others,
>> so I do not think the info is irrelevant.

> I'm not saying that which socket I used is totally irrelevant although
> in most cases it's going to be a lot of detail. I'm just saying that,
> at least for me, when you say you can't connect to a socket, I at
> least think about the return value of connect(2), which was clearly 0
> here.

Fair.  One possibility, which'd take a few more cycles in libpq but
likely not anything significant, is to replace "could not connect to ..."
with "while connecting to ..." once we're past the connect() per se.

> Maybe it would be better if it said:

> connection to database at socket "/tmp/.s.PGSQL.5432" failed: FATAL:
> database "rhaas" does not exist

I'd be inclined to spell it "connection to server at ... failed",
but that sort of wording is surely also possible.

regards, tom lane




Re: Getting column names/types from select query?

2021-01-20 Thread Wesley Aptekar-Cassels
> Where do you need this information?

I'm writing some code that takes a given query, and generates type-safe 
bindings for it, so people can write SQL queries and get structs (or vectors of 
structs) out the other end. So I'm pretty flexible about where I get it, given 
that it'll be part of my build/codegen process. I hadn't seen libpq yet, I'll 
look into that — thanks!




Re: strange error reporting

2021-01-20 Thread Robert Haas
On Wed, Jan 20, 2021 at 12:19 PM Tom Lane  wrote:
> Robert Haas  writes:
> > I just made the mistake of trying to run pgbench without first running
> > createdb and got this:
>
> > pgbench: error: connection to database "" failed: could not connect to
> > socket "/tmp/.s.PGSQL.5432": FATAL:  database "rhaas" does not exist
>
> > This looks pretty bogus because (1) I was not attempting to connect to
> > a database whose name is the empty string and (2) saying that it
> > couldn't connect to the socket is wrong, else it would not also be
> > showing a server message.
>
> I'm not sure about the empty DB name in the first part (presumably
> that's from pgbench, so what was your pgbench command exactly?).

I think it was just 'pgbench -i 40'. For sure, I didn't specify a database name.

> But the 'could not connect to socket' part is a consequence of my
> recent fiddling with libpq's connection failure reporting, see
> 52a10224e.  We could discuss exactly how that ought to be spelled,
> but the idea is to consistently identify the host that we were trying
> to connect to.  If you have a multi-host connection string, it's
> conceivable that "rhaas" exists on some of those hosts and not others,
> so I do not think the info is irrelevant.

I'm not saying that which socket I used is totally irrelevant although
in most cases it's going to be a lot of detail. I'm just saying that,
at least for me, when you say you can't connect to a socket, I at
least think about the return value of connect(2), which was clearly 0
here.

> Just looking at this, I wonder if we ought to drop pgbench's
> contribution to the message entirely; it seems like libpq's
> message is now fairly freestanding.

Maybe it would be better if it said:

connection to database at socket "/tmp/.s.PGSQL.5432" failed: FATAL:
database "rhaas" does not exist

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Phrase search vs. multi-lexeme tokens

2021-01-20 Thread Alexander Korotkov
On Thu, Jan 7, 2021 at 6:36 AM Alexander Korotkov  wrote:
>
> > I read your patch over quickly and it seems like a reasonable
> > approach (but sadly underdocumented).  Can we extend the idea
> > to fix the to_tsquery case?
>
> Sure, I'll provide a revised patch.

The next version of the patch is attached.  Now, it just makes
to_tsquery() and websearch_to_tsquery() use phrase operator to connect
multiple lexemes of the same tsquery token.  I leave plainto_tsquery()
aside because it considers the whole argument as a single token.
Changing it would make it an equivalent of phraseto_tsquery().

--
Regards,
Alexander Korotkov


tsquery_phrase_fix.path
Description: Binary data


Re: strange error reporting

2021-01-20 Thread Tom Lane
Robert Haas  writes:
> I just made the mistake of trying to run pgbench without first running
> createdb and got this:

> pgbench: error: connection to database "" failed: could not connect to
> socket "/tmp/.s.PGSQL.5432": FATAL:  database "rhaas" does not exist

> This looks pretty bogus because (1) I was not attempting to connect to
> a database whose name is the empty string and (2) saying that it
> couldn't connect to the socket is wrong, else it would not also be
> showing a server message.

I'm not sure about the empty DB name in the first part (presumably
that's from pgbench, so what was your pgbench command exactly?).
But the 'could not connect to socket' part is a consequence of my
recent fiddling with libpq's connection failure reporting, see
52a10224e.  We could discuss exactly how that ought to be spelled,
but the idea is to consistently identify the host that we were trying
to connect to.  If you have a multi-host connection string, it's
conceivable that "rhaas" exists on some of those hosts and not others,
so I do not think the info is irrelevant.

Just looking at this, I wonder if we ought to drop pgbench's
contribution to the message entirely; it seems like libpq's
message is now fairly freestanding.

regards, tom lane




Jsonpath ** vs lax mode

2021-01-20 Thread Alexander Korotkov
Hi!

We have a bug report which says that jsonpath ** operator behaves strangely
in the lax mode [1].

Naturally, the result of this query looks counter-intuitive.

# select jsonb_path_query_array('[{"a": 1, "b": [{"a": 2}]}]', 'lax
$.**.a');
 jsonb_path_query_array

 [1, 1, 2, 2]
(1 row)

But actually, everything works as designed.  ** operator reports both
objects and wrapping arrays, while object key accessor automatically
unwraps arrays.

# select x, jsonb_path_query_array(x, '$.a') from jsonb_path_query('[{"a":
1, "b": [{"a": 2}]}]', 'lax $.**') x;
  x  | jsonb_path_query_array
-+
 [{"a": 1, "b": [{"a": 2}]}] | [1]
 {"a": 1, "b": [{"a": 2}]}   | [1]
 1   | []
 [{"a": 2}]  | [2]
 {"a": 2}| [2]
 2   | []
(6 rows)

At first sight, we may just say that lax mode just sucks and
counter-intuitive results are expected.  But at the second sight, the lax
mode is used by default and current behavior may look too surprising.

My proposal is to make everything after the ** operator use strict mode
(patch attached).  I think this shouldn't be backpatched, just applied to
the v14.  Other suggestions?

Links
1.
https://www.postgresql.org/message-id/16828-2b0229babfad2d8c%40postgresql.org

--
Regards,
Alexander Korotkov


jsonpath_double_star_strict_mode.patch
Description: Binary data


strange error reporting

2021-01-20 Thread Robert Haas
I just made the mistake of trying to run pgbench without first running
createdb and got this:

pgbench: error: connection to database "" failed: could not connect to
socket "/tmp/.s.PGSQL.5432": FATAL:  database "rhaas" does not exist

This looks pretty bogus because (1) I was not attempting to connect to
a database whose name is the empty string and (2) saying that it
couldn't connect to the socket is wrong, else it would not also be
showing a server message.

I haven't investigated why this is happening; apologies if this is a
known issue.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Odd, intermittent failure in contrib/pageinspect

2021-01-20 Thread Tom Lane
Michael Paquier  writes:
> On Tue, Jan 19, 2021 at 05:03:49PM -0500, Tom Lane wrote:
>> In short I propose the attached patch, which also gets rid of
>> that duplicate query.

> Agreed, +1.

Pushed, thanks for looking at it.

regards, tom lane




Re: Getting column names/types from select query?

2021-01-20 Thread Tom Lane
"Wesley Aptekar-Cassels"  writes:
> I am interested in figuring out how to get the names and types of the
> columns from an arbitrary query.

Where do you need this information?  Usually the easiest way is to
prepare (plan) the query and then extract metadata, for instance
PQprepare and PQdescribePrepared if using libpq.

regards, tom lane




Re: [HACKERS] [PATCH] Generic type subscripting

2021-01-20 Thread Dian M Fay
On Wed Jan 20, 2021 at 11:22 AM EST, Dmitry Dolgov wrote:
> > On Tue Jan 19, 2021 at 1:42 PM EST, Pavel Stehule wrote:
> >
> > I found minor issues.
> >
> > Doc - missing tag
> >
> > and three whitespaces issues
> >
> > see attached patch
>
> Thanks, I need to remember to not skipp doc building for testing process
> even for such small changes. Hope now I didn't forget anything.
>
> > On Wed, Jan 20, 2021 at 09:58:43AM -0500, Dian M Fay wrote:
>
> > Here's a full editing pass on the documentation, with v45 and Pavel's
> > doc-whitespaces-fix.patch applied. I also corrected a typo in one of the
> > added hints.
>
> Great! I've applied almost all of it, except:
>
> + A jsonb value will accept assignments to nonexistent
> subscript
> + paths as long as the nonexistent elements being traversed are all
> arrays.
>
> Maybe I've misunderstood the intention, but there is no requirement
> about arrays for creating such an empty path. I've formulated it as:
>
> + A jsonb value will accept assignments to nonexistent
> subscript
> + paths as long as the last existing path key is an object or an array.

My intention there was to highlight the difference between:

* SET obj['a']['b']['c'] = '"newvalue"'
* SET arr[0][0][3] = '"newvalue"'

obj has to conform to {"a": {"b": {...}}} in order to receive the
assignment of the nested c. If it doesn't, that's the error case we
discussed earlier. But arr can be null, [], and so on, and any missing
structure [[[null, null, null, "newvalue"]]] will be created. Take 2:

A jsonb value will accept assignments to nonexistent
subscript paths as long as object key subscripts can be traversed as
described above. The final subscript is not traversed and, if it
describes a missing object key, will be created. Nested arrays will
always be created and NULL-padded according to the
path until the value can be placed appropriately.




Getting column names/types from select query?

2021-01-20 Thread Wesley Aptekar-Cassels
Hi all,

I am interested in figuring out how to get the names and types of the columns 
from an arbitrary query. Essentially, I want to be able to take a query like:

CREATE TABLE foo(
bar bigserial,
baz varchar(256)
);

SELECT * FROM foo WHERE bar = 42;

and figure out programmatically that the select will return a column "bar" of 
type bigserial, and a column "foo" of type varchar(256). I would like this to 
work for more complex queries as well (joins, CTEs, etc).

I've found https://wiki.postgresql.org/wiki/Query_Parsing, which talks about 
related ways to hook into postgres, but that seems to only talk about the parse 
tree — a lot more detail and processing seems to be required in order to figure 
out the output types. It seems like there should be somewhere I can hook into 
in postgres that will get me this information, but I have no familiarity with 
the codebase, so I don't know the best way to get this.

How would you recommend that I approach this? I'm comfortable patching postgres 
if needed, although if there's a solution that doesn't require that, I'd prefer 
that.

Thanks,

:w




  1   2   >