Re: ERROR: wrong varnullingrels (b 5 7) (expected (b)) for Var 3/3

2023-05-18 Thread Michael Paquier
On Fri, May 19, 2023 at 09:42:35AM +0800, tender wang wrote:
> postgres=# \d
> Did not find any relations.
> postgres=# CREATE TEMP TABLE t0(c0 inet, c1 money, c2 TEXT);
> CREATE TABLE
> postgres=# CREATE TEMP TABLE IF NOT EXISTS t1(LIKE t0);
> CREATE TABLE
> postgres=# CREATE TEMP TABLE t2(c0 boolean , c1 DECIMAL  NOT NULL UNIQUE);
> CREATE TABLE
> postgres=# CREATE TEMPORARY TABLE t3(LIKE t1);
> CREATE TABLE
> postgres=# CREATE VIEW v0(c0) AS (SELECT DISTINCT 0 FROM t3);
> NOTICE:  view "v0" will be a temporary view
> CREATE VIEW
> postgres=# SELECT SUM(count) FROM (SELECT
> (((t1.c2)LIKE(t0.c2)||(((103556691)-(v0.c0)||(v0.c0)::INT as
> count FROM t0, ONLY t1 RIGHT OUTER JOIN ONLY t2 ON t2.c0 RIGHT OUTER JOIN
> v0 ON ((t2.c1)=(0.08182310538090898))) as res;
> ERROR:  wrong varnullingrels (b 5 7) (expected (b)) for Var 3/3

Thanks for the test case, issue reproduced here on HEAD and not v15.
This causes an assertion failure here:
#4  0x55a6f8faa776 in ExceptionalCondition (conditionName=0x55a6f915ac60 
"bms_equal(rel->relids, root->all_query_rels)", fileName=0x55a6f915ac3d 
"allpaths.c", 
lineNumber=234) at assert.c:66
#5  0x55a6f8c55b6d in make_one_rel (root=0x55a6fa814ea8, 
joinlist=0x55a6fa83f758) at allpaths.c:234
#6  0x55a6f8c95c45 in query_planner (root=0x55a6fa814ea8, 
qp_callback=0x55a6f8c9c373 , qp_extra=0x7ffc98138570) at 
planmain.c:278
#7  0x55a6f8c9860a in grouping_planner (root=0x55a6fa814ea8, 
tuple_fraction=0) at planner.c:1493

I am adding an open item.
--
Michael


signature.asc
Description: PGP signature


Re: createuser --memeber and PG 16

2023-05-18 Thread Michael Paquier
On Thu, May 18, 2023 at 10:22:32PM -0400, Bruce Momjian wrote:
> Patch applied.

Thanks, Bruce.
--
Michael


signature.asc
Description: PGP signature


Re: PostgreSQL 16 Beta 1 release announcement draft

2023-05-18 Thread Erik Rijkers

Op 5/19/23 om 06:17 schreef Jonathan S. Katz:

Hi,

Attached is a draft of the release announcement for PostgreSQL 16 Beta 


Hi,


The usual small fry:


'continues to to'  should be
'continues to'

'continues to give users to the ability'  should be
'continues to give users the ability to'

'pg_createsubscription'  should be
'pg_create_subscription'

'starting with release'  should be
'starting with this release'

'credentials to connected to other services'  should be
'credentials to connect to other services'


Thanks,

Erik





Re: psql: Could we get "-- " prefixing on the **** QUERY **** outputs? (ECHO_HIDDEN)

2023-05-18 Thread Andrey M. Borodin



> On 18 May 2023, at 02:23, Kirk Wolak  wrote:
> 
> I labelled this v2. 

+1 to the feature and the patch looks good to me.

I have a question, but mostly for my own knowledge. Translation changes seem 
trivial for all languages, do we typically fix .po files in such cases? Or do 
we leave it to translators to revise the stuff?

Thanks!

Best regards, Andrey Borodin.





PostgreSQL 16 Beta 1 release announcement draft

2023-05-18 Thread Jonathan S. Katz

Hi,

Attached is a draft of the release announcement for PostgreSQL 16 Beta 
1. The goal of this announcement is to get people excited about testing 
the beta and highlight many of the new features.


Please review for inaccuracies, omissions, and other suggestions / errors.

Please provide feedback no later than May 24, 0:00 AoE. This will give 
me enough time to incorporate the changes prior to the release the next day.


Thanks,

Jonathan
The PostgreSQL Global Development Group announces that the first beta release of
PostgreSQL 16 is now [available for 
download](https://www.postgresql.org/download/).
This release contains previews of all features that will be available when
PostgreSQL 16 is made generally available, though some details of the release
can change during the beta period.

You can find information about all of the features and changes found in
PostgreSQL 16 in the [release 
notes](https://www.postgresql.org/docs/16/release-16.html):

  
[https://www.postgresql.org/docs/16/release-16.html](https://www.postgresql.org/docs/16/release-16.html)

In the spirit of the open source PostgreSQL community, we strongly encourage you
to test the new features of PostgreSQL 16 on your systems to help us eliminate
bugs or other issues that may exist. While we do not advise you to run
PostgreSQL 16 Beta 1 in production environments, we encourage you to find ways
to run your typical application workloads against this beta release.

Your testing and feedback will help the community ensure that the PostgreSQL 16
release upholds our standards of delivering a stable, reliable release of the
world's most advanced open source relational database. Please read more about
our [beta testing process](https://www.postgresql.org/developer/beta/) and how
you can contribute:

  
[https://www.postgresql.org/developer/beta/](https://www.postgresql.org/developer/beta/)

PostgreSQL 16 Feature Highlights


### Logical replication enhancements

Logical replication lets PostgreSQL users stream data in real-time to other
PostgreSQL or other external systems that implement the logical protocol. Until
PostgreSQL 16, users could only create logical replication publishers on primary
instances. PostgreSQL 16 adds the ability to perform logical decoding on a
standby instance, given users more options to distribute their workload, for
example, use a standby that's less busy than a primary to logically replicate
changes.

PostgreSQL 16 also includes several performance improvements to logical
replication. This includes allowing the subscriber to apply large transactions
in parallel, use indexes other than the `PRIMARY KEY` to perform lookups during
`UPDATE` or `DELETE` operations, and allow for tables to be copied using binary
format during initialization.

### Performance

PostgreSQL 16 includes performance improvements in query execution. This release
adds more query parallelism, including allowing `FULL` and `OUTER` joins to
execute in parallel, and parallel execution of the `string_agg` and `array_agg`
aggregate functions. Additionally, PostgreSQL 16 can use incremental sorts in
`SELECT DISTINCT` queries. There are also several optimizations for
[window 
queries](https://www.postgresql.org/docs/devel/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS),
improvements in lookups for `RANGE` and `LIST` partitions, and support for
"anti-joins" in `RIGHT` and `OUTER` queries.

This release also introduces support for CPU acceleration using SIMD for both
x86 and ARM architectures, including optimizations for processing ASCII and JSON
strings, and array and substransaction searches. Additionally, PostgreSQL 16
introduces [load 
balancing](https://www.postgresql.org/docs/devel/libpq-connect.html#LIBPQ-CONNECT-LOAD-BALANCE-HOSTS)
to libpq, the client library for PostgreSQL.

### Developer Experience

PostgreSQL 16 continues to to implement the 
[SQL/JSON](https://www.postgresql.org/docs/devel/functions-json.html)
standard for manipulating 
[JSON](https://www.postgresql.org/docs/devel/datatype-json.html)
data, including support for SQL/JSON constructors (e.g. `JSON_ARRAY()`,
`JSON_ARRAYAGG()` et al), and identity functions (`IS JSON`). This release also
adds the SQL standard 
[`ANY_VALUE`](https://www.postgresql.org/docs/devel/functions-aggregate.html#id-1.5.8.27.5.2.4.1.1.1.1)
aggregate function, which returns any arbitrary value from the aggregate set.
For convenience, PostgreSQL 16 now lets you specify non-decimal integer
literals, such as `0xff`, `0o777`, and `0b101010`, and use thousands separators,
such as `5_432`.

This release adds support for the extended query protocol to the 
[`psql`](https://www.postgresql.org/docs/devel/app-psql.html)
client. Users can execute a query, e.g. `SELECT $1 + $2`, and use the
[`\bind`](https://www.postgresql.org/docs/devel/app-psql.html#APP-PSQL-META-COMMAND-BIND)
command to substitute the variables.

### Security features

PostgreSQL 16 continues to give users to the ability 

RE: Reload configuration more frequently in apply worker.

2023-05-18 Thread Zhijie Hou (Fujitsu)
On Wednesday, May 17, 2023 11:05 AM Amit Kapila  wrote:
> 
> On Wed, May 17, 2023 at 7:18 AM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > Currently, the main loop of apply worker looks like below[1]. Since
> > there are two loops, the inner loop will keep receiving and applying
> > message from publisher until no more message left. The worker only
> > reloads the configuration in the outer loop. This means if the
> > publisher keeps sending messages (it could keep sending multiple
> > transactions), the apply worker won't get a chance to update the GUCs.
> >
> 
> Apart from that, I think in rare cases, it seems possible that after the apply
> worker has waited for the data and just before it receives the new replication
> data/message, the reload happens, then it won't get a chance to process the
> reload before processing the new message.
> I think such a theory can explain the rare BF failure you pointed out later 
> in the
> thread. Does that make sense?

Yes, that makes sense. That's another case where we would miss the reload and I 
think
is the reason for the failure because the apply worker has finished applying 
changes(which
means it's idle) before the failed case.

Best Regards,
Hou zj


Re: Assert failure of the cross-check for nullingrels

2023-05-18 Thread Richard Guo
On Fri, May 19, 2023 at 12:33 AM Tom Lane  wrote:

> Bleah.  The other solution I'd been poking at involved adding an
> extra check for clone clauses, as attached (note this requires
> 8a2523ff3).  This survives your example, but I wonder if it might
> reject all the clones in some cases.  It seems a bit expensive
> too, although as I said before, I don't think the clone cases get
> traversed all that often.


I tried with v4 patch and find that, as you predicted, it might reject
all the clones in some cases.  Check the query below

explain (costs off)
select * from t t1
left join t t2 on t1.a = t2.a
left join t t3 on t2.a = t3.a
left join t t4 on t3.a = t4.a and t2.b = t4.b;
QUERY PLAN
--
 Hash Left Join
   Hash Cond: (t2.b = t4.b)
   ->  Hash Left Join
 Hash Cond: (t2.a = t3.a)
 ->  Hash Left Join
   Hash Cond: (t1.a = t2.a)
   ->  Seq Scan on t t1
   ->  Hash
 ->  Seq Scan on t t2
 ->  Hash
   ->  Seq Scan on t t3
   ->  Hash
 ->  Seq Scan on t t4
(13 rows)

So the qual 't3.a = t4.a' is missing in this plan shape.


> Perhaps another answer could be to compare against syn_righthand
> for clone clauses and min_righthand for non-clones?  That seems
> mighty unprincipled though.


I also checked this solution with the same query.

explain (costs off)
select * from t t1
left join t t2 on t1.a = t2.a
left join t t3 on t2.a = t3.a
left join t t4 on t3.a = t4.a and t2.b = t4.b;
QUERY PLAN
--
 Hash Left Join
   Hash Cond: ((t3.a = t4.a) AND (t3.a = t4.a) AND (t2.b = t4.b))
   ->  Hash Left Join
 Hash Cond: (t2.a = t3.a)
 ->  Hash Left Join
   Hash Cond: (t1.a = t2.a)
   ->  Seq Scan on t t1
   ->  Hash
 ->  Seq Scan on t t2
 ->  Hash
   ->  Seq Scan on t t3
   ->  Hash
 ->  Seq Scan on t t4
(13 rows)

This time the qual 't3.a = t4.a' is back, but twice.

I keep thinking about my proposal in v2 patch.  It seems more natural to
me to fix this issue, because an outer join's quals are always treated
as a whole when we check if identity 3 applies in make_outerjoininfo, as
well as when we adjust the outer join's quals for commutation in
deconstruct_distribute_oj_quals.  So when it comes to check if quals are
computable at a join level, they should be still treated as a whole.
This should have the same effect regarding qual placement if the quals
of an outer join are in form of 'qual1 OR qual2 OR ...' rather than
'qual1 AND qual2 AND ...'.

Thanks
Richard


Re: PG 16 draft release notes ready

2023-05-18 Thread Bruce Momjian
On Fri, May 19, 2023 at 10:41:59AM +0800, jian he wrote:
> On Fri, May 19, 2023 at 4:49 AM Bruce Momjian  wrote:
> * When granting role membership, require the granted-by role to be a role
> that has appropriate permissions (Robert Haas)
> This is a requirement even when the superuser is granting role membership.
> 
> 
> an exception would be the granted-by is the bootstrap superuser.

Okay, updated text is:



When granting role membership, require the granted-by role to be a role 
that has appropriate permissions (Robert Haas)



This is a requirement even when a non-bootstrap superuser is granting 
role membership.



-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: PG 16 draft release notes ready

2023-05-18 Thread Bruce Momjian
On Thu, May 18, 2023 at 07:15:26PM -0700, Peter Geoghegan wrote:
> On Thu, May 18, 2023 at 6:44 PM Bruce Momjian  wrote:
> > > I don't understand what you mean by that. The changes to
> > > *_till_end_of_wal() (the way that those duplicative functions were
> > > removed, and more permissive end_lsn behavior was added) is unrelated
> > > to all of the other changes. Plus it's just not very important.
> >
> > I see what you mean now.  I have moved the function removal to the
> > incompatibilities section and kept the existing entry but remove the
> > text about the removed functions.
> 
> Your patch now has two separate items for "[5c1b66280] Rework design
> of functions in pg_walinspect", but even one item is arguably one too
> many. The "ending LSN" item (the second item for this same commit)

I see your point.  pg_get_wal_block_info() doesn't exist in pre-PG 16,
so I have removed that text from the release note item, but kept the
other two functions.

> should probably be removed altogether. If you're going to keep the
> sentences that appear under that second item, then it should at least
> be consolidated with the first item, in order that commit 5c1b66280
> isn't listed twice.

We can list items twice if they have different focuses.

> Note also that the patch doesn't remove a remaining reference to an
> update in how pg_get_wal_block_info() works, which (as I've said) is a
> brand new function as far as users are concerned. Users don't need to
> hear that it has been updated, since these release notes will also be
> the first time they've been presented with any information about
> pg_get_wal_block_info(). (This isn't very important; again, I suggest
> that you avoid saying anything about any specific function, even if
> you feel strongly that the "ending LSN" issue must be spelled out like
> this.)

Agreed.

> > > There is pretty much one truly new piece of functionality added to
> > > pg_walinspect (the function called pg_get_wal_block_info was added) --
> > > since the enhancement to rmgr description output applies equally to
> > > pg_waldump, no matter where you place it in the release notes. So not
> > > sure what you mean.
> >
> > I see what you mean now.  I have removed the mention of
> > pg_get_wal_block_info() and moved the three items back into the
> > extension section since there are only three pg_walinspect items now.
> 
> The wording for this item as it appears in the patch is: "Improve
> descriptions of pg_walinspect WAL record descriptions". I suggest the
> following wording be used instead: "Provide more detailed descriptions
> of certain WAL records in the output of pg_walinspect and pg_waldump".

I went with, "Add detailed descriptions of WAL records in pg_walinspect
and pg_waldump (Melanie Plageman, Peter Geoghegan)".

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: walsender performance regression due to logical decoding on standby changes

2023-05-18 Thread Kyotaro Horiguchi
At Thu, 18 May 2023 20:11:11 +0530, Bharath Rupireddy 
 wrote in 
> > > + ConditionVariableInit(>physicalWALSndCV);
> > > + ConditionVariableInit(>logicalWALSndCV);
> >
> > It's not obvious to me that it's worth having two CVs, because it's more
> > expensive to find no waiters in two CVs than to find no waiters in one CV.
> 
> I disagree. In the tight per-WAL record recovery loop, WalSndWakeup
> wakes up logical walsenders for every WAL record, but it wakes up
> physical walsenders only if the applied WAL record causes a TLI
> switch. Therefore, the extra cost of spinlock acquire-release for per
> WAL record applies only for logical walsenders. On the other hand, if
> we were to use a single CV, we would be unnecessarily waking up (if at
> all they are sleeping) physical walsenders for every WAL record -
> which is costly IMO.

As I was reading this, I start thinking that one reason for the
regression could be to exccessive frequency of wakeups during logical
replication. In physical replication, we make sure to avoid exccessive
wakeups when the stream is tightly packed.  I'm just wondering why
logical replication doesn't (or can't) do the same thing, IMHO.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



 




Re: PG 16 draft release notes ready

2023-05-18 Thread jian he
On Fri, May 19, 2023 at 4:49 AM Bruce Momjian  wrote:

> I have completed the first draft of the PG 16 release notes.  You can
> see the output here:
>
> https://momjian.us/pgsql_docs/release-16.html
>
> I will adjust it to the feedback I receive;  that URL will quickly show
> all updates.
>
> I learned a few things creating it this time:
>
> *  I can get confused over C function names and SQL function names in
>commit messages.
>
> *  The sections and ordering of the entries can greatly clarify the
>items.
>
> *  The feature count is slightly higher than recent releases:
>
> release-10:  189
> release-11:  170
> release-12:  180
> release-13:  178
> release-14:  220
> release-15:  184
> --> release-16:  200
>
> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
>
>   Only you can decide what is important to you.



* When granting role membership, require the granted-by role to be a role
> that has appropriate permissions (Robert Haas)
> This is a requirement even when the superuser is granting role membership.


an exception would be the granted-by is the bootstrap superuser.


Re: createuser --memeber and PG 16

2023-05-18 Thread Bruce Momjian
On Wed, May 10, 2023 at 01:33:26PM -0400, Bruce Momjian wrote:
> In writing the PG 16 release notes, I came upon an oddity in our new
> createuser syntax, specifically --role and --member.  It turns out that
> --role matches CREATE ROLE ... ROLE IN (and has prior to PG 16) while
> the new --member option matches CREATE ROLE ... ROLE.  The PG 16 feature
> discussion thread is here:
> 
>   
> https://www.postgresql.org/message-id/flat/69a9851035cf0f0477bcc5d742b031a3%40oss.nttdata.com
> 
> This seems like it will be forever confusing to people.  I frankly don't
> know why --role matching CREATE ROLE ... ROLE IN was not already
> confusing in pre-PG 16.  Any new ideas for improvement?
> 
> At a minium I would like to apply the attached doc patch to PG 16 to
> improve awkward wording in CREATE ROLE and createuser.

Patch applied.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: PG 16 draft release notes ready

2023-05-18 Thread Peter Geoghegan
On Thu, May 18, 2023 at 6:44 PM Bruce Momjian  wrote:
> > I don't understand what you mean by that. The changes to
> > *_till_end_of_wal() (the way that those duplicative functions were
> > removed, and more permissive end_lsn behavior was added) is unrelated
> > to all of the other changes. Plus it's just not very important.
>
> I see what you mean now.  I have moved the function removal to the
> incompatibilities section and kept the existing entry but remove the
> text about the removed functions.

Your patch now has two separate items for "[5c1b66280] Rework design
of functions in pg_walinspect", but even one item is arguably one too
many. The "ending LSN" item (the second item for this same commit)
should probably be removed altogether. If you're going to keep the
sentences that appear under that second item, then it should at least
be consolidated with the first item, in order that commit 5c1b66280
isn't listed twice.

Note also that the patch doesn't remove a remaining reference to an
update in how pg_get_wal_block_info() works, which (as I've said) is a
brand new function as far as users are concerned. Users don't need to
hear that it has been updated, since these release notes will also be
the first time they've been presented with any information about
pg_get_wal_block_info(). (This isn't very important; again, I suggest
that you avoid saying anything about any specific function, even if
you feel strongly that the "ending LSN" issue must be spelled out like
this.)

> > There is pretty much one truly new piece of functionality added to
> > pg_walinspect (the function called pg_get_wal_block_info was added) --
> > since the enhancement to rmgr description output applies equally to
> > pg_waldump, no matter where you place it in the release notes. So not
> > sure what you mean.
>
> I see what you mean now.  I have removed the mention of
> pg_get_wal_block_info() and moved the three items back into the
> extension section since there are only three pg_walinspect items now.

The wording for this item as it appears in the patch is: "Improve
descriptions of pg_walinspect WAL record descriptions". I suggest the
following wording be used instead: "Provide more detailed descriptions
of certain WAL records in the output of pg_walinspect and pg_waldump".

-- 
Peter Geoghegan




Re: PG 16 draft release notes ready

2023-05-18 Thread Bruce Momjian
On Fri, May 19, 2023 at 01:33:17AM +0200, Matthias van de Meent wrote:
> On Thu, 18 May 2023 at 22:49, Bruce Momjian  wrote:
> >
> > I have completed the first draft of the PG 16 release notes.  You can
> > see the output here:
> >
> > https://momjian.us/pgsql_docs/release-16.html
> >
> > I will adjust it to the feedback I receive;  that URL will quickly show
> > all updates.
> 
> I'm not sure if bugfixes like these are considered for release notes,
> but I'm putting it up here just in case:
> 
> As of 8fcb32db (new, only in PG16) we now enforce limits on the size
> of WAL records during construction, where previously we hoped that the
> WAL records didn't exceed those limits.
> This change is immediately user-visible through a change in behaviour
> of `pg_logical_emit_message(true, repeat('_', 2^30 - 10), repeat('_',
> 2^30 - 10))`, and extensions that implement their own rmgrs could also
> see a change in behavior from this change.

I saw that commit but I considered it sufficiently rare and sufficiently
internal that I did not include it.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: PG 16 draft release notes ready

2023-05-18 Thread Bruce Momjian
On Thu, May 18, 2023 at 04:12:26PM -0700, Peter Geoghegan wrote:
> On Thu, May 18, 2023 at 3:52 PM Bruce Momjian  wrote:
> > So, I looked at this and the problem is that this is best as a single
> > release note entry because we are removing and adding, and if I moved it
> > to compatibility, I am concerned the new feature will be missed.  Since
> > WAL inspection is a utility operation, inn general, I think having it in
> > the pg_walinspect section makes the most sense.
> 
> I don't understand what you mean by that. The changes to
> *_till_end_of_wal() (the way that those duplicative functions were
> removed, and more permissive end_lsn behavior was added) is unrelated
> to all of the other changes. Plus it's just not very important.

I see what you mean now.  I have moved the function removal to the
incompatibilities section and kept the existing entry but remove the
text about the removed functions.

> > Okay, I went with:
> >
> > Improve descriptions of pg_walinspect WAL record descriptions
> > (Melanie Plageman, Peter Geoghegan)
> >
> > > Note also that the item "Add pg_waldump option --save-fullpage to dump
> > > full page images (David Christensen)" is tangentially related to
> > > pg_get_wal_block_info(), since you can also get FPIs using
> > > pg_get_wal_block_info() (in fact, that was originally its main
> > > purpose). I'm not saying that you necessarily need to connect them
> > > together in any way, but you might consider it.
> >
> > Well, there is so much _new_ in that tool that listing everything new
> > seems confusing.
> 
> There is pretty much one truly new piece of functionality added to
> pg_walinspect (the function called pg_get_wal_block_info was added) --
> since the enhancement to rmgr description output applies equally to
> pg_waldump, no matter where you place it in the release notes. So not
> sure what you mean.

I see what you mean now.  I have removed the mention of
pg_get_wal_block_info() and moved the three items back into the
extension section since there are only three pg_walinspect items now.

> > All changes committed.
> 
> Even after these changes, the release notes still refer to a function
> called "pg_get_wal_block". There is no such function, though -- not in
> Postgres 16, and not in any other major version.

Oh

> 
> As I said, there is a new function called "pg_get_wal_block_info". It
> should simply be presented as a whole new function that offers novel
> new functionality compared to what was available in Postgres 15 --
> without any further elaboration. (It happens to be true that
> pg_get_wal_block_info only reached its final form following multiple
> rounds of work in multiple commits, but that is of no consequence to
> users -- even the earliest form of the function appeared in a commit
> in the Postgres 16 cycle.)

Done.  Please see the URL for the updated text, diff attached.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/release-16.sgml b/doc/src/sgml/release-16.sgml
index 47f6cae907..5c17237966 100644
--- a/doc/src/sgml/release-16.sgml
+++ b/doc/src/sgml/release-16.sgml
@@ -91,6 +91,17 @@ Processing such indexes is still possible using REINDEX SYSTEM.
 
 
 
+
+
+
+
+Remove pg_walinspect functions pg_get_wal_records_info_till_end_of_wal() and pg_get_wal_stats_till_end_of_wal().
+
+
+
 
+
+
+
+Add pg_walinspect function pg_get_wal_block_info() to report WAL block information (Michael Paquier, Melanie Plageman, Bharath Rupireddy)
+
+
+
+
+
+
+
+Change how pg_walinspect functions pg_get_wal_records_info(), pg_get_wal_stats(), and pg_get_wal_block_info() interpret ending LSNs (Bharath Rupireddy)
+
+
+
+Previously ending LSNs which represent nonexistent WAL locations would generate an error, while they will now be interpreted as the end of the WAL.
+
+
+
+
+
+
+
+Improve descriptions of pg_walinspect WAL record descriptions (Melanie Plageman, Peter Geoghegan)
+
+
+
 
-
-
-
-Add pg_walinspect function pg_get_wal_block() to report WAL block information (Michael Paquier, Melanie Plageman, Bharath Rupireddy)
-
-
-
-
-
-
-
-Add output fields to pg_walinspect's function pg_get_wal_block_info() (Bharath Rupireddy, Peter Geoghegan)
-
-
-
-
-
-
-
-Change how pg_walinspect functions pg_get_wal_records_info(), pg_get_wal_stats(), and pg_get_wal_block_info() interpret ending LSNs (Bharath Rupireddy)
-
-
-
-Previously ending LSNs which represent nonexistent WAL locations would generate an error, while they will now be interpreted as the end of the WAL.  Functions pg_get_wal_records_info_till_end_of_wal() and
-pg_get_wal_stats_till_end_of_wal() have been removed.
-
-
-
-
-
-
-
-Improve descriptions of pg_walinspect WAL record descriptions (Melanie Plageman, Peter Geoghegan)
-
 
 
  


ERROR: wrong varnullingrels (b 5 7) (expected (b)) for Var 3/3

2023-05-18 Thread tender wang
Hi hackers,
   I found $subject problem when using SQLancer.

   How to repeat:
 CREATE TEMP TABLE t0(c0 inet, c1 money, c2 TEXT);
CREATE TEMP TABLE IF NOT EXISTS t1(LIKE t0); CREATE TEMP TABLE t2(c0
boolean , c1 DECIMAL NOT NULL UNIQUE); CREATE TEMPORARY TABLE t3(LIKE t1);
CREATE VIEW v0(c0) AS (SELECT DISTINCT 0 FROM t3); SELECT SUM(count) FROM
(SELECT
(((t1.c2)LIKE(t0.c2)||(((103556691)-(v0.c0)||(v0.c0)::INT as
count FROM t0, ONLY t1 RIGHT OUTER JOIN ONLY t2 ON t2.c0 RIGHT OUTER JOIN
v0 ON ((t2.c1)=(0.08182310538090898))) as res;


psql (16devel)
Type "help" for help.

postgres=# \d
Did not find any relations.
postgres=# CREATE TEMP TABLE t0(c0 inet, c1 money, c2 TEXT);
CREATE TABLE
postgres=# CREATE TEMP TABLE IF NOT EXISTS t1(LIKE t0);
CREATE TABLE
postgres=# CREATE TEMP TABLE t2(c0 boolean , c1 DECIMAL  NOT NULL UNIQUE);
CREATE TABLE
postgres=# CREATE TEMPORARY TABLE t3(LIKE t1);
CREATE TABLE
postgres=# CREATE VIEW v0(c0) AS (SELECT DISTINCT 0 FROM t3);
NOTICE:  view "v0" will be a temporary view
CREATE VIEW
postgres=# SELECT SUM(count) FROM (SELECT
(((t1.c2)LIKE(t0.c2)||(((103556691)-(v0.c0)||(v0.c0)::INT as
count FROM t0, ONLY t1 RIGHT OUTER JOIN ONLY t2 ON t2.c0 RIGHT OUTER JOIN
v0 ON ((t2.c1)=(0.08182310538090898))) as res;
ERROR:  wrong varnullingrels (b 5 7) (expected (b)) for Var 3/3

regards, tender wang


Re: Missing update of all_hasnulls in BRIN opclasses

2023-05-18 Thread Tomas Vondra
On 5/18/23 20:45, Tomas Vondra wrote:
> ...
>
> 0001 fixes the issue. 0002 is the original fix, and 0003 is just the
> pageinspect changes (for master only).
> 
> For the backbranches, I thought about making the code more like master
> (by moving some of the handling from opclasses to brin.c), but decided
> not to. It'd be low-risk, but it feels wrong to kinda do what the master
> does under "oi_regular_nulls" flag.
> 

I've now pushed all these patches into relevant branches, after some
minor last-minute tweaks, and so far it didn't cause any buildfarm
issues. Assuming this fully fixes the NULL-handling for BRIN, this
leaves just the deadlock issue discussed in [1].

It seems rather unfortunate all these issues went unnoticed / unreported
essentially since BRIN was introduced in 9.5. To some extent it might be
explained by fairly low likelihood of actually hitting the issue (just
the right timing, concurrency with summarization, NULL values, ...). It
took me quite a bit of time and luck to (accidentally) hit these issues
while stress testing the code.

But there's also the problem of writing tests for this kind of thing. To
exercise the interesting parts (e.g. the union_tuples), it's necessary
to coordinate the order of concurrent steps - but what's a good generic
way to do that (which we could do in TAP tests)? In manual testing it's
doable by setting breakpoints on a particular lines, and step through
the concurrent processes that way.

But that doesn't seem like a particularly great solution for regression
tests. I can imagine adding some sort of "probes" into the code and then
attaching breakpoints to those, but surely we're not the first project
needing this ...


regards

[1]
https://www.postgresql.org/message-id/261e68bc-f5f5-5234-fb2c-af4f58351...@enterprisedb.com

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: The documentation for READ COMMITTED may be incomplete or wrong

2023-05-18 Thread Tom Lane
I wrote:
> Debian Code Search doesn't know of any outside code touching
> relsubs_done, so I think we are safe in dropping that code in
> ExecScanReScan.  It seems quite pointless anyway considering
> that up to now, EvalPlanQualBegin has always zeroed the whole
> array.

Oh, belay that.  What I'd forgotten is that it's possible that
the target relation is on the inside of a nestloop, meaning that
we might need to fetch the EPQ substitute tuple more than once.
So there are three possible states: blocked (never return a
tuple), ready to return a tuple, and done returning a tuple
for this scan.  ExecScanReScan needs to reset "done" to "ready",
but not touch the "blocked" state.  The attached v2 mechanizes
that using two bool arrays.

What I'm thinking about doing to back-patch this is to replace
one of the pointer fields in EPQState with a pointer to a
subsidiary palloc'd structure, where we can put the new fields
along with the cannibalized old one.  We've done something
similar before, and it seems a lot safer than having basically
different logic in v16 than earlier branches.

regards, tom lane

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0186be452c..c76fdf59ec 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2469,7 +2469,7 @@ ExecBuildAuxRowMark(ExecRowMark *erm, List *targetlist)
  *	relation - table containing tuple
  *	rti - rangetable index of table containing tuple
  *	inputslot - tuple for processing - this can be the slot from
- *		EvalPlanQualSlot(), for the increased efficiency.
+ *		EvalPlanQualSlot() for this rel, for increased efficiency.
  *
  * This tests whether the tuple in inputslot still matches the relevant
  * quals. For that result to be useful, typically the input tuple has to be
@@ -2503,6 +2503,14 @@ EvalPlanQual(EPQState *epqstate, Relation relation,
 	if (testslot != inputslot)
 		ExecCopySlot(testslot, inputslot);
 
+	/*
+	 * Mark that an EPQ tuple is available for this relation.  (If there is
+	 * more than one result relation, the others remain marked as having no
+	 * tuple available.)
+	 */
+	epqstate->relsubs_done[rti - 1] = false;
+	epqstate->relsubs_blocked[rti - 1] = false;
+
 	/*
 	 * Run the EPQ query.  We assume it will return at most one tuple.
 	 */
@@ -2519,11 +2527,12 @@ EvalPlanQual(EPQState *epqstate, Relation relation,
 		ExecMaterializeSlot(slot);
 
 	/*
-	 * Clear out the test tuple.  This is needed in case the EPQ query is
-	 * re-used to test a tuple for a different relation.  (Not clear that can
-	 * really happen, but let's be safe.)
+	 * Clear out the test tuple, and mark that no tuple is available here.
+	 * This is needed in case the EPQ state is re-used to test a tuple for a
+	 * different target relation.
 	 */
 	ExecClearTuple(testslot);
+	epqstate->relsubs_blocked[rti - 1] = true;
 
 	return slot;
 }
@@ -2532,18 +2541,26 @@ EvalPlanQual(EPQState *epqstate, Relation relation,
  * EvalPlanQualInit -- initialize during creation of a plan state node
  * that might need to invoke EPQ processing.
  *
+ * If the caller intends to use EvalPlanQual(), resultRelations should be
+ * a list of RT indexes of potential target relations for EvalPlanQual(),
+ * and we will arrange that the other listed relations don't return any
+ * tuple during an EvalPlanQual() call.  Otherwise resultRelations
+ * should be NIL.
+ *
  * Note: subplan/auxrowmarks can be NULL/NIL if they will be set later
  * with EvalPlanQualSetPlan.
  */
 void
 EvalPlanQualInit(EPQState *epqstate, EState *parentestate,
- Plan *subplan, List *auxrowmarks, int epqParam)
+ Plan *subplan, List *auxrowmarks,
+ int epqParam, List *resultRelations)
 {
 	Index		rtsize = parentestate->es_range_table_size;
 
 	/* initialize data not changing over EPQState's lifetime */
 	epqstate->parentestate = parentestate;
 	epqstate->epqParam = epqParam;
+	epqstate->resultRelations = resultRelations;
 
 	/*
 	 * Allocate space to reference a slot for each potential rti - do so now
@@ -2566,6 +2583,7 @@ EvalPlanQualInit(EPQState *epqstate, EState *parentestate,
 	epqstate->recheckplanstate = NULL;
 	epqstate->relsubs_rowmark = NULL;
 	epqstate->relsubs_done = NULL;
+	epqstate->relsubs_blocked = NULL;
 }
 
 /*
@@ -2763,7 +2781,13 @@ EvalPlanQualBegin(EPQState *epqstate)
 		Index		rtsize = parentestate->es_range_table_size;
 		PlanState  *rcplanstate = epqstate->recheckplanstate;
 
-		MemSet(epqstate->relsubs_done, 0, rtsize * sizeof(bool));
+		/*
+		 * Reset the relsubs_done[] flags to equal relsubs_blocked[], so that
+		 * the EPQ run will never attempt to fetch tuples from blocked target
+		 * relations.
+		 */
+		memcpy(epqstate->relsubs_done, epqstate->relsubs_blocked,
+			   rtsize * sizeof(bool));
 
 		/* Recopy current values of parent parameters */
 		if (parentestate->es_plannedstmt->paramExecTypes != NIL)
@@ -2931,10 +2955,22 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	}
 
 	/*
-	 

Re: Adding SHOW CREATE TABLE

2023-05-18 Thread Stephen Frost
Greetings,

* Kirk Wolak (wol...@gmail.com) wrote:
> My approach for now is to develop this as the \st command.
> After reviewing the code/output from the 3 sources (psql, fdw, and
> pg_dump).  This trivializes the approach,
> and requires the smallest set of changes (psql is already close with
> existing queries, etc).
> 
> And frankly, I would rather have an \st feature that handles 99% of the use
> cases then go 15yrs waiting for a perfect solution.
> Once this works well for the group.  Then, IMO, that would be the time to
> discuss moving it.

Having this only available via psql seems like the least desirable
option as then it wouldn't be available to any other callers..

Having it in libpq, on the other hand, would make it available to psql
as well as any other utility, library, or language / driver which uses
libpq, including pg_dump..

Using libpq would also make sense from the perspective that libpq can be
used to connect to a number of different major versions of PG and this
could work work for them all in much the way that pg_dump does.

The downside with this apporach is that drivers which don't use libpq
(eg: JDBC) would have to re-implement this if they wanted to keep
feature parity with libpq..

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: XLog size reductions: smaller XLRec block header for PG17

2023-05-18 Thread Matthias van de Meent
On Thu, 18 May 2023 at 18:22, Heikki Linnakangas  wrote:
>
> On 18/05/2023 17:59, Matthias van de Meent wrote:
> Perhaps we should introduce a few generic inline functions to do varint
> encoding. That could be useful in many places, while this scheme is very
> tailored for XLogRecordBlockHeader.

I'm not sure about the reusability of such code, as not all varint
encodings are made equal:

Here, I chose to determine the size of the field with some bits stored
in leftover bits of another field, so storing the field and size
separately.
But in other cases, such as UTF8's code point encoding, each byte has
a carry bit indicating whether the value has more bytes to go.
In even more other cases, such as sqlite's integer encoding, the value
is stored in a single byte, unless that byte contains a sentinel value
that indicates the number of bytes that the value continues into.

What I'm trying to say is that there is no perfect encoding that is
better than all others, and I picked what I thought worked best in
this specific case. I think it is reasonable to expect that
varint-encoding of e.g. blockNo or RelFileNode into the WAL record
could want to choose a different method than the method I've chosen
for the block data length.

> We could replace XLogRecordDataHeaderShort and XLogRecordDataHeaderLong
> with this too. With just one XLogRecordDataHeader, with a
> variable-length length field.

Yes, that could be used too. But that's not part of the patch right
now, and I've not yet planned on implementing that for this patch.

> > [0] https://wiki.postgresql.org/wiki/Updating_the_WAL_infrastructure
>
> Good ideas here. Eliminating the two padding bytes from XLogRecord in
> particular seems like a pure win.

It requires code churn and probably increases complexity, but apart
from that I think 'pure win' is accurate, yes.

> > PS. Benchmark results on my system (5950x with other light tasks
> > running) don't show an obviously negative effect in a 10-minute run
> > with these arbitrary pgbench settings on a fresh cluster with default
> > configuration:
> >
> > ./pg_install/bin/pgbench postgres -j 2 -c 6 -T 600 -M prepared
> > [...]
> > master: tps = 375
> > patched: tps = 381
>
> That was probably not CPU limited, so that any overhead in generating
> the WAL would not show up. Try PGOPTIONS="-csynchronous_commit=off" and
> pgbench -N option. And make sure the scale is large enough that there is
> no lock contention. Also would be good to measure the overhead in
> replaying the WAL.

with assertions now disabled, and the following configuration:

synchronous_commit = off
fsync = off
full_page_writes = off
checkpoint_timeout = 1d
autovacuum = off

and now without assertions, I get
master: tps = 3500.815859
patched: tps = 3535.188054

With autovacuum enabled it's worked similarly well, within 1% of these results.

> How much space saving does this yield?

No meaningful savings in the pgbench workload, mostly due to xlog
record length MAXALIGNs currently not being favorable in the pgbench
workload. But, record sizes have dropped by 1 or 2 bytes in several
cases, as can be seen at the bottom of this mail.

Kind regards,

Matthias van de Meent
Neon, Inc.

The data: Record type, then record length averages (average aligned
length between parens) for both master and patched, and the average
per-record savings with this patch.

| record type   | master avg  | patched avg | delta | delta   |
|   | (aligned avg)   | (aligned avg)   |   | aligned |
|---|-|-|---|-|
| BT/DEDUP  | 64.00 (64.00)   | 63.00 (64.00)   |-1 |   0 |
| BT/INS_LEAF   | 81.41 (81.41)   | 80.41 (81.41)   |-1 |   0 |
| CLOG/0PG  | 30.00 (32.00)   | 30.00 (32.00)   | 0 |   0 |
| HEAP/DEL  | 54.00 (56.00)   | 52.00 (56.00)   |-2 |   0 |
| HEAP/HOT_UPD  | 72.02 (72.19)   | 71.02 (72.19)   | 0 |   0 |
| HEAP/INS  | 79.00 (80.00)   | 78.00 (80.00)   |-1 |   0 |
| HEAP/INS+INIT | 79.00 (80.00)   | 78.00 (80.00)   |-1 |   0 |
| HEAP/LOCK | 54.00 (56.00)   | 52.00 (56.00) * |-2 |   0 |
| HEAP2/MUL_INS | 85.00 (88.00)   | 84.00 (88.00) * |-1 |   0 |
| HEAP2/PRUNE   | 65.17 (68.19)   | 64.17 (68.19)   |-1 |   0 |
| STDBY/R_XACTS | 52.76 (56.00)   | 52.21 (56.00)   |  -0.5 |   0 |
| TX/COMMIT | 34.00 (40.00)   | 34.00 (40.00)   | 0 |   0 |
| XLOG/CHCKPT_O | 114.00 (120.00) | 114.00 (120.00) | 0 |   0 |




Re: PG 16 draft release notes ready

2023-05-18 Thread Matthias van de Meent
On Thu, 18 May 2023 at 22:49, Bruce Momjian  wrote:
>
> I have completed the first draft of the PG 16 release notes.  You can
> see the output here:
>
> https://momjian.us/pgsql_docs/release-16.html
>
> I will adjust it to the feedback I receive;  that URL will quickly show
> all updates.

I'm not sure if bugfixes like these are considered for release notes,
but I'm putting it up here just in case:

As of 8fcb32db (new, only in PG16) we now enforce limits on the size
of WAL records during construction, where previously we hoped that the
WAL records didn't exceed those limits.
This change is immediately user-visible through a change in behaviour
of `pg_logical_emit_message(true, repeat('_', 2^30 - 10), repeat('_',
2^30 - 10))`, and extensions that implement their own rmgrs could also
see a change in behavior from this change.

Kind regards,

Matthias van de Meent
Neon, Inc.




Re: Missing warning on revokes with grant options

2023-05-18 Thread Joseph Koshakow
On Thu, May 18, 2023 at 7:17 PM Joseph Koshakow  wrote:
>
>I looked into this function and that is correct. We fail to find a
>match for the revoked privilege here:
>
>/*
>* Search the ACL for an existing entry for this grantee and grantor. If
>* one exists, just modify the entry in-place (well, in the same
position,
>* since we actually return a copy); otherwise, insert the new entry at
>* the end.
>*/
>
>for (dst = 0; dst < num; ++dst)
>{
>if (aclitem_match(mod_aip, old_aip + dst))
>{
>/* found a match, so modify existing item */
>new_acl = allocacl(num);
>new_aip = ACL_DAT(new_acl);
>memcpy(new_acl, old_acl, ACL_SIZE(old_acl));
>break;
>}
>}
>
>Seeing that there was no match, we add a new empty privilege to the end
>of the existing ACL list here:
>
>if (dst == num)
>{
>/* need to append a new item */
>new_acl = allocacl(num + 1);
>new_aip = ACL_DAT(new_acl);
>memcpy(new_aip, old_aip, num * sizeof(AclItem));
>
>/* initialize the new entry with no permissions */
>new_aip[dst].ai_grantee = mod_aip->ai_grantee;
>new_aip[dst].ai_grantor = mod_aip->ai_grantor;
>ACLITEM_SET_PRIVS_GOPTIONS(new_aip[dst],
>  ACL_NO_RIGHTS, ACL_NO_RIGHTS);
>num++; /* set num to the size of new_acl */
>}
>
>We then try and revoke the specified privileges from the new empty
>privilege, leaving it empty (modechg will equal ACL_MODECHG_DEL here):
>
>old_rights = ACLITEM_GET_RIGHTS(new_aip[dst]);
>old_goptions = ACLITEM_GET_GOPTIONS(new_aip[dst]);
>
>/* apply the specified permissions change */
>switch (modechg)
>{
>case ACL_MODECHG_ADD:
>ACLITEM_SET_RIGHTS(new_aip[dst],
>  old_rights | ACLITEM_GET_RIGHTS(*mod_aip));
>break;
>case ACL_MODECHG_DEL:
>ACLITEM_SET_RIGHTS(new_aip[dst],
>  old_rights & ~ACLITEM_GET_RIGHTS(*mod_aip));
>break;
>case ACL_MODECHG_EQL:
>ACLITEM_SET_RIGHTS(new_aip[dst],
>  ACLITEM_GET_RIGHTS(*mod_aip));
>break;
>}
>
>Then since the new privilege remains empty, we remove it from the ACL
>list:
>
>new_rights = ACLITEM_GET_RIGHTS(new_aip[dst]);
>new_goptions = ACLITEM_GET_GOPTIONS(new_aip[dst]);
>
>/*
>* If the adjusted entry has no permissions, delete it from the list.
>*/
>if (new_rights == ACL_NO_RIGHTS)
>{
>memmove(new_aip + dst,
>new_aip + dst + 1,
>(num - dst - 1) * sizeof(AclItem));
>/* Adjust array size to be 'num - 1' items */
>ARR_DIMS(new_acl)[0] = num - 1;
>SET_VARSIZE(new_acl, ACL_N_SIZE(num - 1));
>}

Sorry about the unformatted code, here's the entire quoted section
again with proper formatting:

I looked into this function and that is correct. We fail to find a
match for the revoked privilege here:

/*
 * Search the ACL for an existing entry for this grantee and grantor. If
 * one exists, just modify the entry in-place (well, in the same
position,
 * since we actually return a copy); otherwise, insert the new entry at
 * the end.
 */

for (dst = 0; dst < num; ++dst)
{
if (aclitem_match(mod_aip, old_aip + dst))
{
/* found a match, so modify existing item */
new_acl = allocacl(num);
new_aip = ACL_DAT(new_acl);
memcpy(new_acl, old_acl, ACL_SIZE(old_acl));
break;
}
}

Seeing that there was no match, we add a new empty privilege to the end
of the existing ACL list here:

if (dst == num)
{
/* need to append a new item */
new_acl = allocacl(num + 1);
new_aip = ACL_DAT(new_acl);
memcpy(new_aip, old_aip, num * sizeof(AclItem));

/* initialize the new entry with no permissions */
new_aip[dst].ai_grantee = mod_aip->ai_grantee;
new_aip[dst].ai_grantor = mod_aip->ai_grantor;
ACLITEM_SET_PRIVS_GOPTIONS(new_aip[dst],
   ACL_NO_RIGHTS, ACL_NO_RIGHTS);
num++;/* set num to the size of new_acl */
}

We then try and revoke the specified privileges from the new empty
privilege, leaving it empty (modechg will equal ACL_MODECHG_DEL here):

old_rights = ACLITEM_GET_RIGHTS(new_aip[dst]);
old_goptions = ACLITEM_GET_GOPTIONS(new_aip[dst]);

/* apply the specified permissions change */
switch (modechg)
{
case ACL_MODECHG_ADD:
ACLITEM_SET_RIGHTS(new_aip[dst],
   old_rights | ACLITEM_GET_RIGHTS(*mod_aip));
break;
case ACL_MODECHG_DEL:
ACLITEM_SET_RIGHTS(new_aip[dst],
   old_rights & ~ACLITEM_GET_RIGHTS(*mod_aip));
break;
case ACL_MODECHG_EQL:
ACLITEM_SET_RIGHTS(new_aip[dst],
   ACLITEM_GET_RIGHTS(*mod_aip));
break;
}

Then since the new privilege remains empty, we remove it from the ACL
list:


Re: Missing warning on revokes with grant options

2023-05-18 Thread Joseph Koshakow
On Wed, May 17, 2023 at 11:48 PM Nathan Bossart 
wrote:
>
>The thread for the aforementioned change [0] mentions the standard
quite a
>bit, which might explain the current behavior.

I went through that thread and the quoted parts of the SQL standard. It
seems clear that if a user tries to REVOKE some privilege and they
don't have a grant option on that privilege, then a warning should be
issued. There was some additional discussion on when there should be
an error vs a warning, but I don't think it's that relevant to this
discussion. However, I was not able to find any discussion about the
restriction that a revoker can only revoke privileges that they granted
themselves.

The restriction was added to PostgreSQL at the same time as GRANT
OPTIONs were introduced. The commit [0] and mailing thread [1] don't
provide much details on this specific restriction.

The SQL99 standard for REVOKE is very dense and I may have
misunderstood parts, but here's my interpretation of how this
restriction might come from the standard and what it says about issuing
a warning (section 12.6).

Let's start with the Syntax Rules:

1) Let O be the object identified by the  contained in


In my example O is the table t.

3) Let U be the current user identifier and R be the current role name.
4) Case:
  a) If GRANTED BY  is not specified, then
Case:
  i) If U is not the null value, then let A be U.
  ii) Otherwise, let A be R.

In my example A is the role r1.

9) Case:
  a) If the  is a , then
for every 
 specified, a set of privilege descriptors is identified. A
privilege descriptor P is said to be
 identified if it belongs to the set of privilege descriptors that
defined, for any 
 explicitly or implicitly in , that  on O, or
any of the objects in S, granted
 by A to 

In my example,  is the role r1,  is the list of
privileges that only contain SELECT,  is SELECT. Therefore the
set of identified privilege descriptors would be a single privilege
descriptor on table t where the privileges contain SELECT, the grantor
is r1, and the grantee is r1. Such a privilege does not exist, so the
identified privilege set is empty.

Now onto the General Rules:

1) Case:
  a) If the  is a , then
Case:
  i) If neither WITH HIERARCHY OPTION nor GRANT OPTION FOR is
specified, then:
2) The identified privilege descriptors are destroyed.

In my example, the identified set of privileges is empty, so no
privileges are destroyed (which I'm interpreting to mean the same thing
as revoked).

18) If the  is a , then:
  a) For every combination of  and  on O specified in
, if there
 is no corresponding privilege descriptor in the set of identified
privilege descriptors, then a
 completion condition is raised: warning — privilege not revoked.

In my example the identified privileges set is empty, therefore it
cannot contain a corresponding privilege descriptor, therefore we
should be issuing a warning.

So I think our current behavior is not in spec. Would you agree with
this evaluation or do you think I've misunderstood something?

>> I wasn't able to locate where the check for
>>> A user can only revoke privileges that were granted directly by that
>>> user.
>> is in the code, but we should probably just add a warning there.
>
>І'm not certain, but I suspect the calls to aclupdate() in
>merge_acl_with_grant() take care of this because the grantors will
never
>match.

I looked into this function and that is correct. We fail to find a
match for the revoked privilege here:

/*
* Search the ACL for an existing entry for this grantee and grantor. If
* one exists, just modify the entry in-place (well, in the same position,
* since we actually return a copy); otherwise, insert the new entry at
* the end.
*/

for (dst = 0; dst < num; ++dst)
{
if (aclitem_match(mod_aip, old_aip + dst))
{
/* found a match, so modify existing item */
new_acl = allocacl(num);
new_aip = ACL_DAT(new_acl);
memcpy(new_acl, old_acl, ACL_SIZE(old_acl));
break;
}
}

Seeing that there was no match, we add a new empty privilege to the end
of the existing ACL list here:

if (dst == num)
{
/* need to append a new item */
new_acl = allocacl(num + 1);
new_aip = ACL_DAT(new_acl);
memcpy(new_aip, old_aip, num * sizeof(AclItem));

/* initialize the new entry with no permissions */
new_aip[dst].ai_grantee = mod_aip->ai_grantee;
new_aip[dst].ai_grantor = mod_aip->ai_grantor;
ACLITEM_SET_PRIVS_GOPTIONS(new_aip[dst],
  ACL_NO_RIGHTS, ACL_NO_RIGHTS);
num++; /* set num to the size of new_acl */
}

We then try and revoke the specified privileges from the new empty
privilege, leaving it empty (modechg will equal ACL_MODECHG_DEL here):

old_rights = ACLITEM_GET_RIGHTS(new_aip[dst]);
old_goptions = ACLITEM_GET_GOPTIONS(new_aip[dst]);

/* apply the specified permissions change */
switch (modechg)
{
case ACL_MODECHG_ADD:

Re: PG 16 draft release notes ready

2023-05-18 Thread Peter Geoghegan
On Thu, May 18, 2023 at 3:52 PM Bruce Momjian  wrote:
> So, I looked at this and the problem is that this is best as a single
> release note entry because we are removing and adding, and if I moved it
> to compatibility, I am concerned the new feature will be missed.  Since
> WAL inspection is a utility operation, inn general, I think having it in
> the pg_walinspect section makes the most sense.

I don't understand what you mean by that. The changes to
*_till_end_of_wal() (the way that those duplicative functions were
removed, and more permissive end_lsn behavior was added) is unrelated
to all of the other changes. Plus it's just not very important.

> Okay, I went with:
>
> Improve descriptions of pg_walinspect WAL record descriptions
> (Melanie Plageman, Peter Geoghegan)
>
> > Note also that the item "Add pg_waldump option --save-fullpage to dump
> > full page images (David Christensen)" is tangentially related to
> > pg_get_wal_block_info(), since you can also get FPIs using
> > pg_get_wal_block_info() (in fact, that was originally its main
> > purpose). I'm not saying that you necessarily need to connect them
> > together in any way, but you might consider it.
>
> Well, there is so much _new_ in that tool that listing everything new
> seems confusing.

There is pretty much one truly new piece of functionality added to
pg_walinspect (the function called pg_get_wal_block_info was added) --
since the enhancement to rmgr description output applies equally to
pg_waldump, no matter where you place it in the release notes. So not
sure what you mean.

> All changes committed.

Even after these changes, the release notes still refer to a function
called "pg_get_wal_block". There is no such function, though -- not in
Postgres 16, and not in any other major version.

As I said, there is a new function called "pg_get_wal_block_info". It
should simply be presented as a whole new function that offers novel
new functionality compared to what was available in Postgres 15 --
without any further elaboration. (It happens to be true that
pg_get_wal_block_info only reached its final form following multiple
rounds of work in multiple commits, but that is of no consequence to
users -- even the earliest form of the function appeared in a commit
in the Postgres 16 cycle.)

-- 
Peter Geoghegan




Re: PG 16 draft release notes ready

2023-05-18 Thread Bruce Momjian
On Thu, May 18, 2023 at 02:53:25PM -0700, Peter Geoghegan wrote:
> On Thu, May 18, 2023 at 1:49 PM Bruce Momjian  wrote:
> > I will adjust it to the feedback I receive;  that URL will quickly show
> > all updates.
> >
> > I learned a few things creating it this time:
> >
> > *  I can get confused over C function names and SQL function names in
> >commit messages.
> 
> The commit history covering pg_walinspect was complicated. Some of the
> newly added stuff was revised multiple times, by multiple authors due
> to changing ideas about the best UI. Here is some concrete feedback
> about that:
> 
> * Two functions that were in 15 that each end in *_till_end_of_wal()
> were removed for 16, since the same functionality is now provided
> through a more intuitive UI: we now tolerate invalid end_lsn values
> "from the future", per a new "Tip" in the pg_walinspect documentation
> for 16.
> 
> In my opinion this should (at most) be covered as a compatibility
> item. It's not really new functionality.

So, I looked at this and the problem is that this is best as a single
release note entry because we are removing and adding, and if I moved it
to compatibility, I am concerned the new feature will be missed.  Since
WAL inspection is a utility operation, inn general, I think having it in
the pg_walinspect section makes the most sense.
 
> * There is one truly new pg_walinspect function added to 16:
> pg_get_wal_block_info(). Its main purpose is to see how each
> individual block changed -- it's far easier to track how blocks
> changed over time using the new function. The original "record
> orientated" functions made that very difficult (regex hacks were
> required).
> 
> pg_get_wal_block_info first appeared under another name, and had
> somewhat narrower functionality to the final version, all of which
> shouldn't matter to users -- since they never saw a stable release
> with any of that. There is no point in telling users about the commits
> that changed the name/functionality of pg_get_wal_block_info that came
> only a couple of months after the earliest version was commited -- to
> users, it is simply a new function with new functionality.

Right.

> I also suggest merging the pg_waldump items with the section you've
> added covering pg_walinspect. A "pg_walinspect and pg_waldump" section
> seems natural to me. Some of the enhancements in this area benefit
> pg_walinspect and pg_waldump in exactly the same way, since
> pg_walinspect is essentially a backend/SQL interface equivalent of
> pg_waldump's frontend/CLI interface. Melanie's work on improving the
> descriptions output for WAL records like Heap's PRUNE and VACUUM
> records is a great example of this -- it does exactly the same thing
> for pg_walinspect and pg_waldump, without directly targeting either
> (it also affects the wal_debug developer option).

Well, pg_waldump is an installed binary while pg_walinspect is an
extension, so I am not sure where I would put a merged section.

> It might also make sense to say that the enhanced WAL record
> descriptions from Melanie generally apply to records used by VACUUM
> only.

Okay, I went with:

Improve descriptions of pg_walinspect WAL record descriptions
(Melanie Plageman, Peter Geoghegan)

> Note also that the item "Add pg_waldump option --save-fullpage to dump
> full page images (David Christensen)" is tangentially related to
> pg_get_wal_block_info(), since you can also get FPIs using
> pg_get_wal_block_info() (in fact, that was originally its main
> purpose). I'm not saying that you necessarily need to connect them
> together in any way, but you might consider it.

Well, there is so much _new_ in that tool that listing everything new
seems confusing.

FYI, I have just added an item about more aggressing freezing:





During non-freeze operations, perform page freezing where appropriate
Peter Geoghegan)



This makes full-table freeze vacuums less necessary.



All changes committed.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Order changes in PG16 since ICU introduction

2023-05-18 Thread Jeff Davis
On Thu, 2023-05-18 at 20:11 +0200, Matthias van de Meent wrote:
> As I complain about in [0], since 5cd1a5af --no-locale has been
> broken
> / bahiving outside it's description: Instead of being equivalent to
> `--locale=C` it now also overrides `--locale-provider=libc`,
> resulting
> in undocumented behaviour.

I agree that 5cd1a5af is incomplete.

Posting updated patches. Feedback on the approaches below would be
appreciated.

For context, in version 15:

  $ initdb -D data --locale-provider=icu --icu-locale=en
  => create database clocale template template0 locale='C';
  => select datname, datlocprovider, daticulocale
 from pg_database where datname='clocale';
   datname | datlocprovider | daticulocale 
  -++--
   clocale | i  | en
  (1 row)

That behavior is confusing, and when I made ICU the default provider in
v16, the confusion was extended into more cases.

If we leave the CREATE DATABASE (and createdb and initdb) syntax in
place, such that LOCALE (and --locale) do not apply to ICU at all, then
I don't see a path to a good ICU user experience.

Therefore I conclude that we need LOCALE (and --locale) to apply to ICU
somehow. (The LOCALE option already applies to ICU during CREATE
COLLATION, just not CREATE DATABASE or initdb.)

Patch 0003 does this. It's fairly straightforward and I believe we need
this patch.

But to actually fix your complaint we also need --no-locale to be
equivalent to --locale=C and for those options to both use memcmp()
semantics. There are several approaches to accomplish this, and I think
this is the part where I most need some feedback. There are only so
many approaches, and each one has some potential downsides, but I
believe we need to select one:


(1) Give up and leave the existing CREATE DATABASE (and createdb, and
initdb) semantics in place, along with the confusing behavior in v15.

This is a last resort, in my opinion. It gives us no path toward a good
user experience with ICU, and leaves us with all of the problems of the
OS as a collation provider.

(2) Automatically change the provider to libc when locale=C.

Almost works, but it's not clear how we handle the case "provider=icu
lc_collate='fr_FR.utf8' locale=C".

If we change it to "provider=libc lc_collate=C", we've overridden the
specified lc_collate. If we ignore the locale=C, that would be
surprising to users. If we throw an error, that would be a backwards
compatibility issue.

One possible solution would be to change the catalog representation to
allow setting the default collation locale separately from datcollate
even for the libc provider. For instance, rename daticulocale to
datdeflocale, and store the default collation locale there for both
libc and ICU. Then, "provider=icu lc_collate='fr_FR.utf8' locale=C"
could be changed into "provider=libc lc_collate='fr_FR.utf8'
deflocale=C". It may be confusing that datcollate is a different
concept from datdeflocale; but then again they are different concepts
and it's confusing that they are currently combined into one.

(3) Support iculocale=C in the ICU provider using the memcmp() path.

In other words, if provider=icu and iculocale=C, lc_collate_is_c() and
lc_ctpye_is_c() would both return true.

There's a potential problem for users who've misused ICU in the past
(15 or earlier) by using provider=icu and iculocale=C. ICU would accept
such a locale name, but not recognize it and fall back to the root
locale, so it never worked as the user intended it. But if we redefine
C to be memcmp(), then such users will have broken indexes if they
upgrade.

We could add a check at pg_upgrade time for iculocale=C in versions 15
and earlier, and cause the check (and therefore the upgrade) to fail.
That may be reasonable considering that it never really worked in the
past, and perhaps very few users actually ever created such a
collation. But if some user runs into that problem, we'd have to resort
to a hack like telling them to "update pg_collation set iculocale='und'
where iculocale='C'" and then try the upgrade again, which is not a
great answer (as far as I can tell it would be a correct answer and
should not break their indexes, but it feels pretty dangerous).

There may be some other resolutions to this problem, such as catalog
hacks that allow for different representations of iculocale=C pre-16
and post-16. That doesn't sound great though, and we'd have to figure
out what to do with pg_dump.

(4) Create a new "none" provider (which has no locale and always memcmp
semantics), and automatically change the provider to "none" if
provider=icu and iculocale=C.

This solves the problem case in #2 and the potential upgrade problem in
#3. It also makes the documentation a bit more natural, in my opinion,
even if we retain the special case for provider=libc collate=C.


#4 is the approach I chose (patches 0001 and 0002), but I'd like to
hear what others think.


For historical reasons, users may assume that LC_COLLATE 

Re: Introduce WAIT_EVENT_EXTENSION and WAIT_EVENT_BUFFER_PIN

2023-05-18 Thread Michael Paquier
On Thu, May 18, 2023 at 12:28:20PM -0400, Robert Haas wrote:
> I mean, I agree that it would probably be hard to measure any real
> performance difference. But I'm not sure that's a good reason to add
> cycles to a path where we don't really need to.

Honestly, I am not sure that it's worth worrying about performance
here, or perhaps you know of some external stuff that could set the
extension class type in a code path hot enough that it would matter.
Anyway, why couldn't we make all these functions static inline
instead, then?
--
Michael


signature.asc
Description: PGP signature


Re: Memory leak from ExecutorState context?

2023-05-18 Thread Melanie Plageman
On Wed, May 17, 2023 at 6:35 PM Jehan-Guillaume de Rorthais
 wrote:
>
> On Wed, 17 May 2023 13:46:35 -0400
> Melanie Plageman  wrote:
>
> > On Wed, May 17, 2023 at 07:10:08PM +0200, Jehan-Guillaume de Rorthais wrote:
> > > On Tue, 16 May 2023 16:00:52 -0400
> > > Melanie Plageman  wrote:
> > > > ...
> > > > There are some existing indentation issues in these files, but you can
> > > > leave those or put them in a separate commit.
> > >
> > > Reindented in v9.
> > >
> > > I put existing indentation issues in a separate commit to keep the actual
> > > patches clean from distractions.
> >
> > It is a matter of opinion, but I tend to prefer the commit with the fix
> > for the existing indentation issues to be first in the patch set.
>
> OK. moved in v10 patch set.

v10 LGTM.




Re: The documentation for READ COMMITTED may be incomplete or wrong

2023-05-18 Thread Tom Lane
Nathan Bossart  writes:
> On Thu, May 18, 2023 at 04:03:36PM -0400, Tom Lane wrote:
>> What we need to do, I think, is set epqstate->relsubs_done[] for
>> all target relations except the one we are stuffing a tuple into.

> This seems generally reasonable to me.

Here's a draft patch for this.  I think it's OK for HEAD, but we are
going to need some different details for the back branches, because
adding a field to struct EPQState would create an ABI break.  Our
standard trick of shoving the field to the end in the back branches
won't help, because struct EPQState is actually embedded in
struct ModifyTableState, meaning that all the subsequent fields
therein will move.  Maybe we can get away with that, but I bet not.

I think what the back branches will have to do is reinitialize the
relsubs_done array every time through EvalPlanQual(), which is a bit sad
but probably doesn't amount to anything compared to the startup overhead
of the sub-executor.

Debian Code Search doesn't know of any outside code touching
relsubs_done, so I think we are safe in dropping that code in
ExecScanReScan.  It seems quite pointless anyway considering
that up to now, EvalPlanQualBegin has always zeroed the whole
array.

regards, tom lane

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0186be452c..c518daa7d0 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2469,7 +2469,7 @@ ExecBuildAuxRowMark(ExecRowMark *erm, List *targetlist)
  *	relation - table containing tuple
  *	rti - rangetable index of table containing tuple
  *	inputslot - tuple for processing - this can be the slot from
- *		EvalPlanQualSlot(), for the increased efficiency.
+ *		EvalPlanQualSlot() for this rel, for increased efficiency.
  *
  * This tests whether the tuple in inputslot still matches the relevant
  * quals. For that result to be useful, typically the input tuple has to be
@@ -2503,6 +2503,13 @@ EvalPlanQual(EPQState *epqstate, Relation relation,
 	if (testslot != inputslot)
 		ExecCopySlot(testslot, inputslot);
 
+	/*
+	 * Mark that an EPQ tuple is available for this relation.  (If there is
+	 * more than one result relation, the others remain marked as having no
+	 * tuple available.)
+	 */
+	epqstate->relsubs_done[rti - 1] = false;
+
 	/*
 	 * Run the EPQ query.  We assume it will return at most one tuple.
 	 */
@@ -2519,11 +2526,12 @@ EvalPlanQual(EPQState *epqstate, Relation relation,
 		ExecMaterializeSlot(slot);
 
 	/*
-	 * Clear out the test tuple.  This is needed in case the EPQ query is
-	 * re-used to test a tuple for a different relation.  (Not clear that can
-	 * really happen, but let's be safe.)
+	 * Clear out the test tuple, and mark that no tuple is available here.
+	 * This is needed in case the EPQ state is re-used to test a tuple for a
+	 * different target relation.
 	 */
 	ExecClearTuple(testslot);
+	epqstate->relsubs_done[rti - 1] = true;
 
 	return slot;
 }
@@ -2532,18 +2540,26 @@ EvalPlanQual(EPQState *epqstate, Relation relation,
  * EvalPlanQualInit -- initialize during creation of a plan state node
  * that might need to invoke EPQ processing.
  *
+ * If the caller intends to use EvalPlanQual(), resultRelations should be
+ * a list of RT indexes of potential target relations for EvalPlanQual(),
+ * and we will arrange that the other listed relations don't return any
+ * tuple during an EvalPlanQual() call.  Otherwise resultRelations
+ * should be NIL.
+ *
  * Note: subplan/auxrowmarks can be NULL/NIL if they will be set later
  * with EvalPlanQualSetPlan.
  */
 void
 EvalPlanQualInit(EPQState *epqstate, EState *parentestate,
- Plan *subplan, List *auxrowmarks, int epqParam)
+ Plan *subplan, List *auxrowmarks,
+ int epqParam, List *resultRelations)
 {
 	Index		rtsize = parentestate->es_range_table_size;
 
 	/* initialize data not changing over EPQState's lifetime */
 	epqstate->parentestate = parentestate;
 	epqstate->epqParam = epqParam;
+	epqstate->resultRelations = resultRelations;
 
 	/*
 	 * Allocate space to reference a slot for each potential rti - do so now
@@ -2760,10 +2776,20 @@ EvalPlanQualBegin(EPQState *epqstate)
 		/*
 		 * We already have a suitable child EPQ tree, so just reset it.
 		 */
-		Index		rtsize = parentestate->es_range_table_size;
 		PlanState  *rcplanstate = epqstate->recheckplanstate;
 
-		MemSet(epqstate->relsubs_done, 0, rtsize * sizeof(bool));
+		/*
+		 * We reset the relsubs_done[] flags only when the caller of
+		 * EvalPlanQualInit provided no result relations.  Otherwise, we keep
+		 * the current state (which should have all result relations marked as
+		 * "done", and others not).
+		 */
+		if (epqstate->resultRelations == NIL)
+		{
+			Index		rtsize = parentestate->es_range_table_size;
+
+			memset(epqstate->relsubs_done, 0, rtsize * sizeof(bool));
+		}
 
 		/* Recopy current values of parent parameters */
 		if (parentestate->es_plannedstmt->paramExecTypes != NIL)
@@ -2931,11 

Re: PG 16 draft release notes ready

2023-05-18 Thread Peter Geoghegan
On Thu, May 18, 2023 at 1:49 PM Bruce Momjian  wrote:
> I will adjust it to the feedback I receive;  that URL will quickly show
> all updates.
>
> I learned a few things creating it this time:
>
> *  I can get confused over C function names and SQL function names in
>commit messages.

The commit history covering pg_walinspect was complicated. Some of the
newly added stuff was revised multiple times, by multiple authors due
to changing ideas about the best UI. Here is some concrete feedback
about that:

* Two functions that were in 15 that each end in *_till_end_of_wal()
were removed for 16, since the same functionality is now provided
through a more intuitive UI: we now tolerate invalid end_lsn values
"from the future", per a new "Tip" in the pg_walinspect documentation
for 16.

In my opinion this should (at most) be covered as a compatibility
item. It's not really new functionality.

* There is one truly new pg_walinspect function added to 16:
pg_get_wal_block_info(). Its main purpose is to see how each
individual block changed -- it's far easier to track how blocks
changed over time using the new function. The original "record
orientated" functions made that very difficult (regex hacks were
required).

pg_get_wal_block_info first appeared under another name, and had
somewhat narrower functionality to the final version, all of which
shouldn't matter to users -- since they never saw a stable release
with any of that. There is no point in telling users about the commits
that changed the name/functionality of pg_get_wal_block_info that came
only a couple of months after the earliest version was commited -- to
users, it is simply a new function with new functionality.

I also suggest merging the pg_waldump items with the section you've
added covering pg_walinspect. A "pg_walinspect and pg_waldump" section
seems natural to me. Some of the enhancements in this area benefit
pg_walinspect and pg_waldump in exactly the same way, since
pg_walinspect is essentially a backend/SQL interface equivalent of
pg_waldump's frontend/CLI interface. Melanie's work on improving the
descriptions output for WAL records like Heap's PRUNE and VACUUM
records is a great example of this -- it does exactly the same thing
for pg_walinspect and pg_waldump, without directly targeting either
(it also affects the wal_debug developer option).

It might also make sense to say that the enhanced WAL record
descriptions from Melanie generally apply to records used by VACUUM
only.

Note also that the item "Add pg_waldump option --save-fullpage to dump
full page images (David Christensen)" is tangentially related to
pg_get_wal_block_info(), since you can also get FPIs using
pg_get_wal_block_info() (in fact, that was originally its main
purpose). I'm not saying that you necessarily need to connect them
together in any way, but you might consider it.

-- 
Peter Geoghegan




Re: PG 16 draft release notes ready

2023-05-18 Thread Bruce Momjian
On Thu, May 18, 2023 at 05:39:08PM -0400, Jonathan Katz wrote:
> On 5/18/23 4:49 PM, Bruce Momjian wrote:
> > I have completed the first draft of the PG 16 release notes.  You can
> > see the output here:
> > 
> > https://momjian.us/pgsql_docs/release-16.html
> > 
> > I will adjust it to the feedback I receive;  that URL will quickly show
> > all updates.
> 
> Still reading, but saw this:
> 
>   Allow incremental sorts in more cases, including DISTINCT (David
> Rowley)window
> 
> I didn't realize we had a DISTINCT (David Rowley) window, but it sounds like
> an awesome feature ;)

I have fixed this and several others.  The URL will show the new text.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Easy dump of partitioned and inherited data

2023-05-18 Thread Przemysław Sztoch

Przemysław Sztoch wrote on 17.05.2023 17:46:
||
|2. Another common usability problem is a quick dump of the selected 
parent table.
It is important that it should includes all inherited tables or 
subpartitions.
Now you can't just specify -t root-table-name, because you'll usually 
get an empty data dump.


The list of necessary tables is sometimes very long and, when using 
timescaledb, partition tables are called not humanly.

|
Just saw that the problem was solved in "[Proposal] Allow pg_dump to 
include all child tables with the root table"
and "pg_dump - read data for some options from external file 
".

--
Przemysław Sztoch | Mobile +48 509 99 00 66


Re: PG 16 draft release notes ready

2023-05-18 Thread Jonathan S. Katz

On 5/18/23 4:49 PM, Bruce Momjian wrote:

I have completed the first draft of the PG 16 release notes.  You can
see the output here:

https://momjian.us/pgsql_docs/release-16.html

I will adjust it to the feedback I receive;  that URL will quickly show
all updates.


Still reading, but saw this:

  Allow incremental sorts in more cases, including DISTINCT (David 
Rowley)window


I didn't realize we had a DISTINCT (David Rowley) window, but it sounds 
like an awesome feature ;)


Thanks,

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: The documentation for READ COMMITTED may be incomplete or wrong

2023-05-18 Thread Nathan Bossart
On Thu, May 18, 2023 at 04:03:36PM -0400, Tom Lane wrote:
> Yeah.  I see the problem: when starting up an EPQ recheck, we stuff
> the tuple-to-test into the epqstate->relsubs_slot[] entry for the
> relation it came from, but we do nothing to the EPQ state for the
> other target relations, which allows the EPQ plan to fetch rows
> from those relations as usual.  If it finds a (non-updated) row
> passing the qual, kaboom!  We decide the EPQ check passed.

Ah, so the EPQ check only fails for the last tuple because we won't fetch
rows from the other relations.  I think that explains the behavior I'm
seeing.

> What we need to do, I think, is set epqstate->relsubs_done[] for
> all target relations except the one we are stuffing a tuple into.

This seems generally reasonable to me.

> While nodeModifyTable can certainly be made to do that, things are
> complicated by the fact that currently ExecScanReScan thinks it ought
> to clear all the relsubs_done flags, which would break things again.
> I wonder if we can simply delete that code.  Dropping the
> FDW/Custom-specific code there is a bit scary, but on the whole that
> looks like code that got cargo-culted in rather than anything we
> actually need.

I see that part was added in 385f337 [0].  I haven't had a chance to
evaluate whether it seems necessary.

[0] 
https://postgr.es/m/9A28C8860F777E439AA12E8AEA7694F80117370C%40BPXM15GP.gisp.nec.co.jp

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: PG 16 draft release notes ready

2023-05-18 Thread Jonathan S. Katz

On 5/18/23 4:49 PM, Bruce Momjian wrote:

I have completed the first draft of the PG 16 release notes.  You can
see the output here:

https://momjian.us/pgsql_docs/release-16.html

I will adjust it to the feedback I receive;  that URL will quickly show
all updates.


Thanks for going through this. The release announcement draft will 
follow shortly after in a different thread.



I learned a few things creating it this time:

*  I can get confused over C function names and SQL function names in
commit messages.

*  The sections and ordering of the entries can greatly clarify the
items.

*  The feature count is slightly higher than recent releases:

release-10:  189
release-11:  170
release-12:  180
release-13:  178
release-14:  220
release-15:  184
-->  release-16:  200


This definitely feels like a very full release. Personally I'm very excited.

Thanks,

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] Support % wildcard in extension upgrade filenames

2023-05-18 Thread Przemysław Sztoch
For me, it would be a big help if you could specify a simple from/to 
range as the source version:

myext--1.0.0-1.9.9--2.0.0.sql
myext--2.0.0-2.9.9--3.0.0.sql
myext--3.0.0-3.2.3--3.2.4.sql

The idea of % wildcard in my opinion is fully contained in the from/to 
range.


The name "ANY" should also be allowed as the source version.
Some extensions are a collection of CREATE OR REPLACE FUNCTION. All 
upgrades are one and the same SQL file.


For example: address_standardizer_data_us--ANY--3.2.4.sql
--
Przemysław Sztoch | Mobile +48 509 99 00 66


Re: Add operator for dividing interval by an interval

2023-05-18 Thread Tom Lane
Andres Freund  writes:
> What about an interval / interval -> double operator that errors out whenever
> month is non-zero? As far as I can tell that would always be deterministic.

We have months, days, and microseconds, and microseconds-per-day isn't
much more stable than days-per-month (because DST).  By the time you
restrict this to give deterministic results, I think it won't be
particularly useful.

You could arbitrarily define months as 30 days and days as 24 hours,
which is what some other interval functions do, and then the double
result would be well-defined; but how useful is it really?

regards, tom lane




PG 16 draft release notes ready

2023-05-18 Thread Bruce Momjian
I have completed the first draft of the PG 16 release notes.  You can
see the output here:

https://momjian.us/pgsql_docs/release-16.html

I will adjust it to the feedback I receive;  that URL will quickly show
all updates.

I learned a few things creating it this time:

*  I can get confused over C function names and SQL function names in
   commit messages.

*  The sections and ordering of the entries can greatly clarify the
   items.

*  The feature count is slightly higher than recent releases:

release-10:  189
release-11:  170
release-12:  180
release-13:  178
release-14:  220
release-15:  184
--> release-16:  200

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Add operator for dividing interval by an interval

2023-05-18 Thread Andres Freund
Hi,

I've open-coded $subject many times. I wonder if we should add at least a
restricted version of it.

I did find one past discussion of it on the list:
https://www.postgresql.org/message-id/24948.1259797531%40sss.pgh.pa.us

We have workarounds for it on the wiki:
https://wiki.postgresql.org/wiki/Working_with_Dates_and_Times_in_PostgreSQL#4._Multiplication_and_division_of_INTERVALS_is_under_development_and_discussion_at_this_time

There are plenty of search results with various, often quite wrong,
workarounds.


Of course, it's true that there are plenty intervals where division would not
result in clearly determinable result. E.g. '1 month'::interval / '1 
day'::interval.

I think there's no clear result whenever the month component is non-zero,
although possibly there are some cases of using months that could be made work
(e.g. '12 months' / '1 month').


In the cases I have wanted interval division, I typically dealt with intervals
without the month component - typically the intervals are the result of
subtracting timestamps or such.

One typical usecase for me is to divide the total runtime of a benchmark by
the time taken for some portion of that (e.g. time spent waiting for IO).


What about an interval / interval -> double operator that errors out whenever
month is non-zero? As far as I can tell that would always be deterministic.

Greetings,

Andres Freund




Re: The documentation for READ COMMITTED may be incomplete or wrong

2023-05-18 Thread Tom Lane
Nathan Bossart  writes:
> On Thu, May 18, 2023 at 11:22:54AM -0400, Tom Lane wrote:
>> Ugh.  Bisecting says it broke at
>> commit 86dc90056dfdbd9d1b891718d2e5614e3e432f35
>> which was absolutely not supposed to be breaking any concurrent-execution
>> guarantees.  I wonder what we got wrong.

> With the reproduction steps listed upthread, I see that XMAX for both
> tuples is set to the deleting transaction, but the one in inh_child_2 has
> two additional infomask flags: HEAP_XMAX_EXCL_LOCK and HEAP_XMAX_LOCK_ONLY.
> If I add a third table (i.e., inh_child_3), XMAX for all three tuples is
> set to the deleting transaction, and only the one in inh_child_3 has the
> lock bits set.  Also, in the three-table case, the DELETE statement reports
> "DELETE 2".

Yeah.  I see the problem: when starting up an EPQ recheck, we stuff
the tuple-to-test into the epqstate->relsubs_slot[] entry for the
relation it came from, but we do nothing to the EPQ state for the
other target relations, which allows the EPQ plan to fetch rows
from those relations as usual.  If it finds a (non-updated) row
passing the qual, kaboom!  We decide the EPQ check passed.

What we need to do, I think, is set epqstate->relsubs_done[] for
all target relations except the one we are stuffing a tuple into.

While nodeModifyTable can certainly be made to do that, things are
complicated by the fact that currently ExecScanReScan thinks it ought
to clear all the relsubs_done flags, which would break things again.
I wonder if we can simply delete that code.  Dropping the
FDW/Custom-specific code there is a bit scary, but on the whole that
looks like code that got cargo-culted in rather than anything we
actually need.

The reason this wasn't a bug before 86dc90056 is that any given
plan tree could have only one target relation, so there was not
anything else to suppress.

regards, tom lane




Re: The documentation for READ COMMITTED may be incomplete or wrong

2023-05-18 Thread Nathan Bossart
On Thu, May 18, 2023 at 11:22:54AM -0400, Tom Lane wrote:
> Ugh.  Bisecting says it broke at
> 
> 86dc90056dfdbd9d1b891718d2e5614e3e432f35 is the first bad commit
> commit 86dc90056dfdbd9d1b891718d2e5614e3e432f35
> Author: Tom Lane 
> Date:   Wed Mar 31 11:52:34 2021 -0400
> 
> Rework planning and execution of UPDATE and DELETE.
> 
> which was absolutely not supposed to be breaking any concurrent-execution
> guarantees.  I wonder what we got wrong.

With the reproduction steps listed upthread, I see that XMAX for both
tuples is set to the deleting transaction, but the one in inh_child_2 has
two additional infomask flags: HEAP_XMAX_EXCL_LOCK and HEAP_XMAX_LOCK_ONLY.
If I add a third table (i.e., inh_child_3), XMAX for all three tuples is
set to the deleting transaction, and only the one in inh_child_3 has the
lock bits set.  Also, in the three-table case, the DELETE statement reports
"DELETE 2".

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Missing update of all_hasnulls in BRIN opclasses

2023-05-18 Thread Tomas Vondra
On 5/15/23 12:06, Alvaro Herrera wrote:
> On 2023-May-07, Tomas Vondra wrote:
> 
>>> Álvaro wrote:
 In backbranches, the new field to BrinMemTuple needs to be at the end of
 the struct, to avoid ABI breakage.
>>
>> Unfortunately, this is not actually possible :-(
>>
>> The BrinMemTuple has a FLEXIBLE_ARRAY_MEMBER at the end, so we can't
>> place anything after it. I think we have three options:
>>
>> a) some other approach? - I really can't see any, except maybe for going
>> back to the previous approach (i.e. encoding the info using the existing
>> BrinValues allnulls/hasnulls flags)
> 
> Actually, mine was quite the stupid suggestion: the BrinMemTuple already
> has a 3 byte hole in the place where you originally wanted to add the
> flag:
> 
> struct BrinMemTuple {
> _Bool  bt_placeholder;   /* 0 1 */
> 
> /* XXX 3 bytes hole, try to pack */
> 
> BlockNumberbt_blkno; /* 4 4 */
> MemoryContext  bt_context;   /* 8 8 */
> Datum *bt_values;/*16 8 */
> _Bool *bt_allnulls;  /*24 8 */
> _Bool *bt_hasnulls;  /*32 8 */
> BrinValues bt_columns[]; /*40 0 */
> 
> /* size: 40, cachelines: 1, members: 7 */
> /* sum members: 37, holes: 1, sum holes: 3 */
> /* last cacheline: 40 bytes */
> };
> 
> so putting it there was already not causing any ABI breakage.  So, the
> solution to this problem of not being able to put it at the end is just
> to return the struct to your original formulation.
> 

Thanks, that's pretty lucky. It means we're not breaking on-disk format
nor ABI, which is great.

Attached is a final version of the patches - I intend to do a bit more
testing, go through the comments once more, and get this committed today
or perhaps tomorrow morning, so that it gets into beta1.

Unfortunately, while polishing the patches, I realized union_tuples()
has yet another long-standing bug with handling NULL values, because it
does this:

/* Adjust "hasnulls". */
if (!col_a->bv_hasnulls && col_b->bv_hasnulls)
col_a->bv_hasnulls = true;

but let's assume "col_a" is a summary representing "1" and "col_b"
represents NULL (col_b->bv_hasnulls=false col_b->bv_allnulls=true).
Well, in that case we fail to "remember" col_a should represent NULL
values too :-(

This is somewhat separate issue, because it's unrelated to empty ranges
(neither of the two ranges is empty). It's hard to test it, because it
requires a particular timing of the concurrent actions, but a breakpoint
in brin.c on the brin_can_do_samepage_update call (in summarize_range)
does the trick for manual testing.

0001 fixes the issue. 0002 is the original fix, and 0003 is just the
pageinspect changes (for master only).

For the backbranches, I thought about making the code more like master
(by moving some of the handling from opclasses to brin.c), but decided
not to. It'd be low-risk, but it feels wrong to kinda do what the master
does under "oi_regular_nulls" flag.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom 698609154324f64831ac44e4440a2283691184e2 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Thu, 18 May 2023 13:00:31 +0200
Subject: [PATCH 1/3] Fix handling of NULLs when merging BRIN summaries

When merging BRIN summaries, union_tuples() did not correctly update the
target hasnulls/allnulls flags. When merging all-NULL summary into a
summary without any NULL values, the result had both flags set to false
(instead of having hasnulls=true).

This happened because the code only considered the hasnulls flags,
ignoring the possibility the source summary has allnulls=true.

Discovered while investigating issues with handling empty BRIN ranges
and handling of NULL values, but it's a separate problem (has nothing to
do with empty ranges).

Fixed by considering both flags on the source summary, and updating the
hasnulls flag on the target summary.

Backpatch to 11. The bug exists since 9.5 (where BRIN indexes were
introduced), but those releases are EOL already.

Discussion: https://postgr.es/m/402430e4-7d9d-6cf1-09ef-464d80afff3b%40enterprisedb.com
---
 src/backend/access/brin/brin.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 41bf950a4af..a155525b7df 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -1613,10 +1613,13 @@ union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b)
 		BrinValues *col_b = >bt_columns[keyno];
 		BrinOpcInfo *opcinfo = bdesc->bd_info[keyno];
 
 		if (opcinfo->oi_regular_nulls)
 		{
+			/* Does the "b" summary represent any NULL values? */
+			bool		b_has_nulls = (col_b->bv_hasnulls || col_b->bv_allnulls);
+
 			/* Adjust "hasnulls". 

Re: Possible regression setting GUCs on \connect

2023-05-18 Thread Tom Lane
Robert Haas  writes:
> This discussion made me go back and look at the commit in question. My
> opinion is that the feature as it was committed is quite hard to
> understand. The documentation for it said this: "Specifies that
> variable should be set on behalf of ordinary role." But what does that
> even mean? What's an "ordinary role"? What does "on behalf of" mean?

Yeah.  And even more to the point: how would the feature interact with
per-user grants of SET privilege?  It seems like it would have to ignore
or override that, which is not a conclusion I like at all.

I think that commit a0ffa885e pretty much nailed down the user interface
we want, and what remains is to work out how granting SET privilege
interacts with the time-delayed nature of ALTER USER/DATABASE SET.
But the answer to that does not seem difficult to me: remember who
issued the ALTER and see if they still have SET privilege at the time
we activate a particular entry.

regards, tom lane




Re: Order changes in PG16 since ICU introduction

2023-05-18 Thread Jeff Davis
On Thu, 2023-05-18 at 13:58 -0400, Jonathan S. Katz wrote:
>  From my read of them, as an app developer I'd be very unlikely to
> use 
> this. Maybe there is something with building out some collation rules
> vis-a-vis an extension, but I have trouble imagining the use-case. I
> may 
> also not be the target audience for this feature.

That's a problem for the ICU rules feature. I understand some features
may be for domain experts only, but we at least need to call that out
so that ordinary developers don't get confused. And we should hear from
some of those domain experts that they actually want it and it solves a
real problem.

For the features that can be described with collation
settings/attributes right in the locale name, the use cases are more
plausible and we've supported them since v10, so it's good to document
them as best we can. It's hard to expose only the particular ICU
collation settings we understand best (e.g. the "ks" setting that
allows case insensitive collation), so it's inevitable that there will
be some settings that are more obscure and harder to document.

But in the case of ICU rules, they are newly-supported in 16, so there
should be a clear reason we're adding them. Otherwise we're just
setting up users for confusion or problems, and creating backwards-
compatibility headaches for ourselves (and the last thing we want is to
fret over backwards compatibility for a feature with no users).

Beyond that, there seems to be some danger: if the syntax for rules is
not perfectly compatible between ICU versions, the user might run into
big problems.

Regards,
Jeff Davis





Re: Order changes in PG16 since ICU introduction

2023-05-18 Thread Matthias van de Meent
On Fri, 21 Apr 2023 at 22:46, Jeff Davis  wrote:
>
> On Fri, 2023-04-21 at 19:00 +0100, Andrew Gierth wrote:
> > > > > >
> > Also, somewhere along the line someone broke initdb --no-locale,
> > which
> > should result in C locale being the default everywhere, but when I
> > just
> > tested it it picked 'en' for an ICU locale, which is not the right
> > thing.
>
> Fixed, thank you.

As I complain about in [0], since 5cd1a5af --no-locale has been broken
/ bahiving outside it's description: Instead of being equivalent to
`--locale=C` it now also overrides `--locale-provider=libc`, resulting
in undocumented behaviour.

Kind regards,

Matthias van de Meent
Neon, Inc.

[0] 
https://www.postgresql.org/message-id/CAEze2WiZFQyyb-DcKwayUmE4rY42Bo6kuK9nBjvqRHYxUYJ-DA%40mail.gmail.com




Re: Order changes in PG16 since ICU introduction

2023-05-18 Thread Jonathan S. Katz

On 5/18/23 1:55 PM, Jeff Davis wrote:

On Wed, 2023-05-17 at 19:59 -0400, Jonathan S. Katz wrote:

I did a quicker read through this time. LGTM overall. I like what you
did with the explanations around sensitivity (now it makes sense).


Committed, thank you.


\o/


There are a few things I don't understand that would be good to
document better:

* Rules. I still don't quite understand the use case: are these for
people inventing new languages? What is a plausible use case that isn't
covered by the existing locales and collation settings? Do rules make
sense for a database default collation? Are they for language experts
only or might an ordinary developer benefit from using them?


From my read of them, as an app developer I'd be very unlikely to use 
this. Maybe there is something with building out some collation rules 
vis-a-vis an extension, but I have trouble imagining the use-case. I may 
also not be the target audience for this feature.



* The collation types "phonebk", "emoji", etc.: are these variants of
particular locales, or do they make sense in multiple locales? I don't
know where they fit in or how to document them.


I remember I had a exploratory use case for "phonebk" but I couldn't 
figure out how to get it to work. AIUI from random searching, the idea 
is that it provides the "phonebook" rules for ordering "names" in a 
particular locale, but I couldn't get it to work.



* I don't understand what "kc" means if "ks" is not set to "level1".


Me neither, but I haven't stared at this as hard as others.

Thanks,

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature


Re: Order changes in PG16 since ICU introduction

2023-05-18 Thread Jeff Davis
On Wed, 2023-05-17 at 19:59 -0400, Jonathan S. Katz wrote:
> I did a quicker read through this time. LGTM overall. I like what you
> did with the explanations around sensitivity (now it makes sense).

Committed, thank you.

There are a few things I don't understand that would be good to
document better:

* Rules. I still don't quite understand the use case: are these for
people inventing new languages? What is a plausible use case that isn't
covered by the existing locales and collation settings? Do rules make
sense for a database default collation? Are they for language experts
only or might an ordinary developer benefit from using them?

* The collation types "phonebk", "emoji", etc.: are these variants of
particular locales, or do they make sense in multiple locales? I don't
know where they fit in or how to document them.

* I don't understand what "kc" means if "ks" is not set to "level1".

Regards,
Jeff Davis





Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-18 Thread Daniel Verite
Joel Jacobson wrote:

> I've been using that trick myself many times in the past, but thanks to this
> deep-dive into this topic, it looks to me like TEXT would be a better format
> fit when dealing with unquoted TSV files, or?
> 
> OTOH, one would then need to inspect the TSV file doesn't contain \. on an
> empty line...

Note that this is the case for valid CSV contents, since backslash-dot
on a line by itself is both an end-of-data marker for COPY FROM and a
valid CSV line.
Having this line in the data results in either an error or having the
rest of the data silently discarded, depending on the context. There
is some previous discussion about this in [1].
Since the TEXT format doesn't have this kind of problem, one solution
is to filter the data through PROGRAM with an [untrusted CSV]->TEXT
filter. This is to be preferred over direct CSV loading when
strictness or robustness are more important than convenience.


[1]
https://www.postgresql.org/message-id/10e3eff6-eb04-4b3f-aeb4-b920192b9...@manitou-mail.org

Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: Assert failure of the cross-check for nullingrels

2023-05-18 Thread Tom Lane
Richard Guo  writes:
> On Thu, May 18, 2023 at 3:34 AM Tom Lane  wrote:
>> After some poking at it I hit on what seems like a really simple
>> solution: we should be checking syn_righthand not min_righthand
>> to see whether a Var should be considered nullable by a given OJ.

> I thought about this solution before but proved it was not right in
> https://www.postgresql.org/message-id/CAMbWs48fObJJ%3DYVb4ip8tnwxwixUNKUThfnA1eGfPzJxJRRgZQ%40mail.gmail.com
> I checked the query shown there and it still fails with v3 patch.

Bleah.  The other solution I'd been poking at involved adding an
extra check for clone clauses, as attached (note this requires
8a2523ff3).  This survives your example, but I wonder if it might
reject all the clones in some cases.  It seems a bit expensive
too, although as I said before, I don't think the clone cases get
traversed all that often.

Perhaps another answer could be to compare against syn_righthand
for clone clauses and min_righthand for non-clones?  That seems
mighty unprincipled though.

regards, tom lane

diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c
index d2bc096e1c..2071783987 100644
--- a/src/backend/optimizer/util/restrictinfo.c
+++ b/src/backend/optimizer/util/restrictinfo.c
@@ -532,6 +532,12 @@ extract_actual_join_clauses(List *restrictinfo_list,
  * This function checks the second condition; we assume the caller already
  * saw to the first one.
  *
+ * For has_clone and is_clone clauses, there is a third condition: the
+ * clause_relids and eval_relids must list the same subset of relevant
+ * commutating outer joins.  This ensures we don't select the wrong one
+ * of the clones.  (We do not check this for non-cloned clauses, as it
+ * might improperly reject them.)
+ *
  * For speed reasons, we don't individually examine each Var/PHV of the
  * expression, but just look at the overall clause_relids (the union of the
  * varnos and varnullingrels).  This could give a misleading answer if the
@@ -561,7 +567,27 @@ clause_is_computable_at(PlannerInfo *root,
 
 		/* OK if clause lists it (we assume all Vars in it agree). */
 		if (bms_is_member(sjinfo->ojrelid, clause_relids))
-			continue;
+		{
+			if (rinfo->has_clone || rinfo->is_clone)
+			{
+/* Must check whether it's the right one of the clones */
+Relids		commutators = bms_union(sjinfo->commute_below_l,
+	sjinfo->commute_above_r);
+Relids		clause_commutators = bms_intersect(clause_relids,
+			   commutators);
+Relids		eval_commutators = bms_intersect(eval_relids,
+			 commutators);
+bool		match = bms_equal(clause_commutators,
+			  eval_commutators);
+
+bms_free(commutators);	/* be tidy */
+bms_free(clause_commutators);
+bms_free(eval_commutators);
+if (!match)
+	return false;	/* wrong clone */
+			}
+			continue;			/* OK, clause expects this OJ to be done */
+		}
 
 		/* Else, trouble if clause mentions any nullable Vars. */
 		if (bms_overlap(clause_relids, sjinfo->min_righthand) ||
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 9bafadde66..b254a1b649 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2500,6 +2500,52 @@ select * from int4_tbl t1
  ->  Seq Scan on int4_tbl t4
 (12 rows)
 
+explain (costs off)
+select * from int4_tbl t1
+  left join (int4_tbl t2 left join int4_tbl t3 on t2.f1 > 0) on t2.f1 > 1
+  left join int4_tbl t4 on t2.f1 > 2 and t3.f1 > 3
+where t1.f1 = coalesce(t2.f1, 1);
+ QUERY PLAN 
+
+ Nested Loop Left Join
+   Join Filter: ((t2.f1 > 2) AND (t3.f1 > 3))
+   ->  Nested Loop Left Join
+ Join Filter: (t2.f1 > 0)
+ ->  Nested Loop Left Join
+   Filter: (t1.f1 = COALESCE(t2.f1, 1))
+   ->  Seq Scan on int4_tbl t1
+   ->  Materialize
+ ->  Seq Scan on int4_tbl t2
+   Filter: (f1 > 1)
+ ->  Seq Scan on int4_tbl t3
+   ->  Materialize
+ ->  Seq Scan on int4_tbl t4
+(13 rows)
+
+explain (costs off)
+select * from int4_tbl t1
+  left join ((select t2.f1 from int4_tbl t2
+left join int4_tbl t3 on t2.f1 > 0
+where t3.f1 is null) s
+ left join tenk1 t4 on s.f1 > 1)
+on s.f1 = t1.f1;
+   QUERY PLAN
+-
+ Nested Loop Left Join
+   Join Filter: (t2.f1 > 1)
+   ->  Hash Right Join
+ Hash Cond: (t2.f1 = t1.f1)
+ ->  Nested Loop Left Join
+   Join Filter: (t2.f1 > 0)
+   Filter: (t3.f1 IS NULL)
+   ->  Seq Scan on int4_tbl t2
+   ->  Materialize
+ ->  Seq Scan on int4_tbl t3
+ ->  Hash
+   ->  Seq Scan on 

Re: Introduce WAIT_EVENT_EXTENSION and WAIT_EVENT_BUFFER_PIN

2023-05-18 Thread Robert Haas
On Wed, May 17, 2023 at 7:38 PM Andres Freund  wrote:
> On 2023-05-17 09:22:19 -0400, Robert Haas wrote:
> > Adding pgstat_get_wait_extension adds runtime cost for no corresponding
> > benefit. Having a special case in the code to avoid that seems worthwhile.
>
> I don't think that should ever be used in a path where performance is
> relevant?

I mean, I agree that it would probably be hard to measure any real
performance difference. But I'm not sure that's a good reason to add
cycles to a path where we don't really need to.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: XLog size reductions: smaller XLRec block header for PG17

2023-05-18 Thread Heikki Linnakangas

On 18/05/2023 17:59, Matthias van de Meent wrote:

Attached is a patch that reduces this overhead by up to 2 bytes by
encoding how large the block data length field is into the block ID,
and thus optionally reducing the block data's length field to 0 bytes.
Examples: cross-page update records will now be 2 bytes shorter,
because the record never registers any data for the new block of the
update; pgbench transactions are now either 6 or 8 bytes smaller
depending on whether the update crosses a page boundary (in xlog
record size; after alignment it is 0 or 4/8 bytes, depending on
MAXALIGN and whether the updates are cross-page updates).

It changes the block IDs used to fit in 6 bits, using the upper 2 bits
of the block_id field to store how much data is contained in the
record (0, <=UINT8_MAX, or <=UINT16_MAX bytes).


Perhaps we should introduce a few generic inline functions to do varint 
encoding. That could be useful in many places, while this scheme is very 
tailored for XLogRecordBlockHeader.


We could replace XLogRecordDataHeaderShort and XLogRecordDataHeaderLong 
with this too. With just one XLogRecordDataHeader, with a 
variable-length length field.



This is part 1 of a series of patches trying to decrease the size of
WAL; see also [0], [1] and [2] for more info on what's still to go.
I'm working on a separate, much more involved patch for the XLogRecord
header itself, similar to the patch in [1], which I expect to send
sometime soon as well.
Unless someone thinks the patches should be discussed as one series,
I'm planning on posting that in another thread, as I don't see any
meaningful dependencies between the patches, and the XLR header patch
will be quite a bit larger than this one.

Kind regards,

Matthias van de Meent
Neon, Inc.

[0] https://wiki.postgresql.org/wiki/Updating_the_WAL_infrastructure


Good ideas here. Eliminating the two padding bytes from XLogRecord in 
particular seems like a pure win.



PS. Benchmark results on my system (5950x with other light tasks
running) don't show an obviously negative effect in a 10-minute run
with these arbitrary pgbench settings on a fresh cluster with default
configuration:

./pg_install/bin/pgbench postgres -j 2 -c 6 -T 600 -M prepared
[...]
master: tps = 375
patched: tps = 381


That was probably not CPU limited, so that any overhead in generating 
the WAL would not show up. Try PGOPTIONS="-csynchronous_commit=off" and 
pgbench -N option. And make sure the scale is large enough that there is 
no lock contention. Also would be good to measure the overhead in 
replaying the WAL.


How much space saving does this yield?

- Heikki





Re: RFI: Extending the TOAST Pointer

2023-05-18 Thread Nikita Malakhov
Hi!

Matthias, in the Pluggable TOAST thread we proposed additional pointer
definition, without modification
of the original varatt_external - we have to keep it untouched for
compatibility issues. The following extension
for the TOAST pointer was proposed:

typedef struct varatt_custom
{
uint16 va_toasterdatalen;/* total size of toast pointer, < BLCKSZ */
uint32 va_rawsize; /* Original data size (includes header) */
uint32 va_toasterid; /* Toaster ID, actually Oid */
char va_toasterdata[FLEXIBLE_ARRAY_MEMBER]; /* Custom toaster data */
} varatt_custom;

with the new tag VARTAG_CUSTOM = 127.

Rawsize we have to keep because it is used by Executor. And Toaster ID is
the OID by which
we identify the extension that uses this pointer invariant.


On Thu, May 18, 2023 at 5:34 PM Matthias van de Meent <
boekewurm+postg...@gmail.com> wrote:

> On Thu, 18 May 2023 at 15:12, Aleksander Alekseev
>  wrote:
> >
> > Hi,
> >
> > > I agree that va_tag can have another use. But since we are going to
> > > make varatt_external variable in size (otherwise I don't see how it
> > > could be really **extendable**) I don't think this is the right
> > > approach.
>
> Why would we modify va_tag=18; data=varatt_external? A different
> va_tag option would allow us to keep the current layout around without
> much maintenance and a consistent low overhead.
>
> > On second thought, perhaps we are talking more or less about the same
> thing?
> >
> > It doesn't matter what will be used as a sign of presence of a varint
> > bitmask in the pointer. My initial proposal to use ToastCompressionId
> > for this is probably redundant since va_tag > 18 will already tell
> > that.
>
> I'm not sure "extendable" would be the right word, but as I see it:
>
> 1. We need more space to store more metadata;
> 2. Essentially all bits in varatt_external are already accounted for; and
> 3. There are still many options left in va_tag
>
> It seems to me that adding a new variant to va_att for marking new
> features in the toast infrastructure makes the most sense, as we'd
> also be able to do the new things like varints etc without needing to
> modify existing toast paths significantly.
>
> We'd need to stop using the va_tag as length indicator, but I don't
> think it's currently assumed to be a length indicator anyway (see
> VARSIZE_EXTERNAL(ptr)). By not using the varatt_external struct
> currently in use, we could be able to get down to <18B toast pointers
> as well, though I'd consider that unlikely.
>
> Kind regards,
>
> Matthias van de Meent
> Neon, Inc.
>


-- 
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/


Re: Possible regression setting GUCs on \connect

2023-05-18 Thread Robert Haas
On Wed, May 17, 2023 at 1:31 PM Alexander Korotkov  wrote:
> I have carefully noted your concerns regarding the USER SET patch that
> I've committed.  It's clear that you have strong convictions about
> this, particularly in relation to your plan of storing the setter role
> OID in pg_db_role_setting.
>
> I want to take a moment to acknowledge the significance of your
> perspective and I respect that you have a different view on this
> matter.  Although I have not yet had the opportunity to see the
> feasibility of your approach, I am open to understanding it further.
>
> Anyway, I don't want to do anything counter-productive.  So, I've
> taken the decision to revert the USER SET patch for the time being.

This discussion made me go back and look at the commit in question. My
opinion is that the feature as it was committed is quite hard to
understand. The documentation for it said this: "Specifies that
variable should be set on behalf of ordinary role." But what does that
even mean? What's an "ordinary role"? What does "on behalf of" mean? I
think these are not terms we use elsewhere in the documentation, and I
think it wouldn't be easy for users to understand how to use the
feature properly. I'm not sure whether Tom's idea about what the
design should be is good or bad, but I think that whatever we end up
with, we should try to explain more clearly and thoroughly what
problem the feature solves and how it does so.

Imagine a paragraph someplace that says something like "You might want
to do X. But if you try to do it, you'll find that it doesn't work
because Y: [SQL example] We can work around this problem using the Z
feature. That lets us tell the system that it should Q, which fixes Y:
[SQL example].". It sounds like Tom might be proposing that we solve
this problem in some way that doesn't actually require any new SQL
syntax, and if we do that, then this might not be needed. But if we do
add syntax, then I think something like this would be really good to
have.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Meson build updates

2023-05-18 Thread Tristan Partin
Hi,

I took some time to look at the Meson build for Postgres. I contribute
some of the time to Meson, itself. Within this patchset you will find
pretty small changes. Most of the commits are attempting to create more
consistency with the surrounding code. I think there is more that can be
done to improve the build a bit like including subprojects for optional
dependencies like lz4, zstd, etc.

While I was reading through the code, I also noticed a few XXX/FIXMEs. I
don't mind taking a look at those in the future, but I think I need more
context for almost all of them since I am brand new to Postgres
development. Since I also have _some_ sway in the Meson community, I can
help get more attention to Meson issues that benefit Postgres. I
currently note 3 Meson issues that are commented in the code for
instance. If anyone ever has any problems or questions about Meson or
specifically the Meson build of Postgres, I am more than happy to
receive an email to see if I can help out.

Highlighting the biggest changes in this patchset:

- Add Meson overrides
- Remove Meson program options for specifying paths

Everything but the last patch could most likely be backported to
previous maintained releases including Meson, though the patchset is
initially targeting the master branch.

-- 
Tristan Partin
Neon (https://neon.tech)
From ae61706e5466b689dfbb3d7d3eafec218547204e Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Tue, 16 May 2023 07:55:03 -0500
Subject: [PATCH postgres v1 01/17] Remove triple-quoted strings

Triple-quoted strings are for multiline strings in Meson. None of the
descriptions that got changed were multiline and the entire file uses
single-line descriptions.
---
 meson_options.txt | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/meson_options.txt b/meson_options.txt
index 5b44a8829d..1ea9729dc2 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -10,19 +10,19 @@ option('blocksize', type : 'combo',
 option('wal_blocksize', type : 'combo',
   choices: ['1', '2', '4', '8', '16', '32', '64'],
   value: '8',
-  description : '''WAL block size, in kilobytes''')
+  description : 'WAL block size, in kilobytes')
 
 option('segsize', type : 'integer', value : 1,
-  description : '''Segment size, in gigabytes''')
+  description : 'Segment size, in gigabytes')
 
 option('segsize_blocks', type : 'integer', value: 0,
-  description : '''Segment size, in blocks''')
+  description : 'Segment size, in blocks')
 
 
 # Miscellaneous options
 
 option('krb_srvnam', type : 'string', value : 'postgres',
-  description : '''Default Kerberos service principal for GSSAPI''')
+  description : 'Default Kerberos service principal for GSSAPI')
 
 option('system_tzdata', type: 'string', value: '',
   description: 'use system time zone data in specified directory')
@@ -32,7 +32,7 @@ option('system_tzdata', type: 'string', value: '',
 
 option('pgport', type : 'integer', value : 5432,
   min: 1, max: 65535,
-  description : '''Default port number for server and clients''')
+  description : 'Default port number for server and clients')
 
 
 # Developer options
-- 

Tristan Partin
Neon (https://neon.tech)

From 63d982604a2c1ee13f290c4cbd6af5e9404e541e Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Tue, 16 May 2023 08:03:31 -0500
Subject: [PATCH postgres v1 02/17] Use consistent casing in Meson option
 descriptions

Meson itself uses capital letters for option descriptions, so follow
that.
---
 meson_options.txt | 90 +++
 1 file changed, 45 insertions(+), 45 deletions(-)

diff --git a/meson_options.txt b/meson_options.txt
index 1ea9729dc2..fa823fd088 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -5,7 +5,7 @@
 option('blocksize', type : 'combo',
   choices : ['1', '2', '4', '8', '16', '32'],
   value : '8',
-  description: 'set relation block size in kB')
+  description: 'Set relation block size in kB')
 
 option('wal_blocksize', type : 'combo',
   choices: ['1', '2', '4', '8', '16', '32', '64'],
@@ -25,7 +25,7 @@ option('krb_srvnam', type : 'string', value : 'postgres',
   description : 'Default Kerberos service principal for GSSAPI')
 
 option('system_tzdata', type: 'string', value: '',
-  description: 'use system time zone data in specified directory')
+  description: 'Use system time zone data in specified directory')
 
 
 # Defaults
@@ -38,7 +38,7 @@ option('pgport', type : 'integer', value : 5432,
 # Developer options
 
 option('cassert', type : 'boolean', value: false,
-  description: 'enable assertion checks (for debugging)')
+  description: 'Enable assertion checks (for debugging)')
 
 option('tap_tests', type : 'feature', value : 'auto',
   description : 'Whether to enable tap tests')
@@ -47,43 +47,43 @@ option('PG_TEST_EXTRA', type : 'string', value: '',
   description: 'Enable selected extra tests')
 
 option('atomics', type : 'boolean', value: true,
-  description: 'whether to use atomic operations')
+  description: 

Re: The documentation for READ COMMITTED may be incomplete or wrong

2023-05-18 Thread Aleksander Alekseev
Hi,

> I wonder what we got wrong.

One thing we noticed is that the description for EvalPlanQual may be wrong [1]:

"""
In UPDATE/DELETE, only the target relation needs to be handled this way.
In SELECT FOR UPDATE, there may be multiple relations flagged FOR UPDATE,
so we obtain lock on the current tuple version in each such relation before
executing the recheck.
"""

[1]: 
https://github.com/postgres/postgres/blob/master/src/backend/executor/README#L381

-- 
Best regards,
Aleksander Alekseev




Re: The documentation for READ COMMITTED may be incomplete or wrong

2023-05-18 Thread Tom Lane
I wrote:
> Aleksander Alekseev  writes:
>> A colleague of mine, Ante Krešić, got puzzled by the following behavior:

> That's not a documentation problem.  That's a bug, and an extremely
> nasty one.  A quick check shows that it works as expected up through
> v13, but fails as described in v14 and later.  Needs bisecting ...

Ugh.  Bisecting says it broke at

86dc90056dfdbd9d1b891718d2e5614e3e432f35 is the first bad commit
commit 86dc90056dfdbd9d1b891718d2e5614e3e432f35
Author: Tom Lane 
Date:   Wed Mar 31 11:52:34 2021 -0400

Rework planning and execution of UPDATE and DELETE.

which was absolutely not supposed to be breaking any concurrent-execution
guarantees.  I wonder what we got wrong.

regards, tom lane




Re: The documentation for READ COMMITTED may be incomplete or wrong

2023-05-18 Thread Nathan Bossart
On Thu, May 18, 2023 at 10:53:35AM -0400, Tom Lane wrote:
> That's not a documentation problem.  That's a bug, and an extremely
> nasty one.  A quick check shows that it works as expected up through
> v13, but fails as described in v14 and later.  Needs bisecting ...

git-bisect points me to 86dc900.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




XLog size reductions: smaller XLRec block header for PG17

2023-05-18 Thread Matthias van de Meent
Hi all,

As was previously discussed at the thread surrounding [1]: Currently
any block registration in WAL takes up at least 8 bytes in xlog
overhead, regardless of how much data is included in that block: 1
byte for block ID, 1 byte for fork and flags, 2 bytes for the block
data length, and 4 bytes for the blockNo. (Usually, another 12 bytes
are used for a RelFileLocator; but that may not be included for some
blocks in the record when conditions apply)

Attached is a patch that reduces this overhead by up to 2 bytes by
encoding how large the block data length field is into the block ID,
and thus optionally reducing the block data's length field to 0 bytes.
Examples: cross-page update records will now be 2 bytes shorter,
because the record never registers any data for the new block of the
update; pgbench transactions are now either 6 or 8 bytes smaller
depending on whether the update crosses a page boundary (in xlog
record size; after alignment it is 0 or 4/8 bytes, depending on
MAXALIGN and whether the updates are cross-page updates).

It changes the block IDs used to fit in 6 bits, using the upper 2 bits
of the block_id field to store how much data is contained in the
record (0, <=UINT8_MAX, or <=UINT16_MAX bytes).

This is part 1 of a series of patches trying to decrease the size of
WAL; see also [0], [1] and [2] for more info on what's still to go.
I'm working on a separate, much more involved patch for the XLogRecord
header itself, similar to the patch in [1], which I expect to send
sometime soon as well.
Unless someone thinks the patches should be discussed as one series,
I'm planning on posting that in another thread, as I don't see any
meaningful dependencies between the patches, and the XLR header patch
will be quite a bit larger than this one.

Kind regards,

Matthias van de Meent
Neon, Inc.

[0] https://wiki.postgresql.org/wiki/Updating_the_WAL_infrastructure
[1] 
https://www.postgresql.org/message-id/flat/CAEze2Wjd3jY_UhhOGdGGnC6NO%3D%2BNmtNOmd%3DJaYv-v-nwBAiXXA%40mail.gmail.com#17a51d83923f4390d8f407d0d6c5da07
[2] 
https://www.postgresql.org/message-id/flat/CAEze2Whf%3DfwAj7rosf6aDM9t%2B7MU1w-bJn28HFWYGkz%2Bics-hg%40mail.gmail.com

PS. Benchmark results on my system (5950x with other light tasks
running) don't show an obviously negative effect in a 10-minute run
with these arbitrary pgbench settings on a fresh cluster with default
configuration:

./pg_install/bin/pgbench postgres -j 2 -c 6 -T 600 -M prepared
[...]
master: tps = 375
patched: tps = 381


v1-0001-Reduce-overhead-of-small-block-data-in-xlog-recor.patch
Description: Binary data


Re: The documentation for READ COMMITTED may be incomplete or wrong

2023-05-18 Thread Tom Lane
Aleksander Alekseev  writes:
> A colleague of mine, Ante Krešić, got puzzled by the following behavior:

That's not a documentation problem.  That's a bug, and an extremely
nasty one.  A quick check shows that it works as expected up through
v13, but fails as described in v14 and later.  Needs bisecting ...

regards, tom lane




Re: walsender performance regression due to logical decoding on standby changes

2023-05-18 Thread Bharath Rupireddy
On Thu, May 18, 2023 at 1:23 AM Andres Freund  wrote:
>
> On 2023-05-15 20:09:00 +0530, Bharath Rupireddy wrote:
> > > [1]
> > > max_wal_senders = 100
> > > before regression(ms)after regression(ms)v2 patch(ms)
> > > 13394.4013  14141.2615  13455.2543
> > > Compared with before regression 5.58%   0.45%
> > >
> > > max_wal_senders = 200
> > > before regression(ms)after regression(ms) v2 patch(ms)
> > > 13280.8507  14597.1173  13632.0606
> > > Compared with before regression 9.91%   1.64%
> > >
> > > max_wal_senders = 300
> > > before regression(ms)after regression(ms) v2 patch(ms)
> > > 13535.0232  16735.7379  13705.7135
> > > Compared with before regression 23.65%  1.26%
> >
> > Yes, the numbers with v2 patch look close to where we were before.
> > Thanks for confirming. Just wondering, where does this extra
> > 0.45%/1.64%/1.26% coming from?
>
> We still do more work for each WAL record than before, so I'd expect something
> small. I'd say right now the main overhead with the patch comes from the
> spinlock acquisitions in ConditionVariableBroadcast(), which happen even when
> nobody is waiting.


> > + ConditionVariableInit(>physicalWALSndCV);
> > + ConditionVariableInit(>logicalWALSndCV);
>
> It's not obvious to me that it's worth having two CVs, because it's more
> expensive to find no waiters in two CVs than to find no waiters in one CV.

I disagree. In the tight per-WAL record recovery loop, WalSndWakeup
wakes up logical walsenders for every WAL record, but it wakes up
physical walsenders only if the applied WAL record causes a TLI
switch. Therefore, the extra cost of spinlock acquire-release for per
WAL record applies only for logical walsenders. On the other hand, if
we were to use a single CV, we would be unnecessarily waking up (if at
all they are sleeping) physical walsenders for every WAL record -
which is costly IMO.

I still think separate CVs are good for selective wake ups given there
can be not so many TLI switch WAL records.

> > +  *
> > +  * XXX: When available, WaitEventSetWait() can be replaced with its 
> > CV's
> > +  * counterpart.
>
> I don't really understand that XXX - the potential bright future would be to
> add support for CVs into wait event sets, not to replace WES with a CV?

Yes, I meant it and modified that part to:

 * XXX: A desirable future improvement would be to add support for CVs into
 * WaitEventSetWait().

> FWIW, if I just make WalSndWakeup() do nothing, I still see a very small, but
> reproducible, overhead: 1.72s - that's just the cost of the additional
> external function call.

Hm, this is unavoidable.

> If I add a no-waiters fastpath using proclist_is_empty() to
> ConditionVariableBroadcast(), I get 1.77s. So the majority of the remaining
> slowdown indeed comes from the spinlock acquisition in
> ConditionVariableBroadcast().

Acquiring spinlock for every replayed WAL record seems not great. In
general, this problem exists for CV infrastructure as a whole if
ConditionVariableBroadcast()/CVB() is called in a loop/hot path. I
think this can be proven easily by doing something like - select
pg_replication_origin_session_setup()/pg_wal_replay_resume()/pg_create_physical_replication_slot()
from generate_series(1, 100...);, all of these functions end up in
CVB().

Do you think adding a fastpath exit when no waiters to CVB() is worth
implementing? Something like - an atomic state variable to each CV,
similar to LWLock's state variable, setting it when adding waiters and
resetting it when removing waiters, CVB() atomically reading the state
for fastpath (of course, we need memory barriers here). This might
complicate things as each CV structure gets an extra state variable (4
bytes), memory barriers, and atomic writes and reads.

Or given the current uses of CVB() with no callers except recovery
calling it in a hot path with patch, maybe we can add an atomic waiter
count to WalSndCtl, incrementing it atomically in WalSndWait() before
the wait, decrementing it after the wait, and a fastpath in
WalSndWakeup() reading it atomically to avoid CVB() calls. IIUC, this
is something Jeff proposed upthread.

Thoughts?

Please find the attached v4 patch addressing the review comment (not
the fastpath one).

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v4-0001-Optimize-walsender-wake-up-logic-with-Conditional.patch
Description: Binary data


Re: Assert failure of the cross-check for nullingrels

2023-05-18 Thread Tom Lane
Richard Guo  writes:
> On Thu, May 18, 2023 at 3:42 AM Tom Lane  wrote:
>> ... BTW, something I'd considered in an earlier attempt at fixing this
>> was to change clause_is_computable_at's API to pass the clause's
>> RestrictInfo not just the clause_relids, along the lines of

> This change looks good to me.

Did that part.

regards, tom lane




Re: RFI: Extending the TOAST Pointer

2023-05-18 Thread Matthias van de Meent
On Thu, 18 May 2023 at 15:12, Aleksander Alekseev
 wrote:
>
> Hi,
>
> > I agree that va_tag can have another use. But since we are going to
> > make varatt_external variable in size (otherwise I don't see how it
> > could be really **extendable**) I don't think this is the right
> > approach.

Why would we modify va_tag=18; data=varatt_external? A different
va_tag option would allow us to keep the current layout around without
much maintenance and a consistent low overhead.

> On second thought, perhaps we are talking more or less about the same thing?
>
> It doesn't matter what will be used as a sign of presence of a varint
> bitmask in the pointer. My initial proposal to use ToastCompressionId
> for this is probably redundant since va_tag > 18 will already tell
> that.

I'm not sure "extendable" would be the right word, but as I see it:

1. We need more space to store more metadata;
2. Essentially all bits in varatt_external are already accounted for; and
3. There are still many options left in va_tag

It seems to me that adding a new variant to va_att for marking new
features in the toast infrastructure makes the most sense, as we'd
also be able to do the new things like varints etc without needing to
modify existing toast paths significantly.

We'd need to stop using the va_tag as length indicator, but I don't
think it's currently assumed to be a length indicator anyway (see
VARSIZE_EXTERNAL(ptr)). By not using the varatt_external struct
currently in use, we could be able to get down to <18B toast pointers
as well, though I'd consider that unlikely.

Kind regards,

Matthias van de Meent
Neon, Inc.




Re: logical decoding and replication of sequences, take 2

2023-05-18 Thread Ashutosh Bapat
Hi,
Sorry for jumping late in this thread.

I started experimenting with the functionality. Maybe something that
was already discussed earlier. Given that the thread is being
discussed for so long and has gone several changes, revalidating the
functionality is useful.

I considered following aspects:
Changes to the sequence on subscriber
-
1. Since this is logical decoding, logical replica is writable. So the
logically replicated sequence can be manipulated on the subscriber as
well. This implementation consolidates the changes on subscriber and
publisher rather than replicating the publisher state as is. That's
good. See example command sequence below
a. publisher calls nextval() - this sets the sequence state on
publisher as (1, 32, t) which is replicated to the subscriber.
b. subscriber calls nextval() once - this sets the sequence state on
subscriber as (34, 32, t)
c. subscriber calls nextval() 32 times - on-disk state of sequence
doesn't change on subscriber
d. subscriber calls nextval() 33 times - this sets the sequence state
on subscriber as (99, 0, t)
e. publisher calls nextval() 32 times - this sets the sequence state
on publisher as (33, 0, t)

The on-disk state on publisher at the end of e. is replicated to the
subscriber but subscriber doesn't apply it. The state there is still
(99, 0, t). I think this is closer to how logical replication of
sequence should look like. This is aso good enough as long as we
expect the replication of sequences to be used for failover and
switchover.

But it might not help if we want to consolidate the INSERTs that use
nextvals(). If we were to treat sequences as accumulating the
increments, we might be able to resolve the conflicts by adjusting the
columns values considering the increments made on subscriber. IIUC,
conflict resolution is not part of built-in logical replication. So we
may not want to go this route. But worth considering.

Implementation agnostic decoded change

Current method of decoding and replicating the sequences is tied to
the implementation - it replicates the sequence row as is. If the
implementation changes in future, we might need to revise the decoded
presentation of sequence. I think only nextval() matters for sequence.
So as long as we are replicating information enough to calculate the
nextval we should be good. Current implementation does that by
replicating the log_value and is_called. is_called can be consolidated
into log_value itself. The implemented protocol, thus requires two
extra values to be replicated. Those can be ignored right now. But
they might pose a problem in future, if some downstream starts using
them. We will be forced to provide fake but sane values even if a
future upstream implementation does not produce those values. Of
course we can't predict the future implementation enough to decide
what would be an implementation independent format. E.g. if a
pluggable storage were to be used to implement sequences or if we come
around implementing distributed sequences, their shape can't be
predicted right now. So a change in protocol seems to be unavoidable
whatever we do. But starting with bare minimum might save us from
larger troubles. I think, it's better to just replicate the nextval()
and craft the representation on subscriber so that it produces that
nextval().

3. Primary key sequences
---
I am not experimented with this. But I think we will need to add the
sequences associated with the primary keys to the publications
publishing the owner tables. Otherwise, we will have problems with the
failover. And it needs to be done automatically since a. the names of
these sequences are generated automatically b. publications with FOR
ALL TABLES will add tables automatically and start replicating the
changes. Users may not be able to intercept the replication activity
to add the associated sequences are also addedto the publication.

--
Best Wishes,
Ashutosh Bapat




The documentation for READ COMMITTED may be incomplete or wrong

2023-05-18 Thread Aleksander Alekseev
Hi hackers,

A colleague of mine, Ante Krešić, got puzzled by the following behavior:

Setup:

postgres=# create table inh_test (id serial, value float);
CREATE TABLE
postgres=# create table inh_child_1 () INHERITS ( inh_test);
CREATE TABLE
postgres=# create table inh_child_2 () INHERITS ( inh_test);
CREATE TABLE
postgres=# insert into inh_child_1 values (1,1);
INSERT 0 1
postgres=# insert into inh_child_2 values (1,1);
INSERT 0 1

Update tuples in first transaction:

postgres=# begin;
BEGIN
postgres=*# update inh_test set value = 2 where value = 1;
UPDATE 2

Delete in second transaction while the first is still active:

postgres=# delete from inh_test where value = 1;

Commit in the first transaction and we get a delete in the second one
even though committed values do not qualify after update.

postgres=# COMMIT;

postgres=# delete from inh_test where value = 1;
DELETE 1

The same happens for declarative partitioned tables as well. When
working on a table without inheritance / partitioning the result is
different, DELETE 0.

So what's the problem?

According to the documentation [1]:

"""
UPDATE, DELETE [..] commands behave the same as SELECT in terms of
searching for target rows: they will only find target rows that were
committed as of the command start time. However, such a target row
might have already been updated (or deleted or locked) by another
concurrent transaction by the time it is found. In this case, the
would-be updater will wait for the first updating transaction to
commit or roll back (if it is still in progress). If the first updater
rolls back, then its effects are negated and the second updater can
proceed with updating the originally found row. If the first updater
commits, the second updater will ignore the row if the first updater
deleted it, otherwise it will attempt to apply its operation to the
updated version of the row. The search condition of the command (the
WHERE clause) is re-evaluated to see if the updated version of the row
still matches the search condition. If so, the second updater proceeds
with its operation using the updated version of the row.
"""

It looks like the observed behaviour contradicts the documentation. If
we read it literally the second transaction should delete 0 rows, as
it does for non-partitioned and non-inherited tables. From what I can
tell the observed behavior doesn't contradict the general guarantees
promised by READ COMMITTED.

Perhaps we should update the documentation for this case, or maybe
remove the quoted part of it.

Thoughts?

[1]: https://www.postgresql.org/docs/current/transaction-iso.html

-- 
Best regards,
Aleksander Alekseev




Re: RFI: Extending the TOAST Pointer

2023-05-18 Thread Aleksander Alekseev
Hi,

> I agree that va_tag can have another use. But since we are going to
> make varatt_external variable in size (otherwise I don't see how it
> could be really **extendable**) I don't think this is the right
> approach.

On second thought, perhaps we are talking more or less about the same thing?

It doesn't matter what will be used as a sign of presence of a varint
bitmask in the pointer. My initial proposal to use ToastCompressionId
for this is probably redundant since va_tag > 18 will already tell
that.

-- 
Best regards,
Aleksander Alekseev




Re: psql: Add role's membership options to the \du+ command

2023-05-18 Thread Pavel Luzanov

On 18.05.2023 05:42, Jonathan S. Katz wrote:

That said, from a readability standpoint, it was easier for me to 
follow the tabular form vs. the sentence form.


May be possible to reach a agreement on the sentence form. Similar 
descriptions used for referential constraints in the \d command:


create table t1 (id int primary key);create table t2 (id int references 
t1(id));\d t2 Table "public.t2" Column |  Type   | 
Collation | Nullable | Default 
+-+---+--+- id | integer 
|   |  | Foreign-key constraints:    "t2_id_fkey" 
FOREIGN KEY (id) REFERENCES t1(id)As for tabular form it looks more 
natural to have a separate psql command for pg_auth_members system 
catalog. Something based on this query:SELECT r.rolname role, m.rolname 
member,   admin_option admin, inherit_option inherit, set_option 
set,   g.rolname grantorFROM pg_catalog.pg_auth_members pam JOIN 
pg_catalog.pg_roles r ON (pam.roleid = r.oid) JOIN 
pg_catalog.pg_roles m ON (pam.member = m.oid) JOIN 
pg_catalog.pg_roles g ON (pam.grantor = g.oid)WHERE r.rolname !~ 
'^pg_'ORDER BY role, member, grantor;   role   |  
member  | admin | inherit | set | grantor 
--+--+---+-+-+-- regress_du_role0 
| regress_du_admin | t | t   | t   | postgres regress_du_role0 | 
regress_du_role1 | t | t   | t   | 
regress_du_admin regress_du_role0 | regress_du_role1 | f | t   | 
f   | regress_du_role1 regress_du_role0 | regress_du_role1 | f | 
f   | t   | regress_du_role2 regress_du_role0 | regress_du_role2 | 
t | f   | f   | regress_du_admin regress_du_role0 | 
regress_du_role2 | f | t   | t   | 
regress_du_role1 regress_du_role0 | regress_du_role2 | f | f   | 
f   | regress_du_role2 regress_du_role1 | regress_du_admin | t | 
t   | t   | postgres regress_du_role1 | regress_du_role2 | t | 
f   | t   | regress_du_admin regress_du_role2 | regress_du_admin | 
t | t   | t   | postgres(10 rows)But is it worth inventing a new 
psql command for this?


-
Pavel Luzanov


Re: No buildfarm animals are running both typedefs and --with-llvm

2023-05-18 Thread Tom Lane
Andrew Dunstan  writes:
> On 2023-05-17 We 18:14, Tom Lane wrote:
>> I did a preliminary test run of pgindent today, and was dismayed
>> to see a bunch of misformatting in backend/jit/llvm/, which
>> evidently is because the typedefs list available from the
>> buildfarm no longer includes any LLVM typedefs.  We apparently
>> used to have at least one buildfarm animal that was configured
>> --with-llvm and ran the "typedefs" task, but there aren't any
>> today.

> Do you remember which animals that used to be?

Nope, but last night I configured indri to do it, so we have some
typedefs coverage now.  It'd be good to have more than one animal
doing it though.

regards, tom lane




Re: run pgindent on a regular basis / scripted manner

2023-05-18 Thread Andrew Dunstan


On 2023-05-17 We 17:10, Tom Lane wrote:

I wrote:

Andrew Dunstan  writes:

I doubt there's something like that.

I had a read-through of the latest version's man page, and found
this promising-looking entry:
-boc, --break-at-old-comma-breakpoints

Sadly, this seems completely not ready for prime time.  I experimented
with it under perltidy 20230309, and found that it caused hundreds
of kilobytes of gratuitous changes that don't seem to have a direct
connection to the claimed purpose.  Most of these seemed to be from
forcing a line break after a function call's open paren, like

@@ -50,10 +50,12 @@ detects_heap_corruption(
  #
  fresh_test_table('test');
  $node->safe_psql('postgres', q(VACUUM (FREEZE, DISABLE_PAGE_SKIPPING) test));
-detects_no_corruption("verify_heapam('test')",
+detects_no_corruption(
+   "verify_heapam('test')",
"all-frozen not corrupted table");
  corrupt_first_page('test');
-detects_heap_corruption("verify_heapam('test')",
+detects_heap_corruption(
+   "verify_heapam('test')",
"all-frozen corrupted table");
  detects_no_corruption(
"verify_heapam('test', skip := 'all-frozen')",

although in some places it just wanted to insert a space, like this:

@@ -77,9 +81,9 @@ print "standby 2: $result\n";
  is($result, qq(33|0|t), 'check streamed sequence content on standby 2');
  
  # Check that only READ-only queries can run on standbys

-is($node_standby_1->psql('postgres', 'INSERT INTO tab_int VALUES (1)'),
+is( $node_standby_1->psql('postgres', 'INSERT INTO tab_int VALUES (1)'),
3, 'read-only queries on standby 1');
-is($node_standby_2->psql('postgres', 'INSERT INTO tab_int VALUES (1)'),
+is( $node_standby_2->psql('postgres', 'INSERT INTO tab_int VALUES (1)'),
3, 'read-only queries on standby 2');
  
  # Tests for connection parameter target_session_attrs



So I don't think we want that.  Maybe in some future version it'll
be more under control.

Barring objections, I'll use the attached on Friday.





LGTM


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: RFI: Extending the TOAST Pointer

2023-05-18 Thread Aleksander Alekseev
Hi,

> No, that's inaccurate. The complete on-disk representation of a varatt is
>
> {
> uint8   va_header;  /* Always 0x80 or 0x01 */
> uint8   va_tag; /* Type of datum */
> charva_data[FLEXIBLE_ARRAY_MEMBER]; /* Type-dependent
> data, for toasted values that's currently only a varatt_external */
> } varattrib_1b_e;
>
> With va_tag being filled with one of the vartag_external values:
>
> typedef enum vartag_external
> {
> VARTAG_INDIRECT = 1,
> VARTAG_EXPANDED_RO = 2,
> VARTAG_EXPANDED_RW = 3,
> VARTAG_ONDISK = 18
> } vartag_external;
>
> This enum still has many options to go before it exceeds the maximum
> of the uint8 va_tag field. Therefore, I don't think we have no disk
> representations left, nor do I think we'll need to add another option
> to the ToastCompressionId enum.
> As an example, we can add another VARTAG option for dictionary-enabled
> external toast; like what the pluggable toast patch worked on. I think
> we can salvage some ideas from that patch, even if the main idea got
> stuck.

The problem here is that the comments are ambiguous regarding what to
call "TOAST pointer" exactly. I proposed a patch for this but it was
not accepted [1].

So the exact on-disk representation of a TOAST'ed value (for
little-endian machines) is:

0b0001, 18 (va_tag), (varatt_external here)

Where 18 is sizeof(varatt_external) + 2, because the length includes
the length of the header.

I agree that va_tag can have another use. But since we are going to
make varatt_external variable in size (otherwise I don't see how it
could be really **extendable**) I don't think this is the right
approach.

Also I agree that this particular statement is incorrect:

> This will allow us to extend the pointers indefinitely.

varatt_external is going to be limited to 255. But it seems to be a
reasonable limitation for the nearest 10-20 years or so.

[1]: https://commitfest.postgresql.org/39/3820/

-- 
Best regards,
Aleksander Alekseev




Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-18 Thread Andrew Dunstan


On 2023-05-18 Th 02:19, Joel Jacobson wrote:

On Thu, May 18, 2023, at 08:00, Joel Jacobson wrote:
> 1. How about adding a `WITHOUT QUOTE` or `QUOTE NONE` option in 
conjunction

> with `COPY ... WITH CSV`?

More ideas:
[ QUOTE 'quote_character' | UNQUOTED ]
or
[ QUOTE 'quote_character' | NO_QUOTE ]

Thinking about it, I recall another hack;
specifying a non-existing char as the delimiter to force the entire 
line into a
single column table. For that use-case, we could also provide an 
option that

would internally set:

    delimc = '\0';

How about:

[DELIMITER 'delimiter_character' | UNDELIMITED ]
or
[DELIMITER 'delimiter_character' | NO_DELIMITER ]
or it should be more use-case-based and intuitive:
[DELIMITER 'delimiter_character' | WHOLE_LINE_AS_RECORD ]




QUOTE NONE and DELIMITER NONE should work fine. NONE is already a 
keyword, so the disturbance should be minimal.



cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: No buildfarm animals are running both typedefs and --with-llvm

2023-05-18 Thread Andrew Dunstan


On 2023-05-17 We 18:14, Tom Lane wrote:

I did a preliminary test run of pgindent today, and was dismayed
to see a bunch of misformatting in backend/jit/llvm/, which
evidently is because the typedefs list available from the
buildfarm no longer includes any LLVM typedefs.  We apparently
used to have at least one buildfarm animal that was configured
--with-llvm and ran the "typedefs" task, but there aren't any
today.

I can manually prevent those typedefs from getting deleted from
typedefs.list, but it'd be better if the normal process was
taking care of this -- the more so because we're hoping to automate
that process some more.





Do you remember which animals that used to be? A lot of animals are not 
building with LLVM. Maybe we should encourage it more, e.g. by adding 
--with-llvm to the default config set in the sample file.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: RFI: Extending the TOAST Pointer

2023-05-18 Thread Matthias van de Meent
On Thu, 18 May 2023 at 12:52, Aleksander Alekseev
 wrote:
>
> Hi Nikita,
>
> > this part of the PostgreSQL screams to be revised and improved
>
> I completely agree. The problem with TOAST pointers is that they are
> not extendable at the moment which prevents adding new compression
> algorithms (e.g. ZSTD), new features like compression dictionaries
> [1], etc. I suggest we add extensibility in order to solve this
> problem for the foreseeable future for everyone.
>
> > where Custom TOAST Pointer is distinguished from Regular one by va_flag 
> > field
> > which is a part of varlena header
>
> I don't think that varlena header is the best place to distinguish a
> classical TOAST pointer from an extended one. On top of that I don't
> see any free bits that would allow adding such a flag to the on-disk
> varlena representation [2].
>
> The current on-disk TOAST pointer representation is following:
>
> ```
> typedef struct varatt_external
> {
> int32 va_rawsize; /* Original data size (includes header) */
> uint32 va_extinfo; /* External saved size (without header) and
>   * compression method */
> Oid va_valueid; /* Unique ID of value within TOAST table */
> Oid va_toastrelid; /* RelID of TOAST table containing it */
> } varatt_external;
> ```

No, that's inaccurate. The complete on-disk representation of a varatt is

{
uint8   va_header;  /* Always 0x80 or 0x01 */
uint8   va_tag; /* Type of datum */
charva_data[FLEXIBLE_ARRAY_MEMBER]; /* Type-dependent
data, for toasted values that's currently only a varatt_external */
} varattrib_1b_e;

With va_tag being filled with one of the vartag_external values:

typedef enum vartag_external
{
VARTAG_INDIRECT = 1,
VARTAG_EXPANDED_RO = 2,
VARTAG_EXPANDED_RW = 3,
VARTAG_ONDISK = 18
} vartag_external;

This enum still has many options to go before it exceeds the maximum
of the uint8 va_tag field. Therefore, I don't think we have no disk
representations left, nor do I think we'll need to add another option
to the ToastCompressionId enum.
As an example, we can add another VARTAG option for dictionary-enabled
external toast; like what the pluggable toast patch worked on. I think
we can salvage some ideas from that patch, even if the main idea got
stuck.

Kind regards,

Matthias van de Meent
Neon Inc.




Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-05-18 Thread Aleksander Alekseev
Hi,

> I think either a specific pg_ctl command to return the port like Greg had 
> initially mentioned or simply a separate file to store the port numbers would 
> be ideal.

+1, if we are going to do this we definitely need a pg_ctl command
and/or a file.

> The standalone file being the simpler option

Agree. However, I think we will have to add the display of the port
number to "pg_ctl status" too, for the sake of consistency [1].

[1]: https://www.postgresql.org/docs/current/app-pg-ctl.html

-- 
Best regards,
Aleksander Alekseev




Re: RFI: Extending the TOAST Pointer

2023-05-18 Thread Aleksander Alekseev
Hi Nikita,

> this part of the PostgreSQL screams to be revised and improved

I completely agree. The problem with TOAST pointers is that they are
not extendable at the moment which prevents adding new compression
algorithms (e.g. ZSTD), new features like compression dictionaries
[1], etc. I suggest we add extensibility in order to solve this
problem for the foreseeable future for everyone.

> where Custom TOAST Pointer is distinguished from Regular one by va_flag field
> which is a part of varlena header

I don't think that varlena header is the best place to distinguish a
classical TOAST pointer from an extended one. On top of that I don't
see any free bits that would allow adding such a flag to the on-disk
varlena representation [2].

The current on-disk TOAST pointer representation is following:

```
typedef struct varatt_external
{
int32 va_rawsize; /* Original data size (includes header) */
uint32 va_extinfo; /* External saved size (without header) and
  * compression method */
Oid va_valueid; /* Unique ID of value within TOAST table */
Oid va_toastrelid; /* RelID of TOAST table containing it */
} varatt_external;
```

Note that currently only 2 compression methods are supported:

```
typedef enum ToastCompressionId
{
TOAST_PGLZ_COMPRESSION_ID = 0,
TOAST_LZ4_COMPRESSION_ID = 1,
TOAST_INVALID_COMPRESSION_ID = 2
} ToastCompressionId;
```

I suggest adding a new flag that will mark an extended TOAST format:

```
typedef enum ToastCompressionId
{
TOAST_PGLZ_COMPRESSION_ID = 0,
TOAST_LZ4_COMPRESSION_ID = 1,
TOAST_RESERVED_COMPRESSION_ID = 2,
TOAST_HAS_EXTENDED_FORMAT = 3,
} ToastCompressionId;
```

For an extended format we add a varint (utf8-like) bitmask right after
varatt_external that marks the features supported in this particular
instance of the pointer. The rest of the data is interpreted depending
on the bits set. This will allow us to extend the pointers
indefinitely.

Note that the proposed approach doesn't require running any
migrations. Note also that I described only the on-disk
representation. We can tweak the in-memory representation as we want
without affecting the end user.

Thoughts?

[1]: https://commitfest.postgresql.org/43/3626/
[2]: 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/postgres.h;h=0446daa0e61722067bb75aa693a92b38736e12df;hb=164d174bbf9a3aba719c845497863cd3c49a3ad0#l178


-- 
Best regards,
Aleksander Alekseev




RE: WL_SOCKET_ACCEPT fairness on Windows

2023-05-18 Thread Wei Wang (Fujitsu)
On Sat, April 1, 2023 at 11:00 AM Thomas Munro  wrote:
>
Hi,
Thanks for your patch.

I tried to test this patch on Windows. And I did cover the new code path below:
```
+   /* We have another event to decode. */
+   cur_event = >events[next_pos + (rc - WAIT_OBJECT_0)];
```

But I have one thing want to confirm:
In my tests, I think the scenario with two different events (e.g., one ending
socket connection and one incoming socket connection) has been optimized.
However, it seems that when there are multiple incoming socket connections, the
function WaitEventSetWaitBlock is called multiple times instead of being called
once. Is this our expected result?

Here are my test details:
I use the debugger to ensure that multiple events occur when the function
WaitEventSetWaitBlock is called. First, I added a breakpoint in the below code:
```
/*
* Sleep.
*
* Need to wait for ->nevents + 1, because signal handle is in 
[0].
*/
b   rc = WaitForMultipleObjects(set->nevents + 1, set->handles, 
FALSE,

cur_timeout);
```
And then make sure that the postmaster process enters the function
WaitForMultipleObjects. (I think the postmaster process will only return from
the function WaitForMultipleObjects when any object is signaled or timeout
occurs). Before the timeout occurs, I set up multiple socket connections using
psql (the first connection makes the process returns from the function
WaitForMultipleObjects). Then, as I continued debugging, multiple socket
connections were handled by different calls of the function
WaitEventSetWaitBlock.

Please let me know if I am missing something.

Regards,
Wang wei


Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-18 Thread Joel Jacobson
On Thu, May 18, 2023, at 08:35, Pavel Stehule wrote:
> Maybe there is another third implementation in Libre Office.
>
> Generally TSV is not well specified, and then the implementations are not 
> consistent.

Thanks Pavel, that was a very interesting case indeed:

Libre Office (tested on Mac) doesn't have a separate TSV format,
but its CSV format allows specifying custom "Field delimiter" and
"String delimiter".

How peculiar, in Libre Office, when trying to write double quotation marks
(using Shift+2 on my keyboard) you actually don't get the normal double
quotation marks, but some special type of Unicode-quoting,
e2 80 9c ("LEFT DOUBLE QUOTATION MARK") and
e2 80 9d ("RIGHT DOUBLE QUOTATION MARK"),
and in the .CSV file you get the normal double quotation marks as
"String delimiter":

a,b,c,d,e
unquoted,“this field is quoted”,this “word” is quoted,"field with , 
comma",field with  tab

So, my "this field is quoted" experiment was exported unquoted since their
quotation marks don't need to be quoted.


Re: Assert failure of the cross-check for nullingrels

2023-05-18 Thread Richard Guo
On Thu, May 18, 2023 at 3:42 AM Tom Lane  wrote:

> ... BTW, something I'd considered in an earlier attempt at fixing this
> was to change clause_is_computable_at's API to pass the clause's
> RestrictInfo not just the clause_relids, along the lines of
>
> @@ -541,9 +547,10 @@ extract_actual_join_clauses(List *restrictinfo_list,
>   */
>  bool
>  clause_is_computable_at(PlannerInfo *root,
> -Relids clause_relids,
> +RestrictInfo *rinfo,
>  Relids eval_relids)
>  {
> +Relidsclause_relids = rinfo->clause_relids;
>  ListCell   *lc;
>
>  /* Nothing to do if no outer joins have been performed yet. */
>
> with corresponding simplifications at the call sites.  That was with
> a view to examining has_clone/is_clone inside this function.  My
> current proposal doesn't require that, but I'm somewhat tempted
> to make this API change anyway for future-proofing purposes.
> Thoughts?


This change looks good to me.

Thanks
Richard


Re: Assert failure of the cross-check for nullingrels

2023-05-18 Thread Richard Guo
On Thu, May 18, 2023 at 3:34 AM Tom Lane  wrote:

> After some poking at it I hit on what seems like a really simple
> solution: we should be checking syn_righthand not min_righthand
> to see whether a Var should be considered nullable by a given OJ.
> Maybe that's still not quite right, but it seems like it might be
> right given that the last fix reaffirmed our conviction that Vars
> should be marked according to the syntactic structure.


I thought about this solution before but proved it was not right in
https://www.postgresql.org/message-id/CAMbWs48fObJJ%3DYVb4ip8tnwxwixUNKUThfnA1eGfPzJxJRRgZQ%40mail.gmail.com

I checked the query shown there and it still fails with v3 patch.

explain (costs off)
select * from t1
left join (select t2.x from t2
  left join t3 on t2.x where t3.x is null) s
left join t4 on s.x
on s.x = t1.x;
server closed the connection unexpectedly

The failure happens when we are forming the join of (t1/t2) to t3.
Consider qual 't3.x is null'.  It's a non-clone filter clause so
clause_is_computable_at is supposed to think it's applicable here.  We
have an Assert for that.  However, when checking outer join t1/t2, which
has been performed but is not listed in the qual's nullingrels,
clause_is_computable_at would think it'd null vars of the qual if we
check syn_righthand not min_righthand, and get a conclusion that the
qual is not applicable here.  This is how the Assert is triggered.

Thanks
Richard


Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-18 Thread Pavel Stehule
čt 18. 5. 2023 v 8:01 odesílatel Joel Jacobson  napsal:

> On Thu, May 18, 2023, at 00:18, Kirk Wolak wrote:
> > Here you go. Not horrible handling.  (I use DataGrip so I saved it from
> there
> > directly as TSV, just for an extra datapoint).
> >
> > FWIW, if you copy/paste in windows, the data, the field with the tab gets
> > split into another column in Excel. But saving it as a file, and opening
> it.
> > Saving it as XLSX, and then having Excel save it as a TSV (versus
> opening a
> > text file, and saving it back)
>
> Very useful, thanks.
>
> Interesting, DataGrip contrary to Excel doesn't quote fields with commas
> in TSV.
> All the DataGrip/Excel TSV variants uses quoting when necessary,
> contrary to Google Sheets's TSV-format, that doesn't quote fields at all.
>

Maybe there is another third implementation in Libre Office.

Generally TSV is not well specified, and then the implementations are not
consistent.



>
> DataGrip/Excel terminate also the last record with newline,
> while Google Sheets omit the newline for the last record,
> (which is bad, since then a streaming reader wouldn't know
> if the last record is completed or not.)
>
> This makes me think we probably shouldn't add a new TSV format,
> since there is no consistency between vendors.
> It's impossible to deduce with certainty if a TSV-field that
> begins with a double quotation mark is quoted or unquoted.
>
> Two alternative ideas:
>
> 1. How about adding a `WITHOUT QUOTE` or `QUOTE NONE` option in conjunction
> with `COPY ... WITH CSV`?
>
> Internally, it would just set
>
> quotec = '\0';`
>
> so it would't affect performance at all.
>
> 2. How about adding a note on the complexities of dealing with TSV files
> in the
> COPY documentation?
>
> /Joel
>
>


Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-18 Thread Joel Jacobson
On Thu, May 18, 2023, at 08:00, Joel Jacobson wrote:
> 1. How about adding a `WITHOUT QUOTE` or `QUOTE NONE` option in conjunction
> with `COPY ... WITH CSV`?

More ideas:
[ QUOTE 'quote_character' | UNQUOTED ]
or
[ QUOTE 'quote_character' | NO_QUOTE ]

Thinking about it, I recall another hack;
specifying a non-existing char as the delimiter to force the entire line into a
single column table. For that use-case, we could also provide an option that
would internally set:

delimc = '\0';

How about:

[DELIMITER 'delimiter_character' | UNDELIMITED ]
or
[DELIMITER 'delimiter_character' | NO_DELIMITER ]
or it should be more use-case-based and intuitive:
[DELIMITER 'delimiter_character' | WHOLE_LINE_AS_RECORD ]

/Joel

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-18 Thread Joel Jacobson
On Thu, May 18, 2023, at 00:18, Kirk Wolak wrote:
> Here you go. Not horrible handling.  (I use DataGrip so I saved it from there
> directly as TSV, just for an extra datapoint).
>
> FWIW, if you copy/paste in windows, the data, the field with the tab gets
> split into another column in Excel. But saving it as a file, and opening it.
> Saving it as XLSX, and then having Excel save it as a TSV (versus opening a
> text file, and saving it back)

Very useful, thanks.

Interesting, DataGrip contrary to Excel doesn't quote fields with commas in TSV.
All the DataGrip/Excel TSV variants uses quoting when necessary,
contrary to Google Sheets's TSV-format, that doesn't quote fields at all.

DataGrip/Excel terminate also the last record with newline,
while Google Sheets omit the newline for the last record,
(which is bad, since then a streaming reader wouldn't know
if the last record is completed or not.)

This makes me think we probably shouldn't add a new TSV format,
since there is no consistency between vendors.
It's impossible to deduce with certainty if a TSV-field that
begins with a double quotation mark is quoted or unquoted.

Two alternative ideas:

1. How about adding a `WITHOUT QUOTE` or `QUOTE NONE` option in conjunction
with `COPY ... WITH CSV`?

Internally, it would just set

quotec = '\0';`

so it would't affect performance at all.

2. How about adding a note on the complexities of dealing with TSV files in the
COPY documentation?

/Joel