Re: Added missing invalidations for all tables publication

2021-09-05 Thread Amit Kapila
On Tue, Aug 31, 2021 at 8:54 PM vignesh C  wrote:
>
> On Tue, Aug 31, 2021 at 2:00 PM Kyotaro Horiguchi
>  wrote:
> >
>
> Thanks for the comments, the attached v3 patch has the changes for the same.
>

I think this bug should be fixed in back branches (till v10). OTOH, as
this is not reported by any user and we have found it during code
review so it seems either users don't have an exact use case or they
haven't noticed this yet. What do you people think about
back-patching?

Attached, please find a slightly updated patch with minor changes. I
have slightly changed the test to make it more meaningful. If we
decide to back-patch this, can you please test this on back-branches?

-- 
With Regards,
Amit Kapila.


v4-0001-Invalidate-relcache-for-publications-defined-for-.patch
Description: Binary data


Re: Skipping logical replication transactions on subscriber side

2021-09-05 Thread Masahiko Sawada
On Sat, Sep 4, 2021 at 12:24 PM Amit Kapila  wrote:
>
> On Fri, Sep 3, 2021 at 2:15 AM Mark Dilger  
> wrote:
> >
> > > On Aug 30, 2021, at 12:06 AM, Masahiko Sawada  
> > > wrote:
> > >
> > > I've attached rebased patches.
> > For the v12-0003 patch:
> >
> > I believe this feature is needed, but it also seems like a very powerful 
> > foot-gun.  Can we do anything to make it less likely that users will hurt 
> > themselves with this tool?
> >
>
> This won't do any more harm than currently, users can do via
> pg_replication_slot_advance and the same is documented as well, see
> [1]. This will be allowed to only superusers.  Its effect will be
> documented with a precautionary note to use it only when the other
> available ways can't be used.

Right.

>
> > I am thinking back to support calls I have attended.  When a production 
> > system is down, there is often some hesitancy to perform ad-hoc operations 
> > on the database, but once the decision has been made to do so, people try 
> > to get the whole process done as quickly as possible.  If multiple 
> > transactions on the publisher fail on the subscriber, they will do so in 
> > series, not in parallel.
> >
>
> The subscriber will know only one transaction failure at a time, till
> that is resolved, the apply won't move ahead and it won't know even if
> there are other transactions that are going to fail in the future.
>
> >
> > If the user could instead clear all failed transactions of the same type, 
> > that might make it less likely that they unthinkingly also skip subsequent 
> > errors of some different type.  Perhaps something like ALTER SUBSCRIPTION 
> > ... SET (skip_failures = 'duplicate key value violates unique constraint 
> > "test_pkey"')?
> >
>
> I think if we want we can allow to skip particular error via
> skip_error_code instead of via error message but not sure if it would
> be better to skip a particular operation of the transaction rather
> than the entire transaction. Normally from the atomicity purpose the
> transaction can be either committed or rolled-back but not partially
> done so I think it would be preferable to skip the entire transaction
> rather than skipping it partially.

I think the suggestion by Mark is to skip the entire transaction if
the kind of error matches the specified error.

I think my proposed feature is meant to be a tool to cover the
situation like where something should not happen have happened, rather
than conflict resolution.  If the users failed into a difficult
situation where they need to skip a lot of transaction by this
skip_xid feature, they should rebuild the logical replication from
scratch. It seems to me that skipping all transactions that failed due
to the same type of failure seems to be problematic, for example, if
the user forget to reset it. If we want to skip the particular
operation that failed due to the specified error, we should have a
proper conflict resolution feature that can handle various types of
conflicts by various types of resolutions methods, like other RDBMS
supports.

>
> >  This is arguably a different feature request, and not something your patch 
> > is required to address, but I wonder how much we should limit people 
> > shooting themselves in the foot?  If we built something like this using 
> > your skip_xid feature, rather than instead of your skip_xid feature, would 
> > your feature need to be modified?
> >
>
> Sawada-San can answer better but I don't see any problem building any
> such feature on top of what is currently proposed.

If the feature you proposed is to skip the entire transaction, I also
don't see any problem building the feature on top of my patch. The
patch adds the mechanism to skip the entire transaction so what we
need to do for that feature is to extend how to trigger the skipping
behavior.

>
> >
> > I'm having trouble thinking of an example conflict where skipping a 
> > transaction would be better than writing a BEFORE INSERT trigger on the 
> > conflicting table which suppresses or redirects conflicting rows somewhere 
> > else.  Particularly for larger transactions containing multiple statements, 
> > suppressing the conflicting rows using a trigger would be less messy than 
> > skipping the transaction.  I think your patch adds a useful tool to the 
> > toolkit, but maybe we should mention more alternatives in the docs?  
> > Something like, "changing the data on the subscriber so that it doesn't 
> > conflict with incoming changes, or dropping the conflicting constraint or 
> > unique index, or writing a trigger on the subscriber to suppress or 
> > redirect conflicting incoming changes, or as a last resort, by skipping the 
> > whole transaction"?
> >
>
> +1 for extending the docs as per this suggestion.

Agreed. I'll add such description to the doc.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-09-05 Thread Dilip Kumar
On Mon, Sep 6, 2021 at 1:58 AM Andres Freund  wrote:

> On 2021-09-05 14:22:51 +0530, Dilip Kumar wrote:

> But these directly operate on the buffers and In my patch, whether we are
> > reading the pg_class for identifying the relfilenode or we are copying
> the
> > relation block by block we are always holding the lock on the buffer.
>
> I don't think a buffer lock is really sufficient. See e.g. code like:
>

I agree that the only buffer lock is not sufficient, but here we are
talking about the case where we are already holding the exclusive lock on
the database + the buffer lock.   So the cases like below which should be
called only from the drop relation must be protected by the database
exclusive lock and the other example like buffer reclaim/checkpointer
should be protected by the buffer pin + lock.   Having said that, I am not
against the point that we should not acquire the relation lock in our
case.  I agree that if there is an assumption that for holding the buffer
pin we must be holding the relation lock then better not to break that.


static void
> InvalidateBuffer(BufferDesc *buf)
> {
> ...
> /*
>  * We assume the only reason for it to be pinned is that someone
> else is
>  * flushing the page out.  Wait for them to finish.  (This could
> be an
>  * infinite loop if the refcount is messed up... it would be nice
> to time
>  * out after awhile, but there seems no way to be sure how many
> loops may
>  * be needed.  Note that if the other guy has pinned the buffer
> but not
>  * yet done StartBufferIO, WaitIO will fall through and we'll
> effectively
>  * be busy-looping here.)
>  */
> if (BUF_STATE_GET_REFCOUNT(buf_state) != 0)
> {
> UnlockBufHdr(buf, buf_state);
> LWLockRelease(oldPartitionLock);
> /* safety check: should definitely not be our *own* pin */
> if (GetPrivateRefCount(BufferDescriptorGetBuffer(buf)) > 0)
> elog(ERROR, "buffer is pinned in
> InvalidateBuffer");
> WaitIO(buf);
> goto retry;
> }
>
> IOW, currently we assume that you're only allowed to pin a block in a
> relation
> while you hold a lock on the relation. It might be a good idea to change
> that,
> but it's not as trivial as one might think - consider e.g. dropping a
> relation's buffers while holding an exclusive lock: If there's potential
> concurrent reads of that buffer we'd be in trouble.
>

> > 3. While copying the relation whether to use the bufmgr or directly use
> the
> > smgr?
> >
> > If we use the bufmgr then maybe we can avoid flushing some of the buffers
> > to the disk and save some I/O but in general we copy from the template
> > database so there might not be a lot of dirty buffers and we might not
> save
> > anything
>
> I would assume the big benefit would be that the *target* database does not
> have to be written out / shared buffer is immediately populated.
>

Okay, that makes sense.  Infact for using the shared buffers for the
destination database's relation we don't even have the locking issue,
because that database is not yet accessible to anyone right?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: Postgres perl module namespace

2021-09-05 Thread Michael Paquier
On Sat, Sep 04, 2021 at 09:58:08AM -0400, Andrew Dunstan wrote:
> On 9/4/21 2:19 AM, Noah Misch wrote:
>> plperl uses PostgreSQL:: as the first component of its Perl module namespace.
>> We shouldn't use both PostgreSQL:: and Postgres:: in the same source tree, so
>> this change should not use Postgres::. 
>
> Good point. Here's the same thing using PostgreSQL::Test

A minor point: this introduces PostgreSQL::Test::PostgresVersion.
Could be this stripped down to PostgreSQL::Test::Version instead?
--
Michael


signature.asc
Description: PGP signature


RE: [PATCH] Add tab-complete for backslash commands

2021-09-05 Thread tanghy.f...@fujitsu.com
On Sunday, September 5, 2021 1:42 AM, Tom Lane  wrote:
>I particularly question why we'd offer both
>single- and multiple-character versions, as the single-character
>version seems entirely useless from a completion standpoint.

I generally agreed with your opinion. But I'm not sure if there's someone
who'd like to see the list of backslash commands and choose one to use.
I mean, someone may type '\[tab][tab]' to check all supported backslash 
commands.
postgres=# \
Display all 105 possibilities? (y or n)
\! \dc\dL\dx\h \r
...

In the above scenario, both single- and multiple-character versions could be 
helpful, thought?

Regards,
Tang


RE: Added schema level support for publication.

2021-09-05 Thread tanghy.f...@fujitsu.com
On Thursday, September 2, 2021 2:36 PM vignesh C  wrote:
> 
> On Wed, Sep 1, 2021 at 11:14 AM tanghy.f...@fujitsu.com
>  wrote:
> >
> > 3. When using '\dn+', I noticed that the list of publications only contains 
> > the
> > publications for "SCHEMA", "FOR ALL TABLES" publications are not shown. Is 
> > it
> designed on purpose?
> > (The result of '\d+' lists the publications of "SCHEAME" and "FOR ALL 
> > TABLES").
> >
> > For example:
> > create schema sch1;
> > create table sch1.tbl(a int);
> > create publication pub_schema for all tables in schema sch1;
> > create publication pub_all_tables for all tables;
> 
> I'm not sure if it is intentional or not, Do you want to post the
> question in a separate thread and see if that should be handled?
> 

Sorry, maybe I didn't make my last question clearly enough. 

In HEAD(where schema level is not supported for publication), there is no 
publication
information in the result of '\dn+'. 

With this schema patch, '\dn+' shows the publications related to the schema, 
but ALL
TABLES publications are not shown. Do you think we should add ALL TABLES 
publications, too?

Regards
Tang


Re: pg_receivewal starting position

2021-09-05 Thread Michael Paquier
On Fri, Sep 03, 2021 at 11:58:27AM +0200, Ronan Dunklau wrote:
> 
> 0002 for pg_receivewal tried to simplify the logic of information to return, 
> by using a dedicated struct for this. This accounts for Bahrath's demands to 
> return every possible field.
> In particular, an is_logical field simplifies the detection of the type of 
> slot. 
> In my opinion it makes sense to simplify it like this on the client side 
> while 
> being more open-minded on the server side if we ever need to provide a new 
> type of slot. Also, GetSlotInformation now returns an enum to be able to 
> handle the different modes of failures, which differ between pg_receivewal 
> and 
> pg_basebackup. 

+   if (PQserverVersion(conn) < 15)
+   return READ_REPLICATION_SLOT_UNSUPPORTED;
[...]
+typedef enum {
+   READ_REPLICATION_SLOT_OK,
+   READ_REPLICATION_SLOT_UNSUPPORTED,
+   READ_REPLICATION_SLOT_ERROR,
+   READ_REPLICATION_SLOT_NONEXISTENT
+} ReadReplicationSlotStatus;

Do you really need this much complication?  We could treat the
unsupported case and the non-existing case the same way: we don't know
so just assign InvalidXlogRecPtr or NULL to the fields of the
structure, and make GetSlotInformation() return false just on error,
with some pg_log_error() where adapted in its internals.

> There is still the concern raised by Bharath about being able to select the 
> way to fetch the replication slot information for the user, and what should 
> or 
> should not be included in the documentation. I am in favor of documenting the 
> process of selecting the wal start, and maybe include a --start-lsn option 
> for 
> the user to override it, but that's maybe for another patch.

The behavior of pg_receivewal that you are implementing should be
documented.  We don't say either how the start location is selected
when working on an existing directory, so I would recommend to add a
paragraph in the description section to detail all that, as of:
- First a scan of the existing archives is done.
- If nothing is found, and if there is a slot, request the slot
information.
- If still nothing (aka slot undefined, or command not supported), use
the last flush location.

As a whole, I am not really convinced that we need a new option for
that as long as we rely on a slot with pg_receivewal as these are used
to make sure that we avoid holes in the WAL archives.

Regarding pg_basebackup, Daniel has proposed a couple of days ago a
different solution to trap errors earlier, which would cover the case
dealt with here:
https://www.postgresql.org/message-id/0f69e282-97f9-4db7-8d6d-f927aa634...@yesql.se
We should not mimic in the frontend errors that are safely trapped in
the backend with the proper locks, in any case.

While on it, READ_REPLICATION_SLOT returns a confirmed LSN when
grabbing the data of a logical slot.  We are not going to use that
with pg_recvlogical as by default START_STREAMING 0/0 would just use
the confirmed LSN.  Do we have use cases where this information would
help?  There is the argument of consistency with physical slots and
that this can be helpful to do sanity checks for clients, but that's
rather thin.
--
Michael


signature.asc
Description: PGP signature


Re: Estimating HugePages Requirements?

2021-09-05 Thread Bossart, Nathan
On 9/5/21, 7:28 PM, "Michael Paquier"  wrote:
> On Fri, Sep 03, 2021 at 05:36:43PM +, Bossart, Nathan wrote:
>> On 9/2/21, 6:46 PM, "Michael Paquier"  wrote:
>>> 0001, that refactors the calculation of the shmem size into a
>>> different routine, is fine as-is, so I'd like to move on and apply
>>> it.
>> 
>> Sounds good to me.
>
> Applied this one.

Thanks!

> Without concluding on 0002 yet, another thing that we could do is to
> just add the GUCs.  These sound rather useful on their own (mixed
> feelings about huge_pages_required but I can see why it is useful to
> avoid the setup steps and the backend already grabs this information),
> particularly when it comes to cloned setups that share a lot of
> properties.

I think this is a good starting point, but I'd like to follow up on
making them visible without completely starting the server.  The main
purpose for adding these GUCs is to be able to set up huge pages
before server startup.  Disallowing "-C huge_pages_required" on a
running server to enable this use-case seems like a modest tradeoff.

Anyway, I'll restructure the remaining patches to add the GUCs first
and then address the 0002 business separately.

Nathan



RE: Allow escape in application_name

2021-09-05 Thread kuroda.hay...@fujitsu.com
Dear Ikeda-san,

> I agree that this feature is useful. Thanks for working this.

Thanks :-).

> 1)
> 
> Why does postgres_fdw.application_name override the server's option?
> 
> > +  establishes a connection to a foreign server.  This overrides
> > +  application_name option of the server object.
> > +  Note that change of this parameter doesn't affect any existing
> > +  connections until they are re-established.
> 
> My expected behavior was that application_name option of server object
> overrides the GUC because other GUCs seems behave so. For example,
> log_statement in postgresql.conf can be overrided by ALTER ROLE for
> only specific user. Should the narrower scope setting takes precedence?

Hmm... I didn't know the policy in other options, thank you.
I set higher priority because of the following reason.
When users set app name as server option and
create remote connection from same backend process,
they will use same even if GUC sets.
In this case users must execute ALTER SERVERS every session,
and I think it is inconvenient.

I think both approaches are reasonable, so I also want to ask committer.
Fujii-san, how do you think?

> 2)
> 
> Is it better to specify that we can use the escaped format
> for application_name option of the server object in the document?
> I think it's new feature too.

Yeah, I agree. I will add at `Connection Options` section in 0002 patch.
Is it OK? Do we have more good place?

> 3)
> 
> Is the following expected behavior?
> 
> * 1. [coorinator] configure "postgres_fdw.application_name" as "[%d]".
> * 2. [coodinator] connect *Non-ASCII* database.
> * 3. [coodinator] execute queries for remote server.
> * 4. [remote] execute the following query.
> 
> ```
> postgres(2654193)=# SELECT application_name FROM pg_stat_activity WHERE
> backend_type = 'client backend'  ;
>   application_name
> ---
>   psql
>   [?]
> ```
> 
> One of the difference between log_line_prefix and application_name
> is that whether it only allow clean ASCII chars or not. So, the above
> behavior is expected.
> 
> I think there are three choices for handling the above case.
> 
> 1) Accept the above case because Non-ASCII rarely be used(?), and
> describe the limitation in the document.
> 2) Add a new characters for oid of database.
> 3) Lease the limitation of application_name and make it accept
> Non-ASCII.
> 
> As a user, 3) is best for me.
> But it seems be out of scope this threads, so will you select 1)?

Actually I did not considering about such a situation...
But I want to choose (1) because the limitation is caused by libpq layer
and the modification is almost unrelated to this patch.
I'm not sure why appname have such a limitation
so at first we should investigate it. How do you think?


Best Regards,
Hayato Kuroda
FUJITSU LIMITED





Re: Unused variable in TAP tests file

2021-09-05 Thread Amul Sul
Thank you !

Regards,
Amul

On Mon, Sep 6, 2021 at 7:58 AM Michael Paquier  wrote:
>
> On Fri, Sep 03, 2021 at 04:03:36PM +0900, Michael Paquier wrote:
> > Indeed.  Let's clean up that.  Thanks!
>
> And done.
> --
> Michael




Re: Column Filtering in Logical Replication

2021-09-05 Thread Amit Kapila
On Sat, Sep 4, 2021 at 8:11 PM Alvaro Herrera  wrote:
>
> On 2021-Sep-04, Amit Kapila wrote:
>
> > On Thu, Sep 2, 2021 at 2:19 PM Alvaro Herrera  
> > wrote:
> > >
> > > On 2021-Sep-02, Rahila Syed wrote:
> > >
> > > > After thinking about this, I think it is best to remove the entire table
> > > > from publication,
> > > > if a column specified in the column filter is dropped from the table.
> > >
> > > Hmm, I think it would be cleanest to give responsibility to the user: if
> > > the column to be dropped is in the filter, then raise an error, aborting
> > > the drop.
> >
> > Do you think that will make sense if the user used Cascade (Alter
> > Table ... Drop Column ... Cascade)?
>
> ... ugh.  Since CASCADE is already defined to be a potentially-data-loss
> operation, then that may be acceptable behavior.  For sure the default
> RESTRICT behavior shouldn't do it, though.
>

That makes sense to me.

-- 
With Regards,
Amit Kapila.




Re: PG Docs - CREATE SUBSCRIPTION option list order

2021-09-05 Thread Amit Kapila
On Sun, Sep 5, 2021 at 12:23 AM Tom Lane  wrote:
>
> Amit Kapila  writes:
> > On Mon, Apr 19, 2021 at 10:32 AM Peter Smith  wrote:
> >> Yes, if there were dozens of list items then I would agree that they
> >> should be grouped somehow. But there aren't.
>
> > I think this list is going to grow in the future as we enhance this
> > subsystem. For example, the pending 2PC patch will add another
> > parameter to this list.
>
> Well, we've got nine now; growing to ten wouldn't be a lot.  However,
> I think that grouping the options would be helpful because the existing
> presentation seems extremely confusing.  In particular, I think it might
> help to separate the options that only determine what happens during
> CREATE SUBSCRIPTION from those that control how replication behaves later.
>

+1. I think we can group them as (a) create_slot, slot_name, enabled,
connect, and (b) copy_data, synchronous_commit, binary, streaming,
two_phase. The first controls what happens during Create Subscription
and the later ones control the replication behavior later.

> (Are the latter set the same ones that are shared with ALTER
> SUBSCRIPTION?)
>

If we agree with the above categorization then not all of them fall
into the latter category.

-- 
With Regards,
Amit Kapila.




Re: Don't clean up LLVM state when exiting in a bad way

2021-09-05 Thread Justin Pryzby
On Wed, Aug 18, 2021 at 03:00:59PM +, Jelte Fennema wrote:
> I ran into some segfaults when using Postgres that was compiled with LLVM 7. 
> According to the backtraces these crashes happened during the call to 
> llvm_shutdown, during cleanup after another out of memory condition. It seems 
> that calls to LLVMOrcDisposeInstance, can crash (at least on LLVM 7) when 
> LLVM is left in bad state. I attached the relevant part of the stacktrace to 
> this email.
> 
> With the attached patch these segfaults went away. The patch turns 
> llvm_shutdown into a no-op whenever the backend is exiting with an error. 
> Based on my understanding of the code this should be totally fine. No memory 
> should be leaked, since all memory will be cleaned up anyway once the backend 
> exits shortly after. The only reason this cleanup code even seems to exist at 
> all is to get useful LLVM profiling data. To me it seems be acceptable if the 
> profiling data is incorrect/missing when the backend exits with an error.

Andres , could you comment on this ?

This seems to explain the crash I reported to you when testing your WIP patches
for the JIT memory leak.  I realize now that the crash happens without your
patches.
https://www.postgresql.org/message-id/20210419164130.byegpfrw46mza...@alap3.anarazel.de

I can reproduce the crash on master (not just v13, as I said before) compiled
on centos7, with:
LLVM_CONFIG=/usr/lib64/llvm7.0/bin/llvm-config 
CLANG=/opt/rh/llvm-toolset-7.0/root/usr/bin/clang

I cannot reproduce the crash after applying Jelte's patch.

I couldn't crash on ubuntu either, so maybe they have a patch which fixes this,
or maybe RH applied a patch which caused it...

postgres=# CREATE TABLE t AS SELECT i FROM generate_series(1,99)i; VACUUM 
ANALYZE t;
postgres=# SET client_min_messages=debug; SET statement_timeout=333; SET 
jit_above_cost=0; SET jit_optimize_above_cost=-1; SET jit_inline_above_cost=-1; 
explain analyze SELECT sum(i) FROM t a NATURAL JOIN t b;
2021-09-05 22:47:12.807 ADT client backend[7563] psql ERROR:  canceling 
statement due to statement timeout
2021-09-05 22:47:12.880 ADT postmaster[7272] LOG:  background worker "parallel 
worker" (PID 8212) was terminated by signal 11: Segmentation fault

-- 
Justin




Re: Unused variable in TAP tests file

2021-09-05 Thread Michael Paquier
On Fri, Sep 03, 2021 at 04:03:36PM +0900, Michael Paquier wrote:
> Indeed.  Let's clean up that.  Thanks!

And done.
--
Michael


signature.asc
Description: PGP signature


Re: Estimating HugePages Requirements?

2021-09-05 Thread Michael Paquier
On Fri, Sep 03, 2021 at 05:36:43PM +, Bossart, Nathan wrote:
> On 9/2/21, 6:46 PM, "Michael Paquier"  wrote:
> You bring up an interesting point about _PG_init().  Presently, you
> can safely assume that the data directory is locked during _PG_init(),
> so there's no need to worry about breaking something on a running
> server.  I don't know how important this is.  Most _PG_init()
> functions that I've seen will define some GUCs, request some shared
> memory, register some background workers, and/or install some hooks.
> Those all seem like safe things to do, but I wouldn't be at all
> surprised to hear examples to the contrary.  In any case, it looks
> like the current ordering of these two steps has been there for 15+
> years.

Yeah.  What you are describing here matches what I have seen in the
past and what we do in core for _PG_init().  Now extensions developers
could do more fancy things, like dropping things on-disk to check the
load state, for whatever reasons.  And things could break in such
cases.  Perhaps people should not do that, but it is no fun either to
break code that has been working for years even if that's just a major
upgrade.

+* We skip this step if we are just going to print a GUC's value and 
exit
+* a few steps down.
 */
-   CreateDataDirLockFile(true);
+   if (output_config_variable == NULL)
+   CreateDataDirLockFile(true);

Anyway, 0002 gives me shivers.

> If this is a concern, one option would be to disallow running "-C
> shared_memory_size" on running servers.  That would have to extend to
> GUCs like data_checksums and huge_pages_required, too.

Just noting this bit from 0003 that would break without 0002:
-$ pmap 4170 | awk '/rw-s/  /zero/ {print $2}'
-6490428K
+$ postgres -D $PGDATA -C shared_memory_size
+6339

>> 0001, that refactors the calculation of the shmem size into a
>> different routine, is fine as-is, so I'd like to move on and apply
>> it.
> 
> Sounds good to me.

Applied this one.

Without concluding on 0002 yet, another thing that we could do is to
just add the GUCs.  These sound rather useful on their own (mixed
feelings about huge_pages_required but I can see why it is useful to
avoid the setup steps and the backend already grabs this information),
particularly when it comes to cloned setups that share a lot of
properties.
--
Michael


signature.asc
Description: PGP signature


Re: Allow escape in application_name

2021-09-05 Thread Masahiro Ikeda

Hi, Kuroda-san, Fujii-san

I agree that this feature is useful. Thanks for working this.
I've tried to use the patches and read them. I have some questions.

1)

Why does postgres_fdw.application_name override the server's option?


+  establishes a connection to a foreign server.  This overrides
+  application_name option of the server object.
+  Note that change of this parameter doesn't affect any existing
+  connections until they are re-established.


My expected behavior was that application_name option of server object
overrides the GUC because other GUCs seems behave so. For example,
log_statement in postgresql.conf can be overrided by ALTER ROLE for
only specific user. Should the narrower scope setting takes precedence?

2)

Is it better to specify that we can use the escaped format
for application_name option of the server object in the document?
I think it's new feature too.

3)

Is the following expected behavior?

* 1. [coorinator] configure "postgres_fdw.application_name" as "[%d]".
* 2. [coodinator] connect *Non-ASCII* database.
* 3. [coodinator] execute queries for remote server.
* 4. [remote] execute the following query.

```
postgres(2654193)=# SELECT application_name FROM pg_stat_activity WHERE 
backend_type = 'client backend'  ;

 application_name
---
 psql
 [?]
```

One of the difference between log_line_prefix and application_name
is that whether it only allow clean ASCII chars or not. So, the above
behavior is expected.

I think there are three choices for handling the above case.

1) Accept the above case because Non-ASCII rarely be used(?), and
   describe the limitation in the document.
2) Add a new characters for oid of database.
3) Lease the limitation of application_name and make it accept 
Non-ASCII.


As a user, 3) is best for me.
But it seems be out of scope this threads, so will you select 1)?

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




Re: storing an explicit nonce

2021-09-05 Thread Sasasu



在 2021/9/5 下午10:51, Sasasu 写道:


For AES-GCM, a predictable IV is fine. I think we can decrypt and 
re-encrypt the user data in pg_upgrade. this will allows us to use 
relfile oid + block number as nonce.


relfile oid + block number + some counter for heap table IV. I mean.


OpenPGP_0x4E72AF09097DAE2E.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


RE: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)

2021-09-05 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san,

Thank you for updating! Your modification is very interesting and
I learn something new.

> Attached is the updated version of the patch. I removed the test
> for case (1). And I arranged the regression tests so that they are based
> on debug_discard_caches, to simplify them. Also I added and updated
> some comments and docs. Could you review this version?

I agree removing (1) because it is obvious case.

```diff
+-- If appname is set both as GUC and as options of server object,
+-- the GUC setting overrides appname of server object and is used.
+SET postgres_fdw.application_name TO 'fdw_guc_appname';
+SELECT 1 FROM postgres_fdw_disconnect_all();
+ ?column? 
+--
+1
+(1 row)
+
+SELECT 1 FROM ft1 LIMIT 1;
+ ?column? 
+--
+1
+(1 row)
+
+SELECT application_name FROM pg_stat_activity
+   WHERE application_name IN ('loopback2', 'fdw_guc_appname');
+ application_name 
+--
+ fdw_guc_appname
+(1 row)
```

I think we should SELECT ft6 because foreign server 'loopback'
doesn't have application_name server option.

I have no comments anymore.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED





RE: Skipping logical replication transactions on subscriber side

2021-09-05 Thread houzj.f...@fujitsu.com
From Sun, Sep 5, 2021 9:58 PM Masahiko Sawada :
> On Thu, Sep 2, 2021 at 8:37 PM houzj.f...@fujitsu.com 
>  wrote:
> >
> > From Mon, Aug 30, 2021 3:07 PM Masahiko Sawada  
> > wrote:
> > > I've attached rebased patches. 0004 patch is not the scope of this
> > > patch. It's borrowed from another thread[1] to fix the assertion
> > > failure for newly added tests. Please review them.
> >
> > Hi,
> >
> > I reviewed the 0002 patch and have a suggestion for it.
> @@ -3672,11 +3671,12 @@ typedef enum AlterSubscriptionType  typedef
> struct AlterSubscriptionStmt  {
> NodeTag type;
> -   AlterSubscriptionType kind; /* ALTER_SUBSCRIPTION_SET_OPTIONS,
> etc */
> +   AlterSubscriptionType kind; /* ALTER_SUBSCRIPTION_OPTIONS, etc
> + */
> char   *subname;/* Name of the subscription */
> char   *conninfo;   /* Connection string to publisher */
> List   *publication;/* One or more publication to
> subscribe to */
> List   *options;/* List of DefElem nodes */
> +   boolisReset;/* true if RESET option */
>  } AlterSubscriptionStmt;
> 
> It's unnatural to me that AlterSubscriptionStmt has isReset flag even in 
> spite of
> having 'kind' indicating the command. How about having RESET comand use
> the same logic of SET as you suggested while having both
> ALTER_SUBSCRIPTION_SET_OPTIONS and
> ALTER_SUBSCRIPTION_RESET_OPTIONS?

Yes, I agree with you it will look more natural with 
ALTER_SUBSCRIPTION_RESET_OPTIONS.

Best regards,
Hou zj


RE: Added schema level support for publication.

2021-09-05 Thread houzj.f...@fujitsu.com
From Thur, Sep 2, 2021 7:42 PM Amit Kapila  wrote:
> On Thu, Sep 2, 2021 at 11:58 AM vignesh C  wrote:
> >
> 
> Below are few comments on v23. If you have already addressed anything in v24,
> then ignore it.
> 
> 1. The commit message says: A new system table "pg_publication_schema"
> has been added, to maintain the schemas that the user wants to publish
> through the publication.". The alternative name for this system table could be
> "pg_publication_namespace". The reason why this alternative comes to my
> mind is that the current system table to store schema information is named
> pg_namespace. So shouldn't we be consistent here?
> What do others think about this?

Since the new system table refer to pg_namespace, so, personally, I am +1 for
"pg_publication_namespace".

Best regards,
Hou zj


RE: Added schema level support for publication.

2021-09-05 Thread houzj.f...@fujitsu.com
From Thur, Sep 2, 2021 2:33 PM vignesh C  wrote:
> On Wed, Sep 1, 2021 at 6:58 AM houzj.f...@fujitsu.com 
>  wrote:
> >
> > Here are some other comments for v23-000x patches.
> > 3)
> >
> > + .description =
> "PUBLICATION SCHEMA",
> > + .section =
> SECTION_POST_DATA,
> > + .createStmt
> > + = query->data));
> >
> > Is it better to use something like 'PUBLICATION TABLES IN SCHEMA' to
> > describe the schema level table publication ? Because there could be
> > some other type publication such as 'ALL SEQUENCES IN SCHEMA' in the
> > future, it will be better to make it clear that we only publish table in 
> > schema in
> this patch.
> 
> Modified

Thanks for updating the patch.

I think we might also need to mention the publication object 'table' in the
following types:

1)
+   /* OCLASS_PUBLICATION_SCHEMA */
+   {
+   "publication schema", OBJECT_PUBLICATION_SCHEMA
+   },

2)
+   PUBLICATIONOBJ_SCHEMA,  /* Schema type */
+   PUBLICATIONOBJ_UNKNOWN  /* Unknown type */
+} PublicationObjSpecType;

3)
+   DO_PUBLICATION_SCHEMA,

I think it might be to change the typename like XX_REL_IN_SCHEMA,
and adjust the comments.

Best regards,
Hou zj



Timeout failure in 019_replslot_limit.pl

2021-09-05 Thread Michael Paquier
Hi all,

Running the recovery tests in a parallel run, enough to bloat a
machine in resources, sometimes leads me to the following failure:
ok 19 - walsender termination logged
# poll_query_until timed out executing this query:
# SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep3'

This corresponds to the following part of the test, where a WAL sender
is SIGSTOP'd and SIGCONT'd:
$node_primary3->poll_query_until('postgres',
"SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep3'",
"lost")
  or die "timed out waiting for slot to be lost";

There is already a default timeout of 180s applied as of the default
of PostgresNode::poll_query_until(), so it seems to me that there
could be a different issue hiding here.

Thanks,
--
Michael


signature.asc
Description: PGP signature


Re: strange case of "if ((a & b))"

2021-09-05 Thread Justin Pryzby
On Wed, Aug 18, 2021 at 11:08:57PM -0400, Tom Lane wrote:
> Peter Smith  writes:
> > On Thu, Aug 19, 2021 at 4:29 AM Justin Pryzby  wrote:
> >> -   state->oneCol = (origTupdesc->natts == 1) ? true : false;
> >> +   state->oneCol = origTupdesc->natts == 1;
> 
> FWIW, I am definitely not a fan of removing the parentheses in this
> context, because readers might wonder if you meant an "a = b = 1"
> multiple-assignment, or even misread it as that and be confused.
> So I'd prefer
> 
>   state->oneCol = (origTupdesc->natts == 1);
> 
> In the context of "return (a == b)", I'm about neutral on whether
> to keep the parens or not, but I wonder why this patch does some
> of one and some of the other.
> 
> I do agree that "x ? true : false" is silly in contexts where x
> is guaranteed to yield zero or one.  What you need to be careful
> about is where x might yield other bitpatterns, for example
> "(flags & SOMEFLAG) ? true : false".  Pre-C99, this type of coding
> was often *necessary*.  With C99, it's only necessary if you're
> not sure that the compiler will cast the result to boolean.

I revised the patch based on these comments.  I think my ternary patch already
excluded the cases that test something other than a boolean.

Peter: you quoted my patch but didn't comment on it.  Your regex finds a lot of
conditional boolean assignments, but I agree that they're best left alone.  My
patches are to clean up silly cases, not to rewrite things in a way that's
arguably better (but arguably not worth changing and so also not worth arguing
that it's better).

-- 
Justin
>From 51dab66eb3cbeccfa5bad0f1b8a26b94523edb65 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 28 Apr 2021 14:12:49 -0500
Subject: [PATCH 1/2] Avoid double parens

git grep -l '\tuphdr->t_hoff));
-		else if ((infomask & HEAP_HASNULL))
+		else if (infomask & HEAP_HASNULL)
 			report_corruption(ctx,
 			  psprintf("tuple data should begin at byte %u, but actually begins at byte %u (%u attributes, has nulls)",
 	   expected_hoff, ctx->tuphdr->t_hoff, ctx->natts));
diff --git a/contrib/ltree/ltree_io.c b/contrib/ltree/ltree_io.c
index 15115cb29f..b75f35d5b5 100644
--- a/contrib/ltree/ltree_io.c
+++ b/contrib/ltree/ltree_io.c
@@ -661,17 +661,17 @@ deparse_lquery(const lquery *in)
 }
 memcpy(ptr, curtlevel->name, curtlevel->len);
 ptr += curtlevel->len;
-if ((curtlevel->flag & LVAR_SUBLEXEME))
+if (curtlevel->flag & LVAR_SUBLEXEME)
 {
 	*ptr = '%';
 	ptr++;
 }
-if ((curtlevel->flag & LVAR_INCASE))
+if (curtlevel->flag & LVAR_INCASE)
 {
 	*ptr = '@';
 	ptr++;
 }
-if ((curtlevel->flag & LVAR_ANYEND))
+if (curtlevel->flag & LVAR_ANYEND)
 {
 	*ptr = '*';
 	ptr++;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 387f80419a..1ba9dbf966 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2417,7 +2417,7 @@ PrepareTransaction(void)
 	 * cases, such as a temp table created and dropped all within the
 	 * transaction.  That seems to require much more bookkeeping though.
 	 */
-	if ((MyXactFlags & XACT_FLAGS_ACCESSEDTEMPNAMESPACE))
+	if (MyXactFlags & XACT_FLAGS_ACCESSEDTEMPNAMESPACE)
 		ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg("cannot PREPARE a transaction that has operated on temporary objects")));
@@ -5530,7 +5530,7 @@ XactLogCommitRecord(TimestampTz commit_time,
 		xl_xinfo.xinfo |= XACT_COMPLETION_UPDATE_RELCACHE_FILE;
 	if (forceSyncCommit)
 		xl_xinfo.xinfo |= XACT_COMPLETION_FORCE_SYNC_COMMIT;
-	if ((xactflags & XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK))
+	if (xactflags & XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK)
 		xl_xinfo.xinfo |= XACT_XINFO_HAS_AE_LOCKS;
 
 	/*
@@ -5681,7 +5681,7 @@ XactLogAbortRecord(TimestampTz abort_time,
 
 	xlrec.xact_time = abort_time;
 
-	if ((xactflags & XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK))
+	if (xactflags & XACT_FLAGS_ACQUIREDACCESSEXCLUSIVELOCK)
 		xl_xinfo.xinfo |= XACT_XINFO_HAS_AE_LOCKS;
 
 	if (nsubxacts > 0)
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index dbee6ae199..e018cdfd9e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -16309,7 +16309,7 @@ PreCommit_on_commit_actions(void)
  * relations, we can skip truncating ON COMMIT DELETE ROWS
  * tables, as they must still be empty.
  */
-if ((MyXactFlags & XACT_FLAGS_ACCESSEDTEMPNAMESPACE))
+if (MyXactFlags & XACT_FLAGS_ACCESSEDTEMPNAMESPACE)
 	oids_to_truncate = lappend_oid(oids_to_truncate, oc->relid);
 break;
 			case ONCOMMIT_DROP:
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index ff3dcc7b18..ae3364dfdc 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2390,7 +2390,7 @@ query_tree_walker(Query *query,
 	 * don't contain actual expressions. However they do contain OIDs which
 	 * may be 

RE: New predefined roles- 'pg_read/write_all_data'

2021-09-05 Thread Shinoda, Noriyoshi (PN Japan FSIP)
Thank you for your quick response.
I understood the specifications from your explanation.

Regards,
Noriyoshi Shinoda

From: Stephen Frost [mailto:sfr...@snowman.net]
Sent: Sunday, September 5, 2021 8:50 PM
To: Shinoda, Noriyoshi (PN Japan FSIP) 
Cc: Anastasia Lubennikova ; Michael Banck 
; gkokola...@pm.me; 
pgsql-hackers@lists.postgresql.org
Subject: Re: New predefined roles- 'pg_read/write_all_data'

Greetings,

On Sun, Sep 5, 2021 at 07:43 Shinoda, Noriyoshi (PN Japan FSIP) 
mailto:noriyoshi.shin...@hpe.com>> wrote:
I have tested this new feature with PostgreSQL 14 Beta 3 environment.
I created a user granted with pg_write_all_data role and executed UPDATE and 
DELETE statements on tables owned by other users.
If there is no WHERE clause, it can be executed as expected, but if the WHERE 
clause is specified, an error of permission denied will occur.
Is this the expected behavior?

A WHERE clause requires SELECT rights on the table/columns referenced and if no 
SELECT rights were granted then a permission denied error is the correct 
result, yes. Note that pg_write_all_data, as documented, does not include 
SELECT rights.

Thanks,

Stephen


Re: [PATCH] postgres_fdw: suppress explicit casts in text:text comparisons (was: column option to override foreign types)

2021-09-05 Thread Tom Lane
"Dian M Fay"  writes:
> [ 0001-Suppress-explicit-casts-of-text-constants-in-postgre.patch ]

I took a quick look at this.  The restriction to type text seems like
very obviously a hack rather than something we actually want; wouldn't
it mean we fail to act in a large fraction of the cases where we'd
like to suppress the cast?

A second problem is that I don't think the business with a const_showtype
context field is safe at all.  As you've implemented it here, it would
affect the entire RHS tree, including constants far down inside complex
expressions that have nothing to do with the top-level semantics.
(I didn't look closely, but I wonder if the regression failure you
mentioned is associated with that.)

I think that we only want to suppress the cast in cases where
(1) the constant is directly an operand of the operator we're
expecting the remote parser to use its same-type heuristic for, and
(2) the constant will be deparsed as a string literal.  (If it's
deparsed as a number, boolean, etc, then it won't be initially
UNKNOWN, so that heuristic won't be applied.)

Now point 1 means that we don't really need to mess with keeping
state in the recursion context.  If we've determined at the level
of the OpExpr that we can do this, including checking that the
RHS operand IsA(Const), then we can just invoke deparseConst() on
it directly instead of recursing via deparseExpr().

Meanwhile, I suspect that point 2 might be best checked within
deparseConst() itself, as that contains both the decision and the
mechanism about how the Const will be printed.  So that suggests
that we should invent a new showtype code telling deparseConst()
to act this way, and then supply that code directly when we
invoke deparseConst directly from deparseOpExpr.

BTW, don't we also want to be able to optimize cases where the Const
is on the LHS rather than the RHS?

regards, tom lane




Re: stat() vs ERROR_DELETE_PENDING, round N + 1

2021-09-05 Thread Thomas Munro
On Mon, Sep 6, 2021 at 9:55 AM Tom Lane  wrote:
> Andres Freund  writes:
> > I'd be ok to only fix these bugs on e.g. Win10, Win Server 2019, Win Server
> > 2016 or such.
>
> Yeah, particularly if the fix is trivial on the newer systems and
> incredibly complicated otherwise.  Between the effort needed and
> the risk of introducing new bugs, I'm really not excited about
> an invasive fix for this.

If DeleteFile() is automatically using
FILE_DISPOSITION_POSIX_SEMANTICS by default when possible on recent
releases as per the SO link that Andres posted above ("18363.657
definitely has the new behavior"), then that's great news and maybe we
shouldn't even bother to try to request that mode ourselves explicitly
(eg in some kind of unlink wrapper).  Then we'd need just one
accomodation for older systems and non-NTFS systems, not two, and I
currently think that should be the short and sweet approach shown in
0001-Handle-STATUS_DELETE_PENDING-on-Windows.patch, with some tidying
and adjustments per feedback.




Re: stat() vs ERROR_DELETE_PENDING, round N + 1

2021-09-05 Thread Tom Lane
Andres Freund  writes:
> On 2021-09-06 01:32:55 +1200, Thomas Munro wrote:
>> The best idea is probably to set FILE_DISPOSITION_DELETE |
>> FILE_DISPOSITION_POSIX_SEMANTICS before unlinking.  This appears to be
>> a silver bullet, but isn't available on ancient Windows releases that
>> we support, or file systems other than local NTFS.  So maybe we need a
>> combination of that + STATUS_DELETE_PENDING as shown in the attached.
>> I'll look into that next.

> When was that introduced?

Googling says that it was introduced in Win10, although in RS2 (version
1703, general availability in 4/2017) not the initial release.

> I'd be ok to only fix these bugs on e.g. Win10, Win Server 2019, Win Server
> 2016 or such.

Yeah, particularly if the fix is trivial on the newer systems and
incredibly complicated otherwise.  Between the effort needed and
the risk of introducing new bugs, I'm really not excited about
an invasive fix for this.

regards, tom lane




Re: stat() vs ERROR_DELETE_PENDING, round N + 1

2021-09-05 Thread Andres Freund
Hi,

On 2021-09-06 01:32:55 +1200, Thomas Munro wrote:
> It's a good idea.  I tested it and it certainly does fix the
> basebackup problem I've seen (experimental patch attached).  But,
> yeah, I'm also a bit worried that that path could be fragile and need
> special handling in lots of places.

It's also expensive-ish.


> I also tried writing a new open() wrapper using the lower level
> NtCreateFile() interface, and then an updated stat() wrapper built on
> top of that.  As a non-Windows person, getting that to (mostly) work
> involved a fair amount of suffering.  I can share that if someone is
> interested, but while learning about that family of interfaces, I
> realised we could keep the existing Win32-based code, but also
> retrieve the NT status, leading to a very small change (experimental
> patch attached).

Is it guaranteed, or at least reliable, that the status we fetch with
RtlGetLastNtStatus is actually from the underlying filesystem operation,
rather than some other work that happens during the win32->nt translation?
E.g. a memory allocation or such? Presumably most of such work happens before
the actual nt "syscall", but ...


> The best idea is probably to set FILE_DISPOSITION_DELETE |
> FILE_DISPOSITION_POSIX_SEMANTICS before unlinking.  This appears to be
> a silver bullet, but isn't available on ancient Windows releases that
> we support, or file systems other than local NTFS.  So maybe we need a
> combination of that + STATUS_DELETE_PENDING as shown in the attached.
> I'll look into that next.

When was that introduced?

I'd be ok to only fix these bugs on e.g. Win10, Win Server 2019, Win Server
2016 or such. I don't think we need to support OSs that the vendor doesn't
support - and I wouldn't count "only security fixes" as support in this
context.
mainextended
Windows 10  Oct 14, 2025
Windows Server 2019 Jan 9, 2024 Jan 9, 2029
Windows Server 2016 Jan 11, 2022Jan 12, 2027
Windows 7   Jan 13, 2015Jan 14, 2020
Windows Vista   Apr 10, 2012Apr 11, 2017


One absurd detail here is that the deault behaviour changed sometime in
Windows 10's lifetime:
https://stackoverflow.com/questions/60424732/did-the-behaviour-of-deleted-files-open-with-fileshare-delete-change-on-windows

"The behavior changed in recent releases of Windows 10 -- without notice
AFAIK. DeleteFileW now tries to use POSIX semantics if the filesystem supports
it. NTFS does."


>  #ifndef FRONTEND
> - Assert(pgwin32_signal_event != NULL);   /* small chance of pg_usleep() 
> */
> + /* XXX When called by stat very early on, this fails! */
> + //Assert(pgwin32_signal_event != NULL); /* small chance of pg_usleep() 
> */
>  #endif

Perhaps we should move the win32 signal initialization into StartupHacks()?
There's some tension around it using ereport(), and MemoryContextInit() only
being called a tad later, but that seems resolvable.


> +  * Our open wrapper will report STATUS_DELETE_PENDING as ENOENT.  We 
> pass
> +  * in a special private flag to say that it's _pgstat64() calling, to
> +  * activate a mode that allows directories to be opened for limited
> +  * purposes.
> +  *
> +  * XXX Think about fd pressure, since we're opening an fd?
>*/

If I understand 
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/getmaxstdio?view=msvc-160
etc correctly, it looks like there is. But only at the point we do
_open_osfhandle(). So perhaps we should a pgwin32_open() version returning a
handle and make pgwin32_open() a wrapper around that?


Greetings,

Andres Freund




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-09-05 Thread Andres Freund
On 2021-09-05 14:22:51 +0530, Dilip Kumar wrote:
> On Sat, Sep 4, 2021 at 3:24 AM Andres Freund  wrote:
> 
> > Hi,
> >
> > On 2021-09-03 14:25:10 +0530, Dilip Kumar wrote:
> > > Yeah, we can surely lock the relation as described by Robert, but IMHO,
> > > while creating the database we are already holding the exclusive lock on
> > > the database and there is no one else allowed to be connected to the
> > > database, so do we actually need to bother about the lock for the
> > > correctness?
> >
> > The problem is that checkpointer, bgwriter, buffer reclaim don't care about
> > the database of the buffer they're working on... The exclusive lock on the
> > database doesn't change anything about that.
> 
> 
> But these directly operate on the buffers and In my patch, whether we are
> reading the pg_class for identifying the relfilenode or we are copying the
> relation block by block we are always holding the lock on the buffer.

I don't think a buffer lock is really sufficient. See e.g. code like:

static void
InvalidateBuffer(BufferDesc *buf)
{
...
/*
 * We assume the only reason for it to be pinned is that someone else is
 * flushing the page out.  Wait for them to finish.  (This could be an
 * infinite loop if the refcount is messed up... it would be nice to 
time
 * out after awhile, but there seems no way to be sure how many loops 
may
 * be needed.  Note that if the other guy has pinned the buffer but not
 * yet done StartBufferIO, WaitIO will fall through and we'll 
effectively
 * be busy-looping here.)
 */
if (BUF_STATE_GET_REFCOUNT(buf_state) != 0)
{
UnlockBufHdr(buf, buf_state);
LWLockRelease(oldPartitionLock);
/* safety check: should definitely not be our *own* pin */
if (GetPrivateRefCount(BufferDescriptorGetBuffer(buf)) > 0)
elog(ERROR, "buffer is pinned in InvalidateBuffer");
WaitIO(buf);
goto retry;
}

IOW, currently we assume that you're only allowed to pin a block in a relation
while you hold a lock on the relation. It might be a good idea to change that,
but it's not as trivial as one might think - consider e.g. dropping a
relation's buffers while holding an exclusive lock: If there's potential
concurrent reads of that buffer we'd be in trouble.


> 3. While copying the relation whether to use the bufmgr or directly use the
> smgr?
> 
> If we use the bufmgr then maybe we can avoid flushing some of the buffers
> to the disk and save some I/O but in general we copy from the template
> database so there might not be a lot of dirty buffers and we might not save
> anything

I would assume the big benefit would be that the *target* database does not
have to be written out / shared buffer is immediately populated.

Greetings,

Andres Freund




Re: use-regular-expressions-to-simplify-less_greater-and-not_equals.patch

2021-09-05 Thread Tom Lane
Andrew Dunstan  writes:
> On 8/10/21 11:27 PM, 孙诗浩(思才) wrote:
>>      we can use regular expressions (<>|!=) to cover "<>" and "!=".

> I don't understand the problem being solved here. This looks like a
> change with no discernable benefit. I don't agree that the current code
> is substantially more likely to confuse developers. And the lexer is not
> something that gets a lot of change.

By my count, the response to this has been two definite "no"s, one
leaning-to-no, and nobody speaking in favor.  Nor has the submitter
provided a stronger argument for it.  I'm going to mark this
rejected in the CF app.

regards, tom lane




Re: Fix pkg-config file for static linking

2021-09-05 Thread Tom Lane
Peter Eisentraut  writes:
> Makes sense.  I think we could do it without hardcoding those library 
> names, as in the attached patch.  But it comes out to the same result 
> AFAICT.

This has been pushed, but the CF entry is still open, which is
making the cfbot unhappy.  Were you leaving it open pending
pushing to back branches as well?  I'm not sure what the point
of waiting is --- the buildfarm isn't going to exercise the
troublesome scenario.

regards, tom lane




Re: pg_dump handling of ALTER DEFAULT PRIVILEGES IN SCHEMA

2021-09-05 Thread Bossart, Nathan
On 9/5/21, 12:14 PM, "Tom Lane"  wrote:
> "Bossart, Nathan"  writes:
>> Is there a specific style you have in mind for this change, or is your
>> point that the CASE statement should only be reserved for when
>> is_default_acl is true?
>
> I'd be inclined to split this logic out into a separate if-statement,
> along the lines of

Got it, thanks.

Nathan



Re: psql: \dl+ to list large objects privileges

2021-09-05 Thread Pavel Luzanov

Hi,

On 03.09.2021 15:25, Georgios Kokolatos wrote:

On a high level I will recommend the addition of tests. There are similar tests


Tests added.


Applying the patch, generates several whitespace warnings. It will be helpful
if those warnings are removed.


I know this is a silly mistake, and after reading this article[1] I 
tried to remove the extra spaces.


Can you tell me, please, how can you get such warnings?


The patch contains:

 case 'l':
-   success = do_lo_list();
+   success = listLargeObjects(show_verbose);


It might be of some interest to consider in the above to check the value of the
next character in command or emit an error if not valid. Such a pattern can be
found in the same switch block as for example:

 switch (cmd[2])
 {
 case '\0':
 case '+':
 
 success = ...
 
 break;
 default:
 status = PSQL_CMD_UNKNOWN;
 break;
 }


Check added.


The patch contains:

 else if (strcmp(cmd + 3, "list") == 0)
-   success = do_lo_list();
+   success = listLargeObjects(false);
+
+   else if (strcmp(cmd + 3, "list+") == 0)
+   success = listLargeObjects(true);


In a fashion similar to `exec_command_list`, it might be interesting to consider
expressing the above as:

 show_verbose = strchr(cmd, '+') ? true : false;
 
 else if (strcmp(cmd + 3, "list") == 0
 success = do_lo_list(show_verbose);


I rewrote this part.

New version attached.

[1] https://wiki.postgresql.org/wiki/Creating_Clean_Patches

--
Pavel Luzanov
Postgres Professional: https://postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index e0ffb020bf..374515bcb2 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -2094,7 +2094,7 @@ REVOKE ALL ON accounts FROM PUBLIC;
   LARGE OBJECT
   rw
   none
-  
+  \dl+
  
  
   SCHEMA
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index fcab5c0d51..de47ef528e 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1681,11 +1681,13 @@ testdb=
 
 
   
-\dl
+\dl[+]
 
 
 This is an alias for \lo_list, which shows a
-list of large objects.
+list of large objects. If + is appended 
+to the command name, each large object is listed with its 
+associated permissions, if any.
 
 
   
@@ -2588,12 +2590,15 @@ lo_import 152801
   
 
   
-\lo_list
+\lo_list[+]
 
 
 Shows a list of all PostgreSQL
 large objects currently stored in the database,
 along with any comments provided for them.
+If + is appended to the command name, 
+each large object is listed with its associated permissions,
+if any. 
 
 
   
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 49d4c0e3ce..f3645cab96 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -807,7 +807,16 @@ exec_command_d(PsqlScanState scan_state, bool active_branch, const char *cmd)
 success = describeRoles(pattern, show_verbose, show_system);
 break;
 			case 'l':
-success = do_lo_list();
+switch (cmd[2])
+{
+	case '\0':
+	case '+':
+		success = listLargeObjects(show_verbose);
+		break;
+	default:
+		status = PSQL_CMD_UNKNOWN;
+		break;
+}
 break;
 			case 'L':
 success = listLanguages(pattern, show_verbose, show_system);
@@ -1901,11 +1910,13 @@ exec_command_lo(PsqlScanState scan_state, bool active_branch, const char *cmd)
 	{
 		char	   *opt1,
    *opt2;
+		bool		show_verbose;
 
 		opt1 = psql_scan_slash_option(scan_state,
 	  OT_NORMAL, NULL, true);
 		opt2 = psql_scan_slash_option(scan_state,
 	  OT_NORMAL, NULL, true);
+		show_verbose = strchr(cmd, '+') ? true : false;
 
 		if (strcmp(cmd + 3, "export") == 0)
 		{
@@ -1935,8 +1946,8 @@ exec_command_lo(PsqlScanState scan_state, bool active_branch, const char *cmd)
 			}
 		}
 
-		else if (strcmp(cmd + 3, "list") == 0)
-			success = do_lo_list();
+		else if (strcmp(cmd + 3, "list") == 0 || strcmp(cmd + 3, "list+") == 0)
+			success = listLargeObjects(show_verbose);
 
 		else if (strcmp(cmd + 3, "unlink") == 0)
 		{
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 90ff649be7..a079ce9e36 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -6828,3 +6828,64 @@ listOpFamilyFunctions(const char *access_method_pattern,
 	PQclear(res);
 	return 

Re: pg_dump handling of ALTER DEFAULT PRIVILEGES IN SCHEMA

2021-09-05 Thread Tom Lane
"Bossart, Nathan"  writes:
> On 9/5/21, 10:08 AM, "Tom Lane"  wrote:
>> I'm kind of allergic to this SQL coding style, too.  It expects the
>> backend to expend many thousands of cycles parsing and then optimizing
>> away a useless CASE, to save a couple of lines of code and a few cycles
>> on the client side.  Nor is doing the query this way even particularly
>> readable on the client side.

> Is there a specific style you have in mind for this change, or is your
> point that the CASE statement should only be reserved for when
> is_default_acl is true?

I'd be inclined to split this logic out into a separate if-statement,
along the lines of

... build some of the query ...

if (is_default_acl)
{
/* The reference ACL is empty for schema-local default ACLs */
appendPQExpBuffer(..., "CASE WHEN ... THEN 
pg_catalog.acldefault(%s,%s) ELSE NULL END", ...);
}
else
{
/* For other cases, the reference is always acldefault() */
appendPQExpBuffer(..., "pg_catalog.acldefault(%s,%s)", ...);
}

... build rest of the query ...

I think this is more readable than one giant printf, and not incidentally
it allows room for some helpful comments.

(I kind of wonder if we shouldn't try to refactor buildACLQueries to
reduce code duplication and add comments while we're at it.  The existing
code seems pretty awful from a readability standpoint already.  I don't
have any clear idea about what to do instead, though.)

regards, tom lane




Re: Patch: shouldn't timezone(text, timestamp[tz]) be STABLE?

2021-09-05 Thread Tom Lane
Aleksander Alekseev  writes:
>>> [ timetz_zone is VOLATILE ]
>> I wasn't excited enough about it personally to change it, and I'm
>> still not --- but if you want to, send a patch.

> Here is the patch.

I looked at this patch, and felt unhappy about the fact that it left
timetz_zone() still depending on pg_time_t and pg_localtime, which
nothing else in date.c does.  Poking at it closer, I realized that
the DYNTZ code path is actually completely broken, and has been
for years.  Observe:

regression=# select timezone('America/Santiago', '12:34 -02'::timetz);
  timezone   
-
 11:34:00-03
(1 row)

That's fine.  But CLT, which should be entirely equivalent
to America/Santiago, produces seeming garbage:

regression=# select timezone('CLT', '12:34 -02'::timetz);
 timezone  
---
 09:51:14-04:42:46
(1 row)


What's happening there is that pg_localtime produces a struct tm
containing POSIX-style values, in particular tm_year is relative
to 1900.  But DetermineTimeZoneAbbrevOffset expects a struct using
the PG convention that tm_year is relative to "AD 0".  So it sees
a date in the second century AD, decides that that's way out of
range, and falls back to the "LMT" offset provided by the tzdb
database.  That lines up with what you'd get from

regression=# set timezone = 'America/Santiago';
SET
regression=# select '0121-09-03 12:34'::timestamptz;
 timestamptz  
--
 0121-09-03 12:34:00-04:42:46
(1 row)



Basically the problem here is that this is incredibly hoary code
that's never been touched or tested as we revised datetime-related
APIs elsewhere.  I'm fairly unhappy now that we don't have any
regression test coverage for this function.  However, I see no
very good way to make that happen, because the interesting code
paths will (by definition) produce different results at different
times of year.  I suppose we could carry two variant expected-files,
but ick.  The DYNTZ path is particularly problematic here, because
that's only used for timezones that have changed definitions over
time, meaning they're particularly likely to change again.

Anyway, attached is a revised patch that gets rid of the antique
code, and it produces correct results AFAICT.

BTW, it's customary to *not* include catversion bumps in submitted
patches, because that accomplishes little except to ensure that
your patch will soon fail to apply.  (This one already is failing.)
If you feel a need to remind the committer that a catversion bump
is needed, just comment to that effect in the submission email.

I'm not entirely sure what to do about the discovery that the
DYNTZ path has pre-existing breakage.  Perhaps it'd be sensible
to back-patch this patch, minus the catalog change.  I doubt that
anyone would have a problem with the nominal change of behavior
near DST boundaries.  Or we could just ignore the bug in the back
branches, since it's fairly clear that basically no one uses this
function.

regards, tom lane

diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c
index 0bea16cb67..cf85a61778 100644
--- a/src/backend/utils/adt/date.c
+++ b/src/backend/utils/adt/date.c
@@ -3029,7 +3029,7 @@ extract_timetz(PG_FUNCTION_ARGS)
 
 /* timetz_zone()
  * Encode time with time zone type with specified time zone.
- * Applies DST rules as of the current date.
+ * Applies DST rules as of the transaction start time.
  */
 Datum
 timetz_zone(PG_FUNCTION_ARGS)
@@ -3068,12 +3068,11 @@ timetz_zone(PG_FUNCTION_ARGS)
 	}
 	else if (type == DYNTZ)
 	{
-		/* dynamic-offset abbreviation, resolve using current time */
-		pg_time_t	now = (pg_time_t) time(NULL);
-		struct pg_tm *tm;
+		/* dynamic-offset abbreviation, resolve using transaction start time */
+		TimestampTz now = GetCurrentTransactionStartTimestamp();
+		int			isdst;
 
-		tm = pg_localtime(, tzp);
-		tz = DetermineTimeZoneAbbrevOffset(tm, tzname, tzp);
+		tz = DetermineTimeZoneAbbrevOffsetTS(now, tzname, tzp, );
 	}
 	else
 	{
@@ -3081,12 +3080,15 @@ timetz_zone(PG_FUNCTION_ARGS)
 		tzp = pg_tzset(tzname);
 		if (tzp)
 		{
-			/* Get the offset-from-GMT that is valid today for the zone */
-			pg_time_t	now = (pg_time_t) time(NULL);
-			struct pg_tm *tm;
+			/* Get the offset-from-GMT that is valid now for the zone */
+			TimestampTz now = GetCurrentTransactionStartTimestamp();
+			struct pg_tm tm;
+			fsec_t		fsec;
 
-			tm = pg_localtime(, tzp);
-			tz = -tm->tm_gmtoff;
+			if (timestamp2tm(now, , , , NULL, tzp) != 0)
+ereport(ERROR,
+		(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+		 errmsg("timestamp out of range")));
 		}
 		else
 		{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c07b2c6a55..d068d6532e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5953,7 +5953,7 @@
   proname => 'timestamp_larger', prorettype => 'timestamp',
   proargtypes => 'timestamp timestamp', prosrc => 

Re: corruption of WAL page header is never reported

2021-09-05 Thread Alvaro Herrera
On 2021-Sep-03, Kyotaro Horiguchi wrote:

> diff --git a/src/backend/access/transam/xlog.c 
> b/src/backend/access/transam/xlog.c
> index 24165ab03e..b621ad6b0f 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -12496,9 +12496,21 @@ retry:
>*
>* Validating the page header is cheap enough that doing it twice
>* shouldn't be a big deal from a performance point of view.
> +  *
> +  * Don't call XLogReaderValidatePageHeader here while not in standby 
> mode
> +  * so that this function won't return with a valid errmsg_buf.
>*/
> - if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf))
> + if (StandbyMode &&
> + !XLogReaderValidatePageHeader(xlogreader, targetPagePtr, 
> readBuf))

OK, but I don't understand why we have a comment that says (referring to
non-standby mode) "doing it twice shouldn't be a big deal", followed by
"Don't do it twice while not in standby mode" -- that seems quite
contradictory.  I think the new comment should overwrite the previous
one, something like this:

-* Validating the page header is cheap enough that doing it twice
-* shouldn't be a big deal from a performance point of view.
+*
+* We do this in standby mode only,
+* so that this function won't return with a valid errmsg_buf.
 */
-   if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf))
+   if (StandbyMode &&
+   !XLogReaderValidatePageHeader(xlogreader, targetPagePtr, 
readBuf))

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/




Re: pg_dump handling of ALTER DEFAULT PRIVILEGES IN SCHEMA

2021-09-05 Thread Bossart, Nathan
On 9/5/21, 10:08 AM, "Tom Lane"  wrote:
> "Bossart, Nathan"  writes:
>> I've attached a quick hack that seems to fix this by adjusting the
>> pg_dump query to use NULL instead of acldefault() for non-global
>> entries.  I'm posting this early in order to gather thoughts on the
>> approach and to make sure I'm not missing something obvious.
>
> I find this impossible to comment on as to correctness, because the patch
> is nigh unreadable.  "case_stmt" is a pretty darn opaque variable name,
> and the lack of comments doesn't help, and I don't really think that you
> chose good semantics for it anyway.  Probably it would be better for the
> new argument to be along the lines of "bool is_default_acl", and allow
> buildACLQueries to know what it should put in when that's true.

My apologies.  This definitely shouldn't have been marked as ready-
for-committer.  FWIW this is exactly the sort of feedback I was hoping
to get before I expended more effort on this patch.

> I'm kind of allergic to this SQL coding style, too.  It expects the
> backend to expend many thousands of cycles parsing and then optimizing
> away a useless CASE, to save a couple of lines of code and a few cycles
> on the client side.  Nor is doing the query this way even particularly
> readable on the client side.

Is there a specific style you have in mind for this change, or is your
point that the CASE statement should only be reserved for when
is_default_acl is true?

> Lastly, there probably should be a test case or two.

Of course.  That will be in the next revision.

Nathan



Re: pg_dump handling of ALTER DEFAULT PRIVILEGES IN SCHEMA

2021-09-05 Thread Tom Lane
"Bossart, Nathan"  writes:
> The problem appears to be that pg_dump is comparing the entries in
> pg_default_acl to the default ACL (i.e., acldefault()).  This is fine
> for "global" entries (i.e., entries with no schema specified), but it
> doesn't work for "non-global" entries (i.e., entries with a schema
> specified).  This is because the default for a non-global entry is
> actually an empty ACL.

Good point.

> I've attached a quick hack that seems to fix this by adjusting the
> pg_dump query to use NULL instead of acldefault() for non-global
> entries.  I'm posting this early in order to gather thoughts on the
> approach and to make sure I'm not missing something obvious.

I find this impossible to comment on as to correctness, because the patch
is nigh unreadable.  "case_stmt" is a pretty darn opaque variable name,
and the lack of comments doesn't help, and I don't really think that you
chose good semantics for it anyway.  Probably it would be better for the
new argument to be along the lines of "bool is_default_acl", and allow
buildACLQueries to know what it should put in when that's true.

I'm kind of allergic to this SQL coding style, too.  It expects the
backend to expend many thousands of cycles parsing and then optimizing
away a useless CASE, to save a couple of lines of code and a few cycles
on the client side.  Nor is doing the query this way even particularly
readable on the client side.

Lastly, there probably should be a test case or two.

regards, tom lane




Re: storing an explicit nonce

2021-09-05 Thread Sasasu

Hi, community,

It looks like we are still considering AES-CBC, AES-XTS, and 
AES-GCM(-SIV). I want to say something that we don't think about.


For AES-CBC, the IV should be not predictable. I think LSN or HASH(LSN, 
block number or something) is predictable. There are many CVE related to 
AES-CBC with a predictable IV.


https://cwe.mitre.org/data/definitions/329.html

For AES-XTS, use block number or any fixed variable as tweak still has 
weaknesses similar to IV reuse (in CBC not GCM). the attacker can 
decrypt one block if he knows a kind of plaintext of this block.
In Luks/BitLocker/HardwareBasedSolution, the physical location is not 
available to the user. filesystem running in kernel space. and they not 
do encrypt when filesystem allocating a data block.
But in PostgreSQL, the attacker can capture an encrypted 'ALL-ZERO' page 
in `mdextend`, with this, the attacker can decode the ciphertext of all 
following data in this block.


For AES-GCM, a predictable IV is fine. I think we can decrypt and 
re-encrypt the user data in pg_upgrade. this will allows us to use 
relfile oid + block number as nonce.


OpenPGP_0x4E72AF09097DAE2E.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: Skipping logical replication transactions on subscriber side

2021-09-05 Thread Masahiko Sawada
On Fri, Sep 3, 2021 at 3:46 PM Greg Nancarrow  wrote:
>
> On Mon, Aug 30, 2021 at 5:07 PM Masahiko Sawada  wrote:
> >
> > I've attached rebased patches. 0004 patch is not the scope of this
> > patch. It's borrowed from another thread[1] to fix the assertion
> > failure for newly added tests. Please review them.
> >
>
> BTW, these patches need rebasing (broken by recent commits, patches
> 0001, 0003 and 0004 no longer apply, and it's failing in the cfbot).

Thanks! I'll submit the updated patches early this week.

Regards,


--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2021-09-05 Thread Masahiko Sawada
On Thu, Sep 2, 2021 at 8:37 PM houzj.f...@fujitsu.com
 wrote:
>
> From Mon, Aug 30, 2021 3:07 PM Masahiko Sawada  wrote:
> > I've attached rebased patches. 0004 patch is not the scope of this
> > patch. It's borrowed from another thread[1] to fix the assertion
> > failure for newly added tests. Please review them.
>
> Hi,
>
> I reviewed the 0002 patch and have a suggestion for it.
>
> +   if (IsSet(opts.specified_opts, 
> SUBOPT_SYNCHRONOUS_COMMIT))
> +   {
> +   
> values[Anum_pg_subscription_subsynccommit - 1] =
> +   CStringGetTextDatum("off");
> +   
> replaces[Anum_pg_subscription_subsynccommit - 1] = true;
> +   }
>
> Currently, the patch set the default value out of 
> parse_subscription_options(),
> but I think It might be more standard to set the value in
> parse_subscription_options(). Like:
>
> if (!is_reset)
> {
> ...
> +   }
> +   else
> +   opts->synchronous_commit = "off";
>
> And then, we can set the value like:
>
> 
> values[Anum_pg_subscription_subsynccommit - 1] =
> 
> CStringGetTextDatum(opts.synchronous_commit);

You're right. Fixed.

>
>
> Besides, instead of adding a switch case like the following:
> +   case ALTER_SUBSCRIPTION_RESET_OPTIONS:
> +   {
>
> We can add a bool flag(isReset) in AlterSubscriptionStmt and check the flag
> when invoking parse_subscription_options(). In this approach, the code can be
> shorter.
>
> Attach a diff file based on the v12-0002 which change the code like the above
> suggestion.

Thank you for the patch!

@@ -3672,11 +3671,12 @@ typedef enum AlterSubscriptionType
 typedef struct AlterSubscriptionStmt
 {
NodeTag type;
-   AlterSubscriptionType kind; /* ALTER_SUBSCRIPTION_SET_OPTIONS, etc */
+   AlterSubscriptionType kind; /* ALTER_SUBSCRIPTION_OPTIONS, etc */
char   *subname;/* Name of the subscription */
char   *conninfo;   /* Connection string to publisher */
List   *publication;/* One or more publication to
subscribe to */
List   *options;/* List of DefElem nodes */
+   boolisReset;/* true if RESET option */
 } AlterSubscriptionStmt;

It's unnatural to me that AlterSubscriptionStmt has isReset flag even
in spite of having 'kind' indicating the command. How about having
RESET comand use the same logic of SET as you suggested while having
both ALTER_SUBSCRIPTION_SET_OPTIONS and
ALTER_SUBSCRIPTION_RESET_OPTIONS?

Regards,


--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2021-09-05 Thread Masahiko Sawada
On Thu, Sep 2, 2021 at 5:41 PM houzj.f...@fujitsu.com
 wrote:
>
> From Mon, Aug 30, 2021 3:07 PM Masahiko Sawada  wrote:
> > I've attached rebased patches. 0004 patch is not the scope of this patch. 
> > It's
> > borrowed from another thread[1] to fix the assertion failure for newly added
> > tests. Please review them.
>
> Hi,
>
> I reviewed the v12-0001 patch, here are some comments:

Thank you for the comments!

>
> 1)
> --- a/src/backend/utils/error/elog.c
> +++ b/src/backend/utils/error/elog.c
> @@ -1441,7 +1441,6 @@ getinternalerrposition(void)
> return edata->internalpos;
>  }
>
> -
>
> It seems a miss change in elog.c

Fixed.

>
> 2)
>
> +   TupleDescInitEntry(tupdesc, (AttrNumber) 10, "stats_reset",
> +  TIMESTAMPTZOID, -1, 0);
>
> The document doesn't mention the column "stats_reset".

Added.

> 3)
>
> +typedef struct PgStat_StatSubErrEntry
> +{
> +   Oid subrelid;   /* InvalidOid if the 
> apply worker, otherwise
> +* the table 
> sync worker. hash table key. */
>
> From the comments of subrelid, I think one subscription only have one apply
> worker error entry, right ? If so, I was thinking can we move the the apply
> error entry to PgStat_StatSubEntry. In that approach, we don't need to build a
> inner hash table when there are no table sync error entry.

I wanted to avoid having unnecessary error entry fields when there is
no apply worker error but there is a table sync worker error. But
after more thoughts, the apply worker is likely to raise an error than
table sync workers. So it might be better to have both
PgStat_StatSubErrEntry for the apply worker error and hash table for
table sync workers errors in PgStat_StatSubEntry.

>
> 4)
> Is it possible to add some testcases to test the subscription error entry 
> delete ?

Do you mean the tests checking if subscription error entry is deleted
after DROP SUBSCRIPTION?

Those comments are incorporated into local branches. I'll submit the
updated patches after incorporating other comments.

Regards,


--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2021-09-05 Thread Masahiko Sawada
On Thu, Sep 2, 2021 at 9:03 PM Greg Nancarrow  wrote:
>
> On Mon, Aug 30, 2021 at 5:07 PM Masahiko Sawada  wrote:
> >
> > I've attached rebased patches. 0004 patch is not the scope of this
> > patch. It's borrowed from another thread[1] to fix the assertion
> > failure for newly added tests. Please review them.
> >
>

Thank you for the comments!

> Some initial comments for the v12-0003 patch:
>
> (1) Patch comment
> "This commit introduces another way to skip the transaction in question."
>
> I think it should further explain: "This commit introduces another way
> to skip the transaction in question, other than manually updating the
> subscriber's database or using pg_replication_origin_advance()."

Updated.

>
>
> doc/src/sgml/logical-replication.sgml
> (2)
>
> Suggested minor update:
>
> BEFORE:
> +   transaction that conflicts with the existing data.  When a conflict 
> produce
> +   an error, it is shown in
> pg_stat_subscription_errors
> +   view as follows:
> AFTER:
> +   transaction that conflicts with the existing data.  When a conflict 
> produces
> +   an error, it is recorded in the
> pg_stat_subscription_errors
> +   view as follows:

Fixed.

>
> (3)
> +   found from those outputs (transaction ID 740 in the above case).
> The transaction
>
> Shouldn't it be transaction ID 716?

Right, fixed.

>
> (4)
> +   can be skipped by setting skip_xid to
> the subscription
>
> Is it better to say here ... "on the subscription" ?

Okay, fixed.

>
> (5)
> Just skipping a transaction could make a subscriber inconsistent, right?
>
> Would it be better as follows?
>
> BEFORE:
> +   In either way, those should be used as a last resort. They skip the whole
> +   transaction including changes that may not violate any constraint and 
> easily
> +   make subscriber inconsistent if a user specifies the wrong transaction ID 
> or
> +   the position of origin.
>
> AFTER:
> +   Either way, those transaction skipping methods should be used as a
> last resort.
> +   They skip the whole transaction, including changes that may not violate 
> any
> +   constraint.  They may easily make the subscriber inconsistent,
> especially if a
> +   user specifies the wrong transaction ID or the position of origin.

Agreed, fixed.

>
> (6)
> The grammar is not great in the following description, so here's a
> suggested improvement:
>
> BEFORE:
> +  incoming change or by skipping the whole transaction.  This option
> +  specifies transaction ID that logical replication worker skips to
> +  apply.  The logical replication worker skips all data modification
>
> AFTER:
> +  incoming changes or by skipping the whole transaction.  This option
> +  specifies the ID of the transaction whose application is to
> be skipped
> +  by the logical replication worker.  The logical replication worker
> +  skips all data modification

Fixed.

>
>
> src/backend/postmaster/pgstat.c
> (7)
> BEFORE:
> + * Tell the collector about clear the error of subscription.
> AFTER:
> + * Tell the collector to clear the subscription error.

Fixed.

>
>
> src/backend/replication/logical/worker.c
> (8)
> + * subscription is invalidated and* MySubscription->skipxid gets
> changed or reset.
>
> There is a "*" after "and".

Fixed.

>
> (9)
> Do these lines really need to be moved up?
>
> + /* extract XID of the top-level transaction */
> + stream_xid = logicalrep_read_stream_start(s, _segment);
> +

I had missed to revert this change, fixed.

>
> src/backend/postmaster/pgstat.c
> (10)
>
> + bool m_clear; /* clear all fields except for last_failure and
> + * last_errmsg */
>
> I think it should say: clear all fields except for last_failure,
> last_errmsg and stat_reset_timestamp.

Fixed.

Those comments including your comments on the v12-0001 and v12-0002
are incorporated into local branch. I'll submit the updated patches
after incorporating all other comments.

Regards,


--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2021-09-05 Thread Masahiko Sawada
On Thu, Sep 2, 2021 at 2:55 PM Greg Nancarrow  wrote:
>
> On Mon, Aug 30, 2021 at 5:07 PM Masahiko Sawada  wrote:
> >
> >
> > I've attached rebased patches. 0004 patch is not the scope of this
> > patch. It's borrowed from another thread[1] to fix the assertion
> > failure for newly added tests. Please review them.
> >
>
> I have a few comments on the v12-0002 patch:

Thank you for the comments!

>
> (1) Patch comment
>
> Has a typo and could be expressed a bit better.
>
> Suggestion:
>
> BEFORE:
> RESET command is reuiqred by follow-up commit introducing to a new
> parameter skip_xid to reset.
> AFTER:
> The RESET parameter for ALTER SUBSCRIPTION is required by the
> follow-up commit that introduces a new resettable subscription
> parameter "skip_xid".

Fixed.

>
>
> doc/src/sgml/ref/alter_subscription.sgml
>
> (2)
> I don't think "RESET" is sufficiently described in
> alter_subscription.sgml. Just putting it under "SET" and changing
> "altered" to "set" doesn't explain what resetting does. It should say
> something about setting the parameter back to its original (default)
> value.

Doesn't "RESET" normally mean to change the parameter back to its default value?

>
>
> (3)
> case ALTER_SUBSCRIPTION_RESET_OPTIONS
>
> Some comments here would be helpful e.g. Reset the specified
> parameters back to their default values.

Okay, added.

Regards,


--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Skipping logical replication transactions on subscriber side

2021-09-05 Thread Masahiko Sawada
On Thu, Sep 2, 2021 at 12:06 PM Greg Nancarrow  wrote:
>
> On Mon, Aug 30, 2021 at 5:07 PM Masahiko Sawada  wrote:
> >
> >
> > I've attached rebased patches. 0004 patch is not the scope of this
> > patch. It's borrowed from another thread[1] to fix the assertion
> > failure for newly added tests. Please review them.
> >
>
> I have some initial feedback on the v12-0001 patch.
> Most of these are suggested improvements to wording, and some typo fixes.

Thank you for the comments!

>
>
> (0) Patch comment
>
> Suggestion to improve the patch comment:
>
> BEFORE:
> Add pg_stat_subscription_errors statistics view.
>
> This commits adds new system view pg_stat_logical_replication_error,

Oops, I realized that it should be pg_stat_subscription_errors.

> showing errors happening during applying logical replication changes
> as well as during performing initial table synchronization.
>
> The subscription error entries are removed by autovacuum workers when
> the table synchronization competed in table sync worker cases and when
> dropping the subscription in apply worker cases.
>
> It also adds SQL function pg_stat_reset_subscription_error() to
> reset the single subscription error.
>
> AFTER:
> Add a subscription errors statistics view "pg_stat_subscription_errors".
>
> This commits adds a new system view pg_stat_logical_replication_errors,
> that records information about any errors which occur during application
> of logical replication changes as well as during performing initial table
> synchronization.

I think that views don't have any data so "show information" seems
appropriate to me here. Thoughts?

>
> The subscription error entries are removed by autovacuum workers after
> table synchronization completes in table sync worker cases and after
> dropping the subscription in apply worker cases.
>
> It also adds an SQL function pg_stat_reset_subscription_error() to
> reset a single subscription error.
>
>
>
> doc/src/sgml/monitoring.sgml:
>
> (1)
> BEFORE:
> +  One row per error that happened on subscription, showing
> information about
> +   the subscription errors.
> AFTER:
> +  One row per error that occurred on subscription,
> providing information about
> +   each subscription error.

Fixed.

>
> (2)
> BEFORE:
> +   The pg_stat_subscription_errors view will
> contain one
> AFTER:
> +   The pg_stat_subscription_errors view contains one
>

I think that descriptions of other statistics view also say "XXX view
will contain ...".

>
> (3)
> BEFORE:
> +Name of the database in which the subscription is created.
> AFTER:
> +Name of the database in which the subscription was created.

Fixed.

>
> (4)
> BEFORE:
> +   OID of the relation that the worker is processing when the
> +   error happened.
> AFTER:
> +   OID of the relation that the worker was processing when the
> +   error occurred.
>

Fixed.

>
> (5)
> BEFORE:
> +Name of command being applied when the error happened.  This
> +field is always NULL if the error is reported by
> +tablesync worker.
> AFTER:
> +Name of command being applied when the error occurred.  This
> +field is always NULL if the error is reported by a
> +tablesync worker.

Fixed.

> (6)
> BEFORE:
> +Transaction ID of publisher node being applied when the error
> +happened.  This field is always NULL if the error is reported
> +by tablesync worker.
> AFTER:
> +Transaction ID of the publisher node being applied when the error
> +happened.  This field is always NULL if the error is reported
> +by a tablesync worker.

Fixed.

> (7)
> BEFORE:
> +Type of worker reported the error: apply or
> +tablesync.
> AFTER:
> +Type of worker reporting the error: apply or
> +tablesync.

Fixed.

>
> (8)
> BEFORE:
> +   Number of times error happened on the worker.
> AFTER:
> +   Number of times the error occurred in the worker.
>
> [or "Number of times the worker reported the error" ?]

I prefer "Number of times the error occurred in the worker."

>
> (9)
> BEFORE:
> +   Time at which the last error happened.
> AFTER:
> +   Time at which the last error occurred.

Fixed.

>
> (10)
> BEFORE:
> +   Error message which is reported last failure time.
> AFTER:
> +   Error message which was reported at the last failure time.
>
> Maybe this should just say "Last reported error message" ?

Fixed.

>
>
> (11)
> You shouldn't call hash_get_num_entries() on a NULL pointer.
>
> Suggest swappling lines, as shown below:
>
> BEFORE:
> + int32 nerrors = hash_get_num_entries(subent->suberrors);
> +
> + /* Skip this subscription if not have any errors */
> + if (subent->suberrors == NULL)
> +continue;
> AFTER:
> + int32 nerrors;
> +
> + /* Skip this subscription if not have any errors */
> + if (subent->suberrors == NULL)
> +continue;
> + nerrors = hash_get_num_entries(subent->suberrors);

Right. Fixed.

>
>
> (12)
> 

Re: stat() vs ERROR_DELETE_PENDING, round N + 1

2021-09-05 Thread Thomas Munro
On Fri, Sep 3, 2021 at 2:01 PM Kyotaro Horiguchi
 wrote:
> Might be stupid, if a delete-pending'ed file can obstruct something,
> couldn't we change unlink on Windows to rename to a temporary random
> name then remove it?  We do something like it explicitly while WAL
> file removal. (It may cause degradation on bulk file deletion, and we
> may need further fix so that such being-deleted files are excluded
> while running a directory scan, though..)
>
> However, looking [1], with that strategy there may be a case where
> such "deleted" files may be left alone forever, though.

It's a good idea.  I tested it and it certainly does fix the
basebackup problem I've seen (experimental patch attached).  But,
yeah, I'm also a bit worried that that path could be fragile and need
special handling in lots of places.

I also tried writing a new open() wrapper using the lower level
NtCreateFile() interface, and then an updated stat() wrapper built on
top of that.  As a non-Windows person, getting that to (mostly) work
involved a fair amount of suffering.  I can share that if someone is
interested, but while learning about that family of interfaces, I
realised we could keep the existing Win32-based code, but also
retrieve the NT status, leading to a very small change (experimental
patch attached).

The best idea is probably to set FILE_DISPOSITION_DELETE |
FILE_DISPOSITION_POSIX_SEMANTICS before unlinking.  This appears to be
a silver bullet, but isn't available on ancient Windows releases that
we support, or file systems other than local NTFS.  So maybe we need a
combination of that + STATUS_DELETE_PENDING as shown in the attached.
I'll look into that next.
From 5990d14bb4f9a0ca83fd1b6c7c6e0a6a23786a0d Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sun, 5 Sep 2021 17:07:44 +1200
Subject: [PATCH] Fix Windows basebackup by renaming before unlinking.

Rename files before unlinking on Windows.  This solves a problem where
basebackup could fail because other backends have a file handle to a
deleted file, so stat() and open() fail with ERROR_ACCESS_DENIED.

Suggested by Kyotaro Horiguchi .

XXX This is only a rough sketch to try the idea out!

Discussion: https://postgr.es/m/CA%2BhUKGJz_pZTF9mckn6XgSv69%2BjGwdgLkxZ6b3NWGLBCVjqUZA%40mail.gmail.com
---
 src/backend/replication/basebackup.c |  7 +
 src/backend/storage/file/fd.c| 38 +---
 src/backend/storage/smgr/md.c|  6 ++---
 src/include/storage/fd.h |  1 +
 4 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index e71d7406ed..7961a7041b 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -19,6 +19,7 @@
 #include "access/xlog_internal.h"	/* for pg_start/stop_backup */
 #include "catalog/pg_type.h"
 #include "common/file_perm.h"
+#include "common/string.h"
 #include "commands/progress.h"
 #include "lib/stringinfo.h"
 #include "libpq/libpq.h"
@@ -1268,6 +1269,12 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
 	strlen(PG_TEMP_FILE_PREFIX)) == 0)
 			continue;
 
+#ifdef WIN32
+		/* Skip unlinked files */
+		if (pg_str_endswith(de->d_name, ".unlinked"))
+			continue;
+#endif
+
 		/*
 		 * Check if the postmaster has signaled us to exit, and abort with an
 		 * error in that case. The error handler further up will call
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 501652654e..43f24d4ceb 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -72,6 +72,7 @@
 
 #include "postgres.h"
 
+#include 
 #include 
 #include 
 #include 
@@ -683,6 +684,35 @@ pg_truncate(const char *path, off_t length)
 #endif
 }
 
+/*
+ * Unlink a file.  On Windows, rename to a temporary filename before unlinking
+ * so that "path" is available for new files immediately even if other backends
+ * hold descriptors to the one we unlink.
+ */
+int
+pg_unlink(const char *path)
+{
+#ifdef WIN32
+	for (;;)
+	{
+		char		tmp_path[MAXPGPATH];
+
+		snprintf(tmp_path, sizeof(tmp_path), "%s.%ld.unlinked", path, random());
+		if (!MoveFileEx(path, tmp_path, 0))
+		{
+			if (GetLastError() == ERROR_FILE_EXISTS)
+continue;		/* try a new random number */
+			_dosmaperr(GetLastError());
+			return -1;
+		}
+
+		return unlink(tmp_path);
+	}
+#else
+	return unlink(path);
+#endif
+}
+
 /*
  * fsync_fname -- fsync a file or directory, handling errors properly
  *
@@ -3269,8 +3299,9 @@ CleanupTempFiles(bool isCommit, bool isProcExit)
  * postmaster session
  *
  * This should be called during postmaster startup.  It will forcibly
- * remove any leftover files created by OpenTemporaryFile and any leftover
- * temporary relation files created by mdcreate.
+ * remove any leftover files created by OpenTemporaryFile, any leftover
+ * temporary relation files created by mdcreate, and anything left
+ * behind by pg_unlink() on crash.
  *
  * During 

Fwd: Problem with Unix sockets when porting MobilityDB for Windows

2021-09-05 Thread Esteban Zimanyi
Windows 10 supports Unix sockets as reported, e.g., here
https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/

We run the tests on MobilityDB using an ephemeral instance that is created
by the test suite and torn down afterwards.
https://github.com/MobilityDB/MobilityDB/blob/develop/test/scripts/test.sh
For this we use Unix sockets and thus the pg_ctl command is configured as
follows

PGCTL="${BIN_DIR}/pg_ctl -w -D ${DBDIR} -l ${WORKDIR}/log/postgres.log -o
-k -o ${WORKDIR}/lock -o -h -o ''"

The log file reports things are working as expected

2021-09-05 14:10:53.366 CEST [32170] LOG:  starting PostgreSQL 13.3 on
x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0,
64-bit
2021-09-05 14:10:53.372 CEST [32170] LOG:  listening on Unix socket
"/home/esteban/src/MobilityDB/build/tmptest/lock/.s.PGSQL.5432"
2021-09-05 14:10:53.394 CEST [32171] LOG:  database system was shut down at
2021-09-05 14:10:52 CEST
2021-09-05 14:10:53.412 CEST [32170] LOG:  database system is ready to
accept connections

We are trying to port MobilityDB on Windows using msys2. In this case the
above command does not work as reported in the corresponding log

2021-09-05 14:34:10.553 CEST [19060] LOG:  starting PostgreSQL 13.4 on
x86_64-w64-mingw32, compiled by gcc.exe (Rev5, Built by MSYS2 project)
10.3.0, 64-bit
2021-09-05 14:34:10.558 CEST [19060] LOG:  could not translate host name
"''", service "5432" to address: Unknown host
2021-09-05 14:34:10.558 CEST [19060] WARNING:  could not create listen
socket for "''"
2021-09-05 14:34:10.558 CEST [19060] FATAL:  could not create any TCP/IP
sockets
2021-09-05 14:34:10.560 CEST [19060] LOG:  database system is shut down

Any ideas on how to solve this ?

Esteban


Re: New predefined roles- 'pg_read/write_all_data'

2021-09-05 Thread Stephen Frost
Greetings,

On Sun, Sep 5, 2021 at 07:43 Shinoda, Noriyoshi (PN Japan FSIP) <
noriyoshi.shin...@hpe.com> wrote:

> I have tested this new feature with PostgreSQL 14 Beta 3 environment.
> I created a user granted with pg_write_all_data role and executed UPDATE
> and DELETE statements on tables owned by other users.
> If there is no WHERE clause, it can be executed as expected, but if the
> WHERE clause is specified, an error of permission denied will occur.
> Is this the expected behavior?


A WHERE clause requires SELECT rights on the table/columns referenced and
if no SELECT rights were granted then a permission denied error is the
correct result, yes. Note that pg_write_all_data, as documented, does not
include SELECT rights.

Thanks,

Stephen


RE: New predefined roles- 'pg_read/write_all_data'

2021-09-05 Thread Shinoda, Noriyoshi (PN Japan FSIP)
Hi hackers,

I have tested this new feature with PostgreSQL 14 Beta 3 environment.
I created a user granted with pg_write_all_data role and executed UPDATE and 
DELETE statements on tables owned by other users.
If there is no WHERE clause, it can be executed as expected, but if the WHERE 
clause is specified, an error of permission denied will occur.
Is this the expected behavior?
The WHERE clause is not specified in the regression test (privileges.sql).

Below is the execution log.

postgres=# CREATE USER owner1 PASSWORD 'owner1';
CREATE ROLE
postgres=# CREATE USER write1 PASSWORD 'write1';
CREATE ROLE
postgres=# GRANT pg_write_all_data TO write1;
GRANT ROLE
postgres=# SET SESSION AUTHORIZATION owner1;
SET
postgres=> CREATE TABLE data1(c1 INT, c2 VARCHAR(10));
CREATE TABLE
postgres=> INSERT INTO data1 VALUES (generate_series(1, 10), 'data1');
INSERT 0 10
postgres=> SET SESSION AUTHORIZATION write1;
SET
postgres=> INSERT INTO data1 VALUES (0, 'data1');   -- success
INSERT 0 1
postgres=> UPDATE data1 SET c2='update' WHERE c1=0; -- fail
ERROR:  permission denied for table data1
postgres=> DELETE FROM data1 WHERE c1=0;-- fail
ERROR:  permission denied for table data1
postgres=> UPDATE data1 SET c2='update';-- success
UPDATE 11
postgres=> DELETE FROM data1;   -- success
DELETE 11
postgres=> SELECT version();
  version

 PostgreSQL 14beta3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 
20150623 (Red Hat 4.8.5-39), 64-bit
(1 row)
-

Regards,
Noriyoshi Shinoda

-Original Message-
From: Stephen Frost [mailto:sfr...@snowman.net] 
Sent: Saturday, August 28, 2021 7:34 AM
To: Michael Banck 
Cc: gkokola...@pm.me; Anastasia Lubennikova ; 
pgsql-hackers@lists.postgresql.org
Subject: Re: New predefined roles- 'pg_read/write_all_data'

Greetings,

* Michael Banck (michael.ba...@credativ.de) wrote:
> On Thu, Apr 01, 2021 at 04:00:06PM -0400, Stephen Frost wrote:
> > diff --git a/doc/src/sgml/user-manag.sgml 
> > b/doc/src/sgml/user-manag.sgml index d171b13236..fe0bdb7599 100644
> > --- a/doc/src/sgml/user-manag.sgml
> > +++ b/doc/src/sgml/user-manag.sgml
> > @@ -518,6 +518,24 @@ DROP ROLE doomed_role;
> >
> >   
> >   
> > +  
> > +   pg_read_all_data
> > +   Read all data (tables, views, sequences), as if having SELECT
> > +   rights on those objects, and USAGE rights on all schemas, even 
> > without
> > +   having it explicitly.  This role does not have the role attribute
> > +   BYPASSRLS set.  If RLS is being used, an 
> > administrator
> > +   may wish to set BYPASSRLS on roles which this 
> > role is
> > +   GRANTed to.
> > +  
> > +  
> > +   pg_write_all_data
> > +   Write all data (tables, views, sequences), as if having 
> > INSERT,
> > +   UPDATE, and DELETE rights on those objects, and USAGE rights on all
> > +   schemas, even without having it explicitly.  This role does not 
> > have the
> > +   role attribute BYPASSRLS set.  If RLS is being 
> > used,
> > +   an administrator may wish to set BYPASSRLS on 
> > roles
> > +   which this role is GRANTed to.
> > +  
> 
> Shouldn't those "SELECT", "INSERT" etc. be wrapped in  tags?

Yeah, good point, fixed.

Thanks!

Stephen




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-09-05 Thread Dilip Kumar
On Sat, Sep 4, 2021 at 3:24 AM Andres Freund  wrote:

> Hi,
>
> On 2021-09-03 14:25:10 +0530, Dilip Kumar wrote:
> > Yeah, we can surely lock the relation as described by Robert, but IMHO,
> > while creating the database we are already holding the exclusive lock on
> > the database and there is no one else allowed to be connected to the
> > database, so do we actually need to bother about the lock for the
> > correctness?
>
> The problem is that checkpointer, bgwriter, buffer reclaim don't care about
> the database of the buffer they're working on... The exclusive lock on the
> database doesn't change anything about that.


But these directly operate on the buffers and In my patch, whether we are
reading the pg_class for identifying the relfilenode or we are copying the
relation block by block we are always holding the lock on the buffer.


> Perhaps you can justify it's safe
> because there can't be any dirty buffers or such though.
>
>
> > I think we already have such a code in multiple places where we bypass
> the
> > shared buffers for copying the relation
> > e.g. index_copy_data(), heapam_relation_copy_data().
>
> That's not at all comparable. We hold an exclusive lock on the relation at
> that point, and we don't have a separate implementation of reading tuples
> from
> the table or something like that.
>

Okay, but my example was against the point Robert raised that he feels that
bypassing the shared buffer anywhere is hackish.  But yeah, I agree his
point might be that even if we are using it in existing code we can not
justify it.

For moving forward I think the main open concerns we have as of now are

1. Special purpose code of scanning pg_class, so that we can solve it by
scanning the source database directory, I think Robert doesn't like this
approach because we are directly scanning to directory and bypassing the
shared buffers?  But this is not any worse than what we have now right?  I
mean now also we are scanning the directory directly, so only change will
be instead of copying files directly we will read file and copy block by
block.

2. Another problem is, while copying the relation we are accessing the
relation buffers but we are not holding the relation lock, but we are
already holding the buffer so I am not sure do we really have a problem
here w.r.t checkpointer, bgwriter?  But if we have the problem then also we
can create the lock tag and acquire the relation lock.

3. While copying the relation whether to use the bufmgr or directly use the
smgr?

If we use the bufmgr then maybe we can avoid flushing some of the buffers
to the disk and save some I/O but in general we copy from the template
database so there might not be a lot of dirty buffers and we might not save
anything,  OTOH, if we directly use the smgr for copying the relation data
we can reuse some existing code RelationCopyStorage() and the patch will be
simpler.  Other than just code simplicity or IO there is also a concern by
Robert that he doesn't like to bypass the bufmgr, and that will be
applicable to the point #1 as well as #3.

Thoughts?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: dup(0) fails on Ubuntu 20.04 and macOS 10.15 with 13.0

2021-09-05 Thread Mario Emmenlauer
On 03.09.21 01:00, Tom Lane wrote:
> Mario Emmenlauer  writes:
>> The idea to switch to dup(2) sounds very good to me.
> 
> I poked at this some more, and verified that adding "fclose(stdin);"
> at the head of PostmasterMain is enough to trigger the reported
> failure.  However, after changing fd.c to dup stderr not stdin,
> we can pass check-world even with that in place.  So that confirms
> that there is no very good reason for the postmaster to require
> stdin to be available.
> 
> Hence, I pushed the fix to make fd.c use stderr here.  I only
> back-patched to v14, because given the lack of other complaints,
> I couldn't quite justify touching stable branches for this.

Thanks a lot Tom, its appreciated!

All the best,

Mario Emmenlauer